Handbook of Artificial Intelligence in Biomedical Engineering

HANDBOOK OF
ARTIFICIAL INTELLIGENCE IN
BIOMEDICAL ENGINEERING
Biomedical Engineering: Techniques and Applications
HANDBOOK OF
ARTIFICIAL INTELLIGENCE IN
BIOMEDICAL ENGINEERING
Edited by
Saravanan Krishnan, PhD
Ramesh Kesavan, PhD
B. Surendiran, PhD
G. S. Mahalakshmi, PhD
First edition published 2021
Apple Academic Press Inc. CRC Press
1265 Goldenrod Circle, NE, 6000 Broken Sound Parkway NW,
Palm Bay, FL 32905 USA Suite 300, Boca Raton, FL 33487-2742 USA
4164 Lakeshore Road, Burlington, 2 Park Square, Milton Park,
ON, L7L 1A4 Canada Abingdon, Oxon, OX14 4RN UK
© 2021 Apple Academic Press, Inc.

Apple Academic Press exclusively co-publishes with CRC Press, an imprint of Taylor & Francis Group, LLC
Reasonable efforts have been made to publish reliable data and information, but the authors, editors, and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors, editors, and publishers have attempted
to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to
publish in this form has not been obtained. If any copyright material has not been acknowledged, please write and let us know so
we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any
form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and
recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright
Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC
please contact mpkbookspermissions@tandf.co.uk
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification
and explanation without intent to infringe.
Library and Archives Canada Cataloguing in Publication
Title: Handbook of artificial intelligence in biomedical engineering / edited by Saravanan Krishnan, PhD,
Ramesh Kesavan, PhD, B. Surendiran, PhD., G. S. Mahalakshmi, PhD.
Names: Saravanan, Krishnan, 1982- editor. | Kesavan, Ramesh, editor. | Surendiran, B., editor. | Mahalakshmi, G. S., editor.
Series: Biomedical engineering series (Boca Raton, Fla.)
Description: Series statement: Biomedical engineering: techniques and applications | Includes bibliographical references
and index.
Identifiers: Canadiana (print) 20200316303 | Canadiana (ebook) 20200316737 | ISBN 9781771889209 (hardcover) |
ISBN 9781003045564 (ebook)
Subjects: LCSH: Artificial intelligence—Medical applications. | LCSH: Biomedical engineering.
Classification: LCC R859.7.A78 H36 2021 | DDC 610.28563—dc23
Library of Congress Cataloging-in-Publication Data
Names: Saravanan, Krishnan, 1982- editor. | Kesavan, Ramesh, editor. | Surendiran, B., editor. | Mahalakshmi, G. S., editor.
Title: Handbook of artificial intelligence in biomedical engineering / edited by Saravanan Krishnan, Ramesh Kesavan, B.
Surendiran, G. S. Mahalakshmi.
Other titles: Biomedical engineering (Apple Academic Press)
Description: Palm Bay, FL : Apple Academic Press, [2021] | Series: Biomedical engineering: techniques and applications |
Includes bibliographical references and index. | Summary: “Handbook of Artificial Intelligence in Biomedical
Engineering focuses on recent AI technologies and applications that provide some very promising solutions and enhanced
technology in the biomedical field. Recent advancements in computational techniques, such as machine learning, Internet
of Things (IoT), and big data, accelerate the deployment of biomedical devices in various healthcare applications. This
volume explores how artificial intelligence (AI) can be applied to these expert systems by mimicking the human expert’s
knowledge in order to predict and monitor the health status in real time. The accuracy of the AI systems is drastically
increasing by using machine learning, digitized medical data acquisition, wireless medical data communication, and
computing infrastructure AI approaches, helping to solve complex issues in the biomedical industry and playing a
vital role in future healthcare applications. The volume takes a multidisciplinary perspective of employing these new
applications in biomedical engineering, exploring the combination of engineering principles with biological knowledge
that contributes to the development of revolutionary and life-saving concepts. Topics include: Security and privacy issues
in biomedical AI systems and potential solutions Healthcare applications using biomedical AI systems Machine learning
in biomedical engineering Live patient monitoring systems Semantic annotation of healthcare data This book presents a
broad exploration of biomedical systems using artificial intelligence techniques with detailed coverage of the applications,
techniques, algorithms, platforms, and tools in biomedical AI systems. This book will benefit researchers, medical and
industry practitioners, academicians, and students”-- Provided by publisher.
Identifiers: LCCN 2020038313 (print) | LCCN 2020038314 (ebook) | ISBN 9781771889209 (hardcover) |
ISBN 9781003045564 (ebook)
Subjects: MESH: Artificial Intelligence | Biomedical Engineering--methods | Medical Informatics Applications
Classification: LCC R855.3 (print) | LCC R855.3 (ebook) | NLM W 26.55.A7 | DDC 610.285--dc23
LC record available at https://lccn.loc.gov/2020038313
LC ebook record available at https://lccn.loc.gov/2020038314
ISBN: 978-1-77188-920-9 (hbk)
ISBN: 978-1-00304-556-4 (ebk)
ABOUT THE BOOK SERIES:
BIOMEDICAL ENGINEERING:
TECHNIQUES AND APPLICATIONS
This new book series aims to cover important research issues and concepts
of the biomedical engineering progress in alignment with the latest tech-
nologies and applications. The books in the series include chapters on the
recent research developments in the field of biomedical engineering. The
series explores various real-time/offline medical applications that directly
or indirectly rely on medical and information technology. Books in the
series include case studies in the fields of medical science, i.e., biomedical
engineering, medical information security, interdisciplinary tools along with
modern tools, and technologies used.
Coverage & Approach
• In-depth information about biomedical engineering along with

applications.
• Technical approaches in solving real-time health problems
• Practical solutions through case studies in biomedical data
• Health and medical data collection, monitoring, and security
The editors welcome book chapters and book proposals on all topics in the
biomedical engineering and associated domains, including Big Data, IoT,
ML, and emerging trends and research opportunities.
Book Series Editors:

Raghvendra Kumar, PhD
Associate Professor, Computer Science & Engineering Department,
GIET University, India
Email: raghvendraagrawal7@gmail.com
Vijender Kumar Solanki, PhD

Associate Professor, Department of CSE, CMR Institute of Technology
(Autonomous), Hyderabad, India
Email: spesinfo@yahoo.com
vi About the Book Series: Biomedical Engineering: Techniques and Applications
Noor Zaman, PhD

School of Computing and Information Technology, Taylor’s University,
Selangor, Malaysia
Email: noorzaman650@hotmail.com
Brojo Kishore Mishra, PhD

Professor, Department of CSE, School of Engineering, GIET University,
Gunupur, Osidha, India
Email: bkmishra@giet.edu
FORTHCOMING BOOKS IN THE SERIES
The Congruence of IoT in Biomedical Engineering: An Emerging Field

of Research in the Arena of Modern Technology
Editors: Sushree Bibhuprada B. Priyadarshini, Rohit Sharma, Devendra
Kumar Sharma, and Korhan Cengiz
Handbook of Artificial Intelligence in Biomedical Engineering

Editors: Saravanan Krishnan, Ramesh Kesavan, and B. Surendiran
Handbook of Deep Learning in Biomedical Engineering and Health

Informatics
Editors: E. Golden Julie, S. M. Jai Sakthi, and Harold Y. Robinson
Biomedical Devices for Different Health Applications

Editors: Garima Srivastava and Manju Khari
Handbook of Research on Emerging Paradigms for Biomedical and

Rehabilitation Engineering
Editors: Manuel Cardona and Cecilia García Cena
High-Performance Medical Image Processing

Editors: Sanjay Saxena and Sudip Paul
ABOUT THE EDITORS
Saravanan Krishnan, PhD, is Senior Assistant Professor in the Department

of Computer Science & Engineering at Anna University, Regional Campus,
Tirunelveli, Tamilnadu, India. He has 14 years of experience in academia
and the IT industry and has published papers in 14 international conferences
and 24 international journals. He has also written six book chapters and has
edited three books with international publishers. He has conducted four
research projects and two consultancy projects with the total worth of Rs.70
Lakhs. He is an active researcher and academician, and he is reviewer for
many reputed journals. He also received an outstanding reviewer certificate
from Elsevier, Inc. He is a Mentor of Change, for Atal Tinkering Lab of NITI
Aayog, and has professional membership with several organizations. He
previously worked at Cognizant Technology Solutions, Pvt Ltd. as software
associate. He completed his ME (Software Engineering) in 2007 and earned
his PhD in 2015.
Ramesh Kesavan, PhD, is Assistant Professor in the Department of

Computer Applications, Anna University Regional Campus, Tirunelveli,
India. His areas of research include cloud computing, big data analytics,
data mining, and machine learning. He earned his PhD degree in Computer
Science from Anna University, Chennai, India.
B. Surendiran, PhD, is Associate Dean (Academic) and Assistant Professor

in the Department of Computer Science and Engineering at the National
Institute of Technology, Puducherry, Karaikal, India. His research interests
include medical imaging, machine learning, dimensionality reduction, and
intrusion detection. He has published over 20 papers in international journals
and has several conference publications to his credit. He is an active reviewer
for various SCI and Scopus journals. He earned his PhD at the National
Institute of Technology, Tiruchirappalli, India.
viii About the Editors
G. S. Mahalakshmi, PhD, is Associate Professor in the Computer Science

and Engineering department at College of Engineering, Guindy, Anna
University, Chennai, INDIA. She has vast research experience and published
180 papers in reputed journals and international conferences. She is also
deputy director for Centre for Entrepreneurship Development, Anna Univer-
sity. She is active reviewer for various SCI, Scopus Journals. Her research
interests include machine learning, artificial intelligence, text mining, and
natural language processing.
CONTENTS
Contributors .......................................................................................................... xiii

Abbreviations ........................................................................................................xvii
Preface ................................................................................................................ xxiii
1. Design of Medical Expert Systems Using Machine

Learning Techniques...................................................................................... 1
S. Anto, S. Siamala Devi, K. R. Jothi, and R. Lokeshkumar
2. From Design Issues to Validation: Machine Learning in

Biomedical Engineering............................................................................... 31
Christa I. L. Sharon and V. Suma
3. Biomedical Engineering and Informatics Using

Artificial Intelligence ................................................................................... 51
K. Padmavathi and A. S. Saranya
4. Hybrid Genetic Algorithms for Biomedical Applications ........................ 73

Srividya P. and Rajendran Sindhu
5. Healthcare Applications of the Biomedical AI System ............................. 99

S. Shyni Carmel Mary and S. Sasikala
6. Applications of Artificial Intelligence in Biomedical Engineering ........ 125

Puja Sahay Prasad, Vinit Kumar Gunjan, Rashmi Pathak, and Saurabh Mukherjee
7. Biomedical Imaging Techniques Using AI Systems ................................ 147

A. Aafreen Nawresh and S. Sasikala
8. Analysis of Heart Disease Prediction Using Machine

Learning Techniques.................................................................................. 173
N. Hema Priya, N. Gopikarani, and S. Shymala Gowri
9. A Review on Patient Monitoring and Diagnosis Assistance by

Artificial Intelligence Tools........................................................................ 195
Sindhu Rajendran, Meghamadhuri Vakil, Rhutu Kallur, Vidhya Shree,
Praveen Kumar Gupta, and Lingaiya Hiremat
10. Semantic Annotation of Healthcare Data ................................................ 217

M. Manonmani and Sarojini Balakrishanan
x Contents
11. Drug Side Effect Frequency Mining over a Large Twitter

Dataset using Apache Spark ..................................................................... 233
Dennis Hsu, Melody Moh, Teng-Sheng Moh, and Diane Moh
12. Deep Learning in Brain Segmentation..................................................... 261

Hao-Yu Yang
13. Security and Privacy Issues in Biomedical AI Systems and

Potential Solutions ..................................................................................... 289
G. Niranjana and Deya Chatterjee
14. LiMoS—Live Patient Monitoring System ................................................311

T. Ananth Kumar, S. Arunmozhi Selvi, R.S. Rajesh,
P. Sivananaintha Perumal, and J. Stalin
15. Real-Time Detection of Facial Expressions Using k-NN, SVM,

Ensemble classifier and Convolution Neural Networks ......................... 331
A. Sharmila, B. Bhavya, and K. V. N. Kavitha, and P. Mahalakshmi
16. Analysis and Interpretation of Uterine Contraction Signals

Using Artificial Intelligence ....................................................................... 353
P. Mahalakshmi and S. Suja Priyadharsini
17. Enhanced Classification Performance of Cardiotocogram

Data for Fetal State Anticipation Using Evolutionary
Feature Reduction Techniques.................................................................. 371
Subha Velappan, Manivanna Boopathi Arumugam, and Zafer Comert
18. Deployment of Supervised Machine Learning and Deep

Learning Algorithms in Biomedical Text Classification ......................... 401
G. Kumaravelan* and Bichitrananda Behera
19. Energy Efficient Optimum Cluster Head Estimation for

Body Area Networks .................................................................................. 423
P. Sundareswaran and R.S. Rajesh
20. Segmentation and Classification of Tumour Regions from Brain

Magnetic Resonance Images by Neural Network-Based Technique ..... 447
J. V. Bibal Benifa and G. Venifa Mini
21. A Hypothetical Study in Biomedical Based Artificial Intelligence

Systems using Machine Language (ML) Rudiments .............................. 469
D. Renuka Devi and S. Sasikala
22. Neural Source Connectivity Estimation using particle filter

and Granger causality methods ................................................................ 493
Santhosh Kumar Veeramalla and T. V. K. Hanumantha Rao
Contents xi
23. Exploration of Lymph Node-Negative Breast Cancers by

Support Vector Machines, Naïve Bayes, and Decision Trees:
A Comparative Study ................................................................................ 509
J. Satya Eswari and Pradeep Singh
Index .................................................................................................................... 525

CONTRIBUTORS
S. Anto
School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
Manivanna Boopathi Arumugam

Instrumentation & Chemicals Division, Bahrain Training Institute, Kingdom of Bahrain
Sarojini Balakrishanan
Department of Computer Science, Avinashilingam Institute for Home Science and
Higher Education for Women, Coimbatore 641043, India
B. Bhavya
Deloitte Consulting India Private Limited, Bengaluru, Karnataka
Bichitrananda Behera
Department of Computer Science, Pondicherry University, Karaikal, India
J. V. Bibal Benifa
Department of Computer Science and Engineering, Indian Institute of Information Technology,
Kottayam, India
Deya Chatterjee
Department of Computer Science and Engineering, SRM Institute of Science and Technology,
Kattankulathur, Chennai 603203, India
Zafer Comert
Department of Software Engineering, Samsun University, Turkey
D. Renuka Devi
Department of Computer Science, IDE, University of Madras, Chennai 600005, Tamil Nadu, India
S. Siamala Devi
Department of Computer Science and Engineering, Sri Krishna College of Technology,
Coimbatore, India
J. Satya Eswari
Department of Biotechnology, National Institute of Technology Raipur, Raipur,
Chhattisgarh 492010, India
N. Gopikarani
Department of Computer Science and Engineering, PSG College of Technology, Coimbatore,
Tamil Nadu
S. Shymala Gowri
Department of Computer Science and Engineering, PSG College of Technology, Coimbatore,
Tamil Nadu
Vinit Kumar Gunjan

Department of Computer Science & Engineering, CMRIT, Hyderabad, India
Praveen Kumar Gupta

Department of Biotechnology, R. V. College of Engineering, Bangalore, India
xiv Contributors
Lingaiya Hiremat
Department of Biotechnology, R. V. College of Engineering, Bangalore, India
Dennis Hsu
Department of Computer Science, San Jose State University, San Jose, CA, USA
K. R. Jothi
Rhutu Kallur
Department of Electronics and Communication, R. V. College of Engineering, Bangalore, India
K. V. N. Kavitha
School of Electronics Engineering, Vellore Institute of Technology, Vellore, India
T. Ananth Kumar
Department of Computer Science and Engineering, IFET college of Engineering, Tamil Nadu, India
G. Kumaravelan
Department of Computer Science, Pondicherry University, Karaikal, India
R. Lokeshkumar
P. Mahalakshmi
Department of Electronics and Communication Engineering, Anna University Regional Campus,
Tirunelveli, Tamil Nadu, India
M. Manonmani
Department of Computer Science, Avinashilingam Institute for Home Science and
Higher Education for Women, Coimbatore 641043, India
S. Shyni Carmel Mary
Department of Computer Science, IDE, University of Madras, Cheapuk, Chennai 600 005,
Tamil Nadu, India
G. Venifa Mini
Department of Computer Science and Engineering, Noorul Islam Centre for Higher Education,
Kumaracoil, India
Diane Moh
College of Pharmacy, Touro University, Vallejo, CA, USA
Melody Moh
Teng-Sheng Moh
Saurabh Mukherjee
Banasthali Vidyapith Banasthali, Rajasthan, India
Srilakshmi Mutyala
Stratalycs Technologies Pvt. Ltd., Bangalore, India
A. Aafreen Nawresh
Department of Computer Science, Institute of Distance Education, University of Madras, Chennai, India
E-mail: anawresh@gmail.com
Contributors xv
G. Niranjana
Department of Computer Science and Engineering, SRM Institute of Science and Technology
K. Padmavathi
Department of Computer Science, PSG College of Arts and Science, Coimbatore 641014,
Tamil Nadu, India
Rashmi Pathak
Siddhant College of Engineering, Sudumbre, Pune, Maharashtra, India
P. Sivananaintha Perumal
Department of Computer Science and Engineering, Manonmaniam Sundaranar University,
Tirunelveli, India
Puja Sahay Prasad

Department of Computer Science & Engineering, GCET, Hyderabad, India
N. Hema Priya
Department of Information Technology, PSG College of Technology, Coimbatore, Tamil Nadu
S. Suja Priyadharsini
Department of Electronics and Communication Engineering, Anna University Regional Campus,
Tirunelveli, Tamil Nadu, India
Sindhu Rajendran
R. S. Rajesh
Tirunelveli, India
T. V. K. Hanumantha Rao
Department of Electronics and Communication Engineering, National Institute of Technology,
Warangal, Telangana 506004, India, E-mail: tvkhrao75@nitw.ac.in
A. S. Saranya
Department of Computer Science, PSG College of Arts and Science, Coimbatore 641014,
Tamil Nadu, India
S. Sasikala
Department of Computer Science, IDE, University of Madras, Cheapuk, Chennai 600 005,
Tamil Nadu, India
S. Arunmozhi Selvi
Tirunelveli, India
A. Sharmila
School of Electrical Engineering, Vellore Institute of Technology, Vellore, India
Christa I. L. Sharon
Department of Information Science and Engineering, Dayananda Sagar College of Engineering,
Bangalore, Karnataka, India
Vidhya Shree
Department of Electronics and Instrumentation, R. V. College of Engineering, Bangalore, India
Rajendran Sindhu
Department of Electronics and Communication, R.V. College of Engineering, Bangalore 560059, India
xvi Contributors
Pradeep Singh
Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur,
Chhattisgarh 492010, India
P. Srividya
Department of Electronics and Communication, R.V. College of Engineering, Bangalore 560059, India
J. Stalin
Tirunelveli, India
V. Suma
Department of Information Science and Engineering, Dayananda Sagar College of Engineering,
Bangalore, Karnataka, India
P. Sundareswaran
Tirunelveli, India
Meghamadhuri Vakil
Santhosh Kumar Veeramalla

Department of Electronics and Communication Engineering, National Institute of Technology,
Warangal, Telangana 506004, India
Subha Velappan
Department of Computer Science & Engineering, Manonmaniam Sundaranar University,
Tirunelveli, India
Hao-Yu Yang
CuraCloud Corporation, Seattle, WA, USA
ABBREVIATIONS
ABC artificial bee colony

ACHE adaptive energy efficient cluster head estimation
ACO ant colony optimization
ADEs adverse drug events
ADHD attention deficit hyperactivity disorder
AI artificial intelligence
ANNs artificial neural networks
API application program interface
apriori TID apriori algorithm for transaction database
ARM augmented reality microscope
AUC area under curve
BANs body area networks
BCOA binary cuckoo optimization algorithm
BIDMC Beth Israel Deaconess Medical Center
BP backpropagation
BPN backpropagation network
CAD computer-aided diagnosis
CADs coronary artery diseases
CART classification and regression tree
CBR case-based reasoning
CCA clear channel assessment
CDSS clinical decision support systems
CFCNs cascade of completely convolutional neural systems
CKD chronic kidney disease
CM classification measure
CNN convolutional neural arrange
CNN convolutional neural network
CRFs conditional random fields
CT computed tomography
CTG cardiotocography
CV cross-validation
CVS cross-validation score
DAG directed acyclic graph
DBN deep belief network
xviii Abbreviations
DILI drug-induced liver damage

DIP digital image processing
DME diabetic macular edema
DNN deep neural network
DP data preprocessing
DPW distributional profile of a word
DPWC distributional profile of multiple word categories
DRE digital rectal examination
DT decision tree
DTC decision tree classifier
DTFs directed transfer functions
DWT discrete wavelet transform
EC ensemble classifier
ECG electrocardiography/electrocardiogram
EEG electro-encephalography
EHG electrohysterography
EHR electronic health record
ELM extreme learning machine
EMG electromyography
EOG electrooculogram
FCN fully convolutional network
FE feature extraction
FER facial expression recognition
FFNNs feedforward neural networks
FHR fetal heart rate
fMRI functional MRI
FN false negative
FP false positive
FP frequent pattern
FS feature selection
FS Fisher score
GA genetic algorithm
GAN generative adversarial network
GA-SA genetic algorithm-simulated annealing
GC Granger causality
GC grouping and choosing
GD Gaussian distribution
GE General Electric
GLCM gray level co-occurrence matrices
GLMC gray-level co-occurrence lattice
Abbreviations xix
GNB Gaussian naïve Bayes

HBA heartbeat analysis
HBC human body communication
HEDIS Healthcare Effectiveness Data and Information Set
ID3 iterative dichotomiser-3
IG-OBFA OBFA melded with Information Gain
IMFs intrinsic mode functions
IR information retrieval
KDE kernel density estimation
KNN k-Nearest neighbor
LDA linear discriminant analysis
LGR logistic regression
LNMF local non-negative matrix factorization
LOS line of sight
LOSO leave-one-subject-out
LS least square
LSI latent semantic indexing
LSSVM late slightest squares bolster vector machine
LSTM long short-term memory
LYNA lymph node assistant
MAC medium access control
MC-CNN multi-edit convolution neural organize
MEG magnetoencephalography
ML machine learning
MLP multilayer perceptron
MMK medical monitor kit
MRF Markov random field
MRI magnetic resonance imaging
MTRs medical transportation robots
MVAR multivariate autoregressive
NB naïve Bayes
NB narrow band
NEE neural edge enhancer
NFs network filters
NLP natural language processing
NMF non-negative matrix factorization
NN neural network
OBFA Opposition-Based Firefly Algorithm
OBL opposition-based learning
OWC optical wireless communication
xx Abbreviations
OWL web ontology language

PA passive–aggressive
PCA principal component analysis
PDC partial directed coherence
PET positron emission tomography
PF particle filter
PID PIMA Indians Diabetes
POS Part of Speech
PPDM privacy-preserving data mining
PPN perceptron
PSA prostate-specific antigen
PSG polysomnography
PWM position weight matrices
QSTR quantity structure toxicity relationship
RADBAS radial basis function
RBF radial basis function
RBM restricted Boltzmann machine
RBNN radial basis function neural network
RC Rocchio classifier
RDD resilient distributed dataset
RDF resource description framework
RElim recursive elimination
ReLU rectified linear unit
REM rapid eye movement
RF random forest
RFC random forest classifier
RFSs random finite sets
RNN recurrent neural network
RNN repetitive or recursive neural arrange
ROI region of interest
SA simulated annealing
SGD stochastic gradient descent
SGN stochastic gradient descent
SI swarm intelligence
SLFN single hidden layer feed forward network
SOM self-organizing map
SVM support vector machine
TEEN threshold sensitive energy efficient network
TF term frequency
TN true negative
Abbreviations xxi
TP true positive
TRIBAS triangular basis function
TS training set
UMLS Unified Medical Language System
UWB ultra-wide band
VAE variationalautoencoder
VLC visible light communication
VSM vector space method
VSVM vicinal back vector machine
WAC weighted associative classifier
WBAN wireless body area network
WDBC Wisconsin Diagnostic Breast Cancer
WSNs wireless sensor networks
PREFACE
Biomedical engineering is a multidisciplinary field that applies engineering prin-

ciples and materials to medicine and healthcare. The combination of engineering
principles with biological knowledge has contributed to the development of
revolutionary and life-saving concepts. Artificial intelligence (AI) is an area
of computer science and engineering that provides intelligence to machines.
AI in biomedical engineering uses machine-learning algorithms and software to
analyze complicated medical data and perform automatic diagnosis.
With the recent rapid advancement in digitized medical data acquisition,
wireless medical data communication, and computing infrastructure, AI has
drastically changed medical practice. AI has wide applications in the field
of biomedical engineering, namely, health applications, wearable devices,
medical image processing, telemedicine, and surgical robots.
Biomedical engineering applications are associated with many domains,
such as Big Data, IoT, machine learning, and AI. Many technologies,
modern tools, and methodologies have been adopted in this field. Informa-
tion technology solutions empower biomedical engineering and healthcare.
AI contributes many research advancements in medical applications.
This book is focussed on recent AI technologies and applications that
contribute to the biomedical field. This edited book explores the applications
and the research solutions. The book is organized into 22 chapters as follow:
Chapter 1 explores an expert system based on fuzzy logic and ant colony
optimization (ACO) for automatic diagnosis and clinical decision-making. It
also proposes an expert system based on hybrid genetic algorithm-simulated
annealing and support vector machine (GASA-SVM) for disease diagnosis.
Finally, it suggests a decision-support system based on Fisher score-extreme
learning machine-simulated annealing.
Chapter 2 describes the data acquisition prospects in biomedical systems
and the different approaches that can be followed in the knowledge representa-
tion. Further, it describes the design issues and feature-selection techniques
that can be adapted to achieve an optimum learning model. It also addresses
the design and validation challenges faced while adapting machine learning
techniques in the biomedical engineering specialization.
Chapter 3 describes the uses of AI and their related techniques in the
biomedical and healthcare. This chapter also explore the field of biomedical
and informatics within the branches of AI.
xxiv Preface
Chapter 4 gives an insight into the diagnosis of heart disease using

hybrid genetic algorithm. It describes the basis of genetic algorithm and its
limitations. A hybrid genetic algorithm is proposed by combining genetic
algorithm and classification techniques. It also brief classical decision algo-
rithms, includes CHAID algorithm, CART algorithm, ID3 algorithm, C4.5
algorithm, and C5.0. Finally, it describes a novel hybrid genetic algorithm
devised by combining image processing technique.
Chapter 5 gives a comprehensive description of healthcare applications.
Furthermore, a number of applications such as breast mass lesion, phar-
maceutical, and rehabilitation robotics are included. Finally, it provides a
summary of healthcare applications of biomedical AI system.
Chapter 6 introduces the current status of AI in healthcare. Motivated by
the widespread applications of AI techniques in biomedical field, it discusses
the current applications and its issues in detail. It also reviews the real-time
application of AI.
Chapter 7 gives a clear insight into basic biomedical research, translational
research, and clinical practice. It also gives a detailed survey on biomedical
imaging capturing techniques like X-ray, computed tomography (CT) scan,
positron emission tomography (PET), magnetic resonance imaging (MRI),
and ultrasound. It also illustrates AI-based biomedical imaging components
developed by various companies. Finally, it concludes with the challenges in
the field of biomedical imaging.
Chapter 8 deals with the prediction of heart disease. A comparative study
is done with KNN, decision tree, support vector machine, random forest,
and multilayer perceptron algorithms for heart disease detection. It also
concludes that a random forest algorithm gives a better prediction ratio.
Chapter 9 is a review on an AI-based patient monitoring system. It
discusses the key biomedical challenges and their wide range of AI-enabled
medical applications. The chapter also discusses different segmentation
approaches and their advantages and disadvantages. It also gives a short
insight on prognostic models and the usage of prognostic scores to quantify
severity or intensity of diseases.
Chapter 10 deals with the management of heterogeneous healthcare data.
It describes semantic annotation process based data handling and methods for
incorporating semantic models in an AI expert system for the prediction of
chronic diseases. The main objective envisaged in this chapter is to propose
a semantic annotation model for identifying patients suffering from chronic
kidney disease (CKD). This chapter stresses the need for achieving semantic
annotation by surpassing the implementation challenges by using ontology.
Preface xxv
Chapter 11 describes a drug side effect analysis system based on reviews

from the social media. It details the techniques to extract information from
social media website Twitter using sentiment analysis and machine learning.
Then, it describes the procedure to process and handle large datasets using
distributed computing through Apache Spark. This is followed by an
experimental results analysis of the best and most efficient ways to correctly
extract the frequency of adverse drug events using the techniques previously
described. Afterward, a detailed pharmaceutical analysis is provided for the
results, with insight from a domain expert.
Chapter 12 focuses on deep learning applications in brain image segmen-
tation. It starts off with a brief introduction of brain imaging modalities,
image segmentation, followed by the essential image processing procedures.
Then it gives an introduction to the basic building blocks of CNNs and
lays the foundation for modern CNN architectural designs. Next, it reviews
the state-of-the-art deep learning models for brain segmentation and draw
comparisons with traditional machine learning methods. Finally, this chapter
discusses the current state, challenges of clinical integration, and future
trends of deep learning in brain segmentation.
Chapter 13 deals with security and privacy issues in biomedical AI systems.
It outlines various ranges of security and privacy threats to biomedical AI
systems such as linkage attacks, inference attacks, adversarial examples,
and so on. Similarly, solutions to such problems have been discussed, such
as conventional techniques like auditing, etc., and newer advancements in
research like differential privacy and federated learning.
Chapter 14 proposes a real-time patient monitoring system. It describes
a visible-light-based wireless technology called Li-Fi for real-time moni-
toring. It proposes a Li-Fi Monitoring System framework (LiMoS) and also
gives a detailed note on the components of the system. It also describes the
procedure for experimental setup. Finally, it concludes with a result analysis
to justify the performance of the system.
Chapter 15 involves a comparative analysis of facial expression recognition
techniques using the classic machine learning algorithms—k-nearest neighbor
(KNN), support vector machine (SVM), ensemble classifiers, and the most
advanced deep learning technique using convolutional neural networks.
Chapter 16 offers insights into the development of an intelligent system
for the early diagnosis of premature birth by correctly identifying true/false
labor pains. Raw term preterm electrohysterography (TPEHG) signals from
PhysioNet were analyzed in this work. The performance of classifiers such
as the SVM, ELM, KNN, ANN, radial basis function neural network, and
random forest is individually evaluated in terms of classifying EHG signals.
xxvi Preface
Chapter 17 briefly presents the feature selection techniques to improve

the performance of SVM classifier. Cardiotography (CTG) data is examined
to diagnose fetal hypoxia for fetal state anticipation. In this chapter, three
efficient feature selection techniques based on evolutionary methodologies
such as firefly algorithm (FA), opposition-based firefly algorithm (OBFA),
and opposition-based firefly algorithm melded with information gain
(IG-OBFA) are presented in detail.
Chapter 18 investigates the deployment of the state-of-the-art ML algo-
rithms like decision tree, k-nearest neighborhood, Rocchio, ridge, passive–
aggressive, multinomial naïve Bayes, Bernoulli naïve Bayes, support vector
machine, and artificial neural network classifiers such as perceptron, random
gradient descent, backpropagation neural network in automatic classifica-
tion of biomedical text documents on benchmark datasets like BioCreative
Corpus III(BC3), Farm Ads, and TREC 2006 genetics Track.
Chapter 19 gives a detailed note on body area networks (BAN) and the
properties of wearable devices. To extend the life of wearable sensors, it
proposes a novel protocol to find the best possible cluster heads for a single
round of operations in reactive body area networks.
In Chapter 20, a practical solution is proposed through a novel algorithm
by segmenting the tumor part and healthy part from MR image that works
based on image segmentation and self-organizing neural networks. The
segmentation algorithm identifies the tumor regions and further boundary
parameters will be gathered from the segmented images to feed into the
neural network system.
Chapter 21 explores the fundamentals and application of machine
learning in biomedical domain. It interprets various aspects of AI and
machine learning (ML)-enabled technologies, prodigies, and its independent
applications. Finally, it reviews the research developments and challenges in
biomedical engineering.
Chapter 22 suggests a new strategy to identify brain sources with their
corresponding locations and amplitudes depending on a particle filter.
Modeling of the time series is used to detect movement and time dependence
among the brain sources. Finally, the Granger causality techniques have been
applied to assess directional causal flow across the sources. It provides a
framework to test the analytical pipeline on real EEG information.
—Saravanan Krishnan
Ramesh Kesawan
B. Surendiran
G. S. Mahalakshmi
CHAPTER 1
DESIGN OF MEDICAL EXPERT

SYSTEMS USING MACHINE LEARNING
TECHNIQUES
S. ANTO1*, S. SIAMALA DEVI2, K. R. JOTHI3, and R. LOKESHKUMAR4
School of Computer Science and Engineering,
1,3,4
Vellore Institute of Technology, Vellore, India.

2
Department of Computer Science and Engineering,
Sri Krishna College of Technology, Coimbatore, India
Corresponding author. E-mail: anto.s@vit.ac.in
*
ABSTRACT the accuracy of diagnosis has moti-

vated many researchers to propose
The number of qualified doctors in medical expert systems using various
India is around 7 per 10,000 people, machine learning and optimization
which leads to the various challenges techniques. This leads to proposing
faced by the medical field such as four different medical expert system
inadequacy of physicians, the slower designs as follows.
rate of diagnosis, and unavailability As a first step, an expert system
of medical assistance to common based on fuzzy logic and ant colony
people in time. In this scenario, optimization (ACO) is used. This
as clinical decision demands the system uses the ACO algorithm
utmost accuracy of diagnosis, it is to generate fuzzy classification
a tedious and challenging task for rules from the training patterns.
physicians. These issues can be The stochastic behavior of the
addressed by an automated system ACO algorithm encourages the
that helps the clinicians in disease ants to find more accurate rules.
diagnosis. Attaining the maximum Second, an expert system based on
accuracy is an open challenge in this a hybrid genetic algorithm–simu-
scenario. The scope for improving lated annealing and support vector
2 Handbook of Artificial Intelligence in Biomedical Engineering
machine is proposed for the disease its environment and takes actions to
diagnosis. The hybrid GASA is used maximize the probability of success.
for selecting the most significant AI encompasses an extensive range
feature subset as well as to optimize of areas in decision-making process.
the kernel parameters of the SVM One of the subfields of AI is an
classifier. As a next step, a decision automated diagnosis. It is related to
support system based on Fisher the development of algorithms and
score-extreme learning machine techniques to confirm the behavior
(ELM)-simulated annealing is of a system. Thus, the developed
proposed. The ELM-based learning algorithm should be capable enough
machine uses a single-hidden layer to discover its cause, whenever
feedforward neural network. Finally, something goes wrong.
an expert system based on least
square–support vector machine–
simulated annealing (LS-SVM-SA) 1.1.2 EXPERT SYSTEMS
is proposed. FS is used for the selec-
tion of the most significant features. An expert system is “a computer
To improve the performance of the program that represents and reasons
system, LS-SVM with radial basis with knowledge of some specialist
function is used for classification subject with a view to solving prob-
and the SA is used for the optimiza- lems or giving advice” (Jackson,
tion of the kernel parameters of the 1999).
LS-SVM. It consists of a knowledge source
and a mechanism that solves prob-
lems and returns a response based
1.1 INTRODUCTION on the information provided by the
query. Direct input from domain
1.1.1 ARTIFICIAL INTELLIGENCE experts and evidence from literature
are the sources of knowledge to the
Artificial intelligence (AI) is a expert systems. To solve expert-
replication of human intelligence level problems, efficient access to a
by computer systems. It is an inter- substantial domain knowledge base
disciplinary field that embraces a and a reasoning mechanism to apply
number of sciences, professions, and the knowledge to the problems is
specialized areas of technology. To mandatory.
be precise, AI will not replace people Knowledge acquisition, the process
but will augment their capabilities. of transforming human knowledge to
AI helps to devise an intelligent machine-usable form is considered a
agent in which, an agent observes bottleneck (Feigenbaum, 1977) as it
Design of Medical Expert Systems 3
demands more time and labor. Further, are used to design such expert
maintaining the knowledge base is systems. These decision support
also a challenging task (Coenen and systems can play a major role in
Bench-Capon, 1992; Watson et al., assisting the physicians, while
1992). Techniques such as case-based making complex clinical decisions,
reasoning (CBR) (Watson and Marir, thereby, can improve the accuracy of
1994) and machine learning (ML) diagnosis. Such systems have higher
methods based on data are used for optimization potential and reduced
inference as they avoid the knowledge financial costs. Pattern recognition
acquisition problem. In CBR, the and data mining are the techniques
knowledge consists of preceding cases used in these expert systems that
that include the problem, solution, and allow retrieval of meaningful infor-
the outcome stored in the case library. mation from large scale medical
To obtain a solution for a new case, it data.
is needed to identify a case that resem-
bles the problem in the case library
and adopt the proposed solution from 1.1.3 MACHINE LEARNING
the retrieved case. Similar to CBR,
ML-based expert systems avoid the ML, a subfield of computer science
bottleneck of knowledge acquisition and statistics, is a scientific discipline
as knowledge is directly obtained from that deals with the design and study
data. Recommendations are generated of algorithms to learn from data and
by nonlinear forms of knowledge and to make autonomous decisions. It has
easily updated by simply adding new strong ties to data mining (Mannila
cases. and Heikki, 1996), AI, and optimiza-
tion. It does not explicitly program
computers to acquire knowledge
1.1.2.1 MEDICAL EXPERT but emphases on the development
SYSTEM of computer programs that grow
and change by teaching themselves
A decision support system is when exposed to new data. Further,
computer software that attempts it focuses more on exploratory data
to act like a human being. During analysis (Friedman, 1998) and on
the past few years, medical expert the improvement of machine proj-
systems for the diagnosis of different ects that develop and change when
diseases have received more atten- given new information. Knowledge
tion (Kourou et al., 2015; Fan et representation and generalization are
al. 2011). Knowledge discovery in the core of ML.
patient’s data and machine learning
To be precise, the algorithms do The PID dataset for diabetes

not involve programmed instructions is used in which the patients
but build a model based on inputs are females above 21 years
and make predictions or decisions of age having Pima Indian
on their own. It comprises of a set heritage. There are 768
of methods that automatically detect instances with nine attri-
patterns in data, use the apparent butes including the “class”
patterns to predict the data, and variable. The attributes are
performs better for exact decision numeric-valued. A total of
making. It is employed in applica- 500 instances belong to
tions like spam filtering, optical Class “0” and 268 instances
character recognition (Wernick et al., belong to Class “1.”
2010) search engines, and computer Breast Cancer Wisconsin
vision, where designing explicit (Diagnostic) Dataset
rule-based algorithms is not feasible. WDBC includes 569
instances, out of which
357 are benign and 212 are
1.1.4 DATASET FOR malignant with 32 attributes
PERFORMANCE EVALUATION including ID, diagnosis, and
30 real-valued input features.
The datasets that are considered for Hepatitis Dataset
analyzing the performance of the The Hepatitis domain
proposed systems are given below. consists of mostly Boolean
These data sets are taken from UCI or numeric-valued attribute
ML Repository. types which have 155
instances with 20 attributes
Breast Cancer Wisconsin including the “class” attri-
(Original) Dataset bute. The “BILIRUBIN”
Wisconsin Diagnostic Breast attribute is continuously-
Cancer (WDBC; Original) valued. A total of 32
is one of the standard data- instances belong to “DIE”
sets considered for Breast class and 123 instances
Cancer diagnosis and it has belong to “LIVE” class.
699 instances, out of which Cardiac Arrhythmia
458 are benign and 241 are Dataset
malignant with 11 attributes The Cardiac Arrhythmia
including the class attribute. dataset records the pres-
PIMA Indians Diabetes ence/absence of cardiac
(PID) Dataset arrhythmia and classifies it
in one of the 16 groups. The FP is the incorrect predictions

database has 452 instances of an instance as negative.
with 279 attributes, out of TP is the correct predictions
which 206 are linear valued of an instance as positive.
and the rest are nominal. (ii) Cross-Validation (CV): CV
is a widely used statistical
method to evaluate the clas-
1.1.5 PERFORMANCE sifier’s performances by
METRICS splitting a data set into two
sets as training and testing.
To evaluate the performance of the In CV, the training and the
medical expert system model, the testing sets must cross over
following performance metrics are in successive rounds, and
used and the results are given in in this way, each record has
Table 1.3 a chance of being validated
(i) Confusion Matrix: The against.
confusion matrix holds both (iii) Classification Accuracy:
the actual and predicted Classification accuracy is
instances classified by the most commonly used
the classifier system. A measure for determining the
confusion matrix for a clas- performance of classifiers.
sification problem with two It is the rate of a number of
classes is of size 2×2 as correct predictions made by
shown in Table 1.1. a model over a data set as
shown in Equation (1.1)
TABLE 1.1 Confusion Matrix
TP + TN
Predicted Actual Accuracy =
TP + TN + FP + FN (1.1)
Positive Negative
Positive TP (true FP (false (iv) Sensitivity and Specificity:
positive) positive) Sensitivity is the true posi-
Negative FN (false TN (true tive rate, and specificity is
negative) negative) the true negative rate. They
are defined in (1.2) and (1.3)
TN is the correct predictions
of an instance as negative. TP
Sensitivity = (1.2)
FN is the incorrect predic- TP + FN
tions of an instance as
positive. TN
Specificity = (1.3)
FP + TN
1.2 MEDICAL EXPERT SYSTEM k. rcf

S= (1.4)
BASED ON FUZZY CLASSIFIER ( k + k ( k −1) rff )
WITH ANT COLONY
OPTIMIZATION (ACO) where,
k is the number of attributes in the
A medical decision support system set S;
based on fuzzy logic and ACO rcf models the correlation of the
is proposed for the diagnosis of attributes to the class label;
various disease datasets like breast rff is the inter-correlation
cancer (original), breast cancer between attributes.
(diagnostic), diabetes, heart disease It selects a constant value of the
(Niranjana and Anto, 2014), and 10 most significant features of the
hepatitis. The datasets are accessed given medical dataset.
from the UCI ML repository. A set
of fuzzy rules is extracted from the
patient’s dataset using fuzzy logic 1.2.2 FUZZY-ACO CLASSIFIER
and ACO (Anto and Chandramathi,
2015). ACO algorithm optimizes The proposed system uses the ACO
these extracted fuzzy rules and algorithm to generate fuzzy classifi-
generates an optimized set of rules. cation rules from training patterns of
The fuzzy inference system uses the dataset. The artificial ants make
these optimized rules to perform candidate fuzzy rules gradually in
the classification of the test data. A search space. The stochastic behavior
10-fold cross-validation procedure is of the ACO algorithm encourages
used to evaluate the performance of the ants to find more accurate rules.
the system in terms of the classifica- These optimized rules are used
tion accuracy. by the fuzzy inference engine to
perform decision making on testing
patters as shown in Figure 1.1.
1.2.1 FEATURE SELECTION
The selection of the most significant 1.2.3 MEMBERSHIP

features is done by using correlation- VALUE ASSIGNMENT BY
based feature selection. It assigns NORMALIZATION OF DATASET
higher scores to feature subsets that
are highly correlated to the class Normalization of dataset between
labels but uncorrelated to each other. 0.0 and 1.0 is done using the min–
The merit “S” of an attribute set max normalization method as shown
is given by Equation (1.4) in Equation (1.5)
FIGURE 1.1 Stages of the proposed system.
X − X min X is the linguistic value;

Normalize (X) = (1.5) Xmin is the least linguistic value;
X max − X min
Xmax is the maximum linguistic
where value.
(X) is the membership function;
FIGURE 1.2 Antecedent fuzzy sets.

The domain of each attribute iterations (t ≥ 1), the ants modify

is homogeneously partitioned into the rule. In an iteration (t ≥ 1), the
symmetric triangular fuzzy sets. maximum number of times each
The membership function of each ant modifies the rule is decided by
linguistic value is determined from maximum possible change value.
the domain as shown in Figure 1.2. Each ant chooses termij to modify
The antecedent fuzzy sets that are based on the probability given in
used are small (S), medium small Equation (1.6)
(MS), medium (M), medium large
(ML), and large (L). ˆ ij ( t ) .ˇ ij
Pij = (1.6)
ˆ ( t ) ,ˇ
a bi
i j ij ij , ˘i I
1.2.4 RULE GENERATION where
tij (t) is the pheromone currently
In the training stage, a set of fuzzy available on the path between attri-
rules is generated using the training bute “” and antecedent fuzzy set “j”;
patterns. The fuzzy rules are of the
ηij is the heuristic value for termij;
form:
“a” is the total number of attri-
Rule Rj: If x1 is Aj1 and … and xn
butes;
is Ajn, then Class Cj with CF = CFj
“bi” is the total number of an
where “Rj” is the label of jth
antecedent fuzzy set for attributei;
fuzzy IF–THEN rule, Aj1… Ajn are
“l” is the set of attributes that are
the antecedent fuzzy sets in the unit
not yet used by the ant.
interval [0, 1], “Cj” is the consequent
Corresponding to the quality of
class and CFj is the grade of certainty
of the fuzzy IF–THEN rule. the modified rules, a pheromone
The rule learning process is done is assigned to each trail of the ant.
separately for each class. The list of The ants choose the trail with high
discovered rules is initially empty pheromone density.
and the training samples consist of
all the training samples. The first ant
1.2.6 HEURISTIC
constructs the rule “Rj” randomly by
adding one term at a time. The ants INFORMATION
modify the rule “Rj” according to the
The ants modify the rule using
maximum change parameter.
heuristic information and the amount
of pheromone. The model uses a
1.2.5 RULE MODIFICATION set of two-dimensional matrices as
heuristic information for each class.
During initial iteration (t = 0), the The rows represent the attributes and
ant creates a rule and in subsequent
the columns representing the fuzzy fuzzy IF-THEN rule “Rj”

values. These matrices help the ants using the following product
to choose more accurate rules. operation:
μj(Xp) = μj1(Xp1) x . . . . μjn(Xpm), p =

1.2.7 PHEROMONE UPDATE 1, 2, 3, m (1.9)
RULE
where µji (xpi) = membership func-
Pheromone update is carried out tion of ith attribute of pth pattern,
only when there is an improvement where “m” is the total number of
in the quality of the rule before patterns.
modification. It is carried out using
the following equations: 2. The relative sum of compat-
ibility grades of the training
∆Q=QiAfter modification – QiBefore modification (1.7) pattern is computed with
each fuzzy IF–THEN rule
τij(t+1) = τij(t) + τij(t)(∆iQ.C) (1.8) “Rj.”
where “ΔiQ” shows the difference

µ j ( xp )
between the quality of rule after classh ( R j ) = xpclassh , h = 1, 2,…,c
N classh
and before modification and “C”
is the parameter to regulate the (1.10)
influence of “ΔiQ” to increase the
pheromone. where
βclassh (Rj) is the sum of the
compatibility grades of the training
1.2.8 FUZZY INFERENCE patterns in class;
Rj is the fuzzy rule;
The fuzzy-ACO (Anto and Chan- Nclassh is the number of training
dramathi, 2015) system generates patterns.
and optimizes a set of fuzzy rules
to classify the test data. To compute 3. The grade of certainty CFj is
the certainty grade of each fuzzy determined as follows:
IF-THEN rule, the following steps
are followed: class hˆ ( R j ) −
CFj =
1. The compatibility of each

ch=1 classh R j ( ) (1.11)
training pattern xp=(xp1,xp2,…

where
xpn) is determined with the
h hˆ classh R j
j
( ) extremely redundant, then the input
data is represented with a reduced set
=
( C −1) (1.12) of features. The selection of the most
significant subset of features from a
The certainty grade for any
dataset is an optimization problem.
combination of antecedent fuzzy sets
In this system, the feature selection
can be specified. Combinations of
is done using hybrid GA–SA local
antecedent fuzzy sets for generating
search mechanism.
a rule set with high classification
ability are to be generated by the
fuzzy classification system. When a 1.3.2 OPTIMIZATION USING
rule set is given, an input pattern is
HYBRID GA–SA
classified by a single rule as given
below: This hybrid GA–SA optimiza-
tion technique is used for feature
{
µ j ( x p ) .CFj = max µ j ( x p ) .CFj | R j } (1.13)
selection and SVM parameter
optimization. The performance of
The winner rule has the maximum
an SVM classifier depends mainly
product of the compatibility and
on the values of the kernel function
certainty grade CFj.
parameter, Gamma (γ), and penalty
function parameter (C). Finding the
1.3 MEDICAL EXPERT SYSTEM best values of these two parameters
BASED ON SVM AND HYBRID to achieve a maximum classification
accuracy of the SVM classifier is
GENETIC ALGORITHM (GA)-
an optimization problem. A hybrid
SIMULATED ANNEALING (SA)
GA–SA algorithm is used to solve
OPTIMIZATION this problem and find the optimal
values of “C” and “γ.”
USING GA AND SA
1.3.2.1 STEPS OF GA
Feature selection is an optimiza-
tion problem, which is based on 1. Randomly generate an initial
the principle of picking a subset of source population with “n”
attributes that are most significant in chromosomes.
deciding the class label. It reduces 2. Calculate the fitness function
the dimension of the data. When the f(x) of all chromosomes in
input to an algorithm is too large to the source population using
be processed and is suspected to be min f ( x ) = 100* (x (1) − x ( 2 )) 2 + (1 − x (1)) 2
2
2. Create an empty successor • How high the current temper-

population and then repeat ature of our system is?
the following steps until “n”
chromosomes have been At high temperatures, the system
created. is more likely to accept worse
3. Using the fitness value, solutions.
select two chromosomes
“x1” and “x2” from the source
population. 1.3.2.3 STEPS OF SA
Apply crossover to “x1” Table 1.2 lists the terminologies used
and “x2” to obtain a child in the SA algorithm.
chromosome “n.”
Apply mutation to “n,” to TABLE 1.2 Terminology of the SA
produce a dissimilar new Algorithm
offspring. Terminology Explanation
Place the new offspring in X Design vector
a new population. fc System energy (i.e.,
Replace the source popu- objective function value)
lation with the successor T Temperature
population. ∆ The difference in system
4. If the termination criterion is energy between two
satisfied, stop the algorithm. configuration vectors
Otherwise, go to step 2.
Step 1: Choose a random “Xi,” select
the initial temperature “t1” and
1.3.2.2 ACCEPTANCE specify the cooling schedule.
FUNCTION Step 2: Evaluate fc(Xi) using a simu-
lation model.
The calculation chooses the best
answer to avoid local optimums. Step 3: Perturb “Xi” to get a neigh-
Initially, the neighbor solution is boring design vector (Xi+1).
checked to see whether it improves Step 4: Evaluate fc(Xi+1) using a
the current solution. If so, the simulation model.
current solution is accepted. Else, Step 5: If fc(Xi+1)<fc(Xi), Xi+1 is the
the following couple of factors are new current solution.
considered.
Step 6: If fc(Xi+1)>fc(Xi), then accept
• How greatly worse the neigh- Xi+1 as the new current solution with
boring solution is? a probability using exp(−∆/t) where
∆ = fc(Xi+1) − fc(Xi)
Step 7: Reduce the system worse, the solution can be accepted

temperature according to the cooling based on the current temperature to
schedule. avoid algorithm to stick in a local
Step 8: Terminate the algorithm. optimum. In the cooling phase, the
new temperature is determined by
the decrement function t.
1.3.2.4 NEIGHBORHOOD
SEARCH (HYBRID GA–SA) 1.3.3 CLASSIFICATION USING
SVM
In the hybrid SA algorithm, the best-
obtained solution in each GA genera- The main objective of SVM in clas-
tion is transferred to SA to improve sification is to separate data into
the quality of solution through two different classes with maximum
neighborhood search to produce a margin. Here, SVM is applied for
solution close to the current solution classification and for optimizing
in the search space, by randomly SVM parameters GA– SA is applied.
choosing one gene in a chromosome, The overall flow of the proposed
removing it from its original position system including feature selection,
and inserting it at another random classification, and optimization is
position in the same chromosome. shown in Figure 1.3.
According to this criterion, even
when the value of the next solution is
FIGURE 1.3 Flow diagram of the proposed GASA-SVM model.

Given the training sample of {α i }i =1 is Lagrange multiplier and W=

i
instance-label pairs (xi, yi), i = 1,. . . ,

∑ l
i=1 α i yiϕ , (xi) is the weight vector.
Ɩ, X i ∈ R n , yi ,∈ {1,-1}, SVMs require The classification decision func
the solution of the following (primal) tion is given as
problem (Keerthi and Lin, 2003).
( )
sgn W T Φ ( x ) + b = sgn(∑ li =1α i yi K ( xi , x ) + b) (1.19)
1 (1.14)
min W T W + C ∑ i=1 ε i
l
w,b ,ε 2
The kernel function K(xi, xj) has
manifold forms. In this work, the
( )
Subject to yi W T Z i + b ³ 1− ε i , ε i ³ 0, i = 1, …., l. Gaussian kernel function is shown in
Equation (1.20) or (1.21) is used:
where the training vector “xi” is
mapped onto a high dimension space K ( x, xi ) = exp −γ x − xi( 2
) (1.20)
by mapping function φ as zi = φ (xi).
C > 0 is the penalty parameter of the ⎛ 1 2⎞
K ( x, xi ) = exp ⎜ − 2 x − xi ⎟ (1.21)
error term. ⎝ σ ⎠
Usually, Equation (1.14) is
resolved by sorting out the following Both Equations (1.20) and (1.21),
dual problem: which are in the same context, can
transform parameters “γ” and “s2.”
1 The Gaussian kernel parameter “c”
min F (α ) α T Qα − eT α (1.15) 1
α 2 is determined by γ = σ 2 .
Subject to 0 ≤ α i ≤ C , i = 1, …l
The parameters of SVMs with
Gaussian radial basis function
γ Tα = 0 (1.16) (RBF) kernel refer to the pair—the
error penalty parameter “C” and
where “e” is the vector of all 1’s and the Gaussian kernel parameter “γ,”
“Q” is a positive semidefinite matrix. usually depicted as (C, γ).
The (i, j)th element of “Q” is
given by
1.4 MEDICAL EXPERT SYSTEM
Qi , j ≡ yi y j K ( xi x j ) (1.17) BASED ON ELM AND SA
The kernel function is The key problem in a neural network

(NN) is determining the number of
K ( xi , x j ) ≡ ϕ T ( xi ) ϕ ( xi ) (1.18) hidden nodes that affect accuracy.
To overcome this problem, the
proposed system uses ELM on single
hidden layer feedforward network where a directed cycle is not formed

(SLFN), in which the hidden nodes in the connections between the units.
are randomly selected and the The network consists of three layers
optimal number of hidden nodes is namely, the input layer, the hidden
determined by SA. The performance layer, and the output layer.
of an ELM is mainly decided by the
number nodes present in the hidden • The input layer consists of the
layer. This parameter (number of inputs to the network.
nodes) is optimized by the SA. • The hidden layer consists
of neurons or hidden units
placed in parallel. Each
1.4.1 FEEDFORWARD NEURAL neuron performs a weighted
NETWORKS (FFNNS) summation of the inputs
and then passes a nonlinear
FFNNs is the most widely used activation function called the
models in classification problems. neuron function.
It has a single hidden layer feed- • The output layer.
forward network with inputs “xi”
and output “Oj” (Figure 1.4). Each Standard FFNNs with hidden
arrow symbolizes a parameter in nodes have universal approximation
the network. It is an artificial neural and separation capabilities.
network (ANN; Vinotha et al., 2017),
FIGURE 1.4 Single hidden layer feedforward network.

1.4.2 FEATURE SELECTION • µj is the jth feature means of

the whole dataset;
The most significant features in • σj is the jth feature standard
the dataset are to be selected. The deviation (SD) of the whole
dimensionality of the dataset has a dataset.
higher influence on the accuracy of
the system. Hence, it is necessary to SD is computed as shown in
consider the aspects that reduce the Equation (1.23)
complexity of the system (Refaeil-
( ) = ( )
2 2
zadeh, 2007). Fisher score (FS) is j
G
n kj
k =1 k (1.23)
the most suitable method used to
select the best features of all the four
medical datasets. 1.4.2.2 NORMALIZATION
Scaling is done to evade the domi-

1.4.2.1 FISHER SCORE nance of attributes with greater
numerical values over the smaller
FS selects the most relevant feature values by computing the linear
“m” from the given set of features. transformation of numerical values
The datasets consist of (xi, yi) for within a range. The values of the
“N” instances, where “xi” is the input selected features from the dataset are
vector with “p” features and “yi” normalized from 0 to 1 as shown in
is the class label. To find the most Equation (1.24).
discriminating features, two basic
X − X min
steps are involved: X norm =
X max − X min
( upperbound − lowerbound ) (1.24)
1. Calculation of feature score where “X” is the original data, “Xmax”

for all features. is the maximum value of X, “Xmin” is
2. Selection of top “m” features the minimum value in X and “Xnorm”
based on the score. is the normalized value within the
given upper and lower bound.
FS is computed using equation
(1.22)
1.4.3 CLASSIFICATION USING
( )
2
nk µkj − µ j EXTREME LEARNING MACHINE
F ( x j ) = ck =1 j = 1, 2… p (1.22)
( )
2
j (ELM)
where
ELM, a learning algorithm, is a
• nk is the number of instances SLFNs proposed by Huang et al.
in each class k; (2004). The hidden node parameters
may be randomly chosen and fixed, • G ( ai , X j , bi ) = g ( bi X j − ai ) (1.26)

followed by analytically determining • The number of hidden nodes
the output weights. The parameters L.
of the hidden nodes are not only
independent of the target functions Step 1: By using continuous
or the training datasets but also of sampling distribution, assign hidden
each other. The parameters of ELM nodes by randomly generating
are analytically determined instead parameters (ai, bi), for i=1, 2,...,N
of being tuned. Once the weights of
Step 2: Compute the hidden layer
the SLFNs are randomly assigned,
output matrix “H.”
then SLFNs is considered as a linear
system. The output weights are Step 3: Compute the output weight

analytically obtained by a general- “ ,” by using the relation
ized inverse operation of the hidden
layer output matrices.
In contrast to the conventional  = H #T (1.27)
learning methods that see the training
data before generating the hidden
neuron parameters, ELM randomly
1.4.4 OPTIMIZATION USING
generates the hidden neuron param-
eters even before seeing the training SIMULATED ANNEALING
data. ELMs have fast learning speed,
The number of hidden nodes (L) has
easy to implement, and involve
a high impact on the performance
minimal human intervention. They
of the ELM-based classification
seem to be a feasible alternate for
system. Computing the optimal
large-scale computing and ML.
value for “L” is a demanding task.
Here, SA is employed in computing
1.4.3.1 ELM ALGORITHM the optimal value for “L,” to improve
the performance of ELM. It is one
The formal ELM algorithm is given of the most popular optimization
below; techniques used for finding solutions
Given: for optimization problems.
It is a local heuristic search
• A training set of input/output algorithm that employs a nongreedy
values: method for finding optimal solutions
• ( x i ,t i ) R n xR m , for i = 1, 2, that usually do not settle in local
…, N. (1.25) maxima. The strength of SA is that it
• An activation function: does not get caught at local maxima.
It searches for the best solution by
generating a random initial solution optimization search. The output of

and exploring the area nearby. If a the first stage is given as the input of
neighboring solution is better than the next stage. This decides whether
the current one, the algorithm moves the parameter is acceptable or not. If
to it. It is a kind of Monte Carlo it is not acceptable, the parameter is
method used for examining the tuned further.
state and frozen state of the n-body A maximum of “kmax” iterations
systems. are performed until maximum accu-
In the hierarchical SA search, racy is achieved. CallNeighbor( )
classification measure (CM) is used function finds the ensuing values of
as the basic optimization parameter. “N.” CallRandom() function gener-
The classification accuracy of ELM ates a random value from “0” and
is considered as the CM. Initially, “1.” The basic SA process is shown
the SA and the ELM parameters are in Figure 1.5. To divide the set into
initialized. Then, the neighbors of training and testing sets, the k-fold
the ELM parameters are selected. cross-validation procedure is applied
The neighbors are tuned using the SA to the selected feature set.
FIGURE 1.5 Simulated annealing search algorithm.

The cross-validation technique neurons from 1 to 200. The

returns a CM for “k” classifiers built value of the parameter “L”
by the ELM algorithm. Each fold of for which maximum CM is
the dataset is optimized using the obtained is chosen as the best
hyperparameter search strategy. The value.
procedure for cross-validation is as
follows:
1.4.5 FISHER SCORE-EXTREME
• Division of datasets: The LEARNING MACHINE-
medical datasets are divided SIMULATED ANNEALING (FS-
into training and testing sets. ELM-SA)
The k-non overlapping equal-
sized subsets are formed from In the proposed ELM-based learning
the given dataset “Di,” where machine that uses SA for optimiza-
i = 1, 2,...,k. tion, the SLFN has “L” hidden
• Classifier training: “k−1” nodes. It can be approximated by
folds are trained using the the given “N” pairs of input/output
classifier algorithm and the values namely, (xi, tj) Є Rnx Rm with
remaining one fold is tested zero errors. The overall flow and
on the trained classifier. Each working of the proposed system are
classifier output generates depicted in Figure 1.6
accuracy for the predicted
sets. The class performance is p
i=1 i G ( ai , X j , bi ) = t j , for j= 1, 2,….,L
analyzed using performance (1.29)
parameters.
• CM: The CM is obtained by where (ai, bi) is the parameter associ-
Equation (1.28) ated with “ith” hidden node, and “βi”
is the output weight that links the
Number of True Records Predicted
CM = (1.28) “ith” hidden node to the output node.
Number of Total Records
In this work, a nonlinear activation
CM is calculated for every function called RBF as shown in
sequential increase in the number of Equation (1.30) is used:
hidden nodes (L).
• Optimization parameters:
(
G ( a i , X j , bi ) = g bi X j − a i ) (1.30)
To find the optimal number Hence, Equation (1.30) can be

of neurons (L), the SA rewritten as,
optimization technique is
used. The CM is calculated Hβ = T (1.31)
by increasing the number of
FIGURE 1.6 Flow diagram of the proposed FS-ELM-LSSVM-SA.
where 
“ β ” is used as the estimated value
⎡ G ( a1 , X1 , b1 ) G ( a 2 , X1 , b 2 )………G ( a p , X1 , b p ) ⎤ of “β,” where “H#” is the Moore–
⎢ ⎥
⎢ G ( a1 , X 2 , b1 ) G ( a 2 , X 2 , b 2 ) ………G ( a p , X 2 , b p ) ⎥ Penrose generalized inverse of the
⎢ ⎥
H = ⎢⎢
. ⎥ hidden layer output matrix “H”
⎥
⎢ . ⎥ (Serre, 2002).
⎢ . ⎥
⎢ ⎥
⎢⎣G ( a1 , X N , b1 ) G ( a 2 , X N , b 2 ) ………G ( a p , X N , b p ) ⎥⎦
(1.32) 1.5 MEDICAL EXPERT SYSTEM

BASED ON LS-SVM AND SA
T
β = ⎣⎡ β1T , β 2T , β3T ….β pT ⎦⎤ (1.33)
To improve the classification perfor
mance of the expert system further,
T LS-SVM with RBF kernel is used
T = ⎡⎣t1T , t2T , t3T …t NT ⎤⎦ (1.34) for classification and the SA is used
for the optimization of the kernel
parameters of the LS-SVM. The
β = H #T (1.35) performance of the SVM classifier
is highly influenced by the kernel
function parameter gamma (γ) and A linear SVM is a binary clas

penalty function parameter (C). The sifier used to separate data between
critical parameters “C” and “γ” are two classes, yi ∈ {1, −1} . Separating
optimized using SA to get the best hyperplanes are formed using Equa
combination of kernel parameters, tion (1.36) and inequality for both
leading to the highest classification classes is found using Equation
accuracy. (1.37).
D ( x ) = ( w * x ) + w0 (1.36)
USING FISHER SCORE
yi ⎡⎣( w ∗ xi ) + w0 ⎤⎦ ≥ 1 , i= 1, . . . , n.
The FS algorithm is used for many (1.37)
supervised learning systems to deter
mine the most relevant and discrim where “xi” is the input vector, “m”
inative features for classification the number of features and “yi” is
(Yilmaz, 2013). Based on the promi the class label. Support vectors are
nence of the attributes in the dataset, the margin values that are formed
it generates a score for each attribute when the equality of Equation (1.37)
and vital features are selected based holds. Classification of data is done
on the scores. It uses discriminative using these support vectors.
methods and generative statistical
models to perform feature selection.
1.5.3 OPTIMIZATION USING
SIMULATED ANNEALING
LEAST SQUARE SUPPORT The performance of the LS-SVM
classifier (Aishwarya and Anto,
VECTOR MACHINE (LS-SVM)
2014) is also subjective to the values
The main objective of SVM in clas of ‘C’ and ‘γ’. As finding the best
sification is to separate data into values of these parameters is monot
two different classes with maximum onous, optimization techniques are
margin. A higher computational used along with LS-SVM. SA is one
load due to the quadratic program of the most popular optimization
ming problem is a challenge in techniques used for finding a solu
SVM. To balance this, Suykens and tion to the optimization problem. It
Vandewalle (1999) have proposed is a local heuristic search algorithm
LS-SVM. LS-SVM uses linear equa that uses a greedy method for finding
tions instead of quadratic program an optimal solution.
ming of SVM.
FIGURE 1.7 Simulated annealing search.
In least squares-support vector Cross-validation score (CVS) is

machine (LS-SVM), simulated the basic parameter that should be
annealing (SA) delivers the best maximized in the hierarchical SA
values for “C” and “γ” by trying search optimization technique. CVS
random variations of the current gives the classification accuracy of
solution (local). SA is a kind of the LS-SVM classifier. To commence
Monte Carlo method used for finding with, the SA and SVM parameters
the global minimum of a nonlinear are initialized. The neighbors of the
error function. Figure 1.7 shows the SVM parameters can be selected
basic SA search algorithm. and tuned using the SA optimiza-
Iteration “I0” is initialized and tion search. This helps in deciding
continues to the maximum of “kmax” whether the parameter is acceptable
steps. Let “kmax” be the maximum or more tuning is required.
number of iterations done until Cross-validation procedure
maximum accuracy is achieved. The involves the following steps.
function Neighbor () finds the next 1. Division of datasets: The
values of “C” and “γ.” The function diabetes dataset is divided
Random () generates a random value into training and testing sets.
between a range “0” and “1.” The “k” nonoverlapping
equal-sized subsets are using LSSVM, and optimization

formed from the given using SA.
dataset “Di” where i = 1, The FS is computed using Equa-
2,…,k. tion (1.39).
2. Classifier training: The c
n (µ −µj )
j 2
“k−1” folds are trained k k
using the classifier and the

j
F (x ) = k =1
( j ) 2
, j = 1, 2,, p. (1.39)
remaining one fold is tested
where “nk” is the number of instances
using the trained classifier.
in class “k,” “µj” is the mean of the
The output of each classifier
whole dataset for jth feature, “σj” is
generates accuracy for the
the SD of the whole dataset for jth
predicted sets. The classifier
feature. The SD is given by
performance is analyzed
using performance metrics. c
( j ) 2 = nk ( kj ) .
2
3. Calculation of CVS: It is (1.40)
calculated for each combina- k =1
tion of “C” and “γ” values. The values of the selected features
CVS is obtained from Equa- from the datasets are normalized
tion (1.38) between the range “0” and “1”as
shown below
# Records Predicted True
CVS = . (1.38)
# Total Records X normalized =
X − X min
( Upper Bound − Lower Bound ) .
X max − X min
4. Optimization parameters: (1.41)
Find an optimal solution for
the kernel parameters “C” where “X” is the original data, “Xmax”
and “γ” is tedious in case of is the maximum value of X, “Xmin” is
any SVM. the minimum value of “X” and “Xnor-
Hence, the SA optimization tech- ” is the normalized value within
malized
nique is used. Here, the kernel values the given upper and lower bounds.
are varied between 2−5 to 215 for “C” The following inequality holds
and 2−15 to 25 for “γ.” for the margins (support vectors) of
the hyper-planes:
1.5.4 FS-LSSVM-SA yk × D( xk )
€ •, k = 1,, n. (1.42)
w
The proposed FS-LSSVM-SA (Aish-
warya and Anto, 2014) involves Margin (Γ) is inversely propor-
the following steps namely, feature tional to “w,” thus minimizing
selection using FS, classification “w” and maximizing the margin.
Equation (1.43) is written to reduce The major difference between

the number of solutions for the norm SVM and LSSVM is that, LSSVM
of “w.” uses linear equation whereas SVM
uses quadratic optimization problem.
Γ × w = 1. (1.43) Equations (1.45) and (1.48) are
minimized as
Minimizing “w,”
yi ⎡⎣( wxi ) + w0 ⎦⎤ = 1− ξi ,i = 1, , n
1 2
w . (1.44) (1.48)
2
Slack variables “ξi” is added 1 C n
2
w + ∑ ξi
2
to Equation (1.38) and expression (1.49)
(1.44)
2 2 i=1
Based on Equations (1.45) and

yi ⎡⎣( wxi ) + w0 ⎤⎦ ≥ 1 − ξi . (1.45) (1.46), the dual problem can be built
as shown in Equation (1.50)
n
1
C ∑ ξi +
1 C n n
2
w . (1.46)
( w, b, α, ξ) =
2
2
2 i=1
{
w + ∑ ξi 2 − ∑ α i yi ⎡⎣( wxi ) + w0 ⎤⎦ − 1 + ξi }
2
i=1
i=1
(1.50)
SVMs are used to classify linear
data. In SVM, it is difficult to achieve Lagrange Multiplier “αi” can
better classification for nonlinear be either positive or negative for
data. To overcome this problem, LSSVM, whereas it should be
SVM uses kernel functions. Input positive for SVM. LS-SVM can be
datasets are distributed in nonlinear expressed as
dimensional space. These are
converted into high dimensional N
linear feature space by using kernels. f ( x) = sign ∑ yi α i K ( x, x′ ) + b.

i=1
RBF is used for such mapping of
medical dataset as given in Equation (1.51)
(1.47)
The k-fold cross-validation
procedure is applied to the selected
⎛ − x − x′ 2
⎞ feature set to divide the set into
RBF kernels: K ( x, x′) = exp ⎜ ⎟. training and testing sets. Cross-
⎜ σ2 ⎟
⎝ ⎠
validation technique returns the CVS
(1.47) for “k” classifiers which are built by
the LS-SVM algorithm.
1.6 CONCLUSION The performance of the proposed

systems is evaluated based on clas-
It has been observed that people sification accuracy, sensitivity, and
in the medical field face lots of specificity by constructing a 2 ×
challenges in the timely delivery 2 matrix named confusion matrix
of diagnosis and medication. using the predicted values of the
This situation has led to delay or classifier. The prediction accuracy,
nonavailability of medical assistance also known as classification accu-
to patients especially suffering from racy, is calculated from the values
chronic diseases. The introduc- of the constructed confusion matrix.
tion of an automated computing The performance of a classification
system to assist the physicians can system is evaluated by keeping all
drastically improve the reach of the instances of the database as a
a medical facility to the common test set, that is, the splitting of the
public. Recent advances in the field dataset is done using k-fold cross-
of AI have created an opportunity validation. The dataset was divided
for the researchers to investigate into 10 equal partitions and 1 parti-
several intelligent systems to assist tion out of 10 is kept as a test set and
the physicians in disease diagnosis. the remaining instances are used for
These systems are designed using training the classifier.
knowledge discovery in patient’s The datasets that are used for
data and ML. The major components analyzing the performance of the
of such a system include feature proposed systems are discussed in
selection, classification, and optimi- Section 1.1.4.
zation. Several research works in the The performance of the proposed
field of the medical expert system fuzzy-ACO system for all the data-
aim at improving the performance of sets is shown in terms of maximum
these components. accuracy. The comparison of the
The core objective is to improve performance of the proposed system
the diagnosis accuracy of the with other existing systems such as
medical decision support system GA–SVM (Tan et al., 2009), GA and
using several ML techniques. With grid algorithm (Martens et al., 2007),
this objective, four different deci- MLPNN (Nauck and Kruse, 1999)
sion support system designs were and AR2+NN (Dorigo and Blum,
proposed. The metrics that are 2005), SVM (Sartakhti et al., 2012),
considered to evaluate the perfor- ANN (Pradhan et al., 2011, Vinotha
mance of the proposed medical et al., 2017), RIPPER, C4.5 and 1NN
expert system model are discussed (Martens et al., 2007), Naïve Bayes,
in Section 1.1.5. and KNN (Sartakhti et al., 2012). It
is found that the fuzzy–ACO system as 6.455, and the final temperature
performs better when compared to as 0.333. The temperature of SA is
the existing methodologies for all gradually reduced from the initial
the datasets. value to the final in 50 cycles. GA
As a next step, a clinical deci- receives the best chromosome with
sion support system based on SVM the help of SA. The comparisons
and hybrid ^^GA–SA is used for of the performance of the proposed
diagnosis. The SVM with Gaussian system in terms of the classification
RBF kernel performs the classifica- accuracy with the existing systems
tion process. The hybrid GA–SA along with systems such as grid
is used for two purposes, one is to algorithm (Huang et al., 2006), RST
select the most significant feature (Azar et al., 2014), decision tree
subset of the dataset, and the other (Zangooei et al., 2014), BP (Orkcu
is to optimize the kernel parameters and Bal 2011), SVM NSGA-II
of SVM. While the existing RST (Zangooei et al., 2014), LDA-ANFIS
based model offered an accuracy of (Dogantekin et al., 2010), PSO, and
85.46%, the proposed GASA-SVM SVM (Sartakhti et al., 2012).
yields the maximum accuracy of Subsequently, a medical
93.6% for breast cancer (diagnostic) expert system based on ELM and
dataset. For the diabetes dataset, SA is proposed. Classification
SVM offers the least accuracy of is performed using ELM while
74% while the proposed GASA- optimization of ELM parameter
SVM yields the maximum accuracy is carried out by SA heuristic. The
of 91.2%. On the hepatitis dataset, performance of the proposed model
the proposed GASA-SVM gives the is compared with several existing
maximum accuracy of 87%. SVM- works. The RST based system
Gaussian kernel model yields the offers an accuracy of 85.46% while
minimum accuracy of 76.1% while the proposed ELM-SA yields the
the proposed GASA-SVM yields maximum accuracy of 94.39% for
the maximum accuracy of 89.3% for breast cancer (diagnostic) dataset.
cardiac arrhythmia dataset. The SVM based system offers the
In GA, an average of 30 genera- least accuracy of 77.73% while
tions is taken. The best fitness value the proposed GASA-SVM yields
is found to be 0.1481 and the mean the maximum accuracy of 96.45%
fitness values for the four datasets for the diabetes dataset. For the
are also calculated as 0.17 for PID, hepatitis dataset, the Naïve Bayes
0.19 for breast cancer, 0.18 for hepa- system yields a minimum accuracy
titis, and 0.18 for cardiac arrhythmia. of 82.05% while the proposed
For SA, the initial temperature is set GASA-SVM yields the maximum
accuracy of 81.08%. For the cardiac 97.54%. The SVM based approach
arrhythmia dataset, KNN-HITON offers the minimum accuracy
yields the minimum accuracy of of 77.73% while the proposed
65.3% while the proposed GASA- LSSVM-SA yields the maximum
SVM yields a maximum accuracy accuracy of 99.29% for the diabetes
of 76.09%. dataset. For the hepatitis dataset,
Experimental results show the the existing GRNN model offers
highest accuracy on the best folds, the least accuracy of 80% while the
the average accuracy over 10-folds, proposed LSSVM-SA yields the
sensitivity, and specificity of the maximum accuracy of 90.26%. The
proposed system for the four VFI5-GA system offers a minimum
medical datasets. The classification accuracy of 68% while the proposed
accuracy of the proposed system
LSSVM-SA yields the maximum
is compared with the existing
accuracy of 77.96% for the cardiac
systems such as RST (Azar et al.,
arrhythmia dataset.
2014), CART (Ster and Dobnikar,
To conclude, it is observed
1996), GRID algorithm (Chen et
that the medical expert systems
al., 2012), GA-based approach,
MKS-SSVM (Purnami et al., proposed in this chapter applied
2009), MABC fuzzy (Fayssal and over breast cancer, PID, hepatitis,
Chikh, 2013), VFI5-GA (Yilmaz, and cardiac arrhythmia dataset
2013), RF-CBFS (Ozcift 2011), produced improved classification
and AIRS-FWP (Polat and Gunes, accuracy when compared with
2009). other existing systems as shown
Finally, a medical decision in Table 1.3. The proposed system
support system based on LSSVM based on LSSVM-SA produced
and SA heuristic for the disease the highest accuracy over breast
diagnosis is proposed. FS method cancer, PID, and hepatitis dataset.
is used to select the most significant Moreover, the proposed system
features from the given feature set. based on GASA-SVM gives
LS-SVM with RBF is used for clas- maximum accuracy over the
sification and the SA for optimiza- cardiac arrhythmia dataset. Since
tion of the kernel parameters of the clinical decision making requires
LS-SVM. For breast cancer dataset the utmost accuracy of diagnosis,
the existing RST-based system medical expert systems design with
offered the least accuracy of 85.46% the highest classification accuracy
while the proposed LSSVM-SA can help the physicians to carry out
yields the maximum accuracy of an accurate diagnosis of diseases.
TABLE 1.3 Accuracies of the Proposed Systems and the Existing Systems
Breast Cancer Dataset Diabetes Dataset Hepatitis Dataset Cardiac Arrhythmia Dataset
Methods Accuracy Methods Accuracy Methods Accuracy Methods Accuracy (%)
(%) (%) (%)
RIPPER 79.75 SVM 74 SVM 74 VFI5-GA 68
C4.5 79.34 GA–SVM 82.98 KNN 75 PRUNING 61.4
APPROACH
1NN 80.37 ANN 73.4 C4.5 83.6 KNN-HITON 65.3
SVM ACO 81.93 LDA-ANFIS 84.61 NAIVE BAYES 82.05 KDFW-KNN 70.66
RST 85.46 SVM NSGA-II 86.13 KNN 83.45 AIRS-FWP 76.2
Design of Medical Expert Systems
GRID 90.78 ANN 73.4 GA–SVM 86.12 SVM-GAUSSIAN 76.1

ALGORITHM KERNEL
CART 93.5 SVM 77.73 PSO 82.66 HLVQ 76.92
DECISION TREE 92.81 GA-SVM 71.64 SVM 84.67 NEWFM 81.32
BP 93.1 GA BASED 82.98 GRNN 80 RF-CBFS 76.3
APPROACH
PROPOSED 87.19 MKS-SSVM 93.2 PROPOSED 84.95 PROPOSED 75.7
FUZZY-ACO FUZZY-ACO FUZZY-ACO
PROPOSED 93.6 MABC FUZZY 84.21 PROPOSED 87 PROPOSED 89.3
GASA-SVM GASA-SVM GASA-SVM
PROPOSED 94.39 PROPOSED 85.38 PROPOSED 81.08 PROPOSED 76.09
FS-ELM-SA FUZZY-ACO FS-ELM-SA FS-ELM-SA
PROPOSED 97.54 PROPOSED 91.2 PROPOSED 90.26 PROPOSED 77.96
LSSVM-SA GASA-SVM LSSVM_SA LSSVM_SA
PROPOSED 96.45
FS-ELM-SA
PROPOSED 99.29
27
LSSVM-SA
KEYWORDS Dorigo, M & Blum, C, 2005, ‘Ant colony

optimization theory: A survey,’ Theoretical
Computer Science,’ vol. 344, no. 2, pp.
medical expert systems 243–278.
Fan, CY, Chang, PC, Lin, JJ & Hsieh, JC,
machine learning
2011, ‘A hybrid model combining case-
classifer optimization based reasoning and fuzzy decision tree for
clinical decision making medical data classification,’ Applied Soft
Computing, vol. 11, no. 1, pp. 632–644.
Feigenbaum, EA, 1977, ‘The art of artificial
intelligence. 1. Themes and case studies
REFERENCES of knowledge engineering (No. STAN-
CS-77-621), Stanford University CA,
Aishwarya S, Anto S, 2014, ‘A medical
Department of Computer Science.
decision support system based on genetic
Friedman, JH, 1998, ‘Data mining and
algorithm and least square support vector
statistics: What's the connection?,’
machine for diabetes disease diagnosis’
Computing Science and Statistics, vol. 29,
International Journal of Engineering
no. 1, pp. 3–9.
Sciences & Research Technology, vol. 3,
Huang, GB, Zhu, QY & Siew, CK, 2004,
no. 4, pp. 4042–4046.
‘Extreme learning machine: A new
Anto S , Chandramathi S, 2015, ‘An
learning scheme of feedforward neural
expert system for breast cancer diagnosis
networks,’ Proceedings IEEE International
using fuzzy classifier with ant colony
Joint Conference on Neural Networks, pp.
optimization,’ Australian Journal of Basic
985–990.
and Applied Sciences, vol. 9, no. 13, pp.
Jackson P, 1999, ‘Introduction to Expert
172–177.
Systems,’ 3rd Edition, Addison Wesley,
Azar, AT, Elshazly, HI, Hassanien, AE &
Reading, MA, USA.
Elkorany AM, 2014, ‘A random forest
Kourou, K, Exarchos, TP, Exarchos, KP,
classifier for lymph diseases,’ Computer
Karamouzis, MV, & Fotiadis, DI, 2015,
Methods and Programs in Biomedicine,
‘Machine learning applications in cancer
vol. 113, no. 2, pp. 465–473.
prognosis and prediction,’ Computational
Chen, HL, Yang, B, Wang, G, Wang, S. J,
and Structural Biotechnology Journal, vol.
Liu, J & Liu, DY, 2012, ‘Support vector
13, pp. 8–17.
machine based diagnostic system for
Mannila, H, 1996, ‘Data mining: Machine
breast cancer using swarm intelligence,’
learning, statistics, and databases,’
Journal of Medical Systems, vol. 36, no. 4,
in SSDBM, pp.2–9.
pp. 2505–2519.
Martens, D, De Backer, M, Haesen, R,
Coenen, F & Bench-Capon, TJM, 1992,
Vanthienen, J, Snoeck, M & Baesens, B,
‘Maintenance and maintainability in
2007, ‘ Classification with ant colony
regulation based systems,’ ICL Technical
optimization,’ IEEE Transactions on
Journal, vol. 5, pp. 76–84.
Evolutionary Computation, vol. 11, no. 5,
Dogantekin, E, Dogantekin, A, Avci, D &
pp. 651–665.
Avci, L, 2010, ‘An intelligent diagnosis
Nauck, D & Kruse, R, 1997, ‘A neuro-fuzzy
system for diabetes on linear discriminant
method to learn fuzzy classification rules
analysis and adaptive network based fuzzy
from data,’ Fuzzy Sets and Systems, vol.
inference system: LDA-ANFIS,’ Digital
89, no. 3, pp. 277–281.
Signal Processing, vol. 20, no. 4, pp.
1248–1255.
Niranjana Devi Y, & Anto S, 2014, ‘An with other methods,’ In Proceedings of
evolutionary-fuzzy expert system for the the International Conference EANN, pp.
diagnosis of coronary artery disease,’ 427–430.
International Journal of Advanced Suykens, JA & Vandewalle, J, 1999,
Research in Computer Engineering & ‘Least squares support vector machine
Technology, vol. 3, no. 4, pp. 1478–1484. classifiers,’ Neural Processing Letters, vol.
Orkcu, HH & Bal, H, 2011, ‘Comparing 9, no. 3, pp. 293–300.
performances of backpropagation Tan, KC, Teoh, EJ, Yu, Q & Goh, KC, 2009,
and genetic algorithms in the data ‘A hybrid evolutionary algorithm for
classification,’ Expert Systems with attribute selection in data mining,’ Expert
Applications, vol. 38, no. 4, pp. 3703–3709. Systems with Applications, vol. 36, no. 4,
Polat, K & Güneş, S, 2009, ‘A new feature pp. 8616–8630.
selection method on classification Vinotha PG, Uthra V, Dr Anto S, 2017, ‘
of medical datasets: Kernel F-score Medoid Based Approach for Missing
feature selection,’ Expert Systems Values in the Data Sets Using AANN
with Applications, vol. 36, no. 7, pp. Classifier,’ International Journal of
10367–10373. Advanced Research in Computer Science
Pradhan, M & Sahu, RK, 2011, ‘Predict the and Software Engineering, vol. 7, no. 3,
onset of diabetes disease using artificial pp. 51–55.
neural network (ANN),’ International Watson, I, Basden, A & Brandon, P, 1992,
Journal of Computer Science & Emerging ‘The client-centred approach: Expert
Technologies, vol. 2, no. 2, pp. 2044–6004. system maintenance,’ Expert Systems, vol.
Purnami, SW, Embong, A, Zain, J.M & 9, no. 4, pp. 189–196.
Rahayu, SP, 2009, ‘A new smooth support Watson, I & Marir, F, 1994, ‘Case-based
vector machine and its applications in reasoning: A review,’ The Knowledge
diabetes disease diagnosis,’ Journal of Engineering Review, vol. 9, no. 04, pp.
Computer Science, vol. 5, no. 12, 1003. 327–354.
Refaeilzadeh, P, Tang, L & Liu, H, 2007, Wernick, MN, Yang, Y, Brankov,
‘On comparison of feature selection JG, Yourganov, G & Strother, SC,
algorithms,’ Proceedings of Association 2010, ‘Machine learning in medical
for the Advancement of Artificial imaging,’ IEEE Signal Processing
Intelligence,” pp. 35–39. Magazine, vol. 27, no. 4, pp. 25–31.
Sartakhti, JS, Zangooei, MH & Mozafari, K, Yilmaz, E, 2013, ‘An expert system based
2012 ‘Hepatitis disease diagnosis using on Fisher score and LS-SVM for cardiac
a novel hybrid method based on support arrhythmia diagnosis,’ Computational and
vector machine and simulated annealing Mathematical Methods in Medicine, vol.
(SVM-SA),’ Computer Methods and 5, pp. 1–6.
Programs in Biomedicine, vol.108, no. 2, Zangooei, MH, Habibi, J & Alizadehsani,
pp. 570–579. R, 2014, ‘Disease diagnosis with a
Ster, B & Dobnikar, A, 1996, ‘Neural hybrid method SVR using NSGA-
networks in medical diagnosis: Comparison II,’ Neurocomputing, vol. 136, pp. 14–29.
CHAPTER 2
FROM DESIGN ISSUES TO VALIDATION:

MACHINE LEARNING IN BIOMEDICAL
ENGINEERING
CHRISTA I. L. SHARON* and V. SUMA
Department of Information Science and Engineering,
Dayananda Sagar College of Engineering, Bangalore, Karnataka, India
Corresponding author. E-mail: sharonchrista-ise@dayanandasagar.edu
*
ABSTRACT based machine learning models can

be validated is also depicted further.
This chapter is all about the design
and validation challenges that
are faced when adapting machine 2.1 INTRODUCTION
learning techniques in biomedical
engineering specialization. The With the integration of software
issues concerning the acquisition systems in day to day life, by 2020,
of data from biomedical sources to 44 trillion GB of data will be accu-
concerns in the design and valida- mulated, of which a huge share is
tion of the machine learning models from the biomedical sector. The
are wrapped up and presented. The computing system has crossed the
chapter describes the data acquisi- era of limited, slower, and expensive
tion prospects in the biomedical memory storage, therefore, storage is
systems and the different approaches not a hurdle anymore. The advance-
that can be followed in the knowl- ments in technology-enabled
edge representation. Further, the storing huge amounts of data. Data
design issues and feature selection in the repository can be utilized
techniques that can be followed to for analytics that gives insights on
achieve an optimum learning model common trends, anomalies patterns,
are presented. The different ways in and relationships. Researchers and
which the outcome of the biomedical scientists all over the world are using
advanced technology and algo- binary type, image type to temporal

rithms in predicting and detecting and fuzzy types.
diseases. Even though the proposal While developing an AI-based
to integrate AI and biological data system for biomedical data, the
was spurred in the early 1970s, the application is very important. The
approach becomes popular in the accuracy of the models will be based
1980s. Even then it had limited on the initialization parameters, data
results because of the unavailability input type, and the required output
of the apt technology. Watson’s (Villanueva et al., 2019). These
system, a machine learning-based are considered the design issues
research initiative by IBM and that need to be addressed. Further,
similar machine learning algorithms analysis and the factors to be consid-
produces results that are accurate ered in the analysis, the methods of
than doctors themselves. Precision validating and evaluating the output
medicine using artificial intelligence is application specific.
techniques is an emerging field. For The chapter gives an introduction
any of the technology to work in the and details on the design issues that
desired manner, the availability of have to be considered in modeling
data is essential. an artificial intelligence-based
Data in the right format enables biomedical system. Design issues
one to apply the available advanced cover the approach to be considered
technologies to produce desired in identifying the model and its
results. When data analytics enables objectives. An approach to identify
the researcher to identify the the right data source followed by a
patterns and trends, data analysis different approach to extract data
is the process that acts as a support from medical records is put forth.
system to perform data analytics. Data collection from available data-
Data analysis cleans and transforms sheets and processing is presented.
the data into the required format. Since missing data is a major
The process is important especially concern and it affects the accuracy
in biomedical data because of the of the models, the approaches in
complicated and diverse nature of filling out the missing data are
the medical records. Computerizing presented further. This is followed
the test results of different kinds and by parameters that can be considered
the hand-written medical notes are in identifying the number of data
pretty challenging. Organizing and points as well as requirements in
automating the same is pretty chal- the training and testing of the data.
lenging. The computerized data that Further, the factors that have to be
is available has types that range from considered in designing the model
From Design Issues to Validation 33
as well as what factors have to be base, the data of biomedical signals

considered are presented in the next like bioelectric signals, blood
part of the chapter as analyses of the oxygenation, respiration, etc. are
data model. acquired by different biomedical
Approaches for the validation instruments like EEG, ECG, etc.
and evaluation of the data models are These data need to be stored for
presented along with the issues that the proper performance of the AI
have to be considered for which the models.
different aspects of the system that The different ways in which a
has to be considered are presented knowledge base can be populated is
further. The analysis to be followed presented further:
to identify the appropriateness of the The raw biomedical signals
model will give an insight into the collected from different biomedical
factors that have to be considered in instruments are stored in different
a model before adapting a training forms presented in Section 2.3.1.
set as well as the model itself. The These data are stored in structured
practical considerations as well as storage systems called databases.
the appropriateness of the algorithm On the other hand, the relation-
based on the performance evaluation ships between different variables
and the different ways to perform the of the databases are maintained in
same will conclude the chapter. the knowledge base. To identify
the same requires inputs from the
domain experts. The initial step is
2.2 DATA AND KNOWLEDGE to populate the database and then
ACQUISITION establish the relationships.
There are different approaches
An AI-based system will have two in which the knowledge base can
parts, a knowledge base that is fed be populated. Rule-based knowl-
with data acquired from different edge base is one of the typical and
sources along with the inputs from primary approaches that is adapted.
domain experts and an expert system In rule-based approach, the rules
that derives inferences from the are derived according to the criteria
knowledge base. The development that are based on the inputs from the
of a knowledge base in a required emergency room details. Figure 2.1
domain is a major issue since these specifies a sample ER data and the
knowledge bases require inputs from rules based on the specifies sample
experts and domain associated data- ER data are presented further in
bases. To develop the knowledge Figure 2.2.
FIGURE 2.1 ER data and the rules.
FIGURE 2.2 ER-based rules.
Once the rules are developed, the very well-reflected. Another impor-
models are implemented. Further, tant factor is to streamline the data
the same is evaluated based on the recording process where the factors
data. The rules can be modified or that are being recorded should be
changed if required as per the inputs represented in the same way as it is
from the evaluation data. followed in the medical field. The
Another approach is to extract outcome of the knowledge elicitation
knowledge from the experts. Elic- is to lead to a state where knowledge
iting information from the experts state should lead to decision making
requires sound knowledge of the state.
requirements and the type of knowl- When multiple experts are
edge that is required. The question concerned about knowledge elici-
that is raised to the experts should tation, the factors that should be
as well map very much with the taken care of are different; experts
decision-making process. The actual will follow different strategies to
decision-making process should be diagnose and treat some diseases.
Furthermore, the rules specified inputs on the actions. The evalua-

by different experts can contradict tion enables a better understanding
and it can be inconsistent. There- of the knowledge base. Induction
fore, the inconsistencies should be is based on example-based learning
removed to get the most accurate that involves supervised and
and optimum model. Therefore, unsupervised learning methods,
contradictory information if present representing simple and complex
in the knowledge base as well as the concepts, etc. In all the cases,
rules that will lead to inconsistent representation of the primary
conclusions should be removed knowledge and the associated rules
(Parmar, 2018). is the biggest challenge. Each of the
One of the major concerns of training data points must be mapped
the knowledge base and its related as the rule (Li, 2018).
model is that the models cannot be Knowledge acquisition requires
complete. All the different cases of information that is associated with
symptoms cannot be included and the knowledge base and that can be
there are boundaries in the expert acquired by a better understanding
system that should be accepted.
of the meta-data. The meta-data
The boundary can be overcome
is the data that deals with domain
to some extend by updating the
knowledge. The data represents the
information in the knowledge
depth of the understanding of the
base continuously according to
knowledge base and the associated
the field changes and expansion of
rules and its correctness. Meta-data
knowledge.
further represents the process and
An alternative to the expert elici-
tation is to learn through examples. steps that are taken by the AI-based
Rote learning, advice taking, and models to reason and reach a
induction are the three different decision.
types of learning through examples.
Rote learning is nothing but storing
2.3 KNOWLEDGE
the information that is presented
to the system that is similar to the REPRESENTATION
human-based learning. At this level,
Biomedical data takes different
knowledge organization is required.
In the advice taking system, the steps forms and has to be derived from
involved are based on deducing and different forms of medical records.
understanding the consequences The different types of data encoun-
of the actions and updating the tered are patient history, physical
knowledge base according to the exams, laboratory tests, pathology
reports, imaging reports, and best or so. These types of data

different tests. The conversion of need to be numerically coded
the medical data to electronic data is in the ordered form.
important since models developed • Integer and continuous
using erroneous data will result data include variables like
in faulty AI models. The different temperature with an inherent
data forms that can be considered order. The main consideration
while converting the biomedical in such data is precision and
data to electronic data are specified accuracy.
in Section 2.3.1. The conversion of
• Fuzzy data can be used in the
medical data to electronic data is
representation of informa-
the first step that will be discussed
tion that is associated with
in Section 2.3.2. The ways in which
imprecision.
knowledge is represented in the
AI-based systems play a major • Temporal data represents
role in the decision support system. the findings related to the
There are various approaches to biomedical data over a period
it. Section 2.3.3 onward gives the of time and time-series data
details of the different approaches is a category of temporal data
in knowledge representation. that records and presents the
patterns of readings associ-
ated with different biological
2.3.1 DATA FORMS functions.
• Image data: digitized images
Different forms of data that can be are the outcome of different
derived from medical records are medical imaging technologies
presented further. that are used in the diagnosis
of different diseases. Images
• Binary/bipolar data with two are represented as pixels
possible responses that can be that determine its resolution.
assumed values 0/1 or −1/1 Image data is analyzed using
• Categorical data with values pixel values.
that can be ranked worst to
FIGURE 2.3 Patient record.
2.3.2 MEDICAL DATA TO be illegible with multiple interpreta-

ELECTRONIC DATA tions (Sarjapu, 2016). Therefore, the
format should be made in such a way
Figure 2.3 shows a sample patient that the information relevant to the
record from which data need to be current decision-making process is
extracted. Typically patients will specified. A sample data collection
have different volumes of medical format associated with a model
records. The extraction of medical for cardiac diseases is presented
information from the handwritten in Figure 2.4 (Hudson and Cohen,
records is challenging since it will 2000).
FIGURE 2.4 ER data converted to electronic data.
2.3.3 DIFFERENT the knowledge based on the

REPRESENTATIONS requirements.
The data once derived from the

biomedical records need to be 2.3.3.1 DATABASE
processed and represented in the
proper format based on the type Databases provide primary physical
of information that is available storage at a logical level of the data
(Friedman, 2018). These different organization that makes storage of
ways of representing the data the records more efficient, that is,
will aid in building the decision database is used in the logical orga-
support systems of the AI models. nization of data. There are different
There are different approaches database organizations based on
that can be adapted in representing mathematical relations that are being
used at different levels to reduce In general, database follows a

the complexity of data retrieval like relational database structure. Figure
hierarchical structure, relational struc- 2.5 shows a sample patient relation
ture, object-oriented structure, etc. (Healthcare, 2019). The advantage
Different database product and service of a database over traditional file
providers provide databases that are structure is that database provides
based on relational database structure data consistency and integrity. It
since it has a strong mathematical further avoids data inconsistency.
basis. Data in the database is stored The relational databases fail to inte-
as tables with each characteristic of grate the knowledge of the domain
the biomedical signal stored as an to the raw data. To achieve the same,
attribute. The records of each patient an advanced database architecture
will be stored as a row in the table. can be adapted, which is the object-
The rows are termed as tuples and oriented database. The features of
are typically unordered. Furthermore, object-oriented modeling like a
attributes are also unordered. Typi- combination of procedures and data,
cally, the entities represented as a table hierarchy, and inheritance is adapted
will be related to each other where to integrate the knowledge base to
relationships are specified. Further, the raw database. Further, object-
top-level databases have fields that oriented databases are highly adapt-
are referred to databases at the lower able and can be used in the inclusion
levels when the databases follow of temporal data.
hierarchical structure (Liu, 2018).
FIGURE 2.5 Sample patient relation (Reprinted from Khan, 2010).
The database enables efficient in the language for data manipula-

addition, deletion, and retrieval of tion. Further, different constructs
data from the database. It is accom- like SELECT, WHERE, COUNT,
plished by the querying language. AVERAGE, MAX, etc. are used in
The query language is based on the deriving the data in the desired form
relational algebra. Logical constructs (Agrawal, 2018).
like AND, OR, and NOT are used
Based on the location, a database 2.3.3.3 FRAMES

can be centralized or distributed.
The database if present in a single A frame is another approach for
location then it is termed centralized knowledge representation that
and if it is spread across multiple includes general information related
locations it will be distributed. The to the event. The same can be modi-
adaptation of the different types of fied and adjusted to fit different
databases is mainly based on the situations. This further mimics the
requirements and other factors that knowledge representation in humans
concern the client. where the preliminary knowledge
on the situation if encountered will
be used to perform the actions and
2.3.3.2 DATABASE AND accordingly the knowledge of the
KNOWLEDGE-BASED SYSTEMS humans will be updated (Priya-
FOR TEMPORAL DATA darshini and Hemanth, 2018). The
same is achieved by adopting the
The focus of storing temporal data object-oriented property called
is to identify the occurrence of inheritance. Each frame associated
the event and the time at which with a user will inherit the general
the event occurred. These data are information in the mainframe.
termed as point events. Further, the Further, user-specified details are
point events are classified as simple, filled in the associated fields and
complex, and interval events where subframes. Triggers can be further
complex events are a combina- used to follow the data-driven
tion of simple and interval events approach in the frames. The triggers
and interval events are events that will be invoked in case of a change
happened for the interval of time. in the frame field values. Figure 2.6
An example for a simple event is shows a general frame structure and
the ECG reading. Interval events the inherited user-specific frame.
can be stress, blood pressure, etc.
Complex reading can be chest pain
where heartbeat reading and BP are
also considered. The representation
of the same in the database requires
the time also to be captured along
with the event. In knowledge-based
systems, the temporal variable
whereever relevant should be taken
and recorded in a usable format. FIGURE 2.6 Sample general frame
structure and inherited user specific frame.
2.3.3.4 PRODUCTION RULES patient/user. Once the inputs for the

cases for conditions are obtained,
The other approach for the repre- the model will give the specified
sentation of the knowledge base is outcome (Panyam, 2018).
termed as production rules. It is one Even with all the associated rules
of the earliest, flexible, and most and specifications, processing of the
important knowledge representation language is very challenging. Since
approach. A production rule-based the expert systems are domain-
approach can be categorized into specific the language used will only
two parts. The first part will identify be the subset of the natural language.
the different situations that can be Further, to overcome the same
encountered in a domain. The second there are systems that allow some
part will identify actions that can be conditions to be matched and give
adopted in different situations. The responses accordingly. The rule-
situation, in general, is termed as based systems should be data driven,
a condition and that is followed by where the user will be allowed
actions. to enter the information and this
The system works in such a way information can be matched with the
that the occurrence of a condition conditions.
will trigger the action. There are The rules or combinations of rules
cases where multiple conditions are represented logically with AND/
occur. In that case, the conditions OR Tree. The tree presents a logical
will be conjugated and the actions grouping of the conditions and the
will be performed only if the actions. The leaf node of the tree
conditions specified are all true. To represents the actions and the root
confirm the occurrence of the condi- and internal nodes represent facts
tions, the system works in such a or conditions. Figure 2.7 presents a
way that question are directed to the general structure of AND/OR Tree
FIGURE 2.7 AND/OR Tree Pi is the production rule and AI is the action. The edges from
same nodes represent AND operation.
based on the production rules. The and domain. Also, it can be used in
situations where values of multiple keeping track of the rule searching
conditions are considered the order order. Further, it mimics human
of the conditions are analyzed to see reasoning in the standard conditions
if it matters. In that case, the order with a uniform structure.
of the conditions is set. Further, in
case of conflict in the condition,
the priority in which the conditions 2.4 FEATURE SELECTION
should be considered is set.
The main drawback of the The primary factor that influences
production rule is that the model is the design and development of an
limited to the factors presented in AI model is the input variables. The
the production rules. The scenario in process of identification of these
which other conditions are present input variables is the preliminary
will not be considered in such a step for the design of an AI model
system. The rule search strategies and the same is called feature
are limited only to forward chaining extraction. The process is uniform
and backward chaining. In certain for supervised and unsupervised
rule-based systems, the rules allow learning models.
standard conditions and the inclu- The first step of any feature
sion of AND, OR NOT, COUNT, extraction phase is to identify the
etc. along with the certainty factor types of variables that can be used
will provide more accurate results. to develop the models that will give
Each production rule is associated optimized results. The feature selec-
with a certainty factor that is a value tion process starts with the feature
between zero and one. The AND extraction process and it depends
nodes in the AND/OR Tree will have on the type of model and type of
the certainty factor that is calculated variable. The biomedical systems
based on the outcome of the situation have continuous variables as well
and is carried forward. 1—certainty as time series variables; the models
factor is considered for the evalua- developed should be able to accept
tion of the outcome (Yu, 2015). This such inputs.
can be considered mainly in cases One aspect of feature extraction
where ad hoc approach is considered is to identify all the possible vari-
with uncertain information. Even ables that will contribute positively
then there are factors that make the to the development of AI-based
production of rule-based system models. Even though only a limited
inefficient. Primarily, it is due to the number of variables are allowed
restriction to the formal specification to be included as input variables,
an adequate amount of data under to classifying the images (Duda and

each variable should be available Hart, 1973). Changes in gray levels
for the training of the models. In the (Vannier et al., 1992), areas with
learning process, the weight of the irregular borders (Wang, 2011), and
unimportant variables will approach changes from previous images of
zero. An example of the features that the same patient (Zhao and Peng,
can be considered while modeling 2010), asymmetry are some of the
using brain signals is (Chen et al., features.
2016). Further, moving on to the time
series data, biomedical signals are
• WM stage is labels. recorded in different ways and they
• Frequency is a continuous can be classified as signals with
variable. built-in patterns and without built-in
• Feature weight is a discrete patterns (Beam, 2018). Body signals
variable. with built-in patterns can be recorded
based on the normal variations
All these variables will be
from its standards (Pyrkov, 2018),
enumerated to develop the model.
whereas signals without built-in
Of the three variables presented in
patterns need the consultation of
the example, feature weight is an
integer, WM stage changes based the medical experts to identify and
on the electrode used so categori- extract the information that has to be
cally and, frequency has continuous recorded (Peng, 2018).
values. The parameter can be clinical While considering the number
and nonclinical. Again the consider- of modeling of the system, the
ation of parameters is purely based number of variables corresponds
on the requirements of the model. to the number of dimensions that
Further, images and image-related would generalize the models result
data play an important role in and it is practically not possible
medical diagnosis, therefore, feature to visualize the models with more
extraction from medical images than three-dimension. Therefore,
depends on the goal the model has an increase in the number of vari-
to achieve. The recognition and ables will result in the increase in
classification of different patterns of the dimensions that will make the
the images are very much important training of the model more time-
for the accurate development of the consuming. Therefore, feature
models. Researchers have worked selection is crucial in building
in this area and have put forth the AI-based models with better accu-
best practices that can be adapted racy and reliability.
2.5 DESIGN ISSUES fuzzy data like a range of blood

sugar are represented as raw data
Once the required data is stored in that need to be encoded in the most
the proper format, AI models can be suitable way to get the most accurate
developed. To achieve the same, it is outcome.
important to consider different issues Further, one of the major features
related to designing the system. of biomedical data is the presence
Availability of data, identification of of the missing data. If the decision-
the requirements of the output, and making AI models have to rely
objectives plays a major role in the on the charts then protocols have
system design. Further, the major to be established to deal with the
concerns are the specific models missing data. The different protocols
that can be adapted for a particular that can be followed are to remove
application. the cases where the data fields are
The parameters that will aid in missing. The integrity of the data is
the best suitable model is and will ensured in this particular case but the
be a factor that will have an impact number of data points will reduce.
on the designing of the AI-based Or wherever the data is missing,
models. The decision-making enter the minimum possible value
process involved in the AI-based or the maximum possible value. The
models required appropriate average value can also be considered
information sources. The different as the entry in the records. The usage
approaches that can be considered of maximum, minimum, or average
are knowledge-based approach and depends mainly on the significance.
data-driven approach. The choice In case of the missing blood pressure
of the approach depends mainly on value, an average of the previous and
the availability of the data. Expert the next reading can be considered as
opinions if required can also be inte- the input of the missing field (Shah
grated into case of knowledge-based and Chircu, 2018).
approach (Sniecinski, 2018). Time series data as well as image
The input to the system will be data needs special consideration.
the data extracted from the medical The different categories of time-
records that are encoded as per the series data include the data reading
model requirements. As mentioned from ECG, EEG, biosensors, etc.
in Section 2.3, data represented in The monitoring of related AI models
the different formats will be manipu- requires real-time analysis and
lated for the modeling. Multiple decision-making systems. Therefore,
responses like medicines prescribed, the models should have the least
categorical data like symptoms, trade-offs in terms of analysis time,
execution time as well as interpreta- unlabeled dataset. The models build

tion time. When it comes to image with the dataset comes under clus-
data, different imaging techniques tering or self-organizing networks
are in use for the proper diagnosis where it can be used to determine the
of the diseases. The image encoding type of malignancy in the tumors.
and analysis is the primary concern These systems have little or zero
along with the determination of control over the mapping of patterns
anomalies (Hudson and Cohen, to the clustering. Since the data is
2000). unlabeled, the data is highly sensi-
Further, the choice of the tive to the presence of noisy data.
learning algorithm and the choice Further, it cannot be determined
of the features of the algorithm with a fair accuracy if the solution is
should be considered for the stability correct or not (Yu, 2018).
and convergence of the AI-based The results can be interpreted
system. The integration and testing based on various factors, and the
of multiple algorithms are very approach depends on the type of
much required to ensure that the algorithm. In the case of the super-
selected model that is developed for vised learning algorithm, the process
the application produces the best of developing the model is divided
possible outcome than producing into the training phase, testing phase,
meaningless outcomes or simply and the validation phase. In the
fails to produce an outcome. training phase, the most important
The different approaches that can parameters that have to be consid-
be considered in the development of ered for modeling are determined.
the learning models are supervised The different cases that have to be
learning models and unsupervised classified are also determined. An
learning models. To develop a analysis of whether the classification
well-rounded supervised learning cases that have to be considered
model a very detailed labeled dataset is present in the dataset should
is required. The model produced be performed. Further, the model
will be able to give results under performance is measured based on
two different categories, namely, the outcome of the models during
classification and regression. The the testing phase. The classified
supervised learning algorithms can outcome by the models is compared
be used in the classification of malig- with the already known outcome to
nant tumors or can be used in the determine the error (Yu, 2018).
prediction of heart failure. Whereas In the case of the unsupervised
unsupervised learning algorithms on learning models, the primary objec-
the other hand are build based on the tives are to identify the number of
clusters that can be derived and the and the different attributes need to be
models need to be evaluated based scaled down to the same range. To
on generally available knowledge. achieve the same normalization of
The establishment of different the data can be performed where all
models enable in a more accurate the attribute values can be narrowed
differential diagnosis where the new down to values between 0 and −1 or
data presented for the modeling will +1.
be almost similar to the original data. A major limitation of the devel-
To get the most out of the models, opment of AI-based models is its
it is always beneficial to scale the data dependence on the training dataset.
to the same range. Since different The performance of the models
attributes will have a different range mainly depends on the specification
of data, the implication of the same of the training model. The design
will be different. Therefore, the data issues presented in Section 5 are
summarized in Table 2.1.
TABLE 2.1 Design Issues Associated with Building AI Models

Sr. No. Design Issues
1 Information source/availability of data
2 Parameter selection
3 Encoding approach required for different types of data
4 Handling the missing data
5 Choosing the most appropriate learning algorithm
6 Choosing the required number of classification/cluster and its presence in
the available data
7 Determining the most appropriate data preprocessing techniques
2.6 VALIDATION system are discussed. Therefore,

the approaches that can be adapted
Validation of AI-based decision in the evaluation of performance as
support systems has different levels well as the evaluation of the model is
of problems. Primarily the outcome presented further in this chapter.
of the systems is based on the
knowledge base and the parameters 2.6.1 INPUTS FROM THE
that should be considered. Further, DOMAIN EXPERT
the problems and the points that
need to be considered in validating The knowledge base and algo-
the outcome of a decision support rithms are not interrelated. But it
can influence the outcome of the and optimized with the presence of
expert systems. The development all the required attributes. Further,
of the knowledge base is based on in cases where the standards that are
the inputs from domain experts. present for the correct classification,
The inputs from the domain expert the cases where the standards are
guarantee neither completeness nor not followed need assumptions to be
consistency of the knowledge base made. Thus, the model can produce
(Ozel, 2016). Upon the completion inaccurate results. Such cases need
of the development of the knowl- to be considered when it comes to
edge base, the same should be veri- validation of the appropriateness of
fied by the domain expert/’s for its the training dataset (Wehbe, 2018).
appropriateness. The validation of the learning
algorithm requires the developed
model to be tested thoroughly. To
2.6.2 VALIDATION OF ensure the same, the points to be
THE DATA AND TRAINING considered are whether the selected
ALGORITHM algorithms are suitable for the input
data and the model outcome is
Further, checking of the data can be interpretable. Also, one of the major
performed by different approaches. trade-offs of AI-based models is the
The accuracy of the model based on training time. To get a well-rounded
the training data can be verified as model, other approaches need to be
per the reviews from databases and considered and the performance of
charts. Further, studies on the specific the developed model needs to be
research area can be conducted. The compared with the existing other
data collected in each case can be models.
considered in verifying the accuracy
of the models. Further, how appro-
priate the training data is needed to 2.6.3 PERFORMANCE
be verified to determine if the most EVALUATION
appropriate parameters are included.
To achieve the same, statistical The factors that can be considered
analysis of the training dataset in measuring the performance is the
needs to be performed. Possibilities accuracy of the output generated by
of the accuracy of the data, scaling the model. The performance of the
consistency of the dataset, standards developed models can be verified by
to be followed are verified since the analyzing the outcome of the model
model will perform well only if the using a test dataset. The same can be
training dataset is most appropriate compared with the results obtained
from other models for the same test presented. The criteria that have
dataset. Different error measures like to be considered in the selection
mean absolute error, mean absolute of features and designing the AI
percentage error, etc. can be consid- models are presented. It should be
ered in verifying the performance noted that to realize an AI-based
of the models. The values of the model, the concepts presented are
error measures should be one of not definite. Furthermore, the evalu-
the benchmarks in the evaluation ation of a number of parameters is
of the models. The error measures required. Different methodologies
considered in the verification of when combined can be beneficial
the model and the model with the also. The performance measure
minimum error will be considered should be considered as the evalu-
the best performing model for the ation parameter of the model. The
deployment (Emmert, 2016). When AI-based models have the capability
it comes to unsupervised models, to adapt to unseen scenarios even
apart from all the points mentioned, then the performance of the models
the applicability of other datasets in in such cases depends mainly on the
the model should be the parameter similarity of the training dataset.
that has to be considered. To further
narrow down the applicability
KEYWORDS
of other datasets, it should have
characteristics that are similar to
the training dataset with the same data processing
parameters and accuracy. data acquisition
biomedical data source
feature selection
2.7 CONCLUSION data validation
In the process of the development

of AI-based biomedical systems,
the primary factor to be considered REFERENCES
is the availability of the data and
Agrawal, S., Khan, A. and Kumar, K.,
its representation. This chapter International Business Machines Corp,
presents an overview of the different 2018. Query modification in a database
ways in which biomedical data management system. U.S. Patent
can be represented and the associ- Application 15/424,769.
Ahmed, I.O., Ibraheem, B.A. and Mustafa,
ated knowledge can be acquired. Z.A., 2018. Detection of eye melanoma
Further, the different approaches using artificial neural network. Journal of
in representing the knowledge are Clinical Engineering, 43(1), pp. 22–28.
Beam, A.L. and Kohane, I.S., 2018. Big medical data acquisition system based on
data and machine learning in health two-level modeling. International Journal
care. JAMA, 319(13), pp. 1317–1318. of Medical Informatics, 112, pp. 114–122.
Chen, C.M.A., Johannesen, J.K., Bi, J., Liu, Z.H., Lu, J., Gawlick, D., Helskyaho, H.,
Jiang, R., and Kenney, J.G. 2016. Machine Pogossiants, G. and Wu, Z., 2018. Multi-
learning identification of EEG features Model Database Management Systems—A
predicting working memory performance Look Forward. In Heterogeneous Data
in schizophrenia and healthy adults. Management, Polystores, and Analytics
Neuropsychiatric Electrophysiology, 2(1), for Healthcare (pp. 16–29). Springer,
p. 3. Cham.
Codd, E.F., 1970. A relational model Ozel, T., Bártolo, P.J., Ceretti, E., Gay,
of data for large shared data banks. J.D.C., Rodriguez, C.A. and Da Silva,
Communications of the ACM, 13(6), pp. J.V.L. eds., 2016. Biomedical Devices:
377–387. Design, Prototyping, and Manufacturing.
Duda, R.O., and Hart, P.E. 1973. Pattern John Wiley & Sons, New York.
Classification and Scene Analysis. New Panyam, N.C., Verspoor, K., Cohn, T. and
York, NY, USA: John Wiley & Sons. Ramamohanarao, K., 2018. Exploiting
Emmert-Streib, F., Dehmer, M. and Yli-Harja, graph kernels for high performance
O., 2016. Against dataism and for data biomedical relation extraction. Journal of
sharing of big biomedical and clinical biomedical semantics, 9(1), p. 7.
data with research parasites. Frontiers in Parmar, C., Barry, J.D., Hosny, A.,
Genetics, 7, p. 154. Quackenbush, J. and Aerts, H.J., 2018b.
Friedman, C., 2018. Mobilizing Computable Data analysis strategies in medical
Biomedical Knowledge Conference imaging. Clinical Cancer Research, 24(15),
October 18, 2017. Overview/Opening pp. 3492–3499.
Remarks. Peng, L., Peng, M., Liao, B., Huang, G., Li,
http://slideplayer.com/slide/6207174/20/ W. and Xie, D., 2018. The advances and
images/28/Healthcare+example+of+ challenges of deep learning application
relational+databases.jpg accessed on in biological big data processing. Current
06/05/2019 Bioinformatics, 13(4), pp. 352–359.
Hudson, D.L. and Cohen, M.E., 2000. Neural Priyadarshini, S.J. and Hemanth, D.J., 2018.
Networks and Artificial Intelligence for Investigation and reduction methods of
Biomedical Engineering. Institute of specific absorption rate for biomedical
Electrical and Electronics Engineers. applications: A survey. International
Piscataway, NJ, USA Journal of RF and Microwave Computer-
Ju, Z., Wang, J. and Zhu, F., 2011. Named Aided Engineering, 28(3), p. e21211.
entity recognition from biomedical text Pyrkov, T.V., Slipensky, K., Barg, M.,
using SVM. In 2011 5th International Kondrashin, A., Zhurov, B., Zenin, A.,
Conference on Bioinformatics and Pyatnitskiy, M., Menshikov, L., Markov,
Biomedical Engineering (pp. 1–4). IEEE. S. and Fedichev, P.O., 2018. Extracting
Khan, R.S. and Saber, M., 2010. Design biological age from biomedical data
of a hospital-based database system (A via deep learning: too much of a good
case study of BIRDEM). International thing?. Scientific Reports, 8(1), p. 5210.
Journal on Computer Science and Sarjapur, K., Suma, V., Christa, S. and Rao,
Engineering, 2(8), pp. 2616–2621. J., 2016. Big data management system
Li, B., Li, J., Lan, X., An, Y., Gao, W. and for personal privacy using SW and SDF.
Jiang, Y., 2018. Experiences of building a In Information Systems Design and
Intelligent Applications (pp. 757–763). Framework for Healthcare Records

New Delhi, India: Springer. Management: Constrained Goal Model.
Shah, R. and Chircu, A., 2018. IOT In 2018 26th Telecommunications Forum
and AI in healthcare: a systematic (TELFOR) (pp. 420–425). IEEE.
literature review. Issues in Information Yu, H., Jung, J., Lee, D. and Kim, S., 2015,
Systems, 19(3), pp. 33–41. October. What-if Analysis in Biomedical
Sniecinski, I. and Seghatchian, J., 2018. Networks based on Production Rule
Artificial intelligence: A joint narrative System. In Proceedings of the ACM Ninth
on potential use in pediatric stem and International Workshop on Data and Text
immune cell therapies and regenerative Mining in Biomedical Informatics (pp.
medicine. Transfusion and Apheresis 28–28). ACM.
Science, 57(3), pp. 422–424. Yu, K.H., Beam, A.L. and Kohane, I.S., 2018.
Vannier, M.W., Yates, R.E., and Whitestone, Artificial intelligence in healthcare. Nature
J.J. (eds.). 1992. Electronic Imaging of the Biomedical Engineering, 2(10), p. 719.
Human Body. Wright Paterson Air Force Zhao, Q., Peng, H., Hu, B., Liu, Q., Liu, L.,
Base, Ohio, USA: CSERIAC. Qi, Y. and Li, L., 2010, August. Improving
Villanueva, A.G., Cook-Deegan, R., Koenig, individual identification in security check
B.A., Deverka, P.A., Versalovic, E., with an EEG based biometric solution.
McGuire, A.L. and Majumder, M.A., In International Conference on Brain
2019. Characterizing the Biomedical Informatics (pp. 145–155). Springer,
Data-Sharing Landscape. The Journal of Berlin, Heidelberg.
Law, Medicine & Ethics, 47(1), pp. 21–30.
Wehbe, Y., Al Zaabi, M. and Svetinovic,
D., 2018, November. Blockchain AI
CHAPTER 3
BIOMEDICAL ENGINEERING AND

INFORMATICS USING ARTIFICIAL
INTELLIGENCE
K. PADMAVATHI* and A. S. SARANYA
Department of Computer Science, PSG College of Arts and Science,
Coimbatore 641014, Tamil Nadu, India
Corresponding author. E-mail: padmasakthivel@gmail.com
*
ABSTRACT knowledge-based reasoning in

disease classification, which is
In recent decades, artificial intel- used to learn and discover novel
ligence (AI) is widely used in biomedical knowledge for disease
various fields of human life. One treatment. AI is used to detect
of the most promising areas of AI is disease in an earlier stage and guide
medical imaging. Medical imaging diagnosis for early treatment with
provides an increasing number imaging technologies. AI applica-
of features derived from different tions are used and implemented
types of analysis, including AI, in various biomedical fields for
neural networks, fuzzy logic, etc. analyzing diseases like myocardial
The selection of image processing infarction, skin disorders, etc. The
features and AI technologies can be tools and techniques of AI are
used as a medical diagnostics tools. useful for solving many biomedical
AI is used to help and improve problems with the use of computer-
numerous aspects of the healthcare related equipped hardware and
system. AI tools and techniques software applications. This chapter
provide considerable insight to provides a thorough overview of
power predictive visual analysis. the ongoing evolution in the appli-
The recent growing applications cation of biomedical engineering
of AI techniques in biomedical and informatics using AI techniques
engineering and informatics used and tools. It gives a deeper insight
into the technological background In biomedical engineering, AI

of AI and the impacts of new and could be used to aid the doctors in
emerging technologies on biomed- making decisions without consulting
ical engineering and informatics. the specialists directly. AI and related
decision-support systems help to
make clinical decisions for health
3.1 INTRODUCTION professionals. They use medical data
and knowledge domains in diagnosis
Artificial intelligence (AI) is the to analyze patient’s conditions as
human and computer interaction well as recommend suitable treat-
and development system that uses ments for the patients. In addition, AI
human intelligence to do various and related decision-support systems
tasks like visual perception, speech provide help to the patient and
recognition, language translations, medical practitioner to improve the
robotics, and decision-making. AI quality of medical decision-making,
and its related technologies offer real increase patient compliance, and
practical benefits and innovations minimize estrogenic disease and
in many research areas and in their medical errors. This chapter presents
applications. AI is a revolutionized a detailed approach to each applica-
tion of AI in biomedical engineering
technology that combines intel-
such as diagnosis, medical imaging,
ligent machines and software that
waveform analysis, outcome predic-
work and react like human beings.
tion, and clinical pharmacology
AI and its applications are used in
(Singh et al., 2014).
various fields of human life to solve
complex problems in various areas
like science, engineering, business, 3.2 ARTIFICIAL INTELLIGENCE
and medicine. Recent technological IN MEDICAL IMAGING AND
developments and related areas like DIAGNOSIS
biomedical engineering, medical
informatics, and biomedicine use an Medical imaging is a collection of
innovative computer-based system techniques that are used to create
for decision-making. AI is also visual representations of a body
used in various fields like biology, interior for clinical analysis and
engineering, and medicine that give medical intervention. Medical
a great impact by using machine imaging plays an important role in
learning, neural networks (NNs), medical diagnosis, treatment, and
expert systems, fuzzy logic, and medical applications, which seeks to
genetic algorithms. reveal internal structures hidden by
Biomedical Engineering and Informatics Using Artifcial Intelligence 53
the skin and bones for diagnosing the human body using radiography,
and treating disease. In medicine, MRI, nuclear medicine, ultrasound
AI is used to identify diagnosis and imaging, tomography, cardiograph,
give therapy recommendations. In and so on (Smita et al., 2012).
medical diagnosis, artificial neural
networks (ANNs) is used to get
the result of the diagnosis. ANN 3.2.1 COMMON ARTIFICIAL
provides an extraordinary level of NEURAL NETWORKS IN
achievement in the medical field. MEDICAL IMAGE PROCESSING
ANN has been applied to various
areas in medicine like disease In recent years, NNs algorithms
diagnosis, biochemical analysis, and techniques are used in medical
image analysis, etc. In recent years, image processing because of their
medical image processing uses good performance in classification
ANNs for analyzing medical images. and function approximation. NN
The main components of medical techniques are mostly used in image
image processing that heavily preprocessing (e.g., construction
depend on ANNs are medical image and restoration), segmentation,
object detection and recognition, registration, and recognition. Table
medical image segmentation, and 3.1 shows the different types of NNs
medical image preprocessing. The used in the medical field (Rajesh et
various AI imaging technologies al., 2016; Yasmin et al., 2013).
help to examine various factors of
TABLE 3.1 Neural Network Used in the Medical Field

Neural Network Preprocessing Segmentation Registration Recognition
Hopfield NN √ √ – √
Radial basis – – √ √
function NN
Feedforward NN √ √ – √
Self-organizing √ √ √
feature
NN
Probabilistic NN – √ – √
Fuzzy NN √ √ – √
Neural ensemble √ √ √
Massive training √ – – √
NN
Image segmentation is an (i) preprocessing

indispensable process in outlining (ii) pegmentation
boundaries of organs and tumors and (iii) andobject recognition
the visualization of human tissues The preprocessing is used to
during clinical analysis. Segmenta- improve image data by suppressing
tion of medical image processing unwanted distortions and enhances or
is very important for clinical data highlights image features for further
analysis, diagnosis, and applications, processing. Medical image segmenta-
leading to the requirement of robust, tion is a process, used to divide the
reliable, and adaptive segmentation image into meaningful regions using
techniques. Image segmentation and homogeneous properties. It performs
edge detection often follow image operations on images to detect
registration and can serve as an addi- patterns and to retrieve information
tional preprocessing step in multi- from it. Object recognition is used to
step medical imaging applications recognize one or several prespecified
(Pratyush et al., 2014). The following or the learned objects or object classes
subsections describe applications of in medical images.
ANNs where segmentation or edge (i) Preprocessing
detection is the primary goal. In medical image processing,
preprocessing is used to enhance the
quality of the images when medical
3.2.2 USES OF ARTIFICIAL images have a poor noise-to-signal
NEURAL NETWORK IN ratio. Image reconstruction and
MEDICAL IMAGE PROCESSING image restoration are the two catego-
ries that use neural networks to
Medical image processing is a reconstruct and restore the medical
collection of techniques, used for images. In medical diagnosis,
disease diagnosis in the medical Hopfield NN and Kohonen NN are
field. Medical image processing used for image reconstruction, and
techniques and applications meet neural network filters (NFs) are used
these challenges and provide for image restoration.
an enduring bridge in the field Hopfield NN is a special type of
of medical imaging. It helps for feedback NN that is a fully intercon-
quantitative image analysis using nected network of artificial neurons,
authoritative resources and sophis- in this; each neuron is connected to
ticated techniques, with a focus on each other. All artificial neurons have
medical applications. The common N inputs in which each input i have
techniques used in medical image associated weight wi. The weight
processing are wi is computed and not changed.
All artificial neurons also have an NN that is the feedforward NN,

output. All neurons are having both which provides the most advantage
input and output, and all neurons for the problem of medical image
are connected to each other in both reconstruction as compared to other
directions using patterns. The inputs methods.
are received simultaneously by all
the neurons, they output to each
other continuously until a stable
state is reached.
FIGURE 3.2 Kohonen network.
The Kohonen network (Figure

3.2) consists of an input layer, which
FIGURE 3.1 Hopfield neural network.
distributes the inputs to each node
in a second layer, the so-called
Medical image reconstruction competitive layer. In a competitive
is the process of reconstructing an layer, each node acts as an output
image from a number of parameters node. Each neuron in the competitive
acquired from sensors. The Hopfield layer is connected to other neurons
NN (Figure 3.1) is used to reconstruct in its neighborhood and feedback is
medical images that can always be restricted to neighbors through these
conceptualized as an optimization lateral connections. Neurons in the
problem making the convergence of competitive layer have excitatory
the network to a stable position and connections to immediate neighbors
at the same minimizing the energy and inhibitory connections to more
functions. Electrical impedance distant neurons. All neurons in the
tomography reconstruction on data competitive layer receive a mixture
that is noisy requires the solution to of excitatory and inhibitory signals
employ nonlinear inverse concepts. from the input layer neurons and
The problem is generally ill- other competitive layer neurons.
conditioned and needs regularization This method is used to compute
based on prior knowledge or simpli- the linear approximation associated
fying assumptions. Another method with the inverse problem directly
of ANN is self-organizing Kohonen from the forward problem of finite
element simulation. A high number outputs for nonlinearly separable

of NN applications in medical image data. In medical image processing,
preprocessing are concentrated in MLP performs segmentation
medical reconstruction. In medical directly on the pixel data or image
image restoration, neural NFs are features. Therefore, MLP performs
used to remove noise from the segmentation in two different ways:
image. Neural NFs use a neural edge (a) pixel-based segmentation and (b)
enhancer (NEE), which is based on a feature-based segmentation.
modified multilayer NN to enhance (a) Pixel-based segmentation:
edges that are desired. NEE is robust Feedforward ANN (Figure 3.3) is
against noise, to enhance continuous used to segment the medical images
edges found in images that are noisy. using image pixels. BP supervised
learning algorithm classifies and
(ii) Image segmentation
segments the medical images content
Image segmentation is used to extract based on
the specific part from the medical
images. In AI-based medical image • texture or a combination of
segmentation, feedforward NNs are texture and local shape,
used to formulate image segments. • connecting edge pixels,
Segmentation is the partitioning of • identification of surfaces, and
an image into smaller parts that are • clustering of pixels.
coherent according to some criteria.
In this classification task, segmenta- In medical applications, super-
tion is achieved by assigning labels vised classifiers are used to perform
to individual pixels. the desired segmentation.
In a feedforward network, infor- (b) Feature-based segmenta-
mation flows in one direction from tion: Feedforward ANNs classify
the input layer to the final output layer and segment the medical images
via the intermediate hidden layers. based on
The feedforward network uses the
backpropagation (BP) supervised • texture or a combination of
learning algorithm to dynamically texture and local shape,
alter the weight and bias values • estimation of ranges,
for each neuron in the network. A • connecting edges and lines, and
multilayer perceptron (MLP) is a • region growing.
special type of feedforward network
employing three or more layers, with Texture segregation is one of
nonlinear transfer functions in the the most frequently performed
hidden layer neurons. MLPs are able segmentations by feature-based
to associate training patterns with ANNs. In feedforward NNs-based
segmentation, segmented images a more robust performance in object

appear less noisy, and the classifier recognition.
is also less sensitive to the selection
of the training sets.
3.3 WAVEFORM ANALYSIS
A waveform is the graphical repre-

sentation of a signal in the form
of a wave, derived by plotting a
characteristics wave against time.
The various inputs are used to create
a waveform. Waveforms are used to
represent different things in various
FIGURE 3.3 Multilayer feedforward fields like science, biochemistry,
network.
and medicine. In the medical field
different kinds of waveforms are
(iii) Object recognition used to diagnosis the diseases. The
Object recognition consists of most frequently used waveforms are
locating the positions and possibly
orientations and scales of instances • ECG—electrocardiography,
of objects in an image. The purpose • EEG—electroencephalog-
of object recognition is to assign a raphy, and
class label to a detected object. In • EMG—electromyography.
medical applications, ANNs are used
to recognize and locate individual The waveforms are bioelectric
objects based on pixel data. Recur- signals, produced in human bodies
rent ANNs are used for object recog- by coordinated activity of a large
nition in medical image processing. group of body cells. ECG,, and EMG
RNNs use supervised machine systems are used to measure cell
learning models for object recogni- activity of heart, brain, and muscle/
tion, made of artificial neurons with nerve (Table 3.2). These waveforms
one or more feedback loops. The are measured by bioelectric poten-
feedback loops are recurrent cycles tials on the surface of active tissue.
over time or sequence. It is used to
TABLE 3.2 Waveforms–Frequency Ranges
minimize the difference between the
output and target pairs (i.e., the loss Bioelectric Signal Frequency Range
value) by optimizing the weights of (µV)
the network for object recognition. ECG 50–5
The usage of recurrence in RNN EEG 2–100
uses averaging, which helps to give EMG 20–5
ECG: ECG is the process of tools for assessing brain disorders

displaying the electrical activity like epilepsy, a seizure disorder. An
of the heart over a period of time EEG can also help in diagnosing
using electrodes placed on a human and treating the following brain
body (Figure 3.4). These electrodes disorders:
detect the electrical changes on the • Brain tumor
membrane that arise from the heart • Brain damage from a head
muscle depolarizing during the injury
heartbeat. The electrocardiogram o Brain dysfunction that can
is a diagnostic tool that is used to have a variety of causes
measure the electrical and muscular (encephalopathy)
functions of the heart. The electro- o Inflammation of the brain
cardiogram is used to measure the (encephalitis)
rate and rhythm of the heartbeat and • Stroke
provide evidence of blood flow to • Sleep disorders
the heart muscle.
FIGURE 3.5 EEG waveforms.
FIGURE 3.4 ECG waveform.

EMG: EMG is a method used to
assess the health of muscles and the
EEG: An electroencephalogram
nerve cells of the body. EMG can
is a process that is used to detect
identify nerve dysfunction, muscle
electrical activity of the brain using
dysfunction, or other problems with
small, metal discs (electrodes)
attached to the scalp (Figure 3.5). nerve-to-muscle signal transmission.
Human brain cells communicate via In the human body, motor neurons
electrical impulses and these pulses are transmitted electrical signals
are active all the time, even when a between muscles. EMG uses elec-
human is asleep. This brain activity trodes to translate these signals into
shows as wavy lines and it can be graphs, sounds, or any numerical
recorded as an EEG. EEG is used to values (Figure 3.6). The following
measure changes in brain activity. nerve or muscle disorder symptoms
It is one of the main diagnostic can be identified by using EMG:
• muscular dystrophy or poly- Variations in the signal amplitude

myositis myasthenia gravis, and duration of the ECG are used
• spinal cord disorders like to detect the cardiac abnormality.
carpal tunnel syndrome or These variations are implemented in
peripheral neuropathies, a computer-aided diagnosis system
• amyotrophic lateral sclerosis that can help in monitoring and
or polio, and diagnosis of cardiac health status.
• herniated disk in the spine. The information extraction can be
done easily using ECG because of
its nonlinear dynamic behavior.
ANNs are effectively used for
detecting morphological changes in
nonlinear signals such as the ECG
signal because ANNs use a pattern
matching technique based on the
nonlinear input–output mapping.
A feedforward multilayer NN with
FIGURE 3.6 EMG waveform.
error BP learning algorithm is used
to investigate, monitor, recognize,
3.3.1 WAVEFORM ANALYSIS and diagnose heart disease using
IN THE MEDICAL FIELD ECG signals. The following steps
are used to identify heart disease
3.3.1.1 DIAGNOSIS OF HEART using feedforward multilayer NN
with error BP learning algorithm
DISEASE USING ANN AND ECG
(Sayad et al., 2014; Olaniyi et al.,
The ECG is used to measure the 2015) (Figure 3.7).
bioelectrical activity of the heart.
FIGURE 3.7 Common steps used in heart disease detection.
3.3.2 STEPS USED IN HEART kinds of noises like baseline

DISEASE DETECTION wander, alternating current
power noise, muscular
(i) Preprocessing: ECG signals contraction noise, and
may have corrupted various
electrode contact noise. To various ECG characteristic

remove these noises, ECG and features are used.
signals are preprocessed
Heart rate: It is the interval
using a filtering algorithm.
between two successive QRS
(ii) QRS complex detection: QRS
complexes.
complex detection is used to
detect and recognize QRS Change in heart rate: It is the
complexes based on analyses difference between two successive
of the slope, amplitude, and heart rates.
width. The recognition of QRS complex width: It is the
the ECG onset and offset duration between the QRS complex
is necessary for computer- onset and offset.
based analysis to assess the Normalized source entropy of
QRS complexes in ECG. The QRS complex: It is determined by
QRS onset is used to identify the part of the signal containing in
the beginning of the Q wave, the QRS complex.
or the R wave if no Q wave Normalized source entropy of ST
is present. The QRS offset wave: It is determined by the part
is used to identify at the end of the signal containing in the ST
of the S wave. The edges segment.
are detected as the point Complexity parameter for the
with zero slopes when there QRS complex: It is a Lempel and
is a sudden change or by a Ziv temporal complexity parameter
minimum distance method. that is determined by the part of the
(iii) ST-segment analyzation: signal containing the QRS complex.
The ST segment represents Complexity parameter for the
the period of the ECG ST wave: It is a Lempel and Ziv
after depolarization of the temporal complexity parameter that
QRS complex and before is determined by the part of the ECG
depolarization of the T signal containing the ST segment.
wave. Changes in the ST Spectral entropy: It is Shannon’s
segment of the ECG indicate spectral entropy that is determined
a deficiency in blood supply by using the entire heartbeat
to the heart muscle. An ST RT interval: It is the time between
segment analyzer is used to the occurrence of the R peak and T
make measurements of the peak.
ST segment. ST segment length: ST segment
(iv) Image classification. In deviation and ST segment angle of
image classification, the deviation.
Using these characteristics and = fj' (net j) ∑kδk if unit j is and

features as the input data set, the BP wjk hidden unit (3.3)
learning algorithm is implemented
where η denotes the learning factor
as classifiers. In the BP learning
(a constant); δj denotes error (the
algorithm, the system weights are
difference between the real output
randomly assigned at the beginning and teaching input) of unit j; tj
and then progressively modified in denotes the teaching input of unit j;
the light of the desired outputs for a oi denotes the output of preceding
set of training inputs. The difference unit i; i denotes the index of the
between the desired output and the predecessor to the current unit j,
actual output is calculated for every with link wij from i to j; j denotes
input, and the weights are altered in the index of the current unit; and k
proportion to the error factor. The denotes the index of successor to the
process is continued until the system current unit j with link wij from j to
error is reduced to an acceptable k.
limit. The BPA requires both the
The modified weights correspond activation function of the neuron
to the boundary between various and its (first) derivative to be of a
classes, and to draw this boundary finite magnitude and single valued.
accurately, the ANN requires a The input layer consists of nodes
large training data set that is evenly to accept data, and the subsequent
spread throughout the class domain. layers process the data using the
For quick and effective training, it activation function. The output layer
is desirable to feed the data from has three neurons, giving rise to an
each class in a routine sequence, so output domain of 16 possible classes.
the correct message about the class
boundaries is communicated to the 3.3.1.2 DIAGNOSIS OF BRAIN
ANN.
TUMOR USING ANN AND EEG
In the BP algorithm, the modifi-
cations are affected starting from the ANN and EEG help to identify brain
output layer and progress towards tumors with image processing tech-
the input. The weight increments are niques. Brain tumor classification is
calculated according to the formula done by two stages: feature extrac-
listed in the following equations: tion and classification. In ANN,
backpropagation network (BPN)
wij = η δj oi (3.1)
classifier is used to evaluate the
δj = fj' (netj) if unit j is an performance and classification accu-
(tj−oj) output unit (3.2)
racies. In brain tumor classification,
the EEG signal is recorded and stored • New. Data = eigenvectors.

in the digital forms. The necessary Transposed * adjustedData.
features are extracted using principal Transposed
component analysis (PCA). The
steps of the PCA algorithm are as In BPN classifier, two-layer
follows: feedforward network is used for
Step 1: Prepare the data: classification. Feedforward network
covers smaller regions for classifica-
• Center the data: Subtract the tion because it has a set of hidden
mean from each variable. layer and the output nodes.
This produces a data set The steps used in BPN are
whose mean is zero. • storing,
• Scale the data: If the vari- • sampling,
ances of the variables in • finding similarity matching,
your data are significantly • by updating,
different, it’s a good idea to • repeating the four steps
scale the data to unit variance. again, and
This is achieved by dividing • spreads chosen by
each variable by its standard normalization.
deviation.
Step 2: Calculate the covariance/ 3.3.1.3 USES OF ANN AND
correlation matrix. EMG IN DIAGNOSIS OF
Step 3: Calculate the eigenvectors MUSCULAR DISORDERS
and the eigenvalues of the covari-
ance matrix. EMG is used to find the function of
muscles and the nerves of the human
Step 4: Choose principal compo-
body. EMG signals studies are used
nents: eigenvectors are ordered by
to help in the diagnosis of muscles
eigenvalues from the highest to
and the nerve disorders like dystro-
the lowest. The number of chosen
phies and neuropathies. The classifi-
eigenvectors will be the number of
dimensions of the new data set. cation of EMG diseases is done by
Eigenvectors = (eig_1, eig_2,…, various techniques like feedforward
eig_n) network, BPN, PCA, support vector
machine (SVM). PCA is used for
Step 5: Compute the new data set: extracting desired information
• Transpose eigenvectors : rows from relevant or irrelevant data sets
are eigenvectors because it uses a nonparametric
• Transpose the adjusted data method for information extraction.
(rows are variables and Classification of EMG diseases is
columns are individuals) a challenging and complex task.
The classification is done by BPN consider various variables for the

with feedforward network and SVM prediction of an outcome require
classifier. computational intelligent methods
for efficient prediction outcomes.
Although computational intelligence
3.4 OUTCOME PREDICTION approaches have been used to predict
USING ANN prostate cancer outcomes, very few
models for predicting the pathological
The ability to predict the patho- stage of prostate cancer exists. In
logical stage of a patient with essence, classification models based
prostate cancer is important because on computational intelligence are
it enables clinicians to better deter- utilized for prediction tasks. A classi-
mine the optimal treatment and fication is a form of data analysis that
management strategies. This is to extracts classifier models describing
the patient’s considerable benefit, data classes, and uses these models to
as many of the therapeutic options predict categorical labels (classes) or
can be associated with significant numeric values. When the classifier
short- and long-term side effects. is used to predict a numeric value, as
For example, radical prostatectomy, opposed to a class label, it is referred
that is, the surgical removal of the to as a predictor. Classification and
prostate gland offers the best chance numeric prediction are both types of
for curing the disease when prostate prediction problems, and classification
cancer is localized, and the accurate models are widely adopted to analyze
prediction of the pathological stage patient data and extract a prediction
is fundamental to determining which model in the medical setting. Compu-
patients would benefit most from this tational intelligence approaches, and
approach. Currently, clinicians use in particular fuzzy-based approaches,
monograms to predict a prognostic are based on mathematical models that
clinical outcome for prostate cancer, are specially developed for dealing
and these are based on statistical with the uncertainty and imprecision
methods such as logistic regression. that is typically found in the clinical
However, cancer staging continues data that are used for prognosis and
to present significant challenges to the diagnosis of diseases in patients.
the clinical community (Koprowsk These characteristics make these algo-
et al., 2012). rithms a suitable platform to base new
The prostate cancer staging strategies for diagnosing and staging
monograms that are used to predict prostate cancer. In prostate cancer,
the pathological stage of the cancer cancer staging prediction is a process
are based on results from the clinical for estimating the likelihood that the
tests. Cancer prediction systems that disease has spread before treatment
is given to the patient. Cancer staging 50–59 years: PSA ≥ 3.0,

evaluation occurs before (i.e., at the 60–69 years: PSA ≥ 4.0, and
prognosis stage) and after (i.e., at the 70 and over: PSA > 5.0.
diagnosis stage) the tumor is removed. Abnormally high and raised PSA
There are three primary clinical-stage levels may indicate the presence of
tests for prostate cancer: prostate cancer. The screening for
Biopsy: It is used to detect the prostate cancer can be reduced death
presence of cancer in the prostate from prostate cancer.
and to evaluate the degree of cancer
aggressiveness.
Digital Rectal Examination 3.4.1 PRIMARY AND
(DRE): It is a physical examination SECONDARY GLEASON
that can determine the existence of PATTERNS
disease and possibly provide suffi- A tissue sample (biopsy) is used to
cient information to predict the stage detect the presence of cancer in the
of cancer. prostate and to evaluate its aggressive-
A limitation of the PSA test is that ness. The results from a prostate biopsy
abnormally high PSA levels may not are usually provided in the form of the
necessarily indicate the presence of Gleason grade score. For each biopsy
prostate cancer nor might normal sample, pathologists examine the
PSA levels reflect the absence of most common tumor pattern (primary
prostate cancer. Pathological staging Gleason pattern) and the second most
can be determined following surgery common pattern (secondary Gleason
and the examination of the removed pattern), with each pattern being given
tumor tissue and is likely to be more a grade of 3–5. These grades are then
accurate than clinical staging, as it combined to create the Gleason score
allows direct insight into the extent that is used to describe how abnormal
and nature of the disease. the glandular architecture appears
Prostate-Specific Antigen under a microscope.
(PSA): The PSA test is a blood test For example, if the most common
that measures the level of PSA in tumor pattern is grade 3, and the next
the bloodstream. The PSA test is most common tumor pattern is grade
currently the best method for iden- 4, the Gleason score is 3 + 4 or 7. A
tifying an increased risk of localized score of 6 is regarded as a low-risk
prostate cancer. PSA values tend disease, as it poses little danger of
to rise with age, and the total PSA becoming aggressive, and a score of
levels (ng/ml) recommended by the 3 + 4 = 7 indicates intermediate risk.
Prostate Cancer Risk Management Because the first number represents
Programme are as follows: the majority of abnormal tissue in the
biopsy sample, a 3 + 4 is considered Category T1 is when the tumor

less aggressive than a 4 + 3. Scores cannot be felt during the DRE or be
of 4 + 3 = 7 or 8–10 indicate that the seen with imaging such as transrectal
glandular architecture is increasingly ultrasound. Category T1 has three
more abnormal and associated with a subcategories:
high-risk disease that is likely to be
• T1a: Cancer is found inciden-
aggressive.
tally during a transurethral
resection of the prostate
3.4.2 CLINICAL AND (TURP), which will have
PATHOLOGICAL STAGES been performed for the treat-
ment of benign prostatic
The clinical stage is an estimate hyperplasia, and the cancer is
of the prostate cancer stage, and present in no more than 5% of
this is based on the results of the the tissue removed.
DRE. The pathological stage can • T1b: Cancer is found during
be determined if a patient has had a TURP but is present in
surgery and hence is based on the more than 5% of the tissue
examination of the removed tissue. removed.
Pathological staging is likely to be • T1c: Cancer is found in a
more accurate than clinical staging needle biopsy that has been
because it can provide direct insight performed due to an elevated
into the extent of the disease. At the PSA level.
clinical stage, there are four catego-
ries for describing the local extent of Category T2 is when the tumor
a prostate tumor (T1–T4). Clinical can be felt during a DRE or seen
and pathological staging uses the with imaging but still appears to
same categories, except that the T1 be confined to the prostate gland.
Category T2 has three subcategories:
category is not used for pathological
staging. • T2a: Cancer is in one half or
less of only one side (left or
• The stages T1 and T2 describe right) of the prostate.
a cancer that is probably • T2b: Cancer is in more than
organ-confined. half of only one side (left or
• T3 describes a cancer that is right) of the prostate.
beginning to spread outside • T2c: Cancer is on both sides,
the prostate. that is, left and right sides of
• T4 describes a cancer that the prostate.
has likely begun to spread to • T3a: Cancer can extend
nearby organs. outside the prostate.
• T3b: Cancer may spread to used as a two-layer feedforward

the seminal vesicles. network, in which the first layer has
a connection from the network input
Category T4 cancer has grown
and is connected to the output layer
into tissues next to the prostate like
that produces the network’s output.
the urethral sphincter, the rectum,
A log-sigmoid transfer function
the bladder, and/or the wall of the
is embedded in the hidden layer,
pelvis. The TNM staging is the most
and a softmax transfer function is
widely used system for prostate
embedded in the output layer.
cancer staging and aims to determine
A neuron has R number of inputs,
the extent of
where R is the number of elements in
• T-stage: primary tumor, an input vector. Let an input vector
• N-stage: the absence or pres- X be a patient record Xi belonging
ence of regional lymph node to a class organ-confined disease or
involvement, and extra-prostatic disease. Each input
• M-stage: the absence or pres- Xi is weighted with an appropriate
ence of distant metastases. weight w. The sum of the weighted
inputs and the bias forms the input
Most medical facilities use
to the transfer function f. Neurons
the TNM system as an important
can use a differentiable transfer
method for cancer reporting. In
function f to generate their output.
prostate cancer staging prediction,
The log-sigmoid function that
classification can be done by various
generates outputs between 0 and 1
classification algorithms. They are
as the neuron’s net input goes from
• classification using ANN, negative to positive infinity is used.
• classifier classification using The Softmax neural transfer function
the naive Bayes classifier, and is used to calculate a layer’s output
• classification using the SVM from its net input. Softmax functions
classifier. convert a raw value into a posterior
probability and this provides a
3.4.3 CLASSIFICATION USING measure of certainty. The maximum
number of repetitions is set to ϵ = 200
THE ARTIFICIAL NEURAL
and to avoid over-fitting, training
NETWORK CLASSIFIER
stops when the maximum number
In prostate cancer staging predic- of repetitions is reached. The ANN
tion, ANN is trained to recognize the is trained using the scaled conjugate
patients who have organ-confined gradient for fast supervised learning
disease or extra-prostatic disease. that is suitable for large-scale prob-
The pattern recognition NN is lems. The process of training the
ANN involves tuning the values will fall in a given class regardless of
of the weights and biases of the the record’s characteristics; and P(X)
network to optimize network perfor- is the prior probability of record
mance that is measured by the mean X, and hence the probability of the
squared error network function. attribute values of each record. The
naive Bayes classifier predicts that
a record Xi belongs to the class ci
having the highest posterior prob-
THE NAIVE BAYES CLASSIFIER ability, conditioned on Xi if and only
The naive Bayes classifier is designed if P(ci|X) > P(cj|X) for 1 j ≤ m, j ≠ i,
for use when predictors within each maximizing P(ci|X). The class ci for
class are independent of one another which P(ci|X) is maximized is called
within each class. The naive Bayes the maximum posteriori hypothesis.
classifies data in two steps. The first The classifier predicts that the class
one is training and prediction. The label of record Xi is the class ci if and
training step uses the training data, only if
which are patient cases and their P(X|ci)P(ci) > P(X|cj)P(cj)
corresponding pathological cancer when 1 ≤ j ≤ m, j ≠ I (3.4)
stage (i.e., organ-confined disease or
The naive Bayes outcome is
extra-prostatic disease), to estimate
that each patient’s record, which is
the parameters of a probability
represented as a vector Xi, is mapped
distribution, assuming predictors are
to exactly one class ci, where ci =
conditionally independent given the
1,…, n where n is the total number
class. In the prediction step, the clas-
of classes, that is, n = 2. The naive
sifier predicts any unseen test data
Bayes classification function can be
and computes the posterior prob-
tuned on the basis of an assumption
ability of that sample belonging to
regarding the distribution of the data.
each class. It subsequently classifies The naive Bayes classifier used two
the test data according to the largest functions for classification:
posterior probability. The following
naive Bayes description is used in • Gaussian distribution (GD)
the classification process. and
Let P(ci|X) be the posterior • kernel density eestimation
probability that a patient record Xi (KDE).
will belong to a class ci (class can
be organ-confined disease or extra- GD assumes that the variables
prostatic disease), given the attributes are conditionally independent given
of vector Xi. Let P(ci) be the prior the class label and thereby exhibit
probability that a patient’s record a multivariate normal distribution,
whereas KDE does not assume a Karush–Kuhn–Tucker conditions.

normal distribution, and hence, it is Once the SVM has been trained, the
a nonparametric technique. classification of new unseen patient
records is based on the Lagrangian
formulation. For many “real-world”
3.4.5 CLASSIFICATION
practical problems, using the linear
USING THE SUPPORT VECTOR
boundary to separate the classes
MACHINE CLASSIFIER
may not reach an optimal separation
The SVM classification method uses of hyperplanes. However, SVM
nonlinear mapping to transform the kernel functions that are capable
original training data (i.e., the patient of performing linear and nonlinear
dataset) into a higher dimensional hyperplane separation exist. The
feature space. It determines the outcome of applying the SVM for
best separating hyperplane, which prediction is that each patient record,
serves as a boundary separating represented as a vector Xi, is mapped
the data from two classes. The best to exactly one class label yi, where
separating hyperplane for an SVM yi = ±1, such that (X1,y1), (X2, y2),
means the one with the largest …, (Xm, ym), and hence, yi can take
margin between the two classes. one of two values, either −1 or +1
The bigger the margin, the better corresponding to the classes organ-
the generalization error of the linear confined disease and extra-prostatic
classifier is defined by the separating disease.
hyperplane. Support vectors are the
points that reside on the canonical 3.5 ANN IN CLINICAL
hyperplanes and are the elements of PHARMACOLOGY
the training set that would change the
position of the dividing hyperplane In the pharmaceutical process, NN
if removed. As with all supervised finds preformulation parameters for
learning models, an SVM is initially predicting the physicochemical prop-
trained on existing data records, erties of drug substances because of
after which the trained machine is its nonlinear relationships. It is also
used to classify (predict) new data. used in applications of pharmaceu-
Various SVM kernel functions can tical research, medicinal chemistry,
be utilized to obtain satisfactory quantity structure–activity relation-
predictive accuracy. ship (QSAR) study, pharmaceutical
The SVM finds the maximum instrumental engineering. Its multi-
marginal hyperplane and the support objective concurrent optimization is
vectors using a Lagrangian formula- adopted in the drug discovery process
tion and solving the equation using the and protein structure analysis. This
section describes the uses of ANN in physicochemical properties of the

clinical pharmacology. compounds and related to a toxico-
logical reaction of intrigue through
3.5.1 STRUCTURAL ANN. For example, the topology
SCREENING IN THE DRUG method used as the input mode of
DISCOVERY PROCESS a network; The ANN-QSTR model
was approved by 23 substituted
In the drug discovery process, benzene derivatives. The connection
ANN helps to predicate how coefficient amongst anticipated and
active a chemical compound will real toxicological activities of these
be against a given target in the compounds was observed to be
development of new medicines. 0.9088.
ANN-based QSAR models are
used as the forecast strategies in the
virtual screening. In patient care, 3.5.3 DESIGNING OF
AI helps to find and examine the PREFORMULATION
picture of the drug. This approach PARAMETERS
is referred to as virtual screening.
For example, if any basic structure ANN modeling has been utilized
of the compound is the input for a to enhance the preformulation
NN, it displays various structures parameter and to estimate the
similar to those compounds screens physicochemical properties of
over 1000 compounds, among them amorphous polymers. They predict
three compounds with high biologic the absorption, glass temperatures,
activity can be identified.
and viscosities of different hydro-
philic polymers and their physical
3.5.2 TOXICITY PREDICTION substance. It demonstrated the
potential of ANN as a preformula-
ANN can be used as an integral part tion tool by prediction of the rela-
of pharmacotoxicology, especially in tionship between the composition
quantity structure toxicity relation- of polymer substance and the water
ship (QSTR) contemplates. QSTR is uptake profiles, viscosity of polymer
a connection between the substance solutions, moisture content, and their
descriptors and its toxicological glass transition temperatures. It has
activity and can be associated to been precious in the preformulation
predict the lethality of compounds. outline and would help to decrease
Like QSAR, the molecular descrip- the cost and length of preformulation
tors of QSTR are predicted from the contemplate.
3.5.4 OPTIMIZATION of compounds with substance or

OF PHARMACEUTICAL biological activities. These param-
FORMULATION eters include molecular weight,
log p-value, electronic properties,
It addresses the multiobjective hydrophobicity, steric effects, a
oriented concurrent optimization hydrogen donor, molar volume, and
issues in the pharmaceutical industry molar refractivity. Experimental
to establish the relationship between determination of such properties
reaction factors and insignificant can be the time-consuming process.
factors. The prediction of pharma- An initial phase in QSAR contem-
ceutical responses in the polynomial plates figuring a massive number
equation and response surface meth- of structural descriptors that are
odology has been broadly used as used as mathematical illustrative of
a part of formulation optimization. chemical structure. The relation of
However, this prediction is small structure and activity with the phys-
scale due to a low success rate of icochemical descriptors and topo-
estimation. An optimization method logical parameters can be controlled
consolidating an ANN has been by computational techniques. For
created to overcome these shortcom- example, ANN associates to foresee
ings and to anticipate release profile the quantitative structure QSPR of
and improve detailing of different the beta-adrenoceptor antagonist in
drug formulations. The results calcu- humans. In the examination, ANN
lated by the trained ANN model with the concentric arrangement of
satisfy well with the theoretically ten beta-blockers having high set
observed values including in vitro up pharmacokinetic parameters is
release pattern that helps to improve developed and tried for its ability
the effectiveness of process and to predict the pharmacokinetic
formulation variables. parameters. Testing an enormous
number of possible combinations
of descriptors might take a lifetime
3.5.5 QUANTITY to succeed. The BP algorithm with
STRUCTURE–ACTIVITY topological indices, molecular
RELATIONSHIP (QSAR) connectivity, and novel physi-
cochemical descriptors helps to
ANN is a useful tool to establish predict the structure–activity rela-
quantity structure–activity relationship of a large series of analogs.
tionship and predict the activities It generates valuable models of the
of new compounds. QSAR links aqueous solubility inner arrange-
the physicochemical parameters ment of fundamentally related
medications with necessary auxil- resolved by the information of the

iary parameters. ANN-predicted drug’s pharmacokinetics and phar-
properties exhibit a better correla- macodynamics. They are adapted
tion with the experimentally deter- to estimate the pharmacodynamics
mined values than those predicted profiles precisely for a broad
by various multiple regression assortment of pharmacokinetic and
methods. ANN is valuable in QSAR pharmacodynamics relationship
investigation of the antitumor without requiring any data of active
movement of acridinone subordi- metabolite. As they do not require
nates. Moreover, a developed mode any structural information, it
is allowed to recognize the critical provides an idea over usual depen-
variables adding to the antitumor dent conventional methods. ANN is
movement such as lipophilicity. a quick and straightforward method
Therefore, ANN is not just valuable for predicting and identifying
in predicting QSARs yet addition- covariates. For example, the rate of
ally in distinguishing the part of clearance, protein-bound fraction of
the particular components relevant drug, and also volume distribution
to the action of interest. ANN can be determined.
can also be helpful in predicting
synthetic properties of compounds.
Predicting neural system models 3.6 CONCLUSION
have been distributed for alkanes,
alkenes, and assorted hydrocarbons. In this chapter, the branches of
These models commonly demon- AI are explored within the field of
strate great fitting and expectation biomedical and informatics. The
insights with necessary descriptors. information is presented in a very
concise way and the performance of
some AI systems that are employed
3.5.6 PHARMACOKINETICS in the biomedical and healthcare
AND PHARMACODYNAMICS domain is investigated. By this
chapter, we explore the various AI
ANN monitors human pharmaco- techniques in different domains in
kinetics parameters from the set of an appropriate way and make the
data on the physicochemical prop- field of AI more robust and appli-
erties of medications such as parti- cable is the sense of performance in
tion coefficient, protein binding, the healthcare. Especially, this chapter
dissociation constant, and animal describes the uses AI and their
pharmacokinetic parameter. Medi- related techniques in biomedical and
cation doses and drug choices are healthcare.
KEYWORDS of Innovative Research in Computer and

Communication Engineering, 2013, 4(8),
pp. 14509–14516.
artifcial intelligence Sayad, A.T; Halkarnikar, PP; Diagnosis
of heart disease using neural network
diagnosis
approach, International Journal of
medical imaging Advances in Science Engineering and
waveform analysis Technology, 2014, pp. 88–92.
Shi, Z; He, L; Application of neural networks
outcome prediction
in medical image processing, Proceedings
clinical pharmacology of the Second International Symposium on
Networking and Network Security, China,
2010, 2, pp. 023–026.
Singh, M; Verma, R.K; Kumar, G; Singh,
REFERENCES S; Machine perception in biomedical
applications: An introduction and review,
Duda, R; Hart, P; Stork, D; Pattern Journal of Biological Engineering, 2014,
Classification, second edition. New York, 1, pp. 20–24.
NY, USA: John Wiley & Sons, Inc., 2001. Smialowski, P; Frishman, D; Kramer, S;
Koprowsk, I. R; Zieleźnik, W; Wróbel Z, Pitfalls of supervised feature selection,
Małyszek, J; Stepien, B; Wójcik, W; Bioinformatics, 2010, 26, pp. 40–44.
Assessment of significance of features Smita, SS; Sushil, S; Ali, MS; Artificial
acquired from thyroid ultrasonograms in intelligence in medical diagnosis,
Hashimoto’s disease, BioMedEngOnLine, International Journal of Applied
2012, 11, pp. 1–20. Engineering Research, 2012, 7(11), pp.
Kunchewa, LI; Combining Pattern Classifiers, 1539–1543.
Methods and Algorithms. Hoboken, New Tadeusiewicz, R; Ogiela, MR; Automatic
Jersey, USA: John Wiley & Sons, Inc., 2004. understanding of medical images new
Olaniyi, EO; Oyedotun, OK; Heart achievements in syntactic analysis of
diseases diagnosis using neural networks selected medical images, Biocybernetics
arbitration, International Journal of and Biomedical Engineering 2002, 22, pp.
Intelligent Systems and Applications, 17–29.
2015, 7(12), pp. 75–82. Vyas, M; Thakur, S; Riyaz, B; Bansal,
Pratyush, Rn. M; Satyaranjan, M; Rajashree, B.B, Tomar, B; Mishra, V; Artificial
S; The improved potential of neural intelligence: the beginning of a new era
networks for image processing in in pharmacy profession, Asian Journal of
medical domains, International Journal of Pharmaceutics, 2018, 12(2), pp. 72–76.
Computer Science and Technology, 2014, Yasmin, M; Sharif, M; Mohsin, S; Neural
pp. 69–74. networks in medical imaging applications:
Rajesh, G; Muthukumaravel, A; Role of a survey, World Applied Sciences Journal,
artificial neural networks (ANN) in 2013, 22, pp. 85–93.
image processing, International Journal
CHAPTER 4
HYBRID GENETIC ALGORITHMS FOR

BIOMEDICAL APPLICATIONS
P. SRIVIDYA* and RAJENDRAN SINDHU
Department of Electronics and Communication,
R.V. College of Engineering, Bangalore 560059, India
Corresponding author. E-mail: srividyap@rvce.edu.in
*
ABSTRACT to develop AI in biomedical applica-

tions. In machine learning technique,
In the era of growing technology the machine itself learns the steps
where digital data is playing a vital to achieve the set goal by gaining
role, artificial intelligence (AI) experience. In NLP, the software
has emerged as a key player. The
itself automatically manipulates
process of simulating human intel-
the natural language like text and
ligence using machines is called
speech. Vision technique enables the
AI. To accomplish a given task, the
machine to see, capture, and analyze
machines are trained with activities
the captured image with the image
like reasoning, speech recognition,
captured from human eyesight.
planning, manipulating, and task
solving. Feeding the machines with Robots are often used to perform the
ample information is the core part tasks that humans find difficult to
of AI. Inculcating common sense, achieve.
thinking capabilities, and task AI finds application in different
solving in machines is a tedious sectors due to the recent progress in
task. AI is a combination of various digitalization. It has been expanding
thrust areas like computer science, in areas such as marketing, banking,
sociology, philosophy, psychology, finance, agriculture, and health care
biology, and mathematics. sectors. With the development of
Machine learning, natural data acquisition, data computing,
language processing (NLP), vision and machine learning, AI is causing
and robotics are the different ways a gradual change in medical
practice. To act like intelligent the hidden patterns in the data sets
systems, machines have to be fed to predict the possibility of a disease.
with a huge amount of data. The Different types of classification tech-
algorithms play an important role in niques along with data mining have
AI, as they provide instructions to proved to provide useful information
the machine to execute the required for better treatment of diseases.
task by analyzing the data provided. In the medical field, GAs find
In this chapter, the limitations extensive use in the field of gyne-
of genetic algorithms (GAs) are cology, cardiology, oncology, radi-
discussed and different classification ology, surgery, and pulmonology.
techniques along with Hybrid GAs
for biomedical applications will be
presented by identifying the chal- 4.1.1 GENETIC ALGORITHM
lenges in bio-medical applications
using AI. The GA (presented by Holland) is
a method applied to optimization
and search-related problems to
4.1 ARTIFICIAL INTELLIGENCE provide the most enhanced solution.
IN HEALTH CARE SECTOR The basis for the GA is the theory
of natural evolution by Charles
Artificial intelligence (AI) in Darwin. According to the theory,
healthcare is the usage of complex offspring for the next generation will
algorithms and software to assess be produced by selecting the fittest
human cognition in the analysis of individuals at random. A set of solu-
complicated medical data. AI is the tions for a task will be considered
capability for algorithms to estimate and among these solutions, the best
conclusions without human involve- ones will be selected. GA is divided
ment. Healthcare sectors are under into the following five stages:
pressure to reduce the cost. Hence,
an efficient way to use the data has 1. Evolution
to be devised. At present very few 2. Fitness function
software and hardware equipments 3. Selection
are available to analyze the existing 4. Crossover
huge medical data. Diagnosing 5. Mutation.
disease and its cure can be simplified
if the patterns within the clinical data The evolution stage starts from
are identified. a population and it is an iterative
The field of medical diagnostics process. In each iteration, the fitness
uses AI and algorithms like genetic of the individual is assessed. The
algorithms (GAs) for discovering fit individual’s genome is used to
Hybrid Genetic Algorithms for Biomedical Applications 75
create the next generation. This combats most of the problems that
new generation is then used in the arise due to finite population sizes.
next iteration. This continues until The blend of the local search method
the desired fitness level is achieved. along with the GA accelerates the
The crossover points are selected optimization process.
arbitrarily within the genes and Adaptive GAs are a favorable
to produce offspring using genes variation to GAs. They are GAs
exchanged among the parents until with adaptive parameters. Instead
the crossover points are reached. of using fixed values, here crossover
The mutation is then applied if the and mutation vary based on the solu-
bits in the string are to be flipped. tion’s fitness values.
Convergence and degree of accuracy Clustering-based adaptive GA is
in obtaining a solution are governed also a variant of the GA. Here, the
by the probabilities of crossover and population’s optimization states are
mutation. The algorithm terminates judged using clustering analysis.
if the required criteria are almost The crossover and mutation depend
met or if the required number of on the optimization states. For effec-
generations is produced or by manual tive implementations, GAs can be
inspection. In addition to the above combined with other optimization
five stages, heuristics can be applied methods to create a Hybrid GA.
to speed up the process.
Although, GAs are more efficient
as they provide a number of solutions 4.1.2 HYBRID GENETIC
for a task when compared with tradi- ALGORITHM
tional methods. It proves to be better
when vast parameters are available Even though the performance of
and show good performance for a GAs for global searching is superior,
global search. They quite often have it takes a long time to converge to
more latency while converging to the an optimum value. Local search
global optimum. In addition, each methods on the other hand converge
time we run the algorithm; the output to an optimum value very quickly for
might vary for the same set of inputs. smaller search space. Interestingly,
The problem arises because in the though their performance is poor as
GA the population size is assumed to global searchers.
be infinite. However, in practice the To improve the performance of
population size is finite. This affects GAs, a number of variations have
the sampling capacity of a GA and been devised. Hybrid GAs is one
hence its performance. Uniting a GA such variation. Hybrid GAs are
along with the local search method a combination of GA with other
optimization and search techniques Davis (1991) claims that the

to produce a hybrid to form the hybrid GA can prove to produce
best combination of algorithms for the best results only when the GA is
problem solving (El-Mihoub et combined with correct local search
al., 2006). These algorithms help methods, or else the result obtained
in improving the genetic search might be worse than using GA
performance by capturing the best alone.
of both schemes. Hybrid GAs can be According to Holland (1975),
applied to various medical fields like the quality of the solution can be
radiology, cardiology, gynecology, enhanced by performing the initial
oncology, and other health care search using the GA as shown in
sectors to find solutions for complex Figure 4.1 and later using a suitable
problems. The algorithms can be local search algorithm to enhance
applied in the screening of diseases the final population. Local search
and the planning of the treatment. algorithms should have the ability
Hybrid GAs can be used either to to identify the local optima with
enhance the search capability or to greater accuracy. The efficiency can
optimize the parameters of the GA. be in terms of memory or based on
Search capability can be improved time consumed to reach a global
by combining the GA with a proper optimum. Lamarckian and Bald-
choice of local method that is winian strategies are found to be the
specific to the problem. This helps in most suitable approaches to combine
improving the quality of the solution the local search algorithm with the
and efficiency. GA.
FIGURE 4.1 Genetic algorithm.

4.1.2.1 LAMARCKIAN AND the learning strategies become more

BALDWINIAN APPROACH FOR effective using this approach, it is
GENETIC ALGORITHM slower than the Lamarckian strategy.
Baldwinian search can also hamper
The Lamarckian approach is built the evolution process because of the
on learning. The theory states that confusion in genetic differences.
the characteristics possessed by The hybridization of Lamarckian
an individual are acquired from and Baldwinian approaches outper-
the previous generation. In this forms the individual approaches.
approach, the genetic structure This hybridization is either done
reflects the results obtained using the at an individual level or the gene
local search. The genetic structure level. At the individual level, it is
and the fitness of the individual are done by creating some individuals
modified suitably to suit the solution using Lamarckian and some using
found by the local search technique. the Baldwinian approach. At the
Thus, in this approach, the local gene level, few genes are evolved
search technique alters the genetic using Lamarckian and few using the
structure and places the operator Baldwinian approach.
back in the genetic population. Thus,
the Lamarckian approach helps in 4.1.2.2 STEPS TO BE
accelerating the search procedure of FOLLOWED TO DEVELOP A
the GA. However, the drawback is HYBRID GENETIC ALGORITHM
that by altering the genetic structure,
the GA’s discovering abilities are 1. Define the fitness function
badly affected. This leads to early and set different GA param-
convergence. eters like population size,
The Baldwinian approach was selection method, parent
proposed by Hinton and Nowlan. to offspring ratio, required
According to the Baldwinian numbers of crossovers, and
approach, the characteristics and mutation rate (Wan and
the genetic structure of the next Birch, 2013).
generation are unaltered. Instead, 2. Generate the current popu-
only the fitness is changed. The local lation in random and an
search method is used to create a objective function for each
new fitness value and this is used individual.
by the GA to improvise on the 3. Using GA operators, create
individual’s ability. Thus, the fitness the next generation.
value is enhanced by applying the 4. For each individual evaluate
local search method. Even though the objective function.
5. Apply a local search method situations. Classification is impor-

on each individual of the tant for preliminary diagnosis of
next generation and evaluate disease in a patient. This helps in
the fitness. If there is some deciding the immediate treatment.
considerable improvement Machine learning is one of the
in the solution, then replace main approaches for classification.
the individual. Machine learning provides
6. Halt the process if stopping automatic learning ability to the
criteria is met. machines. It allows the machines to
Many options are available in improve from experience without
selecting a local search method. being programmed explicitly and
Some of the most popular methods without human intervention. Two
are the classification technique different categories of machine
and image processing techniques. learning include supervised and
Classification technique and image unsupervised learning. Under
processing techniques are further supervised learning, the training
discussed in Sections 2 and 3, for the machine is provided with
respectively. some labeled data, meaning the
data that has correct answers. Once
this is completed, the machine is
4.2 HYBRID GENETIC provided with new examples so
ALGORITHM DEVELOPED that the supervised learning algo-
BY COMBINING GENETIC rithm produces the correct output
ALGORITHM AND using the labeled data by analyzing
CLASSIFICATION TECHNIQUES the training data. In unsupervised
learning, the machine is trained
As demand for achieving high using the information, which is
accuracy in medical diagnosis neither labeled nor classified.
grows, the usage of hybrid GAs The algorithm should act without
ensures improved performance any guidance on the information.
over GA. Classification is a tech- Unsupervised learning is classi-
nique in data mining to extract the fied into two types: clustering and
required information from a huge association.
amount of data to group the data In the clustering algorithm, the
into different classes. This helps inherent groupings in the data will
in identifying the class to which be identified. Whereas in the asso-
a data belongs. Classification is ciation algorithm, rules that describe
required when some decisions the greater portions of the data will
are to be made based on current be discovered.
Different types of machine the network has two or more hidden

learning classifiers are given as layers. Each of the hidden layers
follows: gets trained depending on the inputs
given from the previous layers.
• neural network
• decision trees
• support vector machines
• K-nearest neighbor
• fuzzy logic
GA can be used for the initial

search. To enhance the quality of the
search, classification techniques are
FIGURE 4.2 Interconnections in neural
then applied. The results obtained networks.
with the hybrid algorithm show
improved classification performance
in a reasonable time. 4.2.2 DECISION TREE
ALGORITHM
4.2.1 NEURAL NETWORK
Decision tree algorithms are
controlled learning algorithms that
The neural network is not an
find its use to resolve both regression
algorithm by itself. It acts as an
outline to process complex input and classification problems. In deci-
data by different machine learning sion tree classifiers, depending on
algorithms. As shown in Figure 4.2, the prior data or instances a training
neural network comprises of three model or decision tree is generated
layers, namely, the input layer, the (Neelamegam et al., 2013). The
hidden layer, and the output layer model consists of an initial node
with each layer consisting of nodes. called the root node. The root node
Computations occur at the node. The has zero in-coming edges followed
input from the data combines with by nodes with one inward edge and
the coefficient values in the node. with outward edges called internal
The product from the nodes is then node and nodes with no outgoing
added and passed on to the next layer edge called terminal nodes as shown
through the activation function. The in Figure 4.3. Root node and internal
hidden layers help in transforming nodes have attributes associated with
the input into a form that can be used them. However, terminal nodes have
by the output layer. The network is classes associated with them. For
classified as a deep neural network if each attribute, the internal nodes
have an outgoing branch. The class Chi-square automatic interac-

for each new instance is determined tion detection (CHAID) algorithm,
only at the terminal node after classification and regression
visiting through the internal nodes. tree (CART) algorithm, iterative
Steps followed in decision tree dichotomiser-3 (ID3) algorithm,
algorithm are given as follows: C4.5 algorithm, and C5.0 are the
main types of classical decision
1. The best attribute should be algorithms. Among them, the C5.0
positioned at the root. algorithm provides higher accuracy,
2. The training set should be consumes less memory, is highly
allocated into subsets and adaptable, has smaller decision
every subset should contain trees, and is less complex compared
the same attribute value. to other algorithms. Owing to their
3. Repeat steps 1 and 2 on all advantages, it is the most preferred
the subsets until the terminal algorithm for different applications.
node is reached.
4.2.2.1 CHAID ALGORITHM
It creates all possible cross-tabula-

tions for each predictor until further
splitting is unachievable and the
best outcome is obtained. The target
FIGURE 4.3 Structure of a decision tree.
variable is selected as a root node.
The target variable is then split into
Attributes decide the estimation two or more groups. These groups
criterion in decision tree algorithms. are called the initial nodes or parent
The attributes give a measure of how nodes. The groups belonging to this
well the input sequence achieves parent node are called child nodes.
the target classification. Hence, The last group is called a terminal
the selection of attributes at each node. In CHAID, the group that
node is a challenge in decision tree influences the most emanates first
algorithms. and the groups that have lesser influ-
Even though decision trees are ence emanates last
easy to understand, they have lower
prediction accuracy related to other
machine learning algorithms. They 4.2.2.2 CART ALGORITHM
give a prejudiced response when
attributes have a better number of Classification tree algorithms are
categories. used when the target variable is
fixed. The algorithms then iden- 4.2.2.3 ID3 ALGORITHM

tify the class to which the target
variable belongs. The regression In Iterative Dichotomiser-3 (ID3)
tree algorithm is used when the algorithm, the original set acts as
values of the target variable are a root node. On each reiteration,
to be predicted using independent the algorithm recaps through each
variables. unused attribute of the set to calcu-
This is a structured algorithm late the entropy for that attribute.
where a set of questions is asked. The attribute with smallest entropy
The answer to these questions helps value (or maximum information
in framing the next set of questions. gain) is then selected and the entire
This tree structure continues until no set is partitioned using the selected
more questions can be formed. entropy to create the data subsets,
The main stages involved in the that is, the decision tree is made
CART algorithm are as follows: using the entropy. Recur the algo-
rithm by considering the attribute
1. Based on the value of the that was never selected earlier.
independent variable the Example: The root node (exam
data at the node is split. score) can be fragmented into child
2. The branch is terminated nodes depending on the subsets
when further splitting is not whose marks are less than 40 or
possible. greater than 40. These nodes can
3. At each terminal node, the further be divided based on the
target value is predicted. marks scored as shown in Figure 4.4.
FIGURE 4.4 Example to show the ID3 algorithm.

4.2.2.4 C4.5 ALGORITHM 5. Dataset is induced (i.e., split)

on the basis of the new deci-
The extension of the ID3 algorithm sion node created in step 4.
is the C4.5 algorithm. ID3 is exces- 6. To get a subtree, all the
sively sensitive to features with large subdatasets in step 5 should
values. To overcome this limitation of call the C4.5 algorithm
the ID3 algorithm, C4.5 can be used. (recursive call).
Like in the ID3 algorithm, at every 7. The decision node created in
node of the tree the data is sorted to step 4 is attached to the tree
find the best attribute. At each node, obtained after the execution
one attribute that best separates the of step 6.
data set into subsets is selected. In 8. Return tree.
decision making, the attribute with
maximum normalized information Additional features C4.5 are
gain is selected. To mitigate overfit- tree pruning, improvisation in using
ting inherently it employs the single continuous attributes, handling
pass pruning process. It handles missing values, and inducing ruleset.
both the discrete and continuous
attributes. A threshold is created and
the attributes are listed that are above 4.2.2.5 C5.0 ALGORITHM
or equal or less than to the threshold.
It also handles the data with missing This algorithm is faster than C4.5.
attributes. It has improved memory usage.
In short, C4.5 is recursively The decision trees are smaller when
implemented sequentially as given compared with C4.5. The field that
below: provides the highest information is
split into subsamples. These subsam-
1. Verify if the termination ples are further split again based
criteria of the algorithm are on a different field. This continues
satisfied. until a stage when further splitting
2. All attributes are provided could not be done. The lowest level
with theoretical computer subsamples that do not contribute
information criteria. to the model can be removed. Two
3. Choosing the finest attribute different models can be built using
according to the criteria C5.0 algorithm:
provided by theoretical 1. Decision tree-based model—
computer information. here only one prediction is
4. A decision node is created possible for each data value.
on the basis of the finest 2. The rule-set-based model—
attribute of step 3. from the decision trees, rule
sets are derived. The rule the data points overlap then either a
sets need not have the same hyperplane with tolerance or a hyper-
properties of the decision plane with zero tolerance can be used.
tree. For a particular data The important parameters in SVMs
value, either one or more are margin, kernel, regularization,
rule sets or no rule needs to and gamma. By varying these param-
be applied. If multiple rules eters, a hyperplane with nonlinear
are applied, weighted votes classification can be achieved at a
are given to the data values reasonable time. Finding a perfect
and then added. If no rule set class when there are many training
is applied, a default value is data sets consumes a lot of time.
assigned. The error rates are
lower on rule sets thereby
helping in improving the
accuracy of the result. It also
automatically removes attri-
butes that are not helpful.
4.2.3 SUPPORT VECTOR

MACHINE FIGURE 4.5 Linear classification of
features using hyperplane.
In the support vector machine (SVM)
technique, every data point is plotted Although SVMs works well with
on an n-dimensional space where both structured and unstructured
the number of features is denoted data, gives better result compared to
by n. The classification is then ANNs, solves any complex problem
achieved by finding a hyperplane with suitable kernel function, and
that clearly splits the data points or reduces the risk of overfitting, it has
features, as shown in Figure 4.5. The few disadvantages also. Choosing
selected hyperplane must satisfy the the best kernel function is a difficult
following requirements: task. The time required to train
1. Clearly separate the data large data sets is more. The memory
points. requirement is more.
2. Maximize the distance
between the nearest data
4.2.4 BAYSESIAN NETWORK
point and the hyperplane.
SVMs can be used to perform both A Bayesian network is a directed
linear and nonlinear classification. If acyclic graph (DAG) belonging to a
family of probabilistic distributions Bayesian networks are proba-

that is used to represent not only the bilistic modeling techniques that
variables but also the conditional are perfect to consider the past
dependencies of the variables on a event and predict the cause for
directed graph. DAG consists of a the happening. For example, if the
probability table, nodes, and edges.
symptoms are known, the probable
Attributes or arbitrary variables are
diseases are predictable by the
represented as nodes and the condi-
tional dependencies of the variables network.
are represented as edges. Uncon-
nected nodes represent independent
attributes. If an edge connects two 4.2.5 K-NEAREST NEIGHBOR
nodes A and B, then for all the
values of A and B the probability K-nearest neighbor is built upon the
P(B|A) should be known to draw an basis of learning by comparison.
inference. All the probability values The n-dimensional space is used
will be specified beside the node in for storing training samples. Every
a probability table. For example, let sample is denoted by a point in
two different events A and B cause n-dimensional space. After a test
an event C. Among A and B let B sample is provided, the K-nearest
be more dominant, that is, when B neighbor classifier quests for K
occurs, A is not active. The represen- samples that are close to the test
tation of this condition is as shown
sample and then classify the test
in Figure 4.6. The variables have two
sample suitably.
possible values true (T) or false (F).
For example, consider that there
are two classes 1 and 2, as shown in
the Figure 4.7. Let Blue star be the
test sample that has to be classified.
If K = 5, then dotted square 1 is
selected. In this square, the number
of class 1 samples is more than class
2. Hence, the test sample is assigned
to class 1. If K = 7, then dotted
square 2 is selected. In this square,
the number of class 2 samples is
FIGURE 4.6 DAG used in the Bayesian more than class 1. Hence, the test
network. sample is assigned to class 2.
to give a definite output. Fuzzy logic

deals with uncertainties and provides
acceptable reasoning.
Four major modules involved in
fuzzy logic system are as follows:
Fuzzifier module: it splits the
input into five steps: large positive,
medium positive, small, medium
negative, or large negative.
Information base module: used
to store the guidelines provided by
FIGURE 4.7 Class assignment in K-nearest
experts.
neighbor. Inference module: fuzzy infer-
ence is made on the input based on
the information stored in the infor-
4.2.6 FUZZY LOGIC mation module.
Defuzzification module: output is
Fuzzy logic can be used for the obtained from the module due to the
problems with uncertain values at transformation of the obtained fuzzy
the input. It is a multivalued logic set.
in which the true values range The main modules involved in
from zero to one. Fuzzy logic takes fuzzy logic is shown in Figure 4.8.
different possibilities of input values
FIGURE 4.8 Fuzzy logic modules.
In fuzzy logic, based on the shows the membership degree in the

problem and the fuzzy set, a interval [0,1] and the x-axis shows
membership function is devised. the universe of discourse. Triangular,
This function provides a graphical Gaussian, trapezoidal, and linear are
representation of the fuzzy set few common shapes for the member-
(Santhanam et al., 2015). The y-axis ship function.
Algorithm: feature vector GA. A few of

the approaches are “branch
1. Define the variables for input and bound algorithm” and
and output. “artificial bee colony algo-
2. Build the membership func- rithm” using brain tumor,
tions for the input and output breast cancer, and thyroid
variables. images. In this genetic-
3. Construction of base rules. based feature selection
4. Obtain fuzzy values. method, the existing system
5. Perform defuzzification. dimensionality problem is
reduced (Shanmugapriya
and Palanisamy, 2012; Li et
4.3 HYBRID GENETIC al., 2012).
ALGORITHM DEVELOPED
BY COMBINING GENETIC
ALGORITHM AND IMAGE 4.3.1 FEATURE EXTRACTION
PROCESSING TECHNIQUES
To extract the most dominating and
The hybrid medical image retrieval important feature that represents the
image feature extraction methodolo-
system uses a GA approach for the
gies are used for analyses. The algo-
selection of spatiality reduced set
rithm formulated for the extraction
of features. The development of the
of the features is an intrinsic pattern
system comprises two phases.
extraction algorithm, Texton-based
1. In phase one, three distinct contour gradient extraction algo-
rithm, and a modified shift-invariant
algorithms are used to
feature transformation algorithm.
extract the important
There are different feature extrac-
features from the images.
tion techniques available. Some of
The algorithms used for
them are as follows:
extraction of the features are
intrinsic pattern extraction 1. Intrinsic pattern extraction
algorithm, Texton-based algorithm
contour gradient extraction Texture in this context is the intense
algorithm, and modified variation pattern in an image to
shift-invariant feature trans- represent the texture pattern that
formation algorithm. cannot be analyzed from a single-
2. In phase two, it is based on pixel point. The images require
the phase feature selection consideration of neighboring pixel
used to identify the potential point’s intensity for which an
algorithm called the intrinsic pattern magnitude determines the change

extraction algorithm was introduced of pixel intensity values and the
originating from the basic of prin- direction of the gradient specifies
cipal component analysis (PCA). the direction where the changes
Input for intrinsic pattern extraction take place. In Texton-based contour
algorithm is medical images avail- gradient extraction algorithm
able in the data sets and the output (TCGR), the input is medical images
is a positive intrinsic pattern feature from the datasets and the obtained
vector for each of the input images. output is the TCGR feature vector
The size of the identified feature for each input image.
vector, which represents the intrinsic Texton is one of the latest
pattern of the image provided and evolving concepts derived from
computation is reduced in this texture analysis to obtain the exact
approach. PCA is a statistical model contour gradient of the image
to classify a few discrete patterns for provided. It uses spatial filter
any given dataset. From the identi- transformation to extract syntactic
fied pattern design, PCA implements features of any user-defined image
a linear system. This linear system sets. The complex patterns can be
derived by applying linear algebra is analyzed by Texton and used to
used to identify the potential feature develop extensible texture models
vectors from the pattern design. for an image (Julesz, 1981). These
Later, these feature vectors are Texton vector features are analyzed
formalizing a tactic to analyze the
and normalized, which can be used
continuity in data sets.
for indexing the medical images as
2. Texton-based contour gradient per their domain specified.
extraction algorithm
The gradient of the edge pixel was 4.3.2 FEATURE SELECTION
also analyzed for a more effec-
tive feature detecting system. The This is a process used to choose a
fundamentals of the gradient are criteria based subset of features.
derived from the concept of deriva- Feature selection is used to remove
tives, which tells about the variation irrelevant data, reduce the cost of
of the functional variable derived data, improve the efficiency of
mathematically. The gradient is learning, reduce storage space, and
one such concept that is used to cost of computation. It reduces the
identify the variation in image pixel evaluation time, increases precision,
intensity value in a two-dimensional thereby reducing the complexity of
space at (i, j) location. The gradients the model, and helps in understanding
are usually vector values whose the data and the model. Hence, the
calculation must be powerful in 4.3.2.1 BRANCH AND

discarding the excess, immaterial, BOUND FEATURE REDUCTION
and noisy highlights. A hybrid branch ALGORITHM
and bound method along with coun-
terfeit bee settlement calculation for This is one of the feature selection
the ideal element choice are consid- algorithm proposed for maximizing
ered. The calculation builds a binary the accuracy and reducing the
search tree where the root depicts the computation time. In this algorithm,
set of all specifications and leaves the input is taken as a medical
depict the subsets highlights. While images feature vector and the output
crossing the tree down to leaves, the obtained is nothing but a reduced
calculation progressively eliminates feature vector. In this algorithm,
single features from the present consecutive tree levels are created
arrangement of “candidates.” The and tested with the evaluation
algorithm retains the data about the function, while traversing down the
current best subset and the rule value tree the algorithm removes single
it yields. This value is depicted as a features and the bound values are
bound value. ABC algorithm begins updated. This algorithm allows effi-
with the generation of a population cient search for optimal solutions. It
of binary strings (or bees). Introduce is well suited for discrete or binary
the population and assess wellness. data.
Select different features from neigh-
borhood features in the underlying
population and contrast to assess 4.3.2.2 BC FEATURE
the wellness. If the chosen features REDUCTION ALGORITHM
do not fulfill the wellness function
then eliminate those features from ABC algorithm the input is the
the population. Thus, every one of same as the branch and bound
the bees is compared with wellness feature reduction algorithm and the
function and structure the ideal list output obtained is also the same
of capabilities. If not even one of the as the branch bound algorithm.
features fulfills the fitness function, In this algorithm, the first stage is
find a new fitness function. Then, the generation of a population of
proceed with hunting for choosing binary strings, initialize population,
the ideal value. The proposed mixture and evaluate fitness. In the initial
approach calculation combines the population, the other features from
features of both branch and bound the neighborhood are compared with
algorithm and fake bee colony evaluated fitness; if the features do
calculation. not satisfy the function, remove them
from the population, and thereby, all using a function-defined called DD

the binary strings are compared and function. The main objective of DD
discarded; this keeps continuing is to find features that are closest to
until the optimal value is obtained. all the positive images and farthest
from all the negative images.
4.3.3 DIVERSE DENSITY

RELEVANCE FEEDBACK 4.4 APPLICATIONS
OF HYBRID GENETIC
Based on the relevance feedback ALGORITHMS IN CLINICAL
method in medical image retrieval, DIAGNOSIS
the system first receives a query
image that the user submits. The As the human population is
system uses Euclidean distance to increasing, the diseases are also
calculate the similarity between increasing in rapid pace. Human
images, to return initial query results. death due to
By using the user feedback informa- malignancy and heart attack has
tion the results are marked positive increased worldwide and accurate
and negative. On the basis of the clinical diagnosis is the demand
user’s interest and the feedback of the hour. In most of the cancer
refine the query image feature selec- patients, the malignant tumors are
tion on a repeated basis. The diverse diagnosed late or they are misdi-
density algorithm is used to achieve agnosed leading to the death of the
relevance feedback. The input image patient.
can be both positive image and nega-
tive image and the output obtained
would be the features that the user is 4.4.1 HYBRID GENETIC
interested in. ALGORITHM IN RADIOLOGY
In this algorithm, the image
content is considered as a set of Magnetic resonance imaging (MRI),
features, and the finds the features compute tomography (CT) scan,
within the features space with the positron emission tomography (PET)
greatest diversity density. The scan, mammography, X-rays, and
diversity density is a measure ultrasound are some of the imaging
that refers to the more positive modalities used in the field of medi-
examples around at that point, and cine. The imaging modalities are
the less negative examples, The used in detecting and diagnosing a
co-occurrence of similar features disease. The time required for detec-
from different images are measured tion and diagnosing has reduced after
the invention of computer-aided is a process of selecting

detection and diagnosis (CAD). appropriate features by
CAD systems also help in improving removing irrelevant features
the detection performance of the and constructing a model.
imaging modalities. This helps in reducing the
Even though CAD systems assist time, complexity, and cost
in diagnosis, the images captured by involved in the computation
imaging modalities are affected by of irrelevant features. By
noise. This noise affects the diag- applying feature extraction,
nosis process by the radiologists. the region of interest can be
Hence to diagnose the disease, the identified to be malignant or
detection machines have to process not.
and interpret the captured image 3. GAs can also be applied in
using algorithms based on classifica- image fusion to combine two
tion techniques (Ali Ghaheri, Saeed different images captured
Shoar et al., 2015; Ghaheri et al., using different image modal-
2015). ities. An example CT scan
image can be merged with
1. In treating cancer patients, an MRI image or a CT scan
exact tumor size and its image can be merged with a
volume determination play a PET scan image. This helps
vital role. This can be done in easy diagnosing since each
using imaging techniques. image possesses different
MRI of the organ can be information acquired under
captured and GA can be different conditions.
applied for image segmen-
tation. Artificial neural
network can then be applied 4.4.2 HYBRID GENETIC
to reduce the false-positive ALGORITHM WITH NEURAL
results. This technique was NETWORK FOR BREAST
adopted by Zhou et al. to CANCER DETECTION
predict tongue carcinoma
using head and neck MRIs. In malignancy detection, GA can be
2. Genetic algorithm (GA) can used to set the weights and the neural
be applied for feature extrac- networks can be used to give an
tion from mammograms or accurate diagnosis as elucidated by
PET scans to identify the Schaffer et al. (1992). The popula-
region of interest (Mohanty et tion of n-chromosomes along with
al., 2013). Feature extraction their fitness values is maintained
by GA. Based on the fitness values, (e) Stop when a fixed number of
the parents are selected to produce generations is reached.
the next generation. Crossover and
2. Artificial neural network:
mutation are then applied to obtain
the best solution. GA discards all (a) Input the training samples
bad proposals and considers only the and the class of the sample.
good once. Thus, the end result is not (b) Compare the output with
the known class and adjust
affected.
the weight of the training
Neural networks are used in
sample to meet the purpose
solving classification problems.
of classification.
They are capable of learning from
previous experiences and improvise 3. Algorithm for malignant cell
on the behavior when they are detection using GA and neural
trained. Neural networks mimic the network:
human brain. It consists of neurons (a) Initial solutions are gener-
joined using connecting links. The ated using GA.
weight of each link in the network is (b) It is then fed as input to
multiplied by the transmitted signal. neural network.
Every node in the network forms (c) The output from the neural
the output node of the network, the network is then evaluated
lines form the input, and the inter- using a fitness function.
mediate layer forms the hidden layer. (d) If the stop condition is not
The output of the hidden layer is the reached, a new selection,
input to the output layer (Alalayah et crossover,, and mutations are
al., 2018; Ahmad et al., 2010).). performed and fed back to
The below steps are involved in the neural network. Else, the
classifying malignancy: process is stopped.
1. GA:
(a) Selects the optimal weights 4.4.3 HYBRID GENETIC
and bias values for the neural ALGORITHM FOR HEART
network. DISEASE DETECTION
(b) Evaluation of fitness value.
(c) On the basis of fitness value, Heart diseases usually occur due
parents are selected. to improper pumping of the blood,
(d) The new population is block in arteries, high blood pres-
formed from the parents sure, diabetes, etc. It has become a
using a cross over and prominent cause of death these days.
mutation. Hence, predicting its occurrence
has gained prominence in the health 1. The data preprocessing by

field. Shortness of breath, feeling replacing the missing values
fatigue, and swollen feet are all some in the dataset.
of the symptoms for heart disease. 2. Best attributes selection
Early diagnosis of heart disease using GA. Attribute reduc-
helps in reducing the risk of heart tion helps in enhancing
attacks and death (Durga, 2015). The accuracy.
common method of diagnosing heart 3. Classifier creation using the
disease is based on examining and reduced attributes.
analyzing a patient’s medical history 4. Heart disease prediction
and the symptoms by cardiologists. using the created classifier.
This diagnosis is time-consuming
and not very precise (Al-Absi et al.,
2011). 4.4.4 ELECTROCARDIOGRAM
The difficulties involved in (ECG) EXAMINATION
the manual examination can be USING THE HYBRID
improvised by using few predictive GENETIC ALGORITHM AND
models based on GA combined with CLASSIFICATION TECHNIQUE
machine learning techniques like
SVM, K-nearest neighbor, decision The electrical activity of the heartbeat
tree, naive Bayes, fuzzy logic, arti- is measured using ECG. For every
ficial neural networks, and others. heartbeat, a wave is sent through the
The data mining repository of the heart as shown in Figure 4.9. The
University of California, Irvine also wave causes the heart to pump the
called as the Cleveland heart disease blood by squeezing the muscle. ECG
dataset provides the input dataset for has the following waves:
the Hybrid Algorithm. The dataset P-wave: created by the right and left
provides samples of 303 patients atria.
with 76 features and a few missing
values. This can be used for inves- QRS complex: created by the right
tigations related to heart diseases. and left ventricles.
This dataset can be taken as input to T-wave: made when the wave returns
different algorithms and classifica- to the state of rest.
tion techniques can be applied to The time duration taken by the
refine the investigation. wave to traverse through the heart
General steps involved in can be determined by measuring
developing a hybrid algorithm for the time intervals on the ECG. This
heart disease diagnosis include the helps in finding out the electrical
following: activity of the heart.
organized as layers. Deep belief

networks are used in recognizing,
clustering, and generating images,
motion capture data, and video
sequences. Once the RBM receives
the input data it is trained and it
passes the result to the next layer
as input. The initial weights are
assigned by DBN and then the error
FIGURE 4.9 Normal ECG propagation algorithm is performed
to obtain optimized performance.
However, using only a deep belief
GA can be used to detect QRS
complex in ECG as explained by network will not result in optimum
Diker et al. (2019). The QRS complex value for a number of nodes and
is the main and central spike seen on layers (Lim et al., 2018). Combining
the ECG line. Interpreting the QRS DBN with GA will prove to be better
complex in ECG allows the analysis in achieving an optimum solution.
of heart rate variations. P-waves and Steps involved are as follows:
T-waves from the ECG can then be
examined to diagnose the disorder. 1. Apply GA to find optimal
value by using selection,
crossover, and mutation.
4.4.5 HYBRID GENETIC Feed this training set into
ALGORITHM AND DEEP BELIEF deep belief network.
NETWORK FOR HEART DISEASE 2. Using unsupervised learning,
DETECTION construct the RBM network.
3. Learning the backpropaga-
Deep belief network is a class of tion algorithm using super-
neural networks that helps in perfor- vised learning.
mance improvement by maintaining
several hidden layers and by passing
the information to those layers. It 4.4.6 HYBRID GENETIC
consists of two different types of ALGORITHM WITH FUZZY
neural networks—belief networks LOGIC FOR PREDICTING
and restricted Boltzmann machines. HEART DISEASES
Belief networks are composed of
stochastic binary units along with GA can be used for feature extrac-
weighted connections. Restricted tion and fuzzy logic for classifica-
Boltzmann machine (RBM) is tion and prediction. GA is applied to
extract the relevant information from 2. Using the prediction equa-

the entire data set fed into it. It thus tions and prediction tables
helps in reducing the count of the for calculations.
attributes present in the dataset that 3. Combination of both of the
helps in reducing the search. above methods.
Steps involved are as follows:
With the considerable develop-
1. Select the required features ment of medical databases, the avail-
using GA. able traditional methods should be
2. Develop fuzzy rules. upgraded to more efficient methods
3. Fuzzify the input values. for computation.
4. To generate rule strength Artificial neural networks find
combine fuzzy input and rules. major application in analyzing
5. Generate the output distri- medical data and in dental predic-
bution by combining the tions. GA when combined with
rule strength with output artificial neural networks helps in
function. predicting the size of unemerged
6. Defuzzify the output. tooth. In the prediction process, the
GA finds the best reference and the
4.4.7 HYBRID GENETIC artificial neural networks predicts
the size of the tooth based on the
ALGORITHM AND ARTIFICIAL
information provided by GA.
NEURAL NETWORKS IN
The algorithm for prediction is
ORTHODONTICS
described as follows:
In growing children, predicting the
1. GA introduces the reference
size of unemerged tooth becomes
very important (Moghimi et al., tooth value during every
2012). By predicting the size of the iteration into artificial neural
unemerged tooth, the dentists can network.
evaluate whether the vacant space 2. Reference input values are
available is sufficient for the growth mapped to the target values.
of the permanent tooth in proper 3. Stopping criteria are
alignment. Three different methods checked.
are available for prediction: 4. If the obtained results are
satisfactory, the results are
1. Using radiographs for displayed. Else, genetic
measuring the unemerged algorithm shifts to the
teeth. next generation and again
searches for the better match diagnose the presence of malignancy

among the reference teeth. and heart disease in a patient is
5. The cycle repeats until the provided in detail. The explanation
stopping criteria are met or about the different GAs used often
until the predefined value is along with its various approaches
exceeded by the number of and its application. These approaches
generations. illustrate that hybridizing is a poten-
tial method to build a capable GA
that cracks tough problems swiftly,
At the end of the training, the
consistently, and precisely. The
GA will not be used during the
chapter also talks about the different
training process. The reference tooth
applications of these algorithms in
introduced by the GA will not be
real-time specifically related to the
used any further. The artificial neural
diagnosis of diseases.
network will further use the data to
predict the output by using the best
function for mapping. KEYWORDS
artifcial intelligence
4.5 CONCLUSION
genetic algorithms
The need for digitalization has been hybrid genetic algorithms
rising day by day. AI has contributed classifcation techniques
to many fields such as medical, image processing
automobile, education, etc. The
research to extract a huge amount
of information available in clinical
data to improve the diagnosis of a REFERENCES
disease is critical. GA helps to find Ahmad F., Mat-Isa N. A., Hussain Z.,
an optimal solution for complex Boudville R., Osman M. K. Genetic
data at a reasonable time. Hence, algorithm-artificial neural network
their usage in the field of medicine (GA-ANN) hybrid intelligence for
cancer diagnosis. Proceedings of the 2nd
helps the physician to solve complex
International Conference on Computational
diagnosing problems. The search- Intelligence, Communication Systems and
ability of GA can be increased by the Networks, July 2010, 78–83.
proper blend of GA with the local Alalayah K. M. A., Almasani S. A. M.,
search method. This blend is called Qaid W. A. A., Ahmed I. A. Breast cancer
diagnosis based on genetic algorithms and
hybrid GA. neural networks. International Journal of
In this chapter, an insight into Computer Applications (0975-8887), 180,
how hybrid GA can be used to 2018, 42–44.
Al-Absi H. R. H., Abdullah A., Hassan M. Holland J. H. Adaptation in Natural and

I., Shaban K. B. Hybrid intelligent system Artificial Systems. The University of
for disease diagnosis based on artificial Michigan, 1975.
neural networks, fuzzy logic, and genetic Idrissi M. A. J., Ramchoun H., Ghanou
algorithms. ICIEIS, 252, 2011, 128–139. Y., Ettaouil M. Genetic algorithm for
Cagnoni S., Dobrzeniecki A., Poli R., Yanch neural network architecture optimization.
J. Genetic algorithm based interactive Proceedings of the 3rd International
segmentation of 3D medical images. Conference on Logistics Operations
Image and Vision Computing 17(12),1999, Management (GOL), May 2016, 1–4.
881–895. Julesz B. Textons, the elements of texture
Davis L. The Handbook of Genetic perception, and their interactions. Nature,
Algorithms. Van Nostrand Reinhold, New 290(5802), 1981, 91–97.
York, USA, 1991. Lei L., Peng J., Yang B. Image feature
da Silva S. F., et al. Improving the ranking selection based on genetic algorithm.
quality of medical image retrieval using a Proceedings of the International
genetic feature selection method. Decision Conference on Information Engineering
Support Systems, 51(810), 2012, 810–820. and Applications (IEA), 8(25), 2012.
Diker A., Avci D., Avci E., Gedikpinar Lim K., Lee B. M., Kang U., Lee Y. An
optimized DBN-based coronary heart
M. A new technique for ECG signal
disease risk prediction. International
classification genetic algorithm wavelet
Journal of Computers, Communications &
kernel extreme learning machine. Optik,
Control (IJCCC), 13(14), 2018, 492–502.
180, 2019, 46–55.
Lowe D. G. Distinctive image features from
Durga Devi A. Enhanced prediction of
scale-invariant keypoints. International
heart disease by genetic algorithm and
Journal of Computer Vision, 60(2), 2004,
RBF network, International Journal of
91–110.
Advanced Information in Engineering
Minu R. I., Thyagharajan K. K. Semantic
Technology (IJAIET), 2(2), 2015, 29–36.
rule based image visual feature ontology.
El-Mihoub T. A., Hopgood A. A., Nolle L., International Journal of Automation and
Battersby A. Hybrid genetic algorithms: a Computing, 11(5), 2014, 489–499.
review, Engineering Letters, 13(3), 2006, Moghimi S. Talebi M., Parisay I. Design
124–137. and implementation of a hybrid genetic
Ghaheri A., Shoar S., Naderan M., Hosein S. algorithm and artificial neural network
S. The applications of genetic algorithms system for predicting the sizes of unerupted
in medicine. Oman Medical Journal, 30(6), canines and premolars. European Journal
2015, 406–416. of Orthodontics, 34(4), 2012, 480–486.
Grosan C. and Abraham A. Hybrid Mohanty A. K., et al. A novel image
evolutionary algorithms: methodologies, mining technique for classification of
architectures and reviews. Studies in mammograms using hybrid feature
Computational Intelligence, 75, 2007, selection. Neural Computer and
1–17. Application, 22(1151), 2013, 1151–1161.
Haq A. Ul, Li J. P., Memon M. H., Nazir Nagarajan G., Minu R. I. Fuzzy ontology
S., Sun R. A hybrid intelligent system based multi-modal semantic information
framework for the prediction of heart retrieval. Procedia Computer Science, 48,
disease using machine learning algorithms. 2015, 101–106.
Mobile Information Systems, 2018, 2018, Neelamegam S., Ramaraj E. Classification
1–21. algorithm in data mining: an overview.
International Journal of P2P Network Shunmugapriya S., Palanisamy A. Artificial

Trends and Technology, 3(5), 2013, 1–5. bee colony approach for optimizing
Santhanam T., Ephzibah E. P. Heart disease feature selection. International Journal
prediction using hybrid genetic fuzzy of Computer Science Issues, 9(3), 2012,
model. Indian Journal of Science and 432–438.
Technology, 8(9), 2015, 797–803. Smith L. I. A Tutorial on Principal
Schaffer J. D., Whitley D., Eshelman L. J. Components Analysis, Cornell University,
Combinations of genetic algorithms and 2002.
neural networks: a survey of the state of Wan W., Birch J. B. An improved hybrid
the art. Proceedings of the International genetic algorithm with a new local
Workshop on Combinations of Genetic search procedure. Journal of Applied
Algorithms and Neural Networks Mathematics. 2013, Article ID 103591,
(COGANN-92), June 1992, 1–37. 2013, 10.
CHAPTER 5
HEALTHCARE APPLICATIONS USING

BIOMEDICAL AI SYSTEM
S. SHYNI CARMEL MARY* and S. SASIKALA
Department of Computer Science, IDE, University of Madras,
Cheapuk, Chennai 600 005, Tamil Nadu, India
Corresponding author. E-mail: shynipragasam@gmail.com
*
ABSTRACT amends these insufficiencies. The

importance of AI in India has become
Artificial intelligence (AI) in health- a key technology for improving the
care is an emerging trend identified effectiveness, value, cost, and reach
as a collection of technologies, of healthcare. The ethical issues
programmed to sense, comprehend, for applying AI in healthcare must
act, and learn. Its performance is be analyzed and standardized. The
appreciated for the administrative encounters to the use of AI in health-
and clinical healthcare functions care were recognized primarily
and also for research and training through an analysis of literature,
purposes. AI includes natural interviews, and roundtable inputs.
brainpower, instrument learning,
deep learning, neural networks,
robotics, multiagent organisms, 5.1 INTRODUCTION
modernization in healthcare, and
indistinct reasoning. Health includes The new paradigm view of non-
healthcare, diagnostics, sanatoriums, natural brainpower is an online-
and telemedicine, medications, enabled technology that sets guiding
therapeutic utensils and supplies, principles and recommendations to
health assurance, and medical data. the community to help those who
Healthcare curriculums are restricted are involved in decision making
by the unavailability of clinicians in all the domains. Especially in
and inadequate dimensions, and AI the medical domain, AI is used to
evaluate the data and predicts the Medical science is also using tech-
deceases and prescribes the medica- nology, but it defines proper ways
tion accordingly. It is interlined with to implement technology, and the
the lifestyle of the individuals and processing depends on the require-
their related data. The evaluation ments of the medical applications,
of artificial intelligence (AI) tech- which is a challenging task to the
nology leads to think and generate researchers.
knowledge based on the continuous The experts use computation
data analysis process and predict as processes in the medical science
human experts by adopting various field (He et al., 2019) as a tool, but
algorithmic processes including the tools are advanced by computer
industry 4.0. specialists with different technical
The life span of humans varies in procedures. The technical experts
various ways because of this contem- derive the concept and incorporate
porary world. The current upcoming the same in several medical science
science and technology affects the applications. Data and images anal-
lifestyle in both fields of advance- yses are calculated by the medical
ment and defect. Revolutions field computational process. The
happening in this modern era are research process comprises analysis
because of the prominent invention and design of the solution and
of computer processing. Computing implements the identified existing
and knowledge processing applica- algorithms in medical science.
tions expand the living conditions
in many areas and many ways.
Medical science is one of the poten- 5.2 APPLICATIONS OF
tial domains where the computing BIOMEDICAL AI SYSTEM
process is used for the advancement
of human lifestyle. Many research AI is playing a vital role in the
programs are carried out using a Industry 4.0 revolution. AI is the
combination of computing processes replication of human intelligence
and medical applications for the processes by machines, especially
betterment of human health (Jiang et computer systems. These methods
al., 2017). Most of the medical appli- include acquisition of informa-
cations used with computations are tion and procedures for using the
aiding practitioners to take decisions information, reaching approximate
on their drug recommendations and or definite conclusions and self-
the identification of the diseases. In correction. The AI technologies are
all of the fields, computing technolo- used in Big Data Analytics, Autono-
gies are well established and applied. mous robots, Simulation, Internet of
Healthcare Applications Using Biomedical AI System 101
Things, etc. The industrial world is and clinical decisions. It provides

adopting revolutionary technologies diagnosis and therapeutic interven-
in the name of “Industry 4.0” for tions in different fields of medical
scaling up socioeconomic applica- science, such as cardiology, genetics,
tions. These technologies are applied neurology, radiology, etc., for identi-
to enhance the scientific processes on fying the variation of abnormalities,
the existing analysis and predictions analyses, and predictions. AI can
in science, business, and healthcare definitely assist physicians to make
domains. effective clinical decisions or even
replace human verdict in certain
functional areas of biomedical
healthcare processes, like radiology
(Mazurowski et al., 2019) for image
processing.
The AI process aids to keep the
humans well based on the analysis
and observations of their regular
lifestyle. The wellness starts from the
new born baby vaccination process
FIGURE 5.1 Applications of AI and and continues throughout the entire
robotics in healthcare. life of humans. AI recommends the
vaccination process in accordance to
When a model and its simulation the analysis of regional factors and
states are applied in the biomedical family history. The AI techniques
processes, it is called a biomedical generate a customized individual
system (Salman et al., 2017). Its vaccination record as per the child’s
requirements and process states are immune system. The welfare recom-
different according to the simulation mends the food chart, growth chart,
environment integrated with medical and regular physical and mental
applications (Yu et al., 2018). An growth verification for the newborn
AI-based system integrated with child. The well-being process is an
a biomedical model produces a integrated analysis of the human
biomedical AI system. The primary physical system (Faust et al., 2018)
objective of AI is providing a cogni- and its ability to perform a sensitive
tive construct to humans that will process that is an integrated and
mimic human knowledge and also trained set of AI algorithmic process
the cognitive process (Pesapane as per the regional and common
et al., 2018). In healthcare, AI is factors. The AI technology recom-
predominantly used for diagnosis mends a rule-based approach and
generates a good process systematic 2016) such as resolution trees, naive

approach for regular assessment and Bayes cataloging, ordinary least
health development process. square regression, logistic deteriora-
Further, AI technology can be tion, upkeep vector machine, and
applied for early detection based ensemble method. It also adopted a
on the AI-rule-based approach and few unsupervised algorithms such
can develop the new rule as per the as clustering algorithms, principal
regional factors. There are two ways component analysis, singular value
the detection process is carried out decomposition, and independent
in the medical phenomena, namely, component analysis.
data and image analysis process. In the diagnosis process, AI
The early detection process collects plays a vital role in accuracy and
the set of text and numerical data, prediction (Paul et al., 2018). The AI
matches the similarity of the current techniques diagnose patients without
existing case, and predicts the a doctor. A software produces a diag-
possibility of the disease as per the nostic result, detecting the fulfilment
similarity occurrence. The AI logical or partial fulfilment of the condi-
and predictive approach finds the tions. The AI approach diagnoses via
similarities to determine the possible passive and nonpassive approaches.
diseases. This approach adopted and The results are compared with the
applied predictive algorithms using standard values, and identification of
personal data. Processing personal variations in these variants leads to
data involves the use of big data determine the diseases. The AI algo-
algorithms (Lo’ai et al., 2016) such rithms such as crunchers, guides,
as clustering and classification for advisor, predictors, tacticians, strate-
the more accurate process of early gies, lifters, partners, okays, and
detection. It depends on the appro- supervisory approaches are applied
priate data process of the questions for detecting the diseases through
of data scope, quality, quantity, and diagnoses.
trustworthiness. In this process, The decision-making process
the questions must be framed care- in the medical domain is highly
fully by deciding the objective of supported by artificial techniques. It
the prediction and its process. The is applied based on the training data
algorithmic process determines the set and uses the current and captured
appropriateness of the data needed for data to decide the factors, depending
the research and aims for the detec- on which the decisions are arrived. It
tion of diseases. The early detection is a machine that is able to replicate
process adopts supervised learning humans’ cognitive functions to solve
process algorithms (Ramesh et al., the problems focused on healthcare
issues (Rodellar et al., 2018; Xu pattern and will play a major role to
et al., 2019; William et al., 2018; subscribe the treatment. AI can be
Johnson et al., 2018). It helps the used for life care process with the
well-being industry by two ways, support of robotics. Robots are used
such as making decisions for experts in emergency situations, for rescue
and common people. Common purposes, as emergency servants,
people used these self-recommended and in all other possible manners.
decisions based on captured data and In the current scenario, they are
with the support of AI for instant performing activities as an expert
remedies. At the same time, it helps tutor and train the experts too.
the experts to decide a complex case. According to the availability
Treatment is a process for of the biomedical data and rapid
subscribing medication or surgical development of device knowl-
processes in the medical industry. edge, profound education, natural
The AI techniques are supporting to language processing (NLP),
subscribe medication based on the robotics, and computer visualization
preceding cases and training. It is techniques have made possible to
highly recommended to rely on the create successful applications of AI
training data set with all possible in biomedical and healthcare appli-
comparisons of the rules that are cations. The authoritative AI tech-
applicable according to the cases. niques can reveal medically relevant
The AI system will consider the knowledge hidden in the massive
regional factors, experts’ opinions, volume of data, which in turn can
and previous cases with a similar contribute to decision making.
FIGURE 5.2 Industry 2.4 technologies.
Medical data mining used by AI exposure to required patient care

is to search for relationships and and improved cure rates using
patterns within the medical data computational applications. The
that will provide useful knowledge application of data mining methods
for effective medical diagnosis. The in the medical domains is to improve
probability of disease will become medical diagnosis. Some examples
more effective, and early detection include predicting breast cancer
of disease will aid in increased survivability by using data mining
techniques, application of data 5.3 HEALTHCARE

mining to discover subtle factors APPLICATIONS OF THE
affecting the success, failure of back BIOMEDICAL AI SYSTEM
surgery that led to improvements in
care, data-mining classification tech- The AI approach is used for predic-
niques (Kim) for medical diagnosis tion from a massive amount of data
decision support in a clinical setting, for generating a configuration and
and data mining techniques used to to draw knowledge from experience
search for relationships in a large independently. Medical practitio-
clinical database. ners try to analyze related groups
Many research works focused of subjects, associations between
on the development of data mining subject features, and outcomes of
algorithms to learn the regularities in interest. In the diagnostic domain, a
these rich, mixed medical data. The substantial quantity for the existing
success of data mining on medical AI-analyzed records of indicative
data sets is affected by many factors. imaging, hereditary testing, and elec-
Knowledge discovery during trodiagnosis is used to determine the
training is more difficult if informa- diseases based on available clinical
tion is irrelevant or redundant or data for the benefits of human’s
if the data is noisy and unreliable. healthcare systems.
Feature selection is an important In this chapter, the healthcare
process for detecting and eradicating applications of the biomedical AI
as much of the inappropriate and system are reviewed in the following
aspects.
redundant information as possible.
The necessary preprocessing step
1. The motivation behind using
for analyzing these data, i.e., feature
AI in healthcare.
selection, is often considered, as this 2. Different techniques of AI.
method can reduce the dimension- 3. Role of AI in various health-
ality of the data sets and often leads care applications.
to better analysis. Research proves
that the reasons for feature selection Knowing the importance of AI
include improvement in performance in healthcare with the technologies
prediction, reduction in computa- that are supporting different aspects
tional requirements, reduction in of healthcare will help to get more
data storage requirements, reduction understanding of how things are
in the cost of future measurements, developed and utilized, and this
and improvement in data or model will also emerge thing to new
understanding. innovations.
5.3.1 THE MOTIVATION OF complicated tasks or the mistakes

USING AI IN HEALTHCARE done by human interventions.
Implementation of AI in healthcare
In the recent era, medical issues includes machine learning (ML)
challenge the society as well as as a division of AI that is used
the practitioners in identification for clustering, classification, and
and decision support-related treat- predictive analysis of medical data.
ments. There are many reasons for Its enhanced application of neural
the emergence of AI in the field of
networks is introduced with deep
medicine. The factors considered as
learning algorithms and genetic
most important are an increase in
algorithms.
available healthcare data, requiring
a greater efficient practitioner and
healthcare system, due to rising
external burden in medicine, for the
diagnosis, and people demanding
fast and more custom-made care. So,
for this, AI is introduced in the field
of healthcare that uses a collection of
algorithms. This provides sophisti-
cations to learn and develop a pattern
from the large volume of data more
accurately, aids the physician to get FIGURE 5.3 Applications of AI techniques
more knowledge and understanding in healthcare.
about the day-to-day updates in
the field, and reduces the errors or Big data analytics is in the
the flaws occurring due to human recent trend to handle the medical
clinical practices in diagnostics and healthcare data that are collected
therapeutics. from mass screening, diagnosis, and
treatment monitoring and evaluation
5.3.2 DIFFERENT through methods of demographics,
TECHNIQUES OF ARTIFICIAL therapeutic notes, electronic
INTELLIGENCE recording of medical strategies, and
physical examinations. Cyber secu-
To understand AI, it is more useful rity is an ethic of AI to be applied
to know the techniques and algo- in healthcare. The most important
rithms and how they are processed techniques of artificial intelligence
for the implementation of solving are described below.
5.3.2.1 MACHINE LEARNING identifies the edges, but it depends

on user-defined parameters. Genetic
ML is a technique that must be algorithm is very slow in execution,
understood to assist AI. This algo- but it tries to search the best solu-
rithm represents the data structure tion. Active contour is used to detect
for the clustering and classification the boundaries of the image, and
of abnormalities. It consists of statis- its drawback is it takes much time
tical inference to give more accuracy for energy minimization. K-means
in prediction and classification. algorithm detects the threshold, but
The algorithm in ML is categorized then, its overlapped cells cannot
based on supervised and unsuper- be separated from the background.
vised algorithms. The supervised Fuzzy logic will discover the robust-
algorithm is learning with input and ness and uncertainty in the data,
output that produce the prediction but it not flexible because of firm
and classification result by mapping. data. Classification algorithm such
So, based on the trained data, the as decision tree performs well only
new data result will be produced. when data are trained carefully.
However, the unsupervised algo- Bayesian network understands the
rithm is ideal for the dispersal of data based on statistical inference;
data to learn more about the data. So, k-nearest neighbor (KNN) and artifi-
there is no output, and the algorithm cial networks are very fast. Although
by itself has to discover and deliver these are fast, they face difficulty
the interesting pattern in the data. when used for large attributes and
The popular image analysis and tolerance of noise data. Finally,
ML algorithms are water immersion, support vector machine is suitable
Hough transform, seed-based region for only linear data, unless using
growing, genetic algorithm, active multiple parameters and kernel trick
contour, K-means, fuzzy logic, etc. multiple classes of classification are
Each algorithm works efficiently and not possible.
gives a unique result for different
applications. Water immersion is a
segmentation technique used to find 5.3.2.2 NEURAL NETWORKS
the closed boundaries of the region
of interest in the data, but it is not Information technology and the
suitable for overlapping cells. Hough biological system that has distributed
transform is used to find similarities communication nodes are stimulated
in shapes, and it is suitable only neural networks. There are many
for round-shaped images. Seed- algorithms used in a neural network.
based region growing algorithm Very few are important, such as
radial basis function network, Hop field network has the feedback
perceptron, back propagation, connection that is called recurrent.
logistic regression, gradient descent, They are fully interconnected neural
and hop field network. Radial basis networks and applied for image
function network has an activation segmentation only if it has an opti-
function called radial basis func- mization problem.
tion in the hidden layer. It has an
input layer, hidden layer, and linear
output; also, it is used for the time 5.3.2.3 DEEP LEARNING
series prediction, classification, and
system control. Perceptron is a linear Deep learning is an extension of ML
unsupervised binary classifier. There based on neural network concepts that
are two layers of perceptron such can be a supervised network, a semi-
as a single layer and a multilayer. supervised network, or an unsuper-
The multilayer is called the neural vised network. There are many deep
network. It has input layer weights learning algorithms available such
and bias, net sum, and activation as deep neural network (DNN), deep
function. belief network (DBN), recurrent
Backpropagation is used for neural network (RNN), and convo-
the classification that is essential lutional neural network (CNN),
for neural network training. It back restricted Boltzmann machine
propagates the information about (RBM), auto encoder network, and
the error through a network that long short-term memory (LSTM).
is made by wrong guesses of the This LSTM is a special type of RNN
neural network. So, immediately ML technique. The advantage is
according to the error information, that the expected output compares
the parameter passed by the neural with models’ output with updates
network will be changed in one in weights. All of these algorithms
step at a time. Logistic regression are used in compute vision, speech
is a nonlinear binary classification recognition, and NLP, drug design,
method that translates the signal and medical image analysis. DNN
into space from 0 to 1. It is used to is a feedforward network with a
calculate the probability of the set of complex nonlinear relationship,
input with the label that is matched. where data flows through input to
Gradient descent is a neural network output without looping back. It has
algorithm used to access the weight a multilayer between the input and
based on the error rate and to find the output. DBN is an unsupervised
local minimum of the function that probabilistic algorithm that consists
is known as an optimum function. of a multilayer of the stochastic
latent variables. Latent variables for the predictive representation of

are binary and also called as feature patients in an unsupervised learning
detectors and hidden units. method from the electronic health
RNN is a type of neural network record. However, there are some
that has an in-built memory so that all limitations, such as it requires a large
the information about the sequence volume of data with high expensive
is stored and the output result is computation to train the data and
produced as an input of another interpretability is difficult.
network. This process is not there
in other traditional neural networks,
and it is also used to reduce the 5.3.2.4 BIG DATA
complexity of the parameter. CNN
is the deep artificial neural network The word big data represents a huge
that is used for classification done volume of data. These data can be in
only after by clustering the image any of the following form: the data
similarity with image recognition. cannot be stored in memory and the
RBM is a simple type of neural data that cannot be managed and
network algorithm that is used to retrieve because of its size. Big data
minimize the dimensionality, clas- give great hope in healthcare for the
sification, and reduction, and it also data collected from the biological
works like perceptron. Auto encoder experiments for the medical
is an unsupervised algorithm that drugs and new ways of treatment
works like a back propagation algo- discovery, efficient patient care, in
rithm. It is trained to feed input to and outpatient management, insur-
its output with the help of a hidden ance claim, and payment reimburse-
layer that has the code to represent ment. So, there are different types of
the input layer. This has been used healthcare data. The different types
for dimensionality reduction and of biomedical data are integrated
feature extraction. LSTM is an artifi- electronic health record (EHR),
cial neural network that works like a genomics-electronic health record,
RNN. It is used for the classification, genomics-connectomes, and insur-
very particularly for the predictions ance claims data. Big data algorithm
based on the time series. and its architecture are used to
Deep learning is the most popular manage the collection of data to
technique in the recent era because of promise the tremendous changes for
its advancements. It has the following effective results.
advantages that are features used as Some of the important big data
an input for the supervised learning concepts and their applications
model. All these algorithms are used commonly used are discussed here.
The map reduce method with the awful clinicians. Instep, they are
stochastic gradient descent algorithm frequently ascribed to cognitive
is used for the prediction purpose. blunders (such as disappointments in
Logistic regression is used to train discernment, fizzled heuristics, and
EHR with the stochastic descent predispositions), anon attendance or
algorithm. There are many archi- underuse of security nets and other
tectures developed for monitoring conventions.
personal health such as Meta Cloud The utilization of AI advances
Data Storage architecture, which is guarantees to diminish the cognitive
used to transform the data collected workload for doctors, in this way
into cloud storage. To store a huge set moving forward care, symptomatic
of data, the Hadoop Distributed File exactness, clinical and operational
System is used. To secure big data productivity, and the in general
that are all collected and transformed quiet involvement. Whereas there
into the cloud, the data are protected are reasonable concerns and dialogs
by using an integrated architecture around AI taking over human occu-
model called the Grouping and pations, there is restricted proof to
Choosing (GC) architecture with the date that AI will supplant people
Meta Fog Redirection architecture. in well-being care. For illustration,
Logistic regression implanted with many considerations have been
the MetaFog architecture that is
proposed that computer-aided read-
used for the prediction of diseases
ings of radiological pictures are
from the historical record when the
fair as precise as (or more than)
dependent variable has existed.
the readings performed by human
radiologists.
5.3.2.5 COGNITIVE Rather than large-scale work
TECHNOLOGIES misfortune coming about from the
computerization of human work, we
Cognitive innovations are excep- propose that AI gives an opportunity
tionally and famously presented for the more human-centric approach
in well-being care to diminish to increase. Different from mechani-
human decision making and have zation, it is increasingly presumed
the potential to correct for human that savvy people and savvy
blunder in giving care. Restorative machines can coexist and make
mistakes are the third driving cause way for better results than either
of death all over India and addition- alone. AI frameworks may perform
ally in the world, but they are not for a few well-being care errands with
the most part due to exceptionally constrained human intercession,
subsequently liberating clinicians to an appropriate policy infrastructure

perform higher level assignments. must be developed for introducing
the device effects due to lack of
stability, unanimous definition, or
5.3.2.6 CYBER SECURITY instantiation of AI. In the event that
the AI gadgets ought to work and
The AI applications are applied perform more rapidly and precisely,
within the restorative choice making indeed, on the off chance that they
with tall unmistakable comes have similarity to the human neural
about since of the advancement of systems and think in an unexpected
progressed calculation combined way unlike humans. AI medical
with adaptably getting to computa- devices have enough and more
tional assets. Its framework plays opportunities in every available
a major role within the process situation, but in contrast, humans
information and makes a difference are incapable of doing proper
for the authorities to create the right processing and making the right
choices. Doctors who handle the decisions. Quickly, it starts making
restorative and radiological gadgets a self-governing determination in
that give the touchy administrations connection to analyze medicines
require appropriate preparation and moving ahead of its work as just
and certification. Obviously, the a bolster instrument and an issue
applications are considered by their comes up as to whether its designer
standards based on which artificial can be held dependable for its choice.
systems are held so that the proce- The primary question to address is:
dures and techniques will prove their who will be sued in case an AI-based
standards. Profoundly speaking, the gadget makes a botch?
AI system provides forward thinking In this AI framework, the ques-
about every potential scenario tion remains to be unanswered about
and every possible consideration, the guardian relationship between
whereas the humans, within the limi- the patients and restorative frame-
tations, consider what is apparent to works with most recent approach
their brains alone. activities, the direction of informa-
AI medical devices facilitate the tion assurance and cyber security,
medical field for various reasons. and the talk about almost the bizarre
They may be adopted for generating responsibilities and duties. So, it has
revenue for manufacturers and to address the trustworthiness of the
the physicians will possibly have data. Of course, AI is a good applica-
additional tools at their disposal. tion because of its power, assistance,
Whenever an AI device is proposed, and value, but on the contrary, it
must be taken seriously so that no information assurance and cyber

bad and unethical uses of the tech- security. The European Govern-
nology are formulated. If there is any ment shaped common information
unethical use of technology, this will assurance control over information
be dangerous, so all the authorities, assurance and cyber security orders,
especially physicians, patients, and restorative gadget control, and in
other technicians, must work together vitro demonstrative therapeutic
to prevent any evil in this scenario. If gadget directions. In the United
there is a good spirit of cooperation States, the well-being protections
in this healthcare system, we could movability and responsibility act
find a balanced situation in providing (HIPAA) is a person right to security,
security and privacy protection. It over well-being data and preparation
is also important to maintain the for how to handle the touchy infor-
ethical use of sensitive information mation. The government nourish-
to give the surety about both the ment medicate and restorative act
human and structured management (FDA) for cyber security permitted
of the patients. It is mandatory that the producer to depict the hazards
the ethical views and challenges and steps to limit the defenseless-
have to be very clear to the point and ness. So, the rapid development of
be safeguarded against some social AI should provide safety and trust-
evils. There is also a possibility of worthiness in healthcare.
ethnical biases that could be built
in the medical algorithms since the
healthcare output already varies by 5.3.3 ROLE OF AI IN VARIOUS
ethnicity. HEALTHCARE APPLICATIONS
So, the use of AI leads to two
problems with regard to data collec- AI approaches are used for predic-
tion from various devices. One tion from a large amount of data to
is providing security for the data generate a pattern to draw knowl-
collected and stored using these edge from experience independently.
devices, and the second one is that the Its techniques are effectively used
same data is endangered by the cyber to develop biomedical healthcare
hackers when using such devices. systems. It is applied for the detec-
So, there is an urge to provide data tion, diagnosis, and characterization
protection and cyber security for the of diseases and interpreting medical
AI device system applications. images.
As of now, the European and Wealthy semantics may be a
American Government administra- connection of characteristically
tive formulated an enactment for depicted strategies for data extraction
from clinical writings; the extrac- The manufactured neural arrange

tion of events is to analyze clinical has moreover demonstrated its
medicines or drugs primarily for the capacity by working on the clas-
reason of clinical coding, location of sification of heart diseases. In this
sedate intuitive, or contradictions. procedure, for the classification of
ML and NLP are the successful a stroke, the input of the sensor is
techniques in providing semantics given to the framework that employs
that includes a description of clinical a bolster forward arrange with the
events and relations among clinical run the show of back engendering
events (e.g., causality relations). within the conceivable strategy. The
Additionally, underpins in subjec- viable result of classification is given
tivity, extremity, feeling or indeed by the re-enactment framework.
comparison for alter in well-being Heart beat classification and
status and unforeseen circumstances congestive heart failure detection
or particular restorative conditions are done using a convolution neural
that affect the patient’s life, the network deep learning method. It
result or viability of treatment and also helps in two-dimensional (2D)
the certainty of a determination. image classification for three-dimen-
DNN is used for speech recognition sional (3D) anatomy localization.
with a Gaussian mixture model. It is Heart diseases of both invasive and
utilized to extricate highlights from noninvasive types are detected using
crude information and for discourse deep learning algorithms. Indicators
feeling acknowledgment. For each of heart diseases are end-systolic
emotion segment, the emotion prob- volume and end-diastolic volume of
ability distribution is created. the cleared out ventricle (LV) and
the discharge division (EF). These
are all portioned utilizing U-Net
profound learning engineering. It
may be a convolution neural arrange
that is commonly connected for the
division of biotherapeutic pictures.
MRI brain tumor examination
is conducted by utilizing manufac-
tured neural arrange procedures
for the classification of pictures in
demonstrative science. A common
relapse neural arrange (GRNN) is
utilized as a 3D method of classifi-
FIGURE 5.4 Heart disease. cation and pooling-free completely
convolutional systems with thick profound convolutional encoder

skip associations for the picture of system is connected for the divi-
the brain tumor. sion of sclerosis. Profound learning
convolution neural organizes for
the brain structure division with a
distinctive fashion of engineering
such as patch-wise convolution
neural organize design, semantic-
wise convolution neural arrange
design, and cascaded convolution
neural arrange design. These designs
contain pooling, enactment, and
classification (completely associate).
The convolutional layer yields
FIGURE 5.5 MRI brain tumor. feature maps by employing a kernel
over the input picture. The pooling
Many methods and algorithms layer within the design is utilized
are used for the classification of to down test that comes about. The
deformations in the acquired image. actuation capacities are amended
Exceptionally, the late slightest straight unit (ReLU) and defective
squares bolster vector machine ReLU.
(LSSVM) could be a well-known Brain glioma segmentation can
strategy utilized for the conclusion be done using a deep neural network
of ordinary and irregular ranges of with max pooling and without using
the brain from the information of pooling. Cervical cancer discovery
MRI. By the independent way of and classification of pap-smear test
classifying the MRI picture, it gives pictures are done utilizing ML strat-
a successful result with more promi- egies. Numerous sorts of enquires
nent exactness than other classifiers. have experienced utilizing ML
Prognosis of Alzheimer’s disease calculations for the highlight divi-
is diagnosed by sparse regression and sion, including choice and classifica-
deep learning combination called the tion. Very common unsupervised
deep ensemble sparse regression segmentation and classification
network. The leftover profound algorithms are the K-NN method,
neural arrange (ResNet) design C4.5 and logical regression, SVM,
is utilized to assess the capacity random forest, DT, Bayesian, SVM,
in foreseeing 06-methylguanine FCM clustering, and Otsu’s method.
methyltranferase quality status for All these algorithms are applied to
the division of the brain tumor. The this type of data and achieved the
maximum accuracy. The calcula- the skin to detect muscle actuation,

tions such as K-NN, metaheuristic constrain, and state of the muscles.
calculation with a hereditary calcula- EEG is measured by placing termi-
tion, moment arrange neural arrange nals on the cranium to discover the
preparing calculations Levenberg– brain movement. ECG is measured
Marquard with versatile force id by setting the anodes on the chest
known as LMAM and optimized to record the human heart action.
Levenberg–Marquard with versatile EOG measures the corneoretinal
energy is known as OLMAM calcu- potential by placing terminals within
lations are used. Different methods the back and front of the human eye
applied for identifying cervical developments.
cancer are two level cascade classi- Deep learning algorithm used
fication utilizing C4.5 and consistent with EMG signals for the limb
relapse, KNN, fluffy KNN, SVM, movement estimation employs
and irregular timberland-based RNN that performs well and esti-
classifier, iterative edge strategy, mates a 3D trajectory. CNN is used
and pixel level classification such as for the neuroprosthesis control,
pixel level examination. movement intension decoding, and
Classification results can be gesture recognition. Deep learning
obtained not only by the clas- calculation with EEG signals are
sification algorithm but also after for the applications such as EEG
implementing the segmentation interpreting and visualization,
algorithms such as SVM block-wise translating energized developments,
feature extraction with the Otsu’s but do the expectation for patients
segmentation method. Cervical cell with a post-anoxic coma after
segmentation is generally performed cardiac capture, segregate brain
by the scene segmentation method, movement, brain computer inter-
decision tree, Bayesian, SVM, and face, seizure discovery, following
combination of the three individual of neural elements, forecast of
posterior probabilities. Profound drivers cognitive execution, reaction
learning calculations play a critical representation, highlight extraction,
part in extricating data from the epileptic seizure forecast, and engine
physiological signals such as symbolism classification connected
electromyogram (EMG)-, electroen- with convolution neural arrange.
cephalogram (EEG)-, electrocardio- Auto encoder is applied to iden-
gram (ECG)-, and electrooculogram tify the sleep state. RBM is used for
(EOG)-based healthcare applica- the application of motor imagery
tions. EMG flag measures the elec- classification and effective state
trical sensors called as anodes put on recognition. Deep learning network
is applied for emotion recognition. organizing and discovering of

DBM performs well in the applica- Alzheimer’s infections, breast
tions such as emotion classification, cancer, and lung cancer.
motor imagery, and detecting target Lesions in various parts of the
images. DL calculation with EGG human body are detected and clas-
signals is applied for applications like sified using different neural network
arrhythmia discovery by utilizing architectures such as convolution
distinctive interims of tachycardia layers, CNN, Alex net, and VGG
EGG portions, coronary supply route with small changes in the weight
infection flag distinguishing proof, and layers. Layer performances are
coronary course illness discovery, tested using area under curve (AUC)
screening paroxysmal atrial fibrilla- to reduce the overtraining. ResNet
tion, checking and identifying atrial and the Inception architecture are the
fibrillation, discovery of myocardial best networks used for presurgical
violation, and distinguishing proof MRI of the brain tumor. Inception
of ventricular arrhythmias are V3 network seems to be the best one
comprise of convolution neural for the feature extraction rather than
organize. LSTM is utilized for the GoogLeNet, which is used for
coronary supply route malady flag the classification.
distinguishing proof. Breast mass lesion classification
RNN is applied for sleep apnea is done using deep learning methods
detection. The RBM algorithm is such as RNN with LSTM for training
implemented for signal quality clas- the data, and the features are extracted
sification and heart beat classifica- using VGGNet. Also, the perfor-
tion. Auto encoder is used with the mance is tested using the ROC curve
signal EOG and EEG for the driving (AUC) method to distinguish benign
fatigue detection application. CNN and malignant lesion. Classification
is used with EOG, EEG, and ECG of genomic subtypes is performed by
for the momentary mental workload utilizing three distinctive profound
classification and with EOG signal learning approaches that are used to
for drowsiness detection. DBN is classify the tumor according to their
applied with EOG, EEG, and EMG atomic subtypes: learn the tumor
for the classification of sleep stage. patches as a pre-prepared strategy
Profound learning is connected from and extricate standard highlights by
one-dimensional physiological flag neural systems utilized for classifica-
to 2D therapeutic pictures for picture tion with a back vector machine. The
investigation and utilizing these design of neural systems is connected
signals to determine and to derive in GoogleNet, VGG, and CIFAR.
a conclusion with computerized The 10-fold cross approval strategy is
utilized for approval, and the region cannot demonstrate that a suspi-
beneath the collector working charac- cious zone is threatening or kind.
teristic (AUC) is utilized to degree its To choose that, the tissues should
execution. be evacuated for examination by
Computerized mammograms are utilizing breast biopsy procedures.
among the foremost troublesome An untrue positive discovery may
therapeutic pictures to be studied cause superfluous biopsy. Measure-
due to their low contrast and contrast ments show that 20–30 rates of
within the sorts of tissues. Vital breast biopsy cases are demonstrated
visual clues of breast cancer incorpo- as cancerous. In an untrue negative
rate preliminary signs of masses and discovery, a real tumor remains
calcifications clusters. Tragically, undetected that might lead to higher
within the early stages of breast costs or indeed to fetch a human
cancer, these signs are exceptionally life. Here is the tradeoff that shows
subtle and varied in appearance, creating a classification framework
making conclusion troublesome that might specifically influence
and challenging indeed for pros.
human life. In expansion, the tumor
This is often the most reason for the
presence is distinctive. Tumors
advancement of the classification
are of diverse shapes and a few of
system to help masters in restorative
them have the characteristics of the
teaching. Due to the importance of a
ordinary tissues. The density level of
robotized picture categorization to
assist physicians and radiologists, tumors decides the level of the stage.
much inquiry within the field of The stage determination process is
restorative picture classification has explained in the next session.
been done as of late. With all this Lung cancer is recognized and
exertion, there is still no broadly analyzed by utilizing computed
utilized strategy to classify restor- tomography pictures that have been
ative pictures. This is often because utilized with CNN for a computer-
the therapeutic space requires tall ized classifier, recognizing prescient
exactness and particularly the rate of highlights. Profound highlight
untrue negatives to be exceptional. extraction with preprepared CNN is
In expansion, another vital proficient than the choice tree classi-
calculates that impacts the victory of fier. VGG-f pretrained CNN is used
classification strategies is working for linear unit feature extraction, and
in a group with therapeutic masters, the result will be better when feature
which is alluring but frequently not ranking algorithm is tailed with a
achievable. The results of mistakes random forests classifier. Fine-tuning
in discovery or classification are a CNN is the strategy that has been
exorbitant. Mammography alone pretrained with a huge set of named
preparing information, which is the The convolutional neural arrange

issue confronted in the profound is used to identify and classify
convolutional neural arrange (CNN). districts of intrigued as threatening or
In lung knob classification, multi- kind. A profound convolution neural
edit convolution neural organize arrange (DCNN) is utilized with FP,
(MC-CNN) is connected to extricate which incorporates four convolu-
knob notable data by employing a tional layers and three completely
novel multi-crop pooling approach associated layers connected to
that crops different locales from CNN mammogram breast pictures for
including maps with max-pooling. mass discovery. The problem occur-
Programmed division of the liver and ring in layer implementation is over-
its injury utilizes a profound learning fitting, and this can be solved using
cascade of completely convolutional Jittering and dropout techniques. For
neural systems (CFCNs) and thick the DCNN-based approach and FP
3D conditional arbitrary areas. CT reduction, the prescreening stage is
gut pictures of fifteen hepatic tumors the same in both systems in the FP
are recognized and approved by reduction stage. CAD is a feature-
utilizing twofold cross-validation. based system consisting of 3D clus-
The probabilistic boosting tree tering, an active contour method for
is used to classify lesions or paren- segmentation, morphological, gray
chyma in the liver. Convolutional level, and texture features for the
layer with a fully connected layer linear discriminant classifier to mark
detect lesions and liver morphology. the detected masses. Breast tumor
Drug-induced liver damage (DILI) is is identified by utilizing the vicinal
recognized with the DL engineering back vector machine (VSVM), which
comprising profound neural organize is an improvement learning calcula-
(DNN), convolutional neural arrange tion. The preparing information is
(CNN), and repetitive or recursive clustered into diverse delicate vicinal
neural arrange (RNN). Liver tumor zones in highlight space through the
from CT is distinguished by utilizing part-based deterministic tempering
convolution neural organize that (KBDA) strategy. The collector
can discover fluffy boundaries, working characteristics (ROC) are
inhomogeneity densities, and shapes utilized to determine the degree of
and sizes of injuries. Irregular execution of the calculation.
timberlands, K-NN, and back vector Segmentation, quantitative
machine (SVM) with spiral premises features, and classification of the
work less and the lattice looks are malignant blood cell and blast cell
utilized for central liver injuries with and irregular lymphoid cells are
contrast-enhanced ultrasound. called abnormal lymphoid cells
(ALC). The lymphocytic such as Differences between AI and robots

unremitting (chronic) lymphocytic are as follows:
leukemia (CLL), splenic minimal AI programs
zone lymphoma, hairy cell leukemia,
mantle cell lymphoma, follicular • These are used to function in a
lymphoma, B and T prolymphocytic computer-simulated universe.
lymphoma, large granular lympho- • An input is given within the
cyte lymphoma, Sezary syndrome, frame of images and rules.
plasma cell leukemia, blast cell • To perform these programs,
leukemia (it is interrelated with we require common reason
both myeloid and lymphoid acute computers.
leukemia), reactive lymphocytes
(RL) linked to viral infection, and Robots
normal lymphocytes (NL) are using
ML algorithms. For the segmenta- • Robots are utilized to function
tion, two regions of interests (ROIs) within the genuine physical
are considered from the image world.
such as the nucleus and cytoplasm. • Inputs are given within the
Morphological operations with shape of the simple flag
watershed segmentation algorithms and within the frame of the
are used to find the ROIs. Highlights discourse waveform.
are extricated with the gray-level • For their working, uncommon
co-occurrence lattice (GLMC). At equipment with sensors and
that point, it is prepared by employing effectors are needed.
an administered calculation called Mechanical arm direction is done
back vector machine (SVM). with seven motions by utilizing a
Robotics: Biomedical AI systems CNN profound learning organize.
can be trained in interpreting clinical Surgical robots are a type of robots
data such as screening, diagnosis, and utilized to help in surgical methods
treatment assessment. Commonly either by way of automated surgery,
used robots are improved and exhibit computer-assisted surgery, or roboti-
enhanced proficiencies for the cally assisted surgery.
biomedical field, which have proved Robotically assisted surgery is
their greater mobility and porta- used to improve the performance
bility. Robots in medical science of doctors during open surgery
are used for surgical, rehabilitation, and introduce advanced minimally
biological science, telepresence, invasive surgery like laparoscopy. It
pharmacy automation, companion- permits a broad extend of movement
ship, and as a disinfection agent. and more accuracy. Ace controls
permit the specialists to control the for compounding, etc. A companion

rebellious arms, by deciphering the robot emotionally engages and
surgeon’s characteristic hand and provides a company to the users and
wrist developments to compare, alert the users if there is any problem
extract, and scale. The objective with their health. A disinfection
of the automated surgery field is to robot acts as an agent that resists the
plan a robot to be utilized to perform bacterial spores in the place or room
brain tumor surgery. There are three rather killing the microorganisms.
surgical robots that have been used In Automated Fraud Detection
as of late but were for heart surgery: in the Healthcare sector, the system
monitors the employer in the medical
• da Vinci Surgical System, billing. Based on the prescription of
• ZEUS Automated Surgical the experts or the practitioners, the
System, and bill has to be claimed for the delivery
• AESOP Automated System. of the medicine. If wrongly keyed-
in or misspelt, medicines will lead
The automated specialists such to a greater consequences. Medical
as da Vinci surgical framework and product invoice trustworthiness is
ZEUS mechanical play a critical part important because it has made it as
in the heart surgery framework. online delivery in recent years. So,
Rehabilitation robotics provides the analysis must be done carefully
rehabilitation through robotic for the data integrated with different
devices. It assists different sensory medicine issued based on different
motor functions such as arm, hand, problems. For online medical bills
leg, and ankle. It also provides thera- and medicine delivery systems,
peutic training and gives therapy implementing AI efficiently will
aids instead of assistive devices. produce the remarks as well as the
Biorobots are planned and utilized common people will intelligently
within different fields, famously in learn automatically by their own.
hereditary designing to constrain the This biomedical application is inte-
cognition of people and creatures. grated with the health indicator and
Telepresence robots give a remote their working environment.
organize back to communicate in In pharmaceuticals, AI is imple-
farther locations. mented as an automatic machine
Pharmacy automation handles that is used in manufacturing or
pharmacy distribution and inven- discovering medicines. The deep
tory management and also involves neural network is implemented
pharmacy tasks such as counting with the machines with big data
small objects, measuring the liquids concepts with a huge set of molecule
compound ratios for each medicine monitoring is very important and

with a different combination. Medi- this can be done only by the humans.
cines are formulated based on the
chemical compounds. The number of
molecules with the ratio value helps 5.4 SUMMARY OF
to produce the medicine. It is very HEALTHCARE APPLICATIONS
difficult to produce a huge volume OF BIOMEDICAL AI SYSTEM
of medicine within the aspect of
man power based on the necessity Healthcare applications of the
of medicine availability. So, by biomedical AI framework are
making this production as an auto- dissected. Based on the writing
mated process, time will be saved survey of the numerous investigates,
and human errors will be avoided. an outline of major application
Although the automated system is of biomedical AI frameworks is
introduced in the pharmaceuticals, arranged in Table 5.1.
TABLE 5.1 Survey of Healthcare Applications of Biomedical AI System

Application AI Algorithms Description
Rich semantics ML and NLP Diagnoses, clinical treatments, or
medications
Speech Deep learning network Speech emotion recognition
recognition
Heart diseases Artificial neural network, CNN, Classification of stroke, heartbeat
Feed forward network classification and congestive heart
failure detection, heart diseases of
invasive and noninvasive
Brain tumor Artificial neural network, GRNN, Classification of pictures, ordinary
CNN, LSSVM, regression and deep and irregular ranges of the brain
learning, ResNet from information of MRI, diagnosis
of prognosis of Alzheimer’s disease,
segmentation of brain tumor
Cervical cancer ML techniques Detection and classification of
K-NN, C4.5 and logical regression, pap-smear test images
SVM, random forest, DT, Bayesian,
decision tree, SVM, FCM, Otsu
Physiological Deep learning algorithm Skin to find the muscle activation,
signals RNN force and state of the muscle and
heart activity
Deep learning algorithm used with
EMG signals limb movement
TABLE 5.1 (Continued)
Application AI Algorithms Description

Sleep state RNN, CNN, RBM, DBM, LSTM, Emotion classification,
DNN arrhythmia detection,
coronary supply route, Malady
Flag Recognizable proof, coronary
course infection location
Breast mass RNN with LSTM VGGNet Benign and malignant lesions
lesions Classify the tumor
Breast tumor Neural networks (GoogleNet,
VGG, and CIFAR), SVM Mammogram breast
DCNN ,VSVM
Lung cancer CNN, random forest classifier, Detection and diagnosis
MC-CNN
Liver and its CFCNs, DNN, CNN, RNN, random Liver tumor, liver lesions
lesions forest, K-NN, SVM, DCNN
Malignant blood ML Algorithms, GLCM, SVM Abnormal lymphoid cell and chronic
cells lymphocytic leukemia diagnosis
Robotics CNN Surgical, rehabilitation,
telepresence, pharmacy, companion,
disinfection
Fraud detection DNN Medical billing
Pharmaceuticals DNN Produce the medicine
5.5 CONCLUSION necessarily known so that new ideas

emerge for the production of new
AI techniques are applied to the and efficient AI systems. Basically,
biomedical healthcare system for in AI, the decisions are taken by the
analysis and prediction where use of algorithms. The usage of the
the technical support required for algorithm has proved itself in giving
human decision making a process the right results. Invariably, the AI
or automated system. In the current plays a major role in healthcare to
era, AI has formed a steady improve- take the right decision very espe-
ment in providing greater support cially within the short span of time.
and assistance for a better lifestyle An AI machine is a secured user with
and living. The key point in the start the predefined algorithms system to
of the development of AI has to be secure or protect the humans from the
unsure situations; unlike the human a particular period of time, it is the

beings, the machines are faster and machines that capture and preserve
long runners. Since machines would the human expertise. Also, machines
not get tired, they can be functioning have the ability to comprehend large
continuously without having a break. amounts of data quickly and also can
The data that is collected out of give a fast response for any related
the implementation of the AI machine queries. Finally, AI has to provide the
will be huge, but only a few of them trustworthiness when data are reused
can be used. Since it is a machine and for training or research purposes to
when its usage becomes massive, discover new things or to analyze the
the human’s dependency keeps existing ones for the prediction.
increasing. Universally, the massive
usage machines make humans to
fade away in their research, experi- KEYWORDS
ence, and scientific knowledge.
Unlike the other countries, India still machine learning
manages in giving job to maximum artifcial intelligence
humans without giving the much deep learning
importance. Once machines are used big data
for every work and for automatic cognitive technologies
functioning, the humans will have
neural networks
high crises in finding a job, which
will create chaotic situations in their natural language processing
livelihood. A machine functions out cyber security
of commands where you can find robotics
no creativity. Practically speaking, electronic health record
creativity alone can create a new system
and vibrant society. Machines are a clinical decision support
creativity of humans. If we are going imaging modalities
to miss the crown of creativity by
using machines, we are going to miss
a vibrant and innovative society. REFERENCES
Although it has little disadvantages,
by the perfect functions of AI, the Faust, Oliver, et al. "Deep learning
human mind gets stimulated in for healthcare applications based on
physiological signals: a review." Computer
cognitive processes and behaviors. It Methods and Programs in Biomedicine
is the machine but it is the creation 161 (2018): 113.
of the human effort and expertise. He, Jianxing, et al. "The practical
Although humans may be experts for implementation of artificial intelligence
technologies in medicine." Nature in healthcare: a survey approach."

Medicine 25, 1 (2019): 30. Proceedings of the 2016 International
Jiang, Fei, et al. "Artificial intelligence Conference on Microelectronics,
in healthcare: past, present and future." Computing and Communications
Stroke and Vascular Neurology 2, 4 (MicroCom). IEEE, 2016.
(2017): 230243. Rodellar, J., et al. "Image processing and
Johnson, Kipp W., et al. "Artificial machine learning in the morphological
intelligence in cardiology." Journal of the analysis of blood cells." International
American College of Cardiology 71, 23 Journal of Laboratory Hematology 40
(2018): 26682679. (2018): 4653.
Kim, Jae Won. "Classification with Deep Salman, Muhammad, et al. "Artificial
Belief Networks." intelligence in bio-medical domain:
Lo’ai, A. Tawalbeh, et al. "Mobile cloud an overview of AI based innovations
computing model and big data analysis for in medical." International Journal
healthcare applications." IEEE Access 4 of Advanced Computer Science and
(2016): 61716180. Applications 8 8 (2017): 319327.
Mazurowski, Maciej A., et al. "Deep learning William, Wasswa, et al. "A review of image
in radiology: an overview of the concepts analysis and machine learning techniques
and a survey of the state of the art with for automated cervical cancer screening
focus on MRI." Journal of Magnetic from pap-smear images." Computer
Resonance Imaging 49, 4 (2019): 939954. Methods and Programs in Biomedicine
Paul, Yesha, et al. "Artificial intelligence 164 (2018): 1522.
in the healthcare industry in India." The Xu, Jia, et al. "Translating cancer genomics
Centre for Internet and Society, India into precision medicine with artificial
(2018). intelligence: applications, challenges and
Pesapane, Filippo, et al. "Artificial future perspectives." Human Genetics 138,
intelligence as a medical device in 2 (2019): 109124.
radiology: ethical and regulatory issues Yu, Kun-Hsing, Andrew L. Beam, and
in Europe and the United States." Insights Isaac S. Kohane. "Artificial intelligence
into Imaging 9, 5 (2018): 745753. in healthcare." Nature Biomedical
Ramesh, Dharavath, Pranshu Suraj, and Engineering 2, 10 (2018): 719.
Lokendra Saini. "Big data analytics
CHAPTER 6
APPLICATIONS OF ARTIFICIAL
INTELLIGENCE IN BIOMEDICAL
ENGINEERING
PUJA SAHAY PRASAD1*, VINIT KUMAR GUNJAN2,
RASHMI PATHAK3, and SAURABH MUKHERJEE4
Department of Computer Science & Engineering, GCET,
1
Hyderabad, India
Department of Computer Science & Engineering, CMRIT,
2
Hyderabad, India
3
Siddhant College of Engineering, Sudumbre, Pune, Maharashtra, India
4
Banasthali Vidyapith Banasthali, Rajasthan, India
Corresponding author. E-mail: puja.s.prasad@gmail.com
*
ABSTRACT and neurology. The development of

pharmaceuticals via clinical trials can
Artificial intelligence (AI) can now take more time even decades and very
be very popular in various healthcare costly. Therefore, making the process
sectors. It deals with both structured quicker and inexpensive is the main
and unstructured medical data. objective of AI start-ups. AI thus
Common AI techniques in which has a wide application in the field of
machine learning procedures are used biomedical engineering. AI can also
for structured data are neural network help in carrying out repetitive tasks,
and the classical support vector which are time-consuming processes.
machine (SVM) as well as natural Tasks such as computed tomography
language processing and modern (CT) scans, X-ray scans, analyzing
deep learning for unstructured data. different tests, data entry, etc. can be
The main disease areas where AI tools done faster and more precisely by
have been used are cancer, cardiology, robots. Cardiology and radiology and
are two such areas where analyzing between subject features, and
the amount of data can be time- outcomes of interest. These clinical
consuming and overwhelming. In data often exist in but not limited to
fact, AI will transform healthcare the form of demographics, medical
in the near future. There are various notes, electronic recordings from
health-related apps in a phone that medical devices, physical examina-
use AI like Google Assistant, but tions, and clinical laboratories. AI
there are also some apps like Ada has been intended to analyze medical
Health Companion that uses AI to reports and prescriptions from a
learn by asking smart questions to patient’s file, medical expertise, as
help people feel better, takes control well as external research to assist in
of their health, and predicts diseases selecting the right, separately custom-
based on symptoms. As in expert ized treatment pathway. Nuance
systems, AI acts as an expert in a Communications provides a virtual
computer system that emulates the assistant solution that enhances
decision-making ability of a human interactions between clinicians and
expert. Expert systems like MYCIN patients, overall improving patient
for bacterial diseases and CaDET for experience and reducing physi-
cancer detection are widely used. In cian stress. The platform enables
image processing, it is very critical conversational dialogue and prebuilt
when it comes to healthcare because capabilities that automate clinical
we have to detect disease based on workflows. The healthcare virtual
assistant employs voice recognition,
the images from X-ray, MRI, and CT
electronic health record integrations,
scans so an AI system that detects
strategic health IT relationships,
those minute tumor cells is really
voice biometrics, and text-to-speech
handy in early detection of diseases.
and prototype smart speakers
One of the biggest achievements is a
customized for a secure platform.
surgical robot as it is the most inter-
IBM Medical Sieve is an ambitious
esting and definitely a revolutionary
long-term exploratory project that
invention and can change surgery
plans to build a next-generation
completely. “cognitive assistant” that is capable
However, before AI systems can of analytics and reasoning with a vast
be arranged in healthcare applica- range of clinical knowledge. Medical
tions, they need to be “trained” Sieve can help in taking clinical
through data that are generated from decisions regarding cardiology and
clinical activities, such as screening, radiology—a “cognitive health assis-
diagnosis, treatment assignment, and tant” in other terms. It can analyze the
so on, so that they can learn similar radiology images to detect problems
groups of subjects, associations reliably and speedily.
Applications of Artifcial Intelligence in Biomedical Engineering 127
6.1 INTRODUCTION • Motivations of applying AI in

healthcare
Artificial intelligence (AI) technology • AI techniques
has exposed massive opportunity • Disease types that the AI
in the medical healthcare system. communities are currently
The advantages that AI have fueled tackling
a lively discussion that whether AI • Real-time application of AI
doctors shall in due course replace • Future models
human doctors in the near future.
However, it is very difficult to
6.2 MOTIVATIONS OF
believe that human doctors will not
APPLYING AI IN HEALTHCARE
be replaced by AI machines in likely
future, as AI definitely provides
A number of advantages of AI
assistance to physicians so that they
have now been widely found in the
can make better clinical decisions. medical literature. AI can use refined
AI may also or even replace human algorithms to “learn” different
judgment in certain functional areas features from a large volume of
of healthcare like radiology. The healthcare data and then use to build
growing availability of medical models that help in medical practice.
healthcare data and fast development Learning and self-correcting abili-
of different big data analytic methods ties are also equipped to improve its
have made possible to give recent precision based on comments. An
fruitful applications of AI in medical AI system can help physicians by
healthcare. Mining medical records giving up-to-date medical evidence
is the most obvious application of from textbooks, journals as well
AI in medicine. Collecting, storing, as clinical practices to notify good
normalizing, tracing its lineage are patient care. One more benefit of an
the first steps in revolutionizing AI system is that it help to decrease
existing healthcare systems. Directed therapeutic and diagnostic errors that
by relevant clinical queries, influen- are predictable in the human medical
tial AI techniques can solve clini- practice. Furthermore, an AI system
cally relevant information concealed extracts valuable information from
in the huge amount of data, which in a large number of patients to help
turn assist medical decision making. in making real-time implications
This chapter discusses the current for health outcome prediction and
status of AI in healthcare, as well as health risk alert. There are numerous
its importance in future. This chapter advantages of using AI machines in
has four subsections: healthcare. AI service has found its
application in many industries due to the form of medical notes, electronic

its advantages. Some of the advan- recordings, demographics, from
tages of AI are as follows: medical devices, physical examina-
tions, clinical laboratories, and
• AI machines reduce the error images. Especially, in the diagnosis
in the operations. They are stage, a considerable proportion of
highly accurate and have a the AI literature helps in analysing
high degree of precision. data from genetic testing, diagnosis
• Robotics and AI machines imaging, and electrodiagnosis.
have helped in surgical treat-
ments, predicting disease
from symptoms, monitoring 6.3 AI TECHNIQUES
patients, etc. As AI machines
can work in any environment, The AI techniques that are mostly
exploration becomes easier. used in health care system are
• AI machines reduce the risk
• machine learning (ML) and
factor as they can perform
• natural language processing
tasks that involve risks to the
(NLP).
lives of humans.
• AI is a good digital assistant In this section, the main focus is
also. It can interact with on different AI techniques that have
the user in the form of text, been found useful in all different
speech, or biometrics. It can types medial applications. We cate-
perform simple tasks that a gorize them into three subgroups:
human assistant would do.
• classical ML techniques,
Medical data before AI systems • recent deep learning tech-
can be organized in health related niques, and
applications, these data need to be • NLP methods.
“trained” using data that are produced
from different clinical activities, Classical ML builds data
such as diagnosis, screening, treat- analytical algorithms that extract
ment assignment, and so on. The features from different set of data.
main objective is that they may Inputs to ML algorithms include
be able to learn similar groups of qualities or traits of patient “traits”
subjects, relations between different as well as sometimes medical results
subject features, and outcomes that or outcomes of interest. A patient’s
they needed. These medical data traits include disease history, age,
frequently exist in but not limited to allergies gender, and so on, as well
as disease-specific data like gene The main aim of supervised

expressions, diagnostic imaging, learning is to classify data and to train
physical examination results, EP the data for a particular outcome.
tests, medications, clinical symp- Regression and classification are the
toms, etc. Not only the traits but also two main methods for supervised
patients’ medical results are often learning. The best example of super-
collected in medical research. This vised learning is used by a cardiolo-
consists of patient’s survival times, gist for the automated analysis of the
disease indicators, and quantitative EKG, in which pattern recognition
disease levels like tumor sizes. ML is performed to select from a limited
algorithms can be divided into two set of diagnoses, which is a classifi-
cation task. In the field of radiology,
major classes:
automatic detection of a lung knot by
seeing chest X-ray is also supervised
• unsupervised learning and
learning. In this, the computer is
• supervised learning.
approximating what a trained doctor
is already done with high accuracy.
Unsupervised learning is well
Supervised learning is also used
known for reducing dimension and to estimate risk. Framingham risk
clustering, while supervised learning score for coronary heart disease can
is suitable for classification and predict the 10-year cardiovascular
regression. Semi-supervised learning risk, and it is a supervised gender-
is hybrid between unsupervised specific ML algorithm. Principal
learning and supervised learning, component analysis (PCA) and
which is suitable for circumstances clustering are two major methods of
where the result is not present for unsupervised learning. Regression
certain subjects. is a significant statistical method for
the analyzing medical data. It helps
in identification and description
of relationships among different
multiple factors.
Clustering is the method of
grouping subjects having similar
characters together into groups,
without using the outcome informa-
tion. Clustering algorithms output
the cluster labels for the patients
FIGURE 6.1 Different algorithms of through minimizing and maximizing
machine learning. the similarity of the patients within
and between the clusters. General tree, random forest, discriminant

clustering algorithms include analysis, nearest neighbour, SVM,
hierarchical clustering, k-means and neural network.
clustering, and Gaussian mixture
clustering. PCA is mainly for dimen-
sion reduction, especially when the 6.3.1 DEEP LEARNING
trait is recorded in a large number of
dimensions, such as the number of Deep learning is the extension of
genes in a genome-wide association classical neural network. Deep
study. PCA schemes the data onto learning has the ability to discover
a few PC directions, without losing more complex nonlinear patterns
too much information about the in the data. Another cause for the
subjects. Sometimes, one can first fresh popularity of deep learning is
use PCA to decrease the dimension due to the increase in complexity as
of the data and then using clustering well volume of data. Deep learning
algorithm to cluster the subjects. is different from the typical neural
While supervised learning reflects network, it consists of more hidden
the subjects’ consequences together layers so that the algorithms can lever
with their behaviours it drives complex data having various struc-
through an assured training process tures. Convolutional neural network
to define the best yields or output (CNN) is the most commonly
that are associated with the corre- used in after that recurrent neural
sponding inputs and that are closest network, and deep neural network.
to the results on average. Usually, The convolutional neural network
the output designs vary with the is established in viewing of the
results of interest. For example, the ineffectiveness of the classical ML
outcome can be the expected value algorithms when handling complex
of a disease level or probability of and high dimensional data, means
receiving a particular clinical event, data having number of qualities.
or the expected survival time. Traditionally, the ML algorithms are
It is seen that compared to planned to analyze data when the
unsupervised learning, supervised number of traits is small. However,
learning delivers more clinically the image data are obviously high-
applicable results, and this is the dimensional because individually
reason supervised learning is image contains thousands of pixels
more popular AI applications in as traits. Dimension reduction is one
healthcare. Significant techniques solution, but there is a possibility of
comprise logistic regression, naïve loss of information in the images
Bayes, linear regression, decision by using ML algorithms. However,
heuristic feature selection procedures So, the main aim of NLP is to assist
may lose information in the images. in decision making by processing the
Unsupervised learning methods such narrative text. Two main components
as PC analysis or clustering methods of NLP are
can be cast-off for data-driven
reducing dimension. CNN was first (a) text processing and
proposed and supported for the anal- (b) classification.
ysis of high-dimensional images.
The inputs for CNN are the appro- In text processing, disease-
priately normalized pixel values on significant keywords are identified
the images. CNN then shift the pixel based on historical data. After that,
values in the image from side to side a keyword subset is selected through
weighting in the convolution layers inspecting their effects on the classi-
as well as sampling in the subsam- fication of the abnormal and normal
pling layers otherwise. The final cases. The confirmed keywords
output is a recursive function of the then move in and enrich the clinical
weighted input. structured data to assist in medical
decision making. The NLP pipelines
have been established to assist
6.3.2 NATURAL LANGUAGE medical decision making on moni-
PROCESSING toring adverse effects, alerting action
arrangements so on. On introducing
The main focus of NLP is to process NLP to analyze the X-ray reports of
the narrative text into machine chests, it would help the antibiotic
understandable form. A number assistant system to alert doctors for
of clinical data or information like the need for anti-infective treatment.
physical examination, laboratory Laboratory-based adverse effects
reports, discharge summaries, and canbe also automatically monitored
operative notes are incomprehensible by using NLP. NLP also helps to
and unstructured for the computer diagnose diseases. For example, 14
program. In this type of unstructured variables that are associated with the
data, NLP’s main targets is to extract cerebral aneurysm disease are found
meaningful information from this to be successfully used for classi-
narrative text so that clinical decision fying persons with cerebral diseases
will become easy. ML algorithms are and normal persons. NLP also used
more useful in case of genetic data to mine the outlying arterial disease-
and EP as these data are easily under- associated keywords from narrative
standable by the machine for quality clinical notes. This keyword is then
control processes or preprocessing. used for classification between the
normal person and the patients with The movement recognition stage
peripheral arterial disease having helps to recognize abnormal behavior
91% accuracy. having movement different from the
normal pattern. Collecting data about
pathological gaits is also helpful for
6.4 DISEASE TYPES predicting a stroke. Hidden Markov
CURRENTLY TACKLED BY AI models and SVM are used here,
COMMUNITIES and they could appropriately clas-
sify 90.5% of the subjects to the
Stroke is a frequently occurring correct group. MRI and CT are good
disease that affects more than 500 neuroimaging techniques for disease
million persons worldwide. China evaluation. In some of the literature
and North America are one of the works, it was found that apply ML
leading countries having high death methods to neuroimaging is useful
rate due to strokes. Medical expenses for diagnosis. Some used support
due to stroke are also high ad put vector machine for MRI data, which
heavy burden on families and coun- helps in identifying end phenotypes
tries. So, research on avoidance as of motor disability after stroke.
well as treatment for stroke has good Some researchers also use three-
significance. AI methods have been dimensional CNN for finding lesion
used in more and more in stroke- segmentation in brain MRI (multi-
related studies as it is one of the modal). In this, for postprocessing
main causes of death throughout the of CNN segmentation maps, a fully
country. Three main areas of strokes conditional random field is used.
are predicting the disease by early Gaussian process regression is also
diagnosis, cure, and outcome predic- used for stroke anatomical MRI
tion as well as prognosis evaluation. images. ML is also used to analyze
Though there is a lack of finding or the CT scan of patients.
judgement of early stroke symptoms, After stroke, a free-floating
only a few patients might receive intraluminal thrombus may form
treatment on time. For predicting as a lesion, which is hard to notice
early stroke, movement-detecting with a carotid plaque in CT imaging.
devices are already present. PCA For this, a researcher uses three
and genetic fuzzy finite state ML ML algorithms for classification
algorithm were implemented into of these two kinds by the quantita-
the device for building solutions. tive shape, including SVM, linear
The detection procedure included discriminant analysis, and artificial
a stroke-onset detection stage and neural network. Treatment using
human movement recognition stage. ML has been useful for analyzing
and predicting the stroke treat- the three-month treatment result

ment performance. Intravenous by analyzing physiological factors
thrombolysis outcome has strong during 48 hours after stroke by using
relationship with prognosis and the a method called logistic regression.
survival rate. Some researchers use Some researchers accumulated a
SVM for predicting whether patients database of clinical infomation of
having tPA cure would mature a 110 patients with acute posterior and
symptomatic intracranial hemor- anterior circulation stroke who went
rhage by the CT scan. For this, for intra-arterial therapy.
full-brain images are used as the
input into the SVM, which achieved
more improvement than the conven- 6.5 REAL-TIME APPLICATIONS
tional radiology-based approaches. OF AI
Another researcher proposed a stroke
treatment model to improve the The increasingly growing number
of applications of ML in healthcare
medical or clinical decision-making
allows us to glimpse at a future
procedure of tPA cure. Another
where data, analysis, and innovation
stroke treatment model for analyzing
work hand-in-hand to help countless
practices strategies, clinical trials,
patients without them ever realizing
and meta-analyses by using the
it. Soon, it will be quite common
Bayesian belief network. This model
to find ML-based applications
consists of 57 different variables
embedded with real-time patient
and three conclusions to analyze the
data available from different health-
process of diagnosis, treatment, and
care systems in multiple countries,
outcome prediction. One researcher thereby increasing the efficacy of
subgroup analyzes using interac- new treatment options that were
tion trees to discover suitable tPA unavailable before.
dosage formed on patient features,
taking into account both the risks
of treatment efficacy and bleeding. 6.5.1 ELECTRONIC HEALTH
Prognosis evaluation and outcome RECORDS
prediction are the factors that can
affect disease mortality and stroke Handling medical records and other
prognosis. Compared to the conven- medical data are very important part
tional approaches, ML procedures of the medical field. Clinical records
have benefits to improving forecast are one of the significant parts for
performance. For better support managing patients. Two reasons
medical decision-making processes, are necessary to put records safely.
a model is developed that analyzes It helps in evaluating the patients
and to plan treatment protocols, providers—like specialists, medical

important for the doctor so that imaging facilities, laboratories,
they properly maintain the clinical pharmacies, emergency facilities,
records of the number of patients. schools, and workplace clinics. The
One more advantage is that it also main aim behind is that they encom-
assists the legal system that trusts pass information from all clinicians
mostly on documentary proofs in that are involved for patient’s care.
cases of any medical negligence. So, EHRs, as a very large and
it is important that clinical records networked healthcare delivery
should be properly preserved and system, are regularly observed
written to aid the attention of the as monolithic, inflexible, costly
doctor and his patient. to configure, and difficult to use.
A digital form of a patient’s They are practically obtained
prescription chart named electronic from commercial vendors and
health records (EHRs) are patient- require significant time, consulting
cantered, real-time records that assistance, money, and support for
prepare the available information implementation.
securely and instantly to the autho- The most popular systems are
rized users. An EHR contains a often built around older underlying
patient’s diagnoses, medical history, technologies, and it often shows in
treatment plans, medications, treat- their ease of use. Many healthcare
ment history of patients, laboratory systems find these systems complex
and test results, radiology images, and difficult to navigate, and it is
immunization dates. An EHR system rare that the EHR system is a good
allows access to the evidence- fit with their preferred care delivery
based practice network tools that processes.
help in taking decision for patients. As delivery networks grow
It also streamlines and automates and deploy broad enterprise EHR
provider workflow. platforms, the challenge of making
One of the significant character- them help rather obstruct clinicians
istics of an EHR is that facts related is increasing. Clinicians’ knowledge
to patient can be made and managed extends far beyond their clinical
by certified or authorized providers domain—care procedure knowledge,
in a proper digital format that can patient context knowledge, adminis-
be shared with additional providers trative process knowledge—and it is
across more than one health care rare that EHRs can capture all of it
organization. The key aim of EHRs efficiently or make it easily available
is made to share information with (Simpson and Demner-Fushman,
other healthcare organizations and 2012).
Introducing AI to the existing the past (Zeng et al., 2010). To enable

EHRs makes the EHR system more more personalized care, companies
effective, intelligent, and flexible such as Change Healthcare, IBM
(Bernstam et al., 2010). A number of Watson, and Allscripts are using the
works are introducing AI capabilities learned data obtained from different
in the medical field, but this requires clinical sources.
more work in this direction. The The main aims to introduce AI
text data of individual patients, their in an EHR system are to personalize
specific illnesses, as well as treat- treatment and in data discovery and
ment methods for those diseases are extraction as well as to make it more
in a developing stage to use the ML friendly. This is very complicated
algorithm and make more effective and hard to use and often seen as
Biotech and pharmaceutical contributing to doctor’s burnout.
groups already have the pressure Nowadays, customizing EHRs to
to use resources as proficiently as make it easier for doctors is mainly
possible. This pressure forces them a manual and time-consuming
to search for any opportunity to process; as also, systems’ rigidity is
streamline processes that are more an actual obstacle in their improve-
secure, act faster, as well as of low ment. AI and ML specifically could
costs. Many life science companies help EHRs to continuously get used
are targeting biologics and preci- to users’ choices, improving both
sion medicine therapeutics, with outcomes and clinicians’ life (Alex-
focus shifting toward smaller, ander, 2006).
geographically distributed patient Most current AI choices are
segments (Patel et al., 2009; Wilson, “encapsulated” as separate aids and
2011). This shift has resulted in an are not available as integrated ones
increasing mandate to capture data and also require physicians to learn
from previously unavailable, nontra- working of new interfaces. However,
ditional sources. These sources nowadays, EHR vendors are begin-
include mobile devices, IoT devices, ning to add AI capabilities to make
and in-home and clinical devices. their systems easier to use. Firms
Life science companies merge data like Epic, Cerner, All Scripts, and
from these sources with data from Athena are adding capabilities like
traditional clinical trials to build NLP and ML for clinical decision
robust evidence around the safety support, integrating with telehealth
and efficacy of a drug. Clinical deci- technologies and automated imaging
sion an important decision support, analysis. This will provide integrated
which endorses treatment strategies interfaces, access to data held within
and was rule-based and generic in the systems, and multiple other
benefits—although it will prob- and desirable, it seems likely that

ably happen slowly (Lymberis and the transition to dramatically better
Olsson, 2003). and smarter EHRs will require many
Future EHRs should also be years to be fully realized.
developed with the integration of
telehealth technologies in mind
(as is the EHR at One Medical). 6.5.2 ROBOTIC APPLICATIONS
As healthcare costs rise and new IN MEDICINE
healthcare delivery methods are
tested, home devices such as A wide range of robots has been
glucometers or blood pressure cuffs developed for providing helps in
that automatically measure and send different roles within the clinical
results from the patient’s home to environment. Two main types of
the EHR are gaining momentum. robots called rehabilitation robots
Some companies even have more and surgical robots are nowadays
advanced devices such as the smart specializing in human treatment.
t-shirts of Hexoskin that can measure The field of therapeutic robotic
several cardiovascular metrics and and assistive robotic devices is also
are being used in clinical studies growing rapidly. These robots help
and at-home disease monitoring. patient to rehabilitate from severe
or serious conditions like strokes;
Electronic patient reported outcomes
empathic robotsalso help in the care
and personal health records are also
of mentally/physically challenged
being leveraged more and more as
old individuals, and in industries
providers emphasize on the impor-
robots help in doing a large range
tance of patient-centered care and
of routine tasks, such as delivering
self-disease management; all of
medical supplies and medications
these data sources are most useful
and sterilizing rooms as well as
when they can be integrated into the
equipment. The areas where robots
existing EHRs.
are engaged are given below.
Most delivery networks will prob-
ably want to use a hybrid strategy—  Telepresence
waiting for vendors to produce In telepresence, doctors uses robots
AI capabilities in some areas and to help them to examine and treat
relying on a third party or in-house patients in remote locations and
development for AI offerings that rural areas. Consultants or specialist
improve patient care and the work can be on call use robots to answer
lives of providers. Starting from health-related questions and also
scratch, however, is probably not an guide therapy from far or remote
option for them. However necessary locations. These robotic devices
have navigation capability within the sclerosis. Virtual reality having

electronic record and built-in sophis- integrated by rehabilitation robots
ticated cameras for the physical also improves gait, balance as well
examination. as motor functions.
 Surgical Assistants • Medical Transportation Robots
Assisting surgical robots are (MTRs)
very old and present since 1985 This type of robots supply meals and
for remote surgery, also called
medications transported to staff and
unmanned surgery and minimally
patients. By using MTRs, there is
invasive surgery. Robotic surgical
optimized communication between
assistance has too many advantages
hospital staff members, doctors, and
including decreased blood loss,
smaller incisions, quicker healing patients. Self-navigation capability
time, less pain, as well as ability to is one of the important characteris-
pinpoint positions very precisely. tics of this type of robots (Horvitz
In this type of surgery, the main et al., 1988). There is still a need for
trademark is that remote-controlled extremely advanced as well as indoor
hands are manipulated by an oper- navigation property that is based on
ating system while seating outside sensor fusion location technology
the operation theatre. Further appli- so that navigational capabilities of
cations for surgical-assistant robots transportation robots become more
are constantly being developed to robust.
give surgeons more enhanced natural
stereovisualization with augmented • Sanitation and Disinfection
technology as well as spatial refer- Robots
ences required for very complex These robots can disinfect a clinic or
surgeries. room containing viruses and bacteria
 Rehabilitation within minutes. With the outbreaks
of infections as well as increase
People having disabilities, including
in antibiotic-resistant strains like
improved mobility, coordination,
strength, and low quality of life are Ebola, there is a requirement of more
assisted by rehabilation robots that healthcare facilities having robots to
can be automated to adapt to the disinfect surfaces and clean surfaces.
situation or condition of each patient Presently, the major methods that are
separately as they recover from trau- used for disinfection are hydrogen
matic brain strokes, spinal cord inju- peroxide vapors and ultraviolet light,
ries, or neurobehavioral as well as so there is a need to introduce robots
neuromuscular diseases like multiple in this area.
• Robotic Prescription Dispensing AI that could be easily set up into the

Systems care delivery process.
The major advantages of robots Nowadays, people are too busy
include accuracy and speed, and in their day-to-day activities, so
these are two features that are very requirement for home services
important for pharmacies (Reggia for routine check-ups are in more
and Sutton, 1988; Patel et al., 1988). demand. In this busy schedule,
people want that nurses should visit
robots in automated dispensing
their homes and provide services.
systems handle liquids, powder, as
Medical assistant apps may be
well as highly viscous materials with
very useful in this type of situa-
much higher accuracy and speed.
tions. AI apps learn the behavior of
nurses and medical staff and give
them proper required care and also
6.5.3 VIRTUAL NURSES
connect them with the proper care
AI is a very big driving force that providers. Virtual Nurse is an AI
started to make changes the way we that can provide you with useful
and educational information on
do our day-to-day activities. Virtual
hundreds of illnesses, injuries, and
nurses are one of the examples of
medications. Medications are now
application of AI in nursing job as
listed on Virtual Nurse. Virtual
there is always a shortage of nursing
Nurse can tell a person about illness
staff. Due to a lack of time and long
and also give information about
distance from the hospital, people
it. We can find detailed informa-
can get assistance and medical tion about medications including
advice immediately through video side effects and instructions. We
conferencing, phone, e-mail, chat as can also get first-aid advice, also
well as instant gratification. ask for medical advice, and know
Healthcare applications exactly what to do in your next
combined with AI can save health- emergency with Virtual Nurse on
care economy of United States as your devices.
much $150 billion by five years, and
among them, virtual nursing aides
could save around $20 billion. 6.5.3.1 CHARACTERISTICS OF
There is a big challenge for VIRTUAL NURSE
healthcare business to upgrade
commitment and care quality to the • Helps in assisting answering
patients by working together with medical and health and
developers to make solutions using wellness-related questions.
• This also gives reminders for just cannot make so many

appointments/follow-ups and calls, records abundant
medication administration. information, and conducts
• Symptom checkers are also analysis.
present, which help to diag-
nose related medical prob- Chronic diseases like heart
lems by enquiring symptoms failure, cancer, diabetes, and chronic
and also search symptoms by obstructive pulmonary disease have
using medical dictionary and a continuous presence in person’s
reach at probable diagnosis. life. One of the virtual avatar apps
• Virtual Nurse also provides monitors patient’s health, also
cardiopulmonary resuscita- helps patient with hospital readmis-
tion (CPR) instructions to the sion revisits, etc. It also helps to
person using it. get medicinal instructions after
• This provides information interacting with doctors (Tiwana et
about what will be required al., 2012; Poon and Vanderwende,
for an expecting mother at an 2010). For helping patients that have
every stage of pregnancy in chronic diseases, virtual nursing
terms of food, medicine, and apps could be a blessing for patients
fluid. who spend their lifespan in and
• It also instructs patients about round healthcare facilities.
the side effects caused by an AI-based virtual Nurse applica-
active drug and also advices tion is compatible to work with
to cope with the drug. various different age groups. This
• It helps patients by using app is based on a rule-based engine
visual and audio aids to having algorithms based on normally
do basic nursing work like acknowledged medical protocols for
changing bandages. diagnosing and dealing with specific
• It also books an appointment chronic diseases. The protocols and
of a general practitioner. contents for the app can be delivered
• It improve treatment by partner and clinic hospitals. The
observance rates, by 24 × 7 patient’s mood and modulations are
availability. used to qualify the app to answer.
• It helps to make lifestyle Such apps are being developed to
change as well as make collect signals about a patient’s
corrections to encourage health and help as an unnamed
healthier living. database on symptoms. One of the
• It ensures data collection, benefits of this app is that these are
helps in analysis as humans able to integrate emotions analytic
tools, conversational chat and voice bore, as well as the software and
assistants, and emotion recognition electronic control systems that are
tools into the artificial platform. present in it, to improve the accuracy
The virtual avatar app of nurses of prostate biopsy.
can also be programmed to perform In this, the main aim is to develop
detailed counselling of a behavioral a robot for an MRI scanner that can
health problem. This type of app work inside it. However, there are
talks to patients through a smart- number of physical challenges for
phone about their health condition. placing a robot inside the scanner, as
The patient has no need to type: scanner uses a powerful magnet, so
they only talk to the virtual avatar it is necessary that the robot should
about their health condition, then be made up of nonferrous materials.
this conversation can be rolled and Most of the technical difficulties
transcripted to record, and afterward have already been overcome by
it would be reviewed by the health this team. Besides this, they need
provider. The virtual avatars will to develop the software interfaces
speak to person empathetically and and communication protocols for
naturally, which might also benefit properly controlling the robot with
people who are ailing elderly and planning systems and higher level
having chronic diseases. imaging (Holzinger, 2012). For
By hearing unusual voice or the nontechnical surgical team,
detecting unusual emotional tone the robot must be easily sterilized,
of a patient such as depression and easily placed, and easily setup in the
anxiety, the app does emotional scanner, which are also some of the
analysis and provides alert to the requirements. Because of all this, it
health provider who may prescribe is a huge system integration assign-
medications. ment that requires many repetitions
of the software and hardware to get
to that point.
6.6 FUTURE MODELS In other projects, a rehabilitation
robot is integrated with virtual reality
Research for developing advanced to expand the variety of therapy exer-
robots continues for an ever- cise, increasing physical treatment
expanding variety of applications effects and motivations. Nowadays,
in the healthcare area. For instance, discoveries are being made using
a research team led by Gregory nanomaterials and nanoparticles. For
Fischer is developing a high-preci- example, in “blood–brain barrier,”
sion, compact surgical robot that will nanoparticles easily traverse. In the
operate only within the MRI scanner coming future, nanodevices can be
filled with “treatment payloads” of 6.6.2 BIO BEATS

required medicine, can be injected
into the body, and automatically Bio Beats is a leading app devel-
guided to the exact target sites inside oper that specializes in improving
the body. Very soon, digital tools patient well-being. Their apps
that are broadband-enabled will be focus on dealing with stress, which
accessible, which will use wireless has a debilitating effect on health.
technology to monitor inner reac- Bio Beats uses sophisticated AI
tions to medications. AI is playing algorithms to better understand the
a prominent role in the healthcare factors that contribute to stress and
industry, from wearable technologies the steps that people need to take to
that can monitor heart rate and the mitigate it. Their app is programmed
number of calories you have burnt to to help people develop better
apps that combat stress and anxiety. resilience to stress, which will help
The number of advancements that AI improve their long-term health. The
pioneers have made in the healthcare app offers a variety of features, which
industry is nothing short of spectac- include helping promote better sleep,
ular. Are you wondering who are the encouraging more physical activity,
innovators and creators behind the and conducting a regular assessment
technology that is vastly changing of your mood.
the medical industry? Well, here are
the top 10 companies that are using
their advanced technology to make a 6.6.3 JVION
positive difference in healthcare.
Jvion has a similar approach to Bio
Beats. It is a company that focuses
6.6.1 GOOGLE DEEP MIND on improving long-term physical
HEALTH health by addressing stress and other
cognitive factors.
Google is obviously one of the most According to their website,
renowned AI companies in the world. the Jvion Machine has been used
The AI division of the company to help over two million patients.
is focusing more on applying AI While the machine uses AI to iden-
models to advance healthcare. tify physical risk factors that were
These new models are capable of previously undetected, the majority
evaluating patient data and forming of risk factors are cognitive. The
diagnoses. Google believes that their technology used to run this machine
technology will be able to develop has found interesting correlations
entirely new diagnostic procedures between various psychological
in the near future. factors and physical ailments. One
54-year-old man was suffering from 6.6.5 DREAMED

deep vein thrombosis. He was at the
risk of exacerbating the problem to Managing diabetes can be incredibly
become an embolism. Fortunately, frustrating and complex. The biggest
the machine was able to identify problem is that every diabetic
the socioeconomic problems that patient’s needs are different. They
were contributing to his disease. have to account for gender, weight,
The doctors were able to develop a and metabolic factors. Their diet
also plays a key role in their diabetes
customized treatment plant to help
management plan.
save him.
DreaMed is a new company that
has found new ways to improve the
delivery of diabetes management
6.6.4 LUMIATA
services. It uses carefully tracked
Lumiata is a company that biometrics a data to help develop
a custom treatment plan for every
offers a unique cost and return
patient.
analytics technology that is used
to improve the efficiency and
cost-effectiveness of healthcare. 6.6.5 HEALINT
While most other cutting-edge AI
companies focus on developing AI Healint is a company that focuses
models to improve the effective- on helping patients manage chronic
ness of healthcare diagnostics, this diseases. Their technology uses a
company focuses on the financial wide range of data sources, which
side of the equation. include information from mobile
Lumiata is a relatively new phones, wearable fitness devices,
company, but it is already making a and patient records. This technology
splash in AI-based health care. They is able to provide some of the time-
secured $11 million in funding from liest healthcare services available.
several prominent investors this
summer. The company is expected to
6.6.6 ARTERYS
get even more funding in the future
as a growing number of investors see Arterys is one of the most sophis-
opportunities to use deep learning ticated AI solutions for medical
to improve healthcare analytics. imaging. This technology leverages
This should significantly change the cloud technology to perform some
economic models behind the health- other fastest medical computations
care industry. imaginable. The medical imaging
analytics platform is known for its to improving mental health. They

unprecedented speed and quality of keep track of different patient inputs
imaging. to establish causal links between
various mental health issues and
create custom patient plans.
6.6.7 ATOMWISE
Atomwise is helping pharmaceutical 6.6.9 IBM WATSON SYSTEM

companies make breakthroughs in
rolling out new drugs. They have IBM Watson System is good pioneer
already helped companies find a in this field. The Watson System
number of new drugs much more includes both ML and NLP.
quickly. Processing modules have made
The company is forming partner- encouraging progress in the field
ships with other leading healthcare of oncology. Treatment recom-
organizations to use AI to improve mendations for cancer patients from
healthcare models. Last month, Watson are about 99% coherent with
Atomwise entered into a relation- the doctor decision. For AI genetic
ship with Pfizer. A few months ago, diagnostics, Watson collaborated
they sponsored a new project that is with the Quest Diagnostic for
helping over 100 academic institutes offering better solution. In addition,
around the world to develop better the collaboration shows the impact
data. on real-life clinical practices. By
analyzing the genetic code in Japan,
Watson effectively identified the
6.6.8 HEALTH FIDELITY
rare leukemia (secondary) that is
triggered through myelodysplastic
Processing payments is a key aspect
of any healthcare system. Unfortu- syndromes.
nately, some healthcare payments CC-Cruiser is a web-based appli-
involve a higher level of risk than cation that supports proposals for
others. Health Fidelity uses data dealing with ophthalmologists as it
to help mitigate the risks of taking enables high-quality medicinal care
these types of payments. as well as individualized treatment
for the person of developing areas.
Besides, this software can also be
6.6.9 GINGER.IO used in giving teaching activities
ophthalmology junior students.
Ginger.io is one of the most One more research prototype called
cutting-edge data-driven solutions cloud-based CC-Cruiser connects AI
system back end clinical activities using more sophisticated algorithms

with front end input data. All the before giving apps or systems to
clinical data like images, blood pres- assist physicians for diagnosing
sure, genetic results medical notes, diseases and providing medical
and so on and demographic informa- treatment suggestions.
tion like age, sex, etc. are collected
into the AI system. By using this
information, AI app results some KEYWORDS
suggestion and then these sugges-
tions are sent to the physician to principal component analysis
assist in clinical decision making. naïve Bayes
Feedback related to suggestions like linear regression
wrong or right will also be collected
decision tree
together and input back into the AI
random forest
system as a result to keep improving
accuracy. NLP
6.7 CONCLUSION AND REFERENCES

DISCUSSION
Alexander, C.Y., 2006. Methods in
This chapter gives motivation of biomedical ontology. Journal of
using AI for healthcare organiza- Biomedical Informatics, 39(3), pp.
252–266.
tion. By using various categories
Bernstam, E.V., Smith, J.W. and Johnson,
of healthcare data. AI has analyzed T.R., 2010. What is biomedical informatics.
and charted the key diseases that Journal of Biomedical Informatics, 43(1),
take advantages of AI. After that, the pp. 104–110.
two most classical techniques of AI Holzinger, A., 2012. On knowledge discovery
called SVM and neural networks and and interactive intelligent visualization
of biomedical data. In Proceedings of
modern deep learning techniques
the International Conference Int. Conf.
discuss the process of developingdif- on Data Technologies and Applications,
ferent AI apps for healthcare industry. DATA (pp. 5–16).
For developing any successful AI Horvitz, E.J., Breese, J.S. and Henrion, M.,
app, the ML component is used for 1988. Decision theory in expert systems
handling different categories of and artificial intelligence. International
Journal of Approximate Reasoning, 2(3),
data like genetic data as well as the
pp. 247–302.
NLP component is used for mining Lymberis, A. and Olsson, S., 2003. Intelligent
or handling unstructured data. The biomedical clothing for personal health
healthcare data is used for training and disease management: state of the art
and future vision. Telemedicine Journal Reggia, J.A. and Sutton, G.G., 1988.
and e-Health, 9(4), pp. 379–386. Self-processing networks and their
Patel, V.L., Groen, G.J. and Scott, H.M., 1988. biomedical implications. Proceedings of
Biomedical knowledge in explanations the IEEE, 76(6), pp. 680–692.
of clinical problems by medical Simpson, M.S. and Demner-Fushman, D.,
students. Medical Education, 22(5), pp. 2012. Biomedical text mining: a survey of
398–406. recent progress. In Mining text data (pp.
Patel, V.L., Shortliffe, E.H., Stefanelli, M., 465–517). Springer, Boston, MA, USA.
Szolovits, P., Berthold, M.R., Bellazzi, Tiwana, M.I., Redmond, S.J. and Lovell,
R. and Abu-Hanna, A., 2009. The N.H., 2012. A review of tactile sensing
coming of age of artificial intelligence technologies with applications in
in medicine. Artificial Intelligence in biomedical engineering. Sensors and
Medicine, 46(1), pp. 5–17. Actuators A: Physical, 179, pp. 17–31.
Poon, H. and Vanderwende, L., 2010, June. Wilson, E.A., 2011. Affect and Artificial
Joint inference for knowledge extraction Intelligence. University of Washington
from biomedical literature. In Human Press.
Language Technologies: The 2010 Annual Zeng, D., Chen, H., Lusch, R. and Li,
Conference of the North American Chapter S.H., 2010. Social media analytics
of the Association for Computational and intelligence. IEEE Intelligent
Linguistics (pp. 813–821). Association for Systems, 25(6), pp. 13–16.
Computational Linguistics.
CHAPTER 7
BIOMEDICAL IMAGING TECHNIQUES

USING AI SYSTEMS
A. AAFREEN NAWRESH1* and S. SASIKALA2
Department of Computer Science, Institute of Distance Education,
1
University of Madras, Chennai, India

Department of Computer Science, Institute of Distance Education
2
University of Madras, Chennai, India

Corresponding author. E-mail: anawresh@gmail.com
*
ABSTRACT biotechnology, biophysics, bioin-

formatics, genomics, and so on.
Artificial intelligence (AI) can One steady development is seen in
be defined as the one that makes medical imaging systems since there
machines to think, work, and also is an increase in the requirement that
achieve tasks that generally humans is satisfied such that the AI-based
do. AI built for medical diagnosing systems become highly valuable.
systems has promised to work at AI will help doctors, surgeons,
the rate of performing 1025 opera- and physicians to see more and
tions per second. AI will be helpful do more at an earlier stage so that
in providing much unconditional patient outcomes can be improved.
help that usually takes long hours/ AI-incorporated systems and appli-
day to be done by humans. AI will cations would surely bring a great
automate a heavy amount of manual change in the field of the medical
work and speed up the processing. world and its evolutionary practices.
The digital world would experience
simple yet powerful systems that
make tasks done in a jiffy. AI not 7.1 INTRODUCTION
only enhanced medical diagnosis
but also groomed the relevant Biomedical imaging focuses on
areas by developing applications in capturing the images useful for both
diagnostic and healing purposes. where patients can monitor them-

Imaging techniques provide unique selves. Devices are being equipped
important details about tissue with advanced techniques that help
composition, skeletal damage, and people perform tasks that were once
breakage, and also an explanation done only in clinical test centers.
of many elemental biological proce- There are several promising
dures. In recent time, biomedical applications of AI systems in the
imaging science has transformed into medical devices sector, and profes-
a diverse and logical set of informa- sionals are looking for an advantage
tion and model, and it has attained a of the impact of this technology. The
position of central importance in the attempt to build medical devices as
field of medical research. more reliable, compact, accurate,
There were times when people and automated is producing a
used to travel long distances from growing interest in finding ways to
their village or town to take the blood incorporate AI systems. Medical
test, scans, and furthermore wait for imaging is an area that is progres-
the reports to come that used to take sively developing and improving
a week or even a month. Magnetic devices to support the management
resonance imaging (MRI) machines and treatment of chronic diseases
were less available in the nearby and will most likely continue to be a
hospitals, and many hospitals do not major area of focus.
favor running them continuously for In general, an AI system is a
24-h in a day due to huge electricity machine that will achieve jobs that a
bills. Computed tomography (CT) human can perform on their own. The
scans were used only in emergency task of achieving or completing objec-
situations, in the case of trauma or tives depends on the training phase
accidents. A CT scan generally costs that is given to the systems where it
a little more than the existing X-ray learns to follow certain rules or proto-
and also takes time in producing cols when input data is given. It also
results. On the contrary, to get a involves targeting self-realization
clear and deep image of the tissues, of errors and correcting it to attain
nerves, a full-body MRI machine proper and accurate results. Artificial
was invented, which was named intelligence (AI) is an emerging field
“indomitable,” that means impos- that invokes researches for human
sible to defeat. It sure by name is reasoning in AI systems. Researchers
the best imaging system since then. have tried to create a major pace in
In this era of vast development, the development of successful AI
we can see that there are essential applications and systems. Apart
components in a medical device from creating exquisite machines to
Biomedical Imaging Techniques Using AI Systems 149
make work of capturing the region The scope involved in making

of interest easier, AI has also entered tasks easier is by establishing an
into creating possibilities of making interface between the user and the
other areas of research in medicine application/system. This ensures
fascinating. Among the potential a promising platform for making
and efficient discoveries of AI, the adaptability to the tech world.
establishment has been in the field Some of the applications that
of medicine, the various products made tasks easier in the domain
discovered in the field of biomedical of medical science and research
research, transactional research, are being discussed in the below
and clinical practice are given in the section.
figure below (Figure 7.1).
FIGUE 7.1 Applications of AI in medicine.
7.1.1 BASIC BIOMEDICAL and quick recording of data allowing

RESEARCH experiments to perform exercises
effectively. During the development
Automated experiments: Accessing phase, scientists and researchers
scientific data is clearly a very vast augmented by computers were
task, which needs efficient and able to retrieve information from
rapid systems. If the systems are the data by creating an interface or
equipped then there is an efficient bridge between science and data
science. And at this point of time, existing related or similar genes that
the complete size, dimensionality, are not available in the list, define
and the amount of scientific data the proteins that interact with one
have become very enormous that another, provide names of genes in
the dependence on the intelligent groups, list out the genes connected
and automated systems is becoming with diseases, emphasize protein
the need of the hour. Algorithms are functional domains and their motifs,
predominantly integrated to provide highlight related literature reviews,
faster accessing and accurate results, and conversion of gene identifiers.
the more the requirement, the more Prediction of transcription
efficient the algorithms to provide factor binding sites: The transcrip-
better discovery in the processing of tion factors are essential gene regu-
data. lators that have a characteristic task
Automated data collection: AI in the improvement, cell signaling
has started showing a great impact in and cycling, and their association
the field of health care; it is expected
with various diseases. There are
to provide machines that will help
about thousands of position weight
doctors, nurses, and other technicians
matrices (PWM) that are accessible
to save time on tasks. Techniques
to choose for the detection of explicit
such as voice-to-text transcriptions
binding sites. This process is mainly
will now help order tests, prescribe
used for the prediction based on the
medicines, and provide chart notes.
PWMs that have false-positive rates.
IBM’S Watson has started to provide
an opportunity to mine data and Simulation of molecular
help the physicians to bring out the dynamics: Molecular dynamics is
best and efficient treatment for the a computer simulation technique
patients. It renders help by analyzing useful for analyzing the physical
millions of medical papers using movements of atoms and molecules.
natural language processing to Molecular dynamics simulation
provide treatment plans. allows the study of complex and
Gene function annotation: The dynamic procedures that happen
major task involved in gene function in a biological system. The study
annotation is to identify enhanced includes confrontational changes,
biological subjects, to ascertain the protein stability, protein folding,
improved function-related gene ion transport in biological systems,
groups, the group repeated annota- molecular recognition such as of
tion expressions, categorize and proteins, DNA, and membranes and
display the related many-genes-to- also provides an urge to perform the
many-terms in a 2D view, search for other studies such as drug designing,
structure determination of X-ray, the stage of a particular disease. The

and nuclear magnetic resonance. most popular biomarker is protein
Literature mining: There is an molecular biomarker because of
increase in research studies in the the availability of a large range of
biomedical field every day; there is systematic instrumentation, which
a need to automate systems that can are used to identify and calculate
be helpful to retrieve unknown facts proteins in the compound biological
and information from the already sample.
published articles and journals. This Drug-target prioritization:
automated system is known as text Target prioritization helps to iden-
mining, which helps in preprocessing tify suitable targets like proteins,
of the documents, natural language genes, nonpeptide gene products for
processing, searching and retrieval classification, since it is the most
of documents, techniques useful for difficult step in biology especially
clustering, and classification of the in annotating the gene function,
gathered information. Text mining discovery of drugs, and provide a
methods will ease the mining of a molecular base on diseases. For a
huge amount of information on a gene, the treatment versus control
topic from the published research discrepancy expression level will be
publications/articles and provide a determined. To integrate informa-
concise conclusion that is impossible tion on the functional associations
to get. between proteins, the differential
expression values are mapped to the
STRING network. Genes are being
7.1.2 TRANSLATIONAL ranked with respect to their differ-
RESEARCH ential expression of their nearest
neighbors in the network using the
Biomarker discovery: Biomarkers kernel diffusion method. The steps
are biological descriptions that are through the network resolve on the
efficiently measured and evaluated size of the neighborhood considered.
as indicators of biological proce- The correlation diffusion method
dures, pharmacologic responses to a is used to rank the genes using the
therapeutic intervention. Biomarkers differential expression of them using
are effectively used to provide the tough connectivity correlation.
disease occurrences, progress, Drug discovery: In the existing
and effectiveness of the medicinal research, drugs were discov-
treatment, patient vulnerability to ered either by identifying the
develop a particular disease, or even active compound from existing
predict the usefulness of treatment at remedies or through unexpected or
fortunate discovery. At present, new the determination of toxic dosage

approaches have been made, first, to measures in animals, which can
recognize how the disease and the also help to decrease the number of
particular infection can be controlled animal trials. As these are helpful in
at the molecular and physiological providing prediction of compound
level and to target the attributes drug-like features rapidly in the
on the information gathered. The present decision making system
process involves recognition of of drug discovery, AI technologies
candidates, synthesis characteriza- are being extensively accepted in
tion, screening, and attempts for creating fast and high throughput
therapeutic value. If a compound has in silico analysis. It is accepted that
revealed its value in the tests, it will early screening of the chemical attri-
start the procedure for the develop- butes can effectively decrease the
ment of drugs before trials. heavy costs related to late stage fail-
Drug repurposing: Drug ures of drugs due to poor properties.
repurposing (also known as drug The toxicity prediction procedures
reprofiling or drug repositioning) and structure–activity correlation
is the technique of redeveloping a depends on the exact estimation and
compound for usage in a different depiction of physicochemical and
disease. This method is based on toxicological properties.
the fact that many of the recognized Genetic variant annotation:
drugs and neglected compounds Genetic variant annotation and its
have already been tested by humans management offer a solution for
and their relevant information will collecting and annotating genetic
be made available on their pharma- data result with the help of a huge
cology, formulation, dosage, and impact on public resource domains.
possible toxicity. Drug repurposing The system is efficient in handling
has advantages over the established thousands of millions of rows,
existing drug discovery approaches filtering out from the input data of
where it considerably reduces the SNPs and genetic variants of interest.
cost and creation time, also provides The queries that arise are to display
a need to undergo clinical trials by genetic variants that are identified
humans. to be probable pathogenic from a
Prediction of chemical toxicity: repository database while limiting
The prediction of complex toxici- the search to only those possibili-
ties is a significant component of ties present in exon coding regions.
the drug design development They present rare genetic varieties
procedure. Computational toxicity that contain population possibilities
inferences are not only quicker than that are lesser than 5% in the 1000
genomes when restricting the search lumbar puncture to confirm whether

variants to display their associations it is caused by a bacteria or virus.
in public sources studies. It is also The application of machine learning
helpful in annotating the association in diagnostics has just begun, where
analysis results, rapidly displaying the determined systems access a
particular genetic types in a disease or its growth with the help
particular interest, also with linkage of multiple data sources such as CT,
disequilibrium plots. MRI, genomics and proteomics,
patient data, and even handwritten
medical chart records. AI systems
7.1.3 CLINICAL PRACTICE will be essential in differentiating
between normal, cancerous, and
Disease diagnosis: As a well- malignant lesions/patterns, which
known factor, diseases like cancer, will be helpful to the doctors to give
meningitis, heart failures, brain an inference from those predictions.
tumor, lung cancers or strokes, Interpretation of patient
kidney failure, liver sclerosis, and genomes: The interpretation of
many other infectious diseases are genomic data is an even more
being diagnosed effectively. With complex task than producing and
the rapid development of diagnostic organizing the data. The genome
techniques and systems, people get does not signify the identical thing
the information associated with their to each person at every stage of
symptoms on the Internet easily. time. The implication of particular
Provided one must not get confused variations depends on the age, health
with the common symptoms and conditions, and other contextual
conclude that it could be so and so aspects through many life phases.
disease, rather visit a doctor imme- Using the cancer genomes, the
diately when the symptoms go high. sequence reveals a huge amount of
Nowadays, it is a common practice data on the variations that can be
by physicians to give either an used in classification, prediction, and
antibiotic or a sedative based on the therapeutic management. Therefore,
patient’s situation and balance their it is practical to know whether in the
condition for a particular period. One future the easy, cost-effective, and the
must be aware of the fact that some efficient assay will be the predictive
diseases like meningitis need to be power of tumor genome sequencing
taken care of immediately than wait or even sequencing key variants.
for severe symptoms to project out, An important research aspect is
as it has to be checked immediately whether every cancer is diverse if it
and correctly using CT scan and is true then it is compulsory to skim
through huge amounts of genomic robot-assisted surgeon to perform

data to know each patient’s disease. the surgery securely without the risk
Automated surgery: Currently, of destructing vital structures caused
surgical robots are human–slave by accidental crash or unknown
machines without data-driven AI. anatomy. In the upcoming advance-
Computer vision systems will soon ment, surgeons will be able to use
encompass surgical depth imaging robots to perform recurring jobs like
for difficult tasks. The robots will use providing first aid, suturing, which
sensors to accumulate data during a may provide time for the more
surgical process, and hence, they essential area of the operations. The
become semi-autonomous. Particu- patients belonging to rural hospitals
larly, robots must have sensors that who attain fewer facilities will also
will help to analyze the surgical get benefited from this safe and
setting and recognize the placement complication-free surgery.
of tools and instruments with respect Patient monitoring: Unin-
to the setting in real-time. Surgical terrupted readings of patient’s
automation also requires to under- parameters like blood pressure,
stand how and where to navigate respiratory rate, heart rate and
the instruments with a good level rhythm, blood–oxygen diffusion,
of accuracy at the time of opera- and other parameters are essential
tion. The three-dimensional (3D) factors to look for in critical patients.
integrated technologies will be used Quick and accurate decision making
to provide precise depth interpreta- is essential for efficient patient
tion of the surgical surface and also care, where electronic monitoring
estimate the position of the surgical is done consistently to gather and
instruments and the camera that is in display physiological informa-
contact with the surface. Such type tion. The data are gathered using
of information/facts are required harmless sensors from patients at
to build a data-driven platform for hospitals, medical centers, delivery
surgical robots. The information and labor wards, clinics, or even
received from computer vision from patient’s homes to sense any
systems will ensure a better digital unpredicted serious life situations or
understanding of the surgical field. even to document routine necessary
The information gathered creates data capably. Patient monitoring
a base for computer-driven robotic is generally defined as frequent or
surgery, as robots will now be able nonstop remarks of observations or
to see through and comprehend the measurements of the patient, their
area where it has been operating. physiological functioning, and also
This process surely makes the the functioning of the life support
device, which is used for conducting high-cost crisis or unwanted

and managing decisions like thera- acute care service.
peutic recommendations and evalu- 2. High risk: This group
ation of those interventions. contains 20% population
Patient risk stratification for count and includes patients
primary prevention: Risk stratifica- with numerous risk factors
tion helps providers to recognize the that when left unseen will
accurate level of concern and service consequence the shifting of
for diverse groups of patients. It patients into a high complex
is generally a process to assign a group. This cluster of
jeopardy strategy to a patient and patients is suitable to hold in
then using the accumulated facts to a planned care management
direct care and improvise the health program that will offer one-
conditions. The main aim of risk to-one support in the supervi-
stratification is to partition/group sion of medical, community,
patients into separate groups of and social requirements. A
similar complications and concern care manager assigned to
needs. The groupings may be classi- work for this group ensures
fied into highly complex, high-risk, that every patient gets
rising-risk, and low-risk patients. appropriate disease supervi-
Special care modules and strategies sion and also preventive
are used for every group. measures.
3. Rising risk: This group
1. Highly complex: It is a group consists of patients who
containing a small number of suffer from one or more
patients, which needs inten- chronic conditions or threats
sive care. This group may and those who do not have
contain a count of about 5% a balanced condition, i.e.,
who has multiple complex sometimes they are fine and
illnesses, which may include sometimes they go down
psychosocial needs or in their conditions. The
barriers. Care models for this analysis done on this group
group of patients need exclu- showed that rendering care
sive, proactive intensive care management services to this
management. The objective group of patients will reduce
of this group is to make use the count of patients who has
of low-cost care manage- moved to high-risk groups
ment facilities to attain better by 12%, which is a reduction
health results when avoiding compared to a 10% decrease
in the overall costs. The area of interest because of the inter-

general risk factors include ference of tissue or muscle mass, so
smoking, obesity, blood fluoroscopy was used to overcome
pressure, and cholesterol such situations . In the late 1920s,
level monitoring. Recog- radiologist gave patients radio-
nizing these risk issues will opaque barium to swallow so where
provide the team to target the barium traversed as it entered
the root cause of multiple the gastrointestinal tract. This was
conditions. helpful in analyzing cancers formed
4. Low risk: This group belongs in the stomach, esophagus, bowel
to the patients who follow a also ulcers, diverticulitis, and also
proper and healthy diet and appendicitis. With the invention
are stable in condition. The of fluoroscopy, many diseases that
patients in this group have were diagnosed are now being
slight conditions that will be easily analyzed using a CT scan.
managed effortlessly. The X-ray tomography was established
main objective of this low- in the 1940s where “tomograms” or
risk model is to maintain the slices were obtained through tissues
health care system and to without an over or underlying tissue
keep the patients healthy. being captured. It was attained by
rotating the X-ray tube such that
only the required region of interest
7.2 LITERATURE REVIEW can be focussed and captured during
the rotation of the tube. In the
Biomedical imaging in the field of current era, tomography is no longer
radiography had initiated the devel- being used, which is being replaced
opment of X-ray, CT scan, positron with CT scans. To get to the point,
emission tomography (PET), MRI, both CT and MRI are tomography
and ultrasound, which produced techniques that are useful to display
good visualization of the affected the anatomy in slices than through
areas. projections like the working of X-ray.
X-ray: Medical imaging was In the late 1950s, a new technique
initiated during the year 1895 known as nuclear medicine entered
when Wilhelm Conrad Roentgen the diagnostic imaging procedures.
discovered X-ray.1 X-ray is the basic In these procedures, the source of
machine commonly used to analyze X-rays was not X-ray tubes but
bone and chest fractures (Figure radioactive compounds, which emit
7.2). Many times radiologists found gamma rays as they start to decay.
it difficult to correctly analyze the The test that is generally used today
is PET, where the isotopes emit posi- these images. Today’s multidetector
trons (positively charged electrons) row CTs acquire multiple sub-
instead of emitting gamma rays. millimeter spatial resolution slices
Commonly, PET is based on the with processing speeds measured
positron-emitting isotope of fluorine in milliseconds rather than hours.
that is integrated into glucose called Iodinated contrast agents are used
fluoro-deoxyglucose. with CT since they block X-rays
based on their density compared
with that of normal tissue.
FIGURE 7.3 Computed tomography

FIGURE 7.2 X-ray procedure of the back. scanning procedure.
Computed tomography (CT): Magnetic resonance imaging

Before the initiation of CT in (MRI): MRI evolved in the 1970s,
1973, there were only plane films initally producing images with low
of the head showing the bones or spatial resolution through the resis-
angiography showing the masses tive magnets that had weak magnetic
when the vessels of the brain were fields.2 The soft tissue evaluation
banished from their original posi- of MRI was better than that of CT
tion. Fundamentally, there was no when an early diagnosis was made.
way to directly image the brain. In MRI had an advantage where it
CT, an X-ray tube rotates around did not involve ionizing radiation
the patient and various detectors like the X-ray-based CT scanners.
pick up the X-rays that are not Most medical MRI nowadays uses
absorbed, reflected, or refracted as the hydrogen nucleus since it is
they pass through the body (Figure so plentiful in water and for the
7.3). Early CT units produced crude reason that its nucleus has a property
images on a 64 × 64 matrix. Early known as spin. Functional MRI
computers took all night to process is another imaging method called
“magnetoencephalography” or 2D, meaning the MRI images are

“MEG.” MEG is alike to the known accessible in slices—top to bottom.
electro-encephalography (EEG) also Nevertheless, by means of useful
it is better than EEG for localizing computer computation, the 2D slices
signals coming from the brain. The can be fixed together to construct a
electrical signals of EEG are disfig- 3D model of the area of interest that
ured by the scalp and other tissues was scanned, thus it is called 3D
between the brain and the electrodes MRI. MRI is available for a huge
on the skin. The electrical currents variety of analysis like heart-vessel
pulled out up by EEG also create functioning, liver-bile ducts diseases,
weak magnetic fields picked up by chest imaging, nerve conditions in
MEG but without the interference the brain, orthopedic situations like
of the scalp. The magnetic fields shoulder and hip injury. 3D MRI will
from the brain are of various orders give clear information on conditions
of magnitude, which are fewer than like cardiovascular pathology and
the earth’s magnetic field; MEG hepatobiliary pathology. The 3D
needs to be carried out in a special reconstruction of the nervous system
magnetically protected room. is also done. MRI utilizes strong
Functional MRI is based on blood magnets that create a powerful
flow, the resolution is on the order magnetic field that forces protons in
of seconds, on the other hand, MEG the body to line up with that field.4
works on the order of milliseconds. When radiofrequency current is
The magnetic signals identified by pulsed into a patient, the protons get
MEG are typically presented on a enthused, and turn out of equilib-
3D MRI that has been blown up like rium, straining against the pull of the
a balloon. magnetic field. When the radiofre-
Magnetic resonance imaging quency current is turned off, the MRI
(MRI) is a diagnostic system used to sensors will be able to perceive the
generate images of the body (Figure energy that is released as the protons
7.4).3 It requires no radiation since it realign with the magnetic field. The
is based on the magnetic fields of the time required for the protons to
hydrogen atoms in the body. MRI realign with the magnetic field and
is proficient to present computer- the amount of energy released will
generated images of the human’s change according to the environ-
internal organs and tissues. MRI ment and the chemical nature of the
generally scans the body in an axial molecules. The difference between
plane surface (splitting the body various types of tissues are distin-
into slices starting from front to guished by the physicians based on
back). Generally, the images are in the magnetic properties.
FIGURE 7.4 Magnetic resonance imaging

machine.
Ultrasound: Ultrasound was

initially used medically during the
1970s. Unlike X-ray and nuclear
medicine, ultrasound uses only FIGURE 7.5 Ultrasound machine.
sound waves and not ionizing
(Figure 7.5). When the sound waves Sphygmomanometer: A sphyg-
pass through the tissue and then are momanometer is a device that is
reflected back, tomography images used to measure blood pressure.5 It
will be produced and tissues will be consists of an inflatable rubber cuff,
categorized. For example, a lump wrapped around the arm (Figure 7.6).
or a mass found on a mammogram A measuring device specifies the
can be additionally categorized as cuff’s pressure. A bulb inflates the
solid (cancer) or cystic (benign). cuff and a valve liberates pressure. A
Ultrasound is, in addition, valuable stethoscope is further being used to
for the noninvasive imaging of the listen to arterial blood flow sounds.
abdomen and pelvis, as well as Blood is forced through the arteries
imaging the fetus during pregnancy. as the heart beats and this causes a
Early medical ultrasound units were rise in pressure known as systolic
large equipments with articulated pressure, which is being followed
arms that formed low-resolution by a decrease in the pressure as the
images. These days ultrasound is heart ventricles get ready to perform
performed by a portable unit no another beat, this low pressure is
outsized than a laptop. known as diastolic pressure. The
systolic and diastolic pressures are At present, we can see that

stated as systolic “over” diastolic, many android applications are being
which is 120 over 80. Blood flow created that predict the blood pres-
sounds are known as Korotkoff sure and monitor the heart rate with
sound. There are three types of the flash of the mobile. Applications,
sphygmomanometers: mercury, namely, heart rate monitor, blood
aneroid, and digital (Figure 7.6). pressure diary, instant heart rate,
Digital sphygmomanometers are cardiac diagnosis, and many more
automated to provide blood pressure provide support at an emergency
reading without making someone situation but how accurate they are
to operate the manually used cuff in giving accurate results is still a
or even listen to the sound of blood question to the digital world.
flow through a stethoscope. Though Computers provided aide to the
physicians use a digital sphygmoma- world of medical imaging from the
nometer for testing, they still prefer early 1970s with the start of CT scan
manual sphygmomanometers for and then with the MRI scan. CT was
validating the readings in some situ- the main advancement that primarily
ations. On the other hand, the manual authorized numerous tomography
sphygmomanometers comprise of images (slices) of the brain to be
aneroid and mercury devices. The obtained. As technology advance-
operation of the aneroid and mercury ment started to develop, there have
devices is the same, except for the been many changes both dimen-
aneroid device requiring periodic sionally and algorithmically, only
calibration. to increase the speed and accuracy
FIGURE 7.6 Types of sphygmomanometer devices.

of the system and application, and main difference is that algorithms

also to decrease the workload of require a lot of real examples—thou-
the physicians and doctors. The sands and millionsto learn. Also,
technologies that have provided these examples are to be precisely
good services to the medical field are digitizedas machines cannot
being discussed below. comprehend between the lines in
textbooks. Hence, machine learning
is mainly useful in areas where the
7.2.1 BIOMEDICAL IMAGING analytical diagnostic data that a
TECHNIQUES USING AI doctor examines is already digi-
tized. Like, identifying lung cancer
Companies are incorporating using CT scans, considering the
AI-driven proposals in medical scan- risk of unexpected cardiac death or
ning devices to improve image clarity other heart ailment using electrocar-
and clinical results by reducing the diograms and cardiac MRI images,
exposure to radiation emission. classifying skin abrasion in skin
General Electric (GE) Healthcare in images, finding markers of diabetic
collaboration with the NVIDIA AI retinopathy in eye images.
platform claims to improve the speed The newly developed devices
and accuracy of the existing CT tend to provide quick and accu-
scans. The algorithms are powered rate medical decisions to avoid
up to reorganize small patterns of collapsing of life. One such handy
the damaged organ that seems to device that came into the world is the
have been failed to notice when the Accu-Chek,6 it provided a complete
physician was skimming the scan. solution for diabetes monitoring,
The finer details getting captured where people can test frequently for
can help in supporting faster diag- a better knowledge and also control
nosis and reduced error rates. The it accordingly. It gives the result in
GE healthcare system claims that just a few seconds at temperatures
the CT scan system developed will from 57 to 104°F. The device poten-
be two times faster than the existing tially works on testing diabetes,
system and it is probably useful where the patient inserts a biosensor
to quickly detect liver and kidney into the meter and drops in a tiny
lesions, because of the high volume drop of blood. The biosensor collects
of data accessible through NVIDIA’s the drop of blood, it performs a
AI platform. sequence of tests, which undergoes
Machine learning algorithms an enzymatic chemical reaction and
can be trained to see patterns in the then followed by an electrochemical
same way doctors see them. The reaction. On the other hand, the
electronic meter is used for measure- Guardian Connect system. It is the

ment, storage, and communication. first smart standalone Continuous
It applies potential differences in a Glucose Monitoring system4
programmed manner to the sensor, intended to empower individuals
collects biamperometric current with diabetes consuming multiple
data, later records, and displays the daily injections by means of action-
results. able equipment to help them get
Medtronic is one of the compa- forward with the high- and low-
nies manufacturing medical devices glucose levels. It provides support
that are creating their step into the for customizable prognostic alerts
AI world, aiming to help diabetic up to 60 min before an unpredict-
patients handle their situation more able level high or low, and the
efficiently.7 In September 2016, Sugar.IQ support, guardian connect
Medtronic announced its Sugar. continuous glucose monitoring
IQ app, developed in collaboration offers individuals with additional
with IBM Watson, a mobile assis- options to stay in scope than any
tant that helps to track the amount other continuous glucose monitoring
of glucose content in food items, system.
specify diet chart, therapy-related Apple Watch users can now take
actions through the sensors. It a look at their heart rhythm just by
works to monitor continuously the holding the crown of the device.8
blood glucose levels by examining The software update provided to
the data generated from Medtronic the Apple watch series 4 provides a
glucose sensors and insulin pumps. new feature, to identify atrial fibril-
The devices will be attached to the lation, also provides extra passive
patient’s or user’s body. The applica- monitoring. People over the age of
tion characteristics consist of a smart 22 and above can utilize the features
food logging system, motivational provided to differentiate between a
insights, a glycemic aide, an infor- normal heart rate or with atrial fibril-
mation tracker. The analysis is done lation and sinus rhythm. An optical
with the help of machine learning heart sensor uses green LED lights
algorithms and Internet of Things to paired with light-sensitive photodi-
predict the outcomes. The company odes helpful to detect blood volume
also developed a MiniMed 670G pulses in the wrist of a human using
system that will deliver insulin level the “photoplethysmography” based
(self-adjust baseline insulin) needed algorithm, which is an easy and inex-
at a given time every five minutes. pensive optical method that is used
The Sugar.IQ app is made acces- to detect blood volume variation in
sible to consumers of the Medtronic the micro-vascular area of tissue. To
verify heart rate variability, Apple zooming in on identifying features

Watch captures a tachogram, a plot in the images and help to automate
of the time between heartbeats, the diagnoses. Measurements on
every 2–4 h. It also allows the user images can be performed with up to
to send messages to friends showing four linear measurements and one
their level of heart rate in case of elliptical dimension. Butterfly iQ has
emergency and also recommends the a built-in battery to reduce drawing
user to visit a doctor to seek proper power away from the mobile. A
medical care if irregular symptoms wireless charging base is provided,
are seen. The watch works in models where one needs to place the iQ on
like the iPhone 5 and above, though the charger base where the battery
it can work with fewer features indicator should face upwards and
when not connected to the phone, the charging will be completed in
it is necessary that iPhone has to be less than 5 h. The battery lasts up to a
connected for it to work efficiently. 10-h shift or nearly 2 h of continuous
The Apple Watch integrates innova- scanning. The service will incorpo-
tion in hardware, software, and user rate deep learning techniques into
interface design such that users can its device. The ultrasound-on-chip
interact with it through sight, sound, technology will replace the existing
and touch. Apart from monitoring transducer and the system with a
heartbeat rate, people can answer silicon chip. The plan is to even auto-
calls, send messages, keep track of mate many of the medical imaging
appointments and meetings, listen procedures, the device will be made
to music, find routes to destina- available for usage in clinics, retail
tions, and also even unlock doors. pharmacies, and in poorer regions of
It is customizable with a lot of user the world with affordable pricing of
desired accessories. about $1999. One added advantage
Entrepreneur Jonathan Rothberg is that Butterfly Cloud has been
of Butterfly Network proposed to given to the users of Butterfly iQ
create a new handheld medical- mobile app users, where the cloud
imaging device that aims to take both is a storage web application that
MRI and ultrasounds scans easier helps users to upload the case study
and in a cheaper rate.9 Butterfly iQ’s to an Internet-based storage space
device uses semiconductor chips, system. Butterfly Cloud is available
instead of piezoelectric crystals. for purchase to an individual and also
It will use ultrasound scanners to for a team. The preference comprises
create 3D images, then sends the of limitless archiving, anonymous
images to a cloud service, which sharing, and secure access through an
will further work on enhancement, iPhone mobile or a laptop. All local
and in-country privacy and security critical or serious conditions like the
regulations are provided that certify acute kidney injury and also provide
that the data is protected and secure. results of blood tests, X-rays and
DeepMind has bought many scans at the press of a button. Nurses
healthcare projects across the world, and other assistants said that the
now in collaboration with UCL’s application saved their time of up to
radiotherapy department, it has 2 h in a day.
initialized to reduce the amount Google Glasses initially helpful
of time taken to plan treatments.10 for recognizing text and translating
Through machine learning, Deep- it, recognizing objects, and searching
Mind has given access to 1 million for its relevant match, looking at
images of the eye scans, along with posters and playing videos, getting
their patient data. It sets to train directions on the go, all of this
itself to read the scans and predict happening in front of the eye.11 Some
spot early signs that may indicate the editions of Google Glasses had no
occurrence of the degenerative eye lenses in them; what all editions had
and also reduce the time taken for is a thick area of the frame over the
diagnosis is reduced to one fourth. right eye, it was where Google had
The convolution neural network inserted the screen for the glasses.
that was built for this system was To look upon the screen, one has to
the first deep learning model that peek up with the eyes. The region of
was intended to effectively learn placement was quite important, since
the control guidelines straight from the screen inserted in the direct line
a high-dimensional sensory input of vision may result in serious prob-
using the reinforcement learning lems. The display has a resolution of
algorithm. This remarkable accom- about 640 × 360 pixels, making it as
plishment was soon improved by on the low side for mobile devices.
succeeding forays into gaming. The camera has about 5-megapixel
During the year 2015, DeepMind quality and it also records videos at
in collaboration with Royal Free about 720 pixels. The only issue is in
NHS Trust had been used to create the battery life, which lasts for about
a patient safety application that 5 h for average usage, for taking a
was called “Streams,” it reviews longer video or using the glass for a
the clinical test results to check for longer time might drain the battery
signs of sickness and sends notifica- quickly. Google Glass has a storage
tions and alerts to staffs instantly capacity of about 16 GB of storage
if an emergency examination is and it also synchronizes with Google
required. The application also helps Drive for an added accessibility
physicians to rapidly check for other to the videos and photos taken by
the user. It is also equipped with a check the ongoing professional skill
micro-USB port for transferring files development and certification. The
and charging the device. The frame start of Google Glass will provide a
is generally lightweight and it has a technical change in the way people
replaceable nose pad in case of any get to understand the world.
accidental breaking. Sounds of phone An ultrasonography exam takes
calls and other notifications are quite a lot of time in identifying the
produced through bone-conduction planes in the brain, which needs an
transfer, or even by passing some ample amount of training and manual
vibrations directly to the skull, thus work.12 There could also be a missed
transmitting sound to ears. The glass or delayed diagnosis. Now, with AI
is an optical head-mounted display systems, users will just need to find
worn as a pair of spectacles. With its a starting point in the fetal brain and
multitasking capability and respon- the device will automatically take
siveness to hands-free voice and measurements after identifying the
motion commands gained acknowl- standard planes of the brain. The data
edgment in the medical field, where or the documentation is maintained
doctors can actually perform surgery as the patient may visit for examina-
as a surgical navigation display. The tion some other day; this will help in
first-ever surgery using Google Glass a more positive diagnosis.
was done by Dr. Marlies P. Schijven EchoNous has developed a
in the year 2013, at the Academic convolutional neural network for the
Medical Centre, Netherlands. In the automatic detection of the urinary
operation theatre, doctors can see the bladder with the help of high-quality
medical data without even having ultrasound images captured with
to turn away from the patients. Uscan, using the advantage of the
Researchers have found that the “high spatial density fanning tech-
navigating options were helpful in nique”.13 With the help of the captured
finding tumors by the doctors who image, one can compute the urinary
perform surgery and can also venture bladder volume with much higher
a form of a tunnel vision or blindness accuracy. Uscan actively recognizes
on a part that could make them miss the contours of the bladder, where
unconnected lesions or the inconve- the measurements are very accurate
nience around them. Google Glass than the results got from the existing
will further be helpful in recording scanners. EchoNous Vein is an
the surgery to maintain it for docu- ultrasound-based device intended
mentation purposes to keep a track particularly for nurses to improve
of the patient’s medical record and peripheral IV catheter placements.
to assess the surgical competency to It is being developed for handling
a wide variety of patients including could correct a person’s vision for

both adults and children. EchoNous life, effectively resulting in vision
Vein offers immediate, crisp images three times better than 20/20.15
at depths from 1 to 5 cm for rapidly The product came into life after
visualizing superficial and deeper eight years of research and about
veins with just two buttons to control. $3 million funding; the Ocumetics
Cancer can be diagnosed promptly Bionic Lens is said to require a
with the help of deep learning and painless 8-min in-office procedure
AI concepts.14 A Chinese start-up that requires no anesthesia. The
named “Infervision” uses image researcher folds up the custom-made
recognition technology and deep lens like a tiny taco (a Mexican dish)
learning to efficiently diagnose the to fit it into a saline-filled syringe.
signs of lung cancer with X-rays the Then, uses the syringe to place it in
same as Facebook recognizes faces the eye through a super-small inci-
in photographs. Infervision trains its sion, then leaves it there to unravel
algorithms routinely with data got over about 10 s. And, it is finally
from varied sources. Usually, it takes done. The bionic lens is made to
doctors almost 20 min to analyze an replace our natural eyes; the process
image, but with Infervision AI helps may get rid of any risk of cataracts in
in processing the visuals available the future. As cataracts may release
and generate a report within 30 s. chemicals that raise the risk of glau-
Data is passed through networks of coma and other issues, it also helps
nodes, where these networks adapt in protecting the overall eye health.
to the data that has been processed Diagnosing a disease or illness
from node to node. In this manner, needs a lot of facts about the patient,
the neural networks effectively AI seems to be well suitable in
process the data (next bit) that collecting information as it has a lot
comes along keeping a record of the of memory, energy, and it does not
data that came before it. This type even need to sleep or take rest.16 One
of ability to learn from the data and such AI system has been developed
effectively teach the system by itself in the University of California, San
is what made deep learning become Diego where Professor Kang Zhang
dominant. As in the initial training and his team trained an AI system
phase, an X-ray image was used to on the medical records from over
teach the system to predict whether 1.3 million patient data who have
it was normal or abnormal. visited medical clinics and hospitals
In May 2015, a Canadian in Guangzhou, China. The patient’s
ophthalmologist announced that data taken for training belonged
he has created a bionic lens that to age groups of all under 18 years
old, who visited doctors during the data, and also confirm whether the
period of January 2016 and January patient needs to visit doctor or it is
2017. The medical charts associated just a common cold. It is likely that
with every patient had medical charts junior doctors who depend on this AI
that were text written by doctors and system could possibly miss out on
also a few laboratory test inferences. their learning and check patterns in
To ease the work performed by AI, the patient’s queries. The team looks
Zhang and his team made doctors forward to training AI systems that
annotate the records to recognize can also diagnose adult diseases.
the part of the text linked with the A team from Beth Israel
patient's issues, the period of illness, Deaconess Medical Center and
and also tests performed. When the Harvard Medical School have devel-
testing phase began using unseen oped an AI system to predict disease
cases, AI was efficient enough based on the training given to the
to identify roseola, chickenpox, systems to investigate the patholog-
glandular fever, influenza, and ical images and perform pathological
hand–foot–mouth disease giving an diagnosis.17 The AI-powered systems
accuracy rate of about 90%–97%. It are incorporated with machine
may not be a perfect score but still, learning, deep learning algorithms,
we should know that even doctors where it trains the machines to
cannot predict correctly at times. The understand the complex patterns and
performance measure was compared structures experienced in real-time
with some 20 pediatricians who have data by creating multilayer percep-
various years of clinical experience. tron neural network, is a procedure
AI outperformed the junior doctors, that is used to show similarities with
and also the senior doctors performed the learning step that occurred in
well than the AI. When doctors are layers of neurons. In the evaluation
so busy while looking upon about where researchers were given slides
60–80 patients in a day, they can of the lymph node cells and required
only accumulate little information, to identify if it was cancerous or not,
as that is where the doctors lack the automated diagnosis method
interest and provided might make gave an accuracy of about 92%,
mistakes in recognizing the serious- which nearly matched the success
ness of the disease or illness. That is efficiency rate of the pathologist
where an AI can be counted on. AI who gave an accuracy of about 96%.
can be efficiently used to check out Recognizing the presence or absence
the patients in emergency sections, of metastatic cancer in the patient’s
provided AI should be able to predict lymph nodes is an important work
the level of illness with sufficient done by pathologists. Looking into
the microscope to skim through determined mission lies Emedgene’s

several normal cells to find out a few capability to ingest and analyze
of the malignant cells is a tremendous unstructured information from
task using conventional methods. present scientific publications with
This task can be efficiently done by the addition of a wealth of data to be
a computer that has high power, and accessible structured data. A set of
this was the case that was proved. convention machine learning algo-
The system was trained to differen- rithms that frequently discover clini-
tiate between normal and cancerous cally significant associative models
tumor regions through a deep multi- within the AI knowledge graph.
layer convolutional network. The With the help of these algorithms,
system extracted millions of small Emedgene can effectively identify
training samples and used deep the potentially contributing mutation
learning to build a computational for both recognized and unidentified
model to classify them. The team genes. Science is intensely indulged
later recognized the specific training in each and every aspect of the
samples for which the system fails to Emedgene platform. A meticulous
classify and hence re-train them with standard operating procedure guar-
a greater number of difficult training antees the quality and accuracy of the
samples, therefore improving the AI knowledge graph. The genomic
performance of the computer. It is research department leads to the
a good sign that AI will change the improved development of clinically
way we see pathological images in victorious variant discovery algo-
the coming years. rithms. Emedgene keeps on working
Emedgene is the world’s first to solve the unsolved cases routinely.
fully automated genetic interpreta- The samples are reanalyzed when
tion platform, incorporated with applicable new scientific results
advanced AI technology to consider- enter the knowledge graph. Regu-
ably restructure the interpretation and larly updating the AI knowledge
evidence presentation processes.18 graph, integrating the structured and
Emedgene is connected with the unstructured data from the latest
advanced AI technology to radically scientific literature. It contains an
scale genetic interpretations and automatic evidence builder that pres-
findings. The AI knowledge graph ents all of the available data points
codifies the complex and endlessly that guide to a particular variant
updated web of variants, genes, identification, such that it can offer
mechanisms, and phenotypes that the most time-effective decision
reside at the heart of the interpreta- support system for geneticists. The
tion process. At the center of this AI interpretation engine locates the
reason behind the genetic disease without the intervention of a doctor.

and provides proof for an obvious Getting to the point, AI can never
path to clinical conclusions. The replace a doctor at any point of time
clinical workbench is a fully featured but can only provide support.
lab solution that provides the user
from analysis to reporting. Using
Emedgene, healthcare suppliers are 7.3 CONCLUSION
able to offer individual care to more
and more patients with the help The field of AI has modified its
high-resolution rates and probably knowledge over the last 50 years.
improved yield. AI is hovering to transform many
portions of present medical customs
in the predictable prospect. AI
7.2.2 CHALLENGES systems can improve medical deci-
sion-making, ease disease analysis,
As there are progressive develop- recognize formerly unrecognized
ments in the medical AI systems, imaging, or genomic prototypes
there will be a foreseeable demand related to patient phenotypes, and aid
in their medical use and opera- in surgical interventions for a variety
tion, which may create new social, of human diseases. AI applications
economical, and legal problems. even include the probability to carry
Geoffrey Hinton, one of the great out medical knowledge to an isolated
researchers in neural networks area where experts are inadequate or
predicts that AI will bring extreme not accessible.
changes in the field of medical Even though AI assures to
science and practice. AI will improve change medical tradition, many
the value of care by decreasing technical disputes lie forward. As
human mistakes and reducing machine-learning-based processes
doctor’s tiredness caused by habitual depend deeply on the accessibility
clinical practice. But still, it may not of a huge quantity of premium-
decrease the workload of a doctor, quality training dataset, the concern
since clinical rules may recommend must be engaged to collect data
that diagnosis should necessarily be that is representative of the end
carried out very quickly for high-risk patient group. Data from various
patients. Even though AI’s systems healthcare centers will contain
and applications are being well different types of noise and bias,
equipped to provide quick solutions, which will cause a trained model on
for every support and decision- a hospital’s data to fail to generalize
making situation nothing is possible to a different one. Motivated by
knowledge disparities got from the check for the shortest route to reach
past and future human civilizations, your destination quickly, and to do
AI will quickly discover the type of what not.
potential that cliché would provide AI has improved medical analysis
to human development for a decade, and decision-making performance
century, or millennium starting in numerous medical undertaking
from now. Many high-end machine- fields. Physicians may necessarily
learning models produce outcomes adapt to their fresh practice as
that are complicated to understand data accumulators, presenters, and
by unaided humans. An AI would patient supporters, and the medical
absorb an enormous quantity of education system may have to give
extra hardware subsequent to the them the equipment and technique
attainment of some edge of profi- to do so. How AI-enhanced applica-
ciency. Faster-emerging artificial tions and systems performance will
intelligent system represents a provide a great impact on the existing
better scientific and technological medical practice including disease
challenge. analysis, detection, and treatments
In this era of advancement, we will probably be determined on how
can see that machines are learning AI applications will co-combine
while people are being hooked up with the healthcare systems that are
with their mobile; knowing the fact under revolutionary development
that it is simply a tool we fiddle with. financially with the adaptation of
As the advancement gradually moves molecular and genomic science.
up just to ease the humungous task The people who get benefited, or get
with the help of machines, people controlled from the AI applications
will now become jobless. AI-based and systems are yet to be determined,
applications and machines are now but the balance of rigid regulations
being used to clean the dishes, serve of safeguards and market services
as a waiter for 24 h with a single to certify that people/patients get
charge, check for cash balance at the advantage the most should be if great
bank, talk when bored, and answer priority. AI is a one road challenge,
your mysterious questions, help and it is the road we will end up
you in cooking, recite a poem or taking.
even sing a lullaby song when you In this article, we came across a
are insomniac, suggest you to drink plenty number of applications and
water, check for the recent missed systems that have started to create
calls and dial a number, give you wonders in the field of medical
the recent weather updates, drive science, such as the drug discovery,
you home safe in the driverless car, automated data collection, literature
mining, simulation of molecular 3. Retrieved from https://www.myvmc.

dynamics, prediction of chemical com/investigations/3d-magnetic-
resonance-imaging-3d-mri/
toxicity, disease diagnosis, auto- 4. Retrieved from https://www.
mated surgery, patient monitoring, khanacademy.org/test-prep/mcat/
and many more. Every potential physical-sciences-practice/physical-
discovery approached has helped sciences-practice-tut/e/the-effects-of-
ultrasound-on-different-tissue-types
a lot of researchers and medical 5. Retrieved from https://www.
practitioners in making their work practicalclinicalskills.com/
easier and also providing competent sphygmomanometer
efficiency. 6. Hill, Brian. Accu-Chek Advantage:
Electrochemistry for Diabetes
This is just the beginning of a
Management. CurrentSeparations.
storm. The more the medical data is com and Drug Development 21(2)
digitized and unified, the more the (2005). Retrieved from http://
AI will be used to help us discover currentseparations.com/issues/21-2/
cs21-2c.pdf
important patterns and featuresthese
7. Retrieved from https://www.
can be used to make accurate, cost- mobihealthnews.com/content/
effective decisions in composite medtronic-ibm-watson-launch-sugariq
analytical processes. -diabetes-assistant
8. Retrieved from https://electronics.
howstuffworks.com/gadgets/high-
tech-gadgets/apple-watch.htm
KEYWORDS 9. Retrieved from https://www.
butterflynetwork.com
artifcial intelligence 10. Retrieved from https://deepmind.com
11. Retrieved from https://electronics.
medical diagnosis howstuffworks.com/gadgets/other-
digital world gadgets/project-glass.htm
diagnosticimaging.com/article/how-
ai-changing-ultrasounds
ENDNOTES
businesswire.com/news/home/
1. Bradley, William G. History of 20180711005836/en/EchoNous-Vein-
medical imaging. Proceedings of Receives-FDA-Approval-New-
the American Philosophical Society Innovation
152(3) (2008): 349–361. Retrieved 14. Retrieved from https://www.
from http://www.umich.edu/~ners580/ b e r n a r d m a r r. c o m / d e f a u l t . a s p ?
ners-bioe_481/lectures/pdfs/2008- contentID=1269
09-procAmerPhilSoc_Bradley- 15. Retrieved from http://ocumetics.com/
MedicalImagingHistory.pdf 16. Retrieved from https://www.
2. Retrieved from https://www.lanl. newscientist.com/article/2193361-
gov/museum/news/newsletter/2016- ai-can-diagnose-childhood-illnesses-
12/x-ray.php better-than-some-doctors/
17. Retrieved from https://healthcare-

in-europe.com/en/news/artificial-
intelligence-diagnoses-with-high-
accuracy.html
18. Retrieved from https://emedgene.com/
CHAPTER 8
ANALYSIS OF HEART DISEASE

PREDICTION USING MACHINE
LEARNING TECHNIQUES
N. HEMA PRIYA,* N. GOPIKARANI, and S. SHYMALA GOWRI
Department of Computer Science PSG college of Technology,
Coimbatore -14 Tamil Nadu
Corresponding author. E-mail:nhemapriya@gmail.com
*
ABSTRACT identified. The results show that the

machine learning algorithms work
Analytical models are automated well for heart disease prediction and
with the method of machine learning, the model is trained by applying
a predominant evolution of artificial different algorithms. Each algorithm
intelligence. In health care, the works on the model to fit and trains
availability of data is high, so is the the model using a training dataset
need to extract the knowledge from and the model is tested for a different
it, for effective diagnosis, treatment, dataset, test data for each of the algo-
etc. This chapter deals with heart rithms. The accuracy increases with
disease, which is considered to be the random forest algorithm, being
an obvious reason for the increase in 89% that is considered to be better
mortality rate. A method to detect the than other techniques.
presence of heart disease in a cost-
effective way becomes essential. The
algorithms considered are K-nearest 8.1 INTRODUCTION
neighbors, decision tree, support Health care is one of the most
vector machine, random forest, important areas of huge knowledge.
and multilayer perceptron. The Extracting medical data progres-
performances of these algorithms sively becomes more and more
are analyzed and the best algorithm necessary for predicting and treat-
for heart disease prediction is ment of high death rate diseases like
a heart attack. Hospitals can make involve the heart and blood vessels.
use of appropriate decision support Cardiovascular disease includes
systems, thus minimizing the cost of coronary artery diseases (CADs) like
clinical tests. Nowadays, hospitals angina and myocardial infarction
employ hospital information systems (commonly known as a heart attack).
to manage patient data. Terabytes There is another heart disease called
of data are produced every day. To coronary heart disease, in which
avoid the impact of the poor clinical a waxy substance called plaque
decision, quality services are needed. develops inside the coronary arteries
Hospitals can make use of appro- that is primarily responsible for
priate decision support systems, thus supplying blood to the heart muscle
minimizing the cost of clinical tests. that is rich in oxygen. When plaque
Huge data generated by the health- accumulates up in these arteries, the
care sector must be filtered for which condition is termed as atheroscle-
some effective methods to extract the rosis. The development of plaque
efficient data is needed. The mortal happens over many years. Over
rate in India increases due to the time, this plaque deposits harden or
noncommunicable diseases. Data rupture (break open) that eventually
from various health organizations narrows the coronary arteries, which
like World Health Organization and in turn reduces the flow of oxygen-
Global Burden of Disease states that rich blood to the heart. Because of
most of the death is due to the cardio- these ruptures, blood clots form on
vascular diseases. its surface. The size of the blood
Heart disease is a predominant clot also makes the situation severe.
reason for the increase in the The larger blood clot leads to flow
mortality rate. A method to detect the blockage through the coronary artery.
presence of heart disease in a cost- When time passes by, the ruptured
effective way becomes essential. The plaque gets hardened and would
objective of the article is to compare eventually result in the narrowing
the performance of various machine of the coronary arteries. If the blood
learning algorithms to construct a flow has stopped and is not restored
better model that would give better very quickly, that portion of the heart
accuracy in terms of prediction. muscles begins to die.
When this condition is not treated
8.2 OVERVIEW OF HEART as an emergency, a heart attack occurs
leading to serious health problems
DISEASE
and even death. A heart attack is a
Heart diseases or cardiovascular common cause of death worldwide.
diseases are a class of diseases that Some of the symptoms of the heart
Analysis of Heart Disease Prediction Using Machine Learning Techniques 175
attack (Sayad & Halkarnikar, 2014) Heart diseases develop due to

are listed below. certain abnormalities in the func-
tioning of the circulatory system
• Chest pain: The very common or may be aggravated by certain
symptom to easily diagnose lifestyle choices like smoking,
heart attack is chest pain. If certain eating habits, sedentary life,
someone has a blocked artery and others. If the heart diseases are
or is having a heart attack, he detected earlier then it can be treated
may feel pain, tightness, or properly and kept under control.
pressure in the chest. Here, early detection is the main
• Nausea, indigestion, heart- objective. Being well informed about
burn, and stomach pain: the whys and whats of symptoms
these are some of the often that present will help in prevention
overlooked symptoms of a summarily (Chen et al., 2011).
heart attack. Women tend to The health-care researchers
show these symptoms more analyze that the risk of heart disease
than men. is high and it changes the life of the
• Pain in the arms: The pain patients all of a sudden. There are
various causes for heart disease some
often starts in the chest and
of them are due to change in life-
then moves towards the arms,
style, gene, and smoking. Numerous
especially on the left side.
variations of genetic will increase
• Dizziness and light headed:
heart disease risk. When the heart
Things that lead to the loss of
disease is accurately predicted, the
balance.
treatment of the same that includes
• Fatigue: Simple chores that the intake of cholesterol-lowering
begin to set a feeling of tired- drugs, insulin, and blood pressure
ness should not be ignored. medications is started.
• Sweating: Some other Predicting is not easier; the accu-
cardiovascular diseases that rate prediction of heart attack needs
are quite common are stroke, a constant follow up of cholesterol
heart failure, hypertensive and blood pressure for a lifetime.
heart disease, rheumatic heart A foundation of plaques that cause
disease, cardiomyopathy, heart attack needs to be identified
cardiacarrhythmia, congenital that is much more sensitive to the
heart disease, valvular heart patient. The health-care industry
disease, aortic aneurysms, had made its contribution to early
peripheral artery disease, and and accurate diagnosis of heart
venous thrombosis. disease through various research
activities. The function of the heart chest pain is one of the symptoms
is affected due to various conditions for coronary artery; all the people
that are termed as heart disease. would not have the same symptoms
Some of the common heart diseases as others, some may have chest pain
are CAD, cardiac arrest, congestive as a symptom of indigestion. The
heart failure, arrhythmia, stroke, and doctor confirms the heart disease
congenital heart disease. with the diagnosed report of the
The symptoms to predict heart patient and various other param-
disease depend upon the type of eters. Some of the most common
heart disease. Each type will have heart diseases are listed in Table 8.1
its own symptoms. For example, with their description.
TABLE 8.1 Most Common Cardiac Diseasesa

Sr. No Cardiac Disease Explanation
Coronary artery disease The condition where circulatory vessels that
(CAD) supply oxygenated blood to the heart get
narrowed. This occurs due to a deposition of
plaque
Cerebrovascular disease A type of CVD associated with circulatory
(CVD) vessels that supply blood to the brain, causing
the patient to have a stroke
Congenital heart disease Most commonly identified as birth defects, in
(CHD) the new born children
Peripheral arterial disease A condition caused by reduced blood supply
(PAD) to limbs due to atherosclerosis
a
Src: Sachith Paramie Karunathilake and Gamage Upeksha Ganegoda, 2018, “Secondary
Prevention of Cardiovascular Diseases and Application of Technology for Early Diagnosis,”
Hindawi Bio Med Research International, Article ID 5767864
8.3 MOTIVATION The motivation to do this problem

comes from the World Health Orga-
It is evident from the recent statistics nization estimation. According to
that the major cause for the death the World Health Organization esti-
of both males and females is heart mation until 2030, very nearly 23.6
disease. In 2017, nearly 616,000 million individuals will die because
deaths have been caused due to heart of heart diseases. So, to minimize the
disease. Hence, the need for an effi- danger, the expectation of coronary
cient and accurate prediction of heart illness ought to be finished. Analysis
disease is increasingly high. of coronary illness is typically in
view of signs, manifestations, and their understanding of data. This

physical examination of a patient. technology is totally varied from
The most troublesome and complex the ways in which companies
assignment in the medicinal services normally present data. Rather than
area is finding the right ailment. beginning with business logic
Providing quality service at a and then applying data, machine
cost that is affordable lies as a major learning techniques enable the
challenge in front of the health-care data to create the logic. One of the
organization and medical centers. greatest benefits of this approach is
The quality service includes the to remove business assumptions and
proper diagnosis of the patients biases that can cause leaders to adopt
leading to the administration of a strategy that might not be the best.
effective treatments. The main Machine learning requires a focus on
objective of the chapter is to analyze managing the right data that is well
the performance of various machine prepared. Proper algorithms must
learning algorithms for heart disease be selected to create well-designed
prediction. The future is expecting models. Machine learning requires
the usage of the above techniques for a perfect combination of data,
eliminating the existing drawbacks modeling, training, and testing.
and improving the prediction rate In this chapter, we focus on
thus providing a way for improving the technology underpinning that
the survival rate for the well being of supports machine learning solutions.
mankind. Analytical models are automated
with the method of machine learning.
A machine learns and adapts from
8.4 LITERATURE REVIEW
learning. Machine learning helps
Each individual could not be equally computers to learn and act accord-
skilled and hence they need special- ingly without the explicit program.
ists. Each specialist would not have It helps the computer to learn the
similar talents and hence we do not complex model and make predic-
have authorized specialists’ access tions on the data. Machine learning
without effort. has the ability to calculate complex
mathematics on big data. The
independent adaption to new data is
8.4.1 MACHINE LEARNING achieved through an iterative aspect
of machine learning (Ajam, 2015).
Machine learning (Hurwitz, 2018) is Machine learning helps to analyze
a powerful set of technologies that huge complex data accurately. The
can help organizations transform heart disease predicting systems
build using machine learning will be The same level of training while
precise and it reduces the unknown developing a model is needed after a
risk. The machine learning tech- model is built. The machine learning
nique will take completely different cycle is continuous, and choosing the
approaches and build different correct machine learning algorithm
models relying upon the sort of is just one of the steps.
information concerned. The value The steps that must be followed
of machine learning technology in the machine learning cycle are as
is recognized in the health-care follows:
industry with a large amount of data.
Identify the data: Identifying the
It helps the medical experts to predict
relevant data sources is the first
the disease and lead to improvised
step in the cycle. In addition, in the
treatment.
process of developing a machine
Predictive analytics (Hurwitz,
learning algorithm, one should plan
2018) helps anticipate changes based
for expanding the target data to
on understanding the patterns and
improve the system.
anomalies within that data. Using
such models, the research must Prepare data: The data must be
be done to compare and analyze cleaned, secured, and well-governed.
a number of related data sources If a machine learning application is
to predict outcomes. Predictive built on inaccurate data the chance
analytics leverages sophisticated for it to fail is very high.
machine learning algorithms to
gain ongoing insights. A predictive Select the machine learning algo-
analytics tool requires that the model rithm: Several machine learning
is constantly provided with new data algorithms are available out of
that reflects the business change. which best suitable for applications
This approach improves the ability to the data and business challenges
of the business to anticipate subtle must be chosen.
changes in customer preferences, Train: To create the model,
price erosion, market changes, and depending on the type of data and
other factors that will impact the algorithm, the training process may
future of business outcomes. be supervised, unsupervised, or
The machine learning cycle reinforcement learning.
creates a machine learning applica-
tion or operationalizing a machine Evaluate: Evaluate the models to
learning algorithm is an iterative find the best algorithm.
process. The learning phase has to Deploy: Machine learning algo-
be started as clean as a whiteboard. rithms create models that can be
deployed to both cloud and on-prem- analyzing the validity of predictions

ises applications. is then fed back into the machine
learning cycle to help improve
Predict: Once deployed, predictions
accuracy.
are done based on new input data.
Figure 8.1 demonstrates the
Assess predictions: Assess the relationship between consumers and
validity of your predictions. The service providers—hospitals.
information you gather from
FIGURE 8.1 Transformation in health care.

(Source: http://www.mahesh-vc.com/blog/understanding-whos-paying-for-what-in-the-
healthcare-industry)
8.4.2 DEEP LEARNING learning to meet the inner goals of

artificial intelligence. The world
A system that makes its own deci- needs a deep learning system that
sion based on intelligence is highly thinks like a neo-cortex. Various
needed in today’s life. The break- types of data is been learned by
through is one of the biggest defeats the system and the patterns are
in artificial neural networks (ANNs) recognized as the same as that of
that becomes a major reason for the real sense. To train a large neural
the rise of the new approach called network, deep learning is the best-
deep learning. A new approach, situated approach. The performance
deep learning, arises from machine measure considered with large data
set is high in deep learning rather experimental results show that 13

than other approaches. The advan- attributes were reduced to 8 attri-
tage of deep learning is that it allows butes using information gain. The
automatic feature extraction and accuracy obtained as a result of this
makes the feature learning easier. work is 1.1 % for the training data
This approach helps to discover the set and 0.82% for the validation data
structure in the data. The problem set.
in unsupervised and supervised In “Heart Attack Prediction
learning is tackled with the help of a System Using Data Mining and
deep learning algorithm. Artificial Neural Network” (2011),
Machine learning algorithms are the weighted associative classifier
far away from other algorithms. With (WAC) is used. WAC (Soni et al.,
most algorithms, a programmer starts 2011) is introduced to diagnose if the
by inputting the algorithm. However, patient is affected with any cardio-
this is not the case with machine vascular disease or not. GUI-based
learning. In machine learning, interface was designed to enter the
the above-mentioned process is patient record. They have taken 13
reversed. Here, the data that is to be attributes and 303 records of data for
processed creates a model. When the
training and testing. The author has
size of the data becomes large, the
incorporated a little modification in
algorithm becomes more refined and
the dataset. Instead of considering
polished. As the machine learning
5 class labels (4 – different types
algorithm is exposed to more and
of heart disease and 1—no heart
more data, it is able to create an
disease) they have considered only 2
increasingly accurate algorithm.
class labels “1” for the presence of
heart disease and “0” for the absence
8.4.3 HEART DISEASE of heart disease. In this work, a
PREDICTION new technique of WAC has been
proposed to get the exact significant
In (Khemphila, 2011), heart disease rule instead of flooding with insig-
is classified by hybridizing the nificant relation. From the results, it
backpropagation and multilayer is observed that WAC outperforms
perceptron algorithm. To select the than other associative classifiers
appropriate features from the patient with high efficiency and accuracy.
dataset, information gain along with WAC classifier obtained an accu-
biomedical test values is used. A racy of 81.51% with a support and
total of 13 attributes were chosen confidence value of 25% and 80%,
to classify the heart disease. The respectively.
Kim and Lee (2017), have used with the results of the CNN for the
the dataset taken from the sixth same. Results yield 90% accuracy
Korea National Health and Nutrition in the prediction of heart diseases,
Examination Survey to diagnose the whereas CNN achieves only 82%
heart-related diseases. The feature accuracy, thus enhancing the heart
extraction is done by statistical disease prediction.
analysis. For the classification, the Ajam (2015) chooses a feed-
deep belief network is used that forward back propagation neural
obtained an accuracy of 83.9%. network for classifying the absence
Olaniyi and Oyedotun (2015) and presence of heart disease. This
have taken 270 samples. They were proposed solution used 13 neurons
divided into two parts that are the in the input layer, 20 neurons in
training dataset and the testing the hidden layer, and 1 output layer
dataset. This division is based on neuron. The data here is also taken
60:40, that is, 162 training dataset from the UCI machine learning
and 108 testing dataset for the repository. The dataset is separated
network input. The target of the into two categories such as input
network is coded as (0 1), if there is and target. The input and target
presence of heart disease and (1 0) if samples are divided randomly into
heart disease is absent. The dataset 60% training dataset, 20% valida-
that is used is taken from the UCI tion dataset, and 20% testing dataset.
machine learning repository. The The training set is presented to the
feedforward multilayer perceptron network and the network weights
and support vector machine (SVM) and biases are adjusted according to
is used for the classification purpose its error during training. The pres-
of the heart disease. The results ence and the absence of the disease
obtained from the work are 85% in are known with the target outputs
the case of feedforward multilayer 1 and 0, respectively. The proposed
perceptron, and 87.5% in case of solution had proved to give 88%
SVM, respectively. accuracy experimentally.
In the work, “Prediction of In “Diagnosis of heart disease
Heart Disease Using Deep Belief based BCOA,” UCI dataset is used
Network” (2017), the deep belief to evaluate the heart attack. This
network is utilized for the predic- dataset includes test results of 303
tion of heart disease that is likely to people. The dataset used in this work
occur for the human beings. It was contains two classes, one class for
developed in MATLAB 8.1 develop- healthy people and the other class
ment environment. This proposed for people with heart disease. In this
solution is then later on compared work, a binary cuckoo optimization
algorithm (BCOA) is used for determination of the network depth

feature selection and SVM is used is done by using the reconstruction
for constructing the model. The final error. The unsupervised training
model of this work has accuracy and supervised optimization are
equal to 84.44%, sensitivity equal combined together. A total of 30
to 86.49%, and specificity equal to independent experiments were done
81.49%. on the Statlog (heart) and heart
Chitra (2013) has used the disease database data sets in the
cascaded neural network for the clas- UCI database. The result obtained
sification purpose. For the accurate includes the mean and prediction
prediction of the heart disease the accuracy of 91.26% and 89.78%,
cascaded correlation neural network respectively.
was considered. The proposed work Ajam (2015) has studied that
has taken a total of 270 data samples ANNs show significant results in
among which 150 was taken for heart disease diagnosis. The activa-
training and remaining for testing to tion function used is tangent sigmoid
simulate the network architecture. for hidden layers and linear transfer
The number of input neurons for function for the output layer.
the proposed work is 13 and the
number of output neurons is 1. The
training set accuracy of 72.6% and 8.5 METHODOLOGY
the testing accuracy of 79.45% is
obtained by using ANN with a back- There is a lot of data put away in
propagation algorithm. It is 78.55% stores that can be utilized viably to
and 85% for testing and training, guide medical practitioners in deci-
respectively, in the case of CNN. sion making in human services. Some
Experimental results prove that the of the information obtained from
accuracy of CNN increased 3% than health care is hidden as it collects a
ANN. When the performance of the huge amount of medical-related data
above is analyzed, the Cascaded of patients which in turn is used for
correlation neural network provided making effective decisions. Hence,
accurate results with minimum time the advanced data mining techniques
complexity in comparison with are used for the above for obtaining
ANN. the appropriate results.
The work by Lu et al. (2018) Here, an effective heart disease
proposes a cardiovascular prediction system is developed. It
disease prediction model based uses a neural network for accurate
on an improved deep belief prediction of the risk level of heart
network (DBN). The independent disease. The 14 attributes including
age, sex, blood pressure, cholesterol, to the administration of effective

etc., are used by the system. The treatments.
quality service includes the proper The process flow is depicted in
diagnosis of the patients leading Figure 8.2.
FIGURE 8.2 Flow diagram.
The dataset has been collected dataset chosen are as follows (Table
from UCI machine learning reposi- 8.2):
tory (Cleaveland clinic dataset).
The dataset includes 303 records • Input attributes
including the 14 attributes. The • Key attribute
types of the attributes present in the • Predictable attribute
TABLE 8.2 Dataset Description

Input Attributes Key Attributes Predictable Attributes
1. Age in years
2. Sex represented by values
0—males
1—Females
3. Chest pain type
4. Resting blood
Input Attributes Key Attributes Predictable Attributes

5. Serum cholesterol in mg/dL
6. Fasting blood sugar
value 1: >120 mg/dL
value 0: <120 mg/dL
7. Resting Electrocardiographic Diagnosis:
results (values 0: normal, 1: Patient’s ID: Value 1 = <50% , no heart
having ST-wave abnormality, 2: disease
Patient’s identification
showing probable or definite left
number Value 0 = >50% , heart
ventricle hypertrophy)
disease
8. Maximum heart rate achieved
9. Exercise induced angina
(value 1: yes, value: 0: no)
10. Old peak is the ST depression
induced by exercise relative
to rest
11. The slope of the peak
exercise ST segment (value 1:
unsloping, value 2: flat, value
3: down sloping)
12. No. of major vessels colored
by flouroscopy (value 0–3)
13. Thal (value 3 = normal; value
6 = fixed defect; vlaue7 =
reversible defect)
14. Obesity
15. Smoking
8.5.1 PREPROCESSING mandatory initial step. Prepro-

cessing is done to transform raw
Data can be very intimidating for a data into an understandable format.
data scientist. When working with Raw data (real-world data) is always
data, there are so many ways in incomplete and that data cannot be
which data can be used for analysis, sent through a model as it would lead
out of which preprocessing is the to errors.
Normally, some of the following # sklearn libraries

steps must be followed in data from sklearn.neighbors
preprocessing: import KNeighborsClassifier
from sklearn.model_selection
1. Import libraries import train_test_split
2. Read data from sklearn.preprocessing import
3. Checking for missing values normalize
4. Checking for categorical from sklearn.metrics import
data confusion_matrix,accuracy_score,
5. Standardize the data precision_score,recall_score,f1_
6. Principal component anal- score,matthews_corrcoef,
ysis (PCA) transformation classification_report, roc_curve
7. Data splitting. from sklearn.externals import joblib
from sklearn.preprocessing import
StandardScaler
8.5.2 IMPORT DATA
from sklearn.decomposition import
As main libraries, Pandas, Numpy, PCA
and time can be used. Read Data
Pandas: Use for data Read the data in the CSV file using
manipulation and data analysis. pandas
Numpy: a basic package for scientific df = pd.read_csv('../input/creditcard.
computing with Python. csv')
As for the visualization Matplotlib df.head()
and Seaborn are generally used.
For the data preprocessing tech- Checking for missing values
niques and algorithms, Scikit-learn df.isnull().any().sum()
libraries can be made use of. >0
# main libraries Checking for categorical data
import pandas as pd The features not in numerical
import numpy as np format are converted to categorical
import time data, nominal and ordinal values.
# visual libraries Seaborn distplot() can be used to
visualize the distribution of features
from matplotlib import pyplot as plt
in the dataset.
import seaborn as sns
from mpl_toolkits.mplot3d import Standardize the data
Axes3D The dataset contains only numerical
plt.style.use('ggplot') input variables which are the result
of a PCA transformation. Features pca = PCA(n_components=2)

V1, V2, … V28 are the principal principalComponents = pca.
components obtained with PCA, fit_transform(X.values)
the only features that have not been principalDf = pd.DataFrame(data =
transformed with PCA are “time” principalComponents
and “amount.”. So, PCA is effected , columns = ['principal component
by scale so we need to scale the 1', 'principal component 2'])
features in the data before applying finalDf = pd.concat([principalDf,
PCA. For the scaling I am using y], axis = 1)
Scikit-learn’s StandardScaler(). To finalDf.head()
fit to the scaler the data should be
Data splitting
reshaped within −1 and 1.
# splitting the feature array and
# Standardizing the features label array keeping 80% for the
df['Vamount'] = StandardScaler(). training sets
fit_transform(df['Amount'].values. X_train,X_test,y_train,y_test =
reshape(−1,1)) train_test_split(feature_array,label_
df['Vtime'] = StandardScaler(). array,test_size=0.20)
fit_transform(df['Time'].values.
reshape(−1,1)) # normalize: Scale input vectors
df = df.drop(['Time','Amount'], axis individually to unit norm (vector
= 1) length).
df.head() X_train = normalize(X_train)
Now all the features are stan- X_test=normalize(X_test)
dardize into unit scale (mean = 0 and For the model building, K-nearest
variance = 1) neighbors (KNN), the algorithms
that are considered are KNN, deci-
PCA transformation
sion tree, support vector machine,
PCA is mainly used to reduce the random forest, and DBN. The
size of the feature space while performances of these algorithms are
retaining as much of the informa- analyzed and the best algorithm for
tion as possible. Here, all the heart disease prediction is identified
options are remodeled exploiting (Table 8.3).
PCA. After we build the model we can
X = df.drop(['Class'], axis = 1) use predict() to predict the labels for
y = df['Class'] the testing set.
TABLE 8.3 Prediction Model

The model for heart disease prediction is as follows:
deftrain_model(X_train, y_train, X_test, y_test, classifier, **kwargs):
# instantiate model
model = classifier(**kwargs)
# train model
model.fit(X_train,y_train)
# check accuracy and print out the results
fit_accuracy = model.score(X_train, y_train)
test_accuracy = model.score(X_test, y_test)
print(f"Train accuracy: {fit_accuracy:0.2%}")
print(f"Test accuracy: {test_accuracy:0.2%}")
return model
8.5.3 DECISION TREE The tree can be built as:

• The attribute splits decides
Decision trees is a classification the attribute to be selected.
algorithm where the attributes in the • The decisions about the node
dataset are recursively partitioned. to represent as the terminal
Decision tree contains many branches node or to continue for split-
and leaf nodes. All the branches ting the node.
tell the conjunction of the attributes • The assignment of the
that leads to the target class or class terminal node to a class.
labels. The leaf nodes contain class
labels or the target class that tells to The impurity measures such as
which class the tuple belongs. information gain, gain ratio, Gini
This serves as a very powerful index, etc. decides the attribute splits
algorithm for the prediction of breast done on the tree. After pruning,
cancer. There are several decision the tree is checked against overfit-
tree algorithms available to classify ting and noise. As a result, the tree
the data. Some algorithms include becomes an optimized tree. The main
C4.5, C5, CHAID, ID3, J48, and advantage of having a tree structure
CART. Decision tree algorithm is is that it is very easy to understand
applied to the dataset and accuracy and interpret. The algorithm is also
obtained is 94.6% that proves that very robust to the outliers also.
the decision tree algorithm is a The structure of the decision tree is
powerful technique for breast cancer shown in Figure 8.3.
prediction.
proximity. In other words, similar

things are near to each other.
In other words, the object is
assigned to the class which is most
common among the neighbors. For
example, if k = 2 then the query
point is assigned to the class to
which the nearest two neighbors
belong to. Being the most simplest
machine learning algorithm the
explicit training step is not required.
The neighbors are taken from the set
FIGURE 8.3 Binary DS.
of the objects for which the class is
known that can be considered as the
The equation for the decision tree training step. The algorithm is very
is: delicate or sensitive when it comes to
model = train_model(X_train, the local data. Euclidean distance for
y_train, X_test, y_test, Decision- continuous variables and Hamming
TreeClassifier, random_state=2606) distance for discrete variables are
most widely used.
However, the usage of the
8.5.4 K-NEAREST NEIGHBOR specialized algorithms like large
margin nearest neighbor or neigh-
The KNN algorithm which is known borhood components analysis helps
as KNN is a nonparametric method in the improvement of the accuracy.
used for various purposes like regres- The k value is chosen based on the
sion and classification. The output data. The appropriate K value must
of the classification from KNN is a be chosen so that most of the tuples
class label that describes to which are classified correctly. Heuristic
class or group it belongs to. In KNN, techniques like hyperparameter opti-
the membership is assigned based on mization are used because the larger
the majority vote by the neighbors k value reduces the effect of noise
which is decided by the K value. but makes the boundary between the
The input consists of the k closest classes less distinct.
training examples in the feature The performance gets degraded
space. The output depends on when the noise in the data gets
whether KNN is used for classifica- increased. The accuracy achieved
tion or regression. from KNN is about 95% with an
The KNN algorithm assumes appropriate K value. KNN is shown
that similar things exist in close in Figure 8.4.
In a SVM, an N-dimensional
space is considered for plotting
all the data points. SVM not only
performs linear classification but
also the nonlinear classification
where the inputs are implicitly
mapped to the high dimensional
feature space. SVM is considered to
be advantageous because of a unique
technique called Kernel Trick, where
a low dimensional space is converted
FIGURE 8.4 K-nearest neighbor. to high-dimensional space and clas-
Source: https://www.analyticsvidhya.com/ sified. Thus, the SVM constructs a
blog/2018/03/introduction-k-neighbours- hyperplane or set of hyperplanes.
algorithm-clustering/ The construction is done in a high
or infinite-dimensional space for
The equation is as given below: usage in classification, regression, or
model = train_model(X_ outlier detection.
train, y_train, X_test, y_test, The hyperplane that has the
KNeighborsClassifier) largest functional margin is said to
achieve a good functional margin.
Thus, the generalization error of the
8.5.5 SUPPORT VECTOR classifier is reduced. The SVM gives
MACHINE an accuracy of 97% when employed
on the dataset. The hyperplane that
The support vector machine is also is used for classification purposes is
known as SVM in machine learning shown in Figure 8.5.
is a supervised learning model that
analyzes the data used for classifica-
tion and regression analysis using
associated learning algorithms.
The SVM, an emerging approach
is a powerful machine learning
technique for classifying cases. It
FIGURE 8.5 Support vector machine
has been employed in a range of (SVM).
problems and they have a successful
application in pattern recognition
in bioinformatics, cancer diagnosis, The equation is given as
and more.
model = train_model(X_train, y_ top-down manner. The random forest

train, X_test, y_test, SVC, C=0.05, achieves a faster convergence rate.
kernel='linear') The depth and number of trees are
considered to be the most important
8.5.6 RANDOM FOREST parameters. Increasing the depth is
evident to increase the performance.
For the classification of whether it Thus, random forest is considered as
is malignant or benign the random the best classification algorithm in
forest algorithm could be employed. terms of processing time and accu-
Based on some type of randomiza- racy. The random forest algorithm is
tion, random forest has to be built applied on the dataset and the accu-
that is an ensemble of decision trees. racy obtained is 97.34%. The overall
The random forest is most widely view of the random forest algorithm
used for its flexibility. The random is shown in Figure 8.6.
forest is a supervised learning algo-
rithm. Here, the algorithm creates a
forest with many trees.
The accuracy increases with a
large number of trees. The advantage
of using random forest is that it could
be used for both classification and
regression. It could also handle the
missing values and it will not overfit FIGURE 8.6 Random forest.
in case of more number of trees.
Random forest works by taking
the test features and predicting the The equation is as:
outcome of the randomly created model = train_model (X_train,
trees based on rules and then y_train, X_test, y_test, Random-
stores the result. The votes for ForestClassifier, n_estimators=110,
each predicted target is calculated random_state=2606)
since each tree results in different
prediction.
Finally, the target receiving high 8.5.7 MULTILAYER
vote will be considered as the final PERCEPTRON
prediction. The random forest is
capable of processing a huge amount A multilayer perceptron is a feed-
of data at a very high speed. The forward ANN that is used for clas-
structure of each tree in random sification. The layers present in the
forest is binary that is created in a multilayer perceptron are an input
layer, a hidden layer, and an output

layer. Multilayer perceptron adapts a
supervised learning technique for the
training where the data are associated
with the class label. The MLP differs
from the other traditional models
with its multiple layers and nonlinear
activation. The MLP consists of a FIGURE 8.7 Multilayer perceptron.
minimum of three layers. In some
cases, it may be more than three with for i in range(1,8):
an input and an output layer with one print ("max_depth = "+str(i))
or more hidden layers. Since MLPs train_model(X_train, y_train,
are fully connected, each node in one X_test, y_test, MultilayerPer-
layer connects with a certain weight ceptronClassifier, max_depth=i,
to every other node in the succeeding random_state=2606)
layer.
The amount of error is calculated
at each iteration and the connection 8.6 RESULT ANALYSIS
weights are updated. Therefore, it is
an example of a supervised learning Decision Tree:
algorithm. The backpropagation that
is carried out is based on the least
mean square algorithm in the linear
perceptron. A multilayer network
contains input, hidden, and output FIGURE 8.8 Accuracy of decision tree.
neurons called ANN. ANN is a
computing system that performs the The prediction model is trained
task similar to the working of the using a decision tree algorithm and
brain. It is used for solving problems. the obtained results are shown in
The collection of nodes connected Figure 8.8. The attributes of the
in a network transmits a signal and dataset are recursively partitioned
process the artificial neurons. The and we obtained the accuracy
activation node contains the weight as 85% for the selected training
values. A neural network is trained to dataset.
teach the problem-solving technique
to the network using training data. K-nearest neighbor (KNN):
The accuracy obtained from the The k value is chosen based on the
constructed model using a multilayer data. The appropriate K value must
perceptron is 96.5% (Figure 8.7). be chosen so that most of the tuples
are classified correctly. The accuracy Multilayer perceptron:

obtained is 83% as shown in Figure A neural network is trained to teach
8.9, which can also be improved by the problem-solving technique to
using other algorithms. the network using training data.
The accuracy obtained from the
constructed model using a multi-
layer perceptron is 60% as shown in
FIGURE 8.9 Accuracy of KNN. Figure 8.12.
Support vector machine:

All the data points are plotted in an
N-dimensional space. SVM uses FIGURE 8.12 Accuracy of MLP.
linear classification, and in addition
to that, it also uses nonlinear clas-
sification. The obtained accuracy is 8.7 DISCUSSION
87% as shown in Figure 8.10.
The setup was built on a hardware
arrangement of Intel i7 GPU with
the capacity of 16GB RAM using
Python. The results show that the
FIGURE 8.10 Accuracy of SVM. machine learning algorithms work
well for heart disease prediction and
the model is trained by applying
Random forest: different algorithms. Each algorithm
works on the model to fit and trains
the model using a training dataset and
the model is tested for the different
datasets, test data for each of the
FIGURE 8.11 Accuracy of random forest. algorithms. The accuracy that is
obtained is listed. By examining the
Based on some type of random- results, we found that random forest
ization, random forest has to be produced the highest accuracy when
built that is an ensemble of decision compared to all other algorithms
trees. The random forest is used (Figure 8.13). Since it constructs
most widely due to its flexibility. a multitude of decision trees and
The accuracy increases with a large chooses the subset of features for
number of trees and obtained 89% as classification, it has the highest
shown in Figure 8.11. accuracy of all others (Table 8.4).
TABLE 8.4 Accuracy Evaluation
Techniques Train Test

Accuracy Accuracy
Random forest 100 89.01
SVM 84.91 87.91
MLP 78.77 60.44
Decision tree 85.38 85.71
KNN 90.57 83.52
FIGURE 8.14 Analysis of time complexity.
The graph shows the response

time evaluation of various machine
learning techniques and random
forest proves to suit well than other
methods. The hybridization of SVM
with the random forest is expected to
serve the purpose better.
FIGURE 8.13 Accuracy for various ML 8.8 CONCLUSION

techniques.
Data from various health organiza-
Time complexity is measured tions like the World Health Organi-
zation and Global Burden of Disease
between all the algorithms and the
states that most of the death is due
same is as given in Table 8.5. to the cardiovascular diseases. When
the emergency treatment is given at
TABLE 8.5 Time Complexity Evaluation
the correct duration than the chances
Techniques Time Complexity (ms) of survival increase. A lot of work has
Random forest 550 been done already in making models
SVM 608 that can predict whether a patient
MLP 700 is likely to develop heart disease or
Decision tree 730
not. The various analysis and studies
were made on the research papers of
KNN 1000
heart disease prediction. The random
forest has the most wider usage.
The graphical representation The accuracy increases with a large
is shown in the form of a graph in number of trees and shows 89% that
Figure 8.14 is considered to be better than other
techniques. Prediction accuracy of Chitra, R. and Seenivasagam, V. “Heart

the existing system can be further disease prediction system using supervised
learning classifier,” Bonfring International
improved, so in the future, new Journal of Software Engineering and Soft
hybrid algorithms are to be devel- Computing, vol. 3, no. 1, 2013.
oped that overcomes the drawbacks h t t p : / / w w w. m a h e s h - v c . c o m / b l o g /
of the existing system. In the future, understanding-whos-paying-for-what-
an intelligent cognitive system in-the-healthcare-industry. Reference
cited in https://medium.com/@fkigs/
may be developed that can lead to the-healthcare-system-1c2733d88aa9
the selection of proper treatment https://www.saylor.org/site/wpcontent/
methods for a patient diagnosed with uploads/2012/06/Wikipedia-Decision-
heart disease, well ahead. Also, the Tree.pdf.
same methodologies can be applied Hurwitz, J., and Kirsch, D. “Machine
Learning for Dummies, IBM Limited
for prenatal, oncological, and rare Edition,” John Wiley & Sons, Inc. 2018.
disease predictions to enhance the Khemphila, A. and Boonjing, V. “Heart
overall health care. disease classification using neural network
and feature selection,” Proceedings of
21st International Conference on Systems
KEYWORDS Engineering, 2011.
Kim, J., Kang, U., and Lee, Y. “Statistics and
health care deep belief network-based cardiovascular
risk prediction,” vol. 23, no. 3, 2017, pp.
heart disease
169–175.
prediction Lu, P., Guo, S., Zhang, H., Li, Q., Wang,
machine learning Y., Wang, Y., and Qi, L. “Research on
improved depth belief network-based
accuracy prediction of cardiovascular diseases,”
Journal of Healthcare Engineering,
Hindawi, vol. 2018, 2018.
REFERENCES Olaniyi, E. O. and Oyedotun, O. K. “Heart
diseases diagnosis using neural networks
Ajam, N. “Heart diseases diagnoses using arbitration,” International Journal of
artificial neural network,” Network and Intelligent Systems and Applications, vol.
Complex Systems, vol. 5, no. 4, 2015, pp. 12, 2015, pp. 75–82.
7–11, ISSN: 2224-610X (Paper), ISSN: Sayad, A. T. and Halkarnikar, P. “Diagnosis
2225-0603(Online). of heart disease using neural network
Chen, A. H., Huang, S. Y., Hong, P. S., approach,” International Journal of
Cheng, C. H., and Lin, E. J. “HDPS: Heart Advances in Science Engineering and
Disease Prediction System,” Computing Technology, vol. 2, no. 3, Jul. 2014.
in Cardiology, 2011, pp. 557–560, ISSN: Soni, J., Ansari, U., and Sharma, D. “Intelligent
0276-6574. and effective heart disease prediction system
Chitra, R. and Seenivasagam, V. “Heart attack using weighted associative classifiers,”
prediction system using cascaded neural International Journal on Computer Science
network,” Proceedings of the International and Engineering (IJCSE), vol. 3, no. 6, Jun.
Conference on Applied Mathematics and 2011.
Theoretical Computer Science, 2013.
CHAPTER 9
A REVIEW ON PATIENT MONITORING

AND DIAGNOSIS ASSISTANCE BY
ARTIFICIAL INTELLIGENCE TOOLS
SINDHU RAJENDRAN1*, MEGHAMADHURI VAKIL1,
RHUTU KALLUR1, VIDHYA SHREE2, PRAVEEN KUMAR GUPTA3, and
LINGAIYA HIREMAT3
1
Department of Electronics and Communication, R. V. College of
Engineering, Bangalore, India
2
Department of Electronics and Instrumentation, R. V. College of
Engineering, Bangalore, India
3
Department of Biotechnology, R. V. College of Engineering,
Bangalore, India
Corresponding author. E-mail: sindhur@rvce.edu.in
*
ABSTRACT laboratory research by which

images are processed and analyzed.
In today’s world, digitalization Segmentation of images can be
is becoming more popular in all analyzed using both fuzzy logic
aspects. One of the keen develop- and AI. As a result, AI and imaging,
ments that have taken place in the the tools and techniques of AI are
21st century in the field of computer useful for solving many biomedical
science is artificial intelligence (AI). problems and using a computer-
AI has a wide range of applica- based equipped hardware–software
tions in the medical field. Imaging, application for understanding
on the other hand, has become an images, researchers, and clinicians,
indispensable component of several thereby enhancing their ability to
fields in medicine, biomedical study, diagnose, monitor, under-
applications, biotechnology, and stand, and treat medical disorders.
Patient monitoring by automated healthcare industry as they have a

data collection has created new wide level of applicability because
challenges in the health sector for of their accuracy in a variety of
extracting information from raw tasks. It has helped to uncover some
data. For interpretation of data of the hidden insights into clinical
quickly and accurately the use of decision-making, communicate with
AI tools and statistical methods are patients from the other part of the
required, for interpretation of high- world as well as extract meaning
frequency physiologic data. Tech- from the medical history of patients,
nology supporting human motion which are often inaccessible unstruc-
analysis has advanced dramatically tured data sets. The productivity and
and yet its clinical application has potentiality of pathologists and radi-
not grown at the same pace. The ologists have accelerated because of
issue of its clinical value is related to AI. Medical imaging is one of the
the length of time it takes to perform most complex data, and a potential
interpretation, cost, and the quality source of information about the
of the interpretation. Techniques patient is acquired using AI.
from AI such as neural networks Health management includes
and knowledge-based systems can fault prognostics, fault diagnostics,
help overcome these limitations. In and fault detection. Prognostics
this chapter, we will discuss the key has been the latest addition to this
approaches using AI for biomedical game-changing technology that goes
applications and their wide range
beyond the existing limits of systems
of applications in the medical field
health management.
using AI. The different segmenta-
The main reason for the expo-
tion approaches their advantages
nential boom of AI is that its scope
and disadvantages. A short review
of innumerable applications in
of prognostic models and the usage
care-based applications has widened
of prognostic scores quantify the
within healthcare. It is seen that
severity or intensity of diseases.
the healthcare market using AI
has a growth rate of almost 40%
9.1 INTRODUCTION in a decade. Right from delivering
advanced vital information to physi-
Artificial intelligence (AI), termed cians promptly to initiating informed
as smart, today’s technology, has choices, customized real-time treat-
entered broadly across all possible ment facilities are the basis for the
sectors—from manufacturing and revolutionizing care of the patients
financial services to population that are appraised as the applications
estimates. AI has a major hold in the of AI.
A Review on Patient Monitoring and Diagnosis Assistance 197
An extensive quantity of data most ideal diagnosis tools, which is

is obtained to clinical specialists, used to highlight the possibilities of
starting from details of clinical any disorder. By using these, there
symptoms to numerous styles of is promptness in diagnosing the
organic chemistry knowledge and diseases.
outputs of imaging devices. Every AI and drug discovery: The study
form of knowledge provides info that of multiple drug molecules and their
has got to be evaluated and assigned structure accurately and promptly is
to a selected pathology throughout done using AI to predict their phar-
the diagnostic process. To contour macological activity, their adverse
the diagnostic method in daily effects, and potency, which are the
routine and avoid misdiagnosis, the most cost-effective routes of drug
computing method may be utilized. discovery. This is mainly used across
These adaptive learning neural pharmaceutical companies, thereby
network algorithms will handle reducing the cost of medications
various styles of medical knowledge
drastically. AI-based drug discovery
and integrate the classified outputs.
has led to the auxiliary treatment
The variety of applications in
of neurodegenerative disorders and
today’s world using AI is as follows:
cancer.
Diagnosis: Diagnosis of disease
AI-enabled hospital care: Smart
in healthcare is one of the challenging
monitoring of intensive care unit
parts. With the assistance of AI,
and IV solutions in hospitals using
machines are powered up with the
flexibility to search large data from AI has simplified care delivery.
existing medical images, indicating Robot-assisted surgeries have been
early detection of the many disor- booming with the intervention of AI
ders, through diagnostic imaging in routine phlebotomy procedures.
victimization neural networks. It has Other applications of AI among
numerous applications in proactive the hospital are patient medication
diagnosing of the probability of chase, nursing worker performance
tumor growth, stroke, etc. assessment systems, patient alert
Biomarkers: For choosing ideal systems, and patient movement
medications and to assess treatment chase. The main advantage is a
sensitivity, bookmarkers are used decrease in dosage errors and an
that automatically provide accurate increase in the productivity of
data of the patients as both audio nursing staff.
and video of the important health In this chapter, we will emphasize
parameters. The precision and fast- on the applications of AI in different
ness of biomarkers make them the sectors of healthcare.
9.1.1 BENEFITS OF AI IN THE advancements, the patients

MEDICINE FIELD can get assistance by doctors
without visiting clinics/
The importance of AI is booming, hospitals. AI provides assis-
especially in the medical field; tance to patients via online
considering that the email address and with their past wellness
details are extremely precise and fast, documents, thereby reducing
these are a few advantages of this the cost.
stream, some of them are because • Virtual presence: Technology
there is technical advancement in all is rolling out so much that
sectors, as follows: this assistance can be given to
clients living in remote loca-
• Quick and precise diagnosis: tions by utilizing a technology
For diagnosis of certain called Telemedicine.
diseases, immediate action
measures should be taken
before they become serious. 9.1.2 ARTIFICIAL NEURAL
AI has the power to find out NETWORKS
from past situations, and
also, it has tried to prove that Synthetic networks such as artifi-
these networks can quickly cial neural networks (ANNs) have
diagnose diseases such as various applications in branches
malignant melanoma and of chemistry, biology, as well as
eye-related problems. physics. They are also used exten-
• Diminish human mistakes: It sively in neuroscientific technology
is normal for humans to make and science. ANNs have actually
small mistakes, as the profes- an extensive application index that
sion of doctors is extremely is utilized in chemical dynamics,
perceptive since they focus modeling kinetics of drug release,
on a great deal of patients that and producing classification that is
can be very exhaustive, thus agricultural. ANNs are also widely
leading to a lack of activeness used in the prediction of behavior of
that compromises patient industrial reactors and plants that are
safety. In this case, AI plays a agricultural determination. Gener-
major part in assisting physi- ally speaking, an understanding of
cians by reducing human biological item classification and
errors. knowledge of chemical kinetics, if
• Low prices: With the not clinical parameters, are handled in
increase in technological an identical way. Machine strategies
such as ANN apply different inputs The first step is initiated with
in the initial stage (Abu-Hanna and the network receiving a patient’s
Lucas, 1998). These input files are data to make a prediction of the
forms of processes that are well diagnosis. The next step would be
within the context of the formerly feature selection. Once the diagnosis
known history of the defined is completed, feature selection takes
database to generate an appropriate place. Feature selection provides the
output that is expected (Figure 9.1). necessary information to differen-
tiate between the health conditions
of the patient who is being evaluated.
The next step is building of the data-
base itself. All the data available are
validated and finally preprocessed.
With the help of ANN, the training
and verification of database using
training algorithms can be used to
predict diagnosis. The end diagnosis
as predicted by the network itself
is further evaluated by a qualified
physician.
FIGURE 9.1 Application of ANN in the

medical sector.
(Source: Reprinted from Abu-Hanna and Lu-
cas, 1998.)
9.1.2.1 FUNDAMENTAL STEPS

IN ANN-BASED MEDICAL
DIAGNOSIS
FIGURE 9.2 Flow diagram of steps in
ANN-based diagnosis.
The flow diagram describing the
analysis of ANN starts with clinical (Source: Reprinted from Abu-Hanna and Lu-
cas, 1998.)
situations and is shown in Figure
9.2. This flow diagram gives us an
1. Feature Selection: The exact
overview of the involved processes
recognition designation of a disease
or steps that utilize ANN in medical
is normally predicted using varied
diagnosis (Ahmed, 2005).
and mostly incoherent or confusing
data. The factors that significantly ones that are obtainable is some-
affect any kind of diagnosis are times allotted a victimization varied
instrumental data and laboratory approach. The primary tools that can
data and largely determined by the be utilized for variable selection are
convenience of the practitioner as follows:
itself. Clinicians are provided with
a. Powerful mathematical
ample training to enable them to
means of information
extract the required relevant infor-
mining.
mation from every sort of data and
b. Principal component
point out any kind of diagnosis that
analysis.
can be done. In ANN, this particular
c. A Genetic algorithm program.
information is referred to as features.
Features may range from being Utilizing the help of suitable
biochemical symptoms to any other trainee examples, we train the network
information that gives insight into with the “example” data of one patient
what the ailment could possibly be. that is fed, examined, and collected as
The last diagnosis is linked to the a feature. The major component that
level of expertise associated with affects the prediction of diagnosis,
the skilled clinician. ANNs have quality of training, and the overall
actually higher flexibility, and their result is the training sample used.
ability to compare the information Enough number of samples whose
with formerly stored samples is what diagnosis is well known must be
has enabled fast medical diagnosis. within the database used for training
Varieties of neural networks are to enable the community to extract the
feasible to solve sensory activity provided information hidden within
problems, whereas some are adapted the database. The network employs
for purposeful information modeling this knowledge that is extremely
and approximation. Irrespective assessing the new cases. Despite this,
of the features chosen, the people laboratory data received from clinics
selected to train the neural system should be easily transferable with
must be robust and clear indica- other programs for computer-aided
tors of a given clinical scenario or diagnosis (Aleksander and Morton,
pathology. The choice of features 1995).
depends upon medical expertise 2. Building the database: Multi-
choices done formerly. Thus, any layer feed-forward neural networks,
short, nonspecific information that is such as Bayesian, stochastic, recur-
redundant to the investigation itself rent, and fuzzy are used. The optimal
is avoided. Selection/extraction of neural community architecture for
appropriate features among other the maximum values for both training
and verification should be selected in to be occurring while modeling

the first stage. These are obtained by complicated systems, for example,
testing sites that have multiple layers human health. This noise would not
that have concealed nodes in them. only impact the uncertainty that is
3. Training algorithm: There are conventional of measured data but
multiple techniques for obtaining also influence secondary facets,
necessary training. The most including the presence of more than
common algorithm is backpropaga- one disease. Crossed effects cannot
tion. The backpropagation rule be foreseen unless they need to be
requires two training parameters, considered throughout the develop-
namely, learning and momentum ment of this training database (Alkim
rates. In many cases, below par et al., 2012). Any issue that alters or
generalization ability of the network affects the outward indications of
is the total result of high values of the condition under study should be
parameters. High-value parameters considered by including instances
cause learning instability. Therefore, that are in the database. By this
the performance is generally flawed. means, the network can precisely
The training parameter values rely classify the in-patient. Of course,
on the studied system’s complexity. a technique to avoid this can be a
The worth of momentum is not up to combination of the expertise associ-
that of the learning rate; in addition, ated with the clinical specialist and
the sum total of their values should the power of ANN-aided approaches.
be equal to 1. 4. Testing in medical practice:
The last step in ANN-aided diagnosis
a. Verification: ANN-based
is testing in medical practice. The
medical diagnosis is confirmed
outcome of this system is carefully
by means of a dataset that is
examined by a practitioner for every
not utilized for training.
patient. Medical diagnosis data of
Robustness of ANN-based patients without any error may be
approaches: It has been well estab- finally enclosed inside the training
lished that ANNs have the potential database. Nonetheless, an in-depth
to tolerate a noise that is particularly analysis of ANN-assisted diagnosing
within the data and usually offer applications in the clinical environ-
sufficient accuracy associated ment is important even throughout
with the end result or in cases completely different establishments.
like prediction of diagnosis. This Verified ANN-aided diagnosis that
sound can possibly cause decep- have medical applications in clinical
tive results on the bright side. This settings is a necessary condition for
situation or anomaly is discovered additional growth in drugs.
9.2 APPROACHES OF AI IN usage of AI techniques specifically

BIOMEDICAL APPLICATIONS in medical proves useful because it
will store huge information, retrieve
AI allowed the development of data, and provide most fascinating
various algorithms and implemented use of knowledge analysis for higher
them within the field of medical cognitive processes in finding issues
imaging system. Based on physical, (Shukla et al., 2016).
geometrical, statistical, and func-
tional strategies, a wide range of
algorithms and methods are devel- 9.2.1 MEDICAL IMAGE
oped using AI, and these strategies ANALYSIS
are used to solve the problems of Many image processing tech-
feature extraction, image, texture, niques are employed in automatic
segmentation, motion measure- analysis of medical images. Before
ments, image-guided surgery, segmenting the image, operations
method anatomy, method physi- related to preprocessing have to
ology, telemedicine with medical be undertaken, for example, noise
images, and many by using pattern removal, image improvement,
image datasets. In recent studies, edge detection, and so on. After the
the medical information analysis preprocessing operations, the image
is increasing because various and is now ready for analysis. Mining
large data sets are collected and of the region of interest (ROI) from
these large data sets are analyzed on the image is completed within the
a daily basis (Amato et al., 2013). segmentation section by a combi-
The manual analysis of such vast nation of intelligent ways. Later,
information can mechanically result feature extraction or even selection
in human errors. So, to avoid such of features is performed to spot and
mistakes, automated analysis of the acknowledge the ROI, which can be
medical information needs to be a neoplasm, lesion, abnormality, and
implemented. Thus, the automated so on. Figure 9.3 shows the auto-
medical information analysis is mated medical image segmentation
currently in demand. As a result, mistreatment intelligent methods.
FIGURE 9.3 Automatic medical image segmentation by intelligent methods.

(Source: Adapted from Rastgarpour and Shanbehzadeh, 2011.)
9.2.2 IMAGE SEGMENTATION As a result, by applying these intel-

ligent methods, correct analysis and
In the healthcare system, for medical specific identification of biological
imaging it is necessary to diagnose options will be done. AI of such type
the disease accurately and provide is suggested to use digital image
correct treatment for the disease. To processing techniques and medical
solve such a complex issue, algo- image analysis along with machine
rithms for automatic medical image learning, pattern recognition, and
analysis are used, which provide fuzzy logic to improve the efficiency.
a larger and correct perceptive of Figure 9.4 shows the final theme
medical images with their high reli- of the medical image analysis
ability (Costin and Rotariu, 2011). system.
FIGURE 9.4 General representation of medical image analysis system.

(Source: Adapted from Rastgarpour and Shanbehzadeh, 2013.)
It is tough to investigate the abdomen, respiratory organ fissure,

disease in automatic medical image gristle knee, and so on.
analysis. As a result of an incorrect
image, segmentation can usually
result in incorrect analysis of the 9.2.3 METHODS OF IMAGE
image, and also, improper segmenta- SEGMENTATION
tion affects the later steps, particu-
larly ROI illustration and extraction 1. Image Segmentation Using
in medical image analysis. So, to Machine Learning Techniques
get correct diagnosis of diseases or (a) Artificial neural network
sequent lesions in medical image (ANN): ANN is a statistical
analysis, segmentation of the ROIs model that works on the
must be done accurately. Thus, a principle of neural networks.
correct segmentation methodology There are three layers in the
is important. In medical applications, neural network model; the
segmentation identifies the bound- first one is the primary layer
aries of ROIs as well as tumors, and called as input layer, the
lesions, abnormalities, bony struc- next one is the middle layer
tures, blood vessels, brain elements, that includes a number of
breast calcification, prostate, iris, hidden layers, and the last
one is the output layer. As classification is achieved

the number of hidden layers by using “self-organizing
grows, the capability of maps” of neural networks,
solving the problem accu- which are very helpful in
rately also increases (Deepa the application for decision
and Devi, 2011). ANN uses making of “computer-
an iterative training method assisted diagnosis” as well
to solve the problems by as categorization. In ANN,
representation of weights. the supervised classification
Hence, this method is a technology uses various
very popular classification methods such as Bayesian
method. It has a simple decision theory, support
physical implementation vector machine (SVM),
arrangement, and its class linear discriminant analysis
distribution is complex (LDA), etc. Each method
but can be mapped easily. uses different approaches
There are two types of to provide a unique clas-
techniques used by a neural sification result. Usually,
network: (1) supervised clas- the information of the data
sification techniques and (2) is divided into two phases,
unsupervised classification training and test phases to
techniques (Jiji et al., 2009). classify and validate the
Medical image segmenta- image (Figure 9.5).
tion based on clusters and
FIGURE 9.5 ANN applied on an image document. The left figure shows the noise scanning
of the image, and the right figure is a clear image after using ANN.
(Source: Adapted from Shuka, et al., 2016.)
(b) Linear discriminant analysis (c) Naive Bayes (NB): The

(LDA): The LDA method NB method could be a
is used as a dimensionality straightforward technique
reduction technique that for constructing classifiers
reduces the feature space that models; the class labels are
is very large. LDA computes assigned to the instances of
very best transformation by the problem delineated as
reducing the within-class vectors of feature values;
distance and increasing the from some finite sets, the
between-class distance at class labels are drawn.
the same time, consequently This type of classifier does
achieving most of the not use a single algorithm,
category discrimination. By but there are a bunch of
applying the Manfred Eigen algorithmic techniques that
decomposition on the scatter have their own principles.
matrices, the simplest attain- All NB classifiers assumes
able transformation in LDA that, given the class variable,
may be pronto-computed the features are independent
(Amin et al. (2005). It has of each other, that is, they
been widely used in numerous have different feature values
applications relating to high- at different feature sets.
dimensional information. On In simple words, it can be
the opposite hand, classical explained with an example,
LDA needs the disputed total such as by using NB method,
scatter matrix to be neces- a fruit can be recognized as
sarily nonsingular. In some an apple based on its shape
applications that involve if it is round, its color if it is
high-dimensional and red, and its diameter if it is
low-sample-size informa- about 10 cm, which means
tion, the overall scatter matrix every feature value such as
may be singular. As a result, color, diameter, and round-
the info points are from a ness is independent of each
really high-dimensional area, other (Zhang, 2016).
and therefore, the sample (d) Support vector machine
size might not surpass this (SVM): Supervised learning
dimension. This is often models are referred to as
called singularity downside SVM of the machine learning
in LDA (Mandelkow et al., technique, which under-
2016). stands the data with the help
of different learning algo- to as defuzzification, which are the

rithms for classification and two steps that produce a conceivable
multivariate analysis. SVM method to convert a medical image
behaves as a nonprobabilistic in fuzzy sets. The central step is
binary linear classifier, such that the elementary force of fuzzy
as when there exists a given image processing (adjustment of
set of partially or fully trained the values of membership). Once
examples, the SVM method the dynamics of image data changes
models a new example for from a gray-level plane to a member-
one or the other category of ship plane (called as fuzzification),
the training sets. The SVM correct fuzzy approaches modify the
method develops a model membership values.
like points in a space, that is, A fuzzy logic has two completely
the examples of two different different approaches, one is region-
categories are kept apart by based segmentation and another
some distance in a space so is contour-based segmentation. In
that when a new example is region-based segmentation, one
fed it should fall under one of looks for the attributes, division,
the two categories that are s and growth of the region by using
by some distance. a threshold type of classification. In
contour-based segmentation, native
2. Image Segmentation Using Fuzzy
discontinuities like mathematical
Logic
morphology, derivatives operators,
Image processing fuzzy logic is the etc. are examined (Khan et al., 2015).
cluster of all methodologies that
represent, recognize, and process 3. Image Segmentation Using
medical pictures, as a fuzzy sets, Pattern Recognition
their options, and segments. The Supervised ways embrace many
process and illustration depend pattern recognition techniques.
upon the chosen fuzzy sets and the Several pattern recognition ways are
difficulty to be settled. The three forward explicit distributions of the
elementary levels are as follows: options and are known as constant
ways. For instance, the maximum
• image fuzzification,
likelihood (ML) technique normally
• image defuzzification, and
assumes variable mathematical
• modification of membership
distributions. This means that the
values (Amin et al., 2005).
covariant E-matrices for every
Fuzzification refers to the tissues that are calculable from a
writing of medical image data and user provided coaching set, typically
deciphering of the results, referred found by drawing regions of interest
(ROI) on the pictures. The remaining scales by designing wavelet func-

pixels are then categorized by tion. “Markov random fields” are the
conniving the chance of every tissue most accepted methods to model the
class and selecting the tissue sorted images to extract special information
with the very best chance. Constant of an image in which the method says
ways are solely helpful once the that every element of pixel intensity
feature distributions for the various depends only on the neighborhood
categories are well known, so that the pixel intensity. These models assume
case for main pictures is not essen- that the intensity at every picture
tial. Statistic ways, like k-nearest element within the image depends
neighbors (kNNs) do not depend on the intensities of solely the
on predefined distributions, rather neighbor pixels. The next important
on the particular distribution of the step is the feature selection or feature
coaching samples themselves. kNN extraction. This is very important
has given superior results in terms of because the segmentation of image
accuracy and duplicability compared and classification is purely based
to constant ways. Conjoint reporting on which type of feature is selected;
is the use of ANNs and a call tree the machine learning algorithm uses
approach (Secco et al., 2016). the statistical feature extraction
method, for example, features such
4. Image Segmentation Using
as entropy, energy, standard devia-
Textural Classification
tion, mean, homogeneity, and so
Based on the texture of the image, on are extracted to achieve the best
textural classification of image classification result.
segmentation is divided into four
categories as geometrical, statistical, 5. Image Segmentation Using Data
model-based, and signal processes. Mining Technique
In this classification method, a The data mining technique is used for
number of sample images are extracting the features of the image,
trained based on their texture, and segmenting the image, and also for
the testing image is assigned to one the classification of the segmented
of the trained texture images for image. As this technique is also part
accurate classification and segmen- of AI, it performs the classifica-
tation. This type of classification tion along with other methods such
method is very much efficient for as pattern recognition, machine
almost all types of digital image learning, databases, and statistics to
classifications. Another important extract the image information from
part of the textural classification is the large set of data. The data mining
wavelet transform, in which spatial technique is used for many applica-
frequencies are evaluated at multiple tions that have a large set of images.
Such a large set of images is analyzed bifurcated to improve the storage

by one of the popular methods called of patients’ records, which helps
cluster method in which the features the other departments to utilize the
that have the same characteristics are records when the patients are referred
combined as one large set unit for to them, thereby providing central-
analysis. The C-means algorithmic ized information access to secure
rule is the preferred method based on the information from unauthorized
cluster analysis. users. The prediction module makes
The decision tree algorithm use of the neural network technique
method is another method used for prediction of illness and condi-
for classification of images by the tion of the patient based on similar
data mining technique. This type of previous cases. The data obtained via
method uses association rules. The databases will be used for training
algorithm follows the tree pattern and testing. On the other hand,
that helps the user to segment and the diagnosis module comprises
classify the image efficiently and specialist systems and fuzzy logic
makes the process easy. techniques for performing the tasks.
The specialist system follows a set
of rules defined based on the two
9.3 MEDICAL DIAGNOSIS databases and information about
USING WORLD WIDE WEB the disease. This system uses these
Figure 9.6 indicates the web-based rules to identify patient’s ailments or
diseases based on their current condi-
medical diagnosis and corresponding
tions or symptoms. The fuzzy logic
prediction. It involves four compo-
techniques are used along with the
nents, namely, a diagnosis module, a
system to improve the performance
prediction module, a user interface,
and enhance the reasoning. In Figure
and a database. The model can both
9.6, World Wide Web (WWW) is an
diagnose and predict. It consists
essential part for interaction, and the
of two sets of databases, namely, a
figure also describes how WWW is
patient database and a disease data-
acting as an interface between the
base. The patient database comprises system and patients, taking data
personal information such as name, from the specified databases.
address, and medical history of the
patient if any. The disease database
9.4 AI FOR PROGNOSTICS
consists of all the information
regarding patient’s illness, which Out of the various methods being
includes the type of disease, treat- currently utilized and devel-
ments taken and suggested, and oped, prognostic models are also
tests encountered. This is mainly constructed.
FIGURE 9.6 Web-based diagnosis and prediction.

(Source: Reprinted from Ishak and Siraj, 2008.)
An integrated system health A very crude model of prognos-

management (IHSM) takes certain tics has been used. This makes use
input values from the signals and is of statistical information regarding
able to provide fault detection, fault the time required for a failure to
identification, and ultimately fault occur in a certain component of a
prognostics. An effective diagnostic system. This statistical information
should provide both fault identifica- can also make the life prediction
tion and isolation. This is rather of all the other components present
effective than just notifying an in that system. However, these are
unusual pattern in the sensor values.
only to help predict the life of a
Prognostics may be defined as a
component. The reason for failure is
method of detecting a failure and
not predicted using this basic model
also the time period before which an
(Asch et al., 1990).
impending failure occurs. To reduce
damage, we must be able to carry out Modeling the system via ANN
a diagnosis before performing prog- is the widely used approach to
nostics. ISHM also gives a response prognostics. ANN models create a
command to the detected failure, mapping or a relationship between
and these responses may vary from the input and the output. These
changing the configuration of the relationships or parameters may be
hardware components to even reca- modified to get the optimum result.
libration of sensor values (Figure 9.7 This modification process is usually
and Table 9.1). done by exposing the network to a
FIGURE 9.7 Taxonomy of ISHM algorithms.

(Source: Reprinted from Schwabacher and Goebel, nd.)
TABLE 9.1 Sample Method for Each Set of Algorithm Type (Rows) ISHM Problem
(Columns) in Data-Driven Prognostics
Faculty Detection Diagnostics Prognostics
Physics based System Theory Damage
propagation models
AI-model based Expert systems Finite state machines
Conventional numerical Linear regression Logistics regression Kalman filters
Machine learning Clustering Decision trees Neural networks
(Source: Reprinted from Schwabacher and Goebel, nd.)
number of examples and recording Using linguistic variables, dynamic

its response to those situations. This systems are modeled. This variable
helps minimize the errors. A popular is interpreted as an elastic constraint
method to reduce these errors is that is propagated by fuzzy interfer-
gradient descent. Another approach ence operations. The mechanism of
is novelty detection. This learns reasoning gives fuzzy logic incred-
optimal behavior of the system, ible robustness regarding the varia-
and when the sensor output fails tions and disturbances in the data
to tally with the given or expected and parameters. This can be applied
data, an anomaly is indicated. Other to prognostics in addition to machine
data mining techniques also try and learning algorithms. This would
discover a pattern in the dataset. help reduce the uncertainty that
Fuzzy logic is also a popular most prognostics estimations fail
method used in AI. Using this, we to address. The learning algorithm
can translate or derive a meaning to nullify any uncertainty is imple-
regarding the qualitative knowledge mented, such as lazy or Q-learning
and solve the problem in hand. process.
9.4.1 APPLICATION OF certain client management

PROGNOSTICS systems with supportive
decisions on withdrawing or
Several systems have seen an upward withholding treatment.
growth when it comes to utilizing
prognostic models. These include As supportive decision making
different engineered systems, actua- tools, it is beneficial to differentiate
tors, bearings, aircrafts, electronic between two kinds of models based
devices and turbines, helicopter gear on prognostics: models throughout
boxes, hydraulic pumps, etc. A joint the patient populace in addition to the
strike fighter is being developed, level that is person specific. Both of
which will be used by the US Air these amounts imply different needs
Force. The idea is to implement regarding the model concerning
prognostics and fault detection which prognostic techniques can be
mechanisms into every system employed for building and assessing
available in an aircraft. Using rule- the model.
based architecture and model-based The population versus certain
and data-driven methods, we can degree aspects gives a fair idea of two
improve safety and also significantly quantities of Shortliffe’s typology of
reduce the damage and cost. decision-support functions in several
The prognostic models are used cases. Shortliffe differentiates
in various applications, such as between three types of decision help
functions, which are as follows:
(a) deriving healthcare policy
• information management,
by creating global predictive
• concentrating attention, and
scenarios,
• patient-specific assessment.
(b) supporting relative review
among hospitals by (case- A prognostic model to the popu-
mix modified) mortality lation degree found an excellent
predictions, assurance system that can bring
(c) defining research eligi- about the recognition of the discrep-
bility of clients for ground ancy between your anticipated and
breaking remedies, real prices of problems originated
(d) outlining addition require- after surgery of a group of patients,
ments for medical studies which therefore help spot the reasons
to manage for variation in why this discrepancy is used, for
prognosis, (age) price reim- example. A prognostic model can be
bursement programs, and used as a foundation for providing
(e) Selecting appropriate tests treatment advice to the patient,
and methods of therapy in which is for certain connected with
patient-specific assessment functions. Each factor accounts to “penalty

The thought of “group” is, needless points” for the prognostic scoring
to state, coarse as you team with system. This prognostic score may
circumstances where two Method be used independently in medical
Inform Med clients are stratified in a diagnosis.
large number of various subgroups, Survival analysis consists of a
for example, danger teams, since collection of statistical methods for
it covers circumstances where the time-dependent data. A few methods
entire populace is customer-taken. have been employed for this type of
survival analysis. We are basically
9.4.2 MODEL BUILDING only interested in the timeline, that is,
when the event will occur. Although
Many ways can be suggested to predicting the survival of patients’
represent prognostic models, for sounds condescending, this tech-
example, quantitative and probabi- nique can be used to predict the time
listic approaches. Models need to be of complete recovery after treatment
integrated in the medical manage- or usual expected time of initiation
ment of the subject or patient that of a particular symptom after being
needs assessment. subjected to certain environments to
There already exists a massive study the patients and diseases well.
body of experience required to
develop prognostic models for the
field of medical statistics. Some of 9.4.3 ASSESSMENT OF
the techniques that can be used are PROGNOSTIC MODELS
decision-based rules that depend on a Over the past few years, the art of
prognostic score and the Bayes rule. evaluating prognostic models has
The simplest prognostic instrument become crucial. The one question
used in medicine is assessment using that arises for this model is, will this
the prognostic score that helps in the work on a population that is different
classification of a patient into a future
from that used to develop the model
risk category. This score system gives
itself? This is the main challenge
us a method to quantify the seriousness
faced by this model. Let us under-
of the diseases in hand. The higher the
stand how to solve this problem.
score, the greater the severity. The
There are two types of evalua-
factors that contribute or influence this
tions, namely, clinical and labora-
scoring system are as follows:
tory evaluations. The criteria for a
1. physiological variables, model to pass a laboratory test is the
2. demographic variables, and comparison with the statistical tests
3. laboratory findings. previously collected. This kind of
analysis has so many disadvantages, systems to run errands for applica-

listed as follows: tions such as fitness maintenance,
and these systems are guided by
a. inaccuracy,
ANNs. The flow of procedure is
b. imprecision,
given below.
c. inseparability, and
ANNs have the following
d. resemblance.
advantages:
In clinical evaluation, the model
is checked to satisfy its clinical • They deliver work room
purpose and whether its performance to accomplish inspection,
is efficient enough to deal with association, illustration, and
clinical prediction. The other factor categorization of medicinal
that governs the choice of models information.
is whether the model is needed to • They generate innovative
operate on the individual or popula- tools for supporting medic-
tion level. For example, prediction of inal decision making and
infection causing death over a large investigation.
risk group or various risk groups and • They combine actions in
the classification of the vital status health, workstation, intellect,
of an individual patient cannot be and also sciences.
evaluated under the same bracket. • They provide amusing content
These evaluation methods for the upcoming technical
require a great deal of care. In the therapeutic field.
machine learning community, many
models are being developed, each There were a lot of challenges in
better than the one before. The next the field of education, reasoning, and
challenge that arises is the one using language processing. The wide range
the correct statistical test? Thus, tests of applications of AI has solved all
have to be modified and adapted of these challenges by introducing
according to the requirement. a modern and digital way of getting
things done. Some of the applica-
tions of AI in the medical field are
9.5 APPLICATIONS OF
discussed below.
PATIENT DIAGNOSIS AND
1. Virtual inquiry system:
MONITORING
This modern technology is used to
All aspects of the universe have teach hospitals, medical aspirants
some very favorable advantages. in colleges, as well as inhabitants.
However, they also have their Clinicians make use of this online
own characteristic disadvantages. system to interact with the patients.
Medical diagnosis requires complex Basically, in this system, the software
accumulates data of thousands of transfer technology active moni-

patients; it is then interpreted by toring, online storage technology,
experts and AI as subject cases, which integrated platform technology, self-
gives an overview of clinical issues. healing technology, and computer-
This system provides opportunity for aided diagnosis. All of these have had
the medical students for diagnosis a significant impact on the methods
and provides treatment plans and of teaching. At present (as of 2019),
develop problem solving skills in 50 state-level medical education
clinical aspects, and the teachers projects have been approved, and
understand the student’s perception more than 4000 projects have been
and adjust the course accordingly. announced every year. Most of the
Through interactions with the medical colleges have already been
multiple cases, the students are able equipped with this facility. Each year,
to understand the skills involving more than 1500 experts contribute
disease diagnosis. Simultaneously, to distance continuing medical
the system has the ability to detect education covering more than 20
any error that the students make secondary disciplines and 74 tertiary
while studying the case. The system disciplines.
can solve all of these difficulties 3. Influence of AI on distance
with the help of deep learning and medical education: Using informa-
analysis. The psychological steps tion technology, resource libraries
can be tracked by using a tool named and data centers can be constructed
intelligent tutor system; this tool for recruitment of students. This
diagnoses the concepts wrongly is can be also used for the training
interpreted and approximates the process management as well as
student’s understanding extent in the evaluation. This can help improve
field and guides accordingly based service level of medical education
on the feedback provided. and efficiency.
2. Medical distance learning:
Web-based teaching methods are
used for communication, sharing, 9.6 CONCLUSION
and learning. In the field of medicine,
mobile nursing and clinical practice With the growing population and the
teaching play a vital role. Some need for technology, AI has played
of the teaching methods such as a vital role in the healthcare sector
microblogging and virtual simulation by providing different technological
training have seen their applications developments covering both urban
in remote transmission technology of and rural sectors in terms of assis-
pathological films and imaging. Their tance given by doctors. ANNs have
application is also seen in instant tried appropriately for reasonable
identification of a variety of diseases; result in ground-breaking advance-

additionally, their use makes the ments within the medical trade. AI
diagnosis a lot consistent and thus technology has seen notable advance-
will increase patient satisfaction. ments, particularly in medical
However, despite their extensive specialty analysis research and
application in modern identification, medicines, thereby raising the hopes
they need to be thought of solely as a of society to a higher level. It has far
tool to facilitate the ultimate call of a more to supply in the coming years,
practitioner, who is eventually liable given that support and comfortable
for crucial analysis of the output funding are formed on the market.
obtained. In this chapter, we discuss
about the need for AI, benefits in
KEYWORDS
the medical industry of using AI,
one of the budding fields, prognos- artifcial intelligence
tics, and the prototype systems for
fuzzy logic
making predictions, and the different
neural networks
approaches and the neural network
algorithms used mainly for diagnosis prognostics
of a wide variety of diseases. The biomedical
chapter is concluded by the applica- applications
tions of AI technology in the field of
healthcare.
REFERENCES
9.7 FUTURE VISION Abu-Hanna A, Lucas PJF. Intelligent
Prognostic Methods in Medical Diagnosis
With the recent advancement in and Treatment Planning. In: Borne P,
the techniques of AI, discussed Iksouri, el Kamel A, eds. Proceedings of
in the chapter, fuzzy logic, image Computational Engineering in Systems
processing, ML, and neural networks Applications (IMACS-IEEE), UCIS, Lille,
have focused on superior augmenta- 1998, pp. 312–317.
Ahmed F. Artificial neural networks for
tion of diagnosis data by computer. diagnosis and survival prediction in colon
Key features such as reliability, less cancer. Mol Cancer. 4: 29, 2005.
dependency on operator, robustness, Aleksander I, Morton H. An Introduction to
and accuracy are achieved using Neural Computing. International Thomson
image segmentation and classifica- Computer Press, London, 1995.
Alkim E, Gürbüz E, Kiliç E. A fast and
tion algorithms of AI. Blue Brain, adaptive automated disease diagnosis
Human Brain, Google Brain are some method with an innovative neural network
of the several on-going projects in model. Neural Netw. 33: 88–96, 2012.
the field of AI. Integration of science Amato F, López A, Peña-Méndez E M,
and mathematics has invariably Vaňhara P, Hampl A, Havel J. Artificial
neural networks in medical diagnosis. J Multiconference of Engineers and

Appl Biomed. 11: 47–58, 2013. doi 10.2478/ Computer Scientists, Vol 1, IMECS,
v10136-012-0031-xISSN 1214-0287 2011. http://www.iaeng.org/publication/
Amin S, Byington C, Watson M. Fuzzy IMECS2011/IMECS2011_pp519-523.pdf
Inference and Fusion for Health, 2005. Rastgarpour, M, Shanbehzadeh, J. The
Asch D, Patton J, Hershey J. Knowing for the Status Quo of Artificial Intelligence
sake of knowing: the value of prognostic Methods in Automatic Medical Image
information. Med Decis Mak. 10: 47–57, Segmentation. International Journal
1990. of Computer Theory and Engineering,
Costin H, Rotariu H. Medical Image Vol.5, No. 1, February 2013. https://pdfs.
Processing by using Soft Computing semanticscholar.org/e5f7/e56f19cf3efc
Methods and Information Fusion. In: 460ba5d4ce0188cee664a735.pdf
Recent Researches in Computational Schwabacher M, Goebel K. A Survey of
Techniques, Non-Linear Systems and Artificial Intelligence for Prognostics.
Control. ISBN: 978-1-61804-011-4 NASAAmes Research Centre. https://ti.arc.
Deepa SN, Aruna Devi B. A survey on nasa.gov/m/pub-archive/1382h/1382%20
artificial intelligence approaches for (Schwabacher).pdf
medical image classification. Indian J Sci Schwabacher M, Goebel K. A survey of
Technol. 4(11): 1583–1595, 2011, ISSN: artificial intelligence for prognostics.
0974-6846. In: Proceedings of the AAAI Fall
Goebel K, Eklund N, Bonanni P. Fusing Symposium—Technical Report, 2007.
Competing Prediction Algorithms for Secco J, Farina M, Demarchi D, Corinto F,
Prognostics. In: Proceedings of 2006 IEEE Gilli M. Memristor cellular automata for
Aerospace Conference. New York: IEEE, image pattern recognition and clinical
2006. applications. In: Proceedings of the 2016
Ishak, WHW, Siraj, F. Artificial Intelligence IEEE International Symposium on Circuits
in Medical Application: An Exploration. and Systems (ISCAS), 2016, pp. 1378–
2008. https://www.researchgate.net/ 1381. doi:10.1109/ISCAS.2016.7527506
publication/240943548_artificial_ Shiyou L. Introduction to Artificial
intelligence_in_medical_application_an_ Intelligence (2nd edition). Xi'an University
exploration of Electronic Science and Technology
Jiji GW, Ganesan L, Ganesh SS. Press, 2002.
Unsupervised texture classification. J Shukla S, Lakhmani A, Agarwal AK.
Theor Appl Inf Technol. 5(4): 371–381, Approaches of Artificial Intelligence in
2009. Biomedical Image Processing. IEEE,
Khan A, Li J-P, Shaikh R A, Medical Image 2016.
Processing Using Fuzzy Logic. IEEE, Smitha P, Shaji. L, Mini MG. A Review of
2015. Medical Image Classification Technique.
Mandelkow H, de Zwart JA, Duyn JH. In: Proceedings of the International
Linear discriminant analysis achieves high Conference on VLSI, Communication &
classification accuracy for the bold FMRI Instrumentation (ICVCI), 2011.
response to naturalistic movie stimuli. Taoshen L. Artificial Intelligence. Chongqing
Front Hum Neurosci. 10: 128, 2016. University Press, 2002
Rastgarpour, M, Shanbehzadeh, J. Zhang Z. Naive Bayes classification in R.
Application of AI Techniques in Ann Transl Medic. 4(12): 241, 2016.
Medical Image Segmentation and Novel Zixing C, Guangyou X. Artificial Intelligence
Categorization of Available Methods and and Its Applications (2nd edition).
Tools. In: Proceedings of the International Tsinghua University Press, 1996.
CHAPTER 10
SEMANTIC ANNOTATION OF
HEALTHCARE DATA
M. MANONMANI* and SAROJINI BALAKRISHANAN
Department of Computer Science, Avinashilingam Institute for Home
Science and Higher Education for Women, Coimbatore 641043, India
Corresponding author. E-mail: manonmaniatcbe@gmail.com
*
ABSTRACT annotation techniques help to assign

meaningful information and relation-
In recent times, there has been more ships between different sources of
research aiming at providing person- data. Incorporating semantic models
alized healthcare by combining in artificial intelligence (AI) expert
heterogeneous data sources in the system for prediction of chronic
medical domain. The need for diseases will enhance the accuracy
integrating multiple, distributed of prediction of the disease since the
heterogeneous data in the medical meaning of the data and the relation-
domain is a challenging task to the ships among the data can be clearly
data analysts. The integrated data is described. This chapter aims to
of utmost use to the physicians for address the challenges of processing
providing remote monitoring and heterogeneous medical data by
assistance to patients. Apart from proposing a lightweight semantic
integrating multiple data, there is a annotation model. Semantic models
large amount of information in the have the capability to tag the
form of structured and unstructured incoming healthcare data and attach
documents in the medical domain meaningful relationships between
that pose challenges in diagnosing the user interface and the healthcare
the medical data. Medical diagnosis cloud server. From the relationships,
from a heterogeneous medical data- prediction and classification of
base can be handled by a semantic chronic diseases can be easy and at the
annotation process. Semantic earliest possible time with enhanced
accuracy and low computation time. data due to the ever-increasing quan-
The main objective envisaged in tity of medical data and documents.
this chapter consists of proposing Effective knowledge discovery can
a semantic annotation model for be envisaged with the foundation
identifying patients suffering from of AI expert systems in the field of
chronic kidney disease (CKD). The medical diagnosis. The complexity
purpose of the semantic annotation of the medical data hinders commu-
is to enable the medical sector to nication between the patient and the
process disease diagnosis with the physician that can be simplified with
help of an Ontograf that shows the semantic annotation by providing
relationship between the attributes a meaningful abstraction of the
that represent the presence of CKD features in the diagnosis of chronic
represented as (ckd) or absence of diseases.
CKD represented as (not_ckd) and Proper implementation of
to attach meaningful relationships semantic annotation in medical
among the attributes in the dataset. diagnosis will ensure that every tiny
The semantic annotation model will detail regarding the health of the
help in increasing the classification patient is taken care of and important
accuracy of the machine learning decisions regarding their health are
algorithm in disease classification. A delivered to the patients through
collaborative approach of semantic remote access. Heterogeneous
annotation model and feature selec- medical information such as phar-
tion can be applied in biomedical AI maceutical’s information, prescrip-
systems to handle the voluminous tion information, doctor’s notes or
and heterogeneous healthcare data. clinical records, and healthcare data
is generated continuously in a cycle.
The collected healthcare data need to
10.1 INTRODUCTION
be harnessed in the right direction by
Semantic annotation of healthcare providing integrity of heterogeneous
data aids in processing the keywords data for diagnosing chronic illness
attached to the data attributes and and for further analysis. Otherwise,
deriving relationships between the the integrity of the vast medical data
attributes for effective disease clas- poses a major problem in the health-
sification (Du et al., 2018). Semantic care sector. Semantic annotation of
annotation implemented on the basis the incoming healthcare data forms
of artificial intelligence (AI) expert the basis of research motivation so
systems will bring accurate and that the problem of late diagnosis
timely management of healthcare can be overcome.
Semantic Annotation of Healthcare Data 219
10.2 MEDICAL DATA MINING records lies in integrating the hetero-

geneous data and making use of the
Medical data mining deals with results which in turn helps the data
the creation and manipulation of analysts to exploit the advantages
medical knowledgebase for clinical of medical research. Harnessing
decision support. The knowledge- this wide range of information for
base is generated by discovering the prediction and diagnosis can provide
hidden patterns in the stored clinical more personalized patient care and
data. The generated knowledge and follow-up. In many organizations,
interpretations of decisions will the data about a patient is maintained
help the physician to know the new in different departments, and at the
interrelations and regularities in the receiving end, there is a lack of inter-
features that could not be seen in connection between the departments
an explicit form. However, there that leads to incomplete, unreliable
are many challenges in processing information about the patient. At
medical data like high dimension- the physician’s end, the data about
ality, heterogeneity, voluminous a particular patient is updated only
data, imprecise, and inaccurate data with regard to details concerning
(Househ and Aldosari, 2017). the treatment procedures and follow
up only for their department. This
results in a fragmented view of a
10.2.1 SEMANTIC particular patient, and there is a lack
ANNOTATION OF MEDICAL of coordination among the different
DATA departments in the hospital to get
an overall picture of the health of
10.2.1.1 MEANING the patient. On the other hand, the
clinical decision support system
Clinical care and research are aims to provide increased healthcare
increasingly relying on digitized assistance with enhanced quality and
patient information. A typical patient efficiency (Du et al., 2018). Efficient
record consists of a lot of composite access to patient data becomes a chal-
and heterogeneous data that differ lenging work with the use of different
from lab results, patients’ personal formats, storage structures, and
data, scan images including CT, semantics. Hence, it becomes reason-
MRI, physician notes, gene informa- ably advantageous to the medical
tion, appointment details, treatment sector to capture the semantics of the
procedures, and follow-ups. The patient data to enable seamless inte-
main challenge encountered in using gration and enhanced use of diverse
this huge amount of valuable medical biomedical information.
Semantic annotation is based on medical devices including glucose

the concept of tagging the data with level sensing devices, heart moni-
conceptual knowledge so that it can tors, blood pressure monitors, and
be represented in a formal way using IVs. But, the management of these
ontology so that the information is
devices and recording of informa-
understandable and can be used by
tion from these devices consumes
computers. The basis of the semantic
annotation is the creation of tags that time and are sometimes even prone
are computer-understandable and to errors.
gives a clear picture of the role of In the present scenario, with
the features in a given dataset. In the the advent of biomedical devices
medical domain, semantic annotation and AI expert systems, handling
models can help to overcome the of patient’s data can be done auto-
issue of integration of different hard- matically with the help of electronic
ware components by using high-end health record (EHR) systems. These
ontology structure so that the interac-
systems provide accurate informa-
tion between the user interfaces can
be handled easily (Liu et al., 2017). tion about the patient and save time
Semantic annotation can be used for the nurses so that they can spend
to enrich data as well as it can also be valuable time with the patient in
a viable solution for the application in providing more care. Added to the
semiautomatic and automatic systems advantages provided by biomedical
interoperability (Liao et al., 2011). devices and AI expert systems,
Moreover, semantic feature selection semantic annotation of the medical
processes when applied to medical data can enhance the wide range of
datasets, provide inference about
solutions provided in the medical
feature dependencies on the dataset
field. Physicians are loaded with a
at the semantic level rather than data
level so that a reduced feature set lot of patient information and proper
arrives that in turn aids in early detec- diagnostic solutions can arrive if the
tion of diseases for patients suffering data received from heterogeneous
from chronic illness. devices are semantically annotated.
Semantic analysis of medical data
provides meaningful relationships
10.2.1.2 NEED FOR SEMANTIC
between the symptoms of any
ANNOTATION
disease and these relationships
When a patient is admitted to the aid in the proper diagnosis of the
hospital, he is entangled by many disease (Zhang et al., 2011).
10.2.1.3 INTEGRATING of the medical domain surpassing

MACHINE LEARNING the vendor and application details.
TECHNIQUES AND BIOMEDICAL However, lack of interoperability
EXPERT SYSTEMS TO SEMANTIC between the organizational bound-
aries hinders information exchange
ANNOTATION PROCESS
between systems via a complex
Semantic Annotation of medical data network that is developed by diver-
plays an important role in healthcare gent manufacturers.
organizations enabling ubiquitous In medicine, data mining tech-
forms of representing the informa- niques applied in ontology-based
tion that is collected from various semantic annotation processes
heterogeneous sources. By inte- enable accurate data transfer, secure
grating machine learning techniques data sharing, and consistent data
and biomedical expert system infor- exchange surpassing the underlying
mation with the semantic annotation hardware and physical devices
process, complex queries regarding involved in data management. The
emergency medical situations can clinical or operational data are
be handled with ease and reliable preserved with meaningful rela-
information sharing can be achieved tionships so that these data can be
(Pech et al., 2017). accessed and processed at any time
Effective healthcare delivery when required.
systems require automation of data The semantic annotation process
sharing among computers that are provides a clear classification
interconnected so that information scheme of the features that are rele-
sharing is done quickly and at high- vant in the interpretation of medical
speed response time. In the health- data. Semantic annotation models
care sector, data interconnection describe the resource provided by
systems and the underlying schema the user by the process of annotation
of the devices enable data sharing of the data as represented in Figure
between and across the stakeholders 10.1.
FIGURE 10.1 Generic annotation model.

In a semantic annotation model, feature selection model. The authors

the meanings and relations of the have developed semantic guidance
data terms are delivered to the for feature selection that is in fact an
computer using ontology objects intensive task in semantic annotation
that, in fact, enriches resource infor- models. The authors have analyzed
mation. Web Ontology Language the cases of high dimensional data
(OWL) is considered as a standard and have operated the features in
language for ontology representa- two different approaches. In the
tion for semantic web (Jabbar et first approach, a large number of
al., 2017). The proposed research features and instances are found. In
work is implemented on Protégé this case, the authors have used the
5.0 ontology editor. The Protégé 5.0 Web Ontology Language Version 2
ontology editor provides various Query Language (OWL 2 QL) for
options for querying and structuring. Ontology-Based Data Access that
The query process inbuilt in the uses the T Box level feature selection
Portege tool enables the users to instead of instance-level approach.
perform a detailed analysis of the In the second approach, there are
input and output data. The results of a fewer number of instances than
the semantic annotation process are features. In this case, the authors
an OWL instance file, which is avail- have introduced an embedded feature
able to the end-users of the system selection approach that provided
and the output file can be processed a semantic annotation model for
further for more pattern analysis and handling engineering knowledge.
knowledge discovery. The results of the semantic feature
selection have been represented in
a Resource Description Framework
10.3 LITERATURE SURVEY (RDF) graph that specifies the
reduced feature set that can be fed
Ringsquandl et al. (2015) have to the machine learning models for
conferred two semantic-guided further analysis.
feature selection approaches for Jabbar et al. (2017) have
different data scenarios in industrial proposed an IoT-based semantic
automation systems. This model was interoperability model to provide
executed to handle the high dimen- semantic annotations to the
sional data produced by automation features in the medical dataset that
systems. To eliminate the highly helps to resolve the problem of
correlated features and to increase interoperability among the various
the processing time, the authors heterogeneous sensor devices in the
have presented the semantic-guided healthcare sector. The health status
of the patients is monitored and undertaken using fog computing. In

informed to the medical assistants this paper, the author has provided
with the help of the sensor devices an augmented health monitoring
that operate remotely. The data that system based on fog computing.
is collected through the sensors is Apart from proving fog computing at
semantically annotated to provide a gateway, the authors have offered
meaningful relationships between better interoperability, graphical user
the features and the class that deter- interface with access management,
mines whether the patient is affected distributed database, a real-time
with heart disease or not. The dataset notification mechanism, and location
that is taken for evaluation is the heart awareness. In addition, they have
disease dataset that is available in the introduced a versatile and light-
UCI repository is taken for analysis weight template extracting ECG
and the RDF graph is generated to features. The results reveal that fog
show the relationships between the computing at a gateway has in fact
features and the final class. produced better results compared to
Gia et al. (2015) have used fog previous achievements.
computing at smart gateways to Chui et al. (2017) have presented
provide a high-end health monitoring the method of diagnosing diseases in
system. The authors have exploited smart healthcare. This paper gives
the advantages of embedded data a summary of the recent algorithms
processing, distributed storage, and based on optimization and machine
notification service in their work. In learning algorithms. The optimiza-
many of the cardiac disease, electro- tion techniques discussed in this
cardiogram (ECG) feature extraction paper include stochastic optimiza-
plays an important role, and hence tion, evolutionary optimization, and
ECG feature extraction was selected combinatorial optimization. The
as the case study in this research work. authors have focused their research
The vital signs of the patients were on diagnosing diseases like cardio-
recorded including ECG signals, vascular disease, diabetes mellitus,
P wave, and T wave. Lightweight Alzheimer’s disease and other forms
wavelet transform mechanism was of dementia, and tuberculosis. In
adopted to record the ECG signals, addition, the issues and challenges
P wave, and T wave. Experiments that are encountered in the classifi-
conducted reveal 90% bandwidth cation of disease in the healthcare
efficiency using fog computing. At sector have also been dealt with, in
the edge of the network, the latency this paper.
of real-time response was also Antunes and Gomes (2018)
good for the experiments that were have presented the demerits of
the existing storage structure and with affinity and without affinity.
analytical solutions in the field of Co-occurrence cluster metrics and
data mining and IoT. An automated cosine similarity cluster metrics
system that can process word were evaluated. To take into consid-
categories easily was devised with eration multiple word categories,
an extension to the unsupervised the model was extended further to
model. They have aimed to provide accomplish the multiple words and a
a solution for semantic annotation, novel unsupervised learning method
and the Miller–Charles dataset and was developed. The issues like
IoT semantic dataset were used to noisy dimension from distributional
evaluate the undertaken research profiles and sense-conflation were
work. Among human classification, taken into consideration and to curb
the correlation achieved was 0.63. this, dimensional reduction filters
The reference dataset, that is, the and clustering were employed in this
Miller–Charles dataset was used to model. By this method, the accuracy
find the semantic similarity. A total can be increased, and also this model
of 38 human subjects have been can be made used more potentially.
analyzed to construct the dataset tath A correlation of 0.63 was achieved
comprises 30 word-pairs. A scaling after evaluating the results against
system of 0 for no similarity and 4 for the Miller–Charles dataset and an
perfect synonym was adopted to rate IoT semantic dataset.
the word pairs. Then, 20 frequently Sharma et al. (2015) have
used terms were collected and they presented an evaluation of stemming
were ordered into 30-word pairs. and stop word techniques on a text
Each pair was rated on a scale from classification problem. They have
0 to 4 by five fellow researchers. The summarized the impact of stop word
correlation result was 0.8 for human and stemming onto feature selection.
classification. Unsupervised training The experiment was conducted with
methods were used by the author to 64 documents, having 9998 unique
mark the groups and also to improve terms. The experiments have been
accuracy. The model was evaluated conducted using nine documents
based on the mean squared error. For with frequency threshold values
a given target word u and different (sparsity value in %) of 10, 20,
neighborhood dimensions, the 30, 40, 50, 60, 70, 80, and 90. The
performance of distributional profile threshold is the proportion value
of a word represented as DPW (u) instead of the sparsity value. Experi-
and distributional profile of multiple mental results show that the removal
word categories represented as of stop-words decrease the size of
DPWC (u) was calculated for both the feature set. They have found the
maximum decrement in feature set an RDF-graph format. The authors

at sparsity value 0.9 as 90%. Results have implemented a data model
also indicate that the stemming in RDFox-Datalog. The RDFox-
process affects significantly the Datalog was implemented to express
size of the feature set with different HEDIS CDC in rule ontology that
sparsity values. As they increased is in fact close to the specification
the sparsity value the size of the language. The data format of RDF
feature set also increased. Only for proved to be flexible during the
sparsity value 0.9, the feature set development of the rule ontology.
decreased from 9793 to 936. The Experimental results in fact exceeded
results further reveal an important the HEDIS data analyst’s expecta-
fact that stemming, even though it tions due to the ability of the semantic
is very important, is not making a technology employed in the work.
negligible difference in terms of the The results seemed to be exceeding
number of terms selected. From the good when the rules were evaluated
experimental results, it could be seen on the patients’ records. By using
that preprocessing has a huge impact the high efficiency of RDFox triple
on performance of classification. store technology, it was possible
The goal of preprocessing is to cut to fit the details of about 466,000
back the number of features that patients easily into memory and the
were successfully met by the chosen time taken was 30 min that was low
techniques. compared to previous works. The
Piro et al. (2016) in their paper discrepancies in the raw data were
have tried to overcome the difficul- easily traced using the explanation
ties encountered in computing the facilities in RDFox. This enabled
challenging part of the Healthcare reduction in the number of develop-
Effectiveness Data and Information ment cycles, and the problems in the
Set (HEDIS) measures by applying vendor solution were resolved. The
semantic technologies in health- HEDIS analyst has approved the
care. RDFox was used in the rule results and solutions derived by the
language of the RDF-triple store author based on the application of
and a clean, structured, and legible RDFox.
encoding of HEDIS was derived. Lakshmi and Vadivu (2017) have
The reasoning ability of RDFox’s used multicriterion decision analysis
and SPARQL queries was employed for extracting association rules from
in computing and extracting the EHRs. In this paper, the authors
results. An ontology schema was have arrived at a novel approach
developed using the RIM modeling by extracting association rules from
standard that depicts the raw data in patients’ health records based on
the results of the best association algorithms proved to be more effec-

rule mining algorithm implemented tive using multiple criteria decision
for different criteria. The main aim analysis techniques. To get better
of the research work is to identify classification results, the lift that is
the correlation between diseases, a correlation measure is employed
diseases and symptoms, and diseases to the generated rules of these
and medicines. Apriori algorithm, algorithms.
frequent pattern (FP) growth,
equivalence class clustering and
bottom-up lattice traversal, apriori 10.4 SEMANTIC ANNOTATION
algorithm for transaction database MODEL FOR HEALTHCARE
(apriori TID), recursive elimination DATA
(RElim) and close algorithms are
employed to extract the association The main objective of this research
rules. Once the association rules is to provide a semantic annotation
are generated the best algorithm is model in medical diagnosis that iden-
chosen using multicriteria decision tifies the healthcare data, analyzes
analysis. The method that was used keywords, attaches important rela-
for multicriteria decision analysis is tionships between the features with
the ELECTRE 1 method. The three the help of knowledge graph, and
parameters that are used to evaluate disambiguates similar entities. The
the algorithms are: P1: perfor- proposed research work is depicted
mance, P2: memory space, and P3: in Figure 10.2.
response time. Based on the average In the proposed research work as
number of rules that are generated depicted in Figure 10.2, the health-
for different values of support and care data from the UCI is loaded
different values of confidence, the to the ontology editor. Then, fuzzy
performance metric is calculated. rule mining with if-then condition
Results indicate that the rules that is applied to the dataset to identify
are generated by the association rule the keywords in the dataset. The
mining algorithms are more or less keywords that are extracted from
the same. Initially, the maximum the dataset are in fact tokens that
and minimum support values for deliver information regarding the
generating rules were 100 and 10, medical data in the healthcare
respectively. The support values sector. The tokens are sent to the
were increased and the confidence tagging section that associates tags
values were decreased in the next for the attribute values. These tags
stage. Compared to other algorithms, are semantically annotated using
RElim, apriori TID, and FP-growth the fuzzy rules in the ontology. The
rule mining algorithm checks each in the rule mining algorithm. If the
feature in the dataset and generates condition is not satisfied, then a rela-
a keyword for the satisfied condi- tionship is expressed between the
tion. There exists a relationship selected feature and the binary clas-
between the selected features and sification of type “no.” The result is
the binary classification type “yes” a context-aware data represented in
if the feature selected satisfies the the form of Ontograf that aids in the
given if–then condition specified analysis of the medical diagnosis.
FIGURE 10.2 Proposed semantic annotation model.
10.5 DATASET DESCRIPTION categorical. The features that are

present in the CKD dataset are age,
Experiments were undertaken on blood pressure, specific gravity,
CKD dataset taken from the UCI albumin, sugar, red blood cells, pus
repository that consists of clinical cells, puss cell clumps, bacteria,
data of patients suffering from kidney blood glucose random, blood urea,
disease. The dataset consists of a serum creatinine, sodium, potassium,
record of 400 patients suffering from hemoglobin, packed cell volume,
CKD. There a total of 25 features, white blood cell count, red blood cell
of which 11 are numeric and 14 are count, hypertension, diabetes mellitus,
coronary artery disease, appetite, relationships between the features

pedal edema, anemia, and class. The and the class. The output of the
last feature is a binary output variable semantic annotation process is an
represented as “ckd” or “notckd.” Ontograf that shows the relation-
ships in an ontology graph. Every
attribute in the Ontograf has a
10.6 EXPERIMENTS AND specific color for identification.
RESULTS The results of an asserted hierarchy
of the entire dataset are shown in
The dataset is analyzed in the OWL Figure 10.3.
language by creating semantic
FIGURE 10.3 Ontograf showing the relationship between the attributes and class.
In Figure 10.4 the overall rela- the patient details, and the features
tionship between the main OWL related to these classes is depicted.
instance, the binary class of CKD,
FIGURE 10.4 Ontograf showing overall relationship between the main class, subclass, and
features.
In Figure 10.5, the Ontograf is are semantically annotated using a

simulated based on the conditions fuzzy rule mining algorithm, which
satisfied in the rule mining algo- shows the interrelationship between
rithm. All the features in the dataset the features and the classes.
FIGURE 10.5 Ontograf that depicts the semantic relationship between the class and the
binary outputs “ckd” and “not ckd.”
In Figure 10.6, the ontograf patient with id 1, who are having

shows the semantic relationship ckd and who are not having ckd,
between the patient with id 383 and respectively.
FIGURE 10.6 Ontograf showing the semantic relationship between patient383 and patient1
and binary outputs “ckd” and “notckd.”
The clinical data of all patients semantic annotation in healthcare data

are annotated semantically based by proposing a lightweight semantic
fuzzy rule mining and the relation- annotation model. The CKD dataset
ships indicate the semantic relation- was used to analyze the experiments
ship between the features and the of the semantic annotation process.
binary class “ckd” or “notckd.” Thus, The features present in the dataset
semantically annotated data proves were semantically annotated using
to be informative in providing the fuzzy rule mining with if-then
relationship between the features of condition. The result is an Ontograf,
the medial dataset that aid in knowl- a graphical representation in OWL
edge discovery and early diagnosis that shows the relationships between
of chronic diseases. the features to predict whether the
patient is suffering from CKD or
not. This semantic annotation model
10.7 CONCLUSION can be used to predict the diagnosis
of chronic illness for any incoming
This chapter aims to address the patient. Added to this semantic anno-
challenges of extracting relevant tation model, data from IoT and sensor
feature subset through the process of devices can be fed to this proposed
model to annotate the data and predict Generation Computer Systems, 86,
if a particular patient is affected or 792–798.
Ashrafi, N., et al. (2017). Semantic
not affected by any chronic disease. interoperability in healthcare: challenges
The issues of information sharing and roadblocks. In Proceedings of
between the various devices and STPIS’18, vol. 2107, pp 119–122.
allied components in the healthcare Chui et al.(2017). Disease diagnosis in smart
sector are handled with the help of healthcare: innovation, technologies and
applications. Sustainability. 9(12), 2309.
ontology-based semantic analysis of Du, Y., et al. (2018). Making semantic
the healthcare data (Tchechmedjiev annotation on patient data of depression.
et al. 2018). This semantic annota- Proceedings of 2018 the 2nd International
tion model will prove to be effective Conference on Medical and Health
in identifying the relevant features Informatics (ICMHI 2018), Association of
Computing Machinery, pp. 134–137.
that aid in disease diagnosis mainly Gefen, D., et al. (2018). Identifying patterns
for patients who are suffering from in medical records through latent semantic
chronic diseases. This chapter stresses analysis. Communications of the ACM,
the need for achieving semantic 61(6), 72–77.
annotation by surpassing the imple- Gia T. N. et al. (2015). Fog computing in
healthcare internet of things: a case study
mentation challenges with the use on ECG feature extraction, 2015 IEEE
of ontology. The semantic analysis International Conference on Computer
provides meaningful relationships and Information Technology; Ubiquitous
between the features that help in the Computing and Communications;
early diagnosis of chronic illness. Dependable, Autonomic and Secure
Computing; Pervasive Intelligence and
Computing, Liverpool, 2015, pp. 356–363.
Guerrero-Contreras, G., et al. (2017). A
KEYWORDS collaborative semantic annotation system
in health: towards a SOA design for
knowledge sharing in ambient intelligence.
semantic annotation
Mobile Informations Systems, 2017,
heterogeneous models Article ID 4759572, 10 pages.
medical data mining He, Z., Tao, C., Bian, J., Dumontier, M., and
Hogan, W. R. (2017). Semantics-powered
artifcial intelligence (AI) healthcare engineering and data analytics.
expert system Journal of Healthcare Engineering, 2017,
machine learning algorithm 7983473. doi:10.1155/2017/7983473.
Househ, M. and Aldosari, B. (2017). The
hazards of data mining in healthcare.
Studies in Health Technology and
REFERENCES Information. 238, 80–83.
Jabbar, S., Ullah, F., et al. (2017). Semantic
interoperability in heterogeneous IoT
Antunes, M., Gomes, D., and Aguiar, R.
infrastructure for healthcare. Wireless
(2018). Towards IoT data classification
through semantic features. Future
Communications and Mobile Computing, P. et al. (eds) The Semantic Web—ISWC

2017, Article ID 9731806, 10 pages. 2016. ISWC 2016. Lecture Notes in
Lakshmi, K. S., and Vadivub, G. (2017). Computer Science, vol. 9982. Springer,
Extracting association rules from medical Cham
health records using multi-criteria Ringsqunadi, M., et al. (2015). Semantic-
decision analysis. In Proceedings of 7th guided feature selection for industrial
International Conference on Advances in automation systems. International
Computing & Communications, August Semantic Web Conference, Springer
2017, Cochin, India, vol. 115, pp. 290–295. (2015), LNCS 9367, pp. 225–240.
Liao, Y., Lezoche, M., et al. (2011). Why, Sharma, D., Jain, S. (2015). Evaluation of
where and how to use semantic annotation stemming and stop word techniques on
for systems interoperability. 1st UNITE text classification problem, International
Doctoral Symposium, Bucarest, Romania, Journal of Scientific Research in Computer
pp 71–78. hal-00597903 Science and Engineering, 3(2), 1–4.
Liu, F., Li, P., and Deng, D. (2017). Device- Tchechmedjiev, A., et al. (2018). SIFR
oriented automatic semantic annotation in annotator: ontology-based semantic
IoT. Journal of Sensors, 2017, Article ID annotation of French biomedical text and
9589064, 14 pages. clinical notes. BMC Bioinformatics, 19,
Pech, F., et al. (2017). Semantic annotation 405, 26 pages.
of unstructured documents using concepts Zhang, L., Wang, T., Liu, Y., and Duan, Q.
similarity. Scientific Programming, 2017, (2011). A semi-structured information
10 pages. semantic annotation method for Web pages.
Piro, R., et al. (2016). Semantic technologies Neural Computing and Applications, 2017,
for data analysis in health care. In: Groth Article ID 7831897, 10 pages.
CHAPTER 11
DRUG SIDE EFFECT FREQUENCY

MINING OVER A LARGE TWITTER
DATASET USING APACHE SPARK
DENNIS HSU1*, MELODY MOH1*, TENG-SHENG MOH1, and
DIANE MOH2
1
Department of Computer Science, San Jose State University, San Jose,
CA, USA
2
College of Pharmacy, Touro University, Vallejo, CA, USA
Corresponding author. E-mail: dennis.hsu@gmail.com,
*
melody.moh@sjsu.edu
ABSTRACT effects from tweets, a pipeline can be

used with sentimental analysis-based
Despite clinical trials by pharmaceu- mining and text processing. Machine
tical companies as well as current learning in the form of a new
Food and Drug Administration ensemble classifier using a combina-
reporting systems, there are still tion of sentiment analysis features to
drug side effects that have not been increase the accuracy of identifying
caught. To find a larger sample of drug-caused side effects. In addition,
reports, a possible way is to mine the frequency count for the side
online social media. With its current effects is also provided. Furthermore,
widespread use, social media such we have also implemented the same
as Twitter has given rise to massive pipeline in Apache Spark to improve
amounts of data, which can be used the speed of processing of tweets by
as reports for drug side effects. To 2.5 times, as well as to support the
process these large datasets, Apache process of large tweet datasets. As
Spark has become popular for fast, the frequency count of drug side
distributed batch processing. To effects opens a wide door for further
extract the frequency of drug side analysis, we present a preliminary
study on this issue, including the side effects that are not reported by
side effects of simultaneously using the average consumer.
two drugs, and the potential danger To solve this problem, one
of using less common combination solution is to use a much larger
of drugs. We believe the pipeline database where many more reports
design and the results present in this of side effects can be found: social
work would have helpful implication media. With the current widespread
on studying drug side effects and on use of social media, the amount of
big data analysis in general. With data provided by platforms such as
the help of domain experts, it may LinkedIn, Facebook, Google, and
be used to further analyze drug side Twitter is enormous. Social media
effects, medication errors, and drug has been used in many different
interactions. fields of study due to both its large
sample size as well as its ease of
access. For mining drug side effects,
11.1 INTRODUCTION social media has many different
users who report their daily use of
Monitoring drug side effects is an the drugs they are taking as well as
important task for both the Food and any side effect they get, and most
Drug Administration (FDA) as well of these reports are in the form of
as the pharmaceutical companies communications with other users.
developing the drugs. Missing these To achieve this goal, machine
side effects can lead to potential learning can be used to design and
health hazards that are costly, forcing implement a pipeline that will aid in
a drug withdrawal from the market. mining Twitter for the frequency of
Most of the important side effects are reported drug side effects. The pipe-
caught during the drug clinical trials, line also has to be fast enough and
but even those trials do not have a have the ability to support large data-
large enough sample size to catch sets through a distributed framework
all the side effects. As for drugs that such as Apache Spark. The data used
are already on the market, current will come from Twitter, which has
reporting systems for those drugs its own set of unique features. In the
use voluntary participation, such as pipeline, Twitter was chosen because
the FDA Adverse Event Reporting of its ease of access to the data in the
System (FAERS), which monitors form of tweets through the Twitter
reports of drug side effects from Application Program Interface
healthcare providers 0. Thus, the (API). Also, the tweets are only 140
system only catches side effects that characters long, making them easy to
are considered severe while missing process and store.
Drug Side Effect Frequency Mining over a Large Twitter Dataset 235
Extracting drug side effects from In this chapter, we introduce the

Twitter comes with numerous chal- importance of monitoring adverse
lenges that are hard to overcome. drug events (ADEs), the current lack
There are previous works in this of methods to report these events,
regard 00; 0 0, all of which have and the potential of social media as
excellent explorations into different new source of reports. Next, we will
ways of classification and extraction. go into techniques to extract these
The work in this chapter expands on reports from the social media website
extraction, focusing mostly on senti- Twitter using sentiment analysis and
ment analysis (opinion mining) tools. machine learning. Then, we will
Sentiment analysis is the process of show how to process and handle
identifying and categorizing opin- large datasets that come from Twitter
ions expressed in text (0) and in our using distributed computing through
case using these opinions to classify Apache Spark. This is followed by
the tweets as positive or negative. an experiment results analysis of best
To get the frequency of tweets, and most efficient ways to correctly
identifying tweets with drug side extract the frequency of ADEs using
effects is required, and sentiment the techniques previously described.
analysis tools use features such as Afterward, a detailed pharmaceu-
reactions to taking a drug to provide tical analysis is provided for the
such identification. Some other results, with insight from a domain
challenges of extraction with tweets expert, as well as referring to actual
medical and pharmacy databases
include reducing the amount of noise
(WebMD LLC, 2019; Medline Plus,
in tweets. Tweets usually contain
2017; Shapiro and Brown, 2016).
incomplete sentences as well as acro-
Finally, we will end with a discus-
nyms and general slang. Tweets also
sion of possible applications of using
must be filtered properly to remove
the frequency of ADEs as well as
spam such as advertisements by
future research direction and ways to
drug companies or announcements
extend our work.
by news organizations. Finally, the
dataset mined in this work is larger
than earlier works 0; 0; 0. To process 11.2 BACKGROUND AND
this dataset, Apache Spark is used to PREVIOUS WORK
speed up the pipeline. Apache Spark
is an open-source distributed cluster This section will provide background
framework that can provide parallel information for understanding
processing to speed up extraction the pipeline, including ADEs, the
from the dataset 0. concept of sentiment analysis for
text processing, and distributed voluntary reports while missing out

computing through Apache Spark. on users who do not visit hospitals
or clinics. The pipeline described in
this chapter is a new way to get more
11.2.1 ADVERSE DRUG reports of ADEs.
EVENTS
ADEs refers to any type of injury or 11.2.2 SENTIMENT ANALYSIS

harm caused by taking a drug for its USING N-GRAMS
intended medical purpose. Catching
and monitoring ADEs are extremely Sentiment analysis, also known as
important to the FDA to make sure opinion mining, from text has been
drugs on the market are safe. For a popular tool in extracting text
further clarification, ADEs can be features used in machine learning.
divided into three types (Shapiro Sentiment analysis can be used on the
and Brown, 2016). An adverse drug n-gram feature, which are sequences
reaction encompasses all unintended of letters or words in the text.
pharmacologic effects of a drug n-Grams have been around for two
when it is administered correctly decades. Cavnar and Trenkle first
and used at recommended doses. A introduced the concept of n-grams
side effect is a predicted response for text categorization of documents
to a drug. A medication error occurs 0. There are two types of n-grams:
when the wrong drug or wrong word grams and character grams.
dose is administered to the patient. Word grams convert documents into
However, most of the research and token count sequences based upon
studies into ADEs rely on voluntary different words in the document
self-reports either by the patient while character grams break the
or nurses and hospitals. One study document into sets of n-character
focused on finding the incidence rate sequences. The reasoning behind
and preventability of ADEs in hospi- using n-characters is to be tolerant
tals, but relied on doctor and nurse of errors in the text, especially with
reports 0. The study found most spelling. They were able to achieve
ADEs were common and prevent- a high accuracy of 80% in catego-
able, and most occurred due to the rizing texts from news articles into
wrong ordering of the drug, such groups. Using character n-grams
as incorrect dosage. There has been is especially useful for Twitter, as
research in automating identification tweets from users often have incor-
of ADEs reported in hospital settings rect spelling as well as acronyms
0, but the ADEs still come from and short-hand words. n-Grams,
from unigrams, which is one word support vector machine (SVM) as

or letter, all the way to four grams, a the machine learning classifier 0.
sequence of four words or letters, are Yu et al.’s (2016) work took
used in our work. a different approach and focused
instead on the cause-effect relations
between the drug and the side effect.
11.2.3 SENTIMENT ANALYSIS Tweets containing drugs that directly
ON TWITTER FOR DRUG SIDE caused the side effect were the ones
EFFECTS identified as positive. To extract
this relation, n-grams were used
Several previous works have as features. Lemmatization of the
explored mining Twitter for drug tweets was also used to reduce the
side effects 00; 00. noise of the text to allow for better
Jiang and Zheng (2013) n-gram features. Using unigram and
extracted drug side effects with bigram words, a 76.9% accuracy was
the use of MetaMap 0. Using five achieved for a large sample of drugs.
different drugs as their dataset, 0 work further improved the
they developed a machine learning techniques of 0 approach. They
classifier to automate classification continued to focus more on specifi-
of tweets with drug-caused side cally capturing only tweets related to
effects, followed by extraction of five different drugs. Their experiment
drug side effects using MetaMap. results gave a better detection rate,
They used user experience as the five times more than the original,
main feature for correct classifica- as well as simplifying classification
tion of the tweets. techniques.
Wu et al. (2015) focused on using The current approach in this
opinion lexicons and subjective chapter focuses on combining the
corpuses as features for classifying techniques from the two earlier
tweets. They first constructed a approaches 00 for further improve-
pipeline for extracting drug side ment. The sentiment features from
effects from tweets, but focused only lexicons of the first approach 0 as
on a small sample size of four drugs. well as the n-gram features of the
The features that were used in this second approach 0 are both used
approach were syntactic features as features. Also, more machine
such as question marks and nega- learning classifiers are explored and
tions as well as the sentiment scores combined to test the best combina-
from the different corpuses. For the tion of these features. In addition,
four drugs, they were able to achieve our approach also uses MetaMap
an f-measure score of 79.5% using to extract drug side effects 0 to
calculate the frequency for further running the potential targets through
analysis and applications. multiple machine learning predic-
tors and calculating a combined
score as the identifying feature for
11.2.4 APACHE SPARK the compound. The predictors gave
a score for the compound based on
Apache Spark is the distributed how well the compound could target
framework used in this chapter to a protein, and this interaction was
process large datasets. It is a cluster based on how well the compound’s
computing system that has become shape complemented the protein
widely used in the recent decade. shape. They partitioned their data
Spark is an improvement over into multiple chunks to process their
Hadoop’s MapReduce paradigm in dataset of compounds in parallel.
terms of speed of batch processing The results of their work showed
0. Spark distributes the workload that the time for processing their
over a cluster for distributed, parallel large dataset decreased linearly with
processing. the number of nodes used in Spark.
Apache Spark’s core feature Similarly, in this chapter, Spark is
is the resilient distributed dataset used to process the large dataset
(RDD), a read-only dataset over the by splitting the tweet dataset into
whole cluster. RDDs can be stored chunks for parallel processing to
in memory for faster repeat batch improve pipeline speed.
processing instead of being stored on
the system’s hard disk. RDDs are also
fault tolerant and can be used in the 11.3 DESIGN AND APPROACH
same tasks that Hadoop can do such
as mapping and reducing. Spark has This section will go into detail of
an extensive set of tools supported, how the current improved pipeline
and their machine learning library is works based different machine
widely used and integrated well with learning classifiers and distributed
their RDD paradigm. computing. The pipeline should
Apache Spark is extremely useful start by identifying whether a tweet
when processing large datasets. contains a drug-caused side effects
In the work by 0, Spark is used to and at the end outputting an updated
improve the speed of identifica- count of the different side effects
tion of potential drug targets to be reported for each drug. There are
studied in clinical trials. The original five parts to the pipeline, as shown
pipeline was changed to process in Figure 11.1. First, the tweets are
the drug compounds in parallel by mined and filtered. Then, the tweets
are preprocessed before features the frequency of the side effects

are extracted. Finally, the classifier is extracted and updated. These
uses the features to identify the drug steps are explained in the following
side-effect-related tweets and then subsections.
FIGURE 11.1 Pipeline for extracting frequency of drug side effects from Twitter.
11.3.1 MINING AND • No retweets: only tweets from

FILTERING TWITTER THROUGH users who are self-reporting
LIVESTREAM drug side effects were mined.
Most of the retweets contained
In the first step, tweets are mined advertisements from pharma-
from Twitter through a livestream.
ceutical companies.
Tweepy, a Python library, was used
• Tweets with <10,000
to access the Twitter streaming API
0. The stream was mine for 9 days followers: users with more
in December 2016. The tweets were were usually organizations or
then stored in a csv file for bulk celebrities. Our target was the
processing. The stream was filtered average consumer.
for keywords containing drug names • Only English tweets were
taken from the top most popular considered for ease of text
drugs from the drugs website 0, processing as well as natural
totaling 462 different drug names. language processing.
The drug names used were their
most commonly used names instead Filtering by drug names and
of always using their scientific name.
other filters over the 9 days returned
The dataset was further cleaned
a total of 486,689 tweets as the
using other filters to further remove
spam and to narrow down the tweets initial dataset that we need to train
we want, using the following: our machine learning classifier.
11.3.2 DATA PREPROCESSING Stop words were not removed

due to the small length of each tweet.
In the second step of the pipeline,
the tweets in the csv file were further
preprocessed to reduce noise. The 11.3.3 FEATURE EXTRACTION
preprocess steps that were used on
the data included the following: After the data was preprocessed,
the features were then extracted
• Any tweet that started for classification using sentiment
with “RT” was removed. analysis, specifically two separate
The Twitter API does not methods. Previous works only used
completely filter out all n-gram cause-and-effect relations 0
retweets, so the tweets had to or opinion lexicons 0, but not both.
be checked a second time. This step of the pipeline here uses
• All hashtag pound symbols both n-gram and lexicons as features
and usernames were removed to train the classifier.
For the n-gram classification, the
(hashtags remained).
experiment tested using a combina-
• All nonalphanumeric char-
tion of unigram, bigram, trigrams,
acters and punctuation were
and four grams. Both word and
removed to allow for easier
character n-grams were tested. The
text processing. The char-
n-grams used do not have punc-
acters were all converted to
tuation and the words are all in their
lowercase as well.
base form due to lemmatization. For
the machine learning classifier to
All drug names in the tweet were
understand the features, the n-grams
replaced with the keyword “drug.”
were converted into vectors based
Due to the different distribution of
on term-frequency and inverse docu-
drug tweets in the dataset, normaliza-
ment frequency (tfidf). The term
tion of the drug name was required
frequency is how frequent (and thus
to balance the dataset 0. important) an n-gram appears in a
The words in the tweet were document, in this case the document
lemmatized. The Natural Language being the tweet. The inverse docu-
Toolkit 0 was used to lemmatize the ment frequency is how frequent the
words down to their base form to n-gram appears in different tweets
further reduce noise. The words in in the whole dataset. Combining
the tweet were tokenized and labeled these two frequencies together gives
with a Part of Speech (POS) tagging us the tfidf statistic feature for how
before lemmatization. important the n-gram is relative to
the others. This is one of the features 11.3.4 MACHINE LEARNING

that was passed to the machine CLASSIFICATION
learning classifier.
For the opinion mining lexicons The features from sentiment analysis
features, the number of words were used to train the machine
covered is small. Thus, multiple learning classifiers through super-
vised learning. This requires a labeled
lexicons were used to get better
training dataset to be run through the
coverage of words. The experiment
classifier, followed by validation
tested out using a combination of
and test sets to evaluate how well
four different lexicons: our classifier has learned. A total of
SentiWordNet 0: This lexicon 1000 tweets were manually labeled
assigns each word a positive, neutral, for the training dataset, with half of
or negative score number. Senti- the tweets being positively identified
WordNet also uses POS tagging to as having drug-caused side effects,
distinguish between different forms while the other half were negatively
of words 0. identified as not having drug-caused
AFINN 0: This lexicon rates each side effects. This is to provide a
word a sentiment score in the range balanced training dataset. A total
[−5, +5] of 1000 tweets were chosen as the
dataset for comparison with results
MPQA 0: Multipurpose question
from previous works 0. Different
analysis has its own subjectivity
combinations of n-gram and lexicon
lexicon that rates each word as
features were used to train the
strong/weak positive or negative. In following different classifiers:
this experiment, we had a “strong”
label be a magnitude of five while a • Gaussian naïve Bayes (GNB):
“weak” label be a magnitude of one a simple classifier using
in ratings. Bayes’ theorem that is usually
Bing-Liu 0: This lexicon contains used as the baseline
more slang words and jargons than • Logistic regression (LGR):
the other lexicons. The lexicon splits uses a logistic function as a
the words into positive and negative base and is popular for binary
classification
lists, which in our experiment, we
• SVM: attempts to separate
gave a score of positive one and
data using a line (called
negative one, respectively. hyperplane) that maximizes
Each of the four lexicons assigns the margin between the data
a number based on its sentiment group points
which is used.
• Stochastic gradient descent • Ensemble classifier (NB,

(SGD): a linear classifier that LGR, SVM, SGD, kNN,
attempts to find a minimum DTC): combination of
one sample at a time multiple classifiers making
• k-Nearest neighbor (kNN): the decision
attempts to separate the data
into groups that form the The ensemble classifier is actu-
nearest neighbors with each ally a combination of the first six
other classifiers taken together in either
• Decision tree classifier a majority voting (hard vote) or
(DTC): uses a data tree a prediction of probabilities (soft
structure where each of data vote), as shown in Figure 11.2. The
points goes through different ensemble classifier provides a better
decision paths that are used to overall predictive accuracy than any
classify of the classifiers it uses by itself
• Random forest classifier and use of ensemble classifiers has
(RFC): uses an ensemble/ not been previously tested 0; 0. By
multiple decision tree clas- tweaking the weights of the classi-
sifiers to form the final fiers, the ensemble’s best accuracy
classification can be found.
FIGURE 11.2 Ensemble classifier with six classifiers and a soft or hard majority vote of
the classifier predictions.
11.3.5 FREQUENCY TABLE 11.1 (Continued)

EXTRACTION Abbreviation MetaMap Group
DISO Disorders
After training the classifier and
GENE Genes and Molecular
identifying the drug side-effect- Sequences
related tweets, a frequency count GEOG Geographic Areas
is taken. To get the frequency, the
LIVB Living Beings
text extraction of drug side effects
OBJC Objects
in each tweet is done by MetaMap.
OCCU Occupations
MetaMap is a tool for recognizing
medical concepts from the Unified ORGA Organizations
Medical Language System (UMLS) PHEN Phenomena
0; 0. Currently, there are 15 different PHYS Physiology
semantic group types in MetaMap, PROC Procedures
each with multiple subcategories,
with no change from what was used
in the previous work 0, as shown 11.3.6 LARGE DATASET
in Table 11.1. MetaMap extracts PROCESSING WITH SPARK
medical text from the tweet and maps
it to a UMLS medical term with a After creating the pipeline initially
certain confidence. In experimenting in a Python environment through
on the pipeline, a confidence to 850 Scikit-learn, a Spark pipeline was
out of 1000 was set as the lower then created to process large datasets,
bound for accepting a mapping by as shown in Figure 11.3. Spark’s
MetaMap. Once extracted, the side RDDs as a well-distributed frame-
effects were then grouped by each work allows for parallel processing
drug for analysis. The most common of all the tweets 0.
side effects as well as rare side For Spark, we first trained
effects could then be observed. Spark’s classifier using our previous
feature sets used in the original pipe-
TABLE 11.1 MetaMap Semantic Groups
line. Next, the Spark pipeline was
and Abbreviations implemented in the following steps:
Abbreviation MetaMap Group
• The large input dataset RDD
ACTI Activities and Behaviors
was partitioned and mapped
ANAT Anatomy out to the nodes. The classi-
CHEM Chemical and Drugs fier on each node identified
CONC Concept and Ideas if the tweet contained a drug-
DEVI Devices caused side effect.
FIGURE 11.3 Apache Spark pipeline with 12 cores for distributed processing. RDDs are
stored in memory between each step. Spark is assigns tasks by automatically partitioning at
each step: data preprocessing (DP), feature extraction (FE), SVM classifier, and frequency
extraction with MetaMap.
• The positively identified use parallelism (as shown in Figure

tweets were then reduced back 11.1) and the other for testing
into a RDD containing all the Apache Spark for large datasets
tweets with drug-caused side that does use parallelism (as shown
effects. in Figure 11.3). The pipelines were
• The tweets were then labeled then compared for speed from the
with a key that is the drug starting point of the twitter dataset
name associated with the to the final output of the side effect
tweet. frequencies.
• Frequency extraction with
MetaMap was then run on
the RDD and the frequency 11.4.1 PIPELINE SETUP
counts for each side effect
were returned as (side effect, For the experiment pipeline in Figure
count) pairs. 11.1, the goal was to test which set
• The (side effect, count) pairs of features as well as which machine
were then reduced back into classifier performed the best. This
one RDD and outputted back section describes how the Scikit-
to a data text file. learn pipeline was setup.
In the initial stream through
Tweepy, 486,689 tweets in total
11.4 EXPERIMENT SETUP were mined over 9 days using the
filters mentioned in Section 11.3.1.
To test the pipeline to determine After removing the retweets not
which features and classifiers work caught by the filter, duplicate tweets
the best, an experiment (Hsu, 2017) were removed. Using the sequence
was setup where two separate matcher from Python’s difflib, all
pipelines were constructed: one tweet that had 0.6 similarity or above
for testing the different machine were removed, leaving 226,834
learning classifiers and does not tweets as our dataset. Using regular
expressions (Regex), the tweets shown in Figure 11.3, was created

were preprocessed using the steps using Spark’s machine learning
shown in Section 11.3.2. NLTK was library MLlib 0. Spark supports
then used to lemmatize each tweet Scala, Java, and Python, and for
further to remove noise. the experiment, PySpark (Python)
Next, sentiment scores were was used for preprocessing, FE, and
extracted from each of the four machine learning classification, and
lexicons. For each lexicon, the sum the tools used were the same as the
of the sentiment scores for each Scikit-learn pipeline 0. NLTK was
word in the tweet was calculated as used for DP and sentiment score
the feature. The sentiment scores features, while MLlib’s vectorizers
were then categorized using a were used to extract n-gram features
one-hot encoder to provide better as well as One Hot Encoding of the
feature weight against the n-gram sentiment scores. For testing of the
features. For the n-gram features, Spark pipeline speed, SVM was
Scikit-learn’s tfidf vectorizer was used as the comparison between
used to create unigrams through four the Scikit-learn pipeline and the
grams 0. After the n-grams were Spark pipeline 0. The output tweets
created, they were then converted identified by Spark’s classifier were
to feature vectors based on the tfidf then stored as a permanent RDD in
frequencies. memory with the persist function.
The tweet’s extracted features The RDD was then passed through
were then run through the machine the Java API of MetaMap for side
classifier and were classified as effect mapping, and the output was
having drug related side effects or then collected and reduced to get the
not. If it was positively identified, frequency output of the side effects
then the tweet was then passed reported for each drug.
to MetaMap, and the side effects For splitting up the dataset, a
extracted by MetaMap were then Spark configuration of two nodes
stored in a dictionary for the drug running on two virtual machines was
along with its count. At the end, the implemented to allow for parallel
frequency of the side effects for each processing. The dataset was parti-
drug was then outputted. tioned automatically over the two
nodes, which had a combined total
of 12 cores, giving 12 partitions of
11.4.2 SPARK SETUP approximately 18,902 tweets per
partition. A map to the two nodes
To test the benefits of distributed was called to allow Spark to run
computing, a Spark pipeline, as the predictions in parallel, but an
246
TABLE 11.2 Classifier Accuracies (f-Measure Score Weighted) for Different Combinations of Features with the Best for Each in Bold
Features SVC GNB LGR SGD kNN DTC RFC Ensemble Ensemble
(Soft) (Hard)
f1+f2 (char_wb) 0.6202 0.5080 0.6094 0.5955 0.5273 0.5080 0.5009 0.5389 0.6136
f1+f2+f3 (word) 0.6280 0.6036 0.6096 0.5599 0.5670 0.5036 0.5663 0.5692 0.6066
f1+f2+f3+f4 (char_wb) 0.6532 0.6262 0.6352 0.5767 0.6311 0.6449 0.5338 0.6710 0.6686
f1+f2+f3+f4+f5 (char_wb) 0.6792 0.6768 0.6899 0.5734 0.6576 0.6131 0.6446 0.6468 0.7128
f2+f3+f4+f5+f6+f7 (word) 0.7229 0.6827 0.7099 0.6913 0.7036 0.6746 0.6291 0.7097 0.7449
f1+f2+f5+f6+f7 (char_wb) 0.7186 0.6460 0.7260 0.6311 0.7332 0.7467 0.6873 0.6671 0.7467
f1+f2+f3+f4+f5+f6+f7 0.7392 0.7219 0.7347 0.6797 0.7032 0.6825 0.6174 0.7568 0.7760
(char_wb)
f1+f2+f3+f4+f5+f6+f7+f8 0.7028 0.6973 0.7147 0.6907 0.6643 0.5939 0.6881 0.7219 0.6992
(char_wb)
f1, unigram; f2, bigram; f3, trigram; f4, four grams; f5, SentiWordNet; f6, AFINN; f7, MPQA; f8, Bing-Liu.
inner map call was used to allow the both SVM and LGR compared with
predictions to occur on each node the other four classifiers. RFC was
sequentially. The predicted tweets excluded from the ensemble clas-
were then run through MetaMap on sifier as RFC itself is an ensemble
another Spark job due to MetaMap classifier. Using Yu et al.’s (2016)
being supported only with a Java work as a baseline of their best f1
API. The side effect counts were score of 0.7690 with SVM 0, our
extracted before being merged ensemble classifier had a small
together into an output text file. improvement. The best nonensemble
classifier was the DTC with a f1
measure score of 0.7467, which still
11.5 RESULTS was a small improvement from the
previous work’s decision tree classi-
In the following subsections, results fier f1 score of 0.7447 0.
from experiments on the pipeline The best features to use were
are presented for the accuracy, all four n-grams from unigram to
processing speed-up, and frequency four-gram plus three of the lexicons:
of drug side effects. SentiWordNet, AFINN, and MPQA.
The trend of the data shows more
features gives better accuracy up to
11.5.1 ACCURACY a certain point. Adding the feature
of the final lexicon Bing Liu gave a
For testing the Scikit-learn pipe-
lower accuracy, which most likely is
line, a fivefold cross validation
caused by overfitting.
was used on different combina-
tions of features. The weighted f1
score was then calculated for each 11.5.2 PIPELINE SPEED
of the machine learning classifiers COMPARISON
for comparison, as shown in Table
11.2. The experiment with unigram Next, the speed between the Scikit-
and bigram was used as a baseline learn pipeline (shown in Figure
for comparison with the other 11.1) and the Apache Spark pipeline
features. (shown in Figure 11.3) using the
The best classifier was the SVM classifier were compared.
ensemble classifier with hard voting From the dataset, 200,000 tweets ran
with an f1 measure score of 0.7760. through the both pipelines and the
Different weights were tested for the time was recorded upon completion,
ensemble classifier, and the optimal as shown in Table 11.3.
weights were double-weighted for
TABLE 11.3 Total Time to Extract Frequency To further investigate these drugs
of Drug Side Effects for Both Pipelines and their side effects, and to compare
Pipeline Total Time (min) with the previous work 0, the top five
Scikit-learn 257.88 drugs (plus two extras for compar-
Apache Spark 105.63 ison) are analyzed in Table 11.5, each
with their five most reported negative
side effects, respectively. These were
Spark was faster than the Scikit- manually examined and extracted
learn pipeline by around 2.5 times from the list of side effects to remove
due to Spark’s parallel processing side effects that were alleviated by
capabilities. the drug and those not caused by the
drug. Each of the top five side effects
was manually checked to make sure
11.5.3 FREQUENCY OF DRUG
the drug did cause the side effect in
SIDE EFFECTS their respective tweets.
Most of the side effects were
The pipeline outputs the frequency
from the MetaMap semantic groups
of drug side effects. For the experi-
“Disorder” and “Physiology.” Note
ment, ut of the 200,000 tweets,
that the side effects reported do not
78,242 tweets were predicted as
consider if the side effects were
tweets containing drug side effects.
directly caused by the drug. The
Table 11.4 shows the top ten
predicted tweets based on the training
drugs with the most side effects
dataset were geared more toward
reported.
false positive, as missing side effects
TABLE 11.4 Top 10 Most Reported Drugs were considered more detrimental
by Twitter Users than over reporting. MetaMap also
Drug No. of Tweets No. of Side had problems in extracting side
Predicted Effects effects due to catching all medical
Reported terms, thus requiring the filter of the
Xanax 12081 27,289 semantic groups.
Adderall 7958 16,906 The predicted tweets based on the
Ibuprofen 7822 16,050 training dataset were geared more
Melatonin 5873 14,274 toward false positive, as missing side
Benadryl 5259 13,708
effects were considered more detri-
mental than over reporting. MetaMap
Tylenol 5263 13,469
also had problems in extracting side
Insulin 5070 12,248
effects due to catching all medical
Nicotine 4819 11,763
terms, thus requiring the filter of the
Aspirin 3185 7638 semantic groups.
Morphine 3028 7223
TABLE 11.5 Frequency of Side Effects Reported, Showing the Top Five Reported Side
Effects per Drug with Number Reported
Drug Name Drug Use Side Effect Side Effect 2 Side Effect Side Effect Side Effect
1 3 4 5
Xanax Anxiety Drowsiness/ Abnormally Addictive Blackout Withdrawal
sleep (291) high (76) behavior (13) (9)
(66)
Adderall ADHD Emotions Addictive Insomnia Tired (17) Binge
(122) behavior (29) (26) eating
disorder
(16)
Ibuprofen Fever, Emotions Drowsiness/ Binge Abnormally Allergic
headache/ (169) sleep (126) eating high (16) reaction
pain disorder (14)
(17)
Melatonin Insomnia Emotions Nightmares Binge Weight loss Anxiety (7)
(89) (25) eating (11)
disorder
(21)
Benadryl Allergy Drowsiness Tiredness Dry throat Nausea (3) Dizziness
(107) (23) (13) (2)
Vyvanse ADHD Emotions Abnormally Weight loss Short Chest pain

(25) high (12) (6) breath (6) (2)
Gabapentin Seizure/ Emotions Insomnia (5) Hot flushes Confusion Dryness (1)
pain (6) (2) (2)
Another problem with MetaMap noncaused side effects, especially

was side effects not caused by the when analyzing side effects shown
drug within the tweet were also in Table 11.5 as well as in analysis of
extracted along with the actual multiple drug interactions in Section
drug-caused side effect. The side 11.6. This is further explained in
effects extracted by MetaMap had Section 11.5.2.
to be manually examined to remove
TABLE 11.6 Side Effects and Number Reported for Xanax in Subcategory “Sign and
Symptom” with Keys for Reference
Side Effect Number Side Effect Number Reported
Reported
Chills (Relax) 98 Unwanted Hair 3
Spells 39 Agitation 3
Malaise 21 Sleeplessness 3
Side Effect Number Side Effect Number Reported

Reported
Blackout 13 Muscle Twitch 3
Catch 13 Pruritus 3
Halitosis 11 Sighing Respiration 3
Tired 8 Clumsiness 3
Hunger 6 Headache 2
Blurred Vision 6 Nausea 2
Muscle Cramp 5 Memory Loss 2
Earache 4 Drooling 2
Withdrawal 4 Seizures 2
Vomiting 3 Other 24
Table 11.6 shows further Xanax as well as side effects caught

examination of Xanax, the drug in the tweet that were not caused by
with the most reports out of the the drug. For example, the side effect
predicted. Within the semantic group for relaxation “chills” was caused by
“Disorder” (referring to Table 11.1), Xanax despite not being a negative
Table 11.6 shows all side effects in side effect, but the side effect was
the subcategory “Sign or Symptom,” mentioned with an actual negative
including side effects that Xanax is side effect in the same tweet. “Chills”
suppose alleviate. There were other in the tweets was considered a posi-
side effects in other subcategories tive side effect, as people who take
of “Disorder” such as “Finding” or Xanax are using it to relax without
“Mental or Behavioral Dysfunction” anxiety, but MetaMap in this case
that are not shown here, such as the caught this side effect as well. Thus,
side effect “Abnormally High.” In both side effects were extracted.
relation to Table 11.5, the only side These are also discussed in Section
effect with the matching number of 11.5.2.
reports was “Blackout.” Other side The pipeline was able to output
effects were not caught under the the frequency of drug-caused side
category “Disorder,” and the side effects for all 462 drugs, such as with
effect “withdrawal” had reports in Xanax, showing both commonly
multiple categories, with “Disorder” and uncommonly reported side
only catching four of them. effects, which can be compared with
The reported frequencies of side Xanax’s known side effects from
effects included both those caused by medical sources 0; 0.
11.5.4 PHARMACEUTICAL swings. This may occasionally cause

ANALYSIS those taking Vyvanse to feel high.
Adderall causes insomnia due
Based on the results from the to its very nature as a stimulant
pipeline, a domain expert provided which keeps people awake, similar
some analysis and insight on the to caffeine. Tiredness is a side effect
different drug side effects and their of Adderall in the sense that once
frequency. The following drugs are the stimulant effect wears off, the
discussed based on information from patient will crash. The binge eating is
Medscape (WebMD LLC, 2019) and a paradoxical side effect of Adderall
Medline Plus (Medline Plus, 2017). as a stimulant usually suppresses
It is not surprising that the Xanax, a the appetite and causes weight loss,
benzodiazepine utilized for anxiety, as seen in Vyvanse. Chest pain and
causes drowsiness as a side effect. Its shortness of breath are serious but
rapid action on the GABA receptors rare side effects seen with Vyvanse,
in the limbic system and reticular often necessitating emergency
formation also explains how the medical care.
dopamine surge in response contrib- Ibuprofen is a nonsteroidal anti-
utes to addictive behavior, abnormal inflammatory drug. There is no past
highs, and withdrawal symptoms. precedence of it causing emotional
Overdoses of benzodiazepines, complications, abnormal highs,
especially in the elderly, or use in drowsiness, or binge eating disor-
combination with alcohol or other ders, although allergic rashes up
central nervous system depressants, to and including Stevens–Johnson
can cause anterograde amnesia and syndrome and toxic epidermal
loss of consciousness. necrolysis have been reported in the
Adderall is a stimulant used in literature.
the treatment of ADHD. Adderall Melatonin is a natural supple-
is composed of two amphetamine ment used over the counter for the
salts, dextroamphetamine and treatment of insomnia. In the body, it
amphetamine, which are controlled is a hormone produced by the pineal
substances due to being highly addic- gland. It has been shown to cause
tive; thus, it is not surprising that they irritability, anxiety and nightmares
cause addictive behavior. Similarly, but not to cause binge eating disorder
Vyvanse is also an amphetamine and weight loss.
salt, lisdexamfetamine, and is also Benadryl is a first-generation
highly addictive. Both Adderall and antihistamine utilized not only for
Vyvanse cause irritability, anxiety, allergies but very commonly as a
and emotional lability, a.k.a. mood sleep aid. Thus, it causes dizziness,
drowsiness and fatigue, and as it also For example, “Chills,” the most
has strong anticholinergic effects, it reported side effect of Xanax,
causes dry mouth and throat as well. in context means “to relax” but
Nausea is not defined as a side effect; to MetaMap, the concept means
in fact, Benadryl is used in combina- “shivers.” Thus, extracting negative
tion to treat motion sickness. side effects of the drugs required
Gabapentin is utilized for seizure both reducing by MetaMap category
and neuropathic pain. Hot flushes, as well as by manual examination, to
dryness, confusion, insomnia, and correctly identify which side effects
irritability as well as depression have were negative. Furthermore, each
all been reported as well as other tweet usually contained more than
emotions. Insomnia has especially one side effect besides the nega-
tive side effect caused by the drug,
been reported upon discontinuation
requiring further manual examina-
of the drug, along with anxiety.
tion to determine which side effect
within the tweet is the one caused by
11.5.5 CHALLENGES AND the drug.
Other complications include
LIMITATIONS
tweets with multiple drugs, as asso-
There have been challenges and ciating the side effect with the correct
drug(s) requires manual examination
limitations to the pipeline concerning
as well. It is not known whether the
the extraction of drug-caused side
side effects in these cases are caused
effects.
by one of the drugs, both drugs, or
First, all of the subcategories
some form of interaction between
for the MetaMap group “Disorder” the drugs. These lead to preliminary
had to be used, as leaving out any work in Section 11.6.
subcategories might cause side
effects to be missed. Secondly,
MetaMap extracts all side effects 11.6 NEXT STEPS:
from the tweets, both those caused APPLICATIONS OF DRUG SIDE
and those not caused by the drug, EFFECT FREQUENCY ANALYSIS
thus also requiring manual examina-
tion to identify the drug-caused side Using the frequency extracted from
effect. However, an external dataset the proposed pipeline, one can make
containing all possible side effects some observations on the most
that are alleviated by the drug can be common side effects as well as rare
used to remove some of these extra side effects reported. One can also
non-caused side effects. observe the side effects that may be
caused by two or more drugs taken in all the top 10 drugs was nausea.
together; some may be side effects For example, Adderall and Vyvanse
caused by rare drug pairs that might are known to cause nausea as they
be potentially dangerous. The are stimulants that suppress the
following results required manual appetite. Ibuprofen is also known
examination to remove side effects to cause nausea, especially when
that were not caused by the drug or
taken without food, as it is a nonste-
were alleviated by the drug as well
roidal anti-inflammatory drug
as any other side effect that was
incorrectly reported. This required and its effects on inhibiting COX
going through the tweets manu- also inhibit stomach protective
ally to make sure the side effect enzymes. Xanax and Gabapentin
was caused by the drug(s). Some cause nausea. Melatonin causes
preliminary studies and observa- abdominal cramps and would be
tions are reported in the following associated with nausea. Tylenol
subsections. is not listed as causing nausea but
can cause gastrointestinal hemor-
rhage, so this is new information
11.6.1 MOST FREQUENTLY
that would be forwarded to FAERS.
REPORTED SIDE EFFECTS
Also, Benadryl had two reports of
The top three side effects were nausea, which were reports that
drowsiness/tiredness, emotions, and would also be forwarded to 0.
being abnormally high. “Drowsi-
ness” is considered a mild side effect
that affects most people, thus being 11.6.2 SIDE EFFECTS CAUSED
commonly reported. People who BY MORE THAN ONE DRUG
have reported being emotional can be
inferred as being more likely to share Next, predicted tweets where
their emotions on Twitter, which is more than one drug was used were
probably the cause of large number examined. Having multiple drugs
of reports. Finally, “abnormally makes it hard to correctly identify
high” was largely reported because which side effect is caused by
of the large number of tweets related which drug. Out of the predicted
to drugs that cause this side effect, tweets, 2678 contained more than
most notably Xanax and other drugs one drug. Table 11.7 lists the top
used for anxiety. six drugs that were mentioned most
An example of a rare side effect out of these tweets containing two
that was less reported but was seen or more drugs.
TABLE 11.7 List of Top Five Drugs mentioned with Other Drugs in a Tweet and Top Two
Side Effects
Drug Tweets Side Effect 1 Side Effect 2
Tylenol 274 Emotions (6) Drowsiness (3)
Xanax 203 Addiction (4) Drowsiness (4)
Ibuprofen 196 Drowsiness (4) Allergic (2)
Adderall 50 Addiction (3) High (2)
Benadryl 99 Drowsiness (12) Insomnia (5)
Most of the tweets with multiple drugs did not specify which of the
drugs-caused the side effect. Also, some of the tweets focused on one of the
drugs not working or causing a side effect that required the second drug (or
even third) to solve their problem.
11.6.3 SIDE EFFECTS CAUSED BY THE MOST POPULAR DRUG

PAIRS
As shown in Table 11.8, most of the tweets with multiple drugs focused on
competing drugs. Ibuprofen, also known as Advil or Motrin, competes with
acetaminophen (Tylenol) for relieving pain and headaches. The “emotions”
side effect related mostly to anger caused by the ineffectiveness of Ibuprofen
or Tylenol at alleviating the pain.
Another pair of drugs, Adderall and Vyvanse, used for attention deficit
hyperactivity disorder (ADHD), unfortunately caused insomnia, as the drugs
providing focus also stopped the users from sleeping. Same conclusion can
be made for Adderall (for ADHD) and Xanax (for anxiety), which caused
insomnia on those patients who really need to sleep as well. Although Xanax
usually causes drowsiness, paradoxically, it can also cause insomnia, espe-
cially at higher doses (WebMD LLC, 2019).
TABLE 11.8 List of Top Five Most Mentioned Drug Pairs with Top Side Effect
Drug Pair Tweets Side Effect
Ibuprofen, Tylenol 131 Emotions (2)
Adderall, Vyvanse 54 Insomnia (2)
Adderall, Xanax 36 Insomnia (2)
Mucinex, Tylenol 25 Drowsiness (2)
Benadryl, Melatonin 23 Drowsiness (4)
Finally, another example, Bena- As the first example, a user took

dryl and Melatonin, had the same Klonopin to treat his anxiety but at
common side effect of drowsiness, the same time caused him to feel
as people who took Benadryl, used depression. He then took Zoloft for
to relieve allergies, usually became the depression, and instead began
drowsy, and they wanted an extra to feel emotional, as shown in the
Melatonin, used as a sleeping pill, tweet:
for extra effect to fall asleep at yeah klonopin make it so my
night. Without manual examination, depression be way more
it would have been hard to figure evident but when I try take
out if the side effect was caused by zoloft w/it [messed with] me
multiple or just one of the drugs in and make me
the tweet, and this is something that manic so idk
future works might improve on.
Trying to treat both depression
TABLE 11.9 Side Effects of Drug Pairs and anxiety with this drug pair
Not Commonly Associated with Either Drug made him feel “manic” and crazy.
Drug Pair Side Effect Side From this tweet, the fact that the
Effect patient mentioned he was going
Count “manic” indicates that he is treating
Klonopin, Zoloft Emotions 1 his bipolar depression with Zoloft,
Tylenol, Ativan Abnormally 1 sertraline, which he should not be
high
doing because bipolar depression
Adderall, Drowsiness 2 should be treated with a mood
Benadryl
stabilizer and atypical antipsychotic
or a mood stabilizer in combination
11.6.4 POTENTIAL DANGER: with an antidepressant and not solely
SIDE EFFECTS ASSOCIATED an antidepressant as treating bipolar
WITH UNCOMMON DRUG depression with an antidepressant
will cause a switch from the depres-
PAIRS
sive phase into the manic phase, thus
Finally, there was an examination possibly precipitating a manic event
of the side effects that were rare and and the need for hospitalization per
not usually associated with a certain Medscape (WebMD LLC, 2019).
drug, due to taking a combination of In another pair, Tylenol (used
drugs. As seen in Table 11.9, three for headaches) and Ativan (used to
different pairs are shown that had treat seizures) caused the user to feel
side effects that were considered rare “high,” which is not a particularly
and abnormal for both drugs when uncommon side effect for Ativan,
taken together. as it is often abused for this very
purpose do to its rapid onset of finding side effects that are more
action. Tylenol has no drug–drug common as well as those that are
interaction with Ativan in this case, rare but potentially dangerous. In
it is merely mentioned in the tweet this chapter, an improvement on
due to the user’s unfamiliarity with previous pipelines for extracting
the medications: drug side effects from Twitter was
so apparently mixing tylenol discussed. A pipeline was created to
and ativan makes you first identify tweets that contained
extremely high drug-caused side effects followed
by extracting the frequency of those
In the last example, Adderall is side effects. An increase in the accu-
used to treat ADHD and is used for racy of the classifier compared to
focus, but Benadryl made the user previous works had been achieved.
fall asleep instead of remaining Finally, the pipeline was also imple-
focused. The tweet, shown below, mented in Apache Spark to improve
shows the user saying that the user the speed of extraction as well as for
became drowsy, favoring the side processing large datasets.
effect of Benadryl (sleepy) instead
Future work for this pipeline
of Adderall (insomnia, focused).
includes the preliminary study of
Adderall typically causes insomnia
application of frequency analysis
only when used near bedtime, and
of drug side effects described in
Benadryl causes drowsiness irre-
this chapter. More studies would be
gardless of the time of day it is used:
beneficial for finding side effects
felt a stuffy so I took a benadryl of concurrently taking two or more
with my coffee and
drugs, and the proposed Apache
adderall. I’ll be fallin asleep Spark-based pipeline may further
and an inch from death today contribute in this direction.
From the above examples, we There are also challenges and
see that finding uncommon side limitations of the experiments and
effects from a combination of drugs analysis, and work may be extended
is important and can be expanded on to address and overcome them
further in the future. by involving domain experts and
improving the machine learning
methods. For example, a domain
11.7 CONCLUSION AND expert has advised that it is important
FUTURE WORK to split the analysis of adverse drug
effects into side effects, medication
Mining the frequency of adverse errors, and adverse drug reactions
drug side effects is important for and categorize them correctly. There
may also be drug interactions. In KEYWORDS

addition, it is wise to properly iden-
tify which side effects come from classifcation
which drug.
machine learning
In addition, the following may
sentiment analysis
be applied on technically improving
opinion mining
the proposed pipelines and experi-
ments. First, have the pipeline Apache Spark
be fed live-streams, allowing for Twitter
constant updates on drug side natural language processing
effects over a certain time period. supervised learning,
Next, implementation of our Scikit- adverse drug event
learn ensemble classifier in Apache
Spark (currently unsupported) can
be done to take advantage of distrib- REFERENCES
uted processing with the majority
vote classifier. More nodes can be Agarwal, A.; Xie, B.; Vovsha, I.; Rambow,
O.; Passonneau, R. Sentiment analysis
added to Apache Spark to speed
of Twitter data. In Proceedings of the
up the pipeline even further. Also, Workshop on Languages in Social Media
more tweets with different drug (LSM '11). Association for Computational
names can be added to the training Linguistics, Stroudsburg, PA, USA. 2011.
dataset because those tweets would 30–38.
Aronson, A. R. “Effective mapping
contain even more different side of biomedical text to the UMLS
effects to further improve on our metathesaurus: The MetaMap program.”
classifier accuracy. Furthermore, In Proceedings of the AMIA Symposium.
for the side effects, a dictionary can 2001. 7–21.
Baccianella, S.; Esuli, A.; Sebastiani, F.
be made to remove side effects that
“SENTIWORDNET 3.0: An enhanced
are alleviated by the drug instead lexical resource for sentiment analysis and
of being marked as caused by the opinion mining.” In Proceedings of the
drug. Tweets with multiple drugs LREC Conference. 2015.
Banerjee, R.; I. Ramakrishnan, V.; Henry,
can also be tested specifically to
M.; Perciavalle, M. “Patient centered
see that the side effect corresponds identification, attribution, and ranking
to the correct drug. Finally, the of adverse drug events.” In Proceedings
frequency output of the pipeline can of the International Conference on
be used to compare with FAERS to Healthcare Informatics. Dallas, TX, USA.
2015. 18–27.
see if there are any common side Bates, D.; Cullen, D.; Laird, N.; Petersen,
effects that have not been reported L.; Small, S.; Servi, D. et al. “Incidence
to the FDA. of adverse drug events and potential
adverse drug events implications for Mining and Applications. Springer: Berlin,
prevention.” JAMA. 1995; 274(1): 29–34. Heidelberg. 2013. 434–443
Bodenreider, O.; Hole, W. T.; Humphreys, Liu, B. “Sentiment Analysis: Mining
B. L.; Roth, L. A.; Srinivasan, S. Opinions, Sentiments, and Emotions.”
“Customizing the UMLS metathesaurus Cambridge University Press. 2015. https://
for your applications.” In Proceedings of www.cs.uic.edu/~liub/FBS/sentiment-
the AMIA Symposium. November 2002. analysis.html (accessed December 21,
Burges, C. “A tutorial on support vector 2016).
machines for pattern recognition.” Data Medline Plus Database. https://medlineplus.
Mining and Knowledge Discovery. 1998; gov/ (accessed December 13, 2017)
2: 121–167. Meng, X.; Bradley, J.; Yavuz, B.; Sparks,
Cavnar, W. B.; Trenkle, J. M. “N-gram-based E.; Venkataraman, S.; Liu, D. et al.
text categorization.” In Proceedings of “MLlib: Machine learning in Apache
the 3rd Annual Symposium on Document Spark.” J. Mach. Learn. Res.. 2016; 17(1):
Analysis and Information Retrieval, 1235–1241.
SDAIR-94. Las Vegas, NV, USA. 1994. Nielsen, F. Å. “A new ANEW: Evaluation
161–175. of a word list for sentiment analysis
Deng, L.; Wiebe, J. “MPQA 3.0: An in microblogs.” In Proceedings of the
entity/event-level sentiment corpus.” In ESWC2011 Workshop on 'Making Sense
Proceedings of the NAACL-HLT, 2015. of Microposts': Big Things Come in
Drugs.com. “Popular Drugs” from Drug Small Packages 718 in CEUR Workshop
Index A to Z. from https://www.drugs. Proceedings. May 2011. 93–98.
com/drug_information.html (accessed NLTK (Nature Language Tool Kit). www.
December 14, 2016) nltk.org (accessed December 15, 2016)
FDA Adverse Event Reporting System Pedregosa, F.; Varoquaux, G.; Gramfort,
(FAERS). https://www.fda.gov/drugs/ A.; Michel, V.; Thirion, B.; Grisel, O. et
surveillance/fda-adverse-event-reporting- al. “Scikit-learn: Machine learning in
system-faers. (accessed August 12, 2018) Python.” J. Mach. Learn. Res. 2011; 12:
Harnie, D.; Vapirev, A. E.; Wegner, J. K.; 2825–2830.
Gedich, A.; Steijaert, M.; Wuyts, R.; Peng, Y.; Moh, M.; Moh, T.-S. “Efficient
Meuter, W. D. “Scaling machine learning adverse drug event extraction using Twitter
for target prediction in drug discovery sentiment analysis.” In Proceedings of the
using Apache Spark.” In Proceedings 8th IEEE/ACM International Conference
of the 15th IEEE/ACM International on Advances in Social Networks Analysis
Symposium on Cluster, Cloud and Grid and Mining, ASONAM. San Francisco,
Computing. Shenzhen. 2015. 871–879. California. Aug. 2016. 1101–1018.
Hsu, D.; Moh, M.; Moh, T.-S. “Mining Pyspark. Spark Python API. http://spark.
frequency of drug side effects over a apache.org/docs/latest/api/python/index.
large twitter dataset using Apache Spark.” html (accessed December 21, 2016).
In Proceedings of the 9th IEEE/ACM Roesslein, J. Tweepy (An easy-to-use
International Conference on Advances Python library for accessing the Twitter
in Social Networks Analysis and Mining, API). http://www.tweepy.org (accessed on
ASONAM. Sydney, Australia, July 2017. August 12, 2019).
915–924. Shapiro, K.; Brown, S. Rx Prep Course
Jiang, K.; Zheng, Y. “Mining Twitter data for Book. 2016. 100–123.
potential drug effects.” In Advanced Data Tabassum, N.; Ahmed, T. "A theoretical
study on classifier ensemble methods and
its applications". In Proceedings of the 3rd Santa Clara, California. October 2015.
International Conference on Computing 1570–1574.
for Sustainable Global Development Yu, F.; Moh, M.; Moh, T.-S. “Towards
(INDIACom). New Delhi. 2016. 374–378. extracting drug-effect relation from twitter:
Toutanova, K.; Klein, D.; Manning, C. D.; A supervised learning approach.” In
Singer, Y. “Feature-rich part-of-speech Proceedings of the IEEE 2nd International
tagging with a cyclic dependency network.” Conference on Big Data Security on Cloud
In Proceedings of the Conference of the (BigDataSecurity), IEEE International
North American Chapter of the Association Conference on High Performance and
for Computational Linguistics on Human Smart Computing (HPSC), and IEEE
Language Technology, NAACL'03, Vol. 1. International Conference on Intelligent
Association for Computational Linguistics. Data and Security (IDS). New York, NY.
Stroudsburg, PA, USA. 2003. 173–180. 2016. 339–344.
WebMD LLC, Medscape database. 2019. Zaharia, M.; Chowdhury, M.; Das, T.; Dave,
https://reference.medscape.com/ (accessed A.; Ma, J.; McCauley, M.; Franklin, M. J.;
August 13, 2019) Shenker, S.; Stoica, I. “Resilient distributed
Wu, L.; Moh, T.-S.; Khuri, N. “Twitter datasets: A fault-tolerant abstraction
opinion mining for adverse drug reactions.” for in-memory cluster computing.” In
In Proceedings of the IEEE International Proceedings of the NSDI’12. April 2012.
Conference on Big Data (BigData).
CHAPTER 12
DEEP LEARNING IN BRAIN

SEGMENTATION
HAO-YU YANG
CuraCloud Corporation, Seattle, WA, USA
Corresponding author. E-mail: deronmonta@gmail.com
*
ABSTRACT learning model, therefore are more

suitable to handle highly diversified
Segmentation of normal tissue data such as medical images.
and lesion is a crucial first step to In this chapter, we first introduce
automated brain image analysis. some common brain imaging modal-
Segmentation distills key informa- ities and fundamental concepts in
tion about the patient such as lesion image segmentation. Then, we’ll
volume, the affected area, and discuss modern deep neural network
structural visualization. Tradition- approaches to brain segmentation.
ally, brain segmentation is done by Finally, we’ll take a look at state-of-
human annotations which may be the-art methods in this field.
time-consuming and requires highly-
knowledgeable annotators. While 12.1 INTRODUCTION
some machine learning methods
such as region growing may provide Analysis of brain images, specifi-
a certain level of automation for cally segmentation, is a standard
segmentation, oftentimes human procedure for quantitative analysis
editing is still required. In recent in clinical diagnosis. Noninva-
years, the developments of the deep sive imaging techniques such
neural networks have lead to prom- as computed tomography (CT),
ising results in fully-automated brain magnetic resonance imaging (MRI),
image segmentation. Deep neural and positron emission tomography
networks have higher model capacity (PET) are some routine imaging
compared to classical machine modalities for obtaining images of
the brain. Brain segmentation can be processing segmentation methods.

further divided into two main tasks: These methods rely solely on the
segmentation of normal brain tissues intensity information and distribu-
and segmentation of brain lesions. tion of pixels in the image to segment
Anatomical segmentation of normal objects of interests. The convoluted
brain tissues involves classifying structures of the brain and subtle
voxels of 3D images or pixels of discrepancy between normal tissues
two-dimensional (2D) images into and lesions prove to be too complex
heterogeneous structures like gray for these simple methods to capture.
matter, white matter, or cerebrospinal Machine learning, a branch of
fluid. Segmentation of brain lesions artificial intelligence (AI) that uses
aims to detect abnormal regions of pattern recognition techniques and
the brain such as a tumor, multiple learnable parameters, has shown the
sclerosis, and stroke lesion. capabilities to replace simple image
While the current gold standard processing as the method of choice
for brain segmentation is manual for image segmentation. Methods
labeling, this approach requires without stacked layers or deep
intensive labor from highly expe- representation are known as “tradi-
rienced experts. Furthermore, it is tional” machine learning, which
related to other issues like human includes algorithms like support
error and varying labeling standards vector machine (SVM), random
from a different institution. Hence, forest (RF), and Markov random
tremendous efforts have been field (MRF). While demonstrating
devoted to developing automated better performance than image
algorithms for segmenting brain processing methods, traditional
tissues and lesions with little to no machine learning still does not scale
human intervention. Automated well to high-dimensional data such
image segmentation is a computer in three-dimensional (3D) or four-
vision task that aims to partition an dimensional (4D) MRI sequences.
image into meaningful segments for Furthermore, it requires meticulous
quantitative analysis. It is often the manual feature extraction and may
foundation of medical image analysis show poor generalization facing
as useful information, for example, different scanners.
volume and position of anatomical Deep neural network (DNN)
structures, can be extracted from the or deep learning (DL) is a subclass
results of segmentation. of machine learning algorithms
Intensity thresholding, pixel clus- that consists of multiple layers of
tering, and histogram-based methods nonlinear operation. These layers are
are known as conventional image used to represent different levels of
Deep Learning in Brain Segmentation 263
abstraction. In contrast to traditional 12.2 BRAIN IMAGING

machine learning models, DNNs MODALITIES
eliminate the need for hand-crafted
features by incorporating data repre- Brain imaging (also known as
sentation as a part of the training. neuroimaging), involves using
With the recent rapid development of noninvasive techniques to obtain
DL, there has been a growing interest the structural or functional image of
in applying DNNs in medical image the brain. Structural imaging of the
analysis. With a diverse choice of brain refers to obtaining the structure
network specializations, convolu- of the central nervous system. Struc-
tional neural network (CNN) is by far tural imaging is useful for detecting
the most regularly utilized structure in large-scale intracranial diseases such
the field of computer vision and image as a tumor or brain trauma. Two of
recognition. In its basic form, CNN the most common structural brain
consists of convolution layers pooling imaging techniques used in clinical
layers and fully connected layers. settings are MRI and CT.
This chapter focuses on DL appli- Functional imaging is the
cations in brain image segmentation. measurement of a brain's function-
We start off with a brief introduc- ality. It is effective in identifying
tion of brain imaging modalities, metabolic-related diseases such as
image segmentation, followed by Alzheimer’s. The functional imaging
the essential image processing technique is also often employed in
procedures. We will then introduce cognitive studies as these researches
the basic building blocks for CNNs are concerned with functional
and lay the foundation for modern connectivity of the brain. Functional
CNN architectural designs. MRI (fMRI), positron emission
Next, we will direct our attention tomography (PET), and functional
to the publicly available datasets in near-infrared spectroscopy are some
neuroimaging. Training techniques examples of functional imaging
such as data augmentation and transfer methods.
learning will also be discussed in this Understanding the theo-
section. We then review state-of-the- retical background of each imaging
art DL models for brain segmentation modality is a fundamental prereq-
and draw comparisons with tradi- uisite for developing a medical
tional machine learning methods. image segmentation algorithm. For
Finally, we conclude the chapter with example, the pixel values of a CT
discussions regarding the current scan represent physical meanings
state, challenges of clinical integra- that correspond to different tissues
tion and future trends of DL in brain such as bones, normal tissues, or
segmentation.
micro-bleedings. Differentiating the types of contrasting images: the

nuances of each modality can help T1-weighted and T2-weighted MRI.
developers build a more efficient and The T1 image is created when the
error-proof system. In this section, magnetization of the tissue aligns
we provide succinct descriptions with the static magnetic field, while
and applications of routine imaging the T2 image is created by trans-
modalities in brain imaging. As we verse magnetization of the tissue
introduce each imaging modality, and the field. The two contrasting
the advantages and disadvantages of MRIs demonstrates distinct signal
each will also be covered. characteristics. A T1-weighted study
is useful for establishing a baseline
12.2.1 MAGNETIC for normal brain structures, while a
T2-weighted study can be used to
RESONANCE IMAGING
detect inflammation and tumor.
MRI is one of the most common Compared to radioactive imaging
modalities for brain imaging. It modalities that expose patients to
provides a distinctive contrast small doses of ionizing radiation
between different tissues of the like X-rays and CTs, MRI studies
central nervous system. A brain MRI do not involve radiation and present
study uses the magnetic gradients to no known health risk. Moreover,
differentiate heterogeneous tissues MRI offers better contrast of soft
of the central nervous system. A tissues which make up the majority
subject undergoing MRI study is of tissue in the brain. Hence, MRI
placed inside a confined scanner. has gained widespread popularity in
The scanner forms a strong magnetic neuroimaging since its introduction
field that excites the hydrogen atoms, in the late 1970s.
which is the most abundant atoms in
human and organisms in general. 12.2.2 COMPUTED
Hydrogen atoms in organisms typi-
TOMOGRAPHY
cally exist in the form of water or fat.
Therefore, by alternating hydrogen CT integrates multiple X-ray images
atoms between excitation and resting from various directions to form a
state, the scanner can collect gradient tomographic volume image. The
signals and map the location of water “slices” taken from each position are
and fat in human tissue. stitched together with digital geom-
The difference magnetization etry processing. Although other forms
direction during the process of of imaging modalities like PET also
returning from the excitation state utilize computer-generated tomog-
to the equilibrium state creates two raphy, the term “CT” typically refers
to X-ray CT. Since its introduction in diagnosis of diseases involving

the 1970s, CTs have gained increasing significant changes in the brain’s
popularity in diagnosing head-related metabolism such as Alzheimer's
trauma. A CT scan can be used to disease, brain tumors, and strokes.
detect hemorrhage, tumors, and calci- For example, there is little to no
fications of the brain. The advantage noticeable structural shift of the brain
of CT over MRI is the shorter time in the early stages of Alzheimer's
of the study. As such, in acute critical disease. However, PET can measure
conditions like trauma and stroke, CT regional glucose usage and detect
is the preferred imaging modality. subtle changes between patients
One of the main disadvantages of CT with Alzheimer's disease and normal
is the high levels of radiation. While patients.
providing high-resolution volumetric
images, the standard-dose CT scans
may present 1000 times of radiation 12.3 NEUROIMAGE
than 2D X-rays. SEGMENTATION
Image segmentation refers to the
task of partitioning the pixels of a
2D image or voxels of a 3D image
into semantic meaningful regions.
Ideally, the pixels or voxels in the
same semantic group should share
similar characteristics or present
physical significance. The purpose
of performing segmentation is to
FIGURE 12.1 Example CT image of a transform the image into a concise
stroke patient. representation that can be easily
manipulated in further quantitative
12.2.3 POSITRON EMISSION analysis such as area measurement,
TOMOGRAPHY object detection, and boundary
finding. We refer to the result of
PET is an imaging modality that image segmentation as “segmenta-
generates a functional map of the tion mask” or simply “mask.” Appli-
brain by measuring the distribution cations of image segmentation span
of radiolabeled chemical markers. multiple disciplines, ranging from
PET is one of the mainstream automated video surveillance, face
methods besides fMRI for obtaining detection, pathology localization, etc.
the functional structure of the brain. There are many ways to catego-
PET can also be used for early rize image segmentation methods.
In the context of this chapter, we 12.3.1 SEGMENTATION

classify the segmentation techniques EVALUATION METRICS
into simple image-processing-based
method and machine-learning-based 12.3.1.1 SØRENSEN–DICE
methods with learnable parameters.
Pure image processing segmentation Sørensen–Dice coefficient, or
techniques involve little to no learning commonly known as the Dice coef-
process. This method includes region ficient, is one of the most popular
growing, active snake, and intensity metrics used for evaluating the
thresholding. In contrast to image segmentation quality of medical
processing segmentation techniques, images. The coefficient is essentially
model-based approaches allow learn- a similarity measure of two sets X
able parameters to fit the data. Exten- and Y:
sive evaluation of a newly developed
X —Y
segmentation algorithm is crucial 2– (12.1)
to understanding its performance. X +Y
Two metrics commonly applied to
where X denotes the predicted mask
evaluate image segmentation are the
and Y denotes the ground truth labels.
Sørensen–Dice coefficient and the
The dice coefficient calculates the
Hausdorff distance.
intersection of the two sets while taking
Quantitative analysis of a
the sizes of each set into account.
neuroimage requires segmentation
of the target of interest. Based
on the nature of the target, brain 12.3.1.2 HAUSDORFF
segmentation can be categorized DISTANCE
into two task groups. Anatomical
brain segmentation is the identifi- In addition to the dice score, the
cation of distinct brain structures Hausdorff distance is another
such as cerebrospinal fluid, hippo- metric for quantitative evaluation
campus, and white matter. Brain of segmentation performance. The
lesion segmentation, on the other Hausdorff distance of two sets X and
hand, attempts to segment brain Y measures the maximal distance
lesions from normal tissue. For between one point in a set to the
instance, stroke, brain tumor, and nearest point in the other set. It can
microbleeding are targets for lesion be calculated with the following
segmentation in the brain. This equation:
section is an introduction to image
segmentation as well as its applica- d H ( X , Y ) = maxxX min y Y d ( x, y ) (12.2)
tion in neuroimaging.
12.3.2 NEUROIMAGING of brain tumors with Gliomas being

APPLICATIONS one of the most common subtypes.
Depending on the aggressiveness
12.3.2.1 ANATOMICAL BRAIN of the cancer cells, Gliomas can
SEGMENTATION be further classified into different
grades. Early diagnosis of a brain
Segmentation of anatomical brain tumor is crucial to treatment manage-
structures like corpus callosum, ment and recovery. It is important
lateral ventricle and hippocampus are to identify the area affected by the
useful for establishing quantitative lesion as preoperation segmentation
assessment for brain developments. of normal and lesion tissues can help
A popular method for segmentation surgeons safely remove the tumor
of anatomical structures of the brain area while avoiding healthy tissue
was the multiatlas approach. In an during an operation. Stroke is the
atlas-based method, several altars cutoff of blood flow to the brain and
that are most similar to the query is another frequent target of brain
image are selected and matched. lesion segmentation.
With the recent developments of DL,
neural networks have been replacing
the atlas-based methods, which 12.4 IMAGE PROCESSING FOR
also discarded the need for creating BRAIN SEGMENTATION
atlases.
The performance of automated
brain imaging analysis is heavily
12.3.2.2 BRAIN LESION affected by factors like scanner
noises, subject-wise variabilities,
SEGMENTATION
and manufacture inhomogeneities.
Segmentation of brain lesion is the The purpose of preprocessing is,
foundation to lesion volume measure- therefore, to standardize the images
ment, disease progression, and treat- before conducting the automated
ment assessment. Since the lesion algorithms in hopes of eliminating
area typically takes up a smaller area as many inconsistencies and artifacts
in proportion to the whole brain, as possible. The preprocessing
lesion segmentation is often done steps ensure that images acquired
with detection-based methods. One from different institution scanners
of the most routine applications of are quantitatively comparable. We
brain lesion segmentation is a brain introduce common image processing
tumor. Brain tumors are uncontrolled steps used in neuroimage analysis
growth of cells in the central nervous such as intensity normalization,
system. There are over 100 subtypes registration, and skull stripping.
12.4.1 IMAGE REGISTRATION images onto a single comparable

scale. As the pixel intensities
Image registration is the task of produced by different modalities
aligning images in different coordi- present physical meanings, intensity
nates into a unified system. Image normalization should be handled with
registration is an essential step the modality information in mind.
for comparing sequential images As an example, the intensity of a CT
obtained from the same source, i.e., image may range from negative to
sequential CT scans from the same 4000. Not selecting the appropriate
patient. Image registration is a ubiq- intensity window is detrimental to
uitous operation in brain imaging the automated algorithm.
since tracking a lesion in the brain
will likely require multiple studies in
12.4.3 SKULL STRIPPING
an indefinite time span.
When registering two images, one Skull stripping refers to removing
image is set as the target or fixed image the skull pixels of the head. In some
and the other is set as the moving modalities such as CT, the bones
image. The objective is to perform present a strong signal in the image
some transformation that matches and may affect the focus of the
coordinates of the moving image to that segmentation model.
of the fixed image. There are two types
of image registration: intensity-based
12.5 TRADITIONAL MACHINE
methods and feature-based methods.
Intensity-based registration matches LEARNING
the intensity information in the two Machine learning is a branch of AI
images with correlation metrics and that relies on statistical models and
transforms the moving image accord- pattern recognition to make predic-
ingly. In a feature-based registration tions on unseen data. By fitting the
algorithm, the first step is to identify parameters according to the training
distinctive feature points of both the data, these methods require minimal
fixed image and the moving image. explicit instructions on how to
The transformation is done by aligning perform a certain task.
the feature points of the moving image Automated approaches are neces-
to that of the fixed image. sary for large-scale brain segmenta-
tion because manual delineation of
anatomical structures and lesions are
12.4.2 INTENSITY
time-consuming and are subject to
NORMALIZATION
large observer bias. It is also infeasible
Intensity normalization is the projec- to conduct populational-level research
tion of intensities across different involving thousands of image that
requires a large number of expert extract or to look for and often boils
annotators. As mentioned in earlier down to expert domain knowledge
texts, image processing segmentation and previous research experiences.
methods are not sufficient to capture
the complexity of the brain structure.
12.5.2 LINEAR DISCRIMINANT
With a learnable parameter that adjusts
ANALYSIS
according to the data, machine learning
has been proposed for automated brain Linear discriminant analysis
segmentation. (LDA) is a machine learning
The subject of this section is method that uses a linear combina-
machine learning methods without tion of features to determine the
multilayer representation. We refer class of the inputs. The main prin-
to these methods as “traditional” ciple of LDA can be summarized as
machine learning methods in contrast projecting data from the extracted
to the DL models which will be the feature space onto a single dimen-
topic of the next section. While we sion space. The single dimension
mainly focus on DL approaches in this space can then be classified into
chapter, it is imperative to understand two classes by thresholding. The
the basics of traditional machine major disadvantage of using LDA
learning methods and understand is the linear nature of the classifier.
how DL methods compare against the In most practical cases, the data
conventional approach. cannot be separated in the feature
space by a linear function.
In conventional machine learning, 12.5.3 SUPPORT VECTOR
algorithms operate on features MACHINE
extracted from the image rather
than the raw image itself. A feature SVM is a class of supervised learning
can be thought of as a summary models that can be used for both clas-
statistics regarding the subject sification and regression tasks. With
image. Average pixel intensity and a set of labeled data points, an SVM
standard deviation are some exam- learns from this dataset and assign
ples of scalar features. Features an unseen example to a category
can also be matrices like the results without assigning probabilities.
from a neighboring pixel filter like Hence, SVM is a nonprobability
Gaussian filters, Haar filters, etc. classifier. With SVM, each data is
Features can also be study in Section represented as an n-dimensional
12.11. In summary, there are no vector. SVM classifies data point by
rigorous rules for which feature to what is known as a “hyperplane” that
creates the largest margin between process. The theoretical proofs

the different classes. of neural networks as a universal
function approximator have first
been introduced in the 1980s. Due
12.5.4 RANDOM FOREST
to the limitation of computation
A decision tree is a simple classifier powers, neural networks have not
aimed at performing classification been given much attention until
tasks by forming a series of deci- the early 2010s. With the recent
sion nodes and tree-like structure. significant advancement of special-
A decision tree consists of multiple ized hardware designed to perform
branches whose quantity is based matrix multiplication like graphical
on the number of input features. processing unit, the computational
Typically, each decision node in limitation that was hindering the
a decision tree is represented by development of neural networks is
the values of the input variables. now gone. A tremendous amount
The dependent (target) variable is of research has been devoted to
positioned at the leaf of each tree. DNNs in the past decade, which
Though a simple yet powerful has been backed-up by state-of-the-
technique, decision trees are art performance in a wide variety
prone to poor generalization. RF of disciplines including natural
improves upon a single decision language processing, computer
tree by assembling the results from vision, and medical images.
multiple independently trained There exists a wide variety of
decision tree. An RF is trained by neural network architectures such
bootstrap aggregating, meaning that as fully connected neural networks,
each tree node has only a sampled recurrent neural networks (RNNs),
subset of data to train on. convolutional neural networks
(CNNs), and restricted Boltzmann
12.6 DEEP NEURAL NETWORKS machines. These varying structures
are designed to optimize the infor-
DNNs or DL is a specialized branch mation flow of different data types.
of machine learning that uses RNNs are ideal for processing data
stacked layers of nonlinear opera- with sequential dependencies such
tions for processing information. as time series and texts. CNNs are
In contrast to the aforementioned the first choice when it comes to data
traditional machine learning with a spatial relationship such as
methods, DNNs do not require task- images. As such, CNNs are by far the
specific nor handcrafted features. most widely adopted in the computer
Instead, DNN incorporates feature vision domain and will be the focus
extraction as part of the learning of our discussions regarding neural
FIGURE 12.2 Training pipeline for conventional machine learning.
FIGURE 12.3 Training pipeline for deep learning.
networks. In this section, we describe adjacent elements. Using a vanilla

the basic operations that serve as the multilayer perceptron neural network
building blocks for modern CNNs. for analyzing images will destroy
These layers can be regularly seen the spatial information and requires
across state-of-the-art DL models with a large number of redundant param-
different combinations and variants. eters. In contrast, CNNs preserve the
spatial resolution by using an effi-
12.6.1 CONVOLUTIONAL cient representation of the encoded
information. CNNs are formed by a
NEURAL NETWORK
series of convolution layers, activa-
CNNs are a type of neural networks tion functions, and pooling layers.
specialized in handling high-dimen- Convolution layers produce feature
sional data with spatial resolution maps by sliding convolution kernels
such as images. In a traditional multi- across the image. Pooling layers are
layer feedforward neural network, the used for downsampling the feature
inputs are long vectors with no spatial space. The end layer of a CNN is
or temporal correlation between usually a fully connected layer.
12.6.1.1 CONVOLUTION position to the next. Higher stride

LAYER size means more area of the image
is skipped and therefore creating a
The convolution layer is the most smaller feature map. Padding refers
essential part of a CNN and typi- to inserting zeros around the edge of
cally the layer that accepts the input the image. This is useful for retaining
image. The outputs of a convolution the desired feature map size after
layer are often referred to as the a convolution layer. In contrast to
feature map in the DL terminology. fully-connected neural networks
A convolution layer consists of where connections are made across
multiple convolution filters (also all neurons, CNN utilizes an idea
known as kernels). Convolution called parameter sharing. By
filters have a fixed spatial arrange- sharing the same convolution kernel
ment in the same layer, that is, 5 × across an image, a good amount of
5 × 3 pixels. During the forward redundant weights can be avoided
pass, each filter will be slid across therefore allowing deeper network
the image and a discrete value structures.
convolution commonly denoted with
an asterisk (∗) will be performed at
each sliding position. A general defi- 12.6.1.2 ACTIVATION
nition of convolution is an operation FUNCTION
between two functions that gener-
ates a third function. In the case of After a convolution layer, the feature
CNN, the input image (and the input maps are passed through an activa-
feature maps in the forthcoming tion function
layers) can be thought of as the first The activation function intro-
function and the convolution kernels duces element-wise nonlinearities
are therefore the second function. to the model. Without the activation
The kernels will learn various visual function, the weights and bias from
features and create high activations the previous layer would be passed
if the area contains the feature that it onto the next merely in the form of
is looking for. some linear combinations. Linear
The size of the feature maps is combinations are limited in terms of
determined by three factors: depth, its ability to fit the data. The choice
kernel stride, and padding. The of activation function remains an
depth factor is simply the number of open research topic. While the
kernels at a specific layer. The kernel sigmoid function was the standard
stride size is how much the kernel preference for activation function
is moved from one convolution when DL started to gain attention,
the rectified linear unit (ReLU) and assigned classes in the ground truth
leaky ReLU have mostly replaced labels. For instance, if the task is to
the sigmoid function as the definitive classify voxels in an MRI into the
activation function. The size of the background, normal brain tissue, and
feature map remains unchanged after tumor tissue, then N = 3.
the activation function.
12.6.1.5 SKIPPED
12.6.1.3 POOLING LAYER CONNECTION
Pooling layers are typically inserted Before we introduce the skipped

in between convolution layers. connection (also known as a residual
The purpose of pooling layers is to connection), we need to understand
downsample the effective spatial the vanishing gradients problem
resolution and therefore reduces and build the intuition as to why
the total number of parameters. In such structural designs are neces-
other words, pooling layers provide sary. As mentioned in the earlier
summary statistics of the previous section, modern neural networks are
layers using a different mechanism. becoming deeper with the number
For example, the max-pooling layer
of layers well-exceeding hundreds.
returns the maximum activation
However, with more layers training
within a rectangular area. Average
neural networks (which we will
pooling, on the other hand, returns
go into details in the latter section)
the average of pixels within the
becomes difficult. This is because
designated rectangular area. Another
the gradients of parameters with
benefit of using pooling layers is to
respect to the loss become too small
introduce translation and spatial shift
after repeated matrix multiplications
invariance.
making earlier layer close the inputs
nearly impossible to adjust.
12.6.1.4 FULLY CONNECTED The residual neural networks
LAYER (ResNet) have been proposed as a
solution to the vanishing gradient
The last layer of a CNN is usually problem. ResNets utilizes a skip-
a fully connected layer. The feature connection in between neighboring
maps from the previous convolu- layers. The skip-connection takes
tional layers are reshaped to fit the the inputs from the previous layer
fully connected layer where the and adds it to the results of the
output is an N-dimensional vector. current layer, avoiding computation
N corresponds to the number of in the current layer. The skipped
connection solves the vanishing inputs. After repeated convolutions

gradients problem by letting the and pooling layers, the feature maps
gradients propagate through the have much smaller heights and
identity function. The skipped widths compared to the input image.
connection can be implemented Therefore, it is necessary to upsample
in almost any CNN structure and the feature maps back to the original
proves to be effective in improving dimension. Upsampling can be done
CNN training. by parameterless operations like the
nearest neighbor interpolation or a
learnable function such as transposed
convolutions. Transposed convolu-
tion can be thought of like a reverse
operation to the convolution. With
convolution, there is a many-to-one
relationship between the inputs
and outputs as the pixels under the
convolution area are “squashed” into
one value. Transposed convolution,
on the other hand, shows a one-to-
many relationship of the inputs and
outputs. A single pixel is mapped to
pixels within the kernel size of the
FIGURE 12.4 Example convolution block
with skipped connections.
transposed convolution. Take a 3 × 3
transposed convolution, for example,
the center pixel is upsampled to nine
pixels.
12.6.1.6 TRANSPOSED
CONVOLUTION AND 12.7 NEURAL NETWORKS FOR
UPSAMPLING SEGMENTATION
Transposed convolutions (some- In the previous section, we have

times called deconvolution or discussed the common opera-
fractionally strived convolution) and tions in a convolutional neural
upsampling layers are regularly used network. The topic of this section
in CNNs with a bottleneck structure is the architectural design of CNN
such as the U-net. Networks with a specifically for image segmentation
bottleneck structure share a similar purposes. In contrast to image clas-
strait as the outputs of these models sification task where the outputs are
have the same dimension as the
discrete-valued binary labels, image of fully convolutional neural

segmentation produces segmented networks (FCNNs) such as U-Net
masks that share the same dimen- have been the working horse of
sion as the input images thus major segmentation algorithms and
making this task computationally have demonstrated state-of-the-art
demanding. Furthermore, imaging performance.
of the brain often involves three-
dimensional images such as CTs
and MRIs, which aggravates the 12.7.1 PATCH-BASED CNN
issue of the computational burden.
Designing a segmentation network Patch-based CNN as its name
for volumetric images, therefore, suggests is a CNN training para-
requires additional attention to the digm where the original images are
hardware constraints. Common divided into smaller patches. Each
CNN structures used for brain patch contains a center pixel and
image analysis can be roughly segmentation is done by classifying
categorized into three subclasses: the center pixel with the neighboring
patch-based, semantic-based, and pixels as the context information.
triplanar. Semantic-based neural Since dealing with volumetric
networks output dense, images is the norm in neuroimaging,
voxel-wise predictions that the patch-based method is an effi-
assign probabilities to each indi- cient way to handle large memory
vidual voxel. Since the introduction consumption.
FIGURE 12.5 Schematic overview of a patch-based CNN.

12.7.2 FULLY with a fully-connected layer attached

CONVOLUTIONAL CNN at the end only accepts a fixed dimen-
sion of inputs since the size of the
A fully convolutional CNN model last layer determines the size of the
contains only convolution, max- final output. The fully convolutional
pooling and transpose convolution. nature of the fully convolutional
This kind of network predicts a network allows images with varying
whole probability map of each pixel dimensions and ideal for performing
belonging to a certain class. A CNN segmentation.
FIGURE 12.6 Schematic overview of a fully-convolutional CNN.
12.7.2.1 U-NET with a concatenation of the feature

maps in between the two paths. The
U-net is a CNN-based neural downsampling path contains stacked
network introduced by Olaf Ronne- convolutional layers, activation
berger, Philipp Fischer, and Thomas functions, and pooling layers until it
Brox and was given the name due to reaches the bottleneck section of the
its “U”-shaped architecture (Ronne- network. With the image gradually
berger et al., 2015). It was originally passed down the downsampling path,
designed to perform semantic the spatial resolution of the feature
segmentation on biomedical images map decreases while the feature
and is now widely adapted for resolution increases. After another
general segmentation task U-net is convolution block, the feature maps
an example of the fully convolu- are passed onto the upsampling path.
tional network (FCN) where no fully The upsampling path consists of
connected layers are involved. A transpose convolution blocks that
typical U-net consists of a downsam- up-samples the feature map back to
pling path and an upsampling path its original dimensions.
FIGURE 12.7 Schematic illustration of the U-net.
12.7.3 TRIPLANAR CNN 12.8 TRAINING AND

EVALUATION
An alternative approach to the patch-
based CNN in solving the memory The two fundamental prospects of
consumption issue is the triplanar DL are the structural design of the
CNN. A volumetric medical image network and parameters learning
(or any 3D image) can be viewed with latter often referred to as
from three planes: sagittal, coronal, “training.” Having discussed the
and transaxial. The triplanar CNN structural formation of DNNs in the
takes the 2D image from each view previous sections, we now shift our
and combines the results to form a attention to the dataset, training and
single 3D segmentation mask. The validation aspects of DL.
triplanar CNN is sometimes called a In the realm of machine learning,
2.5D method as the third dimension a typical paradigm for tuning
is used in its entirety. Compared parameters and validating models
to a 3D patch-based method, the is to partition the available data into
tri-planar CNN offers a larger field three nonoverlapping splits: training,
of view while still incorporating validation, and testing. The training
context information along the third dataset is used for fitting the model
dimension.
parameters. The validation set is of common training techniques

used for adjusting the hyperparam- for CNNs in this section. In-depth
eter such as learning rate, number discussions regarding the theoretical
of kernels, etc. The model’s perfor- details of backpropagation is beyond
mance on unseen data is evaluated the scope of this chapter. Readers
using the test dataset. For the training can refer to Bengio et al. (2017) for
and validation set, both images and detailed theoretical proofs.
ground truth labels are included.
Ground truth labels whether it be
binary class labels, bounding boxes 12.8.1 TRAINING NEURAL
or segmentation masks, are obtained NETWORKS
from expert annotation as the "gold
standard" for the model. Training a neural network is formu-
A typical workflow of designing lated as an error-minimization
an automated algorithm for brain problem where the goal is to mini-
image analysis can be summarized mize the distance between the ground
as follows: First, the developers truth label and the predicted results
formulate the problem by examining from the model. Training a neural
the dataset and understanding the network is essentially a sequential
objective. Second, developers will four-step procedure that is repeated
conduct thorough literature research iteratively until the network reaches
of current methodologies and design convergence. These steps are the
a network architecture tailored forward pass, loss calculation, back-
for the specific task. Then, model ward pass, and parameters update. In
training, debugging and hyper- the forward pass step, an image will
parameter tuning will take place. be fed as the input to the network
Finally, the model is evaluated using and passed through the layers of
a test dataset that is not seen by the the CNN. The results of the forward
model. pass will then be compared to the
The aforementioned convolution actual label to produce a continuous-
layers and fully-connected layers all valued error. This is called the loss
contain learnable parameters in the calculation step. In the next phase,
forms of weight metrics. Training backward pass (also known as
a neural network is effectively backpropagation), each parameter
adjusting these weight metrics in the networks has attributed a
according to the training data. The gradient that represents how much
performance of a neural network the parameter is responsible for the
undoubtedly depends on the training error calculated in the previous step.
process. We will be covering a couple Finally, parameters update adjusts
the weights in the convolutional of a machine learning algorithm,

layers and fully connected layers collecting new data may be expen-
with respect to the gradients from the sive and sometimes infeasible for
backward pass step. the case of medical images. Data
augmentation is a powerful training
technique that extends the existing
12.8.2 HYPERPARAMETERS dataset at hand. It is widely adopted
in DNNs and machine learning algo-
The hyperparameters in a neural rithms in general due to its effective-
network refer to those parameters
ness in preventing overfitting and
that are not attributed in the back-
improving model performance. The
ward pass and parameters update
technique creates small perturba-
steps. Since hyperparameters are not
tion by performing geometric and
learned in the training process, they
color transformation to the input
need to be manually adjusted by the
image. For instance, randomly crop-
developers. Take the learning rate,
ping small portions of the image
for example. Learning rate is how
10 times effectively creates 10
much the parameters are adjusted
each time during the parameter different images to the dataset. Other
update step. Balancing the learning forms of data augmentation include
rate is important because a low image translation, random flipping,
learning rate leads to slow conver- color jittering, and random affine
gence and too high of learning will transformation.
lead to unstable training. Hence,
experimenting with some sensible
12.8.3.2 TRANSFER LEARNING
choice of hyperparameters is crucial
to the performance of the model
Training a neural network from
and requires a solid understanding
scratch can be time-consuming and
of CNNs. Other hyperparameters
random weight initialization will
include a number of layers, layer
likely lead to models with fluctuating
arrangements, stride size, kernel
local minimum. Images generally
size, padding size, etc.
share similar low-level features like
edges, lines, and geometric shapes,
12.8.3 TRAINING TECHNIQUES and these features will likely be
learned across any image dataset.
12.8.3.1 DATA AUGMENTATION Transfer learning is an effective
training technique that reuses the
While having access to more training weights of a CNN previously trained
data improves the performance on a large dataset for a secondary
task. By reusing the weights from a by normalizing the activations at

large dataset, we can assure that the each layer, the interlayer interaction
model learns the low-level features can be reduced, meaning that an
on from a robust source. Generally over-activation in one layer will less
speaking, the fully connected layer likely to affect the up-coming layer.
at the end of the CNN is the only Second, without batch normaliza-
structural modification that needs to tion, choosing a high learning rate
be done for transfer learning. will likely to cause fluxing training.
In short, batch normalization helps
stabilize the training process.
12.8.3.3 DROPOUT
Dropout is a training technique that 12.9 DATASETS AND STATE OF

is proven to be effective against THE ART
overfitting.
In simple terms, executing a Having a large dataset with high-
dropout in training is to ignore quality labels is just as crucial as the
neurons by random. The neurons design of the model for data-driven
being dropped out will not partici- methods like DNNs. There are
pate in the forward pass or loss several challenges regarding the
calculation in the back-propagation availability of data that are unique in
pass. By dropping some neurons the medical imaging domain. First,
during training, the model is forced compared to tasks involving natural
to learn from the strongest signal and images where abundant data exists,
therefore not likely to overfit. publically accessible medical images
are scarce. The lack of data is espe-
cially problematic for DL models
12.8.3.4 BATCH which are notorious for being "data
NORMALIZATION hungry.” Second, developers need
to be aware of the privacy issues
Batch normalization is a technique regarding sensitive patient informa-
for improving the training of neural tion when collecting medical data for
networks in terms of speed and training machine learning models.
guarantee convergence. Essentially, The Health Insurance Portability
batch normalization adjusts each and Accountability Act (HIPPA) is
mini-batch by subtracting the a United States legislation that aims
batch mean from each sample then at safeguarding sensitive medical
dividing by the batch standard devia- information. Any clinical applica-
tion. There are two major advantages tions involving patient data needs to
of using batch normalization. First, be HIPPA compliant.
Recently, online challenges In this section, we introduce

with publicly available dataset several state-of-the-art publically
are gaining considerable atten- available datasets related to brain
tion. An online challenge is a data segmentation. The modalities of these
science related competition that datasets range from MRI to CT with
offers public datasets and invites different segmentation targets such as
participants to develop algorithms tumor and stroke. Furthermore, these
regarding the dataset. Typically, datasets are often used as the bench-
an online challenge undergoes a mark for task-specific performance.
sequence of phases. The first phase is For example, the Multimodal Brain
usually the “training phase,” where Tumor Segmentation Challenge
the training dataset alongside expert (BraTS) has been a gold standard for
annotated ground truths is released developing automated brain tumor
to the participants. Participants will segmentation algorithms.
use this data to design and evaluate
their automated algorithms. After
12.9.1 BRATS
a designated period of time, the
organizers will release a set of test
BraTS is an annual online challenge
data with no accompanying labels.
held in conjunction with one of the
The participants are asked to either
largest conferences on computer-
submit their program or the results assisted medical imaging, the
on the test set to the challenge International Conference on Medical
website. Before the widespread use Image Computing and Computer-
of online challenges, algorithms are Assisted Intervention (MICCAI).
trained and evaluated on different The goal of the BraTS challenge is
datasets, making it extremely to identify state-of-the-art methods
difficult to quantitatively compare for automated brain tumor segmenta-
the effectiveness of each algorithm. tion. Each year, the organizers release
Specifically, it is hard to pinpoint an increasingly larger number of
whether the improvements came MR scans with different contrasting
from a well-designed model or methods. The last BraTS challenge
simply having a larger dataset with (BraTS 2018), featured 135 glio-
better quality. Online challenges blastoma studies and 108 low-grade
resolve these issues by offering glioma studies collect from 19 insti-
benchmark datasets and a robust tutions. The expert-annotated labels
evaluation method for the partici- included different tumor subtypes like
pants. By using the same dataset for gadolinium enhancing, peritumoral
training, different algorithms share edema, and necrotic and nonen-
the same platform for evaluation hancing tumor core. The imaging
study from a single patient consists 12.9.2 ISLES

of four contrast modalities: native T1,
postcontrast T1-weighted (T1Gd), Ischemic stroke lesion segmentation
native T2, and T2 fluid attenuated (ISLES) is a medical image segmen-
inversion recovery (T2-FLAIR). In tation challenge aimed at identifying
the 2018 challenge, the dataset also stroke lesions locations. The ISLES
included overall survival data where challenge is also held in conjunction
the participants are asked to predict with MICCAI. While the target of
the overall survival time using pre- segmentation remained ischemic
operation scans. stroke lesion, the imaging modality
The state-of-the-art method for for ISLES has changed over the
brain tumor segmentation is the first years. For ISLES 2016 and 2017,
place method proposed by NVIDIA the dataset provided by the orga-
(Myronenko, 2018) in the BraTS nizers were 51 MRIs from stroke
2018 challenge. The method consists patients. In the 2018 challenge, the
of a regular U-net branch and a varia- organizers provided acute stroke CT
tional autoencoder (VAE) branch that and multiple perfusion maps using
shares partial parameters. The VAE contrast agents. For instance, cere-
branch takes the multimodality MRIs bral blood volume, cerebral blood
as inputs and seek reconstructs iden- flow, and time to peak of the residue
tical images as outputs. The VAE acts function (Tmax) are some perfusion
as a regularizer to ensure segmenta- maps included in the dataset.
tion quality. The dice coefficients for The best-performing method for
this method are 0.76, 0.88, and 0.81 ISLES (Tong, 2018) used a multi-
for enhancing tumor core, whole scale 3D U-net with dilated convo-
tumor, and tumor core, respectively. lution. The multiscale approach
combines low-level features and
high-level features to ensure precise
pixel-wise localization.
12.9.3 MRBRAINS
The grand challenge on MR brain

segmentation (MRBrainS) is another
challenge held with MICCAI. In
contrary to BraTS and ISLES, which
are both brain lesion segmentation
FIGURE 12.8 Example data from the
BraTS challenge. tasks, the MRBrainS challenge is
an illustration of anatomical brain of online challenges and datasets,

segmentation. benchmarking new segmentation
In the MRBrainS18 challenge, algorithms performance can be done
the organizers provided 30 fully with little effort. Methods employing
annotated multimodality MRI the U-net architecture with a variety
sequences. The modalities include of modifications can be found across
T1, T1 inversion recovery, and almost all semantic segmentation
T2-FLAIR. The ground truth challenges. Developers usually start
covered 11 labels including basal with a U-net as the baseline method
ganglia, white matter, ventricles, for the segmentation task.
brain stem, etc. Although DL approaches,
The state-of-the-art method for especially CNN-based models,
MRBrainS is a 3D patch-based have demonstrated state-of-the-art
U-net proposed by Luna et al. The performances for segmentation of
modification for this approach is that brain images, it is still challenging
instead of directly concatenating the to develop a general-purpose model
feature maps between the downs- that is robust to scanner, task and
ampling path and the upsampling modality variations. Currently, these
path of the U-net, the authors added variations are handled by meticu-
transition layer in the bridge section. lously sought out preprocessing steps
that differ from task to task. In the
following section, we are going to
12.10 DISCUSSION discuss some challenges DL models
face in terms of clinical integration
Despite only recently applied to and adapting to the medical domain.
brain image analysis, DNNs have
demonstrated immense potential
for accelerating clinical workflow, 12.10.1 CHALLENGES
improving segmentation accuracy,
and conserving medical resources. One of the main criticisms of the
DL approaches have surpassed DNN is the lack of interoperability as
conventional machine learning it has often been coined as a “black
methods in terms of generalizability box” approach. The black box refers
and accuracy on numerous occasions. to a model that is difficult to trace
Compared to traditional machine the predictions back to the inputs.
learning methods, DL models do In other words, it is challenging to
not need hand-engineered features, know why the black box model
which lead to faster and more robust made a certain decision by exam-
development. With the widespread ining the inputs. Interoperability of
a computer-aided diagnosis (CAD) such as time consumption and limited

system is especially important since spatial resolution of the attention
applications in the medical domain maps still exist.
involve the well-being of human Another issue with a DNN is
lives. Physicians employing a CAD the potential risk of overfitting.
system should be able to trace the Overfitting occurs when the model
decision process from the input parameters fit a limited amount of
image to the predicted probabilities data points too closely. Real-world
of disease. While there are other data often contains random noises
black box machine algorithms such and an overfitting model will likely
as SVMs, a large number of DNN to consider these noises as also a part
applications in medical imaging are of the signal. For instance, machine
much more likely to take place in manufacturer, institutional imaging
the foreseeable future. Addressing protocols and meta-information will
the interoperability issue is therefore all affect the neural network as it will
imperative to the development of mistakenly associate these signal
DNNs in the medical domain. with the true label if not enough data
The first convolutional layer is presented. Furthermore, a neural
perhaps holds the highest interoper- network that overfitted the training
ability amongst all layers in a CNN. data may produce contradicting
By checking the activations of results given visually similar inputs.
feature maps from the initial layer, There are two aspects to addressing
human readers can tell whether a the overfitting issue. The first solu-
certain kernel detects edges, curved tion is to present the model with more
lines or a specific geometric shape. data to ensure the model learns from
However, the feature maps beyond the true data distribution. Though
the first layer are too abstract for the online challenges have mitigated
human comprehension. Currently, the the lack of medical data, the number
mainstream approach for interpreting of images in the medical dataset is
the predictions of CNNs is by the use still relatively limited compared to
of attention mechanism. A CNN with natural image datasets like ImageNet
attention module will place higher and CIFAR-10. The second solution
weights on different parts of the is to use regularization techniques.
image. By examining the feature maps The regularization techniques can be
through the form of a weighted heat training related such as weight decay
map, human readers can distinguish or learning rate decay. Regulariza-
which part of the image contributes tion can also be an explicit design in
the most to a certain prediction of the the neural network like dropout and
network. However, some drawbacks batch normalization.
12.10.2 FUTURE TRENDS 12.10.2.2 GENERATIVE

ADVERSARIAL NETWORKS
DL is one of the fastest growing
scientific disciplines with substan- Generative adversarial network
tial breakthrough each year. Since (GAN) is a relatively new concept in
the topic of this chapter is segmen- DL where a generative network and a
tation, we have been focusing on discriminator network are trained in a
segmentation network specifically min–max game fashion. The purpose
CNNs. However, the field of DL is of the generator is to produce realistic
broad and innovations in some other samples, and it is the discriminator's
seemingly unrelated domain often duty to identify fabricated images.
have a far-reaching effect. Here, Previously, GANs have been mostly
we take a look at some cutting-edge utilized in generating realistic natural
DL trends for DL that were not images. Recent researches have shown
proposed to solving segmentation that GANs can also be employed in
task but ended up having a signifi- image segmentation. There are two
cant impact. ways that GANs can be used for image
segmentation. The first approach is
to use the discriminator as a critique
12.10.2.1 REGIONAL CNN network for refining the segmentation
results of a base segmented network.
Regional CNN (R-CNN) is a family For example, a classification network
of models based on region proposal takes the ground truth segmentation
networks where regions of interest mask and a U-net predicted segmen-
are detected through bounding tation mask as inputs and tries to
box regression. These methods distinguish whether the mask is real
are widely adopted in the general or generated. The gradients are then
computer vision community where fed back to the U-net to create masks
object detection is much more that resemble the ground truth more
common than object segmentation. closely.
The first generation of R-CNN The second approach is to use
produces only bounding boxes. a conditional GAN to transform
However, with the introduction of the original image into a “mask”
Mask-RCNN, its application has domain. In contrast to a vanilla
extended to segmentation as well. GAN where the network produces
There have been recent studies images based on random Gaussian
employing Mask-RCNNs in brain noises, a conditional GAN gener-
segmentation and shown promising ates a new image conditioned on a
results. reference image. In a segmentation
scenario, the reference image is REFERENCES

the original image that needs to be
Bakas, Spyridon, et al. “Advancing the
segmented and the output is a mask
Cancer Genome Atlas Glioma MRI
with discrete values. The conditional Collections with Expert Segmentation
GAN transforms the input image Labels and Radiomic Features.” Scientific
from its original domain to the Data, 5 Sept. 2017, www.ncbi.nlm.nih.
masked domain. The discriminator gov/pubmed/28872634.
Bengio, Yoshua, et al. Deep Learning. MIT
network holds a similar objective as
Press, 2017.
the first approach where it tries to Brainlesion: Glioma, Multiple Sclerosis,
differentiate if the transformed mask Stroke and Traumatic Brain Injuries: 4th
comes from a true mask distribution. International Workshop, BrainLes 2018,
Held in Conjunction with MICCAI 2018,
Granada, Spain, September 16, 2018—
12.11 SUMMARY Part I. Springer, 2019.
In this chapter, we have provided a Brainlesion: Glioma, Multiple Sclerosis,
Stroke and Traumatic Brain Injuries: 4th
comprehensive discussion of DNN International Workshop, BrainLes 2018,
applications in brain image segmen- Held in Conjunction with MICCAI 2018,
tation. The first section of this chapter Granada, Spain, September 16, 2018—
is an introduction to brain imaging Part II. Springer, 2019.
and image segmentation. We have Brant, William E., and Clyde A. Helms.
Fundamentals of Diagnostic Radiology.
then introduced DNNs and frequent
Wolters Kluwer/Lippincott Williams &
operations such as convolution, max- Wilkins, 2012.
pooling, batch-norm, etc. We then Brebisson, Alexandre De, and Giovanni
proceed to show how these opera- Montana. “Deep Neural Networks
tions are arranged to form a func- for Anatomical Brain Segmentation.”
tioning network for segmentation Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition
purposes. Model training and various Workshops (CVPRW’15), 2015,
training techniques were covered in doi:10.1109/cvprw.2015.7301312.
the subsequent section. Finally, an Brosch, Tom, et al. “Deep 3D Convolutional
overview of notable public datasets Encoder Networks With Shortcuts for
and state-of-the-art models for brain Multiscale Feature Integration Applied to
Multiple Sclerosis Lesion Segmentation.”
segmentation were discussed.
IEEE Transactions on Medical Imaging,
May 2016, 35, 1229–1239, www.ncbi.
KEYWORDS nlm.nih.gov/pubmed/26886978.
Caceres, J. Alfredo, and Joshua N. Goldstein.
brain segmentation “Intracranial Hemorrhage.” Emergency
Medicine Clinics of North America,
lesion segmentation Aug. 2012, www.ncbi.nlm.nih.gov/pmc/
deep learning articles/PMC3443867.
neural networks Cancer Imaging Archive Wiki. Segmentation
Labels and Radiomic Features for the
Pre-Operative Scans of the TCGA-LGG Ho, Tin Kam. “Random Decision Forests.”
Collection—TCIA DOIs—Cancer Imaging Proceedings of the 3rd International
Archive Wiki, wiki.cancerimagingarchive. Conference on Document Analysis
net/display/DOI/Segmentation Labels and and Recognition, doi:10.1109/
Radiomic Features for the Pre-operative icdar.1995.598994.
Scans of the TCGA-LGG collection. Kamnitsas, Konstantinos, et al. “Efficient
Cancer Imaging Archive Wiki. Segmentation Multi-Scale 3D CNN with Fully
Labels and Radiomic Features for the Connected CRF for Accurate Brain Lesion
Pre-Operative Scans of the TCGA-GBM Segmentation.” Medical Image Analysis,
Collection—TCIA DOIs—Cancer Imaging 2017, 36, 61–78. www.ncbi.nlm.nih.gov/
Archive Wiki, doi.org/10.7937/K9/ pubmed/27865153.
TCIA.2017.KLXWJJ1Q. Long, Jonathan, et al., “Fully Convolutional
Caselles, Vicent, et al. Geodesic Active Networks for Semantic Segmentation.”
Contours. Kluwer Academic Publishers, Proceedings of the IEEE Conference on
doi.org/10.1023/A:1007979827043. Computer Vision and Pattern Recognition
CIFAR-10 and CIFAR-100 Datasets, www. (CVPR’15), 2015, doi:10.1109/
cs.toronto.edu/~kriz/cifar.html. cvpr.2015.7298965.
Current Methods in Medical Image Mcrobbie, Donald W., et al. MRI from
Segmentation. Annual Reviews, www. Picture to Proton. Cambridge University
annualreviews.org/doi/10.1146/annurev. Press, 2016.
bioeng.2.1.315. Menze, Bjoern H, et al. “The Multimodal
Deng, Jia, et al. “ImageNet: A Large- Brain Tumor Image Segmentation
Scale Hierarchical Image Database.” Benchmark (BRATS).” IEEE
Proceedings of the IEEE Conference on Transactions on Medical Imaging, 2015,
Computer Vision and Pattern Recognition, 34, 1993–2024. www.ncbi.nlm.nih.gov/
2009, doi:10.1109/cvprw.2009.5206848. pubmed/25494501.
Feng, Xue, et al. “Brain Tumor Segmentation Moeskops, Pim, et al., “Automatic
Using an Ensemble of 3D U-Nets and Segmentation of MR Brain Images with
Overall Survival Prediction Using a Convolutional Neural Network.” IEEE
Radiomic Features.” Brainlesion: Transactions on Medical Imaging, 2016,
Glioma, Multiple Sclerosis, Stroke and 35, 1252–1261. www.ncbi.nlm.nih.gov/
Traumatic Brain Injuries, Lecture Notes pubmed/27046893.
on Computer Science, 2019, pp. 279–288., Morra, Jonathan, et al., “Machine Learning
doi:10.1007/978-3-030-11726-9_25. for Brain Image Segmentation.”
Gaillard, Frank. “Ischemic Stroke” Machine Learning, 2012, pp. 851–874,
Radiopaedia Blog RSS, radiopaedia.org/ doi:10.4018/978-1-60960-818-7.ch408.
articles/ischaemic-stroke. Myronenko, Andriy. “3D MRI Brain
Havaei, Mohammad, et al. “Brain Tumor Tumor Segmentation Using Autoencoder
Segmentation with Deep Neural Regularization.” Brainlesion: Glioma,
Networks.” Medical Image Analysis, Multiple Sclerosis, Stroke and Traumatic
Jan. 2017, www.ncbi.nlm.nih.gov/ Brain Injuries, Lecture Notes on
pubmed/27310171. Computer Science, 2019, pp. 311–320,
He, Kaiming, et al. “Deep Residual Learning doi:10.1007/978-3-030-11726-9_28.
for Image Recognition.” Proceedings of Pham, D.L., Xu, C., Princ, J.L. Computer
the IEEE Conference on Computer Vision Vision. Springer, 2014.
and Pattern Recognition (CVPR’16), Poldrack, Russell A., and Rebecca Sandak.
2016, doi:10.1109/cvpr.2016.90. “Introduction to This Special Issue: The
Cognitive Neuroscience of Reading.” arXiv.org, 2 Mar. 2015, arxiv.org/

Scientific Studies of Reading, 2018, 8, pp. abs/1502.03167.
199–202, doi:10.4324/9780203764442-1. “Special Section on Deep Learning in
Prasoon, Adhish, et al. “Deep Feature Medical Applications.” IEEE Transactions
Learning for Knee Cartilage Segmentation on Medical Imaging, vol. 34, no. 8,
Using a Triplanar Convolutional Neural 2015, pp. 1769–1769, doi:10.1109/
Network.” Advanced Information tmi.2015.2460431.
Systems Engineering, Lecture Notes on “Support Vector Machine.”
Computer Science, 2013, pp. 246–253, SpringerReference, doi:10.1007/
doi:10.1007/978-3-642-40763-5_31. springerreference_63661.
Razzak, Muhammad Imran, et al. “Deep Tanoue, L.T. “Computed Tomography—An
Learning for Medical Image Processing: Increasing Source of Radiation Exposure.”
Overview, Challenges and the Future.” Yearbook of Pulmonary Disease, vol.
Lecture Notes on Computational Vision 2009, 2009, pp. 154–155, doi:10.1016/
and Biomechanics Classification s8756-3452(08)79173-4.
in BioApps, 2017, pp. 323–350, Wang, Mei, and Weihong Deng.
doi:10.1007/978-3-319-65981-7_12. “Deep Visual Domain Adaptation: A
Ronneberger, Olaf, et al. “U-Net: Survey.” Neurocomputing, vol. 312,
Convolutional Networks for Biomedical 2018, pp. 135–153, doi:10.1016/j.
Image Segmentation.” Medical Image neucom.2018.05.083.
Computing and Computer-Assisted Yang, Hao-Yu, and Junlin Yang. “Automatic
Intervention—MICCAI’15, Lecture Notes Brain Tumor Segmentation with
on Computer Science, 2015, pp. 234–241, Contour Aware Residual Network and
doi:10.1007/978-3-319-24574-4_28. Adversarial Training.” Brainlesion:
Roth, Holger R., et al. “A New 2.5D Glioma, Multiple Sclerosis, Stroke and
Representation for Lymph Node Detection Traumatic Brain Injuries, Lecture Notes
Using Random Sets of Deep Convolutional on Computer Science, 2019, pp. 267–278,
Neural Network Observations.” Medical doi:10.1007/978-3-030-11726-9_24.
Image Computing and Computer-Assisted Yang, Hao-Yu. “Volumetric Adversarial
Intervention—MICCAI 2014, Lecture Notes Training for Ischemic Stroke Lesion
on Computer Science, 2014, pp. 520–527, Segmentation.” Brainlesion: Glioma,
doi:10.1007/978-3-319-10404-1_65. Multiple Sclerosis, Stroke and Traumatic
Schroff, F., et al. “Object Class Segmentation Brain Injuries, Lecture Notes on
Using Random Forests.” Proceedings of Computer Science, 2019, pp. 343–351,
the British Machine Vision Conference, doi:10.1007/978-3-030-11723-8_35.
2008, doi:10.5244/c.22.54. Zhou, S. Kevin, et al. Deep Learning for
Sergey, et al. “Batch Normalization: Medical Image Analysis. Academic Press,
Accelerating Deep Network Training 2017.
by Reducing Internal Covariate Shift.”
CHAPTER 13
SECURITY AND PRIVACY ISSUES

IN BIOMEDICAL AI SYSTEMS AND
POTENTIAL SOLUTIONS
G. NIRANJANA* and DEYA CHATTERJEE
Department of Computer Science and Engineering, SRM Institute of
Science and Technology, Kattankulathur, Chennai 603203, India
Corresponding author. E-mail: niranjag@srmist.edu.in
*
ABSTRACT continue to gain more importance

and surpass the state-of-the-art
Machine learning and artificial systems in terms of performance in
intelligence (AI) have been rapidly biomedical and healthcare problems,
progressing in several fields, and one the aspect of preserving the privacy
of these is healthcare and biomedical and ensuring the security of our data
systems where AI and deep learning and the users becomes extremely
algorithms have shown massive pertinent. Deep learning techniques
may be at the risk of learning
success for various applications and
and memorization of confidential
use cases, such as virtual healthcare
information instead of generalizing
assistants, smart medical homes,
well to it, which, in fact, is the
automated diagnosis, processing of
main aim (often termed as overfit-
pathology reports, drug discovery, ting). Biomedical data is inherently
implantable medical devices, and complex and involves sensitive and
many more. At this stage of prog- private, often confidential, informa-
ress, AI can reshape the healthcare tion of patients and users, which
workforce and bring about a drastic is easily vulnerable in the face of
positive change in the understanding attacks by malicious actors. Often,
and handling of biomedical data in the handling of clinical data requires
automated systems. However, as more context than the standard for
these artificially intelligent systems usual applications of deep learning
algorithms, such as patient history, algorithm), transferability attacks,

patient preferences, social perspec- model theft, etc. These are pertinent
tives, etc. Moreover, the “black box” questions from the perspective of
nature of deep learning algorithms general machine learning security,
results in a lack of model interpret- and is even more important in the
ability and gives rise to confusion biomedical domain, considering
regarding exactly how an AI model the stake of human life and health.
can achieve the kind of performance As far as biomedical AI systems
it does. It is hence not easy to identify are concerned, traditional security
the weaknesses of the model or the and privacy mechanisms are not
reasons for the weakness, or even to suitable to be adopted due to the
extract additional biological expla- ever-changing nature of research
nations from the results. This leads and the complexity of medical data.
to potential misuse of algorithms by To counter such security threats,
attackers and poses potential threats many studies have suggested the
to user/patient security in biomedical adoption of a set of best practices
AI systems. Moreover, biomedical when working with biomedical data
systems usually leverage third-party and also to ensure the optimal use
cloud platforms due to scalability, of predictive models in research,
storage and performance benefits, especially to discourage inadequate
and privacy compromisation is also studies with inaccurate results that
likely to happen in such situations may compromise the credibility
unless secure sharing schemes and of important and valid research in
suitable encryption techniques are the field. There have been recent
devised. The problem of security studies pointing to the practice of
also arises in the case of data inte- keeping training data private while
gration and adoption that is needed simultaneously building accurate AI
to develop large scale biomedical models. Two important techniques,
expert systems. Some ways in which in this case, are differential privacy
user data privacy can be jeopardized and federated learning, which may
include indirect data leakage, data serve as potential solutions for
poisoning (i.e., including fake data the problem. To counter linkage
samples in training set to drastically attacks and security threats of the
change the accuracy), linkage attacks similar sort, which is especially
(i.e., recognizing the actual identi- important in the wake of healthcare
ties of anonymized users), dataset services in mobile devices that
reconstruction from published can compromise user identity and
results, adversarial examples (i.e., location data, recent studies have
adding noise to data to mislead the suggested private record linkage and
Security and Privacy Issues in Biomedical AI Systems and Potential Solutions 291
entity resolution techniques, such as systems, where AI and deep learning

deriving unique fingerprints from algorithms have shown massive
genomes to preserve patient identity. success for various applications and
Finally, it is extremely important to use cases, such as virtual healthcare
test AI models in real-time clinical assistants, smart medical homes,
situations (which are often complex automated diagnosis, processing of
and noisy) to further understand the pathology reports, drug discovery,
fragility of such models and where implantable medical devices, and
their vulnerabilities can be exploited, many more. Thus, as biomedical
so that better security schemes can systems become increasingly depen-
be devised to counter the problem. dent on technology, and especially
After all, it is always essential to AI and data analytics, artificially
understand the problem thoroughly intelligent systems continue to
to come up with actual and effective surpass the state-of-the-art systems
solutions. In conclusion, addressing in terms of performance in complex
the problem of security and privacy clinical problems and healthcare
in biomedical AI systems is complex, surveillance has become easier and
multidisciplinary, and also involves better. AI can reshape the healthcare
ethical and legal perspectives. As workforce and bring about a drastic
newer and better machine learning positive change in the understanding
and deep learning algorithms are and handling of biomedical data in
devised to tackle the problems in automated systems.
the healthcare and medical domain, However, patient concerns
newer security threats will also regarding their sensitive medical
emerge. We are confident that information such as medical history,
research in this field will result in genetic markers, etc. that are fed
quality solutions to achieve the true into AI systems to make predic-
balance between performance and tions about treatment or medical
privacy that is conducive to users diagnosis, or are given to third-party
and patients in the healthcare and services or less-trustworthy smaller
biomedical domains. organizations (Shokri et al., 2017) is
likely to rise. As it is, deep learning
techniques may be at risk of learning
13.1 INTRODUCTION and memorization of confidential
information instead of generalizing
Machine learning and artificial well to it, which, in fact, is the main
intelligence (AI) have been rapidly aim, which is often termed as over-
progressing in several fields and one fitting. On top of that, confidential
of these is healthcare and biomedical medical data are already available
on public platforms and Internet- (for the purpose of furthering

based research sites, encouraged by research) alike. However, in this
open data policies in present times. regard, we would like to choose
Although the sharing of healthcare the types of AI techniques that do
data is extremely important for not memorize confidential patient
research purposes, privacy and secu- data from the dataset like private
rity also become key concerns in the biographical details or specific
wake of medical fraud and AI-based medical histories.
attacks. For some reference, the differ-
Hence, the need for improved ence between “security” and
privacy-preserving techniques and “privacy”—two key terms that are
systems to overcome the variety of interest in this chapter—must
of security and privacy risks for be understood first to be able to
users and patients in biomedical fully grasp the threats and possible
AI systems becomes extremely solutions of these. Abouelmehdi et
pertinent. al. suggest that while privacy refers
to appropriate utilization of patient
information and the authority to
13.2 BIOMEDICAL AI SYSTEMS decide and restrict, if needed, the
flow of that information, security
13.2.1 AI IN BIOMEDICAL refers to the fact that such decisions
DATA should be followed with due respect
to all parties involved and it strives
Biomedical data is inherently to protect the data from external
complex and involves sensitive attacks by malicious actors.
and private, often confidential, Apart from the usual advances
information of patients. Moreover, in AI for different tasks like image
biomedical and healthcare data may classification, etc., there have also
be of several types and more often been great advances in research
than not, contain a lot of unstruc- pertaining to AI systems capable
tured data, upon which it is even of handling complex biomedical
more difficult to carry out conven- data. One of the most important and
tional privacy-preserving analytics widely used examples is the U-net,
(Ronneberger et al., 2015). which is an architectural variant
In present times, AI systems are of convolutional neural networks
being increasingly used for medical developed by Ronneberger et al.
diagnosis and overall health care (1993, 2014, 2016) and used widely
surveillance and, thus, for the benefit in biomedical image segmentation
of patients and medical practitioners and which has received several best
scores and surpassed the state-of-the- the results. This leads to potential
art examples in challenges related to misuse of algorithms by attackers
biomedical data. and poses potential threats to user/
patient security in biomedical AI
systems. Moreover, biomedical
13.2.2 VULNERABILITIES OF systems usually leverage third-party
BIOMEDICAL AI SYSTEMS cloud platforms due to scalability,
storage, and performance benefits,
It is important to understand and and privacy compromisation is also
pinpoint the vulnerabilities of likely to happen in such situations
biomedical AI systems to realize due to the confidential information
the security threats at large and also being stored in the cloud unless
how to overcome or resist them. The secure sharing schemes and suitable
unique features and complexities encryption techniques are devised.
of biomedical data, which we have The problem of security also arises
discussed in the previous section in the case of data integration and
not only make the use cases more adoption that is needed to develop
interesting and complex but also large scale biomedical expert
vulnerable to potential security and systems.
privacy issues in AI systems. Thus, it is important for researchers
Often, the handling of clinical and AI engineers to understand how
data requires more context than the the sharing of sensitive patient data
standard for usual applications of works. For example, to prevent or
deep learning algorithms, such as reduce the effect of inference attacks
patient history, patient preferences, by adversaries, the use of polyin-
social perspectives, etc, which, stantiation techniques is helpful by
in turn, makes it more vulnerable separating the dataset into smaller sets
to security threats by malicious and developing “data silos” to avoid
attackers. Moreover, the “black box” disclosing the whole data.
nature of deep learning algorithms
results in a lack of model interpret-
ability and gives rise to confusion 13.3 SECURITY AND PRIVACY
regarding exactly how an AI model ISSUES IN BIOMEDICAL AI
can achieve the kind of performance SYSTEMS
it does. It is, therefore, not easy to
exactly identify the weaknesses There are several security and privacy
of the model or the reasons for the risks associated with biomedical AI
weakness, or even to extract addi- systems, and as research advances
tional biological explanations from in this area, the probability of
privacy breaches and attacks on threats, like adversarial examples,

sensitive patient data will increase, which have been discussed later, as
with more advanced techniques well to hack the system and pose
of attacks. Commonly observed more dangerous security threats
types of attacks and security issues to patients and practitioners alike.
have been discussed in this section. Nowadays, with the proliferation of
Identity theft (Elmisery et al., 2010) ubiquitous healthcare with the need
and blackmailing schemes, coupled of constant automated sensor-based
with commonly occurring cases of monitoring of patients’ health in real
healthcare reimbursement fraud are time, the vulnerabilities can also be
some of the problems that plague exposed in biomedical wearable and
biomedical systems with respect to implantable devices (Gu et al., 2014)
patient privacy. that contain a trove of biometric and
More often than not, hospitals sensor-based physiological data of
that are stores of highly confidential the wearer, which may face cyber-
and immutable biomedical data are security failure and attract lethal
targets of the data breach and data security threats to the wearer. Mobile
stealing, which may be made to healthcare networks are related in
cause medical insurance fraud, etc., this regard, and security and privacy
by malicious actors in intended acts issues also plague such devices
of cybercrime. from the perspective of “quality of
In recent times, with the rapid protection.” Thus, the healthcare
proliferation of deep learning and and biomedical sectors utilizing AI
AI and their increased application mechanisms are very vulnerable
in the biomedical field and health- to privacy and security threats,
care use cases, AI is being used to most importantly. It is potentially
combat such cybersecurity threats dangerous as the lives of innumer-
by proposing novel mechanisms able patients are at stake.
for biomedical systems to defend
themselves. For example, deep
Q-networks (a deep learning tech- 13.3.1 ADVERSARIAL ATTACKS
nique) were studied for their use to
minimize malware attacks in medical Adversarial attacks (Bos et al.,
Internet of Things-based devices, and 2016, Konečný et al., 2016a) are
privacy-preserving online medical very common security issues in the
diagnosis system may be built with machine learning field. Studies have
nonlinear SVMs, a machine learning shown that “adversarial fooling of
technique. However, attackers can neural networks” may happen or
exploit AI and use AI-based security may be caused by malicious actors,
very small, almost imperceptible (to e.g., ambiguous ground truth due
human sense) changes are made in to disagreement among human
the data to make the neural network medical specialists and the dearth
classifier misclassify the data, that of diversity in neural network archi-
is, make the wrong prediction. In the tectures utilized. The authors also
case of biomedical AI systems, this noted that adversarial patch attack
poses a very big threat to medical techniques are more powerful as
practitioners and more importantly, well as universal. IEEE Spectrum
patients; for, mere modification (Abouelmehdi et al., 2018) also
of some pixels in medical images reports that there may be a lot of
(Alnemari et al., 2017) may cause “incentives” behind such attacks,
the deep learning algorithm/neural e.g., existing cases of healthcare
network to predict a benign tumor fraud and the enormous revenue
as malignant or a malignant one as generated by the global healthcare
benign, both of which cases are very economy (Abuwardih et al., 2016)
unfortunate and life-threatening as it that makes the situation even more
can cause the wrong diagnosis and threatening. Some studies indicate
wrong treatment. Moreover, since that ethical hacking can come to our
such types of attacks are very subtle help in this regard as well since the
due to the imperceptible noise added expertise of ethical hackers lies in
to fool the network, detecting the feigning attacks on the data or the
presence of an error is hard. Celik et system to ultimately understand how
al. (2017) have also noted that such to protect the system against real
adversarial attacks may guide the attackers.
attacker toward the susceptible loca- Studies have suggested that
tions in biomedical images, which careful auditing of biomedical AI
may be distorted to cause the system systems and several rounds of testing
to misclassify. Such identification by cybersecurity specialists and AI
of attacking technique to identify experts can help detect such vulner-
vulnerable pixels in images may then abilities in the system. There is also
be applied to obtain a “susceptibility the need to develop algorithmic and
score” to alert biomedical systems infrastructural defense mechanisms
and make them more resistant to to these adversarial attacks (Alne-
such attacks. mari et al., 2017). In the long run,
Finlayson et al. (2018) have we must prioritize the development
pointed at several factors in the of robust machine learning/deep
specific case of medical data that learning models that are resistant to
makes biomedical AI systems more or not susceptible to such kinds of
vulnerable to adversarial attacks, attacks (Bose, 2016).
Other security threats may be to understand whether a certain data

privacy breaches—considering that point belongs to the training set or
the medical information of a patient not, or extracting information, for
is and should be confidential, the example, hyperparameters tuned on
breach of privacy through various a confidential model, to reconstruct
intentional means results in the models has a high threat to security
vulnerability of the patients’ data to as it could disclose patient data or
be leaked into the public domain, allow the attacker to gain insight into
which is not only unethical but the model to be able to efficiently
also poses a danger to the patients’ attack it. With respect to the data
lives. Data poisoning techniques like leak, appropriate leakage detection
model skewing, the weaponization techniques also need to be devised;
of feedback, etc. (Cooper and Elstun, leak detection may be carried out
2018). along with privacy guaranteeing
methods.
13.3.2 DATA POISONING

AND MODEL THEFT 13.3.3 OTHER TECHNIQUES
OF PRIVACY THREATS
Mozaffari-Kermani et al. (2015)
studied systematic data poisoning There may be a variety of other
(Chen et al., 2015) of machine threats, related or similar to
learning systems related to adversarial examples or otherwise,
healthcare, which may introduce which may be equally harmful to
either targeted errors or arbitrary biomedical AI systems.
errors. Targeted errors refer to the Transferability attacks have been
manipulation of the algorithm to studied by Papernot et al. (2016) and
yield results pertaining to a specific many other researchers in relation
(predetermined by the malicious to the transferability of adversarial
actor in cases of fraud) label or class, examples and how that can harm the
and the measures that may be taken model and breach privacy. Tracing
to counter them. attacks refers to the technique to
Another type of attack is model deduce that a data point is in the
theft or model stealing (Juuti et al., targeted dataset just by gaining
2018; Tramèr et al., 2016; Wang et API access to a deep learning
al., 2018). It refers to the technique model learned on it, which is very
of duplicating machine learning dangerous.
and deep learning models via Dataset reconstruction may
membership leakage of data points be exploited in a different way
altogether, in this regard, that is, to was a part of the training data or on
build privacy-preserving techniques the summary statistics to reveal the
to protect sensitive biomedical data. underlying distribution and other
Reconstruction attacks may also be useful statistical features (Shokri
staged to harm the model (Nasr et et al., 2017) and, thus, exploit the
al., 2018). predictions made by the AI model or
Linkage attacks are another compute some sensitive attribute of
important type of attack in AI the dataset in question. The attack
systems. Data linkage refers to the model may be trained with the help
idea that data in the public domain of what is called “shadow models,”
can be easily related to sensitive which are trained on either real or
information (say, confidential patient artificially generated fake data or
data) that can, thus, be accessed by both. The target model is made to
malicious actors. The linkage can train on such shadow models—it
be of several types such as attribute has been suggested in some studies
linkage or table linkage (Kieseberg
that the more the number of shadow
et al., 2014).
models, the better the perspective
To counter linkage attacks and
of accuracy—though there is a cost
security threats of the similar sort,
disadvantage in such situations,
which is especially important in
however, such claims have not yet
the wake of healthcare services in
been supported with valid proofs.
mobile devices that can compro-
The difference between such types
mise user identity and location
of attacks and reconstruction attacks
data, recent studies have suggested
(described above) is that they work
private record linkage and entity
even when the specific example
resolution techniques, such as
does not belong to the membership
deriving unique fingerprints from
set.
genomes to preserve patient iden-
tity. On the other hand, inference
attacks (Shokri et al., 2017; Nasr 13.4 POSSIBLE SOLUTIONS
et al., 2018) are data mining tech- TO SECURITY AND PRIVACY
niques wherein sensitive and robust ISSUES IN BIOMEDICAL AI
information can be deduced or SYSTEMS
“inferred” from trivial information
in a database with high confidence 13.4.1 GENERAL TECHNIQUES
value by malicious actors, hence
the name. Membership inference General techniques to database
attacks are made by adversaries by security such as auditing, that
querying whether a specific example is, rechecking for detection of
anomalous activity, bug reporting, conventional techniques in this

and authentication may be utilized regard.
for biomedical AI systems also.
However, given the nature of the
issue and the threat to human life and 13.4.2 DIFFERENTIAL
very private information of patients, PRIVACY AS A SOLUTION
newer advanced algorithms must be TO SECURE BIOMEDICAL AI
exploited in this regard. MODELS
Encryption mechanisms must
also be studied in this regard, Secure AI models dealing with
both conventional and advanced biomedical data can be designed
techniques like homomorphic using the idea of a probabilistic
encryption, which is especially theory called differential privacy
(Abadi et al., 2016; Dwork et al.,
used in case of ensuring privacy in
2015; Ji et al., 2015), which extends
cloud-based biomedical AI systems
its general idea from the principles
and allows computations on data
of cryptography and, of course,
without the need to decrypt it first.
mathematics and in some cases,
Sensitive patient information may
game theory even, and deals with
be encrypted before being uploaded
the idea of quantifying the notion of
to the cloud service, via suitable
privacy.
encryption schemes. Bos et al.
In the words of Papernot and
(2014) studied the particular case Goodfellow, “Differential privacy
of cardiovascular patient data and is a framework for measuring the
encryption-based privacy-preserving privacy guarantees provided by an
methods with regard to considerable algorithm.” Differential privacy
accuracy and the result that the cloud (DP) can be added as a constraint to
has no knowledge of the encrypted any algorithm, for example, differ-
data in this regard. entially private-stochastic gradient
Authentication techniques have descent (SGD) or DP SGD can
also been widely researched and ensure that no information extracted
especially with respect to biomedical from a database can be uniquely
systems (He et al., 2015; Wu et al., attributed to specific users. The basic
2018; Punithavathi et al., 2019) such SGD algorithm that is commonly
as hashing techniques and Kerberos used in deep learning systems for
(Valdez and Ziefle, 2019). Access training the neural network may
control methods like role-based be modified to make it DP such as
access control and attribute-based by clipping and randomizing the
access control are widely used gradients. The differential privacy
algorithm/technique provides a Also, considering the sensitivity of

“privacy guarantee” and a “privacy the queries prior to making predic-
budget” that are related by the idea tions is shown to be essential in this
that if the privacy budget is smaller, regard.
the privacy guarantee would be Moreover, the private aggrega-
larger that is advantageous to us tion of teacher ensembles algorithm
from a security point of view. is mentionable here, and it has
Experts have also pointed been studied extensively with some
towards the composition property of studies suggesting variations and
this technique, meaning the privacy extensions to the original algorithm
loss of any AI technique applied to (Papernot et al., 2018; Jordon et al.,
a biomedical system can be added 2018), based on the idea of differen-
over subsequent queries made to tial privacy, which is discussed in the
the training data, making it easier to later sections.
estimate the total privacy loss. These Privacy concerns are every-
properties make differential privacy
where and more so in AI systems
so powerful and an important
but are amplified when it comes
benchmark in privacy-preserving
to AI systems for clinical surveil-
experiments.
lance dealing with biomedical data,
Several variations of this concept
because of the specific vulnerabili-
exist; other than the pure differential
ties of this type of data.
privacy, there exist useful variants
Biomedical AI systems such as
like Rényi differential privacy
(Mironov et al., 2017; Geumlek healthcare recommender services
et al., 2017). Another technique (Valdez and Ziefle, 2019) need to
of designing DP algorithms is the use privacy-aware recommendation
exponential mechanism (McSherry techniques, which are conducive
et al., 2007). The idea of “local to the sparsity of data commonly
differential privacy” versus central- encountered in this regard. Valdez
ized differential privacy, “multiparty and Ziefle (2019) have studied the
differential privacy” and differential user, that is, a patient point of view in
privacy with interactive mechanisms this regard, which is often neglected
has also been studied in this regard. in contemporary research.
Another example of an adaptive Differential privacy has been
algorithm based on differential used successfully in the context
privacy has been suggested by Alne- of healthcare systems in previous
mari et al. (2017)—where it is shown studies (Lin et al., 2016; Dankar and
that the partitioning technique can El Emam, 2012) such as sensor data
help improve the privacy guarantee. in body area networks.
However, like all other good tech- data by the adversary. These tech-
niques, it also has a few limitations. niques include additive and multi-
For example, one such disadvantage plicative noise modeling as well as
is that an adversary would be able other techniques like geometric and
to estimate the sensitive informa- random space perturbation.
tion that we are trying to protect if
repeated queries, that is, multiple
attempts are made and privacy 13.4.3.2 ANONYMIZATION
breach may happen quite easily.
Appropriate anonymization tech-
niques (Szarvas, et al., 2007; Shin, et
13.4.3 PRIVACY al., 2018), that is, those dealing with
PRESERVATION AS A removal of private information and
SOLUTION TO SECURE also the linking of information for
BIOMEDICAL AI MODELS such cases, must be applied to ensure
the protection of patients’ confiden-
13.4.3.1 PRIVACY-PRESERVING tial information in biomedical data.
DATA MINING (PPDM) Examples of such techniques include
the FAST algorithm (Mohammadian
Privacy-preserving data mining et al., 2014) and identity-based
(PPDM) and a similar concept anonymization (Abouelmehdi et al.,
privacy-preserving machine 2018).
learning are important paradigms However, anonymizing data is not
that must be understood and utilized always sufficient and the privacy it
with respect to security threats in provides quickly degrades as adver-
biomedical AI systems. Privacy- saries obtain auxiliary information
preserving techniques have already about the individuals represented in
been utilized in biomedical use the dataset. A much-cited example
cases and especially those involving of such circumstances was studied
distributed computing, such as by Narayanan and Shmatikov
cluster analysis of healthcare data (2008) in relation to “breaking the
(Fung et al., 2018). anonymity” of a Netflix dataset. The
authors experimented with the fact
that privacy breaches may occur by
13.4.3.1 DATA PERTURBATION way of the revelation of anonymized
identities via linkage attacks.
Data perturbation involves the addi- A technique to achieve anony-
tion of random noise, which makes it mization is anatomization, which
harder to attack the sensitive patient deals with splitting the data into
separate records and rendering the estimator mechanisms may be able

linkage ambiguous. to achieve best possible accuracy in
Pseudonymization—in pseud- the predictions.
onymization, the linkage concerned However, these may also harm
with data linkage as described above, biomedical AI systems, as it is
and pseudonyms (reversible or irre- thought that if institutions release
versible types) such as cryptographic the data after deidentification, it may
keys, hashes, or other products of be in the public domain and suscep-
encryption techniques are used as tible to adversary attacks. Often,
a link between private and public healthcare data in the form of elec-
data that can be reidentified only by tronic health records or otherwise
authorized people (Kieseberg et al., is encouraged by organizations and
2014). government initiatives based on open
The broader idea related to this data policy to be shared and dissemi-
can be traced to traditional data nated publicly to help low-cost
deidentification and data reiden-
medical research and investigations.
tification processes that prohibit
This gives rise to privacy attacks and
data leakage by employing various
risks based on linkage and reiden-
techniques.
tification by prospective attackers.
Data reidentification is the
Some studies (Loukides, et al., 2014)
technique of matching deidentified
have experimented with dissociation
or anonymous data with public or
to prevent wrongful data linkage in
auxiliary data to identify the owners
via different techniques like general- such cases, with lower privacy loss
izing, masking, etc. The likelihood than other existing methods.
of reidentification is a factor in Some techniques to combat the
determining the probability that disadvantages of reidentification and
the patient’s information has been deidentification are more uniform
compromised. The probability of standards related to deidentification
reidentification can also be deter- and deidentification with appropriate
mined by the uniqueness factor privacy-preserving techniques for
that may be estimated by several the same. Proper rules and legisla-
methods. Dankar et al. (2012) have tions need to be passed to prevent
studied different uniqueness estima- reidentification from being used for
tors such as Pitman’s estimator and malicious purposes.
Zayatz estimator on biomedical However, some studies tend
datasets and concluded that an appro- to dispel this notion as the risk of
priate “decision rule” developed by reidentification by adversaries may
the combination of different such be less in most instances.
k-Anonymity (Sweeney 2002) 13.4.4 FEDERATED LEARNING

is a concept used frequently in rela- AS A SOLUTION TO SECURE
tion to PPDM. Related concepts are BIOMEDICAL AI MODELS
p-sensitive anonymity (Wibowo
2018; Cooper and Elstun, 2018) and Federated learning (Konečný et al.,
l-diversity (Tu et al., 2019; Mach- 2016b) is a decentralized manner of
anavajjhala et al., 2006), which is learning a model in a collaborative
a technique based on group-based manner, that is, by several clients in
anonymization that focuses on which no training data is exchanged.
decreasing the “granularity” of The original idea behind federated
the dataset, t-closeness, which is learning stems from the premise
basically an improvement over of this concept being utilized in
the former, and quasi-identifiers, mobile phones, such as predictive
which are attributes that may have typing for mobile keyboards being
linkage with an external dataset dependent on the particular user’s
(Veličković et al., 2017). However, history and not on the global users’
these methods are not foolproof and history.
we often need to look at other newer Moreover, the concept of
techniques like differential privacy differential privacy, as mentioned
to ensure maximal privacy and secu- in the previous subsection, can be
rity. For example, in situations when integrated with the idea of feder-
one has auxiliary information from ated learning, as proposed by Geyer
secondary sources, these methods et al. (2017). In techniques that
fail. They also tend to overfit or over- are concerned with “differential
generalize resulting in misleading privacy-preserving federated opti-
predictions, which is very harmful mization” (Geyer et al., 2017), the
to biomedical AI systems. A relevant breach of privacy in the data is less
study in this regard is Pycroft and likely, as the loss of privacy and the
Aziz (2018), where the idea of model performance are balanced
k-anonymity has been improved better.
by introducing “semantic linkage This manner of training AI or
k-anonymity” to balance the privacy deep learning models in a privacy-
loss and accuracy. conscious fashion has brought about
However, it is not advisable to a paradigm shift for the way machine
use anonymization techniques in learning is traditionally done, which
such situations; rather differential usually fails to emphasize the priority
privacy would be a good weapon for issue, and must be duly extended and
adaptive attacks on biomedical data. employed for biomedical systems.
13.5 BEST PRACTICES TO 2018). Moreover, due to the sensitive

ENSURE SECURITY AND nature of the subject, it is essential for
PRIVACY IN BIOMEDICAL institutions to recognize and respect
AI SYSTEMS AND FUTURE the various data protection laws that
DIRECTIONS exist in the system with regard to
maintaining and safeguarding legal
To counter such security threats, and ethical responsibilities toward
many studies have suggested the patients (such laws vary from
adoption of a set of best practices country to country, such as HIPAA
when working with biomedical data Act and Patient Safety and Quality
and also to ensure the optimal use Improvement Act in the USA and IT
of predictive models in research, act in India).
especially to discourage inadequate It is extremely important to test
studies with inaccurate results that AI models in real-time clinical
may compromise the credibility of situations, which are often complex
important and valid research in the and noisy, to further understand the
field. fragility of such models and where
Moreover, working with regula- their vulnerabilities can be exploited
tors is important and emphasizing so that better security schemes can
on the aspect of regulatory affairs in be devised to counter the problems.
biomedical AI will help in assessing After all, it is always essential to
the reliability of machine learning understand the problem thoroughly
and deep learning techniques in this to come up with actual and effective
field. solutions.
With regard to advanced tech-
nologies such as AI finding use in
biomedical systems, few important 13.6 CONCLUSION
factors like regulation compliance,
safety assurance and risk identifi- Biomedical AI systems suffer from
cation, and management are also the threats of security and privacy
cropping up, as is commonly done risks in various capacities due to
in other industries. Moreover, from the complexities and vulnerabilities
the software perspective, developers of clinical data as well as AI-based
must analyze the “data security life- attacking techniques gaining
cycle” for efficient decision-making advancement over normal cases
and also to ensure the other factors of healthcare fraud. The users or
central to a project like software patients are at stake in this regard
reuse, cost-effectiveness, budget the most, and such attacks may
issues, etc. (Abouelmehdi et al., even prove fatal to them because it
misleads AI systems to make wrong In this chapter, we have outlined

predictions regarding the diagnosis the various ranges of security and
of a disease. privacy threats to biomedical AI
This problem must be looked at systems such as linkage attacks,
from several different perspectives. inference attacks, adversarial
The most important one being the examples, and so on. Similarly, solu-
patient point of view—whether the tions to such problems have been
patient is willing to have the sensi- discussed, such as conventional
tive personal medical data shared or techniques, like auditing, etc., and
not and whether they fear the risk of newer advancements in research,
reidentification from publicly shared like differential privacy and feder-
data records. Studies have been ated learning.
carried out in this regard such as It is to be reiterated that practi-
the one by Ziefle and Valdez (2018) cally usable biomedical and health-
where patients’ decisions regarding care systems utilizing AI techniques
whether or not to share their private have a long way to go with regard to
medical data for scientific purposes security and privacy issues and that
were investigated and analyzed advances in research do not neces-
from various perspectives like the sarily translate into advancements or
age of the patient and the nature of improvements in these systems that
the illness. Studies in this direction patients and medical practitioners
must be encouraged more and must make use of in real life. There are
be adopted with more relevance by always practical challenges to apply
researchers and healthcare institu- the research directions and results
tions alike to understand and respect to privacy-preserving real-time
the perception of privacy by patients systems, but efforts are being made
and users, who are most at stake in in the right direction (Sharma et al.,
biomedical AI systems suffering 2018).
from lack of security and privacy. However, it must be understood
However, patients often withhold that the idea of ideal privacy or
sensitive information that does not absolute guarantee of patient privacy
help AI-based biomedical systems is impossible to be achieved; hence,
either. In this regard, some studies the focus should be on maximizing
have suggested other techniques the security and privacy guarantee,
like privacy distillation, which gives prioritizing users, that is, patients,
control to patients about the amount and balancing the tradeoff between
of personal sensitive data that is fed performance/accuracy and privacy
to these systems (Celik et al., 2017). loss.
Hence, not only must research KEYWORDS

in this regard are being prioritized
(which is, of course, the dominant machine learning security
aspect since it drives us toward
privacy
newer advanced algorithms to
differential privacy
fight privacy breaches) but also
a diverse and multidisciplinary adversarial examples
range of professionals belonging
to fields ranging from software,
risk management, and cybersecu-
rity to regulatory affairs, machine REFERENCES
learning validation, and database
applications must be welcomed Abadi, M; Chu, A; Goodfellow, I; McMaha,
H. B., Mironov; I., Talwar, K.; Zhang, L.
in this regard to truly succeed in
Deep learning with differential privacy. In
ensuring security and privacy guar- Proceedings of ACM SIGSAC Conference
antees in biomedical AI systems. on Computer and Communications
However, research must be given Security, 2016, 308–318.
Abouelmehdi, K; Beni-Hessane, A.;
prime importance. We are confi-
Khaloufi, H. Big healthcare data:
dent that suitable advancements in preserving security and privacy. Journal of
research in this field will result in Big Data, 2018, 5(1), 1.
quality solutions to achieve a true Abuwardih, L. A.; Shatnawi, W. E.; Aleroud,
balance between performance and A. Privacy preserving data mining on
published data in healthcare: A survey. In
privacy that is conducive to users Proceedings of the IEEE 7th International
and patients in the healthcare and Conference on Computer Science and
biomedical domains. Information Technology (CSIT), 2016,
In conclusion, addressing the 1–6.
Alnemari, A.; Romanowski, C. J.; Raj, R. K.
problem of security and privacy in An adaptive differential privacy algorithm
biomedical AI systems is complex, for range queries over healthcare data.
multidisciplinary, and also involves In Proceedings of IEEE International
ethical and legal perspectives. As Conference on Healthcare Informatics
(ICHI), 2017, 397–402.
newer and better machine learning
Bos, J. W.; Lauter, K.; Naehrig, M. Private
and deep learning algorithms are predictive analysis on encrypted medical
devised to tackle the problems in data. Journal of Biomedical Informatics,
the healthcare and medical domain, 2014, 50, 234–243.
Bose, R. Intelligent technologies for
newer security threats will also
managing fraud and identity theft.
emerge. In Proceedings of 3rd International
Conference on Information Technology:
New Generations (ITNG'06), 2006, of Advances in Neural Information

446–451. Processing Systems. 2017, 5289–5298
Celik, Z. B.; Lopez-Paz, D.; McDaniel, P. Geyer, R. C; Klein, T.; Nabi, M. Differentially
Patient-driven privacy control through private federated learning: A client level
generalized distillation. In Proceedings perspective. arXiv:1712.07557, 2017.
of IEEE Symposium on Privacy-Aware Goodfellow, I. J.; Shlens, J.; Szegedy, C.
Computing (PAC), 2017, 1–12. Explaining and harnessing adversarial
Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S. examples. arXiv:1412.6572, 2014.
S.; Brox, T.; Ronneberger, O. 3D U-Net: Gu, S.; Rigazio, L. Towards deep neural
learning dense volumetric segmentation network architectures robust to adversarial
from sparse annotation. In Proceedings examples. arXiv:1412.5068, 2014.
of International Conference on Medical He, D.; Kumar, N.; Chen, J.; Lee, C. C.;
Image Computing and Computer-Assisted Chilamkurti, N.; Yeo, S. S. Robust
Intervention, Springer, Cham, 2016, anonymous authentication protocol for
424–432 . health-care applications using wireless
Cooper, N.; Elstun, A; User-controlled medical sensor networks. Multimedia
generalization boundaries for p-sensitive Systems, 21(1), 2015, 49–60.
k-anonymit, 2018. Jordon, J.; Yoon, J.; van der Schaar, M.
Dankar, F. K.; El Emam, K. The application PATE-GAN: Generating Synthetic Data
of differential privacy to health data. In with Differential Privacy Guarantees, 2018
Proceedings of the 2012 Joint EDBT/ICDT Juuti, M.; Szyller, S.; Dmitrenko, A.;
Workshops. ACM, 2012, 158–166 Marchal, S.; Asokan, N. PRADA:
Dankar, F. K; El Emam, K.; Neisa, A.; protecting against DNN Model Stealing
Roffey, T. Estimating the re-identification Attacks. arXiv:1805.02628.2018
risk of clinical data sets. BMC Medical Kieseberg, P.; Hobel, H.; Schrittwieser,
Informatics and Decision Making, 2012 S.; Weippl, E.; Holzinger, A.; Protecting
12(1), 66. anonymity in the data-driven medical
Dwork, C.; Roth, A. The algorithmic sciences. Interactive knowledge discovery
foundations of differential privacy. and data mining: state-of-the-art and future
Foundations and Trends in Theoretical challenges in biomedical informatics,
Computer Science, 9(3–4) 2014, 211–407. Springer Lecture Notes in Computer
Elmisery, A. M.; Fu, H. Privacy preserving Science LNCS, 8401, 2014, 303–318.
distributed learning clustering of healthcare Kohl, J.; Neuman, C. The Kerberos network
data using cryptography protocols. authentication service (V5) (No. RFC
In Proceedings of IEEE 34th Annual 1510), 1993
Computer Software and Applications Konečný, J.; McMahan, H. B.; Ramage,
Conference Workshops. 2010, 140–145. D.; Richtárik, P. Federated optimization:
Finlayson, S. G.; Chung, H. W.; Kohane, Distributed machine learning for on-device
I. S.; Beam, A. L. Adversarial attacks intelligence. arXiv:1610.02527, 2016a
against medical deep learning systems. Konečný, J.; McMahan, H. B.; Yu, F.
arXiv:1804.05296, 2018. X.; Richtárik; P.; Suresh, A. T.; Bacon,
Fung, C.; Yoon, C. J.; Beschastnikh, I. D. Federated learning: Strategies for
Mitigating Sybils in federated learning improving communication efficiency.
poisoning. arXiv:1808.04866, 2018. arXiv:1610.05492, 2016b
Geumlek, J.; Song, S.; Chaudhuri, K. Lacharité, M. S.; Minaud, B.; Paterson, K.
Renyi differential privacy mechanisms G. Improved reconstruction attacks on
for posterior sampling. In Proceedings encrypted data using range query leakage.
In Proceedings of IEEE Symposium on data streams. In Proceedings of ACM

Security and Privacy (SP), 2018, 297–314. International Conference on Big Data
Li, N.; Li, T.; Venkatasubramanian, S. Science and Computing, 2014, 23.
t-Closeness: Privacy beyond k-anonymity Mozaffari-Kermani, M.; Sur-Kolay, S.;
and l-diversity. In Proceedings of IEEE Raghunathan, A.; Jha, N. K. Systematic
23rd International Conference on Data poisoning attacks on and defenses for
Engineering, 2007, 106–115. machine learning in healthcare. IEEE
Li, X.; Qin, J. Protecting privacy when Journal of Biomedical and Health
releasing search results from medical Informatics, 19(6), 2015, 1893–1905.
document data. In Proceedings of 51st Narayanan, A.; Shmatikov, V. Robust
Hawaii International Conference on De-Anonymization of Large Datasets
System Sciences, 2018. (How to Break Anonymity of the Netflix
Lin, C.; Song, Z.; Song, H.; Zhou, Y.; Wang, Prize Dataset). University of Texas at
Y.; Wu, G. Differential privacy preserving Austin, 2008.
in big data analytics for connected health. Nasr, M.; Shokri, R.; Houmansadr, A.
Journal of Medical Systems, 2016, 40(4), comprehensive privacy analysis of deep
97. learning: stand-alone and federated
Loukides, G.; Liagouris, J.; Gkoulalas- learning under passive and active white-box
Divanis, A.; Terrovitis, M. Disassociation inference attacks. arXiv:1812.00910, 2018
for electronic health record privacy. O’Keefe, C. M.; Westcott, M.; O’Sullivan, M.;
Journal of Biomedical Informatics, 2014, Ickowicz, A.; Churches, T. Anonymization
50, 46–61. for outputs of population health and health
Lu, Y.; Sinnott, R. O.; Verspoor, K.; services research conducted via an online
Parampalli, U. Privacy-preserving data center. Journal of the American
access control in electronic health Medical Informatics Association, 24(3),
record linkage. In Proceedings of the 2016, 544–549.
17th IEEE International Conference On Papernot, N.; McDaniel, P.; Goodfellow, I.
Trust, Security and Privacy/12th IEEE Transferability in machine learning: from
International Conference on Big Data phenomena to black-box attacks using
Science and Engineering (TrustCom/ adversarial samples. arXiv:1605.07277,
BigDataSE), 2018, 1079–1090. 2016
Machanavajjhala, A.; Gehrke, J.; Kifer, D.; Papernot, N.; Song, S.; Mironov, I.;
Venkitasubramaniam, M. l-Diversity: Raghunathan, A.; Talwar, K.; Erlingsson,
Privacy beyond k-anonymity. In Ú. Scalable private learning with PATE.
Proceedings of IEEE 22nd International arXiv:1802.08908.2018
Conference on Data Engineering Punithavathi, P.; Geetha, S.; Karuppiah, M;
(ICDE'06), 2006, 24–24. Islam, S. H.; Hassan, M. M.; Choo, K. K.
McSherry, F.; Talwar, K. Mechanism design R. A lightweight machine learning-based
via differential privacy. In Proceedings authentication framework for smart IoT
of Foundations of Computer Science devices. Information Sciences, 484, 2019,
(FOCS), vol. 7, 2007, 94–103. 255–268.
Mironov, I. Rényi differential privacy. In Pycroft, L.; Aziz, T. Z. Security of
Proceedings of IEEE 30th Computer implantable medical devices with wireless
Security Foundations Symposium (CSF), connections: the dangers of cyber-attacks,
2017, 263–275. 2018
Mohammadian, E.; Noferesti, M.; Jalili, Ronneberger, O.; Fischer, P.; Brox, T. U-net:
R. FAST: fast anonymization of big Convolutional networks for biomedical
image segmentation. In Proceedings of framework. Journal of the American

International Conference on Medical Medical Informatics Association, 14(5),
Image Computing and Computer-Assisted 2007, 574–580.
Intervention, Springer, Cham, 2015, Tramèr, F.; Zhang, F.; Juels, A., Reiter; M. K.;
234–241. Ristenpart, T. Stealing machine learning
Sevastopolsky, A. Optic disc and cup models via prediction apis. In Proceedings
segmentation methods for glaucoma of 25th {USENIX} Security Symposium
detection with modification of U-net ({USENIX} Security 16), 2016, 601–618.
convolutional neural network. Pattern Tu, Z.; Zhao, K.; Xu, F.; Li, Y.; Su, L.; Jin,
Recognition and Image Analysis, 27(3), D. Protecting trajectory from semantic
2017, 618–624. attack considering anonymity, diversity,
Shakeel, P. M; Baskar, S.; Dhulipala, V. and closeness. In Proceedings of IEEE
S.; Mishra, S.; Jaber, M. M. Maintaining Transactions on Network and Service
security and privacy in health care system Management, 16(1), 2019, 264–278.
using learning based deep-Q-networks. Valdez, A. C.; Ziefle, M. The users’
Journal of Medical Systems, 42(10), 2018, perspective on the privacy-utility trade-
186. offs in health recommender systems.
Sharma, S.; Chen, K.; Sheth, A. Toward International Journal of Human-Computer
practical privacy-preserving analytics for Studies, 121, 2019, 108–121.
iot and cloud-based healthcare systems. Veličković, P.; Lane, N. D., Bhattacharya,
IEEE Internet Computing, 22(2), 2018, S.; Chieh, A.; Bellahsen, O.; Vegreville,
42–51. M. Scaling health analytics to millions
Shin, H. C.; Tenenholtz, N. A.; Rogers, J. K.; without compromising privacy using
Schwarz, C. G.; Senjem, M. L.; Gunter, J. deep distributed behavior models.
L.; Michalski, M. Medical image synthesis In Proceedings of ACM 11th EAI
for data augmentation and anonymization International Conference on Pervasive
using generative adversarial networks. In Computing Technologies for Healthcare,
Proceedings of International Workshop 2017, 92–100.
on Simulation and Synthesis in Medical Wang, B.; Gong, N. Z.
Imaging, 2018, 1–11. Springer, Cham. Stealinghyperparameters in machine
Shokri, R.; Stronati, M.; Song, C.; Shmatikov, learning. In Proceedings of IEEE
V. Membership inference attacks against Symposium on Security and Privacy (SP),
machine learning models. In Proceedings 2018, 36–52.
of IEEE Symposium on Security and Wibowo, W. C. A. Distributional model of
Privacy (SP), 2017, 3–18. sensitive values on p-sensitive in multiple
Sweeney, L. Achieving k-anonymity sensitive attributes. In Proceedings of
privacy protection using generalization IEEE 2nd International Conference on
and suppression. International Journal of Informatics and Computational Sciences
Uncertainty, Fuzziness and Knowledge- (ICICoS), 2018, 1–5.
Based Systems, 10(05), 2002, 571–588. Wu, F.; Li, X.; Sangaiah, A. K.; Xu, L.;
Sweeney, L. k-Anonymity. A model for Kumari, S.; Wu, L.; Shen, J. A lightweight
protecting privacy. International Journal and robust two-factor authentication
of Uncertainty, Fuzziness and Knowledge- scheme for personalized healthcare
Based Systems, 10(05), 2002, 557–570. systems using wireless medical sensor
Szarvas, G.; Farkas, R.; Busa-Fekete, R. networks. Future Generation Computer
State-of-the-art anonymization of medical Systems, 82, 2018, 727–737.
records using an iterative machine learning
Yuan, X.; He, P.; Zhu, Q.; Li, X. Adversarial Zhu, H.; Liu, X.; Lu, R.; Li, H. Efficient
examples: Attacks and defenses for and privacy-preserving online medical
deep learning. In Proceedings of IEEE prediagnosis framework using nonlinear
Transactions on Neural Networks and SVM. IEEE Journal of Biomedical and
Health Informatics, 21(3), 2017, 838–850.
Learning Systems.2019.
Ziefle, M.; Valdez, A. C. Decisions about
Zhang, K.; Yang, K.; Liang, X.; Su, Z., Shen, medical data disclosure in the internet:
X.; Luo, H. H. Security and privacy for an age perspective. In Proceedings of
mobile healthcare networks: from a quality International Conference on Human
of protection perspective. IEEE Wireless Aspects of IT for the Aged Population,
Communications, 22(4), 2015, 104–112. 2018, 186–201, Springer, Cham.
CHAPTER 14
LIMOS—LIVE PATIENT MONITORING

SYSTEM
T. ANANTH KUMAR1*, S. ARUNMOZHI SELVI2, R.S. RAJESH2,
P. SIVANANAINTHA PERUMAL2, J. STALIN2
1
Department of Computer Science and Engineering,
IFET college of Engineering, Tamilnadu, India
Department of of Computer Science and Engineering,
2
Manonmaniam Sundaranar University, Tirunelveli, India

Corresponding author. E-mail: ananth.eec@gmail.com
*
ABSTRACT parameters has been established

online and the important data are
In this Hi-Tech world, the utiliza- reported immediately. Finally, the
tion of time is a very important report and prescription will be sent
parameter in every human’s life. in reply to the patient. A system is
When it is for a physician, it is more developed to monitor the patient’s
important to save as many lives as health continuously by a medical
possible. Especially, most talented monitor kit that includes hemody-
resource persons are indeed very namic, cardiac, blood glucose level,
busy to get an appointment, but body temperature, pulse oximetry
their services are very important to (respiration rate), and the stress of
the patients who are in the intensive a patient. Practically nowadays, the
care struggling for their life. Of late, measured information is noted down
healthcare monitoring of patients and sent to the medical physician
has been the most crucial part of the using the Internet through Wi-Fi
medical field. In real-time, only the technology that generates harmful
medical reports are translated and radiation that has electromagnetic
have been sent to the physician. In waves. It causes health disorders to
our proposed work, a complete set patients such as tissue damage, blind-
up for analyzing patient’s health ness, sterility, heating of tissues. To
overcome the critical issues, Li-Fi is the examination of the medical

technology is proposed to transmit condition of the critical organs and
the online health information to the other vital parameters of a patient
physician without generating any over a particular period of time. It
vicious radiations to the patient. has the facility to evaluate the condi-
A general physician can get the tion of a patient’s health by checking
critical health parameters of the without letting any disease or ailment
patient from a remote place. The live to further impediments the body.
patient monitoring system (LiMoS) The vital health parameter includes
framework is installed in the systems heart rate, respiration rate, body
of the physician and ICU, where the temperature, glucose level, stress,
framework interlinks the patients hypertension, and blood pressure.
monitoring units with Li-Fi-enabled These parameters are continuously
Arduino board from which all time observed by using the respective
live data is uploaded and updated to sensors and the monitored informa-
the other side of the monitoring unit tion will be transmitted to the medical
of the LiMoS. Based on the given practitioner using different technolo-
information, the physicians send gies like Wi-Fi, GSM, and wireless
back the report of the patient to the sensor network. If any deformities or
ICU unit. Thus, the LiMoS frame- any variation from the normal value
work contributes good performance of such parameters are detected, then
monitoring of the patient for this an alert or warning signal will be
emerging medical field. sent to the medical practitioner and
nurse. So, the medical practitioner
can be able to provide treatment
14.1 INTRODUCTION for the health disorder in the earlier
stage. Thus, the healthcare system
14.1.1 OVERVIEW TO PATIENT continuously monitors the patient(s)
MONITORING information especially just in case
of any potential irregularities, in the
The patient monitoring systems emergency phase, the alert system
started blooming in the mid-1960s. connected to the system gives an
The Technicon Medical Informa- audio and video cautionary signal
tion System was the first and most that the patient needs immediate
successful system, begun in 1965 attention. From (Pramila et al. 2014;
as a collaborative project between Agarwal, 2013; Adivarekar et al.,
Lockheed and El Camino Hospital in 2013) to learn the growth the different
Mountain View, California (Pramila stages involved in patient monitoring
et al., 2014). Patient monitoring are represented in Table 14.1.
Live Patient Monitoring System 313
TABLE 14.1 Stages of Patient Monitoring System

Stages Methods of Type of Communication Limitations
Monitoring
Stage Single parameter Only SMS is sent through Only a minimal amount
1 (ECG) or mobile phones of basic information is
multiparameter communicated, no complete
includes (ECG, healthcare report is given
respiration rate, and
blood pressure) are
monitored in the
hospital
Stage 2 In-home patient Gathers data from the More sharing of data can be
monitoring system sensors and forwards generated when worked on
with sensors and RF the patient’s information a PC or laptop
components via GPRS
Stage 3 In-home patient The necessity behind Radiation issues are not
monitoring system a computer is stopped by considered
GPS-enabled smartphone
Stage 4 Real-time health Wireless structured system Radiation issues are not
monitoring of with Zigbee considered
multiple patients
All the above stages work the alert will be sent to the medical
with wireless technology that uses practitioner through an alarm, SMS,
radio frequency (RF) radiation for or email using Wi-Fi that is an IEEE
the patients when it is monitored. 802.11 standard that stands for “wire-
Especially, the areas like ICU cannot less fidelity.” It is a popular wireless
be entertained with RF signals. networking technology introduced
To overcome all the above issues, by NCR Corporation/AT&T in the
a radiation-free technology can Netherlands (1991) (LaMonica,
be introduced in the hospitals for 1991). With the help of this tech-
monitoring the patient even more nology, the collected information
securely. can be easily exchanged or shared
between one or more devices. Wire-
less technology is required for using
14.1.2 WI-FI ROLE IN PATIENT all home appliances such as mobiles,
MONITORING televisions, DVD players, digital
cameras, laptops, smartphones, etc.
The health parameters are constantly The probability of communication
monitored and recorded. If the param- with Wi-Fi shall be through the
eter value is observed abnormal, client to client communication or
access point to client communica- source. The LED transmits the binary
tion. It is an optimal option for home form of data in the form of light
and business networks. The data is pulses and thus is an optical wireless
converted into a radio signal, which communication (OWC) communica-
in turn transfers the data into an RF tion (Li-Fi, 2019). Li-Fi technology
antenna for users using a computer’s is also based on a visible-light wire-
wireless adapter. less communication system that lies
Previously, patient monitoring between the violet color (800 THz)
is implemented using wireless tech- and red color (400 THz). The Li-Fi
nology that uses RF waves, that is, uses the optical spectrum that is
electromagnetic waves to transmit visible light part of the electromag-
the sensed data which is collected netic spectrum, whereas Wi-Fi uses
by various sensors. Wi-Fi commonly RF of the electromagnetic spectrum.
uses a single band (2.4 GHz) or dual It uses fast strokes of LED light to
band (5.8 GHz) RF that works best transmit data, as it cannot be noticed
for light-of-sight condition. Some by the normal human eye. It includes
common materials can absorb or the visible light spectrum to transmit
reflect the radio waves that restrict the information. VLCs features are
the range of the signal. Wi-Fi uses providing wide bandwidth, that is,
a half-duplex shared configuration the optical spectrum guarantees
where all stations can transmit more than 10,000 times better band-
and receive the signal on the same width than the convention of the
channel. harmful RF frequencies. The LED
lights work rapidly for transmitting
the binary data by switching the
14.2 LI-FI TECHNOLOGY LED on and off because it has no
interfering light frequencies like that
Li-Fi is a wireless technology that of the radio frequencies in Wi-Fi.
uses visible light as the communi- In Li-Fi, the LED in the transmitter
cation medium of standard IEEE is connected to the data network
802.15.7. Li-Fi was proposed by (Internet through the modem) and
Harald Haas in 2011 (Li-Fi, 2019). the receiver (photodetector/solar
Li-Fi refers to an innovative wireless panel) on the receiving end, which
system of visible light communica- obtains the data as light gesture and
tion (VLC) technology. The VLC decrypts the information and then
technology can deliver bidirectional displays on the device connected
communication with high-speed data to the receiver (An Internet of
rates and networked mobile commu- Light, 2014). In the early stage, the
nication by using LED as the light data transfer speed was 15 Mbps.
Later, many commercial luminaries interesting applications that can be

helped to increase the speed of Li-Fi evolved in future provide effective
up to almost 10 Gbps, which has security, high data rate speed in
overcome the speed of 802.11.ad. dense urban environments, cellular
IEEE 802.15.7 is a specific wireless communication, electromagnetic
network standard that defines the interference-sensitive environments
working of the physical and data on aircraft travel, augmented reality,
link layer, which is a media access localized advertising, especially for
layer that defines the working of the abled, underwater communica-
the mobility of optical transmis- tion, safety environment, intelligent
sion and its coexistence with the transportation systems, and indoor
present architecture. There are many navigation (Li-Fi-centre, 2019).
features of Li-Fi related to modula-
tion, illumination, and dimming
scheme, which is the first concern 14.2.2 LIMITATIONS OF LI-FI
(Wikipedia, 2016). In December TECHNOLOGY
2017, Velmenn introduced an
advanced Li-Fi USB adapter for its Although there are many merits of
use in the communication of USB Li-Fi technology, some demerits also
components and Li-Fi-enabled LED exist. The Internet cannot be used
lights (LaMonica et al., 2019). This without a light source. Although
technology that is of a light form of full operations of Li-Fi technology
data transmission is highly radiation- cannot be imposed on certain areas
free and plays an energetic role in where Wi-Fi fails, in such places
the medical field. The advantages Li-Fi replaces Wi-Fi. A whole
include efficiency, availability, and new innovative infrastructure for
security (Rohner et al., 2015). Li-Fi solely should be proposed for
future disabilities. The area where
the Li-Fi fails to operate is duplex
14.2.1 APPLICATIONS transmission where both sending
and receiving of data are in the form
Li-Fi applications are many in a of light and the light may interfere.
broader sense due to its key features In Li-Fi, the transmitter should be
such as high data rate capability, able to maintain a directional link
directional lighting, high-level during transmission (Ramadhani et
efficiency in energy consumption, al., 2018; Classen et al., 2015). As of
intrinsic security, signal blockage now, the uplink is in another mode of
by walls, and combined networking transmission like Wi-Fi and mobile
capability (Techopedia, 2019). Some communication. A certain area of
research has not been touched like signal will be given to the
security; transmission scenario in medical practitioner.
the outdoor that has a high challenge • The second objective of this
needs to be implemented. proposal is to transmit the data
without exposing any harmful
radiation to the patients.
14.3 PROPOSED IDEA
• The main aim is to improve 14.3.1 WORKFLOW OF THE

the health of patients by PROPOSED IDEA
continuously monitoring
them, and the monitored data The proposed idea is pictorially
has to be sent to the medical represented as Figure 14.1, which
practitioner. If any abnor- consists of the workflow of the
mality is detected, the alarm proposed Li-Fi-based system. The
FIGURE 14.1 Workflow of the proposed Li-Fi-based system.

three phases of work have to be The workflow falls in three

explored, namely, entering phase, categories, such as
intermediate phase, and transmission
phase. 1. entering phase,
The first phase is the entering 2. intermediate phase, and
phase, which is a design interac- 3. transmission phase.
tion of the live patient monitoring
system (LiMoS) framework. In this The entering phase is the LiMoS
phase, two types of frameworks framework, which is the gateway
are designed, namely, system app for the online patient monitoring
and mobile app. The system app
system from which the patient’s
will be used by both physicians
information is extracted by a
and the ICU unit section. For the
physician outside the hospital and
second phase, the framework of the
medical monitor kit (MMK) has to the corresponding interactions are
be designed that comprises all the made by an internal doctor in the
instruments used for monitoring. hospital.
Medical monitoring kit comprises The intermediate phase is the
of all the medical support given MMK, which consists of all the
to patients for observations. medical components integrated
The corresponding sensors are with ICU patients. This phase
embedded with the Arduino board, works in the Li-Fi zone. MMK
namely, a respiration sensor, a comprises hemodynamic, cardiac,
heart rate sensor, and temperature blood glucose level, body tempera-
sensors from where the data is ture, pulse oximetry (respiration
generated to transfer. Finally, in rate), and the stress to a patient. The
the third phase, the interoperability results of all these components are
of how the Arduino board interacts lively transmitted to the transmis-
with the Li-Fi environment and sion phase.
the communication scenario for The transmission phase is the
the Backhaul network to transfer most important part of the Li-Fi zone
through the Internet. The measure that transmits and receives informa-
of its performance is compared with
tion from the MMK kit to the Wi-Fi
the real-time patient monitoring
zone. This phase comprises of LED,
system in terms of cost by reducing
transmitter, and receiver units for
the physicians’ official visit for
transmission.
treatment, and in terms of time
management for diagnostic testing
procedures.
14.3.2 LIMOS FRAMEWORK whereas the I-Doc/PCT mentions

(LI-FI MONITORING SYSTEM): the internal physician who is inside
ENTERING PHASE the hospital, and PTC refers to the
patient’s caretaker who is inside the
The terminology named for our ICU all the time. The homepage is
proposed idea has the basic require- the open screen of the system app
ments for the process of monitoring of the doctor who opens the LiMoS
a patient in an ICU. This framework framework. Followed by which, it
is composed of two modules: has doctor and admin options, where
doctor refers to the doctor login
1. System app and portal, where the physicians have
2. Mobile app. been given a separate password to
know the authenticated person. The
E-Doc authentication information
14.3.2.1 SYSTEM APP is created by the admin. The assign-
ment of the respective doctors for the
The System app gives the opera- respective ICU bed is also assigned
tional use of the physician and the in this phase. The E-Doc and I-Doc
ICU unit, hereafter mentioned as have separate login information
E-Doc and I-Doc/patient care taker that provides authentication for the
(PCT), respectively. E-Doc mentions security of data. Figure 14.2 shows
the physician outside the hospital, the main source page of the LiMoS
from the E-Doc side.
FIGURE 14.2 Main source page of the LiMoS from the E-Doc side.
Figure 14.3 shows the monitoring patient. The monitoring page gives
page of the LiMoS from E-Doc side. information about the patient that
The main source page of the LiMoS includes hemodynamic, cardiac,
from the E-Doc side shows all the blood glucose level, body tempera-
bed information in the ICU, and the ture, pulse oximetry (respiration
bed allocated to the particular doctor rate), and the stress of a patient. All
is highlighted. The E-Doc checks this information is live updated to
in the bed to view the status of the the physicians.
FIGURE 14.3 Monitoring page of the LiMoS from E-Doc side.
14.3.2.2 MOBILE APP allocation of bed for the patient in

LiMoS system. These features play
The mobile app is especially for the a minor role that is avoidable when
E-Doc who is outside the hospital. out of station. This page represents
In case of any emergency even if the same page as in the system APP,
the E-Doc is out of station or not in which is the front page of the APP
the hospital he can update the status for the mobile user. Figure 14.5
through mobile app too. Certain shows the detailed information of
features are not present in the mobile the patient monitored.
app. Figure 14.4 indicates the
as Arduino. This is the mother-

board for all the components to be
integrated and is transmitted to the
Li-Fi transmitter, which is placed
near the ICU bed, from which the
receiver placed in the line of sight
(LOS) of the transmitter receives the
data from a distance of 20 m. After
that, the data is sent through the
optical cable communication link
to the Wi-Fi zone outside the ICU
normally provided in the hospitals.
This MMK is the most sensitive
region where RF radiation can be
FIGURE 14.4 ICU information—LiMoS. avoided. The Li-Fi luminaries are
the normal lighting used inside the
ICU. Thus, this part of the MMK
is a radiation-free environment
where the instruments are no more
distracted and error free from the
interrupted noise generated by the
RF radiation. The general setup is
shown in Figure 14.6.
FIGURE 14.5 Patient information

system—LiMoS.
14.4 MMK: INTERMEDIATE

PHASE
FIGURE 14.6 The block diagram of
The MMK works in the Li-Fi zone patient’s MMK in ICU.
that has a common board known
14.4.1 COMPONENTS USED adult, the normal respiration rate is

usually 12–25 breaths per minute,
In this patient healthcare monitoring, and if the respiration rate is above
some of the vital health parameters 25 or below 12, it is considered as
are continuously monitored and abnormal respiration rate.
recorded by a device called a medical
monitor. These health parameters
include hemodynamic (blood pres- 14.4.3 TEMPERATURE SENSOR
sure and blood flow), blood glucose
level (blood sugar level), tempera- In the proposed design, thermocouple
ture, cardiac (electrocardiogram), or resistance temperature detector is
and pulse oximetry (respiration rate) used to measure the body tempera-
of a patient. These parameters are ture. It is very essential to measure
sent to the medical practitioner to the temperature that reveals the
improve the health condition of a body's metabolic rate and hormonal
patient. healthiness. It is mandatory to
measure body temperature regularly.
Certain diseases can be examined by
14.4.2 RESPIRATION SENSOR measuring body temperature, and
the efficiency of a treatment initi-
A respiration sensor measures the ated can be analyzed by the physi-
respiration rate of a human per cian. An abnormal change in body
minute. Breath rate is characterized temperature can be related to fever
as the number of breaths an indi- (high temperature) or hypothermia
vidual takes for every moment, and (low temperature). These sensors
it is estimated when an individual is are normally used in apertures,
at rest state. The rate will increase medical incubators, blood analysers,
with fever, illness, and other medical anesthesia transfer machines, sleep
conditions. The most well-known apnoea machines, and temperature
strategy for the estimation of breath monitoring and control for an organ
rate is by physical evaluation, transplant system.
watching an individual’s rib along
with heart and sum of the number
of breaths for one moment time. 14.4.4 HEART RATE SENSOR
The limited data can be obtained
using breath rate estimation as it A heart rate sensor is used for
is an actual respiration design that measuring a person’s heart rate in
exposes the necessary details and real time and record the heart rate
other characteristics. The respiration in a system intended for future use.
rate differs from breathing. For the Every person’s fitness level can
be restrained by its heart rate. The LEDs can produce different data
heart rate is maintained to reduce rates, where a micro-LED bulb can
the risk of injury and mental fatigue. itself transmit 3.5 Gbps.
For measuring and displaying the
heart rate continuously, a heart rate
monitor device is to display the data 14.5 TRANSMISSION PHASE:
as the number of beats per minute. LI-FI IN NETWORKING
The pulse rate ranges from 60 to 100
bpm, which may fluctuate and rises Li-Fi module has two submodules:
with exercise, illness, injury, and one is the transmitter, and the other
emotions. is the receiver module (Figure 14.7).
14.4.5 ARDUINO UNO 14.5.1 TRANSMITTER AND

RECEIVER MODULES
Arduino is an open-source-based
microcontroller that shall be easily
programmable/reprogrammable at
any time. It also acts as a miniature
CPU just like a microcontroller by
taking inputs and controlling the
outputs for a variety of electronic
devices that are also capable of
observing and forwarding data’s FIGURE 14.7 Block diagram of transmitter
over the Internet through various and receiver modules.
Arduino shields.
14.5.1.1 TRANSMITTER
14.4.6 LED SECTION
The most important requirement for The data received from the sensors
a light source used in Li-Fi is the such as heart rate, temperature, and
ability to be switched on and off respiration are given as input to the
repeatedly in very short intervals of Arduino board, where these data are
time. LEDs are suitable light sources converted into a digital signal. This
for Li-Fi due to their ability to be digital signal is then given to the
switched on and off quickly. The Li-Fi transmitter part. The transmitter
variations in rate with the dimen- section is used to convert digital data
sions of LEDs are very important into visible light. The general concept
in Li-Fi technology. Different sized behind this is that the light intensity
of the LED is modulated, that is, the in the demodulation circuit. The
intensity of the light corresponds to photodiode is a semiconductor that
the data transmitted. The Arduino is converts the light signal into electric
not able to provide the right amount current. The need for the photodiode
of current to make the light intensity in this transmitter section is a rapid
strong and fast enough for transmit- response time with spectral sensi-
ting the data as light. To overcome tivity in the visible spectrum and a
this problem, a transistor is used as large radiation-sensitive area. The
a switch to turn on and off the LED, converted electrical signal is feeble
which made it possible to switch a and overwhelmed by noise. Then,
larger current faster. Figure 14.8 it undergoes demodulation through
shows the components used in the envelope detection to demodulate
Li-Fi transmitter setup that consists the data from the carrier signal.
of the Li-Fi transmitter module and The receiver does the filtering and
LED light source. then amplifies the signal. After
amplification, the signal will be in
an analog form that stances to fed
into an analog-to-digital converter,
before sending it to the Arduino
board. The photodiode generates
the current at very low value;
hence, for converting this current
into a voltage, a high-value resistor
is used. Further voltage is again
amplified by a comparator circuit to
give properly transmitted bits. The
amplitude of the amplified voltage
is the output of the 741 op-amp.
FIGURE 14.8 Li-Fi transmitter module. Then, the voltage comparator trans-
forms the signal into a digital format
before feeding into the micro-
14.5.1.2 RECEIVER SECTION controller that transmits the data
serially to another device. Figure
The receiver module has two
14.9 shows the receiver module
modules such as a demodulation
that transmits data 38,400 baud rate
circuit and a microcontroller. The
serially. It covers 5–15 ft distance.
transmitted optical pulse is retrieved
The coverage area can be increased
back into an electrical signal
by changing the LED wattage.
using a photodiode that is inbuilt
follows the specific channel model

for communication.
14.6 CHANNEL MODEL
OWC can be described by the

following continuous-time model
for a noisy communication link:
i(t)= x(t) F(j(t)) + y(t)

FIGURE 14.9 Receiver module.
where i(t) represents the received
distorted replica of the transmitted
14.5.3 LI-FI OPTICAL
signal, j(t), which is subject to the
CHANNEL CHARACTERISTICS nonlinear distortion function, Fj(t),
of the transmitter frontend. The
The transmitter and the receiver
nonlinearly distorted transmitted
sections communicate through an
signal is convolved with the channel
optical wireless path. The Li-Fi
impulse response, x(t), and it is
channel is generally called OWC,
distorted by AWGN, y(t), at the
as shown in Figure 14.10 (Dimitrov
receiver. Here, * denotes linear
et al., 2015). The optical channel is
convolution. The generalized model
defined in Figure 14.10.
of the OWC link in the time domain
It includes LED nonlinearity, a
is illustrated in Figure 14.10. The
dispersive optical wireless channel,
path of the OWC is to be considered
and the AWGN. This path is the
in the later stage of deep concentra-
light medium where the data is
tion. The general path LOS has to
transmitted much faster and in a
be studied for basic Li-Fi operation.
high data rate. This type of medium
LOS is known for line of sight.
FIGURE 14.10 Generalized block diagram of the OWC link in the time domain.
Generally, the origin of research in observation and incident angles with

Li-Fi is the light communication that respect to the normal direction.
can be transmitted through the LOS
of the receiver. Later on, the research
is evolved and defined as LOS and 14.7 PATH LOSS
NLOS.
The modeling of the path loss
of the optical wireless channel
14.6.1 LOS—LINE OF SIGHT increased the significant interest
after the ground-breaking work of
This is the direct path for commu- Gfellerand and Bapst (1979). They
nication from the source to the offered an analytical model for the
destination easily without any received optical power in LOS and
distraction. If the light is distracted, single-reflection NLOS in OWC. A
the communication is lost. This type design of the geometry of optical
wireless communication scenario is
of communication will be possible in
shown in Figure 14.10. Parameters
some applications that are fixed and
describing the mutual orientation of
not moving.
the transmitter and the receiver, as
well as their orientation toward the
14.6.2 NLOS—NON-LINE OF reflecting surface, are included. As
a communal trend in literature, the
SIGHT
channel characterization is repre-
sented by the following equation:
This is the redirected path for
communication from the source to x(D) = x(o)LOS + x(o)NLOS + noise
the destination easily with some component (14.1)
distraction. The distraction may be In Equation (14.1), the first
the wall inside a room or an obstacle, addend represents the LOS received
which in turn redirects the light by optical power from the direct path,
reflecting on that surface. Here, the while the second addend calculates
light that reflects at the obstacle from the NLOS received optical power
the LOS of the transmitter is known after a single reflection on the
as the angle of reflection ɵinc and the reflecting surface. Here, the received
angle from the obstacle to the desti- optical power is obtained after
nation is ɵobs in Figure 14.10. integration of the reflective surface
The mutual orientation of the in x and y directions and integration
transmitter and the receiver and their of the BRDF over θ and φ angles,
orientation toward the reflecting ranging over 0 ≤ θ ≤ π/2 and 0 ≤ φ
surface are described by means of ≤ 2π. In addition, θTx denotes the
observation angle of the transmitter )—optical filter gain.

on the direct path, and θRx is the
Thus, x(D) represents the gener
direct incident angle of the receiver.
alized model of the optical wireless
Observation and incident angles are
channel. This model of communica
computed with respect to the normal
tion is simulated in the transmission
directions of the radiating, reflecting,
phase of the Li-Fi zone. The input
or detecting elements. The distance
information of the intermediate
between the transmitter and receiver
phase is communicated by the
on the direct path is given by d, and
above-mentioned channel model.
A is the photosensitive area of the
This communication leads to good
PD. On the nondirect path, θTx is the
efficiency and hassle/radiation-free
observation angle of the transmitter
communication.
toward the reflective surface, θRx is
the incident angle of the receiver
from the reflective surface, and ρ 14.8 RESULT ANALYZING
denotes the reflection coefficient of PARAMETERS
the surface. The distance between
the transmitter and the reflective The LiMOS framework system
surface is given by d1, while d2 is analysis is performed based on the
the distance between the receiver basic three metrics like throughput,
and the reflective surface cost, and time.
⎧ ( m +1) A
x(o) LOS = ⎨ cos m (ϕ ) Ts ( Ψ ) g ( Ψ ) cos ( Ψ ) , 0 ≤ ψ < ψ c
⎩ 2π d
2
0, o ≥ ψ c 14.8.1 THROUGHPUT
(14.2)
Throughput is the ratio of the actual
⎧⎪ ρ Ah data transferred to the receiver to
x(o) NLOS =⎨ Ts ( Ψ ) g ( Ψ ) cos ( Ψ ) , 0 ≤ ψ < ψ c
⎪⎩ π ( h + d ) the actual data sent by the trans
2 2
0, o ≥ ψ c mitter. When utilized in the context

(14.3) of communication networks,
like LAN or packet radio, output
where
or network output is the rate of
A—area of the detector successful message delivery over
d—distance between the source and a communication channel. The
detector experimental setup includes the
—angle of incidence transmitter and receiver sections, in
H—distance above the receiver Figures 14.8 and 14.9, of the Li-Fi
—reflectivity unit integrated with the patient
—angle of irradiance monitoring unit. The throughput
is measured when the MMK unit a good result. Here, the value of
information is transmitted from both communication falls under
the transmitter to the receiver. the above category, but the better
The throughput performance of performance is achieved by the
existing wireless communication VLC for our LiMOS framework.
technologies is also measured with Figure 14.11 shows a compara-
real-time results. Comparatively, tive analysis of the patient moni-
Li-Fi achieves better performance toring system based on RF and
and has advantages over other tech- Li-Fi communication. Practically,
nologies. Normally, the value will possible metrics are considered for
fall between 0 and 1. The threshold test cases in the hospital that are
limit is 0.5. If the throughput is taken for analysis such as cost- and
above 0.5, it is considered to be time-based.
FIGURE 14.11 Throughput analysis of the RF and Li-Fi communication.

Cost: This metric is a practical generation medical fields who are

parameter that is calculated based going to run at the back of time
on the hospital management of the and patients can take care of more
doctors such as the travel cost of the patients.
doctor in times of emergency, the
special visit cost for both the hospital
and the patient. 14.10 FUTURE
Time: This metric plays a vital ENHANCEMENT
role in this analysis. The time
management is easily done when Although the area is emerging and
the doctor’s visit is reduced for only test cases are evolving, in near
both the hospital and the doctor. future, more research will give
Especially when the doctor is out of many ideas in this Li-Fi networking.
the station the time management is The Li-Fi has not reached its 100%;
handled easily. The emergency cases as of now, its use complements the
are handled more easily in online lags of Wi-Fi. The Li-Fi and Wi-Fi
support. together will run the world at the
back of them. This part of the work
that has been concentrated in the
14.9 CONCLUSION
ICU section can be enhanced for the
whole hospitals one by one, espe-
Thus, for complete monitoring
cially in the area of scanning and
of the patient, early observation
of any physical deterioration in X-ray where already full radiation
patient health, continuous health is being used. Most of the demerits
monitoring is needed, and especially can be overcome by Li-Fi. In the
radiation-free environment is very scanning center, X-ray can also be
important. In ICUs, these types of equipped. The data transfer speed at
setup hold good results especially present may be less comparatively
in the RF distraction of equipment’s as research gets stronger that can
leading to some noise and high error be overwhelmed too. In the fore
component. This methodology for coming era, artificial intelligence
ICUs in hospitals is a very useful technique can be embedded for the
feature. The time-consuming factors betterment of the LiMoS framework
and cost-effective issues are taken in the MMK unit. In the future, this
into account by the hospital and prototype can be developed on a
the patient’s side, respectively. This system on chip that can be used
system is more effective for future commercially.
KEYWORDS the 2nd International Workshop on Visible

Light Communications Systems (pp.
9–14), 2015.
LiMoS Chow C.-W., Chen C.-Y., Chen S.-H.,
Li-Fi “Visible light communication using
mobile-phone camera with data rate higher
IoT than frame rate,” Optics Express, vol. 23,
radiation-free device 26085, 2015.
online patient monitoring Dimitrov S., Haas H., Principles of
LED Light Communications: Towards
Networked Li-Fi, Cambridge University
Press, 2015.
REFERENCES Gfeller F. R., Bapst U., “Wireless in-house
data communication via diffuse infrared
Adivarekar J. S., Chordia A. D., Baviskar radiation,” Proceedings of the IEEE, vol.
H. H., Aher P. V., Gupta S., “Patient 67, no. 11, pp. 1474–1486, 1979.
monitoring system using GSM Guha P., “Light fidelity: technical overview
technology,” International Journal Of and its applications,” International Journal
Mathematics and Computer Research, of Mobile Ad-hoc and Sensor Networks,
vol. 1, no. 2, pp. 73–78, 2013. ISSN: vol. 4, no. 1, 2014.
2320-7167. https://www.lifi-centre.com/about-li-fi/
Agarwal T., “Project on remote patient applications, accessed on 02 February
monitoring system,” Microcontroller 2019.
Based Project on Patient Monitoring https://www.techopedia.com/7/31772/
System. (2013) https://www.elprocus. technology-trends/what-are-the-
com/microcontroller-based-project-on- advantages-and-disadvantages-of-li-fi-
patient-monitoring-system/ technology, accessed on 02 February 2019.
An IEEE Standard for Visible Light http://iopscience.iop.org/
Communications, Archived 29 August article/10.1088/1757-899X/325/1/012013/
2013 at the Wayback Machine, pdf, accessed on 07 April 2019
visiblelightcomm.com, dated April 2011. https://en.wikipedia.org/wiki/Li-Fi, accessed
It is superfast modern internet technology, on 02 August 2016.
accessed on 02/03/2017. Jeya A. S. , Venket S., Kumar V. L., “Data
An Internet of Light: Going online with LEDs transmission by Ceaser Cipher wheel
and the first Li-Fi smartphone. Archived encryption using LiFi,” International
11 January 2014 at the Wayback Machine, Journal of Advance Research, Ideas and
Motherboard Beta, Brian Merchant. Innovations in Technology. vol. 4, no. 2,
Breton J., “Li-Fi smartphone to be pp. 512–517.
presented at CES 2014.” Digital Versus, Kumar A., Raj A., Lokesh V., Sugacini
20 December 2013. Archived from the M., “IoT enabled by Li-Fi technology,”
original on 8 January 2014. Retrieved on Proceedings of the National Conference
16 January 2014. on Communication and Informatics, 2016,
Classen, J., Chen, J., Steinmetzer, D., Hollick, pp. 214–243, ISSN:2320-0790
M., & Knightly, E. The spy next door: LaMonica M., “Philips creates shopping
Eavesdropping on high throughput visible assistant with LEDs and smart phone.”
light communications. In Proceedings of IEEE Spectrum. 18 February 2014.
Archived from the original on 17 February the original on 15 January 2014. Retrieved
2019. on 16 January 2014.
Lee S. J., Jung S. Y., A SNR analysis of Rohner C., Raza S., Puccinelli D., and
the visible light channel environment for Voigt T., “Security in visible light
visible light communication. Proceedings communication: novel challenges and
of the 18th Asia-Pacific Conference opportunities,” Sensors & Transducers,
on Communication: Green and Smart vol. 192, no. 9, September 2015, pp. 9–15.
Communication for IT Innovation (APCC Study Paper on LiFi (Light Fidelity) & Its
2012), 2012, pp. 709–12. Applications, FN Division, TEC.
Liu J., Chen Y., Wang Y., Chen X., Cheng LiFi Data. Lumisense Technologies,
J., Yang J., “Monitoring vital signs and Chennai.
postures during sleep using WiFi signals,” Sudha S., Indumathy D., Lavanya A.,
IEEE Internet of Things Journal, vol. 5, no. Nishanthi M., Sheeba D. M., Anand
3, pp. 2071–2084, June 2018. V., “Patient monitoring in the hospital
Pottoo S. N., Wani T. M., Dar M. A., Mir S. management using Li-Fi”, Proceedings of
A., “IoT enabled by Li-Fi technology,” IEEE Technological Innovations in ICT
International Journal of Scientific Research for Agriculture and Rural Development
in Computer Science, Engineering and (TIAR), Chennai, 2016, pp. 93–96.
Information Technology, vol. 4, no. 1, Tsonev D., Sinanovic S., Haas H., “Complete
2018, pp. 106–110. ISSN:2456-3307. modeling of nonlinear distortion in OFDM-
Pramila R. S., Nargunam A. S., “Secure based optical wireless communication,”
patient monitoring system,” Journal of IEEE Journal of Lightwave Technology,
Theoretical and Applied Information vol. 31, no. 18, pp. 3064–3076, 2013.
Technology, vol. 62, no. 1, April 2014. Van Camp J., “Wysips solar charging screen
Purwita A. A., Soltani M. D., Safari M., could eliminate chargers and Wi-Fi.”
Harald H. Terminal orientation in OFDM- Digital Trends, 19 January 2014. Archived
based LiFi systems, vol. 18, no. 8, pp. from the original on 7 November 2015.
4003–4016, 2018. Retrieved on 29 November 2015.
Ramadhani E., Mahardika G. P., IOP Yadav S., Mishra P., Velapure M., Togrikar P.
Conference Series: Material Science S., “LI-FI technology for data transmission
Engineering, vol. 325, 012013, 2018. through LED,” Imperial Journal of
Rigg J., “Smartphone concept incorporates Interdisciplinary Research, vol. 2, no. 6,
LiFi sensor for receiving light-based data.” pp. 21–24, 2016.
Engadget. 11 January 2014. Archived from
CHAPTER 15
REAL-TIME DETECTION OF FACIAL

EXPRESSIONS USING K-NN,
SVM, ENSEMBLE CLASSIFIER AND
CONVOLUTION NEURAL NETWORKS
A. SHARMILA1, B. BHAVYA1, and K. V. N. KAVITHA2*, and
P. MAHALAKSHMI1
1
School of Electrical Engineering, Vellore Institute of Technology,
Vellore, India
2
Deloitte Consulting India Private Limited, Bengaluru, Karnataka
Corresponding author. E-mail: kvnkavitha@yahoo.co.in
*
ABSTRACT chapter will provide the best method

that can be adopted for noninva-
Identifying human emotions is sive real-time facial expression
important in facilitating commu- detection. This chapter involves
nication and interactions between a comparative analysis of facial
individuals. They are also used as expression recognition techniques
an important means for studying using the classic machine learning
behavioral science and psycho- algorithms—k-nearest neighbor,
logical changes. There are many support vector machine, ensemble
applications that use facial expres- classifiers, and the most advanced
sions to evaluate human nature, deep learning technique using
their feelings, judgment, and convolutional neural networks.
opinions. Since recognizing human Successful and satisfactory results
facial expressions is not a simple have been obtained giving the future
task because of some circumstances researchers in this field an insight
due to illumination, facial occlu- into which technique could be used
sions, face color/shape, etc., this to get the desired results.
15.1 INTRODUCTION can provide valuable reference in

medical diagnosis and biomedical
Facial expressions of a person are applications. Keeping in mind the
more meaningful than the words advancements in the field of digital
spoken by the person because they image processing, facial expression
express all his feelings or a person recognition (FER) is one the most
can easily understand unspoken important applications that is being
words by facial expressions. In utilized in today’s world. In the
many diseases such as stroke and current decade, a lot of progressive
paralysis, facial motion disorders improvements have been made in
are an early symptom. The patients domains like face recognition, face
with stroke might have swallowing tracking, face retrieval, and FER.
disorders in the initial stage. In An FER system involves different
the case of paralysis, resulted measures. The basic framework of
from the accident or illness, the an FER system is shown in Figure
facial muscle could be the last 15.1.
failed motor unit. Many prosthetic
devices can control facial emotions.
Hence, facial emotion recognition
FIGURE 15.1 Basic framework of an FER system.
• Face detection is the first seven different emotions like

phase in FER. as angry, happy, sad, surprise,
• The next step is to extract neutral, fear, and disgust.
features from the detected
face; then, from these features, The FER was introduced in 1978
the best features are selected and it creates the main point for face
and irrelevant features are detection with feature extraction,
eliminated in the feature image alignment, normalization, and
selection phase (Sharmila and categorization. FER is employed
Geethanjali, 2016a, b). in various kinds of applications of
• Finally, we use it for clas- human–computer interaction like
sification where different face image processing, automatic
expressions are classified into recognition of expressions, video
Real-Time Detection of Facial Expressions 333
surveillance systems, artificial intel- patterns, and so on. Typical CNNs

ligence, and different challenging use 5–25 distinct layers of pattern
tasks during this decade. Researchers recognition. A CNN is a special
Ekman and Friesen represent six case of the neural network described
basic facial expressions (emotions) above. A CNN consists of one or
such as happy, surprise, disgust, sad, more convolutional layers, often
angry, and fear. As per the Mehara- with a subsampling layer, which
bian work, 55% communicative cues are followed by one or more fully
can be judged by facial expressions; connected layers as in a standard
hence, recognition of facial expres- neural network. These networks have
sions becomes a major modality. In been some of the most influential
this chapter, we have developed an innovations in the field of computer
FER system based on different clas- vision, decreasing the classification
sifiers and neural networks. error record from 26% to 15%, an
astounding improvement at the time.
15.1.1 CONVOLUTION
NEURAL NETWORKS 15.1.2 OBJECTIVE
A neural network is a system of inter- • The objective of this work is

connected artificial “neurons” that to generate an FER system
exchange messages between each that could be used to detect
other. The connections have numeric the different expressions
weights that are tuned during the of the human face, namely,
training process so that a properly happy, sad, anger, neutral,
trained network will respond disgust, fear, and surprise in
correctly when presented with an real time using the concept
image or pattern to recognize (Shan of machine learning and deep
et al., 2017). The network consists of learning.
multiple layers of feature-detecting • FER) is one the most impor-
“neurons.” Each layer has many tant applications that is being
neurons that respond to different utilized in today’s world.
combinations of inputs from the • Looking at how interac-
previous layers. As shown in Figure tive the human world and
15.1, the layers are built up so that computers have become,
the first layer detects a set of primi- the FER system is used in
tive patterns in the input, the second artificial intelligence to create
layer detects patterns of patterns, the a better human–computer
third layer detects patterns of those interaction. For example,
the FER is used in robots so proposed a way to do automatic

that they can understand the coding in profile images but not in
emotions of the human they real time. This involved the extrac-
are interacting with and react tion of frontal and profile facial
accordingly. points. Bucio and Pitas (2004)
• In studying psychological performed principal component
and behavioral sciences, the analysis (PCA) for a comparison
FER systems has been used. purpose. Local non-negative matrix
Whatever be the face shape, factorization (LNMF) outperformed
color, skin texture, age of both PCA and non-negative matrix
the person, the FER systems factorization (NMF), whereas NMF
detect the emotion repre- performed the poorest. They discov-
sented by the face. ered that the cumulative learning
• Since the proposed method system (CSM) classifier is more
(CNN model) can be used reliable than Matthews correlation
to detect multiple faces in a coefficient (MCC) and gives better
single frame, this work can be recognition. They used the Cohn–
used by teachers to check the Kanade database that had a sample
students’ emotions in class size of 164 samples and the JAFFE
while teaching and improve database that had 150 samples.
their methods accordingly. The Cohn–Kanade database
• FER systems when hardware gave the highest accuracy of 81.4%,
interfaced can also be used whereas the accuracy of the JAFFE
for further applications like database ranged from 55% to 68%
in ATMs to prevent ATM for all the three methods. They
robberies; if the person with- applied a nearest-neighbor classifier
drawing is scared, the system using CSM and MCC. Their feature
will not dispense cash. extraction included image represen-
tation using NMF and LNMF.
Pantic and Patras (2005) devel-
15.1.3 BACKGROUND oped an algorithm that recognized
27 AUs, which was invariant to
Pantic and Rothkrantz (2004) devel- occlusions like glasses and facial
oped an algorithm that recognized hair. It was shown to give a better
facial expressions in frontal and performance than the AFA system. It
profile views. They took a sample had an overall average recognition of
of 25 subjects in an MMI database. 90%. They used Cohn–Kanade and
They used a rule-based classifier MMI databases. The Cohn–Kanade
that had an accuracy of 86%. They database had 90 images, whereas
the MMI database had 45 images. face AUs was 96.7%. The perma-
They tracked a set of 20 facial nent features extracted were optical
fiducial points using temporal rules. flow, Gabor wavelets, and multistate
Zheng et al. (2006) used KCCA to models, and the transient feature
recognize facial expressions. The extracted was canny edge detection.
singularity problem of the Gram Bourel et al. (2001) developed an
matrix has been tackled using an algorithm that deals with recognizing
improved KCCA algorithm. Their facial expressions in the presence
accuracy on JAFFE database using of occlusions. It also proposed the
semantic info with leave-one-image- use of modular classifiers instead of
out cross-validation was 85.79% and monolithic classifiers. Classification
with leave-one-subject-out (LOSO) is done locally, and then the classifier
cross validation was 74.32% and on output is fused. The Cohn–Kanade
Ekman's database was 78.13%. They database was used, and there were 30
used JAFFE and Ekman's pictures of subjects. A total of 25 sequences for
affect database. Their JAFFE data- 4 expressions (a total of 100 video
base consisted of 183 images, and sequences) were taken. Local spatio-
Ekman's database had 96 images. temporal vectors were obtained from
Neutral expressions were not chosen the Kanade lucas tomasi tracker
from either database. The correla- algorithm. They used the modular
tion is used to estimate the semantic classifier with data fusion. Local
expression vector, which is then used classifiers are rank-weighted KNN
for classification. They converted 34 classifiers.
landmark points into a labeled graph Pardas and Bonafonte (2002)
using the Gabor wavelet transform. developed an algorithm for automatic
Then, a semantic expression vector extraction of MPEG-4 FAPs. This
is built for each training face. proves that FAPs convey the neces-
Tian et al. (2001) developed an sary information that is required to
algorithm that recognized posed extract the emotions. An overall
expressions. It was a real-time efficiency of 84% was observed
system. They used Cohn–Kanade across six prototypic expressions.
and Ekman–Hager Facial Action They used the whole Cohn–Kanade
Exemplars. They used 50 upper database and HMM classifier.
face samples from 14 subjects MPEG-4 FAPs extracted using an
performing 7 AUs and 63 lower face improved active contour algorithm
sample sequences from 32 subjects and motion estimation. Cohen et
performing 11 AUs. The accuracy of al. (2003) developed a real-time
recognition of upper face AUs was system. It suggests use of HMMs
96.4%, and the accuracy of lower to automatically segment a video
into different expression segments. rates have been shown. They used
They used Cohn–Kanade and their the whole Cohn–Kanade database.
own database. They took 53 subjects It had an accuracy of 99.7% for
under Cohn–Kanade, and 5 subjects FER and 95.1% for FER based on
under their own database. They used AU detection. They used multiclass
NB, TAN, and ML-HMM classifiers. SVM for expression recognition and
They extracted a vector of extracted six classes of SVM, one for each
motion units using the piecewise expression. The feature extracted
Bézier volume deformation model was the geometric displacement of
(PBVD) tracker. Candide nodes.
Cohen et al. (2003) made a real- Wang and Yin (2007) proposed
time system that used semi-super- a topographic modeling approach
vised learning to work with some in which the gray-scale image
labeled data and large amount of is treated as a 3D surface. They
unlabeled data. They used the Cohn– analyzed the robustness of detected
Kanade database and the Chen– face region and the different intensi-
Huang database. They also extracted ties of facial expressions. They used
a vector of extracted motion units the Cohn–Kanade database and
using the PBVD tracker. the MMI database. They took 53
Sebe et al. (2007) developed an subjects, and 4 images per subject
algorithm that recognizes sponta- were taken for each expression,
neous expressions. They created which made a total of 864 images.
an authentic DB where subjects They used QDC, LDA, SVC,
are showing their natural facial and NB classifiers that extracted
expressions. They used spontaneous topographic context expression
emotions database and also the descriptors. It had an accuracy of
Cohn–Kanade database. The sample 92.78% with QDC, 93.33% with
for the database consisted of 28 LDA, and 85.56% with NB on the
subjects showing mostly neutral, joy, Cohn–Kanade database.
surprise, and delight whereas, the Dornaika and Davoine
Cohn–Kanade database consisted of (2008) proposed a framework for
53 subjects. They used Bayesian net simultaneous face tracking and
classifier, SVM, and decision tree. expression recognition. Two AR
They extracted MUs generated from models per expression gave better
the PBVD tracker. mouth tracking and in turn better
The algorithm of Kotsia and performance. The video sequences
Pitas (2007) recognizes either six contained posed expressions. They
basic facial expressions or a set of created their own database and used
chosen AUs. Very high recognition several video sequences. Also, they
created a challenge 1600 frame test 15.2.1 FER BASED ON

video, where subjects were allowed CLASSIFIERS
to display any expression in any order
for any duration. Results have been Using the concept of machines
spread across different graphs and learning, I created a train dataset that
charts. First head pose is determined contains five different file classes,
using online appearance models and happy, sad, angry, disgust, surprise,
then expressions are recognized fear, and neutral. This was created
using a stochastic approach. They by dividing the the JAFFE image
extracted the Candide face model to database into seven class files. Using
track features. the feature extraction technique, the
features from each emotion class
were extracted. Different classi-
15.2 DESCRIPTION AND fiers were trained on the features
GOALS extracted, and the validation accu-
racy was generated. Based on the
The work focuses on developing trained classifier one chooses to
a FER system based on machine export to the model, the real-time
learning and deep learning algo- emotion is recognized in comparison
rithms. The entire work is be divided to the amount of accuracy achieved.
into two parts: FER systems based The block diagram of the proposed
on classifiers and FER systems method is shown in Figure 15.2.
based on CNNs.
FIGURE 15.2 System block diagram followed by the classifier-based FER systems.
15.2.2 FER BASED ON CNN lib in Python, we have built the

required CNN model. The trained
The Kaggle dataset has been used CNN model and the weights are
for expression recognition that has saved and loaded into the model we
48×48 pixel labeled images (super- will work with. This image is loaded
vised learning) divided into training into the model that has been loaded
(28,709 images) and validation with pretrained weights that were
(3589 images) datasets. Using Keras imported from the Keras model,
and finally, this unlabeled image the predictions that are displayed.
whose class needs to be detected is The block diagram of the proposed
passed through the model making method is shown in Figure 15.3.
FIGURE 15.3 System block diagram followed by the CNN based FER systems.
15.3 DATASET SPECIFICATION not hand designed. Convolu-

tion filter kernel weights are
The JAFFE image dataset comprises decided as part of the training
216 images of different emotion process. Convolutional layers
classes that are well labeled, as are able to extract the local
supervised learning requires labeled features because they restrict
images for training the classifier. The the receptive fields of the
FER2013 image dataset from Kaggle hidden layers to be local.
comprises of 48×48 pixel labeled • Following are the technical
images (supervised learning) divided specifications of the CNN
into training (28,709 images) and model given in Table 15.1.
validation (3589 images) datasets.
In this work, all classifiers were
used with the help of the classifica- TABLE 15.1 Technical Specifications of
tion learner on MATLAB and the the CNN Model
holdout validation was kept 10%. Batch size 100
This means that 10% of training data Epochs 20
is used by the classifier for validation Optimizer Adam
purpose for the classifier to check Loss function Categorical cross entropy
and improve its accuracy
• For real-time detection, the
image is converted into gray
15.3.1 CONVOLUTION scale and 48×48 pixels as
NEURAL NETWORKS the CNN model takes in only
images of the specified size.
• In a CNN, convolution layers
play the role of a feature In this work, the CNN was built
extractor. However, they are using Keras, and for validation, 3853
images of the training data are used for which we used the JAFFE image
by the CNN for validation purpose database that comprises 213 images
for the CNN to check and improve of different emotion classes, namely,
its accuracy. angry, sad, happy, surprise, disgust,
and neutral. Figure 15.4 represents
the sample of the facial expressions
15.4 DESIGN from the JAFFE image database.
APPROACH AND DETAIL
15.4.1 DESIGN APPROACH: 15.4.1.2 FEATURE
CLASSIFIER-BASED MODEL EXTRACTION
15.4.1.1 INPUT DATA Feature extraction is the method of

extracting certain characteristics
Using the concept of machine from the signal. Using the SURF
learning, we created a train dataset feature extraction technique, the
FIGURE 15.4 Sample of the JAFFE image database.
features from each emotion class kept for the validation purpose of the
were extracted and a total of 500 classifier post the training.
features were used among all the
features extracted.
15.4.1.4 TRAINING THE
CLASSIFIER AND GENERATING
15.4.1.3 FEATURES LOADED THE VALIDATION ACCURACY
TO CLASSIFIER
A classifier is a set pattern recogni-
Features were converted from an tion algorithm that is used to define
array into a table, and the table was whether or not the test data belongs
loaded into the classifier. This was to a certain class based on the
done using the help of a classifica- training set and the labels given in
tion learner toolbox of MATLAB. the training set. The classifiers used
The holdout validation was kept as in this work are elaborately explained
10%, that is, 10% of the data was in the technical specifications. The
different classifiers were trained on 15.4.2 DESIGN APPROACH:

the features extracted, and validation CNN-BASED MODEL
accuracy was generated on each
classifier. The scatter plot, ROC 15.4.2.1 INPUT DATA
curve, and confusion matrix can also
be generated. The Kaggle dataset, a sample for
which is shown in Figure 15.5, has
15.4.1.5 IDENTIFYING IMAGE been used for expression recognition
IN REAL TIME that has 48×48 pixel labeled images
(supervised learning) divided into
Based on the trained classifier one training (28,709 images) and vali-
chooses to export, the real-time dation (3589 images) datasets.
emotion is recognized in comparison
to the amount of accuracy achieved.
15.4.2.2 CONSTRUCTING THE
CNN MODEL
15.4.1.6 RESULT
Using Keras lib in Python, we have
The results obtained and the accu-
built the required CNN model. The
racy achieved are discussed in detail
CNN model used in this work is
in the following sections. Also, the
results were tabulated. shown in Figure 15.6.
FIGURE 15.5 Sample from the Kaggle database.

FIGURE 15.6 Architecture of the CNN model.
15.4.2.3 TRAINING THE CNN weights that are imported from

MODEL the Keras model, and finally, this
unlabeled image whose class needs
The CNN model is trained with all to be detected is passed through the
the training images in every epoch. model making the predictions that
The optimizer used is Adam opti- are displayed.
mizer, and the loss function used is
categorical cross entropy. A batch
size of 100 was given for each epoch, 15.5 VARIOUS MODEL
and the final trained CNN model was RESULTS AND DISCUSSION
saved along with the trained weights.
15.5.1 KNN MODEL
15.4.2.4 IMAGE CAPTURE The KNN model while making

FROM LIVE VIDEO AND prediction on new data (unlabeled)
PROCESSING takes into account the nearest K
different classes and the majority of
The camera is accessed to capture the
the classes in the neighborhood will
image from live video and converted
decide the class of the data (Wang et
into gray scale. The face is detected
al., 2015). This KNN classifier gives
using Harr cascade capable of
detecting multiple faces in the single an accuracy of 87% on the JAFFE
frame. The image is converted into 48 image database. Figure 15.7 repre-
× 48 pixels as the CNN model takes sents the confusion matrix of the
in only images of the following size. KNN classifier. Moreover, the ROC
curves of the respective emotions are
represented in Figure 15.8(a)–(f). It
15.4.2.5 EMOTION was observed that the ROC curves
RECOGNITION of the emotions were a perfect right
angle for emotions that represented
The image is loaded into the model 100% accuracy.
that has been loaded with pretrained
FIGURE 15.7 Confusion matrix—KNN classifier.
FIGURE 15.8 (a) ROC of happy, (b) ROC of sad, (c) ROC of disgust, (d) ROC of surprise,
(e) ROC of neutral, and (f) ROC of anger.
15.5.2 SVM MODEL TABLE 15.2 Accuracy Results When

Different SVM Kernels Were Chosen
The SVM model searches for the Classifier Type Accuracy
closest points, which it calls the (%)
"support vectors," draws a line
SVM classifier (linear) 73.9
connecting them, and then declares
the optimal hyperplane to be a plane SVM classifier (quadratic) 87
that bisects and is perpendicular to the SVM classifier (cubic) 91.3
connecting line dividing the classes.
When a new test data is fed as input,
it declares which class it belongs to, Thus, from Table 15.2, it can be
as the division has already been made concluded that the cubic SVM is the
(Kumbhar et al. 2012). Not all SVM best fit, and hence, the cubic SVM
problems are linearly separable; at model is exported into the model
times, the data distribution is such for real-time emotion detection.
that on applying simple linear SVM The confusion matrix of the model
the accuracy will be very low. So, it is represented in Figure 15.9. Also,
is very important to select the correct the ROC curves of the respective
kernel function that gives better emotions are represented in Figure
accuracy to segregate the data in the 15.10(a)–(f). It was observed that the
train dataset. Table 15.2 shows the ROC curves of the emotions were a
accuracy result of the SVM classifier perfect right angle for emotions that
when different kernels are used. represented 100% accuracy.
FIGURE 15.9 Confusion matrix—cubic SVM model.

15.5.3 ENSEMBLE SPACE KNN the individual model was the KNN
MODEL model. The accuracy achieved was
95.7%. The confusion matrix of
The ensemble-based subspace the ensemble space KNN model
KNN model makes use of predic- is represented in Figure 15.11.
tion of individual models that Moreover, the ROC curves of the
train random parts of the training respective emotions are represented
dataset and generate results and by in Figure 15.12(a)–(f). It was
comparing the prediction results observed that the ROC curves of the
of these individual models. The emotions were a perfect right angle
ensemble subspace model classi- for emotions that represented 100%
fies the real-time new image that accuracy.
is given for prediction. In our case,
FIGURE 15.11 Confusion matrix—ensemble-based subspace KNN model.
The simulation results of detec-

tion of different real-time emotions
like happy, sad, angry, etc. are
summarized in Figures 15.13–15.17.
FIGURE 15.16 Real-time emotion

recognized—surprise.

recognized—sad.

recognized—disgust.
15.5.4 CNN MODEL

recognized—happy. The CNN model was made on
Keras, and the trained weights were
loaded onto our FER model. The
CNN model has greater prediction
accuracy compared to the classi-
fiers mentioned above, and with an
increase in the number of epochs,
one can see that the loss reduces
constantly and the accuracy increases
(Figures 15.18–15.20).

recognized—anger.
FIGURE 15.18 Loss vs epoch graph (left)

and accuracy vs epoch graph (right).
implemented on the JAFFE image

database that contains only 213
images, whereas the CNN model
works on approximately 29,000
images and the obtained results are
only on 20 epochs that will improve
when the number of epochs is
increased.
TABLE 15.3 Comparison of Various

Models
Classifier Accuracy
(%)
KNN classifier 87
FIGURE 15.19 (a) Emotion recognized— SVM classifier (linear) 73.9
happy, (b) emotion recognized—neutral, (c) SVM classifier (quadratic) 87
emotion recognized—surprise, (d) emotion
recognized—sad, (e) emotion recognized— SVM classifier (cubic) 91.3
angry, and (f) emotion recognized—fear. Ensemble subspace KNN 95.7
CNN model 60 (20
epochs)
Table 15.4 shows the comparison

of research work conducted by other
researchers. It can be seen that the
FIGURE 15.20 Emotion detection of JAFFE image database on the whole
multiple images. achieves very low accuracy as the
number of images is not sufficient
to train the model. So, on the whole,
15.5.5 COMPARISON OF THE achieving a good accuracy on
ACCURACY RESULTS JAFFE is difficult, which our model
has achieved up to a certain extent.
It was observed that the CNN model Moreover the CNN model that we
gave the best prediction results (not use gives considerable accuracy at
validation accuracy) compared to 20 epochs, which is comparable to
the classifier models. Table 15.3 the accuracy that the neural network
represents the accuracies of different model has obtained. Although the
models. The accuracy values do CNN models different from the ones
not match with the said conclusion listed below are observed to show an
because the classifiers are being accuracy of around 90%, which even
the model proposed in our work can is increased, which will be our future
achieve once the number of epochs work.
TABLE 15.4 Comparison of Various Methods Adopted by Researchers on FER

Researcher Year Methods Dataset Accuracy
(%)
Pantic et al. 2004 Rule-based classifier on frontal MMI dataset 86
and profile views
Bucio et al. 2004 Nearest neighbor classifier JAFFE 55–68
using CSM and MCC
Sebe et al. 2007 Bayesian nets classifiers, Cohn Kanade 72.46-
SVMs, decision trees 93.06
Kotsia and 2007 Multiclass SVM based on AU Cohn Kanade 95.1
Pitas detection
Dornaika and 2008 First, the head pose is Own Database 56.3
Davoine determined using online
appearance models, and then,
expressions are recognized
using a stochastic approach
Neeta Sarode 2010 2D appearance-based model JAFFE 81
et al. radial symmetry transform
Samad et al. 2011 Gabor wavelet, PCA, multiclass FEEDTUM 81.7 (avg)
SVM database
Kumbhar. et al. 2012 Neural network JAFFE 60 (avg)
Wang et al. 2015 SVM JAFFE 85.74
KNN 83.91
Alexandru 2017 SVM classifier baseline with Kaggle dataset 31.8
Savoiu et al. CNN (VGG-16 and ResNet50)
ensemble-based classifier 67.2
baseline
Ke Shan 2017 KNN classifier JAFFE 65.1163
CNN 76.7
This work 2018 KNN classifier JAFFE 87
SVM (cubic) classifier JAFFE 91.3
Ensemble-based KNN classifier JAFFE 95.7
Convolution neural-network- Kaggle 60 (20
based FER epochs)
15.6 CONCLUSIONS detection was more accurate, and

the results were more reliable.
It has been found that the real-time Also, multiple faces can be detected
FER model based on machine and emotions can be recognized.
learning algorithms are successful Although working on the CNN
in detecting emotions. The proposed model has been more strenuous
FER system was trained on different and time-consuming due to the
classifiers—KNN, SVM, and bulky nature of the dataset that it
ensemble-based subspace KNN requires for training, the results are
models, and it has been found out really satisfactory. For instance, the
that the ensemble-based subspace accuracy obtained was only 60% at
KNN model gives a maximum 20 epochs, which can be increased
accuracy of 95.7% compared to the by increasing the number of epochs,
KNN model that gives an accuracy but the prediction at 60% is a way
of 87% and the SVM model (cubic) better than the classifiers because the
that gave an accuracy of 91.3%. In CNN model trains on large number
real-time detection, it is observed of data, and unlike the classifiers are
that the model was getting confused that trained on JAFFE, this model
between disgust, neutral, and sad. is trained on the Kaggle dataset that
This is because the JAFFE database has a variety of facial expressions.
on which the classifier is trained has Therefore, we can conclude that the
very similar images in these three deep-learning-based CNN model
emotion classes, which is why even works better than the machine-
the accuracy of the classifiers faced learning-based FER system.
a decrease, as clearly represented by
the confusion matrix. So, in future,
more images once added into the KEYWORDS
model for training will give a clear
distinction while detection. (Apart facial expression
from Japanese women, different k-nearest neighbor
other faces need to be added.) support vector machine and
Comparing it to the already existing deep learning technique
research in this field, our model
gives maximum accuracy on the
JAFFE image database.
REFERENCES
It is observed in our research
that the convolution neural network Bourel, F., C.C. Chibelushi, and A.A. Low,
model’s performance was way better “Recognition of Facial Expressions in the
than the classifiers, the emotion Presence of Occlusion,” Proc. of the 12th
British Machine Vision Conference, vol. 1, Cybern., vol. 4, pp. 3358–3363, October
pp. 213–222, 2001. 2005.
Buciu, I. and I. Pitas, “Application of Pantic M. and J.M. Rothkrantz, “Facial
Non-Negative and Local Non Negative Action Recognition for Facial Expression
Matrix Factorization to Facial Expression Analysis from Static Face Images.” IEEE
Recognition.” Proc. of the ICPR, pp. Trans. Systems, Man Cybernet., Part B,
288–291, Cambridge, UK, August 23–26, vol. 34, no. 3, pp. 1449–1461, 2004.
2004. Pardas, M. and A. Bonafonte, “Facial
Cohen, I., N. Sebe, F. Cozman, M. Cirelo, animation parameters extraction and
and T. Huang, “Learning Bayesian expression recognition using Hidden
Network Classifiers for Facial Expression Markov Models.” Signal Process.: Image
Recognition Using Both Labeled and Commun., vol. 17, pp. 675–688, 2002.
Unlabeled Data.” Proc. of the IEEE Conf. Samad, R. and H. Sawada. “Extraction of
Computer Vision and Pattern Recognition the Minimum Number of Gabor Wavelet
(CVPR), vol. 1, pp. I-595–I-604, 2003. Parameters for the Recognition of Natural
Cohen, I. N. Sebe, A. Garg, L.S. Chen, Facial Expressions.” Artif. Life Robot.,
and T.S. Huang, “Facial Expression vol. 16, no. 1, pp. 21–31, 2001.
Recognition From Video Sequences: Sarode, N. and S. Bhatia. “Facial Expression
Temporal and Static Modeling.” Comput. Recognition.” Int. J. Comput. Sci. Eng.,
Vis. Image Understand., vol. 91, pp. vol. 2, no. 5, pp. 1552–1557, 2010.
160–187, 2003. Savoiu, Alexandru, and James Wong.
Dornaika F. and F. Davoine, “Simultaneous “Recognizing facial expressions using
Facial Action Tracking and Expression deep learning.” (2017).
Recognition in the Presence of Head Sebe, N., M.S. Lew, Y. Sun, I. Cohen, T.
Motion.” Int. J. Comput. Vision, vol. 76, Gevers, and T.S. Huang, “Authentic Facial
no. 3, pp. 257–281, 2008. Expression Analysis.” Image Vis. Comput.,
Ekman, P. and W.V. Friesen, “Constants vol. 25, pp. 1856–1863, 2007.
across Cultures in the Face and Emotion.” Shan, K., J. Guo, W. You, D. Lu, and R. Bie,
J. Pers. Soc. Psychol., vol. 17, no. 2, pp. "Automatic Facial Expression Recognition
124–129, 1971. Based on a Deep Convolutional-Neural-
Kotsia I. and I. Pitas, “Facial Expression Network Structure." Proc. of the IEEE
Recognition in Image Sequences Using 15th International Conference on Software
Geometric Deformation Features and Engineering Research, Management
Support Vector Machines.” IEEE Trans. and Applications (SERA), London, pp.
Image Process., vol. 16, no. 1, pp. 123–128, 2017.
172–187, 2007. Sharmila A. and P. Geethanjali, “Detection
Kumbhar, M., A. Jadhav, and M. Patil, of Epileptic Seizure from EEG Based on
“Facial Expression Recognition Based on Feature Ranking and Best Feature Subset
Image Feature." Int. J. Comput. Commun. Using Mutual Information Estimation.”
Eng., vol. 1, pp. 117–119, 2012. J. Med. Imag. Health Informat., vol. 6,
Mehrabian. A., "Communication without 1850–1864, 2016a.
Words." Psychol. Today, vo1. 2, no. 4, pp. Sharmila A. and P. Geethanjali, “DWT
53–56, 1968. Based Epileptic Seizure Detection from
Pantic, M. and I. Patras, “Detecting Facial EEG Signals Using Naïve Bayes and
Actions and Their Temporal Segments KNN Classifiers.” IEEE Access, vol. 4,
in Nearly Frontal-View Face Image 7716–7727, 2016b.
Sequences.” Proc. IEEE Conf. Syst., Man
Shih, F.Y., Chuang C.F., and Wang P.S.P., Wang J. and L. Yin, “Static Topographic
“Performance comparisons of facial Modeling for Facial Expression
expression recognition in JAFFE Recognition and Analysis.” Comput. Vis.
database,” Int. J. Pattern Recognit. Image Understand., vol. 108, pp. 19–34,
Artificial Intell., vol. 22, no. 3, pp. 2007.
445–459, 2008. Wang, X.-H., A. Liu, and S.-Q. Zhang. “New
Shrivastava, D. and L. Bhambu, “Data Facial Expression Recognition Based on
Classification Using Support Vector FSVM and KNN." Optik, vol. 126, pp.
Machine.” J. Theor. Appl. Inf. Technol., 3132–3134, 2015.
vol. 12, no. 1, pp. 1–7, 2010. Zheng, W., X. Zhou, C. Zou, and L. Zhao,
Tian, Y., T. Kanade, and J. Cohn, “Facial Expression Recognition Using
“Recognizing Action Units for Facial Kernel Canonical Correlation Analysis
Expression Analysis.” IEEE Trans. (KCCA).” IEEE Trans. Neural Netw., vol.
Pattern Anal. Mach. Intell., vol. 23, no. 2, 17, no. 1, pp. 233–238, January 2006.
pp. 97–115, 2001.
CHAPTER 16
ANALYSIS AND INTERPRETATION OF

UTERINE CONTRACTION SIGNALS
USING ARTIFICIAL INTELLIGENCE
P. MAHALAKSHMI and S. SUJA PRIYADHARSINI*
Department of Electronics and Communication Engineering, Anna
University Regional campus-Tirunelveli, Tamilnadu, Tirunelveli, India.
Corresponding author. E-mail: sujapriya_moni@yahoo.co.in
*
ABSTRACT of 300 records, in which each record

consists of three signals recorded
Electrohysterography (EHG) is using three channels from the
the technique used to monitor the abdomen of a pregnant woman.
activity of the uterine signals. The The dataset contains 160 true
EHG signals are acquired from labor signals and 140 false labor
the abdominal surface of pregnant signals. Then obtained signal is
women, and the readings used to filtered by using the five pole order
study the electrical activity produced Butter worth band pass filter. The
by the uterus. The electrical signal cut-off frequency applied for the
obtained from the abdominal surface butter worth band pass filter is 0.3–1
helps to differentiate the true labor Hz.
and false labor pain. EHG signals The proposed work uses the
are recorded from the three chan- features such as mean, median,
nels. The objective of the proposed maximum frequency, median
work is to differentiate the true labor frequency, kurtosis, skewness,
and false labor pain from the EHG energy and entropy extracted from
signal of the pregnant woman. The the signals for identifying true labor/
proposed work employs EHG signal false labor pain signals. This work
available in the physionet database. employs different classifiers indi-
Three channels are used to extract vidually for classifying the signals
the EHG signal. The dataset consists into true labor pain/false labor pain
signal based on the values of the diagnosis of premature delivery

features extracted. helps to delay the delivery by proper
This chapter compares the perfor- in-time treatment and thus it helps to
mance of different machine learning prevent the premature baby and its
algorithms such as support vector associated health issues and risk of
machine (SVM), extreme learning death.
machine (ELM), k-nearest neighbor
(KNN), artificial neural network
(ANN), radial basis function neural 16.1 INTRODUCTION
network (RBNN) and random forest
(RF) classifiers in identifying true EHG or uterine electromyography
(EMG) records electrical activity
and false labor pain signals. The
signals responsible for the invol-
performance of each classifiers,
untary contractions of the uterus.
are evaluated individually. The
EHG signals are recorded from
performance of SVM classifier is
the abdominal surface of pregnant
evaluated with different kernel func-
women, and the readings used to
tions like linear, polynomial, radial
study the electrical activity produced
basis function (RBF) and multilayer
by the uterus. EHG is a measure of
perceptron (MLP). SVM yields an
the electrical potential generated by
accuracy of 58%, 55%, 57%, and
the uterine muscles during pregnancy
55% for different kernel functions and labor. The technique consists of
linear, polynomial, RBF and MLP placing electrodes on the maternal
respectively. The Performance of abdomen and recording electrical
KNN classifier is evaluated with activity. Monitoring uterine signals
kernel function 1 norm. KNN clas- is critical during pregnancy to deter-
sifier yields an accuracy of 77% for mine whether the onset of labor pains
kernel function 1 norm. The Accu- is indicative of true or false labor.
racy for different machine learning EHG signals make it possible
algorithms such as ANN, RBNN, RF, to detect uterine activity related to
ELM are 96%, 97%, 82% and 98%, contractions during both gestation
respectively. The highest accuracy of and active labor. EHG signals are
98% is obtained for ELM classifier. mostly used to predict true/false
Hence ELM classifier outperforms labor pains and prevent premature
other classifiers. birth. The aim of the proposed work
This chapter proposes the is to demonstrate the efficacy of
possibility of employing artificial the machine learning algorithm in
intelligence technique with EHG classifying EHG signals as indica-
signals for differentiating true labor tors of true or false labor pains. In
and false labor pain signals. Early the proposed work, the EHG signal
Analysis and Interpretation of Uterine Contraction Signals 355
dataset from the PhysioNet database Methods presently used in obstet-

is used for evaluation. rics are not accurate enough to detect
Delivery prior to the completion the risk of labor early. Therefore, a
of 37 weeks of gestation is referred more consistent method is needed
to as preterm and is currently a for early recognition and prevention
challenge in obstetrics. A preterm of false labor threats (Zorz et al.,
delivery with its associated compli- 2008).
cations is a major cause of infant The EHG signal recorded
deaths, with studies reporting thou- superficially represents the internal
sands of infant deaths daily (Zorz et uterine electrical activity. Though
al., 2008). the EHG represents the uterine
Efforts have been made to alleviate electrical activity it is contaminated
the effects of preterm births. Conse- by the noise that distorts the signal.
quently, it is important to predict or Hence preprocessing is step required
distinguish between true and false to eliminate the noise before further
labor pains. EHG signals occur as a processing the signal.
result of the propagation of electrical In the proposed work, features
activity between the muscular cells such as the mean, median, kurtosis,
of uterus walls, and reveal potential skewness, peak frequency, median
differences between electrodes. HG frequency, energy, and entropy are
signal studies help establish an ambu- extracted from EHG signals. The
latory monitoring system for risky extracted features are applied to
pregnancies, and provide alerts if a different machine learning algo-
premature pregnancy threat occurs rithms individually to determine
(Fergus et al., 2015). whether the signals herald true or
The objective of prenatal care false labor pains.
is to sustain the health of both This work aims to evaluate the
mother and fetus, and retain the features extracted from EHG signals,
fetus in the uterus till healthy birth. in conjunction with several advanced
Monitoring uterine contractility is artificial intelligence algorithms,
crucial during pregnancy in order to assess their ability to distinguish
to differentiate normal contractions between true and false labor pains.
from those causing early stretching
of the cervix (Fergus et al., 2015).
To obtain a better performance, 16.2 A REVIEW OF
the three-channel EHG signals are LITERATURE
band pass-filtered using a fifth-order
Butterworth band pass filter before In this section, brief analysis of
analysis. some significant contributions of
the existing literature in EHG signal physiological contractions during

processing and artificial intelligence pregnancy.
are presented. Ivancevic et al. (2008) proposed
Radomski et al. (2008) proposed an analysis of EHG signals to assess
a method based on a nonlinear feature modern nonlinearity methods used
analysis of EHG signals. Monitoring in preterm birth analysis. A nonlinear
uterine electromyographic signals analysis of uterine contraction
during pregnancy is critical to clin- signals furnishes information on the
ical medicine. This work evaluated physiological changes undergone
the possibility of a nonlinear analysis during the menstrual cycle and
of electrohysterographical signals to pregnancy, which can be used for
assess uterine contractile activity both preterm birth predictions and
during pregnancy. The analysis was preterm labor control.
based on sample entropy statistics, Baghamoradi et al. (2011)
and the initial results confirmed that proposed a method to predict
the method could provide clinically preterm labor and evaluated the
useful information for obstetrical application of cepstral analysis
care. for the classification of both term
Hassan et al. (2010) developed and preterm labor. In all, 20 EHG
a method to distinguish active labor records of term delivery (pregnancy
from normal pregnancy contractions. duration ≥37 weeks) and preterm
Labor prediction using the EHG has delivery (pregnancy duration ≤37
far-reaching clinical applications. weeks) were analyzed. A multilayer
Different linear methods, as in, for perception (MLP) neural network is
instance, classic spectral analysis, used to classify the two groups. An
fail to offer significantly beneficial improved classification accuracy
clinical results. This work presented of 72.73% is obtained using the
two useful methods such as one sequential forward feature selection
linear and two non-linear methods. scheme.
The linear method is based on the Arora and Garg (2012) proposed
mean power frequency, and the two a discrete wavelet transform, based
nonlinear methods on approximate on a pyramid set of rules method, to
entropy and time reversibility. The decompose EHG signals and obtain
comparisons demonstrate that time the final feature vector matrix. EHG
reversibility is an excellent method signals are classified into two, term
for classifying pregnancy and labor and preterm. Classification is carried
signals. The results show that time out with the SVM, dividing data into
reversibility is a very promising tool testing and training sets. It is vali-
for distinguishing between labor and dated on a standard database from
PhysioNet. The experimental results EEG feature extraction and pattern

illustrate that the technique gives recognition is proposed, based on
an accuracy of 97.8% and can be a empirical mode decomposition
constructive tool for investigating (EMD) and the SVM. EEG signals
the risk of preterm labor. are decomposed into intrinsic
Hassan et al. (2012) analyzed the mode functions (IMFs) using the
propagation of uterine EMG signals EMD, followed by the extraction
using the nonlinear correlation of features including the coefficient
coefficient. EMG signals from 49 of variation and fluctuation index of
women (36 during pregnancy and IMFs. The task of recognizing EEG
13 in labor) at different gestational signals is undertaken by classifying
ages were recorded by placing a features using the SVM classifier.
4 × 4 matrix of electrodes on the The experimental results report
abdomen. Receiving operating that the algorithm delivers 97.00%
characteristics curves evaluate the sensitivity and 96.25% specificity
various methods in differentiating for multiracial EEGs, and 98.00%
between contractions recorded sensitivity and 99.40% specificity for
during pregnancy and labor. The normal EEGs on the Bonn datasets.
results indicate that the nonlinear Hussian (2013) proposed a
correlation analysis performs better method to predict the preterm
than classical frequency parameters deliveries in pregnant women using
in distinguishing labor contractions immune algorithm. Identifying the
from normal pregnancy contractions. preterm deliveries and treatment to
The paper concludes that analyzing the preterm infants paves a way to
the propagation of uterine electrical the chance of survival. This work
activity using the nonlinear correla- employed an immune algorithm
tion coefficient underscores the which helps in diagnosing and clas-
usefulness of uterine EMG signals sifying the EHG signal in true labor
for clinical purposes, such as moni- and to predict the preterm delivery.
toring pregnancy, detecting labor, This work pays attention to deter-
and predicting preterm labor. mine the delivery as whether term
Li et al. (2013) proposed a and preterm delivery. The machine
method for EEG signal recognition learning classifier produces the
using empirical mode decomposi- overall accuracy of 90%.
tion (EMD) and an SVM. Auto- Shulgin and Shepel (2014)
matic seizure detection is vital to proposed a method to detect and
monitoring epilepsy, in addition to characterize uterine activity. The
its diagnosis and treatment. This fetal heart rate (FHR) and uterine
paper proposed a new method for contraction activity (UA) during
pregnancy and labor are normally (pregnancy duration <37 weeks)

monitored using external tocography was used. The results show that the
and ultrasound respectively. Given highest accuracy can be attained by
that these methods, however, are using the four statistical features of
not accurate and sensitive enough, mean, standard deviation, median
more precise methods are called and zero crossing from channel 1.
for. Abdominal electrocardiography Fatima et al. (2017) proposed a
and EHG are safe and noninvasive method for EHG signal classifica-
techniques that monitor the FHR tion for true and false pregnancy
and uterine contractions during analysis. A baby born on forty week
pregnancy. Signal processing of pregnancy is called as normal
methods are developed for three and healthy baby. A premature baby
algorithms, based on the amplitude was born within the period of after
demodulation, spectrogram, and 20th week and before 37 weeks
root mean square. The algorithms of pregnancy. In this work, linear
extract uterine activity signals from features (mean, root mean square)
multichannel abdominal signals. and nonlinear features (entropy and
Far et al. (2015) proposed a cepstrum) were extracted and the
method that predicts preterm labor EHG signal is classified into term
using statistical and non-linear and preterm pregnancy using SVM
features. Predicting preterm labor classifier.
plays a key role in decreasing
neonatal deaths. Statistical and
non-linear features extracted from 16.3 METHODOLOGY
EHG signals are classified into term
and preterm labor signals using This section describes the method
the SVM. A dataset comprising 26 adopted for the classification of
records from term delivery (preg- EHG signals into true and false labor
nancy duration ≥ 37 weeks) and pains, using artificial intelligence.
26 records from preterm delivery Figure 16.1 describes the proposed
work in detail.
FIGURE 16.1 Flow diagram of the proposed work.

16.3.1 RAW DATA COLLECTION Energy:

The energy of a signal is defined
In this work, EHG signals used for as
the analysis were obtained from
PhysioNet. Signals were recorded E = ∑ ⎡ x (n) ⎤
2
⎣ ⎦ (16.1)
from pregnant women by placing
electrodes on the abdominal surface where x(n) represents the number of
at horizontal and vertical distances, samples.
between 2.5 and 7 cm apart. The
EHG signals were preprocessed to Kurtosis:
denoise them. Kurtosis is a measure of the peak
ness and tailedness of the probability
distribution (Zorz et al., 2008)
16.3.2 PREPROCESSING
k = ∑ ( x − xσ ) \ n − 3 (16.2)
EHG signals are tainted, owing
to contact with different sources, where x represents s the mean, the
during recording. The recorded, raw standard deviation, and n the number
EHG signals contain noise, and are of samples.
preprocessed to denoise them. This Mean:
is done with the digital fifth-order The mean is the average value
Butterworth band pass filter with of a signal, indicated by m. All the
cut-off frequencies ranging from samples are added together and
0.34 to 1 Hz. The sampling rate of divided by N (Far et al., 2015)
each signal is 20 Hz.
1
∑
N −1
µ= xi (16.3)
N i=0
16.3.3 FEATURE EXTRACTION

where xi is the sum of the values of
the signal.
Features extracted from the EHG
signals, and used in the present Median:
work, are described below. The median is a value separating
Energy, entropy, kurtosis, median the higher half of a data sample from
frequency, mean, median, peak the lower half. The median is the
frequency and skewness are value such that a number is equally
extracted from the power spectrum likely to fall above or below it (Far
of the EHG signals. The power et al., 2015)
spectrum is computed using the
fast Fourier transform. M=
( n +1) (16.4)
2
where n is the number of samples in nearest neighbors contribute more

the set. to the average than distant ones. The
KNNs is a simple algorithm that
Skewness:
stores all available cases and classi-
It estimates the symmetry of a distri- fies new ones based on a similarity
bution, that is, the relative frequency measure (https//datascience.com).
of positiveand negative extreme
values (Zorz et al., 2008)
16.4.2 SUPPORT VECTOR
s = (x − µ / )
3
(16.5) MACHINES
where s represents the standard An SVM is a discriminative classi-

deviation and m is the mean. fier strictly defined by sorting out a
hyperplane. Support vector machines
are supervised learning models that
16.4 CLASSIFIERS analyze the data used for classifica-
tion and regression analysis. A set
In the proposed work, classifiers of training samples is marked as
such as the SVM, ELM, KNNs, RF, belonging to one or two categories,
ANN, and RBFNN are individually and the SVM efficiently performs
used to categorize the EHG signals linear and nonlinear classification
into true or false labor pains. The using a kernel function. In a linear
performance of each classifier function, if the training data is linearly
is individually evaluated and a separable, the two parallel hyper
comparison drawn. planes that separate the two classes
of data are selected. By varying the
kernel functions, accuracy can be
16.4.1 K-NEAREST improved (Li et al., 2013).
NEIGHBORS
The KNNs is a nonparametric 16.4.3 EXTREME LEARNING

method used for classification and MACHINES
regression. The input consists of
the “k” closest training set in the ELMs are feedforward neural
feature space and the output depends networks used for classification and
on whether the k-nearest neighbor regression analysis. A feedforward
is used for classification. For clas- neural network such as back-
sification, a skilful technique can be propagation is slow. Training using
assigned and weighted to the contri- a gradient-based algorithm requires
butions of the neighbors so that the many iterative steps for an enhanced
performance. The ELM, which is The output of the network is a linear

designed to overcome these issues, combination of the RBFs of inputs
is a single hidden layer feedforward and neuron parameters. The RBFNN
neural network. The ELM randomly typically has three layers: an input
chooses input weights and hidden layer, a hidden layer with a nonlinear
unit biases. Training the network by RBFNN activation function, and a
finding the least square solution to linear output layer (https//datasci-
the linear system, the ELM analyti- ence.com).
cally determines the output weight
and provides the best reductive
performance at an extremely fast 16.4.6 RANDOM FOREST (RF)
learning speed (Chen et al., 2017).
The RF is a flexible, easy-to-use
machine learning algorithm that
16.4.4 ARTIFICIAL NEURAL
produces good results, and is used
NETWORKS
for both classification and regres-
sion. It is a supervised learning
An ANN is based on a collection of
connected units or nodes called arti- algorithm, largely used because
ficial neurons. Each connection of its simplicity. It builds multiple
between artificial neurons can decision trees, merges them together
transmit a signal from one node to to obtain stable predictions, and
another node. The artificial neuron adds additional randomness while
network receives the signal and growing the trees. Rather than search
processes it. Artificial neurons for the most important features, it
and connections typically have looks for the best feature among a
a weight that adjusts as learning random subset of features (https//
proceeds. The weight increases or datascience.com).
decreases, depending on the strength
of the signal at a connection (Fergus
et al., 2015). 16.5 RESULTS AND
DISCUSSION
16.4.5 RADIAL BASIS This section presents the results

FUNCTION NEURAL and analyses the performance of
NETWORKS (RBFNN) different classifiers in terms of
grouping signals into true and false
The RBF network is an ANN that labor pains.
uses RBFs as activation functions.
16.5.1 PERFORMANCE measures the proportion of actual

METRICS positives that are correctly identified
as such
Evaluating artificial intelligence
algorithms for the classification Sensitivity =
(TP ) ×100 (16.7)
of EHG signals into true and false (TP + FN )
labor pains is essential. The perfor-
mance is evaluated using the three
measures of accuracy, sensitivity, 16.5.1.3 SPECIFICITY
and specificity.
It is a measure of the performance
of a binary classification test, and is
16.5.1.1 CLASSIFICATION also called the True Negative Rate.
ACCURACY It measures the proportion of actual
negatives that are correctly identi-
fied as such
Accuracy =
(TP + TN ) ×100 (16.6)
TP + FP + FN + TN
Specificity =
( FP ) ×100
where FP + TN
(16.8)
TP—True positive represents the
correct classification, that is, true
labor signals classified as true labor,
16.5.2 DATA COLLECTION
TN—True negative represents
the correct classification, that is, FOR UTERINE CONTRACTION
false labor signals classified as false
Raw EHG signals, obtained from the
labor,
PhysioNet database, were recorded
FP—False positive represents
using four bipolar electrodes. Each
misclassification, that is, false labor
signal was either recorded early, at 26
signals misclassified as true labor,
weeks (at around 23 weeks of gesta-
and
tion) or later, at 26 weeks (at around
FN—False negative represents
31weeks). Within the dataset, three
misclassification, that is, true labor
signals per record were obtained
signals misclassified as false labor.
simultaneously by recording them
through three different channels
16.5.1.2 SENSITIVITY (Zorz et al., 2008).
In the proposed work, EHG
A measure of the performance of a signals from the PhysioNet database
binary classification test, sensitivity were used for analysis. Raw EHG
is also called True Positive Rate. It signals obtained from the PhysioNet
database were recorded using four 16.5.3 RESULTS

bipolar electrodes, stuck to the
abdominal surface and spaced at Figure 16.2 depicts the raw EHG
horizontal and vertical distances uterine signals containing the
between 2.5 and 7 cm apart. The output of three channels, with each
database contains a total of 300 channel comprising three channel
signals, of which 160 are indicative signals, respectively. Therefore,
of true labor and 140 of false labor. the EHG signals constitute a total
The raw EHG signals were prepro- of nine.
cessed for denoising.
FIGURE 16.2 Raw EHG signals.
Figure 16.3 shows the filtered Figure 16.4 represents the

response of the EHG signals. power spectral density plot of the
The signals are filtered using EHG signals. By applying the fast
the five-order Butterworth band Fourier transform to the signals,
pass filter, with a cut-off frequency their power spectral density is
ranging between 0.34 and 1 Hz. obtained.
FIGURE 16.3 Filtered response of the EHG signals.
FIGURE 16.4 Power spectral density of the EHG signal.

In this work, the performance of The proposed work employs

classifiers such as the SVM, ELM, a dataset of 300 signals, in which
KNN, RF, ANN and RBFNN is 160 and 140 signals belong to true
evaluated individually, in terms of and false labor pains respectively.
accuracy, to classify the EHG signals The classifier is trained to classify
into true and false labor pains. EHG the signals into true and false labor
records from the Term–Preterm pains.
Electrohysterogram Database of The performance of different
PhysioNet were used for evaluation. classifiers such as the SVM, ELM,
It comprises 300 records of pregnant KNN, RF, ANN, and RBFNN is
women, of which 262 are full-term evaluated for different features.
pregnancies and 38 are premature SVM and ELM classifiers are
births. Here, 162 records were taken evaluated using the different kernel
before the 26th week of pregnancy functions. The SVM is evaluated
and 138 records after, collected from using four kernel function linear,
1997 until 2005 at the Department polynomial, RBF and multilayer
of Obstetrics and Gynaecology, perceptions (MLP). Moreover, ELM
Ljubljana University Medical is evaluated using kernel functions
Centre Slovenia. The records were like sin, sigmoid, radial basis func-
acquired from the general population (RADBAS), and triangular
tion, in addition to patients admitted basis function (TRIBAS).
to the hospital with a diagnosis of A performance analysis of
different classifiers in classifying
anticipated preterm labor One record
the EHG signals recorded into true
per pregnancy was recorded with a
and false labor, with different feature
sampling frequency (Fs) of 20 Hz.
sets, is shown in Table 16.1.
The records were gathered from
the abdominal surface using four
AgCl2 electrodes, placed in two hori- 16.5.4 DISCUSSIONS
zontal rows, equally under and above
the umbilicus, spaced 7 cm apart. From the tabulation, it is evident
Three channels were made using that the ELM classifier outperforms
the four electrodes. The first channel others in classifying EHG signals
acquired a signal combining elec- for all kernel functions, except
trodes E2-E1, the second acquired a sigmoid and sin, for energy and
second signal combining electrodes entropy features. Next to the ELM,
E2–E3, and the third acquired a third the RBNN classifier excels in clas-
signal combining electrodes E4–E3 sifying the EHG signals for every
(Zorz et al., 2008). feature set. For the different features
366
TABLE 16.1 A Performance Analysis of Different Classifiers
Features SVM ELM RBFNN ANN KNN RF
Kernel functions Kernel functions
Linear Polynomial RBF MLP Sigmoid sin TRIBAS RADBAS
Maximum 52% 55% 54% 48% 98% 98% 98% 98% 97% 96% 76% 81%
frequency
Median
frequency
Mean 58% 50% 57% 55% 98% 98% 98% 98% 96% 94% 77% 82%
median
Kurtosis 49% 49% 49% 44% 98% 98% 98% 98% 85% 72% 70% 68%
skewness
Energy entropy 49% 50% 49% 52% 87% 89% 98% 98% 91% 91% 71% 81%
set ANN classifier produces a good signals is computed by applying the

accuracy except for kurtosis and fast Fourier transform.
skewness features set. In the RF Features are extracted from the
classifier yields a highest accuracy preprocessed EHG signals, and
than SVM classifier. In the case of energy, entropy, kurtosis, mean,
the SVM classifier, of the different median, maximum frequency,
kernel functions and features, the median frequency and skewness
mean and median features, along are calculated After the features
with the linear kernel function, yield are extracted, the classifier is
the higher accuracy in comparison trained and tested using a different
with other kernel functions and feature set The performance of
feature sets. In the proposed work classifiers such as the SVM, ELM,
SVM classifier is inferior than KNN, ANN, RBNN, and RF is
other classifiers in classifying the individually evaluated in terms of
EHG signals into true \ false labor classifying EHG signals into true
pain. labor and false labor pains, and an
analysis is carried out. The accu-
racy obtained from the different
16.6 CONCLUSION classifiers is 58%, 98%, 77%, 96%,
97%, and 82% respectively. It is
This chapter offers insights into evident from the results that the
the development of an intelligent ELM classifier outperforms others
system for the early diagnosis of in classifying EHG signals into true
premature birth by correctly iden- and false labor pains.
tifying true/false labor pains. Raw
term–preterm electrohysterography
signals from PhysioNet were 16.7 FUTURE WORK
analyzed in this work. The database
contains a total of 300 signals. In the present work, EHG signals are
In this study, raw EHG signals classified using different classifiers.
are preprocessed before analysis In future, an embedded system can
and filtered using the five-order be developed by integrating sensors
Butterworth band pass filter. The for signal acquisition and an artificial
raw EHG signals obtained from intelligence algorithm using suitable
the three channels comprise nine software to differentiate between
signals that are filtered with cut-off true and false labor pains using EHG
frequencies of between 0.34 and signals.
1 Hz. The power spectrum of the
KEYWORDS transform and extreme learning machine,

Computational and Mathematical Methods
in Medicine, 2017, 2017, 1–9.
electrohysterography (EHG) Chudacek V, Spilka J, Bursa M, Janku P,
Hruban L, Huptych M and Lhotska L.
support vector machine
Open Access Intrapartum CTG Database
(SVM)
BMC Pregnancy Childbirth, 2014.
extreme learning machine Far TD, Beiranvand. M and Shahbakthi M.
(ELM) Prediction of preterm labor from EHG
signals using statistical and non-linear
features, Biomedical Engineering
International Conference, 2015.
REFERENCES Fatima U, Goskula. T. EHG signal
classification for true and false pregnancy
Acharya RU, Sudarshan KV, Rong QS, Tan analysis, International Journal on Recent
Z, Min CL, Koh EWJ, Nayak S, Bhandary and Innovation Trends in Computing and
VS. Automated detection of premature Communication, 2017, 5 (6), 811–814.
delivery using empirical mode and Fergus P, Dowu. I, Hussain. A, Dobbins.
wavelet packet decomposition techniques C. Advanced artificial neural network
with uterine electromyogram signals, classification for detecting preterm births
Computers in Biology and Medicine, 2017, using EHG records, Neurocomputing
85, 33–42. 2015, 188, 42–49.
Arora S and Garg G. A Novel scheme to Hassan Terrien, Alexanderson, Marque,
classify EHG signal for term and preterm Karlsson. Nonlinearity of EHG signals
pregnancy analysis, International Journal used to distinguish active labor from
of Computer Application, 2012, 51(18), normal pregnancy contractions, 32nd
37–41. Annual International Conference of the
Askar A, Jumeily AD, Jager F, Fergus P. IEEE, 2010.
Dynamic neural network architecture Hassan M, Terrien. J, Muszynski. C,
inspired by the immune algorithm to Alexanderson. A, Marque. C and
predict preterm deliveries in pregnant Karrlonson. B. Better pregnancy monitoring
women, Neurcomputing, 2015, 151, using non linear correlation analysis of
963–974. external uterine electromyography, IEEE
Baghamoradi SMS, Naji M and Aryadoost Transaction in Biomedical Engineering,
H. Evaluation of cepstral analysis of 2013, 60(4), 1160–1166.
EHG signals to prediction of preterm Huang ML, Hsu YY. Fetal distress prediction
labor, Iranian Conference of Biomedical using discriminant analysis, decision tree,
Engineering, 2011. and artificial neural network, Journal of
Batra. A., Chandra A, Matoria V. Biomedical Science and Engineering,
Cardiotocography Analysis Using 2012, 5, 526–533.
Conjunction of Machine Learning Ivancevic T, Jain CL, Pattison EJ, Hariz A.
Algorithms, International Conference Preterm birth analysis using nonlinear
on Machine Vision and Information methods, Recent Patents on Biomedical
Technology 2017. Engineering, 2008, 160–170.
Chen L and Hao1 Y. Feature extraction and Jezewski M, Czabanski R, Wrobel J
classification of EHG between pregnancy and Horoba K. Analysis of extracted
and labour group using Hilbert–Huang cardiotocographic signal features to
improve automated prediction of fetal Shulgin V and Shepel O, Electrohysterographic

outcome, Biocybernetics and Biomedical signals processing for uterine activity
Engineering, 2010, 30(4), 29–47. detection and characterization, International
Li S, Zhou W, Yuan Q, Geng S, Cai D, Scientific Conference Electronics and
Feature extraction and recognition of ictal Nanotechnology, 2014.
EEG using EMD and SVM, Computers Sundar C, Chitradevi M, Geetharamani G.
Classification of cardiotocogram data
in Biology and Medicine, 2013, 43(7),
using neural network based machine
807–816.
learning technique, International Journal
Murray M. Antepartal and intrapartal fetal
of Computer Applications, 2012, 47(14),
monitoring. 3rd ed. Springer, 2006. 19–25.
Radomski D, Grzanka A, Graczyk S, and Zorz F.G, Kavsek G, Novak Z, Franc Jager.
Przelaskowski A. Assessment of uterine A, comparison of various linear and non
contractile activity during a pregnancy linear signal processing techniques to
based on a nonlinear analysis of the uterine separate uterine EMG records of term
electromyographic signal, Information and preterm delivery groups. Medical &
Technologies in Biomedicine Springer, Biological Engineering & Computing,
2008, 47, 325–331. 2008, 46(9), 911–922.
CHAPTER 17
ENHANCED CLASSIFICATION
PERFORMANCE OF
CARDIOTOCOGRAM DATA FOR
FETAL STATE ANTICIPATION
USING EVOLUTIONARY FEATURE
REDUCTION TECHNIQUES
SUBHA VELAPPAN,1* MANIVANNA BOOPATHI ARUMUGAM,2 and
ZAFER COMERT3
Department of Computer Science & Engineering,
1
Manonmaniam Sundaranar University, Tirunelveli, India

2
Instrumentation & Chemicals Division, Bahrain Training Institute,
Kingdom of Bahrain
3
Department of Software Engineering, Samsun University, Turkey
Corresponding author. E-mail: subha_velappan@msuniv.ac.in
*
ABSTRACT Cardiotocography (CTG) is a test that

is done during the third trimester of
Role of computers became inevitable pregnancy to measure the heart rate
in healthcare sector and computers and movements of fetus and helps to
with information and communica- monitor the contractions in the uterus
tion technologies are found to be and thereby for monitoring the signs
widely used for assessment, patient of any distress, before the delivery
monitoring, documentation, and of baby and during the labour. The
telemedicine. Data mining is a field physical interpretation of informa-
which helps to obtain knowledge tion from CTG is found to be a chal-
from massive amount of data lenging task, and any contradictory
from any industry or organization. interpretation will lead to erroneous
diagnosis on fetal condition, which ROC have been used and the hybrid
may lead to the extent of fetal death. model incorporating information
Feature selection is the process in gain and ppposition-based firefly
which an optimal subset of features algorithm proves better performance
is selected based on some defined than the other techniques.
criterion which helps to consider-
ably improve the performance of
17.1 INTRODUCTION
classification in terms of learning
speed, accuracy of prediction, Healthcare is one of the major
simplicity of rules, etc. Also, the sectors which exploits computers
reduction in size of feature subset and modern techniques of informa-
helps to remove noise and irrelevant tion technology for efficient patient
features. Several approaches have information storage, management,
been introduced for improving the retrieval, documetation, diagnosis,
performance of computerized clas- etc. Data mining techniques are
sification of CTG data which leads employed in clinical decision
to an improved diagnosis of fetal support systems (CDSS) for effi-
status. In this chapter, Filter and ciently handling these huge amount
Wrapper feature selection techniques of healthcare data in order to assist
are applied to CTG dataset available the this industry in identifying good
in UCI machine learning repository. practices of patient monitoring,
Evolutionary algorithms such as hospital administration, diagnosis,
genetic algorithm, firefly algorithm, treatment and documentation. This
and a hybrid technique incorporating eventually brings the cost down by
information gain and opposition- almost 30% (HealthCatalyst, 2019).
based firefly algorithm have been However, identifying and employing
used to improve the classification efficient data mining techniques for
performance of CTG dataset. The this purpose remains still a chal-
results of simulations show that the lenge because of the complex nature
proposed methodologies are highly of healthcare data and inability to
promising when compared to the adapt to new technologies. The
other existing methods. To assess knowledge-based CDSSs make use
the performance of these proposed of if-then rules in the knowledge
methodologies, various performance base with an inference system and
measures namely accuracy, sensi- a communication system in order to
tivity (or) recall, specificity, preci- obtain the inferences by combining
sion (or) positive predictive value, the if-then rules with the patient
negative predictive value, geometric data. The CDSSs which do not
mean, F-measure, and area under rely on knowledge base, utilize the
Enhanced Classifcation Performance of Cardiotocogram Data 373
machine learning techniques such model. Further, it also helps to reduce

as support vector machines (SVM), the dimensionality and remove the
artificial neural networks (ANNs), noise present in the data.
etc. (Wagholikar et al., 2012) instead The FS process also assists the
of prewritten knowledge base. knowledge discovery process which
Cardiotography (CTG) is a is performed in three stages namely
popular CDSS used for monitoring preprocessing of data, data mining
the well being of fetus in the mother’s and the postprocessing. The FS
womb. A CTG recording containts process ensures that a good quality
the information of heart rate (HR) of of data is supplied by pre-processing
fetus and uterine contraction (UC). stage to the mining stage, a powerful
Fetal hypoxia is an abnormality in and systematic process of mining
fetal condition which results due and a meaningful knowledge being
to the scarcity of oxygen for the delivered by the post-processing
fetus (Chudáček et al., 2010). If not stage. Excessive amount of features
diagnosed well and treated promptly, present in the data generally leads
this torment of the fetus may lead to to a relatively ineffective mining
severe neurological disorder or death results and hence the number of
(CÖMERT et al., 2018b). features is reduced to the possible
Manual interpretation of informa- extent without compromising the
tion from CTG signals is a difficult quality of mining.
task for the physicians. The poor and Even though there are many
inconsistant intrepretation of CTG methodologies available for interpre-
signals will lead to poor diagnosis tation of CTG data, their prediction
and treatment and eventually to fetal accuracy are still not up to the mark
death. Using computers with machine (Liu and Motoda, 2000). Hence, it is
learning capabilities on the attributes still a challenge to develop an effi-
of CTG data the fetal condition can cient and effective methodology with
be classified more efficiently and excellent accuracy of prediction.
accurately. The preformance of this In this chapter, three new and
classification task can be improved efficient FS techniques based on
by feature selection (FS) process by evolutionary methodologies such as
which the appropriate attributes of Firefly Algorithm (FA), Opposition-
CTG data are identified and selected. Based Firefly Algorithm (OBFA)
Attribute reduction performed and OBFA melded with Information
by the FS process aids to improve Gain (IG-OBFA) are presented in
the learning speed and accuracy of detail. The first two methods employ
prediction using simple rules, ability the wrapper method and the third
to visualize the data for selection of one employs both filter and wrapper
methods. These FS techniques are Various performance measures

used to find an optimal feature subset such as accuracy, sensitivity,
and then combined with SVMs in specificity, positive predictive value,
order to enhance the accuracy of negative predictive value, geometric
classification done by SVM. The mean, F1-measure, and area under
CTG data that is widely used by ROC have been evaluated to assess
researchers (UCI Machine Learning the performance of these methods.
Repository, 2019) has been used for Figure 17.1 shows the classification
the experiments. of classifiers used for classification of
CTG data set.
FIGURE 17.1 Classifiers for CTG data set.
Most of the existing attribute neural-network-based classifier

reduction techniques exploit the with eliminated potential outliers
benefits of soft computing tech- (Chitradevi et al., 2013; Tang et al.,
niques such as ANNs, fuzzy logic, 2018), machine learning technique
genetic algorithm (GA), and the (Sahin and Subasi, 2015), modular
combination of these such as neuro- neural network models (Jadhav et
fuzzy, genetic-neuro, etc. al., 2011), models using discrimi-
Among the techniques employing nant analysis, decision tree (DT)
ANNs, neuro-fuzzy system to and ANN (Huang and Hsu, 2012),
recognize the accelerative and particle swarm optimization (PSO)
decelerative patterns of fetal heart and GA-aided BPNNs (Hongbiao
rate signal (Romero et al., 2002), and Genwang, 2012), optimal neural
network (ONN) (Bryan at al., 2012), detection systems using GA (Aziz

ANN-based clustering adaptive reso- et al., 2013), Correlation-based GA
nance theory 2 (ART2) with fuzzy method (Tiwari and Singh, 2010)
decision trees (Ping et al., 2012) and ANN trained by back propaga-
and adaptive neuro fuzzy inference tion algorithm combined with GA
system (ANFIS) (Hasan and Ertunc, (Venkatesan and Premalatha, 2012)
2013) are found to be appreciable have contributed remarkable perfor-
in certain aspects of classification mance improvement.
performance. The contributions to improve the
Further, the methods based on performance of FS for CTG classifi-
genetic algorithms such as CDSS cation to predict the fetal well being
using Improved adaptive genetic using other optimization techniques
algorithm (IAGA) and extreme include ant colony optimization
learning machine (ELM) (Sindhu at (ACO) technique with SVM (Abd-
al., 2015), SVM classifier with GA Alsabour and Randall, 2010; Al-Ani,
(Hasan, 2013; Subha et al., 2015), 2005), complementary particle
intelligent heart disease decision swarm optimization algorithm
support system (Ratnakar et al., (Chuang et al., 2013), Gray-Wolf
2013), enhanced heart disease predic- Optimization technique (Emary et
tion system using GA (Anbarasi at al., 2015), Complementary binary
al., 2010), decision support system particle swarm optimization algo-
using SVM and integer-coded rithm (Chuang et al., 2013), multi-
genetic algorithm (ICGA) (Bhatia objective algorithms (Xue et al.,
et al., 2008), GA and ANNA-based 2012a), multiobjective binary PSO
algorithm (ElAlami, 2009), two stage (Xue et al., 2012b), combination of
optimisation using GA (Huang et al., ACO and GA for SVM (Imani et al.,
2007), GA with linear and nonlinear 2012), bat algorithm and optimum-
Great Deluge algorithm (Jaddi and path forest (Rodrigues et al., 2014),
Abdullah, 2013), Mining and FS hybridized PSO, PSO-based relative
using GA (Sikora and Piramuthu, reduct and PSO-based Quick Reduct
2007), K-nearest neighbor (KNN) (Inbarani et al., 2014), artificial
with GA algorithm (Deekshatulu bee colony (ABC) (Schiezaro and
and Chandra, 2013), differential Pedrini, 2013), bat algorithm for
evolution (DE) with GA (Bharathi attribute reduction (Taha and Tang,
and Subashini, 2014), Wrapper 2013), combination of binary bat
method using GA and SVM (Zhuo et algorithm with the Optimum-Path
al., 2008), integer and binary coded Forest classifier (Nakamura et
GA-based SVMs (Nithya et al., al., 2012), binary cuckoo search
2013; DİKER et al., 2018), intrusion (Rodrigues et al., 2013), binary
particle swarm optimisation (Xue et 2006), clustering, detection of outlier

al., 2012c), modified multi-swarm and classification, by random tree
PSO (Liu et al., 2011), FA with and Quinlan’s C4.5 algorithm (Jacob
rough set theory (Banati and Bajaj, and Ramani, 2012), supervised
2011) and modified FA optimization ANN (Sundar et al., 2013), medical
(Emary et al., 2015). decision support systems using NB,
In addition to the above- machine learning (Sontakke et al,
mentioned methods, which use 2019), multilayer perceptron and
the soft computing techniques, C4.5 (Aftarczuk, 2007), fetal heart
there are the other methodologies rate (FHR) baseline estimation algo-
with remarkable contribution to rithm (Nidhal et al., 2010), algorithm
the improvement of classification using memory-less fading statistics
performance namely information (Rodrigues et al., 2011), non-stress
gain and adaptive models , evolu- test-based algorithm (Ergun et al.,
tionary neural network FS algorithm 2012), Data mining techniques for
(Chudacek et al., 2008), adaptive heart disease prediction (Bhatla and
boosting decision trees and machine Jyoti, 2012), SVM for comparative
learning algorithm (Karabulut and genomic hybridized data (Liu et al.,
Ibrikci, 2014), knowledge discovery 2008), Two stage decision support
in databases in machine learning, system diabetes disease (Ambica et
statistics and databases (Fayyad et al., 2013), Chi Square and T-Tests
al., 1996), Wrapper around random (Jeyachidra and Punithavalli, 2013),
forest classification feature selec- Fisher Ratio and Mutual Informa-
tion algorithm using Borutapackage tion-based method (Vidyavathi and
(Kursa and Rudnicki, 2010), Mutual Ravikumar, 2008), separability
information-based greedy feature index matrix (Han et al., 2013),
selection (Hoque et al., 2014), margin-based feature selection
emprical mode decomposition- (Bachrach et al., 2004), multiobjec-
based approach (Krupa et al., 2011; tive genetic algorithm (MOGA) and
CÖMERT et al., 2018a), Combined multiobjective version of forward
system identifcation and meashine sequential selection (Pappa et al.,
learning methods (Warrick et al., 2002), dynamic mutual information-
2010), Random Forest, REPTree and based algorithm (Liu et al., 2009),
linear discrimination analysis-based multiclass SVM using GA (Agarwal
algorithms (Tomáš et al., 2013), FS and Bala, 2007), association rule
methods for naïve Bayes (NB) clas- mining with decision tree algorithm
sifier (Menai et al., 2013), continuous (Rajendran and Madheswaran,
CTG monitoring using electronic 2010), cAnt-Miner2 and Max-
fetal monitoring (Alfirevic et al., Relevance-based algorithm and
Min-redundancy feature selection also at intrapartum. Interpretation

algorithms (Michelakos et al., 2010), of CTG signal is done based on the
Levenberg–Marquardt algorithm parameters of the structure “DR
(Buck and Zhang, 2006), Wavelet C BRaVADO” which is listed in
transform (CÖMERT and Kocamaz, Table 17.1 (Geeky.medics, 2019).
2019), prognostic model (CÖMERT Figure 17.2 shows the typical CTG
et al., 2018c) and hybrid data mining recorded signal containing FHR and
model (Ha and Joo, 2010). UC (Healthnetconnections, 2019).
TABLE 17.1 Parameters of CTG Signal

17.2 CARDIOTOCOGRAPHY Parameter Description
(CTG)
DR Define risk
To monitor the UC and FHR and C Contractions
thereby the condition of the fetus such BRa Baseline rate
as fetal hypoxia, cardiotocography V Variability
(CTG) is used (Patient.info, 2019). A Accelerations
It is obtained from the abdomen of D Decelerations
pregnant woman using two external O Overall impression
probes of cardiotocograph instru- (Source: Adapted from Geeky.medics, 2019.)
ment during the last trimester and
FIGURE 17.2 Example of a CTG recorded signal.

(Source: Adapted from Healthnetconnections, 2019.)
Sometimes, internal monitoring possible. During internal CTG moni-

of CTG is done when the external toring, the probe is inserted in to the
monitoring is difficult or not mother’s womb to touch the scalp of
the fetus to get the FHR. The UC is neural networks, etc. Usually, the
measured using Catheter, a flexible two stages of classification process
tube inserted in to the uterus. are, learning the model from the
training data set which has class
labels and applying the model to
17.2.1 UCI CTG DATASET the test set. The table containing the
results of classification is called as
The Centre for Machine Learning and “Confusuion matrix.” The confusion
Intelligent Systems, Bren School of matrix shows the number of times the
Information and Computer Science, classes are predicted correctly and
University of California at Irvine, wrongly, which gives the accuracy
USA maintains a free to access data of classification. From the confusion
bank named UCI Machine Learning matrix, there are a number of perfor
Repository (UCI Machine Learning mance measures such as accuracy,
Repository, 2019). The CTG data error rate, sensitivity, specificity,
available in this data bank is one of negative predictive value, geometric
the widely referred data sets. There mean, precision (or) positive predic
are 2126 fetal CTG recordings tive value, F-measure, area under
classified into three classes namely ROC, etc., are evaluated in order
Normal (N), Suspect (S), and Patho to specify the effectiveness of the
logic (P). There are 21 attributes for classifier. Following are the expres
each CTG with 1 attribute to repre sions for these performance metrics
sent the class. More details of these measured from the confusion matrix
attributes and classes can be found in given in Table 17.2.
(UCI Machine Learning Repository,
2019). TABLE 17.2 Confusion Matrix
Predicted
17.3 CLASSIFIERS AND Class A Class B
PERFORMANCE
TRUE
Class A
FALSE NEGATIVE
POSITIVE
Classifiers are the models used to (FN)
(TP)
ACTUAL
classify the given input data in to

a class, by fitting the data with the FALSE
Class B
label of class based on the relation TRUE NEGATIVE

POSITIVE
(TN)
ship between the atributes and label. (FP)
Some of the popular classifiers are
SVM, NB classifiers, decision tree ⎡
Accuracy = ⎢
TP + TN ⎤ Number of correct predictions
=
⎣ TP + TN + FP + FN ⎥⎦
classifiers, rule-based classifiers, Total number of predictions
(17.1)
Enhanced Classification Performance of Cardiotocogram Data 379
⎡
Error rate = ⎢
FP + FN ⎤ Number of wrong predictions
=
⎣ TP + TN + FP + FN ⎦⎥ Total number of predictions
(17.2) individually, it is simple but least
efficient method. On the other hand,
the vector selection establishes
⎡ TN ⎤
Negative predictive value: NPV = ⎢ (17.3) a relationship across the features
⎣ TN + FN ⎥⎦
based on a wrapper or a filter and
hence it is relative complex and
⎡ TP ⎤
Precision (or) positive predictive value: PPV = ⎢
⎣ TP + FP ⎦⎥
(17.4)
efficient. The filter-based selection
of features is done based on the
⎡ TP ⎤
Sensitivity (or) Recall = ⎢ (17.5) statsitical properties of the features
⎣ TP + FN ⎦⎥
(John et al., 1994). Since, it does
not employ any learning algorithms
⎡ TN ⎤ for this task, it is very simple and
Specificity = ⎢ (17.6)
⎣ TN + FP ⎥⎦
fast. However, because it does not
consider the dependencies of any
Geometric mean: Gmean = specificity× sensitivity (17.7) feature with other features, it may
result in a poor classification. The
⎡ precision×sensitivity ⎤
wrapper-based feature selection
F-measure = 2 × ⎢ ⎥
⎣ precision +sensitivity ⎦
(17.8) is done using repeated learnings
which looks for the optimal or near
Area under ROC =
Sensitivity + Specificity
(17.9) optimal subset then followed by
2
cross-validation. As a result, it is
more computationally expense but
more accurate too.
17.4 FEATURE SELECTION Methods like information
gain, chi-square test, Fisher score,
The FS or variable selection or correlation coefficient and vari
attribute selection, eliminates the ance threshold, gain ratio attribute
features which are not relevant, evaluator (Kantardzic, 2013),
unnecessary and containing noise information gain attribute evaluator
and as a result creates a reduced (Liu and Motoda, 2008), etc., are
feature subset. The reduced size of the popular filter-based methods.
the feature subset helps to improve Genetic algorithms, multiclass
the performance of classification. SVM classifiers (Ahuja and Yadav,
FS is done either by scalar selection 2012), recursive elimination,
or by vector selection (Dua and sequential selection algorithms and,
Du, 2011). Since the scalar selec etc., are the examples for wrapper-
tion invloves selection of features based selection methods.
17.5 FEATURE SELECTION problems of classification (Subha

USING FIREFLY ALGORITHM and Murugan, 2014).
The fireflies emit a flashing
17.5.1 FIREFLY ALGORITHM light from their body in an orderly
manner (Yang, 2010), which is the
Artificial intelligence (AI) tech- result of a bioluminscent reaction
niques based on the behavior of which happens in their body. They
swarms such as bees, birds, fishes, are capable of producing the light as
ants, etc., are called as swarm intel- a high intesity and disjunct flashes.
ligence (SI). These swarms possess These beetles use the flashing light
an organized behavior due to the to attract their partners for copulation
interactivity among the individuals. and also as a cautioning message to
One of the latest SI methods is based other fireflies when required. It is
on the behavior of Fireflies, called an interesting fact that upon seeing
as FA. It was introduced by Yang the flashing light from the male
(Yang, 2010) as meta-heuristic and firefly, the female partner generates
stochastic algorithm to solve the a response light flash comprising
optimization problems. The flashing information about its identity and
behavior of fireflies is considered gender.
as randomization for searching the When the distance from where
optimum solutions. The FA has the flashing light of a firefly is seen
an advantage that it prevents the decreases the the intensity of light
searching processes from being being seen increases and vice versa.
confined in a local optima and the Hence, the attractiveness is directly
candidate solution is improved proportional to the intensity. The
by the local search process untill female fireflies are more attracted by
the algorithm attains the optimum the male firefly’s light of more inten-
solution even in fastly changing sity. However, it is to be noted that
and noisy environments. It is found the female fireflies are not capable
that FA, its modified versions, and of differentiating the larger inten-
hybridized versions are widely used sity of light from longer distance
in varieties of single and multiob- and smaller intensity from shorter
jective engineering optimization distance. In addition to the above
problems (Yang, 2010; Fister at facts being considered for devel-
al., 2013; Manivanna Boopathi and oping the Firefly Algorithm, three
Abudhahir, 2015; Mohamed Ali et facts are assumed; (i) All fireflies are
al., 2018). It is also found that the genderless, (ii) The light intensity of
FAs are efficiently employed for the each firefly has the direct relation-
ship with its attractiveness, (iii) The
nature of fitness function has the Hence, it can be written from the
influence in the light intensity of a above two equations of light inten-
firefly. sity that
The overall idea of FA is on the
relationship that the light intensity I ( x) = I s e −c x2 (17.12)
(I) is in square relationship with the
distance (x). It means that the light However, the attractivness equa-
is seen much brighter than the actual tion relating it with the light intensity
intensity at the source (Is), when can be written using the attractive-
the distance from which it is seen ness at zero distance (as) as
decreases. This relationship can be
2
mathematically written as a = as e −c x (17.13)
Is The Euclidean distance between
I ( x) = (17.10)
x2 two fireflies namely py and pz can
Hence, the fitness or objective be written by representing the mth
function of FA is evaluated in such a component of spatial coordinate, py
way that the solutions are represented as py,m and pz as pz,m;
by the light intensity of each firefly
which are directly proportional to n
( p − pz , m )
2
x yz = p y − pz = y ,m
the value of the fitness or objective m=1
function. (17.14)
In FA, a random initial population
is initialized with the defined values The attraction of firefly y by
of parameters such as randomization another firefly z can be written using
parameter (r), attractiveness (a) and ψy as a vector containing Gaussion
coefficient of absorption (c). With ditributed random numbers in the
these arrangements, the solutions are space [0,1] as
searched by determining the fitness
values continuously for the given
number of iterations.
p y = p y + as e
−c x yz 2
(p z − p y ) + rƒ y
The light intensity of a firefly (17.15)
seen by another firefly in a medium
varies with distance (x) as The present state of yth firefly,
attraction of yth firefly by another
I = I s e −c x (17.11) firefly and the movement of yth
firefly in a random manner are the
where the Is is the intensity of light three factors which are considered
at source. to represent the firefly’s movement.
Therefore, these three param- the form of pseudocode in Figure

eters have to be adjusted in order to 17.3 (Manivanna Boopathi and
improve the performance of FA. The Abudhahir, 2015; Mohamed Ali
whole sequence of FA for finding et al., 2018; Subha and Murugan,
the optimum solution in the given 2014).
number of iterations is presented in
FIGURE 17.3 Firefly algorithm — pseudocode
17.5.2 FEATURE SUBSET measures, FS is employed. The

SELECTION USING FIREFLY process of FS results in a feature
ALGORITHM subset containing g number of
features from the whole feature set
In order to improve the classification containing h number of features. As
accuracy and other performance mentioned earlier, the FS methods
can be based on either wrapper or 17.5.3 RESULTS OF

filter techniques. SIMULATION EXPERIMENTS
Feature subsets are found using
various optimization techniques such The results of simulation experi-
as ABC (Uzer et al., 2013; Schiezaro ments performed in MATLAB
and Pedrini, 2013), ACO, PSO, etc. with both data sets with the actual
The feature selection method using feature set without using FA-based
SVM with FA is presented here FS and reduced feature set by using
which results in a better classifica- FA-based FS are presnted in Table
tion performance compared to the 17.3.
above mentioned methods. To substatiate the better perfor-
All features present in the UCI mance of SVM with FA-based FS,
CTG data set are represented either other performance measures namely
by 0 or by 1 to represent the presence Sensitivity, Specificity, Positive
or absence of a particular feature, Predictive Value and Negative
respectively. For the FA to find the Predictive Value are also evaluated
optimum feature subset, the objec- and presented in Table 17.4.
tive function is taken as the accuracy
of classification of SVM. TABLE 17.3 Average Accuracy of SVM
With and Without FA-based FS
The FA is developed and run
with the initial population of 25 for Class Accuracy (%)
100 iterations which is set as the Actual Full Reduced
stopping criteria for the algorithm. Feature Set Optimal
(Without FS) Feature Set
The objective function of FA is (With
to maximize the light intesity of FA-based FS)
firefly, which in turn maximizes the Normal 94.44 95.64
classification accuracy. The other Suspect 66.77 77.62
parameters of FA such as random- Pathologic 72.15 81.25
ization (r), attractiveness (a), and
Average 88.75 91.92
coefficient of absorption (c) are
selected as 0.5, 0.2 and 1, respec-
tively. At the end of 100 iterations, TABLE 17.4 Other Performance Metrics of
SVM with and Without FA-based FS
an optimal feature subset is found
by FA which improves the accuracy Performance Without With FA-based
Metrics (%) FS FS
of classification of SVM. Totally
25 trials were done using FA and Sensitivity 77.79 84.83
the best results of these 25 runs are Specificity 90.22 93.78
presented. PPV 78.29 83.14
NPV 90.70 93.26
TABLE 17.4 (Continued) Tables 17.3 and 17.4 show that the
FA-based FS with SVM exhibits an
Performance Without With FA-based
Metrics (%) FS FS appreciable improvement of perfor-
G-Mean 83.77 89.19
mance in all aspects. The measures
of these performance metrics are
F-Measure 78.08 83.94
also presented graphically in Figures
Area under 84.00 89.30
ROC 17.4 and 17.5.
FIGURE 17.4 Accuracy of SVM classification with and without FA-based FS.
FIGURE 17.5 Other performance metrics of SVM with and without FA-based FS.
The percentage increases in the matrics are consolidated in Table

performance of SVM using FA-based 17.5 and graphically presented in
FS for each of the performance Figure 17.6.
TABLE 17.5 Percentage Increase in 17.6 FEATURE SELECTION

Performance of SVM using FA-based FS
USING OPPOSITION-BASED
Sl. Performance Improvement FIREFLY ALGORITHM
No. Metric (%)
1. Average accuracy 3.57
17.6.1 OPPOSITION-BASED
2. Sensitivity 9.05
3. Specificity 3.95 FIREFLY ALGORITHM
4. PPV 6.19
5. NPV 2.82 It is encouraging that the perfor-
6. G-Mean 6.47 mance of simple FA is much better
7. F-Measure 7.51 than the other evolutionary algo-
8. Area under ROC 6.31 rithms. However, it is still a chal-
lenge in FA to prevent the premature
From the results shown above, it convergence before reaching the
is clear that using FA-based FS has optimum solution.
improved the performance of SVM To get rid of this challenge and
classifier by minimum of 2.82% expedite the convergence, the FA has
increase in NPV and maximum of been modified with added features.
9.05 % increase in sensitivity. The popular modified FAs are Fuzzy
FIGURE 17.6 Percentage increase in performance of SVM using FA-based FS.
FA, Lévy-flight FA, Jumper FA, and Pedrini, 2013; Draa et al.,
chaotic FA and self-adaptive step FA 2015; Subha and Murugan, 2016;
(Uzer et al., 2013). Another efficient Tizhoosh, 2006; Xu et al., 2011; Yu
modified FA is named opposition- et al., 2015). The other optimiza-
based FA which uses opposition- tion algorithms such as GA, ACO,
based learning (OBL) (Schiezaro PSO, bio-geography optimization,
differential evolution algorithm, opposite positions resulting a total of

gravitational search algorithm, and 2n fireflies. From this 2n fireflies n
simulated annealing have also been number of fittest fireflies are identi-
modified using OBL for improving fied based on their fitness values.
their performance. The overall OBFA sequence is given
The OBL is developed by taking as pseudocode in Figure 17.7.
both a solution and its opposite for The OBL is used to update the
consideration. In simple words, positions of fireflies in every iteration
OBL-based FA uses the opposite by replacing the fireflies with worst
of a worst firefly as a new firefly fitness by their opposite ones. During
to replace the worst firefly. This the initial stage of optimization, the
process expels the worst firefly from number of worst fireflies (w) is kept
its actual path so that it go out from large in order to perform a produc-
the local optima. tive search globally. However, as the
number of iterations increases, the
number of worst fireflies (w) being
17.6.2 OPPOSITION-BASED considered is gradually reduced to
LEARNING ensure the local exploitation. The
way in which the number of worst
As mentioned earlier, the OBL fireflies (w) are considered as the
simultaneously uses a solution and iteration progresses is given by
also its equivalent and opposite an equation using g as the present
solution. generation, Gmax as the maximum
For instance, if a real number is limit of generations and the function
denoted as a [ b, c] , then its opposite Round ( ) to round the number to its
number can be denoted as a` = b + c − a . nearest integer.
Extending this idea to large dimen-
sion can be as follows. For a vector 0.33n ( Gmax − g )
w = Round (17.16)
A ( a1 , a2 ,........an ) of dimension n such Gmax

that a i [ bi , ci ] with i = 1, 2,...., n , the
opposite vector can be obtained as
A`( a`1 , a`2 ,..., a`n ) with a`i = bi + ci − a i . 17.6.3 FEATURE SUBSET
In the OBL-based FA developed SELECTION USING
for Feature Selection, the OBL has OPPOSITION-BASED FIREFLY
been used in during population ALGORITHM
initialization and creating new
generations as two stages. In the To perform feature selection, the
OBFA firstly an initial population of OBFA is used with the SVM clas-
size n is created with number of their sifier on the UCI CTG data set. The
FIGURE 17.7 Opposition-based firefly algorithm—pseudocode.
presence and absence of a feature and the best of these 25 runs are
in the data set is represented by 1 presented.
and 0, respectively. The data set is
divided into two parts in such a way
that the three-fourth of the data set 17.6.4 RESULTS OF
is used for training the classifier and SIMULATION EXPERIMENTS
the remaining one-fourth is used to
test it. The FA parameters such as Two SVM classification experiments
objective function, initial popula- are performed using the full data set
tion, randomization, attractiveness, and reduced data set by OBFA. The
coefficient of absorption, and accuracy of these two classifications
number of iterations, are taken as are presented in Table 17.6. Further,
same as that of the standard FA used the other measured performance
earlier. Also, 25 trials of simula- metrics for all these three classifiers
tions were performed using OBFA are consolidated in Table 17.7.
TABLE 17.6 Average Accuracy of SVM TABLE 17.7 Other Performance Metrics of
With and Without OBFA-based FS SVM With and Without OBFA-based FS
Data set Average Performance Without With OBFA-
accuracy Metrics (%) FS based FS
(%) Sensitivity 77.79 83.81
Actual full feature set 88.75 Specificity 90.22 93.72
(without FS)
PPV 78.29 85.45
Reduced optimal 92.85
feature set (with NPV 90.70 95.02
OBFA-based FS) G-mean 83.77 88.62
F-measure 78.08 84.62
Area under 84.00 88.76
It is found that the average accu- ROC
racy is 88.75% with full feature set
and the same is achieved as 91.92% Figures 17.8 and 17.9 are the
with optimal feature set produced graphical presentations of the results
by FA and as 92.85% with optimal of OBFA-based SVM classifier for
feature set produced by OBFA. UCI CTG data set.
FIGURE 17.8 Average accuracy of SVM with and without OBFA-based FS.
FIGURE 17.9 Other performance metrics of SVM with and without OBFA-based FS.
As presented in FA-based FS, the in specificity. Hence, the OBFA

increase in performance of OBFA- performs well on feature selection.
based SVM is given in terms of the
performance metrics, in Table 17.8
and Figure 17.10. 17.7 FEATURE SELECTION
USING OPPOSITION-BASED
TABLE 17.8 Percentage Increase in FIREFLY ALGORITHM MELDED
Performance of SVM Using OBFA-based FS WITH INFORMATION GAIN
Sl. Performance Improvement
No. Metric (%) 17.7.1 IG-OBFA-BASED
FEATURE SELECTION
1. Average accuracy 4.62
2. Sensitivity 7.74
It is always important that any
feature in a data set which contains
3. Specificity 3.88 useful information about it should
not be ignored during feature selec-
4. PPV 9.15
tion process. Removing such an
5. NPV 4.76 apposite feature will lead to poor
classification and thereby poor
6. G-Mean 5.79
prediction too. Hence, it is a good
7. F-Measure 8.38 practice to employ some tech-
niques to evaluate the relevancy of
8. Area under ROC 5.67
each feature to the data set before
ignoring it for feature reduction
The OBFA-based FS has resulted (Zhang et al., 2018). One of the
a maximum of 8.8% increase in PPV successful filter-based techniques
and the minimum of 3.9% increase for assessing the relevance of
FIGURE 17.10 Percentage increase in performance of SVM using OBFA-based FS.

feature to its associated data set is and the reduced optimum feature
IG (Sui, 2013; Azhagusundari and set produced by IG-OBFA and the
Thanamani, 2013; Mitchell, 1997; results are peresented below.
Porkodi, 2014; Subha et al., 2017).
In order to improve the classifi- TABLE 17.9 Average Accuracy of SVM
cation performance of OBFA-based Without FS, with IG and with IG-OBFA-
SVM classifier further, a new melded based FS
method is presented here which Data Set Average
employs IG with the OBFA for SVM Accuracy (%)
classifier to classify the UCI CTG
Actual full feature set 88.75
data set. (without FS)
In the IG-OBFA-based feature
Reduced feature set 89.47
selection process, the IG of all
(with IG-based FS)
features in the data set are determined
and these features are arranged in Reduced feature set 96.24
descending order based on their IG (with IG-OBFA-based
FS)
values. Then, the top 15 features are
taken as the reduced feature set and
presented as initial population for the Table 17.9 shows that using
OBFA with 1’s and 0’s representing IG alone for feature selection has
the presence and absence of a feature slightly increased the average
in the data set, respectively, in order accuracy than using the full feature
to produce the optimum feature set. set. However, there has been a great
As performed in FA and OBFA, 25 improvement in average accuracy
trials were done using IG-OBFA from 88.75% to 96.24% when
too and the best of the results are using IG-OBFA-based FS instead
presented here. of using a full feature set. The
other performance metrics such as
Specificity, Sensitivity, PPV, NPV,
17.7.2 RESULTS OF G-mean, F-measure and area under
SIMULATION EXPERIMENTS ROC are also measured for these
classifications and presented in
As done in the previous experi- Table 17.10.
ments, the training and testing Figures 17.11 and 17.12 are the
data are selected as 75:25 ratio graphical presentations of the results
of the full data set. Classifica- of IG and IG-OBFA-based SVM
tion experiments are performed classifiers for the classification of
with the full feature set, reduced UCI CTG data set.
feature set produced by IG only
TABLE 17.10 Other Performance Metrics of SVM Without FS, with IG and with IG-OBFA-
based FS
Performance Without FS With IG-based With IG-OBFA-
Metrics (%) FS based FS
Sensitivity 77.79 81.07 96.26
Specificity 90.22 91.14 91.92
PPV 78.29 78.48 93.33
NPV 90.70 91.29 97.44
G-mean 83.77 85.96 94.06
F-measure 78.08 79.75 92.61
Area under ROC 84.00 86.11 94.09
FIGURE 17.11 Average accuracy of SVM without FS, with IG and with IG-OBFA-based
FS.
FIGURE 17.12 Other performance metrics of SVM without FS, with IG and with
IG-OBFA-based FS.
392
TABLE 17.11 Performance Measures of Classification Using Various Feasture Selection Methods for UCI CTG Data Set
Performance Without FS With FS
Metrics (%)
Filter Techniques Wrapper Techniques
DT MLP NB SVM Chi-Squared Gain IG GA FA OBFA IG-OBFA
Ratio
Average Accuracy 88.15 83.45 79.69 88.75 87.40 86.46 89.47 91.35 91.92 92.85 96.24
Sensitivity 78.60 72.21 69.74 77.79 74.77 74.00 81.07 80.71 84.83 83.81 91.92
Specificity 90.99 90.12 86.83 90.22 89.33 89.2 91.14 92.50 93.78 93.72 96.26
PPV 75.94 71.60 62.38 78.29 74.92 72.44 78.48 83.06 83.14 85.45 93.33
NPV 90.09 86.43 83.11 90.70 89.78 88.89 91.29 93.77 93.26 95.02 97.44
G-Mean 84.12 79.93 77.34 83.77 80.90 80.46 85.96 85.92 89.19 88.62 94.06
F-Measure 77.14 71.45 65.28 78.08 74.78 73.03 79.75 81.87 83.94 84.62 92.61
Area under ROC 84.79 81.17 78.29 84.00 82.05 81.60 86.11 86.61 89.30 88.76 94.09
FIGURE 17.13 Performance measures of classification using various feature selection

methods for UCI CTG data.
17.8 CONCLUSION with SVM classifier were presented

in this chapter.
This chapter has briefly presented The overall comparison of perfor-
the feature selection techniques to mances of various classification
improve the performance of SVM methods have been experimented.
classification of UCI CTG data set. These methods include classifiers
By performing feature selection, the without using any feature selection
size of the feature set is reduced and methods such as DT, multilayer
therby the task of classification is perceptron (MLP), NB, and SVM,
made less computationally expen- classifiers using filter techniques for
sive and more efficient. Optimiza- feature selection such as chi-squared,
tion techniques such as FA, OBFA, gain ratio and IG methods and classi-
and IG-OBFA have been used to find fiers using wrapper techniques such
the optimum feature set containing as GA, FA, OBFA, and IG-OBFA
only the most infulential features of methods. Various performance
the whole feature set. These three measures namely average accuracy,
feature selection techniques and sensitivity, specificity, positive
their performance when combined predictive value (PPV), negative
predictive value (NPV), G-mean, Agarwal, RK.; Bala, R. A hybrid approach

F-measure and area under ROC are for selection of relevant features for
microarray datasets. International Journal
evaluated for all these classifiers. The of Computer and Information Engineering.
results of all these classifications for 2007, 1(2), 1319–1325.
UCI CTG data set are summarized Ahmed, Al-Ani. Feature subset selection
in Table 17.11. This table shows using ant colony optimization.
that the classification performance International Journal of Computational
Intelligence. 2005, 2(1), 53–58.
is highly improved when employing Ahuja, Y.; Yadav, SK. Multiclass
IG-OBFA. classification and support vector machine.
Global Journal of Computer Science
and Technology Interdisciplinary, 2012,
KEYWORDS 12(11), 14–20.
Alfirevic, Z.; Devane, D.; Gyte, GM.;
Cuthbert, A. Continuous cardiotocography
cardiotocogram (CTG) (CTG) as a form of electronic fetal
feature selection monitoring (EFM) for fetal assessment
during labour. Cochrane Database
classifcation Systematic Reviews, 2006, 2(2). DOI:
performance metrics 10.1002/14651858.CD006066.pub3
fetal heart rate Ambica, A.; Gandi, S.; Kothalanka, A. An
efficient expert system for diabetes by
uterine contraction naive Bayesian classifier. International
clinical decision support sys- Journal of Engineering Trends and
tem Technology. 2013, 4(10), 4634–4639.
Anbarasi, M.; Anupriya, E.; Iyengar, N.
frefy algorithm
Enhanced prediction of heart disease
opposition-based learning with feature subset selection using
information gain genetic algorithm. International Journal
of Engineering Science and Technology.
support vector machines 2010, 2(10), 5370–5376.
Arumugam, Manivanna Boopathi.; A,
Abudhahir. Firefly algorithm tuned fuzzy
set-point weighted PID controller for
REFERENCES antilock braking systems. Journal of
Engineering Research. 2015, 3(2), 79–94.
Abd-Alsabour, N.; Randall, M. Feature Azhagusundari, B.; Thanamani, Antony
selection for classification using an ant Selvadoss. Feature selection based on
colony system. Sixth IEEE International information gain. International Journal
Conference on e-Science Workshops, of Innovative Technology and Exploring
2010, 86–91. Engineering, 2013, 2(2), 18–21.
Aftarczuk, K. Evaluation of selected data Aziz, ASA.; Azar, AT.; Salama, MA.;
mining algorithms implemented in Medical Hassanien, AE.; Hanafy, SEO. Genetic
Decision Support Systems. Master Thesis, algorithm with different feature selection
Blekinge Institute of Technology, Sweden, techniques for anomaly detectors
2007. generation. Federated Conference on
Computer Science and Information Chudáček, V.; Spilka, J.; Huptych, M.;
Systems (FedCSIS), 2013, 769–774. Georgoulas, G.; Lhotská, L.; Stylios,
Bachrach, RG.; Navot, A,; Tishby, N. C.; Koucky, M.; Janku, P. Linear and
Margin based feature selection-theory non-linear features for intrapartum
and algorithms. Proceedings of the cardiotocography evaluation. Computing
Twenty-First International Conference on in Cardiology, 2010, 37, 999–1002.
Machine Learning. 2004, 43–51. Chudacek, V.; Spilka, J.; Rubackova, B.;
Banati, H.; Bajaj, M. Firefly based feature Koucky, M.; Georgoulas, G.; Lhotska, L.;
selection approach. International Journal Stylios, C. Evaluation of feature subsets
of Computer Science Issues. 2011, 8(4), for classification of cardiotocographic
473–480. recordings. Computers in Cardiology,
Bharathi. PT; Subashini, P. Differential 2008, 845–848.
evolution and genetic algorithm based Cömert, Zafer.; Kocamaz, AF.; Subha,
feature subset selection for recognition of Velappan. Prognostic model based on
river ice types. Journal of Theoretical & image-based time-frequency features
Applied Information Technology. 2014, and genetic algorithm for fetal hypoxia
67(1), 254–262. assessment. Computers in Biology and
Bhatia, S.; Prakash, P.; Pillai, GN. SVM Medicine, 2018, 99, 85–97.
based decision support system for heart Cömert, Z.; Kocamaz, AF. Using wavelet
disease classification with integer-coded transform for cardiotocography signals
genetic algorithm to select critical features. classification, 25th Signal Processing and
Proceedings of the World Congress on Communications Applications Conference
Engineering and Computer Science (SIU), Turkey, 2017
(WCECS), 2008, 22–24. CÖMERT, Zafer.; Yang, Zhan.; Subha,
Bhatla, N.; Jyoti, K. An analysis of heart Velappan.; Kocamaz, Adnan Fatih.;
disease prediction using different data Manivanna Boopathi, Arumugam.
mining techniques. International Journal Performance evaluation of empirical
of Engineering. 2012, 1(8), 1–4. mode decomposition and discrete wavelet
Buck, TE.; Zhang, B. SVM kernel transform for computerized hypoxia
optimization: An example in yeast protein detection and prediction. 26th IEEE
subcellular localization prediction. Project Signal Processing and Communication
Report, School of Computer Science, Applications (SIU) Conference, Turkey,
Carnegie Mellon University, Pittsburgh, 2018a.
USA, 2006. CÖMERT, Zafer.; Yang, Zhan.; Subha,
Chitradevi, Muthusamy.; Sundar, Velappan.; Kocamaz, Adnan Fatih.;
Chinnasamy.; Geetharamani, Gopal. An Manivanna Boopathi, Arumugam.
outlier based Bi-level neural network The influences of different window
classification system for improved functions and lengths on image-based
classification of cardiotocogram data. Life time-frequency features of fetal heart
Science Journal, 2013, 10(1), 244–251. rate signals. 26th IEEE Signal Processing
Chuang, LY.; Jhang, HF.; Yang, CH. Feature and Communication Applications (SIU)
selection using complementary particle Conference, Turkey, 2018b.
swarm optimization for DNA microarray Deekshatulu, BL.; Chandra, P. Classification
data. Proceedings of International of heart disease using k-nearest neighbor
Conference of Engineers and Computer and genetic algorithm. Procedia
Scientists, Hong Kong, 2013. Technology. 2013, 10, 85–94.
DİKER, Aykut.; CÖMERT, Zafer.; Subha, Geeky.medics. http://geekymedics.com/

Velappan.; AVCI, Engin. Intelligent system how-to-read-a-ctg/ (accessed June 21,
based on genetic algorithm and support 2019)
vector machine for detection of myocardial Ha, SH.; Joo, SH. A hybrid data mining
infarction from ECG signals. 26th IEEE method for the medical classification
Signal Processing and Communication of chest pain. International Journal of
Applications (SIU) Conference, Turkey, Computer and Information Engineering.
2018. 2010, 4(1), 33–38.
Draa, Amer.; Benayad, Zeyneb.; Djenna, Han, JS.; Lee, SW.; Bien, Z. Feature subset
Fatima Zahra. An opposition-based firefly selection using separability index matrix.
algorithm for medical image contrast Information Sciences. 2013, 223, 102–118.
enhancement. International Journal HealthCatalyst. https://www.healthcatalyst.
of Information and Communication com/data-mining-in-healthcare (accessed
Technology. 2015, 7(4/5), 385–405. June 21, 2019)
ElAlami, ME. A filter model for feature Healthnetconnections. http://www.hnc.net/
subset selection based on genetic products/trium-ctg-monitoring/ (accessed
algorithm. Knowledge-Based Systems. June 21, 2019)
2009, 22(5), 356–362. Hongbiao, Zhou.; Ying, Genwang.
Emary, E.; Zawbaa, HM.; Ghany, KKA.; Identification of CTG based on BP
Hassanien, AE.; Parv, B. Firefly neural network optimized by PSO. 11th
optimization algorithm for feature International Symposium on Distributed
selection. Proceedings of the 7th Balkan Computing and Applications to Business,
Conference on Informatics Conference, Engineering & Science (DCABES), 2012,
2015, 26. 108–111.
Emary, E.; Zawbaa, HM.; Grosan, C.; Hoque, N.; Bhattacharyya, DK.; Kalita,
Hassenian, AE. Feature subset selection JK. MIFS-ND: A mutual information-
approach by gray-wolf optimization. based feature selection method. Expert
Afro-European Conference for Industrial Systems with Applications. 2014, 41(14),
Advancement. 2015, 1–13. 6371–6385.
Ergun, B.; Sen, S.; Kilic, Y.; Kuru, O.; Huan, Liu.; Hiroshi, Motoda. Computational
Ozsurmeli, M. The role of non-stress test methods of feature selection, Data
to decision making procedure in pregnant Mining and Knowledge Discovery Series.
women with Cesarean delivery “outcomes Chapman & Hall/CRC, 2008.
of our clinic and literature review”. Turkish Huan, Liu.; Hiroshi, Motoda. Feature
Journal of Obstetrics and Gynecology. Selection for Knowledge Discovery and
2012, 9(1), 59–64. Data Mining. Springer Science & Business
Fayyad, U.; Shapiro, GP.; Smyth, P. From Media, New York, 2000.
data mining to knowledge discovery in Huang, J.; Cai, Y.; Xu, X. A hybrid genetic
databases. AI Magazine, 1996, 17(3), algorithm for feature selection wrapper
37–54. based on mutual information. Pattern
Fontenla, Romero.; Guijarro, Berdiñas.; Recognition Letters. 2007, 28(13),
Alonso, Betanzos. Symbolic, neural 1825–1844.
and neuro-fuzzy approaches for pattern Huang, Mei-Ling.; Yung-Yan, Hsu. Fetal
recognition in cardiotocograms. Advances distress prediction using discriminant
in Computational Intelligence and analysis, decision tree, and artificial neural
Learning, 2002, 18, 489–500. network. Journal of Biomedical Science
and Engineering, 2012, 5(9), 526.
Huang, Yo-Ping.; Shin-Liang, Lai.; Frode problem. machine learning: Proceedings

Eika, Sandnes.; Shen-Ing, Liu. Improving of 11th International Conference, 1994,
classifications of medical data based on 121–129.
fuzzy ART 2 decision trees, International Johnson, Bryan.; Alex, Bennett.; Myungjae,
Journal of Fuzzy Systems. 2012, 14(3), Kwak.; Anthony, Choi. Automated
444–453. evaluation of fetal cardiotocograms using
Imani, MB.; Pourhabibi, T.; Keyvanpour, neural network. IEEE International
MR.; Azmi, R. A new feature selection Conference on Systems, Man, and
method based on ant colony and genetic Cybernetics (SMC). 2012, 408–413.
algorithm on Persian font recognition. Karabulut, EM.; Ibrikci, T. Analysis of
International Journal of Machine Learning cardiotocogram data for fetal distress
and Computing. 2012, 2(3), 278–282. determination by decision tree based
Inbarani, HH.; Azar, AT.; Jothi, G. Supervised adaptive boosting approach. Journal of
hybrid feature selection based on PSO Computer and Communications. 2014,
and rough sets for medical diagnosis. 2(9), 32–37.
Computer Methods and Programs in Krupa, N.; Ali, M.; Zahedi, E.; Ahmed,
Biomedicine. 2014, 113(1), 175–185. S.; Hassan, FM. Antepartum fetal heart
Iztok, Fister.; Iztok, Fister Jr.; Xin-She, rate feature extraction and classification
Yang.; JanezBrest. A comprehensive using empirical mode decomposition
review of firefly algorithms. Swarm and and support vector machine. Biomedical
Evolutionary Computation. 2013, 13(1), Engineering Online, 2011, 10(1), 1–15.
34–46. Kursa, MB.; Rudnicki, WR. Feature
Jacob, SG.; Ramani, RG. Evolving efficient selection with the Boruta package, Journal
classification rules from cardiotocography of Statistical Software, 2010, 36(11), 1–13.
data through data mining methods and Liu, H.; Sun, J.; Liu, L.; Zhang, H.
techniques. European Journal of Scientific Feature selection with dynamic mutual
Research. 2012, 78(3), 468–480. information. Pattern Recognition. 2009,
Jaddi, NS.; Abdullah, S. Hybrid of genetic 42(7), 1330–1339.
algorithm and great deluge algorithm Liu, J.; Ranka, S.; Kahveci, T. Classification
for rough set attribute reduction. Turkish and feature selection algorithms for multi-
Journal of Electrical Engineering class CGH data. Bioinformatics. 2008,
& Computer Sciences. 2013, 21(6), 24(13), i86-i95.
1737–1750. Liu, Y.; Wang, G.; Chen, H.; Dong, H.;
Jadhav, S.; Nalbalwar, S.; Ghatol, A. Modular Zhu, X.; Wang, S. An improved particle
neural network model based foetal swarm optimization for feature selection.
state classification. IEEE International Journal of Bionic Engineering. 2011, 8(2),
Conference on Bioinformatics and 191–200.
Biomedicine Workshops (BIBMW), 2011, Mehmed, Kantardzic. Data Mining-
915–917. Concepts, Models, Methods, and
Jeyachidra, J.; Punithavalli, M. A study on Algorithms. John Wiley & Sons, 2011.
statistical based feature selection methods Menai, MEB.; Mohder, FJ.; Al-mutairi, F.
for classification of gene microarray Influence of feature selection on naive
dataset. Journal of Theoretical and Bayes classifier for recognizing patterns in
Applied Information Technology. 2013, cardiotocograms. Journal of Medical and
53(1), 107–114. Bioengineering, 2013, 2(1), 66–70.
John, GH.; Kohavi, Ron.; Pfleger, Karl. Michelakos, I.; Papageorgiou, E.;
Irrelevant features and the subset selection Vasilakopoulos, M. A hybrid classification
algorithm evaluated on medical data. Porkodi, R. Comparison of filter based

19th IEEE International Workshop on feature selection algorithms an overview.
Enabling Technologies: Infrastructures International journal of Innovative
for Collaborative Enterprises (WETICE). Research in Technology & Science. 2014,
2010, 98–103. 2(2), 108–113.
Mitchell, T. Machine Learning. McGraw- Rajendran, P.; Madheswaran, M. Hybrid
Hill, New York, 1997. medical image classification using
Mohamed Ali, EA.; Abudhahir, A.; association rule mining with decision tree
Manivanna Boopathi, A. Firefly algorithm. Journal of Computing. 2010,
Algorithm optimized PI controller for 2(1), 127–136.
pressure regulation in PEM Fuel Cells. Ravindran, Sindhu.; Asral Bahari, Jambek.;
Journal of Computational and Theoretical Hariharan, Muthusamy.; Siew-Chin,
Nanoscience, 2018, 15(1), 1–9. Neoh. A novel clinical decision support
Nakamura, RYM.; Pereira, LAM.; Costa, system using improved adaptive genetic
KA.; Rodrigues, D.; Papa, JP.; XS, Yang. algorithm for the assessment of fetal well-
BBA: A binary bat algorithm for feature being. Computational and Mathematical
selection. 25th SIBGRAPI Conference Methods in Medicine. 2015, 2015, 283532.
on Graphics, Patterns and Images, 2012, Rodrigues, D.; Pereira, LAM.; Almeida,
291–297. TNS.; Papa, JP.; Souza, AN.; CCO,
Nidhal, S.; Ali, MAM.; Najah, H. A novel Ramos.; XS, Yang. BCS: A binary cuckoo
cardiotocography fetal heart rate baseline search algorithm for feature selection.
estimation algorithm. Scientific Research IEEE International Symposium on Circuits
and Essays. 2010, 5(24), 4002–4010. and Systems (ISCAS2013), 2013, 465–468.
Nithya, D.; Suganya, V.; RSI, Mary. Rodrigues, D.; Pereira, LAM.; Nakamura,
Feature selection using integer and binary RYM.; Costa, KAP.; Yang, XS.; Souza,
coded genetic algorithm to improve the AN.; Papa, JP. A wrapper approach for
performance of SVM classifier. Journal of feature selection based on bat algorithm
Computer Applications. 2013, 6(3), 57–61. and optimum-path forest. Expert
Ocak, Hasan. A medical decision support Systems with Applications. 2014, 41(5),
system based on support vector machines 2250–2258.
and the genetic algorithm for the evaluation Rodrigues, PP.; Sebastiao, R.; Santos, CC.
of fetal well-being. Journal of medical Improving cardiotocography monitoring:
systems. 2013, 37(2), 1–9. a memory-less stream learning approach.
Ocak, Hasan.; Huseyin Metin, Ertunc. LEMEDS’11 Learning from Medical Data
Prediction of fetal state from the Streams. 2011, 12.
cardiotocogram recordings using adaptive Sahin, H.; Subasi, A. Classification of the
neuro-fuzzy inference systems. Neural cardiotocogram data for anticipation
Computing and Applications. 2013, 23(6), of fetal risks using machine learning
1583–1589. techniques. Applied Soft Computing, 2015,
Pappa, GL.; Freitas, AA.; Kaestner, CAA. 33, 231–238.
A multiobjective genetic algorithm for Schiezaro, M.; Pedrini, H. Data feature
attribute selection. Proceedings of 4th selection based on artificial bee colony
International Conference on Recent algorithm. EURASIP Journal on Image
Advances in Soft Computing (RASC-2002). and Video Processing. 2013, 47(1), 1–8.
2002, 116–121. Shruti, Ratnakar.; K, Rajeshwari.; Rose,
Patient.info. http://patient.info/in/health/ Jacob. Prediction of heart disease using
cardiotocography (accessed June 21, 2019) genetic algorithm for selection of optimal
reduced set of attributes. International Journal of Computer Science. 2013, 9(2),

Journal of Advanced Computational 198–206.
Engineering and Networking, 2013, 1(2), Taha, AM.; Tang, AYC. Bat algorithm for
2106–2320. rough set attribute reduction. Journal
Sikora, R.; Piramuthu, S. Framework for of Theoretical and Applied Information
efficient feature selection in genetic Technology. 2013, 51(1), 1–8.
algorithm based data mining. European Tang, H.; Wang, T.; Li, M.; Yang, X.
Journal of Operational Research. 2007, The design and implementation of
180(2), 723–737. cardiotocography signals classification
Sontakke, S.; Lohokare, J.: Dani, algorithm based on neural network.
R.; Shivagaje, P. Classification of Computational and Mathematical Methods
Cardiotocography Signals Using in Medicine, 2018, 2018, 12.
Machine Learning. Proceedings of the Tiwari, R.; Singh, MP. Correlation-based
2018 Intelligent Systems Conference attribute selection using genetic algorithm.
(IntelliSys), Volume 2, 2019. International Journal of Computer
Subha, V.; Murugan, D. Foetal state Applications. 2010, 4(8), 28–34.
determination using support vector Tizhoosh, HR. Opposition-based
machine and firefly optimisation. learning: a new scheme for machine
International Journal of Knowledge Based intelligence. International Conference on
Computer System, 2014, 2(2), 7–12. Computational Intelligence for Modelling,
Subha, V.; Murugan, D. Opposition-based Control and Automation Jointly with
firefly algorithm optimized feature International Conference on Intelligence
subset selection approach for fetal risk Agents, Web Technologies and Internet
anticipation. Machine Learning and Commerce (CIMCA-IAWTIC’05), 2006, 1,
Applications: An International Journal. 695–701.
2016, 3(2), 55–64. Tomáš, P.; Krohova. J.; Dohnalek, P.; Gajdoš,
Subha, V.; Murugan, D.; Manivanna P. Classification of cardiotocography
Boopathi, A. A hybrid filter-wrapper records by random forest.36th International
attribute reduction approach for fetal risk Conference on Telecommunications and
anticipation. Asian Journal of Research Signal Processing (TSP), 2013, 620–923.
in Social Sciences and Humanities. 2017, UCI Machine Learning Repository.
7(2), 1094–1106. http://archive.ics.uci.edu/ml/index.php
Subha, Velappan.; Murugan, D.; Prabha, S; (accessed June 21, 2019)
Manivanna Boopathi, Arumugam. Genetic Uzer, MS.; Yilmaz, Nihat.; Inan, Onur.
algorithm based feature subset selection Feature selection method based on
for fetal state classification. Journal of artificial bee colony algorithm and support
Communications Technology, Electronics vector machines for medical datasets
and Computer Science, 2015, 2, 13–17. classification. The Scientific World
Sui, Bangsheng. Information gain feature Journal. 2013, 2013, 419187.
selection based on feature interactions. Venkatesan, P.; Premalatha, V. Genetic-
M.S. thesis, University of Houston, 2013. neuro approach for disease classification.
Sumeet, Dua.; Xian, Du. Data Mining and International Journal of Science and
Machine Learning in Cybersecurity. CRC Technology. 2012, 2(7), 473–478.
Press, 2011. Vidyavathi, BM.; Ravikumar, CN. A novel
Sundar, C.; Chitradevi, M.; Geetharamani, hybrid filter feature selection method
G. An overview of research challenges for data mining, Ubiquitous Computing
for classification of cardiotocogram data.
and Communication Journal. 2008, 3(3), optimisation for feature selection. IEEE
118–121. Congress on Evolutionary Computation,
Wagholikar, Kavishwar.; V, Sundararajan.; 2012, 1–8.
Ashok, Deshpande. Modeling paradigms Xu, Q.; Wang, L.; Baomin, H.; Wang, N.
for medical diagnostic decision support: Modified opposition-based differential
a survey and future directions. Journal evolution for function optimization.
of Medical Systems. Journal of Medical Journal of Computational Information
Systems, 2012, 36(5), 3029–3049. Systems, 2011, 7(5), 1582–1591.
Warrick, PA.; Hamilton, EF.; Kearney, RE.; Yu, Shuhao.; Zhu, Shenglong.; Ma, Yan.;
Precup, D. Classification of normal and Mao, Demei. Enhancing firefly algorithm
hypoxic fetuses using system identification using generalized opposition-based
from intrapartum cardiotocography. IEEE learning. Computing. 2015, 97(7),
Transactions on Biomedical Engineering, 741–754.
2010, 57(4), 771–779. Zhang, Zhongheng.; Trevino, Victor.;
Xin-She, Yang. Nature-Inspired Hoseini, Sayed Shahabuddin.; Belciug,
Metaheuristic Algorithms. 2nd edition, Smaranda.; Manivanna Boopathi,
Luniver Press, UK, 2010. Arumugam.; Gorunescu, Florin.; Subha,
Xue, B.; Cervante, L.; Shang, L.; Browne, Velappan. Variable selection in logistic
WN.; Zhang, M. A multi-objective particle regression model with genetic algorithm.
swarm optimisation for filter-based feature Annals of Translational Medicine. 2018,
selection in classification problems, 6(3):45.
Connection Science, 2012, 24(2–3), Zhuo, L.; Zheng, L.; Li, X.; Wang, F.; Ai, B.;
91–116. Qian, J. A genetic algorithm based wrapper
Xue, B.; Zhang, M.; Browne, WN. Multi- feature selection method for classification
objective particle swarm optimisation of hyperspectral images using support
(PSO) for feature selection. Proceedings vector machine. Geoinformatics and Joint
of the 14th Annual Conference on Genetic Conference on GIS and Built Environment:
and Evolutionary Computation, 2012, Classification of Remote Sensing Images,
81–88. 2008, 71471, 71471J.
Xue, B.; Zhang, M.; Browne, WN. New
fitness functions in binary particle swarm
CHAPTER 18
DEPLOYMENT OF SUPERVISED
MACHINE LEARNING AND DEEP
LEARNING ALGORITHMS IN
BIOMEDICAL TEXT CLASSIFICATION
G. KUMARAVELAN* and BICHITRANANDA BEHERA
Department of Computer Science, Pondicherry University,
Karaikal, India
Corresponding author. E-mail: gkumaravelanpu@gmail.com
*
ABSTRACT community. Hence, this chapter

investigates the deployment of the
Document classification is a state-of-the-art machine learning
prevalent task in natural language (ML) algorithms like decision tree,
processing with broad applications
k-nearest neighborhood, Rocchio,
in the biomedical domain, including
ridge, passive–aggressive, multino-
biomedical literature indexing, auto-
mial naïve Bayes (NB), Bernoulli
matic diagnosis codes assignment,
tweets classification for public health NB, support vector machine, and
topics, patient safety reports classifi- artificial neural network classifiers
cation, etc. In recent years, the catego- such as perceptron, random gradient
rization of biomedical literature plays descent, BPN in automatic classifica-
a vital role in biomedical engineering. tion of biomedical text documents on
Nevertheless, manual classification benchmark datasets like BioCreative
of biomedical papers published in Corpus III (BC3), Farm Ads, and
every year into predefined catego- TREC 2006 genetics Track. Finally,
ries becomes a cumbersome task. the performance of all the said consti-
Hence, building an effective auto- tutional classifiers are compared and
matic document classification for evaluated by means of the well-
biomedical databases emerges as a defined metrics like accuracy, error
significant task among the scientific rate, precision, recall, and f-measure.
18.1 INTRODUCTION classification task is often classified

into three broad classes specifically
Biomedical engineering introduces supervised document classification,
different innovative techniques unsupervised document classifica-
and materials in medicine and tion, and semisupervised document
healthcare for the development of classification. In supervised docu-
novel biomedical tools. In the era of ment classification, some external
Internet-connected devices in every mechanism is needed manually to
minute, a tremendous amount of the classifier model, which contrib-
biomedical data is generated with utes information related to the
high throughput. More specifically, precise document classification. In
biomedical research publishes unsupervised document classifica-
numerous scientific articles in tion, there is no scope of having
electronic text form, which focuses an external mechanism to provide
on innovative research. In this case, information to the classification
the manual classification of these model to the correct document clas-
biomedical documents leads to a sification. In semisupervised docu-
cumbersome task. Hence, building ment classification, a partial amount
an automatic classifier model for of the documents are labeled by an
these biomedical documents plays external mechanism. This chapter
an important area of research. focuses on the deployment of state-
In general, an automatic docu- of-the-art supervised ML algorithms
ment classification algorithm assigns for biomedical text classification.
a predefined label to the instances The classifier model build using
of the text documents (test data supervised ML algorithms are
set) based on the classifier model broadly divided into two forms,
developed using the machine namely multiclass and multilabel
learning (ML) algorithm. ML is a classification. The multiclass classi-
subfield of artificial intelligence fication is the one where a single class
which disseminates intelligence to label out of many is assigned to one
the classifier model from the training instance. Decision tree (DT) classi-
data set. So that the build-in clas- fier, k-nearest neighborhood (k-NN)
sifier model captures the inherent classifier, Rocchio classifier (RC),
patterns and relationship based on ridge classifier, passive–aggressive
the corresponding labels assigned to (PA) classifier, multinomial naïve
the given text documents (training Bayes (M_NB) classifier, Bernoulli
data set). naïve Bayes (B_NB) classifier,
Depending on the usage of the support vector machine (SVM) clas-
ML algorithm, automatic document sifier, artificial neural network
Deployment of Supervised Machine Learning and Deep Learning Algorithms 403
(ANN) classifier including percep- out that examine the progressive

tron (PPN), stochastic gradient ML algorithms to the benchmark
descent (SGD), BPN are the most biomedical set in one platform.
prominent classifier found in Therefore, the primary aim of this
the literature of supervised ML book chapter is to perform an end-
community. However, multilabel to-end performance analysis of all
classification assigns more than one the distinguished supervised ML
class labels among the instances, and algorithms for automatic document
it is considered to be more complex classification in the biomedical
classification than multiclass clas- domain.
sification. Specifically, multilabel The organization of this chapter
classification falls into two main is as follows: Section 18.2 elabo-
categories, namely problem adaption rates the background details for text
and algorithm adaption. Problem document classification process
adaptation method transforms the including preprocessing along with
multilabel problem into a single- document representation, document
label or multiclass problem(s). The classification which includes math-
main aim of this type of transforma- ematic formulation for document
tion is to fit the data to the multiclass classification, and literature review
algorithm. for biomedical text document clas-
This chapter provides an overview sification using ML algorithms.
of the deployment of the state-of-the- Section 18.3 depicts, in a nutshell,
the various ML algorithms used
art supervised ML in biomedical text
in this book chapter for document
document classification. The perfor-
classification purpose. Section 18.4
mance of the built-in classifiers is
describes the novel experiments
compared and empirically evaluated
conducted towards the deployment
using well-defined metrics such as
of ML solutions in document clas-
accuracy, error rate, precision, recall,
sification. Section 18.5 gives the
and f-measure on publicly available
conclusion and suggests topics for
benchmark biomedical data sets like
further research.
BioCreative Corpus III (BC3), Farm
Ads, and TREC 2006 Genomics
Track. Except for farm Ads, data 18.2 BACKGROUND
set remaining BC3 and TREC 2006
Genomics Track datasets consist of 18.2.1 TEXT CLASSIFICATION
documents which are extracted from PROCESS
PubMed central digital repository.
In the literature, only a few Biomedical text classification deals
analysis works have been meted with unstructured text documents
from different biomedical reposi-

tories like PubMed and MedLine,
web blogs, e-newspapers, medical
reports, and social media. The major
aim of the text classification process
is to predict a class label of the
given test document with the prior
knowledge of trained dataset. In
general, text classification process
involves three important steps: text
preprocessing, text classification,
and postprocessing. Figure 18.1
shows the various steps involved
in building an automatic document
classification model.
18.2.1.1 TEXT PREPROCESSING
Generally, in document classifica-

tion model development, the first
and important key component is text
preprocessing, which has a great
impact on classification perfor-
mance. It normally consists of three
tasks, namely feature extraction, FIGURE 18.1 Text document classification
feature reduction, and document process using ML algorithm.
representation.
Feature Extraction: It includes
many activities such as tokeniza- • Tokenization: The input for
tion, filtering or stop-word removal, tokenization activity is the
lemmatization, and stemming of raw text data or text docu-
words to scale down the document ment. It breaks the sequence
complexity and to present the clas- of strings from the given raw
sification method in an accessible text data into small character
manner. pieces that can be a distinctive
word, phrases, or keywords
known as tokens (Webster &
Kit, 2010).
• Filtering: It removes used stemming technique

unwanted words from the (Porter, 1980; Hull, 1996).
documents so that more focus
is given to special words in Feature Reduction: Normally,
the document. Stop-Words in a text document, the numbers of
removal is a well-known words otherwise called features are
filtering method in which incredibly large, and those words
those words that are often play a vital role in document repre-
used without meaningful sentation. Therefore, it is necessary
content is get removed (Saif to use the feature reduction methods
et al., 2014; Silva, 2003). to make an effective representation
Examples of such stop-words of the given text documents without
are prepositions, conjunc- changing the meaning of text data.
tions, and determiners. Feature reduction methods are
• Lemmatization: In docu- loosely divided into two categories,
ments, there is varied namely, feature selection and feature
inflected sort of words transformation.
whose meaning are almost
in identical nature. In such a • Feature selection: It involves
situation, lemmatization is a the selection of a subset of
kind of task which performs features that can equivalently
grouping of those words represent the original physical
having similar meaning into meaning with a better under-
one word by using vocabulary standing of data which leads
and morphological analysis to the elegant learning process
of those words in that cluster (Liu & Motoda, 1998). The
of words. major goal of the feature
• Stemming: It is the task of selection method is to reduce
reducing derived words to the curse of dimensionality to
their base or root form. Other- make the training dataset in
wise, it is like a crude chop- the smaller size that can lead
ping of affixes. For example, to lesser computational time.
words like “running”and The advantage of reducing
“runs” will be reduced to their the curse of dimensionality is
base form like “run.” Several to increase the classification
stemming algorithms have accuracy and to decrease the
been developed with time. over-fitting problem. There
In the field of Text Mining, are different types of feature
Porter Stemmer is the mostly selection methods available
text mining literature namely It initially used for indexing

term frequency (TF), mutual and information retrieval (IR). It
information, information converts documents into numerical
gain, Chi-square statistic vectors with the document set D;
and term strength (Yang & vocabulary
 set V and the term vector
Pederson,1995). td for document d. Set D = {d1 , d 2 ,..., d D }
• Feature Transformation: It is a collection of Documents, the
generates a new and smaller set V = {w1 , w2 ,..., wv } is a set of unique
set of features by transforming words or terms in the set D and the

or mapping the original set of term vector td = ( f d ( w1 ), f d ( w2 ),..., f d ( wv ))
features. Some well-known where f d ( w) represents the frequency
feature transformations of term w V in the document d D
methods are Latent Semantic and f D ( w) represents a number of
Indexing (LSI) (Deerwester documents contain the word w.
et al., 1990), PLSI (Hofmann, In VSM, the Boolean model and
1999), linear discriminant TF-IDF are the two-term weight
analysis (Fisher, 1936; schemes are used to calculate the
Chakrabarti et al., 2003) and weight of each feature. The Boolean
generalized singular value model assigns wij > 0 to each term wi
decomposition methods if wi d j and assigns wij = 0 wi Ł d j
(Howland et al., 2003, 2004) if. However, the TF-IDF scheme
calculates the term weight of each
Document Representation: word w d as follows:
Once the features are extracted
from the raw text data, all the given D
documents are normalized to unit q ( w) = f d ( w) *log (18.1)
f D ( w)
length to perform classification in
an economical manner. Basically, where D is the number of docu-
there are three most used models on ments in the set D.
the market within the literature for
document representation namely,
18.2.1.2 TEXT CLASSIFICATION
vector space method (VSM) (Salton
et al., 1975), probabilistic models STEP
(Manning et al., 2008), and the infer-
Mathematically, the text classifica-
ence network model (Turtle & Croft,
tion problem wants three sets to
1989). Among the three models,
outline. First one is the training
VSM is the most used model, and the
document set D = {d1,d2,…,dn}, the
following section describes briefly
second one is the category label set
about VSM.
C={c1,c2,…,cn} and third one is the
test document set T={d1, d2,…,dn}. evaluation of the classification

Every document di of the training models is performed through various
document set D is labeled with a elegant performance measures like
category label ci from the category accuracy, precision, recall, and F−1
label set C; however, every docu- scores.
ment of the test document set T has
not been labeled. The most aim of
text classification is to construct a 18.2.2 LITERATURE REVIEW
text classification model, that is,
a text classifier from the training The organization and access to
document set by relating the features biomedical information are in great
within the text documents to one demand nowadays because of the
of the target class labels. When the exponential growth of biomedical
classification model is trained, it will documents evolved from different
predict the category labels of the test biomedical research publications
document set. Mathematical formula and clinical trials. Sebastiani surveys
of text classification algorithm both concerning the various types of text
for training and testing is given as document classification, application
of text document classification, and
f: D→C f(d)=c (18.2) mentioned the role of ML algorithms
in automatic text document classifi-
In Equation 18.2, classifier
cation (Sebastiani, 2002) thoroughly.
assigns the proper class label to
Cohen developed a replacement
new document d (test instance). If
classification algorithm by assem-
a class label is assigned to the test
bling SVM with rejection sampling
instance, then this sort of classifica-
and chi-square feature selection
tion is termed hard or multiclass
technique for automatic document
classification, and on the other hand,
classification (Cohen, 2006). The
classification is termed soft if a prob-
TREC 2005 genomics track biomed-
ability value is assigned to the test
ical dataset was used to compare
instance. In multilabel classification,
the classification performance of
multiple class labels are allotted to a
the classifier with a different variant
test instance.
of SVM classifier. Almeida et al.
conferred supervised ML approaches
18.2.1.3 POSTPROCESSING like NB, SVM, and provision Model
STEP Trees to perform text classification
of PubMed abstracts, to support the
In postprocessing step evaluation triage of documents (Almeida et al.,
of the classifier is performed. The 2014).
García et al. developed a bag- 18.3 SUPERVISED ML

of-concepts representation of docu- ALGORITHMS FOR TEXT
ments and applied ML algorithm DOCUMENT CLASSIFICATION
like SVM for biomedical document
classification (García et al., 2015). 18.3.1 DECISION TREE (DT)
Nguyen et al. proposed an improved CLASSIFIER
feature weighting technique for
document representation and SVM In the DT classification model, the
as a classifier (Nguyen et al., 2016). instances are the documents and
The proposed document representa- attributes of every document are
tion technique provides the best itself a bag of words or terms. The
classification performance compared DT classifier (Li & Jain, 1998)
to the documents represented in performs hierarchical decomposi-
bag-of-words or TF-IDF document tion of text documents of training
representation. dataset by labeling its internal nodes
Samal et al. measured the with names of the text documents,
performance of most of the super- branches of the tree with the test
vised classifiers for sentiment condition on terms and leaves of the
analysis using movie review dataset tree with categories (labels). The test
and concluded that SVM classifiers condition on terms could also be of
performed best among all classifiers two varieties supported the docu-
for large movie review datasets ment representation model.
(Samal et al., 2017). The first category of the test is
Mishu et al. analyzed the perfor- to test whether or not a selected
mance of various supervised ML term out there within the docu-
algorithms such as multinomial ments or not. The second kind of
NB, B_NB, logistic regression, test is to look at the weight of the
stochastic gradient descent, SVM, terms within the text document.
BPN for classification on Reuters The primary class of the test is
corpus, brown corpus and movie used if document representation
review corpus and concludes that is of the shape of the binary or
BPN is best among them (Mishu et Boolean model and also the second
al., 2016). category of the test will be used if
Jiang et al. applied NB and document representation is of the
random forest (RF) for classifying form of TF-IDF model. During
biomedical publication docu- the training phase, the DT is made
ments associated with mouse gene from the training dataset, whereas
expression database (Jiang et al., making the DT from the training
2017). data set, totally different splitting
criteria are used, and most of the one or zero depending on the pres-
DT classifiers use single attribute ence or absence of that particular
split combination wherever the one attribute in that document (Lewis,
attribute is employed to perform 1998). However, the multinomial
the division (Aggarwal, 2012). The model works on the frequencies
attribute or term whose informa- of attributes available in the VSM
tion gain is high is considered as a representation of the documents
base node, and also the procedure (McCallum, 1998). If the vocabu-
is continual consequently for lary size is small, then the Bernoulli
choosing the remaining nodes. model performs better than the
Meanwhile within the testing multinomial model.
phase, to predict the category label
of a new untagged document, the
DT classifier tests the terms of the 18.3.3 K-NEAREST
against the DT ranging from the NEIGHBORHOOD CLASSIFIER
root node (base node) to until it (K-NN)
reaches a leaf node and assigns the
category label of the leaf node. Most of the classifiers within the
literature pay longer in the training
part for building the classification
18.3.2 NAÏVE BAYES (NB) model are considered as an eager
CLASSIFIER learner. However, k-NN classifier
spends longer within the testing part
NB classifier is a probabilistic clas- for predicting the category label of
sifier based on Bayesian posterior the new untagged test document.
probability distribution. It holds Hence, it is known as a lazy learner.
the restriction with the independent In the training section of the
relationship among the attributes model construction, k-NN classifier
through conditional probability. stores all the training documents
There is two variant of NB classifier, together with their target class.
namely the multivariate Bernoulli Meanwhile, in testing phase, once
model (B_NB) and multinomial any new test document comes for
model (M_NB) (McCallum & classification whose target class is
Nigam). The multivariate Bernoulli unknown, k-nearest-neighborhood
naive Bayes model works only on classifier finds the distance of the
binary data. Hence, in document test document from all the training
preprocessing steps, each attri- documents and assigns the category
butes corresponding to the list of label of the training documents that
documents in VSM must be either is nearest or most like the unknown
document (Sebastiani, 2002; Han the structure of the brain, and it will
et al., 2001). For this reason, k-NN learn from the prevailing training
classifier is thought of as an instant- data to perform tasks like catego-
based learning algorithm (Han et rization, prediction or forecast,
al., 2001). Euclidian distance and decision-making, visualization, and
cosine similarity are the foremost others. It consists of a compila-
oftentimes used approaches for tion of nodes otherwise known as
measurement similarity quotient to neurons that are the middle of data
find the NN. processing in ANN. With context to
the problem statement, these neurons
are organized into three different
18.3.4 SUPPORT VECTOR layers, specifically the input layer,
MACHINE (SVM) an output layer, and hidden layer.
Within the context of text classifica-
SVM is a kind of classifier has the tion, the quantity of words or terms
potential to classify each linear and
outlines the neuron numbers within
nonlinear data (Cortes & Vapnik,
the input layer, and therefore the
1995). The core plan behind the
classes (class label) of documents
SVM classifier is that it first non-
define the number of neurons in
linearly maps the initial training data
the output layer. ANN will have a
into sufficiently higher dimension
minimum of one input layer and one
let be n, so the data within the higher
output layer; however, it's going to
dimension is separated simply by n-1
have several hidden layers relying
dimension decision surface known as
upon the chosen drawback. All links
hyperplanes. Out of all hyperplanes,
the SVM classifier determines the from the input layer to the output
simplest hyperplane that has most layer through hidden layers are
margins from the support vectors. appointed with some weights that
Thanks to non-linearity mapping, represent the dependence relation
SVM classifier works expeditiously between the nodes. Once the neurons
on an oversized data set and has been get weighted data, it calculates the
with success applied in text classifi- weighted sum, and a well-known
cation (Drucker,1999). activation function processes it. The
output value from the activation
function is fed forward to all the
18.3.5 ARTIFICIAL NEURAL neurons within the input layer to
NETWORK (ANN) map the proper neuron in the output
layer. Some examples of well-known
ANN is a reasonably a data activation functions are Binary
processing nonlinear model cherish step, Sigmoid, TanH, Softmax, and
Rectifier linear unit functions. ANN 

RC computes the centroid µ (ci ) for
can be additional versatile and more the class ci using the equation
powerful by employing additional
hidden layers. In particular, PPN,  1 
SGD neural network, and BPN
µ (ci ) =
Dci
v
d Dci
d
(18.3)
are the three widespread neural
networks primarily based classi- In testing phase to predict the
fiers that extensively used for text category label ci C of an untagged
classification. test document d Ł D , Rocchio
classifier calculates its Euclidean
distance
 from the centroid of every
18.3.6 ROCCHIO CLASSIFIER class µ (ci ) and assigns that class
(RC) label which has a minimum distance
from untagged test document using
Rocchio classification algorithm is the following equation:
outlined on the conception of rele-
 
vance feedback theory established dist = arg c min µ (ci ) − vd (18.4)
within the field of IR (Rocchio,
1971). It uses the properties of
centroid and similarity measure
computations among the documents 18.3.7 RIDGE CLASSIFIER
within the training and testing phase (RIDGE)
of model construction and usage,
The Ridge classification algorithm
respectively. If D=<d1,d2,…,dn>
relies on subspace assumption, which
represents Document set which
states that samples of a specific class
holds all the training documents
lie on a linear subspace and a new
and If C=<c1,c2,…,cm> represents test sample to a category will be
class set which have all the distinct described as a linear combination
class labels. For each class c1 C , Dci
of training samples of the relevant
represent all the documents of D the  class (He et al., 2014). The ridge
set which belong to class ci and vd classification algorithm is presented
represents the VSM document repre- in Figure 18.2.
sentation for each document. In the
training phase, the Rocchio classi-
 18.3.8 PASSIVE–AGGRESSIVE
fier computes the centroid µ (ci ) for (PA) CLASSIFIER
each class from the relevant docu-
ments and establishes the centroid of The PA classifiers belong to the
each class as its representative. The family of a large-scale learning
FIGURE 18.2 Ridge classification algorithm.
algorithm (Crammer et al., 2006). it includes a regularization parameter

The working principle of this kind of c. Figure 18.3 shows the pseudocode
classifier is similar to that of Percep- description of the passive aggressive
tron classifier; meanwhile, they do classifier.
not require a learning rate. However,
FIGURE 18.3 Pseudocode for PA classifier.

18.3.9 RANDOM FOREST (RF) In the testing phase, each DT

performs prediction for a new test
RF classifier is a bagging type document and assigns that class
ensemble-learning algorithm. label, which is mostly predicted by
Figure 18.4 shows the overall archi- all of the DT classifiers. The main
tecture of the random forest classi- advantage of random forest over the
fier. In the training phase, it builds DT is it eliminates the problem of
several DT classifiers from the over-fitting and increases the clas-
random subsample of documents. sification accuracy.
FIGURE 18.4 Random forest classification.
18.4 RESULTS AND 1.90 GHz with 4.00 GB of RAM.

DISCUSSION Four benchmark biomedical dataset
namely; BioCreative Corpus III
18.4.1 EXPERIMENTAL SETUP (BC3), Farm dataset and TREC 2006
Genomics Track have been used to
ML solutions for document classifica- perform an empirical evaluation of
tion has been implemented in python various ML algorithms mentioned in
3.6.7, and the experimentation is Section 18.3. The summary of these
performed on a machine having Intel® datasets are presented in Table 18.1,
Pentium® CPU 3825U processor
and their descriptions are detailed ads or not. This dataset has
below: size 12.4 MB and is available
• BioCreative Corpus III at the UCI ML repository
(BC3): The BC3 dataset has (Lichman, 2013).
been created by the BioCre- • TREC 2006 Genomics Track
ative III interactive task of dataset: This dataset is the
the BioCreative workshop collection of biomedical
that was conducted in 2010. full-text HTML documents
The BC3 dataset is divided from 49 journals in the area
into BC3-part 1 and BC3-part of Genomics Track. In this
2 datasets. Both BC3-part 1 experiment, 1077 biomedical
and BC3-part 2 datasets are article abstract or document is
originally in XML format and collected from five journals.
have size 32.5 MB and 46.5 The number of document
MB, respectively. For docu- collections from each of the
ment classification, all the five journals is presented in
abstract and respective class Table 18.2.
label of each document is
extracted from the XML file TABLE 18.1 Summary of Four Biomedical
and represent in a CSV file. Text Datasets
For BC3-part 1, the CSV file Dataset Classes Number of
is of size 3.12 MB and repre- Documents
sents 2280 article abstract
with the class label of each BC3—part 1 2 2280
abstract. Similarly, BC3-part BC3—part 2 2 4000
2 CSV file has size 5.73 MB Farm Ads 2 4143
and holds 4000 article abstract TREC 2006 5 1077
with their class label. This Genomics Track
dataset is available at https://
biocreative.bioinformatics. TABLE 18.2 TREC 2006 Genomics Track
udel.edu/ resources/corpora/ Dataset
biocreative-iii-corpus/ Journal Name No. of
• Farm Ads dataset: This Documents
dataset contains 4143 farm Cerebral Cortex CC 201
ads texts documents that Glycobiology GLY 203
represent various topics of
Alcohol and Alcoholism AA 202
farm animal. This is a binary
International Journal of 206
classification problem where Epidemiology IJE
each of the documents or
International Immunology II 265
content either approves the
Extensive experimentation was a. True Positive (tp1): The docu-

carried out with eighty percentage of ments, which belong to class
dataset contemplated for training and Ci,are correctly predicted to
the remaining twenty percentage of class Ciby the classifier.
dataset intended for testing, respec- b. True Negative (tn1): The
tively. Using python Scikit-learn documents, which do not
ML library (Pedregosa et al., 2011), belong to the class Ci, are
and TfidfVectorizer envisage all the correctly predicted to other
text preprocessing routines to build class rather than class Ci.
a dictionary and finally to transform c. False Positive (fp1): The
all the documents to VSM represen- documents, which do not
tation. Subsequently, classification is belong to the class Ci, are
performed with different ML algo- wrongly predicted to the
rithms, and finally, the classifiers are class Ci.
evaluated using the well-established d. False Negative (fn1): The
performance measures. documents, which belong
to the class Ci,, are wrongly
predicted to different class
18.4.2 PERFORMANCE rather than class Ci.
MEASURE
Now the performance measures
Measures such as Accuracy, Error are defined as follows
Rate, Precision, Recall, and F1-Score
are used to evaluate the performance • Accuracy: It is the average
of the classifier (Sokolova & of per class ratio of correctly
Lapalme, 2009). Aforementioned classified documents to the
measures are defined by means of the total documents.
following features, which defines the
tpi + tni
properties of the confusion matrix, as
n
shown in Table 18.3.

i=1
tpi + fpi + fni + tni (18.5)
n
TABLE 18.3 Confusion Matrix for Class Ci • Error Rate: It is the average
Total Predicted Class
of per class ratio of incor-
Documents rectly classified documents
Ci Not Ci
to the total documents.
Ci True positive False
(TP) negative (FN) fpi + fni

n
Actual
Class Not Ci False True tpi + fpi + fni + tni
i=1
positive (FP) negative (18.6)

n
(TN)
• Precision: It is the average 2(Precision × Recall)

of per class ratio of true (18.9)
Precision + Recall
positive prediction to total
In all the above cases, n is the no
positive prediction.
of classes or labels in the dataset.
tpi

n
i=1
tpi + fpi (18.7)
n
18.4.3 HYPER-PARAMETERS
FOR DIFFERENT CLASSIFIERS
• Recall: It is the average of
per class ratio of true posi- The initialization of the input param-
tive prediction to a total eters among the different classifiers
number of actual positive has a great impact on the classifica-
documents in the test set. tion performance measurements.
Table 18.4 highlights the respec-
tpi

n
tive parameter setting procedures
i=1
tpi + fni
adapted in the experimental process
n (18.8)
of building the corresponding
classifier.
• F1-Score:
TABLE 18.4 Hyper-parameters Settings of Different Classifiers
Classifiers Parameters
DT Splitting="Gini" splitter="best" min_samples_split=2
M_NB alpha=0.01 fit_prior=True class_prior=None
B_NB alpha=0.01 binarize=0.0 fit_prior=True
K-NN K=10 metric="minkowski" weights="uniform"
SVM penalty factor="l2" tolerance (tol)="1e-4" loss="hinge"
PPN max_iter="50", tolerance(tol)="1e-3" n_iter_no_change=5
SGD alpha="0.0001" Maximum iteration="50", loss="hinge"
Ridge solver="sag" tolerance(tol)="1e-2" max_iter=None
RC metric="Euclidean" shrink_threshold="None"
PA max_iter="50", tolerance(tol)="1e-3" loss="hinge"
RF n_estimator="100" Splitting="Gini" min_samples_split=2
BPN max_iter=200 Hidden layer size="1000" activation function=relu

18.4.4 PERFORMANCE Farm ads and TREC 2006 Genomics

ANALYSIS Track.
The Execution time of different
The extensive experiment is algorithms is provided in Table 18.5.
conducted on different ML solutions Execution time is the sum of training
or algorithms such as DT, M_NB, and testing time of the classification
B_NB, K-NN, SVM, PPN, SGD, algorithm. Execution time plays a
Ridge, RC, PA, RF and BPN algo- great role along with performance
rithms for biomedical benchmark measures for comparing different
dataset like BC3-part 1, BC3-part 2, classification algorithms.
TABLE 18.5 Performance of Classifiers with Respect to Execution Time

Algorithms Execution Time in Seconds for Each Dataset
BC3-p1 BC3-p2 Farm TREC
DT 1.674046 2.537979 2.484233 0.212883
M_NB 0.005981 0.021965 0.013005 0.006006
B_NB 0.010983 0.033979 0.034966 0.008998
K-NN 0.144941 0.413833 0.471888 0.041975
SVM 0.075956 0.232876 0.295832 0.114936
PPN 0.013019 0.048980 0.042972 0.019991
SGD 0.349801 0.819924 0.458874 0.314822
Ridge 0.130532 0.386032 0.338842 0.243861
RC 0.028429 0.033968 0.052979 0.009999
PA 0.025990 0.089968 0.102929 0.031979
RF 2.736432 4.748942 6.632810 1.309248
BPN 186.247590 271.775795 918.549152 92.036547
The performance of all the solutions on various benchmark

ML solutions is evaluated using biomedical dataset for automatic
different performance measures document classification.
like accuracy, error rate, precision, The result for the BC3-part 1
recall, and F1 score. These perfor- dataset in Table 18.6 shows that
mance measurements will provide a RC classifier performs best among
general overview of each ML solu- all the classifiers with respect to
tion performance from a different all the classification performance
perspective. Table 18.6 shows the measures. The classification accu-
performance measurements of ML racy of the RC classifiers is 62.94.
TABLE 18.6 Performance of Classifiers (in %) using a Different Dataset 418
Dataset Performance Classification Algorithms
Measure DT M_NB B_NB K-NN SVM PPN SGD Ridge RC PA RF BPN
Accuracy 53.73 57.68 62.28 57.02 58.11 57.68 59.43 60.31 62.94 57.46 60.31 57.56
Error Rate 46.27 42.32 37.72 42.98 41.89 42.32 40.57 39.69 37.06 42.54 39.69 42.54
BC3-p 1 Precision 53.83 57.66 62.27 57.26 58.10 57.65 59.42 60.32 62.93 57.43 60.36 57.45
Recall 53.80 57.66 62.25 57.14 58.04 57.61 59.41 60.31 62.89 57.39 60.34 57.37
F1-measure 53.68 57.66 62.25 56.88 57.99 57.59 59.41 60.21 62.89 57.36 60.30 57.30
Accuracy 77.28 85.02 84.64 83.65 86.89 84.89 86.89 86.39 79.40 84.89 83.65 84.89
Error rate 22.72 14.98 15.36 13.35 13.11 15.11 13.11 13.61 20.60 15.11 16.35 15.11
BC3-p 2 Precision 78.24 83.89 84.42 80.03 85.52 84.09 85.50 85.22 85.10 83.71 83.59 82.95
Recall 77.28 85.02 84.64 83.65 86.89 84.89 86.89 86.39 79.40 84.89 83.65 84.89
F1-measure 77.74 84.31 84.53 78.57 85.12 84.43 85.41 83.67 81.24 84.15 76.89 83.34
Accuracy 85.04 91.44 86.25 85.28 90.71 89.63 90.35 90.59 86.13 90.47 89.51 89.14
Error rate 14.96 8.56 13.25 14.72 9.29 10.37 9.65 9.41 13.87 9.53 10.49 10.86
Farm ads Precision 85.28 91.61 86.72 85.26 90.71 89.61 90.35 90.60 86.51 90.46 89.52 89.13
Recall 85.28 91.44 86.25 85.28 90.71 89.63 90.35 90.59 86.13 90.47 89.51 89.14
F1-measure 85.09 91.37 86.30 85.25 90.71 89.62 90.33 90.56 85.95 90.46 89.47 89.13
Accuracy 84.26 96.30 94.44 21.30 97.69 96.30 93.98 97.69 95.83 97.22 95.37 98.15
TREC Error rate 15.74 3.70 5.56 78.70 2.31 3.70 6.02 2.31 4.17 2.78 4.63 1.85
2006
Precision 84.36 96.48 95.21 64.34 97.74 96.30 94.02 97.74 96.00 97.25 95.75 98.21
Genomics
Track Recall 84.26 96.30 94.44 21.30 97.69 96.30 93.98 97.69 95.83 97.22 95.37 98.15
F1-measure 84.29 96.23 94.36 10.62 97.69 96.27 93.96 97.69 95.86 97.23 95.39 98.12
The B_NB classifier performs well Meanwhile, after Ridge classifiers

next to RC classifier. After B_NB, SVM shows good performance with
the Ridge and RF classifiers have classification accuracy in percentage
the same classification accuracy is 90.71. Next to SVM, Ridge, PA,
of 60.31, but if both are compared and SGD classifiers provide good
with respect to precision, recall, classification performance.
and F1 score, then Ridge classifiers For TREC 2006 Genomics Track
show better performance. However, dataset, BPN classifiers estimate
DT^^ classifier yields the lowest more than 98.12 percentages of clas-
classification performance among sification accuracy, precision, recall,
all the classifiers for BC3-part 1 and F1 score. SVM, Ridge and PA
dataset. Meanwhile, the remaining classifiers have good classification
classifiers provide an average clas- performance next to BPN. For TREC
sification performance. 2006 Genomics Track dataset KNN
For BC3-part 2 dataset, SVM and classifier shows the least perfor-
SGD classifiers stand top among all mance among all the classifiers.
the classifiers with the same 86.89 Thus from Table 18.6, it is clear
percentage classification accuracy, that no one classifier is best for all the
13.11 percentage error rate, and benchmarking dataset. From dataset
86.89 percentage recall. But SVM to dataset, the performance of the
works better than SGD with respect classifier varies. But, among all the
to precision; on the other hand, SGD classifiers BPN, SGD, Ridge, PA,
outperforms SVM with respect to M_NB, B_NB, and SVM provide
F1 score. Next to SVM and SGD, good classification performances and
Ridge and M_NB classifier perform they have approximately possess the
well. Meanwhile, PA, PPN, and BPN same classification performances. In
classifier have an equal classifica- particular, BPN classifier provides
tion accuracy of 84.89%. However, good classification accuracy, but it
if precision and F1 score are taken consumes more execution time.
for ranking the classifiers, then PPN The comparison of different algo-
classifier performs better than PA and rithms with respect to accuracy and
BPN classifier. Usually for BC3-part execution time is shown in Figure
2 dataset DT classifier generates 18.5. In Figure 18.5, the accuracy
lowest classification performance. values are in the range of 0 to 1
The M_NB classifier shows and the execution time of various
the best classification performance ML algorithms on each dataset are
among all the classifiers for Farm normalized by dividing execution
ads dataset. The classification accu- time of each algorithm by maximum
racy of M_NB classifier is 91.44%. execution time of any algorithm for
concern dataset. The main purpose time is to compare the accuracy of

of having a normalized execution the respective algorithms.
FIGURE 18.5 Performance comparison in accuracy and execution time.
18.5 CONCLUSION AND logic of the state-of-the-art super-

FUTURE SCOPE vised ML algorithms and empirically
evaluates how all the ML algorithms
Medical document classification is which are constituted to act as a clas-
a multidisciplinary field of research sifier to the benchmark biomedical
in biomedical engineering. Many dataset. Particularly, classifiers like
supervised ML algorithms have been SGD, Ridge, PA, BPN, and SVM
successfully applied for automatic provides good results on the given
classification biomedical literature. dataset compared to the other clas-
But only a few authors addressed the sifiers. However, the performance of
performance measurements of all the KNN and Decision Tree classifiers
classifiers in one platform. Hence, has shown poor results for the chosen
this book chapter summarizes in dataset compared to other classifiers.
detail the procedures involved Meanwhile, other classifiers have
automatic document classification an average classification perfor-
process, exemplifies the working mance. The future scope is to make an
improvement among those classifiers to Deerwester, S.; Dumais, S.; Landauer, T.;
adapt well in connection to the large- Furnas, G.; Harshman, R. Indexing by
scale dataset. As a result, application latent semantic analysis. JASIS. 1990,
of deep learning-based models like 41(6), 391–407.
Drucker, H.; Wu, D.; Vapnik, V. Support
multilayer feedforward neural networks,
vector machines for spam categorization,
convolution neural networks, recurrent
IEEE Transactions on Neural Networks.
neural networks and ensemble deep 1999, 10(5), 1048–1054.
learning models becomes an evitable Pedregosa, F. et al. Scikit-learn: Machine
avenue of further research. learning in Python, Journal of Machine
Learning Research. 2011, 12, 2825–2830.
Fisher, R. The use of multiple measurements
KEYWORDS in taxonomic problems. Annals of
Eugenics. 1936, 7, 179–188.
García, M.A.M.; Rodríguez, R.P.; Rifón,
text mining L.E.A. Biomedical literature classification
using encyclopedic knowledge: a
machine learning
Wikipedia-based bag-of-concepts
documents classifcation approach. PeerJ. 2015, 3, e1279.
information retrieval Han, E.S.; Karypis, G.; Kumar, V. Text
categorization using weight adjusted
information extraction
k-nearest neighbor classification. Springer.
2001
He, J.; Ding, L.; Jiang, L.; Ma, L. Kernel ridge
REFERENCES regression classification. Proceedings
of the International Joint Conference on
Aggarwal, C.C. ; Zhai, C. X. Mining Text Neural Networks. 2014, 2263–2267.
Data, Springer. 2012. Hofmann, T. Probabilistic latent semantic
Almeida, H.; Meurs, M. J.; Kosseim, L.; indexing. ACM SIGIR Conference, 1999.
Butler, G.; Tsang, A. Machine learning for Howland, P.; Jeon, M.; Park, H. Structure
biomedical literature triage, PLoS One. preserving dimension reduction
2014, 9(12). for clustered text data based on the
Chakrabarti, S.; Roy, S.; Soundalgekar, M. generalized singular value decomposition.
Fast and accurate text classification via SIAM Journal of Matrix Analysis and
multiple linear discriminant projections, Applications. 2003, 25(1), 165–179.
VLDB Journal. 2003, 12(2), 172–185. Howland, P.; Park, H. Generalizing
Cohen, AM. An effective general purpose discriminant analysis using the
approach for automated biomedical generalized singular value decomposition,
document classification. AMIA Annual IEEE Transactions on Pattern Analysis
Symposium Proceedings. 2006,161–165. and Machine Intelligence. 2004, 26(8),
Cortes, C. ; Vapnik, V.; Support-vector 995–1006.
networks. Machine Learning. 1995, 20, Hull, D.A. Stemming algorithms: A case
273–297. study for detailed evaluation. JASIS 47.
Crammer, K.; Dekel, O.; Keshet, J.; Shalev- 1996, 1, 70–84
Shwartz, S.; Singer, Y. Online passive Jiang, X.; Ringwald, M.; Blake, J.; Shatkay,
aggressive algorithms, Journal of Machine H. Effective biomedical document
Learning Research. 2006, 7, 551–585. classification for identifying publications
relevant to the mouse Gene Expression Proceedings of the Ninth International

Database (GXD). 2017. Conference on Language Resources and
Lewis, D.D. Naive (Bayes) at forty: The Evaluation (LREC 2014), Reykjavik,
independence assumption in information Iceland, 26–31 May 2014.
retrieval. In Machine learning: ECML-98, Salton, G.; Wong, A.; Yang, C.S. A vector
Springer. 1998, 4–15. space model for automatic indexing.
Li, Y. ; Jain, A. Classification of text Commun. ACM 18. 1975, 11, 613–620.
documents. The Computer Journal. 1998, Samal, B.R.; Behera, A.K.; Panda, M.
41(8), 537–546. Performance analysis of supervised
Lichman, M.. UCI Machine Learning machine learning techniques for sentiment
Repository Irvine, CA: University of analysis. Proceedings of the 1st ICRIL
California, School of Information and International Conference on Sensing,
Computer Science. https://archive.ics.uci. Signal Processing and Security (ICSSS).
edu/ml/datasets.html, 2013. Piscataway, IEEE. 2017, 128–133.
Liu, H.; Motoda H. Feature Extraction, Sebastiani, F. Machine learning in automated
construction, and selection: A Data Mining text categorization, ACM Computing
Perpective. Boston, Massachusetts, MA, Surveys. 2002,34(1).
USA: Kluwer Academic Publishers,1998. Silva, C.; Ribeiro, B. The importance of
Manning, C.D.; Raghavan, P.; Schütze, stop word removal on recall values in
H. Introduction to information retrieval. text categorization. In Proceedings of the
Cambridge University Press Cambridge. International Joint Conference on Neural
2008,1. Networks. Portland, OR, USA. 2003, 3,
Mishu, S. Z.; Rafiuddin, S. M. Performance 1661–1666.
analysis of supervised machine learning Sokolova, M.; Lapalme, G. A systematic
algorithms for text classification, 19th analysis of performance measures for
Int. Conf. Comput. Inf. Technol. 2016, classification tasks, Information and
409–413. Processing and Management. 2009, 45(4),
Nguyen, D.B.; Shenify, M.; Al-Mubaid, 427–437.
H. Biomedical Text Classification with Turtle, H.; Croft, W.B. Inference networks
Improved Feature Weighting Method. for document retrieval. In Proceedings of
BICOB 2016, April 4–6 2016, Las Vegas, the 13th annual international ACM SIGIR
Nevada, USA. 2016. Conference on Research and Development
Porter, M.F. An algorithm for suffix in Information Retrieval. ACM. 1989,
stripping. Program: Electronic Library 1–24.
and Information Systems. 1980, 14, 3, Webster, J.J.; Kit, C. Tokenization as the
130–137. initial phase in NLP. In Proceedings of
Rocchio, J.J. ”Relevance Feedback in the 14th Conference on Computational
Information Retrieval” The SMART Linguistics. Association for Computational
Retrieval System. 1971, 313–323. Linguistics. 2010, 4, 1106–1110.
Saif, H.; Fernández, M.; He, Y.; Alani, H. Yang Y.; Pederson, J.O. A comparative study
On stopwords, filtering and data sparsity on feature selection in text categorization,
for sentiment analysis of twitter. In ACM SIGIR Conference, 1995.
CHAPTER 19
ENERGY EFFICIENT OPTIMUM

CLUSTER HEAD ESTIMATION FOR
BODY AREA NETWORKS
P. SUNDARESWARAN* and R.S. RAJESH
Department of Computer Science and Engineering, Manonmaniam
Sundaranar University, Tirunelveli, India
Corresponding author. E-mail: psundareswaran@msuniv.ac.in
*
ABSTRACT type of network is normally used

to monitor the health of the human
Wireless sensor networks (WSNs) and functionalities of various organs
are becoming increasingly familiar of the human body. The sensors are
since they are an inevitable part of generally implanted or positioned
human-centric applications. The in the human body. Therefore, the
WSNs are also used in health appli- BANs are helping the medical
cations. It has small, low energy attendants to access the patient’s
sensing devices that can sense the conditions regularly and keep track
data as required and send it to the of the medical data of the person.
collecting base station. Since the The BANs consist of different type
sensor nodes have limited energy and of sensors, they can sense patients
in most of the applications sensors pulse rate, sugar level, blood pres-
are not replaceable and rechargeable, sure, etc. Similarly, wearable devices
energy conservation of sensor nodes are also having sensors to watch the
is the primary goal over the design of behaviour of the organs of the body.
WSNs. Although a lot of techniques These sensors sense the data and
are available for energy conservation, send them to the base station, where
clustering is the important method they will be analyzed and processed
used for preserving energy. Body by the medical experts at any time.
area networks (BANs) are one of the If any emergency occurs, the system
specific applications of WSNs. This will alarm the medical assistants
and rapid actions will be taken. As nodes among two consecutive

the sensors are implanted, it is not rounds and the ratio of total network
possible to replace them. Since they energy between two consecutive
are energy limited, the conservation rounds are calculated. These values
of energy is very important in this play a significant role in computing
type of applications. In this work, the suitable cluster heads for each
the sensors implanted and used in round. After the cluster heads count
wearable are considered as reactive is computed, the LEACH method
sensors. The reactive sensors trigger is used to elect the cluster heads. If
to send the sensed data when it is the elected cluster heads are below
beyond the hard threshold value the optimum number, the balance
or the difference between the two cluster heads are selected among
consecutive sensed values is greater the remaining eligible nodes. If the
than the soft threshold value. These cluster heads elected are greater than
sensors are grouped under several the optimum number, the cluster
clusters so that they can send the heads above the optimum numbers
data to the cluster heads instead of with least energy are converted
the base stations to preserve energy. as normal nodes. Simulations
The cluster heads in turn send the are performed and the results are
received data to the base station after compared with the existing proto-
aggregation is performed. The trig- cols. The experimental results show
gered nodes or otherwise called as that the proposed protocol outper-
active nodes are the only busy nodes forms the existing protocols in terms
at the corresponding round and other of network lifetime and throughput.
sensors are kept under inoperative
state. Hence, the active sensors are
considered for the calculation of the 19.1 INTRODUCTION
optimum number of cluster heads.
The optimum number of clusters A wireless sensor network (WSN)
plays a crucial role in deciding the is a centralized, distributed network
clusters in each round of operation having a lot of tiny, self-directed, and
as they can also save the energy little powered devices called sensor
consumption of sensor nodes (cluster nodes. Each sensor node is having
heads). The numbers of active nodes a RF transceiver, multiple types of
present in the earlier round and memory, a power supply, accom-
current round are determined and the modates different types of sensors
change in the total network energy and actuators with processing
between two consecutive rounds is capability. The nodes are deployed
also computed. The ratio of active in a random or an organized manner
Energy Effcient Optimum Cluster Head Estimation 425
depending upon the application in 2008), and transport and agriculture

which they are being used. These (Feng et al., 2008; Coates et al.,
nodes can exchange data among 2013; Wang et al., 2006). Sensors
themselves through the wireless are used in hazardous environments,
mode and are self-organized after where the maintenance and replace-
beginning to function. The sensor ment of components by the human
devices are energy efficient and beings is tricky and dangerous. Since
include the potential for multifunc- the sensor devices are provided
tionality. WSNs are classified into with limited energy, energy saving
homogeneous and heterogeneous is an important criteria as far as
WSNs. Sensor devices within the design of a WSN or BAN is
the homogeneous WSNs have an concerned. During the last decade,
equivalent amount of initial energy. considerable amount of research
Heterogeneous sensor network in the area of BANs has focused
nodes are provided with dissimilar on issues associated with wireless
initial energy. Widespread industrial sensor design, size of the sensor
applications (Akyildiz et al., 2002) devices, power-aware sensor circuit
use the services of the wireless design, cost-effective sensors, signal
sensors. The group of sensors collect synthesizing methods, and design of
information from the environment to communications protocols.
carry out and process them to meet In the earlier works, it is reported
the particular application objectives. that the sensor devices transmit the
Wireless sensors are commonly acquired data directly to the sink.
used in industrial, social, and Therefore, these devices would
commercial applications because of become dead rapidly that leads
their advances in processing power, to the death of the total network.
communication ability, and capacity Alternatively, with the introduction
to utilize minimum power. Sensor of clustering techniques, the sensor
nodes are used to sense environ- nodes transmit the data directly
mental metrics like heat, pressure, to their respective cluster head
humidity, noise, vibration, etc. They rather than sending them to the far
also have the capability of sensing away from the base station. This
the air particles (carbon dioxide, approach thus reduces a consider-
etc.) and underwater components. able quantity of energy utilization
The important application areas of that affects the increase of network
sensor nodes include industry (Lin lifetime. Threshold sensitive
et al., 2004; Gungor et al., 2009), energy efficient network (TEEN)
military (Hussain et al., 2009), protocol (Manjeshwar et al., 2001)
environment monitoring (Yick et al., is the significant cluster-based
mechanism designed for reactive Section 19.3 explains the related

sensor networks. In reactive sensor works, and Section 19.4 describes
networks, the sensors are kept idle. the concept of the proposed meth-
The sensors are triggered to activate, odology. Section 19.5 discusses the
sense, and send the information results and concluding remarks are
based on the sudden changes in the provided in Section 19.6.
sensed attributes, thus preserving
the energy. The computation of the
number of clusters for each round 19.2 BODY AREA NETWORKS
of function has been an important
issue in the reactive sensor networks The incorporation of new technolo-
or BANs using clustering protocols. gies with WSNs leads to the develop-
In equally distributed cluster head ment of BANs. A BAN, also referred
methodology (DECH; Sundar- to as a WBAN or otherwise called a
eswaran et al., 2015), the cluster body sensor network, is a wireless
heads are distributed equally within network having wearable computing
the clusters when the cluster heads devices and sensor devices implanted
are placed close to each other. In within the human body. The BAN is
this work, in addition to the DECH, one of the significant applications
we have computed the change in the of WSNs, normally used for moni-
total energy of the network between toring the human physical condition
successive rounds and the change in (Bao et al., 2006). The most notice-
the number of active nodes between able application of WBANs is in the
successive rounds. An inverse expo- medical field, human body care, and
nential function has been employed patient caring. The BAN devices
to compute the optimum cluster may be implanted inside the body
heads for each round of operations or otherwise surface-mounted on the
that leads to improvement in a body at a permanent spot (Darwish
network lifetime and stability period. et al., 2011). These sensors can be
As wireless body area network used for continuously watching the
(WBAN) is one of the applications movement of a person, reading the
of the WSN, it is assumed that this essential factors like heart rate, ECG,
research is based on the WSN. All EEG, blood pressure, etc. and sensing
the sensor nodes are considered as the neighboring atmosphere (Chen
wireless sensor nodes. It is assumed et al., 2011). Even though many of
that the nodes are having the charac- the existing healthcare systems work
teristics of a WSN node. The rest of based on the wired connections, BAN
this chapter is organized as follows: can be a very effective solution in a
Section 19.2 deals with BANs, healthcare system where a patient
needs to be observed constantly in a typical WBANs includes the

and needs mobility. BAN differs following:Personnel device: The
from other WSN by few significant responsibility of this device is to
aspects. Mobile sensors are mostly acquire the data transmitted from
used in BANs. The patient wearing the sensors and actuators. It is also
the sensor devices can move either used as an interface to communicate
within the predefined environment with other users. It can be otherwise
or through different environments. called body control unit or sink.
The BAN nodes support low energy Sensors: Sensors are used in
usage. The cost is also cheaper WBAN to measure the given proper-
when compare with the WSN nodes. ties internally or externally. Based
Considering the aspects of reli- on the physical stimuli, the sensors
ability, node complexity and density, read the information, process, and
BAN nodes are however traditional. transmit them to the personnel
In battlefield situation with large device. These sensors are classified
number of soldiers, it is essential as physiological sensors, ambient
to watch the signs of soldiers sensors, or biokinetics.
and amount of stress induced by Actuators: The role of the
temperature or similar factors, to actuator is to cooperate with the user
read the physical and psychological upon getting data from the sensor
performance of the troops (Jovanov nodes. Based on the sensed data, it
et al., 2003). In order to save the gives feedback to the network.
energy of the sensors in the WBANs, Implant nodes: These tiny sensor
the reactive sensor nodes can be nodes are kept inside the human
used. The reactive sensors transmit body.
the acquired information to the sink Body surface nodes: These nodes
when the data sensed are deviated are mounted on the surface of the
from the threshold values. human body or few centimeters
away from the body.
External nodes: These nodes will
19.2.1 CHARACTERISTICS OF not be in contact with the patient’s
BODY AREA NETWORK body. They are kept away from the
body by more than 5 cm but within
A node of the WBANis defined 5 m.
(Movassaghi et al., 2014) as a The other components present in
physical entity that has the capacity the WBANs, which are mainly used
of communication with others and for the communication purpose are
possesses limited ability to process as follows:
the data. The components present
The coordinator: The function collision avoidance is used by the

of the coordinator node is to act as nodes to send data to the coordinator.
a gateway to the rest of the world,
similar to WBAN, a security-based
trust center. In general, the coordi- 19.2.2 BODY AREA NETWORK
nator of a WBAN is the access point ARCHITECTURE
or a PDA device, which can be used
for entire communication among the The BAN is divided into three tiers
sensor nodes. of devices based on function and
Relay: These nodes are transi- communication. Figure 19.1 illus-
tional nodes used for transferring trates the architecture of a WBAN.
the messages. The relay nodes have Tier 1 consists of body sensor nodes.
parent and child nodes. The location of these nodes may
End nodes: The end nodes are be planted on the body or within
developed such that they can be used the body otherwise near, and there
for the specific application only. are two types of sensors. Among
They do not have the capability of the body sensors in tier 1, certain
passing the data. sensors set to react and gather data
In a BAN with a large number of based on the physical stimuli of the
nodes as in war places, the topology one, and then process and transmit
used is dynamic. If a WBAN has a the quantity to the portable device.
limited number of sensor nodes, the The second one is actuators that
IEEE 802.15.6 standard recommends perform patient’s medical supervi-
the star topology to operate among sion. The sensors within the same
the nodes. There are two types of the network send the information to
star topology, namely, the single-hop the actuator, or the user can interact
star topology and double-hop star with the actuator. Two methods of
topology. The star topology uses two communication are performed in
types of communications and they tier 1 of the BANs. One communi-
are beacon mode and nonbeacon cation is between the body sensor
mode. Using the beacon mode, nodes available in tier 1. The other
periodic beacons are broadcasted by is between sensors and a portable
the network coordinator to define personal server or device (PS/PD).
the starting and the finishing of a In tier 2, the PD or access points are
frame to permit network association used to collect the data sent by the
control and synchronization of the body sensors. The PD is the device
devices. In the nonbeacon mode, used in tier 2. The information
carrier sense multiple access with sensed by the sensors and acquired
by the actuators are collected at the
PD and sent through access points components that helps in the design
to outside networks wirelessly. The of body sensors nodes, position and
communication standards used in locates the sensor nodes in WBAN,
this tier include bluetooth/bluetooth signal processing, data storage
low energy, ZigBee, ultra-wide and feedback mechanism, power
band (UWB), cellular, and WLAN. source, energy harvesting technolo-
The outside users can communicate gies, dynamic control, and antenna
with the BAN using a gateway. design. (b) Protocol stack for radio
Therefore, the medical supervisors and wireless transmissions, channel
remotely attending the patients can modeling, interfaces with other
get the information immediately wireless communication standards,
through wireless communication interference, efficient medium access
or the Internet. Therefore, the state control (MAC) protocols, error
of the patient is clearly monitored, correction methods, and cross-layer
and on emergency conditions, the techniques. (c) Position and mobility
ambulance that is connected to the deal with the position and movable
outside WBAN is informed. The property of the sensor nodes. (d)
important design areas in the WBAN Security issues related to integrity,
architecture are (a) sensors, energy confidentiality, authentication, and
or power, and network hardware secured communication.
FIGURE 19.1 BAN architecture.

(Source: Adapted from Movassaghi et al., 2014.)
19.2.3 BODY AREA NETWORK Physical layer: The PHY layer

APPLICATIONS of IEEE 802.15.6 designed for
the BANs performs the following
BANs have a huge potential to tasks: activation and deactivation
change the future of health care of the radio transceiver built in the
monitoring and patient health device, clear channel assessment
information by identifying critical (CCA) within the current channel
health conditions and providing real- and data transmission and reception.
time tracking of patient health. The Depending upon the application,
applications of BAN is categorized the physical layer is selected. Three
into medical and nonmedical appli- types of physical layers have been
cations. The medical applications specified by the IEEE 802.15.6 and
includes assessing soldier fatigue they are classified as human body
and battle readiness, aiding profes- communication (HBC), narrow band
sional and amateur sports guidance, (NB), and UWB. NB PHY is specifi-
sleep staging, diabetes and asthma cally meant for data communication
control, and patient monitoring. by a WBAN node, activation or
The nonmedical applications are deactivation of the radio transceiver
real-time streaming, emergency within the node and CCA in the
(nonmedical), entertainment appli- existing channel. HBC PHY issues
cations, emotion measure, and the electrostatic field communica-
secure authentication. tion requirements that encapsulate
modulation, preamble/start frame
delimiter, and packet structure.
19.2.4 BODY AREA NETWORK The UWB physical layer is mainly
LAYERS focussed on communication between
on-body devices and transmission/
The communication used in a BAN reception between both on the body
environment should be in such a and off-body devices. Since the HBC
way that it has high reliability, low PHY has been defined in different
complexity, low price, ultra-low bandwidths for different countries,
energy consumption, and short-range the finalization of physical layer
communication. Since the existing frequency bands was the key issue
layers of the IEEE standards are in the development of the IEEE
not meeting the requirements of the 802.15.6 standard.
WBANs exactly, the new physical MAC layer: The IEEE 802.15.6
(PHY) and MAC layers exclusive standard working group places
for BANs has been defined by IEEE the MAC layer above the PHY
802.15.6 (WBAN) working group. layer so that channel access can
be easily controlled. The whole of power source and replacing the

channel is divided into a series of batteries, the network lifetime of
superframes for the purpose of time the BAN should be improved. An
referenced resource allocations. important issue in communication
This is performed by the hub or among the BAN devices is the reli-
otherwise called the coordinator. ability of the transmission since the
The coordinator is responsible for monitored data should be properly
channel access coordination. The reached to the medical professionals.
coordinator performs this through The routing protocols in BANs
any one of the following access are classified as follows:
modes: beacon mode with beacon (a) Cluster-based algorithms:
period superframe boundaries, These types of algorithms
non-beacon mode with superframe allot nodes in WBANs into
boundaries, and nonbeacon mode separate groups named
without superframe boundaries. The clusters, and a cluster head
responsibilities of the MAC layer in is allotted for every single
IEEE 802.15.6 are the same as with group. These cluster heads
other wireless communication stan- are used to collect the data
dards and topping with additional from the nodes to the sink.
responsibilities. Therefore, the multihop
transmission is avoided and
the amount of direct commu-
19.2.5 ROUTING IN BAN nications from the nodes to
the sink is also decreased
A lot of routing protocols are avail- that save a considerable
able for ad-hoc networks and WSNs. amount of energy. The direct
Even though BANs have identical transmission of data from
functionalities of WSN and ad-hoc the sensor node to a faraway
networks, it has its own distinct base station requires more
characteristics. As BANs have more energy than sending data by
stringent energy restrictions in terms the sensor node to nearby
of energy transmission compared cluster heads. However,
to a traditional sensor and ad-hoc the huge overhead and
networks, the node replacements delay related to cluster and
specifically for implant nodes (nodes cluster head selection are
within the human body) is not a viable the main drawbacks of these
one and might require sometimes protocols.
a surgery in particular scenarios. (b) Probabilistic algorithms:
Therefore, to avoid the charging These types of algorithms
use a cost function to estab- the primary concern of all

lish a path between nodes. thermal-based algorithms for
Protocols using this type BANs is to stay away from
of algorithm use link-state routing to hot-spots. Tissue
information to renew the cost temperature of the human
function. These algorithms body is also varied due to
construct the path among the electromagnetic fields.
routes between nodes with Heavy data traffic is one of
minimum cost. The disad- the main causes of tissue
vantage of these algorithms heating. Preventive mecha-
is that a lot of transmissions nisms for tissue heating are
are required to update the the effective implementation
link-state information. of traffic control mechanisms
c) Cross-layer algorithms: and decreasing the power of
These algorithms combine transceivers.
the network layer interface e) QoS-based routing algo-
difficulties with neighboring rithms: The final classifica-
layers. The advantages of tion among the routing
this type of algorithms are algorithms is QoS routing
minimum energy utiliza- protocols. A modular method
tion, good throughput, and is followed here by providing
constant end-to-end delay. A individual modules for
network with high path loss each QoS parameter. These
and mobility will perform modules are functioning
poorly while using these cooperatively and sharing
algorithms. resources among them. The
d) Temperature-based algo- modules used in QoS-based
rithms: The wireless commu- routing algorithms are the
nications generate radio reliability-sensitive module,
signals that in turn produce the power efficiency module,
electromagnetic fields. the neighbor manager, and
These electromagnetic fields the delay-sensitive module.
are passed into the human Hence, these algorithms
body that causes a rise in provide higher reliability,
human body temperature. lower end-to-end delay, and
This will cause a decrease better throughput.
in blood flow and damage
the sensitive organs due to WSNs have a great number of
the rise in heat. Therefore, clustering algorithms while few
algorithms are addressed for BANs. the base station die soon. Therefore,
This research points towards the the mobile base stations can be
clustering algorithms to study the used to collect the data. Another
existing clustering methods and how approach to conserve the energy is
to improve the performance of these by using the clustering approach in
algorithms with respect to network which the information read by the
lifetime, throughput, and packet sensor nodes is directly routed to
delivery ratio on specific environ- the cluster heads. A considerable
mental conditions. amount of energy will be spent to
transmit the data depending on the
distance between the sensor node
19.3 RELATED WORKS and the sink. Therefore, the far-away
nodes would drain earlier. This
A lot of methods have been devel- will affect the performance of the
oped toward energy conservation network at the initial rounds. The
of nodes in the WSNs. Diverse clustering technique perhaps avoids
mechanisms have been suggested this situation by selecting the higher
in the literature for conserving the energy nodes as cluster heads and
energy in WSNs. Duty cycling and the remaining nodes send the data to
data-driven approaches (Giuseppe the nearby cluster heads. The cluster
et al., 2009; Rezaei et al., 2012) are head in turn collects and aggregates
mainly applied in sensor nodes. The the data and sends it to the base
duty cycling approach switches off station. Therefore, minimum energy
the transceiver to the sleep mode is required to transmit the data to the
when data are not transmitted and nearest cluster head. Data aggrega-
makes the nodes ready to receive tion (Nakamura et al., 2007) is a
the information as soon as available. useful method performed by the
The time duration of the nodes in the cluster heads so that the redundant
active state is called a duty cycle. A data are eliminated instead of being
collection of energy-efficient MAC sent to the sink.
protocols is evolved (Pei et al., Many of the clustering algo-
2013; Demirkol et al., 2006; Naik., rithms (Heinzelman et al., 2002;
2004; Batra., 2016) to preserve the Manjeshwar et al., 2002; Younis et
energy. Mobility (Sara et al., 2014) al., 2004; Qing et al., 2006) focus
also plays a role in energy conser- on the selection of the cluster
vation in sensor networks. As the heads between the sensor devices.
traffic around the sink in a network LEACH (Heinzelman et al., 2002)
is always greater than that in the is the pioneer in protocols used for
rest of the area, the nodes around clustering the WSN nodes. In the
LEACH protocol, the nodes are with only one cluster head. Ding
selected as cluster heads depending et al. (2005) have devised another
on the probability value. Each node algorithm DWEHC to overcome
is assigned with a probability Pi(t) the drawback of HEED. Every node
at time t. A sensor will be elected finds its weight after identifying
as cluster head only when it has not its neighbor nodes in neighboring
been a cluster head in most recent vicinity. The weight is the collec-
rounds (r mod (N/k)), and which tion of energy and closeness to
presumably has higher energy than the neighbors. A node having a
the other sensors. The probability larger weight among the others
to become a cluster head is thus will be elected as the cluster head.
calculated as: Even though HEED and DWEHC
look similar, the cluster heads are
‹
‚
evenly distributed in DWEHC than
‚ k that in HEED. WSNs with nodes
Pi ( t ) = ‰ Ci (t ) = 1 (19.1)
‚ N − k * r mod N

having different energy levels at the
0
‚ kotherwise
„
beginning are called heterogeneous
WSNs. Two classes of sensors
where N is the nodes present in a
with dissimilar energy levels
WSN, r is the current round and k is
are employed in Stable Election
the likely number of cluster heads at
Protoco (Smaragdakis et al., 2004)
round r. Variants of LEACH (Salim
and they are called normal nodes
et al., 2014; Batra et al., 2016;
and advanced nodes. The advanced
Arumugam et al., 2015) are devel-
nodes have (1 + α) times the energy
oped in later stages with improved
of the normal nodes. The threshold
performances. A coordinator-based
value for finding the eligible cluster
cluster head election method was
heads is calculated based on the
proposed (Wu et al., 2011) and the
weighted probabilities. Another
network performed in a better way.
distributed energy-efficient clus-
Facility location theory is used to
tering protocol has been developed
resolve the incapacitated facility
for heterogeneous WSNs, called
location problem (Jain et al., 2011)
distributed energy efficient clus-
and the clustering model saved the
tering (DEEC) (Qing et al., 2006). In
energy to a specific extent. Another
DEEC, a probability ratio between
distributed clustering scheme HEED
the residual energy of each node in
(Younis et al., 2004) selects cluster
the network and the average energy
heads from the deployed sensors
of the network is used to select the
based on a hybrid of communica-
cluster head. A node having more
tion cost and energy. In HEED,
initial and residual energy will be
each sensor is directly connected
having a better chance of becoming is designed (Yi et al., 2015) for

a cluster head among the nodes. on-body wireless communication.
Threshold-sensitive energy In this research work, an attempt is
efficient sensor network (TEEN), made to analyze the role of network
an example of the homogeneous energy and active nodes for cluster
clustering method used in WSNs, head computation. Therefore, the
uses two threshold values, namely, ratio of active nodes present in the
hard threshold and soft threshold. If successive rounds and network
the acquired value is far away from energy is considered to decide the
the hard threshold, the sensor imme- (optimum) number of clusters for
diately gets activated and transmits the round.
the sensed value. The soft threshold
has the highest deviation among the
two acquired values, beyond which 19.4 PROPOSED
the node triggers to transmit. The METHODOLOGY
numbers of clusters are calculated
based on the probability function. The suggested technique uses the first
In most of the applications using order radio model for experimental
TEEN, the nodes will be triggered to purposes. This work proposes a new
active state only if the sensed value methodology called adaptive energy
exceeds the soft and hard threshold efficient cluster head estimation
values. Otherwise, the nodes are (ACHE) methodology to compute
kept in an idle state. In a real-time the optimum amount of cluster
deployment of thousands of sensor heads at each round. The subsequent
nodes, only a selected quantity of section discusses the radio network
nodes at a specific location may be model followed by the description
activated. For example, like sensor of the proposed TEEN-ACHE
networks monitoring temperature, methodology.
at a place where the temperature
exceeds the threshold value, the
nodes around that area gets acti- 19.4.1 NETWORK MODEL
vated. TEEN protocol considers all
nodes for cluster head computation, It is understood that all the sensors
whereas the TEEN-DECH approach in the WSN will be having the same
computes the closely placed cluster initial energy, that is, the nodes are
heads and distributes them equally. homogeneous. First-order radio
A system-level energy utilization model (Heinzelman et al., 2000) is
model related to communication considered for the simulation study.
distance and communication speed Depending upon the distance among
the source and destination, these
radio models are classified into the is assumed that Efs and Emp are the
multipath and free space model. amplifier types in the respective
The radio signals propagate from media. The distance d0 is considered
the source and reach the receiving as a threshold value for selecting the
antenna over two or more paths. media.
This method is known as multipath Energy used for sending the data:
propagation. Causes for multipath Let ET be the power required
occurrence are due to ionospheric to transmit a packet of size P at a
reflection and refraction, atmos- distance d. The d and d0 are used
pheric ducting, and reflection from to select the media for sending the
water bodies and terrestrial objects packet. If d ≤ d0, the free space
like mountains and buildings. Phase amplifier type is used for sending the
shifting of the signal and constructive data; otherwise, the multipath ampli-
and destructive interference are the fier is considered.
effects of multipath. The multipath If d≤d0
signals are received in a terrestrial
environment, that is, where different ET = ( Eel š P ) + ( E fs š P š d 2 ) (19.3)
types of propagation are present and
the signals reach at the receiving else
station via different ways of paths.
Therefore, multipath interference ET = ( Eel š P ) + ( Emp š P š d 4 ) (19.4)
occurs here and causes multipath
fading. Transmission antenna and where Eel = Et + E AG . E AG is the power
receiving antenna are kept in an used for performing aggregation.
obstacle-free environment to have a In the case of cluster heads, only
free space propagation model. The EAG will be considered, and for the
absorbing obstacles and reflecting normal sensors, this value would be
surfaces are not considered in the nothing. The cluster heads collect
free space propagation model. The the data from the nodes within this
distance d0 is calculated as, cluster, process, and aggregate; and a
single data is transmitted to the sink
E fs instead of sending complete data
d0 = (19.2) received from each node. This will
Emp
reduce energy while transmitting
the data. Et is the energy needed for
where Efs is the energy needed to send transmitting 1 bit/m2.
data within the free space, Emp is the Energy requirement for receiving
energy needed for sending the data the data:
in multipath networks. Therefore, it
Let ER be the energy spent on is implanted within the soldiers

receiving a packet of size P over the in a war zone; these sensor nodes
distance d implanted in the body of the soldiers
are considered as reactive sensors. In
ER = ( Er × P ) (19.5) the earlier works and also in TEEN,
the cluster count is calculated based
where Er is the power required to on the probability ratio and the
receive a bit/m2. Since receiving nodes with higher energy that are not
a bit of message is consuming a being selected in the recent rounds [r
significant amount of energy, the mod(N/k)]. The nodes in these cases
protocols focus on minimizing the mean the normal nodes that are peri-
transmission distances and number odically sensing and broadcasting the
of transmission/reception operations information to the sink. In this work,
for each message. The radio model the nodes are not always sending
used here is assumed as symmetric the data. They are always under idle
so the energy needed for transmitting mode that saves energy consump-
a packet from a sensor to another tion. When the sensed information is
sensor is identical in both directions. beyond the defined threshold value,
these nodes get activated and start
sending the information to the base
19.4.2 ADAPTIVE ENERGY station or sink or otherwise called
EFFICIENT CUSTER HEAD the controller. Hence, every round
ESTIMATION METHODOLOGY of operation has active nodes which
are less than the total quantity of
In real-time applications, sensor alive nodes. The nodes that are inac-
networks have thousands of sensor tive but idle at the corresponding
nodes. When considering the reac- round are called normal nodes. In
tive WSNs, the nodes are stimulated the earlier works, the normal nodes
during the sensed value exceeds the are also considered for finding the
threshold. Therefore, the assumption optimum number of clusters. As
is made that only part of the sensor this work focuses on reactive sensor
devices is getting activated at a time. nodes, the normal nodes have not
For example, applications like forest been considered for the computa-
fire detection or temperature moni- tion of the optimum cluster number.
toring, only sensor nodes around Therefore, the number of clusters
the affected area may be activated found using the earlier method
and the remaining nodes are under had not been optimized, when less
an idle state. In a similar way, a number of nodes are activated. If the
WBAN having thousands of nodes cluster heads are placed close with
each other, the energy dissipation where Mr has been the total amount
becomes uneven, which leads to the of active sensors in round r and Mr–1
shortfall of the network lifetime. The has been the total amount of active
TEEN-DECH method has been used sensors in the round r−1. The rate of
to resolve the above drawback. change of the above-said parameters
In this chapter, a novel method can be computed as
ology is suggested to compute the
optimum amount of cluster heads X = [ Ec M t ×10] (19.9)
needed for each round. The number
of cluster heads for each round is Now, the given expression is used
calculated by the ratio of the total to find the optimum cluster count
energy of all the nodes in a WSN
at the current round to that at the Y = 1 − e( −α X ) ;0 < α < 1 (19.10)
previous round. The ratio of the
number of active nodes at the current The optimum cluster count is
round divided by that at the previous computed as
round is also taken into consideration.
If Ei is the energy of the ith node in the CH opt = ⎡⎣( p ∗ na ∗Y ) ⎤⎦ (19.11)
WSN, the total energy (Etot(r)) of the
WSN at round r is calculated as where p is the probability value and
na is the total number of alive sensor
nodes in the network. CHopt gives
Etot ( r ) = ∑ i=1 Ei
n
(19.6) the total number of optimum cluster
heads to be selected in the respec
The change in the total energy tive round. If the number of clusters
level (Ec) of the sensor network already computed is less than the
between two consecutive rounds can optimal amount of cluster head
be found as: (CHopt), the balance cluster heads
will be elected from the sensors with
Ec =
( Etot (r ) ) more energy and not being elected as
(19.7)
( Etot (r −1) ) cluster head in the recent r mod (N/k)
where (Etot(r)) is the total energy of rounds are selected as cluster heads.
the network in round r and (Etot(r– Otherwise, if the clusters already
1)) is the total energy of the network computed are greater than the
in the previous round. Let Mt be the optimum cluster count, the numbers
ratio of the active nodes between the of excess cluster heads are found and
current and previous rounds among the existing cluster heads,
cluster heads with least energy are
Mr converted as normal nodes. After
Mt = (19.8)
M r −1 the cluster heads are selected, the
TEEN-DECH method is applied to testing. This tool should incorporate

find the closely placed cluster heads. appropriate analysis on output data,
If these cluster heads have been integrate various models, finite math-
closely placed, these closely placed ematical functions, and statistical
cluster heads are replaced by the accuracy of the simulation results.
members with in the same cluster The tool used for simulating the WSN
having minimum link cost. for performing the developed protocol
is MATLAB R2013a. The required
simulation procedures for transmit-
19.5 SIMULATION RESULTS ting nodes, modeling the communi-
cation channel, and receiving node
Modifications have been made architecture are available here.
on the existing TEEN and TEEN- This tool is an easy-to-use envi-
DECH protocols and the simula- ronment for beginners where prob-
tions have been performed using lems and solutions are expressed
MATLAB R2013a. A number of in recognizable mathematical
simulations with different param- expressions. Due to this, this tool is
eters were executed to evaluate the recognized as one of the benchmark
performance of TEEN and TEEN- network simulations environment
DECH protocols with the proposed and has a significant number of users,
methodology. Experimental results including students, researchers, so
show that the proposed methodology on so forth.
outperforms the existing protocols
in terms of a network lifetime and TABLE 19.1 Network Parameters
throughput. A field dimension of
100 × 100 has been considered as Parameters Values
a simulation environment for the Base station position 50 × 50
experiment. The nodes used for this Efs (amplifier used in the 10×10−12 J
simulation are 100. Various network free space model)
parameters assumed for this simula- Emp (amplifier used in 0.0013×10−12
tion are given in Table 19.1. multipath fading model) J
EAG (aggregation energy) 5×10−9 J
Eo (initial energy) 0.01 J
19.5.1 SIMULATION Size of packet 2000 bits
ENVIRONMENT Size of control packet 100 bits
Po (cluster head election 0.1
Every researcher wants to have an probability)
easy, reliable, flexible, and error-free Number of rounds 500
simulation tool for new prototype Number of sensor nodes 100
development, modification, and (n)
19.5.2 PERFORMANCE 19.5.3 RESULT ANALYSIS

METRICS
Several rounds of iterations are
The objectives of the performance performed in this experiment with α
analysis are to learn and analyze the ranging from 0.1 to 0.9 in the equa-
performance of the work, identify tion 1−e−αx to get the best result for
performance problems, identify the finding the optimal amount of cluster.
factors that should influence expec- Experimental results make it clear that
tations, and understand the relation- the proposed technique performs better
ship between expected output and while α = 0.6. Figure 19.2 illustrates the
performance analysis. Performance amount of cluster heads for each round
metrics measure and analyze the of operation used by proposed method
actions that lead to the result, and and the existing protocols. According
this is the key data everyone needs to the algorithm, first the cluster heads
to have to make proper decisions have been computed using Equation
and directions on their research. (19.1) then the algorithm finds the
The given below metrics have been optimal quantity of cluster heads. If
used to analyze the efficiency of the the number of cluster heads already
proposed TEEN-ACHE technique computed is greater than the optimum
with the available TEEN-DECH and number of cluster heads, the difference
TEEN protocols. numbers of cluster heads with least
energy are changed as normal nodes.
1. Network lifetime: It is defined When the optimal quantity of cluster
as the time until all nodes is greater than the already computed
are drained of their energy. cluster heads, the balance clusters are
Network lifetime is defined regenerated using Equation (19.1). The
in another way as the time at TEEN and TEEN-DECH calculate the
which the first network node cluster heads based on the probability
drains out its energy to send a value, but the TEEN-ACHE method
packet. computes the cluster headcount using
2. Throughput: This is the the ratio of active nodes and change in
cumulative quantity in packets network energy between two succes-
reached the base station or sive rounds. The active nodes Mr at
controller sent by the sensor round r and Mr−1 at round r−1 are
devices in a WBAN. definitely not equal to total number
3. Stability period: It is the of nodes. Therefore, the active nodes
time duration starting from are significant in the computation of
the functioning of a BAN to cluster count.
the demise of the first sensor The outcome of the suggested
node. TEEN-ACHE methodology is observed
using the metrics given in Section
19.5.2. In addition, the performance of been computed and used for the simula-
the proposed method is compared with tion, the unnecessary usage of nodes as
that of the existing TEEN-DECH and cluster heads are averted. This avoids
TEEN protocols and the graphs are the energy loss due to nodes being
shown in Figures 19.2–19.5. Figure cluster heads and improves the network
19.2 illustrates the amount of cluster lifetime. Similarly, a small amount of
heads in each iteration using TEEN- cluster heads inside a network makes a
ACHE and the existing TEEN-DECH rapid drain of the energy of these cluster
and TEEN protocols. It indicates heads due to heavy traffic. Therefore,
that the TEEN-ACHE methodology an energy hole has been created inside
optimally computes the cluster heads the network and decreases the network
so that the energy dissipation is mini- lifetime. It is observed that up to round
mized that leads to an increase in life 320; the TEEN-ACHE performs better
of the BAN. Figure 19.3 illustrates the than the other protocols, and at the
network lifetime of the BAN. Since the last stages, the TEEN-DECH has little
optimum amount of cluster heads has more live nodes.
FIGURE 19.2 Cluster head vs rounds.

FIGURE 19.3 Network life time.
Figure 19.4 illustrates the when the TEEN-ACHE methodology

amount of data packets received is applied. Since optimal cluster heads
by the base station or sink from are selected, the network lifetime has
the sensor devices available in the been enhanced so that more data are
BAN. It is inferred that the sink transmitted. Using the optimal cluster
receives much amount of data count, usage of either more number
than that received using TEEN and of clusters or less number of cluster
TEEN-DECH. Since the network is prevented. This avoids unnecessary
lifetime is improved using TEEN- use of clusters. Therefore, energy
loss due to the cluster is avoided.
DECH, the nodes are alive till round
Similarly, less number of cluster
350. Therefore, the total quantity of
causes heavy traffic at cluster heads
packet received by the base station
that leads to rapid death of cluster
is increased. The number of packets heads. Therefore, it is understood that
received by the cluster heads from the the TEEN-ACHE algorithm func-
respective member sensors for each tions superior to TEEN-DECH and
round is shown in Figure 9.5. This TEEN with respect to throughput and
states that the cluster heads receive network lifetime.
more data from the sensor nodes
FIGURE 19.4 Packets sent to base station.
FIGURE 19.5 Packets sent to cluster head.

19.6 CONCLUSION its level with TEEN-DECH at the

last stage. The experimental results
BANs are striving for finding novel depending upon the topology and
methods to minimize the energy network metrics illustrate that the
utilization of the nodes. Although network’s functionality is enhanced
numerous methods have been used with respect to network lifetime and
for the conservation of energy, throughput when using the proposed
clustering is a premier mechanism methodology. For the future exten-
used in WSN and BAN. Clustering sion of this work, the fuzzy logic will
techniques reduce energy consump- be used in selecting the cluster head.
tion, packet drop rate, and extend Since the heterogeneous devices are
scalability and network lifetime. connected in a BAN, the algorithm
The requirement is there to identify used for electing the cluster head
or compute the correct number of and finding the optimum number
clusters to nullify network overhead should support the heterogeneity of
that is a demanding task for energy- the BAN.
efficient BAN. There is a difficulty
in choosing the most advantageous
number of clusters for each round KEYWORDS
of operation manually. Finding
the appropriate clusters to reduce clustering
energy utilization is an important sensor networks
challenge in BANs. In this work, a energy effciency
novel methodology is suggested to
body area networks
find the best possible cluster heads
TEEN
for a single round of operations in
reactive BANs. The change in total
network energy between consecutive
rounds and the change in the number REFERENCES
of active nodes among consecutive
Akyildiz, Ian F., et al. "A survey on sensor
rounds of operations play a major networks." IEEE Communications
role in the calculation of the optimal Magazine 40.8 (2002): 102–114.
cluster headcount. The inverse expo- Anastasi, Giuseppe, et al. "Energy
nential function is used to optimally conservation in wireless sensor networks:
find the cluster count for each round. A survey." Ad Hoc networks 7.3 (2009):
537–568.
The outcome of the experiments Arumugam, Gopi Saminathan, and
makes clear that during the final Thirumurugan Ponnuchamy. "EE-LEACH:
rounds of operations the nodes Development of energy-efficient
become dead quickly and sustained LEACH Protocol for data gathering in
WSN." EURASIP Journal on Wireless on wireless sensor networks for stem

Communications and Networking 2015.1 diameter microvariation." Transactions
(2015): 1–9. of the Chinese Society of Agricultural
Bao, Shu-Di, Yuan-Ting Zhang, and Lian- Engineering 2008(11) (2008): 7–12.
Feng Shen. "Physiological signal based Gungor, Vehbi C., and Gerhard P. Hancke.
entity authentication for body area sensor "Industrial wireless sensor networks:
networks and mobile healthcare systems." Challenges, design principles, and
Proceedings of the 2005 27th Annual technical approaches." IEEE Transactions
Conference on IEEE Engineering in on Industrial Electronics 56(10) (2009):
Medicine and Biology. IEEE, 2006. 4258–4265.
Batra, Payal Khurana, and Krishna Kant. "A Heinzelman, Wendi B., Anantha P.
clustering algorithm with reduced cluster Chandrakasan, and Hari Balakrishnan. "An
head variations in LEACH protocol." application-specific protocol architecture
International Journal of Systems, Control for wireless microsensor networks." IEEE
and Communications 7(4) (2016): Transactions on Wireless Communications
321–336. 1(4) (2002): 660–670.
Batra, Payal Khurana, and Krishna Kant. Heinzelman, Wendi Rabiner, Anantha
"LEACH-MAC: A new cluster head Chandrakasan, and Hari Balakrishnan.
selection algorithm for wireless sensor "Energy-efficient communication protocol
networks." Wireless Networks 22(1) for wireless microsensor networks.".
(2016): 49–60. Proceedings of the 33rd Annual Hawaii
Chen M, Gonzalez S, Vasilakos A, Cao H, International Conference on System
Leung VC. Body area networks: A survey. Sciences. IEEE, 2000.
Mobile Networks and Applications 16(2) Huang, Pei, et al. "The evolution of MAC
(2011):171–93. protocols in wireless sensor networks: A
Coates, Robert W., et al. "Wireless sensor survey." IEEE Communications Surveys
network with irrigation valve control." & Tutorials 15(1) (2013): 101–120.
Computers and Electronics in Agriculture Hussain, Md Asdaque, and Kwak kyung
96 (2013): 13–22. Sup. "WSN research activities for military
Darwish, Ashraf, and Aboul Ella Hassanien. application.". Proceedings of the 11th
"Wearable and implantable wireless International Conference on Advanced
sensor network solutions for healthcare Communication Technology, ICACT’2009.
monitoring." Sensors 11(6) (2011): Vol. 1. IEEE, 2009.
5561–5595. Jain, Kamal, and Vijay V. Vazirani.
Demirkol, Ilker, Cem Ersoy, and Fatih "Approximation algorithms for metric
Alagoz. "MAC protocols for wireless facility location and k-median problems
sensor networks: A survey." IEEE using the primal-dual schema and
Communications Magazine 44(4) (2006): Lagrangian relaxation." Journal of the
115–121. ACM 48(2) (2001): 274–296.
Ding, Ping, JoAnne Holliday, and Aslihan Jovanov, Emil, et al. "Stress monitoring
Celik. "Distributed energy-efficient using a distributed wireless intelligent
hierarchical clustering for wireless sensor sensor system." IEEE Engineering in
networks." International Conference on Medicine and Biology Magazine 22(3)
Distributed Computing in Sensor Systems. (2003): 49–55.
Springer, Berlin, Heidelberg, 2005. Lin, Ruizhong, Zhi Wang, and Youxian Sun.
Feng, Gao, et al. "Preliminary study on "Wireless sensor networks solutions for
crop precision irrigation system based real time monitoring of nuclear power
plant." Proceedings of the 5th World Salim, Ahmed, Walid Osamy, and Ahmed
Congress on Intelligent Control and M. Khedr. "IBLEACH: Intra-balanced
Automation, WCICA’2004. Vol. 4. IEEE, LEACH protocol for wireless sensor
2004. networks." Wireless Networks 20(6)
Manjeshwar, Arati, and Dharma P. Agrawal. (2014): 1515–1525.
"APTEEN: A hybrid protocol for efficient Sara, Getsy S., and D. Sridharan. "Routing in
routing and comprehensive information mobile wireless sensor network: A survey."
retrieval in wireless sensor networks." Telecommunication Systems 57(1) (2014):
Proceedings of the International Parallel 51–79.
and Distributed Processing Symposium, Smaragdakis, Georgios, Ibrahim Matta, and
IPDPS. Vol. 2. 2002. Azer Bestavros. SEP: A Stable Election
Manjeshwar, Arati, and Dharma P. Agrawal. Protocol for Clustered Heterogeneous
"TEEN: A routing protocol for enhanced Wireless Sensor Networks. Boston
efficiency in wireless sensor networks." University Computer Science Department,
2004.
Proceedings of the International Parallel
Sundareswaran, P., K. N. Vardharajulu, and
and Distributed Processing Symposium,
R. S. Rajesh. "DECH: Equally distributed
IPDPS. Vol. 1. 2001.
cluster heads technique for clustering
Movassaghi S, Abolhasan M, Lipman
protocols in WSNs." Wireless Personal
J, Smith D, Jamalipour A. Wireless
Communications 84(1) (2015): 137–151.
body area networks: A survey. IEEE
Wang, Ning, Naiqian Zhang, and Maohua
Communications Surveys & Tutorials Wang. "Wireless sensors in agriculture
16(3) (2014): 1658–86. and food industry—Recent development
Naik, Piyush, and Krishna M. Sivalingam. and future perspective." Computers and
"A survey of MAC protocols for sensor Electronics in Agriculture 50(1) (2006):
networks." Wireless Sensor Networks. 1–14.
Springer, New York, NY, USA, 2004. pp. Wu, Shan-Hung, Chung-Min Chen, and
93–107. Ming-Syan Chen. "Collaborative wakeup
Nakamura, Eduardo F., Antonio AF Loureiro, in clustered ad hoc networks." IEEE Journal
and Alejandro C. Frery. "Information on Selected Areas in Communications
fusion for wireless sensor networks: 29(8) (2011): 1585–1594.
Methods, models, and classifications." Yi, Chenfu, Lili Wang, and Ye Li. "Energy
ACM Computing Surveys 39(3) (2007): 9. efficient transmission approach for WBAN
Qing, Li, Qingxin Zhu, and Mingwen Wang. based on threshold distance." IEEE
"Design of a distributed energy-efficient Sensors Journal 15(9) (2015): 5133–5141.
clustering algorithm for heterogeneous Yick, Jennifer, Biswanath Mukherjee, and
wireless sensor networks." Computer Dipak Ghosal. "Wireless sensor network
Communications 29(12) (2006): survey." Computer Networks 52(12)
2230–2237. (2008): 2292–2330.
Rezaei, Zahra, and Shima Mobininejad. Younis, Ossama, and Sonia Fahmy. "HEED:
"Energy saving in wireless sensor A hybrid, energy-efficient, distributed
networks." International Journal of clustering approach for ad hoc sensor
Computer Science and Engineering networks." IEEE Transactions on Mobile
Survey 3(1) (2012): 23. Computing. 4 (2004): 366–379.
CHAPTER 20
SEGMENTATION AND CLASSIFICATION

OF TUMOUR REGIONS FROM BRAIN
MAGNETIC RESONANCE IMAGES
BY NEURAL NETWORK-BASED
TECHNIQUE
J. V. BIBAL BENIFA1* and G. VENIFA MINI2
1
Department of Computer Science and Engineering, Indian Institute of
Information Technology, Kottayam, India
Department of Computer Science and Engineering, Noorul Islam
2
Centre for Higher Education, Kumaracoil, India

Corresponding author. E-mail: benifa.john@gmail.com
*
ABSTRACT is incompetent to differentiate tumor

tissue from the healthy tissues. In
Brain tumor imaging and interpreta- this chapter, a practical solution is
tion is a challenging problem in proposed through a novel algorithm
the field of medical sciences. In by segmenting the tumor part and
general, healthy brain as well as healthy part that works based on
the tumor tissues appears similar image segmentation and self-orga-
without any differences in the nizing neural networks (NNs). The
brain images while captured with segmentation algorithm identifies the
magnetic resonance technology. It tumor regions and further boundary
makes several complications in the parameters will be gathered from
course of diagnosis and treatment the segmented images to feed into
of affected people. Presently, image the neural network system. The
processing segmentation algorithms classification efficiency of proposed
are extensively employed to detect algorithm is also improved with the
the tumor regions in MRI, however it aid of neural networks. NN-based
tumor segmentation algorithm walking, utterance, and frequent

is implemented using MATLAB unconsciousness. Hence, detection
12b and it has been tested with the and segmentation of tumor regions
dataset consists of real images. The obtained from Magnetic resonance
results indicate that the proposed images through advanced image
neural network-based algorithm processing techniques are essential
to find brain tumors is a promising for the present scenario.
candidate with intensive practical Magnetic resonance imaging
application capabilities. (MRI) technology helps for
acquiring the image of brain and
it differentiates healthy and tumor
20.1 INTRODUCTION tissue from the acquired image.
Tissue level differentiation is always
Cancer is a life-threatening syndrome challenging due to the unavail-
that occurs because of uncontrolled ability of accurate image classifica-
growth of cells causing a lump called tion techniques (Nabizadeh and
a tumor (Autier, 2015). Typically, the Kubat, 2015; Breiter et al., 1996).
tumor region grows and spreads in an It occurs because of variation in
uncontrollable manner that leads to size, location, and image intensities
causality. Brain tumor or neoplasm exist in the real images and tumor
within the cranium happens when regions cannot be easily detected by
anomalous cell grows inside the computer segmentation algorithms.
brain region (Gonzales, 2012). Conversely, manual segmentation
Tumours are broadly classified into is observed to be a complex and
two categories, namely malignant time-consuming process. MRI
tumors and benign tumors. Further, is a common technique used for
cancerous or malignant tumor in observing and collecting detailed
brain is categorized into primary images of the brain. Segmentation
tumors and secondary tumors. Here, of tumor regions from MR images
primary tumors originate within plays a vital role on diagnosis and
the brain while secondary tumor tumor cell growth rate forecasting
(metastasis) spreads from any other as well as to plan effective treatment
location to brain. The symptoms strategies. Tumors such as menin-
of brain tumors differ based on giomas can be easily segmented,
the affected regions of the brain while gliomas and glioblastomas are
and may comprise of headaches, more complicated to be localized
seizures, vision related issues, and on MRI. Further, these tumors are
mental disorders. Other typical always hard to segment because the
symptoms include complexity in MR images are always diffused and
Segmentation and Classifcation of Tumour Regions 449
in low contrast with an extended (DMRI) pulse sequences). The

tentacle-like structures. The shape of dissimilarity among these values
tumor is always irregular and the size (modalities) yields a sole signature
varies at regular intervals that pose to every class of tissues. Most recent
additional complications. Hence, the automated brain tumor segmenta-
voxel values for the images obtained tion techniques employ tailored
through MR cannot be standardized features using classical machine
like X-ray computed tomography learning (ML) pipeline. As said by
(CT) scans. The type of MR machine these methods, image features are
employed (tesla intensity level) and initially extracted and then intro-
the acquisition protocol (voxel reso- duced into a classifier algorithm
lution, gradient strength, field view and the training practice would not
magnitude and b0 value) influence influence the characteristics of those
the picture resolution. For instance, extracted features (Bengio et al.,
different MR machines offer drasti- 2013). Feature description through
cally different grayscale values for a specific optional method should
the same type of tumor (examined at be selected based on the compe-
different cancer diagnosis centers). tency to learn intricate features
Healthy brains comprise three types openly from in-domain data. For
of tissues that include: (i) white brain tumor region segmentation
matter, (ii) gray matter, and the (iii) as well as to merge data across
cerebrospinal fluid. The objective MRI modalities, deep neural
of brain tumor segmentation is to network (DNN) is employed to
perceive the tumor regions known learn feature hierarchies (Menze et
as active tumorous tissue, necrotic al., 2015). Recently, convolution
tissue, and edema (swelling area that neural networks (CNN) became a
is adjacent to tumor). This is done by foundation for the computer vision
subtracting the normal tissues from researchers because of its excel-
the abnormal areas through MRI. lent performance in the ImageNet
However, the borders of infiltra- Large-Scale Visual Recognition
tive tumors like glioblastomas are Challenge. CNN is productively
always fuzzy and complex in nature used for segmentation problems in
to discern from normal tissues (Hall nonmedical research areas as well.
et al., 1992). In this chapter, numerous CNN-
Multifaceted MRI modality based tumor segmentation architec-
offers better brain tumor segmen- tures are presented which includes
tation through various (e.g., T1 Maxout hidden units and Dropout
(spin-lattice relaxation), T2 (spin- regularization (Havaei et al., 2017).
spin relaxation), and diffusion MRI
The drawback associated with to group the output relevant to the

few machine learning methods is that topology because it has additional
they carryout pixel classification by output neurons beyond the types of
not including the local dependencies tissue to be segmented. Topological
of labels. To overcome this problem, relations in SOM are preserved in
structured output methods such the input and contiguous inputs are
are usually employed and however connected to adjoining neurons.
these methods are computationally Clustering the analogous output
expensive as compared with other neurons is generally done by an extra
methods. Conditional random fields NN that exploits weight vectors as
(CRFs) are a typical example for one the input parameter (Specht, 1991).
of the structured output methods.
Alternatively, label dependencies
can be modeled by considering the 20.2 RELATED WORK
pixel-wise probability approxima-
tion of an initial CNN as additional Demirhan et al. (2010) performed a
input to specific layers of a second study to detach brain tumor region
DNN and it forms a cascaded archi- with SOM where image segmenta-
tecture. Since, convolutions have tion procedure is done by separating
the potential to execute competent the images into segments known
process it would be significantly as classes or subsets according
faster than CRF (He and Garcia, to the features or background
2009). subtraction. Subsequently, the
Neural networks (NN) execute images were segmented by means
classification process by learning of SOM networks and gray level
from data and exclusive of using co-occurrence matrices (GLCM).
rule sets. NN generalizes the vari- Previously, the performance of SOM
ables using previous data and learns networks and GLCM methods on
from past experience and it performs image segmentation were evalu-
well on complex, multivariate, and ated by many researchers and it has
noisy domains such as brain tissue shown more than 90% success rate
segmentation. Self-organizing map on image segmentation applications.
(SOM) is a typical NN that uses an Kaus et al. (1999) described a
unendorsed competitive learning novel iterative technique for the
algorithm and SOM involuntarily programmed segmentation of MRI
classifies itself based on the input images of brain. This iterative
data using a similarity factor such as technique integrates a statistical
Euclidean distance. Studies show that taxonomy scheme and anatomical
the employment of SOM is essential understanding from an aligned
digital atlas. For justification, the estimated using Bayesian theorem.

iterative method was applied to 10 Their parametric technique assumes
tumor cases at various places of that a probability density function of
the brain including meningiomas the tissues which is not precise and it
and astrocytomas (grade 1–3). does not match with real data distri-
The brain and tumor segmentation bution. Subsequently, Iftekharuddin
outputs were evaluated with physical (2005) employed feedforward NN
segmentations done by four autono- combined with automated Bayesian
mous medical experts. Then, it is regularization method for clas-
confirmed that the algorithm offers sification and further used SOM for
better accuracy than the manual clustering.
segmentation process in a short Low signal-to-noise ratio or
period. contrast-to-noise ratio is found to
Reddick et al. (1997) proposed a reduce the correct segmentation ratio
classification method that does not irrespective of the technique used.
necessitate any former understanding Filtering methods such as space
of anatomic structures and it simply invariant low-pass filtering tech-
uses the subpixel precision in the niques can be utilized to images to
zone of interest. They have evaluated resolve this problem. The limitations
the potential of this novel algorithm of conventional filtering techniques
in the course of segmentation of are blurred object boundaries and
anatomic structures on a simulated vital features along with repression
as well as actual brain MR images. of fine structural points such as small
Further, the CFM was evaluated to lesions in the images. This constraint
the level-set-based methodology is set on by using the space-variant
in segmenting objects in a range of filters with feature-dependent tech-
brain MR images. The experimental niques. Gerrig and Murphy (1992)
results through MRI signify that the investigated the performance of
CFM algorithm attains excellent anisotropic diffusion filter that is
segmentation results for brain and proposed by Perona and Malik and
tumor applications. compared the results with a wide
Song et al. (2007) employed range of filters used to get rid of
SOM by means of weighted proba- the arbitrary noise of the MRI. It is
bilistic NN to desperately fragment observed that anisotropic diffusion
the T1 and T2 MR images. The frac- filter smudge homogeneous region,
tional contributions of every direc- boosts the ratio of signal-to-noise
tion vector to various target classes and sharpens the object borders.
are estimated using the training This type of filter diminishes the
sets and posteriori probabilities are noise and reduces partial volume
effects. It is evident that accurate al., 2013). The method specified

segmentation of the images relies by International Consortium for
on the automated feature extraction Brain Mapping atlas computes the
methods that decide the best features later probabilities of healthy tissues
to differentiate dissimilar tissues including white matter, gray matter
(Gerrig and Murphy, 1992). and cerebrospinal fluid. Then, the
Wavelet transform method Tumorous regions are identified
is generally employed in feature by localizing the voxels where
extraction for brain MR image the posterior probability estima-
segmentation because it offers effi- tion is under a definite threshold.
cient localization in both spectral Subsequently, postprocessing
and spatial domains. The drawback action is employed to guarantee an
of this method is translation-variant excellent spatial regularity in the
characteristic and it generates results (Gerrig and Murphy, 1992).
different features from the same two Prastawa et al. (2003) registered
images with a minor realignment. brain images on top of an atlas to
From the literature, brain tumor obtain likelihood map for deformity
segmentation methods for MRI are and hence an active curve is initial-
broadly divided into two catego- ized on the map. Subsequently, the
ries: (i) generative models and (ii) above-mentioned process is iterated
discriminative models (Gerrig and until posterior probability goes under
Murphy, 1992). Generative models a threshold.
require domain-specific prior Several active contour methods
information about the healthy and rely on left–right brain symmetry
tumorous tissues of brain images features or alignment-oriented
and it was discussed extensively by features (Khotanlou, 2009; Cobzas
Prastawa et al. (2003). The appear- et al., 2007). Apart from this, alter-
ance of tissues is complex to char- natively brain tumor segmentation
acterize using generative models can be done by employing discrimi-
and generally it identifies the tumor native models. These methods need
as a shape or an indication that is comparatively less understanding
different from a typical brain image about the brain’s anatomy and
(Clark et al., 1998). necessitate extracting the image
Specifically, brain tumor segmen- features during postprocessing.
tation methods depend on anatomical Subsequently, the relationship
models that are acquired subsequent among these extracted features will
to the alignment of 3D MR image be modeled and labeled. The features
on an atlas estimated from several include values of original input
healthy brain’s images (Doyle et pixels in local histograms along with
surface features that include Gabor compared to other techniques for the
filter banks, or alignment-oriented above-stated problem.
features. Here, the considered
alignment-oriented features are
inter-image gradient, region shape 20.3 PROPOSED WORK
distinction, and symmetry analysis
(Tustison, 2013). The brain tumor regions from MR
The discriminative models images are differentiated from the
use hand-designed features healthy tissues by image segmenta-
and the classifier is specifically tion and thereby training the NN to
trained to differentiate healthy obtain the classification efficiency.
from nonhealthy tissues. It is also In order to enhance the accuracy of
assumed that the key features have differentiation, a NN-based classifi-
elevated discriminative power cation is proposed and its classifica-
because the behaviour of classifier tion efficiency is computed for brain
algorithm is self-regulating from tumor images. The results obtained
the characteristics of those extracted through the proposed method are
features. The complexity of these more robust and it supports in accu-
hand-designed features requires rate segmentation of brain tumor
the computation of huge number regions.
of features for ensuring accuracy The sequence of operations
when employed with conventional involved in the brain tumor segmen-
ML methods. Efficient techniques tation process is presented in Figure
always use less features thereby 20.1. The input images are selected
employing dimensionality reduction from standard cancer image data-
or feature identification methods for bases that are commonly known as
better accuracy. Preliminary analysis MR images. Here, MRI of the brain
has shown that Deep CNNs for is balanced by converting it into grey
brain tumor image segmentation is scale image and resized to a standard
a proficient technique (Havaei et al., resolution. The real two-dimensional
2017). Hinton et al. (2012) proposed (2D) MRI scan image of a brain from
a CNN-based method that consists the DICOM data is obtained from
of a series of convolutional layers for the 64-slice CT scan machine. The
feature detection. Similarly, Nabi- sequences of images are taken from
zadeh and Kubat (2015) investigated different projections by rotating
various classification methods for the gantry. This image provides the
tumor segmentation in brain images complete view of a brain and it is used
and suggested that neural network- for the subsequent analysis purpose.
based methods offer best results as In the course of analysis, the brain
image is segmented for measuring regions of foreground pixels increase

the cancerous area thoroughly in size as the holes inside those areas
inside brain. The regions except the happen to be smaller in size. The
cancerous area are considered to be input to the dilation operator is an
noise and it has to be filtered. image to be dilated and the coordi-
The input MRI images were nate points are called as a kernel. A
normalized, preprocessed and kernel is a structuring element that
registered initially. Subsequently, decides the accurate consequence
the noise in the image, such as the of the dilation operator on the input
strip of the skull or any other clini- MRI.
cally unwanted portion is removed.
Once the irrelevant part is removed
an anisotropic diffusion filtering is
applied all over the image (Nair et al.,
2019). Anisotropic diffusion filters
outperform isotropic related to few
applications such as de-noising of
highly degraded edges. Anisotropic
diffusion filters typically employ
spatial regularization strategies by
considering the modulus of the edge
detector, Œu and its direction. The
orthonormal system of eigenvec-
Please tors u1, u2 of the diffusion tensor is
check
equation estimated by u1  Œu . In order to
something
missing
achieve smoothing along the edge,
FIGURE 20.1 Block diagram of the
Galic et al. (2008) considered to
proposed system.
select the relevant eigenvalues λ1
and λ2 as
In a mathematical insight,
(
Š1 ( Œu ) := g Œu
2
) (20.1)
Gaussian function is applied to
minimize the noise exists in the MRI
images (Chaddad, 2015). When an
In mathematical morphology, image is blurred by the influence
dilation and erosion are the opera- of Gaussian function then it is
tions usually performed with binary called as Gaussian blur or Gaussian
images to expand the boundaries smoothing. The visual outcome
consist of foreground pixels (Chen of Gaussian blurring method is a
and Haralick, 1995). Hence, the smooth blur that is similar to the
visualization of a MRI through matrix format for postprocessing.

a transparent screen. Gaussian In general, the dilation and erosion
smoothing can also be exploited in operations are performed prior to
the preprocessing phase of computer the feature extraction and image
vision-based algorithms to improve segmentation processes. Here, the
the image structures. Further, it mini- shapes and different objects exist in
mizes the high-frequency compo- the images are analysed along with
nents of an input image. One of the the boundary properties.
essential characteristics of Gaussian
filters is to have zero overshoot to a
step function input as the rise and fall 20.3.1 FEATURE EXTRACTION
duration is minimized. A Gaussian THROUGH DISCRETE WAVELET
filter whose impulse response is a TRANSFORM (DWT)
Gaussian function and it transforms
the input signal by convolution and The DWT is a potential mathemat-
this transformation is called as the ical means of feature extraction, and
Weierstrass transform (Kavitha and it has been employed to filter the
Chellamuthu, 2013). wavelet coefficient from MR images
The background pixels of a (Arizmendi, 2012). Typically,
binary input image along with its Wavelets are localized base func-
structuring element are used for tions derived from mother wavelets
computing the dilation as well as in which they provide localized
the erosion factors. The structuring frequency information about the
element lies on apex of an input function of a signal. A continuous
image is placed over such that the wavelet transform (wψ) of a signal
origin of an element matches with x(t), as compared to a real-valued
the corresponding input pixel posi- wavelet, (t) is expressed as
tions at the background. If at any rate
Ÿ
one pixel in the structuring element wƒ ( a, b ) = ( t ) dx (20.2)
matches with a foreground pixel in a −Ÿ
Ž f ( x)* ƒ a ,b
base image, then the input pixel will where, “a” is the Dilation factor, “b”
be set as the corresponding forefront is the translation parameter
value (Chen and Haralick, 1995).
Similarly, if the entire corresponding 1
pixels in the base image lie at the ƒ (t ) = (20.3)
backdrop, then the input pixel is fixed
a ,b
a
as the background value. The struc- The wavelet (ƒ a ,b ) is estimated
turing element should be assigned as from the mother wavelet (ƒ ) by the
a small binary image or in a unique process of translation and dilation. It
is assumed that the mother wavelet (Joshi et al., 2010). In first-order

(ƒ ) satisfies the condition of zero numerical texture analysis, texture
mean and the equation can be data is filtered from the histogram
discredited by preventing “a” and that represents the image intensity.
“b” from a discrete lattice to repre- GLCM method determines the
sent the DWT. The DWT is a linear occurrence of a specific gray-level at
transformation method that functions an arbitrary image position and does
on a data vector whose dimension not include the correlations among
is the numeral power of two and it pixels. The geometric features from
transforms various vectors of the MRI are acquired through gray level
equivalent length (Arizmendi, 2012). spatial dependence matrix. GLCM
Using DWT, the data is divided into approach is a 2D histogram where (i,
multiple frequency components and j)th node is the occurrence of event
it can be expressed as “I” that coincides with “j.” It has a
function with distance d = 1, angle
‹d
‚ = x(n) h * j (n − 2 jk) at 0° (for horizontal), 45° (along the
x ( n ) = ‰ j ,k (20.4)
„‚ j ,k x(n) g * j (n − 2 jk)
d = +ve transverse), 90° (for normal)
and 135° (along the –ve transverse)
The coefficients d j ,k , are the to calculate pixel intensity. GLCM
key elements in signal x(n) and approach has been formulated
correspond to the wavelet function. to capture the texture features
The h(n) and g(n) in Equation (20.4) including contrast, correlation,
signify the coefficients of the high- energy, homogeneity and entropy
pass and low-pass filters while the that can be obtained from LH and
other two factors j and k corresponds HL sub-bands.
to wavelet scale and conversion
factors. The key aspect of DWT
is multidimensional expression of 20.4 IMAGE SEGMENTATION
function and it can be analyzed
at different range of resolution Segmentation method isolates an
through wavelets. The input image image into numerous parts and it is
is analyzed along the x and y axes exercised to recognize an object or
by h(n) and g(n) functions that is the other applicable information exists in
row wise demonstration of the actual digital images. Image segmentation
image. The output of these transfor- can be performed through multiple
mations is summarized as sub-bands strategies that include (i). Thresh-
such as LL, LH, HH, and HL. olding method (Marquez, 2016), (ii).
Texture feature extraction using Color-based segmentation methods
GLCM simply distinguishes normal (Marquez, 2016), and (iii). transform
and abnormal (malignant) tissues
methods (Arizmendi, 2012). The for understanding large image data-

segmentation results obtained sets. SOM consists of two layers,
through Otsu’s thresholding method, namely (i) input nodes (in layer 1),
color-based segmentation such as and (ii) output nodes (in layer 2).
K-means clustering (Selvakumar, Here, output nodes are presented in
2012) and transform methods are the form of 2D grid and there are
presented in Figure 20.2(a), (b), and adaptable weights exist between
(c), respectively. every output. The map characterizes
various features with finest accuracy
using a constrained set of clusters
that is confirmed by multidimen-
sional observation. At the finale
training stage, the clusters turn into
organized grid so that similar clusters
are close to each other and unlike
clusters are isolated from each other.
At the end of training, confusion
matrix is plotted in order to assess
the efficiency of NN classifiers.
The classifier employed in this
chapter is a back-propagation based
NN classifier (Nekovei, 1995). Each
neuron admits a signal from the
neurons exists in the preceding layer,
and the signals are multiplied with an
FIGURE 20.2 Sample results for image assigned weight value. The weighted
segmentation techniques. values are aggregated and conceded
by a limiting function that scales the
In the presented results (Figure result to a finite range of values. The
20.2), the boundary features along relationship among the neurons has
with their values are acquired and an exclusive weighting value and
fed into a NN classification tool. the inputs from previous neurons are
Primarily, SOM is trained to plot independently weighted and summed
the input image against the relevant subsequently. The successive results
tissue regions based on their features are nonlinearly scaled between the
by taking into account of their usual values 0 and +1, meanwhile the
clusters in the matrix (Kohonen, output is moved to the neurons in the
1998). SOM reduces the dimension subsequent layer. The distinctiveness
and clusters linked regions jointly of the NN subsists in the values of the
assigned weights among the neurons. the healthy surrounding tissues.

Hence, a mechanism is required for To achieve the proposed objective,
regulating the assignment of weights the following evaluations are done
to aforementioned problems. Typi- through the sequence of operations
cally, backpropagation (BP) network mentioned in Figure 20.1.
learning algorithm is employed for
such problems and it learns by few 1. Identify tumorous parts by
input examples (training dataset) and differentiating them from the
the correctly known output for each healthy tissue region in the
case. preprocessing phase by MRI
The BP learning process func- segmentation.
tions in a repetitive manner and 2. To obtain the boundary
sample cases are applied to the NN, parameters and coefficients
that generates the output related of the tumor area from the
to the present state of the assigned MRI data.
weights. This output is validated 3. To train the NN with past
with a sample output and a mean- classification data and obtain
squared error signal will be deter- performance parameters.
mined. Subsequently, the determined 4. To analyze the overall
error value is then back propagated efficiency and performance
through the NN, and the weights parameters of the proposed
in each layer are regulated to the system to compare the
minimum level. The weight differ- performance with the state-
ences are estimated to minimize the of-the-art methods.
error signal and the entire process is
repeated for each sample cases. This The proposed work has been
cyclic process is done until the net simulated using the MATLAB
error magnitude drops below the environment and the results are
encoded threshold. At this stage, the presented in the subsequent sections
NN has learned the function and it with discussion. The results are clas-
asymptotically advances towards the sified into three categories, namely
ideal function. preprocessing, segmentation and NN
performance evaluation results.
20.5 EXPERIMENTAL RESULTS

AND DISCUSSION 20.5.1 RESULTS FOR
PREPROCESSING PHASE
The objective of this chapter is to
identify the brain tumor tissue in In the preprocessing phase, the
an MRI image by differentiating training images have been collected
from the National Cancer Archives research work (Kinahan, 2019). The
of United States of America data- image dataset for the proposed work
base and used as the input for this is presented in Figure 20.3.
FIGURE 20.3 Dataset used in the proposed research work.

(Source: Used with permission from Kinahan et al., 2019.)
The input MRI of the brain is conversion is completed and the
taken using a MR scanner which is unwanted noise is removed using
considered to be the test input for the a Sobel edge detector. These edge
proposed work. The test input MRI detectors will identify the edges
is presented Figure 20.4(a) and the present in the image using Sobel
corresponding Gray-scale converted algorithm (Rulaningtyas, 2009). The
image is displayed in Figure 20.4(b). output image obtained from edge
Here, the conversion is essential to detector block is presented in Figure
process input image easily by the 20.5(a). Once the edge detection is
upcoming blocks. Subsequently, completed, the output will be utilized
a filtering operation is performed for creating the dilated image with
to remove unwanted noise present gradient mask as highlighted in
in the image. Once the grey scale Figure 20.5(b).
FIGURE 20.4 Input MRI of brain and converted images.

FIGURE 20.5 Preprocessed and dilated images for edge detector block.
20.5.2 RESULTS FOR dilation is performed to thicken

SEGMENTATION PHASE the connected components and the
corresponding output is presented
Image dilation is attributed for in Figure 20.6(a). After image dila-
identifying neighbourly connected tion process is completed, a binary
components of an object present in gradient mask is applied to the
an image. The edge detected image second level dilated image (Zacha-
is dilated using a gradient mask in raki et al., 2009). The output image
order to identify the neighbourhood after binary gradient mask applied is
connected components. After first presented in Figure 20.6(b).
level dilation is completed, stage-2
FIGURE 20.6 Dilated images with gradient masks.

FIGURE 20.7 Eroded image and final marked image after segmentation process.
An image erosion procedure is image in dataset that will be further

applied after the binary gradeint mask fed into the NN classifier.
phase is completed. The abnormal
peaks of intensities are vaded away
from the dilated image in the course 20.5.3 NEURAL NETWORK
of erosion process and the output is CLASSIFIER PERFORMANCE
shown in Figure 20.7(a). The eroded EVALUATION
image is segmented for abnormal
features and the tumerous tissues The object properties obtained for
are marked. Once the segmentation all the images are assigned as input
and marking process is completed, to the NN classifier. The extracted
the object wise features such as object features include white, grey
contours, area and their properties and Edema along with their structural
are extracted using region properties and intensity features (Nabizadeh
algorithm (Hazen, 1988). The feature and Kubat, 2015).
properties are obtained for the entire
FIGURE 20.8 Error histogram and confusion matrix of NN.

A histogram is an approxima- through BP algorithm based on the

tion of probability distribution of a image features. Except the 70%, there
continuous variable and generally it are three images which is true positive
is represented using graphical distri- but it could not classified correctly
bution of numerical data. To create a by the classifier. From the confusion
histogram, initially "bin" the range of matrix plot, it is found that the clas-
values and subsequently identify the sifier was competent to achieve 0%
values that fall into each time step. error while classifying true negative
The bins are commonly specified as images. Thus, the overall classifica-
successive, nonoverlapping time step tion efficiency is maintained at 70%
of a variable and the intervals should and the false classification is about
be uniformly located at equidistance. 30% during the first instance of
In this work, the classification error training process. The level of classifi-
occurred at each interval is plotted in cation is promising one which can be
the histogram as displayed in Figure improved substantially with a larger
20.8(a). In particular, at the error dataset with additional training.
interval from 0.01419 to 0.956, there The training state statistics shows
are totally 8 instances that occur that the NN classifier achieves an
with a probability of zero error. This optimal minumum value at the
is known as the optimum interval of point 10 and the state is maximum
the classification using NN classifier. at the points 4 and 5. The gradient
The NN confusion matrix is vital peak achieved without error is about
for performance evaluation of the NN 0.011789 and the training state statis-
based on classification as highlighted tics is presented in Figure 20.9. The
in Figure 20.8(b). It is observed that overall performance analysis curve
70% tumor images were successfully is given in Figure 20.10. It shows
classified by the NN based classifier that the validation curve achieves its
FIGURE 20.9 Training state statistics.

FIGURE 20.10 Overall performance of the proposed research work.
peak without error at the epoch 4. The number of healthy frames is more
peak best performance is occurred than the number of tumor frames
at epoch 4 with a score of 0.37558. that formulates the training set
From the training state statistics, it is fully unbalanced. To minimize the
observed that the overall validation difficulties caused while handling
efficiency is constantly above the unbalanced training datasets, equal
required level. The state of testing number of healthy and tumor frames
and training is exponentially varying were preferred for the experimental
until the epoch 4, and it is stabilized analysis.
afterwards. The accuracy of different clas-
sification methods was compared
with the proposed work using
20.5.4 COMPARISON WITH statistical features with noise reduc-
THE STATE-OF-THE-ART tion as presented in Figure 20.11.
METHODS In addition, the number of features
along with the recognition rate for
Four supervised robust classifica- SVM and NN based methods are
tion techniques are applied for presented in Figure 20.12. The
the comparison and validation of results show that the proposed
the present work. The techniques NN based segmentation and clas-
such as SVM, KNN, NSC, and sification methods are efficient for
k-means clustering are compared determining the tumorous tissues
with the NN-based method used for from the brain MRI and classifying
brain tumor segmentation process them precisely.
(Nabizadeh and Kubat, 2015). The
FIGURE 20.11 Accuracy of various classification methods through statistical features with
noise reduction.
FIGURE 20.12 Feature recognition rate.
20.6 CONCLUSION healthy tissue region. The proposed

algorithm is able to obtain the object
The brain tumor detection is done features present in the image and
by image segmentation process and subsequently it can be fed into the
NN-based classification. The input NN classifier. The BP based NN
brain MR images are preprocessed classifier trains itself with the input
by filtering and noise removal dataset and achieves subsequent
followed by removal of unwanted learning with feed forward learning.
data present in the image such as Once the learning is completed, the
skull strip. Here, the NN based performance efficiency of the system
classification method is proficient is evaluated with confusion matrix
for segmentation and it differenti- plot and performance curves. This
ates the brain tumor area from the system is known to be outperforming
than the other existing techniques in Bengio, Yoshua, Aaron Courville, and Pascal
terms of accuracy and performance Vincent. Representation learning: A review
and new perspectives. IEEE Transactions
efficiency. The present chapter shall
on Pattern Analysis and Machine
be extended with a larger dataset by Intelligence. 2013, 35:8, 1798–1828.
including prior knowledge about Clark, M.C., Hall, L.O., Goldgof, D.B.,
shape and model features to the Velthuizen, R., Murtagh, F.R., Silbiger,
tumor segmentation. In addition, it M.S., Automatic tumor-segmentation
can be extended further by including using knowledge-based techniques. IEEE
the morphological structure related Transactions on Medical Imaging. 1998,
117, 187–201.
data of the input brain MRI to train Cobzas, D., Birkbeck, N., Schmidt, M.,
the NN. Hence, such a detailed input Jagersand, M and Murtha, A. 3D variational
data helps the NN tool to perceive brain tumor segmentation using a high
greater information from the MRI dimensional feature set. Mathematical
for extensive medical applications. Methods in Biomedical Image Analysis
(MMBIA 2007). 2007, 1–8.
Carlos Arizmendi, Alfredo Vellido and
Enrique Romero, Classification of human
KEYWORDS brain tumours from MRS data using
discrete wavelet transform and Bayesian
brain tumor neural networks. Expert Systems with
Applications. 2012, 39:5, 5223–5232.
segmentation
Chen S. and Haralick R. M. Recursive
discrete wavelet transform erosion, dilation, opening, and closing
neural networks transforms, IEEE Transactions on Image
Processing, 1995, 4:3, 335–345.
Demirhan, Ayse, Memduh Kaymaz, Raşit
Ahıska, and Inan Guler. A survey on
REFERENCES application of quantitative methods on
analysis of brain parameters changing with
Autier, P. Risk factors and biomarkers of
temperature. Journal of Medical Systems.
life-threatening cancers. Ecancer Medical
2010, 34:6, 1059–1071.
Science. 2015, 9:596, 1–8.
Doyle S., Vasseur F., Dojat M., and Forbes F.
Ahmad Chaddad. Automated feature
Fully automatic brain tumor segmentation
extraction in brain tumor by magnetic
resonance imaging using Gaussian from multiple MR sequences using hidden
mixture models. International Journal of Markov fields and variational EM. Procs.
Biomedical Imaging. 2015, 2015, 1–11. NCI-MICCAI BraTS. 2013, 18–22.
Breiter, Hans C., Scott L. Rauch, Kenneth Galic, I., Weickert, J., Welk, M., Bruhn,
K. Kwong, John R. Baker, Robert M. A., Belyaev, A., Seidel, H.P. Image
Weisskoff, David N. Kennedy, Adair compression with anisotropic diffusion.
D. Kendrick et al. Functional magnetic Journal of Mathematical Imaging and
resonance imaging of symptom Vision. 2008, 31:2–3, 255–269.
provocation in obsessive-compulsive Gerrig, Richard J., and Gregory L.
disorder. Archives of General Psychiatry. Murphy. Contextual influences on the
1996, 53:7, 595–606. comprehension of complex concepts.
Language and Cognitive Processes. 1992, International Conference on Electronic

7:3–4, 205–230. Computer Technology, Kuala Lumpur,
Hall, Lawrence O., Amine M. Bensaid, 2010, 112–116.
Laurence P. Clarke, Robert P. Velthuizen, Kaus, M. R., Simon K. Warfield, Arya
Martin S. Silbiger, and James C. Bezdek. Nabavi, E. Chatzidakis, Peter M. Black,
A comparison of neural network and Ferenc A. Jolesz, and Ron Kikinis.
fuzzy clustering techniques in segmenting Segmentation of meningiomas and
magnetic resonance images of the brain. low-grade gliomas in MRI. International
IEEE Transactions on Neural Networks. Conference on Medical Image Computing
1992, 3:5, 672–682. and Computer-assisted Intervention,
Havaei, Mohammad, Axel Davy, David Springer, Berlin, Heidelberg, 1999, 1–10.
Warde-Farley, Antoine Biard, Aaron Kavitha A. R, Chellamuthu C. Detection
Courville, Yoshua Bengio, Chris Pal, of brain tumour from MRI image using
Pierre-Marc Jodoin, and Hugo Larochelle. modified region growing and neural
Brain tumor segmentation with deep network. The Imaging Science Journal.
neural networks. Medical Image Analysis. 2013, 61:7, 556–567.
2017, 35, 18–31. Kinahan, P., Muzi, M., Bialecki, B., Herman,
Hazem M Raafat, Andrew K.C Wong. B., and Coombs, L. Data from ACRIN-
A texture information-directed region DSC-MR-Brain [Data set]. The Cancer
growing algorithm for image segmentation Imaging Archive. 2019, DOI: https://doi.
and region classification, Computer org/10.7937/tcia.2019.zr1pjf4i.
Vision, Graphics, and Image Processing. Khotanlou Hassan, Olivier Colliot, Jamal
1988, 43:1, 1–21. Atif, Isabelle Bloch. 3D brain tumor
Havaei M, Davy A, Warde-Farley D, Biard A, segmentation in MRI using fuzzy
Courville A, Bengio Y, Pal C, Jodoin PM, classification, symmetry analysis and
Larochelle H. Brain tumor segmentation spatially constrained deformable models,
with deep neural networks. Medical Image Fuzzy Sets and Systems, 2009, 160:10,
Analysis, 2017, 35, 18–31. 1457–1473.
He, Haibo, and Edwardo A. Garcia. Learning Menze, Bjoern H., Andras Jakab, Stefan
from imbalanced data. IEEE Transactions Bauer, Jayashree Kalpathy-Cramer,
on Knowledge and Data Engineering. Keyvan Farahani, Justin Kirby, Yuliya
2009, 21:9, 1263–1284. Burren. The multimodal brain tumor
Hinton Geoffrey E., Nitish Srivastava, image segmentation benchmark (BRATS).
Alex Krizhevsky, Ilya Sutskever, Ruslan IEEE Transactions on Medical Imaging.
R. Salakhutdinov. Improving neural 2015, 34:10, 1993–2024.
networks by preventing co-adaptation Marquez, Cristian. Brain tumor extraction
of feature detector. Neural and from MRI images using MATLAB.
Evolutionary Computing. https://arxiv.org/ International Journal of Electronics,
abs//1207.0580, 2012. Communication and Soft Computing
Iftekharuddin K.M. On techniques in Science and Engineering (IJECSCSE).
fractal analysis and their applications in 2016, 2:1, 1.
brian MRI. Medical Imaging Systems: Michael Gonzales. Classification and
Technology and Applications, Analysis and pathogenesis of brain tumors. Brain
Computational Methods, 2005, 1, 63–86. Tumors (Third Edition). 2012, 36–58.
Joshi D. M., Rana N. K. and Misra V. Nabizadeh Nooshin and Miroslav Kubat.
M. Classification of Brain Cancer Brain tumors detection and segmentation
Using Artificial Neural Network, 2nd in MR images: Gabor wavelet vs. statistical
features. Computers and Electrical Specht, Donald F. A general regression

Engineering, 2015, 45, 286–301. neural network. IEEE Transactions on
Nair, R.R., David, E. and Rajagopal, S. A Neural Networks. 1991, 2:6, 568–576.
robust anisotropic diffusion filter with Song, Bo, Weinong Chen, Yun Ge, and Tusit
low arithmetic complexity for images. Weerasooriya. Dynamic and quasi-static
EURASIP Journal on Image and Video compressive response of porcine muscle.
Processing, 2019, 1, 48. Journal of Biomechanics. 2007, 40:13,
Nekovei R. and Ying Sun, Back-propagation 2999–3005.
network and its configuration for blood Selvakumar J., Lakshmi A. and Arivoli T.
vessel detection in angiograms, IEEE Brain tumor segmentation and its area
Transactions on Neural Networks, 1995, calculation in brain MR images using
6:1, 64–72. K-mean clustering and fuzzy C-mean
Prastawa, Marcel, Elizabeth Bullitt, Nathan algorithm. IEEE-International Conference
Moon, Koen Van Leemput, and Guido on Advances in Engineering, Science
Gerig. Automatic brain tumor segmentation and Management (ICAESM—2012),
by subject specific modification of atlas Nagapattinam. 2012, 186–190.
priors 1. Academic Radiology. 2003, Teuvo Kohonen. The self-organizing map.
10:12, 1341–1348. Neurocomputing. 1998, 21:1–3, 1–6.
Reddick, Wilburn E., John O. Glass, Edwin Tustison N. J. Instrumentation bias in the
N. Cook, T. David Elkin, and Russell J. use and evaluation of scientific software:
Deaton. Automated segmentation and Recommendations for reproducible
classification of multispectral magnetic practices in the computational sciences,
resonance images of brain using artificial Front. Neurosci. 2013, 7, 162.
neural networks. IEEE Transactions on Zacharaki EI, Wang S, Chawla S, Soo Yoo
Medical Imaging. 1997, 16:6, 911–918. D, Wolf R, Melhem ER, Davatzikos C.
Rulaningtyas R. and Ain K. Edge Classification of brain tumor type and
detection for brain tumor pattern grade using MRI texture and shape in
recognition, International Conference a machine learning scheme. Magnetic
on Instrumentation, Communication, Resonance in Medicine. 2009, 62(6),
Information Technology, and Biomedical 1609–1618.
Engineering. 2009, 1–3.
CHAPTER 21
A HYPOTHETICAL STUDY IN
BIOMEDICAL BASED ARTIFICIAL
INTELLIGENCE SYSTEMS USING
MACHINE LANGUAGE (ML)
RUDIMENTS
D. RENUKA DEVI* and S. SASIKALA
Department of Computer Science, IDE, University of Madras, Chennai
600005, Tamil Nadu, India
Corresponding author. E-mail: renukadevi.research@gmail.com
*
ABSTRACT in their gauges. With the degree and

dimensions of data raising up at a
Artificial intelligence (AI) is the confounding pace, henceforth the
recreation of the anthropological conventional diagnostic approaches
intelligence mechanism by machines have been modernized and there has
and specific intelligence systems. been a change in the clinical assertion
These headways contain erudi- making techniques. The intention of
tion, perceptive considerations and promising technological development
self-rectification. Some practices of in AI, leads to tackle critical health
AI include expert systems, speech issues with quicker benefits even in
recognition and machine vision and shortcoming of the issue. On the same
learning. Artificial astuteness or the regards and notion machine learning
intelligence is advancing dramati- (ML) is also trying to oversee the
cally, and it is already transforming applications and possibilities of
our world socially, parsimoniously medical theories with assistive symp-
and politically. There have been para- tomatic services, which will drasti-
digms of a shift in the way patients cally progress the accessibility and the
are clinically treated by doctors with accuracy of the medical services for
AI-assisted excessive limits of data the common man and mankind. In this
chapter, we explore the fundamentals contemporary technology. AI is

and application of ML in biomedical gradually changing medical prac-
domain. We also discuss the research tice. With modern advancements in
developments and challenges. digitized data procurement, machine
learning (ML) and computing
substructure the applications of
21.1 INTRODUCTION
AI are intensifying into areas that
Artificial intelligence (AI) was were previously thought to be only
framed by John McCarthy, an the province of human expertise.
American computer researcher, in In recent years, there has been an
1956 at the Dartmouth Conference enlarged focus on the practical
where the persuasion was born. use of AI in various domains to
Currently, AI is a parasol termi- resolve multifaceted issues. AI is
nology that comprehends everything an anthology of numerous technical
from robotic process automation to expertise that impersonate human’s
medical progress. AI can accomplish cognitive functions. Furthermore, AI
errands as identifying patterns and is having a greater impact on health
behavior in the data with more preci- care and allied sectors by exploiting
sion and efficiency, when compared ML algorithms (genetic, fuzzy,
humankind enabling economic expert systems, and so on).
challenge to gain more insight out It is potentially possible for AI
of their data through industries. With systems to handle both structured
the assistance from AI, enormous and unstructured databases, as the
amounts of data can be analyzed and health-care data not only contains
processed to map poverty, climatic formatted inputs, also images,
influence change, automate agri- videos, mails, and unformatted data.
cultural production and irrigation The technical approach and the
along with tailoring health care and technology are extensively used in
learning with predicting consump- all varied kinds of health-oriented
tion patterns. Creating a mutual AI facets, as well as, considerable appli-
milieu, in which anthropoid and cation to investigate the health care
machine work impeccably together data in cardiology, neurology, and
takes more than a smart machine. oncology. Nurses and medical prac-
Organizing smart systems in ways titioners have already been using this
we humans find normal and instinc- assistive technology in a great way
tual is both science and the art forms. to achieve quick and precise treat-
Streamlining energy-usage and ment to the patients. An anthology
waste-management and training and investigation of medical records,
the behavioral patterns is one of history, a variety of tests, scans,
the greatest outcomes of AI in the cardiology, and radiology reports is
A Hypothetical Study in Biomedical Based Artifcial Intelligence Systems 471
being efficiently managed by AI and exclusive benefits of patient affected

digital automation. One of the best by diabetic retinopathy. Some of the
examples to portray this would be features in retina like exudates and
the Babylon app which uses AI to micro aneurysm are extracted using
provide medical consultations and AI and ML. The convolutional neural
recommends precautionary action network (CNN)-based AlexNet
using individual medical record with deep neural networks (DNNs)
an elevated medical knowledge. (Mansour, 2018) is employed for
Molly is another classic example optimal diabetic retinopathy, a
of what we have extracted from AI, computer-aided diagnostic solution
which is a ML based digital nurse which serves as a key contribu-
that monitors the patient’s health tion to medical history on retinal
condition and keeps track of the issues. Google DeepMind has allied
follow ups. This is a remarkable on a project with Moorfields Eye
playoff that can save a million health Hospital, London to diagnose the
monitoring issues. Conception of causes of diabetic retinopathy and
drugs is another milestone in the age-related macular degeneration.
field of AI which saves time, money Beritelli et al. (2018) recom-
and sometimes life too. Ebola virus mended system based on neural
scare, a program powered by AI was network (NN) for heartbeat analysis
used to dissect existing medicine to (HBA). HBA is usually done for the
fight the disease and finally found patient to uncover the heart diseases.
two medications that may lesson The phonocardiogram records the
Ebola effectively on a calendar day. heart sounds. NN is trained by the
Digital image processing (DIP), a known data samples over three
branch of AI applications, has been in thousand in number, followed by
use for image analysis and processing. feature extraction—Gram polynomial
DIP is an evolving area of study by method.
researchers and academicians for the The Annals of Oncology recently
past year. This technique intended published the statistics that AI was
for deep analyzes the images from able to identify the skin cancer
different sources, namely scan images, patients precisely than doctors. The
MRI, CT, and so on. The potential of ML algorithms were applied to
these techniques is to categorize the the images of patients. The results
features in the image and classify demonstrate the efficacy of these
the same into whether the patient is algorithms in finding the affected
affected by disease or not, and even patients than human doctors. Billah
present the level of risk factor. et al. (2018) recommended an
ML techniques that were devel- improved feature extraction tech-
oped for retinal image analysis has nique which implemented the polyp
detection method for polyp miss rate, market growth is massive in 2021 by
thus aiding doctors to pay attention ten times compared to previous years.
and focus on a specific region. Deep
learning NN framework is imple-
21.2 MACHINE LEARNING (ML)
mented to address the critical issues
of neonatal. Harpreet et al. (2017) AI has its impact in the area of
proposed the cloud-based integrated ML where algorithms are devel-
Neonatal Intensive Care Unit data oped to learn the similarities in
analytics framework. This model data and develop decision rules
integrates and tracks the complete from that. Data mining problems,
assessment sheet of preterm babies. have embedded ML algorithms,
One of the significant areas constantly combined with statistical
where AI is successfully moving to methods, to extract knowledge from
transform, the impact of medicines the given data. The foundation of all
and providing assistance to pharma- the algorithms is a statistical based
cology. It helps in discovering the mathematical model, contribute to
new combination of drugs. It may various fields NN, Deep Learning
become an assistive technology that (DL), support vector machine
will empower the medical researchers (SVM), decision tree, naïve Bayes,
and practitioners to provide better random forest, etc., These models
treatments to serve their patients were used for analytics and decision-
having some critical diseases. making process. The conceptual
According to Accenture report, the AI model is shown in Figure 21.1.
FIGURE 21.1 Biomedical intelligence.

ML also finds its way in numerous the management, treatment, and

health care domains, including follow-up interventions if required.
diabetes, cancer, cardiology, mental
health and so on. Under research,
21.2.1 ML MODELS
most of the established ML models
and tools have explored the potential ML model is a complete abstraction
of prognosis, diagnosis, or differen- of the entire system under study. The
tiation of clinical groups (pathology objective of building a model is to
group and a healthy control group predict the outcome based on the
or groups with pathologies), learned inputs or completely opti-
thus signifying capacity towards mize the system with the intended
building computer-based deci- objectives. For instance, we build a
sion support tools. These systems model for disease detection system,
require adequately large datasets and then train them and classify them into
appropriate labels constructed with two possibilities: Detected or not.
enough participants and variable This completely involves training,
inputs provided by clinical experts. learning, and decision making. This
The idea is to identify the data struc- is depicted in Figure 21.2.
tures or variables such as clinical, Until now, the development in
behavioral, and demographic vari- the field of modeling originates
ables that can be associated with from a human perspective. The
the target outcome, to say whether model reflects upon the knowledge
a person has cancer or not. Hence, and understanding from this sort of
useful knowledge can be derived viewpoint. The neurons, molecules,
from the accessible medical data and the immune system are being
after applying ML techniques. This conceptualized this way. In addition
can empower patients to constantly to this type modeling, it can also
monitor their health status and it also be extended using computational
supports health-care professionals in methods to elaborate data and
decision making in areas concerning method. Learning is classified
FIGURE 21.2 ML model.

into supervised and unsupervised. it was done manually, it required

Supervised leaning establishes the more participation from the side of
relationship between the samples expert physicians. Nevertheless, the
of known output. In unsupervised complexity and ambiguity in the
learning, output variable is unknown, images that lead to the limited knowl-
it discovers the pattern by its own edge for the interpretation of medical
algorithmic method and devise the images, the automated ML system is
predictions. Majority of ML algo- used in this domain.
rithms are classified into either of
the above mentioned learning and
prediction. 21.2.2.1 MODEL PREDICTION
Basically, ML makes a prediction

21.2.2 USES OF ML: based on measurable, trained inputs.
BIOMEDICAL RESEARCH For instance, studies in psychiatric
medicine have used smart phone
In the field of biomedicine, many recordings of a human’s day-to-day
questions arise, that can be efficiently acts such as their wake-up time,
handled by ML techniques. In certain and duration of the exercise time,
cases, ML is useful for predictions, to predict their mood using ML.
answering questions related to drug In neuroscience, the decoding of
discovery, whether it is competent to
brain’s neural activity to infer inten-
cure the cancer or not. In other cases,
tions from brain measurements is a
it is used to develop a system which
common problem. This application
overcomes the shortcomings of the
is useful to execute the movement
manual system. In other instances,
of organs when developing interac-
the understanding of a system may
tive prosthetic devices by taking
be enhanced by ML by the revelation
of variables that are shared between measurements from the brain of a
the system components. paralyzed subject. Many such prob-
The interpretation of medical lems exist in biomedical research,
images using ML has been an authori- especially in areas, namely cancer
tative tool in the diagnosis and assess- detection, preventive medicine, and
ment of diseases. Legacy systems medical diagnostics. Such problems,
had discriminative features manually are interested to come up with
designed for classification such as prediction outputs with high preci-
abnormalities and lesions along sion. Machine learning methods are
with the segmentation of regions of built to reach that objective and so to
interest such as organs and tissues for obtain accurate predictions, it is best
different medical applications. Since to rely on such methods.
21.2.2.2 BENCHMARK 21.3 MACHINE LEARNING

TECHNOLOGIES
Other than describing and predicting
data, the other goals is to make the In this section, the AI devices or tech-
model completely understandable niques useful in the medical applica-
and precise. ML can be exception- tions are reviewed. In this section, we
ally advantageous by providing a explore the ML technologies (refer
benchmark. The challenge arises Figue 21.3) that are useful in medical
when we evaluate the model, applications. In general, they are
the model prediction is prone to categorized into (Jiang et al., 2017),
errors. As ML is useful to make • conventional ML
predictions, it may provide close • deep learning (DL)
results to human-produced models. • natural language processing
Model evaluation and testing is (NLP).
repeatedly done for achieving
the precise outcome, because 21.3.1 CONVENTIONAL ML
the model has to be on par with
the human generated model with ML algorithms are built on the top of
minimal variations. Conversely, if analyzing data. Generally, these kinds
the model completely varies then of algorithms tend to infer meaningful
the basis of the model is heavily insights from the medical datasets.
misguided. The ML technologies are applied to
FIGURE 21.3 Machine learning technologies (source: https://www.ibm.com/analytics/

machine-learning).
the patient’s database, to infer the extended to DL, which abstracts the
desired output, that is, condition of data model by a deploying number of
health status. The patient’s database deep layers (DNNs), thus making the
comprises the basic element of prediction accurate and concise even
information like name, age, disease, though if it is a high dimensional
lab report parameters relevant to data. Here, the process is formulated
the disease. It is the combination of into two steps,
transactional and image data like
• Multiple layers process the
scanned images, CT Scan, MRI, and
data. DL extracts the data
medication facts collectively.
abstraction through the
Furthermore, the outcomes
multilayer learning process.
are researched for the level of the
Thus the huge amount of
disease; this is the output parameter
data is processed to extract
(Y). For instance, in tumor disease
meaningful insights. As it is
prediction, the Y parameter is the
a deep consecutive layer, the
size or stage of the disease based on
intention of learning abstrac-
the input parameter (X) of the patient.
tion is done through the layer
ML algorithms can be divided into
to layer as is follows hierar-
two major categories based on
chical mechanism, thus lead
whether the outcomes can be incor-
to the output of each layer is
porated or not. The ML algorithms
given to the input of the next
fall into two categories namely,
layer.
• Supervised • The final output data repre-
• Unsupervised sentation developed by DL
algorithms provides construc-
The unsupervised algorithms are tive information. It is indeed
generally used for feature extrac- a simpler model working
tion, however supervised algorithm resourcefully on complex
establishes the relationship between data sets. DL also interprets
X (input) and output (Y) by predicting with varied type of data text,
the output via input. The semisuper- image, audio, and video. This
vised algorithm combines both super- system is further extended into
vised and unsupervised approaches. derive relational and semantic
knowledge from raw data.
21.3.2 AN EMERGING LEAP IN
The foundations of DNN layers
ML: DEEP LEARNING (DL)
are established on artificial neural
With the leap and bounds of networks, which contains multiple
emerging technology, ML can be deep layers. Having this as a nature
of the mechanism, DL can discover 21.3.3.1 TEXT PROCESSING

more complex nonlinear patterns
in the given data. The surge of the In this step, NLP uncovers the series
volume and complexity of data is of disease relevant key words from
one more reason for the mounting the unstructured clinical notes.
call for deep learning. The efficacy The keywords are identified with
of these layers found in handling the reference form the historical
complex medical data, in contrast to databases.
CNN. Big medical database in reality
creates a challenge in managing and
analyzing complex database. Many 21.3.3.2 CLASSIFICATION
researches are intended for medical
big data analytics. All the keywords identified in the text
processing stage are grouped into a
different number of subsets. Then
21.3.3 NATURAL LANGUAGE these subsets are analyzed and clas-
PROCESSING (NLP) sified into a two possible outcomes:
normal or abnormal. These decision
The images, the EP and genetic data aids the treatment methodology,
are machine-understandable and monitor the health progress and
thus ML algorithms can be directly extended clinical support in future.
employed after preprocessing or Fiszman et al. (2000) showed
quality control processing. But, that NLP-based system provided
when it comes to clinical informa- antibiotic-assistance system, for an
tion, a large proportion of informa- anti-infective therapy alert.
tion is in unformatted form related Miller et al. (2017) used NLP
to running text, doctor’s note, lab laboratory monitoring system for lab
reports, and discharge summaries. adverse effect prediction. The NLP
To handle this type of unstructured pipelines also helps in disease diag-
form of data, NLP replaces the nosis. Castro et al. (2017) developed
conventional ML approaches. NLP an NLP-based system for identifica-
processes any form of unstruc- tion of 14 cerebral aneurysms from the
tured data to extract meaningful unstructured clinical notes. The infer-
information. ences from this approach, collectively
NLP encompasses two major developed key variables for classifica-
parts: tion of normal patient and affected
one with the accuracy of 95 and 86
• text processing percentiles on training and validation
• classification of the model accordingly. Afzal et al.
(2017) suggested NLP-arterial disease in numbers by the end of this year.

keyword extract system from the This poses diverse challenges such
narrative clinical notes. The inferred as parallel processing infrastructure
keywords are classified into patients and framework, storage management
with arterial disease or not, with the of complex data, highly developed
accuracy of 90%. fault tolerant system. The cloud tech-
nology based big data frameworks
are well suited for complex health
21.4 ADVANCED care management system( Ngiam
INTELLIGENCE WITH BIG DATA et al., 2019). The incorporated
ANALYTICS health care system, possibly caters
the need of all stakeholders of the
The analysis of health care big data, system. The AI based smart hospital
with ML extends the ascendancy of system is one such modern develop-
data science. The emerging health ment to monitor the patient at the
records growing in number of bytes remote level and give personalized
/ per second, leads to innovation medication.
in handling these data and process The fundamental properties of
it. The electronic health record health care big data (Hassan et al.,
(EHR) exceeds more than trillion 2019) are tabulated in Table 21.1.
TABLE 21.1 Big Data Characterictics

Characteristics Explanation
Variety This is the huge collection of data sets of different kind. For
example, image, text, reports, EHR and so on. The sources of
data are from wearables, sensor, survey, clinical notes, socio -
economic data, pshycosocial data, etc.
Volume The huge accumulation of data creates dataset ranges from
million to trillion KB in size
Velocity The rate at which the data are collected and analyzed. The real
time analytics , require the data analytics tools to process the data
as soon as they arrive
Veracity As the health care decisions are significant steps in data mining,
the authenticity of the data is identified by analyzing the different
metrics such as originality, origin, and the data collection
methods involved
Value The objective of big data analytics is to derive the meaningful
insights and decision making. Interpretation of medical data ,
deriving decisions are very crucial for diagnosis
The different types of analytics databases are compared

(Abidi et al., 2019) that are applied and analyzed for enhanced
to health care data are, decisions.
• Semantics based
• Decision based
§ This methodology identi-
§ The analytics based deci-
fies the semantic correla-
sion aids the practitioners
tion between the input
for improved medication.
variables.
The ML decision models
are used to generate Thus, the combination of AI,
decision rules, personal- big data, IoT and ML are ultimately
ized medications, care changing the perspective of the
plans, risk alert, and conventional analytics to combat the
management. issues of scalability, heterogeneity
• Predictive based of data, real-time analytics, single
§ This kind of analytics, unit/patient level analytics, and
process the entire medical complex methodology. Even cloud
records, and predicts based system endure the low latency
the outcome. Based on when the data is enormous. This
analysis of past history issue is handled by fog computing
and prediction is made in infrastructure(Anawar et al., 2018),
the near future. which provides virtualized fog nodes
• Prescriptive based to which IoT devices are connected.
§ This is intensive analytics The fog infrastructure provides the
based on the complex low latency with a rapid response.
decision rules. Thus To develop a complex health care
require deep knowledge system, fog computing is always
and constraints to analyze considered as the best options for real
big datasets. This is still time streaming, decision making,
in the thriving stage of and high response time.
research domain.
• Comparative based 21.5 APPLICATIONS
§ Comparative analytics
assimilate the two or more In this section, we explore the appli-
input methods to derive the cations and proficient ML algorithm
conclusion/outcome. The in the different facets of biomedical
model is generally based domain. Generally, we uncover the
on probabilistic methods. applications that are constructive in
In this scenario, both many instances such as intense data
clinical and administrative analytics and management of data
accumulation, extract an appropriate major study of the brain activity,

element from the huge repository of which is depicted in Figure 21.4.
data (both text and image), reliable Recent research interests are fasci-
treatment: As this technique analyses nated towards the cognitive and
billions of information very quickly behavioral neuroscience(Jean-Rémi
and precisely diagnose with minimal King et al., 2018).
errors, precise system development,
reduced cost for the treatment, 21.5.1.1 DECODING
reduced time to diagnose, achieving
the objectivity, keep abreast of The decoding mechanism aims to
technology, simple gadgets and self- identify the human intentions based
monitoring or self assisted guide, in on the brain activity, for instance, to
home monitoring, embryo selection analyze the activity in the brain and to
of IVF, genome interpretation sick predict the intended movement when
newborns, voice medical coach, they move an exoskeleton with their
potassium blood level prediction, thoughts. The standard mechanism
mental health analytics, paramedic uses a Wiener filter, where all the
diagnosis, promote patient safety, brain signals are linearly combined for
and death prediction in hospital. prediction. However, with the advent
of many ML approaches provides a
different mechanism in handling the
21.5.1 NEUROSCIENCE
same which gives enhanced perfor-
Neuroscience is the analysis of mance over the conventional tech-
human nervous system. In neurosci- niques. The ML techniques, linear
ence, decoding and encoding are Wiener filter, nonlinear extension
FIGURE 21.4 Encoding and decoding mechanism (Reprinted from Wen, Haiguang, et al.
Neural encoding and decoding with deep learning for dynamic natural vision. Cerebral Cortex.
2017, 28(12), 4136-4160. https://arxiv.org/ftp/arxiv/papers/1608/1608.03425.pdf)
(Wiener cascade), Kalman filter, accordingly. This syndrome is preva-

nonlinear SVM , extreme gradient- lent in working age group of adults.
boosted trees, and NN were used for ML acts a vital role in the diabetic
decoding mechanism. retinopathy diagnosis, where CNNs
are used for diabetic retinopathy
staging. The color fundus images are
21.5.1.2 ENCODING processed by CNN and the outcome
is the prediction of the syndrome in
Neural encoding (tuning curve a more accurate way.
analysis) relate to the study of infor- The absolutely automated system
mation representation in a neuron for identifying the syndrome, thus
or a brain region to recognize how
provides an opportunity in preventing
they transmit with sensory stimuli.
the vision loss in our population
Such a study will lead to deriving
globally. Many research works are
the insights. The ML model for
advanced towards combating this
encoding, plots the signal against the
issue, by providing leading ML tech-
function of external visual stimuli or
nology one such is CNN algorithms.
any brain movements. This simple
The reason behind using this archi-
model leads to decoding mechanism
tecture is efficacy in processing huge
proficient.
images even learning is made from
the raw pixels. This model is not
21.5.2 DIABETIC EYE simply for the diabetic retinopathy
RETINOPATHY also used to diagnose other diseases
as well.
Diabetic retinopathy (DR) is on the The Google-Net project devel-
increase cause of diabetic, which oped by Google evaluates the
is preventable blindness if treated strengths and limitations of the CNN
on time. It is normally found by architecture. The 22-layer architec-
examining a retinal scan by the ture, with higher accuracy processes
trained doctor. In the early stage the given image with heterogenous
of this ailment is found, there is a sized spatial filters in combination
progressive treatment could be given with the low dimensional embed-
to the patient at the precise time ding. The deeper layers learn the
and the issue is effectively handled system precisely by analyzing
thus avoiding irreversible blind- the deep features across multiple
ness condition. Early recognition layers. Every layer is responsible for
of this state is significant for better exploring certain features (i.e.) first
diagnosis. So, the real challenge layer identifies the edges, second
is predicting the syndrome and act interpret exudate, a classification of
the features present. The activation of 93.1% for DME. It is the greatest
function is applied on the layer (top milestone in Google’s research accom-
layer), which maps the input and plishment to produce high sensitivity
output variables. This is followed by and specificity, with a minimal elimi-
the normalization procedure of each nation of diseased patient.
convolution layer. This is carried out
as the batch normalization when the
features are elevated.
The scanned images are mixed
with both macroscopic and subtle
features. Many research method-
ologies developed for identifying the FIGURE 21.5 (A) Healthy retinal fundus
major features, however the subtle image on the left (B) On the right, red
features are crucial for diagnosis. spots signifies the affected retina due to DR
(source:https://ai.googleblog.com/2016/11/
The architectures developed was
deep-learning-for-detection-of-diabetic.html).
tested on ImageNet dataset only
exploited the macroscopic features.
This leads to the new paradigm of 21.5.3 PROSTATE CANCER
model that is capable of identifying
even subtle features. The two Prostate cancer is likely uncommon
stage- CNN model uses a pipeline, and non-aggressive in nature.
for feature localization followed by However, the identification this
classification. Preprocessing is done type of cancer poses a challenge
to eliminate the nonrelevant features, in treating the patient, by either
and network weights are adjusted to surgical method or radiation therapy.
deal with the class imbalance. So So, technology acts a key factor in
this model, screens and identifies the measurement of risk factor. In
even the mild disease to multigrade Gleason grade (refer Figure 21.6),
disease detection. the parameter risk stratification is
Google developed brain project identified leads to further diagnose
(refer Figure 21.5), based on DL algo- how cancer cells closely resemble
rithm that can inspect huge numbers the normal one under the micro-
of fundus images and automatically scopic study. However, this conven-
discover DR and diabetic macular tional method has a major part of
edema (DME) with an elevated accu- importance in clinical diagnosis, but
racy. The system is tested with two it is a very complex and subjective
batches of images (11,711), produced technique. This is evident from the
a sensitivity of 96.1% and 97.5% for studies and report from interpatholo-
diabetic retinopathy and a specificity gist disagreements.
Besides, the trained patholo- a Gleason pattern and identifies the

gists are minimal in number when tumor region closely resembles the
compared to the global call for the normal cell. It produces and suggest
prostate cancer treatment across in two grades based on Gleason patterns,
the globe. From the latest guidelines, higher the grade, the greater is the risk
the pathologist’s report deviates factor, thus the patient need utmost
a small percentage by identifying care in the treatment. Hence, the DLS
the different Gleason patterns. competently combats this issue.
This depends on the parallel or the
supporting technology for the precise 21.5.4 METASTATIC BREAST
identification, treatment, and clinical CANCER
management. In general, these issues The microscopic study done by the
recommend for an improved DL pathologist is generally considered
method very much similar to Google as the common and effective proce-
metastatic breast cancer detection. dure for the cancer diagnosis and
DLS (Nagpal et al., 2019)-based treatment. The challenging factor is
Gleason Scoring of prostate cancer, in analyzing the cancer spread from
possibly will improve the accuracy the affected region to the nearby
and neutrality of Gleason grading lymph nodes. Thus, identification of
of prostate cancer in prostatectomy metastasized part is critical. In TNM
specimens. The proposed model cancer staging, identification nodal
categorizes the part of the slide into metastasis are considered.
FIGURE 21.6 Gleason pattern (source: https://training.seer.cancer.gov/prostate/abstract-

code stage/morphology.html.
In nodal metastasis, the core

region of cancer cells where they
are created, tend to spread across
other parts of the body through
the lymph system. The metas-
tasis process majorly influences
the breast cancer detection and
follow up treatment such as radia-
tion, chemotherapy, and surgical FIGURE 21.7 LYNA.
method. The timeline for iden- (Source: https://ai.googleblog.com/2018/10/
applying-deep-learning-to-metastatic.html)
tifying nodal metastases impart
greater significance.
DL-based lymph node assis- In Figure 21.7, on the left side
tant (LYNA) provides improved is the view of lymph node, right
accuracy with the gigapixel slides side image where LYNA identifies
of lymph nodes from patients. the tumor region with red color, and
This issue was addressed by the blue colored are non- tumor region.
researchers to develop a well
optimized algorithm for cancer 21.5.5 ELECTRONIC HEALTH
detection. As these developed RECORD (EHR)
methodologies are tested on the real
time data, hence the model has to EHR maintains the complete history
be trained effectively for a random of the patient from the entry into the
number of test cases and samples hospital until their discharge, which
before applying the actual data. is presented in Figure 21.8 EHR is
This demonstrates the strength and the digitized collection of patient
weakness of any proposed system. information, not only contains
This system should be evaluated by the medical information also the
the pathologists, to infer whether the complete treatment plans (Islam et
proposed system is beneficial or not. al., 2018), history, and act as a tool
The research developments for decision making. So many param-
in breast cancer detection LYNA eters are taken into consideration
algorithm, efficiently supports the and predictions made, the date of
breast cancer staging, and assessing the patient’s discharge from hospital,
its impact in a diagnostic pathology. the recovery time, readmission prob-
Still, these methods have limitations ability, potential treatment methods,
when it comes to larger datasets and suggestions. ML techniques aid
involving multiple slides. This arises doctors to accomplish the aforemen-
the future challenge in DL algorithms. tioned objectives for clinical health
care management system. Such a The DL-based EHR system,

system has the following properties: lessens the burden of manual system.
Generally much effort is given for
• Scalability: Any digitized
collecting, cleaning and analyzing the
system, should be competent
clinical data. This system takes up the
in handling data of minimum
raw data (raw images in its original
to maximum size. As data is
form), followed by preprocessing then
escalating for every second,
finally, manipulate relevant records
huge amount of informa-
into meaningful insights. The complex
tion is being accumulated,
EHR data are efficiently handled by
which should be handled
the DL system with enhanced accu-
and managed efficiently. The
racy. The decision-making system
medical records are volumi-
significantly affects the outcome of
nous in size, as it is the blend
the patient data, thus highly developed
of both text and images.
algorithms are desirable and tested by
• Accuracy: This is the
intensified procedure.
important factor for any
As mentioned in ML model,
proposed system. The
we did not have to manually select
prediction accuracy affects
the input parameters. Instead, it
the decision making, wrong
establishes the correlation between
prediction leads to improper
the input and output parameters by
diagnose and treatment. learning the test data. It learns from
There is much significance newest to oldest data, then predicts
is given in gauging how well the outcome. Moreover, this process
the system performs in terms involves huge database, in such
of accuracy metrics. case, recurrent neural networks and
FIGURE 21.8 Electronic health record.

feedforward networks are suggested AR technique, providing effective

for decision making. analysis from all dimensions.
The recent development,
21.5.6 AUGMENTED REALITY augmented reality microscope (ARM)
MICROSCOPE FOR CANCER (refer Figure 21.8) can possibly help
the pathologists in assisting and
DETECTION
exploiting the DL based technology.
Virtual reality and augmented reality This technology has been more
(AR) have been evolved as one significant in clinical analysis, highly
of the promising technology for appreciated by pathologists around the
future medical developments. Many globe. The inbuilt light microscope
research works are inclined towards enables the real-time image analysis
this area of expertise. In particular, and provides the results after applying
the ML technique into it. The light
radiological images are important
weight property is essential, because it
in analyzing the follow up surgical
can be mounted on the existing micro-
procedure. Izard et al. (2019) proposed
scope available in the hospitals. The
the study based on AR technique.
low cost, readily available compo-
The proposed system enhanced the nents make this system accessible and
visualization of radiological image utilizable across the country.
from 2D to 3D model. This allows Modern computational DL
the visualization procedure, effective built upon TensorFlow, will permit
where multiple dimensional analysis broader technologies to run on
is possible. The images under study this platform. In contrast to the
is segmented by computer vision and conventional analog microscope, the
FIGURE 21.9 ARM (Source: https://ai.googleblog.com/2018/04/an-augmented-reality-

microscope.html)
digitized projection superimposes methods in virtual drug screening,

on the raw image to have the better increases the impact on this entire
understanding in quantifying features process of drug discovery.
of interest. Significantly, the system
model updates quickly by 10 frames/ 21.5.8 SLEEP STAGING
second as the analyst move the slide.
The robust technology makes the Sleep is imperative for good health
analysis in a very quick time and of a human being. Sleep staging
with enhanced prediction accuracy. (refer Figure 21.9) is vital for evalu-
ating sleep disorders. In most of the
sleep study, contact sensors are used
21.5.7 DRUG INVENTION
which may affect the natural sleep
Drug discovery is the process of and provides biased results. Avail-
identifying the new medications ability of sleep study research is
after a long run research. Many years limited, thus elevates the complexity
have been invested by combing and demand. The novel approach
possible methods and tested for is required for rapid eye movement
new discovery. As the growing (REM), non-REM, and wake staging
population across the globe, the new (macro-sleep stages, MSS) estima-
diseases are also emerging lead into tion based on sleep sound analysis
the toughest challenge in treating (Dafna et al., 2018). The different
the same. This identification of stages of sleep are categorized into,
compound involves testing millions
of drug-like compounds, in the verge • awake
of finding new combinations. So, • rapid eye movement
sophisticated tools and technologies • non-rapid eye movement
are involved in this process. These are the cycles of different
In recent times, NN (Chen et al., stages occur about ninety minutes.
2018) has been applied in drug iden- However, these cyclical changes
tification and screening. The virtual get affected due to modern life-
drug screening provides testing for style, stressful environment, and
all possible combinations and the poor eating habits that can lead to
computational process is done with cardiovascular and cerebrovascular
high speed with a reduced time factor. diseases. The assessment and assis-
The virtual screening is done on the tive technology are considered
samples of the test experiment under essential to monitor the sleep.
study. The potential increase in data The conventional polysomnog-
under screening continues to grow, raphy (PSG) method measures the
which can be handled with multitask spectral analysis of EEG signals.
neural networks. The enhanced ML The disadvantage of this method is
its bulkiness in size and present us This is represented in Figure 21.10.

some technical challenges. Thus, Wearables play a vital role nowadays
collecting the signal recording tracking and monitoring the health
during sleep is cumbersome. To data in an uncomplicated and user
overcome from this method, many friendly manner. Let us explore
researchers have suggested various how these technologies use the deep
other parameters like heart beat rate, analytics and make predictions.
inhaling rate, and electrocardiogram Wearables ranges from arm bands,
(ECG). Alternatively, ECG signals watches, smart clothing, shoes, smart
are easier to record and diagnose. glasses and so on. Generally, they are
Recent advancements in DNN used to track pulse rate, blood pres-
architecture is used in sleep analysis, sure, and temperature. The record-
which classifies the sleep stages into ings and readings from the gadgets
wake (W), REM, and non-REM. were remotely monitored and the
The DNN algorithms competently real time data is further analyzed
categorize the sleeping stages and by the AI system. Many hospitals
ascertains if any abnormalities incorporated this tracking method
present. In comparison with the after a patient is discharged from the
conventional PSG method, this hospital and successfully reduced the
approach is considered efficient readmission rate and an emergency
in terms of reduced physiological visit to the hospital.
parameters and reduced factors that From the report of WHO, it is
affect the sleeping state. This helps evident that sixty percent of the health
to better understand the measure- related problems are directly related
ment of usual sleep. This study has to individual life cycle. If every
great projection in the advancement individual is keen on tracking health
of monitoring for sleep disorders and measures regularly, the life expec-
respiratory diseases. tancy will increase in a greater way.
AI aided wearable enables to monitor
21.5.9 WEARABLE TECH (IOT) vital signs, set a reminder for medica-
tions, and other health parameters.
In this technically fast and modern
era, factor of time is crucial in all
21.5.10 CROWDSOURCING—
aspects. Automation of devices and
AI
interconnection is another mile-
stone in the digital technological Crowdsourcing is an up-and-coming
development. Thus IoT, ML, AI are area of expertise in research.
fabricated into small gadgets, with AI researchers are nowadays
simple interface likely to enhance inclined into crowd opinion to
the life eminence and expectancy. build advanced prediction ML
FIGURE 21.10 Sleep stages.

(Source: https://www.tuck.com/stages/#how_your_sleep_cycle_changes_with_age)
FIGURE 21.11 Wearable technology.

(Source: https://mobisoftinfotech.com/resources/blog/wearable-technology-in-healthcare/)
model based on informed deci- “Speech by Crowd” is one of the

sions. The crowdsourced-AI tools NLP-based platforms, to enable the
collaboratively gather information crowdsource approach. But the chal-
via a common platform. The group lenge is how to structure the discus-
members can analyze, share ideas sions and extract the significant
and visualizations. The outcome insights. Swarm Intelligence based
of such a discussion converges on Artificial intelligence systems are
optimized decisions. coupled to attempt these challenges.
21.6 RESEARCH CHALLENGES systems are added advan-

AND FUTURE DIRECTIONS tages for predictive analytics.
• Constructive to the society
Given the scope and application of Every system has a flip side.
AI–ML based system in various AI systems are capable of
domains of health care, this area complex decision making at
of expertise is in vibrant growth in the same time, might be a
large scale. The challenges include: threat to the human kind if it
• Data management is used for vicious purposes.
Health care data management The newly developed system
involves data collection, data should be benefaction to the
storage, processing, and society.
management. Data collection • Government AI implemen-
methodologies comprise of tation strategy—Reach to
different tools and techniques common man
for instance, surveys, medical An index for artificial
records, community health readiness compiled by Oxford
care center, lab reports, along with the International
physician notes, and admin- Development Research
istrative records. A frame- Centre has thrown the light on
work has to be intensified the Government policies on
for complex data handling AI implementation strategy.
and data management. The Two basic queries were raised,
advanced health care systems • National government imple-
are deployed on the cloud menting AI-enabled public
and big data technologies. services
• Reduced complexity with • Beneficial to the community
enhanced and simple usage The index is measured by
The data analytics tools are taking different metrics of
generally used for generating AI systems. It is grouped by
reports and visualization of four significant metrics like
data. To implement AI based ascendancy, infrastructure,
health care system, the key readiness, and services.
challenge is user friendly Recent technologies provide
interface. The complexity of an opportunity to enhance
the system implementation the conventional health care
is completely hidden from system, hence governments
the user/stakeholder perspec- should be prepared to abso-
tives. The user-friendly lutely employ this technology.
apps and graphics interface Governments are instigated to
take up these challenges yet to demand to develop sophisticated

avoid the associated risks. algorithms and train the data values
• Not a replacement for rigorously before the system assist in
health experts, instead as the prediction and suitable treatment
an aiding tool suggestions and refinements. It may
Indeed, the latest AI-enabled become an assertive technology that
treatment enhance the will empower and tutor the medical
accuracy, but not replace the researchers, practitioners and
doctors. It is always consid- academicians to provide better study
ered as the aiding tool, not as programs along with prescribed
the replacement. treatments to serve their patients
• Reduced cost having a terminal illness with acute
This assistive technology is critical diseases. Technology is an
only made available to all ever growing and changing with
when it is completely afford- time as so grows with countless
able. This factor is considered challenges which have to be handled
important as this should be with advanced technology with
advantageous to the society. specific and core AI models. On the
same regards and notion ML is also
21.7 CONCLUSION trying to oversee the applications
and possibilities of medical theories
In this chapter, we have effectively with assertive symptomatic services,
analyzed and interpreted various which will drastically improve the
aspects of AI and machine learning accessibility and the accuracy of the
(ML) enabled technologies, prodi- medical services for the common
gies and its independent applications. man and mankind.
In the current research scenario the
developments in AI and ML across KEYWORDS
various domains along with the
applications of these technologies artifcial intelligence
are prominent in biomedical study machine learning
and the health care industry with biomedical AI
proven working methods. A potential deep learning
machine or deep learning system has
big data analytics
the competence to adopt for rapid
development in data accumulation
and not only the textual data, but REFERENCES
it also embraces and examines the Abidi, Syed Sibte Raza.; Samina Raza
image, genetic, and electrophysi- Abidi. Intelligent health data analytics:
ology data. There are a need and A convergence of artificial intelligence
and big data. Healthcare management mining. Healthcare. Multidisciplinary

forum. Sage CA: Los Angeles, CA: SAGE Digital Publishing Institute. 2018, 6, 2.
Publications. 2019. Izard. Santiago González.; et al. Applications
Afzal, Naveed.; et al. Mining peripheral of virtual and augmented reality in
arterial disease cases from narrative biomedical imaging. Journal of Medical
clinical notes using natural language Systems. 2019, 43(4), 102.
processing. Journal of Vascular Surgery. Jean-Rémi King.; Laura Gwilliams.; Chris
2017, 65(6), 1753–1761. Holdgraf.; Jona Sassenhagen.; Alexandre
Anawar, Muhammad Rizwan.; et al. Fog Barachant.; Encoding and Decoding
computing: An overview of big IoT data Neuronal Dynamics: Methodological
analytics. Wireless Communications and Framework to Uncover the Algorithms of
Mobile Computing. 2018. Cognition. 2018, hal-01848442.
Beritelli F.; Capizzi G; Sciuto. G. L.; Napoli. Jiang, Fei; Yong Jiang.; Hui Zhi.; Yi Dong.;
C.; Scaglione. F. Automatic heart activity Hao Li.; Sufeng Ma.;Yilong Wang.; Qiang
diagnosis based on gram polynomials and Dong.; Haipeng Shen, Yongjun Wang.
probabilistic neural networks. Biomedical Artificial intelligence in healthcare: past,
Engineering Letters. 2018, 8(1), 77–85. present and future. Stroke and Vascular
Billah. Mustain.; Sajjad Waheed. Neurology. 2017, 2(4), 230–243.
Gastrointestinal polyp detection in Mansour, Romany F. Deep-learning-based
endoscopic images using an automatic computer-aided diagnosis
improved feature extraction method. system for diabetic retinopathy. Biomedical
Biomedical Engineering Letters. 2018, Engineering Letters. 2018, 8(1), 41–57.
8(1), 69–75. Miller, TP.; Li Y.; Getz KD.; Using electronic
Castro, VM.; Dligach D.; Finan S et al. medical record data to report laboratory
Large-scale identification of patients with adverse events. British Journal of
cerebral aneurysms Haematology. 2017, 177, 283–6.
using natural language processing. Nagpal, Kunal.; et al. Development and
Neurology. 2017, 88, 164–168. validation of a deep learning algorithm for
Chen. Hongming.; et al. The rise of deep improving Gleason
learning in drug discovery. Drug Discovery scoring of prostate cancer. npj Digital
Today. 2018, 23(6), 241–1250. Medicine. 2019, 2(1), 48.
Dafna, E.; Tarasiuk. A.; Zigel. Y. Sleep Ngiam, Kee Yuan.; Wei Khor. Big data and
staging using nocturnal sound analysis. machine learning algorithms for health-
Scientific Reports. 2018, 8, 1. care delivery. The Lancet Oncology. 2019,
Fiszman, M.; Chapman WW.; Aronsky. D.; 20(5), e262–e273.
Automatic detection of acute bacterial Singh, Harpreet et al. iNICU-Integrated
pneumonia from chest X-ray reports. Neonatal Care Unit: Capturing neonatal
Journal of the American Medical journey in an intelligent data way. Journal
Informatics Association. 2000, 7, 593–604. of Medical Systems. 2017, 41(8), 132.
Hassan. Mohammed K.; et al. Big Data
Challenges and Opportunities in Healthcare
Informatics and Smart Hospitals. Security
in Smart Cities: Models, Applications, and
Challenges. Springer, Cham. 2019, 3–26.
Islam. Md.; et al. A systematic review on
healthcare analytics: Application and
theoretical perspective of data
CHAPTER 22
NEURAL SOURCE CONNECTIVITY

ESTIMATION USING PARTICLE
FILTER AND GRANGER CAUSALITY
METHODS
SANTHOSH KUMAR VEERAMALLA and
T. V. K. HANUMANTHA RAO*
Department of Electronics and Communication Engineering,
National Institute of Technology, Warangal, Telangana 506004, India
Corresponding author. E-mail: tvkhrao75@nitw.ac.in
*
ABSTRACT on a particle filter. Modeling of the

time series (multivariate autoregres-
Connectivity is one of the major sive) is used to detect movement and
concerns in human brain mapping. time dependence among the brain
It shows the connections across sources. Finally, Granger causality
different brain regions through the techniques have been applied to
nervous system. Until now, the assess directional causal flow across
connectivity between the electro- the sources. We provide a framework
encephalogram (EEG) signals has to test the analytical pipeline on real
been calculated without taking EEG information. The results indi-
into the consideration of volume cate that the suggested strategy is
conduction. Even though some of useful for evaluating the directional
the methods show the flow across connections between EEG neural
the scalp sources, we need a prior sources.
assumption about active brain
regions. In this chapter, we suggest
22.1 INTRODUCTION
a new strategy to identify brain
sources with their corresponding Connectivity is considered to be
locations and amplitudes depending anatomical (structural) functional
(symmetrical) or effective (asym- networks among cerebrums neurons

metric) interaction between cerebral (Blinowska, 1992; Haufe and Ewald,
subsystems (Bullmore and Sporns, 2016; Haufe et al., 2013; Kaminski
2009). The analysis of the physical and Barnett and Seth, 2015; Liao et
composition of the brain is done at a al., 2011).
moment by evaluating the anatomical Granger casualty constitutes one
connections in the brain that is of the designs that depend on effi-
called as anatomical connectivity. cient methods for network estimates
Symmetric or functional connec- (Ding et al., 2006). This technique is
tivity is known as moving relationship used for a linear prediction model,
between areas of the brain during the such as the multivariate autoregres-
whole movement of neuronal data. sive (MVAR) model. Directed
It can be predicted either in a time transfer functions (DTFs) and partial
domain or in a frequency domain directed coherence (PDC) constitute
by connections between neurons. some of the measures available to
Efficient or asymmetric connectivity this model (Barnett and Seth, 2013).
examines the effects of one region When examining connectivity
of the brain on another. The aim is between cerebral areas, the crucial
to detect which regions of the brain issue is that associations between
can induce other systems during the scalp EEG signals cannot match the
variation of neuronal data (Nunez connections between hidden neural
and Srinivasan, 2006). The distinction sources. It is because EEG signals
between symmetric and asymmetric do not deliver the median activity
connectivities is that asymmetrical within the electrode region. This
connectivity is depicted as a driver's means that each sensor collects a
relationship with the recipient, while linear superposition of signals from
symmetrical connectivity is defined across the brain instead of measuring
as a relation between cerebral neurons activity at only one brain site. This
(Haufe, 2011). Electroencephalo- superposition of signals creates
gram (EEG) information was used immediate associations in the data
to discover directional movement and can detect improper connec-
between neural sources by effec- tivity. It is because the locations
tive connectivity techniques such of the channel cannot be seen as a
as Granger causality (GC) models. physiological location of sources.
Because of the complex relationship Dynamic sources should be
between cerebral areas and their found, and the network between
practices, it is important to under- them is estimated in the initial
stand the causal connections between section of efficient connectivity
cerebral initiation and effective assessment. The dipoles can be
Neural Source Connectivity Estimation 495
located using anatomical or physi- 1988) was based on two hypotheses:

ological information or estimated (1) the dipole numbers must be fixed
using various approaches for and (2) the source locations are
localization of dipole. At present, static in time. The area and quality
there are two main fields of studies assessment of EEG signals electrical
in neural generator modeling. The current is in any case ill-posed (Miao
first modeling technique involves et al., 2013). However, the number
the use of imaging models that of neural dipoles and their posi-
illustrate the data through a dense tions varies with time generally. In
group of current dipoles at fixed (Sorrentino et al., 2009), at each step
locations. The second technique is of the measured data, the number of
a parametric approach that uses a neural dipoles and their positions is
single equivalent current dipole to dynamically estimated and updated;
replace the dense sets of existing they are based on random finite
dipoles (Makeig et al., 2002). While sets (RFSs) to handle the unknown
imaging-based methods can provide number of dipoles. In this chapter,
a comprehensive map of neuronal we used a particle filter (PF) for the
behavior in the brain, a parametric location by locating the positions
method offers EEG readings directly and amplitudes of the dipoles, which
to a small amount of parameters is equivalent to current dipoles from
(Mosher et al., 1992). With the para- EEG signals.
metric approach, equivalent current The communication between
dipole models may be more intuitive sources should be shown in connec-
in their interpretation, explaining tivity evaluations after retrieval
the electrical activity of a brain of the dipole with its locations.
and promoting it in emerging tech- Connectivity is calculated using
nologies such as brain–computer dipole communication modeling and
interface systems. The assessment of model-order estimation. The MVAR
places for the corresponding dipole procedure shows the connection
sources in the 3D brain volume between the sources in the brain,
using EEG measurements collected especially in terms of the effect of
from scalp is a significant task of one variable on the other.
this parametric strategy. In EEG To the best of the authors’
source localization, most of the knowledge, there is no systemic
previous work (Antelis and Minguez, empirical study on neural source
2009; Galka et al., 2004; Gordon et connectivity estimates, relying
al., 1992; Mosher and Leahy, 1999; on both PF and GC techniques.
Mohseni et al., 2008; Sorrentino et In this chapter, we suggest a new
al., 2007; Van Veen and Buckley, strategy to estimate neural source
space connectivity. In this chapter, to assume the location of underlying

we propose a framework that the neural activity. This methodology
connectivity studies are based on involves a forward problem and
a two-step approach. The first step an inverse problem. The forward
involves an estimation of the brain problem consists of creating a model
sources and its time course using of how electrical activity propagates
an inverse method, then calculating from the source space to the sensor
the connectivity metrics using the space, and its result is the lead field
estimated time courses of brain matrix. Once this model is available,
source. This approach can be applied the inverse problem consists of
to any sort of EEG information, estimating cortical activations given
as it can identify the directional the sensor (scalp) measurements, by
connections between dipole sources imposing some constraints (due to
without knowing the EEG source the ill-posed nature of the problem).
location beforehand. This chapter The volume conduction model in
mainly aims to identify the area of the head consists of nested concentric
the brain that can handle statistical spheres (scalp, skull, and brain) with
dependence in cerebral neurons. the constant conductivity (Miao, et
The rest of this chapter is orga- al., 2013; Federica et al., 2010). To
nized as follows. In Section 22.2, develop a solution for the source
we briefly describe the EEG source localization model in EEG, consider
localization model and introduce that N dipoles in cerebrum exhibit
the PF with systematic resampling electrical activity. To evaluate such
for source localization. Next, we kind of activity, the multichannel
discussed the effective connectivity EEG data can be used with zt ı ;
measures for source connectivity. with ns number of sensors at a given
The results obtained using the time t, the forward EEG model is
proposed technique are demon- given by
strated in Section 22.3. Finally, this
chapter is concluded in Section 22.4.
zt = i =1 Li ( xt ( i ) ) st ( i ) + vt
N
(22.1)
22.2 METHODS wherein a 3Dlocalization vector is

represented as xt(i), and the lead field
22.2.1 EEG SOURCE matrix and 3D moment vector (the
LOCALIZATION MODEL source signal) for dipole i are denoted
by L ( x (i )) = ı and s (i ) , respectively. The
i t
ns×3
t
Localization of neural sources based EEG model has the observation

on EEG utilizes scalp potential data noise represented by vt. To compute
a forward model “zt,” the a priori the dynamic parameters, that is,
knowledge of head geometry, location (x, y, and z directions) in
electrode position, and dipole local the cerebrum, we use the PF by
ization is to be known. The repre considering the measured signal zt
sentation of N number of dipoles is at time t (Ebinger et al., 2015).
xt = ⎡⎣ xt (1) ,..., xt ( N ) ⎤⎦ , where each single
geometric position in 3D is given as
xt (i ) = [ x(i ) y (i ) z (i )] , i = 1, 2,..., N.
T
22.2.2 PARTICLE FILTER
As there are N number of
The paradigm in state space is well
dipoles, the lead field matrix
Li ( xt ) = ℜ N ×ns adapted to problems with neural
can be written as analysis, where an observed signal
L ( xt ) = ⎣⎡ L ( xt (1) ) ,..., L ( xt ( N ) ) ⎦⎤ is influenced by certain unknown
, and it
depends on the location of the dipole factors that change with time. Latent
xt ( i ) at time t. The vector of moments states are referred to as unidentified
T signals. This approach allows us to
st = ℜ3N×1 is st = ⎡s ⎣ t (1) ,..., st ( N ) ⎤⎦ ,
solve issues related to the estimation
where each single moment or mind
of latent signals and the adaptation of
source signal in 3D is given as
T models between latent and observed
st ( i ) = ⎡s
⎣ x ( i ) s y (i ) sz (i )⎤⎦ . Now, (22.1) signals and to statistical testing of
can be rewritten in a matrix form as their relationship. We must identify
a couple of statistical models in
Z = L( X ) S +V (22.2) order to build a state-space model.
In the first model, the state model,
This equation can be taken as a the dynamics of the latent states are
measurement equation, and for the described. The second model, the
state equation, since how the states observation model, explains how
evolve is unknown, we can take this the latent state affects, each time,
as a random walk model in the brain the probability distribution of the
source localization problem observation process. By considering
the measurements, the PF is used to
xt = xt −1 + ut (22.3)
assess the dynamic state variables by
Equations (22.2) and (22.3) approximating the posterior likeli
can be considered as measurement hood density function of the uniden
or observation equation and state tified state parameters at each time
equation, respectively. Based on point (Arulampalam et al., 2002).
the above two state and observation For such a dynamic system, the
equation the EEG can be modeled state-space model can be described
as state-space model. To evaluate in terms of xt and zt as
22.2.2.1 PARTICLE GENERATION

xt = f ( xt −1 ) + ut (22.4)
The first phase is related to generation
zt = h ( xt ) + vt (22.5) of particles denoted by x (i )t . Parti-
cles are taken from q ( xt x t −1 , z1:t )
(i)
where xt consists of Nx number of
unidentified parameters at time t, zt which is an importance density func-
contains Ns number of observations tion, where z1:t = { z1 , ..., zt } .
at time t, f(·) and h(·) are considered
to be nonlinear functions, where
f(·) is related to state transition and 22.2.2.2 WEIGHT CALCULATION
h(·.) is related to the state vector
along with the observation vector, The particle weights can be computed
ut is the state model error, and vt is as follows:
the observation noise. The PF can be
used to estimate the joint posterior w (i)
žw (i)
(
p ( zt / x (i )t ) p x ( )t x (i )t −1
i
) (22.8)
likelihood density function of xt at
t t −1
q x( (i )
t x (i )
t −1 , z1:t )
time t using N number of random
The sum of all weights should be
particles x(i)t along with weights
equal to 1, that is, w( ) = 1, provided N i
w (i)t,i = 1,2,..., N, as i=1 t
that an importance density can be taken

in such a way that q ( x / x( ) , z ) = p ( x / x( ) ). t
i
t −1 1:t t
i
t −1
p ( xt zt ) œ i =1 w(i) t ł ( xt − x ) (22.6)
N (i)
t Now, (22.8) becomes w( ) ž w( ) p ( z x ). i
t
i
t −1 t
(i)
t
where δ(·) is a Dirac delta function.

Using (22.6), the estimated state 22.2.2.3 RESAMPLING
vector can be written as
Resampling is used to avoid rapid
degeneracy of particles, that is, most
x̂t œ i=1 w(i)t x
N (i )
t (22.7) of the particles with high weights
will dominate the smaller weight
To assign the weights for
particles. The dominance of these
particles, the importance density
particles degeneracy leads to the
plays the vital role; based on the
poor posterior likelihood density
importance density, the sequential
function. To avoid particle degenera-
importance resampling PF can be
tion, we used systematic resampling.
used to estimate the states, and it
Systematic resampling is also
consists of three phases: particle
called as universal sampling; in
generation, weight calculation, and
this popular technique, we draws
resampling.
only one random number, that is,
one direction in the “wheel,” for et al., 2007), and DTF (Kaminski et
one particle and other particles with al., 2001). The time-series MVAR
the N − 1 directions being fixed at model equation can be written as
1/N increments from that randomly
s ( t ) = ∑ k =1 r ( k ) s ( t − k ) + a ( t )
p
(1) (22.10)
picked direction. Currently, wt is
drained from the regular distribution
⎛ 1 Here, s(t) represents the M×1
on ⎜ 0, ⎤⎥ , and whatever is left of the neural sources time series, where M
⎝ N⎦
“wt” information is acquired conclu is the number of sources. r(k) is the
sively (Bolic et al., 2004), that is, M×M coefficient matrix, which can
be attained from the autoregressive
⎛ 1⎞ (AR) model, and p is the model
wt( n ) ~ U ⎜ 0, ⎟ ,
⎝ N⎠ order of the AR process and can be
(n)
wt = wt + (1) n −1
, n = 2,3,..., N (22.9) calculated by the Akaike informa
N tion criterion (Akaike, 1974) and
the Bayesian information criterion
(Schwarz, 1978). Rearranging
22.2.3 EFFECTIVE (22.10), we obtain
CONNECTIVITY MEASURES
a ( t ) = ∑ k =0 rˆ ( k ) s ( t − k )
p
(22.11)
Effective connectivity measures
estimate the frequency-domain where rˆ ( k ) = −r ( k ) and r̂ ( 0 ) = I .
directional association between Converting (22.11) to the frequency
cerebrum regions that can be attained domain, we obtain
from spectral measures by utilizing A( f ) = R ( f ) S ( f ) .
MVAR models. Subsequently, Multiplying both sides with
coordinated communications can R −1 ( f ) gives
be measured by fitting the MVAR
S(f) = Q(f) A(f), where R–1(f) = Q(f).
model to the time courses of the
evaluated sources. These measures Here, S(f) is the matrix of the
were proposed by Granger as, for multivariate process and Q(f) is the
two signals, if the first signal infor transfer function of the system. This
mation can be predicted by using transfer function should give the
the previous information of the information about the structure of
second signal, then it can be stated the modeled system.
as casual to the first signal (Granger, Now, the GC connectivity
1969). Based on the Granger theory, measure can be calculated as
effective connectivity measures can ⎛ var ( stm sˆ m ) ⎞
be categorized as GC, PDC (Baccala GCmn = In ⎜ ⎟
⎜ var ( stm sˆ m , sˆ n ) ⎟ (22.12)
⎝ ⎠
where GCmn represents the relation Start analyzing the data at time
m
from n → m and ŝ is the past value t = 1.
of the source m. Based on the probabilistic
PDC is given as criteria, initialize the particle.
Assign the weights to each
Rmn (f) particle.
PDCmn (f) = (22.13)
Normalize the weights.
∑
2
R (f)
p
i=1 in
Most of the particles with high
where PDCmn gives the causality weights will dominate the low-
from n → m, PDCmn is in the range weight particles. The dominance of
from 0 to 1, where 0 represents no these particles’ degeneracy leads to
connectivity between n and m and 1 the poor posterior likelihood density
represents full connectivity between function. Use resampling methods to
n and m. discard the low-weight particles.
The DTF is given as Prepare dipole configuration for
the next time step.
Qmn ( f ) Extraction of dipoles and their
DTFmn ( f ) = (22.14) time series using the PF for EEG
∑
2
Qml ( f )
p
data.
i=1
where DTFmn gives the causality The MVAR procedure portrays

influence from source n to source m the connection between the sources
at frequency f, and it is in the range inside the brain, particularly as far as
of 0 ≤ DTFmn ≤ 1 , where 0 represents the effect of one variable on another
no connectivity between n and m is concerned. MVAR allows us to
and 1 represents full connectivity derive time- and frequency-domain
between n and m. pictures of causality through the
model-order coefficients and through
their spectral representation.
22.2.4 IMPLEMENTATION OF The connectivity is calculated
THE PROPOSED APPROACH from the dipole communication
modeling and estimation of model
The process of connectivity estima order.
tion is as follows: Connectivity measures able to
Calculate the lead field matrix compute causality in the time and
and source space. frequency domains are GC and
The number of dipole sources is PDC/DTF, respectively. Apply
unknown and time variant. For this these measures and compute the
situation, we model these multiple directional connectivity between the
dipole sources as RFSs. sources.
22.3 RESULTS AND the accurate position of the source

DISCUSSIONS dipoles because it is nonlinear. The
connectivity procedure contains
For simulation of the proposed two phases, that is, source localiza-
method, the data were obtained tion and connectivity estimation. In
from the Brainstorm EEG/Epilepsy source localization, first, the lead
dataset (Tadel et al., 2011). The field matrix is computed using the
data were recorded at a frequency FieldTrip toolbox (Oostenveld et al.,
of 256 Hz using 29 channels (FP1, 2011). Combination of the lead-field
FP2, F3, F4, C3, C4, P3, P4, O1, matrix and the EEG data applied to
O2, F7, F8, T7, T8, P7, P8, Fz, Cz, PF for estimating the position and
Pz, T1, T2, FC1, FC2, FC5, FC6, time series of the sources. Figure
CP1,CP2, CP5, and CP6) as per 22.2a and b demonstrates the source
the 10/20 International framework. location and amplitudes obtained
The simulation was performed from the EEG data. It can be observed
for the EEG data from 10,000 to that, at the given time interval, there
20,000 ms over 921,600 samples, as are five different sources available,
illustrated in Figure 22.1. Note that and locations of these sources are
for real EEG data, we do not have given in Table 22.1.
FIGURE 22.1 Real EEG data.
TABLE 22.1 Locations (x, y, and z directions) of the Extracted Sources from the EEG
Source x y z
Source 1 –0.0043 –0.0074 0.0163
Source 2 –0.0160 0.0165 0.0359
Source 3 0.0063 0.0454 0.0280
Source 4 –0.0030 –0.0443 0.0401
Source 5 0.0663 –0.0015 0.0158
After source localization, the from GC. The establishment of the

source time series had been applied relation between source 1 and source
to the MVAR model to find the 4, source 4 and source 5, and source 2
interactions over time. By using and source 3 is demonstrated. Figure
GC methods, effective connectivity 22.4 demonstrates the estimation
measures were obtained; Figure 22.3 of PDC with respect to frequency.
shows the connectivity measures In Figure 22.4a, the connectivity is
FIGURE 22.2 (a) Source extracted by using the PF. (b) Source amplitudes.
FIGURE 22.3 Connectivity measures by GC.

FIGURE 22.4 Connectivity measure by PDC (a) up to 4 Hz and (b) from 4 to 16 Hz.
shown from source 1 to source 4 and source 4, and source 1 to source 5

from source 3 to source 2 up to 4 Hz. up to 4 Hz. Figure 22.5b shows the
Figure 22.4b shows the connectivity connectivity between source 1 and
between source 1 and source 4 from source 4 and source 3 and source 2
4 to 16 Hz. After 16 Hz, PDC is not from 4 to 8 Hz. Figure 22.5c shows
showing any connectivity between the connectivity between source 1
the sources. In Figure 22.5, DTF and source 4 from 8 to 16 Hz. After
connectivity estimation is illustrated. 16 Hz, the DTF does not show any
In Figure 22.5a, the connectivity is connectivity between the sources.
shown from source 1 to source 2, From the effective connectivity
source 1 to source 3, source 1 to estimations, it can be observed that
FIGURE 22.5 Connectivity measure by DTF (a) up to 4 Hz, (b) from 4 to 8 Hz, and (c)
from 8 to 16 Hz.
GC, PDC, and DTF demonstrate of the brain and its useful activi-
the directional connectivity between ties is one of the imperative fields
source 1 and source 4. in investigating how brain works.
These communications are called
brain connectivity. The approach
22.4 CONCLUSION
proposed in this chapter was utilized
Focusing on the relations and for assessing effective connectivity
communications among regions of cerebrum by using PF and GC
methods. In this chapter, the PF was Conflict of Interest: The authors

utilized for evaluating the locations declare no potential conflict of
of the sources by considering EEG interests.
signals as measurement model and Ethical Approval: The
MVAR; GC was used for estimating conducted research is not related to
the connectivity between the sources. either human or animals use.
The proposed strategy is related to
dynamic source actuation location.
Therefore, it is more precise than KEYWORDS
different strategies that utilize static
source actuation methods. This tech- electroencephalography
nique does not require any predefined inverse problems
data (e.g., other model-based strate- connectivity
gies). The PF was applied to extract
Granger causality
the sources and their amplitudes
and applied the multivariate model particle flter
on estimated sources, using GC
techniques, to obtain the effective
connectivity measures of the given REFERENCES
data. The simulated results show the
directed flow among the sources. Akaike H. (1974). A new look at the statistical
Multivariate GC-based measures model identification. IEEE Trans Autom
Control; 19:716–723.
give a suitable framework to
Antelis J, Minguez J. (2009). Dynamic
building up causal relations between solution to the EEG source localization
neural populations. GC-based effec- problem using Kalman filters and particle
tive connectivity measures deliver filters. In Int Conf IEEE Eng Med Biol Soc
the correlation on frequency-specific (pp. 7780).
Arulampalam MS, Maskell S, Gordon N,
coupling in neural congregations,
Clapp T. (2002). A tutorial on particle
and they are vigorous with regard to filters for online nonlinear/non-Gaussian
volume conduction. In addition, they Bayesian tracking. IEEE Trans Signal
offer potential outcomes to take after Process; 50:174–188.
dynamical changes of correlation Baccala LA, Sameshima K, Takahashi
DY. (2007). Generalized partial directed
among cerebrum structures. These
coherence. In: Proc 15th Int Conf Digital
measures give a significant method- Signal Process (Cardiff: IEEE). p. 162166.
ology to investigate the expansive Barnett L, Seth AK. (2013). The MVGC
scale neural synchronization and its multivariate Granger causality toolbox: A
dynamics. new approach to Granger-causal inference.
J Neurosci Methods; 223:5068.
Financial Disclosure: The
Barnett L, Seth AK. (2015). Granger
authors state no funding involved. causality for state space models.
Phys Rev E 91:040101. doi:10.1103/ EEG-based brain connectivity estimation

PhysRevE.91.040101 methodologies. Brain Topogr 32:625–642.
Bolic M, Djuric PM, Hong S. (2004). doi:10.1007/s10548-016-0498-y
Resampling algorithms for particle filters: Haufe S, Nikulin VV, Muller KR, Nolte
A computational complexity perspective. G. (2013). A critical assessment of
EURASIP J Adv Signal Process; connectivity measures for EEG data: A
2004:2267–2277. simulation study. Neuroimage 64:120133.
Bullmore E, Sporns O. (2009). Complex doi:10.1016/j.neuroimage.2012. 09.036
brain networks: Graph theoretical analysis Kaminski MJ, Blinowska KJ. (1991). A
of structural and functional systems. new method of the description of the
Neuroscience; 10:186–198. information flow in the brain structures.
Ding M, Chen Y, Bressler SL. (2006). Biol Cybern 65:203210. doi:10.1007/
Granger causality. Basic theory and BF00198091
application to neuroscience. In Handbook Kaminski M, Ding M, Truccolo W, Bressler
of Time Series Analysis (B. Schelter, M. SL. (2001). Evaluating causal relations in
Winterhalder, and J. Timmer, Eds.), John neural systems: Granger causality, directed
Wiley & Sons, Ltd., pp. 437–160. transfer function and statistical assessment
Ebinger B, Bouaynaya N, Georgieva P, of significance. Biol Cybern; 85:145157.
Mihaylova L. (2015). EEG dynamic Liao W, Ding J, Marinazzo D et al. (2011).
source localization using marginalized Small-world directed networks in the
particle filtering. In 2015 IEEE Int Conf human brain: Multivariate Granger
Bioinform Biomed, Washington, DC, causality analysis of resting-state fMRI.
USA, pp. 454–457. Neuroimage 54:26832694. doi:10.1016/j.
Federica V, Fabio M, Fabrizio E, Stefano neuroimage.2010.11.007
M, Francesco DS. (2010). Realistic and Makeig S, Westerfield M, Jung TP, Enghoff S,
spherical head modeling for EEG forward Townsend J, Courchesne E, Sejnowski TJ.
problem solution: A comparative cortex- (2002). Dynamic brain sources of visual
based analysis. Comput Intel Nerosci; evoked responses. Science; 295:690694,
2010:972060. DOI: 10.1126/science.1066168.
Galka A, Yamashita O, Ozaki T, Biscay Miao L, Zhang JJ, Chakrabarti C,
R, Valde P. (2004). A solution to the Papandreou-Suppappola A. (2013).
dynamical inverse problem of EEG Efficient Bayesian tracking of multiple
generation using spatiotemporal Kalman sources of neural activity: Algorithms and
filtering. NeuroImage; 23:435453. real-time FPGA implementation. IEEE
Gordon NJ, Salmon DJ, Smith AFM. (1992). Trans Signal Process; 61:633–647.
Novel approach to nonlinear/non-Gaussian Mohseni HR, Wilding EL, Sanei S (2008).
Bayesian state estimation. IEE Proc F: Sequential Monte Carlo techniques for
Radar Signal Process 140:107113. EEG dipole placing and tracking. In
Granger C. (1969). Investigating causal Sens Array Multichannel Signal Process
relations by econometric models and Workshop (pp. 95–98).
cross-spectral methods. Econometrica; Mosher JC, Leahy RM. (1999). Source
37:424–438. localization using recursively applied and
Haufe S. (2011). Towards EEG source projected (RAP) MUSIC, IEEE Trans
connectivity analysis. Ph.D., Berlin Signal Process; 47(2):332340.
Institute of Technology, Berlin, Germany. Mosher JC, Lewis PS, Leahy RM. (1992).
Haufe S, Ewald A. (2016). A simulation Multiple dipole modelling and localization
framework for benchmarking from spatio-temporal MEG data. IEEE
Trans Biomed Eng; 39(6):541–557, DOI: Bayesian filtering. Human Brain Mapping;
10.1109/10.141192 30:19111921
Nunez PL, Srinivasan R. (2006). Electric Sorrentino A, Parkkonen L, Piana M.
Fields of The Brain: The Neurophysics of (2007). Particle filters: A new method for
EEG. Oxford University Press: Oxford, reconstructing multiple current dipoles
UK. from MEG data. In Int Congr Ser;
Oostenveld R, Fries P, Maris E, Schoffelen 1300:173176.
JM. (2011). FieldTrip: Open source Tadel F, Baillet S, Mosher JC, Pantazis
software for advanced analysis of MEG, D, Leahy RM. (2011). Brainstorm:
EEG, and invasive electrophysiological A user-friendly application for MEG/
Data. Comput Intel Neurosci; 2011:156869. EEG analysis. Comput Intel Nerosci;
Schwarz G. (1978). Estimating the dimension 2011:879716.
of a model. Ann Statist; 6:461464. Van Veen BD, Buckley K. (1988).
Sorrentino A, Parkkonen L, Pascarella A, Beamforming: A versatile approach to
Campi C, Piana M. (2009). Dynamical spatial filtering. IEEE ASSP Mag 5:424.
MEG source modeling with multi-target
CHAPTER 23
EXPLORATION OF LYMPH NODE-

NEGATIVE BREAST CANCERS BY
SUPPORT VECTOR MACHINES, NAÏVE
BAYES, AND DECISION TREES: A
COMPARATIVE STUDY
J. SATYA ESWARI1,* and PRADEEP SINGH2
1
Department of Biotechnology, National Institute of Technology Raipur,
Raipur, Chhattisgarh 492010, India
2
Department of Computer Science and Engineering, National Institute
of Technology Raipur, Raipur, Chhattisgarh 492010, India
Corresponding author. E-mail: satyaeswarij.bt@nitrr.ac.in
*
ABSTRACT cancer based on their gene expres-

sion signatures using a number of
Background: In classification, classification algorithms, namely,
when the distribution class data are support vector machine (SVM),
uneven, the classification accuracy naïve Bayes, and decision tree.
is biased by the majority class. In Since the data are imbalanced, we
the case of the imbalanced data combined these algorithms with the
set, the classification methods synthetic minority oversampling
are likely to perform poorly for
technique (SMOTE) to handle the
minority class examples because
problem of classifying imbalanced
they are aimed to optimize the
data and to demonstrate that these
overall accuracy of the class rather
techniques provide assistance for
than considering the relative distri-
classification of imbalanced data
bution of each class.
sets. The performance of the SMOTE
Procedure: In this chapter, we to balance gene expression signature
classify lymph node-negative breast data was examined.
Results: The results demonstrated al., 2011). Finally, it is needed to

that the accuracy increased by identify predictive markers, which
8.15%, 15.92%, and 16.78% for are further linked to disease and can
SVM, naïve Bayes, and decision calculate the risk of metastasis in
tree, respectively, and the area under individual patients more precisely.
the receiver operating characteristic There is also a need to accurately
curve increased by 32.4% in SVM, classify reappearance of an indi-
45% in naïve Bayes, and 22.5% in vidual patient’s risk of illness to be
decision tree after balancing the convinced that patient has received
training data. suitable therapy for his illness. In
the past few decades, genome-wide
Conclusion: The results of this
expressions profiles with the various
study show that the techniques of
numbers of prognostic markers have
classification algorithms combined
been recognized (Fisher et al., 1983;
with the SMOTE can provide a
Mitra et al., 2006). Microarray is a
significant solution for the class
recent high-throughput technique for
imbalance problem. This work also
the detection of cancer. It helps to
explains the probable application
of classification techniques for the analyze large quantities of samples
diagnosis of breast cancer and the with new or previously generated
identification of candidate genes in data to test markers present in tumors.
cancer patients. The development of this technique
has transformed the cancer research.
The cancer can be molecularly char-
23.1 INTRODUCTION acterized by studying the expression
profile of mRNA of a large number
Metastasis is an advanced stage of genes (Bojarczuka et al., 2011;
of tumor and can be cured by Perou et al., 2000; Venkateswaru et
using belligerent therapy. This al., 2012; Yang et al., 2013). This
therapy is recommended, which helps in determining alterations of
led to a substantial decrease in gene expressions between different
breast cancer death rates. Hence, types of tissue in healthy (control)
nowadays, majority of patients and cancer patients. Identification
with breast cancer do not trust the of a subset of genes responsible
traditional factors (Alizadeh et al., for cancer can be performed by
2000; Ben-Dor et al., 2000). Even analyzing the vast amount of expres-
they do not require chemotherapy, sion profile of mRNA. The identifi-
early stage and intermediate-stage cation of genes, subsequently, leads
lymph node-negative patients to their use as a prognostic or diag-
undergo chemotherapy (Sehgal et nostic biomarker for treatment of the
Exploration of Lymph Node-Negative Breast Cancers 511
cancer. The classification of tumors breast cancer (Sehgal et al., 2011).

into different types of cancer has A tree-based method for classifica-
been made possible by the develop- tion of leukemia and lymphoma was
ment of molecular classifiers. This presented in the works of Zhang et
method allows separation of tumors al. (2003). They used tree-based
into relevant molecular subtypes, method and constructed random
which are not feasible by patho- forests for classification of cancers.
logical methods. Still, the reliable All of these methods require the
prediction of a robust gene signature usage of statistical techniques for
for the detection of cancer becomes classification of tumors and, hence,
a challenge due to the presence of a pose a hindrance in determination
large number of genes and a small of nonlinear relationships between
number of patient samples (Eswari genes. In contrast, complex models
et al., 2013, 2015). do not provide relationships between
Various statistical and compu- genes responsible for tumors. To
tational tools have been used for evaluate gene expression data, naïve
classification of cancer at molecular Bayes (NB), SVM, and decision
level (Eswari et al., 2017, Eswari trees (DTs) have been rigorously
and Dhagat, 2018). A discrimination tested for their capability in making a
of acute lymphoblastic leukemia distinction among cancers belonging
with acute myeloid leukemia was to various diagnostic categories.
proposed by Golub et al. (1999). Outdated labor-intensive prac-
This discrimination was based in a tices are not competent enough
weighted voting scheme, which iden- to investigate enormous and
tified 50 genes. They also predicted multifarious gene classifications
the membership of new leukemia encrypting massive volume of
cases. The same data set was utilized evidence. Implementation of new
by Mukherjee et al. to develop a technology classification tools such
support vector machine (SVM) clas- as NB, SVM, and DTs has driven the
sifier for classification of samples expanse of abstraction of biologi-
into acute lymphoblastic leukemia cally noteworthy structures in the
with acute myeloid leukemia (Vuong gene sequences (Elouedi et al.,
et al., 2014). Khan et al. classified 2014; Eswari et al., 2013; Heden-
different forms of small round blue falk et al., 2001; Khan et al., 2001;
cell tumors using artificial neural Liu et al., 2013; Ma et al., 2009).
network and classified them into Neural networks and evolutionary
96 genes (Batista et al., 2004). A algorithms such as differential
70-gene predictor was developed evolution and genetic algorithms
by Veer et al. for the prediction of are exploited for optimization and
classification of biological systems investigators such as Taghipour et

(Eswari and Venkateswarlu, 2012; al. performed studies on Modeling
Hong and Cho, 2006; Laurikkala, Breast Cancer Progression and
2001; Quinlan, 2014; Suryawanshi Evaluating Screening Policies (Li
et al., 2020). Some researchers et al., 2001). However, in this study,
used feature selection methods classification of breast cancers
for microarray analysis (He et al., based on gene expression profiles is
2014) and hierarchical clusterings carried out by using SVM, NB, and
(Tan et al., 2003). NB, SVM, and DTs.
DTs are efficient methodologies for
optimization of systems as a classi-
fier. Microarray technology is used 23.2 METHODS AND
for finding pattern of expression ALGORITHMS
with a large number of genes from
all samples of cancer and metastasis Research has been done elaborately
on breast cancer classification
in contradiction of expression of the
by taking the consideration of
same genes in control gene. Every
physical features. These physical
gene has its own gene expression
features use neural networks, fuzzy
profile, and the cancer gene also has
logic, NB, and SVM. However,
a unique gene expression profile.
molecular level of classification
Many samples of patients have
using gene expression profiles has
been taken for appraisal. After
reported less. Hence, in our study,
comparison, they have placed in
we rigorously compared with the
groups to find out conjoint genes.
NB and SVM to classify lymph
Hence, the gene expression profile node-negative breast cancers using
of microarray genes can have the MATLAB software.
capability assessing many genes
simultaneously. Hence, classifica-
tion of numerous cancers has been 23.2.1 DATA GENERATION
performed. Currently, only a few AND MICROARRAY DATABASE
diagnostic tools are present to CONSTRUCTION
recognize patients with cancer risk.
In this work, we intended to build The classification of lymph negative
up a gene-expression-based meth- breast cancer requires generation of
odology and to apply it to provide data and construction of the database.
quantitative predictions on disease The method for data generation and
outcome for patients with lymph- database preparation was performed
node-negative breast cancer. Certain using the following steps.
23.2.1.1 DATABASE 23.2.2 SUPPORT VECTOR

MACHINE
A total of 40 gene signatures for
estrogen receptor-positive and The SVM algorithm is based on
negative patients, as reported by the structured risk maximization
Wang et al. (Ben-Dor et al., 2000), theory. This target of this theory
were selected for the study. The is to minimize the generalization
sequences of the gene signatures error. Sequential minimal optimiza-
were taken from the National tion version of SVM is used in this
Center for Biotechnology Informa- experiment. We used complexity
tion database, where each gene parameter C = 1 to depict the toler-
signature has its unique accession ance degree to errors. Radial basis
number. The obtained data were function kernel, which is an efficient
technique for classification, is used
used for preprocessing and genera-
in this study. SVM as one of the
tion of data sets.
well-known classifiers can be found
in (Eswari and Venkateswarlu,
23.2.1.2 PREPROCESSING OF 2016). SVM is a popular machine
learning tool for tasks involving
DATA AND GENERATION OF
novelty detection, regression, or
DATA SETS
classification.
Data encoding is a very important
part to improve network perfor- 23.2.2.1 BRIEF ALGORITHM
mance for the classification of
OF SVM
data. The DNA encoding gene
signatures of lymph node-negative Vapnik proposed the SVM algorithm
breast cancer are composed of four for density estimation, regression,
nucleotide bases. These four bases and classification of data sets (Van’t
are assigned a numerical vector et al., 2002). In SVM, a hyperplane
to training SVM, NB, and DT for w.x + b = 0, xi ın is found. This
classification of these sequences. hyperplane separates those xi data
The numerical values of 0, 0.5, 1, points of a given class that lie on the
and 1.5 were chosen for encoding same side of the plane. These data
A, T, G, and C, respectively. In the points correspond to the following
following, Section 23.2.2 tells the decision rule: g(x) = sign (wx + b).
basic idea of SVM, Section 23.2.3 To determine the plane, a separating
discusses about the NB algorithm, hyperplane is chosen by SVM, which
and Section 23.2.4 talks about DT. is denoted by w.x + b = 0. The plane is
chosen such that it is far from the data Values Xipi correspond to features
point xi and has maximum margin. The Fe1, Fe2, . . . , Fen.
hyperplane far away from data points There are n classes cl1, cl2, . . .
minimizes the chance of wrong deci- ,cln, and every sample belongs to one
sions during the classification of new of these classes.
data. In other words, the distance of In our model, the value of n is 4
the data points that are closest from as there are four classes.
the hyperplane is maximized in SVM
When an extra data sample Xip
(Naseriparsa and Kashani, 2014).
with an unknown class is provided,
class X can be predicted by highest
23.2.3 NAÏVE BAYES conditional probability Pro (Clk|Xip),
where k = 1, 2,…, n. The Bayes
The NB algorithm is a classifica- model is represented as follows:
tion technique, which is based on
probability. The advantages of this Pro ( Xip Clk ) .Pro ( Clk )
Pro ( clk X ) = (23.2)
technique are that it is easier to be Pro ( Xip )
applied to different types of data Here, Pro is kept constant for all
series and provide better results than cl. The product Pro(Xip|Clk) needs
the existing ones. Suppose Xip is to be maximized. The former prob-
fault data without any class label and abilities of the cl can be estimated
H is a hypothesis such that X falls
by
into a class specified as C. We aim to
establish Pro (H|Xip) as the posterior
Pro ( clk ) =
Number of training int ances of Class Clk
(23.3)
probability representing our confi- m
dence in the hypothesis after Xip,

where m is the total number of
which is the observation, is given.
training instances.
The Bayesian theorem offers a tech-
Taking into account for condi-
nique of computing the Pro(H|Xip)
using probability Pro(Xip), Pro(H) tional probability (C-Pro) indepen-
and Prob(Xip|H) (Ma et al., 2009). dence assumption between features,
The generalized Bayes relation is the C-Pro can be written as follows:
Pro ( Xip H ) .Pro(H) Pro ( Clk Xip ) = Pt=1

n
P ( Xipt Clk ) (23.4)
Pro ( H Xip ) = (23.1)
Pro ( Xip )
Consider a set of m samples String where Xt represent features in sample

= {String1, String2, . . . ,Stringm}, X . The probabilities P(Xt|Ck) can be
where sample String training data set expected from the training data set
is an n-dimensional feature vector and are calculated for each attribute
{Xip1, Xip2, . . . , Xipn}. columns.
23.2.4 DECISION TREES of the objects is defined by starting

from the root of the tree, to branch,
DTs predict responses of the data and finally the leaf. The class of the
set. This is done by following the leaf will be assigned as the class of
decisions in a bottom-up approach, the object (Howland et al., 2013).
such as a tree, where decisions
are from root to leaf node. The
responses for classification trees 23.2.5 TENFOLD CROSS-
are in the form of “true” or false,” VALIDATION FOR THE
whereas the responses are numerical METHODS OF SVM, NB, AND
in regression trees. DT adopts a DT
greedy approach to construct DT
in a top-down recursive divide & We compare each classification
conquer manner. The calculations algorithm’s performance with
create a training set (TS) with the cl imbalance and after removing the
labels. The TS is recursively divided imbalance from data using synthetic
into reduced minisets as the tree is minority oversampling technique
being built. An extension of ID3 (SMOTE) algorithms and recorded
(DT algorithm) C4.5 is used in this their various performance measures.
experiment. The additional features For the purpose of reliable and stable
of C4.5 are taken for missing values, results in the experiments, a K-fold
continuous attribute value ranges, cross-validation strategy is used. A
DT pruning, and rules of derivation. K-fold classifier is generally used
These properties are used as an for classification accuracy measure.
extension to information gain, which In this validation, K partitions are
is named as the gain ratio; hence, made, and out of these, one is used
bias for certain attributes can be for testing and the rest are used for
considered (Ma et al., 2009). training. The data set can be shown
This technique is represented by as follows:
three elements, namely, a decision
node for a test feature, a branch for T1 = X 1 P1 = X 2UX 3U...UX k
one of the feature values, and a leaf T2 = X 2 P2 = X 1UX 3U ...UX k
that contains objects of the same or Tk = X k Pk = X 2UX 3U...UX k −1 (23.5)
similar class. Formation of a DT
involves building a tree and classifi- Here, T1, T2 ,…, Tk are the parti-
cation of the objects in that tree. The tions for testing and P1, P2 ,…Pk are
tree is built by placing a feature into for training. K is typically 10 or 30.
an appropriate node and assigning a In this experiment, k = 10 has been
class to each of the leaf. The class used. In the 10-fold cross-validation
process, the data are split into 10 studies to identify and classify
identical disjoint parts. Nine of better gene expression profiles of
the 10 parts are used for training lymph node-negative breast cancer.
the framework, and one is used for Various accuracy measures, such
testing. This is performed 10 times, as receiver operating characteristic
and each time a different part of (ROC), precision, accuracy, recall/
the data is used for testing. In this sensitivity, and root-mean-square
work, 40 lymph node-negative gene error, have been performed and
signatures of metastasis were given concluded with a better method for
for 10-fold cross validation with identification of gene expression
the help of three techniques such as profiles.
SVM, NB, and DT. In these 36 gene
signatures from DNA, microarrays 23.3.1 ACCURACY MEASURES
were taken as training and four were
used as validation, and so on. Hence, Different measures have been
imbalance in the data sets may be proposed for two-class problems,
taken into account. We were able wherein four possible cases (TP, FP,
to apply methods successfully with FN, and TN) can be represented in
the 10-fold cross validation, and the a confusion matrix (see Table 23.1).
results are presented in the following
sections. 23.3.1.1 ACCURACY
The accuracy is measured as
23.3 RESULTS
Accuracy (A) = correct classifica-
This study presents SVM, NB, and tion/total number of classes. (23.6)
DT to precisely identify the risk of
tumor recurrence. This is beneficial While talking about software
to classify the patients as low- and fault prediction, A does not disclose
high-risk groups in the case of the discrepancy between FP and FN.
lymph node-negative breast cancer. Overall, the accuracy that is deter-
Various accuracy measures have mined has generally lesser relevance
been carried out using comparative than recall and precision.
TABLE 23.1 Confusion Matrix

Actual
Faulty Module Not Faulty Module
Predicted
Faulty Module TP (True positive) FP (False positive)

Not Faulty Module FN (False negative) TN (True negative)
23.3.1.2 RECALLING 23.3.2.1 CRITICAL ANALYSIS

(SENSITIVITY) ABILITY FOR CONSIDERING
IMBALANCED DATA
Recall, also known as the true posi-
tive rate or probability of fault detec- In recent years, researchers have
tion, is represented as observed that an imbalanced data
set can prove to be an obstacle for
Recall = correctly predicted faulty learning and accuracy calculation for
modules/total number of actually learning algorithms (Chawla et al.,
faulty modules. (23.7) 2002; Singh et al., 2014; Sweilam et
al., 2010; Mahata, 2010). In order to
handle the class imbalance problem,
23.3.1.3 PRECISION ABILITY SMOTE oversampling is used, and
the same procedure as followed in (Gu
Precision that is also referred to as
et al., 2008) is used for implementa-
correctness can be given as
tion. SMOTE (Singh et al., 2014)
Precision = correctly predicted is a technique for oversampling.
modules/total number of predicted This technique is used to enhance
faulty modules. (23.8) the instances of minority class by
the nearest neighbor method, which
A high-precision value depicts are the synthetic class instances.
that lesser effort is needed for As this method generated synthetic
inspection and testing. class instances instead of replicate
minority class instances, this over-
comes the problem of overfitting.
23.3.2 AREA UNDER THE ROC
The first step of the algorithm
CURVE (AUC) is the assignment of initial data,
To compare the results, we have named as D. This proceeds with
taken the average value of AUC and the subset M. Every instance in M
accuracy in 10-fold cross valida- is taken as x. Randomization of k
tion. The AUC represents the most from different instances leads to
informative and objective indicator the generation of difference, Diff.
of predictive accuracy as known Various random numbers, ranging
and reported by other researchers from 0 to 1, are generated using
(Van’t et al., 2002). A better classi- rand fun => n = x + Diff * rand.
fier should produce a higher AUC. After addition of n to D, the algo-
We have also used the AUC for our rithm is repeated, and finally, the
study. function is ended.
23.3.2.2 THE SMOTE For all classifiers, the sensitivity

ALGORITHM value is 73%, 97%, and 100%,
respectively. The high accuracy
SMOTE is a technique to generate states that the classifier can classify
an additional data set, wherein the the majority class sample well and
minority samples are oversampled can be unsuccessful in classifying
by 192% to obtain a ratio of 1:1. The the minority set. The data set used
classical method to determine accu- in the experiment is balanced by
racy cannot be used to determine the the SMOTE by generating synthetic
performance of the unbalanced data data of minority class and combining
set as the impact of minority class all the majority samples to make
on accuracy is greater than that of equal proportion of each class. This
the majority class (Gu et al., 2008). depicted that the cancer classification
Some researchers used the SMOTE outcome was better when balanced
for classifying cancer imbalanced by the SMOTE than the original
data (Blagus, 2012; Ramaswamy et unbalance data. With the SVM, NB,
al., 2003; Kothandan, 2015). Thus, and DT classification of microarray
alternative methods, such as accu- data, the sensitivity value was found
racy, recall, precision, and AUC, to be 0.73 even after balancing the
should be used for evaluating the data, as shown in Table 23.3.
performance of the imbalanced data In this study, the SMOTE was
set. From Table 23.2, the original shown to be a good classification tech-
unbalance data set has the accuracy nique for all three classifiers: SVM,
of 62.74% with DT classification, NB, and DT. From Tables 23.2 and
72.55% with NB classification, and 23.3, it can be observed that accuracy
74.5% with SVM. is increased by 15.92% for the DT,
16.78% for NB, and 8.15% in SVM
classification results (see Figure 23.1).
TABLE 23.2 Performance Measures of
Various Algorithms for Classification of
TABLE 23.3 Performance Measures of
Cancer Data—With Imbalance
Various Algorithms for Classification of
Decision Naïve SVM Cancer Data—Without Imbalance
Tree Bayes
Decision Naïve SVM
ROC 0.59 0.45 0.5 Tree Bayes
Precision 0.76 0.74 0.745 ROC 0.82 0.90 0.82
Accuracy 62.75 72.55 74.51 Precision 0.82 0.85 0.74
Recall/ 0.73 0.97 1 Accuracy 78.67 89.33 82.67
sensitivity
Recall/ 0.737 1 1
sensitivity
In this study, the major issue in cases, the records present in clinical
building a good prediction model data sets do not truly represent the
was found to be not only the ratio properties of data consistent with
of majority and minority samples, the corresponding outcome label. By
but also the requirement for good the comparison of the results, it can
training samples that can show the be seen that AUC has increased by
properties of data consistent with the 22.5% in DT, 45% in NB, and 32.4%
corresponding class label assigned in SVM after balancing the training
to them. In the majority of the data (see Figure 23.2).
FIGURE 23.1 Comparisons on the basis of accuracy for original data and data used on the
basis of SMOTE.
FIGURE 23.2 Comparisons on the basis of AUC for original data and data used on the basis
of SMOTE.
23.4 DISCUSSION predict ALNI (Ng and Dash, 2006;

Vapnik, 2013), but the occurrence
In research, molecular classification of ALNI appears to be more in
has been applied to group tumors human epidermal growth factor
based on their gene expression, but receptor 2-positive tumors and
its practical application in clinics less in luminal A tumors (Vapnik,
has been hindered. This is because 2013; Wang et al., 2005). The iden-
of the fact that a large number of tification of predictive factors will
feature genes are needed to construct help the physicians determine the
a discriminative classifier. The need suitable therapeutic approach. To
of the hour is the construction of achieve this, computational tools
molecular classifiers consisting of a provide a feasible approach to study
small number of genes. This would be gene expression profiles by the
helpful for clinical diagnosis, where molecular level of classification.
a diagnostic assay for the evalua- In this work, the SMOTE was
tion of several genes in a single test found to be better classification
cannot be performed (Dhanasekaran technique than SVM, NB, and DT,
et al., 2001; Eswari and Chimmiri, as the accuracy increased by 8.15%,
2013; Ho et al., 2006; Kharya et al., 16.78%, and 15.92% for SVM,
2014; Mukherjee et al., 1999; Park et NB, and DT, respectively. The
al., 2012; Taghipour et al., 2013). In AUC was improved by 32%, 45%,
this study, a molecular classification and 22.5% in SVM, NB, and DT,
system using SVM, NB, and DT was respectively, after the training data
developed for lymph node-negative were balanced.
breast cancer.
He et al., in their work,
discussed about the possible issues 23.5 CONCLUSION
that affect the prognosis of lymph
node-negative breast cancer. This This chapter presents the techniques
includes type of tumor, age and to predict breast cancers by lymph
tissue type, the status of hormone node. Generally, the original data are
receptors, and tumor diameters imbalanced, and when the data set is
(Golub et al., 1999). Fisher et al. extremely imbalanced, the currently
described that the major predictive existing classification techniques do
feature for breast cancer has been not function well on minority class
axillary lymph node involvement examples. Therefore, the sampling
(ALNI) (Jones et al., 2013). Clas- strategy is one of the possible
sification of molecular subtype by solutions for the imbalanced class
computation tools fails to accurately problem. In this study, the SMOTE
was examined with three machine X., Distinct types of diffuse large B-cell
learning algorithms to classify the lymphoma identified by gene expression
profiling, Nature. 403 (2000) p. 503.
test data sets, including DT, NB, Batista, G.E., Prati, R.C., Monard, M.C., A
and SVM, for lymph node-negative study of the behavior of several methods
breast cancer. The results of classifi- for balancing machine learning training
cation were calculated and compared data, ACM SIGKDD Explorations
for the class imbalance problem, Newsletter. 6 (2004) pp. 20–29.
Ben-Dor, A., Bruhn, L., Friedman, N.,
recall, and AUC. The results Nachman, I., Schummer, M., Yakhini, Z.,
obtained from this study indicate Tissue classification with gene expression
that these techniques combined with profiles, Journal of Computational
the SMOTE generally perform better Biology. 7 (2000) pp. 559–583.
when training is done by imbalanced Blagus, R., Lusa, L.. Evaluation of SMOTE
for high-dimensional class-imbalanced
data. microarray data. in 11th International
Conference on Machine Learning and
Applications. (2012) pp. 89–94.
ACKNOWLEDGMENT Bojarczuka, C.C., Lopesb, H.S., Freitasc,
A.A., Data mining with constrained-
The authors are thankful to the syntax genetic programming: Applications
in medical data set. Algorithms. 6 (2001)
National Institute of Technology
p. 7.
Raipur for providing the necessary Chawla, N.V., Bowyer, K.W., Hall, L.O.,
computational facility to analyze Kegelmeyer, W.P., SMOTE: Synthetic
and prepare the manuscript and for minority over-sampling technique,
permission to publish it. Journal of Artificial Intelligence Research.
16 (2002) pp. 321–357.
Dhanasekaran, S.M., Barrette, T.R., Ghosh,
D., Shah, R., Varambally, S., Kurachi, K.,
KEYWORDS Pienta, K.J., Rubin, M.A., Chinnaiyan,
A.M., Delineation of prognostic
biomarkers in prostate cancer, Nature. 412
DNA microarray-based gene
(2001) p. 822.
expression profle
Elouedi, H., Meliani, W., Elouedi, Z.,
classifcation Amor, N.B.. A hybrid approach based on
breast cancer decision trees and clustering for breast
cancer classification. In 6th International
machine learning Conference of Soft Computing and Pattern
Recognition. (2014) pp. 226–231.
Eswari, J.S., Anand, M., Venkateswarlu, C.,
Optimum culture medium composition for
REFERENCES rhamnolipid production by pseudomonas
aeruginosa AT10 using a novel multi-
Alizadeh, A.A., Eisen, M.B., Davis, R.E., objective optimization method, Journal of
Ma, C., Lossos, I.S., Rosenwald, A., Chemical Technology and Biotechnology.
Boldrick, J.C., Sabet, H., Tran, T., Yu, 88 (2013) pp. 271–279.
Eswari, J.S., Chimmiri, V., Evaluation of International Conference on Advanced

kinetic parameters of an anaerobic biofilm Computer Theory and Engineering. (2008)
reactor treating pharmaceutical industry pp. 1020–1024.
wastewater by ant colony optimization, He, J., Wang, H., Ma, F., Feng, F., Lin,
Environmental Engineering Science. 30 C., Qian, H., Prognosis of lymph node-
(2013) p. 527. negative breast cancer: Association with
Eswari, J.S., Venkateswarlu, C., Optimization clinicopathological factors and tumor
of culture conditions for Chinese hamster associated gene expression, Oncology
ovary (CHO) cells production using Letters. 8 (2014) pp. 1717–1724.
differential evolution, International Hedenfalk, I., Duggan, D., Chen, Y.,
Journal of Pharmacy and Pharmaceutical Radmacher, M., Bittner, M., Simon, R.,
Sciences. 4 (2012) pp. 465–470. Meltzer, P., Gusterson, B., Esteller, M.,
Eswari, J.S. and Venkateswarlu, C. Dynamic Raffeld, M., Gene-expression profiles in
modelling and metabolic flux analysis for hereditary breast cancer, New England
optimized production of rhamnolipids. Journal of Medicine. 344 (2001) pp.
Chemical Engineering Communications, 539–548.
203 (2016) pp. 326–338. Ho, S.-Y., Hsieh, C.-H., Chen, H.-M.,
Eswari, J.S. and Swasti Dhagat. Surfactin Huang, H.-L. Interpretable gene
assisted synthesis of silver nanoparticles expression classifier with an accurate and
and drug design aspects of surfactin compact fuzzy rule base for microarray
synthetase. Advances in Natural Sciences: data analysis, Biosystems. 85 (2006) pp.
Nanoscience and Nanotechnology. 165–176.
accepted. 2018. Hong, J.-H., Cho, S.-B., The classification
Eswari, J.S., Swasti Dhagat, Shubham Kaser of cancer based on DNA microarray
and Anoop Tiwari, Molecular docking data that uses diverse ensemble genetic
and homology modelling studies of programming, Artificial Intelligence in
Bacillomycin and Iturin synthetases for Medicine. 36 (2006) pp. 43–58.
the production of therapeutic lipopeptides, Howland, N.K., Driver, T.D., Sedrak, M.P.,
2017, Current Drug Discovery Wen, X., Dong, W., Hatch, S., Eltorky,
Technologies. M.A., Chao, C., Lymph node involvement
Fisher, Bauer, M., Wickerham, D.L., in immunohistochemistry-based molecular
Redmond, C.K., Fisher, E.R., Cruz, classifications of breast cancer, Journal
A.B., Foster, R., Gardner, B., Lerner, H., of Surgical Research. 185 (2013) pp.
Margolese, R., Relation of number of 697–703.
positive axillary nodes to the prognosis Jones, T., Neboori, H., Wu, H., Yang, Q.,
of patients with primary breast cancer. Haffty, B.G., Evans, S., Higgins, S.,
An NSABP update, Cancer. 52 (1983) pp. Moran, M.S., Are breast cancer subtypes
1551–1557. prognostic for nodal involvement and
Golub, T.R., Slonim, D.K., Tamayo, P., associated with clinicopathologic features
Huard, C., Gaasenbeek, M., Mesirov, J.P., at presentation in early-stage breast
Coller, H., Loh, M.L., Downing, J.R., cancer? Annals of Surgical Oncology. 20
Caligiuri, M.A., Molecular classification (2013) pp. 2866–2872.
of cancer: Class discovery and class Khan, J., Wei, J.S., Ringner, M., Saal, L.H.,
prediction by gene expression monitoring, Ladanyi, M., Westermann, F., Berthold, F.,
Science. 286 (1999) pp. 531–537. Schwab, M., Antonescu, C.R., Peterson,
Gu, Q., Cai, Z., Zhu, L., Huang, B., Data C., Classification and diagnostic prediction
mining on imbalanced data sets, in of cancers using gene expression profiling
and artificial neural networks, Nature Mukherjee, S., Tamayo, P., Slonim, D.,
Medicine. 7 (2001) p. 673. Verri, A., Golub, T., Mesirov, J., Poggio,
Kharya, S., Agrawal, S., Soni, S., Naïve T., Support vector machine classification
Bayes classifiers: A probabilistic detection of microarray data, (1999).
model for breast cancer, International Naseriparsa, M., Kashani, M.M.R.,
Journal of Computer Applications. 92 Combination of PCA with SMOTE
(2014) pp. 26–31. resampling to boost the prediction
Kothandan, R., Handling class imbalance rate in lung cancer dataset, (2014)
problem in miRNA dataset associated with arXiv:1403.1949v1.
cancer, Bioinformation. 11 (2015) p. 6. Ng, W., Dash, M. An evaluation of
Laurikkala, J., Improving identification of progressive sampling for imbalanced data
difficult small classes by balancing class sets, in 6th IEEE International Conference
distribution. in Conference on Artificial on Data Mining—Workshops. (2006) pp.
Intelligence in Medicine in Europe. (2001) 657–661.
pp. 63–66. Ooi, P. Tan, Genetic algorithms applied to
Li, L., Weinberg, C.R., Darden, T.A., multi-class prediction for the analysis of
Pedersen, L.G., Gene selection for sample gene expression data. Bioinformatics. 19
classification based on gene expression (2003) pp. 37–44.
data: Study of sensitivity to choice of Park, S., Koo, J.S., Kim, M.S., Park, H.S.,
parameters of the GA/KNN method, Lee, J.S., Lee, J.S., Kim, S.I., Park, B.-W.,
Bioinformatics. 17 (2001) pp. 1131–1142. Characteristics and outcomes according
Liu, H.-C., Peng, P.-C., Hsieh, T.-C., Yeh, to molecular subtypes of breast cancer as
T.-C., Lin, C.-J., Chen, C.-Y., Hou, J.-Y., classified by a panel of four biomarkers
Shih, L.-Y., Liang, D.-C., Comparison using immunohistochemistry, The Breast.
of feature selection methods for cross- 21 (2012) pp. 50–57.
laboratory microarray analysis, IEEE/ Perou, C.M., Sørlie, T., Eisen, M.B., Van De
ACM Transactions on Computational Rijn, M., Jeffrey, S.S., Rees, C.A., Pollack,
Biology and Bioinformatics. 10 (2013) pp. J.R., Ross, D.T., Johnsen, H., Akslen,
593–604. L.A., Molecular portraits of human breast
Ma, J., Nguyen, M.N., Rajapakse, J.C., tumours, Nature. 406 (2000) p. 747.
Gene classification using codon usage Quinlan, J.R., C4. 5: Programs for Machine
and support vector machines, IEEE/ACM Learning. Amsterdam, The Netherlands:
Transactions on Computational Biology Elsevier. 2014.
and Bioinformatics. 6 (2009) pp. 134–143. Ramaswamy, S., Ross, K.N., Lander, E.S.,
Mahata, P., Exploratory consensus of Golub, T.R., A molecular signature of
hierarchical clusterings for melanoma and metastasis in primary solid tumors, Nature
breast cancer, IEEE/ACM Transactions on genetics. 33 (2003) p. 49.
Computational Biology and Bioinformatics Seema Patel, Shadab Ahmed, Eswari, J.S.
(TCBB). 7 (2010) pp. 138–152. Therapeutic cyclic lipopeptides mining
Mitra, A.P., Almal, A.A., George, B., Fry, from microbes: Strides and hurdles,
D.W., Lenehan, P.F., Pagliarulo, V., Cote, World Journal of Microbiology and
R.J., Datar, R.H., Worzel, W.P., The use Biotechnology. 31 (2015), pp. 1177–1193.
of genetic programming in the analysis of Sehgal, A.K., Das, S., Noto, K., Saier, M.,
quantitative gene expression profiles for Elkan, C., Identifying relevant data for a
identification of nodal status in bladder biological database: Handcrafted rules
cancer, BMC Cancer. 6 (2006) p. 159. versus machine learning, IEEE/ACM
Transactions on Computational Biology predicts clinical outcome of breast cancer,

and Bioinformatics. 8 (2011) pp. 851–857. Nature. 415 (2002) p. 530.
Singh, P., Verma, S., Vyas, O., Software Vapnik, V., The Nature of Statistical Learning
fault prediction at design phase, Journal Theory. Springer science & business
of Electrical Engineering & Technology. 9 media: New York, NY, USA, (2013).
(2014) pp. 1739–1745. Venkateswarlu, K. Kiran, Eswari, J., A
Suryawanshi N., Sahu J., Moda Y. and hierarchical artificial neural system
Eswari J.S. Optimization of process for genera classification and species
parameters for improved chitinase activity identification in mosquitoes. Applied
from thermomyces sp. by using artificial Artificial Intelligence. 26 (2012) pp.
neural network and genetic algorithm. 903–920.
Preparative Biochem Biotechnol. (2020), Vuong, P.T. Simpson, Green, B., Cummings,
M.C., Lakhani, S.R., Molecular
in press.
classification of breast cancer, Virchows
Sweilam, N.H., Tharwat, A., Moniem, N.A.,
Archiv. 465 (2014) pp. 1–14.
Support vector machine for diagnosis
Wang, Y., Klijn, J.G., Zhang, Y., Sieuwerts,
cancer disease: a comparative study,
A.M., Look, M.P., Yang, F., Talantov, D.,
Egyptian Informatics Journal. 11 (2010) Timmermans, M., Meijer-van Gelder,
pp. 81–92. M.E., Yu, J., Gene-expression profiles
Taghipour, S., Banjevic, D., Miller, A., to predict distant metastasis of lymph-
Montgomery, N., Jardine, A., Harvey, node-negative primary breast cancer, The
B., Parameter estimates for invasive Lancet. 365 (2005) pp. 671–679.
breast cancer progression in the Canadian Yang, C.-H., Lin, Y.-D., Chaung, L.-Y.,
National Breast Screening Study, British Chang, H.-W. Evaluation of breast cancer
Journal of Cancer. 108 (2013) p. 542. susceptibility using improved genetic
Tan, K.C., Yu, Q., Heng, C., Lee, T.H., algorithms to generate genotype SNP
Evolutionary computing for knowledge barcodes, IEEE/ACM Transactions on
discovery in medical diagnosis, Artificial Computational Biology and Bioinformatics
Intelligence in Medicine. 27 (2003) pp. (TCBB). 10 (2013) pp. 361–371.
129–154. Zhang, H., Yu, C.-Y., Singer, B., Cell and
Van't Veer, L.J., Dai, H., Van De Vijver, M.J., tumor classification using gene expression
He, Y.D., Hart, A.A., Mao, M., Peterse, data: construction of forests, Proceedings
H.L., Van Der Kooy, K., Marton, M.J., of the National Academy of Sciences. 100
Witteveen, A.T., Gene expression profiling (2003) pp. 4168–4172.
INDEX
A gene function annotation, 150

ABC algorithm, 88–89 literature mining, 151
Adaptive energy efficient cluster head molecular dynamics, simulation of,
estimation (ACHE) methodology, 150–151
437–439 transcription factor binding sites,
prediction of, 150
ANNs. See Artificial neural networks
Bayesian network, 83–84
(ANNs)
Big data, 108–109
Anonymization techniques, 300–302
Big data analytics, 478–479
Antecedent fuzzy sets, 7
characteristics of, 478
Arduino, 322
comparative analytics, 479
Arterys, 142–143
decision based, 479
Artificial neural network classifier, 66–67
fundamental properties of, 478
Artificial neural networks (ANNs)
predictive based, 479
accuracies of, 27 prescriptive based, 479
ANN-based diagnosis, flow diagram semantics based, 479
of steps, 199 types of, 479
application in medical sector, BILIRUBIN attribute, 4
198–201 Bio Beats, 141
brain tumor, diagnosis of, 61–62 BioCreative Corpus III (BC3) dataset,
in clinical pharmacology, 68–71 414
diagnosis of muscular disorders, 62–63 Biomarkers, 151
in electrohysterography, 361 Biomedical AI systems
heart disease, diagnosis of, 59 adversarial attacks, 294–296
image segmentation, 203–206 in data, 292–293
medical image processing, 54–57 model stealing, 296
outcome prediction, 63–68 security and privacy issues in,
Atomwise, 143 293–297
Augmented reality microscope (ARM), solutions for, 297–302
486–487 systematic data poisoning, 296
Auto encoder network, 107 transferability attacks, 296
vulnerabilities of, 293
B Biomedical applications
Backpropagation, 56 image segmentation, 203
Baldwinian approach, 77 medical image analysis, 202
Basic biomedical research representation of medical image
automated data collection, 150 analysis system, 203
automated experiments, 149–150 Biomedical imaging techniques
applications of AI in medicine, 149, error histogram and confusion matrix,

161–169 461
basic biomedical research, 149–151 feature extraction through discrete
challenges, 169 wavelet transform (DWT),
clinical practice, 153–156 455–456
computed tomography, 148, 157 feature recognition rate, 464
indomitable, 148 image segmentation, 456–458
magnetic resonance imaging (MRI), neural network classifier performance
157–159 evaluation, 461–463
sphygmomanometer, 159–161 preprocessing phase, 458–460
translational research, 151–153 proposed research work, performance
ultrasound, 159 of, 463
X-ray, 156–157 sample results for, 457
Biomedical intelligence, 472 segmentation phase, 460–461
Biopsy, 64 self-organizing map, 450
Body area networks, 426–427 state-of-the-art methods, 463–464
applications, 430 training state statistics, 462
architecture of, 428–429 wavelet transform method, 452
characteristics of, 427–428 Brain segmentation. See also Brain MR
cluster-based algorithms, 431 image segmentation
cross-layer algorithms, 432 anatomical brain segmentation, 267
layers, 430–431 brain imaging modalities, 263–264
probabilistic algorithms, 431–432 brain lesion, segmentation of, 267
QoS-based routing algorithms, 432 computed tomography, 264–265
routing protocols, 431–433 deep neural networks, 270–274
temperature-based algorithms, 432 hyperparameters, 279
Brain imaging modalities, 263–264. See image processing for, 267–268
also Brain segmentation image registration, 268
computed tomography, 264–265 intensity normalization, 268
magnetic resonance imaging, 264 magnetic resonance imaging, 264
positron emission tomography, 265 neural networks for, 274–277
Brain MR image segmentation, neuroimage segmentation, 265–266
282–283, 448–450 positron emission tomography, 265
block diagram of, 454 segmentation evaluation metrics, 266
conditional random fields (CRFs), 450 skull stripping, 268
converted images and, 459 traditional machine learning, 268–270
dataset used in, 459 training neural networks, 278–279
dilated images with gradient masks, Branch and bound feature reduction
460 algorithm, 88
edge detector block, 460 BraTS, 281–282
eroded image and final marked Breast mass lesion classification,
image, 461 115–116
Index 527
C disease diagnosis, 153

C4.5 algorithm, 82 interpretation of genomic data, 153–154
C5.0 algorithm, 82–83 patient monitoring, 154–155
Cancer, 448 patient risk stratification, 155–156
Cardiotocography (CTG), 373 Clustering, 129–130
classifiers for, 374 Clustering algorithm, 78
example of, 377 CNN-based model, 340–341
feature selection, 373 accuracy vs epoch graph, 346
parameters of, 377 architecture of, 340–341
UCI CTG dataset, 378 construction of, 340–341
CART algorithm. See Classification and emotion recognition, 341
regression tree (CART) algorithm image capture, 341
CC-Cruiser, 143 input data, 340
CHAID algorithm. See Chi-square Kaggle dataset, 340
automatic interaction detection loss vs epoch graph, 346
(CHAID) algorithm system block diagram, 338
Chi-square automatic interaction technical Specifications of, 338
detection (CHAID) algorithm, 80 Cognitive technologies, 109–110
Classification accuracy, 5 Computed tomography, 157
Classification and regression tree for brain imaging, 264–265
(CART) algorithm, 80–81 Computerized mammograms, 116
Classifier-based FER systems, 337 Conditional random fields (CRFs), 450
feature extraction, 339 Confusion metrics, 5
features loaded to, 339 Convolutional neural arrange, 117
input data, 339 Convolutional neural network (CNN), 275
real-time emotion identification, 340 activation function, 272–273
results, 340 convolution layer, 272
system block diagram, 337 dataset specification, 338–339
training, 339–340 facial expressions, detection of,
Classifiers and performance, 378–379 333–334
Clinical decision support system, 25 fully connected layer, 273
Clinical pharmacology, ANN in, 68 fully convolutional CNN model, 276
optimization, 70 objective of, 333–334
pharmacokinetics and patch-based CNN, 275
pharmacodynamics, 71 pooling layers, 273
preformulation parameters, 69 results and discussion, 346–347
quantity structure–activity skipped connection, 273
relationship, 70–71 transposed convolutions, 274
structural screening in drug discovery triplanar CNN, 277
process, 69 U-net, 276
toxicity prediction, 69 Cross-validation, 5
Clinical practice Crowdsourcing, 488–489
automated surgery, 154 Cyber security, 110–111
D supervised learning algorithm, 45

Data and knowledge acquisition, 33–35 time series data, 44
databases, 33 unsupervised learning algorithms,
ER-based rules, 34 45–46
ER data and rules, 34 Diabetic retinopathy (DR), 481–482
meta-data, 35 Diagnosis
rule-based knowledge base, 33 artificial neural networks, 198–199
benefits of AI, in medicine field, 198
Data perturbation, 300
biomarkers, 197
Data reidentification, 301
brain tumor using ANN and EEG,
Dataset for performance evaluation
61–62
Breast Cancer Wisconsin
diagnosis of disease, 197
(Diagnostic), 4
distance medical education, influence
Breast Cancer Wisconsin (Original), 4
of AI, 214
Cardiac Arrhythmia, 4–5
drug discovery, 197
Hepatitis, 4
future perspective, 215
PIMA Indians Diabetes (PID), 4
heart disease using ANN and ECG, 59
Decision tree algorithms, 79–80
medical distance learning, 214
C5.0 algorithm, 82–83
muscular disorders, 62–63
CART algorithm, 80–81 virtual inquiry system, 213–214
CHAID algorithm, 80 Differential privacy (DP), 298–300
ID3 algorithm, 81 Digital Rectal Examination (DRE), 64
structure of, 80 Directed acyclic graph (DAG), 83–84
Decision tree (DT) classifier, 408–409 DreaMed, 142
Deep belief network (DBN), 107 Drug invention, 487
Deep learning, 107–108, 130–131 Drug reprofiling/drug repositioning. See
advancements, 108 Drug repurposing
black box nature, 290 Drug repurposing, 152
in brain segmentation, 261–286 Drug side effect frequency mining,
breast mass lesion, 115–116 234–256
in EMG signals, 114 adverse drug events, 236
emotion recognition, 114–115 Apache Spark, 238
heart diseases detection, 112 challenges and limitations, 252
Infervision, 166 data preprocessing, 240
prognosis of Alzheimer’s disease, 113 ensemble classifier, 242
Deep neural network (DNN), 107 extract frequency, time to, 248
Defuzzification module, 85–86 feature extraction, 240–241
Design issues. See also Biomedical frequency extraction, 243
engineering, machine learning in frequency of, 248–250
associated with building AI models, 46 large dataset processing with spark,
decision-making process, 44 243–244
features of biomedical data, 44 machine learning classification,
learning algorithm, choice of, 45 241–242
Index 529
MetaMap Semantic groups and mean, 359

abbreviations, 243 median, 359
mining and filtering twitter through performance metrics, 362, 366
livestream, 239 power spectral density of, 364
most reported drugs by Twitter users, preprocessing, 359
248 random forest (RF), 361
n-grams, sentiment analysis, 236–237 raw data collection, 359
pharmaceutical analysis, 251–252 raw signals, 363
pipeline for extracting frequency of, review of literature, 355–358
239 sensitivity, 362
pipeline setup, 244–245 skewness, 360
side effects and number reported for specificity, 362
Xanax, 249–250 support vector machines, 360
spark setup, 245–247 Electromyography (EMG), 58–59
twitter for drug side effects, 237–238 in diagnosis of muscular disorders,
Duty cycling and data-driven 62–63
approaches, 433 waveform, 59
E waveforms–frequency ranges, 57
Electronic health record (EHR),
Eager learner, 409
484–486
EchoNous vein, 165–166
accuracy, 485
Effective connectivity measures,
scalability, 485
499–500
Electronic health records, 133–136
Electrocardiography (ECG), 58
Emedgene, 168
hybrid genetic algorithm and
classification technique, 92–93 EMG. See Electromyography (EMG)
Electroencephalography (EEG), 58 Ensemble-based subspace KNN model,
diagnosis of brain tumor, 62 344–346
waveforms, 57 confusion matrix, 345
waveforms–frequency ranges, 57 Expert system, 2–3
Electrohysterography (EHG), 354–355 definition, 2
accuracy classification, 362 knowledge acquisition, 2–3
artificial neural networks, 361 medical expert system, 3
classification of, 358 Extreme learning machine and
classifiers, 360 simulated annealing (ELM and SA),
data collection for uterine 13–14
contraction, 362–363 feature selection, 15
energy, 359 feedforward neural networks
extreme learning machines, 360–361 (FFNNS), 14
filtered response of, 364 fisher score, 15
flow diagram of, 358 normalization, 15
K-nearest neighbors, 360 single hidden layer feedforward
Kurtosis, 359 network, 14
F Fisher score-extreme learning machine-

Facial expression recognition (FER), 332 simulated annealing (FS-ELM-SA),
architecture of the CNN model, 341 18–19
background, 334–337 flow diagram of, 19
based on CNN, 337 Frequency of drug side effects,
basic framework of, 332 248–250
applications of, 252–253
classifier-based systems, 337
challenges and limitations, 252
comparison of models, 347
most frequently reported side effects,
convolution neural networks, 334
253
dataset specification, 338–339
most popular drug pairs, side effect
emotion detection of, 347
by, 254–255
goals, 337
most reported drugs, 248
JAFFE image database, 339
pharmaceutical analysis, 251–252
Kaggle database, 340
side effects caused by more than one
methods, comparison of, 348
drug, 253–254
real-time emotion, 346
top five reported side effects per drug
Farm Ads dataset, 414
with number reported, 249
Feature-based segmentation, 56 uncommon drug pairs, side effects
Feature selection, 6, 42–43, 379 associated with, 255–256
classification with and without Xanax, side effects and number
FA-based, 384 reported for, 249–250
FA-based FS with SVM, 383–384 Fundamental steps in ANN-based
feature subset selection, 382–383 medical diagnosis, 199–201
firefly algorithm, 380–382 database building, 200–201
melded with information gain, feature selection, 199–200
389–393 flow diagram of, 199
opposition-based firefly algorithm, robustness, 201
385–389 testing in medical practice, 201
opposition-based learning, 385 training algorithm, 201
performance metrics, 383–385 Fuzzifier module, 85
results of simulation experiments, Fuzzy-ACO classifier, 6
383–385 Fuzzy classifier
Federated learning, 302 with ant colony optimization (ACO), 6
Feedforward neural networks (FFNNS), antecedent fuzzy sets, 7
14 feature selection, 6
Firefly algorithm, 380–382 fuzzy-ACO classifier, 6
feature subset selection, 382–383 fuzzy inference, 9–10
Fireflies (FA), 380 heuristic information, 8–9
pseudocode, 382 image segmentation, 206
results of simulation experiments, normalization of dataset, 6–8
383–385 pheromone update, 9
Fisher score, 15 rule generation, 8
Index 531
rule modification, 8 lesions in human body, 116

Fuzzy inference, 9–10 for life care process, 103
Fuzzy logic modules, 85–86 lung cancer, 116–117
machine learning, 106
G motivation in healthcare, 105
GASA-SVM model, 12 MRI brain tumor examination, 112–113
Gaussian radial basis function, 13 neural networks, 106–107
Generative adversarial network (GAN), pharmacy automation, 119
285–286 probabilistic boosting tree, 117
Genetic algorithm, 74–75 processing personal data, 102
Ginger.io, 143 prognosis of Alzheimer’s disease, 113
Gleason pattern, 483 rehabilitation robotics, 119
Google deep mind health, 141 robotics, 118–119
Google Glasses, 164 in simulation, 100
Granger causality (GC) models, 493–504 survey of, 120–121
Gray level co-occurrence matrices techniques of, 105
(GLCM), 450 Health care system, 128–130
algorithms of machine learning, 129
H deep learning, 130–131
Hausdorff distance, 266 definition, 2
Healint, 142 disease types, tackled by, 132–133
Healthcare applications of biomedical electronic health records, 133–136
AI system future models, 140–144
auto encoder, 114 in health care sector, 74–78
as automatic machine, 119–120 in medical imaging and diagnosis,
in Autonomous robots, 100 52–57
big data, 108–109 motivations of, 127–128
in Big Data Analytics, 100 natural language processing, 131–132
biomedical system, 101 outcome prediction using ANN, 63–68
brain glioma segmentation, 113 robotic applications in medicine,
breast mass lesion classification, 116 136–138
cognitive innovations, 109–110 techniques of, 106–111
computerized mammograms, 116 virtual nurses, 138–140
convolutional neural arrange, 117 waveform analysis, 57–63
cyber security, 110–111 Health Fidelity, 143
data and image analysis process, 102 Heart disease detection, 59–63
deep learning, 107–108 ANN and EMG in diagnosis of
in diagnosis process, 102–103 muscular disorders, 62–63
drug-induced liver damage (DILI), 117 diagnosis of brain tumor using ANN
heart beat classification, 112 and EEG, 61–62
heart disease, 112 hybrid genetic algorithm for, 91–92
industry 2.4 technologies, 103 image classification, 60–61
Internet of Things, 100 preprocessing, 59–60
QRS complex detection, 60 classification technique and, 92–93

ST-segment analyzation, 60 in clinical diagnosis, 89–95
Heart diseases. See also Heart disease combining genetic algorithm and
detection classification techniques, 78–86
accuracy evaluation, 193 combining genetic algorithm and
analysis of time complexity, 193 image processing techniques, 86–89
angina and myocardial infarction, 174 deep belief network and, for heart
cause of, 175 disease detection, 93
common cardiac diseases, 176 electrocardiogram (ECG)
dataset, 183 examination, 92–93
decision trees, 187–188 with fuzzy logic for predicting heart
deep learning, 179 diseases, 93–94
flow diagram, 183 in health care sector, 75–76
import data, 185–186 heart disease detection, 91–92
K-nearest neighbor, 188–189 Lamarckian approach, 77
machine learning, 177–179 machine learning, 78–79
motivation, 176–177 malignant cell detection, 91
multilayer perceptron, 190–191 with neural network for breast cancer
overview of, 174–176 detection, 90–91
prediction model, 187 in radiology, 89–90
prediction of, 180–182 steps to develop, 77–78
preprocessing, 184–185
random forest, 190 I
result analysis, 191–192 IBM Watson System, 143–144
support vector machine, 189–190 Image processing for brain
symptoms of, 174–175 segmentation, 267
time complexity evaluation, 193 image registration, 268
transformation in health care, 179 intensity normalization, 268
Heart rate sensor, 321–322 skull stripping, 268
Hepatitis dataset, 4 Image registration, 268
Hopfield neural network, 55 Image segmentation methods, 56–57
Hybrid GA–SA optimization technique, data mining technique, 207–208
10 fuzzy logic, 206
acceptance function, 11 machine learning techniques, 203–206
neighborhood search (hybrid GA– pattern recognition, 206–207
SA), 11–12 textural classification, 207
steps of GA, 10–11 Inference module, 85
terminology of SA, 11–12 “Infervision”, 166
Hybrid genetic algorithms Information base module, 85
artificial intelligence, 74–78 Integrated system health management
artificial neural networks and, in (IHSM), 209
orthodontics, 94–95 sample method for, 210
Baldwinian approach, 77 taxonomy of, 210
Index 533
Intensity normalization, 268 transmission phase, 317, 322–324

Intrinsic pattern extraction algorithm, transmitter section, 322–323
86–87 workflow of, 315–316
Ischemic stroke lesion segmentation Live patient monitoring system
(ISLES), 282 (LiMoS), 318–320
Iterative Dichotomiser-3 (ID3) ICU information, 320
algorithm, 81 main source page of, 318
mobile app, 319
J monitoring page of, 319
Jvion, 141–142 patient information system, 320
System app, 318
K throughput, 326–327
Karush–Kuhn–Tucker conditions, 68 Long short-term memory (LSTM), 107
Kernel function, 13 LS-SVM and SA model, 19–20
KNN model, 341–342 classification, 20
Knowledge representation, 35–36 cross-validation, 21–22
data forms, 36 feature selection, using Fisher score,
different representations, 38–42 20
ER data converted to electronic data, FS-LSSVM-SA, 22–23
38 optimization, 20–21
medical data to electronic data, 37 Lumiata, 142
patient record, 37 Lymph node assistant (LYNA), 484
Kohonen network, 55 Lymph node-negative breast cancers,
510–520
L accuracy measures, 516–517
Lamarckian approach, 77 area under the ROC curve (AUC),
Late slightest squares bolster vector 517–519
machine (LSSVM), 113 confusion matrix, 516
Lazy learner, 409 database, 513
Li-Fi technology, 314–315. See also decision trees, 515
Medical monitor kit (MMK) microarray database construction,
applications, 315 512–513
block diagram of transmitter and Naïve Bayes, 514
receiver modules, 322 preprocessing of data and generation
cost, 328 of data sets, 513
entering phase, 317 support vector machine, 513–514
future perspective, 328 tenfold cross-validation for, 515–516
intermediate phase, 317, 320–322
limitations of, 315–316 M
optical channel characteristics, 324 Machine learning (ML), 3–4, 31–33,
receiver module, 323–324 106, 472–473
throughput, 326–327 algorithms, 161
time, 328 applications of, 479–489
benchmark, 475 artificial neural networks (ANNs), 53

in biomedical research, 474 common artificial neural networks,
complexity with enhanced and simple 53–54
usage, 490 image segmentation, 56–57
constructive to society, 490 neural network used in, 53
conventional, 475–476 object recognition, 57
data and knowledge acquisition, preprocessing, 54–56
33–35 Medical monitor kit (MMK), 320–322
data management, 490 Arduino, 322
deep learning, 476–477 block diagram of, 320
design issues, 44–46 components, 321
feature selection, 42–43 heart rate sensor, 321–322
future perspective, 490–491 LEDs, 322
government AI implementation resistance temperature detector, 321
strategy, 490 respiration sensor, 321
knowledge representation, 35–42 Medical Transportation Robots
model prediction, 474 (MTRs), 137
models of, 473–474 Medtronic, 162
natural language processing, 477–478 Medtronic Guardian Connect system, 162
reduced cost, 491 Metastasis, 510–512
replacement for health experts, 491 Metastatic breast cancer, 483–484
research challenges, 490–491 ML. See Machine learning (ML)
technologies of, 475–478 MRI brain tumor, 112–113
validation, 46–48 Multilayer feedforward network, 57
Magnetic resonance imaging (MRI), Multilayer perceptron (MLP), 56
157–159, 448–450
for brain imaging, 264 N
brain tumor segmentation, 449 Naïve Bayes (NB) classifier, 67–68, 409
Magnetoencephalography, 158 Natural language processing (NLP),
Maximum posteriori hypothesis, 67 131–132, 477–478
Medical data mining, 219 classification, 477–478
Medical expert system. See also text processing, 477
specific entries K-nearest neighbor, 84–85
ELM and SA, 13–19 Network model, 435–436
fuzzy classifier with ANT colony Neural networks, 79, 106–107
optimization (ACO), 6–10 hyperparameters, 279
LS-SVM and SA, 19–23 training, 278
proposed systems and the existing Neural source connectivity estimation,
systems, accuracies of, 27 493–504
SVM and hybrid genetic algorithm EEG source localization model,
(GA)-SA optimization, 10–13 496–497
Medical imaging and diagnosis, 52–53 effective connectivity measures,
artificial neural network in, 54–57 499–500
Index 535
electroencephalogram (EEG) P
information, 494 Particle filter, 497–499
locations of sources, 501 particle generation, 498
particle filter, 497–499
resampling, 498–499
process of, 500
Patient monitoring. See also Diagnosis
real EEG data, 501
assistance
Neuroimage segmentation, 265–266
Li-Fi technology, 314–316
Neuroimaging applications
anatomical brain segmentation, 267 LIMOS framework, 318–320
brain lesion segmentation, 267 overview to, 312–313
Neuroscience, 480–481 proposed idea, 316–320
decoding mechanism, 480–481 stages of, 313
neural encoding, 481 Wi-Fi role in, 313–314
Nodal metastasis, 484 Performance metrics
Normalization of dataset, 6–8 classification accuracy, 5
confusion, 5
O cross-validation, 5
Object recognition, 57 sensitivity, 5
Opposition-based firefly algorithm, specificity, 5
385–386 Pharmacokinetics and
melded with information gain, pharmacodynamics, 71
389–393 Pharmacy automation, 119
pseudocode, 387 Pipeline speed comparison, 247–248
results of simulation experiments, Pixel-based segmentation, 56
387–389 Positron emission tomography
Opposition-based learning, 386 for brain imaging, 265
Optical wireless channel (OWC),
Preformulation parameters, 69
324–325
Preprocessing, 54–56
block diagram of, 324–325
Privacy-preserving data mining
line of sight, 324–325
(PPDM), 300
non-line of sight, 324–325
path loss, 325–326 Probabilistic boosting tree, 117
Outcome prediction using ANN, 63–64 Prognostics model, 208–213
artificial neural network classifier, application of, 211–212
66–67 assessment of, 212–213
biopsy, 64 model building, 212
clinical and pathological stages, 65–66 Prostate cancer, 482–483
Digital Rectal Examination (DRE), 64 Prostate-Specific Antigen (PSA), 64
naive Bayes classifier, 67–68 Pseudonymization, 301
primary and secondary Gleason
patterns, 64–65 Q
Prostate-Specific Antigen (PSA), 64 Quantity structure–activity relationship
support vector machine classifier, 68 (QSAR), 70–71
R systematic data poisoning, 296

Radial basis function neural networks transferability attacks, 296
(RBFNN), 361 Segmentation evaluation metrics
Random forest (RF), 361 Hausdorff distance, 266
Recurrent neural network (RNN), 107 Sørensen–Dice coefficient, 266
Regional CNN (R-CNN), 285 Self-organizing map (SOM), 450
Rehabilitation, 119, 137 Semantic annotation of healthcare data,
Relevance feedback method, 89 218–230
Representations, 38–42 experiments and results, 228–230
action and production rule, 41 generic annotation model, 221
databases, 38–40 literature survey, 222–226
frames, 40 machine learning and biomedical
frame structure and inherited user expert systems, 221
specific frame, 40 meaning, 219–220
knowledge-based systems for need for, 220
temporal data, 40 objective of, 226–227
production rules, 41–42 ontograf of attributes and class, 228
sample patient relation, 40 proposed semantic annotation model,
Resistance temperature detector, 321 227
Respiration sensor, 321 relationship between main class,
Restricted Boltzmann machine (RBM), subclass, and features, 229
107 Simulated annealing (SA)
Robotic prescription dispensing cross-validation, 18
systems, 138 search algorithm, 17
terminology of, 11
S Single hidden layer feedforward
Sanitation and disinfection robots, 137 network (SLFN), 13–14
Security and privacy, in biomedical AI Skull stripping, 268
systems, 293–297 Sleep staging, 487–488, 489
adversarial attacks, 294–296 SMOTE algorithm, 518–519
anonymization, 300–302 Sørensen–Dice coefficient, 266
authentication techniques, 298 Sphygmomanometer, 159–161
best practices to ensure, 303 Structural screening in drug discovery
data perturbation, 300 process, 69
dataset reconstruction, 296 Sugar.IQ app, 162
differential privacy, 298–300 Supervised learning, 129
encryption mechanisms, 298 Supervised ML algorithms, in
federated learning, 302 biomedical text document
linkage attacks, 297 classification, 408
model theft or model stealing, 296 artificial neural network (ANN),
privacy-preserving data mining 410–411
(PPDM), 300 Confusion Matrix, 415
solutions for, 297–302 decision tree (DT) classifier, 408–409
Index 537
experimental setup, 413–415 artificial neural network (ANN),

future scope, 420–421 410–411
hyper-parameters Settings, 416 classification step, 406–407
K-nearest neighborhood classifier decision tree (DT) classifier, 408–409
(K-NN), 409–410 K-nearest neighborhood classifier
Naïve Bayes (NB) classifier, 409 (K-NN), 409–410
passive–aggressive (PA) classifier, Naïve Bayes (NB) classifier, 409
411–412 passive–aggressive (PA) classifier,
performance analysis, 417–420 411–412
performance measure, 415–416 performance of classifiers, 418
random forest classifier, 413 postprocessing step, 407
Ridge classification algorithm, 411 preprocessing, 404–406
Rocchio classification algorithm, 411, random forest (RF), 413
412 Ridge classification algorithm, 411,
support vector machine (SVM), 410 412
Support vector machine (SVM) model, Rocchio classification algorithm, 411
support vector machine (SVM), 410
83
Texton-based contour gradient
accuracy results, 343
extraction algorithm (TCGR), 87
classifier, 68
Text preprocessing
confusion matrix, 343
document representation, 406
detection of facial expressions,
feature extraction, 404
343–344
feature reduction, 405
in lymph node-negative breast
feature selection, 405
cancers, 513–514 feature transformation, 406
text document classification, for, 410 filtering, 405
for uterine contraction signals, 360 lemmatization, 405
Surgical Assistants, 137 stemming, 405
SVM and hybrid genetic algorithm stop-words removal, 405
(GA)-simulated annealing (SA) tokenization, 404
optimization Texture segregation, 56–57
classification, 12–13 Threshold-sensitive energy efficient
feature selection, 10 sensor network (TEEN), 435
optimization using hybrid GA–SA, Throughput, 326–327
10–12 analysis of the RF and Li-Fi
SVM model. See Support vector communication, 327
machine (SVM) model Toxicity prediction, 69
Swarm intelligence (SI), 380 Traditional machine learning, 268–269
feature selection, 269
T linear discriminant analysis, 269
Telepresence, 136–137 random forest, 270
Text classification process, 403–407. support vector machine, 269
See also Supervised ML algorithms Training techniques
batch normalization, 280 W

data augmentation, 279 Waveform analysis, 57
dropout, 280 ECG, 58
transfer learning, 279–280 EEG, 58
Translational research EMG, 58–59
biomarker discovery, 151 heart disease detection, steps used in,
drug repurposing, 152 59–63
drug-target prioritization, 151–152 medical field, in, 59
genetic variant annotation, 152–153 Waveforms–frequency ranges, 57
prediction of complex toxicities, 152 Wavelet transform method, 452
Treatment payloads, 141 Wearable tech (IOT), 488
TREC 2006 Genomics Track dataset, Web-based medical diagnosis, 208
414 Wireless sensor network (WSN),
Twitter Application Program Interface 424–426. See also Body area
networks
(API), 234
adaptive energy efficient cluster head
U estimation methodology, 437–439
amount of data packets, 443
Ultrasound, 159
body area networks, 426–433
Unsupervised learning, 129 cluster head vs rounds, 441
Uterine electromyography (EMG), 354 network life time, 442
V network model, 435–437
network parameters, 439
Validation, 46. See also Biomedical performance metrics, 440
engineering, machine learning in related works, 433–435
data and training algorithm, 47 result analysis, 440–443
domain expert, inputs from, 46–47 simulation environment, 439
performance evaluation, 47–48
Virtual nurses, 138 X
characteristics of, 138–140 X-ray, 156–157

Handbook of Artificial Intelligence in Biomedical Engineering

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Handbook of Artificial Intelligence in Biomedical Engineering

Uploaded by

Copyright:

Available Formats

HANDBOOK OF

© 2021 Apple Academic Press, Inc.

Coverage & Approach

• In-depth information about biomedical engineering along with

Book Series Editors:

Vijender Kumar Solanki, PhD

Noor Zaman, PhD

Brojo Kishore Mishra, PhD

FORTHCOMING BOOKS IN THE SERIES

The Congruence of IoT in Biomedical Engineering: An Emerging Field

Handbook of Artificial Intelligence in Biomedical Engineering

Handbook of Deep Learning in Biomedical Engineering and Health

Biomedical Devices for Different Health Applications

Handbook of Research on Emerging Paradigms for Biomedical and

High-Performance Medical Image Processing

Saravanan Krishnan, PhD, is Senior Assistant Professor in the Department

Ramesh Kesavan, PhD, is Assistant Professor in the Department of

B. Surendiran, PhD, is Associate Dean (Academic) and Assistant Professor

G. S. Mahalakshmi, PhD, is Associate Professor in the Computer Science

Contributors .......................................................................................................... xiii

1. Design of Medical Expert Systems Using Machine

2. From Design Issues to Validation: Machine Learning in

3. Biomedical Engineering and Informatics Using

4. Hybrid Genetic Algorithms for Biomedical Applications ........................ 73

5. Healthcare Applications of the Biomedical AI System ............................. 99

6. Applications of Artificial Intelligence in Biomedical Engineering ........ 125

7. Biomedical Imaging Techniques Using AI Systems ................................ 147

8. Analysis of Heart Disease Prediction Using Machine

9. A Review on Patient Monitoring and Diagnosis Assistance by

10. Semantic Annotation of Healthcare Data ................................................ 217

11. Drug Side Effect Frequency Mining over a Large Twitter

12. Deep Learning in Brain Segmentation..................................................... 261

13. Security and Privacy Issues in Biomedical AI Systems and

14. LiMoS—Live Patient Monitoring System ................................................311

15. Real-Time Detection of Facial Expressions Using k-NN, SVM,

16. Analysis and Interpretation of Uterine Contraction Signals

17. Enhanced Classification Performance of Cardiotocogram

18. Deployment of Supervised Machine Learning and Deep

19. Energy Efficient Optimum Cluster Head Estimation for

20. Segmentation and Classification of Tumour Regions from Brain

21. A Hypothetical Study in Biomedical Based Artificial Intelligence

22. Neural Source Connectivity Estimation using particle filter

23. Exploration of Lymph Node-Negative Breast Cancers by

Index .................................................................................................................... 525

Manivanna Boopathi Arumugam

Vinit Kumar Gunjan

Praveen Kumar Gupta

Puja Sahay Prasad

Santhosh Kumar Veeramalla

ABC artificial bee colony

DILI drug-induced liver damage

GNB Gaussian naïve Bayes

OWL web ontology language

Biomedical engineering is a multidisciplinary field that applies engineering prin-

Chapter 4 gives an insight into the diagnosis of heart disease using

Chapter 11 describes a drug side effect analysis system based on reviews

Chapter 17 briefly presents the feature selection techniques to improve

DESIGN OF MEDICAL EXPERT

Vellore Institute of Technology, Vellore, India.

ABSTRACT the accuracy of diagnosis has moti-

To be precise, the algorithms do The PID dataset for diabetes

in one of the 16 groups. The FP is the incorrect predictions

1.2 MEDICAL EXPERT SYSTEM k. rcf

The selection of the most significant 1.2.3 MEMBERSHIP

function parameter gamma (γ) and A linear SVM is a binary clas