Professional Documents
Culture Documents
Handbook of Artificial Intelligence in Biomedical Engineering
Handbook of Artificial Intelligence in Biomedical Engineering
ARTIFICIAL INTELLIGENCE IN
BIOMEDICAL ENGINEERING
Biomedical Engineering: Techniques and Applications
HANDBOOK OF
ARTIFICIAL INTELLIGENCE IN
BIOMEDICAL ENGINEERING
Edited by
Saravanan Krishnan, PhD
Ramesh Kesavan, PhD
B. Surendiran, PhD
G. S. Mahalakshmi, PhD
First edition published 2021
Apple Academic Press Inc. CRC Press
1265 Goldenrod Circle, NE, 6000 Broken Sound Parkway NW,
Palm Bay, FL 32905 USA Suite 300, Boca Raton, FL 33487-2742 USA
4164 Lakeshore Road, Burlington, 2 Park Square, Milton Park,
ON, L7L 1A4 Canada Abingdon, Oxon, OX14 4RN UK
This new book series aims to cover important research issues and concepts
of the biomedical engineering progress in alignment with the latest tech-
nologies and applications. The books in the series include chapters on the
recent research developments in the field of biomedical engineering. The
series explores various real-time/offline medical applications that directly
or indirectly rely on medical and information technology. Books in the
series include case studies in the fields of medical science, i.e., biomedical
engineering, medical information security, interdisciplinary tools along with
modern tools, and technologies used.
The editors welcome book chapters and book proposals on all topics in the
biomedical engineering and associated domains, including Big Data, IoT,
ML, and emerging trends and research opportunities.
S. Anto
School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
Sarojini Balakrishanan
Department of Computer Science, Avinashilingam Institute for Home Science and
Higher Education for Women, Coimbatore 641043, India
B. Bhavya
Deloitte Consulting India Private Limited, Bengaluru, Karnataka
Bichitrananda Behera
Department of Computer Science, Pondicherry University, Karaikal, India
J. V. Bibal Benifa
Department of Computer Science and Engineering, Indian Institute of Information Technology,
Kottayam, India
Deya Chatterjee
Department of Computer Science and Engineering, SRM Institute of Science and Technology,
Kattankulathur, Chennai 603203, India
Zafer Comert
Department of Software Engineering, Samsun University, Turkey
D. Renuka Devi
Department of Computer Science, IDE, University of Madras, Chennai 600005, Tamil Nadu, India
S. Siamala Devi
Department of Computer Science and Engineering, Sri Krishna College of Technology,
Coimbatore, India
J. Satya Eswari
Department of Biotechnology, National Institute of Technology Raipur, Raipur,
Chhattisgarh 492010, India
N. Gopikarani
Department of Computer Science and Engineering, PSG College of Technology, Coimbatore,
Tamil Nadu
S. Shymala Gowri
Department of Computer Science and Engineering, PSG College of Technology, Coimbatore,
Tamil Nadu
Lingaiya Hiremat
Department of Biotechnology, R. V. College of Engineering, Bangalore, India
Dennis Hsu
Department of Computer Science, San Jose State University, San Jose, CA, USA
K. R. Jothi
School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
Rhutu Kallur
Department of Electronics and Communication, R. V. College of Engineering, Bangalore, India
K. V. N. Kavitha
School of Electronics Engineering, Vellore Institute of Technology, Vellore, India
T. Ananth Kumar
Department of Computer Science and Engineering, IFET college of Engineering, Tamil Nadu, India
G. Kumaravelan
Department of Computer Science, Pondicherry University, Karaikal, India
R. Lokeshkumar
School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
P. Mahalakshmi
Department of Electronics and Communication Engineering, Anna University Regional Campus,
Tirunelveli, Tamil Nadu, India
M. Manonmani
Department of Computer Science, Avinashilingam Institute for Home Science and
Higher Education for Women, Coimbatore 641043, India
S. Shyni Carmel Mary
Department of Computer Science, IDE, University of Madras, Cheapuk, Chennai 600 005,
Tamil Nadu, India
G. Venifa Mini
Department of Computer Science and Engineering, Noorul Islam Centre for Higher Education,
Kumaracoil, India
Diane Moh
College of Pharmacy, Touro University, Vallejo, CA, USA
Melody Moh
Department of Computer Science, San Jose State University, San Jose, CA, USA
Teng-Sheng Moh
Department of Computer Science, San Jose State University, San Jose, CA, USA
Saurabh Mukherjee
Banasthali Vidyapith Banasthali, Rajasthan, India
Srilakshmi Mutyala
Stratalycs Technologies Pvt. Ltd., Bangalore, India
A. Aafreen Nawresh
Department of Computer Science, Institute of Distance Education, University of Madras, Chennai, India
E-mail: anawresh@gmail.com
Contributors xv
G. Niranjana
Department of Computer Science and Engineering, SRM Institute of Science and Technology
K. Padmavathi
Department of Computer Science, PSG College of Arts and Science, Coimbatore 641014,
Tamil Nadu, India
Rashmi Pathak
Siddhant College of Engineering, Sudumbre, Pune, Maharashtra, India
P. Sivananaintha Perumal
Department of Computer Science and Engineering, Manonmaniam Sundaranar University,
Tirunelveli, India
N. Hema Priya
Department of Information Technology, PSG College of Technology, Coimbatore, Tamil Nadu
S. Suja Priyadharsini
Department of Electronics and Communication Engineering, Anna University Regional Campus,
Tirunelveli, Tamil Nadu, India
Sindhu Rajendran
Department of Electronics and Communication, R. V. College of Engineering, Bangalore, India
R. S. Rajesh
Department of Computer Science and Engineering, Manonmaniam Sundaranar University,
Tirunelveli, India
T. V. K. Hanumantha Rao
Department of Electronics and Communication Engineering, National Institute of Technology,
Warangal, Telangana 506004, India, E-mail: tvkhrao75@nitw.ac.in
A. S. Saranya
Department of Computer Science, PSG College of Arts and Science, Coimbatore 641014,
Tamil Nadu, India
S. Sasikala
Department of Computer Science, IDE, University of Madras, Cheapuk, Chennai 600 005,
Tamil Nadu, India
S. Arunmozhi Selvi
Department of Computer Science and Engineering, Manonmaniam Sundaranar University,
Tirunelveli, India
A. Sharmila
School of Electrical Engineering, Vellore Institute of Technology, Vellore, India
Christa I. L. Sharon
Department of Information Science and Engineering, Dayananda Sagar College of Engineering,
Bangalore, Karnataka, India
Vidhya Shree
Department of Electronics and Instrumentation, R. V. College of Engineering, Bangalore, India
Rajendran Sindhu
Department of Electronics and Communication, R.V. College of Engineering, Bangalore 560059, India
xvi Contributors
Pradeep Singh
Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur,
Chhattisgarh 492010, India
P. Srividya
Department of Electronics and Communication, R.V. College of Engineering, Bangalore 560059, India
J. Stalin
Department of Computer Science and Engineering, Manonmaniam Sundaranar University,
Tirunelveli, India
V. Suma
Department of Information Science and Engineering, Dayananda Sagar College of Engineering,
Bangalore, Karnataka, India
P. Sundareswaran
Department of Computer Science and Engineering, Manonmaniam Sundaranar University,
Tirunelveli, India
Meghamadhuri Vakil
Department of Electronics and Communication, R. V. College of Engineering, Bangalore, India
Subha Velappan
Department of Computer Science & Engineering, Manonmaniam Sundaranar University,
Tirunelveli, India
Hao-Yu Yang
CuraCloud Corporation, Seattle, WA, USA
ABBREVIATIONS
TP true positive
TRIBAS triangular basis function
TS training set
UMLS Unified Medical Language System
UWB ultra-wide band
VAE variationalautoencoder
VLC visible light communication
VSM vector space method
VSVM vicinal back vector machine
WAC weighted associative classifier
WBAN wireless body area network
WDBC Wisconsin Diagnostic Breast Cancer
WSNs wireless sensor networks
PREFACE
machine is proposed for the disease its environment and takes actions to
diagnosis. The hybrid GASA is used maximize the probability of success.
for selecting the most significant AI encompasses an extensive range
feature subset as well as to optimize of areas in decision-making process.
the kernel parameters of the SVM One of the subfields of AI is an
classifier. As a next step, a decision automated diagnosis. It is related to
support system based on Fisher the development of algorithms and
score-extreme learning machine techniques to confirm the behavior
(ELM)-simulated annealing is of a system. Thus, the developed
proposed. The ELM-based learning algorithm should be capable enough
machine uses a single-hidden layer to discover its cause, whenever
feedforward neural network. Finally, something goes wrong.
an expert system based on least
square–support vector machine–
simulated annealing (LS-SVM-SA) 1.1.2 EXPERT SYSTEMS
is proposed. FS is used for the selec-
tion of the most significant features. An expert system is “a computer
To improve the performance of the program that represents and reasons
system, LS-SVM with radial basis with knowledge of some specialist
function is used for classification subject with a view to solving prob-
and the SA is used for the optimiza- lems or giving advice” (Jackson,
tion of the kernel parameters of the 1999).
LS-SVM. It consists of a knowledge source
and a mechanism that solves prob-
lems and returns a response based
1.1 INTRODUCTION on the information provided by the
query. Direct input from domain
1.1.1 ARTIFICIAL INTELLIGENCE experts and evidence from literature
are the sources of knowledge to the
Artificial intelligence (AI) is a expert systems. To solve expert-
replication of human intelligence level problems, efficient access to a
by computer systems. It is an inter- substantial domain knowledge base
disciplinary field that embraces a and a reasoning mechanism to apply
number of sciences, professions, and the knowledge to the problems is
specialized areas of technology. To mandatory.
be precise, AI will not replace people Knowledge acquisition, the process
but will augment their capabilities. of transforming human knowledge to
AI helps to devise an intelligent machine-usable form is considered a
agent in which, an agent observes bottleneck (Feigenbaum, 1977) as it
Design of Medical Expert Systems 3
demands more time and labor. Further, are used to design such expert
maintaining the knowledge base is systems. These decision support
also a challenging task (Coenen and systems can play a major role in
Bench-Capon, 1992; Watson et al., assisting the physicians, while
1992). Techniques such as case-based making complex clinical decisions,
reasoning (CBR) (Watson and Marir, thereby, can improve the accuracy of
1994) and machine learning (ML) diagnosis. Such systems have higher
methods based on data are used for optimization potential and reduced
inference as they avoid the knowledge financial costs. Pattern recognition
acquisition problem. In CBR, the and data mining are the techniques
knowledge consists of preceding cases used in these expert systems that
that include the problem, solution, and allow retrieval of meaningful infor-
the outcome stored in the case library. mation from large scale medical
To obtain a solution for a new case, it data.
is needed to identify a case that resem-
bles the problem in the case library
and adopt the proposed solution from 1.1.3 MACHINE LEARNING
the retrieved case. Similar to CBR,
ML-based expert systems avoid the ML, a subfield of computer science
bottleneck of knowledge acquisition and statistics, is a scientific discipline
as knowledge is directly obtained from that deals with the design and study
data. Recommendations are generated of algorithms to learn from data and
by nonlinear forms of knowledge and to make autonomous decisions. It has
easily updated by simply adding new strong ties to data mining (Mannila
cases. and Heikki, 1996), AI, and optimiza-
tion. It does not explicitly program
computers to acquire knowledge
1.1.2.1 MEDICAL EXPERT but emphases on the development
SYSTEM of computer programs that grow
and change by teaching themselves
A decision support system is when exposed to new data. Further,
computer software that attempts it focuses more on exploratory data
to act like a human being. During analysis (Friedman, 1998) and on
the past few years, medical expert the improvement of machine proj-
systems for the diagnosis of different ects that develop and change when
diseases have received more atten- given new information. Knowledge
tion (Kourou et al., 2015; Fan et representation and generalization are
al. 2011). Knowledge discovery in the core of ML.
patient’s data and machine learning
4 Handbook of Artificial Intelligence in Biomedical Engineering
h hˆ classh R j
j
( ) extremely redundant, then the input
data is represented with a reduced set
=
( C −1) (1.12) of features. The selection of the most
significant subset of features from a
The certainty grade for any
dataset is an optimization problem.
combination of antecedent fuzzy sets
In this system, the feature selection
can be specified. Combinations of
is done using hybrid GA–SA local
antecedent fuzzy sets for generating
search mechanism.
a rule set with high classification
ability are to be generated by the
fuzzy classification system. When a 1.3.2 OPTIMIZATION USING
rule set is given, an input pattern is
HYBRID GA–SA
classified by a single rule as given
below: This hybrid GA–SA optimiza-
tion technique is used for feature
{
µ j ( x p ) .CFj = max µ j ( x p ) .CFj | R j } (1.13)
selection and SVM parameter
optimization. The performance of
The winner rule has the maximum
an SVM classifier depends mainly
product of the compatibility and
on the values of the kernel function
certainty grade CFj.
parameter, Gamma (γ), and penalty
function parameter (C). Finding the
1.3 MEDICAL EXPERT SYSTEM best values of these two parameters
BASED ON SVM AND HYBRID to achieve a maximum classification
accuracy of the SVM classifier is
GENETIC ALGORITHM (GA)-
an optimization problem. A hybrid
SIMULATED ANNEALING (SA)
GA–SA algorithm is used to solve
OPTIMIZATION this problem and find the optimal
values of “C” and “γ.”
1.3.1 FEATURE SELECTION
USING GA AND SA
1.3.2.1 STEPS OF GA
Feature selection is an optimiza-
tion problem, which is based on 1. Randomly generate an initial
the principle of picking a subset of source population with “n”
attributes that are most significant in chromosomes.
deciding the class label. It reduces 2. Calculate the fitness function
the dimension of the data. When the f(x) of all chromosomes in
input to an algorithm is too large to the source population using
be processed and is suspected to be min f ( x ) = 100* (x (1) − x ( 2 )) 2 + (1 − x (1)) 2
2
Design of Medical Expert Systems 11
w,b ,ε 2
The kernel function K(xi, xj) has
manifold forms. In this work, the
( )
Subject to yi W T Z i + b ³ 1− ε i , ε i ³ 0, i = 1, …., l. Gaussian kernel function is shown in
Equation (1.20) or (1.21) is used:
where the training vector “xi” is
mapped onto a high dimension space K ( x, xi ) = exp −γ x − xi( 2
) (1.20)
by mapping function φ as zi = φ (xi).
C > 0 is the penalty parameter of the ⎛ 1 2⎞
K ( x, xi ) = exp ⎜ − 2 x − xi ⎟ (1.21)
error term. ⎝ σ ⎠
Usually, Equation (1.14) is
resolved by sorting out the following Both Equations (1.20) and (1.21),
dual problem: which are in the same context, can
transform parameters “γ” and “s2.”
1 The Gaussian kernel parameter “c”
min F (α ) α T Qα − eT α (1.15) 1
α 2 is determined by γ = σ 2 .
Subject to 0 ≤ α i ≤ C , i = 1, …l
The parameters of SVMs with
Gaussian radial basis function
γ Tα = 0 (1.16) (RBF) kernel refer to the pair—the
error penalty parameter “C” and
where “e” is the vector of all 1’s and the Gaussian kernel parameter “γ,”
“Q” is a positive semidefinite matrix. usually depicted as (C, γ).
The (i, j)th element of “Q” is
given by
1.4 MEDICAL EXPERT SYSTEM
Qi , j ≡ yi y j K ( xi x j ) (1.17) BASED ON ELM AND SA
• Optimization parameters:
(
G ( a i , X j , bi ) = g bi X j − a i ) (1.30)
where
“ β ” is used as the estimated value
⎡ G ( a1 , X1 , b1 ) G ( a 2 , X1 , b 2 )………G ( a p , X1 , b p ) ⎤ of “β,” where “H#” is the Moore–
⎢ ⎥
⎢ G ( a1 , X 2 , b1 ) G ( a 2 , X 2 , b 2 ) ………G ( a p , X 2 , b p ) ⎥ Penrose generalized inverse of the
⎢ ⎥
H = ⎢⎢
. ⎥ hidden layer output matrix “H”
⎥
⎢ . ⎥ (Serre, 2002).
⎢ . ⎥
⎢ ⎥
⎢⎣G ( a1 , X N , b1 ) G ( a 2 , X N , b 2 ) ………G ( a p , X N , b p ) ⎥⎦
D ( x ) = ( w * x ) + w0 (1.36)
1.5.1 FEATURE SELECTION
USING FISHER SCORE
yi ⎡⎣( w ∗ xi ) + w0 ⎤⎦ ≥ 1 , i= 1, . . . , n.
The FS algorithm is used for many (1.37)
supervised learning systems to deter
mine the most relevant and discrim where “xi” is the input vector, “m”
inative features for classification the number of features and “yi” is
(Yilmaz, 2013). Based on the promi the class label. Support vectors are
nence of the attributes in the dataset, the margin values that are formed
it generates a score for each attribute when the equality of Equation (1.37)
and vital features are selected based holds. Classification of data is done
on the scores. It uses discriminative using these support vectors.
methods and generative statistical
models to perform feature selection.
1.5.3 OPTIMIZATION USING
SIMULATED ANNEALING
1.5.2 CLASSIFICATION USING
LEAST SQUARE SUPPORT The performance of the LS-SVM
classifier (Aishwarya and Anto,
VECTOR MACHINE (LS-SVM)
2014) is also subjective to the values
The main objective of SVM in clas of ‘C’ and ‘γ’. As finding the best
sification is to separate data into values of these parameters is monot
two different classes with maximum onous, optimization techniques are
margin. A higher computational used along with LS-SVM. SA is one
load due to the quadratic program of the most popular optimization
ming problem is a challenge in techniques used for finding a solu
SVM. To balance this, Suykens and tion to the optimization problem. It
Vandewalle (1999) have proposed is a local heuristic search algorithm
LS-SVM. LS-SVM uses linear equa that uses a greedy method for finding
tions instead of quadratic program an optimal solution.
ming of SVM.
Design of Medical Expert Systems 21
n (µ −µj )
j 2
( j ) 2
, j = 1, 2,, p. (1.39)
remaining one fold is tested
where “nk” is the number of instances
using the trained classifier.
in class “k,” “µj” is the mean of the
The output of each classifier
whole dataset for jth feature, “σj” is
generates accuracy for the
the SD of the whole dataset for jth
predicted sets. The classifier
feature. The SD is given by
performance is analyzed
using performance metrics. c
( j ) 2 = nk ( kj ) .
2
3. Calculation of CVS: It is (1.40)
calculated for each combina- k =1
tion of “C” and “γ” values. The values of the selected features
CVS is obtained from Equa- from the datasets are normalized
tion (1.38) between the range “0” and “1”as
shown below
# Records Predicted True
CVS = . (1.38)
# Total Records X normalized =
X − X min
( Upper Bound − Lower Bound ) .
X max − X min
4. Optimization parameters: (1.41)
Find an optimal solution for
the kernel parameters “C” where “X” is the original data, “Xmax”
and “γ” is tedious in case of is the maximum value of X, “Xmin” is
any SVM. the minimum value of “X” and “Xnor-
Hence, the SA optimization tech- ” is the normalized value within
malized
nique is used. Here, the kernel values the given upper and lower bounds.
are varied between 2−5 to 215 for “C” The following inequality holds
and 2−15 to 25 for “γ.” for the margins (support vectors) of
the hyper-planes:
1.5.4 FS-LSSVM-SA yk × D( xk )
€ •, k = 1,, n. (1.42)
w
The proposed FS-LSSVM-SA (Aish-
warya and Anto, 2014) involves Margin (Γ) is inversely propor-
the following steps namely, feature tional to “w,” thus minimizing
selection using FS, classification “w” and maximizing the margin.
Design of Medical Expert Systems 23
w + ∑ ξi
2
to Equation (1.38) and expression (1.49)
(1.44)
2 2 i=1
2 i=1
{
w + ∑ ξi 2 − ∑ α i yi ⎡⎣( wxi ) + w0 ⎤⎦ − 1 + ξi }
2
i=1
i=1
(1.50)
SVMs are used to classify linear
data. In SVM, it is difficult to achieve Lagrange Multiplier “αi” can
better classification for nonlinear be either positive or negative for
data. To overcome this problem, LSSVM, whereas it should be
SVM uses kernel functions. Input positive for SVM. LS-SVM can be
datasets are distributed in nonlinear expressed as
dimensional space. These are
converted into high dimensional N
is found that the fuzzy–ACO system as 6.455, and the final temperature
performs better when compared to as 0.333. The temperature of SA is
the existing methodologies for all gradually reduced from the initial
the datasets. value to the final in 50 cycles. GA
As a next step, a clinical deci- receives the best chromosome with
sion support system based on SVM the help of SA. The comparisons
and hybrid ^^GA–SA is used for of the performance of the proposed
diagnosis. The SVM with Gaussian system in terms of the classification
RBF kernel performs the classifica- accuracy with the existing systems
tion process. The hybrid GA–SA along with systems such as grid
is used for two purposes, one is to algorithm (Huang et al., 2006), RST
select the most significant feature (Azar et al., 2014), decision tree
subset of the dataset, and the other (Zangooei et al., 2014), BP (Orkcu
is to optimize the kernel parameters and Bal 2011), SVM NSGA-II
of SVM. While the existing RST (Zangooei et al., 2014), LDA-ANFIS
based model offered an accuracy of (Dogantekin et al., 2010), PSO, and
85.46%, the proposed GASA-SVM SVM (Sartakhti et al., 2012).
yields the maximum accuracy of Subsequently, a medical
93.6% for breast cancer (diagnostic) expert system based on ELM and
dataset. For the diabetes dataset, SA is proposed. Classification
SVM offers the least accuracy of is performed using ELM while
74% while the proposed GASA- optimization of ELM parameter
SVM yields the maximum accuracy is carried out by SA heuristic. The
of 91.2%. On the hepatitis dataset, performance of the proposed model
the proposed GASA-SVM gives the is compared with several existing
maximum accuracy of 87%. SVM- works. The RST based system
Gaussian kernel model yields the offers an accuracy of 85.46% while
minimum accuracy of 76.1% while the proposed ELM-SA yields the
the proposed GASA-SVM yields maximum accuracy of 94.39% for
the maximum accuracy of 89.3% for breast cancer (diagnostic) dataset.
cardiac arrhythmia dataset. The SVM based system offers the
In GA, an average of 30 genera- least accuracy of 77.73% while
tions is taken. The best fitness value the proposed GASA-SVM yields
is found to be 0.1481 and the mean the maximum accuracy of 96.45%
fitness values for the four datasets for the diabetes dataset. For the
are also calculated as 0.17 for PID, hepatitis dataset, the Naïve Bayes
0.19 for breast cancer, 0.18 for hepa- system yields a minimum accuracy
titis, and 0.18 for cardiac arrhythmia. of 82.05% while the proposed
For SA, the initial temperature is set GASA-SVM yields the maximum
26 Handbook of Artificial Intelligence in Biomedical Engineering
accuracy of 81.08%. For the cardiac 97.54%. The SVM based approach
arrhythmia dataset, KNN-HITON offers the minimum accuracy
yields the minimum accuracy of of 77.73% while the proposed
65.3% while the proposed GASA- LSSVM-SA yields the maximum
SVM yields a maximum accuracy accuracy of 99.29% for the diabetes
of 76.09%. dataset. For the hepatitis dataset,
Experimental results show the the existing GRNN model offers
highest accuracy on the best folds, the least accuracy of 80% while the
the average accuracy over 10-folds, proposed LSSVM-SA yields the
sensitivity, and specificity of the maximum accuracy of 90.26%. The
proposed system for the four VFI5-GA system offers a minimum
medical datasets. The classification accuracy of 68% while the proposed
accuracy of the proposed system
LSSVM-SA yields the maximum
is compared with the existing
accuracy of 77.96% for the cardiac
systems such as RST (Azar et al.,
arrhythmia dataset.
2014), CART (Ster and Dobnikar,
To conclude, it is observed
1996), GRID algorithm (Chen et
that the medical expert systems
al., 2012), GA-based approach,
MKS-SSVM (Purnami et al., proposed in this chapter applied
2009), MABC fuzzy (Fayssal and over breast cancer, PID, hepatitis,
Chikh, 2013), VFI5-GA (Yilmaz, and cardiac arrhythmia dataset
2013), RF-CBFS (Ozcift 2011), produced improved classification
and AIRS-FWP (Polat and Gunes, accuracy when compared with
2009). other existing systems as shown
Finally, a medical decision in Table 1.3. The proposed system
support system based on LSSVM based on LSSVM-SA produced
and SA heuristic for the disease the highest accuracy over breast
diagnosis is proposed. FS method cancer, PID, and hepatitis dataset.
is used to select the most significant Moreover, the proposed system
features from the given feature set. based on GASA-SVM gives
LS-SVM with RBF is used for clas- maximum accuracy over the
sification and the SA for optimiza- cardiac arrhythmia dataset. Since
tion of the kernel parameters of the clinical decision making requires
LS-SVM. For breast cancer dataset the utmost accuracy of diagnosis,
the existing RST-based system medical expert systems design with
offered the least accuracy of 85.46% the highest classification accuracy
while the proposed LSSVM-SA can help the physicians to carry out
yields the maximum accuracy of an accurate diagnosis of diseases.
TABLE 1.3 Accuracies of the Proposed Systems and the Existing Systems
Breast Cancer Dataset Diabetes Dataset Hepatitis Dataset Cardiac Arrhythmia Dataset
Methods Accuracy Methods Accuracy Methods Accuracy Methods Accuracy (%)
(%) (%) (%)
RIPPER 79.75 SVM 74 SVM 74 VFI5-GA 68
C4.5 79.34 GA–SVM 82.98 KNN 75 PRUNING 61.4
APPROACH
1NN 80.37 ANN 73.4 C4.5 83.6 KNN-HITON 65.3
SVM ACO 81.93 LDA-ANFIS 84.61 NAIVE BAYES 82.05 KDFW-KNN 70.66
RST 85.46 SVM NSGA-II 86.13 KNN 83.45 AIRS-FWP 76.2
Design of Medical Expert Systems
LSSVM-SA
28 Handbook of Artificial Intelligence in Biomedical Engineering
Niranjana Devi Y, & Anto S, 2014, ‘An with other methods,’ In Proceedings of
evolutionary-fuzzy expert system for the the International Conference EANN, pp.
diagnosis of coronary artery disease,’ 427–430.
International Journal of Advanced Suykens, JA & Vandewalle, J, 1999,
Research in Computer Engineering & ‘Least squares support vector machine
Technology, vol. 3, no. 4, pp. 1478–1484. classifiers,’ Neural Processing Letters, vol.
Orkcu, HH & Bal, H, 2011, ‘Comparing 9, no. 3, pp. 293–300.
performances of backpropagation Tan, KC, Teoh, EJ, Yu, Q & Goh, KC, 2009,
and genetic algorithms in the data ‘A hybrid evolutionary algorithm for
classification,’ Expert Systems with attribute selection in data mining,’ Expert
Applications, vol. 38, no. 4, pp. 3703–3709. Systems with Applications, vol. 36, no. 4,
Polat, K & Güneş, S, 2009, ‘A new feature pp. 8616–8630.
selection method on classification Vinotha PG, Uthra V, Dr Anto S, 2017, ‘
of medical datasets: Kernel F-score Medoid Based Approach for Missing
feature selection,’ Expert Systems Values in the Data Sets Using AANN
with Applications, vol. 36, no. 7, pp. Classifier,’ International Journal of
10367–10373. Advanced Research in Computer Science
Pradhan, M & Sahu, RK, 2011, ‘Predict the and Software Engineering, vol. 7, no. 3,
onset of diabetes disease using artificial pp. 51–55.
neural network (ANN),’ International Watson, I, Basden, A & Brandon, P, 1992,
Journal of Computer Science & Emerging ‘The client-centred approach: Expert
Technologies, vol. 2, no. 2, pp. 2044–6004. system maintenance,’ Expert Systems, vol.
Purnami, SW, Embong, A, Zain, J.M & 9, no. 4, pp. 189–196.
Rahayu, SP, 2009, ‘A new smooth support Watson, I & Marir, F, 1994, ‘Case-based
vector machine and its applications in reasoning: A review,’ The Knowledge
diabetes disease diagnosis,’ Journal of Engineering Review, vol. 9, no. 04, pp.
Computer Science, vol. 5, no. 12, 1003. 327–354.
Refaeilzadeh, P, Tang, L & Liu, H, 2007, Wernick, MN, Yang, Y, Brankov,
‘On comparison of feature selection JG, Yourganov, G & Strother, SC,
algorithms,’ Proceedings of Association 2010, ‘Machine learning in medical
for the Advancement of Artificial imaging,’ IEEE Signal Processing
Intelligence,” pp. 35–39. Magazine, vol. 27, no. 4, pp. 25–31.
Sartakhti, JS, Zangooei, MH & Mozafari, K, Yilmaz, E, 2013, ‘An expert system based
2012 ‘Hepatitis disease diagnosis using on Fisher score and LS-SVM for cardiac
a novel hybrid method based on support arrhythmia diagnosis,’ Computational and
vector machine and simulated annealing Mathematical Methods in Medicine, vol.
(SVM-SA),’ Computer Methods and 5, pp. 1–6.
Programs in Biomedicine, vol.108, no. 2, Zangooei, MH, Habibi, J & Alizadehsani,
pp. 570–579. R, 2014, ‘Disease diagnosis with a
Ster, B & Dobnikar, A, 1996, ‘Neural hybrid method SVR using NSGA-
networks in medical diagnosis: Comparison II,’ Neurocomputing, vol. 136, pp. 14–29.
CHAPTER 2
Once the rules are developed, the very well-reflected. Another impor-
models are implemented. Further, tant factor is to streamline the data
the same is evaluated based on the recording process where the factors
data. The rules can be modified or that are being recorded should be
changed if required as per the inputs represented in the same way as it is
from the evaluation data. followed in the medical field. The
Another approach is to extract outcome of the knowledge elicitation
knowledge from the experts. Elic- is to lead to a state where knowledge
iting information from the experts state should lead to decision making
requires sound knowledge of the state.
requirements and the type of knowl- When multiple experts are
edge that is required. The question concerned about knowledge elici-
that is raised to the experts should tation, the factors that should be
as well map very much with the taken care of are different; experts
decision-making process. The actual will follow different strategies to
decision-making process should be diagnose and treat some diseases.
From Design Issues to Validation 35
FIGURE 2.7 AND/OR Tree Pi is the production rule and AI is the action. The edges from
same nodes represent AND operation.
42 Handbook of Artificial Intelligence in Biomedical Engineering
based on the production rules. The and domain. Also, it can be used in
situations where values of multiple keeping track of the rule searching
conditions are considered the order order. Further, it mimics human
of the conditions are analyzed to see reasoning in the standard conditions
if it matters. In that case, the order with a uniform structure.
of the conditions is set. Further, in
case of conflict in the condition,
the priority in which the conditions 2.4 FEATURE SELECTION
should be considered is set.
The main drawback of the The primary factor that influences
production rule is that the model is the design and development of an
limited to the factors presented in AI model is the input variables. The
the production rules. The scenario in process of identification of these
which other conditions are present input variables is the preliminary
will not be considered in such a step for the design of an AI model
system. The rule search strategies and the same is called feature
are limited only to forward chaining extraction. The process is uniform
and backward chaining. In certain for supervised and unsupervised
rule-based systems, the rules allow learning models.
standard conditions and the inclu- The first step of any feature
sion of AND, OR NOT, COUNT, extraction phase is to identify the
etc. along with the certainty factor types of variables that can be used
will provide more accurate results. to develop the models that will give
Each production rule is associated optimized results. The feature selec-
with a certainty factor that is a value tion process starts with the feature
between zero and one. The AND extraction process and it depends
nodes in the AND/OR Tree will have on the type of model and type of
the certainty factor that is calculated variable. The biomedical systems
based on the outcome of the situation have continuous variables as well
and is carried forward. 1—certainty as time series variables; the models
factor is considered for the evalua- developed should be able to accept
tion of the outcome (Yu, 2015). This such inputs.
can be considered mainly in cases One aspect of feature extraction
where ad hoc approach is considered is to identify all the possible vari-
with uncertain information. Even ables that will contribute positively
then there are factors that make the to the development of AI-based
production of rule-based system models. Even though only a limited
inefficient. Primarily, it is due to the number of variables are allowed
restriction to the formal specification to be included as input variables,
From Design Issues to Validation 43
clusters that can be derived and the and the different attributes need to be
models need to be evaluated based scaled down to the same range. To
on generally available knowledge. achieve the same normalization of
The establishment of different the data can be performed where all
models enable in a more accurate the attribute values can be narrowed
differential diagnosis where the new down to values between 0 and −1 or
data presented for the modeling will +1.
be almost similar to the original data. A major limitation of the devel-
To get the most out of the models, opment of AI-based models is its
it is always beneficial to scale the data dependence on the training dataset.
to the same range. Since different The performance of the models
attributes will have a different range mainly depends on the specification
of data, the implication of the same of the training model. The design
will be different. Therefore, the data issues presented in Section 5 are
summarized in Table 2.1.
can influence the outcome of the and optimized with the presence of
expert systems. The development all the required attributes. Further,
of the knowledge base is based on in cases where the standards that are
the inputs from domain experts. present for the correct classification,
The inputs from the domain expert the cases where the standards are
guarantee neither completeness nor not followed need assumptions to be
consistency of the knowledge base made. Thus, the model can produce
(Ozel, 2016). Upon the completion inaccurate results. Such cases need
of the development of the knowl- to be considered when it comes to
edge base, the same should be veri- validation of the appropriateness of
fied by the domain expert/’s for its the training dataset (Wehbe, 2018).
appropriateness. The validation of the learning
algorithm requires the developed
model to be tested thoroughly. To
2.6.2 VALIDATION OF ensure the same, the points to be
THE DATA AND TRAINING considered are whether the selected
ALGORITHM algorithms are suitable for the input
data and the model outcome is
Further, checking of the data can be interpretable. Also, one of the major
performed by different approaches. trade-offs of AI-based models is the
The accuracy of the model based on training time. To get a well-rounded
the training data can be verified as model, other approaches need to be
per the reviews from databases and considered and the performance of
charts. Further, studies on the specific the developed model needs to be
research area can be conducted. The compared with the existing other
data collected in each case can be models.
considered in verifying the accuracy
of the models. Further, how appro-
priate the training data is needed to 2.6.3 PERFORMANCE
be verified to determine if the most EVALUATION
appropriate parameters are included.
To achieve the same, statistical The factors that can be considered
analysis of the training dataset in measuring the performance is the
needs to be performed. Possibilities accuracy of the output generated by
of the accuracy of the data, scaling the model. The performance of the
consistency of the dataset, standards developed models can be verified by
to be followed are verified since the analyzing the outcome of the model
model will perform well only if the using a test dataset. The same can be
training dataset is most appropriate compared with the results obtained
48 Handbook of Artificial Intelligence in Biomedical Engineering
from other models for the same test presented. The criteria that have
dataset. Different error measures like to be considered in the selection
mean absolute error, mean absolute of features and designing the AI
percentage error, etc. can be consid- models are presented. It should be
ered in verifying the performance noted that to realize an AI-based
of the models. The values of the model, the concepts presented are
error measures should be one of not definite. Furthermore, the evalu-
the benchmarks in the evaluation ation of a number of parameters is
of the models. The error measures required. Different methodologies
considered in the verification of when combined can be beneficial
the model and the model with the also. The performance measure
minimum error will be considered should be considered as the evalu-
the best performing model for the ation parameter of the model. The
deployment (Emmert, 2016). When AI-based models have the capability
it comes to unsupervised models, to adapt to unseen scenarios even
apart from all the points mentioned, then the performance of the models
the applicability of other datasets in in such cases depends mainly on the
the model should be the parameter similarity of the training dataset.
that has to be considered. To further
narrow down the applicability
KEYWORDS
of other datasets, it should have
characteristics that are similar to
the training dataset with the same data processing
parameters and accuracy. data acquisition
biomedical data source
feature selection
2.7 CONCLUSION data validation
Beam, A.L. and Kohane, I.S., 2018. Big medical data acquisition system based on
data and machine learning in health two-level modeling. International Journal
care. JAMA, 319(13), pp. 1317–1318. of Medical Informatics, 112, pp. 114–122.
Chen, C.M.A., Johannesen, J.K., Bi, J., Liu, Z.H., Lu, J., Gawlick, D., Helskyaho, H.,
Jiang, R., and Kenney, J.G. 2016. Machine Pogossiants, G. and Wu, Z., 2018. Multi-
learning identification of EEG features Model Database Management Systems—A
predicting working memory performance Look Forward. In Heterogeneous Data
in schizophrenia and healthy adults. Management, Polystores, and Analytics
Neuropsychiatric Electrophysiology, 2(1), for Healthcare (pp. 16–29). Springer,
p. 3. Cham.
Codd, E.F., 1970. A relational model Ozel, T., Bártolo, P.J., Ceretti, E., Gay,
of data for large shared data banks. J.D.C., Rodriguez, C.A. and Da Silva,
Communications of the ACM, 13(6), pp. J.V.L. eds., 2016. Biomedical Devices:
377–387. Design, Prototyping, and Manufacturing.
Duda, R.O., and Hart, P.E. 1973. Pattern John Wiley & Sons, New York.
Classification and Scene Analysis. New Panyam, N.C., Verspoor, K., Cohn, T. and
York, NY, USA: John Wiley & Sons. Ramamohanarao, K., 2018. Exploiting
Emmert-Streib, F., Dehmer, M. and Yli-Harja, graph kernels for high performance
O., 2016. Against dataism and for data biomedical relation extraction. Journal of
sharing of big biomedical and clinical biomedical semantics, 9(1), p. 7.
data with research parasites. Frontiers in Parmar, C., Barry, J.D., Hosny, A.,
Genetics, 7, p. 154. Quackenbush, J. and Aerts, H.J., 2018b.
Friedman, C., 2018. Mobilizing Computable Data analysis strategies in medical
Biomedical Knowledge Conference imaging. Clinical Cancer Research, 24(15),
October 18, 2017. Overview/Opening pp. 3492–3499.
Remarks. Peng, L., Peng, M., Liao, B., Huang, G., Li,
http://slideplayer.com/slide/6207174/20/ W. and Xie, D., 2018. The advances and
images/28/Healthcare+example+of+ challenges of deep learning application
relational+databases.jpg accessed on in biological big data processing. Current
06/05/2019 Bioinformatics, 13(4), pp. 352–359.
Hudson, D.L. and Cohen, M.E., 2000. Neural Priyadarshini, S.J. and Hemanth, D.J., 2018.
Networks and Artificial Intelligence for Investigation and reduction methods of
Biomedical Engineering. Institute of specific absorption rate for biomedical
Electrical and Electronics Engineers. applications: A survey. International
Piscataway, NJ, USA Journal of RF and Microwave Computer-
Ju, Z., Wang, J. and Zhu, F., 2011. Named Aided Engineering, 28(3), p. e21211.
entity recognition from biomedical text Pyrkov, T.V., Slipensky, K., Barg, M.,
using SVM. In 2011 5th International Kondrashin, A., Zhurov, B., Zenin, A.,
Conference on Bioinformatics and Pyatnitskiy, M., Menshikov, L., Markov,
Biomedical Engineering (pp. 1–4). IEEE. S. and Fedichev, P.O., 2018. Extracting
Khan, R.S. and Saber, M., 2010. Design biological age from biomedical data
of a hospital-based database system (A via deep learning: too much of a good
case study of BIRDEM). International thing?. Scientific Reports, 8(1), p. 5210.
Journal on Computer Science and Sarjapur, K., Suma, V., Christa, S. and Rao,
Engineering, 2(8), pp. 2616–2621. J., 2016. Big data management system
Li, B., Li, J., Lan, X., An, Y., Gao, W. and for personal privacy using SW and SDF.
Jiang, Y., 2018. Experiences of building a In Information Systems Design and
50 Handbook of Artificial Intelligence in Biomedical Engineering
the skin and bones for diagnosing the human body using radiography,
and treating disease. In medicine, MRI, nuclear medicine, ultrasound
AI is used to identify diagnosis and imaging, tomography, cardiograph,
give therapy recommendations. In and so on (Smita et al., 2012).
medical diagnosis, artificial neural
networks (ANNs) is used to get
the result of the diagnosis. ANN 3.2.1 COMMON ARTIFICIAL
provides an extraordinary level of NEURAL NETWORKS IN
achievement in the medical field. MEDICAL IMAGE PROCESSING
ANN has been applied to various
areas in medicine like disease In recent years, NNs algorithms
diagnosis, biochemical analysis, and techniques are used in medical
image analysis, etc. In recent years, image processing because of their
medical image processing uses good performance in classification
ANNs for analyzing medical images. and function approximation. NN
The main components of medical techniques are mostly used in image
image processing that heavily preprocessing (e.g., construction
depend on ANNs are medical image and restoration), segmentation,
object detection and recognition, registration, and recognition. Table
medical image segmentation, and 3.1 shows the different types of NNs
medical image preprocessing. The used in the medical field (Rajesh et
various AI imaging technologies al., 2016; Yasmin et al., 2013).
help to examine various factors of
ANN involves tuning the values will fall in a given class regardless of
of the weights and biases of the the record’s characteristics; and P(X)
network to optimize network perfor- is the prior probability of record
mance that is measured by the mean X, and hence the probability of the
squared error network function. attribute values of each record. The
naive Bayes classifier predicts that
a record Xi belongs to the class ci
3.4.4 CLASSIFICATION USING
having the highest posterior prob-
THE NAIVE BAYES CLASSIFIER ability, conditioned on Xi if and only
The naive Bayes classifier is designed if P(ci|X) > P(cj|X) for 1 j ≤ m, j ≠ i,
for use when predictors within each maximizing P(ci|X). The class ci for
class are independent of one another which P(ci|X) is maximized is called
within each class. The naive Bayes the maximum posteriori hypothesis.
classifies data in two steps. The first The classifier predicts that the class
one is training and prediction. The label of record Xi is the class ci if and
training step uses the training data, only if
which are patient cases and their P(X|ci)P(ci) > P(X|cj)P(cj)
corresponding pathological cancer when 1 ≤ j ≤ m, j ≠ I (3.4)
stage (i.e., organ-confined disease or
The naive Bayes outcome is
extra-prostatic disease), to estimate
that each patient’s record, which is
the parameters of a probability
represented as a vector Xi, is mapped
distribution, assuming predictors are
to exactly one class ci, where ci =
conditionally independent given the
1,…, n where n is the total number
class. In the prediction step, the clas-
of classes, that is, n = 2. The naive
sifier predicts any unseen test data
Bayes classification function can be
and computes the posterior prob-
tuned on the basis of an assumption
ability of that sample belonging to
regarding the distribution of the data.
each class. It subsequently classifies The naive Bayes classifier used two
the test data according to the largest functions for classification:
posterior probability. The following
naive Bayes description is used in • Gaussian distribution (GD)
the classification process. and
Let P(ci|X) be the posterior • kernel density eestimation
probability that a patient record Xi (KDE).
will belong to a class ci (class can
be organ-confined disease or extra- GD assumes that the variables
prostatic disease), given the attributes are conditionally independent given
of vector Xi. Let P(ci) be the prior the class label and thereby exhibit
probability that a patient’s record a multivariate normal distribution,
68 Handbook of Artificial Intelligence in Biomedical Engineering
practice. To act like intelligent the hidden patterns in the data sets
systems, machines have to be fed to predict the possibility of a disease.
with a huge amount of data. The Different types of classification tech-
algorithms play an important role in niques along with data mining have
AI, as they provide instructions to proved to provide useful information
the machine to execute the required for better treatment of diseases.
task by analyzing the data provided. In the medical field, GAs find
In this chapter, the limitations extensive use in the field of gyne-
of genetic algorithms (GAs) are cology, cardiology, oncology, radi-
discussed and different classification ology, surgery, and pulmonology.
techniques along with Hybrid GAs
for biomedical applications will be
presented by identifying the chal- 4.1.1 GENETIC ALGORITHM
lenges in bio-medical applications
using AI. The GA (presented by Holland) is
a method applied to optimization
and search-related problems to
4.1 ARTIFICIAL INTELLIGENCE provide the most enhanced solution.
IN HEALTH CARE SECTOR The basis for the GA is the theory
of natural evolution by Charles
Artificial intelligence (AI) in Darwin. According to the theory,
healthcare is the usage of complex offspring for the next generation will
algorithms and software to assess be produced by selecting the fittest
human cognition in the analysis of individuals at random. A set of solu-
complicated medical data. AI is the tions for a task will be considered
capability for algorithms to estimate and among these solutions, the best
conclusions without human involve- ones will be selected. GA is divided
ment. Healthcare sectors are under into the following five stages:
pressure to reduce the cost. Hence,
an efficient way to use the data has 1. Evolution
to be devised. At present very few 2. Fitness function
software and hardware equipments 3. Selection
are available to analyze the existing 4. Crossover
huge medical data. Diagnosing 5. Mutation.
disease and its cure can be simplified
if the patterns within the clinical data The evolution stage starts from
are identified. a population and it is an iterative
The field of medical diagnostics process. In each iteration, the fitness
uses AI and algorithms like genetic of the individual is assessed. The
algorithms (GAs) for discovering fit individual’s genome is used to
Hybrid Genetic Algorithms for Biomedical Applications 75
create the next generation. This combats most of the problems that
new generation is then used in the arise due to finite population sizes.
next iteration. This continues until The blend of the local search method
the desired fitness level is achieved. along with the GA accelerates the
The crossover points are selected optimization process.
arbitrarily within the genes and Adaptive GAs are a favorable
to produce offspring using genes variation to GAs. They are GAs
exchanged among the parents until with adaptive parameters. Instead
the crossover points are reached. of using fixed values, here crossover
The mutation is then applied if the and mutation vary based on the solu-
bits in the string are to be flipped. tion’s fitness values.
Convergence and degree of accuracy Clustering-based adaptive GA is
in obtaining a solution are governed also a variant of the GA. Here, the
by the probabilities of crossover and population’s optimization states are
mutation. The algorithm terminates judged using clustering analysis.
if the required criteria are almost The crossover and mutation depend
met or if the required number of on the optimization states. For effec-
generations is produced or by manual tive implementations, GAs can be
inspection. In addition to the above combined with other optimization
five stages, heuristics can be applied methods to create a Hybrid GA.
to speed up the process.
Although, GAs are more efficient
as they provide a number of solutions 4.1.2 HYBRID GENETIC
for a task when compared with tradi- ALGORITHM
tional methods. It proves to be better
when vast parameters are available Even though the performance of
and show good performance for a GAs for global searching is superior,
global search. They quite often have it takes a long time to converge to
more latency while converging to the an optimum value. Local search
global optimum. In addition, each methods on the other hand converge
time we run the algorithm; the output to an optimum value very quickly for
might vary for the same set of inputs. smaller search space. Interestingly,
The problem arises because in the though their performance is poor as
GA the population size is assumed to global searchers.
be infinite. However, in practice the To improve the performance of
population size is finite. This affects GAs, a number of variations have
the sampling capacity of a GA and been devised. Hybrid GAs is one
hence its performance. Uniting a GA such variation. Hybrid GAs are
along with the local search method a combination of GA with other
76 Handbook of Artificial Intelligence in Biomedical Engineering
sets are derived. The rule the data points overlap then either a
sets need not have the same hyperplane with tolerance or a hyper-
properties of the decision plane with zero tolerance can be used.
tree. For a particular data The important parameters in SVMs
value, either one or more are margin, kernel, regularization,
rule sets or no rule needs to and gamma. By varying these param-
be applied. If multiple rules eters, a hyperplane with nonlinear
are applied, weighted votes classification can be achieved at a
are given to the data values reasonable time. Finding a perfect
and then added. If no rule set class when there are many training
is applied, a default value is data sets consumes a lot of time.
assigned. The error rates are
lower on rule sets thereby
helping in improving the
accuracy of the result. It also
automatically removes attri-
butes that are not helpful.
by GA. Based on the fitness values, (e) Stop when a fixed number of
the parents are selected to produce generations is reached.
the next generation. Crossover and
2. Artificial neural network:
mutation are then applied to obtain
the best solution. GA discards all (a) Input the training samples
bad proposals and considers only the and the class of the sample.
good once. Thus, the end result is not (b) Compare the output with
the known class and adjust
affected.
the weight of the training
Neural networks are used in
sample to meet the purpose
solving classification problems.
of classification.
They are capable of learning from
previous experiences and improvise 3. Algorithm for malignant cell
on the behavior when they are detection using GA and neural
trained. Neural networks mimic the network:
human brain. It consists of neurons (a) Initial solutions are gener-
joined using connecting links. The ated using GA.
weight of each link in the network is (b) It is then fed as input to
multiplied by the transmitted signal. neural network.
Every node in the network forms (c) The output from the neural
the output node of the network, the network is then evaluated
lines form the input, and the inter- using a fitness function.
mediate layer forms the hidden layer. (d) If the stop condition is not
The output of the hidden layer is the reached, a new selection,
input to the output layer (Alalayah et crossover,, and mutations are
al., 2018; Ahmad et al., 2010).). performed and fed back to
The below steps are involved in the neural network. Else, the
classifying malignancy: process is stopped.
1. GA:
(a) Selects the optimal weights 4.4.3 HYBRID GENETIC
and bias values for the neural ALGORITHM FOR HEART
network. DISEASE DETECTION
(b) Evaluation of fitness value.
(c) On the basis of fitness value, Heart diseases usually occur due
parents are selected. to improper pumping of the blood,
(d) The new population is block in arteries, high blood pres-
formed from the parents sure, diabetes, etc. It has become a
using a cross over and prominent cause of death these days.
mutation. Hence, predicting its occurrence
92 Handbook of Artificial Intelligence in Biomedical Engineering
artifcial intelligence
4.5 CONCLUSION
genetic algorithms
The need for digitalization has been hybrid genetic algorithms
rising day by day. AI has contributed classifcation techniques
to many fields such as medical, image processing
automobile, education, etc. The
research to extract a huge amount
of information available in clinical
data to improve the diagnosis of a REFERENCES
disease is critical. GA helps to find Ahmad F., Mat-Isa N. A., Hussain Z.,
an optimal solution for complex Boudville R., Osman M. K. Genetic
data at a reasonable time. Hence, algorithm-artificial neural network
their usage in the field of medicine (GA-ANN) hybrid intelligence for
cancer diagnosis. Proceedings of the 2nd
helps the physician to solve complex
International Conference on Computational
diagnosing problems. The search- Intelligence, Communication Systems and
ability of GA can be increased by the Networks, July 2010, 78–83.
proper blend of GA with the local Alalayah K. M. A., Almasani S. A. M.,
search method. This blend is called Qaid W. A. A., Ahmed I. A. Breast cancer
diagnosis based on genetic algorithms and
hybrid GA. neural networks. International Journal of
In this chapter, an insight into Computer Applications (0975-8887), 180,
how hybrid GA can be used to 2018, 42–44.
96 Handbook of Artificial Intelligence in Biomedical Engineering
evaluate the data and predicts the Medical science is also using tech-
deceases and prescribes the medica- nology, but it defines proper ways
tion accordingly. It is interlined with to implement technology, and the
the lifestyle of the individuals and processing depends on the require-
their related data. The evaluation ments of the medical applications,
of artificial intelligence (AI) tech- which is a challenging task to the
nology leads to think and generate researchers.
knowledge based on the continuous The experts use computation
data analysis process and predict as processes in the medical science
human experts by adopting various field (He et al., 2019) as a tool, but
algorithmic processes including the tools are advanced by computer
industry 4.0. specialists with different technical
The life span of humans varies in procedures. The technical experts
various ways because of this contem- derive the concept and incorporate
porary world. The current upcoming the same in several medical science
science and technology affects the applications. Data and images anal-
lifestyle in both fields of advance- yses are calculated by the medical
ment and defect. Revolutions field computational process. The
happening in this modern era are research process comprises analysis
because of the prominent invention and design of the solution and
of computer processing. Computing implements the identified existing
and knowledge processing applica- algorithms in medical science.
tions expand the living conditions
in many areas and many ways.
Medical science is one of the poten- 5.2 APPLICATIONS OF
tial domains where the computing BIOMEDICAL AI SYSTEM
process is used for the advancement
of human lifestyle. Many research AI is playing a vital role in the
programs are carried out using a Industry 4.0 revolution. AI is the
combination of computing processes replication of human intelligence
and medical applications for the processes by machines, especially
betterment of human health (Jiang et computer systems. These methods
al., 2017). Most of the medical appli- include acquisition of informa-
cations used with computations are tion and procedures for using the
aiding practitioners to take decisions information, reaching approximate
on their drug recommendations and or definite conclusions and self-
the identification of the diseases. In correction. The AI technologies are
all of the fields, computing technolo- used in Big Data Analytics, Autono-
gies are well established and applied. mous robots, Simulation, Internet of
Healthcare Applications Using Biomedical AI System 101
issues (Rodellar et al., 2018; Xu pattern and will play a major role to
et al., 2019; William et al., 2018; subscribe the treatment. AI can be
Johnson et al., 2018). It helps the used for life care process with the
well-being industry by two ways, support of robotics. Robots are used
such as making decisions for experts in emergency situations, for rescue
and common people. Common purposes, as emergency servants,
people used these self-recommended and in all other possible manners.
decisions based on captured data and In the current scenario, they are
with the support of AI for instant performing activities as an expert
remedies. At the same time, it helps tutor and train the experts too.
the experts to decide a complex case. According to the availability
Treatment is a process for of the biomedical data and rapid
subscribing medication or surgical development of device knowl-
processes in the medical industry. edge, profound education, natural
The AI techniques are supporting to language processing (NLP),
subscribe medication based on the robotics, and computer visualization
preceding cases and training. It is techniques have made possible to
highly recommended to rely on the create successful applications of AI
training data set with all possible in biomedical and healthcare appli-
comparisons of the rules that are cations. The authoritative AI tech-
applicable according to the cases. niques can reveal medically relevant
The AI system will consider the knowledge hidden in the massive
regional factors, experts’ opinions, volume of data, which in turn can
and previous cases with a similar contribute to decision making.
radial basis function network, Hop field network has the feedback
perceptron, back propagation, connection that is called recurrent.
logistic regression, gradient descent, They are fully interconnected neural
and hop field network. Radial basis networks and applied for image
function network has an activation segmentation only if it has an opti-
function called radial basis func- mization problem.
tion in the hidden layer. It has an
input layer, hidden layer, and linear
output; also, it is used for the time 5.3.2.3 DEEP LEARNING
series prediction, classification, and
system control. Perceptron is a linear Deep learning is an extension of ML
unsupervised binary classifier. There based on neural network concepts that
are two layers of perceptron such can be a supervised network, a semi-
as a single layer and a multilayer. supervised network, or an unsuper-
The multilayer is called the neural vised network. There are many deep
network. It has input layer weights learning algorithms available such
and bias, net sum, and activation as deep neural network (DNN), deep
function. belief network (DBN), recurrent
Backpropagation is used for neural network (RNN), and convo-
the classification that is essential lutional neural network (CNN),
for neural network training. It back restricted Boltzmann machine
propagates the information about (RBM), auto encoder network, and
the error through a network that long short-term memory (LSTM).
is made by wrong guesses of the This LSTM is a special type of RNN
neural network. So, immediately ML technique. The advantage is
according to the error information, that the expected output compares
the parameter passed by the neural with models’ output with updates
network will be changed in one in weights. All of these algorithms
step at a time. Logistic regression are used in compute vision, speech
is a nonlinear binary classification recognition, and NLP, drug design,
method that translates the signal and medical image analysis. DNN
into space from 0 to 1. It is used to is a feedforward network with a
calculate the probability of the set of complex nonlinear relationship,
input with the label that is matched. where data flows through input to
Gradient descent is a neural network output without looping back. It has
algorithm used to access the weight a multilayer between the input and
based on the error rate and to find the output. DBN is an unsupervised
local minimum of the function that probabilistic algorithm that consists
is known as an optimum function. of a multilayer of the stochastic
108 Handbook of Artificial Intelligence in Biomedical Engineering
The map reduce method with the awful clinicians. Instep, they are
stochastic gradient descent algorithm frequently ascribed to cognitive
is used for the prediction purpose. blunders (such as disappointments in
Logistic regression is used to train discernment, fizzled heuristics, and
EHR with the stochastic descent predispositions), anon attendance or
algorithm. There are many archi- underuse of security nets and other
tectures developed for monitoring conventions.
personal health such as Meta Cloud The utilization of AI advances
Data Storage architecture, which is guarantees to diminish the cognitive
used to transform the data collected workload for doctors, in this way
into cloud storage. To store a huge set moving forward care, symptomatic
of data, the Hadoop Distributed File exactness, clinical and operational
System is used. To secure big data productivity, and the in general
that are all collected and transformed quiet involvement. Whereas there
into the cloud, the data are protected are reasonable concerns and dialogs
by using an integrated architecture around AI taking over human occu-
model called the Grouping and pations, there is restricted proof to
Choosing (GC) architecture with the date that AI will supplant people
Meta Fog Redirection architecture. in well-being care. For illustration,
Logistic regression implanted with many considerations have been
the MetaFog architecture that is
proposed that computer-aided read-
used for the prediction of diseases
ings of radiological pictures are
from the historical record when the
fair as precise as (or more than)
dependent variable has existed.
the readings performed by human
radiologists.
5.3.2.5 COGNITIVE Rather than large-scale work
TECHNOLOGIES misfortune coming about from the
computerization of human work, we
Cognitive innovations are excep- propose that AI gives an opportunity
tionally and famously presented for the more human-centric approach
in well-being care to diminish to increase. Different from mechani-
human decision making and have zation, it is increasingly presumed
the potential to correct for human that savvy people and savvy
blunder in giving care. Restorative machines can coexist and make
mistakes are the third driving cause way for better results than either
of death all over India and addition- alone. AI frameworks may perform
ally in the world, but they are not for a few well-being care errands with
the most part due to exceptionally constrained human intercession,
110 Handbook of Artificial Intelligence in Biomedical Engineering
utilized for approval, and the region cannot demonstrate that a suspi-
beneath the collector working charac- cious zone is threatening or kind.
teristic (AUC) is utilized to degree its To choose that, the tissues should
execution. be evacuated for examination by
Computerized mammograms are utilizing breast biopsy procedures.
among the foremost troublesome An untrue positive discovery may
therapeutic pictures to be studied cause superfluous biopsy. Measure-
due to their low contrast and contrast ments show that 20–30 rates of
within the sorts of tissues. Vital breast biopsy cases are demonstrated
visual clues of breast cancer incorpo- as cancerous. In an untrue negative
rate preliminary signs of masses and discovery, a real tumor remains
calcifications clusters. Tragically, undetected that might lead to higher
within the early stages of breast costs or indeed to fetch a human
cancer, these signs are exceptionally life. Here is the tradeoff that shows
subtle and varied in appearance, creating a classification framework
making conclusion troublesome that might specifically influence
and challenging indeed for pros.
human life. In expansion, the tumor
This is often the most reason for the
presence is distinctive. Tumors
advancement of the classification
are of diverse shapes and a few of
system to help masters in restorative
them have the characteristics of the
teaching. Due to the importance of a
ordinary tissues. The density level of
robotized picture categorization to
assist physicians and radiologists, tumors decides the level of the stage.
much inquiry within the field of The stage determination process is
restorative picture classification has explained in the next session.
been done as of late. With all this Lung cancer is recognized and
exertion, there is still no broadly analyzed by utilizing computed
utilized strategy to classify restor- tomography pictures that have been
ative pictures. This is often because utilized with CNN for a computer-
the therapeutic space requires tall ized classifier, recognizing prescient
exactness and particularly the rate of highlights. Profound highlight
untrue negatives to be exceptional. extraction with preprepared CNN is
In expansion, another vital proficient than the choice tree classi-
calculates that impacts the victory of fier. VGG-f pretrained CNN is used
classification strategies is working for linear unit feature extraction, and
in a group with therapeutic masters, the result will be better when feature
which is alluring but frequently not ranking algorithm is tailed with a
achievable. The results of mistakes random forests classifier. Fine-tuning
in discovery or classification are a CNN is the strategy that has been
exorbitant. Mammography alone pretrained with a huge set of named
Healthcare Applications Using Biomedical AI System 117
APPLICATIONS OF ARTIFICIAL
INTELLIGENCE IN BIOMEDICAL
ENGINEERING
PUJA SAHAY PRASAD1*, VINIT KUMAR GUNJAN2,
RASHMI PATHAK3, and SAURABH MUKHERJEE4
Department of Computer Science & Engineering, GCET,
1
Hyderabad, India
Department of Computer Science & Engineering, CMRIT,
2
Hyderabad, India
3
Siddhant College of Engineering, Sudumbre, Pune, Maharashtra, India
4
Banasthali Vidyapith Banasthali, Rajasthan, India
Corresponding author. E-mail: puja.s.prasad@gmail.com
*
are two such areas where analyzing between subject features, and
the amount of data can be time- outcomes of interest. These clinical
consuming and overwhelming. In data often exist in but not limited to
fact, AI will transform healthcare the form of demographics, medical
in the near future. There are various notes, electronic recordings from
health-related apps in a phone that medical devices, physical examina-
use AI like Google Assistant, but tions, and clinical laboratories. AI
there are also some apps like Ada has been intended to analyze medical
Health Companion that uses AI to reports and prescriptions from a
learn by asking smart questions to patient’s file, medical expertise, as
help people feel better, takes control well as external research to assist in
of their health, and predicts diseases selecting the right, separately custom-
based on symptoms. As in expert ized treatment pathway. Nuance
systems, AI acts as an expert in a Communications provides a virtual
computer system that emulates the assistant solution that enhances
decision-making ability of a human interactions between clinicians and
expert. Expert systems like MYCIN patients, overall improving patient
for bacterial diseases and CaDET for experience and reducing physi-
cancer detection are widely used. In cian stress. The platform enables
image processing, it is very critical conversational dialogue and prebuilt
when it comes to healthcare because capabilities that automate clinical
we have to detect disease based on workflows. The healthcare virtual
assistant employs voice recognition,
the images from X-ray, MRI, and CT
electronic health record integrations,
scans so an AI system that detects
strategic health IT relationships,
those minute tumor cells is really
voice biometrics, and text-to-speech
handy in early detection of diseases.
and prototype smart speakers
One of the biggest achievements is a
customized for a secure platform.
surgical robot as it is the most inter-
IBM Medical Sieve is an ambitious
esting and definitely a revolutionary
long-term exploratory project that
invention and can change surgery
plans to build a next-generation
completely. “cognitive assistant” that is capable
However, before AI systems can of analytics and reasoning with a vast
be arranged in healthcare applica- range of clinical knowledge. Medical
tions, they need to be “trained” Sieve can help in taking clinical
through data that are generated from decisions regarding cardiology and
clinical activities, such as screening, radiology—a “cognitive health assis-
diagnosis, treatment assignment, and tant” in other terms. It can analyze the
so on, so that they can learn similar radiology images to detect problems
groups of subjects, associations reliably and speedily.
Applications of Artifcial Intelligence in Biomedical Engineering 127
heuristic feature selection procedures So, the main aim of NLP is to assist
may lose information in the images. in decision making by processing the
Unsupervised learning methods such narrative text. Two main components
as PC analysis or clustering methods of NLP are
can be cast-off for data-driven
reducing dimension. CNN was first (a) text processing and
proposed and supported for the anal- (b) classification.
ysis of high-dimensional images.
The inputs for CNN are the appro- In text processing, disease-
priately normalized pixel values on significant keywords are identified
the images. CNN then shift the pixel based on historical data. After that,
values in the image from side to side a keyword subset is selected through
weighting in the convolution layers inspecting their effects on the classi-
as well as sampling in the subsam- fication of the abnormal and normal
pling layers otherwise. The final cases. The confirmed keywords
output is a recursive function of the then move in and enrich the clinical
weighted input. structured data to assist in medical
decision making. The NLP pipelines
have been established to assist
6.3.2 NATURAL LANGUAGE medical decision making on moni-
PROCESSING toring adverse effects, alerting action
arrangements so on. On introducing
The main focus of NLP is to process NLP to analyze the X-ray reports of
the narrative text into machine chests, it would help the antibiotic
understandable form. A number assistant system to alert doctors for
of clinical data or information like the need for anti-infective treatment.
physical examination, laboratory Laboratory-based adverse effects
reports, discharge summaries, and canbe also automatically monitored
operative notes are incomprehensible by using NLP. NLP also helps to
and unstructured for the computer diagnose diseases. For example, 14
program. In this type of unstructured variables that are associated with the
data, NLP’s main targets is to extract cerebral aneurysm disease are found
meaningful information from this to be successfully used for classi-
narrative text so that clinical decision fying persons with cerebral diseases
will become easy. ML algorithms are and normal persons. NLP also used
more useful in case of genetic data to mine the outlying arterial disease-
and EP as these data are easily under- associated keywords from narrative
standable by the machine for quality clinical notes. This keyword is then
control processes or preprocessing. used for classification between the
132 Handbook of Artificial Intelligence in Biomedical Engineering
normal person and the patients with The movement recognition stage
peripheral arterial disease having helps to recognize abnormal behavior
91% accuracy. having movement different from the
normal pattern. Collecting data about
pathological gaits is also helpful for
6.4 DISEASE TYPES predicting a stroke. Hidden Markov
CURRENTLY TACKLED BY AI models and SVM are used here,
COMMUNITIES and they could appropriately clas-
sify 90.5% of the subjects to the
Stroke is a frequently occurring correct group. MRI and CT are good
disease that affects more than 500 neuroimaging techniques for disease
million persons worldwide. China evaluation. In some of the literature
and North America are one of the works, it was found that apply ML
leading countries having high death methods to neuroimaging is useful
rate due to strokes. Medical expenses for diagnosis. Some used support
due to stroke are also high ad put vector machine for MRI data, which
heavy burden on families and coun- helps in identifying end phenotypes
tries. So, research on avoidance as of motor disability after stroke.
well as treatment for stroke has good Some researchers also use three-
significance. AI methods have been dimensional CNN for finding lesion
used in more and more in stroke- segmentation in brain MRI (multi-
related studies as it is one of the modal). In this, for postprocessing
main causes of death throughout the of CNN segmentation maps, a fully
country. Three main areas of strokes conditional random field is used.
are predicting the disease by early Gaussian process regression is also
diagnosis, cure, and outcome predic- used for stroke anatomical MRI
tion as well as prognosis evaluation. images. ML is also used to analyze
Though there is a lack of finding or the CT scan of patients.
judgement of early stroke symptoms, After stroke, a free-floating
only a few patients might receive intraluminal thrombus may form
treatment on time. For predicting as a lesion, which is hard to notice
early stroke, movement-detecting with a carotid plaque in CT imaging.
devices are already present. PCA For this, a researcher uses three
and genetic fuzzy finite state ML ML algorithms for classification
algorithm were implemented into of these two kinds by the quantita-
the device for building solutions. tive shape, including SVM, linear
The detection procedure included discriminant analysis, and artificial
a stroke-onset detection stage and neural network. Treatment using
human movement recognition stage. ML has been useful for analyzing
Applications of Artifcial Intelligence in Biomedical Engineering 133
tools, conversational chat and voice bore, as well as the software and
assistants, and emotion recognition electronic control systems that are
tools into the artificial platform. present in it, to improve the accuracy
The virtual avatar app of nurses of prostate biopsy.
can also be programmed to perform In this, the main aim is to develop
detailed counselling of a behavioral a robot for an MRI scanner that can
health problem. This type of app work inside it. However, there are
talks to patients through a smart- number of physical challenges for
phone about their health condition. placing a robot inside the scanner, as
The patient has no need to type: scanner uses a powerful magnet, so
they only talk to the virtual avatar it is necessary that the robot should
about their health condition, then be made up of nonferrous materials.
this conversation can be rolled and Most of the technical difficulties
transcripted to record, and afterward have already been overcome by
it would be reviewed by the health this team. Besides this, they need
provider. The virtual avatars will to develop the software interfaces
speak to person empathetically and and communication protocols for
naturally, which might also benefit properly controlling the robot with
people who are ailing elderly and planning systems and higher level
having chronic diseases. imaging (Holzinger, 2012). For
By hearing unusual voice or the nontechnical surgical team,
detecting unusual emotional tone the robot must be easily sterilized,
of a patient such as depression and easily placed, and easily setup in the
anxiety, the app does emotional scanner, which are also some of the
analysis and provides alert to the requirements. Because of all this, it
health provider who may prescribe is a huge system integration assign-
medications. ment that requires many repetitions
of the software and hardware to get
to that point.
6.6 FUTURE MODELS In other projects, a rehabilitation
robot is integrated with virtual reality
Research for developing advanced to expand the variety of therapy exer-
robots continues for an ever- cise, increasing physical treatment
expanding variety of applications effects and motivations. Nowadays,
in the healthcare area. For instance, discoveries are being made using
a research team led by Gregory nanomaterials and nanoparticles. For
Fischer is developing a high-preci- example, in “blood–brain barrier,”
sion, compact surgical robot that will nanoparticles easily traverse. In the
operate only within the MRI scanner coming future, nanodevices can be
Applications of Artifcial Intelligence in Biomedical Engineering 141
and future vision. Telemedicine Journal Reggia, J.A. and Sutton, G.G., 1988.
and e-Health, 9(4), pp. 379–386. Self-processing networks and their
Patel, V.L., Groen, G.J. and Scott, H.M., 1988. biomedical implications. Proceedings of
Biomedical knowledge in explanations the IEEE, 76(6), pp. 680–692.
of clinical problems by medical Simpson, M.S. and Demner-Fushman, D.,
students. Medical Education, 22(5), pp. 2012. Biomedical text mining: a survey of
398–406. recent progress. In Mining text data (pp.
Patel, V.L., Shortliffe, E.H., Stefanelli, M., 465–517). Springer, Boston, MA, USA.
Szolovits, P., Berthold, M.R., Bellazzi, Tiwana, M.I., Redmond, S.J. and Lovell,
R. and Abu-Hanna, A., 2009. The N.H., 2012. A review of tactile sensing
coming of age of artificial intelligence technologies with applications in
in medicine. Artificial Intelligence in biomedical engineering. Sensors and
Medicine, 46(1), pp. 5–17. Actuators A: Physical, 179, pp. 17–31.
Poon, H. and Vanderwende, L., 2010, June. Wilson, E.A., 2011. Affect and Artificial
Joint inference for knowledge extraction Intelligence. University of Washington
from biomedical literature. In Human Press.
Language Technologies: The 2010 Annual Zeng, D., Chen, H., Lusch, R. and Li,
Conference of the North American Chapter S.H., 2010. Social media analytics
of the Association for Computational and intelligence. IEEE Intelligent
Linguistics (pp. 813–821). Association for Systems, 25(6), pp. 13–16.
Computational Linguistics.
CHAPTER 7
science. And at this point of time, existing related or similar genes that
the complete size, dimensionality, are not available in the list, define
and the amount of scientific data the proteins that interact with one
have become very enormous that another, provide names of genes in
the dependence on the intelligent groups, list out the genes connected
and automated systems is becoming with diseases, emphasize protein
the need of the hour. Algorithms are functional domains and their motifs,
predominantly integrated to provide highlight related literature reviews,
faster accessing and accurate results, and conversion of gene identifiers.
the more the requirement, the more Prediction of transcription
efficient the algorithms to provide factor binding sites: The transcrip-
better discovery in the processing of tion factors are essential gene regu-
data. lators that have a characteristic task
Automated data collection: AI in the improvement, cell signaling
has started showing a great impact in and cycling, and their association
the field of health care; it is expected
with various diseases. There are
to provide machines that will help
about thousands of position weight
doctors, nurses, and other technicians
matrices (PWM) that are accessible
to save time on tasks. Techniques
to choose for the detection of explicit
such as voice-to-text transcriptions
binding sites. This process is mainly
will now help order tests, prescribe
used for the prediction based on the
medicines, and provide chart notes.
PWMs that have false-positive rates.
IBM’S Watson has started to provide
an opportunity to mine data and Simulation of molecular
help the physicians to bring out the dynamics: Molecular dynamics is
best and efficient treatment for the a computer simulation technique
patients. It renders help by analyzing useful for analyzing the physical
millions of medical papers using movements of atoms and molecules.
natural language processing to Molecular dynamics simulation
provide treatment plans. allows the study of complex and
Gene function annotation: The dynamic procedures that happen
major task involved in gene function in a biological system. The study
annotation is to identify enhanced includes confrontational changes,
biological subjects, to ascertain the protein stability, protein folding,
improved function-related gene ion transport in biological systems,
groups, the group repeated annota- molecular recognition such as of
tion expressions, categorize and proteins, DNA, and membranes and
display the related many-genes-to- also provides an urge to perform the
many-terms in a 2D view, search for other studies such as drug designing,
Biomedical Imaging Techniques Using AI Systems 151
is PET, where the isotopes emit posi- these images. Today’s multidetector
trons (positively charged electrons) row CTs acquire multiple sub-
instead of emitting gamma rays. millimeter spatial resolution slices
Commonly, PET is based on the with processing speeds measured
positron-emitting isotope of fluorine in milliseconds rather than hours.
that is integrated into glucose called Iodinated contrast agents are used
fluoro-deoxyglucose. with CT since they block X-rays
based on their density compared
with that of normal tissue.
and in-country privacy and security critical or serious conditions like the
regulations are provided that certify acute kidney injury and also provide
that the data is protected and secure. results of blood tests, X-rays and
DeepMind has bought many scans at the press of a button. Nurses
healthcare projects across the world, and other assistants said that the
now in collaboration with UCL’s application saved their time of up to
radiotherapy department, it has 2 h in a day.
initialized to reduce the amount Google Glasses initially helpful
of time taken to plan treatments.10 for recognizing text and translating
Through machine learning, Deep- it, recognizing objects, and searching
Mind has given access to 1 million for its relevant match, looking at
images of the eye scans, along with posters and playing videos, getting
their patient data. It sets to train directions on the go, all of this
itself to read the scans and predict happening in front of the eye.11 Some
spot early signs that may indicate the editions of Google Glasses had no
occurrence of the degenerative eye lenses in them; what all editions had
and also reduce the time taken for is a thick area of the frame over the
diagnosis is reduced to one fourth. right eye, it was where Google had
The convolution neural network inserted the screen for the glasses.
that was built for this system was To look upon the screen, one has to
the first deep learning model that peek up with the eyes. The region of
was intended to effectively learn placement was quite important, since
the control guidelines straight from the screen inserted in the direct line
a high-dimensional sensory input of vision may result in serious prob-
using the reinforcement learning lems. The display has a resolution of
algorithm. This remarkable accom- about 640 × 360 pixels, making it as
plishment was soon improved by on the low side for mobile devices.
succeeding forays into gaming. The camera has about 5-megapixel
During the year 2015, DeepMind quality and it also records videos at
in collaboration with Royal Free about 720 pixels. The only issue is in
NHS Trust had been used to create the battery life, which lasts for about
a patient safety application that 5 h for average usage, for taking a
was called “Streams,” it reviews longer video or using the glass for a
the clinical test results to check for longer time might drain the battery
signs of sickness and sends notifica- quickly. Google Glass has a storage
tions and alerts to staffs instantly capacity of about 16 GB of storage
if an emergency examination is and it also synchronizes with Google
required. The application also helps Drive for an added accessibility
physicians to rapidly check for other to the videos and photos taken by
Biomedical Imaging Techniques Using AI Systems 165
the user. It is also equipped with a check the ongoing professional skill
micro-USB port for transferring files development and certification. The
and charging the device. The frame start of Google Glass will provide a
is generally lightweight and it has a technical change in the way people
replaceable nose pad in case of any get to understand the world.
accidental breaking. Sounds of phone An ultrasonography exam takes
calls and other notifications are quite a lot of time in identifying the
produced through bone-conduction planes in the brain, which needs an
transfer, or even by passing some ample amount of training and manual
vibrations directly to the skull, thus work.12 There could also be a missed
transmitting sound to ears. The glass or delayed diagnosis. Now, with AI
is an optical head-mounted display systems, users will just need to find
worn as a pair of spectacles. With its a starting point in the fetal brain and
multitasking capability and respon- the device will automatically take
siveness to hands-free voice and measurements after identifying the
motion commands gained acknowl- standard planes of the brain. The data
edgment in the medical field, where or the documentation is maintained
doctors can actually perform surgery as the patient may visit for examina-
as a surgical navigation display. The tion some other day; this will help in
first-ever surgery using Google Glass a more positive diagnosis.
was done by Dr. Marlies P. Schijven EchoNous has developed a
in the year 2013, at the Academic convolutional neural network for the
Medical Centre, Netherlands. In the automatic detection of the urinary
operation theatre, doctors can see the bladder with the help of high-quality
medical data without even having ultrasound images captured with
to turn away from the patients. Uscan, using the advantage of the
Researchers have found that the “high spatial density fanning tech-
navigating options were helpful in nique”.13 With the help of the captured
finding tumors by the doctors who image, one can compute the urinary
perform surgery and can also venture bladder volume with much higher
a form of a tunnel vision or blindness accuracy. Uscan actively recognizes
on a part that could make them miss the contours of the bladder, where
unconnected lesions or the inconve- the measurements are very accurate
nience around them. Google Glass than the results got from the existing
will further be helpful in recording scanners. EchoNous Vein is an
the surgery to maintain it for docu- ultrasound-based device intended
mentation purposes to keep a track particularly for nurses to improve
of the patient’s medical record and peripheral IV catheter placements.
to assess the surgical competency to It is being developed for handling
166 Handbook of Artificial Intelligence in Biomedical Engineering
old, who visited doctors during the data, and also confirm whether the
period of January 2016 and January patient needs to visit doctor or it is
2017. The medical charts associated just a common cold. It is likely that
with every patient had medical charts junior doctors who depend on this AI
that were text written by doctors and system could possibly miss out on
also a few laboratory test inferences. their learning and check patterns in
To ease the work performed by AI, the patient’s queries. The team looks
Zhang and his team made doctors forward to training AI systems that
annotate the records to recognize can also diagnose adult diseases.
the part of the text linked with the A team from Beth Israel
patient's issues, the period of illness, Deaconess Medical Center and
and also tests performed. When the Harvard Medical School have devel-
testing phase began using unseen oped an AI system to predict disease
cases, AI was efficient enough based on the training given to the
to identify roseola, chickenpox, systems to investigate the patholog-
glandular fever, influenza, and ical images and perform pathological
hand–foot–mouth disease giving an diagnosis.17 The AI-powered systems
accuracy rate of about 90%–97%. It are incorporated with machine
may not be a perfect score but still, learning, deep learning algorithms,
we should know that even doctors where it trains the machines to
cannot predict correctly at times. The understand the complex patterns and
performance measure was compared structures experienced in real-time
with some 20 pediatricians who have data by creating multilayer percep-
various years of clinical experience. tron neural network, is a procedure
AI outperformed the junior doctors, that is used to show similarities with
and also the senior doctors performed the learning step that occurred in
well than the AI. When doctors are layers of neurons. In the evaluation
so busy while looking upon about where researchers were given slides
60–80 patients in a day, they can of the lymph node cells and required
only accumulate little information, to identify if it was cancerous or not,
as that is where the doctors lack the automated diagnosis method
interest and provided might make gave an accuracy of about 92%,
mistakes in recognizing the serious- which nearly matched the success
ness of the disease or illness. That is efficiency rate of the pathologist
where an AI can be counted on. AI who gave an accuracy of about 96%.
can be efficiently used to check out Recognizing the presence or absence
the patients in emergency sections, of metastatic cancer in the patient’s
provided AI should be able to predict lymph nodes is an important work
the level of illness with sufficient done by pathologists. Looking into
168 Handbook of Artificial Intelligence in Biomedical Engineering
knowledge disparities got from the check for the shortest route to reach
past and future human civilizations, your destination quickly, and to do
AI will quickly discover the type of what not.
potential that cliché would provide AI has improved medical analysis
to human development for a decade, and decision-making performance
century, or millennium starting in numerous medical undertaking
from now. Many high-end machine- fields. Physicians may necessarily
learning models produce outcomes adapt to their fresh practice as
that are complicated to understand data accumulators, presenters, and
by unaided humans. An AI would patient supporters, and the medical
absorb an enormous quantity of education system may have to give
extra hardware subsequent to the them the equipment and technique
attainment of some edge of profi- to do so. How AI-enhanced applica-
ciency. Faster-emerging artificial tions and systems performance will
intelligent system represents a provide a great impact on the existing
better scientific and technological medical practice including disease
challenge. analysis, detection, and treatments
In this era of advancement, we will probably be determined on how
can see that machines are learning AI applications will co-combine
while people are being hooked up with the healthcare systems that are
with their mobile; knowing the fact under revolutionary development
that it is simply a tool we fiddle with. financially with the adaptation of
As the advancement gradually moves molecular and genomic science.
up just to ease the humungous task The people who get benefited, or get
with the help of machines, people controlled from the AI applications
will now become jobless. AI-based and systems are yet to be determined,
applications and machines are now but the balance of rigid regulations
being used to clean the dishes, serve of safeguards and market services
as a waiter for 24 h with a single to certify that people/patients get
charge, check for cash balance at the advantage the most should be if great
bank, talk when bored, and answer priority. AI is a one road challenge,
your mysterious questions, help and it is the road we will end up
you in cooking, recite a poem or taking.
even sing a lullaby song when you In this article, we came across a
are insomniac, suggest you to drink plenty number of applications and
water, check for the recent missed systems that have started to create
calls and dial a number, give you wonders in the field of medical
the recent weather updates, drive science, such as the drug discovery,
you home safe in the driverless car, automated data collection, literature
Biomedical Imaging Techniques Using AI Systems 171
a heart attack. Hospitals can make involve the heart and blood vessels.
use of appropriate decision support Cardiovascular disease includes
systems, thus minimizing the cost of coronary artery diseases (CADs) like
clinical tests. Nowadays, hospitals angina and myocardial infarction
employ hospital information systems (commonly known as a heart attack).
to manage patient data. Terabytes There is another heart disease called
of data are produced every day. To coronary heart disease, in which
avoid the impact of the poor clinical a waxy substance called plaque
decision, quality services are needed. develops inside the coronary arteries
Hospitals can make use of appro- that is primarily responsible for
priate decision support systems, thus supplying blood to the heart muscle
minimizing the cost of clinical tests. that is rich in oxygen. When plaque
Huge data generated by the health- accumulates up in these arteries, the
care sector must be filtered for which condition is termed as atheroscle-
some effective methods to extract the rosis. The development of plaque
efficient data is needed. The mortal happens over many years. Over
rate in India increases due to the time, this plaque deposits harden or
noncommunicable diseases. Data rupture (break open) that eventually
from various health organizations narrows the coronary arteries, which
like World Health Organization and in turn reduces the flow of oxygen-
Global Burden of Disease states that rich blood to the heart. Because of
most of the death is due to the cardio- these ruptures, blood clots form on
vascular diseases. its surface. The size of the blood
Heart disease is a predominant clot also makes the situation severe.
reason for the increase in the The larger blood clot leads to flow
mortality rate. A method to detect the blockage through the coronary artery.
presence of heart disease in a cost- When time passes by, the ruptured
effective way becomes essential. The plaque gets hardened and would
objective of the article is to compare eventually result in the narrowing
the performance of various machine of the coronary arteries. If the blood
learning algorithms to construct a flow has stopped and is not restored
better model that would give better very quickly, that portion of the heart
accuracy in terms of prediction. muscles begins to die.
When this condition is not treated
8.2 OVERVIEW OF HEART as an emergency, a heart attack occurs
leading to serious health problems
DISEASE
and even death. A heart attack is a
Heart diseases or cardiovascular common cause of death worldwide.
diseases are a class of diseases that Some of the symptoms of the heart
Analysis of Heart Disease Prediction Using Machine Learning Techniques 175
activities. The function of the heart chest pain is one of the symptoms
is affected due to various conditions for coronary artery; all the people
that are termed as heart disease. would not have the same symptoms
Some of the common heart diseases as others, some may have chest pain
are CAD, cardiac arrest, congestive as a symptom of indigestion. The
heart failure, arrhythmia, stroke, and doctor confirms the heart disease
congenital heart disease. with the diagnosed report of the
The symptoms to predict heart patient and various other param-
disease depend upon the type of eters. Some of the most common
heart disease. Each type will have heart diseases are listed in Table 8.1
its own symptoms. For example, with their description.
build using machine learning will be The same level of training while
precise and it reduces the unknown developing a model is needed after a
risk. The machine learning tech- model is built. The machine learning
nique will take completely different cycle is continuous, and choosing the
approaches and build different correct machine learning algorithm
models relying upon the sort of is just one of the steps.
information concerned. The value The steps that must be followed
of machine learning technology in the machine learning cycle are as
is recognized in the health-care follows:
industry with a large amount of data.
Identify the data: Identifying the
It helps the medical experts to predict
relevant data sources is the first
the disease and lead to improvised
step in the cycle. In addition, in the
treatment.
process of developing a machine
Predictive analytics (Hurwitz,
learning algorithm, one should plan
2018) helps anticipate changes based
for expanding the target data to
on understanding the patterns and
improve the system.
anomalies within that data. Using
such models, the research must Prepare data: The data must be
be done to compare and analyze cleaned, secured, and well-governed.
a number of related data sources If a machine learning application is
to predict outcomes. Predictive built on inaccurate data the chance
analytics leverages sophisticated for it to fail is very high.
machine learning algorithms to
gain ongoing insights. A predictive Select the machine learning algo-
analytics tool requires that the model rithm: Several machine learning
is constantly provided with new data algorithms are available out of
that reflects the business change. which best suitable for applications
This approach improves the ability to the data and business challenges
of the business to anticipate subtle must be chosen.
changes in customer preferences, Train: To create the model,
price erosion, market changes, and depending on the type of data and
other factors that will impact the algorithm, the training process may
future of business outcomes. be supervised, unsupervised, or
The machine learning cycle reinforcement learning.
creates a machine learning applica-
tion or operationalizing a machine Evaluate: Evaluate the models to
learning algorithm is an iterative find the best algorithm.
process. The learning phase has to Deploy: Machine learning algo-
be started as clean as a whiteboard. rithms create models that can be
Analysis of Heart Disease Prediction Using Machine Learning Techniques 179
Kim and Lee (2017), have used with the results of the CNN for the
the dataset taken from the sixth same. Results yield 90% accuracy
Korea National Health and Nutrition in the prediction of heart diseases,
Examination Survey to diagnose the whereas CNN achieves only 82%
heart-related diseases. The feature accuracy, thus enhancing the heart
extraction is done by statistical disease prediction.
analysis. For the classification, the Ajam (2015) chooses a feed-
deep belief network is used that forward back propagation neural
obtained an accuracy of 83.9%. network for classifying the absence
Olaniyi and Oyedotun (2015) and presence of heart disease. This
have taken 270 samples. They were proposed solution used 13 neurons
divided into two parts that are the in the input layer, 20 neurons in
training dataset and the testing the hidden layer, and 1 output layer
dataset. This division is based on neuron. The data here is also taken
60:40, that is, 162 training dataset from the UCI machine learning
and 108 testing dataset for the repository. The dataset is separated
network input. The target of the into two categories such as input
network is coded as (0 1), if there is and target. The input and target
presence of heart disease and (1 0) if samples are divided randomly into
heart disease is absent. The dataset 60% training dataset, 20% valida-
that is used is taken from the UCI tion dataset, and 20% testing dataset.
machine learning repository. The The training set is presented to the
feedforward multilayer perceptron network and the network weights
and support vector machine (SVM) and biases are adjusted according to
is used for the classification purpose its error during training. The pres-
of the heart disease. The results ence and the absence of the disease
obtained from the work are 85% in are known with the target outputs
the case of feedforward multilayer 1 and 0, respectively. The proposed
perceptron, and 87.5% in case of solution had proved to give 88%
SVM, respectively. accuracy experimentally.
In the work, “Prediction of In “Diagnosis of heart disease
Heart Disease Using Deep Belief based BCOA,” UCI dataset is used
Network” (2017), the deep belief to evaluate the heart attack. This
network is utilized for the predic- dataset includes test results of 303
tion of heart disease that is likely to people. The dataset used in this work
occur for the human beings. It was contains two classes, one class for
developed in MATLAB 8.1 develop- healthy people and the other class
ment environment. This proposed for people with heart disease. In this
solution is then later on compared work, a binary cuckoo optimization
182 Handbook of Artificial Intelligence in Biomedical Engineering
The dataset has been collected dataset chosen are as follows (Table
from UCI machine learning reposi- 8.2):
tory (Cleaveland clinic dataset).
The dataset includes 303 records • Input attributes
including the 14 attributes. The • Key attribute
types of the attributes present in the • Predictable attribute
In a SVM, an N-dimensional
space is considered for plotting
all the data points. SVM not only
performs linear classification but
also the nonlinear classification
where the inputs are implicitly
mapped to the high dimensional
feature space. SVM is considered to
be advantageous because of a unique
technique called Kernel Trick, where
a low dimensional space is converted
FIGURE 8.4 K-nearest neighbor. to high-dimensional space and clas-
Source: https://www.analyticsvidhya.com/ sified. Thus, the SVM constructs a
blog/2018/03/introduction-k-neighbours- hyperplane or set of hyperplanes.
algorithm-clustering/ The construction is done in a high
or infinite-dimensional space for
The equation is as given below: usage in classification, regression, or
model = train_model(X_ outlier detection.
train, y_train, X_test, y_test, The hyperplane that has the
KNeighborsClassifier) largest functional margin is said to
achieve a good functional margin.
Thus, the generalization error of the
8.5.5 SUPPORT VECTOR classifier is reduced. The SVM gives
MACHINE an accuracy of 97% when employed
on the dataset. The hyperplane that
The support vector machine is also is used for classification purposes is
known as SVM in machine learning shown in Figure 8.5.
is a supervised learning model that
analyzes the data used for classifica-
tion and regression analysis using
associated learning algorithms.
The SVM, an emerging approach
is a powerful machine learning
technique for classifying cases. It
FIGURE 8.5 Support vector machine
has been employed in a range of (SVM).
problems and they have a successful
application in pattern recognition
in bioinformatics, cancer diagnosis, The equation is given as
and more.
190 Handbook of Artificial Intelligence in Biomedical Engineering
such as ANN apply different inputs The first step is initiated with
in the initial stage (Abu-Hanna and the network receiving a patient’s
Lucas, 1998). These input files are data to make a prediction of the
forms of processes that are well diagnosis. The next step would be
within the context of the formerly feature selection. Once the diagnosis
known history of the defined is completed, feature selection takes
database to generate an appropriate place. Feature selection provides the
output that is expected (Figure 9.1). necessary information to differen-
tiate between the health conditions
of the patient who is being evaluated.
The next step is building of the data-
base itself. All the data available are
validated and finally preprocessed.
With the help of ANN, the training
and verification of database using
training algorithms can be used to
predict diagnosis. The end diagnosis
as predicted by the network itself
is further evaluated by a qualified
physician.
data. The factors that significantly ones that are obtainable is some-
affect any kind of diagnosis are times allotted a victimization varied
instrumental data and laboratory approach. The primary tools that can
data and largely determined by the be utilized for variable selection are
convenience of the practitioner as follows:
itself. Clinicians are provided with
a. Powerful mathematical
ample training to enable them to
means of information
extract the required relevant infor-
mining.
mation from every sort of data and
b. Principal component
point out any kind of diagnosis that
analysis.
can be done. In ANN, this particular
c. A Genetic algorithm program.
information is referred to as features.
Features may range from being Utilizing the help of suitable
biochemical symptoms to any other trainee examples, we train the network
information that gives insight into with the “example” data of one patient
what the ailment could possibly be. that is fed, examined, and collected as
The last diagnosis is linked to the a feature. The major component that
level of expertise associated with affects the prediction of diagnosis,
the skilled clinician. ANNs have quality of training, and the overall
actually higher flexibility, and their result is the training sample used.
ability to compare the information Enough number of samples whose
with formerly stored samples is what diagnosis is well known must be
has enabled fast medical diagnosis. within the database used for training
Varieties of neural networks are to enable the community to extract the
feasible to solve sensory activity provided information hidden within
problems, whereas some are adapted the database. The network employs
for purposeful information modeling this knowledge that is extremely
and approximation. Irrespective assessing the new cases. Despite this,
of the features chosen, the people laboratory data received from clinics
selected to train the neural system should be easily transferable with
must be robust and clear indica- other programs for computer-aided
tors of a given clinical scenario or diagnosis (Aleksander and Morton,
pathology. The choice of features 1995).
depends upon medical expertise 2. Building the database: Multi-
choices done formerly. Thus, any layer feed-forward neural networks,
short, nonspecific information that is such as Bayesian, stochastic, recur-
redundant to the investigation itself rent, and fuzzy are used. The optimal
is avoided. Selection/extraction of neural community architecture for
appropriate features among other the maximum values for both training
A Review on Patient Monitoring and Diagnosis Assistance 201
FIGURE 9.5 ANN applied on an image document. The left figure shows the noise scanning
of the image, and the right figure is a clear image after using ANN.
(Source: Adapted from Shuka, et al., 2016.)
A Review on Patient Monitoring and Diagnosis Assistance 205
TABLE 9.1 Sample Method for Each Set of Algorithm Type (Rows) ISHM Problem
(Columns) in Data-Driven Prognostics
Faculty Detection Diagnostics Prognostics
Physics based System Theory Damage
propagation models
AI-model based Expert systems Finite state machines
Conventional numerical Linear regression Logistics regression Kalman filters
Machine learning Clustering Decision trees Neural networks
(Source: Reprinted from Schwabacher and Goebel, nd.)
SEMANTIC ANNOTATION OF
HEALTHCARE DATA
M. MANONMANI* and SAROJINI BALAKRISHANAN
Department of Computer Science, Avinashilingam Institute for Home
Science and Higher Education for Women, Coimbatore 641043, India
Corresponding author. E-mail: manonmaniatcbe@gmail.com
*
accuracy and low computation time. data due to the ever-increasing quan-
The main objective envisaged in tity of medical data and documents.
this chapter consists of proposing Effective knowledge discovery can
a semantic annotation model for be envisaged with the foundation
identifying patients suffering from of AI expert systems in the field of
chronic kidney disease (CKD). The medical diagnosis. The complexity
purpose of the semantic annotation of the medical data hinders commu-
is to enable the medical sector to nication between the patient and the
process disease diagnosis with the physician that can be simplified with
help of an Ontograf that shows the semantic annotation by providing
relationship between the attributes a meaningful abstraction of the
that represent the presence of CKD features in the diagnosis of chronic
represented as (ckd) or absence of diseases.
CKD represented as (not_ckd) and Proper implementation of
to attach meaningful relationships semantic annotation in medical
among the attributes in the dataset. diagnosis will ensure that every tiny
The semantic annotation model will detail regarding the health of the
help in increasing the classification patient is taken care of and important
accuracy of the machine learning decisions regarding their health are
algorithm in disease classification. A delivered to the patients through
collaborative approach of semantic remote access. Heterogeneous
annotation model and feature selec- medical information such as phar-
tion can be applied in biomedical AI maceutical’s information, prescrip-
systems to handle the voluminous tion information, doctor’s notes or
and heterogeneous healthcare data. clinical records, and healthcare data
is generated continuously in a cycle.
The collected healthcare data need to
10.1 INTRODUCTION
be harnessed in the right direction by
Semantic annotation of healthcare providing integrity of heterogeneous
data aids in processing the keywords data for diagnosing chronic illness
attached to the data attributes and and for further analysis. Otherwise,
deriving relationships between the the integrity of the vast medical data
attributes for effective disease clas- poses a major problem in the health-
sification (Du et al., 2018). Semantic care sector. Semantic annotation of
annotation implemented on the basis the incoming healthcare data forms
of artificial intelligence (AI) expert the basis of research motivation so
systems will bring accurate and that the problem of late diagnosis
timely management of healthcare can be overcome.
Semantic Annotation of Healthcare Data 219
the existing storage structure and with affinity and without affinity.
analytical solutions in the field of Co-occurrence cluster metrics and
data mining and IoT. An automated cosine similarity cluster metrics
system that can process word were evaluated. To take into consid-
categories easily was devised with eration multiple word categories,
an extension to the unsupervised the model was extended further to
model. They have aimed to provide accomplish the multiple words and a
a solution for semantic annotation, novel unsupervised learning method
and the Miller–Charles dataset and was developed. The issues like
IoT semantic dataset were used to noisy dimension from distributional
evaluate the undertaken research profiles and sense-conflation were
work. Among human classification, taken into consideration and to curb
the correlation achieved was 0.63. this, dimensional reduction filters
The reference dataset, that is, the and clustering were employed in this
Miller–Charles dataset was used to model. By this method, the accuracy
find the semantic similarity. A total can be increased, and also this model
of 38 human subjects have been can be made used more potentially.
analyzed to construct the dataset tath A correlation of 0.63 was achieved
comprises 30 word-pairs. A scaling after evaluating the results against
system of 0 for no similarity and 4 for the Miller–Charles dataset and an
perfect synonym was adopted to rate IoT semantic dataset.
the word pairs. Then, 20 frequently Sharma et al. (2015) have
used terms were collected and they presented an evaluation of stemming
were ordered into 30-word pairs. and stop word techniques on a text
Each pair was rated on a scale from classification problem. They have
0 to 4 by five fellow researchers. The summarized the impact of stop word
correlation result was 0.8 for human and stemming onto feature selection.
classification. Unsupervised training The experiment was conducted with
methods were used by the author to 64 documents, having 9998 unique
mark the groups and also to improve terms. The experiments have been
accuracy. The model was evaluated conducted using nine documents
based on the mean squared error. For with frequency threshold values
a given target word u and different (sparsity value in %) of 10, 20,
neighborhood dimensions, the 30, 40, 50, 60, 70, 80, and 90. The
performance of distributional profile threshold is the proportion value
of a word represented as DPW (u) instead of the sparsity value. Experi-
and distributional profile of multiple mental results show that the removal
word categories represented as of stop-words decrease the size of
DPWC (u) was calculated for both the feature set. They have found the
Semantic Annotation of Healthcare Data 225
rule mining algorithm checks each in the rule mining algorithm. If the
feature in the dataset and generates condition is not satisfied, then a rela-
a keyword for the satisfied condi- tionship is expressed between the
tion. There exists a relationship selected feature and the binary clas-
between the selected features and sification of type “no.” The result is
the binary classification type “yes” a context-aware data represented in
if the feature selected satisfies the the form of Ontograf that aids in the
given if–then condition specified analysis of the medical diagnosis.
FIGURE 10.3 Ontograf showing the relationship between the attributes and class.
Semantic Annotation of Healthcare Data 229
In Figure 10.4 the overall rela- the patient details, and the features
tionship between the main OWL related to these classes is depicted.
instance, the binary class of CKD,
FIGURE 10.4 Ontograf showing overall relationship between the main class, subclass, and
features.
FIGURE 10.5 Ontograf that depicts the semantic relationship between the class and the
binary outputs “ckd” and “not ckd.”
230 Handbook of Artificial Intelligence in Biomedical Engineering
FIGURE 10.6 Ontograf showing the semantic relationship between patient383 and patient1
and binary outputs “ckd” and “notckd.”
model to annotate the data and predict Generation Computer Systems, 86,
if a particular patient is affected or 792–798.
Ashrafi, N., et al. (2017). Semantic
not affected by any chronic disease. interoperability in healthcare: challenges
The issues of information sharing and roadblocks. In Proceedings of
between the various devices and STPIS’18, vol. 2107, pp 119–122.
allied components in the healthcare Chui et al.(2017). Disease diagnosis in smart
sector are handled with the help of healthcare: innovation, technologies and
applications. Sustainability. 9(12), 2309.
ontology-based semantic analysis of Du, Y., et al. (2018). Making semantic
the healthcare data (Tchechmedjiev annotation on patient data of depression.
et al. 2018). This semantic annota- Proceedings of 2018 the 2nd International
tion model will prove to be effective Conference on Medical and Health
in identifying the relevant features Informatics (ICMHI 2018), Association of
Computing Machinery, pp. 134–137.
that aid in disease diagnosis mainly Gefen, D., et al. (2018). Identifying patterns
for patients who are suffering from in medical records through latent semantic
chronic diseases. This chapter stresses analysis. Communications of the ACM,
the need for achieving semantic 61(6), 72–77.
annotation by surpassing the imple- Gia T. N. et al. (2015). Fog computing in
healthcare internet of things: a case study
mentation challenges with the use on ECG feature extraction, 2015 IEEE
of ontology. The semantic analysis International Conference on Computer
provides meaningful relationships and Information Technology; Ubiquitous
between the features that help in the Computing and Communications;
early diagnosis of chronic illness. Dependable, Autonomic and Secure
Computing; Pervasive Intelligence and
Computing, Liverpool, 2015, pp. 356–363.
Guerrero-Contreras, G., et al. (2017). A
KEYWORDS collaborative semantic annotation system
in health: towards a SOA design for
knowledge sharing in ambient intelligence.
semantic annotation
Mobile Informations Systems, 2017,
heterogeneous models Article ID 4759572, 10 pages.
medical data mining He, Z., Tao, C., Bian, J., Dumontier, M., and
Hogan, W. R. (2017). Semantics-powered
artifcial intelligence (AI) healthcare engineering and data analytics.
expert system Journal of Healthcare Engineering, 2017,
machine learning algorithm 7983473. doi:10.1155/2017/7983473.
Househ, M. and Aldosari, B. (2017). The
hazards of data mining in healthcare.
Studies in Health Technology and
REFERENCES Information. 238, 80–83.
Jabbar, S., Ullah, F., et al. (2017). Semantic
interoperability in heterogeneous IoT
Antunes, M., Gomes, D., and Aguiar, R.
infrastructure for healthcare. Wireless
(2018). Towards IoT data classification
through semantic features. Future
232 Handbook of Artificial Intelligence in Biomedical Engineering
melody.moh@sjsu.edu
study on this issue, including the side effects that are not reported by
side effects of simultaneously using the average consumer.
two drugs, and the potential danger To solve this problem, one
of using less common combination solution is to use a much larger
of drugs. We believe the pipeline database where many more reports
design and the results present in this of side effects can be found: social
work would have helpful implication media. With the current widespread
on studying drug side effects and on use of social media, the amount of
big data analysis in general. With data provided by platforms such as
the help of domain experts, it may LinkedIn, Facebook, Google, and
be used to further analyze drug side Twitter is enormous. Social media
effects, medication errors, and drug has been used in many different
interactions. fields of study due to both its large
sample size as well as its ease of
access. For mining drug side effects,
11.1 INTRODUCTION social media has many different
users who report their daily use of
Monitoring drug side effects is an the drugs they are taking as well as
important task for both the Food and any side effect they get, and most
Drug Administration (FDA) as well of these reports are in the form of
as the pharmaceutical companies communications with other users.
developing the drugs. Missing these To achieve this goal, machine
side effects can lead to potential learning can be used to design and
health hazards that are costly, forcing implement a pipeline that will aid in
a drug withdrawal from the market. mining Twitter for the frequency of
Most of the important side effects are reported drug side effects. The pipe-
caught during the drug clinical trials, line also has to be fast enough and
but even those trials do not have a have the ability to support large data-
large enough sample size to catch sets through a distributed framework
all the side effects. As for drugs that such as Apache Spark. The data used
are already on the market, current will come from Twitter, which has
reporting systems for those drugs its own set of unique features. In the
use voluntary participation, such as pipeline, Twitter was chosen because
the FDA Adverse Event Reporting of its ease of access to the data in the
System (FAERS), which monitors form of tweets through the Twitter
reports of drug side effects from Application Program Interface
healthcare providers 0. Thus, the (API). Also, the tweets are only 140
system only catches side effects that characters long, making them easy to
are considered severe while missing process and store.
Drug Side Effect Frequency Mining over a Large Twitter Dataset 235
calculate the frequency for further running the potential targets through
analysis and applications. multiple machine learning predic-
tors and calculating a combined
score as the identifying feature for
11.2.4 APACHE SPARK the compound. The predictors gave
a score for the compound based on
Apache Spark is the distributed how well the compound could target
framework used in this chapter to a protein, and this interaction was
process large datasets. It is a cluster based on how well the compound’s
computing system that has become shape complemented the protein
widely used in the recent decade. shape. They partitioned their data
Spark is an improvement over into multiple chunks to process their
Hadoop’s MapReduce paradigm in dataset of compounds in parallel.
terms of speed of batch processing The results of their work showed
0. Spark distributes the workload that the time for processing their
over a cluster for distributed, parallel large dataset decreased linearly with
processing. the number of nodes used in Spark.
Apache Spark’s core feature Similarly, in this chapter, Spark is
is the resilient distributed dataset used to process the large dataset
(RDD), a read-only dataset over the by splitting the tweet dataset into
whole cluster. RDDs can be stored chunks for parallel processing to
in memory for faster repeat batch improve pipeline speed.
processing instead of being stored on
the system’s hard disk. RDDs are also
fault tolerant and can be used in the 11.3 DESIGN AND APPROACH
same tasks that Hadoop can do such
as mapping and reducing. Spark has This section will go into detail of
an extensive set of tools supported, how the current improved pipeline
and their machine learning library is works based different machine
widely used and integrated well with learning classifiers and distributed
their RDD paradigm. computing. The pipeline should
Apache Spark is extremely useful start by identifying whether a tweet
when processing large datasets. contains a drug-caused side effects
In the work by 0, Spark is used to and at the end outputting an updated
improve the speed of identifica- count of the different side effects
tion of potential drug targets to be reported for each drug. There are
studied in clinical trials. The original five parts to the pipeline, as shown
pipeline was changed to process in Figure 11.1. First, the tweets are
the drug compounds in parallel by mined and filtered. Then, the tweets
Drug Side Effect Frequency Mining over a Large Twitter Dataset 239
FIGURE 11.1 Pipeline for extracting frequency of drug side effects from Twitter.
FIGURE 11.2 Ensemble classifier with six classifiers and a soft or hard majority vote of
the classifier predictions.
Drug Side Effect Frequency Mining over a Large Twitter Dataset 243
FIGURE 11.3 Apache Spark pipeline with 12 cores for distributed processing. RDDs are
stored in memory between each step. Spark is assigns tasks by automatically partitioning at
each step: data preprocessing (DP), feature extraction (FE), SVM classifier, and frequency
extraction with MetaMap.
TABLE 11.2 Classifier Accuracies (f-Measure Score Weighted) for Different Combinations of Features with the Best for Each in Bold
Features SVC GNB LGR SGD kNN DTC RFC Ensemble Ensemble
(Soft) (Hard)
f1+f2 (char_wb) 0.6202 0.5080 0.6094 0.5955 0.5273 0.5080 0.5009 0.5389 0.6136
f1+f2+f3 (word) 0.6280 0.6036 0.6096 0.5599 0.5670 0.5036 0.5663 0.5692 0.6066
f1+f2+f3+f4 (char_wb) 0.6532 0.6262 0.6352 0.5767 0.6311 0.6449 0.5338 0.6710 0.6686
f1+f2+f3+f4+f5 (char_wb) 0.6792 0.6768 0.6899 0.5734 0.6576 0.6131 0.6446 0.6468 0.7128
f2+f3+f4+f5+f6+f7 (word) 0.7229 0.6827 0.7099 0.6913 0.7036 0.6746 0.6291 0.7097 0.7449
f1+f2+f5+f6+f7 (char_wb) 0.7186 0.6460 0.7260 0.6311 0.7332 0.7467 0.6873 0.6671 0.7467
f1+f2+f3+f4+f5+f6+f7 0.7392 0.7219 0.7347 0.6797 0.7032 0.6825 0.6174 0.7568 0.7760
(char_wb)
f1+f2+f3+f4+f5+f6+f7+f8 0.7028 0.6973 0.7147 0.6907 0.6643 0.5939 0.6881 0.7219 0.6992
(char_wb)
f1, unigram; f2, bigram; f3, trigram; f4, four grams; f5, SentiWordNet; f6, AFINN; f7, MPQA; f8, Bing-Liu.
Handbook of Artificial Intelligence in Biomedical Engineering
Drug Side Effect Frequency Mining over a Large Twitter Dataset 247
inner map call was used to allow the both SVM and LGR compared with
predictions to occur on each node the other four classifiers. RFC was
sequentially. The predicted tweets excluded from the ensemble clas-
were then run through MetaMap on sifier as RFC itself is an ensemble
another Spark job due to MetaMap classifier. Using Yu et al.’s (2016)
being supported only with a Java work as a baseline of their best f1
API. The side effect counts were score of 0.7690 with SVM 0, our
extracted before being merged ensemble classifier had a small
together into an output text file. improvement. The best nonensemble
classifier was the DTC with a f1
measure score of 0.7467, which still
11.5 RESULTS was a small improvement from the
previous work’s decision tree classi-
In the following subsections, results fier f1 score of 0.7447 0.
from experiments on the pipeline The best features to use were
are presented for the accuracy, all four n-grams from unigram to
processing speed-up, and frequency four-gram plus three of the lexicons:
of drug side effects. SentiWordNet, AFINN, and MPQA.
The trend of the data shows more
features gives better accuracy up to
11.5.1 ACCURACY a certain point. Adding the feature
of the final lexicon Bing Liu gave a
For testing the Scikit-learn pipe-
lower accuracy, which most likely is
line, a fivefold cross validation
caused by overfitting.
was used on different combina-
tions of features. The weighted f1
score was then calculated for each 11.5.2 PIPELINE SPEED
of the machine learning classifiers COMPARISON
for comparison, as shown in Table
11.2. The experiment with unigram Next, the speed between the Scikit-
and bigram was used as a baseline learn pipeline (shown in Figure
for comparison with the other 11.1) and the Apache Spark pipeline
features. (shown in Figure 11.3) using the
The best classifier was the SVM classifier were compared.
ensemble classifier with hard voting From the dataset, 200,000 tweets ran
with an f1 measure score of 0.7760. through the both pipelines and the
Different weights were tested for the time was recorded upon completion,
ensemble classifier, and the optimal as shown in Table 11.3.
weights were double-weighted for
248 Handbook of Artificial Intelligence in Biomedical Engineering
TABLE 11.3 Total Time to Extract Frequency To further investigate these drugs
of Drug Side Effects for Both Pipelines and their side effects, and to compare
Pipeline Total Time (min) with the previous work 0, the top five
Scikit-learn 257.88 drugs (plus two extras for compar-
Apache Spark 105.63 ison) are analyzed in Table 11.5, each
with their five most reported negative
side effects, respectively. These were
Spark was faster than the Scikit- manually examined and extracted
learn pipeline by around 2.5 times from the list of side effects to remove
due to Spark’s parallel processing side effects that were alleviated by
capabilities. the drug and those not caused by the
drug. Each of the top five side effects
was manually checked to make sure
11.5.3 FREQUENCY OF DRUG
the drug did cause the side effect in
SIDE EFFECTS their respective tweets.
Most of the side effects were
The pipeline outputs the frequency
from the MetaMap semantic groups
of drug side effects. For the experi-
“Disorder” and “Physiology.” Note
ment, ut of the 200,000 tweets,
that the side effects reported do not
78,242 tweets were predicted as
consider if the side effects were
tweets containing drug side effects.
directly caused by the drug. The
Table 11.4 shows the top ten
predicted tweets based on the training
drugs with the most side effects
dataset were geared more toward
reported.
false positive, as missing side effects
TABLE 11.4 Top 10 Most Reported Drugs were considered more detrimental
by Twitter Users than over reporting. MetaMap also
Drug No. of Tweets No. of Side had problems in extracting side
Predicted Effects effects due to catching all medical
Reported terms, thus requiring the filter of the
Xanax 12081 27,289 semantic groups.
Adderall 7958 16,906 The predicted tweets based on the
Ibuprofen 7822 16,050 training dataset were geared more
Melatonin 5873 14,274 toward false positive, as missing side
Benadryl 5259 13,708
effects were considered more detri-
mental than over reporting. MetaMap
Tylenol 5263 13,469
also had problems in extracting side
Insulin 5070 12,248
effects due to catching all medical
Nicotine 4819 11,763
terms, thus requiring the filter of the
Aspirin 3185 7638 semantic groups.
Morphine 3028 7223
Drug Side Effect Frequency Mining over a Large Twitter Dataset 249
TABLE 11.5 Frequency of Side Effects Reported, Showing the Top Five Reported Side
Effects per Drug with Number Reported
Drug Name Drug Use Side Effect Side Effect 2 Side Effect Side Effect Side Effect
1 3 4 5
Xanax Anxiety Drowsiness/ Abnormally Addictive Blackout Withdrawal
sleep (291) high (76) behavior (13) (9)
(66)
Adderall ADHD Emotions Addictive Insomnia Tired (17) Binge
(122) behavior (29) (26) eating
disorder
(16)
Ibuprofen Fever, Emotions Drowsiness/ Binge Abnormally Allergic
headache/ (169) sleep (126) eating high (16) reaction
pain disorder (14)
(17)
Melatonin Insomnia Emotions Nightmares Binge Weight loss Anxiety (7)
(89) (25) eating (11)
disorder
(21)
Benadryl Allergy Drowsiness Tiredness Dry throat Nausea (3) Dizziness
(107) (23) (13) (2)
Gabapentin Seizure/ Emotions Insomnia (5) Hot flushes Confusion Dryness (1)
pain (6) (2) (2)
drowsiness and fatigue, and as it also For example, “Chills,” the most
has strong anticholinergic effects, it reported side effect of Xanax,
causes dry mouth and throat as well. in context means “to relax” but
Nausea is not defined as a side effect; to MetaMap, the concept means
in fact, Benadryl is used in combina- “shivers.” Thus, extracting negative
tion to treat motion sickness. side effects of the drugs required
Gabapentin is utilized for seizure both reducing by MetaMap category
and neuropathic pain. Hot flushes, as well as by manual examination, to
dryness, confusion, insomnia, and correctly identify which side effects
irritability as well as depression have were negative. Furthermore, each
all been reported as well as other tweet usually contained more than
emotions. Insomnia has especially one side effect besides the nega-
tive side effect caused by the drug,
been reported upon discontinuation
requiring further manual examina-
of the drug, along with anxiety.
tion to determine which side effect
within the tweet is the one caused by
11.5.5 CHALLENGES AND the drug.
Other complications include
LIMITATIONS
tweets with multiple drugs, as asso-
There have been challenges and ciating the side effect with the correct
drug(s) requires manual examination
limitations to the pipeline concerning
as well. It is not known whether the
the extraction of drug-caused side
side effects in these cases are caused
effects.
by one of the drugs, both drugs, or
First, all of the subcategories
some form of interaction between
for the MetaMap group “Disorder” the drugs. These lead to preliminary
had to be used, as leaving out any work in Section 11.6.
subcategories might cause side
effects to be missed. Secondly,
MetaMap extracts all side effects 11.6 NEXT STEPS:
from the tweets, both those caused APPLICATIONS OF DRUG SIDE
and those not caused by the drug, EFFECT FREQUENCY ANALYSIS
thus also requiring manual examina-
tion to identify the drug-caused side Using the frequency extracted from
effect. However, an external dataset the proposed pipeline, one can make
containing all possible side effects some observations on the most
that are alleviated by the drug can be common side effects as well as rare
used to remove some of these extra side effects reported. One can also
non-caused side effects. observe the side effects that may be
Drug Side Effect Frequency Mining over a Large Twitter Dataset 253
caused by two or more drugs taken in all the top 10 drugs was nausea.
together; some may be side effects For example, Adderall and Vyvanse
caused by rare drug pairs that might are known to cause nausea as they
be potentially dangerous. The are stimulants that suppress the
following results required manual appetite. Ibuprofen is also known
examination to remove side effects to cause nausea, especially when
that were not caused by the drug or
taken without food, as it is a nonste-
were alleviated by the drug as well
roidal anti-inflammatory drug
as any other side effect that was
incorrectly reported. This required and its effects on inhibiting COX
going through the tweets manu- also inhibit stomach protective
ally to make sure the side effect enzymes. Xanax and Gabapentin
was caused by the drug(s). Some cause nausea. Melatonin causes
preliminary studies and observa- abdominal cramps and would be
tions are reported in the following associated with nausea. Tylenol
subsections. is not listed as causing nausea but
can cause gastrointestinal hemor-
rhage, so this is new information
11.6.1 MOST FREQUENTLY
that would be forwarded to FAERS.
REPORTED SIDE EFFECTS
Also, Benadryl had two reports of
The top three side effects were nausea, which were reports that
drowsiness/tiredness, emotions, and would also be forwarded to 0.
being abnormally high. “Drowsi-
ness” is considered a mild side effect
that affects most people, thus being 11.6.2 SIDE EFFECTS CAUSED
commonly reported. People who BY MORE THAN ONE DRUG
have reported being emotional can be
inferred as being more likely to share Next, predicted tweets where
their emotions on Twitter, which is more than one drug was used were
probably the cause of large number examined. Having multiple drugs
of reports. Finally, “abnormally makes it hard to correctly identify
high” was largely reported because which side effect is caused by
of the large number of tweets related which drug. Out of the predicted
to drugs that cause this side effect, tweets, 2678 contained more than
most notably Xanax and other drugs one drug. Table 11.7 lists the top
used for anxiety. six drugs that were mentioned most
An example of a rare side effect out of these tweets containing two
that was less reported but was seen or more drugs.
254 Handbook of Artificial Intelligence in Biomedical Engineering
TABLE 11.7 List of Top Five Drugs mentioned with Other Drugs in a Tweet and Top Two
Side Effects
Drug Tweets Side Effect 1 Side Effect 2
Tylenol 274 Emotions (6) Drowsiness (3)
Xanax 203 Addiction (4) Drowsiness (4)
Ibuprofen 196 Drowsiness (4) Allergic (2)
Adderall 50 Addiction (3) High (2)
Benadryl 99 Drowsiness (12) Insomnia (5)
Most of the tweets with multiple drugs did not specify which of the
drugs-caused the side effect. Also, some of the tweets focused on one of the
drugs not working or causing a side effect that required the second drug (or
even third) to solve their problem.
As shown in Table 11.8, most of the tweets with multiple drugs focused on
competing drugs. Ibuprofen, also known as Advil or Motrin, competes with
acetaminophen (Tylenol) for relieving pain and headaches. The “emotions”
side effect related mostly to anger caused by the ineffectiveness of Ibuprofen
or Tylenol at alleviating the pain.
Another pair of drugs, Adderall and Vyvanse, used for attention deficit
hyperactivity disorder (ADHD), unfortunately caused insomnia, as the drugs
providing focus also stopped the users from sleeping. Same conclusion can
be made for Adderall (for ADHD) and Xanax (for anxiety), which caused
insomnia on those patients who really need to sleep as well. Although Xanax
usually causes drowsiness, paradoxically, it can also cause insomnia, espe-
cially at higher doses (WebMD LLC, 2019).
TABLE 11.8 List of Top Five Most Mentioned Drug Pairs with Top Side Effect
Drug Pair Tweets Side Effect
Ibuprofen, Tylenol 131 Emotions (2)
Adderall, Vyvanse 54 Insomnia (2)
Adderall, Xanax 36 Insomnia (2)
Mucinex, Tylenol 25 Drowsiness (2)
Benadryl, Melatonin 23 Drowsiness (4)
Drug Side Effect Frequency Mining over a Large Twitter Dataset 255
purpose do to its rapid onset of finding side effects that are more
action. Tylenol has no drug–drug common as well as those that are
interaction with Ativan in this case, rare but potentially dangerous. In
it is merely mentioned in the tweet this chapter, an improvement on
due to the user’s unfamiliarity with previous pipelines for extracting
the medications: drug side effects from Twitter was
so apparently mixing tylenol discussed. A pipeline was created to
and ativan makes you first identify tweets that contained
extremely high drug-caused side effects followed
by extracting the frequency of those
In the last example, Adderall is side effects. An increase in the accu-
used to treat ADHD and is used for racy of the classifier compared to
focus, but Benadryl made the user previous works had been achieved.
fall asleep instead of remaining Finally, the pipeline was also imple-
focused. The tweet, shown below, mented in Apache Spark to improve
shows the user saying that the user the speed of extraction as well as for
became drowsy, favoring the side processing large datasets.
effect of Benadryl (sleepy) instead
Future work for this pipeline
of Adderall (insomnia, focused).
includes the preliminary study of
Adderall typically causes insomnia
application of frequency analysis
only when used near bedtime, and
of drug side effects described in
Benadryl causes drowsiness irre-
this chapter. More studies would be
gardless of the time of day it is used:
beneficial for finding side effects
felt a stuffy so I took a benadryl of concurrently taking two or more
with my coffee and
drugs, and the proposed Apache
adderall. I’ll be fallin asleep Spark-based pipeline may further
and an inch from death today contribute in this direction.
From the above examples, we There are also challenges and
see that finding uncommon side limitations of the experiments and
effects from a combination of drugs analysis, and work may be extended
is important and can be expanded on to address and overcome them
further in the future. by involving domain experts and
improving the machine learning
methods. For example, a domain
11.7 CONCLUSION AND expert has advised that it is important
FUTURE WORK to split the analysis of adverse drug
effects into side effects, medication
Mining the frequency of adverse errors, and adverse drug reactions
drug side effects is important for and categorize them correctly. There
Drug Side Effect Frequency Mining over a Large Twitter Dataset 257
adverse drug events implications for Mining and Applications. Springer: Berlin,
prevention.” JAMA. 1995; 274(1): 29–34. Heidelberg. 2013. 434–443
Bodenreider, O.; Hole, W. T.; Humphreys, Liu, B. “Sentiment Analysis: Mining
B. L.; Roth, L. A.; Srinivasan, S. Opinions, Sentiments, and Emotions.”
“Customizing the UMLS metathesaurus Cambridge University Press. 2015. https://
for your applications.” In Proceedings of www.cs.uic.edu/~liub/FBS/sentiment-
the AMIA Symposium. November 2002. analysis.html (accessed December 21,
Burges, C. “A tutorial on support vector 2016).
machines for pattern recognition.” Data Medline Plus Database. https://medlineplus.
Mining and Knowledge Discovery. 1998; gov/ (accessed December 13, 2017)
2: 121–167. Meng, X.; Bradley, J.; Yavuz, B.; Sparks,
Cavnar, W. B.; Trenkle, J. M. “N-gram-based E.; Venkataraman, S.; Liu, D. et al.
text categorization.” In Proceedings of “MLlib: Machine learning in Apache
the 3rd Annual Symposium on Document Spark.” J. Mach. Learn. Res.. 2016; 17(1):
Analysis and Information Retrieval, 1235–1241.
SDAIR-94. Las Vegas, NV, USA. 1994. Nielsen, F. Å. “A new ANEW: Evaluation
161–175. of a word list for sentiment analysis
Deng, L.; Wiebe, J. “MPQA 3.0: An in microblogs.” In Proceedings of the
entity/event-level sentiment corpus.” In ESWC2011 Workshop on 'Making Sense
Proceedings of the NAACL-HLT, 2015. of Microposts': Big Things Come in
Drugs.com. “Popular Drugs” from Drug Small Packages 718 in CEUR Workshop
Index A to Z. from https://www.drugs. Proceedings. May 2011. 93–98.
com/drug_information.html (accessed NLTK (Nature Language Tool Kit). www.
December 14, 2016) nltk.org (accessed December 15, 2016)
FDA Adverse Event Reporting System Pedregosa, F.; Varoquaux, G.; Gramfort,
(FAERS). https://www.fda.gov/drugs/ A.; Michel, V.; Thirion, B.; Grisel, O. et
surveillance/fda-adverse-event-reporting- al. “Scikit-learn: Machine learning in
system-faers. (accessed August 12, 2018) Python.” J. Mach. Learn. Res. 2011; 12:
Harnie, D.; Vapirev, A. E.; Wegner, J. K.; 2825–2830.
Gedich, A.; Steijaert, M.; Wuyts, R.; Peng, Y.; Moh, M.; Moh, T.-S. “Efficient
Meuter, W. D. “Scaling machine learning adverse drug event extraction using Twitter
for target prediction in drug discovery sentiment analysis.” In Proceedings of the
using Apache Spark.” In Proceedings 8th IEEE/ACM International Conference
of the 15th IEEE/ACM International on Advances in Social Networks Analysis
Symposium on Cluster, Cloud and Grid and Mining, ASONAM. San Francisco,
Computing. Shenzhen. 2015. 871–879. California. Aug. 2016. 1101–1018.
Hsu, D.; Moh, M.; Moh, T.-S. “Mining Pyspark. Spark Python API. http://spark.
frequency of drug side effects over a apache.org/docs/latest/api/python/index.
large twitter dataset using Apache Spark.” html (accessed December 21, 2016).
In Proceedings of the 9th IEEE/ACM Roesslein, J. Tweepy (An easy-to-use
International Conference on Advances Python library for accessing the Twitter
in Social Networks Analysis and Mining, API). http://www.tweepy.org (accessed on
ASONAM. Sydney, Australia, July 2017. August 12, 2019).
915–924. Shapiro, K.; Brown, S. Rx Prep Course
Jiang, K.; Zheng, Y. “Mining Twitter data for Book. 2016. 100–123.
potential drug effects.” In Advanced Data Tabassum, N.; Ahmed, T. "A theoretical
study on classifier ensemble methods and
Drug Side Effect Frequency Mining over a Large Twitter Dataset 259
its applications". In Proceedings of the 3rd Santa Clara, California. October 2015.
International Conference on Computing 1570–1574.
for Sustainable Global Development Yu, F.; Moh, M.; Moh, T.-S. “Towards
(INDIACom). New Delhi. 2016. 374–378. extracting drug-effect relation from twitter:
Toutanova, K.; Klein, D.; Manning, C. D.; A supervised learning approach.” In
Singer, Y. “Feature-rich part-of-speech Proceedings of the IEEE 2nd International
tagging with a cyclic dependency network.” Conference on Big Data Security on Cloud
In Proceedings of the Conference of the (BigDataSecurity), IEEE International
North American Chapter of the Association Conference on High Performance and
for Computational Linguistics on Human Smart Computing (HPSC), and IEEE
Language Technology, NAACL'03, Vol. 1. International Conference on Intelligent
Association for Computational Linguistics. Data and Security (IDS). New York, NY.
Stroudsburg, PA, USA. 2003. 173–180. 2016. 339–344.
WebMD LLC, Medscape database. 2019. Zaharia, M.; Chowdhury, M.; Das, T.; Dave,
https://reference.medscape.com/ (accessed A.; Ma, J.; McCauley, M.; Franklin, M. J.;
August 13, 2019) Shenker, S.; Stoica, I. “Resilient distributed
Wu, L.; Moh, T.-S.; Khuri, N. “Twitter datasets: A fault-tolerant abstraction
opinion mining for adverse drug reactions.” for in-memory cluster computing.” In
In Proceedings of the IEEE International Proceedings of the NSDI’12. April 2012.
Conference on Big Data (BigData).
CHAPTER 12
requires a large number of expert extract or to look for and often boils
annotators. As mentioned in earlier down to expert domain knowledge
texts, image processing segmentation and previous research experiences.
methods are not sufficient to capture
the complexity of the brain structure.
12.5.2 LINEAR DISCRIMINANT
With a learnable parameter that adjusts
ANALYSIS
according to the data, machine learning
has been proposed for automated brain Linear discriminant analysis
segmentation. (LDA) is a machine learning
The subject of this section is method that uses a linear combina-
machine learning methods without tion of features to determine the
multilayer representation. We refer class of the inputs. The main prin-
to these methods as “traditional” ciple of LDA can be summarized as
machine learning methods in contrast projecting data from the extracted
to the DL models which will be the feature space onto a single dimen-
topic of the next section. While we sion space. The single dimension
mainly focus on DL approaches in this space can then be classified into
chapter, it is imperative to understand two classes by thresholding. The
the basics of traditional machine major disadvantage of using LDA
learning methods and understand is the linear nature of the classifier.
how DL methods compare against the In most practical cases, the data
conventional approach. cannot be separated in the feature
space by a linear function.
12.5.1 FEATURE SELECTION
In conventional machine learning, 12.5.3 SUPPORT VECTOR
algorithms operate on features MACHINE
extracted from the image rather
than the raw image itself. A feature SVM is a class of supervised learning
can be thought of as a summary models that can be used for both clas-
statistics regarding the subject sification and regression tasks. With
image. Average pixel intensity and a set of labeled data points, an SVM
standard deviation are some exam- learns from this dataset and assign
ples of scalar features. Features an unseen example to a category
can also be matrices like the results without assigning probabilities.
from a neighboring pixel filter like Hence, SVM is a nonprobability
Gaussian filters, Haar filters, etc. classifier. With SVM, each data is
Features can also be study in Section represented as an n-dimensional
12.11. In summary, there are no vector. SVM classifies data point by
rigorous rules for which feature to what is known as a “hyperplane” that
270 Handbook of Artificial Intelligence in Biomedical Engineering
the rectified linear unit (ReLU) and assigned classes in the ground truth
leaky ReLU have mostly replaced labels. For instance, if the task is to
the sigmoid function as the definitive classify voxels in an MRI into the
activation function. The size of the background, normal brain tissue, and
feature map remains unchanged after tumor tissue, then N = 3.
the activation function.
12.6.1.5 SKIPPED
12.6.1.3 POOLING LAYER CONNECTION
12.9.3 MRBRAINS
Pre-Operative Scans of the TCGA-LGG Ho, Tin Kam. “Random Decision Forests.”
Collection—TCIA DOIs—Cancer Imaging Proceedings of the 3rd International
Archive Wiki, wiki.cancerimagingarchive. Conference on Document Analysis
net/display/DOI/Segmentation Labels and and Recognition, doi:10.1109/
Radiomic Features for the Pre-operative icdar.1995.598994.
Scans of the TCGA-LGG collection. Kamnitsas, Konstantinos, et al. “Efficient
Cancer Imaging Archive Wiki. Segmentation Multi-Scale 3D CNN with Fully
Labels and Radiomic Features for the Connected CRF for Accurate Brain Lesion
Pre-Operative Scans of the TCGA-GBM Segmentation.” Medical Image Analysis,
Collection—TCIA DOIs—Cancer Imaging 2017, 36, 61–78. www.ncbi.nlm.nih.gov/
Archive Wiki, doi.org/10.7937/K9/ pubmed/27865153.
TCIA.2017.KLXWJJ1Q. Long, Jonathan, et al., “Fully Convolutional
Caselles, Vicent, et al. Geodesic Active Networks for Semantic Segmentation.”
Contours. Kluwer Academic Publishers, Proceedings of the IEEE Conference on
doi.org/10.1023/A:1007979827043. Computer Vision and Pattern Recognition
CIFAR-10 and CIFAR-100 Datasets, www. (CVPR’15), 2015, doi:10.1109/
cs.toronto.edu/~kriz/cifar.html. cvpr.2015.7298965.
Current Methods in Medical Image Mcrobbie, Donald W., et al. MRI from
Segmentation. Annual Reviews, www. Picture to Proton. Cambridge University
annualreviews.org/doi/10.1146/annurev. Press, 2016.
bioeng.2.1.315. Menze, Bjoern H, et al. “The Multimodal
Deng, Jia, et al. “ImageNet: A Large- Brain Tumor Image Segmentation
Scale Hierarchical Image Database.” Benchmark (BRATS).” IEEE
Proceedings of the IEEE Conference on Transactions on Medical Imaging, 2015,
Computer Vision and Pattern Recognition, 34, 1993–2024. www.ncbi.nlm.nih.gov/
2009, doi:10.1109/cvprw.2009.5206848. pubmed/25494501.
Feng, Xue, et al. “Brain Tumor Segmentation Moeskops, Pim, et al., “Automatic
Using an Ensemble of 3D U-Nets and Segmentation of MR Brain Images with
Overall Survival Prediction Using a Convolutional Neural Network.” IEEE
Radiomic Features.” Brainlesion: Transactions on Medical Imaging, 2016,
Glioma, Multiple Sclerosis, Stroke and 35, 1252–1261. www.ncbi.nlm.nih.gov/
Traumatic Brain Injuries, Lecture Notes pubmed/27046893.
on Computer Science, 2019, pp. 279–288., Morra, Jonathan, et al., “Machine Learning
doi:10.1007/978-3-030-11726-9_25. for Brain Image Segmentation.”
Gaillard, Frank. “Ischemic Stroke” Machine Learning, 2012, pp. 851–874,
Radiopaedia Blog RSS, radiopaedia.org/ doi:10.4018/978-1-60960-818-7.ch408.
articles/ischaemic-stroke. Myronenko, Andriy. “3D MRI Brain
Havaei, Mohammad, et al. “Brain Tumor Tumor Segmentation Using Autoencoder
Segmentation with Deep Neural Regularization.” Brainlesion: Glioma,
Networks.” Medical Image Analysis, Multiple Sclerosis, Stroke and Traumatic
Jan. 2017, www.ncbi.nlm.nih.gov/ Brain Injuries, Lecture Notes on
pubmed/27310171. Computer Science, 2019, pp. 311–320,
He, Kaiming, et al. “Deep Residual Learning doi:10.1007/978-3-030-11726-9_28.
for Image Recognition.” Proceedings of Pham, D.L., Xu, C., Princ, J.L. Computer
the IEEE Conference on Computer Vision Vision. Springer, 2014.
and Pattern Recognition (CVPR’16), Poldrack, Russell A., and Rebecca Sandak.
2016, doi:10.1109/cvpr.2016.90. “Introduction to This Special Issue: The
288 Handbook of Artificial Intelligence in Biomedical Engineering
scores and surpassed the state-of-the- the results. This leads to potential
art examples in challenges related to misuse of algorithms by attackers
biomedical data. and poses potential threats to user/
patient security in biomedical AI
systems. Moreover, biomedical
13.2.2 VULNERABILITIES OF systems usually leverage third-party
BIOMEDICAL AI SYSTEMS cloud platforms due to scalability,
storage, and performance benefits,
It is important to understand and and privacy compromisation is also
pinpoint the vulnerabilities of likely to happen in such situations
biomedical AI systems to realize due to the confidential information
the security threats at large and also being stored in the cloud unless
how to overcome or resist them. The secure sharing schemes and suitable
unique features and complexities encryption techniques are devised.
of biomedical data, which we have The problem of security also arises
discussed in the previous section in the case of data integration and
not only make the use cases more adoption that is needed to develop
interesting and complex but also large scale biomedical expert
vulnerable to potential security and systems.
privacy issues in AI systems. Thus, it is important for researchers
Often, the handling of clinical and AI engineers to understand how
data requires more context than the the sharing of sensitive patient data
standard for usual applications of works. For example, to prevent or
deep learning algorithms, such as reduce the effect of inference attacks
patient history, patient preferences, by adversaries, the use of polyin-
social perspectives, etc, which, stantiation techniques is helpful by
in turn, makes it more vulnerable separating the dataset into smaller sets
to security threats by malicious and developing “data silos” to avoid
attackers. Moreover, the “black box” disclosing the whole data.
nature of deep learning algorithms
results in a lack of model interpret-
ability and gives rise to confusion 13.3 SECURITY AND PRIVACY
regarding exactly how an AI model ISSUES IN BIOMEDICAL AI
can achieve the kind of performance SYSTEMS
it does. It is, therefore, not easy to
exactly identify the weaknesses There are several security and privacy
of the model or the reasons for the risks associated with biomedical AI
weakness, or even to extract addi- systems, and as research advances
tional biological explanations from in this area, the probability of
294 Handbook of Artificial Intelligence in Biomedical Engineering
very small, almost imperceptible (to e.g., ambiguous ground truth due
human sense) changes are made in to disagreement among human
the data to make the neural network medical specialists and the dearth
classifier misclassify the data, that of diversity in neural network archi-
is, make the wrong prediction. In the tectures utilized. The authors also
case of biomedical AI systems, this noted that adversarial patch attack
poses a very big threat to medical techniques are more powerful as
practitioners and more importantly, well as universal. IEEE Spectrum
patients; for, mere modification (Abouelmehdi et al., 2018) also
of some pixels in medical images reports that there may be a lot of
(Alnemari et al., 2017) may cause “incentives” behind such attacks,
the deep learning algorithm/neural e.g., existing cases of healthcare
network to predict a benign tumor fraud and the enormous revenue
as malignant or a malignant one as generated by the global healthcare
benign, both of which cases are very economy (Abuwardih et al., 2016)
unfortunate and life-threatening as it that makes the situation even more
can cause the wrong diagnosis and threatening. Some studies indicate
wrong treatment. Moreover, since that ethical hacking can come to our
such types of attacks are very subtle help in this regard as well since the
due to the imperceptible noise added expertise of ethical hackers lies in
to fool the network, detecting the feigning attacks on the data or the
presence of an error is hard. Celik et system to ultimately understand how
al. (2017) have also noted that such to protect the system against real
adversarial attacks may guide the attackers.
attacker toward the susceptible loca- Studies have suggested that
tions in biomedical images, which careful auditing of biomedical AI
may be distorted to cause the system systems and several rounds of testing
to misclassify. Such identification by cybersecurity specialists and AI
of attacking technique to identify experts can help detect such vulner-
vulnerable pixels in images may then abilities in the system. There is also
be applied to obtain a “susceptibility the need to develop algorithmic and
score” to alert biomedical systems infrastructural defense mechanisms
and make them more resistant to to these adversarial attacks (Alne-
such attacks. mari et al., 2017). In the long run,
Finlayson et al. (2018) have we must prioritize the development
pointed at several factors in the of robust machine learning/deep
specific case of medical data that learning models that are resistant to
makes biomedical AI systems more or not susceptible to such kinds of
vulnerable to adversarial attacks, attacks (Bose, 2016).
296 Handbook of Artificial Intelligence in Biomedical Engineering
altogether, in this regard, that is, to was a part of the training data or on
build privacy-preserving techniques the summary statistics to reveal the
to protect sensitive biomedical data. underlying distribution and other
Reconstruction attacks may also be useful statistical features (Shokri
staged to harm the model (Nasr et et al., 2017) and, thus, exploit the
al., 2018). predictions made by the AI model or
Linkage attacks are another compute some sensitive attribute of
important type of attack in AI the dataset in question. The attack
systems. Data linkage refers to the model may be trained with the help
idea that data in the public domain of what is called “shadow models,”
can be easily related to sensitive which are trained on either real or
information (say, confidential patient artificially generated fake data or
data) that can, thus, be accessed by both. The target model is made to
malicious actors. The linkage can train on such shadow models—it
be of several types such as attribute has been suggested in some studies
linkage or table linkage (Kieseberg
that the more the number of shadow
et al., 2014).
models, the better the perspective
To counter linkage attacks and
of accuracy—though there is a cost
security threats of the similar sort,
disadvantage in such situations,
which is especially important in
however, such claims have not yet
the wake of healthcare services in
been supported with valid proofs.
mobile devices that can compro-
The difference between such types
mise user identity and location
of attacks and reconstruction attacks
data, recent studies have suggested
(described above) is that they work
private record linkage and entity
even when the specific example
resolution techniques, such as
does not belong to the membership
deriving unique fingerprints from
set.
genomes to preserve patient iden-
tity. On the other hand, inference
attacks (Shokri et al., 2017; Nasr 13.4 POSSIBLE SOLUTIONS
et al., 2018) are data mining tech- TO SECURITY AND PRIVACY
niques wherein sensitive and robust ISSUES IN BIOMEDICAL AI
information can be deduced or SYSTEMS
“inferred” from trivial information
in a database with high confidence 13.4.1 GENERAL TECHNIQUES
value by malicious actors, hence
the name. Membership inference General techniques to database
attacks are made by adversaries by security such as auditing, that
querying whether a specific example is, rechecking for detection of
298 Handbook of Artificial Intelligence in Biomedical Engineering
However, like all other good tech- data by the adversary. These tech-
niques, it also has a few limitations. niques include additive and multi-
For example, one such disadvantage plicative noise modeling as well as
is that an adversary would be able other techniques like geometric and
to estimate the sensitive informa- random space perturbation.
tion that we are trying to protect if
repeated queries, that is, multiple
attempts are made and privacy 13.4.3.2 ANONYMIZATION
breach may happen quite easily.
Appropriate anonymization tech-
niques (Szarvas, et al., 2007; Shin, et
13.4.3 PRIVACY al., 2018), that is, those dealing with
PRESERVATION AS A removal of private information and
SOLUTION TO SECURE also the linking of information for
BIOMEDICAL AI MODELS such cases, must be applied to ensure
the protection of patients’ confiden-
13.4.3.1 PRIVACY-PRESERVING tial information in biomedical data.
DATA MINING (PPDM) Examples of such techniques include
the FAST algorithm (Mohammadian
Privacy-preserving data mining et al., 2014) and identity-based
(PPDM) and a similar concept anonymization (Abouelmehdi et al.,
privacy-preserving machine 2018).
learning are important paradigms However, anonymizing data is not
that must be understood and utilized always sufficient and the privacy it
with respect to security threats in provides quickly degrades as adver-
biomedical AI systems. Privacy- saries obtain auxiliary information
preserving techniques have already about the individuals represented in
been utilized in biomedical use the dataset. A much-cited example
cases and especially those involving of such circumstances was studied
distributed computing, such as by Narayanan and Shmatikov
cluster analysis of healthcare data (2008) in relation to “breaking the
(Fung et al., 2018). anonymity” of a Netflix dataset. The
authors experimented with the fact
that privacy breaches may occur by
13.4.3.1 DATA PERTURBATION way of the revelation of anonymized
identities via linkage attacks.
Data perturbation involves the addi- A technique to achieve anony-
tion of random noise, which makes it mization is anatomization, which
harder to attack the sensitive patient deals with splitting the data into
Security and Privacy Issues in Biomedical AI Systems and Potential Solutions 301
Yuan, X.; He, P.; Zhu, Q.; Li, X. Adversarial Zhu, H.; Liu, X.; Lu, R.; Li, H. Efficient
examples: Attacks and defenses for and privacy-preserving online medical
deep learning. In Proceedings of IEEE prediagnosis framework using nonlinear
Transactions on Neural Networks and SVM. IEEE Journal of Biomedical and
Health Informatics, 21(3), 2017, 838–850.
Learning Systems.2019.
Ziefle, M.; Valdez, A. C. Decisions about
Zhang, K.; Yang, K.; Liang, X.; Su, Z., Shen, medical data disclosure in the internet:
X.; Luo, H. H. Security and privacy for an age perspective. In Proceedings of
mobile healthcare networks: from a quality International Conference on Human
of protection perspective. IEEE Wireless Aspects of IT for the Aged Population,
Communications, 22(4), 2015, 104–112. 2018, 186–201, Springer, Cham.
CHAPTER 14
All the above stages work the alert will be sent to the medical
with wireless technology that uses practitioner through an alarm, SMS,
radio frequency (RF) radiation for or email using Wi-Fi that is an IEEE
the patients when it is monitored. 802.11 standard that stands for “wire-
Especially, the areas like ICU cannot less fidelity.” It is a popular wireless
be entertained with RF signals. networking technology introduced
To overcome all the above issues, by NCR Corporation/AT&T in the
a radiation-free technology can Netherlands (1991) (LaMonica,
be introduced in the hospitals for 1991). With the help of this tech-
monitoring the patient even more nology, the collected information
securely. can be easily exchanged or shared
between one or more devices. Wire-
less technology is required for using
14.1.2 WI-FI ROLE IN PATIENT all home appliances such as mobiles,
MONITORING televisions, DVD players, digital
cameras, laptops, smartphones, etc.
The health parameters are constantly The probability of communication
monitored and recorded. If the param- with Wi-Fi shall be through the
eter value is observed abnormal, client to client communication or
314 Handbook of Artificial Intelligence in Biomedical Engineering
access point to client communica- source. The LED transmits the binary
tion. It is an optimal option for home form of data in the form of light
and business networks. The data is pulses and thus is an optical wireless
converted into a radio signal, which communication (OWC) communica-
in turn transfers the data into an RF tion (Li-Fi, 2019). Li-Fi technology
antenna for users using a computer’s is also based on a visible-light wire-
wireless adapter. less communication system that lies
Previously, patient monitoring between the violet color (800 THz)
is implemented using wireless tech- and red color (400 THz). The Li-Fi
nology that uses RF waves, that is, uses the optical spectrum that is
electromagnetic waves to transmit visible light part of the electromag-
the sensed data which is collected netic spectrum, whereas Wi-Fi uses
by various sensors. Wi-Fi commonly RF of the electromagnetic spectrum.
uses a single band (2.4 GHz) or dual It uses fast strokes of LED light to
band (5.8 GHz) RF that works best transmit data, as it cannot be noticed
for light-of-sight condition. Some by the normal human eye. It includes
common materials can absorb or the visible light spectrum to transmit
reflect the radio waves that restrict the information. VLCs features are
the range of the signal. Wi-Fi uses providing wide bandwidth, that is,
a half-duplex shared configuration the optical spectrum guarantees
where all stations can transmit more than 10,000 times better band-
and receive the signal on the same width than the convention of the
channel. harmful RF frequencies. The LED
lights work rapidly for transmitting
the binary data by switching the
14.2 LI-FI TECHNOLOGY LED on and off because it has no
interfering light frequencies like that
Li-Fi is a wireless technology that of the radio frequencies in Wi-Fi.
uses visible light as the communi- In Li-Fi, the LED in the transmitter
cation medium of standard IEEE is connected to the data network
802.15.7. Li-Fi was proposed by (Internet through the modem) and
Harald Haas in 2011 (Li-Fi, 2019). the receiver (photodetector/solar
Li-Fi refers to an innovative wireless panel) on the receiving end, which
system of visible light communica- obtains the data as light gesture and
tion (VLC) technology. The VLC decrypts the information and then
technology can deliver bidirectional displays on the device connected
communication with high-speed data to the receiver (An Internet of
rates and networked mobile commu- Light, 2014). In the early stage, the
nication by using LED as the light data transfer speed was 15 Mbps.
Live Patient Monitoring System 315
research has not been touched like signal will be given to the
security; transmission scenario in medical practitioner.
the outdoor that has a high challenge • The second objective of this
needs to be implemented. proposal is to transmit the data
without exposing any harmful
radiation to the patients.
14.3 PROPOSED IDEA
FIGURE 14.2 Main source page of the LiMoS from the E-Doc side.
Live Patient Monitoring System 319
Figure 14.3 shows the monitoring patient. The monitoring page gives
page of the LiMoS from E-Doc side. information about the patient that
The main source page of the LiMoS includes hemodynamic, cardiac,
from the E-Doc side shows all the blood glucose level, body tempera-
bed information in the ICU, and the ture, pulse oximetry (respiration
bed allocated to the particular doctor rate), and the stress of a patient. All
is highlighted. The E-Doc checks this information is live updated to
in the bed to view the status of the the physicians.
be restrained by its heart rate. The LEDs can produce different data
heart rate is maintained to reduce rates, where a micro-LED bulb can
the risk of injury and mental fatigue. itself transmit 3.5 Gbps.
For measuring and displaying the
heart rate continuously, a heart rate
monitor device is to display the data 14.5 TRANSMISSION PHASE:
as the number of beats per minute. LI-FI IN NETWORKING
The pulse rate ranges from 60 to 100
bpm, which may fluctuate and rises Li-Fi module has two submodules:
with exercise, illness, injury, and one is the transmitter, and the other
emotions. is the receiver module (Figure 14.7).
14.5.1.1 TRANSMITTER
14.4.6 LED SECTION
The most important requirement for The data received from the sensors
a light source used in Li-Fi is the such as heart rate, temperature, and
ability to be switched on and off respiration are given as input to the
repeatedly in very short intervals of Arduino board, where these data are
time. LEDs are suitable light sources converted into a digital signal. This
for Li-Fi due to their ability to be digital signal is then given to the
switched on and off quickly. The Li-Fi transmitter part. The transmitter
variations in rate with the dimen- section is used to convert digital data
sions of LEDs are very important into visible light. The general concept
in Li-Fi technology. Different sized behind this is that the light intensity
Live Patient Monitoring System 323
of the LED is modulated, that is, the in the demodulation circuit. The
intensity of the light corresponds to photodiode is a semiconductor that
the data transmitted. The Arduino is converts the light signal into electric
not able to provide the right amount current. The need for the photodiode
of current to make the light intensity in this transmitter section is a rapid
strong and fast enough for transmit- response time with spectral sensi-
ting the data as light. To overcome tivity in the visible spectrum and a
this problem, a transistor is used as large radiation-sensitive area. The
a switch to turn on and off the LED, converted electrical signal is feeble
which made it possible to switch a and overwhelmed by noise. Then,
larger current faster. Figure 14.8 it undergoes demodulation through
shows the components used in the envelope detection to demodulate
Li-Fi transmitter setup that consists the data from the carrier signal.
of the Li-Fi transmitter module and The receiver does the filtering and
LED light source. then amplifies the signal. After
amplification, the signal will be in
an analog form that stances to fed
into an analog-to-digital converter,
before sending it to the Arduino
board. The photodiode generates
the current at very low value;
hence, for converting this current
into a voltage, a high-value resistor
is used. Further voltage is again
amplified by a comparator circuit to
give properly transmitted bits. The
amplitude of the amplified voltage
is the output of the 741 op-amp.
FIGURE 14.8 Li-Fi transmitter module. Then, the voltage comparator trans-
forms the signal into a digital format
before feeding into the micro-
14.5.1.2 RECEIVER SECTION controller that transmits the data
serially to another device. Figure
The receiver module has two
14.9 shows the receiver module
modules such as a demodulation
that transmits data 38,400 baud rate
circuit and a microcontroller. The
serially. It covers 5–15 ft distance.
transmitted optical pulse is retrieved
The coverage area can be increased
back into an electrical signal
by changing the LED wattage.
using a photodiode that is inbuilt
324 Handbook of Artificial Intelligence in Biomedical Engineering
FIGURE 14.10 Generalized block diagram of the OWC link in the time domain.
Live Patient Monitoring System 325
0, o ≥ ψ c 14.8.1 THROUGHPUT
(14.2)
Throughput is the ratio of the actual
⎧⎪ ρ Ah data transferred to the receiver to
x(o) NLOS =⎨ Ts ( Ψ ) g ( Ψ ) cos ( Ψ ) , 0 ≤ ψ < ψ c
⎪⎩ π ( h + d ) the actual data sent by the trans
2 2
is measured when the MMK unit a good result. Here, the value of
information is transmitted from both communication falls under
the transmitter to the receiver. the above category, but the better
The throughput performance of performance is achieved by the
existing wireless communication VLC for our LiMOS framework.
technologies is also measured with Figure 14.11 shows a compara-
real-time results. Comparatively, tive analysis of the patient moni-
Li-Fi achieves better performance toring system based on RF and
and has advantages over other tech- Li-Fi communication. Practically,
nologies. Normally, the value will possible metrics are considered for
fall between 0 and 1. The threshold test cases in the hospital that are
limit is 0.5. If the throughput is taken for analysis such as cost- and
above 0.5, it is considered to be time-based.
Archived from the original on 17 February the original on 15 January 2014. Retrieved
2019. on 16 January 2014.
Lee S. J., Jung S. Y., A SNR analysis of Rohner C., Raza S., Puccinelli D., and
the visible light channel environment for Voigt T., “Security in visible light
visible light communication. Proceedings communication: novel challenges and
of the 18th Asia-Pacific Conference opportunities,” Sensors & Transducers,
on Communication: Green and Smart vol. 192, no. 9, September 2015, pp. 9–15.
Communication for IT Innovation (APCC Study Paper on LiFi (Light Fidelity) & Its
2012), 2012, pp. 709–12. Applications, FN Division, TEC.
Liu J., Chen Y., Wang Y., Chen X., Cheng LiFi Data. Lumisense Technologies,
J., Yang J., “Monitoring vital signs and Chennai.
postures during sleep using WiFi signals,” Sudha S., Indumathy D., Lavanya A.,
IEEE Internet of Things Journal, vol. 5, no. Nishanthi M., Sheeba D. M., Anand
3, pp. 2071–2084, June 2018. V., “Patient monitoring in the hospital
Pottoo S. N., Wani T. M., Dar M. A., Mir S. management using Li-Fi”, Proceedings of
A., “IoT enabled by Li-Fi technology,” IEEE Technological Innovations in ICT
International Journal of Scientific Research for Agriculture and Rural Development
in Computer Science, Engineering and (TIAR), Chennai, 2016, pp. 93–96.
Information Technology, vol. 4, no. 1, Tsonev D., Sinanovic S., Haas H., “Complete
2018, pp. 106–110. ISSN:2456-3307. modeling of nonlinear distortion in OFDM-
Pramila R. S., Nargunam A. S., “Secure based optical wireless communication,”
patient monitoring system,” Journal of IEEE Journal of Lightwave Technology,
Theoretical and Applied Information vol. 31, no. 18, pp. 3064–3076, 2013.
Technology, vol. 62, no. 1, April 2014. Van Camp J., “Wysips solar charging screen
Purwita A. A., Soltani M. D., Safari M., could eliminate chargers and Wi-Fi.”
Harald H. Terminal orientation in OFDM- Digital Trends, 19 January 2014. Archived
based LiFi systems, vol. 18, no. 8, pp. from the original on 7 November 2015.
4003–4016, 2018. Retrieved on 29 November 2015.
Ramadhani E., Mahardika G. P., IOP Yadav S., Mishra P., Velapure M., Togrikar P.
Conference Series: Material Science S., “LI-FI technology for data transmission
Engineering, vol. 325, 012013, 2018. through LED,” Imperial Journal of
Rigg J., “Smartphone concept incorporates Interdisciplinary Research, vol. 2, no. 6,
LiFi sensor for receiving light-based data.” pp. 21–24, 2016.
Engadget. 11 January 2014. Archived from
CHAPTER 15
15.1.1 CONVOLUTION
NEURAL NETWORKS 15.1.2 OBJECTIVE
the MMI database had 45 images. face AUs was 96.7%. The perma-
They tracked a set of 20 facial nent features extracted were optical
fiducial points using temporal rules. flow, Gabor wavelets, and multistate
Zheng et al. (2006) used KCCA to models, and the transient feature
recognize facial expressions. The extracted was canny edge detection.
singularity problem of the Gram Bourel et al. (2001) developed an
matrix has been tackled using an algorithm that deals with recognizing
improved KCCA algorithm. Their facial expressions in the presence
accuracy on JAFFE database using of occlusions. It also proposed the
semantic info with leave-one-image- use of modular classifiers instead of
out cross-validation was 85.79% and monolithic classifiers. Classification
with leave-one-subject-out (LOSO) is done locally, and then the classifier
cross validation was 74.32% and on output is fused. The Cohn–Kanade
Ekman's database was 78.13%. They database was used, and there were 30
used JAFFE and Ekman's pictures of subjects. A total of 25 sequences for
affect database. Their JAFFE data- 4 expressions (a total of 100 video
base consisted of 183 images, and sequences) were taken. Local spatio-
Ekman's database had 96 images. temporal vectors were obtained from
Neutral expressions were not chosen the Kanade lucas tomasi tracker
from either database. The correla- algorithm. They used the modular
tion is used to estimate the semantic classifier with data fusion. Local
expression vector, which is then used classifiers are rank-weighted KNN
for classification. They converted 34 classifiers.
landmark points into a labeled graph Pardas and Bonafonte (2002)
using the Gabor wavelet transform. developed an algorithm for automatic
Then, a semantic expression vector extraction of MPEG-4 FAPs. This
is built for each training face. proves that FAPs convey the neces-
Tian et al. (2001) developed an sary information that is required to
algorithm that recognized posed extract the emotions. An overall
expressions. It was a real-time efficiency of 84% was observed
system. They used Cohn–Kanade across six prototypic expressions.
and Ekman–Hager Facial Action They used the whole Cohn–Kanade
Exemplars. They used 50 upper database and HMM classifier.
face samples from 14 subjects MPEG-4 FAPs extracted using an
performing 7 AUs and 63 lower face improved active contour algorithm
sample sequences from 32 subjects and motion estimation. Cohen et
performing 11 AUs. The accuracy of al. (2003) developed a real-time
recognition of upper face AUs was system. It suggests use of HMMs
96.4%, and the accuracy of lower to automatically segment a video
336 Handbook of Artificial Intelligence in Biomedical Engineering
into different expression segments. rates have been shown. They used
They used Cohn–Kanade and their the whole Cohn–Kanade database.
own database. They took 53 subjects It had an accuracy of 99.7% for
under Cohn–Kanade, and 5 subjects FER and 95.1% for FER based on
under their own database. They used AU detection. They used multiclass
NB, TAN, and ML-HMM classifiers. SVM for expression recognition and
They extracted a vector of extracted six classes of SVM, one for each
motion units using the piecewise expression. The feature extracted
Bézier volume deformation model was the geometric displacement of
(PBVD) tracker. Candide nodes.
Cohen et al. (2003) made a real- Wang and Yin (2007) proposed
time system that used semi-super- a topographic modeling approach
vised learning to work with some in which the gray-scale image
labeled data and large amount of is treated as a 3D surface. They
unlabeled data. They used the Cohn– analyzed the robustness of detected
Kanade database and the Chen– face region and the different intensi-
Huang database. They also extracted ties of facial expressions. They used
a vector of extracted motion units the Cohn–Kanade database and
using the PBVD tracker. the MMI database. They took 53
Sebe et al. (2007) developed an subjects, and 4 images per subject
algorithm that recognizes sponta- were taken for each expression,
neous expressions. They created which made a total of 864 images.
an authentic DB where subjects They used QDC, LDA, SVC,
are showing their natural facial and NB classifiers that extracted
expressions. They used spontaneous topographic context expression
emotions database and also the descriptors. It had an accuracy of
Cohn–Kanade database. The sample 92.78% with QDC, 93.33% with
for the database consisted of 28 LDA, and 85.56% with NB on the
subjects showing mostly neutral, joy, Cohn–Kanade database.
surprise, and delight whereas, the Dornaika and Davoine
Cohn–Kanade database consisted of (2008) proposed a framework for
53 subjects. They used Bayesian net simultaneous face tracking and
classifier, SVM, and decision tree. expression recognition. Two AR
They extracted MUs generated from models per expression gave better
the PBVD tracker. mouth tracking and in turn better
The algorithm of Kotsia and performance. The video sequences
Pitas (2007) recognizes either six contained posed expressions. They
basic facial expressions or a set of created their own database and used
chosen AUs. Very high recognition several video sequences. Also, they
Real-Time Detection of Facial Expressions 337
FIGURE 15.2 System block diagram followed by the classifier-based FER systems.
and finally, this unlabeled image the predictions that are displayed.
whose class needs to be detected is The block diagram of the proposed
passed through the model making method is shown in Figure 15.3.
FIGURE 15.3 System block diagram followed by the CNN based FER systems.
images of the training data are used for which we used the JAFFE image
by the CNN for validation purpose database that comprises 213 images
for the CNN to check and improve of different emotion classes, namely,
its accuracy. angry, sad, happy, surprise, disgust,
and neutral. Figure 15.4 represents
the sample of the facial expressions
15.4 DESIGN from the JAFFE image database.
APPROACH AND DETAIL
15.4.1 DESIGN APPROACH: 15.4.1.2 FEATURE
CLASSIFIER-BASED MODEL EXTRACTION
features from each emotion class kept for the validation purpose of the
were extracted and a total of 500 classifier post the training.
features were used among all the
features extracted.
15.4.1.4 TRAINING THE
CLASSIFIER AND GENERATING
15.4.1.3 FEATURES LOADED THE VALIDATION ACCURACY
TO CLASSIFIER
A classifier is a set pattern recogni-
Features were converted from an tion algorithm that is used to define
array into a table, and the table was whether or not the test data belongs
loaded into the classifier. This was to a certain class based on the
done using the help of a classifica- training set and the labels given in
tion learner toolbox of MATLAB. the training set. The classifiers used
The holdout validation was kept as in this work are elaborately explained
10%, that is, 10% of the data was in the technical specifications. The
340 Handbook of Artificial Intelligence in Biomedical Engineering
FIGURE 15.8 (a) ROC of happy, (b) ROC of sad, (c) ROC of disgust, (d) ROC of surprise,
(e) ROC of neutral, and (f) ROC of anger.
Real-Time Detection of Facial Expressions 343
FIGURE 15.10 (a) ROC of happy, (b) ROC of sad, (c) ROC of disgust, (d) ROC of surprise,
(e) ROC of neutral, and (f) ROC of anger.
15.5.3 ENSEMBLE SPACE KNN the individual model was the KNN
MODEL model. The accuracy achieved was
95.7%. The confusion matrix of
The ensemble-based subspace the ensemble space KNN model
KNN model makes use of predic- is represented in Figure 15.11.
tion of individual models that Moreover, the ROC curves of the
train random parts of the training respective emotions are represented
dataset and generate results and by in Figure 15.12(a)–(f). It was
comparing the prediction results observed that the ROC curves of the
of these individual models. The emotions were a perfect right angle
ensemble subspace model classi- for emotions that represented 100%
fies the real-time new image that accuracy.
is given for prediction. In our case,
Real-Time Detection of Facial Expressions 345
FIGURE 15.12 (a) ROC of happy, (b) ROC of sad, (c) ROC of disgust, (d) ROC of surprise,
(e) ROC of neutral, and (f) ROC of anger.
346 Handbook of Artificial Intelligence in Biomedical Engineering
the model proposed in our work can is increased, which will be our future
achieve once the number of epochs work.
British Machine Vision Conference, vol. 1, Cybern., vol. 4, pp. 3358–3363, October
pp. 213–222, 2001. 2005.
Buciu, I. and I. Pitas, “Application of Pantic M. and J.M. Rothkrantz, “Facial
Non-Negative and Local Non Negative Action Recognition for Facial Expression
Matrix Factorization to Facial Expression Analysis from Static Face Images.” IEEE
Recognition.” Proc. of the ICPR, pp. Trans. Systems, Man Cybernet., Part B,
288–291, Cambridge, UK, August 23–26, vol. 34, no. 3, pp. 1449–1461, 2004.
2004. Pardas, M. and A. Bonafonte, “Facial
Cohen, I., N. Sebe, F. Cozman, M. Cirelo, animation parameters extraction and
and T. Huang, “Learning Bayesian expression recognition using Hidden
Network Classifiers for Facial Expression Markov Models.” Signal Process.: Image
Recognition Using Both Labeled and Commun., vol. 17, pp. 675–688, 2002.
Unlabeled Data.” Proc. of the IEEE Conf. Samad, R. and H. Sawada. “Extraction of
Computer Vision and Pattern Recognition the Minimum Number of Gabor Wavelet
(CVPR), vol. 1, pp. I-595–I-604, 2003. Parameters for the Recognition of Natural
Cohen, I. N. Sebe, A. Garg, L.S. Chen, Facial Expressions.” Artif. Life Robot.,
and T.S. Huang, “Facial Expression vol. 16, no. 1, pp. 21–31, 2001.
Recognition From Video Sequences: Sarode, N. and S. Bhatia. “Facial Expression
Temporal and Static Modeling.” Comput. Recognition.” Int. J. Comput. Sci. Eng.,
Vis. Image Understand., vol. 91, pp. vol. 2, no. 5, pp. 1552–1557, 2010.
160–187, 2003. Savoiu, Alexandru, and James Wong.
Dornaika F. and F. Davoine, “Simultaneous “Recognizing facial expressions using
Facial Action Tracking and Expression deep learning.” (2017).
Recognition in the Presence of Head Sebe, N., M.S. Lew, Y. Sun, I. Cohen, T.
Motion.” Int. J. Comput. Vision, vol. 76, Gevers, and T.S. Huang, “Authentic Facial
no. 3, pp. 257–281, 2008. Expression Analysis.” Image Vis. Comput.,
Ekman, P. and W.V. Friesen, “Constants vol. 25, pp. 1856–1863, 2007.
across Cultures in the Face and Emotion.” Shan, K., J. Guo, W. You, D. Lu, and R. Bie,
J. Pers. Soc. Psychol., vol. 17, no. 2, pp. "Automatic Facial Expression Recognition
124–129, 1971. Based on a Deep Convolutional-Neural-
Kotsia I. and I. Pitas, “Facial Expression Network Structure." Proc. of the IEEE
Recognition in Image Sequences Using 15th International Conference on Software
Geometric Deformation Features and Engineering Research, Management
Support Vector Machines.” IEEE Trans. and Applications (SERA), London, pp.
Image Process., vol. 16, no. 1, pp. 123–128, 2017.
172–187, 2007. Sharmila A. and P. Geethanjali, “Detection
Kumbhar, M., A. Jadhav, and M. Patil, of Epileptic Seizure from EEG Based on
“Facial Expression Recognition Based on Feature Ranking and Best Feature Subset
Image Feature." Int. J. Comput. Commun. Using Mutual Information Estimation.”
Eng., vol. 1, pp. 117–119, 2012. J. Med. Imag. Health Informat., vol. 6,
Mehrabian. A., "Communication without 1850–1864, 2016a.
Words." Psychol. Today, vo1. 2, no. 4, pp. Sharmila A. and P. Geethanjali, “DWT
53–56, 1968. Based Epileptic Seizure Detection from
Pantic, M. and I. Patras, “Detecting Facial EEG Signals Using Naïve Bayes and
Actions and Their Temporal Segments KNN Classifiers.” IEEE Access, vol. 4,
in Nearly Frontal-View Face Image 7716–7727, 2016b.
Sequences.” Proc. IEEE Conf. Syst., Man
Real-Time Detection of Facial Expressions 351
Shih, F.Y., Chuang C.F., and Wang P.S.P., Wang J. and L. Yin, “Static Topographic
“Performance comparisons of facial Modeling for Facial Expression
expression recognition in JAFFE Recognition and Analysis.” Comput. Vis.
database,” Int. J. Pattern Recognit. Image Understand., vol. 108, pp. 19–34,
Artificial Intell., vol. 22, no. 3, pp. 2007.
445–459, 2008. Wang, X.-H., A. Liu, and S.-Q. Zhang. “New
Shrivastava, D. and L. Bhambu, “Data Facial Expression Recognition Based on
Classification Using Support Vector FSVM and KNN." Optik, vol. 126, pp.
Machine.” J. Theor. Appl. Inf. Technol., 3132–3134, 2015.
vol. 12, no. 1, pp. 1–7, 2010. Zheng, W., X. Zhou, C. Zou, and L. Zhao,
Tian, Y., T. Kanade, and J. Cohn, “Facial Expression Recognition Using
“Recognizing Action Units for Facial Kernel Canonical Correlation Analysis
Expression Analysis.” IEEE Trans. (KCCA).” IEEE Trans. Neural Netw., vol.
Pattern Anal. Mach. Intell., vol. 23, no. 2, 17, no. 1, pp. 233–238, January 2006.
pp. 97–115, 2001.
CHAPTER 16
Maximum 52% 55% 54% 48% 98% 98% 98% 98% 97% 96% 76% 81%
frequency
Median
frequency
Mean 58% 50% 57% 55% 98% 98% 98% 98% 96% 94% 77% 82%
median
Kurtosis 49% 49% 49% 44% 98% 98% 98% 98% 85% 72% 70% 68%
skewness
Energy entropy 49% 50% 49% 52% 87% 89% 98% 98% 91% 91% 71% 81%
Handbook of Artificial Intelligence in Biomedical Engineering
Analysis and Interpretation of Uterine Contraction Signals 367
ENHANCED CLASSIFICATION
PERFORMANCE OF
CARDIOTOCOGRAM DATA FOR
FETAL STATE ANTICIPATION
USING EVOLUTIONARY FEATURE
REDUCTION TECHNIQUES
SUBHA VELAPPAN,1* MANIVANNA BOOPATHI ARUMUGAM,2 and
ZAFER COMERT3
Department of Computer Science & Engineering,
1
diagnosis on fetal condition, which ROC have been used and the hybrid
may lead to the extent of fetal death. model incorporating information
Feature selection is the process in gain and ppposition-based firefly
which an optimal subset of features algorithm proves better performance
is selected based on some defined than the other techniques.
criterion which helps to consider-
ably improve the performance of
17.1 INTRODUCTION
classification in terms of learning
speed, accuracy of prediction, Healthcare is one of the major
simplicity of rules, etc. Also, the sectors which exploits computers
reduction in size of feature subset and modern techniques of informa-
helps to remove noise and irrelevant tion technology for efficient patient
features. Several approaches have information storage, management,
been introduced for improving the retrieval, documetation, diagnosis,
performance of computerized clas- etc. Data mining techniques are
sification of CTG data which leads employed in clinical decision
to an improved diagnosis of fetal support systems (CDSS) for effi-
status. In this chapter, Filter and ciently handling these huge amount
Wrapper feature selection techniques of healthcare data in order to assist
are applied to CTG dataset available the this industry in identifying good
in UCI machine learning repository. practices of patient monitoring,
Evolutionary algorithms such as hospital administration, diagnosis,
genetic algorithm, firefly algorithm, treatment and documentation. This
and a hybrid technique incorporating eventually brings the cost down by
information gain and opposition- almost 30% (HealthCatalyst, 2019).
based firefly algorithm have been However, identifying and employing
used to improve the classification efficient data mining techniques for
performance of CTG dataset. The this purpose remains still a chal-
results of simulations show that the lenge because of the complex nature
proposed methodologies are highly of healthcare data and inability to
promising when compared to the adapt to new technologies. The
other existing methods. To assess knowledge-based CDSSs make use
the performance of these proposed of if-then rules in the knowledge
methodologies, various performance base with an inference system and
measures namely accuracy, sensi- a communication system in order to
tivity (or) recall, specificity, preci- obtain the inferences by combining
sion (or) positive predictive value, the if-then rules with the patient
negative predictive value, geometric data. The CDSSs which do not
mean, F-measure, and area under rely on knowledge base, utilize the
Enhanced Classifcation Performance of Cardiotocogram Data 373
the fetus to get the FHR. The UC is neural networks, etc. Usually, the
measured using Catheter, a flexible two stages of classification process
tube inserted in to the uterus. are, learning the model from the
training data set which has class
labels and applying the model to
17.2.1 UCI CTG DATASET the test set. The table containing the
results of classification is called as
The Centre for Machine Learning and “Confusuion matrix.” The confusion
Intelligent Systems, Bren School of matrix shows the number of times the
Information and Computer Science, classes are predicted correctly and
University of California at Irvine, wrongly, which gives the accuracy
USA maintains a free to access data of classification. From the confusion
bank named UCI Machine Learning matrix, there are a number of perfor
Repository (UCI Machine Learning mance measures such as accuracy,
Repository, 2019). The CTG data error rate, sensitivity, specificity,
available in this data bank is one of negative predictive value, geometric
the widely referred data sets. There mean, precision (or) positive predic
are 2126 fetal CTG recordings tive value, F-measure, area under
classified into three classes namely ROC, etc., are evaluated in order
Normal (N), Suspect (S), and Patho to specify the effectiveness of the
logic (P). There are 21 attributes for classifier. Following are the expres
each CTG with 1 attribute to repre sions for these performance metrics
sent the class. More details of these measured from the confusion matrix
attributes and classes can be found in given in Table 17.2.
(UCI Machine Learning Repository,
2019). TABLE 17.2 Confusion Matrix
Predicted
17.3 CLASSIFIERS AND Class A Class B
PERFORMANCE
TRUE
Class A
FALSE NEGATIVE
POSITIVE
Classifiers are the models used to (FN)
(TP)
ACTUAL
(17.1)
Enhanced Classification Performance of Cardiotocogram Data 379
⎡
Error rate = ⎢
FP + FN ⎤ Number of wrong predictions
=
⎣ TP + TN + FP + FN ⎦⎥ Total number of predictions
(17.2) individually, it is simple but least
efficient method. On the other hand,
the vector selection establishes
⎡ TN ⎤
Negative predictive value: NPV = ⎢ (17.3) a relationship across the features
⎣ TN + FN ⎥⎦
based on a wrapper or a filter and
hence it is relative complex and
⎡ TP ⎤
Precision (or) positive predictive value: PPV = ⎢
⎣ TP + FP ⎦⎥
(17.4)
efficient. The filter-based selection
of features is done based on the
⎡ TP ⎤
Sensitivity (or) Recall = ⎢ (17.5) statsitical properties of the features
⎣ TP + FN ⎦⎥
(John et al., 1994). Since, it does
not employ any learning algorithms
⎡ TN ⎤ for this task, it is very simple and
Specificity = ⎢ (17.6)
⎣ TN + FP ⎥⎦
fast. However, because it does not
consider the dependencies of any
Geometric mean: Gmean = specificity× sensitivity (17.7) feature with other features, it may
result in a poor classification. The
⎡ precision×sensitivity ⎤
wrapper-based feature selection
F-measure = 2 × ⎢ ⎥
⎣ precision +sensitivity ⎦
(17.8) is done using repeated learnings
which looks for the optimal or near
Area under ROC =
Sensitivity + Specificity
(17.9) optimal subset then followed by
2
cross-validation. As a result, it is
more computationally expense but
more accurate too.
17.4 FEATURE SELECTION Methods like information
gain, chi-square test, Fisher score,
The FS or variable selection or correlation coefficient and vari
attribute selection, eliminates the ance threshold, gain ratio attribute
features which are not relevant, evaluator (Kantardzic, 2013),
unnecessary and containing noise information gain attribute evaluator
and as a result creates a reduced (Liu and Motoda, 2008), etc., are
feature subset. The reduced size of the popular filter-based methods.
the feature subset helps to improve Genetic algorithms, multiclass
the performance of classification. SVM classifiers (Ahuja and Yadav,
FS is done either by scalar selection 2012), recursive elimination,
or by vector selection (Dua and sequential selection algorithms and,
Du, 2011). Since the scalar selec etc., are the examples for wrapper-
tion invloves selection of features based selection methods.
380 Handbook of Artificial Intelligence in Biomedical Engineering
nature of fitness function has the Hence, it can be written from the
influence in the light intensity of a above two equations of light inten-
firefly. sity that
The overall idea of FA is on the
relationship that the light intensity I ( x) = I s e −c x2 (17.12)
(I) is in square relationship with the
distance (x). It means that the light However, the attractivness equa-
is seen much brighter than the actual tion relating it with the light intensity
intensity at the source (Is), when can be written using the attractive-
the distance from which it is seen ness at zero distance (as) as
decreases. This relationship can be
2
mathematically written as a = as e −c x (17.13)
Is The Euclidean distance between
I ( x) = (17.10)
x2 two fireflies namely py and pz can
Hence, the fitness or objective be written by representing the mth
function of FA is evaluated in such a component of spatial coordinate, py
way that the solutions are represented as py,m and pz as pz,m;
by the light intensity of each firefly
which are directly proportional to n
( p − pz , m )
2
x yz = p y − pz = y ,m
the value of the fitness or objective m=1
function. (17.14)
In FA, a random initial population
is initialized with the defined values The attraction of firefly y by
of parameters such as randomization another firefly z can be written using
parameter (r), attractiveness (a) and ψy as a vector containing Gaussion
coefficient of absorption (c). With ditributed random numbers in the
these arrangements, the solutions are space [0,1] as
searched by determining the fitness
values continuously for the given
number of iterations.
p y = p y + as e
−c x yz 2
(p z − p y ) + rƒ y
The light intensity of a firefly (17.15)
seen by another firefly in a medium
varies with distance (x) as The present state of yth firefly,
attraction of yth firefly by another
I = I s e −c x (17.11) firefly and the movement of yth
firefly in a random manner are the
where the Is is the intensity of light three factors which are considered
at source. to represent the firefly’s movement.
382 Handbook of Artificial Intelligence in Biomedical Engineering
TABLE 17.4 (Continued) Tables 17.3 and 17.4 show that the
FA-based FS with SVM exhibits an
Performance Without With FA-based
Metrics (%) FS FS appreciable improvement of perfor-
G-Mean 83.77 89.19
mance in all aspects. The measures
of these performance metrics are
F-Measure 78.08 83.94
also presented graphically in Figures
Area under 84.00 89.30
ROC 17.4 and 17.5.
FIGURE 17.4 Accuracy of SVM classification with and without FA-based FS.
FIGURE 17.5 Other performance metrics of SVM with and without FA-based FS.
FA, Lévy-flight FA, Jumper FA, and Pedrini, 2013; Draa et al.,
chaotic FA and self-adaptive step FA 2015; Subha and Murugan, 2016;
(Uzer et al., 2013). Another efficient Tizhoosh, 2006; Xu et al., 2011; Yu
modified FA is named opposition- et al., 2015). The other optimiza-
based FA which uses opposition- tion algorithms such as GA, ACO,
based learning (OBL) (Schiezaro PSO, bio-geography optimization,
386 Handbook of Artificial Intelligence in Biomedical Engineering
presence and absence of a feature and the best of these 25 runs are
in the data set is represented by 1 presented.
and 0, respectively. The data set is
divided into two parts in such a way
that the three-fourth of the data set 17.6.4 RESULTS OF
is used for training the classifier and SIMULATION EXPERIMENTS
the remaining one-fourth is used to
test it. The FA parameters such as Two SVM classification experiments
objective function, initial popula- are performed using the full data set
tion, randomization, attractiveness, and reduced data set by OBFA. The
coefficient of absorption, and accuracy of these two classifications
number of iterations, are taken as are presented in Table 17.6. Further,
same as that of the standard FA used the other measured performance
earlier. Also, 25 trials of simula- metrics for all these three classifiers
tions were performed using OBFA are consolidated in Table 17.7.
388 Handbook of Artificial Intelligence in Biomedical Engineering
TABLE 17.6 Average Accuracy of SVM TABLE 17.7 Other Performance Metrics of
With and Without OBFA-based FS SVM With and Without OBFA-based FS
Data set Average Performance Without With OBFA-
accuracy Metrics (%) FS based FS
(%) Sensitivity 77.79 83.81
Actual full feature set 88.75 Specificity 90.22 93.72
(without FS)
PPV 78.29 85.45
Reduced optimal 92.85
feature set (with NPV 90.70 95.02
OBFA-based FS) G-mean 83.77 88.62
F-measure 78.08 84.62
Area under 84.00 88.76
It is found that the average accu- ROC
racy is 88.75% with full feature set
and the same is achieved as 91.92% Figures 17.8 and 17.9 are the
with optimal feature set produced graphical presentations of the results
by FA and as 92.85% with optimal of OBFA-based SVM classifier for
feature set produced by OBFA. UCI CTG data set.
FIGURE 17.8 Average accuracy of SVM with and without OBFA-based FS.
FIGURE 17.9 Other performance metrics of SVM with and without OBFA-based FS.
Enhanced Classifcation Performance of Cardiotocogram Data 389
2. Sensitivity 7.74
It is always important that any
feature in a data set which contains
3. Specificity 3.88 useful information about it should
not be ignored during feature selec-
4. PPV 9.15
tion process. Removing such an
5. NPV 4.76 apposite feature will lead to poor
classification and thereby poor
6. G-Mean 5.79
prediction too. Hence, it is a good
7. F-Measure 8.38 practice to employ some tech-
niques to evaluate the relevancy of
8. Area under ROC 5.67
each feature to the data set before
ignoring it for feature reduction
The OBFA-based FS has resulted (Zhang et al., 2018). One of the
a maximum of 8.8% increase in PPV successful filter-based techniques
and the minimum of 3.9% increase for assessing the relevance of
feature to its associated data set is and the reduced optimum feature
IG (Sui, 2013; Azhagusundari and set produced by IG-OBFA and the
Thanamani, 2013; Mitchell, 1997; results are peresented below.
Porkodi, 2014; Subha et al., 2017).
In order to improve the classifi- TABLE 17.9 Average Accuracy of SVM
cation performance of OBFA-based Without FS, with IG and with IG-OBFA-
SVM classifier further, a new melded based FS
method is presented here which Data Set Average
employs IG with the OBFA for SVM Accuracy (%)
classifier to classify the UCI CTG
Actual full feature set 88.75
data set. (without FS)
In the IG-OBFA-based feature
Reduced feature set 89.47
selection process, the IG of all
(with IG-based FS)
features in the data set are determined
and these features are arranged in Reduced feature set 96.24
descending order based on their IG (with IG-OBFA-based
FS)
values. Then, the top 15 features are
taken as the reduced feature set and
presented as initial population for the Table 17.9 shows that using
OBFA with 1’s and 0’s representing IG alone for feature selection has
the presence and absence of a feature slightly increased the average
in the data set, respectively, in order accuracy than using the full feature
to produce the optimum feature set. set. However, there has been a great
As performed in FA and OBFA, 25 improvement in average accuracy
trials were done using IG-OBFA from 88.75% to 96.24% when
too and the best of the results are using IG-OBFA-based FS instead
presented here. of using a full feature set. The
other performance metrics such as
Specificity, Sensitivity, PPV, NPV,
17.7.2 RESULTS OF G-mean, F-measure and area under
SIMULATION EXPERIMENTS ROC are also measured for these
classifications and presented in
As done in the previous experi- Table 17.10.
ments, the training and testing Figures 17.11 and 17.12 are the
data are selected as 75:25 ratio graphical presentations of the results
of the full data set. Classifica- of IG and IG-OBFA-based SVM
tion experiments are performed classifiers for the classification of
with the full feature set, reduced UCI CTG data set.
feature set produced by IG only
Enhanced Classifcation Performance of Cardiotocogram Data 391
TABLE 17.10 Other Performance Metrics of SVM Without FS, with IG and with IG-OBFA-
based FS
Performance Without FS With IG-based With IG-OBFA-
Metrics (%) FS based FS
Sensitivity 77.79 81.07 96.26
Specificity 90.22 91.14 91.92
PPV 78.29 78.48 93.33
NPV 90.70 91.29 97.44
G-mean 83.77 85.96 94.06
F-measure 78.08 79.75 92.61
Area under ROC 84.00 86.11 94.09
FIGURE 17.11 Average accuracy of SVM without FS, with IG and with IG-OBFA-based
FS.
FIGURE 17.12 Other performance metrics of SVM without FS, with IG and with
IG-OBFA-based FS.
392
TABLE 17.11 Performance Measures of Classification Using Various Feasture Selection Methods for UCI CTG Data Set
Performance Without FS With FS
Metrics (%)
Filter Techniques Wrapper Techniques
DT MLP NB SVM Chi-Squared Gain IG GA FA OBFA IG-OBFA
Ratio
Average Accuracy 88.15 83.45 79.69 88.75 87.40 86.46 89.47 91.35 91.92 92.85 96.24
Sensitivity 78.60 72.21 69.74 77.79 74.77 74.00 81.07 80.71 84.83 83.81 91.92
Specificity 90.99 90.12 86.83 90.22 89.33 89.2 91.14 92.50 93.78 93.72 96.26
PPV 75.94 71.60 62.38 78.29 74.92 72.44 78.48 83.06 83.14 85.45 93.33
NPV 90.09 86.43 83.11 90.70 89.78 88.89 91.29 93.77 93.26 95.02 97.44
G-Mean 84.12 79.93 77.34 83.77 80.90 80.46 85.96 85.92 89.19 88.62 94.06
F-Measure 77.14 71.45 65.28 78.08 74.78 73.03 79.75 81.87 83.94 84.62 92.61
Area under ROC 84.79 81.17 78.29 84.00 82.05 81.60 86.11 86.61 89.30 88.76 94.09
Handbook of Artificial Intelligence in Biomedical Engineering
Enhanced Classifcation Performance of Cardiotocogram Data 393
Computer Science and Information Chudáček, V.; Spilka, J.; Huptych, M.;
Systems (FedCSIS), 2013, 769–774. Georgoulas, G.; Lhotská, L.; Stylios,
Bachrach, RG.; Navot, A,; Tishby, N. C.; Koucky, M.; Janku, P. Linear and
Margin based feature selection-theory non-linear features for intrapartum
and algorithms. Proceedings of the cardiotocography evaluation. Computing
Twenty-First International Conference on in Cardiology, 2010, 37, 999–1002.
Machine Learning. 2004, 43–51. Chudacek, V.; Spilka, J.; Rubackova, B.;
Banati, H.; Bajaj, M. Firefly based feature Koucky, M.; Georgoulas, G.; Lhotska, L.;
selection approach. International Journal Stylios, C. Evaluation of feature subsets
of Computer Science Issues. 2011, 8(4), for classification of cardiotocographic
473–480. recordings. Computers in Cardiology,
Bharathi. PT; Subashini, P. Differential 2008, 845–848.
evolution and genetic algorithm based Cömert, Zafer.; Kocamaz, AF.; Subha,
feature subset selection for recognition of Velappan. Prognostic model based on
river ice types. Journal of Theoretical & image-based time-frequency features
Applied Information Technology. 2014, and genetic algorithm for fetal hypoxia
67(1), 254–262. assessment. Computers in Biology and
Bhatia, S.; Prakash, P.; Pillai, GN. SVM Medicine, 2018, 99, 85–97.
based decision support system for heart Cömert, Z.; Kocamaz, AF. Using wavelet
disease classification with integer-coded transform for cardiotocography signals
genetic algorithm to select critical features. classification, 25th Signal Processing and
Proceedings of the World Congress on Communications Applications Conference
Engineering and Computer Science (SIU), Turkey, 2017
(WCECS), 2008, 22–24. CÖMERT, Zafer.; Yang, Zhan.; Subha,
Bhatla, N.; Jyoti, K. An analysis of heart Velappan.; Kocamaz, Adnan Fatih.;
disease prediction using different data Manivanna Boopathi, Arumugam.
mining techniques. International Journal Performance evaluation of empirical
of Engineering. 2012, 1(8), 1–4. mode decomposition and discrete wavelet
Buck, TE.; Zhang, B. SVM kernel transform for computerized hypoxia
optimization: An example in yeast protein detection and prediction. 26th IEEE
subcellular localization prediction. Project Signal Processing and Communication
Report, School of Computer Science, Applications (SIU) Conference, Turkey,
Carnegie Mellon University, Pittsburgh, 2018a.
USA, 2006. CÖMERT, Zafer.; Yang, Zhan.; Subha,
Chitradevi, Muthusamy.; Sundar, Velappan.; Kocamaz, Adnan Fatih.;
Chinnasamy.; Geetharamani, Gopal. An Manivanna Boopathi, Arumugam.
outlier based Bi-level neural network The influences of different window
classification system for improved functions and lengths on image-based
classification of cardiotocogram data. Life time-frequency features of fetal heart
Science Journal, 2013, 10(1), 244–251. rate signals. 26th IEEE Signal Processing
Chuang, LY.; Jhang, HF.; Yang, CH. Feature and Communication Applications (SIU)
selection using complementary particle Conference, Turkey, 2018b.
swarm optimization for DNA microarray Deekshatulu, BL.; Chandra, P. Classification
data. Proceedings of International of heart disease using k-nearest neighbor
Conference of Engineers and Computer and genetic algorithm. Procedia
Scientists, Hong Kong, 2013. Technology. 2013, 10, 85–94.
396 Handbook of Artificial Intelligence in Biomedical Engineering
and Communication Journal. 2008, 3(3), optimisation for feature selection. IEEE
118–121. Congress on Evolutionary Computation,
Wagholikar, Kavishwar.; V, Sundararajan.; 2012, 1–8.
Ashok, Deshpande. Modeling paradigms Xu, Q.; Wang, L.; Baomin, H.; Wang, N.
for medical diagnostic decision support: Modified opposition-based differential
a survey and future directions. Journal evolution for function optimization.
of Medical Systems. Journal of Medical Journal of Computational Information
Systems, 2012, 36(5), 3029–3049. Systems, 2011, 7(5), 1582–1591.
Warrick, PA.; Hamilton, EF.; Kearney, RE.; Yu, Shuhao.; Zhu, Shenglong.; Ma, Yan.;
Precup, D. Classification of normal and Mao, Demei. Enhancing firefly algorithm
hypoxic fetuses using system identification using generalized opposition-based
from intrapartum cardiotocography. IEEE learning. Computing. 2015, 97(7),
Transactions on Biomedical Engineering, 741–754.
2010, 57(4), 771–779. Zhang, Zhongheng.; Trevino, Victor.;
Xin-She, Yang. Nature-Inspired Hoseini, Sayed Shahabuddin.; Belciug,
Metaheuristic Algorithms. 2nd edition, Smaranda.; Manivanna Boopathi,
Luniver Press, UK, 2010. Arumugam.; Gorunescu, Florin.; Subha,
Xue, B.; Cervante, L.; Shang, L.; Browne, Velappan. Variable selection in logistic
WN.; Zhang, M. A multi-objective particle regression model with genetic algorithm.
swarm optimisation for filter-based feature Annals of Translational Medicine. 2018,
selection in classification problems, 6(3):45.
Connection Science, 2012, 24(2–3), Zhuo, L.; Zheng, L.; Li, X.; Wang, F.; Ai, B.;
91–116. Qian, J. A genetic algorithm based wrapper
Xue, B.; Zhang, M.; Browne, WN. Multi- feature selection method for classification
objective particle swarm optimisation of hyperspectral images using support
(PSO) for feature selection. Proceedings vector machine. Geoinformatics and Joint
of the 14th Annual Conference on Genetic Conference on GIS and Built Environment:
and Evolutionary Computation, 2012, Classification of Remote Sensing Images,
81–88. 2008, 71471, 71471J.
Xue, B.; Zhang, M.; Browne, WN. New
fitness functions in binary particle swarm
CHAPTER 18
DEPLOYMENT OF SUPERVISED
MACHINE LEARNING AND DEEP
LEARNING ALGORITHMS IN
BIOMEDICAL TEXT CLASSIFICATION
G. KUMARAVELAN* and BICHITRANANDA BEHERA
Department of Computer Science, Pondicherry University,
Karaikal, India
Corresponding author. E-mail: gkumaravelanpu@gmail.com
*
criteria are used, and most of the one or zero depending on the pres-
DT classifiers use single attribute ence or absence of that particular
split combination wherever the one attribute in that document (Lewis,
attribute is employed to perform 1998). However, the multinomial
the division (Aggarwal, 2012). The model works on the frequencies
attribute or term whose informa- of attributes available in the VSM
tion gain is high is considered as a representation of the documents
base node, and also the procedure (McCallum, 1998). If the vocabu-
is continual consequently for lary size is small, then the Bernoulli
choosing the remaining nodes. model performs better than the
Meanwhile within the testing multinomial model.
phase, to predict the category label
of a new untagged document, the
DT classifier tests the terms of the 18.3.3 K-NEAREST
against the DT ranging from the NEIGHBORHOOD CLASSIFIER
root node (base node) to until it (K-NN)
reaches a leaf node and assigns the
category label of the leaf node. Most of the classifiers within the
literature pay longer in the training
part for building the classification
18.3.2 NAÏVE BAYES (NB) model are considered as an eager
CLASSIFIER learner. However, k-NN classifier
spends longer within the testing part
NB classifier is a probabilistic clas- for predicting the category label of
sifier based on Bayesian posterior the new untagged test document.
probability distribution. It holds Hence, it is known as a lazy learner.
the restriction with the independent In the training section of the
relationship among the attributes model construction, k-NN classifier
through conditional probability. stores all the training documents
There is two variant of NB classifier, together with their target class.
namely the multivariate Bernoulli Meanwhile, in testing phase, once
model (B_NB) and multinomial any new test document comes for
model (M_NB) (McCallum & classification whose target class is
Nigam). The multivariate Bernoulli unknown, k-nearest-neighborhood
naive Bayes model works only on classifier finds the distance of the
binary data. Hence, in document test document from all the training
preprocessing steps, each attri- documents and assigns the category
butes corresponding to the list of label of the training documents that
documents in VSM must be either is nearest or most like the unknown
410 Handbook of Artificial Intelligence in Biomedical Engineering
document (Sebastiani, 2002; Han the structure of the brain, and it will
et al., 2001). For this reason, k-NN learn from the prevailing training
classifier is thought of as an instant- data to perform tasks like catego-
based learning algorithm (Han et rization, prediction or forecast,
al., 2001). Euclidian distance and decision-making, visualization, and
cosine similarity are the foremost others. It consists of a compila-
oftentimes used approaches for tion of nodes otherwise known as
measurement similarity quotient to neurons that are the middle of data
find the NN. processing in ANN. With context to
the problem statement, these neurons
are organized into three different
18.3.4 SUPPORT VECTOR layers, specifically the input layer,
MACHINE (SVM) an output layer, and hidden layer.
Within the context of text classifica-
SVM is a kind of classifier has the tion, the quantity of words or terms
potential to classify each linear and
outlines the neuron numbers within
nonlinear data (Cortes & Vapnik,
the input layer, and therefore the
1995). The core plan behind the
classes (class label) of documents
SVM classifier is that it first non-
define the number of neurons in
linearly maps the initial training data
the output layer. ANN will have a
into sufficiently higher dimension
minimum of one input layer and one
let be n, so the data within the higher
output layer; however, it's going to
dimension is separated simply by n-1
have several hidden layers relying
dimension decision surface known as
upon the chosen drawback. All links
hyperplanes. Out of all hyperplanes,
the SVM classifier determines the from the input layer to the output
simplest hyperplane that has most layer through hidden layers are
margins from the support vectors. appointed with some weights that
Thanks to non-linearity mapping, represent the dependence relation
SVM classifier works expeditiously between the nodes. Once the neurons
on an oversized data set and has been get weighted data, it calculates the
with success applied in text classifi- weighted sum, and a well-known
cation (Drucker,1999). activation function processes it. The
output value from the activation
function is fed forward to all the
18.3.5 ARTIFICIAL NEURAL neurons within the input layer to
NETWORK (ANN) map the proper neuron in the output
layer. Some examples of well-known
ANN is a reasonably a data activation functions are Binary
processing nonlinear model cherish step, Sigmoid, TanH, Softmax, and
Deployment of Supervised Machine Learning and Deep Learning Algorithms 411
and their descriptions are detailed ads or not. This dataset has
below: size 12.4 MB and is available
• BioCreative Corpus III at the UCI ML repository
(BC3): The BC3 dataset has (Lichman, 2013).
been created by the BioCre- • TREC 2006 Genomics Track
ative III interactive task of dataset: This dataset is the
the BioCreative workshop collection of biomedical
that was conducted in 2010. full-text HTML documents
The BC3 dataset is divided from 49 journals in the area
into BC3-part 1 and BC3-part of Genomics Track. In this
2 datasets. Both BC3-part 1 experiment, 1077 biomedical
and BC3-part 2 datasets are article abstract or document is
originally in XML format and collected from five journals.
have size 32.5 MB and 46.5 The number of document
MB, respectively. For docu- collections from each of the
ment classification, all the five journals is presented in
abstract and respective class Table 18.2.
label of each document is
extracted from the XML file TABLE 18.1 Summary of Four Biomedical
and represent in a CSV file. Text Datasets
For BC3-part 1, the CSV file Dataset Classes Number of
is of size 3.12 MB and repre- Documents
sents 2280 article abstract
with the class label of each BC3—part 1 2 2280
abstract. Similarly, BC3-part BC3—part 2 2 4000
2 CSV file has size 5.73 MB Farm Ads 2 4143
and holds 4000 article abstract TREC 2006 5 1077
with their class label. This Genomics Track
dataset is available at https://
biocreative.bioinformatics. TABLE 18.2 TREC 2006 Genomics Track
udel.edu/ resources/corpora/ Dataset
biocreative-iii-corpus/ Journal Name No. of
• Farm Ads dataset: This Documents
dataset contains 4143 farm Cerebral Cortex CC 201
ads texts documents that Glycobiology GLY 203
represent various topics of
Alcohol and Alcoholism AA 202
farm animal. This is a binary
International Journal of 206
classification problem where Epidemiology IJE
each of the documents or
International Immunology II 265
content either approves the
Deployment of Supervised Machine Learning and Deep Learning Algorithms 415
Classifiers Parameters
RC metric="Euclidean" shrink_threshold="None"
improvement among those classifiers to Deerwester, S.; Dumais, S.; Landauer, T.;
adapt well in connection to the large- Furnas, G.; Harshman, R. Indexing by
scale dataset. As a result, application latent semantic analysis. JASIS. 1990,
of deep learning-based models like 41(6), 391–407.
Drucker, H.; Wu, D.; Vapnik, V. Support
multilayer feedforward neural networks,
vector machines for spam categorization,
convolution neural networks, recurrent
IEEE Transactions on Neural Networks.
neural networks and ensemble deep 1999, 10(5), 1048–1054.
learning models becomes an evitable Pedregosa, F. et al. Scikit-learn: Machine
avenue of further research. learning in Python, Journal of Machine
Learning Research. 2011, 12, 2825–2830.
Fisher, R. The use of multiple measurements
KEYWORDS in taxonomic problems. Annals of
Eugenics. 1936, 7, 179–188.
García, M.A.M.; Rodríguez, R.P.; Rifón,
text mining L.E.A. Biomedical literature classification
using encyclopedic knowledge: a
machine learning
Wikipedia-based bag-of-concepts
documents classifcation approach. PeerJ. 2015, 3, e1279.
information retrieval Han, E.S.; Karypis, G.; Kumar, V. Text
categorization using weight adjusted
information extraction
k-nearest neighbor classification. Springer.
2001
He, J.; Ding, L.; Jiang, L.; Ma, L. Kernel ridge
REFERENCES regression classification. Proceedings
of the International Joint Conference on
Aggarwal, C.C. ; Zhai, C. X. Mining Text Neural Networks. 2014, 2263–2267.
Data, Springer. 2012. Hofmann, T. Probabilistic latent semantic
Almeida, H.; Meurs, M. J.; Kosseim, L.; indexing. ACM SIGIR Conference, 1999.
Butler, G.; Tsang, A. Machine learning for Howland, P.; Jeon, M.; Park, H. Structure
biomedical literature triage, PLoS One. preserving dimension reduction
2014, 9(12). for clustered text data based on the
Chakrabarti, S.; Roy, S.; Soundalgekar, M. generalized singular value decomposition.
Fast and accurate text classification via SIAM Journal of Matrix Analysis and
multiple linear discriminant projections, Applications. 2003, 25(1), 165–179.
VLDB Journal. 2003, 12(2), 172–185. Howland, P.; Park, H. Generalizing
Cohen, AM. An effective general purpose discriminant analysis using the
approach for automated biomedical generalized singular value decomposition,
document classification. AMIA Annual IEEE Transactions on Pattern Analysis
Symposium Proceedings. 2006,161–165. and Machine Intelligence. 2004, 26(8),
Cortes, C. ; Vapnik, V.; Support-vector 995–1006.
networks. Machine Learning. 1995, 20, Hull, D.A. Stemming algorithms: A case
273–297. study for detailed evaluation. JASIS 47.
Crammer, K.; Dekel, O.; Keshet, J.; Shalev- 1996, 1, 70–84
Shwartz, S.; Singer, Y. Online passive Jiang, X.; Ringwald, M.; Blake, J.; Shatkay,
aggressive algorithms, Journal of Machine H. Effective biomedical document
Learning Research. 2006, 7, 551–585. classification for identifying publications
422 Handbook of Artificial Intelligence in Biomedical Engineering
PD and sent through access points components that helps in the design
to outside networks wirelessly. The of body sensors nodes, position and
communication standards used in locates the sensor nodes in WBAN,
this tier include bluetooth/bluetooth signal processing, data storage
low energy, ZigBee, ultra-wide and feedback mechanism, power
band (UWB), cellular, and WLAN. source, energy harvesting technolo-
The outside users can communicate gies, dynamic control, and antenna
with the BAN using a gateway. design. (b) Protocol stack for radio
Therefore, the medical supervisors and wireless transmissions, channel
remotely attending the patients can modeling, interfaces with other
get the information immediately wireless communication standards,
through wireless communication interference, efficient medium access
or the Internet. Therefore, the state control (MAC) protocols, error
of the patient is clearly monitored, correction methods, and cross-layer
and on emergency conditions, the techniques. (c) Position and mobility
ambulance that is connected to the deal with the position and movable
outside WBAN is informed. The property of the sensor nodes. (d)
important design areas in the WBAN Security issues related to integrity,
architecture are (a) sensors, energy confidentiality, authentication, and
or power, and network hardware secured communication.
algorithms are addressed for BANs. the base station die soon. Therefore,
This research points towards the the mobile base stations can be
clustering algorithms to study the used to collect the data. Another
existing clustering methods and how approach to conserve the energy is
to improve the performance of these by using the clustering approach in
algorithms with respect to network which the information read by the
lifetime, throughput, and packet sensor nodes is directly routed to
delivery ratio on specific environ- the cluster heads. A considerable
mental conditions. amount of energy will be spent to
transmit the data depending on the
distance between the sensor node
19.3 RELATED WORKS and the sink. Therefore, the far-away
nodes would drain earlier. This
A lot of methods have been devel- will affect the performance of the
oped toward energy conservation network at the initial rounds. The
of nodes in the WSNs. Diverse clustering technique perhaps avoids
mechanisms have been suggested this situation by selecting the higher
in the literature for conserving the energy nodes as cluster heads and
energy in WSNs. Duty cycling and the remaining nodes send the data to
data-driven approaches (Giuseppe the nearby cluster heads. The cluster
et al., 2009; Rezaei et al., 2012) are head in turn collects and aggregates
mainly applied in sensor nodes. The the data and sends it to the base
duty cycling approach switches off station. Therefore, minimum energy
the transceiver to the sleep mode is required to transmit the data to the
when data are not transmitted and nearest cluster head. Data aggrega-
makes the nodes ready to receive tion (Nakamura et al., 2007) is a
the information as soon as available. useful method performed by the
The time duration of the nodes in the cluster heads so that the redundant
active state is called a duty cycle. A data are eliminated instead of being
collection of energy-efficient MAC sent to the sink.
protocols is evolved (Pei et al., Many of the clustering algo-
2013; Demirkol et al., 2006; Naik., rithms (Heinzelman et al., 2002;
2004; Batra., 2016) to preserve the Manjeshwar et al., 2002; Younis et
energy. Mobility (Sara et al., 2014) al., 2004; Qing et al., 2006) focus
also plays a role in energy conser- on the selection of the cluster
vation in sensor networks. As the heads between the sensor devices.
traffic around the sink in a network LEACH (Heinzelman et al., 2002)
is always greater than that in the is the pioneer in protocols used for
rest of the area, the nodes around clustering the WSN nodes. In the
434 Handbook of Artificial Intelligence in Biomedical Engineering
LEACH protocol, the nodes are with only one cluster head. Ding
selected as cluster heads depending et al. (2005) have devised another
on the probability value. Each node algorithm DWEHC to overcome
is assigned with a probability Pi(t) the drawback of HEED. Every node
at time t. A sensor will be elected finds its weight after identifying
as cluster head only when it has not its neighbor nodes in neighboring
been a cluster head in most recent vicinity. The weight is the collec-
rounds (r mod (N/k)), and which tion of energy and closeness to
presumably has higher energy than the neighbors. A node having a
the other sensors. The probability larger weight among the others
to become a cluster head is thus will be elected as the cluster head.
calculated as: Even though HEED and DWEHC
look similar, the cluster heads are
‹
‚
evenly distributed in DWEHC than
‚ k that in HEED. WSNs with nodes
Pi ( t ) = ‰ Ci (t ) = 1 (19.1)
‚ N − k * r mod N
having different energy levels at the
0
‚ kotherwise
„
beginning are called heterogeneous
WSNs. Two classes of sensors
where N is the nodes present in a
with dissimilar energy levels
WSN, r is the current round and k is
are employed in Stable Election
the likely number of cluster heads at
Protoco (Smaragdakis et al., 2004)
round r. Variants of LEACH (Salim
and they are called normal nodes
et al., 2014; Batra et al., 2016;
and advanced nodes. The advanced
Arumugam et al., 2015) are devel-
nodes have (1 + α) times the energy
oped in later stages with improved
of the normal nodes. The threshold
performances. A coordinator-based
value for finding the eligible cluster
cluster head election method was
heads is calculated based on the
proposed (Wu et al., 2011) and the
weighted probabilities. Another
network performed in a better way.
distributed energy-efficient clus-
Facility location theory is used to
tering protocol has been developed
resolve the incapacitated facility
for heterogeneous WSNs, called
location problem (Jain et al., 2011)
distributed energy efficient clus-
and the clustering model saved the
tering (DEEC) (Qing et al., 2006). In
energy to a specific extent. Another
DEEC, a probability ratio between
distributed clustering scheme HEED
the residual energy of each node in
(Younis et al., 2004) selects cluster
the network and the average energy
heads from the deployed sensors
of the network is used to select the
based on a hybrid of communica-
cluster head. A node having more
tion cost and energy. In HEED,
initial and residual energy will be
each sensor is directly connected
Energy Effcient Optimum Cluster Head Estimation 435
radio models are classified into the is assumed that Efs and Emp are the
multipath and free space model. amplifier types in the respective
The radio signals propagate from media. The distance d0 is considered
the source and reach the receiving as a threshold value for selecting the
antenna over two or more paths. media.
This method is known as multipath Energy used for sending the data:
propagation. Causes for multipath Let ET be the power required
occurrence are due to ionospheric to transmit a packet of size P at a
reflection and refraction, atmos- distance d. The d and d0 are used
pheric ducting, and reflection from to select the media for sending the
water bodies and terrestrial objects packet. If d ≤ d0, the free space
like mountains and buildings. Phase amplifier type is used for sending the
shifting of the signal and constructive data; otherwise, the multipath ampli-
and destructive interference are the fier is considered.
effects of multipath. The multipath If d≤d0
signals are received in a terrestrial
environment, that is, where different ET = ( Eel š P ) + ( E fs š P š d 2 ) (19.3)
types of propagation are present and
the signals reach at the receiving else
station via different ways of paths.
Therefore, multipath interference ET = ( Eel š P ) + ( Emp š P š d 4 ) (19.4)
occurs here and causes multipath
fading. Transmission antenna and where Eel = Et + E AG . E AG is the power
receiving antenna are kept in an used for performing aggregation.
obstacle-free environment to have a In the case of cluster heads, only
free space propagation model. The EAG will be considered, and for the
absorbing obstacles and reflecting normal sensors, this value would be
surfaces are not considered in the nothing. The cluster heads collect
free space propagation model. The the data from the nodes within this
distance d0 is calculated as, cluster, process, and aggregate; and a
single data is transmitted to the sink
E fs instead of sending complete data
d0 = (19.2) received from each node. This will
Emp
reduce energy while transmitting
the data. Et is the energy needed for
where Efs is the energy needed to send transmitting 1 bit/m2.
data within the free space, Emp is the Energy requirement for receiving
energy needed for sending the data the data:
in multipath networks. Therefore, it
Energy Effcient Optimum Cluster Head Estimation 437
each other, the energy dissipation where Mr has been the total amount
becomes uneven, which leads to the of active sensors in round r and Mr–1
shortfall of the network lifetime. The has been the total amount of active
TEEN-DECH method has been used sensors in the round r−1. The rate of
to resolve the above drawback. change of the above-said parameters
In this chapter, a novel method can be computed as
ology is suggested to compute the
optimum amount of cluster heads X = [ Ec M t ×10] (19.9)
needed for each round. The number
of cluster heads for each round is Now, the given expression is used
calculated by the ratio of the total to find the optimum cluster count
energy of all the nodes in a WSN
at the current round to that at the Y = 1 − e( −α X ) ;0 < α < 1 (19.10)
previous round. The ratio of the
number of active nodes at the current The optimum cluster count is
round divided by that at the previous computed as
round is also taken into consideration.
If Ei is the energy of the ith node in the CH opt = ⎡⎣( p ∗ na ∗Y ) ⎤⎦ (19.11)
WSN, the total energy (Etot(r)) of the
WSN at round r is calculated as where p is the probability value and
na is the total number of alive sensor
nodes in the network. CHopt gives
Etot ( r ) = ∑ i=1 Ei
n
(19.6) the total number of optimum cluster
heads to be selected in the respec
The change in the total energy tive round. If the number of clusters
level (Ec) of the sensor network already computed is less than the
between two consecutive rounds can optimal amount of cluster head
be found as: (CHopt), the balance cluster heads
will be elected from the sensors with
Ec =
( Etot (r ) ) more energy and not being elected as
(19.7)
( Etot (r −1) ) cluster head in the recent r mod (N/k)
where (Etot(r)) is the total energy of rounds are selected as cluster heads.
the network in round r and (Etot(r– Otherwise, if the clusters already
1)) is the total energy of the network computed are greater than the
in the previous round. Let Mt be the optimum cluster count, the numbers
ratio of the active nodes between the of excess cluster heads are found and
current and previous rounds among the existing cluster heads,
cluster heads with least energy are
Mr converted as normal nodes. After
Mt = (19.8)
M r −1 the cluster heads are selected, the
Energy Effcient Optimum Cluster Head Estimation 439
19.5.2. In addition, the performance of been computed and used for the simula-
the proposed method is compared with tion, the unnecessary usage of nodes as
that of the existing TEEN-DECH and cluster heads are averted. This avoids
TEEN protocols and the graphs are the energy loss due to nodes being
shown in Figures 19.2–19.5. Figure cluster heads and improves the network
19.2 illustrates the amount of cluster lifetime. Similarly, a small amount of
heads in each iteration using TEEN- cluster heads inside a network makes a
ACHE and the existing TEEN-DECH rapid drain of the energy of these cluster
and TEEN protocols. It indicates heads due to heavy traffic. Therefore,
that the TEEN-ACHE methodology an energy hole has been created inside
optimally computes the cluster heads the network and decreases the network
so that the energy dissipation is mini- lifetime. It is observed that up to round
mized that leads to an increase in life 320; the TEEN-ACHE performs better
of the BAN. Figure 19.3 illustrates the than the other protocols, and at the
network lifetime of the BAN. Since the last stages, the TEEN-DECH has little
optimum amount of cluster heads has more live nodes.
plant." Proceedings of the 5th World Salim, Ahmed, Walid Osamy, and Ahmed
Congress on Intelligent Control and M. Khedr. "IBLEACH: Intra-balanced
Automation, WCICA’2004. Vol. 4. IEEE, LEACH protocol for wireless sensor
2004. networks." Wireless Networks 20(6)
Manjeshwar, Arati, and Dharma P. Agrawal. (2014): 1515–1525.
"APTEEN: A hybrid protocol for efficient Sara, Getsy S., and D. Sridharan. "Routing in
routing and comprehensive information mobile wireless sensor network: A survey."
retrieval in wireless sensor networks." Telecommunication Systems 57(1) (2014):
Proceedings of the International Parallel 51–79.
and Distributed Processing Symposium, Smaragdakis, Georgios, Ibrahim Matta, and
IPDPS. Vol. 2. 2002. Azer Bestavros. SEP: A Stable Election
Manjeshwar, Arati, and Dharma P. Agrawal. Protocol for Clustered Heterogeneous
"TEEN: A routing protocol for enhanced Wireless Sensor Networks. Boston
efficiency in wireless sensor networks." University Computer Science Department,
2004.
Proceedings of the International Parallel
Sundareswaran, P., K. N. Vardharajulu, and
and Distributed Processing Symposium,
R. S. Rajesh. "DECH: Equally distributed
IPDPS. Vol. 1. 2001.
cluster heads technique for clustering
Movassaghi S, Abolhasan M, Lipman
protocols in WSNs." Wireless Personal
J, Smith D, Jamalipour A. Wireless
Communications 84(1) (2015): 137–151.
body area networks: A survey. IEEE
Wang, Ning, Naiqian Zhang, and Maohua
Communications Surveys & Tutorials Wang. "Wireless sensors in agriculture
16(3) (2014): 1658–86. and food industry—Recent development
Naik, Piyush, and Krishna M. Sivalingam. and future perspective." Computers and
"A survey of MAC protocols for sensor Electronics in Agriculture 50(1) (2006):
networks." Wireless Sensor Networks. 1–14.
Springer, New York, NY, USA, 2004. pp. Wu, Shan-Hung, Chung-Min Chen, and
93–107. Ming-Syan Chen. "Collaborative wakeup
Nakamura, Eduardo F., Antonio AF Loureiro, in clustered ad hoc networks." IEEE Journal
and Alejandro C. Frery. "Information on Selected Areas in Communications
fusion for wireless sensor networks: 29(8) (2011): 1585–1594.
Methods, models, and classifications." Yi, Chenfu, Lili Wang, and Ye Li. "Energy
ACM Computing Surveys 39(3) (2007): 9. efficient transmission approach for WBAN
Qing, Li, Qingxin Zhu, and Mingwen Wang. based on threshold distance." IEEE
"Design of a distributed energy-efficient Sensors Journal 15(9) (2015): 5133–5141.
clustering algorithm for heterogeneous Yick, Jennifer, Biswanath Mukherjee, and
wireless sensor networks." Computer Dipak Ghosal. "Wireless sensor network
Communications 29(12) (2006): survey." Computer Networks 52(12)
2230–2237. (2008): 2292–2330.
Rezaei, Zahra, and Shima Mobininejad. Younis, Ossama, and Sonia Fahmy. "HEED:
"Energy saving in wireless sensor A hybrid, energy-efficient, distributed
networks." International Journal of clustering approach for ad hoc sensor
Computer Science and Engineering networks." IEEE Transactions on Mobile
Survey 3(1) (2012): 23. Computing. 4 (2004): 366–379.
CHAPTER 20
surface features that include Gabor compared to other techniques for the
filter banks, or alignment-oriented above-stated problem.
features. Here, the considered
alignment-oriented features are
inter-image gradient, region shape 20.3 PROPOSED WORK
distinction, and symmetry analysis
(Tustison, 2013). The brain tumor regions from MR
The discriminative models images are differentiated from the
use hand-designed features healthy tissues by image segmenta-
and the classifier is specifically tion and thereby training the NN to
trained to differentiate healthy obtain the classification efficiency.
from nonhealthy tissues. It is also In order to enhance the accuracy of
assumed that the key features have differentiation, a NN-based classifi-
elevated discriminative power cation is proposed and its classifica-
because the behaviour of classifier tion efficiency is computed for brain
algorithm is self-regulating from tumor images. The results obtained
the characteristics of those extracted through the proposed method are
features. The complexity of these more robust and it supports in accu-
hand-designed features requires rate segmentation of brain tumor
the computation of huge number regions.
of features for ensuring accuracy The sequence of operations
when employed with conventional involved in the brain tumor segmen-
ML methods. Efficient techniques tation process is presented in Figure
always use less features thereby 20.1. The input images are selected
employing dimensionality reduction from standard cancer image data-
or feature identification methods for bases that are commonly known as
better accuracy. Preliminary analysis MR images. Here, MRI of the brain
has shown that Deep CNNs for is balanced by converting it into grey
brain tumor image segmentation is scale image and resized to a standard
a proficient technique (Havaei et al., resolution. The real two-dimensional
2017). Hinton et al. (2012) proposed (2D) MRI scan image of a brain from
a CNN-based method that consists the DICOM data is obtained from
of a series of convolutional layers for the 64-slice CT scan machine. The
feature detection. Similarly, Nabi- sequences of images are taken from
zadeh and Kubat (2015) investigated different projections by rotating
various classification methods for the gantry. This image provides the
tumor segmentation in brain images complete view of a brain and it is used
and suggested that neural network- for the subsequent analysis purpose.
based methods offer best results as In the course of analysis, the brain
454 Handbook of Artificial Intelligence in Biomedical Engineering
(
Š1 ( Œu ) := g Œu
2
) (20.1)
Gaussian function is applied to
minimize the noise exists in the MRI
images (Chaddad, 2015). When an
In mathematical morphology, image is blurred by the influence
dilation and erosion are the opera- of Gaussian function then it is
tions usually performed with binary called as Gaussian blur or Gaussian
images to expand the boundaries smoothing. The visual outcome
consist of foreground pixels (Chen of Gaussian blurring method is a
and Haralick, 1995). Hence, the smooth blur that is similar to the
Segmentation and Classifcation of Tumour Regions 455
base image, then the input pixel will where, “a” is the Dilation factor, “b”
be set as the corresponding forefront is the translation parameter
value (Chen and Haralick, 1995).
Similarly, if the entire corresponding 1
pixels in the base image lie at the ƒ (t ) = (20.3)
backdrop, then the input pixel is fixed
a ,b
a
as the background value. The struc- The wavelet (ƒ a ,b ) is estimated
turing element should be assigned as from the mother wavelet (ƒ ) by the
a small binary image or in a unique process of translation and dilation. It
456 Handbook of Artificial Intelligence in Biomedical Engineering
from the National Cancer Archives research work (Kinahan, 2019). The
of United States of America data- image dataset for the proposed work
base and used as the input for this is presented in Figure 20.3.
FIGURE 20.5 Preprocessed and dilated images for edge detector block.
FIGURE 20.7 Eroded image and final marked image after segmentation process.
peak without error at the epoch 4. The number of healthy frames is more
peak best performance is occurred than the number of tumor frames
at epoch 4 with a score of 0.37558. that formulates the training set
From the training state statistics, it is fully unbalanced. To minimize the
observed that the overall validation difficulties caused while handling
efficiency is constantly above the unbalanced training datasets, equal
required level. The state of testing number of healthy and tumor frames
and training is exponentially varying were preferred for the experimental
until the epoch 4, and it is stabilized analysis.
afterwards. The accuracy of different clas-
sification methods was compared
with the proposed work using
20.5.4 COMPARISON WITH statistical features with noise reduc-
THE STATE-OF-THE-ART tion as presented in Figure 20.11.
METHODS In addition, the number of features
along with the recognition rate for
Four supervised robust classifica- SVM and NN based methods are
tion techniques are applied for presented in Figure 20.12. The
the comparison and validation of results show that the proposed
the present work. The techniques NN based segmentation and clas-
such as SVM, KNN, NSC, and sification methods are efficient for
k-means clustering are compared determining the tumorous tissues
with the NN-based method used for from the brain MRI and classifying
brain tumor segmentation process them precisely.
(Nabizadeh and Kubat, 2015). The
464 Handbook of Artificial Intelligence in Biomedical Engineering
FIGURE 20.11 Accuracy of various classification methods through statistical features with
noise reduction.
than the other existing techniques in Bengio, Yoshua, Aaron Courville, and Pascal
terms of accuracy and performance Vincent. Representation learning: A review
and new perspectives. IEEE Transactions
efficiency. The present chapter shall
on Pattern Analysis and Machine
be extended with a larger dataset by Intelligence. 2013, 35:8, 1798–1828.
including prior knowledge about Clark, M.C., Hall, L.O., Goldgof, D.B.,
shape and model features to the Velthuizen, R., Murtagh, F.R., Silbiger,
tumor segmentation. In addition, it M.S., Automatic tumor-segmentation
can be extended further by including using knowledge-based techniques. IEEE
the morphological structure related Transactions on Medical Imaging. 1998,
117, 187–201.
data of the input brain MRI to train Cobzas, D., Birkbeck, N., Schmidt, M.,
the NN. Hence, such a detailed input Jagersand, M and Murtha, A. 3D variational
data helps the NN tool to perceive brain tumor segmentation using a high
greater information from the MRI dimensional feature set. Mathematical
for extensive medical applications. Methods in Biomedical Image Analysis
(MMBIA 2007). 2007, 1–8.
Carlos Arizmendi, Alfredo Vellido and
Enrique Romero, Classification of human
KEYWORDS brain tumours from MRS data using
discrete wavelet transform and Bayesian
brain tumor neural networks. Expert Systems with
Applications. 2012, 39:5, 5223–5232.
segmentation
Chen S. and Haralick R. M. Recursive
discrete wavelet transform erosion, dilation, opening, and closing
neural networks transforms, IEEE Transactions on Image
Processing, 1995, 4:3, 335–345.
Demirhan, Ayse, Memduh Kaymaz, Raşit
Ahıska, and Inan Guler. A survey on
REFERENCES application of quantitative methods on
analysis of brain parameters changing with
Autier, P. Risk factors and biomarkers of
temperature. Journal of Medical Systems.
life-threatening cancers. Ecancer Medical
2010, 34:6, 1059–1071.
Science. 2015, 9:596, 1–8.
Doyle S., Vasseur F., Dojat M., and Forbes F.
Ahmad Chaddad. Automated feature
Fully automatic brain tumor segmentation
extraction in brain tumor by magnetic
resonance imaging using Gaussian from multiple MR sequences using hidden
mixture models. International Journal of Markov fields and variational EM. Procs.
Biomedical Imaging. 2015, 2015, 1–11. NCI-MICCAI BraTS. 2013, 18–22.
Breiter, Hans C., Scott L. Rauch, Kenneth Galic, I., Weickert, J., Welk, M., Bruhn,
K. Kwong, John R. Baker, Robert M. A., Belyaev, A., Seidel, H.P. Image
Weisskoff, David N. Kennedy, Adair compression with anisotropic diffusion.
D. Kendrick et al. Functional magnetic Journal of Mathematical Imaging and
resonance imaging of symptom Vision. 2008, 31:2–3, 255–269.
provocation in obsessive-compulsive Gerrig, Richard J., and Gregory L.
disorder. Archives of General Psychiatry. Murphy. Contextual influences on the
1996, 53:7, 595–606. comprehension of complex concepts.
466 Handbook of Artificial Intelligence in Biomedical Engineering
A HYPOTHETICAL STUDY IN
BIOMEDICAL BASED ARTIFICIAL
INTELLIGENCE SYSTEMS USING
MACHINE LANGUAGE (ML)
RUDIMENTS
D. RENUKA DEVI* and S. SASIKALA
Department of Computer Science, IDE, University of Madras, Chennai
600005, Tamil Nadu, India
Corresponding author. E-mail: renukadevi.research@gmail.com
*
detection method for polyp miss rate, market growth is massive in 2021 by
thus aiding doctors to pay attention ten times compared to previous years.
and focus on a specific region. Deep
learning NN framework is imple-
21.2 MACHINE LEARNING (ML)
mented to address the critical issues
of neonatal. Harpreet et al. (2017) AI has its impact in the area of
proposed the cloud-based integrated ML where algorithms are devel-
Neonatal Intensive Care Unit data oped to learn the similarities in
analytics framework. This model data and develop decision rules
integrates and tracks the complete from that. Data mining problems,
assessment sheet of preterm babies. have embedded ML algorithms,
One of the significant areas constantly combined with statistical
where AI is successfully moving to methods, to extract knowledge from
transform, the impact of medicines the given data. The foundation of all
and providing assistance to pharma- the algorithms is a statistical based
cology. It helps in discovering the mathematical model, contribute to
new combination of drugs. It may various fields NN, Deep Learning
become an assistive technology that (DL), support vector machine
will empower the medical researchers (SVM), decision tree, naïve Bayes,
and practitioners to provide better random forest, etc., These models
treatments to serve their patients were used for analytics and decision-
having some critical diseases. making process. The conceptual
According to Accenture report, the AI model is shown in Figure 21.1.
the patient’s database, to infer the extended to DL, which abstracts the
desired output, that is, condition of data model by a deploying number of
health status. The patient’s database deep layers (DNNs), thus making the
comprises the basic element of prediction accurate and concise even
information like name, age, disease, though if it is a high dimensional
lab report parameters relevant to data. Here, the process is formulated
the disease. It is the combination of into two steps,
transactional and image data like
• Multiple layers process the
scanned images, CT Scan, MRI, and
data. DL extracts the data
medication facts collectively.
abstraction through the
Furthermore, the outcomes
multilayer learning process.
are researched for the level of the
Thus the huge amount of
disease; this is the output parameter
data is processed to extract
(Y). For instance, in tumor disease
meaningful insights. As it is
prediction, the Y parameter is the
a deep consecutive layer, the
size or stage of the disease based on
intention of learning abstrac-
the input parameter (X) of the patient.
tion is done through the layer
ML algorithms can be divided into
to layer as is follows hierar-
two major categories based on
chical mechanism, thus lead
whether the outcomes can be incor-
to the output of each layer is
porated or not. The ML algorithms
given to the input of the next
fall into two categories namely,
layer.
• Supervised • The final output data repre-
• Unsupervised sentation developed by DL
algorithms provides construc-
The unsupervised algorithms are tive information. It is indeed
generally used for feature extrac- a simpler model working
tion, however supervised algorithm resourcefully on complex
establishes the relationship between data sets. DL also interprets
X (input) and output (Y) by predicting with varied type of data text,
the output via input. The semisuper- image, audio, and video. This
vised algorithm combines both super- system is further extended into
vised and unsupervised approaches. derive relational and semantic
knowledge from raw data.
21.3.2 AN EMERGING LEAP IN
The foundations of DNN layers
ML: DEEP LEARNING (DL)
are established on artificial neural
With the leap and bounds of networks, which contains multiple
emerging technology, ML can be deep layers. Having this as a nature
A Hypothetical Study in Biomedical Based Artifcial Intelligence Systems 477
FIGURE 21.4 Encoding and decoding mechanism (Reprinted from Wen, Haiguang, et al.
Neural encoding and decoding with deep learning for dynamic natural vision. Cerebral Cortex.
2017, 28(12), 4136-4160. https://arxiv.org/ftp/arxiv/papers/1608/1608.03425.pdf)
A Hypothetical Study in Biomedical Based Artifcial Intelligence Systems 481
the features present. The activation of 93.1% for DME. It is the greatest
function is applied on the layer (top milestone in Google’s research accom-
layer), which maps the input and plishment to produce high sensitivity
output variables. This is followed by and specificity, with a minimal elimi-
the normalization procedure of each nation of diseased patient.
convolution layer. This is carried out
as the batch normalization when the
features are elevated.
The scanned images are mixed
with both macroscopic and subtle
features. Many research method-
ologies developed for identifying the FIGURE 21.5 (A) Healthy retinal fundus
major features, however the subtle image on the left (B) On the right, red
features are crucial for diagnosis. spots signifies the affected retina due to DR
(source:https://ai.googleblog.com/2016/11/
The architectures developed was
deep-learning-for-detection-of-diabetic.html).
tested on ImageNet dataset only
exploited the macroscopic features.
This leads to the new paradigm of 21.5.3 PROSTATE CANCER
model that is capable of identifying
even subtle features. The two Prostate cancer is likely uncommon
stage- CNN model uses a pipeline, and non-aggressive in nature.
for feature localization followed by However, the identification this
classification. Preprocessing is done type of cancer poses a challenge
to eliminate the nonrelevant features, in treating the patient, by either
and network weights are adjusted to surgical method or radiation therapy.
deal with the class imbalance. So So, technology acts a key factor in
this model, screens and identifies the measurement of risk factor. In
even the mild disease to multigrade Gleason grade (refer Figure 21.6),
disease detection. the parameter risk stratification is
Google developed brain project identified leads to further diagnose
(refer Figure 21.5), based on DL algo- how cancer cells closely resemble
rithm that can inspect huge numbers the normal one under the micro-
of fundus images and automatically scopic study. However, this conven-
discover DR and diabetic macular tional method has a major part of
edema (DME) with an elevated accu- importance in clinical diagnosis, but
racy. The system is tested with two it is a very complex and subjective
batches of images (11,711), produced technique. This is evident from the
a sensitivity of 96.1% and 97.5% for studies and report from interpatholo-
diabetic retinopathy and a specificity gist disagreements.
A Hypothetical Study in Biomedical Based Artifcial Intelligence Systems 483
a forward model “zt,” the a priori the dynamic parameters, that is,
knowledge of head geometry, location (x, y, and z directions) in
electrode position, and dipole local the cerebrum, we use the PF by
ization is to be known. The repre considering the measured signal zt
sentation of N number of dipoles is at time t (Ebinger et al., 2015).
xt = ⎡⎣ xt (1) ,..., xt ( N ) ⎤⎦ , where each single
geometric position in 3D is given as
xt (i ) = [ x(i ) y (i ) z (i )] , i = 1, 2,..., N.
T
22.2.2 PARTICLE FILTER
As there are N number of
The paradigm in state space is well
dipoles, the lead field matrix
Li ( xt ) = ℜ N ×ns adapted to problems with neural
can be written as analysis, where an observed signal
L ( xt ) = ⎣⎡ L ( xt (1) ) ,..., L ( xt ( N ) ) ⎦⎤ is influenced by certain unknown
, and it
depends on the location of the dipole factors that change with time. Latent
xt ( i ) at time t. The vector of moments states are referred to as unidentified
T signals. This approach allows us to
st = ℜ3N×1 is st = ⎡s ⎣ t (1) ,..., st ( N ) ⎤⎦ ,
solve issues related to the estimation
where each single moment or mind
of latent signals and the adaptation of
source signal in 3D is given as
T models between latent and observed
st ( i ) = ⎡s
⎣ x ( i ) s y (i ) sz (i )⎤⎦ . Now, (22.1) signals and to statistical testing of
can be rewritten in a matrix form as their relationship. We must identify
a couple of statistical models in
Z = L( X ) S +V (22.2) order to build a state-space model.
In the first model, the state model,
This equation can be taken as a the dynamics of the latent states are
measurement equation, and for the described. The second model, the
state equation, since how the states observation model, explains how
evolve is unknown, we can take this the latent state affects, each time,
as a random walk model in the brain the probability distribution of the
source localization problem observation process. By considering
the measurements, the PF is used to
xt = xt −1 + ut (22.3)
assess the dynamic state variables by
Equations (22.2) and (22.3) approximating the posterior likeli
can be considered as measurement hood density function of the uniden
or observation equation and state tified state parameters at each time
equation, respectively. Based on point (Arulampalam et al., 2002).
the above two state and observation For such a dynamic system, the
equation the EEG can be modeled state-space model can be described
as state-space model. To evaluate in terms of xt and zt as
498 Handbook of Artificial Intelligence in Biomedical Engineering
p ( xt zt ) œ i =1 w(i) t ł ( xt − x ) (22.6)
N (i)
t Now, (22.8) becomes w( ) ž w( ) p ( z x ). i
t
i
t −1 t
(i)
t
one direction in the “wheel,” for et al., 2007), and DTF (Kaminski et
one particle and other particles with al., 2001). The time-series MVAR
the N − 1 directions being fixed at model equation can be written as
1/N increments from that randomly
s ( t ) = ∑ k =1 r ( k ) s ( t − k ) + a ( t )
p
(1) (22.10)
picked direction. Currently, wt is
drained from the regular distribution
⎛ 1 Here, s(t) represents the M×1
on ⎜ 0, ⎤⎥ , and whatever is left of the neural sources time series, where M
⎝ N⎦
“wt” information is acquired conclu is the number of sources. r(k) is the
sively (Bolic et al., 2004), that is, M×M coefficient matrix, which can
be attained from the autoregressive
⎛ 1⎞ (AR) model, and p is the model
wt( n ) ~ U ⎜ 0, ⎟ ,
⎝ N⎠ order of the AR process and can be
(n)
wt = wt + (1) n −1
, n = 2,3,..., N (22.9) calculated by the Akaike informa
N tion criterion (Akaike, 1974) and
the Bayesian information criterion
(Schwarz, 1978). Rearranging
22.2.3 EFFECTIVE (22.10), we obtain
CONNECTIVITY MEASURES
a ( t ) = ∑ k =0 rˆ ( k ) s ( t − k )
p
(22.11)
Effective connectivity measures
estimate the frequency-domain where rˆ ( k ) = −r ( k ) and r̂ ( 0 ) = I .
directional association between Converting (22.11) to the frequency
cerebrum regions that can be attained domain, we obtain
from spectral measures by utilizing A( f ) = R ( f ) S ( f ) .
MVAR models. Subsequently, Multiplying both sides with
coordinated communications can R −1 ( f ) gives
be measured by fitting the MVAR
S(f) = Q(f) A(f), where R–1(f) = Q(f).
model to the time courses of the
evaluated sources. These measures Here, S(f) is the matrix of the
were proposed by Granger as, for multivariate process and Q(f) is the
two signals, if the first signal infor transfer function of the system. This
mation can be predicted by using transfer function should give the
the previous information of the information about the structure of
second signal, then it can be stated the modeled system.
as casual to the first signal (Granger, Now, the GC connectivity
1969). Based on the Granger theory, measure can be calculated as
effective connectivity measures can ⎛ var ( stm sˆ m ) ⎞
be categorized as GC, PDC (Baccala GCmn = In ⎜ ⎟
⎜ var ( stm sˆ m , sˆ n ) ⎟ (22.12)
⎝ ⎠
500 Handbook of Artificial Intelligence in Biomedical Engineering
where GCmn represents the relation Start analyzing the data at time
m
from n → m and ŝ is the past value t = 1.
of the source m. Based on the probabilistic
PDC is given as criteria, initialize the particle.
Assign the weights to each
Rmn (f) particle.
PDCmn (f) = (22.13)
Normalize the weights.
∑
2
R (f)
p
i=1 in
Most of the particles with high
where PDCmn gives the causality weights will dominate the low-
from n → m, PDCmn is in the range weight particles. The dominance of
from 0 to 1, where 0 represents no these particles’ degeneracy leads to
connectivity between n and m and 1 the poor posterior likelihood density
represents full connectivity between function. Use resampling methods to
n and m. discard the low-weight particles.
The DTF is given as Prepare dipole configuration for
the next time step.
Qmn ( f ) Extraction of dipoles and their
DTFmn ( f ) = (22.14) time series using the PF for EEG
∑
2
Qml ( f )
p
data.
i=1
TABLE 22.1 Locations (x, y, and z directions) of the Extracted Sources from the EEG
Source x y z
Source 1 –0.0043 –0.0074 0.0163
Source 2 –0.0160 0.0165 0.0359
Source 3 0.0063 0.0454 0.0280
Source 4 –0.0030 –0.0443 0.0401
Source 5 0.0663 –0.0015 0.0158
502 Handbook of Artificial Intelligence in Biomedical Engineering
FIGURE 22.2 (a) Source extracted by using the PF. (b) Source amplitudes.
FIGURE 22.4 Connectivity measure by PDC (a) up to 4 Hz and (b) from 4 to 16 Hz.
FIGURE 22.5 Connectivity measure by DTF (a) up to 4 Hz, (b) from 4 to 8 Hz, and (c)
from 8 to 16 Hz.
GC, PDC, and DTF demonstrate of the brain and its useful activi-
the directional connectivity between ties is one of the imperative fields
source 1 and source 4. in investigating how brain works.
These communications are called
brain connectivity. The approach
22.4 CONCLUSION
proposed in this chapter was utilized
Focusing on the relations and for assessing effective connectivity
communications among regions of cerebrum by using PF and GC
Neural Source Connectivity Estimation 505
Trans Biomed Eng; 39(6):541–557, DOI: Bayesian filtering. Human Brain Mapping;
10.1109/10.141192 30:19111921
Nunez PL, Srinivasan R. (2006). Electric Sorrentino A, Parkkonen L, Piana M.
Fields of The Brain: The Neurophysics of (2007). Particle filters: A new method for
EEG. Oxford University Press: Oxford, reconstructing multiple current dipoles
UK. from MEG data. In Int Congr Ser;
Oostenveld R, Fries P, Maris E, Schoffelen 1300:173176.
JM. (2011). FieldTrip: Open source Tadel F, Baillet S, Mosher JC, Pantazis
software for advanced analysis of MEG, D, Leahy RM. (2011). Brainstorm:
EEG, and invasive electrophysiological A user-friendly application for MEG/
Data. Comput Intel Neurosci; 2011:156869. EEG analysis. Comput Intel Nerosci;
Schwarz G. (1978). Estimating the dimension 2011:879716.
of a model. Ann Statist; 6:461464. Van Veen BD, Buckley K. (1988).
Sorrentino A, Parkkonen L, Pascarella A, Beamforming: A versatile approach to
Campi C, Piana M. (2009). Dynamical spatial filtering. IEEE ASSP Mag 5:424.
MEG source modeling with multi-target
CHAPTER 23
chosen such that it is far from the data Values Xipi correspond to features
point xi and has maximum margin. The Fe1, Fe2, . . . , Fen.
hyperplane far away from data points There are n classes cl1, cl2, . . .
minimizes the chance of wrong deci- ,cln, and every sample belongs to one
sions during the classification of new of these classes.
data. In other words, the distance of In our model, the value of n is 4
the data points that are closest from as there are four classes.
the hyperplane is maximized in SVM
When an extra data sample Xip
(Naseriparsa and Kashani, 2014).
with an unknown class is provided,
class X can be predicted by highest
23.2.3 NAÏVE BAYES conditional probability Pro (Clk|Xip),
where k = 1, 2,…, n. The Bayes
The NB algorithm is a classifica- model is represented as follows:
tion technique, which is based on
probability. The advantages of this Pro ( Xip Clk ) .Pro ( Clk )
Pro ( clk X ) = (23.2)
technique are that it is easier to be Pro ( Xip )
applied to different types of data Here, Pro is kept constant for all
series and provide better results than cl. The product Pro(Xip|Clk) needs
the existing ones. Suppose Xip is to be maximized. The former prob-
fault data without any class label and abilities of the cl can be estimated
H is a hypothesis such that X falls
by
into a class specified as C. We aim to
establish Pro (H|Xip) as the posterior
Pro ( clk ) =
Number of training int ances of Class Clk
(23.3)
probability representing our confi- m
process, the data are split into 10 studies to identify and classify
identical disjoint parts. Nine of better gene expression profiles of
the 10 parts are used for training lymph node-negative breast cancer.
the framework, and one is used for Various accuracy measures, such
testing. This is performed 10 times, as receiver operating characteristic
and each time a different part of (ROC), precision, accuracy, recall/
the data is used for testing. In this sensitivity, and root-mean-square
work, 40 lymph node-negative gene error, have been performed and
signatures of metastasis were given concluded with a better method for
for 10-fold cross validation with identification of gene expression
the help of three techniques such as profiles.
SVM, NB, and DT. In these 36 gene
signatures from DNA, microarrays 23.3.1 ACCURACY MEASURES
were taken as training and four were
used as validation, and so on. Hence, Different measures have been
imbalance in the data sets may be proposed for two-class problems,
taken into account. We were able wherein four possible cases (TP, FP,
to apply methods successfully with FN, and TN) can be represented in
the 10-fold cross validation, and the a confusion matrix (see Table 23.1).
results are presented in the following
sections. 23.3.1.1 ACCURACY
The accuracy is measured as
23.3 RESULTS
Accuracy (A) = correct classifica-
This study presents SVM, NB, and tion/total number of classes. (23.6)
DT to precisely identify the risk of
tumor recurrence. This is beneficial While talking about software
to classify the patients as low- and fault prediction, A does not disclose
high-risk groups in the case of the discrepancy between FP and FN.
lymph node-negative breast cancer. Overall, the accuracy that is deter-
Various accuracy measures have mined has generally lesser relevance
been carried out using comparative than recall and precision.
In this study, the major issue in cases, the records present in clinical
building a good prediction model data sets do not truly represent the
was found to be not only the ratio properties of data consistent with
of majority and minority samples, the corresponding outcome label. By
but also the requirement for good the comparison of the results, it can
training samples that can show the be seen that AUC has increased by
properties of data consistent with the 22.5% in DT, 45% in NB, and 32.4%
corresponding class label assigned in SVM after balancing the training
to them. In the majority of the data (see Figure 23.2).
FIGURE 23.1 Comparisons on the basis of accuracy for original data and data used on the
basis of SMOTE.
FIGURE 23.2 Comparisons on the basis of AUC for original data and data used on the basis
of SMOTE.
520 Handbook of Artificial Intelligence in Biomedical Engineering
was examined with three machine X., Distinct types of diffuse large B-cell
learning algorithms to classify the lymphoma identified by gene expression
profiling, Nature. 403 (2000) p. 503.
test data sets, including DT, NB, Batista, G.E., Prati, R.C., Monard, M.C., A
and SVM, for lymph node-negative study of the behavior of several methods
breast cancer. The results of classifi- for balancing machine learning training
cation were calculated and compared data, ACM SIGKDD Explorations
for the class imbalance problem, Newsletter. 6 (2004) pp. 20–29.
Ben-Dor, A., Bruhn, L., Friedman, N.,
recall, and AUC. The results Nachman, I., Schummer, M., Yakhini, Z.,
obtained from this study indicate Tissue classification with gene expression
that these techniques combined with profiles, Journal of Computational
the SMOTE generally perform better Biology. 7 (2000) pp. 559–583.
when training is done by imbalanced Blagus, R., Lusa, L.. Evaluation of SMOTE
for high-dimensional class-imbalanced
data. microarray data. in 11th International
Conference on Machine Learning and
Applications. (2012) pp. 89–94.
ACKNOWLEDGMENT Bojarczuka, C.C., Lopesb, H.S., Freitasc,
A.A., Data mining with constrained-
The authors are thankful to the syntax genetic programming: Applications
in medical data set. Algorithms. 6 (2001)
National Institute of Technology
p. 7.
Raipur for providing the necessary Chawla, N.V., Bowyer, K.W., Hall, L.O.,
computational facility to analyze Kegelmeyer, W.P., SMOTE: Synthetic
and prepare the manuscript and for minority over-sampling technique,
permission to publish it. Journal of Artificial Intelligence Research.
16 (2002) pp. 321–357.
Dhanasekaran, S.M., Barrette, T.R., Ghosh,
D., Shah, R., Varambally, S., Kurachi, K.,
KEYWORDS Pienta, K.J., Rubin, M.A., Chinnaiyan,
A.M., Delineation of prognostic
biomarkers in prostate cancer, Nature. 412
DNA microarray-based gene
(2001) p. 822.
expression profle
Elouedi, H., Meliani, W., Elouedi, Z.,
classifcation Amor, N.B.. A hybrid approach based on
breast cancer decision trees and clustering for breast
cancer classification. In 6th International
machine learning Conference of Soft Computing and Pattern
Recognition. (2014) pp. 226–231.
Eswari, J.S., Anand, M., Venkateswarlu, C.,
Optimum culture medium composition for
REFERENCES rhamnolipid production by pseudomonas
aeruginosa AT10 using a novel multi-
Alizadeh, A.A., Eisen, M.B., Davis, R.E., objective optimization method, Journal of
Ma, C., Lossos, I.S., Rosenwald, A., Chemical Technology and Biotechnology.
Boldrick, J.C., Sabet, H., Tran, T., Yu, 88 (2013) pp. 271–279.
522 Handbook of Artificial Intelligence in Biomedical Engineering
and artificial neural networks, Nature Mukherjee, S., Tamayo, P., Slonim, D.,
Medicine. 7 (2001) p. 673. Verri, A., Golub, T., Mesirov, J., Poggio,
Kharya, S., Agrawal, S., Soni, S., Naïve T., Support vector machine classification
Bayes classifiers: A probabilistic detection of microarray data, (1999).
model for breast cancer, International Naseriparsa, M., Kashani, M.M.R.,
Journal of Computer Applications. 92 Combination of PCA with SMOTE
(2014) pp. 26–31. resampling to boost the prediction
Kothandan, R., Handling class imbalance rate in lung cancer dataset, (2014)
problem in miRNA dataset associated with arXiv:1403.1949v1.
cancer, Bioinformation. 11 (2015) p. 6. Ng, W., Dash, M. An evaluation of
Laurikkala, J., Improving identification of progressive sampling for imbalanced data
difficult small classes by balancing class sets, in 6th IEEE International Conference
distribution. in Conference on Artificial on Data Mining—Workshops. (2006) pp.
Intelligence in Medicine in Europe. (2001) 657–661.
pp. 63–66. Ooi, P. Tan, Genetic algorithms applied to
Li, L., Weinberg, C.R., Darden, T.A., multi-class prediction for the analysis of
Pedersen, L.G., Gene selection for sample gene expression data. Bioinformatics. 19
classification based on gene expression (2003) pp. 37–44.
data: Study of sensitivity to choice of Park, S., Koo, J.S., Kim, M.S., Park, H.S.,
parameters of the GA/KNN method, Lee, J.S., Lee, J.S., Kim, S.I., Park, B.-W.,
Bioinformatics. 17 (2001) pp. 1131–1142. Characteristics and outcomes according
Liu, H.-C., Peng, P.-C., Hsieh, T.-C., Yeh, to molecular subtypes of breast cancer as
T.-C., Lin, C.-J., Chen, C.-Y., Hou, J.-Y., classified by a panel of four biomarkers
Shih, L.-Y., Liang, D.-C., Comparison using immunohistochemistry, The Breast.
of feature selection methods for cross- 21 (2012) pp. 50–57.
laboratory microarray analysis, IEEE/ Perou, C.M., Sørlie, T., Eisen, M.B., Van De
ACM Transactions on Computational Rijn, M., Jeffrey, S.S., Rees, C.A., Pollack,
Biology and Bioinformatics. 10 (2013) pp. J.R., Ross, D.T., Johnsen, H., Akslen,
593–604. L.A., Molecular portraits of human breast
Ma, J., Nguyen, M.N., Rajapakse, J.C., tumours, Nature. 406 (2000) p. 747.
Gene classification using codon usage Quinlan, J.R., C4. 5: Programs for Machine
and support vector machines, IEEE/ACM Learning. Amsterdam, The Netherlands:
Transactions on Computational Biology Elsevier. 2014.
and Bioinformatics. 6 (2009) pp. 134–143. Ramaswamy, S., Ross, K.N., Lander, E.S.,
Mahata, P., Exploratory consensus of Golub, T.R., A molecular signature of
hierarchical clusterings for melanoma and metastasis in primary solid tumors, Nature
breast cancer, IEEE/ACM Transactions on genetics. 33 (2003) p. 49.
Computational Biology and Bioinformatics Seema Patel, Shadab Ahmed, Eswari, J.S.
(TCBB). 7 (2010) pp. 138–152. Therapeutic cyclic lipopeptides mining
Mitra, A.P., Almal, A.A., George, B., Fry, from microbes: Strides and hurdles,
D.W., Lenehan, P.F., Pagliarulo, V., Cote, World Journal of Microbiology and
R.J., Datar, R.H., Worzel, W.P., The use Biotechnology. 31 (2015), pp. 1177–1193.
of genetic programming in the analysis of Sehgal, A.K., Das, S., Noto, K., Saier, M.,
quantitative gene expression profiles for Elkan, C., Identifying relevant data for a
identification of nodal status in bladder biological database: Handcrafted rules
cancer, BMC Cancer. 6 (2006) p. 159. versus machine learning, IEEE/ACM
524 Handbook of Artificial Intelligence in Biomedical Engineering
electroencephalogram (EEG) P
information, 494 Particle filter, 497–499
locations of sources, 501 particle generation, 498
particle filter, 497–499
resampling, 498–499
process of, 500
Patient monitoring. See also Diagnosis
real EEG data, 501
assistance
Neuroimage segmentation, 265–266
Li-Fi technology, 314–316
Neuroimaging applications
anatomical brain segmentation, 267 LIMOS framework, 318–320
brain lesion segmentation, 267 overview to, 312–313
Neuroscience, 480–481 proposed idea, 316–320
decoding mechanism, 480–481 stages of, 313
neural encoding, 481 Wi-Fi role in, 313–314
Nodal metastasis, 484 Performance metrics
Normalization of dataset, 6–8 classification accuracy, 5
confusion, 5
O cross-validation, 5
Object recognition, 57 sensitivity, 5
Opposition-based firefly algorithm, specificity, 5
385–386 Pharmacokinetics and
melded with information gain, pharmacodynamics, 71
389–393 Pharmacy automation, 119
pseudocode, 387 Pipeline speed comparison, 247–248
results of simulation experiments, Pixel-based segmentation, 56
387–389 Positron emission tomography
Opposition-based learning, 386 for brain imaging, 265
Optical wireless channel (OWC),
Preformulation parameters, 69
324–325
Preprocessing, 54–56
block diagram of, 324–325
Privacy-preserving data mining
line of sight, 324–325
(PPDM), 300
non-line of sight, 324–325
path loss, 325–326 Probabilistic boosting tree, 117
Outcome prediction using ANN, 63–64 Prognostics model, 208–213
artificial neural network classifier, application of, 211–212
66–67 assessment of, 212–213
biopsy, 64 model building, 212
clinical and pathological stages, 65–66 Prostate cancer, 482–483
Digital Rectal Examination (DRE), 64 Prostate-Specific Antigen (PSA), 64
naive Bayes classifier, 67–68 Pseudonymization, 301
primary and secondary Gleason
patterns, 64–65 Q
Prostate-Specific Antigen (PSA), 64 Quantity structure–activity relationship
support vector machine classifier, 68 (QSAR), 70–71
536 Handbook of Artificial Intelligence in Biomedical Engineering