1 s2.0 S095741742300221X Main

Expert Systems With Applications 221 (2023) 119720
Contents lists available at ScienceDirect
Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa
Phonocardiogram signal classification for the detection of heart valve

diseases using robust conglomerated models
Sunil Kumar Prabhakar , Dong-Ok Won *
Department of Artificial Intelligence Convergence, Hallym University, Chuncheon 24252, South Korea
A R T I C L E I N F O A B S T R A C T
Keywords: The diagnosis of cardiovascular diseases is quite important in the field of medical community. An important
PCG physiological signal of human body is heart sound and it arises due to the blood turbulence and pulsing of cardiac
NMF structures. For the early diagnosis of heart diseases, the analysis of heart sounds play an important role as they
Feature selection
contain a huge quantity of pathological information associated with heart. To detect heart sounds, Phonocar
Machine learning
Deep learning
diogram (PCG) is used as it is a highly useful and non-invasive technique and can be easily analyzed well. In this
paper, some efficient models are proposed for the classification of PCG signals. Two important and robust
conglomerated models are proposed initially, wherein the first strategy utilizes the concept of semi-supervised
Non-negative Matrix Factorization (NMF) along with Brain Storming (BS) optimization algorithm and an
advanced version of BS termed as Advanced BS (ABS) is proposed and then it is merged with Genetic Pro
gramming (GP) so that new algorithms such as BS-GP and ABS-GP are formed and finally the features selected
through it are fed to classification through machine learning. The second strategy utilizes the concept of using
three dimensionality reduction techniques along with Fuzzy C-means (FCM) clustering and then an Advanced
Sine-Cosine (ASC) optimization algorithm with three different modifications is proposed for the purpose of
feature selection and finally it is classified. Deep learning techniques were also employed in the study such as the
usage of an Attention based Bidirectional Long Short-Term Memory (A-BLSTM), Ordinal Variational Autoencoder
(O-VAE), Conditional Variational Autoencoders (CVAE), Hyperspherical CVAE (H-CVAE) and the Restricted
Boltzmann Machine based Deep Belief Network (RBM-DBN) for the classification of PCG signals. The experiment
is conducted on a publicly available dataset and results show that a high classification accuracy of 95.39% is
obtained for the semi-supervised NMF concept with ABS-GP technique and Support Vector Machine (SVM)
classifier.
1. Introduction: quite cost effective and non-invasive in nature. Among the cardiovas
cular diseases (CVDs), a high mortality rate is present or evident for
Health care has recently witnessed an evolutionary growth especially heart valve diseases (HVD) (Schmidt et al., 2010). When the heart valves
in the past two decades (Zhao et al., 2015). For pumping the blood are damaged, these diseases occur. In order to prevent the backward
throughout the body, the most essential organs required in the human flow of blood, there are four valves in human heart such as aortic valve,
body is heart, and it comprises of two important cavities called ventri mitral valve, pulmonary valve, and tricuspid valve (Tang et al., 2016).
cles (Springer et al., 2016). For the complete pumping of blood and to For the heart to function properly, it is essential that the heart valves
circulate it to all the essential organs of the human body, components open and close appropriately so that the mechanical activity of the heart
such as atria, veins and valves are utilized. During the cardiac cycle, can be well maintained. During this mechanical activity of the heart, a
when the valves open and close, heart sounds are produced. For the sound is recorded and the physicians usually utilize a stethoscope to
heart disease detection, a huge space is occupied for research purposes listen to these sounds. A graphical indication can be produced by PCG
by a method called auscultation (Liu et al., 2013). Due to the turbulent during the mechanical activity of the heart and most valuable infor
flow of blood in the blood vessels, heart abnormalities occur and for the mation can be provided for the diagnosis of various defects associated
detection of it, the most commonly used technique is auscultation as it is with the heart (Ari et al., 2010). Thus, PCG servers as a low-cost
* Corresponding author.
E-mail address: dongok.won@hallym.ac.kr (D.-O. Won).
https://doi.org/10.1016/j.eswa.2023.119720
Received 4 October 2022; Received in revised form 18 January 2023; Accepted 17 February 2023
Available online 24 February 2023
0957-4174/© 2023 Elsevier Ltd. All rights reserved.
S.K. Prabhakar and D.-O. Won Expert Systems With Applications 221 (2023) 119720
diagnostic tool for the detection of defects in heart. A visual display of accuracy of 100% was obtained (Tuncer et al., 2021). For this same
the heart sound waveform is provided by PCG and it has proved to give private PCG dataset, MFCC, Discrete Wavelet Transforms (DWT) were
most valuable information about heart conditions. PCG helps to analyze utilized with KNN, SVM and deep learning and a classification accuracy
the different cardiac murmurs and heart sounds for the diagnosis of of 97% was obtained (Yaseen Son and Kwon, 2018). The 2D scalograms
heart diseases. with Continuous Wavelet Transforms (CWT) and deep learning CNN was
The PCG signal processing usually involves the analysis and evalu utilized for the Physionet/CinC 2016 challenge dataset and reported
ation of normal heart murmurs sound which includes the in depth un accuracies above 90% (Singh et al., 2020). A neuro-fuzzy based
derstanding of the amplitude, frequency, duration, and other modelling with CNN, Artificial Neural Networks (ANN), KNN, SVM and
characteristics (Dokur & Ölmez, 2008). When decisions are to be made DT was utilized for Physionet/CinC 2016 challenge dataset and an ac
with the help of diagnosis of heart disease, PCG assist the other tradi curacy of 93% was obtained (Soares et al., 2020). A fusion framework
tional methods such as Magnetic Resonance Imaging (MRI) and cardiac model dependent on multi-domain features and deep learning features
echocardiography. For the diagnosis of heart diseases, the auscultation of PCG was utilized for 175 subjects and with the help of MFCC co
technique is quite subjective as it depends on the physicians interpre efficients and CNN, it produced an accuracy of 90.43% (Li et al., 2020).
tation based on his/her experience and knowledge. Physicians are The concept of Principal Component Analysis (PCA) along with SVM and
required to have very high levels of diagnostic experience with auscul KNN classifiers for different kernels and number of neighbours are uti
tation techniques which takes years to acquire and therefore automated lized here for two private PCG datasets an accuracy of 96% was obtained
diagnosis of heart disorders are highly essential (Ölmez & Dokur, 2003). for the first one and an accuracy of 100% was obtained for the second
In the recent years, with the advent of Artificial Intelligence (AI) tech one respectively (el Badlaoui et al., 2020). A Fano-factor constrained
niques, the detection of heart abnormalities can be done so easily by tunable quality wavelet transform with genetic algorithm based light
means of classifying the PCG signals accurately. The important clinical Gradient Boosting Machine (GBM) was utilized to get an accuracy of
information related to the PCG signal are the shape and amplitude of 90.25% on the Physionet/CinC challenge 2016 dataset and the PASCAL
sounds, the systolic and diastolic duration along with the presence of dataset (Sawant et al., 2021). A Savitzky-Golay filter along with an
murmurs and sounds (Akay et al., 1994). For the diagnosis of HVDs, a ensemble CNN model is used to construct a phonogram prediction model
long-term recording of the PCG signals is utilized as it is usually and an accuracy of 89.81% was obtained with a 10-fold cross validation
collected from the Intensive Care Unit (ICU) of hospitals. A large technique for the Physionet/CinC 2016 challenge dataset (Wu et al.,
quantity of PCG data can be procured from such recordings and there 2019). A time-growing neural network model was developed for
fore medical practitioners find it extremely challenging for diagnosing detecting the systolic pathological patterns of PCG reporting an accuracy
the problem and therefore, computer aided diagnosis is utilized. In the of 84.2% for PCG datasets collected directly from hospital (Gharehbaghi
past two decades, tremendous research efforts have been recognized for et al., 2019). The MFCC was utilized with Shannon energy envelope and
the automated diagnosis of heart disorders with the help of PCG signals Hilbert Transform (HT) for feature extraction and then a genetic algo
using AI techniques. Some of the most important, recent, and prominent rithm was utilized for feature selection and finally it was classified with
works done in the PCG signal classification are reviewed as follows. KNN reporting an accuracy of 98.78% for the PCG recordings considered
A wavelet decomposition method and morphological transforms from the Ziaee hospital database (Rujoie et al., 2020). Novel features
analysis were implemented with Support Vector Machine (SVM) clas were developed from synchrosqueezing transform and then classified to
sifier reporting an accuracy of 91.43% for representative global dataset obtain an accuracy of 83.48% and the analysis was reported on their
of 198 heart sound signals which comes from both healthy medical cases own dataset (Pathak et al., 2020). A chirplet transform with multiclass
and cases with heart valve diseases (Maglogiannis et al., 2009). A quality composite classifiers was utilized for a public dataset reporting a clas
assessment using homomorphic filters and a localization algorithm was sification accuracy of 98.33% (Ghosh et al., 2020). The temporal quasi-
classified with SVM, K-nearest neighbours (KNN) and Decision Trees periodic and more discriminative features were analyzed with LSTM on
(DT) for the Pascal Classifying Heart Sound Challenge Dataset reporting the Physionet/CinC 2016 challenge dataset and reported an accuracy of
an accuracy of 97% (Mubarak et al., 2018). The heart sounds were 94.84% (Zhang et al., 2019). Using simple signal processing techniques
classified based on the wavelet fractals and twin SVM for the Physionet/ and machine learning algorithms, the heart sound classification was
CinC Challenge 2016 dataset reporting an accuracy of 90.40% (Li et al., implemented for Physionet/CinC 2016 challenge dataset and PASCAL
2019). Using wavelet threshold denoising, S transforms and discrete dataset and a best accuracy of 87.5% was obtained for the GBM tech
time–frequency energy features, the heart sounds were classified with nique (Zeinali & Niaki, 2022). A total of 515 features were extracted
SVM for their own collected dataset reporting an accuracy of 94.12% from nine different domains and when classified with SVM, a high ac
(Chen & Zhang, 2020). Similarly, a wavelet packet decomposition tree curacy of 88% was obtained for the Physionet/CinC 2016 challenge
yielding a Renyi entropy basis selection with SVM was performed dataset (Tang et al., 2018). An envelope optimization model was utilized
generating a classification accuracy of 99.74% for their own dataset with SVM for two public PCG datasets and classification accuracies of
(Safara & Ramaiah, 2020). An ensemble learning technique based on more than 96% was obtained for both the datasets (Yang et al., 2020).
bagging and boosting concept was implemented for Physionet 2016 The classic Hidden Markov Model (HMM) with MFCC was implemented
challenge dataset and reported a classification accuracy of 86.60% to analyze the PCG signals and it reported an accuracy of 95.08% when
(Baydoun et al., 2020). By means of fusion of Temporal and Cepstral HMM has about four states (Wu & Kim, 2010). An investigation on the
features, SVM was employed for classification of PCG signals reporting a time–frequency features with the CNN based automatic heart sound
classification accuracy of 95.24% for Egeneral Medical Heart Murmur classification was done reporting an accuracy of 86.02% for the Physi
Database and University of Michigan Heart Sound Database (Aziz et al., onet challenge database (Bozkurt et al., 2018) and another famous study
2020). A fractional Fourier transform with Mel Frequency Cepstral Co reported with the same dataset deals with the ensemble features with
efficients (MFCC) was analysed with KNN and SVM reporting a classi deep learning reporting an accuracy of 85% (Potes et al., 2016).
fication accuracy of 94.69% for the Physionet 2016 challenge database The main contributions proposed in this work is as follows:
(Abduh et al., 2020). Another important work done utilizing the Phys
ionet 2016 challenge database was reported with the usage of MFCC a) Once the basic pre-processing of the PCG signals is carried out using
features and Parallel Recurrent Convolutional Neural Network (PR- Savitzky-Golay filter, the first strategy utilizes the concept of semi-
CNN) reporting an accuracy of 98.34% (Deng et al., 2020). For a private supervised Non-negative Matrix Factorization (NMF) along with
PCG dataset, a Peterson graph pattern technique was employed for the Brain Storming optimization algorithm (BS) along with an advanced
sake of feature generation and then it was decomposed into low- and version of BS termed as Advanced BS (ABS) and then it is merged
high-level features and finally classified with KNN, a classification with Genetic Programming (GP) so that new algorithms such as BS-
2
GP and ABS-GP are formed and finally the features selected through
2.1. Semi-supervised NMF
it are fed to classification through machine learning classifiers such
as SVM, KNN, Naïve Bayesian Classifier (NBC), Adaboost and
The label information of the initial ‘d’ datapoints pi (1⩽i⩽d) is
Random Forest (RF).
assumed without loss of generality and there are totally ‘z’ classes in
b) The second strategy utilizes the concept of using three dimension
ality reduction techniques such as Maximum Variance Unfolding total. An indicator matrix M ∈ Rd×z is introduced so that the available
(MVU), Laplacian Eigenmaps (LE) and Kernel Principal Component label information can be incorporated as follows:
analysis (K-PCA) along with Fuzzy C-Means (FCM) clustering and {
1, pi belongsto the jth cluster
then an Advanced Sine-Cosine (ASC) optimization algorithm with 3 mij = (6)
0, otherwise otherwise
different modifications and versions is proposed for the purpose of
feature selection and finally it is classified through machine learning. A label constraint matrix Q is defined with the help of the indicator
c) Deep learning techniques were also employed in the study such as matrix M as follows:
the usage of an Attention based Bidirectional Long Short-Term [
M 0
]
Memory (A-BLSTM), Ordinal Variational Autoencoders (O-VAE), Q= (7)
0 In− d
Conditional Variational Autoencoders (CVAE), Hyperspherical CVAE
(H-CVAE) and the Restricted Boltzmann Machine based Deep Belief where an identity matrix is expressed by In− d and is an (n − d) × (n − d)
Network (RBM-DBN). matrix. The 0 s are nothing but zero metrics with compatible dimension.
d) The above proposed robust conglomerated models are the first of its An auxiliary matrix S is introduced so that B = QS, and the objective
kind to be proposed in PCG signal classification for the detection of function is minimized by the Constrained NMF (CNMF) algorithm with
Heart Valve problems. The work deals with an interesting the label constraints so that is reduced for the Frobenius norm cost
conglomeration of efficient models so that a better classification function and divergence cost function (Cai et al., 2011).
accuracy could be obtained in order to improve the detection of heart ⃦ ⃦2
valve diseases. minCF = ⃦P − AST QT ⃦F
A,S (8)
s.t. A⩾0, S⩾0
2. Proposed approach 1: Semi-supervised NMF based hybrid BS-
GP and hybrid ABS-GP algorithm with classification ( )
pij ( )
minCKL = Σi,j pij log ( T T ) − pij + AST QT ij
Considering a PCG data matrix P = [p1 , ..., pn ] ∈ Rm×n , where the
A,S AS Q ij (9)
feature vector is indicated by pi , the number of samples is represented as s.t. A⩾0, S⩾0
n and the dimension of the feature vectors is indicated by m. The original
high-dimensional non-negative data matrix P is decomposed by NMF The mapping of the data points with the similar label are done into
into 2 low-rank non negative factors A ∈ Rm×c and B ∈ Rn×c , the same group of the low-dimensional space by the CNMF. For the
matrices A, S, the iterative updating rules for (8) are expressed as:
c≪min{m, n}, so that the matrix P is approximated by ABT as far as
possible and it is specified by P ≈ ABT . The quality of the approximation (PQS)ij
aij ←aij ( ) (10)
is quantified by two widely used cost functions such as the divergence AST QT QS ij
between the two matrices and the square of the Frobenius norm of the
matrix differences (Kotsia et al., 2007). The problem is expressed for the (
QT PT A
)
Frobenius norm cost function as follows: (11)
ij
sij ←sij ( )
QT QSAT U ij
⃦ ⃦2
minCF = ⃦P − ABT ⃦T
A,B (1) A partly clustering indicator matrix L is introduced such as L =
( )
s.t. A⩾0, B⩾0 MT Oz×(n− d) , so that the following objective function is minimized by
Once the utilization of the divergence cost function is done, the their Discriminative NMF (DNMF) algorithm and is represented as:
problem is expressed as follows: ⃦ ⃦2 ⃦ ⃦2
min CF = ⃦P − ABT ⃦F + α⃦L − QBTd ⃦F
∑( ) Q,A,B (12)
pij
minCdiv = pij log − pij + Σk aik bjk s.t. A⩾0, B⩾0
Σk aij bjk (2)
A.B
i,j
where α > 0. Q ∈ Rz×m takes the negative values and the matrix formed
s.t A⩾0, B⩾0
by means of setting all rows of B except 1, 2, ..., d to zero and is expressed
The objective function CF in (1) and Cdiv in (2) are convex with by Bd . For the matrices of A, B, Q the iterative updation rules are
respect to A and B respectively. Hence finding the global minimum for expressed as:
either CF or Cdiv is unrealistic. In order to prove their convergence, the
(PB)ij
updation rules are presented for the 2 minimization problems. To aij ←aij ( ) (13)
minimize the objective function CF in (1), the method is as follows: ABT B ij
( T ) ( ( )− ( )+ )
(PB) P A jk PT A + α Bd QT Q + α LT Q
aik ←aik ( T ik) , bik ←bjk ( T ) (3) bij ←bij ( ( )+ (
ij
)− ) (14)
AB B ik BA A jk
BAT A + α Bd QT Q + α LT Q ij
To minimize the objective function Cdiv in (2), the method is as

( )−
follows: (15)
1
Q = LBd BTd Bd
∑( ⃒ )
⃒
j pij bjk Σs ais bjs Thus, to improve the classification accuracy, CNMF and DNMF
aik ←aik (4)
Σj bjk techniques are widely used. Fig. 1 shows the illustration of the first
proposed strategy. The obtained NMF and semi supervised NMF features
∑( ⃒ )
pij aik ⃒Σs ais bjs are merged and then using the proposed feature selection techniques,
bjk ←bjk i
(5) the best features are selected and fed to classification.
Σi aik
3
Fig. 1. Proposed strategy 1: Semi-supervised Non-negative Matrix Factorization NMF based Hybrid BS-GP and Hybrid ABS-GP algorithm with classification.
2.2. BS optimization algorithm: Algorithm 1 (continued )

Evaluate the fitness value for each individual
The collective attitude of behavior of human beings to solve issued is Cluster center: Disruption stage is carried out
simulated by BS algorithm (Shi 2011). Even multi-peaked high-dimen Assessment of replacing a central individual randomly is done.
While cit < mit do
sional function problems are solved by this BS optimization algorithm. Clustering of the Ps individuals into Nc clusters is done by using K-means clustering
In a specific population, there are a lot of individuals and every indi Generation of new individuals is done
vidual indicates a prospective solution to the problem. The framework is Select an individual randomly or hybrid two individuals so that new individuals are
expressed in Algorithm 1. generated as per equation (16).
Selection stage is implemented
The newly generated individuals are compared with the old individuals
Algorithm 1
The superior individuals are selected
Pseudocode of BS algorithm Enter the next iteration
End While
Input: The population size Ps , number of clusters Nc , current number of iterations cit
and maximum number of iterations mit .
Output: Best individual and the respective fitness value The generation of the Ps solutions of the problem which has to be
Initialize randomly the Ps individuals
solved is initiated. With the help of clustering algorithms, these Ps in
(continued on next column)
dividuals are split into Nc clusters, where Nc denotes a pre-set parameter
Fig. 2. Schematic illustration of GP algorithm.
4
in BSO. Based on the fitness value, the rankings of the individuals in each individuals, the selections of the merging individuals are done in mul
cluster is done. The assignment of the paradigmatic individual in every tiple clusters. The population diversity of the algorithm is used to
cluster is done as the central individual of the cluster. In a random maintain the new individuals depending on the information of multiple
manner, the selection of the central individual of the cluster is done. individuals.
Depending on the probability value p, it decides whether another
randomly generated individual could easily replace it. Then the upda 2.3. Advanced BS (ABS) optimization algorithm
tion of the population is done. To control the whereabouts of the gen
eration of a new individual or by amalgamating two individuals, pcluster is To retain the supreme individuals in the BS optimization algorithm,
used for every individual update iteration. The analysis of the generated two strategies can be considered so that the information exchange in
new individual can be a central individual or an ordinary individual and between the individuals is enhanced (Shi et al., 2013). A nearest
it is decided by pcentral . Finally, the calculation of the fitness values of the neighbour strategy and global optimal strategy is considered here. In the
newly generated individuals are done. It is then compared and con nearest neighbour strategy, the generation of the new individuals are
trasted with the current individuals and finally the retainment of the guided by the neighbour individuals and so between the similar in
better individuals is done. Based on equation (16) and (17), the gener dividuals, the information exchange is increased thereby the exploration
ation of the new individuals are done as: capability of the algorithm is improved. The mathematical expression is
(
pinew = piold + ξ(t)*N μ, σ 2
)
(16) as follows:
( ) ( )
( ) pinew = piold + pnear − piold *N μ, σ2 (19)
0.5*T − t
ξ(t) = logsig ∗ rand() (17)
k where pnear is the closest individual to the ith individual traced by the
nearest neighbour algorithm. In the global optimal strategy, the gener
where the ith new individual is represented as pinew and the individual to ation of the new individuals are guided by the globally optimal in
be updated is represented as piold . dividuals and so more data about the optimal individual is preserved.
Here the random number dependent on normal distribution is The mathematical expression is given as follows:
( )
expressed as N μ, σ 2 . Random function is expressed as rand() and the ( ) ( )
random values are generated within (0, 1). The maximum number of pinew = piold + poptimal − piold *N μ, σ2 (20)
iterations are specified by T and the current number of iterations are
where the global optimal individual is represented by poptimal and it is
separated by t respectively. To control the logsig() function, the coeffi
processed by comparing individuals at the cluster centers. To assess
cient k is used. It helps to modify the search step-seize of ξ(t) so that the
whether the new individual is generated by the nearest neighbour
convergence speed of the procedure can be balanced. The assessment of
strategy or the global optimal strategy, the probability value of pcluster is
transfer function is done as follows:
used.
1
logsig(α) = (18)
1 + exp( − α) 2.4. GP Algorithm
The selection of the individuals is done from one cluster so that new
individuals are generated and therefore the evolution area of BSO is In order to ascertain a mathematical correspondence between
managed well and controlled. Moreover, to create and generate new dependent variables and independent variables, a famous evolutionary
algorithm used is GP (Keijzer, 2004). A collection of terminator sets and
Fig. 3. Schematic illustration of Hybrid BS-GP algorithm and Hybrid ABS- GP algorithm.
5
Fig. 4. Second Proposed Strategy: Dimension Reduction techniques with FCM and Advanced Sine-Cosine Optimization based classification.
function sets are utilized to assemble a mathematical expression in GP fication is considered in this process. In order to create the new in
algorithm. A random selection of both the terminator sets and function dividuals, the randomly generated subtree replaces the original subtree.
sets are done depending on the evolutionary operations of GP. By means
of fitting the data, the mathematical function is constructed and it has no 2.4.4. Hoist mutation
clear physical meaning. Variables, function relationships and constants The unbounded expansion of the tree is completely controlled by the
are included in the terminator sets. Boolean operations, arithmetic hoist mutation operation. The hoist mutation probability hmp3 is used to
symbols, conditional expressions, and mathematical functions are control it. To produce a new individual, the random selection of the
included in the function sets. A function tree is utilized to specify every lower subtree of a main subtree of the tournament winner is done and
individual in GP. Based on some rules, the selection of the nodes of the then to the original subtree it is hoisted.
tree are done from the function set and terminator set. The association
between dependent variables and independent variables is specified by a 2.4.5. Point mutation
sequence of formula trees in the GP algorithm and then the generation of The probability of point mutation pm4 is used to control the point
initial individual are done. Basic evolution operations such as selection, mutation. The random selection of a node of the tournament winner is
crossover and mutation operations are ascertained on the formula trees. done in this operation initially. Then, in order to create the new indi
The assessment of formula trees to determine the other evolution op vidual, it is replaced by a node which is randomly generated.
erations are judged by the selection operation. Tournament selection is
the most commonly used selection technique. For the individuals 2.5. Proposed hybrid BS-GP and hybrid ABS-GP algorithm
selected and picked by the tournament, the crossover and mutation
operations are implemented. Point mutation, subtree mutation and host In the proposed BS-GP algorithm, the BS and its improvised version is
mutation are the 3 types of mutation operations. The succeeding gen utilized to optimize the 4 main controlling parameters of GP such as
eration is produced with the help of this operations. Unless the termi crossover probability cp1 , subtree mutation probability mp2 , hoist mu
nator condition is satisfied, the iteration of this process sis done. The tation probability hmp3 and point mutation probability pm4 . The main
main implementation of GP is shown in Fig. 2. procedure of the proposed Hybrid BS-GP and Hybrid ABS-GP as follows.
The determination of initial conditions is done such as population size
2.4.1. Selection operation Ps , current number of iterations cit , maximum number of iterations mit
The individuals which can evolve into the next generation is decided and the number of clusters Nc . The maximum iterations of BS and ABS is
by this selection operation. The tournament selection method is prin also set along with the lower and upper bounds. The maximum iteration
cipally considered in GP. Initially, from the parent population, tourna of GP, the population size and the function set/terminator set of the BS-
ment size of the competitive population of size tsize is selected randomly. GP and ABS-GP algorithm is determined. Into two different categories,
In the competitive population, the selection of an individual with the the preliminary conditions are prioritized (i) computing condition of BS-
best fitness values id done so that it can join the offspring population. GP and ABS-GP algorithm. (ii) the controlling parameters of BS, ABS and
Unless the offspring population size sticks out close to the original GP. The four major controlling parameters of GP is the optimized pa
population size, the repetition of this operation is done. The selection rameters. Therefore, there are ‘four’ optimized parameters in our
process is controlled by the tournament size tsize . experiment and the determination of their upper and lower bounds are
done in advance as it can have a significant influence on execution time.
2.4.2. Crossover operations To establish the model function, the function set and terminator set are
In this operation, between the selected individuals, the mixing of utilized. The controlling parameters of BS, ABS and GP are set before
genes is done and it is administered by the crossover probability cp1 . The hand. The random initialization of the individual positions within the
winners of the tournament-dependent selection procedure are consid upper and lower bounds are determined. By utilizing the selected fitness
ered as therefore 2 parents are formed initially. The crossover operation function, the computation of fitness values can be done. Generally, Mean
in between them is carried out and it is randomly selected. In order to Square Error (MSE) and Root Mean Square Error (RMSE) are widely
generate 2 individuals of the next generation, the exchanging of one used as fitness functions and in our research, MSE is implemented. The
subtree of a winner is done with another corresponding subtree of decision variables are nothing but the parameters which has to be
another winner. optimized. The random generation of the initial values by the BS and
ABS algorithm is considered to the initial parameters of GP algorithm.
2.4.3. Subtree mutation The relationship between variables can be indicated by the formula tree
The mutation probability mp2 is used to control the subtree mutation and is generated by GP. Once the relationship is generated and estab
operation. The eliminated individuals can be reintroduced into the lished, the computation of the corresponding fitness value is done uti
population by this operation. The diversity of the population is easily lizing the fitness functions. The optimal fitness value of the position of
perpetuated by this goal. The random selection of one of the subtrees of the individual is considered as the initial position of BS and ABS algo
the tournament winner along with another subtree with similar speci rithm. Every individual in BS and ABS specifies a collection of versatile
6
⃦ ⃦2
optimized parameters. The individual values of BS and ABS are then ⃦ ⃦
(a) kii − 2kij + kjj = ⃦pi − pj ⃦ for ∀(i, j) ∈ G
transferred to the inner loop as means of controlling parameters. The ∑
generation of GP individuals are done and then their fitness values are (b) ij kij = 0
computed. A formula tree is characterized by every individual which (c) K⩾0
indicates a relationship between dependent and independent variables.
The fitness value indicates nothing but the quality of formula tree. Un By implementing a Singular Value Decomposition (SVD), the low-
less the termination conditions are satisfied, the performance of GP dimensional data representation Q can be obtained from the solution
operations continues. Once the fitness values are obtained after the K of the SDP.
termination of the inner loop, the best value obtained is considered as
the fitness value of the current generation. This implies that the optimal 3.1.2. Kernel PCA
formula tree of this generation is found out by the BS-GP and ABS-GP When the traditional linear PCA is reformulated using a Kernel
algorithm. When a maximum number of iterations are reached by GP function construction in a high dimensional space, it is called as Kernel
algorithm, the inner loop can be terminated. The inner loop can also be PCA (Moghaddasi et al., 2014). The principal eigen vectors of the Kernel
terminated if the error of one generation is much less than 10− 4 . Unless matrix are computed by the Kernel PCA. In a straightforward manner,
the termination condition of outer loop is satisfied, the next BS/ABS the PCA is reformulated in a kernel space using the kernel function
iteration begins. Therefore, for all the generations, the optimal formula because the kernel matrix is quite similar to the inner product of the
tree has been found out. When a maximum number of iterations are respective datapoints. When PCA is implemented in the Kernel space, it
reached by BS/ABS algorithm, the termination criteria of the outer loop forms a Kernel PCA thereby the non-linear mappings can be constructed
is achieved. The optimized parameters can be easily obtained once the very easily.
termination conditions of the outer loop are pleased and therefore the For the datapoints pi , the kernel matrix K is computed by the Kernel
best features are selected to feed it to classification. The flowchart for the PCA. The definition in the entries of the Kernel matrix is expressed as:
proposed BS-GP and ABS-GP algorithm is shown in Fig. 3. (
kij = ke pi , pj
)
(22)
3. Proposed approach 2: Dimension reduction techniques with where ke represents a kernel function. Utilizing the following modifi
FCM and advanced Sine-Cosine optimization based classification cation, the centering of the Kernel matrix K is done as follows:
3.1. Dimensionality reduction techniques 1∑ 1∑ 1∑

kij = kij − kil − kjl + 2 klm (23)
n l n l n lm
In this work, the non-linear dimension reduction is expressed as
Subtraction of the mean of the features in the standard PCA is cor
follows. The specification of a PCG dataset in a n × D matrix format P is
responded by the centering operation. The features in the high-
considered initially where there are ‘m’ data vectors pi (i ∈ {1, 2, ..., n} )
dimensional space which is expressed by means of using the kernel
with a distinct dimensionality D. When d < D or d≪D, then an inherent
functions that are zero-mean. The computation of the principal d eigen
dimensionality is obtained by this dataset. With the help of dimen ( )
vectors evi of the kernel matrix which is centered are done. Then the
sionality reduction, the dataset P with a dimensionality D is remodeled
computation of the eigen vectors of the covariance matrix α1 is done as
into a contemporary dataset Q with a dimensionality d. Fig. 4 shows the
they share a good relation to the eigen vector of the kernel matrix evi by
second proposed strategy.
using the following equation as:
3.1.1. Maximum variance unfolding (MVU) 1
αi = √̅̅̅̅ Pevi (24)
In MVU, a neighbourhood graph is defined on the PCG data and so λi
the pairwise distances are retained in the obtained graph. The unfolding
The projection of the data onto the eigen vector of the covariance
of the data manifold is attempted explicitly by the MVU (Luo et al.,
matrix αi is done so that the low-dimensional data representation is
2013). In between the datapoints, the Euclidean distances are maxi
obtained. Therefore, the low dimensional data representation Q is
mized under the condition that there is no change in the distances of the
expressed as a result of the projection and it is given as:
neighbourhood graph. Therefore, there is no distortion about the local
{ }
geometry of the data manifold. Then by utilizing Semi-Definite Pro ∑n
( ) ∑
n
( )
gramming (SDP), the solving of the optimization problem can be done qi = αj1 ke pj , pi , ..., αjd ke pj , pi (25)
easily. A neighbourhood graph ‘G’ is initially constructed where the
j=1 j=1
connection of every datapoint pi is done to its k-nearest neighbour pij (j = j

1, 2, 3, ...., k). In between all the datapoints, the sum of the squared where the jth value belonging to the vector α1 is specified using α1 and
Euclidean distances is maximized by the MVU, so that the preservation the kernel function and is expressed as ke and utilized for the kernel
of the distances which are to innermost region of the neighbourhood matrix computation. Based on the choice of the kernel function ke , the
graph G are done. In simple words, the optimization problem is carried mapping is implemented. Here in this work, Gaussian kernel is used
out as follows: instead of Linear kernel and Polynomial kernel.
∑ ⃦⃦
⃦2
⃦
Maximize ij ⃦qi − qj ⃦ subject to: 3.1.3. Laplacian Eigenmaps
⃦ ⃦ ⃦ ⃦ By means of preserving the local properties of the manifold, a low
⃦qi − qj ⃦2 = ⃦pi − pj ⃦2 for∀(i, j) ∈ G (21) dimensional data representation is found out by Laplacian Eigenmaps
which are quite similar to LLE (Wang et al., 2015). The pairwise dis
By means of defining a matrix K as an inner product specification of
tances between the near neighbours are utilized to construct the local
the low dimensional data representation Q, the optimization problem is
properties in Laplacian eigenmaps. The Laplacian Eigenmaps helps to
reformulated as a semi-definite programming problem by MVU. The
compute a low-dimensional representation of the data where the mini
following SDP has the similarity with the optimization problem and is
mization of the distance between a datapoint and its k-nearest neighbour
represented as:
are implemented successfully. In between a datapoint and its initial
The trace (K) subject to the conditions (a), (b) and (c) expressed
nearest neighbours, the distance in the low dimensional data represen
below is maximized with:
tation attributes highly to the cost function than it does when assessing
the distance between the specific datapoint and its second nearest
7
Fig. 5. Simplified Illustration of the Attention based BLSTM networks.
neighbour. The cost function minimization is considered as an eigen easily. The basic expression of it is as follows:
problem by means of utilizing the spectral graph theory. Every datapoint
∑
n
( )
pi is connected to its k nearest neighbours by means of construction of a J(A, B) = ai,jm d2 pj , bi (29)
neighbour graph G initially. By utilizing the Gaussian kernel function, i
the computation of the weights of the edge is done as all points pi and pj
where the object pj belongs to the sample set P = {p1 , p2 , ..., pN }, the
in graph G are linked by it, so that a sparse adjacency matrix W is found
out. When the low dimensional representations qi is computed, the cost number of sample data is represented as N and the number of clusters is
function minimized is expressed as: expressed as c→[1, N]. The membership degree of pj in the ith cluster is
∑( )2 indicated as ai,j . The distance measure between object pj and the cluster
( )
ϕ(Q) = qi− qj wij (26)
ij
center bi is expressed as d pj , ai and the fuzzifier constant is indicated
as m. The FCM clustering algorithm is outlined as follows:
Between the datapoints pi and pj , the large weights wij matches
Step 1: The values of m, c, N is set along with the accuracy of the
closely to small distances. Therefore, a large contribution to the cost objective function ε
function is offered by the differences between their low-dimensional
Step 2: The fuzzy partition matrix A is initialized.
representations qi and qj . Therefore, the neighboring parts in the high Step 3: The fuzzy partition matrix A and the cluster center B is
dimensional space are closely spaced and brought in conjunction with updated by (30) and (31) as follows:
the low dimensional representation. For expressing the minimizing
problem as an eigen problem, the degree matrix M and the graph Lap ∑
c
[ 2( )/ ( ) ]2/(m−
(30)
1)
ai,j = 1/ d pj , b i d 2 pj , b k
lacian L of the graph W is computed. The diagonal matrix representation k
is obtained where the degree matrix M of W is in a diagonal form as the
∑ ( )/
entries are nothing but the row sums of W. (i.e., mii = j wij ). ∑
N ∑
N
The computation of the graph Laplacian L is done as follows: bi = a pjijm aijm (31)
j=1 j=1
L = M− W (27)
Step 4: if |J(t) − J(t + 1)|〈ε, then it is terminated, otherwise Step 3 of
The following equation holds well as: FCM is repeated. Once the FCM is implemented, the best features are
∑( )2 selected by the proposed versions of Sine-Cosine Algorithm (SCA).
ϕ(Q) = qi − qj wij = 2QT LQ (28)
ij
3.3. Sine Cosine Algorithm (SCA)
Therefore, the ϕ(Q) is minimized and it is proportional to reducing
QT LQ. By means of solving the generalized eigenvalue issues, the low An optimization technique dependent on both the sine and cosine
dimensional data representation Q is easily found as: mathematical functions is SCA. (Lin et al., 2019). There are two stages in
Lev = λMev , for the d smallest nonzero eigen values. The smallest the optimization procedure of SCA such as exploitation and exploration.
nonzero eigen values which are corresponded by the d eigen vectors evi The methodology of finding an extensive area of search space is the first
forms the low-dimensional data representation Q. step of this metaheuristic algorithm. Therefore, in the exploration stage,
the promising regions are found out with an excessive rate of random
ness. The exploitation stage has a procedure of tracing the regions of a
3.2. Fuzzy C-means Clustering:
particular search space within a specific neighbourhood of earlier visited
points. The updation of the candidate solution positions in SCA (for both
Once the dimensionality reduction is performed, the most successful
the stages) is expressed as follows:
algorithms among the fuzzy clustering algorithms FCM is applied
⃒ ⃒
(Cheng et al., 2010). By means of optimizing the objective function J(A, Pt+1
i = Pti + r1 × sin(r2 ) × ⃒r3 Ptb − Pti ⃒ (32)
B), the membership degree of PCG data points is obtained to the clus
⃒ ⃒
tering centers so that the PCG data points class can be assessed and Pt+1
i = Pti + r1 × cos(r2 ) × ⃒r3 Ptb − Pti ⃒ (33)
hence the automatic categorization of the sample data can be done
8
Where the position of the current solutions is indicated as Pti . The 3.4.1. Advanced Sine Cosine Algorithm – Version 1 (ASC-1):
In the first proposed version of ASC algorithm, the standard position
random numbers are indicated by r1 , r2 , r3 respectively. The position of
updation mechanisms (Equations. (32) and (33)) of the original SCA are
the best solution is indicated by Ptb . The dimension is denoted by ‘i’ and
replaced by the combined average of the sine function and its surrogate
the iteration is specified by ‘t’. The equations (32) and (33) are utilized
sin(r2 ) and with the cosine function and its surrogate cos(r2 ) . The
′ ′
based on the conditions in the following equation as:

{ t ⃒ ⃒ candidate solution position in the ASC algorithm is expressed as:
Pi + r1 × sin(r2 ) × ⃒r3 Ptb − Pti ⃒, r4 < 0.5 ⃒ ⃒
Pt+1
i = ⃒ ⃒ (34) Pt+1 = Pt + r1 × 0.25 × [sin(r2 ).sin(r2 ) + cos(r2 ).cos(r2 ) ] × ⃒r3 Pt − Pt ⃒
i i b i
Pt + r1 × cos(r2 ) × ⃒r3 Pt − Pt ⃒, r4 ⩾0.5
i b i
(37)
Regarding the random numbers in the updation equations of the The purpose of adding a surrogate data is that it helps in reproducing
SCA’s, there are four parameters such as (r1 − r4 ). The r1 parameter is various statistical properties of a dataset and so it is used here. The
used to assess the area of a next position of a candidate solution as global best is represented by Ptb and is the end solution amongst all the Pti
follows: solution positions. For the updation of the positions, the mean of the
( t) sine, sine surrogate, cosine, cosine surrogate functions are used for the
r1 = c 1 − (35)
T random parameter (r2 ).
where the maximum number of iterations is specified by T, and the 3.4.2. Advanced Sine Cosine Algorithm – Version 2 (ASC-2)
current iteration number is specified by t. The constant is denoted by c. In this proposed version, the candidate solution position is randomly
In order to evaluate the direction of movement for the next solution, the determined from the population initially. It is utilized in lieu of the
r2 parameter is utilized. With the help of r3 parameters, the random destination or end position’s candidate solution. In this version, the
weights for Ptb is found out. In between the updating mechanism, the updation position equation is expressed as:
switches are provided by the r4 parameter. The pseudocode of original ⃒ ⃒
SCA is given in Algorithm 2 as follows: Pt+1 = Pt + r1 × 0.25 × [sin(r2 ).sin(r2 ) + cos(r2 ).cos(r2 ) ] × ⃒r3 Pt − Pt ⃒
i i r i
(38)
Algorithm 2
Pseudocode of SCA algorithm:
Now for the population, the random selection of the position of the
candidate solution is expressed by Ptr . During long iterations, there
cannot be any change in the global best solutions for long iterations in
Initialize a set of candidate solution positions randomly
Compute the objective function value for every candidate the original SCA algorithm. Therefore, the gathering of all the candidate
Save the best solution solutions occurs after a certain iteration and in fact it resembles very
Save the position of best candidate close to the global best solution position. Due to this, the local optima
While (t < maximum number of iteration)
cannot be removed and so it causes a late convergence. In order to
Update r1 parameter
For i = 1 to population size do
improve the diversity of the population Ptr is used instead of Ptb so that the
Update random parameters (r2 , r3 , r4 ) random data with respect to the search space can be easily traced.
If (r4 < 0.5)
Position updation of candidate solution is carried out by equation (32). 3.4.3. Advanced Sine Cosine Algorithm – Version 3 (ASC-3)
Else
A simple weighted average is applied to the updated position and the
Position updation of candidate solution is carried out by equation (33).
End if proposed update mechanism is expressed as follows:
Compute the objective value for each updated candidate ⃒ ⃒
Update the best solution P1 = Pti + r1 × [sin(r2 ). + cos(r2 )] × ⃒r3 Ptb − Pti ⃒ (39)
Update the position of the best solution
⃒ ⃒
End for i P2 = Pti + r2 × [sin(r2 ). + cos(r2 )] × ⃒r3 Ptr − Pti ⃒ (40)
End while i
Return the best solution at the loop end and collect the best features
Pt+1
i = 0.5 × (r4 P1 + r5 P2 ) (41)
3.4. Proposed hybrid and surrogate SCA where the updated positions with sine and cosine functions are
represented as P1 and P2 respectively. The normal distributed
The improvised version of the original SCA is written here and random number is expressed by r4 and r5 . In the traditional SCA,
expressed as ASC algorithm. Slow convergence, local optima problem the Equation (34) helps to determine the mechanism of position
and a long run time are some of the deficiencies of original SCA (Long updation, but in our experiment, three versions were proposed to
et al., 2018). A bad host value could be obtained by the refurbished find the updated positions P1 and P2 respectively. The pseudocode
position of the candidate solution and the updation positions can easily of ASC algorithm for all the three versions is given in Algorithm 3 as
go beyond the search space in the exploration phase. Therefore, to follows:
enhance the search performance of SCA and to overcome these demerits,
the update implementation procedure of the original SCA is focused and Algorithm 3
then a hybrid surrogate model is developed. Therefore, there is a sub Pseudocode of ASC algorithm (for all the three versions)
stantial improvement in the search ability of the algorithms in both the
phases. The most propitious areas in the search space are explored Initialize a set of positions of candidate solutions randomly.
locally by the proposed updated mechanisms. In the exploitation stage, Compute the objective function value for every candidate.
Save the best solution
the mutation implementation strategy is expressed as follows: Save the best position of the candidate
( ) While (t < maximum number of iterations)
Pt+1 = Pti + 0.1 × pmax − pmin × r5 (36)
i Old candidates = count candidates
Updation of r1 positions are done
where the upper bounds is indicated by pmax and the lower boundary of For i = 1 to population size do
the candidate solution position is expressed by pmin respectively. The The random parameters r2 , r3 are updated
normal distributed random number is expressed by r5 . (continued on next page)
9
Fig. 6. Simplified Illustration of the Deep Learning Modules with CVAE, HVAE and RBM-DBN networks.
Algorithm 3 (continued ) layer. The input features are considered and represented as X =
Positions of candidate solutions are updated using equation (37) in ASC {x1 , x2 , ..., xn }. The context information can be obtained by BLSTM but
algorithm due to the distance between entities and the uncertainty of utterance
Positions of candidate solutions are updated using equation (38) in ASC length, an attention layer is added after the BLSTM layer, so that the
algorithm
Positions of candidate solutions are updated using equation (39), (40), (41) in
information is captured effectively. Therefore, the spatio-temporal fea
ASC algorithm tures information along with the sequence information can be extracted
Position updation using Mutation operator easily with the help of BLSTM and attention. The attention mechanism
Boundary check mechanism is implemented. focusses on the salient regions. The outputs of the networks are
Candidate sent inside search space
concatenated so that a new feature sequence is formed. To procure the
Compute the objective function value for every updated candidate
If (cost value of updated candidate < cost value of the old candidate) higher-class probabilities, the utterance level features are utilized as a
Next candidate = updated candidate fully connected layer input. There are four important components in the
Else attention based BLSTM. The input layer comprises of the features that is
Next candidate = old candidate fed to the model. A BLSTM layer extracts the high-level representations
End if
Best solution is updated
from the input features. The attention layer helps to generate a weight
Best position is updated vector initially. In each time step, the weight vector is multiplied to the
End for i features of the frame level, so that a discourse level feature vector is
End while t formed. The generative utterance local features are projected as output
Return the best solution
in the output layer. The BLSTM with attention layer is expressed in the
End the loop.
section below. Fig. 5 depicts the overall description of the proposed deep
learning model.
Thus, the best features obtained are selected by the proposed ver
sions of SCA and fed to the machine learning classifiers. 4.1.1. BLSTM
To deal and dispense with capricious length sequences, RNN has
4. Deep learning based classification quite a natural ability. The transition function is applied to the internal
hidden state vector in a recursive manner. The back-propagation algo
The deep learning techniques utilized here are the Attention based rithm’s gradient vectors tend to enhance or decay exponentially when a
BLSTM networks, Conditional Variational Autoencoders (CVAE) and sequence is quite overlong during the training process and therefore to
Restricted Boltzmann Machine based Deep Belief Networks (RBN-DBN). order to address this problem, LSTM network was utilized. Long-range
To the PCG signals, the process of pre-emphasis, Fast Fourier Transform dynamic dependencies can be simulated BLSTM as it has three gates
(FFT), Log Mel spectrogram and Discrete Wavelet Transform (DWT) are such as input gate, forget gate and output gate. The sequential PCG data
applied and then MFCC are obtained and then it is fed to an A-BLSTM can be handled in only one direction by the standard LSTM and so to
network. overcome this problem, BLSTM has been proposed. The processing of
the input is done in both the standard and reverse orders in BLTM
thereby at ever time step the past and future information can be com
4.1. Attention based BLSTM networks
bined by this network.
The weights are processed separately by 2 LSTM layers of a BLSTM
In order to apprehend the feature of context information, BLSTM →
networks are chosen (Jay & Manollas, 2020). Two subnetworks are component so that the hidden states h and the cell states → c of an LSTM
included in the BLSTM which are utilized for the left and right sequence is produced which handles the input for the forward direction and to
contexts. The two subnetworks used here are the forward and backward
10
← ← Od (0, I)σ, the implementation of the normal distribution is done, where

process the input in reverse order, the hidden state h and the cell state c
→ ← Od (0, I) represents a d-dimensional normal distribution, I indicates the
is used. To create the output series of the BLSTM layers, both h and h variance–covariance matrix and is an identity matrix. A mean vector is
are utilized at time step t. The mathematical function is expressed as represented by μ ∈ Rd and a variance vector is represented by σ ∈ Rd .
follows: The training of the VAE model is done so that the same data as the input
→ ← is given as output, but at the same time there exists a difference in the
yt = W→ h t + W←y h t + by (42) objective function from that of AE. The objective function L is repre
h y h
sented as the sum of the reconstructed loss and the KL divergence
where the bias at the output is indicated by by . W→ specifies the weight LKL (a‖b );
h y
from the forward LSTM layer to the output layer. W← specifies the (48)
hy L = Lossrec + LKL (a‖b )
weight from the backward LSTM layer to the output layer.
In between the two probability distributions a and b, the difference is
To generate the output sequence of the BLSTM layer, the cell states of
measured by the KL divergence. The distribution of the latent vector is
the 2 LSTM layers in the BLSTM layer is utilized as follows:
indicated by a and the standard normal distribution is indicated by b in
yt = W→ → ←
c t + W←c y c t + by (43) this VAE model. To the input of the encoder and decoder, the addition of
a label mode is done and hence it becomes a CVAE model. During the
c y
where the weight from the forward LSTM layer to output layer is training of the model, the similar value is provided as input to both the
denoted as W→ and the weight from the backward LSTM to output nodes. The dimension of the input data is reduced with the help of an
c y
encoder by means of the label s. With the information of the label s, the
layer is indicated as W←c y .
reconstruction of the data is done by the decoder.
The CVAE model is expressed as follows:
4.1.2. Attention layer
To enhance the BLSTM output, the attention mechanism can be CV − EncoderAE : f CV AE (pi ; s = si ) = zi (49)
implemented so that the memory cells can be updated (Mnih et al.,
2014). For learning the necessary feature representations, it has shown CV − DecoderAE : gCVAE (zi ; s = si ) = qi (50)
some good performances. Each vector entry xi is calculated in the input x
sequence so that the attention factor αi are obtained as follows: where si denotes the ith label. The objective functions of both the VAE
and CVAE is identical to each other.
exp(f (xi ) )
αi = ∑ ( ( )) (44)
j exp f xj 4.2.1. H-CVAE
Hyperspherical VAE was originally proposed by Davidson et al and in
where the scoring function is represented by f(x). this work, it was utilized for the CVAE and it is termed as Hyperspherical
Here f(x) = WT x is considered as a linear scoring function, where the CVAE (H-CVAE) (Davidson, 2018). The original VAE’s network archi
trainable parameters is expressed as W. The sum of the total number of tecture has a profound influence on the H-CVAE’s network architecture
weights of the input sequences is the result of the attention layer. The and it is somewhat identical in nature. A von-Mises-Fischer (vMF) dis
mathematical formula for calculating the attention is expressed as fol tribution is utilized by the H-CVAE instead of using a O-VAE. For a vMF
lows: distribution, the probability density function is represented as:
∑ ( )
attentivex = αi x i (45) f (z) = Cexp kμT z (51)
i
{ ⃒ }
where z is expressed as z ∈ H = z ∈ Rd+1 ⃒‖z‖ = 1
4.2. Conditional Variational Autoencoder (CVAE) C represents a constant factor and it is utilized to normalize the
distribution. The concentration parameter is identified as k⩾0 and the
A simple extension of an Autoencoder (AE) and a Variational mean of the distribution is represented by μ ∈ H. If the value of μ is fixed,
Autoencoder (VAE) is the neural network model called CVAE (Zhao then z is concentrated at μ and the variance of the distribution is
et al., 2017). The AE comprises of an encoder and a decoder. The fea expressed by k. The d-dimensional hypersphere is specified by H. On the
tures are extracted from the input data by means of encoder and the data hypersphere, the vMF distribution is assumed to be a probability dis
is reconstructed by these features with the help of a decoder. The tribution. For the H-CVAE, the objective function is represented by (3).
dimension of the data is mitigated by the encoder and this feature is For the vMF distribution, the uniform distribution is expressed as vMF(⋅
termed as a latent vector. The corresponding space of this latent vector is , 0) is utilized as prior and vMF(μ, k) is utilized as posterior. The calcu
termed as the latent space. The mathematical expressions of the encoder lation of the KL divergence is then done accordingly. The KL divergence
and decoder are given as follows: is independent of μ but is represented as a function of k as there is an
uniform distribution of the prior on H. The prior is usually N(0, 1) and
EncoderAE : fAE (pi ) = zi (46)
the KL divergence is nothing but a function of μ in a normal distribution
DecoderAE : gAE (zi ) = qi (47) and as the location of the mean of a normal distribution is attained at
origin in a normal VAE. In order to avoid the KL collapse, H-VAE is
where the ith data is expressed by pi , the reconstructed data is expressed primarily used. When a decoder projects the output as an identical data
by qi and the latent vector is expressed by zi . Euclidean space is the latent even though various latent vectors are given as input, then KL collapse
situation happens. To prevent the KL collapse, H-VAE is utilized in this
space in an AE i.e., Rd . The training of AE model is done so that the same
work. The methodology is expressed as follows to achieve the classifi
input data is given as the output. Minimizing the reconstruction loss is
cation issue with O-CVAE and H-CVAE. The procedural steps are as
the primary function of the training and is expressed as Lossrec =
follows:
‖p − q‖2 , where the input vector is represented by p and the output
vector is represented by q respectively. A probability distribution is
a) The PCG signal vectors and the respective feature coefficients are
utilized by the VAE in the latent space and a normal distribution is constructed {(si , Fi ) }.
utilized by the ordinal VAE in the Euclidean space. In our work, the
b) The data is split into test data and train data
ordinal VAE is indicated as O-VAE. With the help of equation p = μ +
11
Table 1
Performance Analysis of NMF features with proposed feature selection algorithms and classifiers.
SVM KNN NBC Adaboost RF
Sen (%) Spe Acc Sen (%) Spe Acc Sen (%) Spe Acc Sen (%) Spe Acc Sen (%) Spe Acc
(%) (%) (%) (%) (%) (%) (%) (%) (%) (%)
BS 76.45 75.56 76.005 76.45 76.66 76.555 74.45 72.44 73.445 73.22 72.11 72.665 74.22 72.02 73.12
ABS 78.47 76.92 77.695 77.12 78.98 78.05 75.09 77.56 76.325 77.23 75.23 76.23 77.48 78.33 77.905
BS-GP 90.19 91.21 90.7 86.03 84.81 85.42 82.81 80.89 81.85 82.55 83.78 83.165 84.98 83.68 84.33
ABS-GP 93.29 92.98 93.135 88.69 89.61 89.15 85.37 84.12 84.745 86.89 85.31 86.1 87.11 85.91 86.51
Table 2
Performance Analysis of Semi-supervised NMF features with proposed feature selection algorithms and classifiers.
(%) (%) (%) (%) (%) (%) (%) (%) (%) (%)
BS 78.35 77.24 77.795 78.22 77.12 77.67 76.23 78.02 77.125 76.02 75.99 76.005 78.02 76.11 77.065
ABS 80.46 79.67 80.065 79.34 78.56 78.95 79.45 78.34 78.895 79.34 78.87 79.105 79.34 79.23 79.285
BS-GP 90.89 88.98 89.935 91.78 90.87 91.325 89.87 88.67 89.27 89.56 87.12 88.34 90.58 90.78 90.68
ABS-GP 95.21 95.38 95.295 93.99 92.02 93.005 90.81 89.09 89.95 90.89 90.37 90.63 91.71 90.21 90.96
Table 3
Performance Analysis of MVU technique and FCM with proposed feature ASC selection algorithms and classifiers.
(%) (%) (%) (%) (%) (%) (%) (%) (%) (%)
SCA 76.22 76.21 76.215 75.11 76.22 75.665 76.02 74.02 75.02 77.22 75.02 76.12 77.01 75.09 76.05
ASC 1 86.45 85.34 85.895 85.23 86.34 85.785 85.33 84.21 84.77 83.35 82.22 82.785 83.22 84.11 83.665
ASC 2 90.68 89.68 90.18 89.47 89.78 89.625 89.57 88.23 88.9 90.76 91.35 91.055 88.47 89.26 88.865
ASC 3 92.09 92.92 92.505 93.87 94.91 94.39 90.89 90.56 90.725 90.11 90.89 90.5 90.89 90.78 90.835
c) By utilizing the train data, the O-CVAE and H-CVAE are trained.
d) The train and test data are reconstructed as follows: ∑N ( l )2
gen
(a) {zi }←fCVAE {{si } : f = {Fi } } (ii) The error is evaluated by LF = N1 r
i=1 fi − Fi
(b) {qi }←gCVAE {{zi } : f = {Fi } }
e) The train and test data are evaluated as follows: The simplified illustration of the procedure is shown in Fig. 6. To the
(i) The feature coefficients are recomputed as pre-processed PCG signals, multi-domain feature extraction was applied
and then the preliminary features were selected with a simple Genetic
Fir = F r (qi )
Algorithm (GA) procedure. Also, the process of pre-emphasis, Fast
Fourier Transform (FFT), Log Mel spectrogram and Discrete Wavelet
train/test ∑N ( l )2 Transform (DWT) are applied to the PCG signals and then MFCC are
(ii) The error is evaluated by LF = N1 r
i=1 fi − Fi obtained. Both these combinations were fused and it was given as an
f) The new set of latent vectors are generated: input to the deep learning models to obtain the in-depth features and
(i) A collection of the latent vectors {zi } along with its respective set then classification is proceeded. A 10-fold cross-validation scheme with
of labels {Fi } is chosen. 80 percent training, 10 percent validation, and 10 percent testing has
(ii) The new features are generated by: been utilized in our experiment. To train the model, the training data is
( ) utilized and to assess the extent of the trained mode’s capacity in
{qi }←gCVAE {zi }; f l = {Fi } reconstructing the unseen data, test data is utilized. To train the models
successfully, the training data is fed inside the CVAE models. Once the
model is trained with the help of train set/validation set/test set, the
g) The training/test data is evaluated again.
reconstructed features are projected as output. Once the test data is
(i) The feature coefficients are recomputed as
reconstructed, it shows that the unseen features can be easily traced by
Fir = F r (qi ) this model. The sum of both the reconstruction loss and the KL loss gives
Table 4
Performance Analysis of K-PCA technique and FCM with proposed feature selection algorithms and classifiers.
(%) (%) (%) (%) (%) (%) (%) (%) (%) (%)
SCA 77.21 76.09 76.65 79.12 79.02 79.07 80.11 81.09 80.6 79.21 77.02 78.115 78.01 77.22 77.615
ASC1 82.34 81.54 81.94 83.34 82.12 82.73 84.23 83.87 84.05 81.35 80.11 80.73 82.21 81.36 81.785
ASC 2 95.56 94.32 94.94 92.87 92.45 92.66 89.76 88.11 88.935 91.67 90.25 90.96 90.33 91.77 91.05
ASC 3 93.77 94.51 94.14 91.54 92.77 92.155 89.87 88.25 89.06 88.86 87.69 88.275 89.58 87.81 88.695
12
Table 5
Performance Analysis of Laplacian Eigenmap technique and FCM with proposed feature selection algorithms and classifiers.
(%) (%) (%) (%) (%) (%) (%) (%) (%) (%)
SCA 79.98 78.21 79.095 79.02 77.11 78.065 80.22 77.01 78.615 80.22 78.02 79.12 80.11 78.11 79.11
ASC 1 82.44 86.22 84.33 84.34 82.23 83.285 83.34 82.22 82.78 83.46 83.33 83.395 82.24 81.23 81.735
ASC 2 88.56 87.35 87.955 88.78 87.67 88.225 88.58 86.34 87.46 87.89 86.68 87.285 89.68 88.79 89.235
ASC 3 93.78 91.78 92.78 93.91 92.83 93.37 89.01 86.68 87.845 90.21 87.51 88.86 91.52 89.01 90.265
considered as {pi |(i = 1, 2, ..., N) }. All the features are included in this
Table 6
set and they indicate a large error of FL . The elimination of these features
Performance Analysis of Deep learning techniques.
is done as they are not suited to be design candidates. The condition of
Sensitivity (%) Specificity (%) Accuracy (%)
{( )2 }
the generated features is expressed as G = pi |fil − Fir ⩽ε , where
Attention based BLSTM 88.02 87.23 87.62
Ordinal VAE 89.34 88.15 88.745 the tolerance of the error is indicated by ε and in our experiment it is set
CVAE 94.12 93.96 94.04 1
∑
H-CVAE 94.98 93.8 94.39
as 0.05. The mean of the feature μ = |G| pi ∈G pi is computed and the
RBM-DBN 91.13 90.11 90.62 distance of the feature from mean can also be easily computed.
4.3. Restricted Boltzmann machine based deep belief networks

us the objective function. The reconstructed features may not project the
same F that is needed by the label. For the reconstructed features, the
To obtain a vigorous and notable representation for the PCG signals,
numerical computations of F are done so that the error of F is evaluated.
a deep belief network structure was utilized here. By means of forming
Depending on the MSE, the error between the label and the recalculated
multiple layers of autoencoders or stacked restricted Boltzmann Ma
F is done as:
chines (RBMs), the DBN structure can easily be implemented.
1 ∑N
( l )2
LF = f − Fir (52) 4.3.1. RBM
N i=1 i
The composition of every RBM is done in such a way that it has a
where the Fir indicates the recalculated F. For the training data, testing visible layer, a hidden layer and then in between two layers, it has
connection weights (Salakhutdinov et al., 2007). In an unsupervised
data and the generated data, the error is defined and specified as LtrainF ,
gen mode, it is usually trained in a greedy mode. The neurons utilized in the
Ltest
F and LF respectively. The generation of the new features are done by RBM are stochastic binary units. The input data is received by the visible
utilizing the decoder section. Initially, the choosing of the latent vector z
layer and it has many directed connections to the hidden layer neurons.
along with the desired feature f are chosen. In the O-VAE concept, the
At the same time, there is a disconnection between the neurons form the
embedding of the latent space is done to the hypersphere and so the
same layer. The reconstruction of the input layer in the responsibility of
latent vector norm should be equal to 1. (i.e.,) ‖z‖ = 1. To the decoder, z
the hidden layer and it is done by means of tuning the biases and
and f are fed so that the features are projected as output by the decoder.
connection weights repeatedly. For PCG signal classification, a spectral
For the generated features, FL is computed and the errors of the label f l feature is indicated by every visible neuron which has a hypothetical
and the actual FL of the reconstructed feature is evaluated. By utilizing Gaussian distribution. For the two layers, there is a joint configuration
the MSE, the difference between the label and the recalculated data is expressed in terms of energy function as:
computed. Using the following process, the evaluation of the variation of
the generated features are done. If the set of generated features are
Performance Comparison of Classification Accuracy (%) for NMF and Semi-

supervised NMF with proposed Feature Selection techniques and Classifiers
120
100
80
60
40
20
0
SVM KNN NBC Adaboost RF SVM KNN NBC Adaboost RF
NMF features with Proposed FS techniques and Classifiers Semi-supervised NMF features with Proposed FS
BS ABS BS-GP ABS-GP techniques and Classifiers
Fig. 7. Performance Analysis of Classification Accuracy for NMF and Semi-supervised NMF with Proposed FS selection techniques and Classifiers.
13
Performance Comparison of Classification Accuracy (%) for Dimensionality

Reduction techniques with FCM and proposed Feature Selection techniques for
classifiers
100
80
60
40
20
0
Adaboost
Adaboost
Adaboost
NBC
NBC
NBC
KNN
KNN
KNN
RF
RF
RF
SVM
SVM
SVM
MVU and FCM with Proposed FS K-PCA and FCM with Proposed FS LE and FCM with Proposed FS
techniques and Classifiers techniques and Classifiers techniques and Classifiers
SCA ASC-1 ASC-2 ASC-3

Fig. 8. Performance Analysis of Classification Accuracy for Dimensionality Reduction techniques and FCM with Proposed FS selection techniques and Classifiers.
∑
m ∑
n m ∑
∑ n
[ ] [ ]
E(y, x) = − bi yi − cj xj − yi xj wij (53) ∂logp(y, x) ∂E(y, x) ∂E(y, x)
= Edata − E mod el (58)
i=1 j=1 i=1 j=1 ∂wij ∂wij ∂wij
where the binary state at the visible neuron is expressed by yi and the where E[data] indicates the expectation parameter under the distribution
binary state at the visible neuron is expressed by xj respectively. The of the training dataset and E[ mod el] specifies the expectation parameter
corresponding biases of neurons are expressed as bi and cj respectively. under the distribution of the training model. Moreover, the gradient can
The connecting weights between them is expressed by wij . For a pair of be rescripted as:
hidden and visible layer, a joint probability is determined depending on
∂logp(y, x) [ ] [ ]
the Boltzmann distribution and every function and is expressed as fol = Edata yi xj − E mod el yi xj (59)
lows: ∂wij
1 − E(y,x)
p(y, x) = e (54)
PF Table 7
∑ Performance Comparison of the proposed results with previous works on the
where PF = y,x e− E(y,x) indicates the partition function.
International Competition Physionet/CinC challenge 2016 challenge dataset.
As there are no connections between the hidden neurons, it is
conditionally independent. If a visible vector y is considered, the con Author Techniques Utilized Results in terms of
Classification Accuracy
ditional probability of neuron xj being considered is expressed as fol (%)
lows:
Baydoun et al Ensemble learning with Bagging and 86.60%
( )
( ) ∑ (2020) Boosting
p xj = 1|y = σ cj + yi wij (55) Abduh et al Fractional Fourier Transforms and 94.69%
i (2020) MFCC with KNN, SVM and ensemble
Classifiers
Now if a hidden vector h is considered, the determination of the Deng et al. MFCC with Paralleling Recurrent 98.34%
conditional probability of the visible neurons yi is assessed as: (2020) Neural Networks
( ) Bozkurt et al. MFCC with CNN 86.02%
∑ (2018)
p(yi = 1|x ) = σ bi + xj wij (56) Li et al. Wavelet Norm and Fractal Dimension 90.40%
j (2019) with Twin SVM
Potes et al. Adaboost with CNN ensemble 85%
where the logistic function is expressed as σ (⋅) For the training dataset (2016)
{ } Wu et al. Savitzky-Golay filter with ensemble 89.81%
T = t 1 , t 2 , ..., tnt , the number of training samples are represented as nt .
(2019) CNN
To fit the training samples, the training of the parameters of RBM are Zhang et al. STFT with LSTM 94.84%
done by means of increasing a log likelihood function by counting the (2019)
biases b and c along with the connection weights w and it is represented Tang et al. Multidomain features with SVM 88%
as: (2018) classifier
Saores et al. CNN with ANN, KNN, SVM and DT 93.00%
∑
nt (2020)
LT = logp(y, x) (57) Proposed NMF + ABS-GP + SVM 93.13%
i=1 Results
Semi supervised NMF + ABS-GP + 95.29%
With respect to weights w, the derivative of the log-likelihood is SVM
formulated by the help of gradient ascent and contrastive divergence MVU with FCM + ASC 3 + KNN 94.39%
K-PCA with FCM + ASC 2 + SVM 94.94%
techniques and is expressed as:
LE with FCM + ASC 3 + KNN 93.37%
H-CVAE 94.39%
14
[ ]
For the approximate estimation of the expectation Edata yi xj , the classification to obtain the results for the proposed works. As far as the
contrastive divergence technique is used. To compute the expectation proposed hybrid BS-GP algorithm and the hybrid ABS-GP algorithm is
[ ]
E mod el yi xj , the Gibbs sampling technique is adopted. For the connec concerned, the parameter settings are as follows. For about 30 times, the
tion weights, the learning rule is assessed as follows: comparison algorithms are repeated and as far as the function evalua
tions are concerned, it is set to a value of 1000 and the population size is
Δwij = η(Edata [yi xi ] − E mod el [yi xi ] ) (60) set to 100. The number of clusters assigned here is four and the value of
For the biases, the updating rules are as follows: pcluster is set to 0.5 respectively in the BS and ABS section of the proposed
hybrid BS-GP and hybrid ABS-GP respectively. For the GP part, the
Δbi = ε(Edata [yi ] − E mod el [yi ] ) (61) function set and the terminator set was selected in the range of [− 10,
10] respectively. As far the proposed ASC algorithm is considered, the
and parameter settings were set as follows. The population size was set at
( [ ] [ ])
Δcj = ε Edata xj − E mod el xj (62) 100 again. The number of iterations was set to 200 and the number of
runs was set as 20 in our experiment. The search domain had a value in
where the learning rates are indicated by η and ε respectively. For the the range of [0,1] and α parameter in the fitness function is defined as
reconstruction of the input data, the training of RBM is done in an un 0.98 respectively. As far as the A-BLSTM architecture is concerned, the
supervised manner depending on the updating rules of parameters. number of hidden layers in BLSTM is set as 30 and to prevent over-fitting
a drop out layer is implemented. A maximum of 100 epochs is utilized
4.3.2. Deep Belief Network and the number of batch size is set to 32 with dropout regularization.
To construct a DBN, three layers of RBM were utilized with a softmax The loss function utilized is the cross-entropy, the optimizer used is
regression layer. To the bottom layer of RBM, the feeding of the raw Adam and the initial learning rate assigned is 0.001. For the O-VAE, C-
input data is done. The output of the hidden layer from the lower RBM VAE and H-CVAE methods, the hyperparameters are as follows. A
was fed to the visible layer from the higher RBM. The highest likelihood learning rate of 0.005 is set and the batch size was set to 50. The opti
of the class that a particular sample belongs to can be statistically esti mization algorithm utilized is Adam, the dropout ratio is set as 0.4 and
mated by the softmax regression concept when compared to the logistic the number of latent vectors assigned here is 60 respectively. For the
regression. The two important stages used primarily in DBN are pre- RBM-DBN, the value of sparsity constraint was set as 0.4 and the weight
training phase and fine-tuning phase (Huang et al., 2016). In every decay was set to 0.08. For the connection weights, the learning rate is set
layer of the RBM, the pre-training phase was regulated so that the initial as 0.45 respectively. The determination of these parameters was done on
parameters of the DBN are obtained. To obtained prediction error, the a trial-and-error basis by conducting the experiment several times and
addition of softmax regression was done so that in the fine-tuning stage, only the values which gave the best results are finally selected and
the parameters can be optimized by the backpropagation algorithm. expressed here.
Additionally, in order to avoid the overfitting which includes the spar Table 1 shows the Performance Analysis of NMF features with pro
sity constraint and weight decay, the incorporation of some constraint posed feature selection algorithms and classifiers and it is evident that a
denominations was done into cost function of softmax regression. high classification accuracy of 93.13% is obtained for the proposed ABS-
GP method with SVM classification. Table 2 shows the Performance
Analysis of Semi-supervised NMF features with proposed feature selec
5. Results and discussion
tion algorithms and classifiers and it is observed that a high classifica
tion accuracy of 95.29% is obtained for the proposed ABS-GP method
The dataset utilized in this work is provided by the International
with SVM classification. Table 3 shows the Performance Analysis of
Competition Physionet/CinC challenge 2016 which can be downloaded
MVU technique and FCM with proposed feature selection algorithms
from the website (http://www.physionet.org/challenge/2016). The
and classifiers and the results show that a high classification accuracy of
PCG recordings of both the healthy patient and pathological patients are
94.39% is obtained when ASC 3 method is utilized with KNN classifi
included in this database and the acquisition of it is done either in a
cation. Table 4 shows the Performance Analysis of K-PCA technique and
clinical way of non-clinical environment. From 764 patients, a total of
FCM with proposed feature selection algorithms and classifiers and it is
3153 heart sound recordings are collected as “*.wav” format which lasts
evident that a high classification accuracy of 94.94% is obtained for ASC
from 5 s to 120 s respectively. The division of the recordings were done
2 method with SVM classification. Table 5 shows the Performance
into two classes such as normal sounds and abnormal sounds respec
Analysis of Laplacian Eigenmap technique and FCM with proposed
tively with the aid of cardiac diagnosis specialist. The abnormal re
feature selection algorithms and classifiers, and the results prove that a
cordings are represented by 665 recordings and the normal recordings
high classification accuracy of 93.37% is obtained for ASC 3 method
are represented by 2488 recordings. For every recording, a skilled
with KNN classifiers. Table 6 shows the comparative Performance
cardiologist who has a rich experience has evaluated the quality of PCG
Analysis of Deep learning techniques, where a high classification accu
signal. The in-depth details of the dataset have been provided in Liu
racy of 94.39% is obtained for the H-CVAE method. Fig. 7 exhibits the
et al. (2016). The performance metrics utilized in our experiment was
Performance Analysis of Classification Accuracy for NMF and Semi-
Sensitivity, Specificity and Accuracy and is represented by the following
supervised NMF with Proposed FS selection techniques and Classifiers.
formulae as follows:
Fig. 8 exhibits the Performance Analysis of Classification Accuracy for
TP dimensionality reduction techniques and FCM with Proposed FS selec
Sensitivity = (63)
TP + FN tion techniques and Classifiers. On the analysis of Fig. 7, it is evident that
a high classification accuracy is obtained for the proposed ABS-GP
TN
Specificity = (64) technique with SVM classifier and on the analysis of Fig. 8, it is once
TN + FP again evident that the proposed ASC-2 and ASC-3 techniques performs
TP + TN well across most of the classifiers.
Accuracy = (65)
TP + TN + FP + FN
5.1. Comparison with previous works
where TP indicates True Positive, TN indicates True Negative, specifies
False Positive and FN specifies False Negative respectively. A 10-fold The comparison of our proposed results with the previous works
cross-validation scheme with 80 percent training, 10 percent valida computed on the same International Competition Physionet/CinC
tion, and 10 percent testing has been utilized in our experiment for challenge 2016 challenge dataset is tabulated in Table 7 as follows.
15
On the analysis of Table 7, it is evident that the proposed works have implement and a detailed mathematical modelling too are explained for
performed well as NMF + ABS-GP + SVM performs well with a classi every procedure in this work. Future works aim to incorporate advanced
fication accuracy of 93.13%. When the Semi supervised NMF + ABS-GP feature selection techniques using hybriding of more meta heuristic
+ SVM is implemented, a classification accuracy of 95.29% is obtained models and classification through advanced deep learning models like
and a high classification accuracy of 94.39% is obtained for the MVU Graphical Neural Networks, Capsule Neural Networks, Transformer
with FCM + ASC 3 + KNN combination. A high classification accuracy of model, Siamese Neural Networks and Temporal Graph Convolution
94.94% is obtained for K-PCA with FCM + ASC 2 + SVM method and a Networks. Future plans aim to involve and incorporate developing this
high classification accuracy of 93.7% is obtained for LE with FCM + ASC same model for other biosignal processing datasets such as Electroen
3 + KNN technique. Finally, a H-CVAE deep learning method produces a cephalography, Electromyography, Photoplethysmography, Electrocar
high classification accuracy of 94.39% when compared to other pro diography etc. Future plans also aim to incorporate more efficient meta
posed deep learning methods. The NMF and semi-supervised NMF with heuristic algorithms and advanced innovative deep learning models so
BS/ABS and machine learning classifiers produced a computational that a better classification accuracy can be obtained. The work imple
complexity of O(n3logn) while the NMF and semi-supervised NMF with mented can also be extended to telemedicine applications in future so
proposed BS-GP/ABS-GP and machine learning classifiers produced a that it could serve humanity in a better manner.
computational complexity of O(n3log2n). The dimensionality reduction
techniques with FCM, SCA and machine learning classifiers produced a CRediT authorship contribution statement
computational complexity of O(n3log2n). The dimensionality reduction
techniques with FCM, ASC algorithm and machine learning classifiers Sunil Kumar Prabhakar: Conceptualization, Methodology, Soft
produced a computational complexity of O(n4log2n). The deep learning ware, Data curation, Validation, Formal analysis, Investigation, Visual
techniques implemented in the paper produced a computational ization. Dong-Ok Won: Investigation, Visualization, Supervision,
complexity of O(n6logn). Writing – review & editing, Project administration, Funding acquisition.
5.2. Advantages and limitations of the study

Declaration of Competing Interest
The study follows a streamlined method to implement the experi
ments. Generally, in order to impute the missing data in statistics, NMF The authors declare that they have no known competing financial
is used as it can take the missing data while the cost function is mini interests or personal relationships that could have appeared to influence
mized. Optimization techniques helps to lower the risk of errors and to the work reported in this paper.
improve the model accuracy. The NMF along with semi-supervised NMF
when incorporated with the optimization algorithms and the modified Data availability
versions of it and then classified with ML classifiers, some good results
are obtained initially. Secondly, with the help of dimensionality Data will be made available on request.
reduction techniques, there are only a few features to process and so
only a less storage space is required along with a less computational Acknowledgments
time. The flexibility to express the data points can belong to more than
one cluster and that is easily aided by FCM. So, the combination of This work was supported by the National Research Foundation of
dimensionality reduction and FCM aided with the implementation of the Korea (NRF) grant funded by the Korea government (MSIT) (No.
optimization algorithms and the modified versions of it and then clas 2022R1A5A8019303) and partly supported by Institute of Information
sified with ML classifiers, some good results are obtained again. The & Communications Technology Planning & Evaluation (IITP) grant
study also analyzes the usage of deep learning techniques to examine the funded by the Korea government (MSIT) (No. 2021-0-02068, Artificial
analysis as it has the ability to execute feature engineering by itself. As Intelligence Innovation Hub).
far as the limitation of the study is considered, the classification accu
racy could still be improved by means of selecting the correct strategy to References:
implement the experiment. For instance, there could be other meta
Abduh, Z., Nehary, E. A., Abdel Wahed, M., & Kadah, Y. M. (2020). Classification of heart
heuristic algorithms or modifications of it that could replace the used
sounds using fractional fourier transform based mel-frequency spectral coefficients
algorithms in the paper so that a better classification accuracy could be and traditional classifiers. Biomedical Signal Processing and Control, 57, Article
obtained. More advanced deep learning models like capsule neural 101788. https://doi.org/10.1016/j.bspc.2019.101788
network, Siamese neural network, transformers etc. can replace the used Akay, Y., Akay, M., Welkowitz, W., & Kostis, J. (1994). Noninvasive detection of
coronary artery disease. IEEE Engineering in Medicine and Biology Magazine, 13(5),
deep learning models in the paper so that a better classification accuracy 761–764.
could be obtained in future. Ari, S., Hembram, K., & Saha, G. (2010). Detection of cardiac abnormality from PCG
signal using LMS based least square SVM classifier. Expert Systems with Applications,
37(12), 8019–8026.
6. Conclusion and future works Aziz, S., Khan, M. U., Alhaisoni, M., Akram, T., & Altaf, M. (2020). Phonocardiogram
signal processing for automatic diagnosis of congenital heart disorders through
One of the most common causes of mortality every year is due to fusion of temporal and cepstral features. Sensors, 20(13), 3790. https://doi.org/
10.3390/s20133790
CVD. CVD can be well controlled before it reaches the final stages by Baydoun, M., Safatly, L., Ghaziri, H., & el Hajj, A. (2020). Analysis of heart sound
means of an early diagnosis. For detecting the heart abnormalities, the anomalies using ensemble learning. Biomedical Signal Processing and Control, 62,
auscultation technique acts as a vital indicator. For the evaluation of Article 102019. https://doi.org/10.1016/j.bspc.2020.102019
Bozkurt, B., Germanakis, I., & Styliomon, Y. (2018). A study of time-frequency features
heart murmurs and for the clinical diagnosis, auscultation is highly for CNN based automatic heart sound classification of pathology detection.
useful as it saves a lot of time. In this paper, two robust conglomerate Computers in Biology and Medicine, 100, 132–134.
techniques are utilized for the efficient classification of heart valve Cai, D., He, X., Han, J., & Huang, T. S. (2011). Graph regularized nonnegative matrix
factorization for data representation. IEEE Transactions on Pattern Analysis and
disease from the PCG signals. The best results show that when super
Machine Intelligence, 33(8), 1548–1560.
vised NMF is utilized with ABS-GP and SVM, the highest classification Chen, P., & Zhang, Q. (2020). Classification of heart sounds using discrete time-
accuracy is reported in this work. The second-best classification accu frequency energy feature based on S transform and the wavelet threshold denoising.
racy is obtained when K-PCA with FCM is utilized alongside ASC tech Biomedical Signal Processing and Control, 57, Article 101684. https://doi.org/
10.1016/j.bspc.2019.101684
nique 2 and SVM. The deep learning model of H-CVAE gives the third Cheng, L. T., Wang, S. G., & Wei, X. (2010). New fuzzy c-means clustering model based
best classification accuracy. The proposed methods are quite simple to on the data weighted approach. Data & Knowledge Engineering, 69, 881–900.
16
Davidson, T.R. (2018). Hyperspherical Variational Auto-Encoders, arXiv.Org. https:// Rujoie, A., Fallah, A., Rashidi, S., Rafiei Khoshnood, E., & Seifi Ala, T. (2020).
arxiv.org/abs/1804.00891. Classification and evaluation of the severity of tricuspid regurgitation using
Deng, M., Meng, T., Cao, J., Wang, S., Zhang, J., & Fan, H. (2020). Heart sound phonocardiogram. Biomedical Signal Processing and Control, 57, Article 101688.
classification based on improved MFCC features and convolutional recurrent neural https://doi.org/10.1016/j.bspc.2019.101688
networks. Neural Networks, 130, 22–32. https://doi.org/10.1016/j. Safara, F., & Ramaiah, A. R. A. (2020). RenyiBS: Renyi entropy basis selection from
neunet.2020.06.015 wavelet packet decomposition tree for phonocardiogram classification. The Journal
Dokur, Z., & Ölmez, T. (2008). Heart sound classification using wavelet transform and of Supercomputing, 77(4), 3710–3726. https://doi.org/10.1007/s11227-020-03413-9
incremental self-organizing map. Digital Signal Processing, 18(6), 951–959. Salakhutdinov, R., Mnih, A., & Hinton, G. (2007). Restricted Boltzmann machines for
el Badlaoui, O., Benba, A., & Hammouch, A. (2020). Novel PCG analysis method for collaborative filtering. In Proceedings of the 24th International Conference on
discriminating between abnormal and normal heart sounds. IRBM, 41(4), 223–228. Machine learning (ICML ’07), Corvallis, Oregon, vol. 227, pp. 791-798.
https://doi.org/10.1016/j.irbm.2019.12.003 Sawant, N. K., Patidar, S., Nesaragi, N., & Acharya, U. R. (2021). Automated detection of
Gharehbaghi, A., Lindén, M., & Babic, A. (2019). An artificial intelligent-based model for abnormal heart sound signals using Fano-factor constrained tunable quality wavelet
detecting systolic pathological patterns of phonocardiogram based on time-growing transform. Biocybernetics and Biomedical Engineering, 41(1), 111–126. https://doi.
neural network. Applied Soft Computing, 83, Article 105615. https://doi.org/ org/10.1016/j.bbe.2020.12.007
10.1016/j.asoc.2019.105615 Schmidt, S. E., Holst-Hansen, C., Graff, C., Toft, E., & Struijk, J. J. (2010). Segmentation
Ghosh, S. K., Ponnalagu, R., Tripathy, R., & Acharya, U. R. (2020). Automated detection of heart sound recordings by a duration-dependent hidden Markov model.
of heart valve diseases using chirplet transform and multiclass composite classifier Physiological Measurement, 31(4), 513–529.
with PCG signals. Computers in Biology and Medicine, 118, Article 103632. https:// Shi, Y. (2011). An optimization algorithm based on brainstorming process. International
doi.org/10.1016/j.compbiomed.2020.103632 Journal of Swarm Intelligence Research, 2(4), 35–62.
http://www.physionet.org/challenge/2016/, Nov. 16, 2016. Shi, Y., Xue, J., & Wu, Y. (2013). Multi-objective optimization based on brain storm
Huang, Y., Wu, A., Zhang, G., & Li, Y. (2016). Speech emotion recognition based on deep optimization algorithm. International Journal of Swarm Intelligence Research, 4(3),
belief networks and wavelet packet cepstral coefficients, International Journal of 1–21.
Simulation: Systems, Science and Technology, 17(28) 28.1-28.5. Singh, S. A., Meitei, T. G., & Majumder, S. (2020). 6-Short PCG classification based on
Jay, S., & Manollas, M., (2020). Effective deep CNN-BiLSTM model for network intrusion deep learning. Deep Learning Techniques for Biomedical and Health Informatics,
detection. In Proceedings of the Third International Conference on Artificial 141–164.
Intelligence and Pattern Recognition, Xiamen, Fujian, China, p. 9. Soares, E., Angelov, P., & Gu, X. (2020). Autonomous Learning Multiple-Model zero-
Keijzer, M. (2004). Scaled symbolic regression. Genetic Programming and Evolvable order classifier for heart sound classification. Applied Soft Computing, 94, Article
Machines, 5(3), 259–269. 106449. https://doi.org/10.1016/j.asoc.2020.106449
Kotsia, I., Zafeiriou, S., & Pitas, I. (2007). Novel discriminant non-negative matrix Springer, D. B., Tarassenko, L., & Clifford, G. D. (2016). Logistic regression-HSMM-based
factorization algorithm with applications to facial image characterization problems. heart sound segmentation. IEEE Transactions on Biomedical Engineering, 63(4),
IEEE Transactions on Information Forensics and Security, 2(3), 588–595. 822–832.
Li, H., Wang, X., Liu, C., Zeng, Q., Zheng, Y., Chu, X., et al. (2020). A fusion framework Tang, H., Chen, H., Li, T., & Zhong, M. (2016). Classification of normal/abnormal heart
based on multi-domain features and deep learning features of phonocardiogram for sound recordings based on multi: domain features and back propagation neural
coronary artery disease detection. Computers in Biology and Medicine, 120, Article network. In Proceedings of the 43rd Computing in Cardiology Conference,
103733. https://doi.org/10.1016/j.compbiomed.2020.103733 Vancouver, Canada, vol. 43.
Li, J., Ke, L., & Du, Q. (2019). Classification of heart sounds based on the wavelet fractal Tang, H., Dai, Z., Jiang, Y., Li, T., & Liu, C. (2018). PCG classification using multidomain
and twin support vector machine. Entropy, 21(5), 472. https://doi.org/10.3390/ features and SVM classifier. BioMed Research International, 1–14. https://doi.org/
e21050472 10.1155/2018/4205027
Lin, A., Wu, Q., Heidari, A. A., et al. (2019). Predicting intentions of students for master Tuncer, T., Dogan, S., Tan, R. S., & Acharya, U. R. (2021). Application of Petersen graph
programs using a chaos-induced sine cosine-based fuzzy K-nearest neighbor pattern technique for automated detection of heart valve diseases with PCG signals.
classifier. IEEE Access, 7, 67235–67248. Information Sciences, 565, 91–104. https://doi.org/10.1016/j.ins.2021.01.088
Liu, C., Li, K., Zhao, L., et al. (2013). Analysis of heart rate variability using fuzzy Wang, G., Luo, J., He, Y., & Chen, Q. (2015). Fault diagnosis of supervision and
measure entropy. Computers in Biology and Medicine, 43(2), 100–108. homogenization distance based on local linear embedding algorithm. Mathematical
Liu, C., Springer, D., Li, Q., et al. (2016). An open access database for the evaluation of Problems in Engineering, 1–8. https://doi.org/10.1155/2015/981598
heart sound algorithms. Physiological Measurement, 37(12), 2181–2213. Wu, H., Kim, S., & Bae, K. (2010). Hidden Markov Model with heart sound signals for
Long, W., Wu, T., Liang, X., & Xu, S. (2018). Solving high-dimensional global identification of heart diseases. In Proceedings of the 20th International Congress on
optimization problems using an improved sine cosine algorithm. Expert Systems with Acoustics, ICA, Sydney, Australia.
Applications, 123, 108–126. Wu, J. M. T., Tsai, M. H., Huang, Y. Z., Islam, S. H., Hassan, M. M., Alelaiwi, A., et al.
Luo, L., Zhang, C., Qin, Y., & Zhang, C. (2013). Maximum variance hashing via column (2019). Applying an ensemble convolutional neural network with Savitzky-Golay
generation. Mathematical Problems in Engineering, 1–10. https://doi.org/10.1155/ filter to construct a phonocardiogram prediction model. Applied Soft Computing, 78,
2013/379718 29–40. https://doi.org/10.1016/j.asoc.2019.01.019
Maglogiannis, I., Loukis, E., Zafiropoulos, E., & Stasis, A. (2009). Support Vectors Yang, L., Li, S., Zhang, Z., & Yang, X. (2020). Classification of phonocardiogram signals
Machine-based identification of heart valve diseases using heart sounds. Computer based on envelope optimization model and support vector machine. Journal of
Methods and Programs in Biomedicine, 95(1), 47–61. Mechanics in Medicine and Biology, 20(1), 1950062. https://doi.org/10.1142/
Mnih, V., Heess, N., Graves, A., et al. (2014). Recurrent models of visual attention. s0219519419500623
Advances in Neural Information Processing Systems, 2204–2212. Yaseen Son, G. Y., & Kwon, S. (2018). Classification of heart sound signal using multiple
Moghaddasi, Z., Jalab, H. A., Md Noor, R., & Aghabozorgi, S. (2014). Improving RLRN features. Appl. Sci., 8, 2344.
image splicing detection with the use of PCA and Kernel PCA. The Scientific World Zeinali, Y., & Niaki, S. T. A. (2022). Heart sound classification using signal processing
Journal, 1–10. https://doi.org/10.1155/2014/606570 and machine learning algorithms. Machine Learning with Applications, 7, Article
Mubarak, Q. U. A., Akram, M. U., Shaukat, A., Hussain, F., Khawaja, S. G., & Butt, W. H. 100206. https://doi.org/10.1016/j.mlwa.2021.100206
(2018). Analysis of PCG signals using quality assessment and homomorphic filters Zhang, W., Han, J., & Deng, S. (2019). Abnormal heart sound detection using temporal
for localization and classification of heart sounds. Computer Methods and Programs in quasi-periodic features and long short-term memory without segmentation.
Biomedicine, 164, 143–157. https://doi.org/10.1016/j.cmpb.2018.07.006 Biomedical Signal Processing and Control, 53, Article 101560. https://doi.org/
Ölmez, T., & Dokur, Z. (2003). Classification of heart sounds using an artificial neural 10.1016/j.bspc.2019.101560
network. Pattern Recognition Letters, 24(1–3), 617–629. Zhao, L., Wei, S., Zhang, C., et al. (2015). Determination of sample entropy and fuzzy
Pathak, A., Samanta, P., Mandana, K., & Saha, G. (2020). Detection of coronary artery measure entropy parameters for distinguishing congestive heart failure from normal
atherosclerotic disease using novel features from synchrosqueezing transform of sinus rhythm subjects. Entropy, 17(9), 6270–6288.
phonocardiogram. Biomedical Signal Processing and Control, 62, Article 102055. Zhao, T., Zhao, R., & Eskenazi, M. (2017). Learning discourse-level diversity for neural
https://doi.org/10.1016/j.bspc.2020.102055 dialog models using conditional variational autoencoders. In Proceedings of the 55th
Potes, C., Parvaneh, S., Rahman, A., & Conray, B. (2016). Ensemble of feature-based and Annual Meeting of the Association for Computational Linguistics (Volume 1: Long
deep-learning based classifiers for detection of abnormal heart sounds, Computing in Papers), Vancouver, Canada, pp. 654-664.
Cardiology Conference, pp. 621-524.
17

1 s2.0 S095741742300221X Main

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S095741742300221X Main

Uploaded by

Copyright:

Available Formats

Expert Systems With Applications 221 (2023) 119720

Contents lists available at ScienceDirect

Expert Systems With Applications

Phonocardiogram signal classification for the detection of heart valve

To minimize the objective function Cdiv in (2), the method is as

2.2. BS optimization algorithm: Algorithm 1 (continued )

Fig. 2. Schematic illustration of GP algorithm.

3.1. Dimensionality reduction techniques 1∑ 1∑ 1∑

connection of every datapoint pi is done to its k-nearest neighbour pij (j = j

Fig. 5. Simplified Illustration of the Attention based BLSTM networks.

based on the conditions in the following equation as:

← ← Od (0, I)σ, the implementation of the normal distribution is done, where

4.3. Restricted Boltzmann machine based deep belief networks

Performance Comparison of Classification Accuracy (%) for NMF and Semi-

Performance Comparison of Classification Accuracy (%) for Dimensionality

SCA ASC-1 ASC-2 ASC-3

5.2. Advantages and limitations of the study

You might also like