Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Biomedical Engineering Advances 4 (2022) 100048

Contents lists available at ScienceDirect

Biomedical Engineering Advances


journal homepage: www.journals.elsevier.com/biomedical-engineering-advances

Multi-classification neural network model for detection of abnormal


heartbeat audio signals
Hassaan Malik a, *, Umair Bashir b, Adnan Ahmad b
a
Department of Computer Science, University of Management and Technology, Lahore 54000, Pakistan
b
Department of Computer Science, National College of Business Administration and Economics Lahore, Sub Campus, Multan 60000, Pakistan

A R T I C L E I N F O A B S T R A C T

Keywords: Nowadays, heart disease is the leading cause of death. The high mortality rate and escalating occurrence of heart
Abnormal heart sound diseases worldwide warrant the requirement for a fast and efficient diagnosis of such ailments. The purpose is to
Heartbeat signals design an automated system for the classification of abnormal heartbeat audio signals to assist cardiologists. To
RNN
the best of our knowledge, this is the first study that uses a single neural network model for the classification of
LSTM
Neural network
eight different types of heartbeat audio signals. The proposed recurrent neural network (RNN) model using Long
short-term memory (LSTM) is developed on two publically available databases such as the PASCAL challenge and
the 2017 PhysioNet challenge. Mel frequency cepstrum coefficient (MFCC) is applied to extract the dominant
features, and a bandpass filter is used to remove the noise from both of the datasets. Afterward, the down­
sampling technique is used to fix the size of the sampling rate of each sound signal to 20KHz and 300 Hz for the
Pascal and PhysioNet database, respectively. The proposed model is compared with multi-layer perceptron
(MLP) in terms of different performance evaluation matrices. Furthermore, the outcomes of five machine
learning (ML) models are also analyzed. The proposed model has achieved the highest classification accuracy of
0.9971 on the Pascal database, and 0.9870 accuracy on the PhysioNet challenge dataset, which is consistently
superior to its competitor approaches. The proposed model provides significant assistance to the cardiac
consultant in detecting heart valve diseases.

1. Introduction range for adults is 60–100 beats per minute. The sound pattern produced
by the heart “lub dub, dub lub”, is the clear and healthy sequence of the
In past, human diagnosticians have diagnosed cardiovascular dis­ normal heartbeat, where the time duration of dub to lub is more than the
eases (CVDs) by simply listening to a patient’s heartbeat directly from time lub to dub. The order of murmur heartbeat sounds has produced a
their chest; this method, however, was found to be very unscientific, noisy sound between lub to dub and dub to lub is a sign of clusters of a
inefficient, and unethical. Therefore, the "Stethoscope" was invented by heart ailment. Furthermore, the sound sequence of extra-systole heart
Laennec in 1816, and” mediate auscultation” is now used by di­ sounds contains the flow patterns of “lub-lub dub, lub dub-dub” and is
agnosticians in the diagnosis of CVDs [1]. In the medical field, this generally present in children [4]. Cvds are one of the major threats to
equipment was started rapidly used for the diagnosis of heart disease. human life. It is considered as the cluster of different heart diseases
Despite its accurate measurement, the stethoscope requires many years which are commonly used to discuss the range of disorders that affect
of experience to expertise [2]. The heart sound of normal patients holds the heart. Narrowing or thinning of the major blood vessels is the main
two types of sound S1 (lub) and sound S2 (dub), respectively. The sound factor of heart problems [5]. According to World Health Organization
S1 segment relates to the closing values of systole, while the S2 sound (WHO) research, an estimated 17.9 M human casualties are due to Cvds
segment connects with the opening values of diastole [3]. The listening [6]. This rapid growth of CVD compels us to the automatic detection of
to heartbeat sounds by using a stethoscope, health experts can find heart disease from the heartbeat that provides effective and efficient treat­
diseases. Heart rate is also called pulse, and pulse is the integer of times a ment procedures.
person’s heart beats per minute. The value of normal heart rate varies The visual monitoring of heartbeat signals is time taking and prone
from person to person. According to mayo clinic [3], a standard heart to inaccurate detection. AI approaches can be applied to overcome these

* Corresponding author.
E-mail address: f2019288004@umt.edu.pk (H. Malik).

https://doi.org/10.1016/j.bea.2022.100048
Received 27 April 2022; Received in revised form 14 July 2022; Accepted 26 July 2022
Available online 27 July 2022
2667-0992/© 2022 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048

limitations. ML is a sub-branch of AI, that is mainly used for feature cutting-edge methods.
selection, statistical analysis, and classification [7]. ML models are Hurnanen et al. [18] used a linear classification approach to build a
commonly applied to different heartbeat audio signal databases with system for SCG signals. The effectiveness of the system was tested on 13
several feature extraction approaches [8–12]. However, these ML patients, and it performed admirably on low-quality SCG motions. Using
methods are subjective and time-consuming due to the handcrafted the Shannon energy algorithm (SEA), Gomes et al. [13] created a system
feature selection process. To overcome these limitations, neural network that converts the heartbeat audio signal into distinct segmentation
models are used for the detection of abnormal heartbeat audio signals. sections. They used two well-known machine learning methods, J48 and
These models have the capability of extracting the features automati­ MLP, respectively, after segmentation. Unfortunately, they were only
cally and performing the classification task. The databases used in the able to complete Challenge 1 (segmentation) and were unable to com­
present study were collected from the Pascal challenge [13] and Phys­ plete challenge 2 (classification). These two classifiers were not able to
ioNet 2017 challenge [14], to achieve the best classification accuracy of show good results because of challenge 2.
heartbeat signals with the lowest error rate. To the best of our knowl­ Using a one-dimensional convolutional neural network (1D-CNN)
edge, this is the first study that uses a single neural model for the clas­ and a feed-forward neural network (F-NN), Krishnan et al. [19] present a
sification of eight (8) different types of heartbeat audio signals. These methodology for classifying unsegmented phonocardiogram (PCG) data.
signals are obtained from both of the databases such as normal, murmur, Without any feature engineering or segmentation of heartbeat sound
extrasystole, artifacts, noisy, atrial fibrillation (AFib), Fusion beat, and data, the F-NN attained the highest diagnosis accuracy of 0.8575. Gomes
other beats. A novel RNN using the LSTM model is constructed for the and Pereira [4] described a method for classifying heartbeat audio
categorization of abnormal heartbeat audio into their respective groups. PASCAL challenges. They used algorithms to evaluate cardiac sound
We also analyzed the outcomes of the proposed model with MLP and five patterns, such as S1 (lub) and S2 (dub). They use MATLAB to implement
different ML models i.e. Random Forest (RF), K-Nearest Neighbours the decimate approach, and then a band-pass filter to remove the noise.
(KNN), Support Vector Machine (SVM), and Naïve Bayes (NB), and After that, they utilized SEA to determine the top and lower values of the
Decision Tree (DT). For the proposed model, the frame rate (FR) features heartbeat sound, as well as J48 and MLP to train the model, and they
are extracted from the heartbeat signal and removed noise by using were able to predict the sound with acceptable accuracy.
bandpass filters. Zheng et al. [20] demonstrate a novel feature approach to identify
Below are the contributions of the underlying study: abnormal patterns from heartbeat sounds. These characteristics
We designed a novel neural network architecture of RNN using LSTM included the heart murmur energy fraction (HMEF), the maximum en­
to automatically detect eight different heartbeat audio patterns. ergy fraction of the heart sound frequency sub-band (HSEFmax), and the
Furthermore, we also performed a detailed comparison with baseline S1-S2EF entropy. After that, a wavelet packet was used to normalize and
models of ML and DL. deconstruct the signals. Furthermore, a support vector machine (SVM)
Our proposed model obtained significant classification accuracy on algorithm was used to obtain outstanding diagnostic accuracy on a
two benchmark datasets such as 99.71% on Pascal Challenge and 98.7% limited set of a dataset with a total of 247 sounds, including 80 normal
on PhysioNet 2017 dataset. and 167 pathological murmur sounds. Deng and Han [21] proposed a
As discussed in Section 2, we have reviewed the recent state-of-the- framework that uses an auto-correlation procedure instead of segmen­
art methods of artificial neural networks and the traditional methods tation techniques to precisely classify heart sounds. With discrete
used to classify abnormal heartbeat audio. wavelet decomposition (DWT), this process was retrieved from the sig­
The proposed model can take advantage of the power of parallel nal’s sub-band coefficient. After extracting the features, they used SVM
computing architecture, such as a graphics processing unit (GPU). As a to acquire a precision of 77%, 76%, and 50% for normal, murmur, and
result, we compare the computation times of a GPU and a central pro­ extra-systole, respectively, indicating an unsatisfactory result.
cessing unit (CPU), demonstrating that in heartbeat audio categoriza­ To determine the discriminative feature of heartbeat sound, Zhang
tion, the GPU implementation beats its CPU counterpart. The GPU et al. [22] employed tensor decomposition and a scaled spectrogram.
outperforms the CPU even in the training phase of the classifier, i.e. the The SVM model was utilized, and they get normal results which are 0.74.
proposed system can be trained rapidly with a GPU. In [23], the heartbeat sound categorization was done utilizing the DWT
Additionally, we developed a novel framework that can detect and MFCC feature extraction techniques. They recorded the digitized
abnormal heartbeat sounds by using neural network methods. Never­ heartbeat sound with a phonocardiogram (PCG) electric stethoscope.
theless, our framework can also be utilized for other heart tract disease They used a deep convolutional neural network (DCNN), SVM, and
classification applications as well. K-Nearest Neighbor in their work. They also combine the DWT and
This research work has been categorized as follows: Section 2 con­ MFCCs approaches to obtain significant outcomes. The three key pro­
sists of state-of-the-art. Section 3 describes the methodology and pro­ cedures employed by the researchers to achieve heartbeat sound clas­
posed framework. Section 4 contains the results obtained after sification are segmentation, feature extraction, and classification model.
experimenting; this study has been concluded in Section 5. Using amplitude threshold [24–26] and probabilistic techniques [27,
28], the initial segmentation process has been discovered. The second
2. State-of-the-arts step is to use feature engineering based on the time-frequency [29,30],
and the third step is to use the classification model. MLP [13] and SVM
A lot of work has been conducted on the classification of cardiac [20] are the most well-known and widely utilized classification models.
diseases by using deep learning on different medical databases. Most of Raza et al. [31] applied a bandpass filter to extract the feature from
them achieved remarkable accuracy. the sound signal and used the downsampling technique for data framing.
Bruser et al. [15] devised a novel method by using bed-mounted The simple spectrogram is used for the apparent visualization of the
sensors to detect aberrant pulse patterns from cardiac vibration signals sequence of amplitude. In their purposed system, multiple models such
(CVS) and remotely monitor patients. They tested the detection per­ as random forest (RF), MLP [13], DT, and RNN for the classification of
formance of their proposed model using a variety of machine learning heartbeat sound were used. CNN was evaluated on large-scale audio
classifiers. Xiong et al. [16] used the k-means technique to find atrial datasets by Hershey et al. [32] and their results were good. In the
fibrillation (AFib) in the PubMed database. The maximum entropy Detection and Classification of Acoustic Scenes and Events (DCASE)
method outperforms the other ML algorithms, according to their find­ 2017 challenge, gated CNN won 1st position which was presented by
ings. In [17], they created a framework that can detect the AF using PPG Thiyagaraja et al. and Xu et al. [33,34]. Li et al. [35] also applied the
signals without using any feature engineering techniques. However, the BLSTM model to DCASE to achieve enhanced results. Xu et al. [36]
system’s fundamental flaw is a lack of comparison to contemporary, evaluated an ensemble approach based on CNN that provides enhanced

2
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048

Fig. 1. Different visual representations of heartbeat sound signals; (a) Normal, (b) AFib, and (c) Noisy.

classification accuracy. 3.1. Data descriptions


Most of the researchers [32–34] used CNN to achieve significant
diagnostic accuracy in medical image classification. RNN [37] also The PASCAL challenge competition dataset consists of two portions
produced acceptable results on a sequence of data. Most medical images (A & B) which have been generated from two different sources. Dataset-
are captured in the form of a 2D array that contains the picture element A [13] was obtained from the general public via the iSethoscope
of that image; while audio data is recorded and collected in the shape of Pro-iPhone application, and Dataset-B [13] was generated from a clin­
a 1-Dimensional array that contains the frequency value. For this ical examination in hospitals using DigiScope. Dataset-A has been
research article, we also selected RNN instead of CNN because the further divided into five categories i.e. normal, murmur, extra-systole,
dataset contains the heartbeat sound samples that have been converted artifacts, and unlabeled, and having a total of 176 samples of all cate­
into a 1D array. The reason behind this; RNN is a more suitable and gories of heart sound in wav format. The category unlabeled has been
appropriate classifier for temporal data. In addition, most of the re­ excluded in this research study. In Dataset-A each sample consists of a
searchers [31,37,38,39,40] believe that CNN is mainly used for spatial 4000 sampling frequency rate i-e. The normal folder contains 44,100
data, while RNN is applied to time series data (sequential data). (4.97 s) and the murmur folder consists of a total 110,250 (4.96 s)
sampling frequency rate. The different visual representation of the
3. Materials and methods heartbeats is shown in Fig. 1. The PhysioNet database is known for its
popularity as it has been widely used by other researchers for abnormal
This section contains the methodology of the study. heartbeat signals classification and diagnosis of heart tract diseases. This
database consists of a total of 8528 ECG waveforms of a single channel.
The time duration of each sample waveform was between 9 (s) and 61

Table 1
Description of heartbeat sound databases.
Datasets Category Recordings Time Length in seconds (s)

Dataset A Dataset B Mean Standard Deviation Max Min

Pascal Challenge Normal 31 320 21.9 8.1 30.0 1.0


Murmur 34 95 21.6 8.5 20.0 22.0
Extra-Systole 19 46 24.1 9.0 25.1 15.2
Artifacts 40 N/A 25.1 10.8 18.0 9.1
Unlabeled 52 195 23.8 8.9 17.8 10.1
Total 176 656 23.3 9.1 30.0 1.0
PhysioNet Normal 5154 31.9 10.0 61.0 9.0
Challenge 2017 AFib 771 31.6 12.5 60 10.0
Noisy 46 27.1 9.0 60 10.2
Other Sound 2557 34.1 11.8 60.9 9.1
Total 8528 32.5 10.9 61.0 9.0

3
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048

Fig. 2. Visual depiction of power spectrogram of heartbeat audio sound signal.

(s) having a sampling frequency of 300 Hz. The average length of the 5 s time, then the TF rate of the audio file can be calculated by using Eq.
heartbeat audio signal was 30 (s). Each audio signal was labeled into (1) i.e. TF = 5000×5 = 25,000. Each heartbeat audio file consists of a
four kinds of classes such as normal (N), AFib, Noisy and other heart­ different length of frame rate, consequently, it is not possible to apply
beats. The total statistics of both datasets are presented in Table 1. any deep learning classification approaches because of the variant size
of feature-length [42]. So, the sampling rate of each audio file has been
3.2. Data pre-processing and visualization fixed by employing a downsampling technique. This technique reduces
the sampling frequency of each heartbeat audio file to the sizes of 20,
The audio data of both databases have been pre-processed before 000 Hz and 300 Hz frame rate for Pascal and PhysioNet challenge da­
applying the classification approaches. There are multiple pre- tabases, respectively. In addition, heartbeat signals have normalized by
processing techniques such as Perceptual feature MFCC, Frequency removing noise using a bandpass filter, and then the zero-padding pro­
domain features (Amplitude of individual frequencies), and Discrete cess has applied which is measured by Eq. (2).
cosine transform (DCT) [41] which are mainly used for selecting H− μ
dominant features from the audio data. In the present study, the MFCC H= (2)
σ
technique is used for extracting useful data from sound signals [40]. The
fast Fourier transform (FFT) of the heartbeat audio signals is derived by Where H shows the shape of the signals (8528, 1800), µ and σ are
using MFCC. Furthermore, DCT is also applied to define the finite considered as mean and standard deviation (SD), respectively.
sequence of heartbeat audio signal points in terms of cosine function
oscillating at 22 Hz frequency. The power spectrogram is also consid­ 3.4. Data sampling
ered one of the most significant and efficient features for the visual
illustration of audio sound signals [40]. The power spectrogram of the There are 50,000 frame rates in the heartbeat audio database. It is
first 5 s of the heartbeat signals is represented in Fig. 2. quite difficult for us to deal with such a high frame rate on our GPU. To
solve this problem, we minimize the feature of each heartbeat audio file
3.3. Data normalization by using down-sampling algorithms to process sound files more quickly
and in a shorter amount of time. In digital signal processing (DSP), the
The sound signal of the dataset contains a sampling frame rate (SFR) downsampling term is associated with the process of resampling in a
of 4000 which is certain with the second period of the audio files. This multi-rate signal system. The entire process of bandwidth reduction
SFR consists of different frame principles of heartbeat audio signals (filtering) and sample-rate reduction is described by this sampling
within the range of 1-s (s) and obtains the all-inclusive frames by method. When applied to a sequence of heartbeat audio sound signals,
multiplying the sampling rate (SR) to the value of time (T) of each audio the technique generates a close approximation of the sequence that
file by using Eq. (1): would have been generated by sampling the data at a lower rate. As a
result, a decimate down-sampling approach is employed to lower the
TotalFrame(TF) = SR × T (1)
frame rate of the cardiac audio sound stream while maintaining its ef­
For more understanding, if the heartbeat audio file has a duration of ficiency [5]. Using an 8 × 8 low pass filter, the downsampling technique

4
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048

transforms a 50 kHz sampling frame rate to an 800 Hz sound signal Table 2


frequency for the Pascal Challenge dataset and 300 Hz for the PhysioNet Dimension and operations of Models.
dataset. The low pass filter allows low-frequency signals to pass easily Classifiers Parameters
and is rated at a cutoff frequency of 1.6% of the sampling frame rate.
RF Shuffle=true, Random_state=0, Max_depth=15, min_samples_leaf = 5
When dealing with filters, however, it is critical to monitor the response KNN return_distance = false, leaf_size = 30, n_neigbhour =10, centers = 2
of the filter, which is based on component value and load impedance. SVM SVC, Kernel linear, degree=8, gamma = “auto”
Now we can use the proposed model to solve the heartbeat audio cate­ NB Random_state=0, Metric_param=dict
gorization problem with minimal processing power and execution time. DT Random_state=0, Max_depth=15

3.5. Machine learning classifiers 3.5.5. Decision tree (DT)


DT is a non-parametric supervised learning approach also used for
Different ML classifiers such as RF, KNN, SVM, NB, and DT are used classification-related dilemmas. The purpose is to define a model that
to classify the heartbeat audio data. can predict the target value by using the decision obtained from the
structure of the data [46]. DT is applied to datasets by setting the value
3.5.1. Random forest (RF) of max depth to 5 and the condition of the random state is false.
The RF method is a type of supervised learning, which is used to Table 2 represents the detailed summary of parameters that are used
process the labeled information for training purposes to accurately to train machine learning models.
classify the unlabeled data [38]. We applied the ML classifier RF to the
datasets with the max depth value of 5 having a random state is 0. Then,
we applied the entropy (E) function to determine the output as shown in 3.6. Proposed method
Eq. (3).
The proposed model is constructed using neurons as its fundamental
E = ΣCn=1 − Pn ∗ log2 Pn (3)
element. When analyzing sequential data, this method is particularly
Where Pn is simply the frequentist probability of a class ‘n’ in our effective because each neuron has access to its internal memory. As an
sound signal data. The dataset used in this study contains four different illustration, the output of a neuron at one stage is used as an input for the
classes i.e. normal, murmur, extra-systole, and artifacts. Therefore, ‘n’ stage that comes after it. The output of one neuron used in the proposed
here could be either normal, murmur, extra-systole, or artifacts. Entropy model acts as the input for another neuron, but actually, a single neuron
(E) is a measure of disorder or uncertainty. In addition, the entropy is responsible for performing the classification of heartbeat audio sig­
function has been used for the present study to assess heart rate vari­ nals. The proposed RNN (LSTM) model uses the magnitude of the
ability, a noninvasive marker of cardiovascular autonomic regulation. spectrogram of heartbeat audio signals as a feature. Fig. 3 shows the
architecture of the proposed model for the classification of abnormal
3.5.2. K-nearest neighbour (KNN) heartbeat audio patterns and Table 3 presents the detailed parameters
KNN is a well-known ML classifier that is used for regression and and compositions of the proposed model. Our designed model consists of
classification tasks [43]. While applying KNN to the datasets, we select multiple layers like LSTM, Dropout, Dense, and Softmax layers. Each
the value of K = 10 (n_neigbhour) and set the condition of return dis­ heartbeat audio file is passed through the input layer with a shape of 1 ×
tance and metric parameters are false and dist, respectively. We used 800 × 1 and contains feature extraction techniques. When the input
Euclidean distance to predict the similarity between the data points by audio file is processed by LSTM layers it transforms the shape from 1 ×
using Eq. (4). 800 × 1 to 1 × 800 × 64, and afterward, the dropout layer is added to
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ the model and set the suitable weight to reduce the chances of over­
D = Σni=1 (qi − pi )2 (4) fitting. The weight value randomly halts approximately half of the
neurons from training and transforms the training process during each
3.5.3. Support vector machine (SVM) step to prevent the hidden layers from being dependent on provided
SVM is one of the famous ML classifiers which is used to find the best inputs 43. Again, the LSTM layer has modified the structure of layers
hyperplane to classify the data between n-dimensional space [44]. In into a size of 1 × 1 × 32 and then again inserted the dropout layer
SVM, the w represents the vector of the decision portion, u and b is the without affecting the form of that layer. In the training of the RNN
unknown and constraint value, respectively. Furthermore, the Eqs. (5) model, it is crucial to define the loss function and select a substantial
and (6) represents the (+) and (-) equations. optimizer. Simultaneously, a batch normalization layer is also added to
the model after each convolutional layer to normalize the non-linear
Positive : w ∗ u + b > 0 (5) transformation method. The resulting feature is fed to the last 2 fully
connected layers. The first dense layer contains multiple nodes and the
Negative : w ∗ u + b < 0 (6) second layer of the model has been divided into the size of 1 × 1 × 3
which is used to classify the abnormal pattern of heartbeat audio into
3.5.4. Naïve bayes (NB) their 4, respective categories of both datasets. The final layer of the
Naive Bayes (NB) is one of the significant and appropriate classifiers network only required three neurons, to equally classify all classes of the
based on Bayes’ theorem with the “naive” assumption of conditional dataset. Additionally, in the last layer, we also added the softmax layer
independence between each pair of class variables (features) [45]. For to obtain the probabilities of each heartbeat audio sample. Next,
the present study, we applied the GaussianNB for classifying 4 different compiling the proposed model takes two parameters as optimizer and
kinds of abnormal patterns of heartbeat audio using the loss function. The optimizer function manages the learning rate (LR).
below-mentioned Eq. (7). For the present study, we applied “Adam” as our optimizer. This opti­
( / √̅̅̅̅̅̅̅̅̅̅̅̅ )
( ( )2 ) mizer is used to adjust the LR throughout the training process. The LR
xi − μy
P(xi |y) = 1 2λσ2 y exp − (7) determines how fast the optimal weights for the proposed model are
2σ 2 y
measured. A smaller LR may lead to more accurate weights (up to a
certain point). An LR of 0.45 is used for training the proposed model.
where the µy and σy were measured by maximum likelihood. The cross-entropy loss function is applied to measure the loss of the
proposed model at the time of training. The cross-entropy loss is
calculated by taking the average squared difference between the

5
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048

Fig. 3. Proposed RNN model of the present study.

the output gate performs the write function and the last gate of the
Table 3 memory block is forget gate which performs reset operations. More
Dimension and operations of the proposed model. precisely, the output produced by the cell is multiplied by the output
RNN Layer (Type) Output Shape Parameters gate, and forget gate is multiplied by the output of the previous memory
Layers cell. When the input signal of heartbeat is provided to the model, each
1 Input (None, 800, 1, 10,024,384 time LSTM layers contain a memory ct at a specific duration of time t.
1) The activation (ht) function is calculated by using Eq. (8):
2 LSTM (None, 800, 1, 0
64) ht = ot × σt (ct ) (8)
3 Dropout (learning rate = 0.45) (None, 800, 1, 1,638,523
64) Eq. (9) was used to measure the memory data of the output gate ot.
4 LSTM (None, 800, 1, 0 (
32)
ot = σs wo xt + uo h(t− 1) + bo ) (9)
5 Dropout (learning rate = 0.45) (None, 800, 1, 400,728
32)
Eq. (10) is used to control the memory of the system by using forget
6 Dense (None, 1, 1, 3) 0 gate ft and transforming the data of the existing memory ct with the new
7 Softmax (None, 1, 1, 3) 512 information.
Total Trainable Parameters:
12,064,147
Trainable Parameters:
12,063,892
Non-Trainable Parameters: 255

predicted and actual values. If the loss of the model is closer to zero, this
means the model performs better. The total number of trainable pa­
rameters is 12,064,147 which are further divided into two groups; the
trainable parameters are 12,063,892, and the non-trainable parameters
are 255. The difference between these two parameters is that the
trainable parameters are updated at the time of training processes, while
non-trainable parameters are not updated and optimized during
training. So, the non-trainable parameter will not take part in the clas­
sification process.

3.6.1. Long short-term memory (LSTM)


The LSTM [47] is considered part of RNN and is commonly applied in
Artificial Neural networks (ANN). The LSTM contains joined block
known as a memory block, and every memory block of the LSTM consists
of three gates such as read (R), write (W), and reset (R). These gates
perform different functions i.e. input gate performs the read function, Fig. 4. LSTM memory cell with three gates.

6
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048

(
f t = σs wf xt + uf h(t− 1) + bf ) (10) eyi
S(yi ) = ∑ yk , k = 1, 2, 3, 4,...., n (15)
The Eq. (11) is applied for upcoming new information of the heart­ ke
beat signal which is managed by the input gate.
( where S and y show the output and input, respectively.
it = σs wi xt + ui h(t− 1) + bi ) (11) The detailed description of our proposed model is outlined in Algo­
rithm 1.
The existing memory of our proposed LSTM block contains the in­
Algorithm 1: Classification of abnormal heartbeat audio using the proposed model
formation of the heartbeat audio signal in the network which is updated
Parameters: Recurrent neural network (R), training (tr), testing (te), Performance
by the applying following Eq. (12) : evaluation (Pe), and Confusion matrix (Cm).
( 1. n ← heartaudio_samples
ct = ft × c(t− 1) + it × σh wc xt + uc h(t− 1) + bc ) (12) 2. Pp ← Pre-processing
3. For h ← 0 to n do :
where ht and ct are known as hidden vector and cell vector, respectively, 4. For c ← 1 to Pp do :
σs is the symbol of the sigmoid function, σh represents the hyperbolic 5. D ← data (Pp + c):
6. For i ← 1 to D do :
method, w and b denote parameters weight and bias vector, respec­
7. tr = partition (rand (D = 0.70))
tively, with the preliminary values of c0 = 0 and h0 = 0. The graphical 8. te = partition (rand (D = 0.30))
representation of the LSTM memory cell is illustrated in Fig. 4. 9. M ← R (tr)
10. End i
3.6.2. Dropout layer 11. End c.
12. For i ← 1: size (te) do:
The main problem has been observed at the time of training the deep
13. For h ← 1: output_class :
learning classifiers are over-fitting. If DL classifiers show satisfactory 14. For j ← 1: count (M (h))
results in the training phase but later when these trained classifiers were 15. D (h,j) = M (j,h) – te(i)
applied to the testing dataset which shows unfortunate output results. It 16. End j.
17. V(h) ← min (D (h))
will only occur when two or more neurons perceive similar results
18. End h.
repetitively [48], and we need to remove the neurons that affect the 19. O ← predict (V)
output as shown in Eq. (13). As we mentioned earlier in the proposed 20. For B ← 0 to O do:
method section, to reduce the chance of overfitting, we fine-tuned the 21 O ← Pe
over model by changing the value of learning rate (r) such as 0.001, 0.05 22. Pe ← Cm
23. End B
& 0.45, etc. In this study, r = 0.45 produced significant and appropriate
24. End i
results in training the proposed classifiers. 25. End h.

ZiL = WiL Y L + BL (13)

Where ZL represents the input layer, L is a hidden layer, and WL, BL,
and YL show the weight, biases, and output vector, respectively. 3.6.5. Evaluation criteria
The performance of the ML and DL classifiers is evaluated on the
3.6.3. Dense layer testing set of both heartbeat sound signal datasets. The performance of
The dense layer is applied to transform the proportions of a layer. the proposed model on test data is evaluated in terms of accuracy,
This layer contains the conventional neurons which take input data like sensitivity, specificity, ROC, and precision-recall curve [51]. Following
weight, assign linear function, and then process the gained output to its Eqs. (16), (17), and (18) represent the accuracy, sensitivity, and speci­
very next layer where all neurons are connected to the input layer. We ficity values, respectively and k represents the number of classes:
applied batch normalization function (BNF) after the dense layer for the ∑
k
normalization of erudite spreading to enhance the training efficiency
TPi +TNi
TPi +TNi +FPi +FNi
[49]. Furthermore, to manage the consequence of initialization, we Accuracy = i=1
(16)
k
repeated the process of a combination of hyperparameters five times and
applied the averaged value for validation as shown in Eq. (14). ∑
k
TPi
X = f(Y × w + b) (14) Sensitivity = i=1
(17)

k
(TPi + FNi )
where b represents the bias vector, f is the activation function, w con­ i=1

tains the weight, Y and X show the input and output layers, respectively
[29]. ∑
k
TNi
Specificity = i=1
(18)
3.6.4. Softmax layer ∑
k
(TNi + FPi )
The softmax activation function is exceedingly important and it de­ i=1
cides whether the neurons are active or not. Softmax is effectively
The confusion matrix is used to calculate the performance of a
handling multi-class classification problems [50]. The main purpose of
classifier on the set of test data for which the true or correct values are
the softmax function is to spot the largest value in neurons and assign the
known. True positive (TP) and False Positive (FP) represent the value of
weight is one to that neuron, and the rest of the neurons are assigned
correctly and incorrectly classified images, respectively. Similarly, True
zero value. It can be mathematically written as in Eq. (15). The results
Negative (TN) and False Negative (FN) contains the value of the correct
obtained from our proposed RNN (LSTM) model are directed toward the
and incorrect instances of images, respectively. The AU(ROC) curve was
number of classes. A softmax function is normally used to get the
also calculated.
resultant vector into the probability containing the values in [0,1]. This
function classifies the output sound into one of the heartbeat audio
4. Results and discussions
diseases classes: normal, murmur, extrasystole, artifacts, noisy, AFib,
Fusion beat, and other beats.
We feed a heartbeat audio signal into the proposed model, which
classifies the audio into one of four categories. The batch size was set to

7
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048

Fig. 5. Training and validation accuracy and loss; (a) represents the model loss and (b) represents the model accuracy.

Table 4 Table 5
Results of ML and DL classifiers on pascal challenge dataset. Results of ML and DL classifiers on PhysioNet Challenge Dataset.
Classifiers Accuracy Specificity Sensitivity F1- AUC Classifiers Accuracy Specificity Sensitivity F1- AUC
Score Score

RF 0.92 0.90 0.89 0.91 0.90 RF 0.702 0.77 0.76 0.78 0.76
KNN 0.74 0.72 0.71 0.73 0.75 KNN 0.646 0.68 0.67 0.68 0.69
SVM 0.72 0.70 0.69 0.70 0.74 SVM 0.812 0.82 0.81 0.826 0.80
NB 0.77 0.76 0.73 0.71 0.76 NB 0.74 0.73 0.72 0.70 0.73
DT 0.70 0.68 0.65 0.66 0.70 DT 0.697 0.69 0.69 0.69 0.68
MLP 0.85 0.84 0.80 0.82 0.85 MLP 0.774 0.77 0.76 0.75 0.75
Proposed Model 0.9971 0.993 0.986 0.989 0.983 Proposed Model 0.987 0.99 0.985 0.989 0.988

64, and the proposed model was trained for 200 epochs. The grid search and DL classifiers to diagnose the abnormal heartbeat signal patterns
technique was used to tune the hyperparameters of the proposed model using the Pascal Challenge database.
and ML classifiers. For each class label, the accuracy, precision, recall, The results in Table 4 show that the proposed deep learning algo­
AUC, and F1-score of the proposed model, MLP, and ML classifiers were rithm attained a better classification accuracy of 99.71%, specificity of
evaluated. 99.3%, the sensitivity of 98.6%, and 98.9% of f1-score as compared to
the MLP as well as traditional ML classifiers. The reason behind
4.1. Experimental setup achieving the significant performance of the proposed model is due to
the ability of self-feature extraction. MLP is considered as the earlier
This research experiment evaluates the proposed model as well as classification algorithm applied in the various extent of classification-
various ML classifiers. The Keras framework is used to implement the related problems. RF is a hybrid learning approach and is mainly used
proposed model. Moreover, the approaches that are not directly linked for categorical and text classification tasks [52]. Therefore, RF was
with neural networks are programmed in python language. The exper­ well-performed on the categorical heartbeat dataset and achieved 92.0%
iment was executed on a Windows-based operating system with 11 GB of classification accuracy which is superior to MLP (85.0% accuracy).
GPU NVIDIA GeForce GTX and 32GB RAM. The performance of the proposed model is also evaluated on the
PhysioNet 2017 challenge dataset, and their detailed results are pre­
sented in Table 5. On the PhysioNet dataset, the overall accuracy of
4.2. Results analysis 98.7%, specificity of 99%, sensitivity of 98.5%, and f1-score of 98.8%
were gained by the proposed model. MLP with 100 hidden layers has
Fig. 5 depicts the training and validation accuracy in terms of epochs. been also applied to the PhysioNet dataset which achieves the 77.4% of
The maximum obtained accuracy for training was 99.96%, and that for accuracy, specificity of 77.0%, sensitivity of 76.0%, and an f1-score of
validation was 95.33%. These values indicated that our proposed model 75.0%. Moreover, the increase in the number of hidden layers also in­
trained well and could correctly classify abnormal cases. The training creases the accuracy but somewhere increasing the size of layers doesn’t
loss was 0.021, and the validation loss was 0.0668. The purpose of this affect the accuracy [41].
study is to design an automated system to classify abnormal heartbeat Besides providing a detailed characterization of the heartbeat audio
audio signals accurately by using deep learning methods. The pre­
liminary diagnoses of heart problems have been assisting the medical
expert to resolve the further process of medicine. In this study, two well- Table 6
known publically available databases i.e. Pascal Challenge and Physi­ Technical specifications of the CPU and GPU for the present study.
oNet databases have been used, which contain 8 different types of Specifications CPU GPU
heartbeat sounds. Initially, we transformed the dataset, and converted it
Processor Intel® Core™ NVIDIA Titan X (Pascal) 3584
two different times i.e., 10.0-s (10,000 frames) for the Pascal dataset and i7–10,700 CUDA cores
25.0-s (20,000 frames) for the PhysioNet dataset. Then, the proposed Memory 32 GB 12 GB
model and MLP model were applied to the datasets to classify abnormal Base Frequency 2.90 GHz 350 MHz
heartbeat audio. In addition, ML classifiers were also applied, and their Max Memory 45.8 GB/s 480 GB/s
Bandwidth
outcomes are observed. Table 4 demonstrates the output summary of ML

8
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048

Fig. 6. (a) Computational times for training the proposed model on the heartbeat audio signals datasets, and (b) Computational times for testing the proposed model.

signal based on the proposed model, our work also aims at achieving
Table 7
computational times that allow for the real-time processing of heartbeat
Hyper-parameter tuning of Proposed model.
audio data [53–55]. In particular, we have implemented the proposed
model defined here independently in an unparalleled Python version for Datasets Dropout Rate Precision Loss
the CPU and a Python/CUDA version for the GPU. Python is an 0.05 0.958 0.39
object-oriented programming language, and CUDA is a parallel Pascal Database 0.35 0.979 0.31
computing platform built by Nvidia to communicate with their GPUs. 0.45 0.989 0.27
0.05 0.970 0.41
The technical specifications of the CPU and GPU employed in this study PhysioNet Database 0.35 0.980 0.34
are listed in Table 6. 0.45 0.987 0.29
Even though ensembles are intrinsically self-contained, making they
are good candidates for parallel multi-processor implementations. Due
to the inclusion of large matrix products and non-linear mapping func­ using the LSTM model with the support of softmax, dense and dropout
tions in the reservoir paradigm, serial implementations are also ideal for layers achieves significant accuracy in classifying different types of
the exploration of computationally rapid approaches. These methods, abnormal heartbeat sounds. The structure of LSTM is based on the time
such as GPU implementations, can reduce latency while increasing series of data [57] and their memory cells remember the previous data.
throughput. To explore the computational time, a series of training and Each time, the gates of the memory cell deal with the flow of data along
classification procedures for both of the heartbeat audio databases are with the pattern that can store the data for a long time more precisely
analyzed. Python implementations benefit from the Scikit Learn library [58]. The ability to remember the previous data for a long time to solve
[19], while Python/CUDA uses TensorFlow, Python cuBLAS, and a the present dilemma is the cause of achieving the appropriate results.
CUDA kernel implemented for the non-linear mapping [56]. Fig. 6a and We applied different optimizers like Adam, Adagrad, and SGD, and finds
b illustrate the computational times of training and a testing realization that the Adam optimizer produced significant results as compared to the
for both the heartbeat audio signals versus the number of neurons. The competitor’s optimizer. The dropout layer was applied to reduce the
GPU and CPU comparison shows the benefit of using a GPU imple­ problem of over-fitting by randomly dropping the connection between
mentation, with significantly lower training times. Fig. 6a presented the the layers [59–61]. Additionally, we also fine-tuned the hyper­
computational times including the random non-linear mapping (NM) of parameters by using the different values of dropout rate {0.05, 0.35,
the input onto the reservoir and the output weights over the entire train 0.45}, and each dropout rate produced different outputs demonstrated
dataset. Fig. 6b depicts the computational time for the last classification in Table 7.
steps that calculate the output in the test dataset. As expected, the Tables 4 and 5 show that RNN has achieved 0.9971 accuracies on
processing time increases as the number of neurons increases, especially 10.0-s sample sound files of the Pascal dataset and 0.987 accuracies with
during the training process. Moreover, the piece-wise linear trend in the 25.0-s audio sample files of the PhysioNet challenge dataset. Random
GPU Classification product demonstrates the impact of small products forest (RF) gained 92.0% accuracy and 85.0% accuracy produced by the
on cuBLAS scaling, which is inherent to the library. MLP with the Pascal Dataset. It is noticed that RF gained remarkable
From Tables 4 and 5, it has been observed that the proposed RNN classification accuracy as compared to MLP and other ML classifiers in
classifying abnormal heartbeat audio signals. Moreover, the overall

9
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048

Fig. 7. 5-fold cross-validation for 10.0-s of heartbeat audio signal of Pascal database.

Fig. 8. 5-fold cross-validation for 10.0-s of heartbeat audio signal of PhysioNet Challenge database.

Table 8 Table 9
Comparison of the proposed model with state-of-the-art classifiers on the Pascal Comparison of the proposed model with state-of-the-art classifiers on PhysioNet
Database. Challenge Dataset.
Refs. Models Accuracy Specificity Sensitivity F1- Refs. Models Accuracy Specificity Sensitivity F1-
(%) (%) (%) Score (%) (%) (%) Score
(%) (%)

[62] CNN 95.3 99.29 97.87 – [31] RNN 80.45 80.1 – –


[63] MLP 97.5 96.2 94.5 – [70] CNN 80.0 80.0 79.0 80.0
[64] CNN 95.0 81.0 69.0 – [71] LSTM 77.16 77.0 76.0 76.0
[65] CNN 95.3 96.6 97.9 – [22] SVM 76.0 – – –
[21] SVM 74.0 50.0 75.0 56 [72] CNN 86.5 0.86 85.0 86.0
[66] DNN 98.9 98.9 98.8 98.9 [73] DCNN 97.1 93.3 98.3 –
[67] SE- 99.61 93.87 93.78 94.6 Proposed RNN 98.7 99.0 98.5 98.9
ResNet Method (LSTM)
[68] DeepNet 99.56 98.9 97.8 97.6
[69] CNN 99.13 98.22 97.49 97.8
Proposed RNN 99.71 99.31 98.6 98.3 and 8, respectively.
Method (LSTM)

4.3. Comparison with state-of-the-art methods


results of RNN with 10.0-s heartbeat audio sample files of Pascal Chal­
lenge are greater than the results of 25.0-s sample files of PhysioNet Tables 8 and 9 show the comparison of the classification accuracy of
Dataset. Cross-validation (CV) is a re-sampling approach that is applied the proposed model with other state-of-the-art methods on both of the
to appraise the classification methods on inadequate dataset samples. publically available databases such as the Pascal challenge and the 2017
The output produced by the K-fold (CV) is less optimistic as related to PhysioNet challenge. As demonstrated, the proposed RNN (LSTM)
the other techniques, such as training and testing split sets. The term K model achieved the best classification accuracy in terms of different
signifies the number of folds that randomly divides the sample datasets parameters like accuracy, sensitivity, specificity, and f1-score.
into K equal size. We applied K = 5-fold CV to calculate the classification
performance of the RNN model. The output generated by RNN with a 5-
fold CV on Pascal and PhysioNet challenge datasets are shown in Figs. 7

10
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048

4.4. Discussions planning to design a real-time process to detect abnormal heartbeat


patterns without using labeled data.
The proposed model, MLP, and different ML classifiers are evaluated
in this study. The overall dataset of Pascal and PhysioNet in Cardiology
Challenge consisted of several patterns of heartbeat sound gathered Declaration of Competing Interest
from different countries. We have detected the heart states from each
ECG waveform and segmented these heart signals into smaller chunks to The authors declare they have no conflict of interest.
reduce the computational power [72,74]. MFCCs were measured from
these segmented chunks, and both the ML and DL classifiers were Availability of data and material
trained on 70% of the heart sound, 10% and 20% of data were applied
for validation and testing, respectively. In this work, the classification The availability of all data and materials are presented in this study.
accuracy of the proposed RNN has been tested on both cardiology da­
tabases. The RNN model consistently performed well as compared to
other traditional machine learning and deep learning classifiers on both Funding
of the datasets (see Tables 4 and 5). The DL algorithms can be derived
well on the erudition of good representation of labeled sound data. It is No funding was received.
to be observed that both datasets applied in this study are clinically
annotated but still, it contains lots of noise and invariance aspects that
References
may produce the anomaly in heart signals, and results from it can affect
the performance of a proposed classifier. Therefore, the bandpass filter [1] I.R. Hanna, M.E. Silverman, A history of cardiac auscultation and some of its
was applied to remove the noise and the current invariance features can contributors, Am. J. Cardiol. 90 (2002) 259–267.
be gained by using the combination of time frequency and consideration [2] Z. Jiang, S. Choi, A cardiac sound characteristic waveform method for in-home
heart disorder monitoring with electric stethoscope, Expert Syst. Appl. 31 (2006)
of the mechanism that can classify the abnormal heartbeat patterns with
286–298.
appropriate performance. Thus, to compare the classification perfor­ [3] D. Kumar, P. Carvalho, M. Antunes, P. Gil, J. Henriques, L. Eugenio, A new
mance of the proposed RNN (LSTM), we select five powerful ML clas­ algorithm for detection of S1 and S2 heart sounds, in: Proceedings of the ICASSP,
IEEE International Conference on Acoustics, Speech and Signal Processing 2, IEEE,
sifiers: RF, KNN, SVM, NB & DT, and DL methods like MLP as baseline
Toulouse, France, 2006, pp. 1180–1183.
models. In SVM, we have tested linear kernels to obtain the best clas­ [4] E.F. Gomes, E. Pereira, Classifying heart sounds using peak location for
sification accuracy outputs. The RF and DT classifiers were executed to segmentation and feature construction, Aistats (2012) 1–5.
the maximum depth level of 15. Furthermore, the KNN classifier was [5] J. Díaz-García, P. Brunet, I. Navazo, P.P. Vázquez, Downsampling methods for
medical datasets, in: Proceedings of the International Conferences on Computer
contain a leaf size of 30 with 10 neighbor values. The hyperparameters Graphics, Visualization, Computer Vision and Image Processing 2017 and Big Data
of other classifiers were also discussed in detail (see Table 2). The Analytics, Data Mining and Computational Intelligence - Part of the Multi
experimental results (see Tables 4 and 5) reveal that our proposed RNN Conference on Computer Science and Info, Lisbon, Portugal, 2017, pp. 12–20, 23
July.
using the LSTM technique of classifying abnormal heartbeat audio on [6] M. Genussov, I. Cohen, Musical genre classification of audio signals using
both of the datasets has achieved significant results and also added geometric methods, Eur. Signal Process. Conf. 10 (2010) 497–501.
substantial output in assisting medical experts. The output of the pro­ [7] O. Faust, R. Acharya U, S.M. Krishnan, L.C. Min, Analysis of cardiac signals using
spatial filling index and time-frequency domain, Biomed. Eng. Online 3 (2004)
posed method was also compared to the previous work (see Tables 8 and 1–11.
9) done on Pascal and PhysioNet challenge dataset. Arslan and Karhan [8] G.Y.Son Yaseen, S. Kwon, Classification of heart sound signal using multiple
[66], Li et al. [67], Sai and Kumari [68], Rubin et al. [70], Raza et al. features, Appl. Sci. 8 (12) (2018).
[9] S. Patidar, R.B. Pachori, Classification of cardiac sound signals using constrained
[31], Warrick et al. [71], Gomes et al. [13], Liaquat et al. [72] Deng et al.
tunable-Q wavelet transform, Expert Syst. Appl. 41 (16) (2014) 7161–7170.
[21] and work presented by Zhang et al. [22] gained the overall per­ [10] S. Ari, K. Hembram, G. Saha, Detection of cardiac abnormality from PCG signal
formance of the machine learning and deep learning methods are 0.989 using LMS based least square SVM classifier, Expert Syst. Appl. 37 (12) (2010)
8019–8026.
(DNN), 0.9961 (SE-ResNet), 0.9956 (DeepNet), 0.80 (CNN), 0.80 (RNN),
[11] J. Li, L. Ke, Q. Du, Classification of heart sounds based on the wavelet, Entropy 21
0.7716 (LSTM), 0.70 (MLP), 0.865 (CNN), 0.74 (SVM) and 0.76 (SVM), (5) (2019) 472.
respectively. Our method can identify and extract the discriminative [12] F. Safara, S. Doraisamy, A. Azman, A. Jantan, A.R. Abdullah Ramaiah, Multi-level
patterns to classify heart abnormalities from audio samples, which basis selection of wavelet packet decomposition tree for heart sound classification,
Comput. Biol. Med. 43 (10) (2013) 1407–1414.
achieved the highest accuracy result is 99.71% and 98.7% on Pascal and [13] E.F. Gomes, P.J. Bentley, M. Coimbra, E. Pereira, Y. Deng, Classifying heart sounds:
PhysioNet datasets, respectively. approaches to the PASCAL challenge, in: Proceedings of the HEALTHINF 2013-
Proceedings of the International Conference on Health Informatics, Barcelona,
Spain, 11–14 February, 2013, pp. 337–340.
5. Conclusion [14] G.D. Clifford, C. Liu, B. Moody, H.L. Li-wei, I. Silva, Q. Li, R.G. Mark, AF
Classification from a short single lead ECG recording: the PhysioNet/computing in
For this study, we identify the abnormal patterns of heartbeat audio cardiology challenge 2017, in: Proceedings of the 2017 Computing in Cardiology
(CinC), IEEE, 2017, pp. 1–4.
signals from two well-renowned publically available databases based on [15] C. Bruser, J. Diesel, M.D. Zink, S. Winter, P. Schauerte, S. Leonhardt, Automatic
data framing, reducing sample size, and employing different ML and DL detection of atrial fibrillation in cardiac vibration signals, IEEE J. Biomed. Health
approaches including the proposed model. The proposed method works Inf. 17 (2012) 162–171.
[16] Z. Xiong, T. Liu, G. Tse, M. Gong, P.A. Gladding, B.H. Smaill, M.K. Stiles, A.
effectively and efficiently to diagnose the heart dilemma from sound M. Gillis, J. Zhao, A machine learning aided systematic review and meta-analysis of
signals and provides appropriate information to the health experts for the relative risk of atrial fibrillation in patients with diabetes mellitus, Front.
determining whether further treatment process is required or not. This Physiol. 9 (2018) 835.
[17] K. Aschbacher, D. Yilmaz, Y. Kerem, S. Crawford, D. Benaron, J. Liu, M. Eaton, G.
study has been performed on Pascal and PhysioNet challenge databases, H. Tison, J.E. Olgin, Y. Li, et al., Atrial fibrillation detection from raw
and the noise was removed by applying different filtering techniques. photoplethysmography waveforms: a deep learning application, Heart Rhythm. O2
Furthermore, the sampling frame of each heartbeat sound signal was 1 (2020) 3–9.
[18] T. Hurnanen, E. Lehtonen, M.J. Tadi, T. Kuusela, T. Kiviniemi, A. Saraste,
converted into a fixed rate of length size of 10.0 s, and downsampling
T. Vasankari, J. Airaksinen, T. Koivisto, M. Pänkäälä, Automated detection of atrial
techniques were used to extract the most dominant and discriminative fibrillation based on time–frequency analysis of seismocardiograms, IEEE J.
patterns. The proposed model was applied and achieved the highest Biomed. Health Inf. 21 (2016) 1233–1241.
accuracy of 0.9971 and 0.987 on Pascal and PhysioNet datasets [19] P.T. Krishnan, P. Balasubramanian, S. Umapathy, Automated heart sound
classification system from unsegmented phonocardiogram (PCG) using deep neural
respectively, which shows that our methodology of identifying network, Phys. Eng. Sci. Med. 43 (2020) 505–515, https://doi.org/10.1007/
abnormal heartbeat audio is more significant. In the future, we are s13246-020-00851-w.

11
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048

[20] Y. Zheng, X. Guo, X. Ding, A novel hybrid energy fraction and entropy-based [45] R. Harper, J. Southern, A bayesian deep learning framework for end-to-end
approach for systolic heart murmurs identification, Expert Syst. Appl. 42 (2015) prediction of emotion from heartbeat, IEEE Trans. Affect. Comput. (2020).
2710–2721. [46] F.I. Alarsan, M. Younes, Analysis and classification of heart diseases using
[21] S.W. Deng, J.Q. Han, Towards heart sound classification without segmentation via heartbeat features and machine learning algorithms, J. Big. Data 6 (1) (2019) 1–15.
autocorrelation feature and diffusion maps, Future Gener. Comput. Syst. 60 (2016) [47] P. Gastaldo, J. Redi, Machine learning solutions for objective visual quality
13–21. assessment, in: Proceedings of the 6th International Workshop on Video Processing
[22] W. Zhang, J. Han, S. Deng, Heart sound classification based on scaled spectrogram and Quality Metrics for Consumer Electronics. VPQM, Scottsdale, AZ, USA, 19–20
and tensor decomposition, Exp. Syst. Appl. 84 (2017) 220–231. January 12, 2012, pp. 2451–2471.
[23] G.Y.S. Yaseen, S. Kwon, Classification of heart sound signal using multiple features, [48] A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional
Appl. Sci. 8 (2018) 2344. LSTM and other neural network architectures, Neural Netw. 18 (2005) 602–610.
[24] T. Chen, K. Kuan, L. Celi, G.D. Clifford, Intelligent heartsound diagnostics on a [49] S. Ioffe, C. Szegedy, arXiv preprint, 2015.
cellphone using a hands-free kit, AAAI Spring Symp. Ser. 2010 (2010) 26–31. [50] Amin Ullah, et al., Classification of arrhythmia by using deep learning with 2-D
[25] Y. Liu, C.C.Y. Poon, Y.T. Zhang, A hydrostatic calibration method for the design of ECG spectral image representation, Remote Sens. 12 (10) (2020) 1685 (Basel).
wearable PAT-based blood pressure monitoring devices, in: Proceedings of the [51] S. Nitish, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a
30th Annual International Conference of the IEEE Engineering in Medicine and simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15
Biology Society, EMBS’08 - “Personalized Healthcare through Technology, (2014) 1929–1958.
Vancouver, BC, Canada, 22–25 August 24, 2008, pp. 1308–1310. [52] M. Yu, Q. Huang, H. Qin, C. Scheele, C. Yang, Deep learning for real-time social
[26] A. Moukadem, A. Dieterlen, N. Hueber, C. Brandt, A robust heart sounds media text classification for situation awareness–using hurricanes sandy, harvey,
segmentation module based on S-transform, Biomed. Signal Process. Control 8 and irma as case studies, Int. J. Dig. Earth (2019) 1–18.
(2013) 273–281. [53] M. Sokolova, N. Japkowicz, S. Szpakowicz, AI 2006: advances in artificial
[27] S.E. Schmidt, C. Holst-Hansen, C. Graff, E. Toft, J.J. Struijk, Segmentation of heart intelligence, in: Proceedings of the 19th Australian Joint Conference on Artificial
sound recordings by a duration-dependent hidden markov, Model. Physiol. Meas. Intelligence, Hobart, Australia, 4–8 December, 2006.
31 (2010) 513–529. [54] K. Kowsari, K.J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, D. Brown, Text
[28] D.B. Springer, L. Tarassenko, G.D. Clifford, Logistic regression-HSMM-based heart classification algorithms: a survey, Information 10 (2019) 1–68.
sound segmentation, IEEE Trans. Biomed. Eng. 63 (2016) 822–832. [55] Z. Ebrahimi, et al., A review on deep learning methods for ECG arrhythmia
[29] S. Ari, K. Hembram, G. Saha, Detection of cardiac abnormality from PCG signal classification, Exp. Syst. Appl. X (2020), 100033.
using LMS based least square SVM classifier, Expert Syst. Appl. 37 (2010) [56] M. Alfaras, M.C. Soriano, S. Ortín, A fast machine learning model for ECG-based
8019–8026. heartbeat classification and arrhythmia detection, Front. Phys. 7 (2019) 103.
[30] F. Safara, S. Doraisamy, A. Azman, A. Jantan, A.R. Abdullah Ramaiah, Multi-level [57] S. Minaee; I. Bouazizi; P. Kolan; H. Najafzadeh Ad-Net: audio-visual convolutional
basis selection of wavelet packet decomposition tree for heart sound classification, neural network for advertisement detection In Videos. arXiv 2018, arXiv:
Comput. Biol. Med. 43 (2013) 1407–1414. 1806.08612.
[31] A. Raza, A. Mehmood, S. Ullah, M. Ahmad, G. Choi, B. On, Heartbeat sound signal [58] V. Ravi, D. Pradeepkumar, K. Deb, Financial time series prediction using hybrids of
classification using deep learning, Sensors 19 (21) (2019) 4819, https://doi.org/ chaos theory, multi-layer perceptron and multi-objective evolutionary algorithms,
10.3390/s19214819. Swarm Evol. Comput. 36 (2017) 136–149.
[32] S. Hershey, S. Chaudhuri, D.P.W. Ellis, J.F. Gemmeke, A. Jansen, R.C. Moore, [59] Z. Zhao, W. Chen, X. Wu, P.C.Y. Chen, J. Liu, LSTM network: a deep learning
M. Plakal, D. Platt, R.A. Saurous, B. Seybold, et al., CNN architectures for large- approach for short-term traffic forecast, IET Intel. Transp. Syst. 11 (2017) 68–75.
scale audio classification, in: Proceedings of the ICASSP, IEEE International [60] R. Zhao, R. Yan, J. Wang, K. Mao, Learning to monitor machine health with
Conference on Acoustics, Speech and Signal Processing, New Orleans, LA, USA, 5–9 convolutional Bi-directional LSTM networks, Sensors 17 (2017) 273.
March, 2017, pp. 131–135. Sensors 2019, 19, 4819 15 of 15. [61] Zümray Dokur, Tamer Ölmez, Heartbeat classification by using a convolutional
[33] S.R. Thiyagaraja, R. Dantu, P.L. Shrestha, A. Chitnis, M.A. Thompson, P. neural network trained with walsh functions, Neural Comput. Appl. (2020) 1–20.
T. Anumandla, T. Sarma, S. Dantu, A novel heart-mobile interface for detection and [62] G. Petmezas, K. Haris, L. Stefanopoulos, V. Kilintzis, A. Tzavelis, J.A. Rogers,
classification of heart sounds, Biomed. Signal Process. Control 45 (2018) 313–324. N. Maglaveras, Automated atrial fibrillation detection using a hybrid CNN-LSTM
[34] Y. Xu, Q. Kong, W. Wang, M.D. Plumbley, Large-scale weakly supervised audio network on imbalanced ECG datasets, Biomed. Signal Process. Control 63 (2021),
classification using gated convolutional neural network, in: Proceedings of the 102194.
2018 IEEE International Conference on Acoustics, Speech and Signal Processing [63] M. Al-dabag, H.S. ALRikabi, R. Al-Nima, Anticipating Atrial Fibrillation Signal
(ICASSP), Calgary, AB, Canada, 15–20 April, 2018, pp. 121–125. Using Efficient Algorithm, 2021.
[35] Y. Li, X. Li, Y. Zhang, W. Wang, M. Liu, X. Feng, Acoustic scene classification using [64] S. Raghunath, J.M. Pfeifer, A.E. Ulloa-Cerna, A. Nemani, T. Carbonati, L. Jing, C.
deep audio feature and BLSTM network, in: Proceedings of the ICALIP 2018—6th M. Haggerty, Deep neural networks can predict new-onset atrial fibrillation from
International Conference on Audio, Language and Image Processing, Shanghai, the 12-lead ECG and help identify those at risk of atrial fibrillation–related stroke,
China, 16–17 July, 2018, pp. 371–374. Circulation 143 (13) (2021) 1287–1298.
[36] K. Xu, B. Zhu, Q. Kong, H. Mi, B. Ding, D. Wang, H. Wang, General audio tagging [65] B.M. Mathunjwa, Y.T. Lin, C.H. Lin, M.F. Abbod, J.S. Shieh, ECG arrhythmia
with ensembling convolutional neural networks and statistical features, J. Acoust. classification by using a recurrence plot and convolutional neural network,
Soc. Am. 145 (2019) EL521–EL527. Biomed. Signal Process. Control 64 (2021), 102262.
[37] G. Keren, B. Schuller, Convolutional RNN: an enhanced model for extracting [66] Ö. Arslan, M. Karhan, Effect of Hilbert-Huang transform on classification of PCG
features from sequential data, in: Proceedings of the International Joint Conference signals using machine learning, J. King Saud Univ. Comput. Inf. Sci. (2022).
on Neural Networks, Vancouver, BC, Canada, 24–29 July, 2016, pp. 3412–3419. [67] X. Li, F. Zhang, Z. Sun, D. Li, X. Kong, Y. Zhang, Automatic heartbeat classification
[38] Z. Masetic, A. Subasi, Congestive heart failure detection using random forest using S-shaped reconstruction and a squeeze-and-excitation residual network,
classifier, Comput. Methods Prog. Biomed. 130 (2016) 54–64. Comput. Biol. Med. 140 (2022), 105108.
[39] H. Malik, M.S. Farooq, A. Khelifi, A. Abid, J.N. Qureshi, M. Hussain, A comparison [68] Y.P. Sai, L.R. Kumari, Cognitive assistant DeepNet model for detection of cardiac
of transfer learning performance versus health experts in disease diagnosis from arrhythmia, Biomed. Signal Process Control 71 (2022), 103221.
medical imaging, IEEE Access (2022), https://doi.org/10.1109/ [69] A.M. Alqudah, A. Alqudah, Deep learning for single-lead ECG beat arrhythmia-type
ACCESS.2020.3004766. detection using novel iris spectrogram representation, Soft Comput. 26 (3) (2022)
[40] M.S. Naeem, Farooq, A. Khelifi, A. Abid, Malignant melanoma classification using 1123–1139.
deep learning: datasets, performance measurements, challenges and opportunities, [70] J. Rubin, S. Parvaneh, A. Rahman, B. Conroy, S. Babaeizadeh, Densely connected
IEEE Access 8 (2020) 110575–110597, https://doi.org/10.1109/ convolutional networks for detection of atrial fibrillation from short single-lead
ACCESS.2020.3001507. ECG recordings, J. Electrocardiol. 51 (2018) S18–S21.
[41] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a [71] P.A. Warrick, M.N. Homsi, Ensembling convolutional and long short-term memory
simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15 (1) networks for electrocardiogram arrhythmia detection, Physiol. Meas. 39 (2018),
(2014) 1929–1958. 114002.
[42] A. Raza, A. Mehmood, S. Ullah, M. Ahmad, G.S. Choi, B.W. On, Heartbeat sound [72] S. Liaqat, K. Dashtipour, A. Zahid, K. Assaleh, K. Arshad, N. Ramzan, Detection of
signal classification using deep learning, Sensors 19 (2019) 4819. atrial fibrillation using a machine learning approach, Information. 11 (12) (2020)
[43] J. Park, K. Lee, K. Kang, Arrhythmia detection from heartbeat using k-nearest 549, https://doi.org/10.3390/info11120549.
neighbor classifier, in: Proceedings of the 2013 IEEE International Conference on [73] Y. Sun, Y.Y. Yang, B.J. Wu, et al., Contactless facial video recording with deep
Bioinformatics and Biomedicine, IEEE, 2013, pp. 15–22. learning models for the detection of atrial fibrillation, Sci. Rep. 12 (2022) 281,
[44] G. Sannino, G. De Pietro, A deep learning approach for ECG-based heartbeat https://doi.org/10.1038/s41598-021-03453-y.
classification for arrhythmia detection, Fut. Gener. Comput. Syst. 86 (2018) [74] L. Meng, W. Tan, J. Ma, R. Wang, X. Yin, Y. Zhang, Enhancing dynamic ECG
446–455. heartbeat classification with lightweight transformer model, Artif. Intell. Med.
(2022), 102236.

12

You might also like