Professional Documents
Culture Documents
1 s2.0 S266709922200024X Main
1 s2.0 S266709922200024X Main
A R T I C L E I N F O A B S T R A C T
Keywords: Nowadays, heart disease is the leading cause of death. The high mortality rate and escalating occurrence of heart
Abnormal heart sound diseases worldwide warrant the requirement for a fast and efficient diagnosis of such ailments. The purpose is to
Heartbeat signals design an automated system for the classification of abnormal heartbeat audio signals to assist cardiologists. To
RNN
the best of our knowledge, this is the first study that uses a single neural network model for the classification of
LSTM
Neural network
eight different types of heartbeat audio signals. The proposed recurrent neural network (RNN) model using Long
short-term memory (LSTM) is developed on two publically available databases such as the PASCAL challenge and
the 2017 PhysioNet challenge. Mel frequency cepstrum coefficient (MFCC) is applied to extract the dominant
features, and a bandpass filter is used to remove the noise from both of the datasets. Afterward, the down
sampling technique is used to fix the size of the sampling rate of each sound signal to 20KHz and 300 Hz for the
Pascal and PhysioNet database, respectively. The proposed model is compared with multi-layer perceptron
(MLP) in terms of different performance evaluation matrices. Furthermore, the outcomes of five machine
learning (ML) models are also analyzed. The proposed model has achieved the highest classification accuracy of
0.9971 on the Pascal database, and 0.9870 accuracy on the PhysioNet challenge dataset, which is consistently
superior to its competitor approaches. The proposed model provides significant assistance to the cardiac
consultant in detecting heart valve diseases.
1. Introduction range for adults is 60–100 beats per minute. The sound pattern produced
by the heart “lub dub, dub lub”, is the clear and healthy sequence of the
In past, human diagnosticians have diagnosed cardiovascular dis normal heartbeat, where the time duration of dub to lub is more than the
eases (CVDs) by simply listening to a patient’s heartbeat directly from time lub to dub. The order of murmur heartbeat sounds has produced a
their chest; this method, however, was found to be very unscientific, noisy sound between lub to dub and dub to lub is a sign of clusters of a
inefficient, and unethical. Therefore, the "Stethoscope" was invented by heart ailment. Furthermore, the sound sequence of extra-systole heart
Laennec in 1816, and” mediate auscultation” is now used by di sounds contains the flow patterns of “lub-lub dub, lub dub-dub” and is
agnosticians in the diagnosis of CVDs [1]. In the medical field, this generally present in children [4]. Cvds are one of the major threats to
equipment was started rapidly used for the diagnosis of heart disease. human life. It is considered as the cluster of different heart diseases
Despite its accurate measurement, the stethoscope requires many years which are commonly used to discuss the range of disorders that affect
of experience to expertise [2]. The heart sound of normal patients holds the heart. Narrowing or thinning of the major blood vessels is the main
two types of sound S1 (lub) and sound S2 (dub), respectively. The sound factor of heart problems [5]. According to World Health Organization
S1 segment relates to the closing values of systole, while the S2 sound (WHO) research, an estimated 17.9 M human casualties are due to Cvds
segment connects with the opening values of diastole [3]. The listening [6]. This rapid growth of CVD compels us to the automatic detection of
to heartbeat sounds by using a stethoscope, health experts can find heart disease from the heartbeat that provides effective and efficient treat
diseases. Heart rate is also called pulse, and pulse is the integer of times a ment procedures.
person’s heart beats per minute. The value of normal heart rate varies The visual monitoring of heartbeat signals is time taking and prone
from person to person. According to mayo clinic [3], a standard heart to inaccurate detection. AI approaches can be applied to overcome these
* Corresponding author.
E-mail address: f2019288004@umt.edu.pk (H. Malik).
https://doi.org/10.1016/j.bea.2022.100048
Received 27 April 2022; Received in revised form 14 July 2022; Accepted 26 July 2022
Available online 27 July 2022
2667-0992/© 2022 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048
limitations. ML is a sub-branch of AI, that is mainly used for feature cutting-edge methods.
selection, statistical analysis, and classification [7]. ML models are Hurnanen et al. [18] used a linear classification approach to build a
commonly applied to different heartbeat audio signal databases with system for SCG signals. The effectiveness of the system was tested on 13
several feature extraction approaches [8–12]. However, these ML patients, and it performed admirably on low-quality SCG motions. Using
methods are subjective and time-consuming due to the handcrafted the Shannon energy algorithm (SEA), Gomes et al. [13] created a system
feature selection process. To overcome these limitations, neural network that converts the heartbeat audio signal into distinct segmentation
models are used for the detection of abnormal heartbeat audio signals. sections. They used two well-known machine learning methods, J48 and
These models have the capability of extracting the features automati MLP, respectively, after segmentation. Unfortunately, they were only
cally and performing the classification task. The databases used in the able to complete Challenge 1 (segmentation) and were unable to com
present study were collected from the Pascal challenge [13] and Phys plete challenge 2 (classification). These two classifiers were not able to
ioNet 2017 challenge [14], to achieve the best classification accuracy of show good results because of challenge 2.
heartbeat signals with the lowest error rate. To the best of our knowl Using a one-dimensional convolutional neural network (1D-CNN)
edge, this is the first study that uses a single neural model for the clas and a feed-forward neural network (F-NN), Krishnan et al. [19] present a
sification of eight (8) different types of heartbeat audio signals. These methodology for classifying unsegmented phonocardiogram (PCG) data.
signals are obtained from both of the databases such as normal, murmur, Without any feature engineering or segmentation of heartbeat sound
extrasystole, artifacts, noisy, atrial fibrillation (AFib), Fusion beat, and data, the F-NN attained the highest diagnosis accuracy of 0.8575. Gomes
other beats. A novel RNN using the LSTM model is constructed for the and Pereira [4] described a method for classifying heartbeat audio
categorization of abnormal heartbeat audio into their respective groups. PASCAL challenges. They used algorithms to evaluate cardiac sound
We also analyzed the outcomes of the proposed model with MLP and five patterns, such as S1 (lub) and S2 (dub). They use MATLAB to implement
different ML models i.e. Random Forest (RF), K-Nearest Neighbours the decimate approach, and then a band-pass filter to remove the noise.
(KNN), Support Vector Machine (SVM), and Naïve Bayes (NB), and After that, they utilized SEA to determine the top and lower values of the
Decision Tree (DT). For the proposed model, the frame rate (FR) features heartbeat sound, as well as J48 and MLP to train the model, and they
are extracted from the heartbeat signal and removed noise by using were able to predict the sound with acceptable accuracy.
bandpass filters. Zheng et al. [20] demonstrate a novel feature approach to identify
Below are the contributions of the underlying study: abnormal patterns from heartbeat sounds. These characteristics
We designed a novel neural network architecture of RNN using LSTM included the heart murmur energy fraction (HMEF), the maximum en
to automatically detect eight different heartbeat audio patterns. ergy fraction of the heart sound frequency sub-band (HSEFmax), and the
Furthermore, we also performed a detailed comparison with baseline S1-S2EF entropy. After that, a wavelet packet was used to normalize and
models of ML and DL. deconstruct the signals. Furthermore, a support vector machine (SVM)
Our proposed model obtained significant classification accuracy on algorithm was used to obtain outstanding diagnostic accuracy on a
two benchmark datasets such as 99.71% on Pascal Challenge and 98.7% limited set of a dataset with a total of 247 sounds, including 80 normal
on PhysioNet 2017 dataset. and 167 pathological murmur sounds. Deng and Han [21] proposed a
As discussed in Section 2, we have reviewed the recent state-of-the- framework that uses an auto-correlation procedure instead of segmen
art methods of artificial neural networks and the traditional methods tation techniques to precisely classify heart sounds. With discrete
used to classify abnormal heartbeat audio. wavelet decomposition (DWT), this process was retrieved from the sig
The proposed model can take advantage of the power of parallel nal’s sub-band coefficient. After extracting the features, they used SVM
computing architecture, such as a graphics processing unit (GPU). As a to acquire a precision of 77%, 76%, and 50% for normal, murmur, and
result, we compare the computation times of a GPU and a central pro extra-systole, respectively, indicating an unsatisfactory result.
cessing unit (CPU), demonstrating that in heartbeat audio categoriza To determine the discriminative feature of heartbeat sound, Zhang
tion, the GPU implementation beats its CPU counterpart. The GPU et al. [22] employed tensor decomposition and a scaled spectrogram.
outperforms the CPU even in the training phase of the classifier, i.e. the The SVM model was utilized, and they get normal results which are 0.74.
proposed system can be trained rapidly with a GPU. In [23], the heartbeat sound categorization was done utilizing the DWT
Additionally, we developed a novel framework that can detect and MFCC feature extraction techniques. They recorded the digitized
abnormal heartbeat sounds by using neural network methods. Never heartbeat sound with a phonocardiogram (PCG) electric stethoscope.
theless, our framework can also be utilized for other heart tract disease They used a deep convolutional neural network (DCNN), SVM, and
classification applications as well. K-Nearest Neighbor in their work. They also combine the DWT and
This research work has been categorized as follows: Section 2 con MFCCs approaches to obtain significant outcomes. The three key pro
sists of state-of-the-art. Section 3 describes the methodology and pro cedures employed by the researchers to achieve heartbeat sound clas
posed framework. Section 4 contains the results obtained after sification are segmentation, feature extraction, and classification model.
experimenting; this study has been concluded in Section 5. Using amplitude threshold [24–26] and probabilistic techniques [27,
28], the initial segmentation process has been discovered. The second
2. State-of-the-arts step is to use feature engineering based on the time-frequency [29,30],
and the third step is to use the classification model. MLP [13] and SVM
A lot of work has been conducted on the classification of cardiac [20] are the most well-known and widely utilized classification models.
diseases by using deep learning on different medical databases. Most of Raza et al. [31] applied a bandpass filter to extract the feature from
them achieved remarkable accuracy. the sound signal and used the downsampling technique for data framing.
Bruser et al. [15] devised a novel method by using bed-mounted The simple spectrogram is used for the apparent visualization of the
sensors to detect aberrant pulse patterns from cardiac vibration signals sequence of amplitude. In their purposed system, multiple models such
(CVS) and remotely monitor patients. They tested the detection per as random forest (RF), MLP [13], DT, and RNN for the classification of
formance of their proposed model using a variety of machine learning heartbeat sound were used. CNN was evaluated on large-scale audio
classifiers. Xiong et al. [16] used the k-means technique to find atrial datasets by Hershey et al. [32] and their results were good. In the
fibrillation (AFib) in the PubMed database. The maximum entropy Detection and Classification of Acoustic Scenes and Events (DCASE)
method outperforms the other ML algorithms, according to their find 2017 challenge, gated CNN won 1st position which was presented by
ings. In [17], they created a framework that can detect the AF using PPG Thiyagaraja et al. and Xu et al. [33,34]. Li et al. [35] also applied the
signals without using any feature engineering techniques. However, the BLSTM model to DCASE to achieve enhanced results. Xu et al. [36]
system’s fundamental flaw is a lack of comparison to contemporary, evaluated an ensemble approach based on CNN that provides enhanced
2
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048
Fig. 1. Different visual representations of heartbeat sound signals; (a) Normal, (b) AFib, and (c) Noisy.
Table 1
Description of heartbeat sound databases.
Datasets Category Recordings Time Length in seconds (s)
3
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048
(s) having a sampling frequency of 300 Hz. The average length of the 5 s time, then the TF rate of the audio file can be calculated by using Eq.
heartbeat audio signal was 30 (s). Each audio signal was labeled into (1) i.e. TF = 5000×5 = 25,000. Each heartbeat audio file consists of a
four kinds of classes such as normal (N), AFib, Noisy and other heart different length of frame rate, consequently, it is not possible to apply
beats. The total statistics of both datasets are presented in Table 1. any deep learning classification approaches because of the variant size
of feature-length [42]. So, the sampling rate of each audio file has been
3.2. Data pre-processing and visualization fixed by employing a downsampling technique. This technique reduces
the sampling frequency of each heartbeat audio file to the sizes of 20,
The audio data of both databases have been pre-processed before 000 Hz and 300 Hz frame rate for Pascal and PhysioNet challenge da
applying the classification approaches. There are multiple pre- tabases, respectively. In addition, heartbeat signals have normalized by
processing techniques such as Perceptual feature MFCC, Frequency removing noise using a bandpass filter, and then the zero-padding pro
domain features (Amplitude of individual frequencies), and Discrete cess has applied which is measured by Eq. (2).
cosine transform (DCT) [41] which are mainly used for selecting H− μ
dominant features from the audio data. In the present study, the MFCC H= (2)
σ
technique is used for extracting useful data from sound signals [40]. The
fast Fourier transform (FFT) of the heartbeat audio signals is derived by Where H shows the shape of the signals (8528, 1800), µ and σ are
using MFCC. Furthermore, DCT is also applied to define the finite considered as mean and standard deviation (SD), respectively.
sequence of heartbeat audio signal points in terms of cosine function
oscillating at 22 Hz frequency. The power spectrogram is also consid 3.4. Data sampling
ered one of the most significant and efficient features for the visual
illustration of audio sound signals [40]. The power spectrogram of the There are 50,000 frame rates in the heartbeat audio database. It is
first 5 s of the heartbeat signals is represented in Fig. 2. quite difficult for us to deal with such a high frame rate on our GPU. To
solve this problem, we minimize the feature of each heartbeat audio file
3.3. Data normalization by using down-sampling algorithms to process sound files more quickly
and in a shorter amount of time. In digital signal processing (DSP), the
The sound signal of the dataset contains a sampling frame rate (SFR) downsampling term is associated with the process of resampling in a
of 4000 which is certain with the second period of the audio files. This multi-rate signal system. The entire process of bandwidth reduction
SFR consists of different frame principles of heartbeat audio signals (filtering) and sample-rate reduction is described by this sampling
within the range of 1-s (s) and obtains the all-inclusive frames by method. When applied to a sequence of heartbeat audio sound signals,
multiplying the sampling rate (SR) to the value of time (T) of each audio the technique generates a close approximation of the sequence that
file by using Eq. (1): would have been generated by sampling the data at a lower rate. As a
result, a decimate down-sampling approach is employed to lower the
TotalFrame(TF) = SR × T (1)
frame rate of the cardiac audio sound stream while maintaining its ef
For more understanding, if the heartbeat audio file has a duration of ficiency [5]. Using an 8 × 8 low pass filter, the downsampling technique
4
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048
5
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048
the output gate performs the write function and the last gate of the
Table 3 memory block is forget gate which performs reset operations. More
Dimension and operations of the proposed model. precisely, the output produced by the cell is multiplied by the output
RNN Layer (Type) Output Shape Parameters gate, and forget gate is multiplied by the output of the previous memory
Layers cell. When the input signal of heartbeat is provided to the model, each
1 Input (None, 800, 1, 10,024,384 time LSTM layers contain a memory ct at a specific duration of time t.
1) The activation (ht) function is calculated by using Eq. (8):
2 LSTM (None, 800, 1, 0
64) ht = ot × σt (ct ) (8)
3 Dropout (learning rate = 0.45) (None, 800, 1, 1,638,523
64) Eq. (9) was used to measure the memory data of the output gate ot.
4 LSTM (None, 800, 1, 0 (
32)
ot = σs wo xt + uo h(t− 1) + bo ) (9)
5 Dropout (learning rate = 0.45) (None, 800, 1, 400,728
32)
Eq. (10) is used to control the memory of the system by using forget
6 Dense (None, 1, 1, 3) 0 gate ft and transforming the data of the existing memory ct with the new
7 Softmax (None, 1, 1, 3) 512 information.
Total Trainable Parameters:
12,064,147
Trainable Parameters:
12,063,892
Non-Trainable Parameters: 255
predicted and actual values. If the loss of the model is closer to zero, this
means the model performs better. The total number of trainable pa
rameters is 12,064,147 which are further divided into two groups; the
trainable parameters are 12,063,892, and the non-trainable parameters
are 255. The difference between these two parameters is that the
trainable parameters are updated at the time of training processes, while
non-trainable parameters are not updated and optimized during
training. So, the non-trainable parameter will not take part in the clas
sification process.
6
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048
(
f t = σs wf xt + uf h(t− 1) + bf ) (10) eyi
S(yi ) = ∑ yk , k = 1, 2, 3, 4,...., n (15)
The Eq. (11) is applied for upcoming new information of the heart ke
beat signal which is managed by the input gate.
( where S and y show the output and input, respectively.
it = σs wi xt + ui h(t− 1) + bi ) (11) The detailed description of our proposed model is outlined in Algo
rithm 1.
The existing memory of our proposed LSTM block contains the in
Algorithm 1: Classification of abnormal heartbeat audio using the proposed model
formation of the heartbeat audio signal in the network which is updated
Parameters: Recurrent neural network (R), training (tr), testing (te), Performance
by the applying following Eq. (12) : evaluation (Pe), and Confusion matrix (Cm).
( 1. n ← heartaudio_samples
ct = ft × c(t− 1) + it × σh wc xt + uc h(t− 1) + bc ) (12) 2. Pp ← Pre-processing
3. For h ← 0 to n do :
where ht and ct are known as hidden vector and cell vector, respectively, 4. For c ← 1 to Pp do :
σs is the symbol of the sigmoid function, σh represents the hyperbolic 5. D ← data (Pp + c):
6. For i ← 1 to D do :
method, w and b denote parameters weight and bias vector, respec
7. tr = partition (rand (D = 0.70))
tively, with the preliminary values of c0 = 0 and h0 = 0. The graphical 8. te = partition (rand (D = 0.30))
representation of the LSTM memory cell is illustrated in Fig. 4. 9. M ← R (tr)
10. End i
3.6.2. Dropout layer 11. End c.
12. For i ← 1: size (te) do:
The main problem has been observed at the time of training the deep
13. For h ← 1: output_class :
learning classifiers are over-fitting. If DL classifiers show satisfactory 14. For j ← 1: count (M (h))
results in the training phase but later when these trained classifiers were 15. D (h,j) = M (j,h) – te(i)
applied to the testing dataset which shows unfortunate output results. It 16. End j.
17. V(h) ← min (D (h))
will only occur when two or more neurons perceive similar results
18. End h.
repetitively [48], and we need to remove the neurons that affect the 19. O ← predict (V)
output as shown in Eq. (13). As we mentioned earlier in the proposed 20. For B ← 0 to O do:
method section, to reduce the chance of overfitting, we fine-tuned the 21 O ← Pe
over model by changing the value of learning rate (r) such as 0.001, 0.05 22. Pe ← Cm
23. End B
& 0.45, etc. In this study, r = 0.45 produced significant and appropriate
24. End i
results in training the proposed classifiers. 25. End h.
Where ZL represents the input layer, L is a hidden layer, and WL, BL,
and YL show the weight, biases, and output vector, respectively. 3.6.5. Evaluation criteria
The performance of the ML and DL classifiers is evaluated on the
3.6.3. Dense layer testing set of both heartbeat sound signal datasets. The performance of
The dense layer is applied to transform the proportions of a layer. the proposed model on test data is evaluated in terms of accuracy,
This layer contains the conventional neurons which take input data like sensitivity, specificity, ROC, and precision-recall curve [51]. Following
weight, assign linear function, and then process the gained output to its Eqs. (16), (17), and (18) represent the accuracy, sensitivity, and speci
very next layer where all neurons are connected to the input layer. We ficity values, respectively and k represents the number of classes:
applied batch normalization function (BNF) after the dense layer for the ∑
k
normalization of erudite spreading to enhance the training efficiency
TPi +TNi
TPi +TNi +FPi +FNi
[49]. Furthermore, to manage the consequence of initialization, we Accuracy = i=1
(16)
k
repeated the process of a combination of hyperparameters five times and
applied the averaged value for validation as shown in Eq. (14). ∑
k
TPi
X = f(Y × w + b) (14) Sensitivity = i=1
(17)
∑
k
(TPi + FNi )
where b represents the bias vector, f is the activation function, w con i=1
tains the weight, Y and X show the input and output layers, respectively
[29]. ∑
k
TNi
Specificity = i=1
(18)
3.6.4. Softmax layer ∑
k
(TNi + FPi )
The softmax activation function is exceedingly important and it de i=1
cides whether the neurons are active or not. Softmax is effectively
The confusion matrix is used to calculate the performance of a
handling multi-class classification problems [50]. The main purpose of
classifier on the set of test data for which the true or correct values are
the softmax function is to spot the largest value in neurons and assign the
known. True positive (TP) and False Positive (FP) represent the value of
weight is one to that neuron, and the rest of the neurons are assigned
correctly and incorrectly classified images, respectively. Similarly, True
zero value. It can be mathematically written as in Eq. (15). The results
Negative (TN) and False Negative (FN) contains the value of the correct
obtained from our proposed RNN (LSTM) model are directed toward the
and incorrect instances of images, respectively. The AU(ROC) curve was
number of classes. A softmax function is normally used to get the
also calculated.
resultant vector into the probability containing the values in [0,1]. This
function classifies the output sound into one of the heartbeat audio
4. Results and discussions
diseases classes: normal, murmur, extrasystole, artifacts, noisy, AFib,
Fusion beat, and other beats.
We feed a heartbeat audio signal into the proposed model, which
classifies the audio into one of four categories. The batch size was set to
7
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048
Fig. 5. Training and validation accuracy and loss; (a) represents the model loss and (b) represents the model accuracy.
Table 4 Table 5
Results of ML and DL classifiers on pascal challenge dataset. Results of ML and DL classifiers on PhysioNet Challenge Dataset.
Classifiers Accuracy Specificity Sensitivity F1- AUC Classifiers Accuracy Specificity Sensitivity F1- AUC
Score Score
RF 0.92 0.90 0.89 0.91 0.90 RF 0.702 0.77 0.76 0.78 0.76
KNN 0.74 0.72 0.71 0.73 0.75 KNN 0.646 0.68 0.67 0.68 0.69
SVM 0.72 0.70 0.69 0.70 0.74 SVM 0.812 0.82 0.81 0.826 0.80
NB 0.77 0.76 0.73 0.71 0.76 NB 0.74 0.73 0.72 0.70 0.73
DT 0.70 0.68 0.65 0.66 0.70 DT 0.697 0.69 0.69 0.69 0.68
MLP 0.85 0.84 0.80 0.82 0.85 MLP 0.774 0.77 0.76 0.75 0.75
Proposed Model 0.9971 0.993 0.986 0.989 0.983 Proposed Model 0.987 0.99 0.985 0.989 0.988
64, and the proposed model was trained for 200 epochs. The grid search and DL classifiers to diagnose the abnormal heartbeat signal patterns
technique was used to tune the hyperparameters of the proposed model using the Pascal Challenge database.
and ML classifiers. For each class label, the accuracy, precision, recall, The results in Table 4 show that the proposed deep learning algo
AUC, and F1-score of the proposed model, MLP, and ML classifiers were rithm attained a better classification accuracy of 99.71%, specificity of
evaluated. 99.3%, the sensitivity of 98.6%, and 98.9% of f1-score as compared to
the MLP as well as traditional ML classifiers. The reason behind
4.1. Experimental setup achieving the significant performance of the proposed model is due to
the ability of self-feature extraction. MLP is considered as the earlier
This research experiment evaluates the proposed model as well as classification algorithm applied in the various extent of classification-
various ML classifiers. The Keras framework is used to implement the related problems. RF is a hybrid learning approach and is mainly used
proposed model. Moreover, the approaches that are not directly linked for categorical and text classification tasks [52]. Therefore, RF was
with neural networks are programmed in python language. The exper well-performed on the categorical heartbeat dataset and achieved 92.0%
iment was executed on a Windows-based operating system with 11 GB of classification accuracy which is superior to MLP (85.0% accuracy).
GPU NVIDIA GeForce GTX and 32GB RAM. The performance of the proposed model is also evaluated on the
PhysioNet 2017 challenge dataset, and their detailed results are pre
sented in Table 5. On the PhysioNet dataset, the overall accuracy of
4.2. Results analysis 98.7%, specificity of 99%, sensitivity of 98.5%, and f1-score of 98.8%
were gained by the proposed model. MLP with 100 hidden layers has
Fig. 5 depicts the training and validation accuracy in terms of epochs. been also applied to the PhysioNet dataset which achieves the 77.4% of
The maximum obtained accuracy for training was 99.96%, and that for accuracy, specificity of 77.0%, sensitivity of 76.0%, and an f1-score of
validation was 95.33%. These values indicated that our proposed model 75.0%. Moreover, the increase in the number of hidden layers also in
trained well and could correctly classify abnormal cases. The training creases the accuracy but somewhere increasing the size of layers doesn’t
loss was 0.021, and the validation loss was 0.0668. The purpose of this affect the accuracy [41].
study is to design an automated system to classify abnormal heartbeat Besides providing a detailed characterization of the heartbeat audio
audio signals accurately by using deep learning methods. The pre
liminary diagnoses of heart problems have been assisting the medical
expert to resolve the further process of medicine. In this study, two well- Table 6
known publically available databases i.e. Pascal Challenge and Physi Technical specifications of the CPU and GPU for the present study.
oNet databases have been used, which contain 8 different types of Specifications CPU GPU
heartbeat sounds. Initially, we transformed the dataset, and converted it
Processor Intel® Core™ NVIDIA Titan X (Pascal) 3584
two different times i.e., 10.0-s (10,000 frames) for the Pascal dataset and i7–10,700 CUDA cores
25.0-s (20,000 frames) for the PhysioNet dataset. Then, the proposed Memory 32 GB 12 GB
model and MLP model were applied to the datasets to classify abnormal Base Frequency 2.90 GHz 350 MHz
heartbeat audio. In addition, ML classifiers were also applied, and their Max Memory 45.8 GB/s 480 GB/s
Bandwidth
outcomes are observed. Table 4 demonstrates the output summary of ML
8
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048
Fig. 6. (a) Computational times for training the proposed model on the heartbeat audio signals datasets, and (b) Computational times for testing the proposed model.
signal based on the proposed model, our work also aims at achieving
Table 7
computational times that allow for the real-time processing of heartbeat
Hyper-parameter tuning of Proposed model.
audio data [53–55]. In particular, we have implemented the proposed
model defined here independently in an unparalleled Python version for Datasets Dropout Rate Precision Loss
the CPU and a Python/CUDA version for the GPU. Python is an 0.05 0.958 0.39
object-oriented programming language, and CUDA is a parallel Pascal Database 0.35 0.979 0.31
computing platform built by Nvidia to communicate with their GPUs. 0.45 0.989 0.27
0.05 0.970 0.41
The technical specifications of the CPU and GPU employed in this study PhysioNet Database 0.35 0.980 0.34
are listed in Table 6. 0.45 0.987 0.29
Even though ensembles are intrinsically self-contained, making they
are good candidates for parallel multi-processor implementations. Due
to the inclusion of large matrix products and non-linear mapping func using the LSTM model with the support of softmax, dense and dropout
tions in the reservoir paradigm, serial implementations are also ideal for layers achieves significant accuracy in classifying different types of
the exploration of computationally rapid approaches. These methods, abnormal heartbeat sounds. The structure of LSTM is based on the time
such as GPU implementations, can reduce latency while increasing series of data [57] and their memory cells remember the previous data.
throughput. To explore the computational time, a series of training and Each time, the gates of the memory cell deal with the flow of data along
classification procedures for both of the heartbeat audio databases are with the pattern that can store the data for a long time more precisely
analyzed. Python implementations benefit from the Scikit Learn library [58]. The ability to remember the previous data for a long time to solve
[19], while Python/CUDA uses TensorFlow, Python cuBLAS, and a the present dilemma is the cause of achieving the appropriate results.
CUDA kernel implemented for the non-linear mapping [56]. Fig. 6a and We applied different optimizers like Adam, Adagrad, and SGD, and finds
b illustrate the computational times of training and a testing realization that the Adam optimizer produced significant results as compared to the
for both the heartbeat audio signals versus the number of neurons. The competitor’s optimizer. The dropout layer was applied to reduce the
GPU and CPU comparison shows the benefit of using a GPU imple problem of over-fitting by randomly dropping the connection between
mentation, with significantly lower training times. Fig. 6a presented the the layers [59–61]. Additionally, we also fine-tuned the hyper
computational times including the random non-linear mapping (NM) of parameters by using the different values of dropout rate {0.05, 0.35,
the input onto the reservoir and the output weights over the entire train 0.45}, and each dropout rate produced different outputs demonstrated
dataset. Fig. 6b depicts the computational time for the last classification in Table 7.
steps that calculate the output in the test dataset. As expected, the Tables 4 and 5 show that RNN has achieved 0.9971 accuracies on
processing time increases as the number of neurons increases, especially 10.0-s sample sound files of the Pascal dataset and 0.987 accuracies with
during the training process. Moreover, the piece-wise linear trend in the 25.0-s audio sample files of the PhysioNet challenge dataset. Random
GPU Classification product demonstrates the impact of small products forest (RF) gained 92.0% accuracy and 85.0% accuracy produced by the
on cuBLAS scaling, which is inherent to the library. MLP with the Pascal Dataset. It is noticed that RF gained remarkable
From Tables 4 and 5, it has been observed that the proposed RNN classification accuracy as compared to MLP and other ML classifiers in
classifying abnormal heartbeat audio signals. Moreover, the overall
9
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048
Fig. 7. 5-fold cross-validation for 10.0-s of heartbeat audio signal of Pascal database.
Fig. 8. 5-fold cross-validation for 10.0-s of heartbeat audio signal of PhysioNet Challenge database.
Table 8 Table 9
Comparison of the proposed model with state-of-the-art classifiers on the Pascal Comparison of the proposed model with state-of-the-art classifiers on PhysioNet
Database. Challenge Dataset.
Refs. Models Accuracy Specificity Sensitivity F1- Refs. Models Accuracy Specificity Sensitivity F1-
(%) (%) (%) Score (%) (%) (%) Score
(%) (%)
10
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048
11
H. Malik et al. Biomedical Engineering Advances 4 (2022) 100048
[20] Y. Zheng, X. Guo, X. Ding, A novel hybrid energy fraction and entropy-based [45] R. Harper, J. Southern, A bayesian deep learning framework for end-to-end
approach for systolic heart murmurs identification, Expert Syst. Appl. 42 (2015) prediction of emotion from heartbeat, IEEE Trans. Affect. Comput. (2020).
2710–2721. [46] F.I. Alarsan, M. Younes, Analysis and classification of heart diseases using
[21] S.W. Deng, J.Q. Han, Towards heart sound classification without segmentation via heartbeat features and machine learning algorithms, J. Big. Data 6 (1) (2019) 1–15.
autocorrelation feature and diffusion maps, Future Gener. Comput. Syst. 60 (2016) [47] P. Gastaldo, J. Redi, Machine learning solutions for objective visual quality
13–21. assessment, in: Proceedings of the 6th International Workshop on Video Processing
[22] W. Zhang, J. Han, S. Deng, Heart sound classification based on scaled spectrogram and Quality Metrics for Consumer Electronics. VPQM, Scottsdale, AZ, USA, 19–20
and tensor decomposition, Exp. Syst. Appl. 84 (2017) 220–231. January 12, 2012, pp. 2451–2471.
[23] G.Y.S. Yaseen, S. Kwon, Classification of heart sound signal using multiple features, [48] A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional
Appl. Sci. 8 (2018) 2344. LSTM and other neural network architectures, Neural Netw. 18 (2005) 602–610.
[24] T. Chen, K. Kuan, L. Celi, G.D. Clifford, Intelligent heartsound diagnostics on a [49] S. Ioffe, C. Szegedy, arXiv preprint, 2015.
cellphone using a hands-free kit, AAAI Spring Symp. Ser. 2010 (2010) 26–31. [50] Amin Ullah, et al., Classification of arrhythmia by using deep learning with 2-D
[25] Y. Liu, C.C.Y. Poon, Y.T. Zhang, A hydrostatic calibration method for the design of ECG spectral image representation, Remote Sens. 12 (10) (2020) 1685 (Basel).
wearable PAT-based blood pressure monitoring devices, in: Proceedings of the [51] S. Nitish, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a
30th Annual International Conference of the IEEE Engineering in Medicine and simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15
Biology Society, EMBS’08 - “Personalized Healthcare through Technology, (2014) 1929–1958.
Vancouver, BC, Canada, 22–25 August 24, 2008, pp. 1308–1310. [52] M. Yu, Q. Huang, H. Qin, C. Scheele, C. Yang, Deep learning for real-time social
[26] A. Moukadem, A. Dieterlen, N. Hueber, C. Brandt, A robust heart sounds media text classification for situation awareness–using hurricanes sandy, harvey,
segmentation module based on S-transform, Biomed. Signal Process. Control 8 and irma as case studies, Int. J. Dig. Earth (2019) 1–18.
(2013) 273–281. [53] M. Sokolova, N. Japkowicz, S. Szpakowicz, AI 2006: advances in artificial
[27] S.E. Schmidt, C. Holst-Hansen, C. Graff, E. Toft, J.J. Struijk, Segmentation of heart intelligence, in: Proceedings of the 19th Australian Joint Conference on Artificial
sound recordings by a duration-dependent hidden markov, Model. Physiol. Meas. Intelligence, Hobart, Australia, 4–8 December, 2006.
31 (2010) 513–529. [54] K. Kowsari, K.J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, D. Brown, Text
[28] D.B. Springer, L. Tarassenko, G.D. Clifford, Logistic regression-HSMM-based heart classification algorithms: a survey, Information 10 (2019) 1–68.
sound segmentation, IEEE Trans. Biomed. Eng. 63 (2016) 822–832. [55] Z. Ebrahimi, et al., A review on deep learning methods for ECG arrhythmia
[29] S. Ari, K. Hembram, G. Saha, Detection of cardiac abnormality from PCG signal classification, Exp. Syst. Appl. X (2020), 100033.
using LMS based least square SVM classifier, Expert Syst. Appl. 37 (2010) [56] M. Alfaras, M.C. Soriano, S. Ortín, A fast machine learning model for ECG-based
8019–8026. heartbeat classification and arrhythmia detection, Front. Phys. 7 (2019) 103.
[30] F. Safara, S. Doraisamy, A. Azman, A. Jantan, A.R. Abdullah Ramaiah, Multi-level [57] S. Minaee; I. Bouazizi; P. Kolan; H. Najafzadeh Ad-Net: audio-visual convolutional
basis selection of wavelet packet decomposition tree for heart sound classification, neural network for advertisement detection In Videos. arXiv 2018, arXiv:
Comput. Biol. Med. 43 (2013) 1407–1414. 1806.08612.
[31] A. Raza, A. Mehmood, S. Ullah, M. Ahmad, G. Choi, B. On, Heartbeat sound signal [58] V. Ravi, D. Pradeepkumar, K. Deb, Financial time series prediction using hybrids of
classification using deep learning, Sensors 19 (21) (2019) 4819, https://doi.org/ chaos theory, multi-layer perceptron and multi-objective evolutionary algorithms,
10.3390/s19214819. Swarm Evol. Comput. 36 (2017) 136–149.
[32] S. Hershey, S. Chaudhuri, D.P.W. Ellis, J.F. Gemmeke, A. Jansen, R.C. Moore, [59] Z. Zhao, W. Chen, X. Wu, P.C.Y. Chen, J. Liu, LSTM network: a deep learning
M. Plakal, D. Platt, R.A. Saurous, B. Seybold, et al., CNN architectures for large- approach for short-term traffic forecast, IET Intel. Transp. Syst. 11 (2017) 68–75.
scale audio classification, in: Proceedings of the ICASSP, IEEE International [60] R. Zhao, R. Yan, J. Wang, K. Mao, Learning to monitor machine health with
Conference on Acoustics, Speech and Signal Processing, New Orleans, LA, USA, 5–9 convolutional Bi-directional LSTM networks, Sensors 17 (2017) 273.
March, 2017, pp. 131–135. Sensors 2019, 19, 4819 15 of 15. [61] Zümray Dokur, Tamer Ölmez, Heartbeat classification by using a convolutional
[33] S.R. Thiyagaraja, R. Dantu, P.L. Shrestha, A. Chitnis, M.A. Thompson, P. neural network trained with walsh functions, Neural Comput. Appl. (2020) 1–20.
T. Anumandla, T. Sarma, S. Dantu, A novel heart-mobile interface for detection and [62] G. Petmezas, K. Haris, L. Stefanopoulos, V. Kilintzis, A. Tzavelis, J.A. Rogers,
classification of heart sounds, Biomed. Signal Process. Control 45 (2018) 313–324. N. Maglaveras, Automated atrial fibrillation detection using a hybrid CNN-LSTM
[34] Y. Xu, Q. Kong, W. Wang, M.D. Plumbley, Large-scale weakly supervised audio network on imbalanced ECG datasets, Biomed. Signal Process. Control 63 (2021),
classification using gated convolutional neural network, in: Proceedings of the 102194.
2018 IEEE International Conference on Acoustics, Speech and Signal Processing [63] M. Al-dabag, H.S. ALRikabi, R. Al-Nima, Anticipating Atrial Fibrillation Signal
(ICASSP), Calgary, AB, Canada, 15–20 April, 2018, pp. 121–125. Using Efficient Algorithm, 2021.
[35] Y. Li, X. Li, Y. Zhang, W. Wang, M. Liu, X. Feng, Acoustic scene classification using [64] S. Raghunath, J.M. Pfeifer, A.E. Ulloa-Cerna, A. Nemani, T. Carbonati, L. Jing, C.
deep audio feature and BLSTM network, in: Proceedings of the ICALIP 2018—6th M. Haggerty, Deep neural networks can predict new-onset atrial fibrillation from
International Conference on Audio, Language and Image Processing, Shanghai, the 12-lead ECG and help identify those at risk of atrial fibrillation–related stroke,
China, 16–17 July, 2018, pp. 371–374. Circulation 143 (13) (2021) 1287–1298.
[36] K. Xu, B. Zhu, Q. Kong, H. Mi, B. Ding, D. Wang, H. Wang, General audio tagging [65] B.M. Mathunjwa, Y.T. Lin, C.H. Lin, M.F. Abbod, J.S. Shieh, ECG arrhythmia
with ensembling convolutional neural networks and statistical features, J. Acoust. classification by using a recurrence plot and convolutional neural network,
Soc. Am. 145 (2019) EL521–EL527. Biomed. Signal Process. Control 64 (2021), 102262.
[37] G. Keren, B. Schuller, Convolutional RNN: an enhanced model for extracting [66] Ö. Arslan, M. Karhan, Effect of Hilbert-Huang transform on classification of PCG
features from sequential data, in: Proceedings of the International Joint Conference signals using machine learning, J. King Saud Univ. Comput. Inf. Sci. (2022).
on Neural Networks, Vancouver, BC, Canada, 24–29 July, 2016, pp. 3412–3419. [67] X. Li, F. Zhang, Z. Sun, D. Li, X. Kong, Y. Zhang, Automatic heartbeat classification
[38] Z. Masetic, A. Subasi, Congestive heart failure detection using random forest using S-shaped reconstruction and a squeeze-and-excitation residual network,
classifier, Comput. Methods Prog. Biomed. 130 (2016) 54–64. Comput. Biol. Med. 140 (2022), 105108.
[39] H. Malik, M.S. Farooq, A. Khelifi, A. Abid, J.N. Qureshi, M. Hussain, A comparison [68] Y.P. Sai, L.R. Kumari, Cognitive assistant DeepNet model for detection of cardiac
of transfer learning performance versus health experts in disease diagnosis from arrhythmia, Biomed. Signal Process Control 71 (2022), 103221.
medical imaging, IEEE Access (2022), https://doi.org/10.1109/ [69] A.M. Alqudah, A. Alqudah, Deep learning for single-lead ECG beat arrhythmia-type
ACCESS.2020.3004766. detection using novel iris spectrogram representation, Soft Comput. 26 (3) (2022)
[40] M.S. Naeem, Farooq, A. Khelifi, A. Abid, Malignant melanoma classification using 1123–1139.
deep learning: datasets, performance measurements, challenges and opportunities, [70] J. Rubin, S. Parvaneh, A. Rahman, B. Conroy, S. Babaeizadeh, Densely connected
IEEE Access 8 (2020) 110575–110597, https://doi.org/10.1109/ convolutional networks for detection of atrial fibrillation from short single-lead
ACCESS.2020.3001507. ECG recordings, J. Electrocardiol. 51 (2018) S18–S21.
[41] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a [71] P.A. Warrick, M.N. Homsi, Ensembling convolutional and long short-term memory
simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15 (1) networks for electrocardiogram arrhythmia detection, Physiol. Meas. 39 (2018),
(2014) 1929–1958. 114002.
[42] A. Raza, A. Mehmood, S. Ullah, M. Ahmad, G.S. Choi, B.W. On, Heartbeat sound [72] S. Liaqat, K. Dashtipour, A. Zahid, K. Assaleh, K. Arshad, N. Ramzan, Detection of
signal classification using deep learning, Sensors 19 (2019) 4819. atrial fibrillation using a machine learning approach, Information. 11 (12) (2020)
[43] J. Park, K. Lee, K. Kang, Arrhythmia detection from heartbeat using k-nearest 549, https://doi.org/10.3390/info11120549.
neighbor classifier, in: Proceedings of the 2013 IEEE International Conference on [73] Y. Sun, Y.Y. Yang, B.J. Wu, et al., Contactless facial video recording with deep
Bioinformatics and Biomedicine, IEEE, 2013, pp. 15–22. learning models for the detection of atrial fibrillation, Sci. Rep. 12 (2022) 281,
[44] G. Sannino, G. De Pietro, A deep learning approach for ECG-based heartbeat https://doi.org/10.1038/s41598-021-03453-y.
classification for arrhythmia detection, Fut. Gener. Comput. Syst. 86 (2018) [74] L. Meng, W. Tan, J. Ma, R. Wang, X. Yin, Y. Zhang, Enhancing dynamic ECG
446–455. heartbeat classification with lightweight transformer model, Artif. Intell. Med.
(2022), 102236.
12