Journal Pre-Proof: Computer Methods and Programs in Biomedicine

Journal Pre-proof
Classification of heart sound signals using a novel deep WaveNet

model
Shu Lih Oh , V. Jahmunah , Chui Ping Ooi , Ru-San Tan ,

Edward J Ciaccio , Toshitaka Yamakawa , Masayuki Tanabe ,
Makiko Kobayashi , U. Rajendra Acharya
PII: S0169-2607(20)31437-1
DOI: https://doi.org/10.1016/j.cmpb.2020.105604
Reference: COMM 105604
To appear in: Computer Methods and Programs in Biomedicine
Received date: 1 May 2020

Accepted date: 7 June 2020
Please cite this article as: Shu Lih Oh , V. Jahmunah , Chui Ping Ooi , Ru-San Tan ,
Edward J Ciaccio , Toshitaka Yamakawa , Masayuki Tanabe , Makiko Kobayashi ,
U. Rajendra Acharya , Classification of heart sound signals using a novel deep WaveNet model, Com-
puter Methods and Programs in Biomedicine (2020), doi: https://doi.org/10.1016/j.cmpb.2020.105604
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.
© 2020 Elsevier B.V. All rights reserved.

Official (Closed) - Non Sensitive
Highlights
 Automated detection of five heart sounds: normal, aortic stenosis, mitral valve prolapse,
mitral stenosis, mitral regurgitation
 Novel deep WaveNet model is proposed
 Obtained average training classification accuracy of 97%
 System can aid cardiologists in the accurate detection of heart valve diseases in patients
1
Full Length Article
Classification of heart sound signals using a novel deep

WaveNet model
Shu Lih Oh1, V. Jahmunah1, Chui Ping Ooi2, Ru-San Tan3, Edward J Ciaccio4,
Toshitaka Yamakawa5, Masayuki Tanabe5,6, Makiko Kobayashi5, U Rajendra
Acharya1,6,7*
School of Engineering, Ngee Ann Polytechnic, Singapore

1
2School of Science and Technology, Singapore University of Social Sciences, 463 Clementi Road, 599494,
Singapore
3National Heart Centre, Singapore
4Department of Medicine - Cardiology, Columbia University, USA
5Department of Computer Science and Electrical Engineering, Kumamoto University,
Japan
6International Research Organization for Advanced Science and Technology (IROAST),
Kumamoto University, Kumamoto, Japan

7Department Bioinformatics and Medical Engineering, Asia University, Taiwan.
*Corresponding author: U Rajendra Acharya (aru@np.edu.sg)
ABSTRACT
Background and objectives: The high mortality rate and increasing prevalence of heart valve
diseases globally warrant the need for rapid and accurate diagnosis of such diseases.
Phonocardiogram (PCG) signals are used in this study due to the low cost of obtaining the
signals. This study classifies five types of heart sounds, namely normal, aortic stenosis, mitral
valve prolapse, mitral stenosis, and mitral regurgitation.
Methods: We have proposed a novel in-house developed deep WaveNet model for automated
classification of five types of heart sounds. The model is developed using a total of 1000 PCG
recordings belonging to five classes with 200 recordings in each class.
Results: We have achieved a training accuracy of 97% for the classification of heart sounds into
five classes. The highest classification accuracy of 98.20% was achieved for the normal class. The
developed model was validated with a 10-fold cross-validation, thus affirming its robustness.
2
Conclusion: The study results clearly indicate that the developed model is able to classify five
types of heart sounds accurately. The developed system can be used by cardiologists to aid in
the detection of heart valve diseases in patients.
Keywords –Phonocardiograms, WaveNet model, 10-fold cross validation, aortic stenosis, mitral
valve prolapse, mitral stenosis, mitral regurgitation.
1. Introduction
Cardiovascular disease (CVD) is the leading cause of death across the globe, claiming more than
17 million lives yearly [1]. CVD involves pathological conditions of the heart, blood vessels, or
heart valves, among others. Heart valve diseases (HVD) are of particular interest in this study
due to a surge in prevalence [2] and high mortality rate, compared with other CVDs [3]. The
gold standard for HVD diagnosis is transthoracic echocardiography (TTE). However, TTE is
limited by its moderate cost and long examination time (about 45 minutes) [4]. Furthermore,
image quality may be degraded by acoustic window limitation [2].
Phonocardiography is a cost-effective method for recording heart sounds and murmurs present
during a cardiac cycle. The phonocardiogram (PCG) signals obtained from the recording can
also be read out as graphical representations, which can serve as a guiding tool for further
diagnostic assessment. Heart murmurs occur with increased antegrade and/or retrograde blood
flow across any of the valves. Innocent murmurs are heart murmurs that occur due to
physiological high-flow conditions without underlying HVD, whereas abnormal murmurs
occur as a consequence of valve stenosis or regurgitation [5]. Different types of HVDs such as
aortic stenosis (AS), mitral stenosis (MS), mitral regurgitation (MR) and mitral valve prolapse
(MVP) can be diagnosed using PCG signals.
Visual screening of the PCG signal is time-consuming and prone to error. Artificial intelligence
(AI) techniques can be used to overcome these limitations. Machine learning is a sub-section of
the AI technique that involves extraction of salient features, statistical analysis, feature selection,
and classification [6]. Machine learning algorithms coupled with PCG signals are commonly
used to detect heart sounds in HVD with various feature extraction methods and classifiers [7],
3
[8], [9], [10], [11], [12], [13], [14],[61],[62]. However, these machine learning techniques are
subjective and time-consuming (as feature and classifier selections are handcrafted, often by
iterative trial and error). To overcome this limitation, deep learning algorithms have recently
been applied to the task of classifying heart sounds from PCG [15], [16], [17], [18], [19]. Deep
learning methods are inherently automatic and do not require any feature extraction, feature
selection, or classification steps. The deep learning model itself will extract the hidden
signatures present in the PCG and perform the classification. Our study is the first to classify
heart sounds from PCG signals, into 5 classes, using the WavNet model. Figure 1 represents the
PCG signals of the 5 classes; normal(N), AS, MR, MS and MVP signals. In this paper, we have
proposed a novel WaveNet deep learning model using PCG signals for the categorization of
heart sounds in HVD.
Figure 1: PCG signals of N, MVP, MS, MR, AS classes.
4
2. Literature review
Heart valve disease (HVD) occurs when there is damage or a flaw in one or more of the four
valves present in the heart. The four valves in the human heart are: the pulmonary, aortic,
mitral, and tricuspid valves [20]. When the heart valves open and close properly, this ensures
good mechanical activity and performance of the heart as well as prevention of the backflow of
blood [21]. Any damage to the heart valves gives rise to HVD. Common types of HVDs include
aortic stenosis (AS), mitral stenosis (MS), mitral regurgitation (MR) and mitral valve prolapse
(MVP). Figures 2a-2b represent the normal and pathological images of aortic valve and Figures
3a-3d represent the normal and pathological images of mitral valve disease (mitral stenosis,
mitral regurgitation and mitral prolapse).
(a) (b)
Figure 2: Images of aortic valve: (a) Normal and (b) Aortic stenosis.
5
(a) (b)
(d)
Figure 3: Images of(c)mitral valve: (a) Normal (b) Mitral stenosis, (c) Mitral regurgitation and
(d) Mitral prolapse.
The traditional method of using a stethoscope to listen to heart sounds (cardiac auscultation) for
detection of abnormalities is still practiced today. However, this method is subjective, as the
diagnostic accuracy is believed to vary depending on the experience, interpretation, and
perceptive capabilities of clinical examiner [22]. The gold standard for a HVD diagnosis is
transthoracic echocardiography (TTE). However, TTE is limited by its moderate cost, long
examination time [4] and occasional poor image quality [2]. Other diagnostic techniques have
also been explored by researchers. Savino et al. [23] investigated handheld ultrasound and
focused cardiovascular echocardiography for the early diagnosis of cardiac diseases by
assessing valvular structure or flow based on parameters such as atrial and ventricular
dimensions or heart wall thickness. However, for a handheld ultrasonic device, proper training
is required and accurate selection of depth and gain parameters are imperative to produce the
suitable high-resolution images required for HVD diagnosis [24].
6
Chaothawee et al. [25] investigated valvular regurgitation using magnetic resonance imaging
for HVD diagnosis. Valvular regurgitation occurs due to incomplete valve closure, which
results in a high speed regurgitant jet flow of blood back into the upstream chamber. Crucial
parameters involving the valvular structure and flow were studied. However, MRI is even more
expensive and time-consuming than TTE. Myserson et al. [26] discussed the use of
cardiovascular magnetic resonance (CMR) for diagnosing different types of HVD. One
limitation of CMR is its inability to assess the pressure within a cardiac chamber, which requires
invasive cardiac catheterization. The electrocardiogram (ECG) may be able to display changes
association with HVD-induced heart chamber remodeling, e.g. left ventricular hypertrophy
with or without related repolarization anomalies [27], but is unable to provide much
information on the murmur and specific HVD [28]. Phonocardiography is a diagnostic
technique used to measure and record sounds and murmurs produced by the heart during a
cardiac cycle. The initial heart sound s1 occurs at the beginning of the systolic interval, due to
closure of the atrioventricular (mitral or tricuspid) valve. The subsequent heart sound s2 occurs
at the start of the diastolic interval, due to the closing of the aortic or pulmonic valve. Heart
murmurs occur with increased antegrade and/or retrograde blood flow across any of the valves,
which can be either physiological (innocent murmur) or pathological.
PCG signals are used for diagnosing HVD [29] and deep learning systems have recently been
utilized for the categorization of heart sounds from these signals. Table 1 presents a summary of
studies that employ deep learning models for automated categorization of heart sounds in
HVD.
Table 1a : Summarised studies on the automated classification of heart sounds (Normal versus abnormal: 2 class)
using deep learning model combined with PCG signals.
Authors, year Methods used Participant information Results/Findings
Normal: 2575 PCG Ensemble classifier

 Convolutional neural network waveforms
ACY: 85%
Potes et al., [18]  Ensemble classifier Abnormal: 665 PCG
SPEC: 82%
2016  Decomposition of signals into 4 waveforms SENS: 88%
frequency bands
 Deep neural network(Deep Healthy + abnormal : 3153 Score: 89%

Thomae et SPEC: 83%
gated RNN+ convolutional front recordings
al.,[30] 2016 SENS: 96%
end)
7
 Validation score
Yang et al., [31]  Recurrent neural network 420 000 samples ACY: 84%
2016  Cross-validation technique
Training data: 3126 heart
Overall score: 79.5%
Ryu et al., [32]  Convolutional neural network sound recordings SPEC: 88.2%
2016  Torch deep learning framework Validation data: 300 heart SENS: 70.8%
sound recordings
 Deep tree-like convolutional Healthy + abnormal : 3153 Deep features combined
recordings with summary features
neural network
Tschannen et al  Deep + summary features, deep ACY: 85%
[19] ., 2017 features, deep features + state SPEC: 82%
statistics SENS: 88%
Healthy + abnormal : 4430

Maknickas et al  Convolutional neural network, Training ACY: 99.7%
recordings from 1072
[33] ., 2017 256 hidden layers Validation ACY: 95.2%
participants
Massachusetts Institute of
Technology Heart Sound
database: Aalborg University Heart
Normal : 157 subjects Sound database:
 Deep belief networks
Abnormal: 332 patients
Faturrahman et  Shannon energy envelope for
Aalborg University Heart ACY: 86.15%
al., [34] 2017 segmentation
Sound database:
 5-fold cross validation
Normal : 153 subjects
Abnormal: 435 patients
Testing data: 301 PCG

 Segmentation of signals
waveforms
 Segmented signals transformed ACY: 83.9%
Rubin et al.,[35] Training data: 2852 PCG
into heat maps. SPEC: 95.2%
2017 waveforms SENS: 72.7%
 Convolutional neural network
Total: 3153 waveforms
trained using heat maps
 Segmentation using logistic Healthy + abnormal: 3126

regression hidden semi-Markov heart sound recordings Random forest ensemble
classifier
model
Nasralla et al.,
 Time, frequency features ACY: 92%
[36] 2017
 Deep neural network(15 layers) SPEC: 98%
SENS: 78%
 Random forest ensemble
classifier
8
 Pre-processed for first, second Healthy + abnormal: 3240

heart sounds raw heart sounds
 Segmentation of signals Bidirectional long short-
term memory model
 Feature selection using Mel-
Latif et al., [37] ACY: 97.1%
frequency cepstral coefficients SPEC: 96.7%
2018
 Classification using long short- SENS: 99.9%
term memory, bi-directional
long short-term memory models
Innocent murmur: 336

ACY: 84.5%
Bozkurt et al  Convolutional neural network recordings(327 children) SPEC: 78.5%
.,[38] 2018  Sub-band envelope features Abnormal murmur: 130 SENS: 81.5%
recordings(117 children)
 Heart sounds from 9 databases Healthy + abnormal: 3126
 Pre-processed and segmented heart sound recordings
Modified AlexNet model
using neuromorphic auditory
Domingeuz-
sensor ACY: 97%
Morales., [5] SPEC: 95.12%
 Modified AlexNet model
2018 SENS: 93.20%
performed best amongst other
convolutional neural networks
Normal: 2575 recordings ACY: 91.5%

Han et al.,[39]  Convolutional neural network SPEC: 84.67 %
Abnormal: 665 recordings
2018  10-fold cross validation SENS: 98.33%
 Stacked autoencoder + Softmax Normal: 200 heart sound

layer samples ACY: 93.0%
Chen et al., [40] SPEC: 88.5%
 Feature extraction using sample Extrasystole: 49 heart
2018 SENS: 98.0%
entropy, wavelet decomposition sound samples
and Hilbert-Huang transform
 Deep neural network Normal: 9152 PCG cycles ACY: 92.6%
Sotaquira et SPEC: 93.8%
 Weighted probability for Abnormal: 9027 PCG
al.,[41] 2018 SENS: 91.3%
classification cycles
 Convolutional neural network Normal: 2575 heart sound Hold out testing
Wu et al.,[42]  Ensemble learning recordings
ACY: 86.0%
2019  Hold-out validation Abnormal: 665 heart sound
SPEC: 85.63%
 10-fold cross-validation recordings SENS: 86.46%
 1-dimenional convolutional Dataset 1
Li et al.,[43] 2019 neural network Normal: 43 subjects ACY: 99.01%
 Denoising autoencoder for Abnormal: 2 patients
9
extraction of deep features Dataset 2

Normal: 2532 heart sound
recordings
Abnormal: 664 heart sound
recordings
Table 1b : Summarised studies on the automated classification of heart sounds ( 3 class) using deep learning
model combined with PCG signals.
Dataset A(Normal,
murmur, Extra heart
sound, artifact):
Training data: 124
Dataset A:
recordings
Normalised precision: 77%
Zhang et al.,[44]  Convolutional neural network + Testing data: 52 recordings
Dataset B:
2017 SVM classifier Dataset B(Normal,
Normalised precision: 71%
murmur, extrasystole):
Training data: 312
recordings
Testing data: 195
recordings
Dataset A:
Normal: 31 signals
Murmur: 34 signals
 Long short-term memory model Extra heart sound: 19
Dataset A(LSTM)
Sujadevi et al.,  Gated recurrent unit signals
ACY: 76.9%
[45] 2018 Artifact: 40 signals
 5-fold, 10-fold cross-validation Dataset B:
Normal: 320 signals
Murmur: 95 signals
Extrasystole: 46 signals
 Band filter for pre-processing Normal: 320 samples

Murmur: 95 samples ACY: 80.8%
 Down sampling for
Raza et al.,[46]
discriminatory parameters Extra-systole: 48 samples
2019
 Long short-term memory model
 5-fold cross-validation
Table 1c : Summarised studies on the automated classification of heart sounds ( 5 class) using deep learning model
combined with PCG signals.
Yaseen et al., [7]  Deep neural network + machine Normal: 200 samples ACY: 97.0%
2018 learning techniques Aortic stenosis: 200
10
 Multiple features samples SENS: 94.5%

 5-fold cross-validation Mitral stenosis: 200
SPEC: 98.2%
samples
Mitral regurgitation: 200
samples
Mitral valve prolapse: 200
samples
Normal: 200 recordings
Aortic stenosis: 200
Training ACY: 97.0%
recordings
Mitral stenosis: 200 SENS: 92.5%
Present work  WaveNet architecture
 10-fold cross validation recordings
SPEC: 98.1%
Mitral regurgitation: 200
recordings
Mitral valve prolapse: 200
recordings
* ACY: Accuracy, SENS: Sensitivity, SPEC: Specificity, PPRV: positive predictive value
3. Methodology
3.1 Database and pre-processing of data
The PCG signals used in this study were obtained from a public database [7]. A total of 1000
PCG recordings were obtained from five different classes with 200 recordings each. The
different classes of signals were N, AS, MS, MR and MVP. Each recording was sampled at a
frequency of 8000 Hz. Each audio sound wave was normalized between -1 to 1 to ensure that
the data shared a common scale for easier analysis. As the samples had varying length, they
were zero-padded to a 31,943 discrete sample point length for consistency.
* https://github.com/yaseen21khan/Classification-of-Heart-Sound-Signal-Using-Multiple-Features-
3.2 Deep learning models
Deep neural networks are neural networks composed of many stacked layers [47] that in turn
contain neurons that mimic neuronal activity in the human brain. At present, some deep
learning models such as the convolutional neural network (CNN) [48], long short-term memory
model (LSTM) [49], and autoencoders [50], are commonly employed for the classification of
heart sounds, amongst other disease classifications [51], [52], [53]. A CNN model comprises
successive convolutional, pooling, and fully connected layers that facilitate the classification
11
task. However, when the input PCG signal is of a high sample rate, this may result in sizeable
parameters in the CNN model, which increases its computational complexity. Hence, the
architecture of a CNN model has to be selected experimentally by the researcher based on
validation error [54].
LSTM, a recurrent neural network, includes an input gate, forget gate, and output gate within a
cell. The gates help to control the flow of information in and out of the cell, which facilitate the
prediction or classification tasks. For audio signals, time-frequency LSTM are frequently used as
they are more adaptable to an array of input features compared with CNN. Although time-
frequency LSTM are known to outperform CNN on some tasks [55], they are less parallelizable
and therefore slower [54].
The traditional autoencoder encompasses an encoder that contains input and hidden layers and
a decoder that creates the reconstruction of data. A large number of neurons in the hidden layer
causes overfitting of the model, hence affecting its learning. Stacked denoising autoencoders
work by introducing noise to the input data in certain proportions, wherein the corrupted data
is compressed and reconstructed thereafter to estimate the input data to output data. Hence,
stacked denoising autoencoders are able to capture valuable information in a dataset that
manifests as a limitation in traditional autoencoders [56]. Therefore, stacked denoising
autoencoders are commonly employed for classification tasks. However, one needs to have
good domain knowledge to be able to select the type and level of corrupting noise to boost
application-specific performances [57].
Considering the limitations of other models, the WaveNet model was explored in this study.
The WaveNet model is a potent prediction method that has gained popularity due to its ability
to efficiently train large data (tens of thousands of samples) per second of audio. It is able to
create raw audio signals with high temporal resolution [58] and has been shown to outperform
recently devised techniques by a large margin [54]. WaveNet is a generative model that consists
of a residual block with gated activation. Apart from the lack of pooling layers, it is similar to
the CNN model and is governed by the following equation
z=tanh(Wf, k ∗ x) ⊙ σ(Wg, k ∗ x) (1)
12
wherein z represents the gated activation unit, ∗ a convolution operator, ⊙ a multiplication

function, σ the sigmoid function, k the number of layers, f the filter, g the gate, W the learnable
convolution filter, x the waveform and h the auxiliary features [59]. In the model, each audio
sample fed to the input is conditional upon the previous audio sample output. Each forecasted
sample is then fed back to the network, to help in predicting the subsequent sample. Dilated
convolutional layers are employed in the network to allow for large skips in input data, and aid
in increasing the receptive field for a better global view. The receptive field is increased with
just a few layers, while retaining the input resolution. This enables a more consistent output
over larger time scales [60]. The conditional distributions over each audio sample is conducted
using the Softmax distribution. The residual and parametrized skip connections used within the
network assist in accelerating convergence and training deeper models [60]. Figure 4 depicts
our proposed WaveNet model.
3.3 Proposed WaveNet architecture
The proposed WaveNet model consists of 6 residual blocks. It was built and trained using 3
epochs, with a batch size of 3. It was also trained with the Adam optimization algorithm, with a
learning rate of 0.0005. The trainable parameters used were 320,841, the non-trainable
parameters 0, and total parameters 320,841. Ten-fold cross-validation was used to evaluate the
model performance, where 10% of the data was used for testing, 85.5% for training, and 4.5%
for validation.
13
Fig
ure
4:
Pro
pos
ed
Wa
veN
et
mo
del.
14
4. Results
Table 2 shows the classification results using our recommended model. The five classes were
classified with accuracies of above 95%. The model classifying the normal (N) class attained the
highest accuracy (ACY) of 98.20%, and sensitivity (SEN), specificity (SPE), and PPRV values of
94.0%, 99.25%, and 96.90%, respectively. A high training accuracy of 97% was obtained by the
model.
Table 2 : Classification results of the WaveNet model.

ACY(%) SEN(%) SPE(%) PPRV(%)
N 98.20 94.00 99.25 96.90

MVP 95.20 88.50 96.87 87.62
MS 97.80 96.50 98.12 92.78
MR 96.10 89.00 97.87 91.28
AS 97.70 94.50 98.50 94.02
Following extensive review of literature on deep models employed for automated detection of
heart sounds as summarized in Table 1, our study is, to the best of our knowledge, the first
study to report a 5-class study for heart sound classification using a deep WaveNet model.
Figure 5 shows the accuracy plot of the WaveNet model, highlighting the training accuracy of
97%. It is apparent that the validation accuracy has increased steeply from epochs 1 to 5, with
minimal changes and gradual improvement from epoch 5 onwards, as the model has
asymptotically arrived at its optimal solution. Figure 6 presents the confusion matrix of the
model. From the matrix, it is noteworthy that the misclassification rates of N, MVP, MS, MR and
AS are 6%, 11%, 4%, 11%, and 6%, respectively. The low misclassification rates attest to the
robustness of our model, which has the advantage that discriminatory features were
automatically
extracted from
it.
15
Figure 5: Accuracy versus epoch plot of WaveNet model.
Figure 6: Confusion matrix of the WaveNet model.
5. Discussion
High accuracy, sensitivity and specificity values of 97%, 92.5%, and 98.1% were achieved with
our WavNet model. From Tables 1a to c, it can be noted that CNN models primarily, followed
by LSTM and autoencoders, were employed for the classification of heart sounds. Latif et al.
[37] and Li et al. [43] reported higher classification accuracies of above 97%. However, they only
performed a two-class study. Domingeuz-Morales et al. [5] and Maknickas et al. [33] achieved
an accuracy of 97%, which is comparable to ours. However. the results were also reported for a
two-class study. The other studies in Table 1a reported lower accuracies for two-class study.
16
Table 1b summarizes the classification results obtained in three-class studies, where all the
papers reported accuracies lower than that of ours. In Table 1c, Yaseen et al. [7] used a similar
data size and obtained the same accuracy as ours, although our model achieved slightly lesser
sensitivity and specificity values. However, in Yaseen et al.’s study, traditional machine
learning techniques such as discrete wavelet transform were used to extract features from the
PCG signals, before feeding them into the deep neural network. This was more tedious as their
features had to be extracted and selected manually as opposed to our study, where there was no
need for handcrafted features, since discriminatory features were automatically selected by our
model. Additionally our model took less than a millisecond to classify each sample. Hence, our
method is less time-consuming and more appropriate for rapid diagnosis. Furthermore, we
used 10-fold cross-validation for our large data size, whereas Yaseen et al. [7] had used only 5-
fold validation. With a larger k value of 10, the size difference between the training set and
resampling subsets becomes smaller, hence reducing the bias of our proposed technique. This
affirms that our proposed method is more robust in comparison. It can be noted from the
confusion matrix that the misclassification rates are less than 10% for each class. This lowest
misclassification rate confirms the high accuracy of the developed model.
There are advantages and limitations to our recommended model.

Advantages
1. The model is robust as it has been trained on sizeable data and validated by ten-fold.
2. The model can be trained rapidly due to the presence of dilated convolutional layers.
3. A large amount of data (in tens of thousands) can be trained by the model.
Limitation
1. Time is required for the model to be built and trained prior to classification.
2. WaveNet is a large model which requires more memory space; hence it can be
computationally expensive.
6. Future work
17
We hope to obtain an even larger database consisting of thousands of PCG signals collected
over a few months’ period, and train the model to predict valvular heart diseases from heart
sounds. This can only be done through improved learning with more data.
Figure 7: Deployment of developed model in the cloud computing setting.
It is also our intent to keep our developed model in the cloud server as shown in Figure 7. The
heart sound signals captured by the stethoscope will be sent to our developed WaveNet model,
kept in the cloud server, for accurate and faster diagnosis. The outcome of the diagnosis will
then be directly sent to the hospital. This will aid clinicians to validate and improve their
diagnosis.
7. Conclusion
The prevalence of heart valve disease is increasing, with a high mortality rate in comparison
with other cardiovascular diseases. PCG signals from patients with heart valve disease contain
important information that can be used for diagnosis. In our study, the signals were used to
train a deep WaveNet model for the classification of heart valve disease types, namely, normal,
mitral valve prolapse, mitral stenosis, mitral regurgitation, and aortic stenosis. The highest
accuracy of 98.20% was achieved with the classification of normal, where a high training
accuracy of 97% was obtained. The model was validated using 10-fold cross-validation,
attesting to its robustness. This is one of the first studies to have performed a classification of
heart sounds based on 5 classes and utilizing a WaveNet model. The proposed model can
potentially be used as a diagnostic tool by cardiologists for the detection of heart valve disease.
Conflict of interest: The authors declare that they have no conflicts of interest.
18
Acknowledgement
This research was supported by the Singapore Ministry of Education under the Baseline
Funding MOE-BLINE-19MBL004.
8. References
[1] “Cardiovascular diseases, CVD Fact Sheet,” World Health Organisation. .

[2] T. Vos et al., “Global, regional, and national incidence, prevalence, and years lived with
disability for 301 acute and chronic diseases and injuries in 188 countries, 1990-2013: A
systematic analysis for the Global Burden of Disease Study 2013,” Lancet, 2015.
[3] D. Mozaffarian et al., “Executive summary: Heart disease and stroke statistics-2016
update: A Report from the American Heart Association,” Circulation, vol. 133, no. 4, pp.
447–454, 2016.
[4] J. Draper, S. Subbiah, R. Bailey, and J. B. Chambers, “Murmur clinic : validation of a new
model for detecting heart valve disease,” pp. 56–59, 2019.
[5] J. P. Dominguez-Morales, A. F. Jimenez-Fernandez, M. J. Dominguez-Morales, and G.
Jimenez-Moreno, “Deep Neural Networks for the Recognition and Classification of Heart
Murmurs Using Neuromorphic Auditory Sensors,” IEEE Trans. Biomed. Circuits Syst., vol.
12, no. 1, pp. 24–34, 2018.
[6] O. Faust, R. Acharya U, S. M. Krishnan, and L. C. Min, “Analysis of cardiac signals using
spatial filling index and time-frequency domain,” Biomed. Eng. Online, vol. 3, pp. 1–11,
2004.
[7] Yaseen, G. Y. Son, and S. Kwon, “Classification of heart sound signal using multiple
features,” Appl. Sci., vol. 8, no. 12, 2018.
[8] S. Patidar and R. B. Pachori, “Classification of cardiac sound signals using constrained
tunable-Q wavelet transform,” Expert Syst. Appl., vol. 41, no. 16, pp. 7161–7170, 2014.
[9] S. Ari, K. Hembram, and G. Saha, “Detection of cardiac abnormality from PCG signal
using LMS based least square SVM classifier,” Expert Syst. Appl., vol. 37, no. 12, pp. 8019–
8026, 2010.
[10] J. Li, L. Ke, and Q. Du, “Classification of Heart Sounds Based on the Wavelet,” Entropy,
vol. 21, no. 5, p. 472, 2019.
[11] F. Safara, S. Doraisamy, A. Azman, A. Jantan, and A. R. Abdullah Ramaiah, “Multi-level
basis selection of wavelet packet decomposition tree for heart sound classification,”
Comput. Biol. Med., vol. 43, no. 10, pp. 1407–1414, 2013.
[12] Y. Zheng, X. Guo, and X. Ding, “A novel hybrid energy fraction and entropy-based
approach for systolic heart murmurs identification,” Expert Syst. Appl., vol. 42, no. 5, pp.
2710–2721, 2015.
[13] S. K. Randhawa and M. Singh, “Classification of Heart Sound Signals Using Multi-modal
Features,” Procedia Comput. Sci., vol. 58, pp. 165–171, 2015.
[14] S. K. Ghosh, R. N. Ponnalagu, R. K. Tripathy, and U. R. Acharya, “Automated detection
of heart valve diseases using chirplet transform and multiclass composite classifier with
PCG signals,” Comput. Biol. Med., vol. 118, no. February, 2020.
19
[15] C. Xu, Q. Long, and J. Zhou, “S1 and S2 heart sound recognition using optimized BP
neural network,” ACM Int. Conf. Proceeding Ser., vol. 64, no. 2, pp. 105–110, 2019.
[16] V. Maknickas and A. Maknickas, “Recognition of normal-abnormal phonocardiographic
signals using deep convolutional neural networks and mel-frequency spectral
coefficients,” Physiol. Meas., vol. 38, no. 8, pp. 1671–1684, 2017.
[17] E. Kay and A. Agarwal, “DropConnected neural networks trained on time-frequency and
inter-beat features for classifying heart sounds,” Physiol. Meas., vol. 38, no. 8, pp. 1645–
1657, 2017.
[18] C. Potes, S. Parvaneh, A. Rahman, and B. Conroy, “Ensemble of feature-based and deep
learning-based classifiers for detection of abnormal heart sounds,” Comput. Cardiol.
(2010)., vol. 43, no. December, pp. 621–624, 2016.
[19] M. Tschannen, T. Kramer, G. Marti, M. Heinzmann, and T. Wiatowski, “Heart sound
classification using deep structured features,” in 2016 Computing in Cardiology Conference
(CinC), 2016, pp. 565–568.
[20] M. J. Legato, “Gender and the heart: sex-specific differences in normal anatomy and
physiology.,” The journal of gender-specific medicine : JGSM : the official journal of the
Partnership for Women’s Health at Columbia. 2000.
[21] E. Nevo, “Method and apparatus for the assessment and display of variability in
mechanical activity of the heart, and enhancement of ultrasound contrast imaging by
variability analysis, US patent,” vol. 1, no. 12, 2001.
[22] A. Arslan and O. Yildiz, “Automated auscultative diagnosis system for evaluation of
phonocardiogram signals associated with heart murmur diseases,” Gazi Univ. J. Sci., vol.
31, no. 1, pp. 112–124, 2018.
[23] K. Savino and G. Ambrosio, “Handheld ultrasound and focused cardiovascular
echography: Use and information,” Med., vol. 55, no. 8, 2019.
[24] S. K. Ghosh, R. N. Ponnalagu, R. K. Tripathy, and U. R. Acharya, “Automated detection
of heart valve diseases using chirplet transform and multiclass composite classifier with
PCG signals,” Comput. Biol. Med., vol. 118, no. January, 2020.
[25] L. Chaothawee, “Diagnostic approach to assessment of valvular heart disease using MRI
- Part I: A practical approach for valvular regurgitation,” Heart Asia, vol. 4, no. 1, pp. 38–
43, 2012.
[26] S. G. Myerson, “Heart valve disease: Investigation by cardiovascular magnetic
resonance,” J. Cardiovasc. Magn. Reson., vol. 14, no. 1, pp. 1–23, 2012.
[27] K. Maganti, V. H. Rigolin, M. E. Sarano, and R. O. Bonow, “Valvular heart disease:
Diagnosis and management,” Mayo Clin. Proc., vol. 85, no. 5, pp. 483–500, 2010.
[28] W. Phanphaisarn, A. Roeksabutr, P. Wardkein, J. Koseeyaporn, and P. Yupapin, “Heart
detection and diagnosis based on ECG and EPCG relationships,” Med. Devices Evid. Res.,
vol. 4, no. 1, pp. 133–144, 2011.
[29] V. N. Varghees and K. I. Ramachandran, “A novel heart sound activity detection
framework for automated heart sound analysis,” Biomed. Signal Process. Control, vol. 13,
no. 1, pp. 174–188, 2014.
[30] C. Thomae and A. Dominik, “Using deep gated RNN with a convolutional front end for
end-to-end classification of heart sound,” Comput. Cardiol. (2010)., vol. 43, pp. 625–628,
20
2016.
[31] T. C. I. Yang and H. Hsieh, “Classification of acoustic physiological signals based on deep
learning neural networks with augmented features,” Comput. Cardiol. (2010)., vol. 43, pp.
569–572, 2016.
[32] H. Ryu, J. Park, and H. Shin, “Classification of heart sound recordings using convolution
neural network,” Comput. Cardiol. (2010)., vol. 43, pp. 1153–1156, 2016.
[33] V. Maknickas and A. Maknickas, “Recognition of normal–abnormal phonocardiographic
signals using deep convolutional neural networks and mel-frequency spectral
coefficients,” Physiol. Meas., vol. 38, no. 8, pp. 1671–1684, 2017.
[34] M. Faturrahman, I. Wasito, F. D. Ghaisani, and R. Mufidah, “A classification method
using deep belief network for phonocardiogram signal classification,” 2017 Int. Conf. Adv.
Comput. Sci. Inf. Syst. ICACSIS 2017, vol. 2018-Janua, pp. 283–289, 2018.
[35] J. Rubin, R. Abreu, A. Ganguli, S. Nelaturi, I. Matei, and K. Sricharan, “Recognizing
abnormal heart sounds using deep learning,” CEUR Workshop Proc., vol. 1891, pp. 13–19,
2017.
[36] M. Nassralla, Z. E. Zein, and H. Hajj, “Classification of normal and abnormal heart
sounds,” in 2017 Fourth International Conference on Advances in Biomedical Engineering
(ICABME), 2017, pp. 1–4.
[37] S. Latif, M. Usman, R. Rana, and J. Qadir, “Phonocardiographic Sensing Using Deep
Learning for Abnormal Heartbeat Detection,” IEEE Sens. J., vol. 18, no. 22, pp. 9393–9400,
2018.
[38] B. Bozkurt, I. Germanakis, and Y. Stylianou, “A study of time-frequency features for
CNN-based automatic heart sound classification for pathology detection,” Comput. Biol.
Med., vol. 100, no. August 2017, pp. 132–143, 2018.
[39] W. Han, Z. Yang, J. Lu, and S. Xie, “Supervised threshold-based heart sound
classification algorithm,” Physiol. Meas., vol. 39, no. 11, pp. 0–11, 2018.
[40] L. Chen, J. Ren, Y. Hao, and X. Hu, “The Diagnosis for the Extrasystole Heart Sound
Signals Based on the Deep Learning,” J. Med. Imaging Heal. Informatics, vol. 8, no. 5, pp.
959–968, 2018.
[41] M. Sotaquirá, D. Alvear, and M. Mondragón, “Phonocardiogram classification using
deep neural networks and weighted probability comparisons,” J. Med. Eng. Technol., vol.
42, no. 7, pp. 510–517, 2018.
[42] J. M. T. Wu et al., “Applying an ensemble convolutional neural network with Savitzky–
Golay filter to construct a phonocardiogram prediction model,” Appl. Soft Comput. J., vol.
78, pp. 29–40, 2019.
[43] F. Li et al., “Feature extraction and classification of heart sound using 1D convolutional
neural networks,” EURASIP J. Adv. Signal Process., vol. 2019, no. 1, 2019.
[44] W. Zhang and J. Han, “Towards heart sound classification without segmentation using
convolutional neural network,” Comput. Cardiol. (2010)., vol. 44, pp. 1–4, 2017.
[45] V. G. Sujadevi, K. P. Soman, R. Vinayakumar, and A. U. P. Sankar, “Deep models for
phonocardiography (PCG) classification,” ICCT 2017 - Int. Conf. Intell. Commun. Comput.
Tech., vol. 2018-Janua, no. March 2020, pp. 211–216, 2018.
[46] A. Raza, A. Mehmood, S. Ullah, M. Ahmad, G. S. Choi, and B. W. On, “Heartbeat sound
21
signal classification using deep learning,” Sensors (Switzerland), vol. 19, no. 21, pp. 1–15,
2019.
[47] B. Y. Goodfellow I., “Courville A-Deep learning-MIT (2016),” Nature, 2016.
[48] S. Indolia, A. K. Goswami, S. P. Mishra, and P. Asopa, “Conceptual Understanding of
Convolutional Neural Network- A Deep Learning Approach,” Procedia Comput. Sci., vol.
132, pp. 679–688, 2018.
[49] R. Cascade-correlation and N. S. Chunking, “Long Short-Term Memory,” Neural Comput.,
vol. 9, no. 8, pp. 1–32, 1997.
[50] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, “Speech enhancement based on deep denoising
autoencoder,” in Proceedings of the Annual Conference of the International Speech
Communication Association, INTERSPEECH, 2013.
[51] U. R. Acharya et al., “Automated identification of shockable and non-shockable life-
threatening ventricular arrhythmias using convolutional neural network,” Futur. Gener.
Comput. Syst., vol. 79, pp. 952–959, 2018.
[52] O. Yildirim, U. B. Baloglu, R.-S. Tan, E. J. Ciaccio, and U. R. Acharya, “A new approach
for arrhythmia classification using deep coded features and LSTM networks,” Comput.
Methods Programs Biomed., vol. 176, pp. 121–133, 2019.
[53] N. Michielli, U. R. Acharya, and F. Molinari, “Cascaded LSTM recurrent neural network
for automated sleep stage classification using single-channel EEG signals,” Comput. Biol.
Med., 2019.
[54] H. Purwins, B. Li, T. Virtanen, J. Schlüter, S. Y. Chang, and T. Sainath, “Deep Learning for
Audio Signal Processing,” IEEE J. Sel. Top. Signal Process., vol. 13, no. 2, pp. 206–219, 2019.
[55] T. N. Sainath and B. Li, “Modeling time-frequency patterns with LSTM vs. convolutional
architectures for LVCSR tasks,” Proc. Annu. Conf. Int. Speech Commun. Assoc.
INTERSPEECH, vol. 08-12-Sept, pp. 813–817, 2016.
[56] P. Liu, P. Zheng, and Z. Chen, “Deep learning with stacked denoising auto-encoder for
short-term electric load forecasting,” Energies, vol. 12, no. 12, 2019.
[57] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. A. Manzagol, “Stacked denoising
autoencoders: Learning Useful Representations in a Deep Network with a Local
Denoising Criterion,” J. Mach. Learn. Res., vol. 11, pp. 3371–3408, 2010.
[58] A. van den Oord et al., “WaveNet: A Generative Model for Raw Audio,” pp. 1–15, 2016.
[59] P. Decoders, A. Graves, and L. Espeholt, “Conditional Image Generation with PixelCNN
Decoders,” Nips, 2016.
[60] K. Simonyan, S. Dieleman, A. Senior, and A. Graves, “WaveNet,” arXiv Prepr.
arXiv1609.03499v2, pp. 1–15, 2016.
[61] N. Kannathal, C.M. Lim, U.R. Acharya, P.K. Sadasivan, "Cardiac state diagnosis using
adaptive neuro-fuzzy technique", Medical Engineering & Physics, vol. 28, no.8, 809-815,
2006.
[62] O.S. Lih, V. Jahmunah, T.R. San, E.J. Ciaccio, T. Yamakawa, M. Tanabe, M. Kobayashi, O.
Faust, U. R. Acharya, "Comprehensive electrocardiographic diagnosis based on deep
learning", Artificial Intelligence in Medicine, vol.103, 101789, 2020.
22

Journal Pre-Proof: Computer Methods and Programs in Biomedicine

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Journal Pre-Proof: Computer Methods and Programs in Biomedicine

Uploaded by

Copyright:

Available Formats

Journal Pre-proof

Classification of heart sound signals using a novel deep WaveNet

Shu Lih Oh , V. Jahmunah , Chui Ping Ooi , Ru-San Tan ,

To appear in: Computer Methods and Programs in Biomedicine

Received date: 1 May 2020

© 2020 Elsevier B.V. All rights reserved.

Full Length Article

Classification of heart sound signals using a novel deep

School of Engineering, Ngee Ann Polytechnic, Singapore

4Department of Medicine - Cardiology, Columbia University, USA

5Department of Computer Science and Electrical Engineering, Kumamoto University,

Kumamoto University, Kumamoto, Japan

*Corresponding author: U Rajendra Acharya (aru@np.edu.sg)

Figure 1: PCG signals of N, MVP, MS, MR, AS classes.

Authors, year Methods used Participant information Results/Findings

Normal: 2575 PCG Ensemble classifier

 Deep neural network(Deep Healthy + abnormal : 3153 Score: 89%

Healthy + abnormal : 4430

Testing data: 301 PCG

 Segmentation using logistic Healthy + abnormal: 3126

 Pre-processed for first, second Healthy + abnormal: 3240

Innocent murmur: 336

Normal: 2575 recordings ACY: 91.5%

 Stacked autoencoder + Softmax Normal: 200 heart sound

 Denoising autoencoder for Abnormal: 2 patients

extraction of deep features Dataset 2

 Band filter for pre-processing Normal: 320 samples

 Multiple features samples SENS: 94.5%

3.1 Database and pre-processing of data

3.2 Deep learning models

z=tanh(Wf, k ∗ x) ⊙ σ(Wg, k ∗ x) (1)

wherein z represents the gated activation unit, ∗ a convolution operator, ⊙ a multiplication

3.3 Proposed WaveNet architecture

Table 2 : Classification results of the WaveNet model.

N 98.20 94.00 99.25 96.90

Figure 5: Accuracy versus epoch plot of WaveNet model.

Figure 6: Confusion matrix of the WaveNet model.

There are advantages and limitations to our recommended model.

Figure 7: Deployment of developed model in the cloud computing setting.

[1] “Cardiovascular diseases, CVD Fact Sheet,” World Health Organisation. .

You might also like