Conv-Random Forest-Based IoT a Deep Learning Model Based on CNN and Random Forest for Classification and Analysis of Valvular Heart Diseases

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Received 3 August 2023; accepted 8 September 2023. Date of publication 29 September 2023; date of current version 23 October 2023.

The review of this paper was arranged by Associate Editor Amitava Chatterjee.
Digital Object Identifier 10.1109/OJIM.2023.3320765

Conv-Random Forest-Based IoT: A Deep Learning


Model Based on CNN and Random Forest
for Classification and Analysis of Valvular
Heart Diseases
TANMAY SINHA ROY 1 , JOYANTA KUMAR ROY 2 (Senior Member, IEEE),
AND NIRUPAMA MANDAL 3 (Senior Member, IEEE)
1 Electrical Engineering Department, Haldia Institute of Technology, Haldia 721657, India

2 Electronics and Communication Engineering Department, Narula Institute of Technology, Kolkata 700109, India

3 Electronics Engineering Department, Indian Institute of Technology (Indian School of Mines) Dhanbad, Dhanbad 826004, India

CORRESPONDING AUTHOR: T. S. ROY (e-mail: tanmoysinha.roy@gmail.com)

ABSTRACT Cardiovascular diseases are growing rapidly in this world. Around 70% of the world’s
population is suffering from the same. The entire research work is grouped into the classification and
analysis of heart sound. We defined a new squeeze network-based deep learning model—convolutional
random forest (RF) for real-time valvular heart sound classification and analysis using industrial Raspberry
Pi 4B. The proposed electronic stethoscope is Internet enabled using ESP32, and Raspberry Pi. The said
Internet of Things (IoT)-based model is also low cost, portable, and can be reachable to distant remote
places where doctors are not available. As far as the classification part is concerned, the multiclass
classification is done for seven types of valvular heart sounds. The RF classifier scored a good accuracy
among other ensemble methods in small training set data. The CNN-based squeeze net model achieved
a decent accuracy of 98.65% after its hyperparameters were optimized for heart sound analysis. The
proposed IoT-based model overcomes the drawbacks faced individually in both squeeze network and
RF. CNN-based squeeze net model and RF classifier combined together improved the performance of
classification accuracy. The squeeze net model plays a pivotal part in the feature extraction of heart
sound, and an RF classifier acts as a classifier in the class prediction layer for predicting class labels.
Experimental results on several datasets like the Kaggle dataset, the Physio net challenge, and the Pascal
Challenge showed that the Conv-RF model works the best. The proposed IoT-based Conv-RF model is
also applied on the selected subjects with different age groups and genders having a history of heart
diseases. The Conv-RF method scored an accuracy of 99.37 ± 0.05% on the different test datasets with
a sensitivity of 99.5 ± 0.12% and specificity of 98.9 ± 0.03%. The proposed model is also examined
with the current state-of-the-art models in terms of accuracy.

INDEX TERMS Cardiovascular disorder, convolutional neural network, electronic stethoscope, ensemble
learning, PCG signal, random forest (RF), Raspberry Pi, squeeze network.

I. INTRODUCTION as well. Sparse networks find better applications in terms

D EEP Learning finds many applications in natural


language processing (NLP), speech recognition, pattern
recognition, image analysis, and medical image diagno-
of heart sound analysis, as it is most suited to learn-
ing local features in a heart sound compared to deep
neural networks. Deep networks generally have many
sis. In medical image diagnosis, heart sound diagnosis and added layers, which, in turn, makes their time complexity
early screening prove to be very effective and challenging higher.
© 2023 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

VOLUME 2, 2023 2500717


SINHA ROY et al.: CONV-RF-BASED IoT: A DEEP LEARNING MODEL BASED ON CNN AND RF

FIGURE 1. Bagging ensemble method. FIGURE 2. Stacking ensemble method.

We defined our work based on the following two


hypotheses.
1) Squeeze network-based CNN model is used as a feature
learning part which scored very high accuracy in heart
sound analysis problems.
2) Random forest (RF) algorithm is used for the classifi-
cation part, which attained better accuracy in terms of
classification of heart sound disorder compared to a
single decision tree (DT) method and gradient boosting
(GB) methods.
In classification problems, ensemble methods are primar-
ily used and are more suitable than any other supervised
machine learning algorithms.
The objective of this research work is to develop an
Internet-enabled low-cost, portable, and accurate Raspberry FIGURE 3. Boosting ensemble method.
Pi-based electronic stethoscope system. The proposed stetho-
scope system can be applied to subjects even in remote places
for the measurement and analysis of valvular heart sounds. Ensemble machine learning methods like RF, DT, GB, and
The said stethoscope is also ear contactless since auscultation extreme GB (XGB) methods are found a special place in
can be done through a Bluetooth-connected microspeaker. classification problems as they achieve a reasonable amount
Mainly, the ensemble methods can be of three types. of accuracy for large training datasets as well as low training
1) Bagging. time.
2) Stacking. Supervised machine learning methods like Naïve Bayes
3) Boosting. (NV), support vector machine (SVM), and multilayer per-
ceptron (MLP) are also used in classification problems, but
In the bagging algorithm, the input training set is subdi- they are not suitable for large training data sets.
vided into different random samples, and each sample is fed
to the DT. The result of DTs is combined through voting, II. PAPER ORGANIZATION
and finally, the output is generated, as presented in Fig. 2. Section III provides a literature study of different deep learn-
In the stacking ensemble method, the input training set is ing methods used in heart sound analysis. Section IV briefs
fed to different models, and their predictions are combined to about the methods and materials used in this research paper.
produce the final predicted output from the ensemble model, Section V highlights the result analysis of the research
as shown in Fig. 2. In boosting the algorithm, the input train- work. Eventually, Section VI summarizes the conclusions
ing set is fed to different models. The effectiveness of the and future scope of the research.
next model is improved through the previous model output
prediction. Finally, all the predicted outputs from other mod- III. LITERATURE REVIEW
els are combined to produce the final predicted output, as In this section, a detailed discussion on heart sound anal-
given in Fig. 3. ysis is carried out for classification. As per the literature

2500717 VOLUME 2, 2023


survey, many research works are carried out in this area. frequency of 44 000 Hz, and has a bandwidth of
Dwivedi et al. [1] did a study on methods for automation 65–500 Hz.
in heart sound analysis and classification. Many researchers 2) Kaggle Dataset [49], [50], [51] is also used for the
have worked on various techniques for the classification of valvular heart sound analysis. Kaggle’s heart sound
heart sounds. Mishra et al. [2] did research on identifying repository contains a collection of NS and heart sound
different segments of heart sounds for the identification of data containing murmurs.
basic heart sound segments by applying the CNN method. Cardiac Sound Dataset 2, as mentioned in Table 3, is
This study has limited scope on analysis of S3 heart sounds. obtained from the Pascal Heart Sound Repository-Dataset
Mishra et al. [3] derived a novel method for the separation B [15], [36], [50]. The cardiac sound samples last for a time
of heart and lung sounds of the phonocardiogram signal duration of 3–9 s, have a sampling frequency of 45 000 Hz,
for the identification of the S3 heart sound. However, this and have a bandwidth of 68–512 Hz as described in the
method contains a few limitations when working in real- following.
time applications. Muduli et al. [4] derived a novel algorithm Table 4 highlights a detailed description of the Physio
for the extraction of biomedical signals from noisy calcula- Net Challenge Training Set, [16], [37], [50] that contains
tions by sparse recovery analysis, and Barma et al. [5] did six training databases (A through F) comprising a collection
research on the study of S2 heart sounds, which engages of 3128 cardiac sample data. The cardiac sound samples
with calculating the time period and the energy of normal- last for a time duration of 4–8 s, a sampling frequency of
ized cardiac sounds, however, they could not distinguish 47 000 Hz, and contain a bandwidth of 72–492 Hz.
heart sounds. Dewangan et al. [6] and Mishra et al. [7] did
research work on heart sound analysis using the wavelet
transform algorithm. It has specific restrictions in online B. METHODOLOGY ADOPTED IN THE PCG SIGNAL
ANALYSIS
PCG signal analysis. Othman and Khaleel [8] did work
on PCG signal analysis using Shannon energy envelop and The methodology used in the cardiac sound analysis is
DWT features, but it could not distinguish the heart sounds described as follows.
accurately. Lubaib and Muneer [9] did work on heart defect Fig. 4 describes the schematic diagram of heart sound sig-
analysis and classification using the pattern recognition tech- nal classification [22], [30], [50]. The preprocessing block
nique, but the adopted technique is entirely based on echo comprises normalization and filtering operations. Normalized
imaging. Singh and Cheema [10] did work on PCG sig- heart sound goes to the filtering block, where a bandpass fil-
nal analysis using classification by feature extraction, but ter of bandwidth 70–450 Hz is used for background filtering
the method has limited scope working in real-time applica- of unwanted noise. A time frame of 3 s is considered for
tions. Ahmad et al. [11] did work on heart sound analysis every heart sound. The various in-depth features are captured
using a soft computing-based fuzzy classifier model based from the preprocessed signal, and eventually, categorization
on Mamdani-type fuzzy computation. The authors worked of the cardiac sound signal is done for validation of the
on a standard cardiac sound repository using an offline-based adopted model
method which was not validated with the human subjects. s (t) = f (s(t)) (1)
Roy et al. [12] reviewed the papers on discrimination of car-
diac signals. Gupta et al. [13] researched various stages in where s (t) is the preprocessed cardiac sound data.
cardiac sound signal for heart sound signal analysis. There The heart sound samples [50] are split into train data
is no important work on designing the real-time screening (85%) and test data (15%). Second, training samples are
device for valvular diseases. Apart from these works, a few again split into validation data (15%) and the remaining for
other current research works that are related to CNN-based training the software model.
PCG signal analysis and classification are highlighted in Fig. 5(a) highlights the block diagram of the display unit
Table 1. and Fig. 5(b) is the experimental setup of the hardware
system where it uses a conventional stethoscope chest piece,
IV. METHODS AND MATERIALS preamplifiers, filters, 7-inch touch screen led, and industrial
A. CARDIAC SOUND BANK DESCRIPTION Raspberry Pi 4B.
Heart sound [50] samples that are used for the valvular heart Fig. 6 provides the schematic representation of the
sound analysis have been considered from four heart sound proposed hardware model. The PCG signal is captured
repositories as in [50] and [51] by T. S. Roy et al. in 2023. through the PCG signal acquisition module that contains
1) https://github.com/yaseen21khan/Classification-of- a stethoscope chest piece, preamplifiers, filters, and buffer
Heart-Sound-Signal-Using-Multiple-Features [15], amplifier circuits followed by the processing and displaying
[17], [50]. A brief explanation of the cardiac sound unit as the industrial Raspberry Pi 4B.
bank is shown in Heart Sound Dataset1 in Table 2. Fig. 7 is the block diagram of the proposed electronic
Five types of cardiac sound samples are considered, stethoscope that contains input sensor unit and computational
namely, NS, MR, MS, MVP, and AS. Every cardiac unit. The real-time PCG signal is captured with a chest piece
sound lasts for a time duration of 5–10 s, sampling and microphone. The captured PCG signal goes through the

VOLUME 2, 2023 2500717


SINHA ROY et al.: CONV-RF-BASED IoT: A DEEP LEARNING MODEL BASED ON CNN AND RF

TABLE 1. Literature study table of some recently done works.

TABLE 2. Cardiac sound dataset1 [50], [51]. TABLE 4. Cardiac sound dataset3 [50], [51].

TABLE 3. Cardiac sound dataset2 [50], [51].

spectrum. The PCG signal [51], [52] generally belongs to


32–472 Hz for normal and abnormal sounds.
The unity gain buffer amplifier is selected for impedance
matching. The signal-conditioned output goes to the node
MCU (ESP32) which is WiFi enabled and contains a 12-bit
ADC with a sampling frequency of 44.1 kHz. Finally, the
converted digital PCG signal [19], [33] goes to the Raspberry
Pi through its enabled WiFi for further signal processing and
preamplifier unit of gain 20, followed by a 50-Hz notch filter analysis. The classified heart sound is displayed on a 7-inch
to reject the electrical interferences. The processed signal is LCD screen attached with the Raspberry Pi and is heard in
fed to an analog tunable band-pass filter with a 32–472-Hz a Bluetooth-enabled microspeaker.

2500717 VOLUME 2, 2023


Time domain Features: RMS, signal energy, signal power,
ZCR, THD, skewness, and kurtosis.
Frequency domain Features: DWT.
Classification Unit: The proposed CNN model built in
Python 3.9.2 ver. stored and implemented using Raspberry
Pi 4B.
The valvular cardiac samples used for the study of PCG
signal analysis are broadly categorized into the following.
1) Ns.
FIGURE 4. Methodology used in heart sound classification. 2) MS.
3) AS.
4) MR.
5) AR.
6) MVP.
7) EXT.
Acoustic stethoscopes based on sensors, such as
diaphragms and piezoelectric crystals, work on the princi-
ple of conversion of sound pressure into electrical energy.
They suffer from distortion in output electrical signal, so
electronic stethoscopes are developed that can incorporate
capacitive electret microphone sensors for better efficiency
and stability as given in Fig. 7.
Message queuing telemetry transport (MQTT) is a stan-
dard protocol used for sending the processed heart sound data
to the Raspberry Pi (MQTT Broker) from ESP32 (MQTT
Client) through a WiFi connection. MQTT broker and sub-
scriber are on the same device (Raspberry Pi). The input
sensor unit and computational unit are connected through
WiFi and they are Internet enabled as shown in Fig. 8. Thus,
they are in the Internet of Things (IoT).
Features [13] of the cardiac sound used for the overall
analysis are as follows.
1) Acoustic Features: MFCCs, Mel, chroma, contrast, and
Tonnetz.
FIGURE 5. (a) Raspberry Pi attached with 7-inch display unit. (b) Experimental setup 2) Time-Domain Features: RMS, signal energy, signal
of the hardware system. power, ZCR, THD, skewness, and kurtosis.
The deep learning method proposed to categorize the heart
sounds is as follows.
1) Proposed Conv-RF Method: The machine learning
methods proposed to classify the heart sounds are as
follows.
a) RF.
b) GB.
c) XGB.
All deep learning-based algorithms are written in Python
ver. 3.9.2 using Thonny Python editor (Linux). The brief
description of the proposed algorithm mentioned above is
FIGURE 6. Schematic representation of the proposed electronic stethoscope explained under the software development of the proposed
system. deep learning model.

Preprocessing Unit: Preamplifier, notch filter, bandpass C. PROPOSED CONV-RANDOM FOREST LEARNING
filter, and unity gain buffer. ALGORITHM
Feature Extraction Unit: Let X = {(xj , yj ); 1 ≤ j ≤ T}, where T denotes the length
Acoustic Features: MFCCs, Mel, chroma, contrast, and of the training data set, xj = [x1√, x2√, . . . , xN ] is the set of
Tonnetz. N feature matrices in R9 or R 9x 9 , and yj denotes the

VOLUME 2, 2023 2500717


SINHA ROY et al.: CONV-RF-BASED IoT: A DEEP LEARNING MODEL BASED ON CNN AND RF

FIGURE 7. Schematic of the input sensor unit and computational unit used in the proposed electronic stethoscope.

the output, γ i (l), of the lth layer for the ith fea-
ture map, is considered from the output of the earlier
layer, γ i (l − 1). For every layer, l, in 1 . . . L: compute
the convolutions to produce the γi (l) for layer, l
f (l−1)

γi (l) = ∅ (Bi (l) + kij (l) ∗ γj (l − 1)
j=1

where ∅ denotes the ReLU activation function, Bi (l)


is a bias matrix, and kij (l) is the filter of size (2wk +
FIGURE 8. Data transmission via WIFI between ESP32 and Raspberry Pi using the 1 × 2hk + 1). √ √
Convert γi (l) to a vector of size ( N × N × Z (l) )
MQTT Protocol.
6)
− γi γi (l).
7) Start with a new training sample data for the class
prediction layer X new = γi γi (l), yj; 1 ≤ j ≤ T.
name of vector xj . The proposed method for the Conv-RF
8) In the reshape layer, random data are considered from
is explained as follows.
the given dataset.
1) Declare the training data set, X = {(xj , yj ); 1 ≤ j ≤ T}. 9) Next, this proposed method shall produce a DT for
2) If required, add zeros to the N items of every training each considered random data. Then, it shall get the
data element, xj , so that a new sample
√ element √ could predicted output from each formed DT.
be framed into a matrix of shape, √N ×√ N. 10) Voting is carried out for every predicted outcome.
3) Transform xj into required shape, ( N, N, Z (l) ). 11) Eventually, the maximum voted prediction outcome is
4) Fix the metrics of the conv as: considered as the eventual predicted outcome.
12) Computing the time complexity of the classifier:
a) total conv layers, L;
Train-Time Complexity of RF = O(d ∗ log(d) ∗ c ∗ f ),
b) output depth, Z;
where
c) in every layer, fix the filter sizes, K (l) , and;
(l) c = count of DTs.
d) filter strides, sk .
d = count of sound samples in the train set.
5) In every conv layer, denoted by l, a conv function, f = count of information in the sound samples.
and an additive bias are done to the input for a Time Complexity of RF = O (depth of tree * c) Depth
feature vector denoted by f ∈ {1, . . . , f (l)}. Thus, of tree = 5.

2500717 VOLUME 2, 2023


FIGURE 10. Proposed CNN-based squeeze net system description.

The squeeze filter contains three 1×1 convolutions, whereas


the expand filter comprises of four 1×1 convolutions and
four 3×3 convolutions. Basically, it is a sparsely connected
network having a maxpool and multiple convolutions of
kernel sizes 1, 3, and 5 at the same layer, followed by an
application of concatenation operation from all filter outputs.
FIGURE 9. (a) Squeeze net block. (b) Squeeze filters and expand filters. Table 5 provides the proposed Squeeze Net architecture
using five fire blocks, two convolutional layers with a
The time complexity of the RF classifier plays an impor- ReLU activation function, two maxpool layers, an input
tant part in determining the total floating point operations layer, and an output layer with a softmax activation
per second (FLOPs) and number of trainable parameters in function.
the proposed model. Fig. 10 provides the entire architecture of the proposed
Squeeze Net model used for the valvular heart sound analy-
D. PROPOSED CNN-BASED SQUEEZE NETWORK sis. In this model, two convolutional layers are considered,
A CNN-based Squeeze Net [22], [31] contains many three maxpool layers are used, five fire blocks have been
intermediate layers and fire modules between the input and implemented, followed by one global average pool layer,
output nodes. The said CNN model can work with any and an output layer having a softmax activation function is
real-world problem having a large number of data. Neural used.
networks [8], [17], [47] are very much helpful in providing The third and fourth hidden layers have fire block mod-
answers to the challenges that are witnessed in real life. ules, followed by the fifth layer having a maxpool layer. The
A deep network reacts to the inputs provided, does diffi- sixth and seventh hidden layers contain fire block modules,
cult computations on them, and eventually generates output. again followed by the maxpool layer and fire block module.
Besides this, backpropagation is the working algorithm in The second conv layer is followed by the global average
training these deep learning models [23], [30]. The skele- pool layer. The output layer comprises of five nodes using
ton structure of the convolutional neural network-based deep the softmax activation function to categorize five various
learning network is provided in Fig. 10. The model sum- classes of cardiac sounds [24], [29].
mary is presented after training and validation of the dataset Characteristic plots are taken up for the proposed system
through this proposed model. implemented with normal and abnormal cardiac sound
Fig. 9(a) is the block diagram of a used fire block in samples, and these are provided in Figs. 11 and 12.
a squeeze network, where it uses a combination of both Fig. 11 presents the cross-entropy loss of the adopted
squeeze filter and expand filter. It can be seen that outputs squeeze network during the training and validation stage
from both the filters get concatenated. Fig. 9(b) provides applied on dataset1. The curve of cross-entropy loss with the
the skeleton of the used squeeze filter and expand filter. number of epochs (i.e., 100) predicts that loss reduces as the

VOLUME 2, 2023 2500717


SINHA ROY et al.: CONV-RF-BASED IoT: A DEEP LEARNING MODEL BASED ON CNN AND RF

TABLE 5. Proposed squeeze NET layers.

FIGURE 12. Characteristic curve of average accuracy versus epoch.

FIGURE 13. Flow diagram of RF.

Table 6 represents the proposed CNN-based squeeze


network that achieved an accuracy of 98.65%.

E. RANDOM FOREST ALGORITHM


RF is an established and popular supervised ensem-
ble machine learning method used mainly for real-time
FIGURE 11. Characteristics curve of loss versus epoch during training and classification-based fields [22], [25], [50]. This method is
validation in CNN-Based squeeze net.
also used to challenge its effectiveness in classifying normal
TABLE 6. Description of squeeze network MODEL effectiveness.
and abnormal cardiac samples.
In Fig. 13, a cardiac sound repository is split into the train-
ing and test data. Training data is further decomposed into
800 data samples called estimators for producing predictions.
The eventual predicted result for valvular cardiac sound [50]
is generated via the mean of all the predictions obtained
through the estimators.
Fig. 14 provides [50] the accuracy of the modified RF
number of epochs grows during the training and validation system during the training and validation stage. The curve
stage for the proposed squeeze network. of accuracy versus training sample size depicts that accu-
Fig. 12 is the accuracy of the improved squeeze network racy during the training and validation stage goes down to
during the training and validation stage applied on dataset1. one as the training sample data grows for the improved RF
The curve of accuracy versus the number of epochs (i.e., 100) model.
predicts that accuracy rises close to 1 as the number of The effectiveness of the RF system is evaluated, and
epochs grows during the training and validation stage for various parameters are selected for the computation of its
the proposed CNN system. efficiency as provided in Table 7. In Section V, by studying

2500717 VOLUME 2, 2023


converted, by adding zeros as required. Various data
types are also transformed in this layer.
3) Convolutional Layer: The convolutional layers play
a significant role in this proposed model. They are
responsible for learning the features of the input by
applying a convolution method known as Squeeze
network.
√ √ The data is basically a tensor, with shape,
N, N, z(l) , where z(l) is the total filters in the l
layer
⎛ ⎞
f (l−1)

γi (l) = ∅ ⎝Bi (l) + kij (l) ∗ γj (l − 1)⎠.
j=1

The pooling layer changes the convolution layer outcome.


The said model downsamples and restricts the overfitting of
FIGURE 14. Characteristics curve of accuracy versus epoch in RF.
the proposed model. It modifies the outcome with the high-
TABLE 7. Description of RF model effectiveness. est or mean value within a rectangular window of square
matrix form. For instance, if (γil )m,n is an output of the ear-
lier layer with ∅ an activation function, then P(.) is a pooling
function that acts on (γil ) by allowing it through a pool-
ing method with stride, Sp , and wp × hp pooling window.
Typically, pooling works by placing windows at nonover-
lapping places in every feature map and taking one item
per window so that the feature maps are subsampled. Two
the results of Figs. 20 and 21, the model shall categorize the types of poolings are mainly used: 1) average pooling and
cardiac samples more effectively with better accuracy than 2) max pooling. In max pooling, the highest value of every
the other classifiers. window is considered. Hence, the output of a max-pooling
function is
F. ARCHITECTURE OF THE ADOPTED CONV-RANDOM
FOREST
P(γ li )m,n = max(γ li )m,n
The proposed model of Conv-RF is given in Fig. 15. The where the max function is fed to the max-pooling window
proposed model contains six layers: 1) input layer; 2) data of the mentioned shape.
preprocessing layer; 3) conv layers; 4) reshape layer; 5) class Predicting the Class Labels: RF is an ensemble algo-
prediction layer; and 6) output layer. rithm that is better than a single DT because it decreases
All layers are subdecomposed into two sections: one for over-fitting by taking the mean of the outcome. It works
feature extraction and another for the prediction of the class very fine with a large number of data elements than a
labels. Each layer has a different function. single DT.
Feature Extraction Section: This section extracts the Reshape Layer: This layer converts the convolutional layer
important information from the data samples undergoing tensor output into the required vector form.
training. It contains three layers: 1) the input; 2) data pre- Class Prediction Layer: The major application in this layer
processing; and 3) conv layers. The prediction accuracy of is to predict the class using RF.
the proposed system relies on efficient feature learning. Output Layer: The output layer shall receive the predicted
Details of each layer are explained as follows. class result, and depending on that accuracy of the model
1) Input Layer: The input layer takes input from the stan- can be computed.
dard heart sound sample in the proposed model. It is Fig. 15 is the architectural block diagram of the
considered that a training data set, X, contains a set of convolutional-RF model. The proposed Conv-RF model
tuples √(xj , yj ),√where j denotes the index of the data set. comprised of feature learning part and predicting the class
xj is a N × N feature vectors, and yj stands for class label part. Table 8 provides a summary of volunteers having
label considered for vector, xj . In case the training set different age groups and genders with past medical history.
is in the mentioned shape, it shall be directly allowed Table 9 highlights PCG signal analysis made on selected
to the conv layers of the feature extraction stage. Else, volunteers in different postures, such as sitting, standing,
it shall be transformed in the data preprocessing layer. and supine with different locations like upper right sternal
2) Data Preprocessing Layer: In the data preprocessing border (URSB), upper left sternal border (ULSB) position,
layer, a square matrix shape is considered for the ten- and lower left sternal border (LLSB).
sors in the convolution block. If√the input √ data is not Table 10 is the analysis of PCG recordings done with
in the matrix shape, with size N × N, it will be the developed stethoscope on the selected volunteers with

VOLUME 2, 2023 2500717


SINHA ROY et al.: CONV-RF-BASED IoT: A DEEP LEARNING MODEL BASED ON CNN AND RF

FIGURE 15. Architectural block diagram of the proposed Conv-RF model.

TABLE 8. Description of the volunteers [50], [51].

FIGURE 16. Operational process flow diagram in the proposed Conv-RF model.

past medical history. The comparison of the past medical Fig. 16 shows the operational process flow diagram in
history of the volunteers with the result obtained from the the proposed Conv-RF model, which highlights the different
said stethoscope is carried out. For evaluation purposes, a tensor transformations in the input convolutional layer and
score value is obtained in the range of 1 to 5. reshape layer. Table 11 is the architecture of the proposed

2500717 VOLUME 2, 2023


TABLE 9. Proposed conv-RF model applied on test volunteers [50], [51].

Conv-RF model adopted in the valvular cardiac sound V. RESULT ANALYSIS


analysis. Experimental Methods and Results: In this experiment, var-
Fig. 17 presents the accuracy of the said Conv-RF system ious standard datasets have been considered, and fivefold
during the training and validation stage applied on dataset2. cross-validation is used for the training and test data samples.
The curve of accuracy with the number of epochs shows The results from each testing set are recorded, and the aver-
that accuracy rises as the number of epochs grows for the age value is computed, as shown in the figure below. The
adopted model. ensemble RF classifier result is compared with other ensem-
Fig. 18 presents the cross-entropy loss of the adopted ble methods, such as single DT, GB, and XGB method.
Conv-RF model during the training and validation stage The result obtained with the proposed Convolutional-RF
applied on dataset2. The curve of cross-entropy loss with algorithm is compared with different CNN-based models
the number of epochs predicts that cross-entropy loss like LeNet-5, Alex Net, VGG16, VGG19, DenseNet121,
reduces as the number of epochs grows for the used CNN Inception Net, Residual Net, Xception Net, and ConvXGB.
model. It is found that the Conv-RF method gives the best result

VOLUME 2, 2023 2500717


SINHA ROY et al.: CONV-RF-BASED IoT: A DEEP LEARNING MODEL BASED ON CNN AND RF

TABLE 10. Analysis of PCG recordings produced using the proposed stethoscope with past medical history [50], [51].

TABLE 11. Proposed conv-RF architecture.

in terms of valvular heart sound analysis. Figs. 19 and 20 Fig. 21 is the comparison of the developed Conv-RF
highlight the comparison of the RF algorithm with other model with other CNN-based models for different datasets.
ensemble algorithms and other models for different datasets The electronic stethoscopes available in the market cost
used in this article. around €300–€399. The total expenditure incurred for the

2500717 VOLUME 2, 2023


FIGURE 20. RF model versus other models for different datasets.

FIGURE 17. Characteristics curve of accuracy versus epoch in conv-RF. TABLE 12. Comparative study of the developed stethoscope with other
stethoscopes.

comparison study of the developed stethoscope with other


stethoscopes based on various factors like use, price, safety
and protection, digital storage, etc.
Table 13 provides the comparison study of runtime (in
seconds) of the Conv-RF model with the RF method and
Squeeze Net-based CNN model for different datasets used
FIGURE 18. Characteristics curve of loss versus epoch during training and in the heart sound analysis. The runtime environment of
validation.
the proposed model is the Raspbian operating system with
Thonny ide in Raspberry Pi 4B using Python version 3.9.2.
Table 14 provides the comparison study of runtime (in
sec) of the Conv-RF model with other ensemble learning
methods like DT, GB method, and XGB method.
The proposed model is also compared with SVM and
MLPs for different datasets used in the valvular heart sound
analysis.

VI. CONCLUSION AND FUTURE SCOPE


Experimental data showed that the Conv-RF model provides
decent outputs in terms of accuracy, sensitivity, recall, and
f 1-score. This is also observed that the runtime (in sec-
onds) of the Conv-RF method is the lowest among all other
FIGURE 19. RF model versus other ensemble classifier models for different
methods for different datasets used. The lowest runtime is
datasets. very helpful in having a fast early screening of any kind
of valvular cardiac disorder. The proposed modified squeeze
development of the proposed system for predicting heart dis- network is highly compatible with the ensemble RF method
eases is only around €220. Since auscultation is done through in terms of their respective architectural breakdown. A few
a Bluetooth-enabled speaker, it is safe to use for patients limitations of the proposed model are the ambient noise
as well as for health professionals. The said stethoscope is that cannot be removed to a full extent. The cable of the
also very easy to use as it is AI enabled. Table 12 is the microphone chest piece movement generates noise during

VOLUME 2, 2023 2500717


SINHA ROY et al.: CONV-RF-BASED IoT: A DEEP LEARNING MODEL BASED ON CNN AND RF

FIGURE 21. Conv-RF model versus other models for different datasets.

TABLE 13. Runtime of the proposed Conv-RF model compared with other models.

TABLE 14. Runtime of the proposed Conv-RF model compared with other ensemble methods.

auscultation. Some of the future scopes of this research are proposed electronic stethoscope is based on the combination
auscultation time which is closely around 2 min and requires of Raspberry Pi and ESP32.
further minimization. More number of volunteers with clin- The Signal conditioned output goes to the node MCU
ical assessment are needed for statistical validation of the (ESP32) that is WiFi enabled and contains a 12-bit ADC with
developed model. a sampling frequency of 44.1 kHz. Finally, the converted
digital PCG signal goes to the Raspberry Pi through its
enabled WiFi for further signal processing and analysis. The
VII. DISCUSSION classified heart sound is displayed on a 7-inch LCD screen
In the hardware development part, this work deals with the attached with the Raspberry Pi and is heard in a Bluetooth-
design of an Internet-enabled electronic stethoscope. The enabled microspeaker.

2500717 VOLUME 2, 2023


In the software development part, a novel CNN-based [3] M. Mishra, S. Banerjee, D. C. Thomas, S. Dutta, and A. Mukherjee,
convolutional-RF algorithm is developed. The squeeze “Detection of third heart sound using variational mode decomposi-
tion,” IEEE Trans. Instrum. Meas., vol. 67, no. 7, pp. 1713–1721,
network is used as a CNN model for feature extraction and Jul. 2018. [Online]. Available: https://ieeexplore.ieee.org/document/
the RF method is used as a classifier in the output part. The 8306138
combination of both of them proved to be a decent valvular [4] P. R. Muduli, A. K. Mandal, and A. Mukherjee, “An anti-noise-folding
algorithm for the recovery of biomedical signals from noisy measure-
heart sound classification algorithm in this field. ments,” IEEE Trans. Instrum. Meas. vol. 66, no. 11, pp. 2909–2916,
Nov. 2017, doi: 10.1109/TIM.2017.2734018.
DECLARATIONS [5] S. Barma, B.-W. Chen, W. Ji, F. Jiang, and J.-F. Wang, “Measurement
of duration, energy of instantaneous-frequencies, and splits of sub-
The authors have worked with the same type of heart sounds components of the second heart sound,” IEEE Trans. Instrum. Meas.,
that have been used in [50] and [51] by T. S. Roy et al. in vol. 64, no. 7, pp. 1958–1967, Jul. 2015.
2023. The same set of online available heart sound reposito- [6] N. K. Dewangan, S. P. Shukla, and K. Dewangan, “PCG signal analysis
using discrete wavelet transform,” Int. J. Adv. Manage. Technol. Eng.
ries are used in this paper for training the proposed Conv-RF: Sci., vol. 8, no. 3, pp. 412–417, 2018.
A CNN-based deep learning model which is used in [50] [7] G. Mishra, K. Biswal, and A. K. Mishra, “Denoising of heart sound
and [51]. signal using wavelet transform,” Int. J. Res. Eng. Technol., vol. 2,
no. 4, pp. 719–723, 2013.
Analysis of captured heart sound recordings is done on the [8] M. Z. Othman and A. N. Khaleel, “Phonocardiogram signal anal-
same set of volunteers using the same positions and postures ysis for murmur diagnosing using Shannon energy envelop and
that have been done in [50] and [51] by T. S. Roy et al. in sequenced DWT decomposition,” J. Eng. Sci. Technol., vol. 12, no. 9,
pp. 2393–2402, 2017.
2023. Postures considered are sitting, standing, and supine.
[9] P. Lubaib and K. V. A. Muneer, “The heart defect analysis based on
Positions considered are upper left sternal border (ULSB), PCG signals using pattern recognition techniques,” in Proc. Int. Conf.
upper right sternal border (URSB), and lower left sternal Emerg. Trends Eng., Sci. Technol., vol. 24, 2016, pp. 1024–1031.
border (LLSB). [10] M. Singh and A. Cheema, “Heart sounds classification using feature
extraction of phonocardiography signal,” Int. J. Comput. Appl., vol. 77,
no. 4, pp. 13–17, 2013.
AUTHOR CONTRIBUTIONS [11] T. J. Ahmad, H. Ali, and S. A. Khan, “Classification of phono-
All authors contributed to the algorithm conception and cardiogram using an adaptive fuzzy inference system,” in Proc. Int.
Conf. Image Process., Compute. Vis., Pattern Recognit., vol. 2, 2009,
design. Experimental evaluation was also performed by pp. 1–6.
all authors. Eventually, all authors have gone through and [12] A. K. Roy, A. Misal, and G. R. Sinha, “Classification of PCG signals:
confirmed the final manuscript. A survey,” Int. J. Comput. Appl., Recent Adv. Inf. Technol., vol. 83,
no. 2, pp. 22–26, Jan. 2014.
[13] C. N. Gupta, R. Palaniappan, S. Rajan, S. Swaminathan, and
DECLARATION OF COMPETING INTEREST S. M. Krishnan, “Segmentation and classification of heart sounds,” in
The authors confirm no conflict of interest. Proc. Int. Conf. Can. Conf. Electr. Comput. Eng., Jun. 2005,
pp. 1674–1677, doi: 10.1109/CCECE.2005.1557305.
[14] A. Mdhaffar, I. B. Rodriguez, K. Charfi, L. Abid, and
FUNDING B. Freisleben, “CEP4HFP: Complex event processing for heart fail-
Not applicable, as the entire funding for this research work ure prediction,” IEEE Trans. NanoBiosci., vol. 16, no. 8, pp. 708–717,
was provided by the authors themselves. Dec. 2017, doi: 10.1109/TNB.2017.2769671.
[15] Yaseen, G.-Y. Son, and S. Kwon, “Classification of heart sound signal
using multiple features,” Appl. Sci., vol. 8, no. 12, p. 2344, 2018,
DATA AVAILABILITY STATEMENT doi: 10.3390/app8122344.
All heart sound data samples used for training, validation, [16] “Your first deep learning project in Python with Keras step.” 2022.
[Online]. Available: https://machinelearningmastery.com/tutorial-first-
and testing of the proposed model are available in standard neural-network-python-keras/
heart sound repositories like: [17] B. Xiao, Y. Xu, X. Bi, J. Zhang, and X. Ma, “Heart sounds
1) Yaseen et al. [15] https://github.com/yaseen21khan/ classification using a novel 1-D convolutional neural network with
extremely low parameter consumption,” Neurocomputing, vol. 392,
Classification-of-Heart-Sound-Signal-Using-Multiple- pp. 153–159, Jun. 2020, doi: 10.1016/j.neucom.2018.09.101.
Features-. [18] G. V. H. Prasad and P. R. Kumar, “Analysis of various DWT methods
2) 2016 Physio Net heart sound recording challenge. [21] for feature extracted PCG signals,” Int. J. Eng. Res. Technol., vol. 4,
no. 4, pp. 1279–1290, 2015.
https://physionet.org/content/challenge-2016/1.0.0/ [19] Anju and S. Kumar, “Detection of cardiac murmur,” Int. J. Comput.
3) 2012 pascal heart sound recording challenge. [32] Sci. Mobile Comput., vol. 3, no. 7, pp. 76–80, 2014.
http://www.peterjbentley.com/heartchallenge/. [20] J. K. Roy and T. S. Roy, “A simple technique for heart sound detection
and real-time analysis,” in Proc. Int. Conf. Sens. Technol. (ICST), 2017,
4) Kaggle heartbeat sounds [49] pp. 1–7, 10.1109/ICSensT.2017.8304502.
https://www.kaggle.com/datasets/kinguistics/heartbeat- [21] E. F. Gomes, P. J. Bentley, M. Coimbra, E. Pereira, and Y. Deng,
sounds “Classifying heart sounds: Approaches to the PASCAL challenge,” in
Proc. Int. Conf. Health Informat., Barcelona, Spain, Feb. 2013,
pp. 337–340.
REFERENCES [22] B. Bozkurt, I. Germanakis, and Y. Stylianou, “A study of time–
[1] A. K. Dwivedi, S. A. Imtiaz, and E. Rodriguez-Villegas, “Algorithms frequency features for CNN based automatic heart sound classification
for automatic analysis and classification of heart sounds—A systematic for pathology detection,” Comput. Biol. Med., vol. 100, pp. 132–143,
review,” IEEE Access, vol. 7, pp. 8316–8345, 2019. Sep. 2018, doi: 10.1016/j.compbiomed.2018.06.026.
[2] M. Mishra, H. Menon, and A. Mukherjee, “Characterization of S1 and [23] J. Muruganantham, R. Amarnath, K. V. Jawahar, and
S2 heart sounds using stacked autoencoder and convolutional neural C. Kalyanasundaram, “Methods for classification of phonocar-
network,” IEEE Trans. Instrum. Meas., vol. 68, no. 9, pp. 3211–3220, diogram,” in Proc. Conf. Convergent Technol. Asia–Pacific Region,
Sep. 2019. vol. 4. Bengaluru, India, 2003, pp. 1514–1515.

VOLUME 2, 2023 2500717


SINHA ROY et al.: CONV-RF-BASED IoT: A DEEP LEARNING MODEL BASED ON CNN AND RF

[24] T. H. Chowdhury, K. N. Poudel, and Y. Hu, “Time-frequency [43] F. Li et al., “Feature extraction and classification of heart sound using
analysis, denoising, compression, segmentation, and classifica- 1D convolutional neural networks,” EURASIP J. Adv. Signal Process.,
tion of PCG signals,” IEEE Access, vol. 8, pp. 160882–160890, vol. 10, p. 59, Dec. 2019.
2020. [44] J. M.-T. Wu et al., “Applying an ensemble convolutional neural
[25] J. K. Roy, T. S. Roy, and S. C. Mukhopadhyay, “Heart network with Savitzky–Golay filter to construct a phonocardio-
sound: Detection and analytical approach towards diseases,” in gram prediction model,” Appl. Soft Comput., vol. 78, May 2019,
Modern Sensing Technologies, S. Mukhopadhyay, K. Jayasundera, pp. 29–40.
O. Postolache, Eds. Cham, Switzerland: Springer, 2019, pp. 103–145. [45] T. C. Yang and H. Hsieh, “Classification of acoustic physiologi-
[Online]. Available: https://doi.org/10.1007/978-3-319-99540-3_7 cal signals based on deep learning neural networks with augmented
[26] F. Li, H. Tang, S. Shang, K. Mathiak, and F. Cong, “Classification of features,” in Proc. Comput. Cardiol. Conf. (CinC), Vancouver, BC,
heart sounds using convolutional neural network,” Appl. Sci., vol. 10, Canada, Sep. 2016, pp. 569–572.
no. 11, p. 3956, 2020, doi:10.3390/app10113956. [46] Q. Suo et al., “Deep patient similarity learning for personalized
[27] J. K. Roy, T. S. Roy, N. Mandal, and O. A. Postolache, “A simple healthcare,” IEEE Trans. NanoBiosci., vol. 17, no. 3, pp. 219–227,
technique for heart sound detection and identification using Kalman Jul. 2018, doi: 10.1109/TNB.2018.2837622.
filter in real time analysis,” in Proc. Int. Symp. Sens. Instrum. IoT Era [47] A. Mario et al., “Cardiac conduction model for generating 12
(ISSI), 2018, pp. 1–8. lead ECG signals with realistic heart rate dynamics,” IEEE Trans.
[28] D. B. Springer, L. Tarassenko, and G. D. Clifford, “Support vector NanoBiosci., vol. 17, no. 4, pp. 525–532, Oct. 2018.
machine hidden semi Markov model-based heart sound segmenta- [48] D. Li, M. Huang, X. Li, Y. Ruan, and L. Yao, “MfeCNN:
tion,” in Proc. Comput. Cardiol., 2014, pp. 625–628. Mixture feature embedding convolutional neural network for data
[29] A. Cheema and M. Singh, “Steps involved in heart sound analysis—A mapping,” IEEE Trans. NanoBiosci., vol. 17, no. 3, pp. 165–171,
review of existing trends,” Int. J. Eng. Trends Technol., vol. 4, no. 7, Jul. 2018, doi: 10.1109/TNB.2018.2841053.
pp. 2921–2925, 2013. [49] “Kaggle heartbeat sounds.” kaggle.com. Accessed: Feb. 28, 2021.
[30] J. B. Wu, S. Zhou, Z. Wu, and X. M. Wu, “Research on the method [Online]. Available: https://www.kaggle.com/datasets/kinguistics/
of characteristic extraction and classification of phonocardiogram,” in heartbeat-sounds.
Proc. Int. Conf. Syst. Informat. (ICSAI), 2012, pp. 1732–1735. [50] T. S. Roy, J. K. Roy, and N. Mandal, “Classifier identification using
[31] C. D. Papadaniil and L. J. Hadjileontiadis, “Efficient heart sound deep learning and machine learning algorithms for the detection
segmentation and extraction using ensemble empirical mode decompo- of valvular heart diseases,” Biomed. Eng. Adv., vol. 3, Jun. 2022,
sition and kurtosis features,” IEEE J. Biomed. Health Inform., vol. 18, Art. no. 100035.
no. 4, pp. 1138–1152, Jul. 2014. [51] T. S. Roy, J. K. Roy, and N. Mandal, “Design of ear-contactless
[32] C. Liu et al., “An open-access database for the evaluation of heart stethoscope and improvement in the performance of deep learning
sound algorithms,” Physiol. Meas., vol. 37, no. 12, pp. 2181–2213, based on CNN to classify the heart sound,” Med. Biol. Eng. Comput.,
2016. [Online]. Available: https://iopscience.iop.org/article/10.1088/ vol. 61, pp. 2417–2439, Apr. 2023. [Online]. Available: https://doi.
0967-3334/37/12/2181 org/10.1007/s11517-023-02827-w
[33] M. Deng, T. Meng, J. Cao, S. Wang, J. Zhang, and H. Fan, “Heart [52] T. S. Roy, J. K. Roy, and N. Mandal, “Early screening of valvular
sound classification based on improved MFCC features and convolu- heart disease prediction using CNN-based mobile network,” in Proc.
tional recurrent neural networks,” Neural Netw., vol. 130, pp. 22–32, Int. Conf. Comput., Electr. Commun. Eng. (ICCECE), Kolkata, India,
Oct. 2020. 2023, pp. 1–8, doi: 10.1109/ICCECE51049.2023.10085513.
[34] Z. Abduh, E. A. Nehary, M. A. Wahed, and Y. M. Kadah,
“Classification of heart sounds using fractional fourier transform based
mel-frequency spectral coefficients and traditional classifiers,” Biomed.
Signal Process. Control, vol. 57, Mar. 2020, Art. no. 101788.
[35] T. Alafif, M. Boulares, A. Barnawi, T. Alafif, H. Althobaiti, and
A. Alferaidi, “Normal and abnormal heart rates recognition using
transfer learning,” in Proc. 12th Int. Conf. Knowl. Syst. Eng. (KSE),
2020, pp. 275–280.
[36] F. Demir, A. Sengür, V. Bajaj, and K. Polat, “Towards the classification
of heart sounds based on convolutional deep neural network,” Health
Inf. Sci. Syst., vol. 7, p. 16, Aug. 2019.
[37] B. Xiao et al., “Follow the sound of children’s heart:
A deep-learning-based computer-aided pediatric CHDs diagnosis
system,” IEEE Internet Things J., vol. 7, no. 3, pp. 1994–2004,
Mar. 2020.
[38] F. A. Khan, A. Abid, and M. S. Khan, “Automatic heart sound classi-
fication from segmented/unsegmented phonocardiogram signals using
time and frequency features,” Physiol. Meas., vol. 41, no. 5, 2020,
Art. no. 55006. TANMAY SINHA ROY was born in India, in 1988.
[39] A. Raza, A. Mehmood, S. Ullah, M. Ahmad, G. S. Choi, and B.-W. On, He received the B.Tech. degree in instrumen-
“Heartbeat sound signal classification using deep learning,” Sensors, tation and control engineering and the M.Tech.
vol. 19, no. 21, p. 4819, 2019. degree in applied electronics and instrumentation
[40] H. Ryu, J. Park, and H. Shin, “Classification of heart sound recordings engineering from the West Bengal University of
using convolution neural network,” in Proc. Comput. Cardiol. Conf. Technology, Kolkata, India, in 2009 and 2011,
(CinC), Vancouver, BC, Canada, Sep. 2016, pp. 1153–1156. respectively. He is currently pursuing the Ph.D.
[41] J. Rubin, R. Abreu, A. Ganguli, S. Nelaturi, I. Matei, and K. Sricharan, degree with the Indian Institute of Technology
“Classifying heart sound recordings using deep convolutional neu- (Indian School of Mines) Dhanbad, Dhanbad,
ral networks and mel-frequency cepstral coefficients,” in Proc. India.
Comput. Cardiol. Conf. (CinC), Vancouver, BC, Canada, 2016, He is currently an Assistant Professor with the
pp. 813–816. Electrical Engineering Department, Haldia Institute of Technology, West
[42] V. Maknickas and A. Maknickas, “Recognition of normal-abnormal Bengal University of Technology. His research interests include PCG signal
phonocardiographic signals using deep convolutional neural networks analysis, developing systems for heart sound acquisition, instrumentation,
and mel-frequency spectral coefficients,” Physiol. Meas., vol. 38, no. 8, and control, and designing low-cost acoustic stethoscopes for diseased
pp. 1671–1684, 2017. patients.

2500717 VOLUME 2, 2023


JOYANTA KUMAR ROY (Senior Member, IEEE) NIRUPAMA MANDAL (Senior Member, IEEE)
received the Ph.D. degree from Calcutta received the Ph.D. degree from Calcutta
University, Kolkata, India, in 2005. University, Kolkata, India, in 2012.
He has been an Electronics and Automation She is currently an Associate Professor with
Engineer for the last 40 years as a Company the Department of Electronics Engineering,
Director, Consulting, Engineer, Developer, Indian Institute of Technology (Indian School
Researcher, and Educationist. He is a Visiting of Mines) Dhanbad, Dhanbad, India. She was
Professor with the Narula Institute of Technology, the Head of the Department of Electronics and
Kolkata, a Company Director with System Instrumentation Engineering, Asansol Engineering
Advance Technologies, Kolkata, and a Freelance College, Asansol, India, in 2013. Her research
Consultant with a number of industries to give interests include transducer development, con-
design support toward intelligent technology in the water sector. He troller design, process plant instrumentation, process modeling, smart
published a significant number of scientific and technical publications in sensing system, and smart instrumentation.
the form of books, book chapters, design documents, and research papers. Dr. Mandal received the National Scholarship Award from the
His research interests include the development of smart measurement and Government of India in 2001.
control systems, multifunction sensors, IoT-based health and technology-
assisted living, and smart homes and cities.
Dr. Roy is an Associate Editor of the International Journal on Smart
Sensing and Intelligent Systems. He is a Senior Member of IET, a Fellow
Member of IETE and IWWA, and a regular reviewer of IEEE and Springer
journals.

VOLUME 2, 2023 2500717

You might also like