A Comparative Analysis of Deep Neural Network Models Using Transfer Learning For Electrocardiogram Signal Classification

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2021 6th International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT), August 27th &

28th 2021

A Comparative Analysis of Deep Neural Network


Models using Transfer Learning for
Electrocardiogram Signal Classification
2021 International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT) | 978-1-6654-3559-8/21/$31.00 ©2021 IEEE | DOI: 10.1109/RTEICT52294.2021.9573692

Nitin Rahuja Sudarshan K.Valluru


Center for Control of Dynamical Systems and Computation Center for Control of Dynamical Systems and Computation
Dept. of Electrical Engineering Dept. of Electrical Engineering
Delhi Technological University Delhi Technological University
Delhi-110042, India Delhi-110042, India
nitinrahuja@gmail.com sudarshan_valluru@dce.ac.in
Abstract— With the development of computational impulses, the heart either beats very slowly, extremely fast or
resources in the past decade, the adoption of deep learning totally unpredictably. Therefore, if appropriate medical
methodologies has taken a steep rise. Particularly, the use of treatment is not received, it can cause heart stroke, heart
various Convolutional Neural Network (CNN) based failure, or abrupt cardiac arrest.
architectures for various computer vision and image
classification problems in healthcare research can be witnessed.
In this scope, performance evaluation and comparison of three
different CNN architectures for multi-class classification of
Electrocardiogram (ECG) signals is presented in this paper.
Three distinct classes of ECG signal i.e. Normal Sinus Rhythm
(NSR), Arrhythmia (ARR), and Congestive Heart failure
(CHF) are utilized in this work, which is representative of an
individual’s heart conditions. The classification methodology
adopted makes use of Continuous Wavelet Transform (CWT)
which provides a 2-dimensional time-frequency representation
of available samples of ECG signals. ECG dataset was collected
from different PhysioBank databases, which provides openly- Fig. 1. An ECG signal waveform
accessible medical research data. Images with time-frequency
representation of ECG signals were applied as input to three Normal Sinus Rhythm (NSR) represents the typical
CNN architectures namely, AlexNet, GoogLeNet, and
electrical activity of a healthy individual's heart. It is
SqueezeNet. Using transfer learning approach and
modification in certain output layers of three architectures,
representative of the fact that the sinus node is appropriately
ECG classification was performed and the performance of generating and transmitting the electrical pulse. Congestive
three CNN architectures was studied. The results revealed Heart Failure (CHF) is a chronic condition in which the
classification accuracy of 97.80%, 97.78% and 97.22% by blood pumping power of an individual’s heart is significantly
AlexNet, GoogLeNet and SqueezeNet respectively. AlexNet affected. The inefficient pumping capacity of heart is caused
outperformed GoogLeNet and SqueezeNet models in terms of mainly due to narrowed arteries in heart and high blood
accuracy as well as training time. pressure; gradually making the heart extremely weak. Fig. 2
shows ECG signal corresponding to three different heart
Keywords—Deep Learning, Electrocardiogram, Transfer conditions.
Learning
Thus, it is important to detect the abnormalities in the
I. INTRODUCTION ECG signal and classify them [1]. As a result, the detection
of precise ECG irregularities will be advantageous to
Cardiovascular diseases are one of the top five reasons
physicians and health professionals to take appropriate
for majority of deaths across the globe. With an increase in
medical actions [2].
the sedentary lifestyle among large set of population, the
associated health related problems, particularly heart related Deep Learning methods make use of neural networks
problems have also increased. Medical practitioners having multiple layers of neurons. Deep learning
predominantly rely on Electrocardiogram (ECG) as tool, for accomplishes feature extraction by using many layers acting
analyzing and interpreting an individual’s heart condition. A as processing unit. Each layer for extracting particular feature
healthy individual's heart activity can be characterized by takes output of the previous layer as input [3]. Deep learning
distinct components such as P-wave, PR interval QRS methodologies have outperformed previously existing
complex, QT interval, and T-wave as shown in Fig. 1. methods in various pattern recognition competitions and have
Diagnosis and identification of any coronary illness can be motivated the research community to make use of these state
done by studying various patterns and abnormalities in these of the art techniques in field of biomedical image processing.
waves. Due to advancement in deep learning methodologies
in the past decade, diagnosis of various cardiac conditions A deep bidirectional LSTM network using a new
using computer based methodologies is widely observed. wavelet-based layer was proposed for classifying
electrocardiogram (ECG) signals. The ECG signals were
Arrhythmia (ARR) refers to the heart condition that decomposed into frequency sub-bands at different scales
influences the heartbeat rate. One of the main reasons of it is using wavelet-based layer. These sub-bands were used as
inappropriate generation of electrical impulses which sequences for the input of LSTM networks. Five different
regulates heartbeats. Due to such improper electrical types of heartbeats obtained from the MIT-BIH arrhythmia

978-1-6654-3559-8/21/$31.00 ©2021 IEEE


285

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on July 04,2022 at 06:34:31 UTC from IEEE Xplore. Restrictions apply.
database, having a total of 7376 ECG signals were studied transform 1-dimensional ECG signal samples into a 2-
for classification. A high recognition performance of 99.39% dimensional image of ECG signal i.e. a Scalogram, using
was achieved [4]. Continuous Wavelet Transform (CWT). Experiments reveal
that AlexNet, GoogLeNet, and SqueezeNet pre-trained on
A 34-layer convolutional neural network for detecting a ImageNet database can effectively extract features extractor
wide range of heart arrhythmias from electrocardiograms using a very small dataset of 900 ECG signal scalograms.
recorded with a single-lead wearable monitor was proposed. This paper presents the results obtained after performing the
A large dataset of 64,121 ECG records from 29,163 patients experiments with the pre-trained CNN architectures.
was used in this work. The CNN model maps a sequence of
ECG samples to a sequence of rhythm classes. The The sections illustrating the work are arranged as follows:
performance of CNN model with 6 individual cardiologists Section II presents dataset utilized and pre-processing steps.
was compared and the proposed model exceeded the average Section III illustrates the detailed methodology using CNN
cardiologist performance in both sensitivity and precision [5]. architectures adopted in this work; Section IV analyses the
collected experimental findings, and Section V concludes the
paper by presenting the final remarks.
II. RAW DATA AND PRE-PROCESSING
For the purpose of comparative performance analysis of
three CNN models, publicly available physiological data
from Physionet was utilized. Samples of various ECG
recordings were extracted from MIT-BIH Arrhythmia
Database [6], [7], MIT-BIH Normal Sinus Rhythm Database
[6] and BIDMC Congestive Heart Failure Database [6]. In
total, 162 ECG recordings were used for this study out of
which, 30 belonged to people having CHF, 96 belonged to
people having ARR and 36 belonged to people having NSR.
A total of 65,536 samples were present in each recording.
Fig. 3 shows plots of three randomly selected records of each
type with 3000 samples per type. For efficient training of
various CNN models, size of training dataset was increased
Fig. 2. ECG signals corresponding to ARR, CHF and NSR conditions by splitting each ECG recording into 500 samples per
recording. An equal number of recordings of each type
Automatic extraction and identification of complex and (CHF, ARR, and NSR) were used to obtain corresponding
intricate features of images is the main advantage provided scalograms to ensure equal distribution of different labels.
by Deep Neural Networks (DNNs) and thereby, eliminating
the requirement for manual feature extraction as required in
conventional Machine Learning (ML) methodologies. This
advantage provides an opportunity for creating an end to end
pipeline, having ECG signal at input end and the result of
classification at the output. ECG classification task can be
binary or multiple. However, more and more accuracy and
human-level output is possible to achieve only when large
amount of data is available for extraction of intricate ECG
features, and learn from a variety of available inputs.
One reason predominantly hampering the wide utilization
of deep learning methods is the hindrance due to non-
availability of large volumes of data required to learn and
perform ECG classification. When contrasted with the
conventional ML based classification approaches, a
humongous amount of data is required by DNNs & CNNs for
training. Such an issue leads to a gap between sufficient
intricate ECG features and size of dataset, due to the lack of
volume of the datasets that are openly accessible in ECG
analysis domain [5].
In order to overcome the above-mentioned issue, this
work proposes an ECG multiple classification method
utilizing transfer learning from the different classes unlike
ECG domain. Particularly, instead of training the CNNs from
scratch using ECG data, architectures pre-trained on data Fig. 3. Plots of three random ECG signal of each type
associated with image classification and object recognition
(1000 classes like chair, mouse, table lamp etc.) are utilized. III. METHODOLOGY
Due to availability of huge datasets in these domains, A. CONTINUOUS WAVELET TRANSFORM & WAVELET
efficient training and extraction of feature maps COEFFICIENTS
representative of complex features and patterns in the images
Due to non-stationary behavior of the ECG signal (i.e. it’s
is possible. Such learnt and available feature maps can be
frequency components vary with time), Fourier transform is
transferred for ECG classification purposes, only if we can
not suitable for obtaining the ECG signal’s time-dependent

286

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on July 04,2022 at 06:34:31 UTC from IEEE Xplore. Restrictions apply.
amplitude and frequency information. The Wavelet A total of 900 scalograms having 300 scalograms of each
Transform, in which the signal can be expressed using type (ARR, CHF and NSR) were obtained using a custom
constituent wavelets of different “scales” and “positions” function in MATLAB. Further, appropriate re-sizing of the
(unlike constituent sinusoid of different frequencies in images was performed as three models have different input
Fourier transform), is a promising tool in investigating the image specification requirements in terms of size/pixels.
non-stationary characteristic of ECG signal [8]. A waveform Table I shows the values of input image size accepted by
or signal having finite time duration and an average value of various CNN models under consideration.
zero can be thought of as a wavelet. The mathematical
expression for Continuous Wavelet Transform (CWT) of a For training and validation of three CNN models, holdout
approach with an 80-20 split of dataset (i.e. 80% for training
signal "𝑓(𝑡)" and the Wavelet, are given in (1) and (2)
& 20% for testing) was used.
respectively.
TABLE. I. INPUT IMAGE SIZE REQUIREMENTS OF THREE CNN MODELS
1 +∞ 𝑡−𝑏 MODELS INPUT IMAGE SIZE (PIXELS )
𝐶𝑊𝑇(𝑎, 𝑏) = < 𝑓, 𝛹𝑎,𝑏 > = ∫ 𝑓(𝑡). 𝛹 ∗ ( 𝑎 ) 𝑑𝑡
√𝑎 −∞
(1)
AlexNet 227 x 227

where, GoogLeNet 224 x 224


SqueezeNet 227 x 227
1 𝑡−𝑏
𝛹𝑎,𝑏 (𝑡) = 𝛹( ) (2) B. TRANSFER LEARNING
√𝑎 𝑎
Generally, CNNs are not trained from scratch due to large
The transformed signal 𝐶𝑊𝑇(𝑎, 𝑏) and Wavelet 𝛹𝑎,𝑏 (𝑡) input data requirements, computational resources and time
is a function of “a” and “b” i.e., dilation (scale) parameter required for training is very high. The transfer learning
and translation (position) parameter, respectively. “  * ” approach enables the utilization of gained knowledge for
classification during training, to classify dissimilar but
represent complex conjugate of wavelet function. Morlet
related classes of objects. Transfer learning refers to the
wavelet, shown in Fig. 4, is utilized in this work.
process of making a machine learn using a pre-trained model,
Using CWT, the correlation between the ECG signal and for different but relevant task [9].
the wavelet function (here, Morlet) is analyzed for different
This paper presents pre-trained CNN models, initially
time intervals, resulting in a Two-Dimensional (2D)
trained on ImageNet [10] database having more than a
representation known as Scalogram. Specifically, this work
millions images. The CNN models were re-trained with ECG
utilized Analytic Morlet (Gabor) wavelet with 12 wavelet
scalogram dataset having three classes. In this work, we have
bandpass filters per octave. Fig. 5 shows an ECG signal, and
replaced and tuned the output layers of CNN models to make
it's the corresponding Scalogram obtained using CWT.
them suitable for ECG classification.
C. ALEXNET
AlexNet is a Convolutional Neural Network (CNN)
model, presented by Alex Krizhevsky [11] and won the
ImageNet challenge in 2012 (ILSVRC-2012). AlexNet was
trained using two Graphical Processing Units (GPUs) with
more than 1 million images, and tested on 150,000 test
images. As a result, it has the ability to classify images into
1000 distinct classes. AlexNet is an 8 layer deep architecture
comprising 5 convolutional layers combined with max-
pooling layers and 3 fully connected layers. All the layers
except output layers use Rectified Linear Unit (ReLu)
activation function. The output layer uses softmax activation
function. The ReLu activation function is mathematically
described in equation (3) and Fig. 6 shows the architecture of
AlexNet. Pre-trained AlexNet was modified and used in this
work. Particularly, last 3 layers i.e. fully connected layer,
softmax layer and classification layer of the AlexNet were
replaced and adapted for classifying three distinct ECG
Fig. 4. Morlet Wavelet
classes under consideration.

0, 𝑥 < 0
𝑔(𝑥) = { (3)
𝑥, 𝑥 ≥ 0

Fig. 5. Scalogram of ECG signal corresponding to Arrhythmia (ARR)

287

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on July 04,2022 at 06:34:31 UTC from IEEE Xplore. Restrictions apply.
parameters. Further, using deep compression technique, the
model can be compressed to have a small size of just 0.47
MB from 4.8MB. Such small DNN models can be easily
implemented on hardware with limited resources including
Field Programmable Gate Arrays (FPGAs). Fig. 8 shows the
architecture of SqueezeNet having Fire modules, which is the
fundamental component of SqueezeNet. Fire module
contains a convolution layer (squeeze) having 1×1 filters,
which provide input to an expanding layer having 1x1 and
3x3 filters. As we move from the beginning to the end of the
network, the number of filters present per Fire Module
gradually increases. All Fire modules make use of ReLu
activation function. At the output end of architecture, a new
dropout layer with a probability of 0.6, a new convolutional
layer with the number of filters set to 3, and a new
classification layer replaced the original respective layers for
classifying three distinct ECG signal types.
Fig. 6. AlexNet Architecture [11]

D. GOOGLENET
GoogLeNet [12] was designed by a team of researchers at
Google, and has performance capability very close to
humans. GoogleNet, also known as Inception, won the
ILSVRC-2014 competition. GoogLeNet was inspired by
LeNet-5 [13]. Particularly, GoogLeNet uses Inception
Module, shown in Fig. 7, as it’s main structural component
providing the benefit of convolution filters of different sizes.
The inception module comprises of convolution and max-
pooling layers, arranged in parallel, which works together to
combine their respective feature maps.

Fig. 7. Inception Module of GoogleNet [12]

By stacking 9 such inception modules linearly along with


other layers and functions, 22 layers deep GoogLeNet
Fig. 8. Architecture of SqueezeNet [14]
architecture was obtained. Such architecture is effective in
reducing the number of parameters i.e. from 60 million IV. RESULTS
parameters in AlexNet, to just 4 million parameters. Similar
to AlexNet, the output layers was customized to deal with The hardware & software for conducting the experiment
three ECG classes. Specifically, a new dropout layer with a included: a single CPU (Core i3) at 2.00 GHz, 8 GB of
probability of 0.6 replaced the final dropout layer, a new RAM, and MATLAB (2020a) software as simulation
fully-connected layer with the number of filters set to 3 environment. To obtain the classification performance of
replaced the original fully connected layer, and a new modified AlexNet, GoogLeNet, and SqueezeNet, the train-
classification layer without a class label was used instead of test split was done as 80% - 20%, respectively, and
the original classification layer. successively, the model was trained & tested for the multi-
class ECG classification task.
E. SQUEEZENET
For training the models, SGDM solver was used. Initial
Researchers at University of Berkeley, California, and
learn rate was set to 0.0001, mini-batch size was set to 15
Stanford and DeepScale brought SqueezeNet [14] into
samples, and the maximum epochs were set to 10. The
existence in 2016. SqueezeNet is an 18 layer deep small
modified models were operated with a total of 480 iterations.
CNN architecture, which is capable of providing an accuracy
Table II presents the performance of three CNN models. As
same as that of AlexNet, on ImageNet database. It performs 3
per the results, a decent validation accuracy of 97.80%,
times faster as compared to AlexNet due to 50 times fewer
97.78%, and 97.22% for AlexNet, GoogLeNet, and

288

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on July 04,2022 at 06:34:31 UTC from IEEE Xplore. Restrictions apply.
SqueezeNet, respectively were obtained. The accuracy and
loss graphs for the three CNN models are presented in Fig. 9.
As can be seen from the Fig. 9 (a), the loss curve of AlexNet
shows least wiggle among loss curve of GoogLeNet and
SqueezeNet, suggesting that the loss function improved
monotonically with every epoch, giving least loss value at the
end of 10 epochs. Same inference can also be drawn from the
accuracy curve where, AlexNet showed least dips in
accuracy curve at initial epochs as compared to GoogLeNet
and SqueezeNet, resulting in enhancing the accuracy to
97.8% at the last epoch.
AlexNet outperformed other two models and showcased
the highest validation accuracy. With respect to training time,
GoogLeNet took longest to train, while AlexNet took least
training time.
Precision, Recall and F1 Score were computed for Fig. 9 (b). Accuracy and Loss graphs of GoogLeNet
evaluating the performance of three models. Precision, Recall
and F-1 Score are obtained using the confusion matrix. Fig.
10 shows the three-class confusion matrix obtained by three
different models. The confusion matrix fundamentally
contains four parameters which are True Positive (TP), True
Negative (TN), False Positive (FP), and False Negative (FN).
TP and TN correspond to the number of correctly predicted
ECG signal classes and are positioned along the diagonal
axis of confusion matrix. Similarly, FP and FN represents the
number of incorrectly classified/predicted ECG signal
classes. The number of ECG signals correctly classified as
positive out of the total ECG signals identified as positive,
defines the Precision. Recall (also known as Sensitivity) can
be defined as the number of ECG signals correctly identified
as positive out of the total actual positives classes of ECG
signal. F1-Score is representative of the effectiveness of
classification when equal importance is given to both recall
and precision. To calculate the overall Precision, Recall, and
F1 score, macro-averaging of results of three different classes Fig. 9 (c). Accuracy and Loss graphs of SqueezeNet
was done.
The Precision and Recall values for three target ECG
classes (ARR, CHF, and NSR) obtained by three different
models are shown in Fig. 10. Therefore, the overall
performance metrics for the multi-class classification task are
shown in Table III.

Fig. 10 (a). Confusion Matrix of AlexNet


Fig. 9 (a). Accuracy and Loss graphs of AlexNet

289

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on July 04,2022 at 06:34:31 UTC from IEEE Xplore. Restrictions apply.
successfully converted to two-dimensional images, which
were then fed as input to three models for training and
classification purposes. Further, the importance of transfer
learning and how it can be utilized for un-related but similar
tasks is studied. Advantages of such methodology included
minimal training time due to transfer learning, as the network
is not trained from scratch for extraction and identification of
intricate ECG features.
The obtained accuracy of three models was close to
human level accuracy, which shows that AlexNet,
GoogLeNet and SqueezeNet architectures performs well for
multi-class ECG signal classification task, even with a small
dataset of 900 scalogram images. Also, results obtained are
strongly influenced by the duration of ECG signal, the
number of samples takes per signal, chosen wavelet and
continuous wavelet transform parameters (e.g., colour map).
REFERENCES
Fig. 10 (b). Confusion Matrix of GoogLeNet [1] A. Diker, E. Avci, Z. Cömert, D. Avci, E. Kaçar and İ. Serhatlioğlu,
"Classification of ECG signal by using machine learning methods",
2018 26th Signal Processing and Communications Applications
Conference (SIU), pp. 1-4, 2018.
[2] W. Zhang, L H. Wang et al., "A Low-Power High-Data-Transmission
multi-lead ECG Acquisition Sensor System", IEEE Sensors J., vol.
19, no. 22, pp. 1-3, Nov. 2019.
[3] L. Deng and D. Yu, “Deep Learning: Methods and Applications,”
Found. Trends® Signal Process., vol. 7, no. 3–4, pp. 197–387, 2014.
Douangnoulack, Phonethep, and Veera Boonjing. "Building Minimal
Classification Rules for Breast Cancer Diagnosis." 2018 10th
International Conference on Knowledge and Smart Technology
(KST). IEEE, 2018.
[4] Özal Yildirim, A novel wavelet sequence based on deep bidirectional
LSTM network model for ECG signal classification, Computers in
Biology and Medicine, Volume 96, 2018, Pages 189-202, ISSN 0010-
4825.
[5] Rajpurkar, Pranav & Hannun, Awni & Haghpanahi, Masoumeh &
Bourn, Codie & Y. Ng, Andrew. (2017). Cardiologist-Level
Arrhythmia Detection with Convolutional Neural Networks.
[6] Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh,
Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE.
PhysioBank, PhysioToolkit, and PhysioNet: Components of a New
Research Resource for Complex Physiologic Signals. Circulation
Fig. 10 (c). Confusion Matrix of SqueezeNet 101(23):e215-e220 [Circulation Electronic Pages;
http://circ.ahajournals.org/content/101/23/e215.full]; 2000 (June 13).
TABLE. II. ACCURACY AND TRAINING TIME OF THREE MODELS [7] Moody GB, Mark RG. The impact of the MIT-BIH Arrhythmia
Database. IEEE Eng in Med and Biol 20(3):45-50 (May-June 2001).
MODEL VALIDATION TRAINING TIME [8] S. Mallat, A wavelet tour of signal processing, Elsevier, 1999.
ACCURACY
[9] Guo Z, Chen Q, Wu G, Xu Y, Shibasaki R, Shao X., Village Building
AlexNet 97.80 22 min. 57 sec. Identification Based on Ensemble Convolutional Neural Networks.
GoogLeNet 97.78 69 min. 03 sec. Sensors.;17(11):2487, 2017.
SqueezeNet 97.22 25 min. 27sec. [10] ImageNet. http://www.image-net.org
[11] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet
classification with deep convolutional neural networks", Neural
TABLE. III. PERFORMANCE METRICS Information Processing Systems, vol. 25, no. 2, 2012.
[12] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D.
MODEL PRECISION RECALL F1 SCORE Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with
AlexNet 0.977 0.978 0.977 convolutions,” in IEEE Conference on Computer Vision and Pattern
Recognition, 2015, pp. 1–9.
GoogLeNet 0.978 0.977 0.977
[13] Huang, Gao, et al. "Densely connected convolutional networks."
SqueezeNet 0.973 0.972 0.973 Proceedings of the IEEE conference on computer vision and pattern
recognition. 2017.
CONCLUSION [14] Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.;
Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer
In this work, a three structurally different CNN models parameters and <0.5MB model size. arXiv 2016; arXiv: 1602.07360.
were modified and studied to evaluate the multi-class ECG
signal classification task. Using continuous wavelet
transform and wavelet coefficients, ECG signal samples were

290

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on July 04,2022 at 06:34:31 UTC from IEEE Xplore. Restrictions apply.

You might also like