Professional Documents
Culture Documents
A Comparative Analysis of Deep Neural Network Models Using Transfer Learning For Electrocardiogram Signal Classification
A Comparative Analysis of Deep Neural Network Models Using Transfer Learning For Electrocardiogram Signal Classification
A Comparative Analysis of Deep Neural Network Models Using Transfer Learning For Electrocardiogram Signal Classification
28th 2021
Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on July 04,2022 at 06:34:31 UTC from IEEE Xplore. Restrictions apply.
database, having a total of 7376 ECG signals were studied transform 1-dimensional ECG signal samples into a 2-
for classification. A high recognition performance of 99.39% dimensional image of ECG signal i.e. a Scalogram, using
was achieved [4]. Continuous Wavelet Transform (CWT). Experiments reveal
that AlexNet, GoogLeNet, and SqueezeNet pre-trained on
A 34-layer convolutional neural network for detecting a ImageNet database can effectively extract features extractor
wide range of heart arrhythmias from electrocardiograms using a very small dataset of 900 ECG signal scalograms.
recorded with a single-lead wearable monitor was proposed. This paper presents the results obtained after performing the
A large dataset of 64,121 ECG records from 29,163 patients experiments with the pre-trained CNN architectures.
was used in this work. The CNN model maps a sequence of
ECG samples to a sequence of rhythm classes. The The sections illustrating the work are arranged as follows:
performance of CNN model with 6 individual cardiologists Section II presents dataset utilized and pre-processing steps.
was compared and the proposed model exceeded the average Section III illustrates the detailed methodology using CNN
cardiologist performance in both sensitivity and precision [5]. architectures adopted in this work; Section IV analyses the
collected experimental findings, and Section V concludes the
paper by presenting the final remarks.
II. RAW DATA AND PRE-PROCESSING
For the purpose of comparative performance analysis of
three CNN models, publicly available physiological data
from Physionet was utilized. Samples of various ECG
recordings were extracted from MIT-BIH Arrhythmia
Database [6], [7], MIT-BIH Normal Sinus Rhythm Database
[6] and BIDMC Congestive Heart Failure Database [6]. In
total, 162 ECG recordings were used for this study out of
which, 30 belonged to people having CHF, 96 belonged to
people having ARR and 36 belonged to people having NSR.
A total of 65,536 samples were present in each recording.
Fig. 3 shows plots of three randomly selected records of each
type with 3000 samples per type. For efficient training of
various CNN models, size of training dataset was increased
Fig. 2. ECG signals corresponding to ARR, CHF and NSR conditions by splitting each ECG recording into 500 samples per
recording. An equal number of recordings of each type
Automatic extraction and identification of complex and (CHF, ARR, and NSR) were used to obtain corresponding
intricate features of images is the main advantage provided scalograms to ensure equal distribution of different labels.
by Deep Neural Networks (DNNs) and thereby, eliminating
the requirement for manual feature extraction as required in
conventional Machine Learning (ML) methodologies. This
advantage provides an opportunity for creating an end to end
pipeline, having ECG signal at input end and the result of
classification at the output. ECG classification task can be
binary or multiple. However, more and more accuracy and
human-level output is possible to achieve only when large
amount of data is available for extraction of intricate ECG
features, and learn from a variety of available inputs.
One reason predominantly hampering the wide utilization
of deep learning methods is the hindrance due to non-
availability of large volumes of data required to learn and
perform ECG classification. When contrasted with the
conventional ML based classification approaches, a
humongous amount of data is required by DNNs & CNNs for
training. Such an issue leads to a gap between sufficient
intricate ECG features and size of dataset, due to the lack of
volume of the datasets that are openly accessible in ECG
analysis domain [5].
In order to overcome the above-mentioned issue, this
work proposes an ECG multiple classification method
utilizing transfer learning from the different classes unlike
ECG domain. Particularly, instead of training the CNNs from
scratch using ECG data, architectures pre-trained on data Fig. 3. Plots of three random ECG signal of each type
associated with image classification and object recognition
(1000 classes like chair, mouse, table lamp etc.) are utilized. III. METHODOLOGY
Due to availability of huge datasets in these domains, A. CONTINUOUS WAVELET TRANSFORM & WAVELET
efficient training and extraction of feature maps COEFFICIENTS
representative of complex features and patterns in the images
Due to non-stationary behavior of the ECG signal (i.e. it’s
is possible. Such learnt and available feature maps can be
frequency components vary with time), Fourier transform is
transferred for ECG classification purposes, only if we can
not suitable for obtaining the ECG signal’s time-dependent
286
Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on July 04,2022 at 06:34:31 UTC from IEEE Xplore. Restrictions apply.
amplitude and frequency information. The Wavelet A total of 900 scalograms having 300 scalograms of each
Transform, in which the signal can be expressed using type (ARR, CHF and NSR) were obtained using a custom
constituent wavelets of different “scales” and “positions” function in MATLAB. Further, appropriate re-sizing of the
(unlike constituent sinusoid of different frequencies in images was performed as three models have different input
Fourier transform), is a promising tool in investigating the image specification requirements in terms of size/pixels.
non-stationary characteristic of ECG signal [8]. A waveform Table I shows the values of input image size accepted by
or signal having finite time duration and an average value of various CNN models under consideration.
zero can be thought of as a wavelet. The mathematical
expression for Continuous Wavelet Transform (CWT) of a For training and validation of three CNN models, holdout
approach with an 80-20 split of dataset (i.e. 80% for training
signal "𝑓(𝑡)" and the Wavelet, are given in (1) and (2)
& 20% for testing) was used.
respectively.
TABLE. I. INPUT IMAGE SIZE REQUIREMENTS OF THREE CNN MODELS
1 +∞ 𝑡−𝑏 MODELS INPUT IMAGE SIZE (PIXELS )
𝐶𝑊𝑇(𝑎, 𝑏) = < 𝑓, 𝛹𝑎,𝑏 > = ∫ 𝑓(𝑡). 𝛹 ∗ ( 𝑎 ) 𝑑𝑡
√𝑎 −∞
(1)
AlexNet 227 x 227
0, 𝑥 < 0
𝑔(𝑥) = { (3)
𝑥, 𝑥 ≥ 0
287
Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on July 04,2022 at 06:34:31 UTC from IEEE Xplore. Restrictions apply.
parameters. Further, using deep compression technique, the
model can be compressed to have a small size of just 0.47
MB from 4.8MB. Such small DNN models can be easily
implemented on hardware with limited resources including
Field Programmable Gate Arrays (FPGAs). Fig. 8 shows the
architecture of SqueezeNet having Fire modules, which is the
fundamental component of SqueezeNet. Fire module
contains a convolution layer (squeeze) having 1×1 filters,
which provide input to an expanding layer having 1x1 and
3x3 filters. As we move from the beginning to the end of the
network, the number of filters present per Fire Module
gradually increases. All Fire modules make use of ReLu
activation function. At the output end of architecture, a new
dropout layer with a probability of 0.6, a new convolutional
layer with the number of filters set to 3, and a new
classification layer replaced the original respective layers for
classifying three distinct ECG signal types.
Fig. 6. AlexNet Architecture [11]
D. GOOGLENET
GoogLeNet [12] was designed by a team of researchers at
Google, and has performance capability very close to
humans. GoogleNet, also known as Inception, won the
ILSVRC-2014 competition. GoogLeNet was inspired by
LeNet-5 [13]. Particularly, GoogLeNet uses Inception
Module, shown in Fig. 7, as it’s main structural component
providing the benefit of convolution filters of different sizes.
The inception module comprises of convolution and max-
pooling layers, arranged in parallel, which works together to
combine their respective feature maps.
288
Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on July 04,2022 at 06:34:31 UTC from IEEE Xplore. Restrictions apply.
SqueezeNet, respectively were obtained. The accuracy and
loss graphs for the three CNN models are presented in Fig. 9.
As can be seen from the Fig. 9 (a), the loss curve of AlexNet
shows least wiggle among loss curve of GoogLeNet and
SqueezeNet, suggesting that the loss function improved
monotonically with every epoch, giving least loss value at the
end of 10 epochs. Same inference can also be drawn from the
accuracy curve where, AlexNet showed least dips in
accuracy curve at initial epochs as compared to GoogLeNet
and SqueezeNet, resulting in enhancing the accuracy to
97.8% at the last epoch.
AlexNet outperformed other two models and showcased
the highest validation accuracy. With respect to training time,
GoogLeNet took longest to train, while AlexNet took least
training time.
Precision, Recall and F1 Score were computed for Fig. 9 (b). Accuracy and Loss graphs of GoogLeNet
evaluating the performance of three models. Precision, Recall
and F-1 Score are obtained using the confusion matrix. Fig.
10 shows the three-class confusion matrix obtained by three
different models. The confusion matrix fundamentally
contains four parameters which are True Positive (TP), True
Negative (TN), False Positive (FP), and False Negative (FN).
TP and TN correspond to the number of correctly predicted
ECG signal classes and are positioned along the diagonal
axis of confusion matrix. Similarly, FP and FN represents the
number of incorrectly classified/predicted ECG signal
classes. The number of ECG signals correctly classified as
positive out of the total ECG signals identified as positive,
defines the Precision. Recall (also known as Sensitivity) can
be defined as the number of ECG signals correctly identified
as positive out of the total actual positives classes of ECG
signal. F1-Score is representative of the effectiveness of
classification when equal importance is given to both recall
and precision. To calculate the overall Precision, Recall, and
F1 score, macro-averaging of results of three different classes Fig. 9 (c). Accuracy and Loss graphs of SqueezeNet
was done.
The Precision and Recall values for three target ECG
classes (ARR, CHF, and NSR) obtained by three different
models are shown in Fig. 10. Therefore, the overall
performance metrics for the multi-class classification task are
shown in Table III.
289
Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on July 04,2022 at 06:34:31 UTC from IEEE Xplore. Restrictions apply.
successfully converted to two-dimensional images, which
were then fed as input to three models for training and
classification purposes. Further, the importance of transfer
learning and how it can be utilized for un-related but similar
tasks is studied. Advantages of such methodology included
minimal training time due to transfer learning, as the network
is not trained from scratch for extraction and identification of
intricate ECG features.
The obtained accuracy of three models was close to
human level accuracy, which shows that AlexNet,
GoogLeNet and SqueezeNet architectures performs well for
multi-class ECG signal classification task, even with a small
dataset of 900 scalogram images. Also, results obtained are
strongly influenced by the duration of ECG signal, the
number of samples takes per signal, chosen wavelet and
continuous wavelet transform parameters (e.g., colour map).
REFERENCES
Fig. 10 (b). Confusion Matrix of GoogLeNet [1] A. Diker, E. Avci, Z. Cömert, D. Avci, E. Kaçar and İ. Serhatlioğlu,
"Classification of ECG signal by using machine learning methods",
2018 26th Signal Processing and Communications Applications
Conference (SIU), pp. 1-4, 2018.
[2] W. Zhang, L H. Wang et al., "A Low-Power High-Data-Transmission
multi-lead ECG Acquisition Sensor System", IEEE Sensors J., vol.
19, no. 22, pp. 1-3, Nov. 2019.
[3] L. Deng and D. Yu, “Deep Learning: Methods and Applications,”
Found. Trends® Signal Process., vol. 7, no. 3–4, pp. 197–387, 2014.
Douangnoulack, Phonethep, and Veera Boonjing. "Building Minimal
Classification Rules for Breast Cancer Diagnosis." 2018 10th
International Conference on Knowledge and Smart Technology
(KST). IEEE, 2018.
[4] Özal Yildirim, A novel wavelet sequence based on deep bidirectional
LSTM network model for ECG signal classification, Computers in
Biology and Medicine, Volume 96, 2018, Pages 189-202, ISSN 0010-
4825.
[5] Rajpurkar, Pranav & Hannun, Awni & Haghpanahi, Masoumeh &
Bourn, Codie & Y. Ng, Andrew. (2017). Cardiologist-Level
Arrhythmia Detection with Convolutional Neural Networks.
[6] Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh,
Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE.
PhysioBank, PhysioToolkit, and PhysioNet: Components of a New
Research Resource for Complex Physiologic Signals. Circulation
Fig. 10 (c). Confusion Matrix of SqueezeNet 101(23):e215-e220 [Circulation Electronic Pages;
http://circ.ahajournals.org/content/101/23/e215.full]; 2000 (June 13).
TABLE. II. ACCURACY AND TRAINING TIME OF THREE MODELS [7] Moody GB, Mark RG. The impact of the MIT-BIH Arrhythmia
Database. IEEE Eng in Med and Biol 20(3):45-50 (May-June 2001).
MODEL VALIDATION TRAINING TIME [8] S. Mallat, A wavelet tour of signal processing, Elsevier, 1999.
ACCURACY
[9] Guo Z, Chen Q, Wu G, Xu Y, Shibasaki R, Shao X., Village Building
AlexNet 97.80 22 min. 57 sec. Identification Based on Ensemble Convolutional Neural Networks.
GoogLeNet 97.78 69 min. 03 sec. Sensors.;17(11):2487, 2017.
SqueezeNet 97.22 25 min. 27sec. [10] ImageNet. http://www.image-net.org
[11] A. Krizhevsky, I. Sutskever and G. E. Hinton, "ImageNet
classification with deep convolutional neural networks", Neural
TABLE. III. PERFORMANCE METRICS Information Processing Systems, vol. 25, no. 2, 2012.
[12] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D.
MODEL PRECISION RECALL F1 SCORE Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with
AlexNet 0.977 0.978 0.977 convolutions,” in IEEE Conference on Computer Vision and Pattern
Recognition, 2015, pp. 1–9.
GoogLeNet 0.978 0.977 0.977
[13] Huang, Gao, et al. "Densely connected convolutional networks."
SqueezeNet 0.973 0.972 0.973 Proceedings of the IEEE conference on computer vision and pattern
recognition. 2017.
CONCLUSION [14] Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.;
Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer
In this work, a three structurally different CNN models parameters and <0.5MB model size. arXiv 2016; arXiv: 1602.07360.
were modified and studied to evaluate the multi-class ECG
signal classification task. Using continuous wavelet
transform and wavelet coefficients, ECG signal samples were
290
Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on July 04,2022 at 06:34:31 UTC from IEEE Xplore. Restrictions apply.