Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Journal of Biomedical Informatics 113 (2021) 103648

Contents lists available at ScienceDirect

Journal of Biomedical Informatics


journal homepage: www.elsevier.com/locate/yjbin

Original research

An hybrid ECG-based deep network for the early identification of high-risk to


major cardiovascular events for hypertension patients
Giovanni Paragliola ∗, Antonio Coronato
National Research Council (CNR) - Institute for High-Performance Computing and Networking (ICAR), Naples, Italy

ARTICLE INFO ABSTRACT

Keywords: Background and Objective: As the population becomes older and more overweight, the number of potential
eHealth high-risk subjects with hypertension continues to increase. ICT technologies can provide valuable support
Deep learning for the early assessment of such cases since the practice of conducting medical examinations for the early
Time series classification
recognition of high-risk subjects affected by hypertension is quite difficult, time-consuming, and expensive.
Early hypertension identification
Methods: This paper presents a novel time series-based approach for the early identification of increases in
Signal processing
hypertension to discriminate between cardiovascular high-risk and low-risk hypertensive patients through the
analyses of electrocardiographic holter signals.
Results: The experimental results show that the proposed model achieves excellent results in terms of
classification accuracy compared with the state-of-the-art. In terms of performances, our model reaches an
average accuracy at 98%, Sensitivity and Specificity achieve both an average value at 97%.
Conclusion: The analysis of the whole time series shows promising results in terms of highlighting the tiny
differences between subjects affected by hypertension.

1. Introduction the other as a low-risk subject. The red circle shows a minor change in
the signals that may be associated with an anomaly.
It is approximated that the 33% of adult human has hypertension However, those anomalies only become apparent when vital signals
and that one out of two of these persons are unknowing of this shape. It are evaluated quantitatively using an analysis system. Therefore, the
is rational, therefore, to suppose that regular cardiac disease screening
attention in the development of ICT tools for the early diagnosis and
examinations can make easier an early diagnosis and make smaller the
prognosis of cardiac disease has been increased.
risk of additionally complications associated with cardiac disease, such
as hypertension [1]. The aim of these tools is to support cardiologists in diagnostic tasks,
Although screening examinations are a conditio sine qua non for reducing both the number of missed diagnoses and the time taken to
the early diagnosis of hypertension, we should bear in mind that the reach such decisions.
diagnosis process is more complicated than it seems [1]. Screening Examples of such tools are digital questionnaires [4], risk factor-
programs for hypertension certainly have benefits but also these may based studies [5] and biological parameter-based techniques [6], the
have potential disadvantages (e.g., false positives, long Waiting Time for majority of which adopt machine learning approaches to extract useful
medical appointments, anxiety, psychological impacts, and economic
information from the data.
costs) [2].
Those focused on biological parameters identify possible anomalies
Diagnosing and the medical treatment of hypertension plays a sig-
nificant role in decreasing the risk of cardiovascular disease [3]. How- by looking for specific patterns in consecutive portions (named win-
ever, the early recognition of symptoms of high-risk subjects affected dows) of the patient’s physiological signals, Therefore, such techniques
by hypertension is quite difficult due to the fact that anomalies, which are defined as a window-by-window analysis of signals.
may be assessed as indicators of the onset of possible problems, are of Unfortunately, it may be difficult to design a machine learning
various kinds and are not always easily quantifiable. model able to identify anomalies in physiological signals due to both
For example, Fig. 2 shows the extraction of the data used in this unclear definition of such anomaly patterns and the engineering trade-
paper concerning the recording of the heartbeat of two subjects affected
off regarding to the design of the windows.
by hypertension, one labeled as a high-risk subject with an anomaly and

∗ Corresponding author.
E-mail addresses: giovanni.paragliola@icar.cnr.it (G. Paragliola), antonio.coronato@icar.cnr.it (A. Coronato).

https://doi.org/10.1016/j.jbi.2020.103648
Received 9 July 2020; Received in revised form 27 October 2020; Accepted 27 November 2020
Available online 1 December 2020
1532-0464/© 2020 Elsevier Inc. This article is made available under the Elsevier license (http://www.elsevier.com/open-access/userlicense/1.0/).
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648

We here introduce a novel approach for the analysis and classifi- a patient’s profile, so supporting an earlier identification of higher-risk
cation of the entire physiological signal to address the design issues of subjects.
traditional window-based approaches. Assessing risk factors is considerably more complicated since risk
In detail, in this paper we describe a novel time-series(TS)-based ap- factors may differ significantly from one region to another [4].
proach for the early identification of an increase in hypertension to dis- Other approaches rely on the application of machine learning tech-
criminate between cardiovascular high-risk and low-risk hypertensive niques for the analysis of the dynamic behaviors of a vital sign TS.
patients, where high-risk subjects are those patients who experienced In [13] the authors propose a mortality prediction model and estimated
critical events (such as myocardial infarctions, strokes, syncopal events, mortality risks over time based on TS of blood pressure and heart rate.
etc.). Moreover, we have designed a hybrid deep learning network In [7] the authors, evaluate linear and non-linear heart rate variability
(HDN) as a combination of three different networks, a Long Term (HRV) analysis methods and pattern recognition schemes from holter
Memory Signal (LTMS), a Convolution Neural Network (CNN) and a recordings. A prediction of blood pressure variability from TS data from
Deep Neural Network (DNN). blood pressure measurements home and data acquired via a medical
The evaluation of prediction models requires large datasets. For examination at a hospital has been proposed in [14].
this reason, the database SHARE [7] was selected for our proposes. In [15] the authors propose a Convolution-Neural-network-based
The database was downloaded from the Physionet Repository [8]. hypertensive prediction scheme using blood pressure data taken over
It contains more than 130 ECG recordings with the clinical data of a period of time to predict hypertension.
hypertensive subjects monitored for at least 12 months. In these approaches a group of temporal windows is obtained
Subjects who endured a vascular event (i.e., myocardial infarction) from TS signals. The choice of the window’s size and the overlapping
were assumed as high-risk subjects, the others as low-risk subjects. between two continuous windows is a standard engineering trade-off
The experimental results show that the proposed model achieves problem making the designing process complicated and time-
excellent results in terms of classification accuracy. In addition, high demanding.
sensitivity and specificity rates have been achieved in the automatic Fig. 1 reports an overview of the strengths and weaknesses of all
identification of subjects experienced vascular events within one year the approaches.
of an ECG recording In this paper, our aim is to highlight and demonstrate that our
solution is the only one that adopts a TS classification-based approach
2. Related work for the assessment of an early increase in hypertension.

2.2. State-of-the-art in relation to TS classification models


The aim of this section is to highlight the contribution of our
solution. First, we will present an overview of the state-of-the-art
Time Series Classification (TSC) is a commonly encountered chal-
approaches for the assessment of hypertension so as to show that
lenge. Now-day, there are a few solutions that address this issue. In the
our approach is the first based on a TS classification of physiological
previous section, we have highlighted that the application of a time-
signals.
series classification approach for the assessment of hypertension disease
Secondly, we will present an overview of other deep-learning mod-
is an innovative technique in the assessment of hypertension diseases.
els for the classification of a TS with the aim of comparing them with
In this section, we present a few well-known models for TSC that we
our model. The experimental results will show the better performance
have implemented in our experiments to assess the prediction success
of our model compared to the other approaches.
of those models compared with our solution.
The results prove that our solution achieves better results in terms
2.1. Classic hypertension assessment approaches of an early identification of an increase in hypertension in order to dis-
criminate between cardiovascular high-risk and low-risk hypertensive
High blood pressure is mostly asymptomatic, in particular in the patients.
early stages, causing to its labeling as a silent killer [9]. The asymp- Fully Convolutional Neural Networks (FCN) [16] are for the most part
tomatic feature of hypertension requires routine blood pressure screen- convolutional networks (CNN) that do not include any local pooling
ing. layers (PL); the size of a TS is maintained unaltered across the convo-
Ambulatory blood pressure monitoring (ABPM) is judged as a gold- lutions. Furthermore, an important features of FCN is the replacement
standard approach of hypertension diagnosis at the time when a indi- of a final Fully Connected (FC) layer with a Global Average Pooling
vidual has been examined and realized to have high blood pressure. (GAP) [17]
The goal of ABPM is to discover the disease at its earliest and most Residual Network (ResNet) is a network that counts 11 layers, the
treatable stage. first 9 layers of which are CN follow on a GAP layer that averages
ABPM typically require the use of portable, automated cuffs worn output of the CN. The layers are grouped in logical units named
continuously that evaluate the blood pressure every 15/30 min in the residual blocks (RB). Each block consists of three convolution layers
course of the day and 15/60 min during the night [10]. of which the output is summed to the following input block and then
Despite their suitability for diagnosis, ambulatory monitors may supplied to the following layer. The network is defined as three residual
have potential drawbacks, such as high treatment costs and a long blocks whose output feeds a GAP layer where the last is a softmax
waiting time for a medical examination [10]. In addition, the wearing classifier [16].
of cuffs and similar tools may be uneasy during the daily life and sleep. Encoder is a hybrid deep CNN similar to FCN but with a significant
The estimation of key risk factors influencing hypertension is an- dissimilarity. An attention layer replaces the GAP layer [18]. In the
other method adopted to support early diagnosis. same way as in FCN, this network consists of three CL with slight
Machine learning-based approaches have been used for the analysis changes. The third convolutional layer supplied to an attention mech-
of questionnaires or/and biological parameters in order to evaluate risk anism [19] which make it possible for the network to discover which
factors. There are many studies that demonstrate this techniques such segments in the TS are significant for the classification.
as: [4,11,12]. Time Le-Net (t-LeNet) can be taken into account as a classic CNN
Generally, the identification of risk factors depends on a considera- with two convolution layers follow on behind an FC that feeds a
tion of characteristics including gender, age, practice sports, body mass softmax. It is possible distinguish two variances with FCNs: the first
index (BMI), food habits, smoking or alcohol usage (substance abuse) concerns the use of an FC layer instead of the GAP, and the second
and a family history of the illness. The aim of these methods is to define relates to the local max-pooling operations on the output of each

2
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648

Fig. 1. Summary of the state-of-the-art for the hypertension assessment and evaluation.

the size of the CNN’s filters. However, the core idea of our work is to
evaluate a different way of analyzing the sequence of samples.
In our solution, we adopt the LSTM network that is naturally
designed to analyze sequences.
In detail, it processes a sequence of values one sample at a time,
preserving in its internal state a data representation that holds informa-
tion about the all past items of the sequence. As a result, each sample
(timestamp) in an input TS is projected into a higher dimensional space.

2.3. Contribution of our work

Following the presentation of the state-of-the-art approaches, it is


worth summarizing the contribution of our solution.

– The literature shows a few methods focused on the assessment of


hypertension, especially those related to the identification of the
Fig. 2. Comparison between the holter recordings of a high-risk subject and a low-risk
subject [7].
disease in its early stages.
Moreover, a study of the literature highlights the lack of any
approaches based on the TCS of the vital signs of patients with
hypertension, and for this reason, in our paper we are proposing
CNN layer. Unlike ResNet and FCN, this approach pushing up the
a TSC-based solution for the early identification of subjects at a
amount of parameters needed to be trained during the learning process,
high risk of hypertension
which is based also on the size of the input TS. The use of t-LeNet
– We present a novel hybrid deep network model to address TCS
network is restricted since the number and length of the chosen filters
by incorporating within our network a component specifically
increases [20].
designed to address the analysis of the sequence of samples.
Multi Channel Deep Convolutional Neural Network (MCDCNN) was
We have compared our model with the state-of-the-art of TS-
suggested for multivariate TS (MTS) datasets [21]. The architecture
based approaches. The results show the goodness of our model
consist of a CNN adapted for MTS where the convolutions operation
on account of its achievements of a better performance.
acts independently for each dimension of the time-series. Each dimen-
sion of the MTS feed two CL followed by a max pooling layer, after that
the output is flatted and then fed to an FC. 3. Methods
The Time-CNN network was designed by Zhao et al. [22]. It is possi-
ble highlight three changes in comparison with the others architectures. 3.1. Data description
(I) Time-CNN is the adoption of mean squared error (MSE) instead
of the categorical cross-entropy loss function. (II) the last layer is a FC We have adopted a database containing 24 h electrocardiographic
with a sigmoid . (III) A local max pooling layer is replaced a with local (ECG) holter recordings of 139 hypertensive subjects (49 woman and
average pooling operation. 90 man, average age 72 ∓ 7 years).
All the presented architectures aim at classifying a TS. Each model The subjects were supervised for 12 months after the recordings to
has its strengths points and the literature shows that each achieves take note of major cardiovascular events (i.e. fatal or non-fatal strokes).
good results. However, our model includes a novel feature that makes In the 12-month follow-up after the recordings, 17 patients put
it different from the other architecture. up with a recorded event (11 myocardial infarctions, 3 strokes and 3
Our model is the only one that adopts a recurrent layer (the LSTM syncopal events) and for this reason, those subjects were considered as
Network) as the first layer of the whole network. The output of this high-risk (HRP), while the remaining ones as low-risk subjects (LRP) [7].
layers is fed to a chain of CNN layers whose output is fed to an FC. Each recording is labeled in accordance with the patients it belongs
The justification of this choice is related to the nature of the to, HRP for the high-risk subjects, and (LRP) for the low-risk subjects.
input TS. An electrocardiographic Holter recording is a class of signals Fig. 3 shows an examples of extraction of data recorded from the
where a sequence of measurements is collected by sampling the heart’s subjects.
activity over the time domain. These sequences may present a relevant Fig. 2 is an image provided by Melillo et al. [7] with the purpose
sequential correlation between consecutive samples which should be of highlighting as details as possible the tiny differences between the
take advantage to enhance the accuracy of our classifier [23]. ECG signals of the two classes of subjects.
All the proposed architectures adopt as first layers a CNN network The red circle shows a slight change in the signal which may be
that applies the convolution operator to a group of samples as wide as interpreted as an anomaly in the recording.

3
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648

Fig. 3. Example of one of the ECG waveforms stored in the dataset.

Fig. 3 shows the structures of the data used for the training. The
signal is a multivariate TS (MTS) where each second of the raw signal 𝑦 = 𝑀𝑎𝑥𝑃 𝑜𝑜𝑙𝑖𝑛𝑔(𝑦𝑐𝑜𝑛𝑣 )
models one time-point (or training example). Each time-point is a
where ⊗ is the convolutional operator, and MaxPooling is a
vector of tree spatio-temporal features where each feature is a different
sub-sample-based discretization operator.
TS of an electrocardiographic holter recording acquired using various
The extracted features have the shape of a vector with dimension-
techniques [7]
ality equal to the kernel filter of the second CNN block.
The CNN Layer produces a sequence of features-vectors which
3.2. Hybrid deep model
must be flattened in order to submit it to the deep layer.
This operation is in charge of the Flatten layer that transforms a
Fig. 4 describes an overview of the proposed hybrid network.
set of one-dimensional vectors of features into a single vector that
At the top level (i.e. the input layer), the model is fed with the input
can be fed into a fully connected network.
TS.
The output of the flatten layer is fed into the deep layer, which
The model is composed of three components.
transforms the flatten vector into a space that makes the output
– The recurrent layer aims to address the temporal correlations easier to classify.
between the time points of the TS. This logical layer is defined – The Deep Layer is composed of a four stacked Deep Neural Net-
as a Long Short-Term Memory network (LSTM). Its capabilities work (DNN). Each layer is defined as a fully-connected layer.
of modeling the correlations of a temporal sequence constitutes Each fully-connected layer has a number of neurons half than
the main reason why we adopted the LSTM networks in this the previous one, followed by a bath normalization (BN) and
work [24]. dropout layers. A rectified linear unit (ReLU) layer is adopted as
The LSTM is defined as a sequence of unit cells for the analyzing an activation function to prevent saturation of the gradient.
of a sequence of data 𝑥1 , … , 𝑥𝑡 with t ∈ 1, 𝑇 , with T the length of The normalization layer is applied to speed up the convergence
the sequence. The generic value 𝑥𝑖 at a time point i is processed speed and help improve the generalization and the dropout layers
by a unit cell. to enhance the generalization capability.
Each time point of both the input and output TS is still shaped as A basic fully-connected layer is formalized as
a vector of features, whose dimensionality changes according to
𝑦𝑓 𝑐 = 𝑊 ⋅ 𝑥 + 𝑏
the unit cell’s filter size.
The LSTM’s output having been characterized by considering the 𝑦𝑏𝑛 = 𝐵𝑁(𝑦𝑓 𝑐 )
features of the input data to be ordered by time. 𝑦 = 𝑅𝑒𝑙𝑈 (𝑦𝑏𝑛 )
The state of the generic units at time t can be expressed as:
Since the output of the network can be stated as a binary clas-
𝑦𝑡 = 𝜙(𝑊 ⋅ 𝑥𝑡 + 𝑈 ⋅ ℎ𝑡−1 + 𝑏) sification, we have decided to adopt a Sigmoid layer as the acti-
vation function [25,26] for the last layer of the Deep Layer for
where 𝜙 is the sigmoid function, and W , U , and b are the
the classification of the input as a high-risk subject or low-risk
parameters which need to be fitted during the training, ℎ𝑡−1 is
subject.
the state of the previous unit.
On the right side of Fig. 4, we have reported a table summarizing
The LSTM counts one layer with a dimensionality of the output
a few of the hyper-parameters that we have used for the training
space of each cell equal to 10 units. Each output of a cell is fed
of the model.
as the input to the convolutional layer.
A comprehensive analysis of the internal structure of all the layers
– The Convolutional layer is in charge of generating a reduced
is outside of the scope of this paper. For more details, the reader
representation of the input by extracting the more informative
can refer to Lipton [24], Orbach [27] and Goodfellow et al. [28].
features from the data input. The layer is defined as two stacked
blocks, each one composed of a convolutional network (C), and a
4. Results
max polling layer (P). The first one has a kernel size of 10 units
for both C and P while the second has 8 units
4.1. Training
A convolutional layer adopts filters that process local partitions
of the input within which the convolutional operator is applied.
In this section we introduce both the training dataset and the
These filters are duplicated alongside the whole input. A sub-
learning process of our model.
sampling step (pooling) performers a smaller resolution version
First, the training data is extracted from the dataset downloaded
of the output of the CL by filtering the maximum value from
from the Physionet repository.
different local regions.
The personal recordings are each of about 24 h in duration and
At the conclusion, the convolutional layer extracts the most rele-
contain three ECG signals, each sampled at 128 samples per second.
vant features from each second by the input TS.
A fragment of 5 min recorded during the daytime was randomly
Each block is formally expressed as:
selected without replication for each subject. From each second, one
𝑦𝑐𝑜𝑛𝑣 = 𝑊 ⊗ 𝑥 + 𝑏 point was randomly selected.

4
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648

Fig. 4. Overview of the hybrid model and summary of the hyper-parameters.

𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
Each fragment is a single sample of the training dataset. Moreover, 𝐹1 = 2 ∗ (4)
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
each sample was labeled with a tag High or Low based on the subject
from which it was selected 5. where:
The number of samples selected from each recording was different – TP: true positives (i.e. the number of high-risk subjects properly
between high-risk and low-risk subjects to guarantee a fair balancing identified)
between the two classes. – FN: false negatives (i.e. high-risk subjects wrongly identified as
From each low-risk recording we selected 40 samples. However, low-risk subjects)
from each high-risk recording we select 160 samples so that the training – TN: true negatives (i.e, low-risk subjects properly identified)
dataset would be well balanced between the two classes. In conclusion. – FP: false positives (i.e. low-risk subjects wrongly identified as
the training set consisted of more than 7000 samples (a.k.a time series). high-risk subjects)
We repeated this process both for the training set and the test set.
We adopted a 10-fold-cross-validation loop that was performed for The AUC (Area under the ROC Curve) is the entire two-dimensional
the model optimization (i.e. a Tuning hyper-parameters optimization), area underneath the entire ROC curve.
while a hold-out test set was used to obtain unbiased estimates of the For the comparative experiments we implemented all TSC architec-
true classification performance tures presented in the related work based on the indications reported
For the test set, we selected one sample from each low-risk recording in the respective papers.
and three samples from each high-risk. Therefore, the test set consists The aim of this section is to compare the performance of our model
of 177 samples. against the other models. The training and the test were set are the
At the conclusion, we reach a training–testing ratio between the two same for all architectures.
datasets at by 2.5% In order to achieve the best performance we searched for the best
configuration of our network by tuning the hyper-parameters of the
4.2. Performance evaluation networks.
The tuning of our model was evaluated with a grid-search approach
It is worth recalling that the results demonstrate the goodness of our with a view to setting the parameters at:
model in classifying a TS as a high-risk subject or a low-level subject.
– Neurons per layers:{512; 1024; 2048; 4096}
Our model exposes itself as a binary classification.
– Learning rate: {0.01; 0.05; 0.001; 0.005; 0.0001}
For the training stage, we randomly split the whole training dataset
– Epochs: {1000; 1500; 2000, 2500}
by defining a training set as 90% of the original set whereas the
– Convolutional kernel size: {5, 10, 15}
validation set was defined as 10%.
– Pooling filter size: {5, 10, 15}
For the evaluation of our model, four metrics were used: precision,
recall, accuracy and F1-Score [29]. A grid search approach consist of an very thorough searching across a
These measurements are defined as follows: given subset of the hyper-parameter space.
𝑇𝑃 Fig. 6 shows the classification results. For each architecture we
𝑅𝑒𝑐𝑎𝑙𝑙 = (1)
𝑇𝑃 + 𝐹𝑁 report the best results and the worse and the average. The best results
𝑇𝑃 refer to tests case in which the training stage had set with 2500 epochs
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (2)
𝑇𝑃 + 𝐹𝑃 while the worst ones with 1000 epochs. The AVG column describes the
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (3) average of all the test cases performed.
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁

5
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648

Fig. 5. Extraction of the training sample from the dataset.

Fig. 6. Overview of the performance results.

The first row reports the results of Melillo et al. [7] authors. The 5. Discussions
authors adopt a non-TSC approach but we included them anyway
because they created the dataset and performed the first experiments We have compared our approach to other well-known architec-
so defining a baseline for our work. There is no value of the F1-score tures in identification of high-risk individuals among a population of
because the authors did not evaluate it. hypertensive patients
Considering the best case, the results show that our model surpasses The results show the goodness of our solution. However, an honest
all the other solutions by reaching an accuracy of 98%. The average discussion considering the strengths and weaknesses, is needed
improvement of accuracy is about 13% ranging from a minimum of Fig. 8 shows the ROC curve. Our model achieves the highest AUC
6.12% [22] to a maximum of 18.37% [20]. values, equals to 98%, higher than the one obtained by Zhao et al.
The goodness of the model is also confirmed by the values of recall [22] (95%). The AUC of our solution is better by 3%, in addition to
and precision, both about higher than 95%. The F1-score reaches 96% an average improvement of 9.7%.
so confirming a good behavior of the model with respect to the false It is important to note that there is a cross point at the false-positive
positives and false negatives. rate (FPR), equals to 25%. After this point, our model performs better as
Considering the worst test-case, the results are better with average the FPR increases and consequently, the response in terms of Recall is
improvements of accuracy of 16%. better. This demonstrates that our model is able to avoid false positives
The following figures reefers to the results achieved in the best test (i.e. a high-risk subject would predict as a low-risk)
cases. Before this point our solution shows very similar results to Zhao
Fig. 7 shows the confusion matrix of each architecture. et al. [22] and Zheng et al. [21] and the improvements are negligible.
Sub-figure A-7 reports the average classification results on the test Fig. 7 shows the confusion matrix of each architecture.
set. The model is able to discern between the two classes with a high It is worth noting that our model achieves a good performance
precision. as regards false positive. The matrix reports only 2 false positives
The test set consists of 156 samples. Our model identifies the low- which produces an improvement of more than 200% compared with
risk subject (the false cases) with an accuracy of 98% (120 out of 122), the best result of the other architecture ([21] with 6 false positives).
and the high-risk subjects (the true cases) with an accuracy of 96% (53 The average number of false positive is around 9, and consequently,
samples out of 55). our model shows better results.
Fig. 8 presents the ROC curve showing the performance of our Concerning the true negative our model achieves results close to the
classification model. other architectures so there is no a clear improvement.
The curves of all the solutions have a very similar trend. However, Figs. 9b and 9a show the trends of the accuracy and loss function
our model shows a higher AUC compared to the other architectures. of our model during the training.

6
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648

Fig. 7. Overview of the confusion matrix of each architecture.

Fig. 8. Overview of the ROC curves.

It is possible to observe that in both figures the curves of the (TCNN), +1.7% (Encoder), +2.9%(FCN), +7%(LeNet), +4% (ResNet)
training set and validation set are very close to each other. This results and +1.6%(MCDCNN).
demonstrates that the models is less affected by over-fitting issues. Fig. 11 show an overview of the training time duration of each
At the beginning of the training process, the learning process shows network. The column Trainable Parameters reports the number of inside
a fluctuating trend that gets smoother after the 250 epoch. parameters of each network which have been fitted during the training.
Figs. 10b and 10a show the trends the accuracy and loss function Our model is the biggest network with more the 1.2 million parameters,
of our solution compared with the other architectures. and consequently it takes the longest training time.
The figures demonstrate that our model converges more slowly than These results show that our model is able to achieve the best
the others. This finding shows that our model reaches a local minimum performance in terms of discerning between high-risk subjects and low-
more slowly since it requires more epochs. risk subjects. However, it shows a slow trending during the training.
After 100 epochs the value of loss of our model is greater than This aspect increases the total amount of time needed by our model
the majority of the other models: +99% (MCDCNN), +76% (Encoder), to reach a local minimum point while the other solutions reach a
+48% (FCN), +45% (LeNet), +43% (ResNet) and -82% (TCNN). sub-optimal point faster.
Every models reaches a stable point after 250 epochs but even at this From the experiments, we have learnt a some points which will
step our model does not show the best loss. At the last epoch the gap- examine in depth in future activities, including:
loss is +90% (MCDCNN), +73% (TCNN), 36% (Encoder), 5% (FCN),
−0.08% (LeNet), -4% (ResNet). – reducing the numbers of parameters. The high number of param-
The slow convergence of our model is rewarded with a better eters makes the training process very long. This is the main issue
accuracy. Fig. 10b shows that at the end of the training the accu- in relation of our approach which we are planning to investigate
racy achieved by our model is the best. As regards the improve- in our future work. A Possible solutions cloud include changing
ment in the classification accuracy, our solution is better by: +5% the network’s structure by modifying the layers.

7
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648

Fig. 9. Trend of the loss and accuracy of the model compared with the validation set.

Fig. 10. Comparison of the trends of the loss and accuracy of all architectures.

8
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648

References

[1] S. Gulec, Early diagnosis saves lives: focus on patients with hypertension, Kidney
Int. Suppl. 3 (2013) 332–334, http://dx.doi.org/10.1038/kisup.2013.69.
[2] S.S. Daskalopoulou, N.A. Khan, R.R. Quinn, M. Ruzicka, D.W. McKay, D.G.
Hackam, S.W. Rabkin, D.M. Rabi, R.E. Gilbert, R.S. Padwal, et al., The 2012
canadian hypertension education program recommendations for the management
of hypertension: Blood pressure measurement, diagnosis, assessment of risk, and
therapy, Canad. J. Cardiol. 28 (2012) 270–287, http://dx.doi.org/10.1016/j.cjca.
2012.02.018.
[3] C.L. Schwartz, R.J. McManus, What is the evidence base for diagnosing hyper-
tension and for subsequent blood pressure treatment targets in the prevention of
cardiovascular disease? BMC Med. 13 (2015) http://dx.doi.org/10.1186/s12916-
015-0502-5.
[4] K. Shobha, S. Nickolas, Analysis of importance of pre-processing in prediction
of hypertension, CSI Trans. ICT 6 (2018) 209–214, http://dx.doi.org/10.1007/
s40012-018-0197-9.
[5] H. Zhao, Z. Ma, Y. Sun, A hypertension risk prediction model based on bp
Fig. 11. Overview of the training time duration of each network.
neural network, in: 2019 International Conference on Networking and Network
Applications (NaNA), IEEE, 2019, http://dx.doi.org/10.1109/nana.2019.00085.
[6] W. Chang, Y. Liu, Y. Xiao, X. Yuan, X. Xu, S. Zhang, S. Zhou, A machine-
– Increasing the dataset size: The dataset used for the training is a learning-based prediction method for hypertension outcomes based on medical
data, Diagnostics 9 (2019) 178, http://dx.doi.org/10.3390/diagnostics9040178.
good starting point. However, we are planning to test our model
[7] P. Melillo, R. Izzo, A. Orrico, P. Scala, M. Attanasio, M. Mirra, N. De Luca,
with other bigger datasets which describe the same class of data. L. Pecchia, Automatic prediction of cardiovascular and cerebrovascular events
The increasing of the dataset and the validation on a bigger using heart rate variability analysis, PLOS ONE 10 (2015) e0118504, http:
dataset will improve the goodness of our model and prevent it //dx.doi.org/10.1371/journal.pone.0118504.
from overfitting issues. [8] A.L. Goldberger, L.A.N. Amaral, L. Glass, J.M. Hausdorff, P.C. Ivanov, R.G. Mark,
J.E. Mietus, G.B. Moody, C.-K. Peng, H.E. Stanley, PhysioBank, PhysioToolkit,
– Generalize the model: One of the future investigations will focus
and PhysioNet: Components of a new research resource for complex physiologic
on the application of the proposed model for different use cases signals, Circulation 101 (2000) e215–e220, http://dx.doi.org/10.1161/01.CIR.
in order to make it as more generalizable as possible. 101.23.e215, Circulation Electronic Pages: http://circ.ahajournals.org/content/
101/23/e215.full PMID:1085218.
6. Conclusions [9] J. Kitt, R. Fox, K.L. Tucker, R.J. McManus, New approaches in hypertension
management: a review of current and developing technologies and their potential
impact on hypertension care, Curr. Hypertens. Rep. 21 (2019) http://dx.doi.org/
In this work we have presented a TS-based approach for the early
10.1007/s11906-019-0949-4.
identification of high-risk subjects affected by hypertension.
[10] A. Hinderliter, R.A. Voora, A.J. Viera, Implementing abpm into clinical practice,
Among other approaches for the assessment of hypertension, our Curr. Hypertens. Rep. 20 (2018) http://dx.doi.org/10.1007/s11906-018-0805-y.
approach is the first one that adopts a TSC to differentiate between [11] A. Wang, N. An, G. Chen, L. Li, G. Alterovitz, Predicting hypertension without
high-risk and low-risk subjects. measurement: A non-invasive, questionnaire-based approach, Expert Syst. Appl.
We have compared our model with other deep network architec- 42 (2015) 7601–7609, http://dx.doi.org/10.1016/j.eswa.2015.06.012.
tures for the classification of TS and the results experiment shows that [12] S. Mohan, C. Thirumalai, G. Srivastava, Effective heart disease prediction using
hybrid machine learning techniques, IEEE Access 7 (2019) 81542–81554, http:
our model achieves better results in terms of classification accuracy.
//dx.doi.org/10.1109/access.2019.2923707.
However, the high number of parameters of our model mean that it [13] L.-w.H. Lehman, R.P. Adams, L. Mayaud, G.B. Moody, A. Malhotra, R.G. Mark,
takes a long time to complete the training. S. Nemati, A physiological time series dynamics-based approach to patient
The analysis of the whole TS shows promising results in terms of monitoring and outcome prediction, IEEE J. Biomed. Health Inf. 19 (2015)
highlighting the tiny differences between subjects affected by hyper- 1068–1076, http://dx.doi.org/10.1109/jbhi.2014.2330827.
tension. [14] H. Koshimizu, R. Kojima, K. Kario, Y. Okuno, Prediction of blood pressure
variability using deep neural networks, Int. J. Med. Inform. 136 (2020) 104067,
http://dx.doi.org/10.1016/j.ijmedinf.2019.104067.
CRediT authorship contribution statement [15] Y. Luo, Y. Li, Y. Lu, S. Lin, X. Liu, The prediction of hypertension based on
convolution neural network, in: 2018 IEEE 4th International Conference on
Giovanni Paragliola: Methodology, Supervision, Investigation, Computer and Communications (ICCC), 2018, pp. 2122–2127.
Writing - review & editing, Visualization, Software, Formal analy- [16] Z. Wang, W. Yan, T. Oates, Time series classification from scratch with deep
sis, Writing - original draft. Antonio Coronato: Project administra- neural networks: A strong baseline, 2016, arXiv:1611.06455.
tion, Funding acquisition, Resources, Writing - review & editing, Data [17] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features
for discriminative localization, 2015, arXiv:1512.04150.
curation, Visualization, Formal analysis, Conceptualization.
[18] J. Serrà, S. Pascual, A. Karatzoglou, Towards a universal neural network encoder
for time series, 2018, arXiv:1805.03908.
Declaration of competing interest [19] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning
to align and translate, 2014, arXiv:1409.0473.
The authors declare that they have no known competing finan- [20] A.L. Guennec, S. Malinowski, R. Tavenard, Data augmentation for time series
cial interests or personal relationships that could have appeared to classification using convolutional neural networks, 2016.
influence the work reported in this paper. [21] Y. Zheng, Q. Liu, Y. Chen, Enhongand Ge, J.L. Zhao, Time series classification
using multi-channels deep convolutional neural networks, in: F. Li, G. Li, S.-w.
Hwang, B. Yao, Z. Zhang (Eds.), Web-Age Information Management, Springer
Acknowledgments International Publishing, Cham, 2014, pp. 298–310.
[22] B. Zhao, H. Lu, S. Chen, J. Liu, D. Wu, Convolutional neural networks for time
This work is supported by the AMICO project which has received series classification, J. Syst. Eng. Electron. 28 (2017) 162–169, http://dx.doi.
funding from the National Programs (PON) of the Italian Ministry org/10.21629/jsee.2017.01.18.
of Education, Universities and Research (MIUR): code ARS0100900 [23] T.G. Dietterich, Machine learning for sequential data: A review, in: T. Caelli,
A. Amin, R.P.W. Duin, D. de Ridder, M. Kamel (Eds.), Structural, Syntactic,
(Decree n.1989, 26 July 2018).
and Statistical Pattern Recognition: Joint IAPR International Workshops SSPR
We would like thank to Dott. Giovanni Donnici who is a cardiologist 2002 and SPR 2002 Windsor, Ontario, Canada, August 6–9, 2002 Proceedings,
at Dipartimento di alta specialità del cuore, AOR San Carlo di Potenza, Springer Berlin Heidelberg, Berlin, Heidelberg, 2002, pp. 15–30, http://dx.doi.
Italy for his contribution to the revision of the paper. org/10.1007/3-540-70659-3_2.

9
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648

[24] Z.C. Lipton, A critical review of recurrent neural networks for sequence learn- [27] J. Orbach, Principles of neurodynamics. perceptrons and the theory of brain
ing, 2015, CoRR, arXiv:abs/1506.00019. URL: http://arxiv.org/abs/1506.00019. mechanisms, Arch. Gen. Psychiatry 7 (1962) 218, http://dx.doi.org/10.1001/
arXiv:1506.00019. archpsyc.1962.01720030064010.
[25] C. Yin, Y. Zhu, J. Fei, X. He, A deep learning approach for intrusion detection [28] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016, http:
using recurrent neural networks, IEEE Access 5 (2017) 21954–21961. //www.deeplearningbook.org.
[26] R. Kumari, S. Kr., Machine learning: A review on binary classification, Int. J. [29] Evaluation: From precision, recall and f-factor to roc, informedness, markedness
Comput. Appl. 160 (2017) 11–15, http://dx.doi.org/10.5120/ijca2017913083. e correlation. URL: http://dx.doi.org/10.9735/2229-3981. doi:http://dx.doi.org/
10.9735/2229-3981.

10

You might also like