Professional Documents
Culture Documents
1 s2.0 S1532046420302768 Main
1 s2.0 S1532046420302768 Main
Original research
Keywords: Background and Objective: As the population becomes older and more overweight, the number of potential
eHealth high-risk subjects with hypertension continues to increase. ICT technologies can provide valuable support
Deep learning for the early assessment of such cases since the practice of conducting medical examinations for the early
Time series classification
recognition of high-risk subjects affected by hypertension is quite difficult, time-consuming, and expensive.
Early hypertension identification
Methods: This paper presents a novel time series-based approach for the early identification of increases in
Signal processing
hypertension to discriminate between cardiovascular high-risk and low-risk hypertensive patients through the
analyses of electrocardiographic holter signals.
Results: The experimental results show that the proposed model achieves excellent results in terms of
classification accuracy compared with the state-of-the-art. In terms of performances, our model reaches an
average accuracy at 98%, Sensitivity and Specificity achieve both an average value at 97%.
Conclusion: The analysis of the whole time series shows promising results in terms of highlighting the tiny
differences between subjects affected by hypertension.
1. Introduction the other as a low-risk subject. The red circle shows a minor change in
the signals that may be associated with an anomaly.
It is approximated that the 33% of adult human has hypertension However, those anomalies only become apparent when vital signals
and that one out of two of these persons are unknowing of this shape. It are evaluated quantitatively using an analysis system. Therefore, the
is rational, therefore, to suppose that regular cardiac disease screening
attention in the development of ICT tools for the early diagnosis and
examinations can make easier an early diagnosis and make smaller the
prognosis of cardiac disease has been increased.
risk of additionally complications associated with cardiac disease, such
as hypertension [1]. The aim of these tools is to support cardiologists in diagnostic tasks,
Although screening examinations are a conditio sine qua non for reducing both the number of missed diagnoses and the time taken to
the early diagnosis of hypertension, we should bear in mind that the reach such decisions.
diagnosis process is more complicated than it seems [1]. Screening Examples of such tools are digital questionnaires [4], risk factor-
programs for hypertension certainly have benefits but also these may based studies [5] and biological parameter-based techniques [6], the
have potential disadvantages (e.g., false positives, long Waiting Time for majority of which adopt machine learning approaches to extract useful
medical appointments, anxiety, psychological impacts, and economic
information from the data.
costs) [2].
Those focused on biological parameters identify possible anomalies
Diagnosing and the medical treatment of hypertension plays a sig-
nificant role in decreasing the risk of cardiovascular disease [3]. How- by looking for specific patterns in consecutive portions (named win-
ever, the early recognition of symptoms of high-risk subjects affected dows) of the patient’s physiological signals, Therefore, such techniques
by hypertension is quite difficult due to the fact that anomalies, which are defined as a window-by-window analysis of signals.
may be assessed as indicators of the onset of possible problems, are of Unfortunately, it may be difficult to design a machine learning
various kinds and are not always easily quantifiable. model able to identify anomalies in physiological signals due to both
For example, Fig. 2 shows the extraction of the data used in this unclear definition of such anomaly patterns and the engineering trade-
paper concerning the recording of the heartbeat of two subjects affected
off regarding to the design of the windows.
by hypertension, one labeled as a high-risk subject with an anomaly and
∗ Corresponding author.
E-mail addresses: giovanni.paragliola@icar.cnr.it (G. Paragliola), antonio.coronato@icar.cnr.it (A. Coronato).
https://doi.org/10.1016/j.jbi.2020.103648
Received 9 July 2020; Received in revised form 27 October 2020; Accepted 27 November 2020
Available online 1 December 2020
1532-0464/© 2020 Elsevier Inc. This article is made available under the Elsevier license (http://www.elsevier.com/open-access/userlicense/1.0/).
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648
We here introduce a novel approach for the analysis and classifi- a patient’s profile, so supporting an earlier identification of higher-risk
cation of the entire physiological signal to address the design issues of subjects.
traditional window-based approaches. Assessing risk factors is considerably more complicated since risk
In detail, in this paper we describe a novel time-series(TS)-based ap- factors may differ significantly from one region to another [4].
proach for the early identification of an increase in hypertension to dis- Other approaches rely on the application of machine learning tech-
criminate between cardiovascular high-risk and low-risk hypertensive niques for the analysis of the dynamic behaviors of a vital sign TS.
patients, where high-risk subjects are those patients who experienced In [13] the authors propose a mortality prediction model and estimated
critical events (such as myocardial infarctions, strokes, syncopal events, mortality risks over time based on TS of blood pressure and heart rate.
etc.). Moreover, we have designed a hybrid deep learning network In [7] the authors, evaluate linear and non-linear heart rate variability
(HDN) as a combination of three different networks, a Long Term (HRV) analysis methods and pattern recognition schemes from holter
Memory Signal (LTMS), a Convolution Neural Network (CNN) and a recordings. A prediction of blood pressure variability from TS data from
Deep Neural Network (DNN). blood pressure measurements home and data acquired via a medical
The evaluation of prediction models requires large datasets. For examination at a hospital has been proposed in [14].
this reason, the database SHARE [7] was selected for our proposes. In [15] the authors propose a Convolution-Neural-network-based
The database was downloaded from the Physionet Repository [8]. hypertensive prediction scheme using blood pressure data taken over
It contains more than 130 ECG recordings with the clinical data of a period of time to predict hypertension.
hypertensive subjects monitored for at least 12 months. In these approaches a group of temporal windows is obtained
Subjects who endured a vascular event (i.e., myocardial infarction) from TS signals. The choice of the window’s size and the overlapping
were assumed as high-risk subjects, the others as low-risk subjects. between two continuous windows is a standard engineering trade-off
The experimental results show that the proposed model achieves problem making the designing process complicated and time-
excellent results in terms of classification accuracy. In addition, high demanding.
sensitivity and specificity rates have been achieved in the automatic Fig. 1 reports an overview of the strengths and weaknesses of all
identification of subjects experienced vascular events within one year the approaches.
of an ECG recording In this paper, our aim is to highlight and demonstrate that our
solution is the only one that adopts a TS classification-based approach
2. Related work for the assessment of an early increase in hypertension.
2
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648
Fig. 1. Summary of the state-of-the-art for the hypertension assessment and evaluation.
the size of the CNN’s filters. However, the core idea of our work is to
evaluate a different way of analyzing the sequence of samples.
In our solution, we adopt the LSTM network that is naturally
designed to analyze sequences.
In detail, it processes a sequence of values one sample at a time,
preserving in its internal state a data representation that holds informa-
tion about the all past items of the sequence. As a result, each sample
(timestamp) in an input TS is projected into a higher dimensional space.
3
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648
Fig. 3 shows the structures of the data used for the training. The
signal is a multivariate TS (MTS) where each second of the raw signal 𝑦 = 𝑀𝑎𝑥𝑃 𝑜𝑜𝑙𝑖𝑛𝑔(𝑦𝑐𝑜𝑛𝑣 )
models one time-point (or training example). Each time-point is a
where ⊗ is the convolutional operator, and MaxPooling is a
vector of tree spatio-temporal features where each feature is a different
sub-sample-based discretization operator.
TS of an electrocardiographic holter recording acquired using various
The extracted features have the shape of a vector with dimension-
techniques [7]
ality equal to the kernel filter of the second CNN block.
The CNN Layer produces a sequence of features-vectors which
3.2. Hybrid deep model
must be flattened in order to submit it to the deep layer.
This operation is in charge of the Flatten layer that transforms a
Fig. 4 describes an overview of the proposed hybrid network.
set of one-dimensional vectors of features into a single vector that
At the top level (i.e. the input layer), the model is fed with the input
can be fed into a fully connected network.
TS.
The output of the flatten layer is fed into the deep layer, which
The model is composed of three components.
transforms the flatten vector into a space that makes the output
– The recurrent layer aims to address the temporal correlations easier to classify.
between the time points of the TS. This logical layer is defined – The Deep Layer is composed of a four stacked Deep Neural Net-
as a Long Short-Term Memory network (LSTM). Its capabilities work (DNN). Each layer is defined as a fully-connected layer.
of modeling the correlations of a temporal sequence constitutes Each fully-connected layer has a number of neurons half than
the main reason why we adopted the LSTM networks in this the previous one, followed by a bath normalization (BN) and
work [24]. dropout layers. A rectified linear unit (ReLU) layer is adopted as
The LSTM is defined as a sequence of unit cells for the analyzing an activation function to prevent saturation of the gradient.
of a sequence of data 𝑥1 , … , 𝑥𝑡 with t ∈ 1, 𝑇 , with T the length of The normalization layer is applied to speed up the convergence
the sequence. The generic value 𝑥𝑖 at a time point i is processed speed and help improve the generalization and the dropout layers
by a unit cell. to enhance the generalization capability.
Each time point of both the input and output TS is still shaped as A basic fully-connected layer is formalized as
a vector of features, whose dimensionality changes according to
𝑦𝑓 𝑐 = 𝑊 ⋅ 𝑥 + 𝑏
the unit cell’s filter size.
The LSTM’s output having been characterized by considering the 𝑦𝑏𝑛 = 𝐵𝑁(𝑦𝑓 𝑐 )
features of the input data to be ordered by time. 𝑦 = 𝑅𝑒𝑙𝑈 (𝑦𝑏𝑛 )
The state of the generic units at time t can be expressed as:
Since the output of the network can be stated as a binary clas-
𝑦𝑡 = 𝜙(𝑊 ⋅ 𝑥𝑡 + 𝑈 ⋅ ℎ𝑡−1 + 𝑏) sification, we have decided to adopt a Sigmoid layer as the acti-
vation function [25,26] for the last layer of the Deep Layer for
where 𝜙 is the sigmoid function, and W , U , and b are the
the classification of the input as a high-risk subject or low-risk
parameters which need to be fitted during the training, ℎ𝑡−1 is
subject.
the state of the previous unit.
On the right side of Fig. 4, we have reported a table summarizing
The LSTM counts one layer with a dimensionality of the output
a few of the hyper-parameters that we have used for the training
space of each cell equal to 10 units. Each output of a cell is fed
of the model.
as the input to the convolutional layer.
A comprehensive analysis of the internal structure of all the layers
– The Convolutional layer is in charge of generating a reduced
is outside of the scope of this paper. For more details, the reader
representation of the input by extracting the more informative
can refer to Lipton [24], Orbach [27] and Goodfellow et al. [28].
features from the data input. The layer is defined as two stacked
blocks, each one composed of a convolutional network (C), and a
4. Results
max polling layer (P). The first one has a kernel size of 10 units
for both C and P while the second has 8 units
4.1. Training
A convolutional layer adopts filters that process local partitions
of the input within which the convolutional operator is applied.
In this section we introduce both the training dataset and the
These filters are duplicated alongside the whole input. A sub-
learning process of our model.
sampling step (pooling) performers a smaller resolution version
First, the training data is extracted from the dataset downloaded
of the output of the CL by filtering the maximum value from
from the Physionet repository.
different local regions.
The personal recordings are each of about 24 h in duration and
At the conclusion, the convolutional layer extracts the most rele-
contain three ECG signals, each sampled at 128 samples per second.
vant features from each second by the input TS.
A fragment of 5 min recorded during the daytime was randomly
Each block is formally expressed as:
selected without replication for each subject. From each second, one
𝑦𝑐𝑜𝑛𝑣 = 𝑊 ⊗ 𝑥 + 𝑏 point was randomly selected.
4
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
Each fragment is a single sample of the training dataset. Moreover, 𝐹1 = 2 ∗ (4)
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
each sample was labeled with a tag High or Low based on the subject
from which it was selected 5. where:
The number of samples selected from each recording was different – TP: true positives (i.e. the number of high-risk subjects properly
between high-risk and low-risk subjects to guarantee a fair balancing identified)
between the two classes. – FN: false negatives (i.e. high-risk subjects wrongly identified as
From each low-risk recording we selected 40 samples. However, low-risk subjects)
from each high-risk recording we select 160 samples so that the training – TN: true negatives (i.e, low-risk subjects properly identified)
dataset would be well balanced between the two classes. In conclusion. – FP: false positives (i.e. low-risk subjects wrongly identified as
the training set consisted of more than 7000 samples (a.k.a time series). high-risk subjects)
We repeated this process both for the training set and the test set.
We adopted a 10-fold-cross-validation loop that was performed for The AUC (Area under the ROC Curve) is the entire two-dimensional
the model optimization (i.e. a Tuning hyper-parameters optimization), area underneath the entire ROC curve.
while a hold-out test set was used to obtain unbiased estimates of the For the comparative experiments we implemented all TSC architec-
true classification performance tures presented in the related work based on the indications reported
For the test set, we selected one sample from each low-risk recording in the respective papers.
and three samples from each high-risk. Therefore, the test set consists The aim of this section is to compare the performance of our model
of 177 samples. against the other models. The training and the test were set are the
At the conclusion, we reach a training–testing ratio between the two same for all architectures.
datasets at by 2.5% In order to achieve the best performance we searched for the best
configuration of our network by tuning the hyper-parameters of the
4.2. Performance evaluation networks.
The tuning of our model was evaluated with a grid-search approach
It is worth recalling that the results demonstrate the goodness of our with a view to setting the parameters at:
model in classifying a TS as a high-risk subject or a low-level subject.
– Neurons per layers:{512; 1024; 2048; 4096}
Our model exposes itself as a binary classification.
– Learning rate: {0.01; 0.05; 0.001; 0.005; 0.0001}
For the training stage, we randomly split the whole training dataset
– Epochs: {1000; 1500; 2000, 2500}
by defining a training set as 90% of the original set whereas the
– Convolutional kernel size: {5, 10, 15}
validation set was defined as 10%.
– Pooling filter size: {5, 10, 15}
For the evaluation of our model, four metrics were used: precision,
recall, accuracy and F1-Score [29]. A grid search approach consist of an very thorough searching across a
These measurements are defined as follows: given subset of the hyper-parameter space.
𝑇𝑃 Fig. 6 shows the classification results. For each architecture we
𝑅𝑒𝑐𝑎𝑙𝑙 = (1)
𝑇𝑃 + 𝐹𝑁 report the best results and the worse and the average. The best results
𝑇𝑃 refer to tests case in which the training stage had set with 2500 epochs
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (2)
𝑇𝑃 + 𝐹𝑃 while the worst ones with 1000 epochs. The AVG column describes the
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (3) average of all the test cases performed.
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
5
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648
The first row reports the results of Melillo et al. [7] authors. The 5. Discussions
authors adopt a non-TSC approach but we included them anyway
because they created the dataset and performed the first experiments We have compared our approach to other well-known architec-
so defining a baseline for our work. There is no value of the F1-score tures in identification of high-risk individuals among a population of
because the authors did not evaluate it. hypertensive patients
Considering the best case, the results show that our model surpasses The results show the goodness of our solution. However, an honest
all the other solutions by reaching an accuracy of 98%. The average discussion considering the strengths and weaknesses, is needed
improvement of accuracy is about 13% ranging from a minimum of Fig. 8 shows the ROC curve. Our model achieves the highest AUC
6.12% [22] to a maximum of 18.37% [20]. values, equals to 98%, higher than the one obtained by Zhao et al.
The goodness of the model is also confirmed by the values of recall [22] (95%). The AUC of our solution is better by 3%, in addition to
and precision, both about higher than 95%. The F1-score reaches 96% an average improvement of 9.7%.
so confirming a good behavior of the model with respect to the false It is important to note that there is a cross point at the false-positive
positives and false negatives. rate (FPR), equals to 25%. After this point, our model performs better as
Considering the worst test-case, the results are better with average the FPR increases and consequently, the response in terms of Recall is
improvements of accuracy of 16%. better. This demonstrates that our model is able to avoid false positives
The following figures reefers to the results achieved in the best test (i.e. a high-risk subject would predict as a low-risk)
cases. Before this point our solution shows very similar results to Zhao
Fig. 7 shows the confusion matrix of each architecture. et al. [22] and Zheng et al. [21] and the improvements are negligible.
Sub-figure A-7 reports the average classification results on the test Fig. 7 shows the confusion matrix of each architecture.
set. The model is able to discern between the two classes with a high It is worth noting that our model achieves a good performance
precision. as regards false positive. The matrix reports only 2 false positives
The test set consists of 156 samples. Our model identifies the low- which produces an improvement of more than 200% compared with
risk subject (the false cases) with an accuracy of 98% (120 out of 122), the best result of the other architecture ([21] with 6 false positives).
and the high-risk subjects (the true cases) with an accuracy of 96% (53 The average number of false positive is around 9, and consequently,
samples out of 55). our model shows better results.
Fig. 8 presents the ROC curve showing the performance of our Concerning the true negative our model achieves results close to the
classification model. other architectures so there is no a clear improvement.
The curves of all the solutions have a very similar trend. However, Figs. 9b and 9a show the trends of the accuracy and loss function
our model shows a higher AUC compared to the other architectures. of our model during the training.
6
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648
It is possible to observe that in both figures the curves of the (TCNN), +1.7% (Encoder), +2.9%(FCN), +7%(LeNet), +4% (ResNet)
training set and validation set are very close to each other. This results and +1.6%(MCDCNN).
demonstrates that the models is less affected by over-fitting issues. Fig. 11 show an overview of the training time duration of each
At the beginning of the training process, the learning process shows network. The column Trainable Parameters reports the number of inside
a fluctuating trend that gets smoother after the 250 epoch. parameters of each network which have been fitted during the training.
Figs. 10b and 10a show the trends the accuracy and loss function Our model is the biggest network with more the 1.2 million parameters,
of our solution compared with the other architectures. and consequently it takes the longest training time.
The figures demonstrate that our model converges more slowly than These results show that our model is able to achieve the best
the others. This finding shows that our model reaches a local minimum performance in terms of discerning between high-risk subjects and low-
more slowly since it requires more epochs. risk subjects. However, it shows a slow trending during the training.
After 100 epochs the value of loss of our model is greater than This aspect increases the total amount of time needed by our model
the majority of the other models: +99% (MCDCNN), +76% (Encoder), to reach a local minimum point while the other solutions reach a
+48% (FCN), +45% (LeNet), +43% (ResNet) and -82% (TCNN). sub-optimal point faster.
Every models reaches a stable point after 250 epochs but even at this From the experiments, we have learnt a some points which will
step our model does not show the best loss. At the last epoch the gap- examine in depth in future activities, including:
loss is +90% (MCDCNN), +73% (TCNN), 36% (Encoder), 5% (FCN),
−0.08% (LeNet), -4% (ResNet). – reducing the numbers of parameters. The high number of param-
The slow convergence of our model is rewarded with a better eters makes the training process very long. This is the main issue
accuracy. Fig. 10b shows that at the end of the training the accu- in relation of our approach which we are planning to investigate
racy achieved by our model is the best. As regards the improve- in our future work. A Possible solutions cloud include changing
ment in the classification accuracy, our solution is better by: +5% the network’s structure by modifying the layers.
7
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648
Fig. 9. Trend of the loss and accuracy of the model compared with the validation set.
Fig. 10. Comparison of the trends of the loss and accuracy of all architectures.
8
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648
References
[1] S. Gulec, Early diagnosis saves lives: focus on patients with hypertension, Kidney
Int. Suppl. 3 (2013) 332–334, http://dx.doi.org/10.1038/kisup.2013.69.
[2] S.S. Daskalopoulou, N.A. Khan, R.R. Quinn, M. Ruzicka, D.W. McKay, D.G.
Hackam, S.W. Rabkin, D.M. Rabi, R.E. Gilbert, R.S. Padwal, et al., The 2012
canadian hypertension education program recommendations for the management
of hypertension: Blood pressure measurement, diagnosis, assessment of risk, and
therapy, Canad. J. Cardiol. 28 (2012) 270–287, http://dx.doi.org/10.1016/j.cjca.
2012.02.018.
[3] C.L. Schwartz, R.J. McManus, What is the evidence base for diagnosing hyper-
tension and for subsequent blood pressure treatment targets in the prevention of
cardiovascular disease? BMC Med. 13 (2015) http://dx.doi.org/10.1186/s12916-
015-0502-5.
[4] K. Shobha, S. Nickolas, Analysis of importance of pre-processing in prediction
of hypertension, CSI Trans. ICT 6 (2018) 209–214, http://dx.doi.org/10.1007/
s40012-018-0197-9.
[5] H. Zhao, Z. Ma, Y. Sun, A hypertension risk prediction model based on bp
Fig. 11. Overview of the training time duration of each network.
neural network, in: 2019 International Conference on Networking and Network
Applications (NaNA), IEEE, 2019, http://dx.doi.org/10.1109/nana.2019.00085.
[6] W. Chang, Y. Liu, Y. Xiao, X. Yuan, X. Xu, S. Zhang, S. Zhou, A machine-
– Increasing the dataset size: The dataset used for the training is a learning-based prediction method for hypertension outcomes based on medical
data, Diagnostics 9 (2019) 178, http://dx.doi.org/10.3390/diagnostics9040178.
good starting point. However, we are planning to test our model
[7] P. Melillo, R. Izzo, A. Orrico, P. Scala, M. Attanasio, M. Mirra, N. De Luca,
with other bigger datasets which describe the same class of data. L. Pecchia, Automatic prediction of cardiovascular and cerebrovascular events
The increasing of the dataset and the validation on a bigger using heart rate variability analysis, PLOS ONE 10 (2015) e0118504, http:
dataset will improve the goodness of our model and prevent it //dx.doi.org/10.1371/journal.pone.0118504.
from overfitting issues. [8] A.L. Goldberger, L.A.N. Amaral, L. Glass, J.M. Hausdorff, P.C. Ivanov, R.G. Mark,
J.E. Mietus, G.B. Moody, C.-K. Peng, H.E. Stanley, PhysioBank, PhysioToolkit,
– Generalize the model: One of the future investigations will focus
and PhysioNet: Components of a new research resource for complex physiologic
on the application of the proposed model for different use cases signals, Circulation 101 (2000) e215–e220, http://dx.doi.org/10.1161/01.CIR.
in order to make it as more generalizable as possible. 101.23.e215, Circulation Electronic Pages: http://circ.ahajournals.org/content/
101/23/e215.full PMID:1085218.
6. Conclusions [9] J. Kitt, R. Fox, K.L. Tucker, R.J. McManus, New approaches in hypertension
management: a review of current and developing technologies and their potential
impact on hypertension care, Curr. Hypertens. Rep. 21 (2019) http://dx.doi.org/
In this work we have presented a TS-based approach for the early
10.1007/s11906-019-0949-4.
identification of high-risk subjects affected by hypertension.
[10] A. Hinderliter, R.A. Voora, A.J. Viera, Implementing abpm into clinical practice,
Among other approaches for the assessment of hypertension, our Curr. Hypertens. Rep. 20 (2018) http://dx.doi.org/10.1007/s11906-018-0805-y.
approach is the first one that adopts a TSC to differentiate between [11] A. Wang, N. An, G. Chen, L. Li, G. Alterovitz, Predicting hypertension without
high-risk and low-risk subjects. measurement: A non-invasive, questionnaire-based approach, Expert Syst. Appl.
We have compared our model with other deep network architec- 42 (2015) 7601–7609, http://dx.doi.org/10.1016/j.eswa.2015.06.012.
tures for the classification of TS and the results experiment shows that [12] S. Mohan, C. Thirumalai, G. Srivastava, Effective heart disease prediction using
hybrid machine learning techniques, IEEE Access 7 (2019) 81542–81554, http:
our model achieves better results in terms of classification accuracy.
//dx.doi.org/10.1109/access.2019.2923707.
However, the high number of parameters of our model mean that it [13] L.-w.H. Lehman, R.P. Adams, L. Mayaud, G.B. Moody, A. Malhotra, R.G. Mark,
takes a long time to complete the training. S. Nemati, A physiological time series dynamics-based approach to patient
The analysis of the whole TS shows promising results in terms of monitoring and outcome prediction, IEEE J. Biomed. Health Inf. 19 (2015)
highlighting the tiny differences between subjects affected by hyper- 1068–1076, http://dx.doi.org/10.1109/jbhi.2014.2330827.
tension. [14] H. Koshimizu, R. Kojima, K. Kario, Y. Okuno, Prediction of blood pressure
variability using deep neural networks, Int. J. Med. Inform. 136 (2020) 104067,
http://dx.doi.org/10.1016/j.ijmedinf.2019.104067.
CRediT authorship contribution statement [15] Y. Luo, Y. Li, Y. Lu, S. Lin, X. Liu, The prediction of hypertension based on
convolution neural network, in: 2018 IEEE 4th International Conference on
Giovanni Paragliola: Methodology, Supervision, Investigation, Computer and Communications (ICCC), 2018, pp. 2122–2127.
Writing - review & editing, Visualization, Software, Formal analy- [16] Z. Wang, W. Yan, T. Oates, Time series classification from scratch with deep
sis, Writing - original draft. Antonio Coronato: Project administra- neural networks: A strong baseline, 2016, arXiv:1611.06455.
tion, Funding acquisition, Resources, Writing - review & editing, Data [17] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features
for discriminative localization, 2015, arXiv:1512.04150.
curation, Visualization, Formal analysis, Conceptualization.
[18] J. Serrà, S. Pascual, A. Karatzoglou, Towards a universal neural network encoder
for time series, 2018, arXiv:1805.03908.
Declaration of competing interest [19] D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning
to align and translate, 2014, arXiv:1409.0473.
The authors declare that they have no known competing finan- [20] A.L. Guennec, S. Malinowski, R. Tavenard, Data augmentation for time series
cial interests or personal relationships that could have appeared to classification using convolutional neural networks, 2016.
influence the work reported in this paper. [21] Y. Zheng, Q. Liu, Y. Chen, Enhongand Ge, J.L. Zhao, Time series classification
using multi-channels deep convolutional neural networks, in: F. Li, G. Li, S.-w.
Hwang, B. Yao, Z. Zhang (Eds.), Web-Age Information Management, Springer
Acknowledgments International Publishing, Cham, 2014, pp. 298–310.
[22] B. Zhao, H. Lu, S. Chen, J. Liu, D. Wu, Convolutional neural networks for time
This work is supported by the AMICO project which has received series classification, J. Syst. Eng. Electron. 28 (2017) 162–169, http://dx.doi.
funding from the National Programs (PON) of the Italian Ministry org/10.21629/jsee.2017.01.18.
of Education, Universities and Research (MIUR): code ARS0100900 [23] T.G. Dietterich, Machine learning for sequential data: A review, in: T. Caelli,
A. Amin, R.P.W. Duin, D. de Ridder, M. Kamel (Eds.), Structural, Syntactic,
(Decree n.1989, 26 July 2018).
and Statistical Pattern Recognition: Joint IAPR International Workshops SSPR
We would like thank to Dott. Giovanni Donnici who is a cardiologist 2002 and SPR 2002 Windsor, Ontario, Canada, August 6–9, 2002 Proceedings,
at Dipartimento di alta specialità del cuore, AOR San Carlo di Potenza, Springer Berlin Heidelberg, Berlin, Heidelberg, 2002, pp. 15–30, http://dx.doi.
Italy for his contribution to the revision of the paper. org/10.1007/3-540-70659-3_2.
9
G. Paragliola and A. Coronato Journal of Biomedical Informatics 113 (2021) 103648
[24] Z.C. Lipton, A critical review of recurrent neural networks for sequence learn- [27] J. Orbach, Principles of neurodynamics. perceptrons and the theory of brain
ing, 2015, CoRR, arXiv:abs/1506.00019. URL: http://arxiv.org/abs/1506.00019. mechanisms, Arch. Gen. Psychiatry 7 (1962) 218, http://dx.doi.org/10.1001/
arXiv:1506.00019. archpsyc.1962.01720030064010.
[25] C. Yin, Y. Zhu, J. Fei, X. He, A deep learning approach for intrusion detection [28] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016, http:
using recurrent neural networks, IEEE Access 5 (2017) 21954–21961. //www.deeplearningbook.org.
[26] R. Kumari, S. Kr., Machine learning: A review on binary classification, Int. J. [29] Evaluation: From precision, recall and f-factor to roc, informedness, markedness
Comput. Appl. 160 (2017) 11–15, http://dx.doi.org/10.5120/ijca2017913083. e correlation. URL: http://dx.doi.org/10.9735/2229-3981. doi:http://dx.doi.org/
10.9735/2229-3981.
10