Journal Pre-Proof: Pattern Recognition Letters

Validating the robustness of an internet of things based atrial fibrillation detection system
Journal Pre-proof
Validating the robustness of an internet of things based atrial

fibrillation detection system
Oliver Faust, Murtadha Kareem, Alex Shenfield, Ali Ali,

U Rajendra Acharya
PII: S0167-8655(20)30043-X
DOI: https://doi.org/10.1016/j.patrec.2020.02.005
Reference: PATREC 7782
To appear in: Pattern Recognition Letters
Received date: 20 September 2019

Revised date: 23 January 2020
Accepted date: 4 February 2020
Please cite this article as: Oliver Faust, Murtadha Kareem, Alex Shenfield, Ali Ali, U Rajendra Acharya,
Validating the robustness of an internet of things based atrial fibrillation detection system, Pattern
Recognition Letters (2020), doi: https://doi.org/10.1016/j.patrec.2020.02.005
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.
© 2020 Published by Elsevier B.V.

Highlights
• We validated a mature LSTM deep learning model

with more and more varied data to mimic the medical
diagnosis process.
• The validation data was distinct from the training
data: it was measured from different patients.
• The model was created with the data from only 20
patients and we have validated it with data from 82
patients.
• Under these difficult circumstances, our LSTM
based deep learning model achieved an accuracy of
94%.
• The classification performance indicates that the ma-

ture LSTM based deep learning model is fit for prac-
tical service.
1
1
Pattern Recognition Letters

journal homepage: www.elsevier.com
Validating the robustness of an internet of things based atrial fibrillation detection system
Oliver Fausta , Murtadha Kareema , Alex Shenfielda , Ali Alib , U Rajendra Acharyac,d
a Department of Engineering and Mathematics, Sheffield Hallam University, United Kingdom
b Sheffield
Teaching Hospitals NHS Foundation Trust, University of Sheffield, United Kingdom
c Ngee Ann Polytechnic, Singapore
d Department of Bioinformatics and Medical Engineering, Asia University, Taiwan
Article history:
ABSTRACT
Intelligent internet of things, Deep learn-

This paper describes the validation of a deep learning model for Internet of Things (IoT)
ing, Atrial fibrillation, Heart rate, Blind- based health care applications. As such, the deep learning model was created to detect
fold validation episodes of Atrial Fibrillation (AF) using Heart Rate (HR) signals. The initial Long
Short-Term Memory (LSTM) model was developed using 20 data sets, from distinct
subjects, obtained from the AFDB database on PhysioNet. This model achieved an AF
detection accuracy of 98.51% with ten fold cross validation. In this study, we validated
the initial results by testing the developed deep learning model with unknown data. To
be specific, we fed the data from 82 subjects to the deep learning system and compared
the classification results with the diagnosis results indicated by human practitioners. The
validation results show 94% accuracy with an area under the Receiver Operating Char-
acteristic (ROC) curve of 96.58. These results indicate that the LSTM model is able to
extract the feature maps from the unknown data and hence detect the AF periods accu-
rately. With this blindfold validation testing we violated a well known design rule for
learning systems which states that more data should be used for training than for test-
ing. By doing so, we have established that our deep learning system is fit for practical
deployment, because in a practical situation the diagnosis support system must apply the
knowledge, extracted from a limited training data set, to a HR trace from a patient.
c 2020 Elsevier Ltd. All rights reserved.
1. Introduction et al., 2013). The sinus node consists of a cluster of special

cells, which act as the heart’s natural pacemaker. In AF, dis-
AF is the most common serious irregular heart rhythm asso-
ordered electrical activity progresses in the walls of the atria,
ciated with rapid heart rate in adults (Benjamin et al., 1998).
exceeding the sinus node. As a result, the atria begins to fib-
Atrial means to the atria (plural of atrium), which describes the
rillate and the rhythm will change from normal to abnormal.
locations at the top two chambers of the heart (Kenny, 2018;
That leads to a rapid rhythm as their muscular walls fail to con-
Swapna et al., 2018). Fibrillation refers to irregular, rapid and
tract with coordination and regularity. To be accurate, 0.4% of
unsynchronised contraction of muscle fibers. Sinus rhythm rep-
adults are affected by this disease, and that prevalence increases
resents the normal beat of the heart, which is managed by a so-
with age (Ali et al., 2012). Less than 1% of people, aged of 60
phisticated electrical control system. This system controls the
or younger, are affected by AF and in surplus of 6% for those
timing of the heart pump. When the electrical system is func-
aged 80 and older (Adamson et al., 2004). It is anticipated that
tioning properly, it maintains a normal heart rate and rhythm
the occurrence of AF increases in accordance with the aging
(Martis et al., 2013b). Problems with this electrical system
population. In addition, AF incidence is related to a significant
can cause two types of arrhythmia; tachycardia and bradycar-
increase in stroke, heart failure, poor mental health, diminished
dia when the heart beats too fast or too slow respectively (Faust
life quality, and as such it is a leading cause of death (Stew-
art et al., 2002). This disease is associated with various types
∗∗ Corresponding author of symptoms, such as chest pain, shortness of breath, fainting,
2
fatigue, palpitation and light-headedness (Thrall et al., 2007). matrix and ROC curve. The subsequent discussion takes cen-
The fact that AF is an independent risk factor for stroke has tre stage in this paper, because in this section we outline why a
significant clinical and economic implications (Ali et al., 2015). robust decision-making process is a necessary prerequisite for
Overall, treatment of AF often includes cardioversion (restoring IoT based AF monitoring. The Discussion section also provides
sinus rhythm) which may involve using medication or electrical limitations and future work, before we wrap up the paper with
cardioversion. the conclusions.
Before we can treat this serious disease, it is necessary to
diagnose it first. One way of diagnosing AF is to record HR 2. Background on intelligent Internet of medical things
signals and subsequently look for signs of AF in these signals.
IIoTs deliver measurement data to a central storage location
The most promising approaches, documented in the scientific
for centralized decision-making (Gubbi et al., 2013). In the
literature, use computer support for that detection task (Hagi-
medical domain, such measurement data are usually physio-
wara et al., 2018; Song et al., 2012). The Computer-Aided Di-
logical signals, such as HR signals (Dimitrov, 2016). Figure
agnosis (CAD) systems are trained and tested on labeled data,
1 depicts a use case scenario for a medical IIoTs. Patients wear
i.e. data that was analyzed by human cardiologists. Looking at
a HR sensor, which is depicted in the figure as a black oval
the scientific literature in more detail revealed that all studies
with a red dot. That sensor communicates the HR signal to a
stopped at a one-time evaluation of the learning model (Chen
mobile device which relays the data to the cloud server via an
et al., 2017; Gotlibovych et al., 2018; Xia et al., 2018; Yıldırım
IoT protocol. The LSTM based deep learning model, which
et al., 2018; Yuan et al., 2016). That means these studies did not
was validated in this study, analyzes the incoming HR signals
test the efficacy of their model with more and more varied HR
in real-time. If AF is detected, a cardiologist is informed. The
data. Unfortunately, using the model with more and more var-
cardiologist can review the suspicious HR trace and reach a di-
ied data is the use case scenario which arises in clinical practice.
agnosis. That diagnosis can be communicated to the patients
To be specific, in a practical setting the patient data represents
in the form of a simple traffic light scheme, as indicated in the
a sample from the continuum of all possible data. That con-
figure.
tinuum is better represented with more and more varied data.
Hence, testing a model with more and more varied data instills 2.1. Long short term memory based deep learning
trust that the model will work in a practical scenario. The ab-
The use case scenario features LSTM based deep learning.
sence of such blindfold validation is a research gap, because
As such, LSTM is an extension of Recurrent Neural Network
the lack of follow up verification results in a diminished trust in
(RNN) (LeCun et al., 2015). That algorithm was chosen be-
the CAD systems. This lack of trust is one of the reasons why
cause LSTM can process data sequences (Sak et al., 2014). This
we do not see the wide spread application of CAD systems in
functionality is distinct from other deep learning approaches
clinical practice.
which consider an input vector as data point in a multidimen-
The current study tried to overcome the above-mentioned sional space (Faust et al., 2018a). LSTM includes the idea of
problem by evaluating the performance of a mature deep learn- time by incorporating memory cells and a sophisticated updat-
ing model with unseen data. We have asked the model to detect ing algorithm (Oh et al., 2018). The updating algorithm uses
AF in signals from 82 long-term HR recordings of subjects with three regulators or gates which control the flow of information
paroxysmal or sustained AF. The system achieved a classifica- within the LSTM. The input gate determines how much new
tion accuracy of 94%, a precision of 95% and it had 95% recall. data flows into the cell. The forget gate controls how much in-
This result is significant as it is obtained using totally unknown formation is retained and the output gate controls the outflow of
data. As such, the model was established with a training data set information. That flexible use of memory enables LSTM to ap-
that came from 20 patients and now we have validated it with proach the utility of a ”general purpose computer” (Siegelmann
data from 82 patients. Doing so goes against the common con- and Sontag, 1995).
cept of using a smaller portion of the data for testing. Hence, it Figure 2 shows a functional diagram of the LSTM algorithm.
is more difficult for the deep learning model to establish the cor- The upper part of the diagram indicates the RNN loop un-
rect result. Furthermore, the training data was measured with rolling, which results in individual LSTM cells. The hidden
a different setup from the validation data, i.e. sampling fre- state vector ~ht ∈ Rh and the cell state vector ~ct ∈ Rh are passed
quency, voltage range, and resolution were different. The fact from one cell to the next. The cells consume the input vector ~xt
that the results were achieved in that different environment indi- at different time instances t. Each cell A has LSTM functional-
cates that the deep learning system knows what an AF affected ity, as indicated in the lower part of the figure.
HR signal looks like. Hence, these results establish that the Each cell incorporates the three gates to establish the LSTM
deep learning model is reliable to detect AF in clinical settings. functionality (Gers et al., 2002). The forget gate regulates the
Having that impetus is a prerequisite for Intelligent Internet of information content stored within the cell and thereby it plays
Things (IIoT) based diagnosis support tools. a vital role in modelling the way humans remember and indeed
To communicate our findings, we have structured the remain- forget (Gers et al., 2000). It is implemented as the first multi-
der of the paper as follows. The next section provides back- plier from the left, highlighted in orange. The vector f~t ∈ Rh
ground on IIoT and the deep learning model. The subsequent controls the forget gate. Mathematically, that vector is estab-
section introduces the methods used for validating that model. lished with:
Section 4 states the validation results in the form of confusion f~t = σ W f ~xt + U f ~ht−1 + ~b f (1)
3
Fig. 1: Use case diagram of an IoT based decision support system. The color of the mobile device indicates the diagnosis result. Red indicates immediate action is
required, orange indicates caution, and green no reason for concern.
where ◦ indicates element-wise multiplication and Tanh(...) is

the hyperbolic tangent function. Finally, the current hidden
state vector is established with:
~ht = ~ot ◦ Tanh(~ct ) (5)
The weights and biases are established during the training

phase and they constitute the LSTM model. During the testing
phase, the model is used to classify an input sequence ~xt . In
our case, the model establishes if there are signs of AF in a HR
sequence. The methods used for testing the LSTM model are
Fig. 2: Overview of the deep learning algorithm. Depicted as RNN loop un- introduced in the next section.
rolling and LSTM cell.
3. Methods
where σ(...) is the sigmoid activation function. W f ∈ Rh×d and
U f ∈ Rh×d are weight matrices and ~b f ∈ Rh is a forget spe- Recently, we developed an LSTM based deep learning model
cific bias vector. The parameters h and d indicate the number of to detect the AF episodes using HR signals (Faust et al., 2018b).
hidden units and the dimensionality of the input vector respec- That model can provide the intelligence needed for state of the
tively. art IoT based diagnosis support systems. The current study
The input gate is implemented as the second multiplier from setup tests the same model with more and more varied data ob-
the left, highlighted in blue. The vector ~it ∈ Rh controls the tained from a different source.
output gate. The mathematical definition of ~it is similar to f~t : Figure 3 provides an overview of the study setup. The upper
half of the figure shows the initial study setup. The LSTM based

~it = σ Wi ~xt + Ui ~ht−1 + ~bi (2) deep learning system was trained and tested with labeled HR
signal data from 20 subjects sourced from PhysioNet’s Atrial
the weight matrices Wi ∈ Rh×d and Ui ∈ Rh×d as well as the Fibrillation Database (AFDB). Training the deep learning net-
work means establishing the weight values of the individual
bias vector ~bi ∈ Rh are different.
neurons. The weight vectors from the initial study were used
The output gate is implemented as the third multiplier from
in the validation setup. However, more and more varied data
the left, highlighted in green. The vector ~it ∈ Rh controls the
were used for the blindfold validation testing. The following
input gate. The mathematical definition of ~it is similar to the
sections detail the validation setup by introducing the data sets
outer two gates:
and the processing steps in more detail.

~ot = σ Wo ~xt + Uo ~ht−1 + ~bo (3)
3.1. Data used
h×d h×d
the weight matrices Wo ∈ R and Uo ∈ R as well as the The initial study was based on data collected from the MIT-
bias vector ~bo ∈ Rh are different. BIH AFDB which is available on PhysioNet (Moody, 1983;
Having established the control vectors for the gates, we can Goldberger et al., 2000). This database includes twenty three
progress to formulate the equations which define the cell output. 10 hour Electrocardiogram (ECG) recordings. These record-
The cell state vector for the current time is established with the ings were labelled with so called rhythm annotation files, the
following equation: content of which was prepared manually. The specific types of
rhythm annotations were (AFIB (atrial fibrillation), (AFL (atrial
~ct = f~t ◦ ~ct−1 + ~it ◦ Tanh(Wc ~xt + Uc ~ht−1 + ~bc ) (4) flutter), (J (AV junctional rhythm), and (N (used to indicate all
4
Fig. 3: Overview of our validation study.
other rhythms). Furthermore, the database contains beat anno- of True Positive (TP), True Negative (TN), False Positive (FP),
tation files that were used to extract the HR signals. and False Negative (FN). Most medical test results refer to a
For the blindfold validation, we used the data from 82 sub- positive case (classifying the subject having the disease) and a
jects, which were sourced from the Long-Term AF Database negative case (classifying the patient not having the disease).
(LTAFDB) (Petrutiu et al., 2007). All patients in that database The ROC curve is a method used to evaluate the diagnostic
were diagnosed with paroxysmal or sustained AF. Each data accuracy of tests in modern medicine (Mas, 2018). It is widely
set is composed from two ECG signals each of which has a du- used to illustrate how well a diagnostic model can distinguish
ration of 24 to 25 hours and was sampled at 128 Hz. The data between the presence and absence of disease and works equally
sets incorporate also rhythm and beat annotations, which were well with data sets that exhibit class imbalance. The ROC
established manually by experienced cardiologists. Moreover, represents the TP rate (sensitivity) plotted against the FP rate
the R peaks are labelled, and the RR interval sequence was ex- (1-specificity) for various cut-off points (Powers, 2011). Each
tracted based on these labels. Rhythm annotations were used to point on the ROC graph indicates the sensitivity/specificity cor-
label the RR interval sequence as either AF or non-AF. Table 1 responding to a specific decision threshold. The Area Under
provides the measurement parameters that were used to acquire Curve (AUC) is a summary metric indicating the discrimina-
the data for the two individual databases. The fact that there is tory power of a classifier.
a significant difference in the sampling frequency indicates the
measurement setups were quite different. 3.3. Initial study setup and performance
The pre-processing of the data in the validation setup follows
closely the initial study. A sliding window was used to parti- For the initial study, we designed a deep RNN (Pascanu et al.,
tion the HR signals into overlapping sequences of 100 beats. 2013; Pearlmutter, 1989) with LSTM (Hochreiter and Schmid-
Each sequence of 100 beats was labeled as AF if one of the huber, 1997; Bengio et al., 1994) model to detect AF based on
beats, within the sequence was labeled AF, all other sequences HR traces. The pre-processed data blocks were fed directly into
were labeled non-AF, indicating normal or other cardiac dis- the model without performing any feature engineering (Faust,
ease. Data from 82 subjects were used with the blindfold val- 2018). Table 2 provides the ten-fold cross validation results ob-
idation strategy to evaluate the performance of the robust deep tained for the initial model based on the data from 20 AFDB
learning system. This means the proposed methods can be used subjects.
to generalize not only unknown data, but also the unknown pa-
tients as well.
4. Validation results
3.2. Performance measures This section documents the blindfold validation results of the
The confusion or error matrix summarizes the prediction re- LSTM based deep learning model obtained in (Faust et al.,
sults of a classification problem (Hay, 1988). The layout al- 2018b). The results are based on HR signals from the 82
lows visualization of the performance of a decision-making LTAFDB subjects. The radar plot, shown in Figure 4, provides
algorithm. To be specific, the matrix has two rows and two a graphical representation of the model performance in terms
columns that represent the classes of actual and predictive anal- of accuracy, precision, recall and F1 score. The labels, around
ysis. Therefore, these performance measures report the number the radar plot, correspond to the patient ID as mentioned in the
5
Table 1: Comparison between AFDB and LTAFDB.
AFDB LTAFDB
Duration 10 h Between 24 h and 25 h
ADC resolution 12-bit 12-bit
Voltage range ±10 mV 20 mV
Sampling frequency 250 Hz 128 Hz
Measurement location Boston’s Beth Israel Hospital Not reported
Table 2: Ten fold cross validation result established during the initial training Blindfold validation should become a standard method to eval-
and testing of the LSTM deep learning model.
uate deep learning and other decision support algorithms.
TP TN FP FN Accuracy AUC The fact that the deep learning system was trained with HR
430,615 523,241 7,040 7,407 98.51% 0.9986 signals is of particular importance for home health care. HR
signals capture the beat-to-beat interval of the human heart
(Acharya et al., 2013). The beat impulse is the most prominent
LTAFDB. The IDs were ordered in terms of accuracy. The sig- signal feature when the electrical activity of the human heart
nal with label 10, shown at 12 o’clock, has the highest accuracy. is recorded. That prominence is reflected in the high signal-to-
The accuracy performance of the model decreases clockwise, noise ratio. Hence, measuring the beat-to-beat interval is robust
i.e. the accuracy was lower for signal 12 than for signal 10 etc. to noise. As a consequence, the measurement setup is simple,
The model achieved the lowest accuracy of just under 70% for when compared to other physiological signals, such as ECG
the HR signal from patient 22. (Martis et al., 2013a). Another advantage of HR is that the time
We have evaluated the model performance with a confusion from one beat to the next can be encoded with a 2-byte integer.
matrix in order to validate AF and normal HR sequences ob- Hence, a digital HR signal consists of approximately 2 bytes
tained from 82 subjects. Figure 5 shows the results obtained a second. In contrast, to sample the complete electrical activ-
using the new test data: (a) confusion matrix by blindfold val- ity of the human heart requires around 256 samples a second,
idation, (b) ROC of the model. It can be noticed from the re- each of which is encoded with 2 bytes. Therefore, ECG sig-
sults in Table 3, that the LSTM deep learning model achieved nals have a 256 times higher data rate. The lower data rate has
a classification accuracy of 94% on the LTAFDB blindfold val- practical benefits, in terms of data communication, storage and
idation data set - classifying 95% of normal and 95% of HR processing. For example, HR signals can be communicated via
sequences showing signs of AF correctly. The deep learning Bluetooth Low Energy (BLE) whereas ECG signals require a
classifier achieves an AUC of 0.9658 which is close to a perfect broader wireless channel, such as WiFi. Using BLE instead of
score of 1, hence the classifier discriminates well between dis- WiFi has beneficial implications for battery powered devices.
ease and no disease. Most classifier values varied between 0.0 The low data rate and patient led signal acquisition makes
(definitely negative) and 1.0 (definitely positive). The closer the HR signals a good choice for IoT based health care applica-
ROC plot to the TP rate, the higher overall accuracy of the test tions. For such applications the HR data travels from the point
performed. Figure 5b shows the ROC curve which indicates the of measurement to a central cloud service over communication
AF diagnostic performance. infrastructure. Having the data at a central location has a num-
Table 3 provides the blindfold validation results for 3 AFDB ber of advantages. Deep learning can be used to detect AF in
subjects, as documented during the initial study, and 82 real time. That is a significant advantage over ECG measure-
LTAFDB subjects. The 82 LTAFDB data sets represent 95 ments with Holter monitors, because Holter data can only be
times more data than the initial 3 data sets from the AFDB. analyzed after the measurement period is complete. Validating
the deep learning model for AF detection has paved the way
for an IoT based AF diagnosis support tool. The knowledge
extracted from a limited amount of labeled data can be used to
5. Discussion provide real-time decision support. In order to reach the diag-
nosis, the deep learning result should be validated by a cardi-
This study shows that deep learning model is able to mimic ologist. The basis for this validation can be the HR data which
human decision-making, when it comes to extracting relevant has been flagged as showing the subtle waveform alterations
knowledge from a labeled data set. Training the deep learning caused by AF.
algorithm models the knowledge generation which transpires A limitation of our study is that we did not address the impor-
during the education of a cardiologist. The blindfold valida- tant concept of training on the job. To be specific, a cardiologist
tion, conducted in this study, models the clinical practice where learns while doing the job. The proposed deep learning model
the decision-Fmaking system is presented with signals from a is static, i.e. it did not learn during the validation. At one point
wide range of patients. Validating the deep learning with un- the knowledge, extracted from 20 patients, will be insufficient
known data that was obtained with a different measurement to cope with practical scenarios. In the future, we have to find
setup builds trust in the AF detection model. Furthermore, the a way to model that continuous training in order to improve the
results might be transferable to other areas of CAD and beyond. diagnostic quality of the proposed AF detection system. One
6
Fig. 4: Illustration of performance for various test subjects obtained by blindfold validation.
way of providing this continuous learning is to retrain the deep In this case, our deep learning model understands HR signals,
learning model with measurement data. A prerequisite for such such that it can differentiate AF from non-AF affected signals.
a methodology is to have the HR data stored in a central loca- This is different from the traditional machine learning approach
tion. Hence, streaming the HR data to a central cloud server which is based on feature engineering. To be specific, the tra-
might prove to be an advantage when the continuous learning ditional approach differentiates AF from non-AF signals based
problem is tackled. on parameters. It is likely that these parameters change when
more and more varied data are analyzed. Hence, it is common
practice for studies based on traditional methods to state this as
6. Conclusion a limitation, i.e. more and more varied data is needed as well as
retraining the machine learning model to improve the diagnos-
The current study indicates that deep learning can acquire
tic quality.
the knowledge to understand specific aspects of HR signals.
7
(a) (b)
Fig. 5: Results obtained using the new test data: (a) confusion matrix by blindfold validation, (b) ROC of the model.
Table 3: Blindfold validation obtained from the LSTM deep learning model for both 3 subjects from the AFDB and 82 subjects from the LTAFDB
Database TP TN FP FN No beats Accuracy AUC

AFDB 91,888 65.699 255 116 92,325 99.77% 1
LTAFDB 5,009,401 3,238,048 241,416 316,318 8,805,183 94% 96.58
In the current study, we have used more and more varied data Benjamin, E.J., Wolf, P.A., DAgostino, R.B., Silbershatz, H., Kannel, W.B.,
without retraining the deep learning model. Data from 82 pa- Levy, D., 1998. Impact of atrial fibrillation on the risk of death: the fram-
ingham heart study. Circulation 98, 946–952.
tients were used for this blindfold validation. With this data, Chen, W., Liu, G., Su, S., Jiang, Q., Nguyen, H., 2017. A chf detection method
our model achieved a classification accuracy of 94%, a preci- based on deep learning with rr intervals, in: Engineering in Medicine and
sion of 95% and it had 95% recall. These results indicate that Biology Society (EMBC), 2017 39th Annual International Conference of
our deep learning model has fewer limitations when compared the IEEE, IEEE. pp. 3369–3372.
Dimitrov, D.V., 2016. Medical internet of things and big data in healthcare.
to traditional machine learning approaches. To be specific, we Healthcare informatics research 22, 156–163.
showed that knowledge extracted from a small training data set Faust, O., 2018. Documenting and predicting topic changes in computers in
can be applied to a larger and more varied validation data set. biology and medicine: A bibliometric keyword analysis from 1990 to 2017.
This concept is significant for practical diagnostic support, be- Informatics in Medicine Unlocked 11, 15–27.
Faust, O., Hagiwara, Y., Hong, T.J., Lih, O.S., Acharya, U.R., 2018a. Deep
cause applying knowledge acquired during the limited training learning for healthcare applications based on physiological signals: a review.
period is exactly what a cardiologist does. Hence, a practical Computer Methods and Programs in Biomedicine 161, 1–13.
IoT based medical decision support system must apply knowl- Faust, O., Martis, R.J., Min, L., Zhong, G.L.Z., Yu, W., 2013. Cardiac arrhyth-
edge, extracted during training, to samples, i.e. patient data, mia classification using electrocardiogram. J. Med. Imaging Health Inf 3,
448.
from a very large data set. Faust, O., Shenfield, A., Kareem, M., San, T.R., Fujita, H., Acharya, U.R.,
2018b. Automated detection of atrial fibrillation using long short-term mem-
ory network with rr interval signals. Computers in biology and medicine
References 102, 327–335.
Gers, F.A., Schmidhuber, J.A., Cummins, F.A., 2000. Learning to forget: Con-
tinual prediction with lstm. Neural Computation 12, 2451–2471.
Acharya, U.R., Faust, O., Kadri, N.A., Suri, J.S., Yu, W., 2013. Automated
Gers, F.A., Schraudolph, N.N., Schmidhuber, J., 2002. Learning precise timing
identification of normal and diabetes heart rate signals using nonlinear mea-
with lstm recurrent networks. Journal of machine learning research 3, 115–
sures. Computers in biology and medicine 43, 1523–1529.
143.
Adamson, J., Beswick, A., Ebrahim, S., 2004. Is stroke the most common cause
Goldberger, A.L., Amaral, L.A., Glass, L., Hausdorff, J.M., Ivanov, P.C., Mark,
of disability? Journal of Stroke and Cerebrovascular Diseases 13, 171–177.
R.G., Mietus, J.E., Moody, G.B., Peng, C.K., Stanley, H.E., 2000. Phys-
Ali, A., Bailey, C., Abdelhafiz, A.H., 2012. Stroke prevention with oral an-
iobank, physiotoolkit, and physionet: components of a new research re-
ticoagulation in older people with atrial fibrillation-a pragmatic approach.
source for complex physiologic signals. Circulation 101, e215–e220.
Aging and disease 3, 339.
Gotlibovych, I., Crawford, S., Goyal, D., Liu, J., Kerem, Y., Benaron, D., Yil-
Ali, A.N., Howe, J., Abdel-Hafiz, A., 2015. Cost of acute stroke care for pa-
maz, D., Marcus, G., Li, Y., 2018. End-to-end deep learning from raw
tients with atrial fibrillation compared with those in sinus rhythm. Pharma-
sensor data: Atrial fibrillation detection using wearables. arXiv preprint
coeconomics 33, 511–520.
arXiv:1807.10707 .
Bengio, Y., Simard, P., Frasconi, P., 1994. Learning long-term dependencies
Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M., 2013. Internet of things
with gradient descent is difficult. IEEE transactions on neural networks 5,
(iot): A vision, architectural elements, and future directions. Future genera-
157–166.
8
tion computer systems 29, 1645–1660.

Hagiwara, Y., Fujita, H., Oh, S.L., Tan, J.H., San Tan, R., Ciaccio, E.J.,
Acharya, U.R., 2018. Computer-aided diagnosis of atrial fibrillation based
on ecg signals: a review. Information Sciences 467, 99–114.
Hay, A., 1988. The derivation of global estimates from a confusion matrix.
International Journal of Remote Sensing 9, 1395–1398.
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural com-
putation 9, 1735–1780.
Kenny, T., 2018. The nuts and bolts of cardiac pacing. John Wiley & Sons.
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. nature 521, 436.
Martis, R.J., Acharya, U.R., Lim, C.M., Mandana, K., Ray, A.K., Chakraborty,
C., 2013a. Application of higher order cumulant features for cardiac health
diagnosis using ecg signals. International journal of neural systems 23,
1350014.
Martis, R.J., Acharya, U.R., Lim, C.M., Suri, J.S., 2013b. Characterization
of ecg beats from cardiac arrhythmia using discrete cosine transform in pca
framework. Knowledge-Based Systems 45, 76–82.
Mas, J., 2018. Receiver operating characteristic (roc) analysis, in: Geomatic
Approaches for Modeling Land Change Scenarios. Springer, pp. 465–467.
Moody, G., 1983. A new method for detecting atrial fibrillation using rr inter-
vals. Computers in Cardiology , 227–230.
Oh, S.L., Ng, E.Y., San Tan, R., Acharya, U.R., 2018. Automated diagnosis
of arrhythmia using combination of cnn and lstm techniques with variable
length heart beats. Computers in biology and medicine 102, 278–287.
Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y., 2013. How to construct deep
recurrent neural networks. arXiv preprint arXiv:1312.6026 .
Pearlmutter, B.A., 1989. Learning state space trajectories in recurrent neural
networks. Neural Computation 1, 263–269.
Petrutiu, S., Sahakian, A.V., Swiryn, S., 2007. Abrupt changes in fibrillatory
wave characteristics at the termination of paroxysmal atrial fibrillation in
humans. Europace 9, 466–470.
Powers, D.M., 2011. Evaluation: from precision, recall and f-measure to roc,
informedness, markedness and correlation. Journal of Machine Learning
Technologies 2, 37–63.
Sak, H., Senior, A., Beaufays, F., 2014. Long short-term memory recurrent
neural network architectures for large scale acoustic modeling, in: Fifteenth
annual conference of the international speech communication association,
pp. 338–342.
Siegelmann, H.T., Sontag, E.D., 1995. On the computational power of neural
nets. Journal of computer and system sciences 50, 132–150.
Song, Z., Ji, Z., Ma, J.G., Sputh, B., Acharya, U.R., Faust, O., 2012. A system-
atic approach to embedded biomedical decision making. Computer methods
and programs in biomedicine 108, 656–664.
Stewart, S., Hart, C.L., Hole, D.J., McMurray, J.J., 2002. A population-
based study of the long-term risks associated with atrial fibrillation: 20-year
follow-up of the renfrew/paisley study. The American journal of medicine
113, 359–364.
Swapna, G., Soman, K., Vinayakumar, R., 2018. Automated detection of car-
diac arrhythmia using deep learning techniques. Procedia Computer Science
132, 1192–1201.
Thrall, G., Lip, G.Y., Carroll, D., Lane, D., 2007. Depression, anxiety, and
quality of life in patients with atrial fibrillation. Chest 132, 1259–1264.
Xia, Y., Wulan, N., Wang, K., Zhang, H., 2018. Detecting atrial fibrillation
by deep convolutional neural networks. Computers in biology and medicine
93, 84–92.
Yıldırım, Ö., Pławiak, P., Tan, R.S., Acharya, U.R., 2018. Arrhythmia detec-
tion using deep convolutional neural network with long duration ecg signals.
Computers in biology and medicine 102, 411–420.
Yuan, C., Yan, Y., Zhou, L., Bai, J., Wang, L., 2016. Automated atrial fibrilla-
tion detection based on deep learning network, in: Information and Automa-
tion (ICIA), 2016 IEEE International Conference on, IEEE. pp. 1159–1164.
Conflict of interest
I would like to declare that there is no conflict of interest.
Oliver Faust
Corresponding author

Journal Pre-Proof: Pattern Recognition Letters

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Journal Pre-Proof: Pattern Recognition Letters

Uploaded by

Copyright:

Available Formats

Validating the robustness of an internet of things based atrial fibrillation detection system

Validating the robustness of an internet of things based atrial

Oliver Faust, Murtadha Kareem, Alex Shenfield, Ali Ali,

To appear in: Pattern Recognition Letters

Received date: 20 September 2019

© 2020 Published by Elsevier B.V.

• We validated a mature LSTM deep learning model

• The classification performance indicates that the ma-

Pattern Recognition Letters

Intelligent internet of things, Deep learn-

1. Introduction et al., 2013). The sinus node consists of a cluster of special

where ◦ indicates element-wise multiplication and Tanh(...) is

~ht = ~ot ◦ Tanh(~ct ) (5)

The weights and biases are established during the training

Fig. 3: Overview of our validation study.

Database TP TN FP FN No beats Accuracy AUC

tion computer systems 29, 1645–1660.

You might also like