Journal of Electrocardiology

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Journal of Electrocardiology 59 (2020) 151–157

Contents lists available at ScienceDirect

Journal of Electrocardiology

journal homepage: www.jecgonline.com

Artificial intelligence for detecting mitral regurgitation


using electrocardiography
Joon-myoung Kwon, MD a,b,1, Kyung-Hee Kim, MD, PhD b,c,⁎,1, Zeynettin Akkus, PhD d, Ki-Hyun Jeon, MD, MS b,c,
Jinsik Park, MD, PhD c, Byung-Hee Oh, MD, PhD c
a
Department of Emergency Medicine, Mediplex Sejong Hospital, Incheon, Republic of Korea
b
Artificial Intelligence and Big Data Center, Sejong Medical Research Center, Bucheon, Republic of Korea
c
Division of Cardiology, Cardiovascular Center, Mediplex Sejong Hospital, Incheon, Republic of Korea
d
Department of Cardiovascular Science, Mayo Clinic, Rochester, MN, USA

a r t i c l e i n f o a b s t r a c t

Available online xxxx Background: Screening and early diagnosis of mitral regurgitation (MR) are crucial for preventing irreversible
progression of MR. In this study, we developed and validated an artificial intelligence (AI) algorithm for detecting
MR using electrocardiography (ECG).
Keywords: Methods: This retrospective cohort study included data from two hospital. An AI algorithm was trained using
Mitral valve insufficiency 56,670 ECGs from 24,202 patients. Internal validation of the algorithm was performed with 3174 ECGs of 3174
Electrocardiography patients from one hospital, while external validation was performed with 10,865 ECGs of 10,865 patients from
Echocardiography another hospital. The endpoint was the diagnosis of significant MR, moderate to severe, confirmed by echocardi-
Artificial intelligence ography. We used 500 Hz ECG raw data as predictive variables. Additionally, we showed regions of ECG that have
Deep learning the most significant impact on the decision-making of the AI algorithm using a sensitivity map.
Results: During the internal and external validation, the area under the receiver operating characteristic curve of
the AI algorithm using a 12-lead ECG for detecting MR was 0.816 and 0.877, respectively, while that using a
single-lead ECG was 0.758 and 0.850, respectively. In the 3157 non-MR individuals, those patients that the AI de-
fined as high risk had a significantly higher chance of development of MR than the low risk group (13.9% vs. 2.6%,
p b 0.001) during the follow-up period. The sensitivity map showed the AI algorithm focused on the P-wave and
T-wave for MR patients and QRS complex for non-MR patients.
Conclusions: The proposed AI algorithm demonstrated promising results for MR detecting using 12-lead and
single-lead ECGs.
© 2020 Elsevier Inc. All rights reserved.

Introduction There are also no suitable screening tools for asymptomatic or mildly
symptomatic patients. The most common symptoms are vague, such as
Mitral regurgitation (MR)—the reversal of blood flow from the left fatigue and exertional dyspnea, and most remain asymptomatic until
ventricle into the left atrium—is the most common heart valve disorder there is left ventricular cavity enlargement with systolic dysfunction, pul-
in the United States and developed countries [1,2]. Although the preva- monary hypertension, or the onset of atrial fibrillation [3–5]. Among pa-
lence of significant MR in the general population is only 1.7%, approxi- tients with MR, a murmur was inconsistently detected clinically, the
mately 10% of patients over the age 70 are affected by significant MR. absence of a murmur dose not exclude MR. [6] The electrocardiography
[1,2] As MR is a progressive disease, which can lead to heart failure (ECG) is non-specific for MR and has limited role to look for signs of ische-
and death, screening and early diagnosis are important to predict irre- mia, left atrial enlargement, and left ventricular hypertrophy [7]. The
versible disease progression and prevent death [3,4]. chest radiograph is also non-specific for mitral regurgitation and shows
left atrial and ventricular enlargement [7]. Echocardiography diagnoses
Abbreviations: AI, artificial intelligence; AUPRC, area under the precision-recall curve; MR, but may underestimate the severity of regurgitation [8]. And echocar-
AUROC, area under the receiver operating characteristics curve; CNN, convolutional neural diography is an expensive, time-consuming, and less accessible method
network; ECG, electrocardiography; MR, mitral regurgitation. for screening and early detection than ECG or simple chest radiography.
⁎ Corresponding author at: Division of Cardiology, Department of Internal Medicine, If MR could be detected using a conventional 12-lead ECG or a
Cardiovascular Center, Mediplex Sejong Hospital, 20, Gyeyangmunhwa-ro, Gyeyang-gu,
Incheon, Republic of Korea.
single-lead wearable device, patients could be referred for echocardiog-
E-mail address: learnbyliving9@gmail.com (K.-H. Kim). raphy and early diagnosis. To develop a reliable screening method based
1
These two authors contributed equally to this study. on ECG, we used artificial intelligence (AI). A deep learning-based AI

https://doi.org/10.1016/j.jelectrocard.2020.02.008
0022-0736/© 2020 Elsevier Inc. All rights reserved.
152 J. Kwon et al. / Journal of Electrocardiology 59 (2020) 151–157

algorithm achieved state-of-the-art performance in several medical do- 60,000 numbers in total. We used 8 s of ECG data by excluding the
mains, including diagnosis and prediction for cardiovascular disease first and last 1-s periods because more artifacts were contained within
using ECG [9,10]. We developed and validated an interpretable AI algo- this range. Consequently, we created 2-dimensional data of 12 × 4000
rithm for detecting MR using 12-lead ECGs. Furthermore, we evaluated from each ECG to develop and validate the algorithm.
the performance of the algorithm in detecting MR using single-lead
ECGs and interpreted the developed algorithm’s decision-making via vi- Development of AI algorithm for detecting MR
sualization and analysis of the ECG characteristics.
As shown in Fig. 2, the AI algorithm was developed using a
Methods convolutional neural network (CNN) with 2-dimensional convolution,
max pooling, flatten, batch normalization, and dropout layers [11–14].
Study design and population The architecture of the AI network was composed of six residual blocks
with two convolutional layers, two batch normalization layers, one
This multicenter retrospective cohort study involved data from two dropout layer, and one max pooling layer per block. The first layer of
hospitals to develop and validate a deep learning-based AI algorithm for convolutional neural network is composed of eight 2-dimensional fil-
detecting MR. Hospital A is a cardiovascular teaching hospital, while hos- tered data, which is the output of the first convolution operation across
pital B is a community general hospital. The study subjects were adults the raw input ECG data with the shape of 4000 × 12. By the six
(aged ≥18 years) who underwent both ECG and echocardiography within maxpooling layer which reduced the size of data, the last layer of
a 4-week period. The Institutional Review Boards of Sejong General Hos- convolutional neural network is composed of 256 2-dimensional fil-
pital (2019–0356) and Mediplex Sejong Hospital (2019–064) approved tered data (7 × 6). The values of each filtered data was connected to
this study protocol and waived the need for informed consent due to im- fully connected layer after flatten layer and was used for distinguishing
practicality and minimal harm. We excluded subjects with missing demo- the patterns of ECG with MR from patterns without MR. Following the
graphic, electrocardiographic, and echocardiographic information. last convolutional layer, the features were fed into a fully connected
Patients treated at hospital A (October 2016–March 2019) were split layer using flatten and batch normalization layers. We confirmed the
into algorithm derivation and internal validation datasets (Fig. 1). Pa- performance of the algorithm while adding the number of fully con-
tients who had follow-up echocardiography after initial evaluation nected layers. Of 0 to 10 as number of fully connected layers, the perfor-
were distributed to an internal validation dataset. Patients who had no mance of algorithm was maximizing when the number of layers was 3.
follow-up echocardiography were distributed to a derivation dataset And the number of nodes in each fully connected layer was selected by
which was used to develop the AI algorithm. We then evaluated the ac- grid search. We choose the smallest-sized node unless there was a sta-
curacy of the algorithm using the internal validation dataset. Further- tistical significantly difference (P b 0.001). The final fully connected
more, we used the hospital B data as an external validation dataset layer outputted one node and was activated using a sigmoid function.
(March 2017–March 2019) to verify that the algorithm was applicable Before using features raw ECG data, data preprocessing, normalization,
across centers. Because the purpose of the validation data was to assess and noise filtering was needed. TensorFlow (Google LLC, Mountain
the accuracy of the algorithm we only used one ECG from each patient View, CA USA)—open-source software library—was used as the backend
for the internal and external validation datasets—the most recent prior [15]. We provided the developed AI algorithm itself as Supplemental
to their first echocardiography in the study period. material with this paper.
We developed an additional single-lead ECG-based AI algorithm
Endpoint and predictive variables with 4000 numbers from each single-lead in the derivation dataset as
input information. The single-lead algorithm was also developed with
The primary endpoint was the presence of significant MR (moderate CNN. For evaluating the performance of algorithm when using one
to severe), defined as effective regurgitant orifice area ≥ 0.2 cm2, regur- ECG lead, we developed the algorithm using one ECG lead and validated
gitation volume ≥ 30 ml, regurgitation fraction ≥ 30%, and MR grade II– the same lead of ECG. For example, we developed an algorithm using
IV—confirmed by echocardiography [8]. As shown in Fig. 2, we used the raw data from Lead I and validated the algorithm using raw data from
raw ECG data as predictor variable. In the raw data of each 12‑lead ECG, Lead I. We then developed an algorithm using raw data of Lead II and
there were 5000 numbers for each lead, recorded over 10 s (500 Hz)— validate the algorithm using raw data of Lead II. In the same manner,

Fig. 1. Study flowchart.


J. Kwon et al. / Journal of Electrocardiology 59 (2020) 151–157 153

Fig. 2. Description of artificial intelligence algorithm for detecting mitral regurgitation: ECG denotes electrocardiography and MR mitral regurgitation.

we developed and validated the algorithm using each ECG lead (Lead I, and F-measure as comparative metrics. As the AI algorithm would be
Lead II, Lead III, aVF, aVR, aVL, V1, V2, V3, V4, V5, and V6). used as a screening tool, we confirmed the results at a cut-off point
that we set as a sensitivity of 90%. We evaluated the 95% confidence in-
Visualizing and interpreting developed AI algorithms terval using boot-strapping (10,000times resampling with replace-
ment) [19]. All statistical analyses were performed using R (R
To understand the model and make a comparison with existing med- Development Core Team, Vienna, Austria).
ical knowledge, it was important to identify which region had a significant
effect on the decision of the AI algorithm. We employed a sensitivity map
Confirming AI performance to predict developing MR as subgroup analysis
using a saliency method [16]. The map was computed utilizing the first-
order gradients of the classifier probabilities with respect to the input sig-
We hypothesized that the ECGs would show subtle abnormal pat-
nals. If the probability of a classifier was sensitive to a specific region of the
terns in the pre-MR phase and developed an AI algorithm would classify
signal, the region would be considered as significant in the model. As the
some of these cases as abnormal, giving the initial result of a false posi-
developed algorithm based on convolutional neural network, we used the
tive test (a study subject classified as having a MR but considered as
class activation map for visualization. As the convolutional neural net-
non-MR). To ensure that this was not so, we conducted subgroup anal-
work used rectified linear unit as activation function and batch normaliza-
ysis with patients who had follow-up echocardiography in the internal
tion layer was included the network, simple class activation map could
and external validation dataset. Of these patients, we confirmed the de-
not be used for visualization. Because of this we used gradient weighted
velopment of MR in patients who were initially considered non-MR
class activation map (Grad-CAM) for visualization. Grad-CAM uses the
based on the initial echocardiography. The AI algorithm divided high
gradients information of the algorithm and could be used in any activa-
and low risk groups based on the risk score using cut-off values which
tion function and architecture of convolutional neural network [16].
were decided using the Youden's J statistic with the derivation dataset
We conducted additional experiment in internal and external vali-
[20]. Because we used sigmoid function as activation function of output
dation dataset for generalizability of result of heat map. In significant
node (last layer), the value of output node is between 0 and 1. We used
MR patients, we confirmed the time points of first T wave peak in
the value as risk score and divided the high and low risk group using the
each ECG data. We found the maximal values of heat map in the ECG
risk score. For the analysis of developing MR during the 24-months by
and confirmed the time points which the maximal value was exist
the AI algorithm, we used the Kaplan-Meier method.
first. And we calculated the difference between two timepoint (first T-
wave peak time point and first heat map maximal value time point)
and confirmed mean and standard deviation of the difference. In pa- Results
tients without significant MR, we calculated the difference of time
points between R-wave peak and first maximal value of heat map. Of the 38,393 patients who were eligible for this study, 152 patients
And we also confirmed the mean and standard deviation of the were excluded due to missing values. As shown in Fig. 1, the study in-
difference. cluded 38,241 patients, of whom 2973 had significant MR. Table 1 is a
baseline characteristics table. An AI algorithm was developed using a
Statistical analysis derivation dataset of 56,670 12‑lead ECGs from 24,202 patients. The
performance of the algorithm was then confirmed using 3174 ECGs
Continuous variables were presented as mean and standard devia- from the 3174 patients, of whom 824 had significant MR, in the internal
tion and were compared using the unpaired Student's t-test or Mann- validation dataset from hospital A, and 10,865 ECGs from the 10,865 pa-
Whitney U test. Categorical variables were expressed as frequencies tients, of whom 424 had significant MR, in the external validation
and percentages and were compared using the χ2 test. At each input dataset from hospital B.
of validation data, each AI algorithm calculated the possibility of signif- As shown in Fig. 3, during the internal validation, the AUROC and
icant MR in the range from 0 to 1. To confirm the performance of the de- AUPRC of the AI algorithm were 0.816 (95% confidence interval [CI],
veloped AI algorithms, we compared the possibility calculated by the AI 0.811–0.820) and 0.600 (95% CI: 0.594–0.604), respectively. During
algorithm with the presence of MR in the validation dataset. For this, we the external validation, AUROC and AUPRC of the algorithm were
used the area under the receiver operating characteristics curve 0.877 (95% CI: 0.870–0.883) and 0.328 (95% CI, 0.322–0.333), respec-
(AUROC) and the area under the precision-recall curve (AUPRC) to tively. The AUROC of the single‑lead AI algorithm during internal and
measure the performance of the model [17,18]. We also used sensitivity, external validation using the aVR lead was 0.758 and 0.850, respec-
specificity, positive predictive value, negative predictive value, accuracy, tively; these results are shown in Fig. 3 and Table 2.
154 J. Kwon et al. / Journal of Electrocardiology 59 (2020) 151–157

Table 1
Baseline characteristics.a

Characteristics Hospital A Hospital B


(Derivation and internal validation data) (External validation data)

non-MR MR p† non-MR MR p† p‡

Study subjects, N(%) 24,827 (90.69) 2549 (9.31) 10,441 (96.10) 424 (3.90) b0.001

Baseline characteristics, mean (SD)


Age 59.41 (15.35) 67.74 (13.85) b0.001 57.86 (15.22) 68.36 (14.77) b0.001 b0.001
Male, N(%) 12,457 (50.2) 1030 (40.4) b0.001 5256 (50.3) 179 (42.2) b0.001 0.185
Weight 64.94 (12.34) 61.20 (11.77) b0.001 66.09 (13.30) 61.20 (13.24) b0.001 b0.001
Height 162.37 (9.33) 159.37 (9.51) b0.001 163.16 (9.51) 159.73 (9.18) b0.001 b0.001
BMI 24.53 (3.57) 24.00 (3.53) b0.001 24.71 (3.78) 23.89 (4.13) b0.001 b0.001

Echocardiographic findings
LVSD 29.66 (8.62) 37.38 (11.05) b0.001 30.70 (27.75) 40.77 (12.22) b0.001 b0.001
LVDD 47.38 (9.13) 53.42 (8.80 b0.001 48.52 (10.70) 55.91 (9.09) b0.001 b0.001
Septum 9.99 (2.08) 10.36 (2.13) b0.001 9.52 (2.92) 9.72 (1.82) 0.172 b0.001
PWT 9.59 (2.32) 9.96 (1.60) b0.001 9.56 (16.83) 9.43 (1.39) 0.873 0.482
Aorta 32.06 (21.33) 31.96 (4.54) 0.819 30.64 (8.57) 30.87 (4.45) 0.588 b0.001
LAD 39.01 (7.58) 49.56 (10.26) b0.001 37.31 (13.76) 48.13 (9.09) b0.001 b0.001
E 62.92 (71.76) 86.37 (29.27) b0.001 66.02 (20.14) 94.63 (26.97) b0.001 0.003
A 72.13 (90.84) 78.64 (45.68) 0.011 72.99 (30.34) 78.05 (35.64) 0.012 0.509
DT 202.26(63.89) 190.13(90.84) b0.001 221.99(269.17) 212.08(187.84) 0.495 b0.001
E' 6.76 (2.74) 5.86 (2.33) b0.001 6.86 (3.32) 5.48 (2.27) b0.001 b0.001
A' 8.79 (2.40) 6.92 (2.39) b0.001 8.62 (3.05) 6.27 (2.57) b0.001 b0.001
E over E' 10.51 (28.54) 16.84 (9.04) b0.001 10.72 (4.82) 20.15 (10.22) b0.001 0.984
PA pressure 24.32 (7.96) 36.49 (13.87) b0.001 23.82 (7.65) 38.33 (14.60) b0.001 b0.001
FAC 25.81 (7.87) 23.53 (7.76) 0.006 30.07 (9.73) 26.36 (10.21) 0.003 b0.001
S 8.89 (2.34) 8.64 (2.00) 0.061 9.65 (2.60) 8.77 (2.41) 0.015 0.002
EF 51.93 (8.22) 45.57 (11.95) b0.001 48.10 (8.33) 38.95 (13.54) b0.001 b0.001

Electrocardiographic findings
Heart rate 74.32 (17.78) 83.66 (25.61) b0.001 72.92 (16.65) 83.87 (24.95) b0.001 b0.001
AF, N(%) 2294 (9.2) 1061 (41.6) b0.001 640 (6.1) 140 (33.0) b0.001 b0.001
PR interval 152.05(56.43) 107.53 (84.83) b0.001 155.78 (48.46) 116.18 (82.65) b0.001 b0.001
QT interval 398.77(42.76) 403.37 (57.30) b0.001 399.20 (40.47) 406.01 (53.27) 0.001 0.587
QTc 436.89(33.72) 462.99 (41.38) b0.001 433.78 (33.69) 468.06 (38.78) b0.001 b0.001
QRSd 96.44(17.42) 102.95(23.17) b0.001 96.01 (15.79) 106.93 (26.36) b0.001 0.003
P axis 73.58 (80.92) 151.17(120.33) b0.001 60.86 (66.07) 125.46 (116.71) b0.001 b0.001
R axis 38.63 (44.07) 40.11 (52.48) 0.113 38.26 (39.93) 36.41 (52.80) 0.357 0.244
T axis 45.97 (49.73) 61.14 (75.75) b0.001 42.63 (42.53) 62.32 (73.78) b0.001 b0.001
a
A denotes late diastolic mitral inflow velocity, A' late diastolic mitral annular tissue velocity, AF atrial fibrillation or atrial flutter, AVA aortic valve area, BMI body mass index, DT de-
celeration time, E early diastolic mitral inflow velocity, E' early diastolic mitral annular tissue velocity, EF ejection fraction, FAC fractional area change, LAD left atrial dimension, LVDD left
ventricular diastolic dimension, LVMI left ventricular mass index, LVSD left ventricular systolic dimension, MR mitral regurgitation, PA pulmonary artery, PWT posterior wall thickness,
QRSd QRS duration, S lateral annular tissue Doppler, and SD standard deviation.

The alternative hypothesis for this p-value was that there is a difference between the mitral regurgitation and non-mitral regurgitation data group for each variable.

The alternative hypothesis for this p-value was that there is a difference between the hospital A (derivation and internal validation data group) and B (external validation group) for
each variable.

We used a sensitivity map to visualize the ECG region used by the AI Discussion
algorithm to detect MR (Fig. 4). The map shows that the AI algorithm fo-
cused on the P wave and T wave in patients with MR (Fig. 4A) and the AI We developed and validated a deep learning-based AI algorithm for
algorithm focused on the QRS complex in patients without MR (Fig. 4B). MR detection using a 12-lead ECG. This study demonstrated the prom-
The mean and standard deviation of time difference between first T- ising performance of the AI algorithm for significant MR detection. In
wave peak and first heat map's maximal values area were 13.4 millisec- addition, we developed an AI algorithm using a single-lead ECG that
onds (ms) and 79.1 ms in patients who had significant MR. And the demonstrated a reasonable performance. We visualized and interpreted
mean and standard deviation of time difference between first R-wave our AI algorithm relative to the region and characteristics of the ECGs for
peak and first heat map's maximal values area were 7.3 ms and MR detection. Lastly, we conducted subgroup analysis for non-MR pa-
32.5 ms in patients without significant MR. tients at initial echocardiography and showed that the AI algorithm
There were 3978 patients (3174 patients in the internal validation could predict the development of MR. To the best of our knowledge,
dataset and 804 patients in the external validation dataset) with this is the first study to develop an AI algorithm for detecting MR, and
follow-up echocardiographic results. Of these, 3157 patients were show interpretable patterns of decision-making using AI.
non-MR individuals at initial echocardiography. We conducted sub- Attia et al.'s studies showed the possibility for detecting left ventric-
group analysis of the MR development after initial echocardiography ular dysfunction and developing atrial fibrillation based on deep learn-
with these 3157 patients and of them, 218 patients developed MR ing [10,21]. However, deep learning is often criticized for the
within the 27 months. In Fig. 5, we confirmed the cumulative hazard unreliability of its outcomes because of a lack of clarity and transparency
for development of MR in the 3157 patients who had been no MR in ini- regarding the process of the decision, often referred to as a black box. Be-
tial echocardiography. As shown in Fig. 5, patients who was selected as cause of this, we used a sensitivity map to visualize the region of the ECG
high risk group in AI initially developed more MR than low risk group that was used for decision-making by the AI.
(13.9% vs. 2.6%, P b 0.001). The second part of Fig. 5 is number of risk In this study, a sensitivity map showed that our AI algorithm focused
table which show the number of patients who have not yet developed on the P wave and T wave in patients with MR. Weinsaft, et al. showed
MR at each time point. that ECG-quantified P wave area provided an index of left atrium
J. Kwon et al. / Journal of Electrocardiology 59 (2020) 151–157 155

Fig. 3. Performance of algorithm for detecting mitral regurgitation: AUPRC denotes area under the precision recall curve, AUROC area under the receiver operating characteristics curve, CI
confidence interval, ECG electrocardiography, NPV negative predictive value, and PPV positive predictive value.

remodeling and increased stepwise in relation to MR severity [22]. And with wide QRS complex N130 ms [25]. Early descriptions of MR were
atrial fibrillation, disappearing P wave, develops commonly in patients based largely on reversibility of the LV dilatation and eccentric hyper-
with MR, with a reported rate as high as 5% per year [23]. Previous stud- trophy [26]. And the R wave also occurs just before mitral valve closure
ies confirmed that a rightward T wave axis had associations with heart and might be the correlation with the mitral valvular function and
failure [24]. The AI algorithm might conclude presence of MR with left motion.
atrial and ventricular morphologies using P wave and T wave in patients Class imbalance is one of the most significant problem in developing
with wide QRS complex. In contrast, the algorithm focused on the QRS and AI algorithm and validating its performance. When the data are
complex in patients without MR. Functional MR and dilated left ventri- very imbalanced, the trained model tends to perform poorly on the mi-
cles were previously reported to be more prevalent among patients nority class. To mitigate this, we adjusted the ratio of MR/non-MR data
in a training process by employing over-sampling and under-sampling
methods. Most patients were non-MR and in for such imbalanced
Table 2 datasets, AUPRC is a more important metric than AUROC [18]. With im-
Performance of artificial intelligence algorithm using single lead electrocardiography. balanced data, in which the number of negatives outweighs the number
of positives, AUROC has a limitation in evaluating the performance be-
Internal validation External validation
cause the false positive rate (false positive/total real negatives) does
Lead AUROC (95% CI) AUROC (95% CI) not decrease dramatically when the total negatives are large. Because
V1 0.756 (0.751–0.762) 0.832 (0.827–0.838)
V2 0.760 (0.752–0.765) 0.849 (0.843–0.855)
of this, AUROC might be overestimated in imbalanced data. In our
V3 0.770 (0.764–0.775) 0.846 (0.839–0.853) study, the MR proportion of internal validation dataset was higher
V4 0.755 (0.747–0.761) 0.844 (0.837–0.851) than that of external validation dataset because internal validation in-
V5 0.757 (0.752–0.762) 0.839 (0.833–0.845) cluded the patients who had follow up echocardiography. The patient
V6 0.760 (0.753–0.765) 0.840 (0.833–0.847)
number of significant MR in derivation, internal, and external validation
aVL 0.746 (0.740–0.752) 0.804 (0.798–0.811)
I 0.764 (0.759–0.769) 0.847 (0.841–0.853) dataset are 1725 (7.13%), 824 (25.96%), and 424 (3.90%). The proportion
aVR 0.758 (0.753–0.762) 0.850 (0.842–0.857) in internal and external validation are very different. Hospital A is car-
II 0.753 (0.746–0.758) 0.832 (0.825–0.839) diovascular teaching hospital and hospital B community general hospi-
aVF 0.750 (0.745–0.756) 0.822 (0.815–0.828) tal. Because of this, hospital A has more significant MS patient than
III 0.747 (0.741–0.752) 0.818 (0.812–0.825)
hospital B. For subgroup analysis which confirm the development of
156 J. Kwon et al. / Journal of Electrocardiology 59 (2020) 151–157

Fig. 4. Sensitivity map of artificial intelligence algorithm for detecting mitral regurgitation.

significant MS in follow up period, we distributed patients who had fol- algorithm, patients in developing countries with limited medical re-
low up echocardiography data in hospital A to internal validation data. sources could screen with simple single lead devices. And in developed
As patients with disease might be conducted follow up study, the pro- countries, patients could be monitored using a wearable devices, such as
portion of significant MR in internal validation data was more increased. a watch or patch, and could be alerted regarding disease progression in
AUROCs in internal validation were smaller (0.816 b 0.877) and AUPRCs daily life. Our additional findings that the AI algorithm could predict the
in internal validation were bigger (0.600 N 0.328) than external valida- development of MR indicates the algorithm could be used for early pre-
tion. As most medical data is imbalance, researchers would keep in diction through screening.
mind when algorithm development and using AUROC for performance There are several limitations to the present study. First, as this study
test. was only conducted in two hospital in Korea, it is necessary to validate
We showed the reasonable performance of an AI algorithm based on the model with patients in other countries. To overcome this limitation,
a single‑lead ECG, especially aVR, as shown in Table 2. Although lead we have provided the developed AI algorithm as supplemental material
aVR is often overlooked, the lead represents electrical forces oriented to this report. Using this file, researchers could find detailed algorithm
toward the cavity of the heart and acute right ventricular overload architecture and validate with their own patients. Second, we need to
[27]. And the morphology of the P wave in lead aVR can be used to dif- further explore the decision process of the AI algorithm. For example,
ferentiate atrial tachycardia [28]. The reliable performance of a single additional experiments are required to further understand the deep
lead ECG based AI algorithm indicates that MR could be screened with learning process, and thereby which characteristics of the P wave, QRS
a single lead wearable device and simple monitoring device. Using this complex, and T wave influence the algorithm's decision. The explainable
AI had been studied and reported on recently, so the “black box” limita-
tion could be solved in near future [29,30]. This subject will be our next
area of study and this might be the new methodology for discovering
the new medical knowledge of disease and ECG. Thirds, although our
purpose of this study is making screening tool using ECG and showing
the possibility of deep learning to interpret the bio-signal data in med-
icine, we should enhance the performance of our algorithm for using
as diagnostic tool. In our next research, we could enhance the perfor-
mance of our algorithm using methods which currently being devel-
oped, such as random wired neural network.

Conclusions

We developed an interpretable AI algorithm for detecting MR using


12-lead and single-lead ECGs and demonstrated that the algorithm
could detect MR and predict the development of MR. The results indi-
cate that MR could be screened and predicted not only with a conven-
tional 12-lead ECG, but also with a single-lead ECG using a wearable
device that employs the AI algorithm.

Funding
Fig. 5. Cumulative hazard of developing MR in patients with an initially non-MR: AI
denotes artificial intelligence and MR mitral regurgitation. No funding was secured for this study.
J. Kwon et al. / Journal of Electrocardiology 59 (2020) 151–157 157

Disclosures [9] Attia ZI, Kapa S, Lopez-Jimenez F, McKie PM, Ladewig DJ, Satam G, et al. Screening for
cardiac contractile dysfunction using an artificial intelligence–enabled electrocardio-
gram. Nat Med 2019;25(1):70–4 Jan 7.
No authors have financial relationships relevant to this article to [10] Attia ZI, Noseworthy PA, Lopez-Jimenez F, Asirvatham SJ, Deshmukh AJ, Gersh BJ,
disclose. et al. An artificial intelligence-enabled ECG algorithm for the identification of pa-
tients with atrial fibrillation during sinus rhythm: a retrospective analysis of out-
come prediction. Lancet 2019;394(10201):861–7 Sep.
CRediT authorship contribution statement [11] Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, et al.
Cardiologist-level arrhythmia detection and classification in ambulatory electrocar-
diograms using a deep neural network. Nat Med 2019;25(1):65–9 Jan 7.
Joon-myoung Kwon:Conceptualization, Methodology, Software, [12] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Proceed-
Formal analysis, Investigation, Writing - original draft, Writing - review ings of the IEEE computer society conference on computer vision and pattern recog-
& editing, Visualization.Kyung-Hee Kim:Conceptualization, Resources, nition, 1. ; 2016. p. 770–8.
[13] LeCun Y, Boser B, Denker JS, Howard RE, Habbard W, Jackel LD. Handwritten digit
Writing - original draft, Writing - review & editing, Project administra-
recognition with a Back-propagation network. Adv Neural Inf Process Syst 1990;1:
tion.Zeynettin Akkus:Methodology, Formal analysis.Ki-Hyun Jeon: 396–404.
Conceptualization, Writing - review & editing.Jinsik Park:Data curation, [14] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document
Resources, Writing - review & editing, Supervision.Byung-Hee Oh:Data recognition. Proc IEEE 1998;1:2278–324.
[15] Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, et al. TensorFlow: A system for
curation, Resources, Writing - review & editing. large-scale machine learning TensorFlow: A system for large-scale machine learning.
12th USENIX Symp Oper Syst des implement (OSDI’16); 2016. p. 265–84.
Declaration of competing interest [16] Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual
explanations from deep networks via gradient-based localization. Proceedings of
the IEEE international conference on computer vision, 1. ; 2017. p. 618–26.
No authors have conflict of interest to disclose. [17] Fawcett T. An introduction to ROC analysis. Pattern Recognit Lett 2006;27(8):
861–74.
[18] Ozenne B, Subtil F, Maucort-Boulch D. The precision-recall curve overcame the opti-
Acknowledgements
mism of the receiver operating characteristic curve in rare diseases. J Clin Epidemiol
2015;68(8):855–9.
None. [19] Carpenter J, Bithell J. Bootstrap confidence intervals: when, which, what? A practical
guide for medical statisticians. Stat Med 2000;19(9):1141–64.
[20] Schisterman EF, Perkins NJ, Liu A, Bondell H. Optimal cut-point and its correspond-
Appendix A. Supplementary data ing Youden index to discriminate individuals using pooled blood samples. Epidemi-
ology 2005;16(1):73–81.
Supplementary data to this article can be found online at https://doi. [21] Attia ZI, Kapa S, Lopez-Jimenez F, McKie PM, Ladewig DJ, Satam G, et al. Screening for
cardiac contractile dysfunction using an artificial intelligence–enabled electrocardio-
org/10.1016/j.jelectrocard.2020.02.008. gram. Nat Med 2019;25(1):70–4.
[22] Weinsaft JW, Kochav JD, Kim J, Gurevich S, Volo SC, Afroz A, et al. P wave area for
References quantitative electrocardiographic assessment of left atrial remodeling. Schillaci G,
editor. PLoS one. 2014 Jun 5; vol. 9(6): e99178.
[1] Nkomo VT, Gardin JM, Skelton TN, Gottdiener JS, Scott CG, Enriquez-Sarano M. Bur- [23] Grigioni F, Avierinos J-F, Ling LH, Scott CG, Bailey KR, Tajik AJ, et al. Atrial fibrillation
den of valvular heart diseases: a population-based study. Lancet 2006 Sep;368 complicating the course of degenerative mitral regurgitation. J Am Coll Cardiol 2002;
(9540):1005–11. 40(1):84–92 Jul.
[2] Benjamin EJ, Muntner P, Alonso A, Bittencourt MS, Callaway CW, Carson AP, et al. [24] Kwon J, Kim K-H, Jeon K-H, Kim HM, Kim MJ, Lim S-M, et al. Development and val-
Heart disease and stroke statistics—2019 update: a report from the American idation of deep-learning algorithm for electrocardiography-based heart failure iden-
Heart Association. Circulation 2019;139(10):5 Mar. tification. Korean Circ J 2019;49(7):629.
[3] Grigioni F, Tribouilloy C, Avierinos JF, Barbieri A, Ferlito M, Trojette F, et al. Outcomes [25] Erlebacher JA, Barbarash S. Intraventricular conduction delay and functional mitral
in mitral regurgitation due to flail leaflets. JACC Cardiovasc Imaging 2008;1(2): regurgitation. Am J Cardiol 2001;88(1):83–6 Jul.
133–41 Mar. [26] Gaasch WH, Meyer TE. Left ventricular response to mitral regurgitation. Circulation
[4] Enriquez-Sarano M, Avierinos J-F, Messika-Zeitoun D, Detaint D, Capps M, Nkomo V, 2008;118(22):2298–303 Nov 25.
et al. Quantitative determinants of the outcome of asymptomatic mitral regurgita- [27] George A, Arumugham PS, Figueredo VM. aVR - the forgotten lead. Exp Clin Cardiol
tion. N Engl J Med 2005;352(9):875–83 Mar 3. 2010;15(2):e36–44.
[5] Cioffi G, Tarantini L, De Feo S, Pulignano G, Del Sindaco D, Stefenelli C, et al. Func- [28] Gorgels APM, Engelen DJM, Wellens HJJ. Lead aVR, a mostly ignored but very valu-
tional mitral regurgitation predicts 1-year mortality in elderly patients with systolic able lead in clinical electrocardiography**Editorials published in the Journal of the
chronic heart failure. Eur J Heart Fail 2005;7(7):1112–7 Dec. American College of Cardiologyreflect the views of the authors and do not necessar-
[6] Bursi F, Enriquez-Sarano M, Nkomo VT, Jacobsen SJ, Weston SA, Meverden RA, et al. ily represent the views of JACCor the American. J Am Coll Cardiol 2001 Nov;38(5):
Heart failure and death after myocardial infarction in the community. Circulation 1355–6.
2005;111(3):295–301 Jan 25. [29] Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P. InfoGAN: interpret-
[7] Maganti K, Rigolin VH, Sarano ME, Bonow RO. Valvular heart disease: diagnosis and able representation learning by information maximizing generative adversarial nets.
management. Mayo Clin Proc 2010;85(5):483–500 May. Neural Inf Process Syst 2016;1:2172–80.
[8] Zoghbi WA, Adams D, Bonow RO, Enriquez-Sarano M, Foster E, Grayburn PA, et al. [30] Fong RC, Vedaldi A. Interpretable explanations of black boxes by meaningful pertur-
Recommendations for noninvasive evaluation of native valvular regurgitation. J bation. Proc IEEE Int Conf Comput Vis 2017:3449–57.
Am Soc Echocardiogr 2017;30(4):303–71 Apr.

You might also like