Professional Documents
Culture Documents
Enhanced Medical Time-Series Forecasting Using LSTM, MDN, and Attention Mechanism
Enhanced Medical Time-Series Forecasting Using LSTM, MDN, and Attention Mechanism
ᾱ σ ᾱ σ ᾱ σ
h1 h2 ht
V1 V2 Vt
{
relevant to} tg ∈ = tg1, tg2, . . . , tgt R , called target value
t
large in magnitude because of the dot products. We use the loss hospital more than 10 times for a fixed window size of 10.
ᾱ μ σ ᾱ μ σ ᾱ μ σ
27 features with low missing rates was selected from lab final target is L2081, MCV. MCV indicates the mean blood
test results in the dataset such as white blood cells, red blood volume. The MCV of a typical person is between 81 and 96
cells, hemoglobin and so on. The missing values were and the unit is equal to fl. The MCV of the experimental
replaced by actual values at previous visits. Because of dataset has a maximum value of 114.6 and a minimum
different sparsity of inpatients and outpatients, the dataset value of 70.6. In the L2081, LSTM-MDN-ATTN model
was aggregated on a monthly basis. If there were multiple shows the best performance in RMSE and MAE. In all three
visits for a month, we used only the last record. The dataset models, the R-square is 0.7 or higher. This experiment
is normalized by using the min-max scaling method. showed that the LSTM-MDN-ATTN model outperformed
B. Training Details the LSTM-RMSE model that have become one of the most
The environment of experimental is the Tensorflow popular networks for modeling time-series. In addition,
frame- work. Hyper-parameters are set as follows: mini the LSTM-MDN-ATTN model shows better performance
batch=64, LSTM-hidden-dimension=32, and mixture- than the LSTM-MDN model, showing that the proposed
components=6. We trained the network using the Adam TATTN layer is effective in the EMR dataset used.
optimizer popular in the field of deep learning at learning IV. CONCLUSION
rate 0.0001. There are three metrics for evaluating the
In this study, we propose the LSTM-MDN-ATTN model
LSTM-MDN-ATTN model, RMSE, MAE, and R2. The lower
for predicting the future state of the patient by modeling
values indicate better performance under RMSE and
the distribution suitable for target data in the multivariate
MAE.
medical data. The LSTM-MDN-ATTN model uses the
r
1 N attention mech-
RMSE Σ (yn − anism to focus on distributions that are highly correlated with
= n i=n (6) the target data to improve prediction accuracy.
yˆn)2
1
MAE = Σnn=1 |yn − yˆ n |
n
R2 is used to determine if the model has been properly R EFERENCES
trained for the dataset. As R2 approaches zero, it means
that the regression model did not fit properly into the data
set.
[1] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural
R2 = 1 − Σ N
n=1 (yn − computation, 9(8):17351780, 1997.
(7) [2] R. Zhao, J. Wang, R. Yan, and K. Mao, Machine health monitoring with
ΣN yˆn)
(yn − y)
n=1
C. Results LSTM networks, 10th International Conference on Sensing Technology
(ICST), pp. 16, 2016.
Table 1 shows a summary of the experimental results. x103/mm3. The maximum value of L2015 in the experimental
The performance of three models such as LSTM-RMSE, dataset is 1.0, and the minimum value is equal to 533.0. In the
LSTM- MDN [7], and LSTM-MDN-ATTN was compared L2015 variable, LSTM-MDN- ATTN also shows the lowest
through the experiments. L2012, L2015 and L2081 were used RMSE and MAE values. The
as prediction targets. First, L2012 means red blood cells. The
normal range of L2012 at Asan Medical Center in Seoul is
4.2 to 6.3, with a unit of x106/mm3. The maximum value is
7.33 and the minimum value is 1.12 in the L2012 of the
experimental dataset. The LSTM-MDN-ATTN model has
the best RMSE and MAE in L2012. R-square shows that all
three models are good at learning more than 0.75. The
target variable, L2015, mean the number of platelets. The
L2015 has a normal range of 150 to 350, and the unit is
[3] R. Zhao, R. Yan, J. Wang, and K. Mao, Learning to monitor
machine health with convolutional bi-directional LSTM networks,
Sensors, vol. 17, no. 2, pp. 273290, 2017.
[4] I. M. Baytas, C. Xiao, X. Zhang, F. Wang, A. K. Jain, and J. Zhou,
”Patient subtyping via time-aware LSTM networks,” in
Proceedings of the 23rd ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining (KDD 2017), Halifax,
Canada, 2017, pp. 65- 74.
[5] C. M. Bishop, Mixture density networks, Tech. Rep., 1994.
[6] D. Ha and J. Schmidhuber, Recurrent world models facilitate
policy evolution, arXiv preprint arXiv:1809.01999, 2018.
[7] R. Rahmatizadeh, P. Abolghasemi, A. Behal, and L. Bol oni, From
virtual demonstration to real-world manipulation using LSTM
and MDN, in Proc. AAAI, New Orleances, LA, USA, 2018.
[8] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov,
R. Zemel, and Y. Bengio, Show, Attend and Tell: Neural Image
Caption Generation with Visual Attention, in International
Conference on Ma- chine Learning, 2015.
[9] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez,
. Kaiser, and I. Polosukhin, Attention is all you need, in Proc. Adv.
Neural Inform. Process. Syst. (NIPS), 2017, pp. 60006010.