Professional Documents
Culture Documents
WCSP 2018 8555945
WCSP 2018 8555945
WCSP 2018 8555945
Abstract—With the development of mobile communicationhu- • We propose a RNN-based approach to classify the human
man activity recognition (HAR) with smartphones has attracted a activities. It could extract features automatically which
lot of attentions in recent years. On the other hand, the appearance might preserve time dependency.
of deep learning technologies makes it possible to extract features
• We design a parallel LSTM architecture to reduce compu-
automatically instead of hand-crafted extracting features in the
traditional machine learning methods. Among deep model, CNN- tation consumption in this RNN model.
based HAR methods dominate the studies compared to RNN- • Plenty of experiments have been performed to verify our
based methods. In this paper, we propose a RNN-based multi- proposed method. The results indicate that the model
layer parallel LSTM network to recognize human activities. The performs better than traditional machine-learning methods
experimental results on the public UCI HAR dataset indicate
that the proposed approach performs better than the traditional and achieves the similar performance as that of CNN,
machine-learning methods, and achieves the similar performance but it has lower computation complexity than CNN-based
as that of CNN, but it has lower computation complexity than methods.
CNN-based methods. The rest of paper is organized as follows. Section II
Index Terms—human activity recognition, deep learning, paral-
lel LSTM network, smartphone sensor
introduces the related work on the deep learning technologies
applied in human activity recognition. Section III gives
I. I NTRODUCTION an overview of the Recurrent Neural Network (RNN) and
Long-Short Term Memory (LSTM) network, and presents the
Human activity recognition (HAR), as a significant part of detailed architecture of a parallel LSTM network. In Section
Human Robot Interaction, is applied widely in the healthcare IV, we discuss the model in terms of the recognition accuracy
domain such as elder care support, rehabilitation assistance and the computation complexity on the public HAR dataset.
and cognitive disorder recognition systems [1]. Generally, the Finally, conclusions are summarized in Section V.
data for human activity recognition collected from two type
of device: camera and sensors [2]. With the development
of mobile communication, the sensor-based approaches using II. RELATED WORK
smartphones with the low cost inertial sensors, such as ac- The task of human activity recognition with sensor data
celerometer, gyroscope and magnetometer, for HAR have being using the deep learning methods has been well studied over
received the extensive concerns [3] [4]. last several years. In [7], Wang et al. surveyed and highlighted
The traditional machine learning(ML) method applied in the recent advancement of deep learning approaches for
HAR often extracts features from sensor data before clas- sensor-based activity recognition. The survey showed that
sification, and the features are mostly related to statistical CNN-based methods dominate the studies and they are better
features in time domain and the frequency domain [5] [6]. at inferring the long-term repetitive activities while RNN at
Nevertheless, choosing features in a specific application requires recognizing short activities. In [8], Kotaro et al. used CNN
professional knowledge and involves a huge workload. And for HAR with dynamic features which captured the dynamic
there is often loss of information, such as the time dependency characteristics of the time series of sensor data. It was found
between actions, after extracting features [7]. In recent years, that the performance of dynamic features with CNN is better
deep learning methods, especially the CNN-based methods, than static features with SVM. In [9], both the dynamic
have been widely used for HAR. They have deep complex features from original sensor data and the statistical features
architecture and consider various characteristics from human were used for HAR with a CNN architecture. The obtained
actions [7] [8] [9]. results on public HAR dataset demonstrated that the CNN-base
In this paper, we focus on the HAR with smartphone sensors model significantly outperformed the baseline approaches.
by deep learning methods. We propose an approach based on the As mentioned above, although CNN-based methods are
long short-term memory (LSTM) network to recognize human well studied for the HAR tasks, there are few works based on
activities from the time series data collected by the inertial RNN, especially the LSTM. In [10], W. Zhu et al. proposed an
sensors attached to smartphones. The contributions of this paper end-to-end fully connected deep LSTM network for skeleton
are as follows: based action recognition. The proposed model facilitated the
x x0 x1 ĊĊ xt
automatic learning of feature co-occurrences from the skeleton state ht and output yt , and Whh is the weight between the
joints and achieved the state-of-art performance on several previous hidden state ht−1 and the current one ht . bh and by
datasets. In [11], Friday N. H. et al. proposed Deep learning are the basis vectors.
fusion strategies to increase performance of HAR, which Nevertheless, the range of the historical information pre-
was a hybrid of convolutional neural network and variant of
recurrent neural network. The convolutional neural network
captured the local regional feature from multiple raw sensor ht
data, aggregated by the gated recurrent units. However, this
framework was still under implementation.
Overall, in this paper, we try to deal with the HAR problem
with LSTM network and expect to get outstanding results. Ct1 Ct
y н
III. A MULTI - LAYER PARALLEL LSTM NETWORK tanh
In this section, we give an overview of RNN and LSTM. Output Gate
Forgot Gate Input Gate
Then we introduce out the multi-layer parallel LSTM network y y
and the training parameters such as dropout and loss function.
ı ı tanh ı
A. RNN AND LSTM ht1 ht
RNN is a kind of artificial neural network that contains
cyclic connections, which can model the contextual information.
RNN shares the parameters for every element of a sequence
and generates outputs that depend on the current and previous xt
inputs. It uses hidden states to hold information on previous
inputs. Fig. 2. LSTM Unit
Fig.1 illustrates the RNN architecture and its unfolding form.
Here x = (x0 , x1 , x2 , · · · , xt ) and y = (y0 , y1 , y2 , · · · , yt )
represent the input and output series. As the computational served by the hidden states is limited, which is known as a
flow of the RNN unit shown in the Fig. 1, a hidden state gradient vanishing problem. This problem results from a fact
ht receives information from the previous hidden state ht−1 that the information of a given input would decay or blow up
and the current input xt , acting like the memory of network exponentially as it circulates around the hidden states. The most
that keeps information about what previously computed. The effective solution for this problem is the LSTM architecture,
parameters involved in a RNN are described as follows. which can learn long-term dependency from a time series. The
architecture of the LSTM unit is shown in Fig.2. The LSTM
ht = tanh (Wxh · xt + Whh · ht−1 + bh ) (1) unit is consisted of a self-connected memory cell (ct ) and three
gates: an input gate (it ) to control the storage of the input data,
yt = Why · ht + by (2)
a forget gate (ft ) to control the discard of the previous state and
Here Wxh is the weight matrix between the input xt and the an output gate (ot ) to generate the output results. At a given
hidden state ht , Wyh is the weight matrix between the hidden time step t, with the input and output represented by xt and ht ,
the LSTM activations are calculated as follows.
it = σ (Wxi · xt + Whi · ht−1 + Wci · ct−1 + bi ) (3)
Prediction Label
choose the number of the parallel and the merging LSTM unit Lastly, we compare our multi-layer LSTM network with
at setting of 24 and 64, which is on the premise of guaranteeing other state-of-the-art methods in HAR. Table.IV lists the
the model performance. achieved human activity recognition accuracy with different
methods on the UCI dateset. From it, we note that our multi-
layer LSTM network achieves the recognition accuracy of
0.95 94.34%, which is superior than the traditional machine learning
0.94 methods like SVM, HMM, and it is closed to the results with
RECOGNITION ACCURACY
0.925
12 16 20 24 28 32 36 40 44 48 52 56 64 80 96 112 128 144
NUMBER OF THE MERGING LSTM UNIT NEURONS
TABLE V
COMPUTATION COMPLEXITY
Fig. 5. Recognition accuracy of the model with different number of the neurons Method Recognition
in the merging LSTM unit Time (ms)
Convolutional Neural Network [21] 21.32
Moreover, we also study the influence of dropout rate on the Convolutional Neural Network [9] 25.61
Multi-layer Serial LSTM network 65.02
overall performance. Experiments show that the model will be Multi-layer Parallel LSTM Network 5.76
over fitting in training at an early time with a high dropout rate
and not converge with a low dropout rate. Finally, we found
V. C ONCLUSIONS [12] D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz, “A
public domain dataset for human activity recognition using smartphones,”
This paper proposes a smartphone-based HAR method European Symposium on Artificial Neural Networks, Computational In-
telligence and Machine Learning ESANN, pp. 467–442, 2013.
using a multi-layer parallel LSTM network. The proposed [13] “Tensorflow api.” https://www.tensorflow.org.
method can automatically extract features of time dependency [14] K. Li, X. Zhao, J. Bian, and M. Tan, “Sequential learning for multimodal
from the original sensor data and classifies the activities 3d human activity recognition with long-short term memory,” in IEEE
International Conference on Mechatronics and Automation, pp. 1556–
with a softmax. A public HAR dateset, UCI dateset, is used 1561, 2017.
for the simulation experiment which contains six activities. [15] S. Seto, W. Zhang, and Y. Zhou, “Multivariate time series classification
The hyper-parameters of the network are adjusted to the using dynamic time warping template selection for human activity recog-
nition,” in Computational Intelligence, 2015 IEEE Symposium, pp. 1399–
optimal state, which leads to an accuracy of 94.34%. The 1406, 2016.
result shows that our proposed method performs better than [16] D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz, “Human
traditional machine-learning method and spends less time activity recognition on smartphones using a multiclass hardware-friendly
support vector machine,” in International Conference on Ambient Assisted
during recognition than CNN-based methods, which is suitable Living and Home Care, pp. 216–223, 2012.
for building a low-cost real-time HAR system on smart phone [17] C. A. Ronaoo and S. B. Cho, “Evaluation of deep convolutional neural
platform. network architectures for human activity recognition with smartphone
sensors,” Korea Information Science Society, pp. 858–861, 2015.
[18] C. A. Ronao and S. B. Cho, “Human activity recognition using smart-
phone sensors with two-stage continuous hidden markov models,” in
ACKNOWLEDGMENT International Conference on Natural Computation, pp. 681–686, 2014.
[19] Y. Li, D. Shi, B. Ding, and D. Liu, “Unsupervised feature learning for
This work was supported by the funding of Key human activity recognition using smartphone sensors,” in Mining Intel-
ligence and Knowledge Exploration: Second International Conference,
Lab of Broadband Wireless Communication and Sensor pp. 99–107, 2014.
Network Technology (Nanjing University of Posts and [20] C. A. Ronao and S. B. Cho, “Recognizing human activities from s-
Telecommunications, Ministry of Education, JZNY201704), martphone sensors using hierarchical continuous hidden markov models,”
International Journal of Distributed Sensor Networks, vol. 13, no. 1,
Nanjing University of Posts and Telecommunications pp. 1–16, 2017.
(NY217021, NY218014). [21] C. A. Ronao and S. B. Cho, “Human activity recognition with smart-
phone sensors using deep learning neural networks,” in Pergamon Press,
pp. 235–244, 2016.
R EFERENCES
[1] C. Chen, R. Jafari, and N. Kehtarnavaz, “A survey of depth and inertial
sensor fusion for human action recognition,” Multimedia Tools & Appli-
cations, vol. 76, no. 3, pp. 4405–4425, 2017.
[2] Y. Chen and C. Shen, “Performance analysis of smartphone-sensor
behavior for human activity recognition,” IEEE Access, vol. 5, no. 99,
pp. 3095–3110, 2017.
[3] A. Jahangiri and H. A. Rakha, “Applying machine learning techniques
to transportation mode recognition using mobile phone sensor data,”
IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 5,
pp. 2406–2417, 2015.
[4] C. V. S. Buenaventura and N. M. C. Tiglao, “Basic human activity
recognition based on sensor fusion in smartphones,” in Integrated Network
and Service Management, pp. 1182–1185, 2017.
[5] Z. Chen, Q. Zhu, S. Y. Chai, and L. Zhang, “Robust human activity
recognition using smartphone sensors via ct-pca and online svm,” IEEE
Transactions on Industrial Informatics, vol. 13, no. 6, pp. 3070–3080,
2017.
[6] H. He, Y. Tan, and J. Huang, “Unsupervised classification of smartphone
activities signals using wavelet packet transform and half-cosine fuzzy
clustering,” in IEEE International Conference on Fuzzy Systems, pp. 1–6,
2017.
[7] J. Wang, Y. Chen, S. Hao, X. Peng, and L. Hu, “Deep learning for sensor-
based activity recognition: A survey,” Pattern Recognition Letters, 2018.
[8] K. Nakano and B. Chakraborty, “Effect of dynamic feature for human ac-
tivity recognition using smartphone sensors,” in International Conference
on Awareness Science and Technology, pp. 539–543, 2017.
[9] I. Andrey, “Real-time human activity recognition from accelerometer data
using convolutional neural networks,” Applied Soft Computing, pp. 1–8,
2017.
[10] W. Zhu, C. Lan, J. Xing, W. Zeng, Y. Li, L. Shen, and X. Xie,
“Co-occurrence feature learning for skeleton based action recognition
using regularized deep lstm networks,” National Laboratory of Pattern
Recognition, pp. 3697–3703, 2016.
[11] N. H. Friday, M. A. Al-Garadi, G. Mujtaba, U. R. Alo, and A. Waqas,
“Deep learning fusion conceptual frameworks for complex human activity
recognition using mobile and wearable sensors,” in International Confer-
ence on Computing, Mathematics and Engineering Technologies, pp. 1–7,
2018.