Professional Documents
Culture Documents
LV 2017
LV 2017
LV 2017
Abstract—Human–machine systems required a deep under- tivities has emerged as a key research area in human–computer
standing of human behaviors. Most existing research on action interaction [1], [4].
recognition has focused on discriminating between different ac- While state-of-the-art systems achieve reasonable perfor-
tions, however, the quality of executing an action has received little
attention thus far. In this paper, we study the quality assessment of mance for many action recognition tasks, research thus far
driving behaviors and present WiQ, a system to assess the quality mainly focused on recognizing “which” action is being per-
of actions based on radio signals. This system includes three key formed. It can be more relevant for a specific application to
components, a deep neural network based learning engine to ex- recognize whether this task is being performed correctly or not.
tract the quality information from the changes of signal strength,
There are very limited studies on how to extract additional ac-
a gradient-based method to detect the signal boundary for an in-
dividual action, and an activity-based fusion policy to improve the tion characteristics, such as the quality or correctness of the
recognition performance in a noisy environment. By using the qual- execution of an action [5].
ity information, WiQ can differentiate a triple body status with an In this paper, we study the quality assessment of driving be-
accuracy of 97%, whereas for identification among 15 drivers, the haviors. A driving system is a typical human–machine system.
average accuracy is 88%. Our results show that, via dedicated anal- With the rapid development of automatic driving technology, the
ysis of radio signals, a fine-grained action characterization can be
achieved, which can facilitate a large variety of applications, such driving process requires closer interactions between humans and
as smart driving assistants. automobiles (machine) and a careful investigation of the behav-
iors of the driver [6]. There are several potential applications
Index Terms—Human–computer interaction, machine learning,
neural networks.
for quality assessments of driving behaviors. The first applica-
tion is driving assistance. According to the quality information,
one can classify the driver as a novice or as experienced, and
I. INTRODUCTION then, for the former, the assistance system can provide advices
in complex traffic situation. The second potential application is
T IS very important to understand fine-grained human behav-
I iors for a human–machine system. The knowledge regarding
human behaviors is fundamental for better planning of a cyber-
risk control. It provides an important hint of fatigued driving if
a driver repeatedly drives at a low quality level. Additionally,
long-term driving quality information is meaningful for the car
physical system [1]–[3]. For example, action monitoring has the
insurance industry.
potential to support a broad array of applications, such as elder
We explore a technique for qualitative action recognition
or child safety, augmented reality, and person identification. In
based on narrowband radio signals. Currently, fatigue detection
addition, by observing the behaviors of a person, one can obtain
systems generally rely on computer vision, on-body sensors or
important clues to his intentions. Automatic recognition of ac-
on-vehicle sensors to monitor the behaviors of drivers and detect
the driver drowsiness [7]. In comparison, a radio-based recog-
nition system is nonintrusive, easy to deploy, and can work well
Manuscript received December 31, 2015; revised June 14, 2016, November
2, 2016, and January 20, 2017; accepted February 5, 2017. This work was sup- in non-line-of-sight scenarios. Additionally, for old used cars or
ported in part by the National Natural Science Foundation of China under Grant low-configuration cars, it is much easier to install an radio-based
61572512, Grant U1435219, and Grant 61472434 and in part by JSPS KAK- system than a sensor-based system.
ENHI under Grant JP16K00117. This paper was recommended by Associate
Editor G. Fortino. (Corresponding author: Mianxiong Dong.) It is obvious that quality assessment is much more challenging
S. Lv, Y. Lu, X. Wang, and Y. Dou are with the National Laboratory of than action recognition. Qualitative action characterization has
Parallel and Distributed Processing, National University of Defense Technology, thus far only been demonstrated in constrained settings, such as
Changsha 410073, China (e-mail: shaohelv@nudt.edu.cn; ylu8@nudt.edu.cn;
xdwang@nudt.edu.cn; yongdou@nudt.edu.cn). in sports or physical exercises [5], [8]. Even with high-resolution
M. Dong is with the Department of Information and Electronic Engi- cameras and other dedicated sensors, for general activities, a
neering, Muroran Institute of Technology, Muroran 050-8585, Japan (e-mail: deep understanding of the quality of action execution has not
mx.dong@csse.muroran-it.ac.jp).
W. Zhuang is with the Department of Electrical and Computer Engi- been reached.
neering, University of Waterloo, Waterloo, ON N2L 3G1, Canada (e-mail: There are several technical challenges for quality recognition
wzhuang@uwaterloo.ca). by radio signals such as modeling the action quality, the method
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. of signal fragments extraction, and how to mitigate the effect of
Digital Object Identifier 10.1109/THMS.2017.2693242 noise and interference. We present WiQ, a radio-based system
2168-2291 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
to assess the action quality by leveraging the changes of radio flight (TOF), as a temporal feature, is used in three-dimensional
signal strength. There are three key components in this system. (3-D) localization [18]. Finally, angle of arrival (AOA) is used
1) Deep neural network-based learning: The quality of ac- in direction inference and object imaging [26].
tion is characterized by the relative variation (e.g., gradi- There are two major classification methods: first, fingerprint-
ent) of the received signal strength (RSS). A framework based mapping, which takes advantage of machine learning
based on the deep neural network is proposed to learn the techniques to recognize actions [15], [19], [24], [25] and, sec-
quality from the gradient sequence of the signal strength. ond, geometric mapping, which extracts the distance, direc-
2) Gradient-based boundary detection: As the signal tion or other parameters to infer the locations or actions of
strength can vary sharply at the start and end points of interest [16]–[18].
an action, a sudden gradient change is a strong indica- While several works have explored how to recognize actions,
tor of the action boundary. A gradient-based method is only a few have addressed the problem of analyzing the action
proposed to extract the signal fragment for an individual quality. In [8], a programming by demonstration method is used
action. to study the action quality in weight lifting exercises through
3) Activity-based fusion: Typically, a driving task is com- Kinect and Razor inertial measurement units. The sensors in
pleted by a series of actions, referred to as activity. To a smart phone are utilized to monitor the quality of exercises
mitigate the effect of surrounding noise, we take an ac- on a balance board [27]. Similarly, Wii Fit uses a special bal-
tivity as a whole, fusing the information from all of ance board to analyze yoga, strength, and balance exercises. In
the actions to derive a sound observation of the action addition, driving behavior analysis systems generally rely on
quality. computer vision to detect eyelid or head movement, on-body
We build a proof-of-concept prototype of WiQ with the Uni- sensors to monitor brain waves or heart rate, or preinstalled in-
versal Software Radio Peripheral (USRP) platform [9] and eval- struments to detect steering wheel movements. Biobehavioral
uate the performance with a driving emulator. To the best of our characteristics are used to infer a driver’s habits or vigilance
knowledge, this is the first study in which action quality as- level [6], [7], [28]. While promising, these techniques suffer
sessment is performed by using wireless signals and a deep from limitations, such as physical contact with drivers (e.g., at-
learning method. Our results show that, via dedicated analysis taching electrodes), high instrumentation overhead, sensitivity
of radio signal features, a fine-grained action characterization to lighting, or the requirement of line-of-sight communication
can be achieved, which leads to a wide range of potential appli- (i.e., a driver wearing eye glasses can pose a serious problem for
cations, such as smart driving assistants, smart physical exercise eye characteristic detection). Different from the existing studies,
training, and healthcare monitoring. we focus on assessing the action quality based on radio signals.
The rest of this paper is organized as follows. Section II
provides an overview of related works, and Section III describes III. BASIC IDEA
the challenges and basic ideas of our work. Section IV discusses
the design of WiQ. Section V presents the experimental results. In this section, we first state the quality recognition problem
Finally, we conclude this research in Section VI. and then the design challenges. Afterward, we discuss the basic
idea for characterizing the quality of actions.
II. RELATED WORK
Action recognition systems generally adopt various tech- A. Problem Statement
niques, such as computer vision [10], inertial sensors [11], ultra- It is critical to understand driving habits for many applications
sonic [12], and infrared electromagnetic radiation. Recently, we such as driver assistance. We consider two representative tasks:
have witnessed the emergence of technologies that can localize First, driver identification to determine which driver from a set
a user and track his activities based purely on radio reflections of candidates performs a driving action; and second, body status
off the person’s body [13]–[15]. The research has pushed the recognition to infer the driver’s vigilance level. When a person
limits of radiometric detection to a new level, including motion is inattentive or fatigued, driving actions will be performed in
detection [16], gesture recognition [17], and localization [18]. a different manner. That is, the quality of driving actions is
By exploiting radio signals, one can detect, e.g., motions behind changed. It is therefore feasible to monitor the body status by
walls and the breathing of a person [19]–[22], or even recognize measuring the quality of driving actions.
multiple actions simultaneously [13]. A careful analysis of the action is required to capture the
In general, there are two major stages in a radio recognition unique feature of driving. As the driving action is generic, it is
system: feature extraction and classification. Thus far, various insufficient to distinguish the driver or his status by identifying
signal features have been proposed: energy [23], frequency [12], actions alone. In fact, different drivers have different driving
[17], temporal characteristics [13], [18], channel state informa- styles, e.g., an experienced driver can stop a car smoothly, while
tion [15], [19], [24], [25], angle characteristics [16], etc. The a novice may be forced to employ sudden braking.
RSS, as an energy feature, is easy to obtain and has been used In this study, the motions of a driver’s foot on the pedal are
widely in action recognition [23]. In comparison, channel state tracked. Fig. 1(a) shows six types of actions for manual car
information is a finer-grained feature that can capture human driving. Here, an action refers to a short motion that cannot
motions effectively [15], [25]. Doppler shift, as a frequency be partitioned further, i.e., a press or release of the pedal (e.g.,
feature, has been used in gesture recognition [17]. Time of clutch, brake, and throttle). Moreover, an activity is defined as
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
C. Quality Recognition
a series of actions to complete a driving task. As shown in
We characterize the quality of action with respect to motion
Fig. 1(b), several typical activities are included: ground-start,
and we consider the duration of an execution and the speed and
parking, hill-start, acceleration, and deceleration.
distance of the pedal motion. We first discuss the case of the
To capture the driving behaviors by radio signals, the trans-
throttle and then extend our discussion to the clutch and brake.
mitter and receiver nodes are located on the two sides of the
The duration of an execution can be estimated after the signal
pedals. The receiver reports the RSS per sampling point. More
boundary of the action is detected. Let TS be the number of
details about the experimental setup are described in Section V.
sampling points in the fragment, an estimate of the duration is
(TS − 1) × tu , where tu is the length of the sampling interval.
B. Challenges The movement speed can be captured by the change rate
There are several technical challenges posed by quality recog- (e.g., gradient) of signal strength. Fig. 3(a) shows the received
nition based on narrowband radio signals. signal in the experiment: The throttle is pressed and released
1) Quality Modeling: Currently, there is no common under- quickly five times and then slowly another five times. When
standing regarding what defines the quality of an action. It is the motion is faster, the change of signal strength is sharper
argued that, if one can specify how to perform an action, quality (e.g., the typical gradients are −18 and −9.47 for the two cases,
can be defined as the adherence of the execution of an action respectively). The gradient sequence of signal strength is plotted
to its specification [5], [8]. To measure quality, it is therefore in Fig. 3(b). The gradient magnitude is, on average, much larger
necessary to characterize the execution of an action through a for a quicker motion. Thus, the gradient of signal strength is an
finer-grained motion analysis. For example, consider the brak- effective metric to characterize the movement speed.
ing (BP) action in three driving behaviors, e.g., sudden braking, The correlation between the pedal position and the signal
parking, and slight deceleration. As shown in Table I, though strength is exploited to estimate the motion distance. We press
the action is the same, the quality is quite different: First, the the throttle (TP) to a small extent and hold for several seconds;
movement of the brake in the first case is much faster than then press it to a large extent and hold for several seconds and
the others; and second, the movement distance of the brake in finally press it to the maximum degree. The same pattern is
the last case is smaller than the others. repeated for the throttle-releasing (TR) in the opposite order.
There is currently no effective way to characterize the ex- The RSS is shown in Fig. 4. The signal strength is distinct when
ecution of an action. Although many radio signal features are the pedal position is different. To infer the motion distance
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 2. RSS for the acceleration activity with (a) high SNR and (b) low SNR.
Fig. 5. RSS when the (a) clutch, (b) brake, and (c) throttle are pressed and then released.
TABLE II
FEATURES OF GRADIENT FOR QUALITY RECOGNITION
TABLE III
FEATURES OF SIGNAL STRENGTH FOR ACTION RECOGNITION
1
n
n n ( x i −x ) 4
1 i= 1
Average xi Kurtosis −3
n i= 1 ( n1 ( x i −x ) 2 ) 2
Range x
m a x − xm in IQR Q3 − Q1
i= 1
n |x i −x | n
MAD n Sum i = 1 xi
n
n x2
− x) 2 i= 1 i
Variance i = 1 (x i RMS n N
1 ( x i −x ) 3
1 n n
Third C-moment i= 1 x 3i Skewness i= 1
n ( n1 ( x i −x ) 2 ) 3 / 2
Fig. 11. Action recognition with high SNR and 100 training instances.
Fig. 12. Action recognition with low SNR and 100 training instances.
Fig. 10. Action recognition with high SNR and ten training instances.
A. Action Recognition
For each dataset in the high-SNR category, we choose 100
samples randomly for training and the remaining for test.
Figs. 10 and 11 show the results of recognition accuracy. For
Fig. 13. Action recognition with multiple drivers, high SNR, and 1000 training
example, the value (i.e., 13%) at position (4, 5) is the (error) instances.
probability that BR is recognized as TP. The recognition accu-
racy is shown by the diagonal of the matrix. When the training
number of the CNN network is 10, the accuracy is at least 86% More experiments are conducted with three drivers. Together
and on average 95%. With more training (e.g., 100 times), the with the six actions, we have a total of 18 classes. For each class,
performance becomes much better, i.e., the average accuracy ap- from the high-SNR samples, we randomly select 100 of them
proaches 98%. Nevertheless, the impact of noise or interference for training and the remaining for test. Fig. 13 shows the results
is severe on the performance of action recognition. As shown when the training number is 1000. As the number of class is
in Fig. 12, for the low-SNR category, the accuracy is as low as much larger, the accuracy decreases drastically, which can be as
39% and on average 65%. low as 26% and approximately 60% on average.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 15. Body status recognition based on the quality of action with high SNR.
Fig. 14. Quality distribution for CP with (a) different drivers or (b) different
receiver positions.
TABLE IV
AVERAGE ERROR RATE OF ACTION RECOGNITION OF CNN, SVM, AND kNN
WITH HIGH SNR AND DIFFERENT NUMBERS OF ITERATIONS
Note: k NN does not need iteration, and the result is shown when k =
3.
TABLE V
AVERAGE ERROR RATE OF ACTION RECOGNITION OF CNN, SVM, AND kNN
WITH LOW SNR AND DIFFERENT NUMBERS OF ITERATIONS
Fig. 17. Accuracy of driver identification with activity-based fusion in the
high-SNR category.
Number of Iterations 5 10 15 20 50 100
TABLE VI
AVERAGE ERROR RATE OF BODY STATUS RECOGNITION WITH SVM AND
DIFFERENT SETS OF GRADIENT FEATURES
[20] Z. Yang, Z. Zhou, and Y. Liu, “From RSSI to CSI: Indoor localiza- Mianxiong Dong (M’13) received the B.S., M.S.,
tion via channel response,” ACM Comput. Surv., vol. 46, no. 2, 2013, and Ph.D. degrees in computer science and engineer-
Art. no. 25. ing from The University of Aizu, Aizuwakamatsu,
[21] W. Xi et al., “Electronic frog eye: Counting crowd using WiFi,” in Proc. Japan, in 2006, 2008, and 2013, respectively.
IEEE INFOCOM, 2014, pp. 361–369. He is currently an Associate Professor with the
[22] F. Adib, H. Mao, Z. Kabelac, D. Katabi, and R. C. Miller, “Smart homes Department of Information and Electronic Engi-
that monitor breathing and heart rate,” in Proc. Annu. ACM 33rd Annu. neering, Muroran Institute of Technology, Muro-
ACM Conf. Human Factors Comput. Syst., 2015, pp. 837–846. ran, Japan. His research interests include wireless
[23] S. Sigg, M. Scholz, S. Shi, Y. Ji, and M. Beigl, “RF-sensing of activities networks, cloud computing, and cyber-physical
from non-cooperative subjects in device-free recognition systems using systems.
ambient and local signals,” IEEE Trans. Mobile Comput., vol. 13, no. 4,
pp. 907–920, Apr. 2014.
[24] G. Wang, Y. Zou, Z. Zhou, K. Wu, and L. M. Ni, “We can hear you
with Wi-Fi!” in Proc. ACM Annu. Int. Conf. Mobile Comput. Netw., 2014,
pp. 593–604.
[25] C. Han, K. Wu, Y. Wang, and L. M. Ni, “Wifall: Device-free fall detection
by wireless networks,” in Proc. IEEE INFOCOM, 2014, pp. 271–279.
[26] D. Huang, R. Nandakumar, and S. Gollakota, “Feasibility and limits of
Wi-Fi imaging,” in Proc. ACM Conf. Embedded Netw. Sensor Syst., 2014, Xiaodong Wang received the B.S., M.S., and Ph.D.
pp. 266–279. degrees in computer science from the National Uni-
[27] A. Moeller et al., “Gymskill: A personal trainer for physical exercises,” in versity of Defense Technology, Changsha, China, in
Proc. IEEE Int. Conf. Pervasive Comput. Commun., 2012, pp. 588–595. 1996, 1998, and 2002, respectively.
[28] J. M. Wang, H. Chou, S. Chen, and C. Fuh, “Image compensation for im- He is with the National Laboratory for Parallel
proving extraction of driver’s facial features,” in Proc. Int. Conf. Comput. and Distributed Processing, National University of
Vis. Theory Appl., 2014, pp. 329–338. Defense Technology, where he has been a Professor
[29] S. Sigg, S. Shi, and Y. Ji, “Rf-based device-free recognition of simulta- since 2011. His current research interests focus on
neously conducted activities,” in Proc. ACM Conf. Pervasive Ubiquitous wireless communications and social networks.
Comput. Adjunct Publ., 2013, pp. 531–540.
[30] S. Sigg, S. Shi, F. Büsching, Y. Ji, and L. C. Wolf, “Leveraging RF-channel
fluctuation for activity recognition: Active and passive systems, contin-
uous and RSSI-based signal features,” in Proc. Int. Conf. Adv. Mobile
Comput. Multimedia, 2013, p. 43.
[31] D. Halperin, T. E. Anderson, and D. Wetherall, “Taking the sting out of
carrier sense: interference cancellation for wireless LANs,” in Proc. 14th
ACM Int. Conf. Mobile Comput. Netw., 2008, pp. 339–350.
[32] K. El-Kafrawy, M. Youssef, and A. El-Keyi, “Impact of the human mo-
tion on the variance of the received signal strength of wireless links,” in Yong Dou (M’08) received the B.Sc., M.Sc., and
Proc. IEEE 22nd Int. Symp. Pers., Indoor Mobile Radio Commun., 2011,
Ph.D. degrees in computer science from the National
pp. 1208–1212.
University of Defense Technology, Changsha, China,
[33] R. Polikar, “Ensemble based systems in decision making,” IEEE Circuits
in 1990, 1993, and 1997, respectively.
Syst. Mag., vol. 6, no. 3, pp. 21–45, Third Quarter 2006. He is currently a Professor with the National Lab-
[34] C. Chang and C. Lin, “LIBSVM: A library for support vector machines,”
oratory for Parallel and Distributed Processing, Na-
ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, 2011, Art. no. 27.
tional University of Defense Technology. His current
research interests focus on intelligent computing, ma-
chine learning, and computer architecture.
Shaohe Lv (S’06–M’11) received the B.S, M.S., and
Ph.D. degrees in computer science from the National
University of Defense Technology, Changsha, China,
in 2011, 2005, and 2003, respectively.
He is with the National Laboratory of Parallel
and Distributed Processing, National University of
Defense Technology, where he has been an Assis-
tant Professor since July 2011. His current research
interests focus on wireless communication, machine
learning, and intelligent computing.
Weihua Zhuang (M’3–SM’01–F’08) received the
B.Sc. and M.Sc. degrees from Dalian Maritime Uni-
versity, Dalian, China, in 1982 and 1985, respectively,
and the Ph.D. degree from the University of New
Yong Lu is currently working toward the Ph.D. de- Brunswick, Fredericton, NB, Canada, in 1993, all in
gree in wireless networking, pervasive computing electrical engineering.
from the National Laboratory for Parallel and Dis- She has been with the Department of Electrical
tributed Processing, National University of Defense and Computer Engineering, University of Waterloo,
Technology, Changsha, China. Waterloo, ON, Canada, since 1993, where she is cur-
His current research interests focus on wireless rently a Professor and a Tier I Canada Research Chair.
communications and networks. Her current research interests focus on wireless net-
works and smart grids.
Prof. Zhuang is an Elected Member of the Board of Governors and VP Pub-
lications of the IEEE Vehicular Technology Society.