HandGest Hierarchical Sensing For Robust In-The-Air Handwriting Recognition With Commodity WiFi Devices

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3170157, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL 1

HandGest: Hierarchical Sensing for Robust


in-the-air Handwriting Recognition with
Commodity WiFi Devices
Jie Zhang, Member, IEEE, Yang Li, Haoyi Xiong, Senior Member, IEEE, Dejing Dou, Senior Member, IEEE,
Chunyan Miao, Senior Member, IEEE, and Daqing Zhang, Fellow, IEEE

Abstract—Recent advances in wireless sensing techniques have teers, evaluation results demonstrate that HandGest outperforms
made it possible to recognize hand gestures using Channel state-of-the-art methods on a large number of handwritings with
State Information (CSI) in commodity WiFi devices. Existing different transceivers’ location, different initial hand locations
WiFi based gesture recognition systems mainly use learning- and orientations, as well as different drawing sizes. Given its
based pattern recognition methods to recognize different gestures, superior performance, we believe that HandGest paves a new
however, these methods fail to work well when the locations of way to enhance the real-world practicality of WiFi-based gesture
transceivers, the relative location and orientation of the hand recognition.
with respect to transceivers, and/or the hand gesturing size
change, leading to inconsistent signal patterns caused by those Index Terms—Handwriting recognition, dynamic phase vector,
factors. Although some recent efforts have been made to address motion rotation variable, hierarchical sensing.
the so-called “domain-dependent” gesture recognition problem,
they either require prior knowledge on initial locations of the
hand and WiFi devices or need to train several classifiers for
I. I NTRODUCTION
the specific domains. Different from state-of-the-art methods, N the past decade, the feasibility of gesture-based human
we construct two distinct features from a hand-oriented view
(rather than from a transceiver’s view), namely dynamic phase
I computer interaction (HCI) has been widely explored by
researchers, aiming at developing advanced interactive tech-
vector (DPV) and motion rotation variable (MRV), which are
quite consistent in characterizing a big set of handwriting nologies with intelligent devices [1]–[3]. Different from tradi-
gestures, despite significant change in locations of transceivers, tional touch-based interfaces, contactless hand gesture-based
the relative location and orientation of the hand with respect to HCI supports the natural interaction through hand drawing in
transceivers, and the drawing sizes. We further incorporate a the air without dedicated on-body hardware. Existing state-of-
hierarchical sensing framework and develop HandGest – a real-
time handwriting gesture recognition system using commodity the-art methods can be grouped into the following categories:
WiFi devices, to precisely recognize a great number of “in-the- camera-based, acoustic-based, and WiFi-based methods, etc.
air” handwritings based on the aforementioned two domain- [4]. Blessed by the ubiquitousness of WiFi signals at low costs,
independent features and a pipeline of specific features. Extensive WiFi-based solutions, capturing the variations of WiFi signals
experiments have been done in practical settings with 20 volun- caused by hand movements, have attracted growing attention
from both academia and industries [5]–[7].
This work is supported in part by the National Natural Science Foundation To capture gestures in the space covered by WiFi signals,
of China A3 Foresight Program under Grant 62061146001, and in part by the
PKU-Baidu Fund under Grant 2019BD005. (Corresponding Author: Daqing Channel State Information (CSI) has been used by existing
Zhang) studies [8]–[10]. Specifically, CSI characterizes the frequency
Jie Zhang is with the Key Laboratory of High Confidence Software response of the wireless channel between the transceivers at
Technologies, Ministry of Education, Beijing 100871, China, also with the
School of Computer Science, Peking University, Beijing 100871, China, orthogonal frequency-division multiplexing (OFDM) subcarri-
and also with the School of Automation & Electrical Engineering, Uni- ers, and it would vary greatly along with moving objects in
versity of Science and Technology Beijing, Beijing 100083, China (e-mail: the space [11]–[13]. Given a commodity WiFi device pair, one
zhangjie.saee@pku.edu.cn).
Yang Li is with the Key Laboratory of High Confidence Software Tech- could retrieve CSI from the network interface card (NIC), such
nologies, Ministry of Education, Beijing 100871, China, and also with the as Intel 5300 NIC, with CSITOOL [14]. Based on raw CSI
School of Computer Science, Peking University, Beijing 100871, China (e- measurements, most state-of-the-art methods adopt learning-
mail: liyang@stu.pku.edu.cn).
Haoyi Xiong and Dejing Dou are with the Big Data Laboratory, Baidu Inc., based methods [15]–[17] to first extract features from CSI, and
Beijing 100193, China, and also with the National Engineering Laboratory of then recognize different gestures though classification, where
Deep Learning Technology and Applications, Beijing 100193, China (e-mails: the primitive features, such as CSI amplitude and statistical
haoyi.xiong.fr@ieee.org; doudejing@baidu.com).
Chunyan Miao is with the School of Computer Science and Engineering, feature, have been frequently used to classify gesture with
Nanyang Technological University, Singapore 639798, and also with the respect to signal variations.
Joint NTU-UBC Research Centre of Excellence in Active Living for the However, these methods fail to recognize hand gestures
Elderly (LILY), Nanyang Technological University, Singapore 639798 (e-mail:
ascymiao@ntu.edu.sg). robustly in practical settings, as they usually cannot tolerate
Daqing Zhang is with the Key Laboratory of High Confidence Software the change of the relative location and orientation of the hand
Technologies, Ministry of Education, Beijing 100871, China, also with the with respect to transceivers, and the drawing size. Specific
School of Computer Science, Peking University, Beijing 100871, China, and
also with the Telecom SudParis, Institute Polytechnique de Paris, 91011 Evry constraints or prior knowledge on those factors are required
Cedex, France (e-mail: dqzhang@sei.pku.edu.cn). by these methods for hand gesture recognition, as signal

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on May 04,2022 at 12:52:51 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3170157, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL 2

patterns of hand gestures are usually inconsistent and they consistent and distinguishable representations of handwriting
are indistinguishable with primitive features. To show the gestures in a domain-independent way. Specifically, the use of
inconsistency of signal patterns with varying factors, in Fig. 1, DPV and MRV accurately categorizes various handwritings
we visualize CSI amplitudes of four hand moving straight into a big number of groups, thus it is quite straight-forward
lines with different initial locations, directions and lengths. As to distinguish gestures from different groups accurately. In
shown in Fig. 1a, line 1 and line 2 share the common moving order to differentiate gestures in the same group, a hierarchical
direction, but with different initial locations and lengths. As sensing framework is proposed to apply with fine-grained
a result, CSI amplitudes of line 1 and line 2 are significantly feature to refine the classification results.
different. Line 2 and line 3 share the same initial location, but In this way, we construct HandGest that incorporates the
with different moving directions and lengths. Such that, CSI two distinct features and a hierarchical sensing framework to
amplitude of line 2 has more peaks and valleys than line 3 (see recognize the hand gestures robustly. Note that, HandGest is
Fig. 1b). Moreover, when the hand moves another straight line just an initial attempt to address the fundamental problem
with different lengths, i.e., line 4, it appears to share similar in CSI-based gesture/activity recognition – the location and
CSI amplitude with line 1. Apparently, all primitive features orientation dependent problem – in the wireless sensing field,
extracted from CSI measurements are affected by the above- where we use digits and letters recognition as an example.
mentioned three factors. The inconsistency of signal patterns With the aforementioned problem solved, HandGest provides
would fail most of the existing WiFi-based gesture/activity researchers in the community critical insights in system design
recognition methods, which highly rely on the assumption that while leading a way to move on. The main contributions can
there are pair-to-pair feature mappings between signal varia- be summarized as follows:
tions and gestures/activities. This is one fundamental problem
1) Two hand-centric features, i.e., DPV and MRV, are
hindering the existing WiFi-based gesture/activity recognition
constructed to convert every handwriting gesture to a
systems from achieving consistent and high accuracy [18].
unique profile from a hand-oriented view, which is not
To recognize gestures under the change of aforementioned
affected by the initial locations and orientations of the
factors (or namely “configurations” in [19] and “domains”
hand, and drawing sizes.
in [20]), some “domain adaptation” techniques have been
2) A hierarchical sensing framework is proposed for ro-
proposed. Specifically, WiAG [19] first leveraged a translation
bust handwriting recognition in real-world application
function to generate virtual samples with realistic CSI mea-
scenarios, where DPV, MRV and fine-grained feature
surements of hand gestures for other possible configurations.
are fused in a hierarchical decision making pathway to
After that, several classifiers were trained using the generated
ensure the distinguishability of numerous handwriting
virtual samples for pattern recognition under different configu-
gestures.
rations. Note that, an additional smart phone had been request-
3) We design and implement HandGest on commodity
ed to collect data from users, so as to model the translation
WiFi devices, and conduct comprehensive experiments
function. Widar3.0 [20] proposed a domain-independent body-
in real-world settings. Experimental results of handwrit-
coordinate velocity profile (BVP) to unify the characterization
ing gestures performed by 20 volunteers with different
of gestures in the same type with at least three pairs of WiFi
initial locations, drawing orientations/sizes, and WiFi
devices, where the locations of the users and WiFi devices are
transceiver settings in 3 typical indoor environments
requested for BVP construction. In summary, state-of-the-art
show that the average recognition accuracy of HandGest
methods usually acquire additional information to tackle the
outperforms state-of-the-art methods, indicating its ro-
“domain adaptation” issue, which limits their practicability and
bustness and great generalization performance.
applicability in real-world application scenarios.
Our Work and Contributions. In this work, we study The remaining of this paper is organized as follows: Section
a specific hand gesture recognition task, namely “in-the- II reviews recent advances of WiFi-based gesture recognition.
air” handwriting recognition, aiming at recognizing the digits The preliminaries and backgrounds of gesture sensing with
and letters written by a hand in the air. While the state- CSI ratio are given in Section III. Section IV presents the
of-the-art learning-based solutions either are intolerant with methodologies of our work, including the design of DPV and
changing factors/domains/configurations, or rely on additional MRV features, and the hierarchical sensing framework. System
prior knowledge, data annotation, devices and transceiver de- implementation is shown in Section V. Performance evaluation
ployments for hand gesture recognition, we propose to address and further analysis are reported in Section VI, followed
the domain-dependent issue from a different perspective. by some discussions in Section VII. Finally, conclusions are
While all existing gesture recognition methods apply the presented in Section VIII.
transceiver-oriented view to receive CSI and extract primitive
features, it is difficult to recognize the hand gestures robustly
as the pattern of the same gesture may vary from the view II. R ELATED W ORK
of WiFi devices due to initial locations and orientations of
the hand, and drawing sizes. Differently, we characterize the WiFi-based gesture recognition can be grouped into two
relative motion of handwriting using a hand-oriented view categories depending on whether the initial locations and
and construct two distinct features – dynamic phase vector orientations of handwritings have been specified or included
(DPV) and motion rotation variable (MRV) – that provide as the input for recognition.

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on May 04,2022 at 12:52:51 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3170157, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL 3

Line 1 Line 2

Amplitude

Amplitude
Line 2

Line 1 Line 3 Line 4

Line 3 Line 4

Tx Rx

Amplitude

Amplitude
(a) (b)
Fig. 1. Effects of initial locations, orientations and drawing sizes on gesture patterns: (a) Drawing lines with different initial locations, directions and lengths,
(b) Patterns of different lines.

A. WiFi-based Gesture Recognition that Depends on the Ini- advanced location/orientation independent feature or/and re-
tial Hand Locations and Orientations sort to machine/deep learning methods. WiAG [19] proposed
a modified translation function to generate virtual samples of
Most of the existing WiFi-based gesture recognition works
all the gestures in the possible configurations only using the
assume the initial hand locations, orientations and drawing
samples of all the gestures in one configuration. Widar3.0 [20]
sizes of handwritings are known or specified. They barely per-
constructed a domain-independent BVP to guarantee each type
form well with the changes of initial location and orientation
of gestures has its unique velocity profile in the established
of the hand, and drawing sizes. Specifically, WiFinger [15]
body coordinate system, which could achieve satisfactory
adopted multi-dimensional DTW (MD-DTW) to calculate the
recognition accuracy on relatively complex gestures. However,
similarity between the testing CSI measurements and each of
it requires to know the locations of user and WiFi devices in
the known gesture profiles, and the one with highest similarity
advance, and a large number of data for training deep neural
score was identified as recognized gesture. QGesture [17]
network, which limit its practicability and scalability in real-
adopted phase correction algorithm and a novel estimation
world application scenarios. WiHand [25] extracted location
algorithm to reduce the phase noise of CSI measurements
independent gesture signals from the raw WiFi signals through
and impacts of environmental dynamics for high accuracy, but
the low rank and sparse decomposition algorithm, and then
user must perform two predefined actions to separate gesture
utilized SVM as classifier to identify the gestures. Regani et
from daily activities. WiMU [21] was designed for multi-user
al. [26] proposed a unique hand gesture model and derived the
gesture recognition, which first generated virtual samples for
correspondence between the time reversal resonating strength
various plausible combinations of simultaneously performed
decay and relative distance moved by the hand in the through-
gestures, and then classified those gestures utilizing the Jac-
the-wall scenario. They achieved 87% recognition accuracy
card similarity coefficient based method. Yang et al. [22]
on 6 uppercase English alphabets, but the hand movement
designed a modified OpenWrt-based platform, in which deep
must be straight line during performing gestures in the air.
neural networks were adopted to extract the spatiotemporal
Additionally, Ma et al. [27] developed a novel deep neural
patterns of gestures. WiDraw [23] extracted angle-of-arrival
network for gesture recognition, which could simultaneously
(AoA) values from CSI and average received signal strength
learn high-level features and a transferrable similarity eval-
(RSS) measurements for hand motion tracking, but it required
uation ability, making the learned knowledge transferrable
dense transmitters to guarantee its high accuracy. WiGest [24]
between the training dataset and new testing scenarios. While
defined three unique signal states, including rising edge, falling
state-of-the-art methods intend to solve the location/orientation
edge and pause, and parsed the combinations of those signal
dependent problem, they actually are with poor robustness
states to identify various gestures. While the aforementioned
and generalization performance especially when the initial
works have made some progresses in recognition, they fail to
location, orientation and drawing size, etc., are not constrained.
solve the location/orientation dependent problem – a funda-
mental problem hindering the wireless sensing techniques to
be applied in real settings. III. G ESTURE S ENSING WITH CSI R ATIO
Given K subcarriers, denoted as H(f1 ), H(f2 ), . . . H(fK )
B. Location/Orientation Independent WiFi-based Gesture over the frequencies f1 , f2 , . . . fK , CSI aggregates the real-
Recognition time characterization of the K subcarriers of radio propaga-
tion, which can be mathematically represented as
In order to tackle the location/orientation dependent prob-
lem of WiFi-based gesture recognition, some works propose H = [H (f1 ) , H (f2 ) , . . . , H (fK )] . (1)

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on May 04,2022 at 12:52:51 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3170157, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL 4

Specifically, a single subcarrier H(fk ) of CSI can be two antennas can be represented as
expressed as ( d (t)
)
−jθoffset −j2π 1λ
H1 (f, t) e Hs,1 + A1 e
= ( )
̸ H(fk ) H2 (f, t) d2 (t)
H (fk ) = |H (fk )| ej (2) e−jθoffset H + A e−j2π λ s,2 2
d (t)
−j2π 1λ
A1 e +Hs,1 (4)
where |H (fk )| denotes the amplitude, and ̸ H (fk ) denotes = d1 (t)+∆d

the corresponding phase, respectively. A2 e−j2π λ +Hs,2


d1 (t)
In an environment covered by WiFi signals, the radio would A1 e−j2π λ +Hs,1
=
propagate over multiple paths, including line-of-sight (LoS) A2 e −j2π ∆d
λ e−j2π
d1 (t)
λ +Hs,2
path of WiFi transceivers, and reflection paths caused by the
surrounding environment, from the transmitter to the receiver. where H1 (f, t) and H2 (f, t) denote the CSI measurements
To be specific, when the hand moves in the air, CSI would of the first and second antennas, Hs,1 and Hs,2 are the
change dynamically due to the change of the reflection path corresponding static phasor components, ∆d = d2 (t) − d1 (t),
length, where we can further decompose the CSI into the respectively.
static component and the dynamic component [28]. The static According to Eq. (4), we can find that CSI ratio can cancel
component remains the same as the subcarriers propagating most of the random phase offsets, and significantly improve
along with the LoS path and surrounding environment do not SNR. Thus, CSI ratio should be very sensitive to hand motions
change, while the dynamic component varies along with the and is capable of fully utilizing the orthogonal information of
hand movement. In this way, decomposition of CSI into static the CSI amplitude and phase. We conduct an empirical study
and dynamic components could be modeled as follows: to verify whether CSI ratio can capture hand movements. We
 draw a digit “7” in the air (see Fig. 3a), CSI ratio illustrated
d(t) in Fig. 3b and Fig. 3c keeps rotating in the complex plane
H (f, t) = Hs (f, t)+Hd (f, t) = Hs (f, t)+A (f, t) e−j2π λ
with the hand movement, where the length of hand movements
© (3) corresponds to the overall phase change of CSI ratio (in Fig. 3b
where Hs (f, t) is the static phasor component, Hd (f, t) is and Fig. 3c, CSI ratio starts from cool color and ends at
d(t)
the dynamic phasor component, A (f, t), e−j2π λ and d (t) warm color). Moreover, when the hand moves in another
are the attenuation, phase shift and path length, respectively. direction, the rotation direction of CSI ratio also changes. For
As depicted in Fig. 2, the dynamic phasor component rotates example, in Fig. 3c, CSI ratio first rotates counter-clockwise
with the change of the reflection path length, caused by the when the reflection path becomes shorter, while it rotates
hand movement. In terms of the angle of rotation, when the clockwise when having the reflection path lengthened. This
length of the reflection path changes within a wavelength, the case study indicates that CSI ratio can be utilized for sensing
dynamic phasor component rotates on an angle smaller than 2π hand movements at the scale of signal wavelength.
(i.e., a circle). In terms of the direction of rotation, the rotation
direction of the dynamic phasor component also couples with IV. M ETHODOLOGY
the hand moving direction [29].
In this section, we first conduct comprehensive empirical
studies to understand the invariance of CSI ratio for hand-
Fresnel Zones writing recognition tasks, and then introduce the proposed
d=d2-d1
two hand-centric features, i.e., dynamic phase vector (DPV)
Tx and motion rotation variable (MRV), fine-grained feature,
d2 and hierarchical sensing framework for robust handwriting
Hd In-phase
© recognition, respectively.
LoS
A
Hd
B H
d1 Out-of-phase
A. Understanding CSI Ratio Invariance for Handwriting
Hs Recognition
Rx
CSI-based handwriting pattern is usually inconsistent, where
the existing solutions based on primitive features fail to rec-
ognize the same handwriting when the initial hand locations,
Fig. 2. Moving hand in Fresnel zones. the orientations and the sizes of handwriting change. In this
section, we aim at uncovering the invariant patterns of CSI
As we all know, the signal-to-noise ratio (SNR) of the ratio for consistent handwriting recognition under the varying
CSI measurements is usually low due to the hardware and conditions.
environmental noise, and random offset noise in the CSI To understand the inconsistency in handwriting recognition,
phase. It seems difficult to fully capture some small-scale hand we draw digit “3” twice (see Fig. 4a and Fig. 4c). Apparently,
movements with the raw CSI signals. Thus, we adopt CSI ratio their corresponding CSI amplitudes illustrated in Fig. 4b and
to remove the uncertain impulse noise in CSI amplitude and Fig. 4d are significantly different. Actually, this is a common
the random offset noise in CSI phase [30], [31]. CSI ratio of phenomenon, as the hand movements to draw digit “3” are

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on May 04,2022 at 12:52:51 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3170157, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL 5

CSI Ratio of Tx-Rx1 CSI Ratio of Tx-Rx2


-0.1
-0.4
-0.2
-0.5
Tx -0.3
-0.6

Q
Q
-0.4


-0.7
Rx2 Rx1 -0.5
-0.8
-0.6
-0.9
-0.8 -0.6 -0.4 -0.2 0.4 0.5 0.6 0.7 0.8 0.9 1
I I

(a) (b) (c)


Fig. 3. Empirical study of CSI ratio for sensing hand movement: (a) Drawing digit “7” in the air, (b) CSI ratio of Tx-Rx1, (c) CSI ratio of Tx-Rx2.

slightly different every time. CSI amplitude could be consid-


ered as a kind of primitive feature, which characterizes hand
movements from a transceiver-oriented view, highly depending
on the initial locations and orientations of the hand, and
drawing sizes, easily leading to inconsistency in recognition.
Oppositely, when we characterize hand movements from a
hand-oriented view, recording every instantaneous status of
hand movement at each time step, we can find that the hand
first rotates clockwise, then turns around, and latter rotates
clockwise. This pattern exists consistently when the hand Fig. 5. Hand moving directions of drawing digit “3” in different manners.
writes the digit “3” (see Fig. 5), no matter how initial locations
of the hand and the orientations/sizes of handwriting change.
Remark 2. We consider the moving hand as a mass point
in the space. When we handwrite digits in front of WiFi
Tx-Rx1 transceivers, CSI signals are reflected by several parts of hu-
0.34 20

18
man body, such as hand, arm and even shoulder. Furthermore,
Amplitude

0.32
16 the CSI signals of more than two times reflection are usually
0.3 14
weak, it is reasonable to ignore them. In this way, we only
12
y (cm)

0.28
0 200 400 600 800 1000 1200 1400 1600 1800
Tx-Rx2
consider the CSI signals that directly propagate to the moving
8
0.26
hand and then are reflected by the hand to the WiFi receivers.
Amplitude

0.24
4

0.22 2
0.27 0.28 0.29 0.3 0.31 0.32 0.33 0.34 0 200 400 600 800 1000 1200 1400 1600 1800
x (cm) Samples

(a) (b)

Tx-Rx1
0.34 16
Qtk
15
Amplitude

0.33
14
0.32 13

12
0.31
y (cm)

0 200 400 600 800 1000 1200 1400 1600 1800


Tx-Rx2
0.3 7
Amplitude

0.29 6

0.28 5

0.27
0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39
4
0 200 400 600 800 1000 1200 1400 1600 1800
Fig. 6. Illustration of DPV feature.
x (cm) Samples

(c) (d)
Fig. 4. Empirical study of drawing digit “3”: (a) Drawing digit “3” for the B. Hand-Centric Feature – Dynamic Phase Vector
first time, (b) CSI amplitude of the first digit “3”, (c) Drawing digit “3” for
the second time, (d) CSI amplitude of the second digit “3”. Practically, when performing the same handwriting, it is
impossible to request users to keep the same initial location-
Remark 1. Without loss of generality, we take digits “0- s/orientations of the hand, and sizes of handwriting every
9” as an example to illustrate the proposed hand-centric fea- time. From a transceiver-oriented view, the change of such
tures, fine-grained feature, and hierarchical sensing framework factors makes the representation of hand movement different.
in the following sections. We use these ten digits as examples Therefore, it is worthwhile to construct a hand-centric feature
for handwriting recognition and uncover every digit with a that can characterize the hand movement in a hand-oriented
unique and consistent pattern from a hand-oriented view. view.

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on May 04,2022 at 12:52:51 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3170157, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL 6

DPV Pattern of Digit "0" DPV Pattern of Digit "1" DPV Pattern of Digit "2" DPV Pattern of Digit "3" DPV Pattern of Digit "4"
2 0 3
2 2
2
1 -1
Phase Changes

Phase Changes

Phase Changes

Phase Changes

Phase Changes
1 1 1
0 -2
0 0 0
-1 -3
-1
-1 -1
-2 -4 -2
0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
Samples Samples Samples Samples Samples

(a) Digit “0” (b) Digit “1” (c) Digit “2” (d) Digit “3” (e) Digit “4”

DPV Pattern of Digit "5" DPV Pattern of Digit "6" DPV Pattern of Digit "7" DPV Pattern of Digit "8" DPV Pattern of Digit "9"
2 1.5 2 2

1 1

1 1 1
Phase Changes

Phase Changes

Phase Changes

Phase Changes

Phase Changes
0.5
0
0
0 0 0
-0.5 -1

-1
-1 -1 -1 -2
-1.5

-2 -2 -2 -2 -3
0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
Samples Samples Samples Samples Samples

(f) Digit “5” (g) Digit “6” (h) Digit “7” (i) Digit “8” (j) Digit “9”
Fig. 7. Pattern templates of DPV of the ten digits.

As shown in Fig. 6, we establish a coordinate system “2” and “4”, digits “5”, “6” and “9” should be two independent
with initial location of the hand as the origin. We define a subgroups.
modified hand-centric feature, named dynamic phase vector We still take digit “3” as an example to demonstrate the
(DPV), as the phase variation between the initial location of representation capability of DPV for the hand movement. As
the hand and the horizontal axis of each point on the trajectory. illustrated in Fig. 8a, when we draw digit “3” without stroke
Mathematically, dynamic phase sequence can be represented reversal, its DPV first rotates clockwise and then counter
as clockwise (see Fig. 8c and Fig. 8d). When we draw digit
Θ= {Θt1 , Θt2 , ..., Θtn } (5) “3” with stroke reversal, the corresponding DPV also first
rotates clockwise and then counter clockwise (see Fig. 8b,
where Θtk stands for the dynamic phase at time step tk , k ∈
Fig. 8c and Fig. 8d). According to Fig. 6, digit “3” consists
[1, n].
three components in DPV feature space, sequentially denoted
Note that, with the hand movement in ideal settings, CSI
by blue, green and red, corresponding to the three-segment
of every subcarrier should vary consistently with the same
curves in Fig. 7d. DPV can be regraded as a kind of coarse-
pace while the reflection path caused by the moving hand
grained hand-centric feature, which has excellent representa-
could be measured by any of them. However, due to multipath
tion capability for the hand movement and is not sensitive
propagation and random noise, etc., such consistency may
to the local characteristics. Thus, it can efficiently reduce the
not always exist during the whole hand moving process in
effects of different handwriting styles on the same digit. Fig. 9
practical scenarios. Based on the observation, we find that
illustrates a practical case of drawing digit “3” without and
the subcarrier, whose phase change difference has the lowest
with stroke reversal, we can find that the two corresponding
variance, can characterize the hand movement better than
DPV patterns are consistent, verifying the above analysis about
others in a sliding window. Therefore, to refine DPV, we select
the representation capability of DPV for the hand movement.
the subcarrier with the lowest variance in each sliding window:
Υtk ,Q = argmin var (Υtk ,q ) (6)
C. Hand-Centric Feature - Motion Rotation Variable
where Υtk ,q denotes the dynamic phase of the specific qth In this section, we introduce another hand-centric feature,
subcarrier at time step tk , q ∈ [1, K], and Υtk ,i is the dynamic named motion rotation variable (MRV). As was discussed,
phase of the ith subcarrier, respectively. DPV represents digits with consistent patterns, however, it
Fig. 7 illustrates the pattern templates of DPV of the ten also fails to provide distinguishable characterizations to the
digits in three independent experiments of different settings, digits “2” and “4”, and digits “5”, “6” and “9”. For example,
where every pattern template refers to the phase of DPV in Fig. 10, the DPVs of both digits “2” and “4” first rotate
changing over time when handwriting the digit in a common clockwise and then counter clockwise. It means that the
way. Apparently, DPV is capable of preserving characteristics essential difference between digits “2” and “4” is the different
of all digits through capturing the hand movement in the hand- segment sequence, but DPV cannot characterize these detailed
oriented view. Though these digits are not fully distinguishable segments (including straight line and arc, etc.) well.
using the DPV features, DPV is still able to categorize them To tackle the problem, we propose MRV, as the angle
into several subgroups with consistent patterns. For example, variation of adjacent sampling points in a certain period of
digits “0”, “1”, “3”, “7” and “8” have their own unique time (see Fig. 11):
patterns, but digits “2” and “4” have similar DPV patterns,
digits “5”, “6” and “9” have similar DPV patterns, thus digits ∆θ= {∆θt1 , ∆θt2 , ..., ∆θtn } (7)

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on May 04,2022 at 12:52:51 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3170157, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL 7

Qt Qt

Stroke Reversal

(a) (b)
Fig. 10. Similarity between digits “2” and “4” in DPV feature space.

Tx

qt k

(c) (d) Rx2 Rx1


qt
Fig. 8. Theoretical analysis of representation capability of DPV for the hand k -1

movement.

0.48 0.5 Fig. 11. Illustration of MRV feature.


0.46 0.48

0.44 0.46
Phase Changes

Phase Changes

0.42 0.44

0.4 0.42
distinguished patterns. Similarly, digit “6” has different MRV
0.38 0.4 pattern compared with digits “5” and “9”. Accordingly, the ten
0.36 0.38 digits can be furthered classified with the help of MRV. Un-
0.34 0.36
0.01 0.02 0.03 0.04 0.05
Samples
0.06 0.07 0.05 0.1
Samples
0.15 fortunately, patterns of some digits are still similar, motivating
(a) (b) us to construct targeted fine-grained feature to try to separate
them.
0
-0.4
-0.5
-0.6 D. Fine-grained Feature
Phase Changes
Phase Changes

-1
-0.8
-1.5 -1 In this section, we introduce fine-grained feature designed
-2
-1.2
-1.4
for the indistinguishable digits by DPV and MRV. As men-
-2.5
-1.6 tioned above, two digits are still indistinguishable, but they
-3
0 500 1000 1500
-1.8
0 200 400 600 800 1000 1200 1400 could be divided into one subgroup. Thus, we can construct
Samples Samples

(c) (d)
case-by-case fine-grained feature depending on the characteris-
tics of the digits in the subgroup to finally separate all of them.
Fig. 9. Empirical study of drawing digit “3” without and with stroke reversal:
(a) Digit “3” without stroke reversal, (b) Digit “3” with stroke reversal, (c) The constructed fine-grained feature is actually the qualitative
DPV pattern of digit “3” without stroke reversal, (d) DPV pattern of digit “3” representation of primitive features, which is enough for the
with stroke reversal. rest of unrecognized digits in the subgroup.
According to Fig. 7 and Fig. 12, digits “5” and “9” have
where ∆θtk is the angle variation at time step tk (θtk and similar DPV and MPV patterns. Experimentally, the velocity
θtk−1 represent the instantaneous angles at time step tk and directions of the last segment of digits “5” and “9” are
tk−1 ): significantly different relative to link Tx-Rx1. The velocity
direction of the last segment of dight “5” is always positive,
∆θtk =θtk − θtk−1 (8) and the corresponding one of digit “9” is negative. Thus, the
velocity direction of the last segment can be a potential fine-
According to Eq. (8), ∆θtk can be either positive or negative grained feature for separating digits “5” and “9”. Practically,
depending on the hand moving direction. Specifically, if hand we randomly select some digits “5” and “9” to verify this
rotates clockwise, ∆θtk > 0, and if hand rotates counter observation, which are demonstrated in Fig. 13. According
clockwise, ∆θtk < 0. Thus, MRV also can be utilized to show to Fig. 13, we can find that the last segment of velocity
the hand moving direction. directions of the selected digit “5” are always positive, and
Fig. 12 illustrates the pattern templates of MRV of the ten the corresponding ones of the selected digit “9” are always
digits in three independent experiments of different settings, negative, indicating that velocity direction is enough for only
where every pattern template refers to the angle of MRV recognizing digits “5” and “9”. The velocity direction of the
changing over time when handwriting the digit. In Section last segment of handwriting is only a kind of case-by-case
IV-B, we have demonstrated that DPV cannot separate digits fine-grained feature proposed in this paper. If more gestures
“2” and “4”, and they can be identified using MRV with are supposed to be recognized using the proposed method in

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on May 04,2022 at 12:52:51 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3170157, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL 8

MRV Pattern of Digit "0" MRV Pattern of Digit "1" MRV Pattern of Digit "2" MRV Pattern of Digit "3" MRV Pattern of Digit "4"
2 0 2 2 1.5

1.5 1
1 -1 1
Angle Changes

Angle Changes

Angle Changes

Angle Changes

Angle Changes
1
0.5
0.5
0 -2 0 0
0
-0.5
-0.5
-1 -3 -1
-1 -1

-2 -4 -1.5 -2 -1.5
0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
Samples Samples Samples Samples Samples

(a) Digit “0” (b) Digit “1” (c) Digit “2” (d) Digit “3” (e) Digit “4”

MRV Pattern of Digit "5" MRV Pattern of Digit "6" MRV Pattern of Digit "7" MRV Pattern of Digit "8" MRV Pattern of Digit "9"
2 2 1.5 2
2
1
1 1 1
Angle Changes

Angle Changes

Angle Changes

Angle Changes
0.5 1
Angles

0
0 0 0
0
-1 -0.5
-1 -1 -1
-1
-2
-2 -1.5 -2 -2
0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000
Samples Samples Samples Samples Samples

(f) Digit “5” (g) Digit “6” (h) Digit “7” (i) Digit “8” (j) Digit “9”
Fig. 12. Pattern templates of MRV of the ten digits.

a location/orientation independent manner, more features need Each or several digits have their own common char-

to be constructed following the idea. acteristics. A certain feature, including both traditional
primitive feature and hand-centric feature, may have
good representation capability only for specific digits, but
Tx-Rx1 Tx-Rx1
0.04
relatively poor representation capability for other digits.
0.02
5 0.02 5
0
0
-0.02
Thus, it is impossible to completely separate all the digits
-0.02
-0.04
-0.06
only utilizing any of the feature. For example, DPV
0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200
Tx-Rx2 Tx-Rx2 cannot identify digits “2” and “4”, and MRV does not
0

-0.05
0
function well for digits “0” and “6”.
-0.05
-0.1 • Every feature has different representation capability for
-0.1 -0.15
0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 different digits, if all the features are utilized equally, the
(a) (b) features with poor representation capability for specific
digits may provide redundant information, leading to
Tx-Rx1 Tx-Rx1
0.05 0.02 performance loss.
0
0 In this way, we design the hierarchical sensing framework
9 -0.02
9 as illustrated in Fig. 14. Accordingly, all the digits are divided
-0.05 -0.04
0 200 400 600 800 1000 0 200 400 600 800
Tx-Rx2
0.02
Tx-Rx2 into several subgroups by DPV in the first layer, named first
0.02

0 0
coarse-grained classification. In the next layer, MRV-based
-0.02 -0.02 second coarse-grained classification is implemented for the
0 200 400 600 800 1000 0 200 400 600 800 grouped digits. After that, unrecognized digits with similar
(c) (d) MRV patterns have been categorized into smaller subgroup.
Fig. 13. Empirical study of identifying digits “5” and “9” using targeted In the final layer, targeted finer-grained feature is designed
fine-grained feature. for the subgroup to separate the digits completely, i.e., fine-
grained classification. Fig. 15 illustrates the hierarchical sens-
ing framework for English letters, which also employs a three-
E. Hierarchical Sensing Framework layer structure with DPV-based first coarse-grained classifi-
With the above three types of features, in this section, we cation, MRV-based second coarse-grained classification, and
present a hierarchical sensing framework for the handwriting fine-grained classification.
recognition. The objective of the hierarchical sensing frame-
work is to enhance the discriminant capability of the features V. S YSTEM I MPLEMENTATION
for pattern recognition, while the design of the framework is In this section, we detail the design and implementation of
motivated by the following observations: HandGest, which consists of data collection, data preprocess-
• Compared with traditional primitive features, the pro- ing, feature construction, and hierarchical sensing components.
posed hand-centric features, i.e., DPV and MRV, play Fig. 16 depicts its overview architecture, in which data collec-
important roles in maintaining pattern consistency of dig- tion component collects raw CSI measurements from the WiFi
its, which make the digit patterns not change significantly receivers and divides them to produce CSI ratio streams, data
due to the difference of initial location and orientation of preprocessing component is mainly for denoising CSI ratio
the hand, and drawing size, etc. data through Savitzky-Golay (SG) filter, feature construction

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on May 04,2022 at 12:52:51 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3170157, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL 9

0 Classification Pool
Fine-grained Velocity
1 feature direction 5
Rx1
3 relative to link 9
7 Tx-Rx1 3
DPV 2 5 Red digits have similar 5 9 7
Tx 4 9 DPV/MRV patterns
5 MRV 2 1
6 4 Green digits have
Rx2
8 6 Distinguishable
8 DPV/MRV patterns
9 Labeling
Fig. 14. Hierarchical sensing framework for digits.

component constructs DPV, MRV, and fine-grained feature


Rx1
based on the denoised CSI ratio, and hierarchical sensing Rx1 Rx2
component identifies handwriting gestures.
TCP/IP Data Collection
Tx
Rx1 lm TCP/IP
T CSI CSI CSI
ags pz
DPV
Rx2
kqw ce
Tx
Differential Data Denosing CSI Ratio
MRV Fine-grained
Rx2 feature Data Pre-processing
kq ag
ac
ps
eg Feature Construction
wz ce
DPV Construction

l No
Recognized
Yes

q m Subgroup1
p Classification Pool Hierarchical
MRV Contruction
Sensing
Fig. 15. Hierarchical sensing framework for letters. Yes
Recognized
No
Subgroup2
A. Data Collection
Fine-grained
In this component, we collect CSI measurements from the Feature Labeling
two WiFi receivers equipped with Intel 5300 NICs with three Construction

antennas for packet receiving. CSI measurements contain 30


subcarriers, each of them is a 3×30 complex matrix. We send Fig. 16. Overview architecture of HandGest.
all the CSI data to a Dell XPS 7590 laptop via TCP/IP socket.
Finally, we divide the two complex CSI measurements from
the two antennas at the same WiFi receiver to obtain CSI. C. Feature Construction
The sampling rate is set as 500 Hz for a tradeoff between
recognition resolution and time consumption of HandGest. In this component, we first construct DPV by calculating the
phase variation between the starting point of the hand move-
B. Data Preprocessing ment and the horizontal axis of each point on the trajectory
at each time step, establishing a dynamic phase sequence. As
Compared with raw CSI, the SNR of CSI ratio is higher,
mentioned above, HandGest does not have specific constraints
but it still needs to be smoothed for revealing the circular
on initial locations of the hand, and it cannot directly calculate
shape of CSI ratio clearly for phase change extraction in the
the trajectory of hand movement, thus we construct DPV by
complex plane. Considering that SG filter can fit successive
subset of data points with low degree polynomial and little
( )
distortion through linear least-square manner, we adopt it to Φ1tk
denoise and smooth the CSI ratio data [32], [33]. Θtk = arctan (9)
Φ2tk

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on May 04,2022 at 12:52:51 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3170157, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL 10

where Φ1tk and Φ2tk denote the phase of CSI ratio of links state-of-the-art methods. Finally, we show experimental results
Tx-Rx1 and Tx-Rx2. of HandGest with different initial locations of the hand, draw-
DPV has good representation capability for characterizing ing sizes, drawing speeds, drawing manners, LoS distances of
the hand movement, which can reduce the effects of drawing WiFi transceivers, angles between WiFi transceivers, sampling
inconsistency of the same handwriting gestures, but it may not rates, static environments, and dynamic environments, etc.
identify all the digits only using DPV. Thus, we then calculate
the angle variation of adjacent sampling points in a certain A. Experimental Settings
period of time to construct MRV: The experimental environments include an office, a meeting
room, and an empty hall of Peking University, layouts of
∆Φ1tk = Φ1tk − Φ1tk−1 (10) the office and meeting room are illustrated in Fig. 17. The
size of the office is 5m×5m, the size of the meeting room
is 6m×8m, and the size of the empty hall is 8m×12m,
∆Φ2tk = Φ2tk − Φ2tk−1 (11) respectively. Accordingly, both of the office and meeting room
are multipath-rich due to their complex layouts. The tested
( )
∆Φ1tk handwriting gestures include selected digits and letters.
∆θtk = arctan (12)
∆Φ2tk
Differently, MRV is sensitive to the local characteristics of
hand movement, which is discriminative to the gestures with
similar DPV patterns. Finally, targeted fine-grained feature is
proposed to identify the unrecognized gestures by DPV and
MRV, such as velocity relative to link Tx-Rx1, direction of
displacement relative to link Tx-Rx1, etc.

D. Hierarchical Sensing
In this component, all the constructed features are embedded (a) (b)
into the modified hierarchical sensing framework to identify Fig. 17. Layouts of the office and meeting room: (a) Office, (b) Meeting
handwriting gestures, enhancing discriminant capability of room.
those features in different layers. Specifically, first coarse- In each of the three indoor environments, we build a 2-
grained classification separates some of the gestures using dimensional Fresnel zones with two pairs of WiFi transceivers,
DPV, in which all the handwriting gestures have their own utilizing GigaByte mini PCs with an Intel 5300 NIC and
consistent patterns. In addition, second coarse-grained classi- external omni-directional antennas for data transmitting and
fication is performed for the handwriting gestures with similar receiving, as shown in Fig. 18 (experimental locations of
DPV patterns using MRV. Finally, fine-grained classification HandGest are denoted by the red points in Fig. 17). HandGest
is implemented for the unrecognized handwriting gestures by runs in 5.24 GHz frequency with 40 MHz, and the packet
DPV and MRV. In the offline preparation phase, to minimize transmission rate is set as 500 packets per second, installing
the efforts required for training and calibration, our method CSITOOL on the mini PCs for data collection.
only needs to collect the WiFi signals for each handwrit-
ing once as the training sample. In the online recognition
phase, when an unknown handwriting is drawn and processed,
HandGest extracts the two features from the WiFi signals
and searches the nearest example as the recognition label
using DTW by calculating the Euclidean distance of DPV
and MRV features between the unknown handwriting and the
template handwritings. Note that, no matter the handwritings
are performed by whom and in which location/orientation
settings, the proposed method is capable of producing a
robust model that can accurately recognize the handwritings
performed by different persons in varying location/orientation
settings.
Fig. 18. Experimental setting in the office.

VI. P ERFORMANCE E VALUATION We recruit 20 volunteers to participate our experiments,


In this section, we conduct comprehensive experiments to including 10 males and 10 females. There are 3 experienced
evaluate the performance of HandGest in various scenarios. participants (i.e., #1-#3), 12 of them have no prior knowledge
Specifically, we first introduce experimental settings, including about the proposed methods for CSI-based gesture recognition
experimental environments, deployment of WiFi devices, and (i.e., #4-#15), and the rest of them know nothing about WiFi
participants, etc. Furthermore, we compare HandGest with sensing (i.e., #16-#20).

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on May 04,2022 at 12:52:51 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3170157, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL 11

B. Overall Comparisons 100


In this section, we compare HandGest with selected state-
of-the-art CSI-based gesture recognition methods, including 80
WiFinger [15], Widar3.0 [20], FingerDraw [29] and AirDraw
[34], and machine/deep learning methods commonly utilized 60

Accuracy
in WiFi sensing, including support vector machine (SVM),
convolutional neural network (CNN), long short-term memory 40

(LSTM), and CNN+LSTM. We train SVM, CNN, LSTM,


and CNN+LSTM using the data collected from the randomly 20 HandGest SVM CNN+LSTM
WiFinger CNN FingerDraw
selective 15 volunteers, and test them using the data collected Widar3.0 LSTM AirDraw
from the rest 5 volunteers. 0
Office Meeting Room Empty Hall
According to Fig. 19, HandGest outperforms all the com-
parison methods. In addition, the recognition accuracy of
Fig. 19. Comparison results of HandGest and state-of-the-art methods.
Widar3.0 is about 73%, which is higher than WiFinger, SVM,
CNN, LSTM, CNN+LSTM. Because Widar3.0 utilizes a mod-
ified BVP to describe the power distribution over different 0 0.92 0.00 0.00 0.00 0.00 0.00 0.08 0.00 0.00 0.00
velocities of body parts involved in gesture performing, and 1 0.00 0.96 0.00 0.00 0.00 0.00 0.00 0.02 0.00 0.02
train the deep learning model. However, Widar3.0 may not 2 0.00 0.00 0.95 0.05 0.00 0.00 0.00 0.00 0.00 0.00
generate enough resolution ratio to well identify centimeter- 3 0.00 0.00 0.03 0.96 0.00 0.01 0.00 0.00 0.00 0.00

True Label
level gestures. Furthermore, it needs at least three wireless 4 0.00 0.00 0.00 0.00 0.96 0.03 0.00 0.00 0.01 0.00
links, and the locations of user and WiFi devices are required, 5 0.00 0.00 0.00 0.00 0.01 0.94 0.00 0.00 0.00 0.05
which may be practically inconvenient. HandGest achieves 6 0.10 0.00 0.00 0.00 0.00 0.00 0.90 0.00 0.00 0.00
better recognition accuracy than FingerDraw and AirDraw. 7 0.01 0.02 0.00 0.00 0.00 0.00 0.00 0.98 0.00 0.00
The main reasons include: 1) Both FingerDraw and AirDraw 8 0.01 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.94 0.02

calculate the displacement information of the hand/finger to 9 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.01 0.00 0.97

reconstruct the drawing trajectory, but the accumulated error 0 1 2 3 4 5 6 7 8 9


Predicted Label
may lead to poor drawing trajectory. 2) When we draw the
digits in 180 degree, Microsoft Azure OCR used in Finger- Fig. 20. Confusion matrix of digits.
Draw and AirDraw for classification cannot recognize such
kind of digits accurately. Differently, HandGest characterizes
the relative motions of hand gestures, thus the accumulated overall accuracies, which are not affected by the different
error has no effect on recognition accuracy, and it can achieve initial locations of the hand. We can find that #1-#3 achieve
better recognition accuracy under different orientations and the highest accuracies with more than 95%, and the accura-
initial locations of the hand, and drawing sizes. Machine/deep cies of others concentrate at 90%, indicating the robustness
learning-based methods do not achieve good performance, and great generalization performance of HandGest with both
because they assume that all the gestures have their unique experienced and unexperienced participants.
patterns. As mentioned above, pattern consistency highly
depends on the initial location of the hand in Fresnel zones
and the number of Fresnel zones it moves across. In order 96
to present the recognition accuracy of each individual digit,
we select samples collected from different environments and
94
volunteers to construct a confusion matrix of digits (see
Fig. 20). Accordingly, the overall average recognition accuracy
Accuracy

92
is around 95%.

90
C. Evaluation of Robustness and Generalization Performance
In this section, we evaluate the robustness and generalization 88
performance of HandGest under various factors.
1) Impact of Different Initial Locations of the Hand: We
print templates for all the selected digits and letters on a 1 2 3 4 5 6 7 8 9 1011121314151617181920
cardboard, and hang them in the sensing area between the Volunteer Number
two pairs of WiFi transceivers, making participants draw at
Fig. 21. Comparison results of different participants without restrictions on
different initial locations by adjusting the locations of those initial location of the hand.
templates. In this manner, there are no special constraints
on the initial locations of the hand, all the participants can 2) Impact of Different Drawing Sizes: We print three
perform handwriting gesture at any locations naturally be- templates on a cardboard in three different sizes (5cm×7cm
tween the WiFi transceivers. Fig. 21 depicts the corresponding for small size handwriting gestures, 7cm×10cm for middle

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on May 04,2022 at 12:52:51 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3170157, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL 12

TABLE I TABLE III


C OMPARISON R ESULTS OF D IFFERENT D RAWING S IZES C OMPARISON R ESULTS OF D IFFERENT L O S D ISTANCES

Small size Middle size Large size 50 cm 75 cm 100 cm 125 cm 150 cm

Accuracy 95.35% 95.79% 92.97% Accuracy 95.41% 95.62% 94.63% 93.50% 91.30%

TABLE II TABLE IV
C OMPARISON R ESULTS OF D IFFERENT D RAWING S PEEDS C OMPARISON R ESULTS OF D IFFERENT A NGLES BETWEEN W I F I
T RANSCEIVERS
Low speed Normal speed High speed
60 degree 90 degree 120 degree
Accuracy 94.17% 95.10% 90.79%
Accuracy 94.71% 96.03% 95.09%

size handwriting gestures, and 10cm×12cm for large size


handwriting gestures) and hang them in front of the volunteers 5) Impact of Different LoS Distances: We set five different
as the ground truth of drawing sizes. We select one volunteer LoS distances of WiFi transceivers, including 60cm, 75cm,
to draw digits and letters in the meeting room, and the cor- 100cm, 125cm and 150cm, and ask a volunteer to perform
responding results are listed in Table I. According to Table I, handwriting gestures in each setting. The comparison results
we can find that the overall accuracies of different drawing are list in Table III, we can find that the accuracies under short
sizes are comparable only with little fluctuations. LoS distances are little better than the one under longer LoS
3) Impact of Different Drawing Speeds: We ask a volunteer distance. The main reason is that the reflected WiFi signals of
to perform handwriting gestures in three different speeds, the hand under short LoS distance scenarios are stronger than
including low speed about 3 cm/s, normal speed about 5 cm/s, the ones under longer setting, which can reduce the effects of
and high speed about 10 cm/s, the detailed results are listed in other moving objects in the surroundings.
Table II. Accordingly, both of the low speed and normal speed 6) Impact of Different Angles between WiFi Transceivers:
achieve better accuracies than high speed. We conjecture that We set three different angles between WiFi transceivers,
the hand cannot perform the handwriting gestures in a standard including 60 degree, 90 degree and 120 degree. According
manner with high speed, leading to performance degradation. to the results listed in Table IV, HandGest can achieve similar
4) Impact of Different Drawing Manners: Similar to the accuracies with different angles between WiFi transceivers,
evaluation of different initial locations of the hand, there are indicating that HandGest does not need additional efforts on
still no specific constraints on drawing manners, but with fixed deployment of angles between devices.
initial locations. As we all know, it is difficult to perform one 7) Impact of Different Sampling Rates: Sampling rate is
specific handwriting gesture in the same manner every time. actually important to guarantee the satisfactory accuracy, thus
The overall accuracies are depicted in Fig. 22, in which all we set five different sampling rates to evaluate its effects,
the volunteers can achieve satisfactory results as long as they including 50 Hz, 100 Hz, 150 Hz, 300 Hz and 500 Hz.
follow the standard styles of digits and letters, indicating that Fig. 23 depicts the corresponding comparison results, when
the proposed hand-centric features and hierarchical sensing the sampling rate is less than 100 Hz, the accuracies are worse
mechanism can reduce or even eliminate the effects of drawing than the ones with higher sampling rates, the corresponding
inconsistency on recognition accuracy. accuracies become stable with small fluctuations, indicating
that the lower sampling rate is not enough for capturing hand
movements. In practical scenarios, we should do a tradeoff
96 between recognition accuracy and system running time, be-
cause high sampling rate may lead to HandGest relatively
95 time-consuming.
8) Impact of Different Static Environments: As mentioned
94
above, we implement HandGest in three typical indoor envi-
Accuracy

93 ronments, including an office, a meeting room, and an empty


hall. The experimental results are detailed in Table V, we can
92 find that there is no obvious difference among them. Because
HandGest recognizes handwriting gestures depending on the
91 reflected WiFi signals in the limited sensing area, the changes
of multipaths superimposed in different indoor environments
90
have negligible effects on the recognition accuracy.
1 2 3 4 5 6 7 8 9 1011121314151617181920
9) Impact of Dynamic Environments: We conduct experi-
Volunteer Number
ments in dynamic environments to evaluate the performance
Fig. 22. Comparison results of different volunteers without restrictions on of HandGest in dynamic scenarios. Specifically, we open and
drawing manners. close door and window while the user performs handwriting in

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on May 04,2022 at 12:52:51 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3170157, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL 13

VII. D ISCUSSIONS
100
In this section, we discuss several open issues of this
work, including the flexibility of the proposed hierarchical
80
sensing framework, interpretability of feature extraction for
WiFi sensing, additional benefits of the proposed hierarchical
Accuracy

60 sensing mechanism, and limitations of HandGest.

40 A. Flexibility of the Hierarchical Sensing Framework


We demonstrate the proposed hierarchical sensing frame-
20 works for recognizing digits and letters in Fig. 14 and Fig. 15,
respectively. In the three-layer framework, DPV-based first
0 coarse-grained classification and MRV-based second coarse-
50 100 150 300 500 grained classification are performed to ensure the digits and
Sampling Rates letters have the unique profiles, and targeted fine-grained
features are constructed in the third layer for specific digits and
Fig. 23. Comparison results of different sampling rates.
letters, making the distance among handwriting gestures with
similar DPV or MRV patterns smaller and the distance among
TABLE V handwriting gestures with different DPV or MRV patterns
C OMPARISON R ESULTS OF S TATIC E NVIRONMENTS larger. The above experiments show that such kind of hierar-
chical sensing framework can recognize digits and letters with
Office Meeting room Empty hall robustness and great generalization performance. It should be
Accuracy 95.73% 95.21% 96.02% noted that all the modules in this framework can be withdrawn
and adjusted, and new modules and more modified features can
be embedded depending on recognition granularity and type
TABLE VI of handwriting, indicating its good flexibility.
C OMPARISON R ESULTS OF DYNAMIC E NVIRONMENTS
B. Interpretability of Feature Extraction for WiFi Sensing
Office Meeting room
Data-based methods attract much more attention in WiFi
Door open 93.42% 95.30% sensing field due to the easier implementation, which aim at
finding fixed feature mapping between signal variation and
Door closed 95.99% 95.77%
activity using machine/deep learning techniques [35]–[39]. In
Window open 94.93% 94.06% simple scenarios, data-based methods may achieve satisfactory
performance when the relationship between signal variation
Window closed 95.11% 94.63% and activity is consistent. It may be difficult to find pair-to-pair
feature mappings only using machine learning due to the rich
multipath and/or other random interference. In this situation,
TABLE VII
C OMPARISON R ESULTS OF P EOPLE M OVING AT D IFFERENT D ISTANCES deep learning is usually adopted to extract more discriminative
features through the complicated multilayer structure. How-
Office Meeting room Empty hall ever, when two different digits have similar patterns, deep
learning may fail to extract discriminative features, such as
2m 94.31% 94.90% 95.55% digits “5”, “6” and “9” have similar DPV patterns [40]. In
1.5m 95.20% 95.36% 95.02%
addition, deep learning suffers from time-consuming training
process due to fine-tuning of a large number of parameters, and
1m 93.32% 92.01% 94.60% this complication makes it difficult to theoretically understand
0.5m 90.20% 90.62% 91.93%
and interpret the feature extraction process [41], [42]. Thus,
the “black-box” deep learning makes people lose understand-
ing the essence of WiFi sensing. Differently, the proposed
hierarchical sensing mechanism enables feature extraction of
front of WiFi transceivers. The experimental results are listed WiFi sensing interpretable and visualized, which is very useful
in Table VI, which indicate the performance of HandGest for WiFi sensing, as shown in Fig. 24. In the first layer,
has not been affected seriously. In addition, we ask people to DPV represents the general movement of the hand, it may
move around the WiFi transceivers. According to Table VII, be some instantaneous points at each time step, describing the
HandGest still could achieve good performance when people outline of specific gesture. In the second layer, MRV has good
move far from 0.5m away. When people are too close to representation capability for the local characteristics of specific
the WiFi devices, we cannot achieve the satisfactory DPV handwriting gesture, making its texture clearer. Finally, fine-
and MRV patterns, because the movement of the surrounding grained feature can describe qualitative details well, we can
people significantly affects the received signals. recognize the profile of this handwriting gesture.

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on May 04,2022 at 12:52:51 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3170157, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL 14

DPV MRV
Fine-grained
Labeling hands by increasing additional WiFi devices to construct 3-
Features
dimensional Fresnel zones in a multiview manner. Moreover,
we treat the moving hand as a mass point due to the weak
Unknown
reflection of other parts of human body. However, it should
Gestures
be multiple points reflection when our body shakes vigorously
during performing handwritings, which is a challenging issue.
General
Local Details Qualitative Details While the proposed hierarchical sensing framework is extensi-
Moving Trend
ble, but more fine-grained features are needed to be identified
Fig. 24. Interpretable feature extraction of handwriting gesture recognition. if additional types of handwritings are required for recognition.
VIII. C ONCLUSION
C. Additional Benefits of the Hierarchical Sensing Mechanism In this paper, we design and implement HandGest, the
In this paper, we propose the hierarchical sensing mechanis- real-time handwriting recognition system using commodity
m for handwriting recognition, but we envision that it should WiFi devices. Specifically, we first construct two hand-centric
be a general methodology to guarantee the robustness and gen- features, i.e., DPV and MRV, to ensure the consistency and
eralization performance for many WiFi sensing applications. discriminability of handwriting, in the feature spaces, without
In addition, we also can identify handwriting gestures with strict constraints of initial locations and orientations of the
similar trajectories using the hand-centric features embedded hand, and drawing sizes, etc. Furthermore, we propose a
in the hierarchical structure. For example, we can draw digit hierarchical sensing framework that only needs two pairs
“0” tall and slender, and letter “o” short and fat. Some of commodity WiFi devices and achieves good robustness
existing methods directly utilize quantitative features, such as and great generalization performance in practical scenarios
velocity and accumulated displacement, making them achieve through a layer-by-layer recognition procedure. Comprehen-
satisfactory accuracy only when the hand moves in specific sive experiments on a large number of handwritings performed
areas of the two pairs of WiFi transceivers. Because the by 20 volunteers in different real-world settings demonstrate
intersected Fresnel zones of the two pairs of WiFi transceivers that HandGest outperforms state-of-the-art solutions without
are orthogonal at specific areas between them (illustrated as knowing or constraining the initial locations and orientations
optimal sensing area in Fig. 25), and it is sparse and dense of the hand, and drawing sizes, etc. The main objective of
alternately in other areas. In the optimal sensing area, quan- this work is to address the fundamental problem in CSI-based
titative relationship of the features can be held. However, the gesture/activity recognition – the location and orientation
performance of the proposed hierarchical sensing mechanism dependent problem. The proposed HandGest is just an initial
is not limited by the so-called “optimal sensing area” due to attempt to solve this problem, where we use the ten digits
the embedded hand-centric features recording instantaneous recognition as an example. We envision that the proposed
status of the moving hand at each time step from a hand- hierarchical sensing mechanism is a general methodology
oriented view. for both gesture recognition and other CSI-based sensing
applications, which may fill in the gap between laboratory
prototype systems and real-world deployments.
R EFERENCES
[1] D. Zhang, H. Wang, and D. Wu, “Toward centimeter-scale human activity
sensing with Wi-Fi signals,” Computer, vol. 50, vol. 1, 2017, pp. 48-57.
[2] Y. Ma, G. Zhou, and S. Wang, “WiFi sensing with channel state
information: A survey,” ACM Comput. Surv. Tut., vol. 52, no. 3, 2019,
pp. 1-36.
[3] J. Liu, H. Liu, Y. Chen, Y. Wang, and C. Wang, “Wireless sensing for
human activity: A survey,” IEEE Commun. Surv. Tut., vol. 22, no. 3, 2020,
pp. 1629-1645.
[4] Y. Ma, G. Zhou, S. Wang, H. Zhao, and W. Jung, “SignFi: Sign language
recognition using WiFi,” Proc. ACM Interact. Mob. Wearable Ubiquitous
Technol., vol. 2, no. 1, 2018, pp. 1-21.
[5] X. Wang, X. Wang, and S. Mao, “RF sensing with the Internet of Things:
A general deep learning framework,” IEEE Commun. Mag., vol. 56, no.
9, 2018, pp. 62-67.
[6] J. Zhang, Y. Li, and W. Xiao, “Integrated multiple kernel learning for
Fig. 25. Intersection of Fresnel zones of the WiFi transceivers. device-free localization in cluttered environmnets using spatiotemporal
information,” IEEE Internet Things J., vol. 8, no. 6, 2021, pp. 4749-4761.
[7] Y. Wang, J. Liu, Y. Chen, M. Gruteser, J. Yang, and H. Liu, “E-
eyes: Device-free location-oriented activity indentification using fine-
grained WiFi gestures,” in: Proceedings of the 20th Annual International
D. Limitations Conference on Mobile Computing and Networking (MobiCom), Maui,
Hawaii, USA, 2014, pp. 617-628.
HandGest focuses on handwriting recognition of one hand, [8] W. Wang, A.X. Liu, M. Shahzad, K. Li, and S. Lu, “Understanding
it is difficult to separate the WiFi signals reflected by multiple and modeling of WiFi signal based human activity recognition,” in:
Proceedings of the 21st Annual International Conference on Mobile
moving hands simultaneously using two pairs of commodi- Computing and Networking (MobiCom), New York, NY, USA, 2015, pp.
ty WiFi devices. It is possible to capture multiple moving 65-76.

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on May 04,2022 at 12:52:51 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3170157, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL 15

[9] K. Ali, A.X. Liu, W. Wang, and M. Shahzad, “Keystroke recognition ratio of two antennas,” Proc. ACM Interact. Mob. Wearable Ubiquitous
using WiFi signals,” in: Proceedings of the 21st Annual International Technol., vol. 3, no. 3, 2019, pp. 1-26.
Conference on Mobile Computing and Networking (MobiCom), London, [31] T. Needham, “Visual complex analysis,” Oxford University Press, 1998.
UK, 2015, pp. 1-13. [32] R.W. Schafer, “What is a Savitzky-Golay filter?” IEEE Signal Proc.
[10] Y. Zou, W. Liu, K. Wu, and L.M. Ni, “Wi-Fi radar: Recognizing human Mag., vol. 28, no. 4, 2011, pp. 111-117.
behavior with commodity Wi-Fi,” IEEE Commun. Mag., vol. 55, no. 10, [33] D. Wu, D. Zhang, C. Xu, Y. Wang, and H. Wang, “WiDir: Walking
2017, pp. 105-111. direction estimation using wireless signals,” in: Proceedings of the
[11] J. Zhang, W. Xiao, and Y. Li, “Data and knowledge twin driven 2016 ACM International Joint Conference on Pervasive and Ubiquitous
integration for large-scale device-free localization,” IEEE Internet Things Computing (UbiComp), Heidelberg, Germany, 2016, pp. 351-362.
J., vol. 8, no. 1, 2021, pp. 320-331. [34] Z. Han, Z. Lu, X. Wen, J. Zhao, L. Guo, and Y. Liu, “In-air handwriting
[12] Z. Wang, B. Guo, Z. Yu, and X. Zhou, “Wi-Fi CSI-based behavior by passive gesture tracking using commodity WiFi,” IEEE Commun. Lett.,
recogniton: From signals and actions to activies,” IEEE Commun. Mag., vol. 24, no. 11, 2020, pp. 2652-2656.
vol. 56, no. 5, 2018, pp. 109-115. [35] S. Yousefi, H. Narui, S. Dayal, S. Ermon, and S. Valaee, “A survey
[13] F. Wang, F. Zhang, C. Wu, B. Wang, and K.J.R. Liu, “Respiration on behavior recognition using WiFi channel state information,” IEEE
tracking for people counting and recognition,” IEEE Internet Things J., Commun. Mag., vol. 55, no. 10, 2017, pp. 98-104.
vol. 7, no. 6, 2020, pp. 5233-5245. [36] Y. Yang, J. Cao, X. Liu, and X. Liu, “Door-monitor: Counting in-and-
[14] D. Halperin, W. Hu, A. Sheth, and D. Wetherall, “Tool release: Gath- out visitors with COTS WiFi devices,” IEEE Internet Things J., vol. 7,
ering 802.11 n traces with channle state information,” ACM SIGCOMM no. 3, 2020, pp. 1704-1717.
Computer Commun. Rev., vol. 41, no. 1, 2011, pp. 53. [37] W. Jiang, C. Miao, F. Ma, S. Yao, Y. Wang, Y. Yuan, H. Xue, C. Song,
[15] S. Tang and J. Yang, “WiFinger: Leveraging commodity WiFi for fine- X. Ma, D. Koutsonikolas, W. Xu, and L. Su, “Towards environmnet
grained finger gesture recognition,” in: Proceedings of the 17th ACM independent device free human activity recognition,” in: Proceedings
International Symposium on Mobile Ad Hoc Networking and Computing of the 24th Annual International Conference on Mobile Computing and
(MobiHoc), Paderborn, Germany, 2016, pp. 201-210. Networking (MobiCom), New Delhi, India, 2018, pp. 1-16.
[16] O. Zhang, and K. Srinivasan, “Mudra: User-friendly fine-grained gesture [38] F. Wang, W. Gong, and J. Liu, “On spatial diversity in WiFi-based human
recognition using WiFi signals,” in: Proceedings of the 12th Internation- activity recognition: A deep learning-based approach,” IEEE Internet
al Conference on emerging Networking EXperiments and Technologies Things J., vol. 6, no. 2, 2019, pp. 2035-2047.
(CoNEXT), Irvine, CA, USA, 2016, pp. 83-96. [39] J. Zhang, Y. Li, W. Xiao, and Z. Zhang, “Robust extreme learning
[17] N. Yu, W. Wang, A.X. Liu, and L. Kong, “QGesture: Quantifying gesture machine for modeling with unknown noise,” J. Franklin Inst., vol. 357,
distance and direction with WiFi signals,” Proc. ACM Interact. Mob. no. 14, 2020, pp. 9885-9908.
Wearable Ubiquitous Technol., vol. 1, no. 4, 2018, pp. 1-22. [40] D. Wu, D. Zhang, C. Xu, H. Wang, and X. Li, “Device-free WiFi human
[18] Y. Zeng, D. Wu, J. Xiong, and D. Zhang, “Boosting WiFi sensing sensing: From pattern-based to model-based approaches,” IEEE Commun.
performance via CSI ratio,” IEEE Pervas. Comput., vol. 20, no. 1, 2021, Mag., vol. 55, no. 10, 2017, pp. 91-97.
pp. 62-70. [41] J. Zhang, Y. Li, W. Xiao, and Z. Zhang, “Non-iterative and fast deep
learning: Multilayer extreme learning machines,” J. Franklin Inst., vol.
[19] A. Virmani, and M. Shahzad, “Position and orientation agnostic gesture
357, no. 13, 2020, pp. 8925-8955.
recognition using WiFi,” in: Proceedings of the 15th ACM International
[42] T. Zheng, Z. Chen, S. Ding, and J. Luo, “Enhancing RF sensing with
Conference on Mobile System, Applications and Services (MobiSys),
deep learning: A layered approach,” IEEE Commun. Mag., vol. 59, no.
Niagara, NY, USA, 2017, pp. 252-264.
2, 2021, pp. 70-76.
[20] Y. Zheng, Y. Zhang, K. Qian, G. Zhang, Y. Liu, C. Wu, and Z. Yang,
“Zero-effort cross-domain gesture recognition with Wi-Fi,” in: Proceed-
ings of the 17th ACM International Conference on Mobile System,
Applications and Services (MobiSys), Seoul, South Korea, 2019, pp. 313-
325.
[21] R.H. Venkatnarayan, G. Page, and M. Shahzad, “Multi-user gesture
recognition using WiFi,” in: Proceedings of the 16th ACM International Jie Zhang (Member, IEEE) received the Ph.D.
Conference on Mobile System, Applications and Services (MobiSys), degree from University of Science and Technology
Munich, Germany, 2018, pp. 401-413. Beijing, China, in 2019.
[22] J. Yang, H. Zou, Y. Zhou, and L. Xie, “Learning gestures from WiFi: He was a Visiting Research Fellow with Uni-
A siamese recurrent convolutional architecture,” IEEE Internet Things J., versity of Leeds, U.K., in 2018. From 2019 to
vol. 6, no. 6, 2019, pp. 10763-10772. 2021, he was a Postdoctoral Research Fellow with
[23] L. Sun, S. Sen, D. Koutsonikolas, and K.H. Kim, “WiDraw: Enabling Peking University, China. He is currently an Asso-
hands-free drawing in the air on commodity WiFi devices,” in: Proceed- ciate Professor with the School of Automation &
ings of the 21st Annual International Conference on Mobile Computing Electrical Engineering, University of Science and
and Networking (MobiCom), Paris, France, 2015, pp. 77-89. Technology Beijing, China, and a Marie Sklodowska
[24] H. Abdelnasser, M. Youssef, and K.A. Harras, “WiGest: A ubiquitous Curie Research Fellow with the School of Electronic
WiFi-based gesture recognition system,” in: Proceedings of the 2015 and Electrical Engineering, University of Leeds, U.K. His current research
IEEE Conference on Computer Communications (INFOCOM), Hong interests include pervasive computing, body sensor network, wireless sensor
Kong, China, 2015, pp. 1472-1480. networks, and machine learning.
[25] Y. Lu, S. Lv, and X. Wang, “Towards location independent gesture Dr. Zhang was a recipient of the Marie Sklodowska-Curie Actions Individ-
recognition with commodity WiFi devices,” Electronics, vol. 8, no. 9, ual Fellowships (MSCA-IF), European Commission.
2019, pp. 1069-1088.
[26] S.D. Regani, B. Wang, and K.J.R. Liu, “WiFi-based device-free gesture
recognition through-the-wall,” in: Proceedings of the 2011 IEEE Interna-
tional Conference on Acoustics, Speech and Signal Processing (ICASSP),
Toronto, ON, Canada, 2021, pp. 8017-8021.
[27] X. Ma, Z. Zhao, L. Zhang, Q. Gao, M. Pan, and J. Wang, “Practical Yang Li received the B.E. degree from Northwest-
device-free gesture recognition using WiFi signals based on metalearn- ern Polytechnical University, China, in 2020. He
ing,” IEEE Trans. Ind. Inform., vol. 16, no. 1, 2020, pp. 228-237. is currently a Ph.D. student with the School of
[28] C. Wu, Z. Yang, Z. Zhou, X. Liu, Y. Liu, and J. Cao, “Non-invasive Computer Science, Peking University, China. His
detection of moving and stationary human with WiFi,” IEEE J. Sel. Areas research interest includes wireless sensing.
Commun., vol. 33, no. 11, 2015, pp. 2329-2342.
[29] D. Wu, R. Gao, Y. Zeng, J. Liu, L. Wang, T. Gu, and D. Zhang,
“FingerDraw: Sub-wavelength level finger motion tracking with WiFi
signals,” Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., vol.
4, no. 1, 2020, pp. 1-27.
[30] Y. Zeng, D. Wu, J. Xiong, E. Yi, R. Gao, and D. Zhang, “FarSense:
Pushing the range limit of WiFi-based respiration sensing with CSI

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on May 04,2022 at 12:52:51 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2022.3170157, IEEE Internet of
Things Journal
IEEE INTERNET OF THINGS JOURNAL 16

Haoyi Xiong (Senior Member, IEEE) received Ph.D Daqing Zhang (Fellow, IEEE) received the Ph.D.
in computer science from Télécom SudParis and degree from the University of Rome “La Sapienza”,
Université Pierre et Marie Curie, France in 2015. Italy, in 1996.
From 2016 to 2018, he was a Tenure-Track Assistant He is currently a Chair Professor with the School
Professor with the Department of Computer Science, of Computer Science, Peking University, China,
Missouri University of Science and Technology, and Telecom SudParis, France. His current research
Rolla, MO, and a Postdoc at University of Virginia, interests include context-aware computing, urban
Charlottesville, VA, USA, from 2015 to 2016. He computing, mobile computing, big data analytics,
is currently a Principal Architect at Big Data Lab, and pervasive elderly care. He has published over
Baidu Inc., Beijing, China, and also a Graduate 300 technical papers in leading conferences and
Faculty Scholar affiliated to University of Central journals.
Florida, Orlando FL, USA. Prof. Zhang was a recipient of the Ten-Years CoMoRea Impact Paper Award
His current research interests include AutoDL and ubiquitous computing. at IEEE PerCom 2013, the Honorable Mention Award at ACM UbiComp 2015
He has published more than 70 papers in top computer science conferences and 2016, the Ten-Years Most Influential Paper Award at IEEE UIC 2019,
and journals, including UbiComp, ICML, ICLR, RTSS, KDD, AAAI/IJCAI, and the Distinguished Paper Award at ACM UbiComp 2021. He served as
TMC, TC, TKDE, TNNLS, IoTJ, and TKDD. He was a co-recipient of the the General or Program Chair for over 17 international conferences, giving
2020 IEEE TCSC Award for Excellence in Scalable Computing (Early Career keynote talks at more than 20 international conferences. He is an Associate
Researcher) and the prestigious Science & Technology Advancement Award Editor for IEEE Pervasive Computing, ACM Transactions on Intelligent
(First Prize) from Chinese Institute of Electronics in 2019. Systems and Technology, and Proceedings of the ACM on Interactive, Mobile,
Wearable and Ubiquitous Technologies.

Dejing Dou (Senior Member, IEEE) received the


B.E. degree from Tsinghua University, China, in
1996 and the Ph.D. degree from Yale University,
USA, in 2004.
He is the Head of Big Data Lab (BDL) and
Business Intelligence Lab (BIL) at Baidu Research.
He is also a Full Professor (on leave) from the
Computer and Information Science Department at
the University of Oregon and has led the Advanced
Integration and Mining (AIM) Lab since 2005. He
has been the Director of the NSF IUCRC Center
for Big Learning (CBL) since 2018. He was a visiting associate Professor at
Stanford Center for Biomedical Informatics Research during 2012-2013. His
research areas include artificial intelligence, data mining, data integration,
NLP, and health informatics.
Dejing Dou has published more than 160 research papers, some of which
appear in prestigious conferences and journals like AAAI, IJCAI, ICML,
NeurIPS, ICLR, KDD, ICDM, ACL, EMNLP, CIKM, ISWC, JIIS and JoDS,
with more than 5000 Google Scholar citations. His DEXA’15 paper received
the best paper award. His KDD’07 paper was nominated for the best research
paper award. He is on the Editorial Boards of Journal on Data Semantics,
Journal of Intelligent Information Systems, and PLoS One. He has been
serving as program committee members for major international conferences
and as program co-chairs for five of them. He has received over 5 million
PI research grants from the NSF and the NIH. He is a senior member of the
AAAI, ACM and IEEE.

Chunyan Miao (Senior Member, IEEE) received


the B.S. degree from Shandong University, China, in
1988, and the M.S. and Ph.D. degrees from Nanyang
Technological University, Singapore, in 1998 and
2003, respectively.
She is currently a Professor with the School of
Computer Science and Engineering, Nanyang Tech-
nological University (NTU), and the Director of the
Joint NTU-UBC Research Centre of Excellence in
Active Living for the Elderly (LILY). Her research
focus on infusing intelligent agents into interactive
new media (virtual, mixed, mobile, and pervasive media) to create novel
experiences and dimensions in game design, interactive narrative, and other
real-world agent systems.

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: BEIJING INSTITUTE OF TECHNOLOGY. Downloaded on May 04,2022 at 12:52:51 UTC from IEEE Xplore. Restrictions apply.

You might also like