The Liar's Walk Detecting Deception With Gait and Gesture

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

The Liar’s Walk

Detecting Deception with Gait and Gesture

Tanmay Randhavane Uttaran Bhattacharya Kyra Kapsaskis


University of North Carolina University of Maryland University of North Carolina
Chapel Hill, USA College Park, USA Chapel Hill, USA
tanmay@cs.unc.edu uttaranb@cs.umd.edu kyrakaps@gmail.com
arXiv:1912.06874v1 [cs.CV] 14 Dec 2019

Kurt Gray Aniket Bera Dinesh Manocha


University of North Carolina University of Maryland University of Maryland
Chapel Hill, USA College Park, USA College Park, USA
kurtjgray@gmail.com ab@cs.umd.edu dm@cs.umd.edu

http://gamma.cs.unc.edu/GAIT/

Abstract

We present a data-driven deep neural algorithm for de-


tecting deceptive walking behavior using nonverbal cues
like gaits and gestures. We conducted an elaborate user
study, where we recorded many participants performing
tasks involving deceptive walking. We extract the partici-
pants’ walking gaits as series of 3D poses. We annotate var-
ious gestures performed by participants during their tasks.
Based on the gait and gesture data, we train an LSTM-based
deep neural network to obtain deep features. Finally, we use
a combination of psychology-based gait, gesture, and deep
features to detect deceptive walking with an accuracy of
93.4%. This is an improvement of 16.1% over handcrafted
gait and gesture features and an improvement of 5.9% and
10.1% over classifiers based on the state-of-the-art emo-
tion and action classification algorithms, respectively. Ad-
ditionally, we present a novel dataset, DeceptiveWalk, that
contains gaits and gestures with their associated deception
labels. To the best of our knowledge, ours is the first algo-
Figure 1. Detecting Deception: We present a new data-driven al-
rithm to detect deceptive behavior using non-verbal cues of gorithm for detecting deceptive walking using nonverbal cues of
gait and gesture. gaits and gestures. We take an individual’s walking video as an
input (top), compute psychology-based gait features, gesture fea-
tures, and deep features learned from a neural network, and com-
1. Introduction bine them to detect deceptive walking. In the examples shown
above, smaller hand movements (a) and the velocity of hands and
In recent years, AI and vision communities have focused feet joints (d) provide deception cues.
a lot on learning human behaviors, and human-centric video
analysis has rapidly developed [65, 67, 70, 42]. While con- actions [68, 73], as well as anomaly detection [49, 46, 59],
ventional video content analysis pays attention to the anal- etc. While these problems are being widely studied, a re-
ysis of the video content, human-centric video analysis fo- lated problem, detecting deception, has not been the focus
cuses on the humans in the videos and attempts to obtain of much research.
information about their behaviors, describe their disposi- Masip et al. [41] define deception as “the deliberate
tions, and predict their intentions. This includes recogni- attempt, whether successful or not, to conceal, fabricate,
tion of emotions [36, 38, 40], personalities [67, 75], and and/or manipulate in any other way, factual and/or emo-

1
tional information, by verbal and/or nonverbal means, in collected from 88 individuals performing deceptive and nat-
order to create or maintain in another or others a belief that ural walking. The videos in this dataset provide interesting
the communicator himself or herself considers false”. Mo- observations about participants’ behavior in deceptive and
tives for deceptive behaviors can vary from inconsequential natural conditions. Deceivers put their hands in their pock-
to those constituting a serious security threat. Many appli- ets and look around more than the participants in the nat-
cations related to computer vision, human-computer inter- ural condition. Our observations also corroborate previous
faces, security, and computational social sciences need to findings that deceivers display the opposite of the expected
be able to automatically detect such behaviors in public ar- behavior or display a controlled movement [22, 19, 48].
eas (airports, train stations, shopping malls), simulated en- Some of the novel components of our work include:
vironments, and social media [60]. In this paper, we address 1. A novel deception feature formulation of gaits and ges-
the problem of automatically detecting deceptive behavior tures obtained from walking videos based on psychological
learned from gaits and gestures from walking videos. characterization and deep features learned from an LSTM-
Deception detection is a challenging task because decep- based neural network.
tion is a subtle human behavioral trait, and deceivers at- 2. A novel deception detection algorithm that detects de-
tempt to conceal their actual cues and expressions. How- ceptive walking with an accuracy of 93.4%.
ever, there is considerable research on verbal (explicit) and 3. A new public domain dataset, DeceptiveWalk, contain-
nonverbal (implicit) cues of deception [22]. Implicit cues ing annotated gaits and gestures with deceptive and natural
such as facial and body expressions [22], eye contact [79], walks.
and hand movements [30] can provide indicators of decep- The rest of the paper is organized as follows. In Sec-
tion. Facial expressions have been widely studied as cues tion 2, we give a brief background on deception and discuss
for automatic recognition of deception [20, 44, 43, 53, 52, previous methods of detecting deceptive walking automat-
29, 69]. However, deceivers try to alter or control what ically. In Section 3, we give an overview of our approach
they think is getting the most attention from others [21]. and present the details of our user study. We also present
Compared to facial expressions, body movements such as our novel DeceptiveWalk dataset in Section 3. We present
gaits are more implicit and are less likely to be controlled. the novel deceptive features in Section 4 and provide details
This makes gait an excellent avenue for observing decep- of our method for detecting deception from walking videos.
tive behaviors. The psychological literature on deception We highlight the results generated using our method in Sec-
also shows that multiple factors need to be considered when tion 5.
deciphering nonverbal information [12].
Main Results: We present an data-driven approach for 2. Related Work
detecting deceptive walking of individuals based on their In this section, we give a brief background of research on
gaits and gestures as extracted from their walking videos deception and discuss previous methods for detecting de-
(Figure 1). Our approach is based on the assumption that ceptive behaviors automatically.
humans are less likely to alter or control their gaits and ges-
tures than facial expressions [21], which arguably makes 2.1. Deception
such cues better indicators of deception.
Research on deception shows that people behave differ-
Given a video of an individual walking, we extract ently when they are being deceitful, and different clues can
his/her gait as a series of 3D poses using state-of-the-art be used to detect deception [22]. Darwin first suggested that
human pose estimation [16]. We also annotate various ges- certain movements are “expressive” and escape the controls
tures performed during the video. Using this data, we com- of the will [17]. Additionally, when people know that they
pute psychology-based gait features, gesture features, and are being watched, they alter their behavior in something
deep features learned using an LSTM-based neural net- known as “The Hawthorne Effect” [39]. To avoid detec-
work. These gait, gesture, and deep features are collec- tion, deceivers try to alter or control what they think others
tively referred to as the deceptive features. Then, we feed are paying the most attention to [21]. Different areas of the
the deceptive features into fully connected layers of the neu- body have different communicating abilities that provide in-
ral network to classify normal and deceptive behaviors. We formation based on differences in movement, visibility, and
train this neural network classifier (Deception Classifier) to speed of transmission [22]. Therefore, using channels to
learn the deep features as well as classify the data into be- which people pay less attention and that are harder to con-
havior labels on a novel dataset (DeceptiveWalk). Our De- trol are good indicators of deception.
ception Classifier can achieve an accuracy of 93.4% when Body expressions present an alternative channel for
classifying deceptive walking, which is an improvement of perception and communication, as shown in emotion re-
5.9% and 10.1% over classifiers based on the state-of-the- search [6]. Body movements are more implicit and may be
art emotion [10] and action [57] classification algorithms, less likely to be controlled compared to facial expressions,
respectively. as evidenced by prior work, suggesting that clues such as
Additionally, we present our deception dataset, Decep- less eye contact [79], downward gazing (through the affec-
tiveWalk, which contains 1144 annotated gaits and gestures tive experiences of guilt/sadness) [9, 21], and general hand
movements [30] are good indicators of deception. Though
there is a large amount of research on non-verbal cues of de-
ception, there is no single conclusive universal cue [63]. To
address this issue, we present a novel data-driven approach
that computes features based on gaits and gestures, along
with deep learning.

2.2. Automatic Detection of Deception


Many approaches have been developed to detect decep-
tion automatically using verbal or text-based cues. Text-
based approaches have been developed to detect deception Figure 2. Overview: At runtime, we compute gait and gesture
in online communication [78], news [13], court cases [23], features from an input walking video using 3D poses. Using our
novel DeceptiveWalk dataset, we train a classifier consisting of
etc. A large number of approaches that detect deception us-
an LSTM module that is used to learn temporal patterns in the
ing non-verbal cues have focused on facial expressions and walking styles, followed by a fully connected module that is used
head and hand gestures [53, 52, 20, 29]. Some approaches to classify the features into class labels (natural or deceptive).
use only facial expressions to detect deception [7, 27], while
other approaches use only hand and head gestures [44, 43]. deceptive walks.
In situations where the subject is sitting and talking, Van
der Zee et al. [62] used full-body features captured using 3. Approach
motion capture suits to detect deception. However, such
an approach is impractical for most applications. Other In this section, we give an overview of our approach. We
cues that have been used for detecting deception include present details of our user study that was used to derive a
eye movements [80], fMRI [15], thermal input [11, 1], and data-driven metric.
weight distribution [5]. Recent research has also introduced
3.1. Overview
multimodal approaches that use multiple modalities such as
videos, speech, text, and physiological signals to detect de- We provide an overview of our approach in Figure 2. Our
ception [25, 55, 69, 35, 31, 3, 2]. However, most of these data-driven algorithm consists of an offline training phase
approaches have focused on humans that have been sitting during which we conducted a user study and obtained a
or standing; deception detection from walking is relatively video dataset of participants performing either deceptive or
unexplored. This is an important problem for many appli- natural walks. For each video, we use a state-of-the-art 3D
cations such as security in public areas (e.g., airports, con- human pose extraction algorithm to extract gaits as a se-
certs, etc.) and surveillance. Therefore, in this paper, we ries of 3D poses. We also annotate various hand and head
present an approach that detects deception from walking us- gestures performed by participants during the video. We
ing gait and gesture features. compute psychology-based gait features, gesture features,
and deep features learned using an LSTM-based neural net-
2.3. Non-verbal Cues of Gaits and Gestures work. Using these features with their deception labels, we
Research has shown that body expressions, including train a Deception Classifier that detects deceptive walks. At
gaits and gestures, are critical for the expression and percep- runtime, our algorithm takes a video as input and extracts
tion of others’ emotions, moods, and intentions [34]. Gaits gaits and gestures. Using the trained Deception Classifier,
have been shown to be useful in conveying and recogniz- we can then detect whether the individual in the video is
ing identity [64], gender [74], emotions [34], moods [45], performing a deceptive walk or not. We now describe each
and personalities [4, 56]. Gestures have also been widely component of our algorithm in detail.
observed to convey emotions [37, 50], intentions [47], 3.2. Notation
moods [18], personality [8], and deception [44, 43, 20, 29].
For gait recognition, many deep learning approaches have We represent the deception dataset by D. We obtain
been proposed [77, 76, 66], and LSTM-based approaches this DeceptiveWalk dataset from a data collection study.
have been used to model gait features [77]. Body expres- We denote a data point in this dataset by Mi , where i ∈
sions have also been used for anomaly detection in videos {1, 2, ..., N } and N is the number of gaits in the dataset.
with multiple humans [46]. Many deep learning approaches A data point Mi contains the gait Wi , the gestures Gi ex-
have been proposed to recognize and generate actions from tracted from the ith video, and its associated deception label
2D [32, 72] and 3D skeleton data [57, 51, 26]. Graph con- di ∈ {0, 1}. A value of di = 0 represents a natural walking
volution networks such as STEP [10] and ST-GCN [71] video and di = 1 represents a deceptive walking video.
for emotion and action recognition from skeleton data re- We use a joint representation of humans for our gait
spectively have also been proposed. Inspired by these ap- formulation. Similar to the previous literature on model-
proaches that showcase the communicative capability of ing human poses [16], we use a set of 16 joints to repre-
gaits and gestures, we use these non-verbal cues to detect sent the poses. A set of 3D positions of each joint ji , i ∈
{1, 2, ..., 16} represents a human pose. We define the gait 6. At the Exchange location, the participant exchanges
Wi of a human obtained from the ith video as a series of the object with a package from the confederate sitting
poses Pik ∈ R48 where k ∈ [1, 2, ..., τ ]. Each pose cor- there. The package is a red package that remains the
responds to a frame k from the video, and τ is the total same in both conditions. In the briefing, the participant
number of frames in the video. is informed that it is a piece of computer hardware.

3.3. Data Collection Study 7. After the exchange, the participant walks back towards
the Chair. We refer to this walk as Walk 3, and it is
We conducted a user study to collect gait and gesture captured by the camera placed behind the chair (Cam
data of individuals performing either deceptive or natural 1).
walks.
8. The participant places the exchanged package on the
3.3.1 Participants chair.

We recruited 88 participants (51 female, 30 male, 7 pre- 9. The participant walks towards the stairs located near
ferred not to say, age = 20.35) from a university cam- the Start location. We refer to this walk as Walk 4, and
pus. Appropriate institutional review board (IRB) approval it is captured by the camera placed behind the chair
was obtained for the study and each participant provided in- (Cam 2).
formed written consent to participate in the study and record
videos. 10. The participant walks towards the experimenter on
Floor 1 and is debriefed about the experiment.

3.3.2 Procedure In this procedure, the deceptive walk is induced by the


experimenter during the briefing, the type of object, and the
We adopted a between-subject method for data collection. appearance of the confederate. The experiment was con-
We compared two conditions: natural and deceptive. Each ducted in a public setting (lobby of an academic building in
participant was randomly assigned to walk either naturally a university). Throughout the experiment, there were 4 − 10
or in a deceptive manner, as described below. In both con- students sitting around the lobby area who were not part of
ditions, the experiment area remains the same. We show the the experiment. In the deceptive condition, the participant
experiment setup in Figure 3 and Figure 4. The experiment is supposed to conceal their activities from these “onlook-
consists of the following steps: ers.” The experimenter stresses that the package is a se-
cret, and because of the amount of money involved, it is re-
1. The experiment area consists of two floors. The par- ally important that the participant keeps anyone from notic-
ticipant is briefed about the procedure by the experi- ing as he/she does these things. In the natural condition,
menter on Floor 1. The participant is informed that the confederate is wearing a hoodie with the hood pulled
he/she is supposed to walk to a chair, pick up an ob- down. However, in the deceptive condition, the confederate
ject from the chair, exchange it with a person wearing is wearing the same hoodie with the hood pulled up while
a hoodie, replace the package on the chair, and come wearing sunglasses. Because of this setting, the participant
back. makes a deliberate attempt to conceal his/her activities, re-
sulting in deceptive behavior [41].
2. The participant then takes the stairs from Floor 1 to
Floor 2 and emerges at the Start location.
3.3.3 Data Labeling
3. The participant then walks towards the Chair. We refer
As described in the procedure, we obtain four walking
to this walk as Walk 1, and it is captured by the camera
videos for each participant. For each video, we extract the
placed behind the chair (Cam 1).
3D pose of the participant in each frame using the human
pose extraction algorithm described below. Specifically, we
4. After reaching the Chair, the participant picks up an
obtain the 3D position of each joint relative to the root joint
object. In the natural condition, the object is a collec-
in each frame. Depending on the condition, natural or de-
tion of classic computer science papers and is placed
ceptive, we assign a label to each walking video.
on top of the chair. However, in the deceptive condi-
tion, the object is a big roll of money and is placed 3.4. Human Pose Extraction
under the chair.
To extract a person’s poses from their walking video, we
5. The participant then walks towards the Exchange loca- first need to distill out extraneous information such as at-
tion. We refer to this walk as Walk 2, and it is captured tire, items carried (e.g., bags or cases), background clut-
by the camera placed behind the Exchange/Start loca- ter, etc. We adapt the approach of [16], where the authors
tion (Cam 2). have trained a weakly supervised network to perform this
Table 1. Posture Features: In each frame, we compute the follow-
ing features that represent the human posture.
Type Description
Volume Bounding box
Neck by shoulders
Right shoulder by neck and left shoulder
Angle At Left shoulder by neck and right shoulder
Neck by vertical and back
Neck by head and back
Right hand and the root joint
Left hand and the root joint
Distance
Right foot and the root joint
Between
Left foot and the root joint
Consecutive foot strikes (stride length)
Area of Between hands and neck
Triangle Between feet and the root joint
Figure 3. Experiment Procedure: We describe the experiment 3.6. DeceptiveWalk Dataset
procedure. The experiment area consists of two floors. We obtain
four videos of walking captured by two cameras, Cam 1 and Cam We invited 88 participants for the data collection study.
2, for each participant. For each participant, we obtained four walking videos.
Some participants followed the instructions incorrectly, and
task. The first part of the network, called the Structure- we could not obtain all four walking videos for these partic-
Aware PoseNet (SAP-Net), is trained on spatial information, ipants. Overall, we obtained 314 walking videos. For each
which learns to extract 3D joint locations from each frame video, we extracted gaits using the human pose extraction
of an input video. The second part of the network, called algorithm. Due to occlusions, the human pose extraction
the Temporal PoseNet (TP-Net), is trained on temporal in- algorithm had significant errors in 28 of these videos. To
formation, which takes in the extracted joints and outputs a expand the size and diversity of the dataset, we performed
temporally harmonized sequence of poses. data augmentation by reflecting all the 3D pose sequences
Moreover, walking videos are collected from various about the vertical axis and performing phase shifts in the
viewpoints, and the scale of the person varies depending temporal domain. Reflection about the vertical axis and the
on the relative camera position. Therefore, we perform a phase shift does not alter the overall gaits. Hence the corre-
least-squares similarity transform [61] on each pose in the sponding labels can remain the same. As a result, we were
dataset. This step ensures that each individual pose lies able to obtain a total of 1144 gaits of participants from the
within a box of unit volume, and the first pose of each video data collection study.
is centered at the global origin. Depending on the condition assigned to a participant, we
assigned each video with a label of natural or deceptive.
3.5. Gestures
We also annotated the various gestures from the gesture set
In addition to extracting gaits, we also annotate the ges- performed by the participants in each video. We refer to
tures performed by the participants during the four walks. this dataset of 1144 gaits, gestures, and their associated de-
Prior literature on deception suggests that people showing ception labels as the DeceptiveWalk dataset. The dataset
deceptive behavior often feel distressed, and levels of dis- contains 508 natural and 636 deceptive videos. The dataset
comfort can be used to detect a person’s truthfulness. These contains 280 videos of Walk 1, 300 videos of Walk 2, 192
levels of discomfort may appear in fidgeting (adjusting their videos of Walk 3, and 231 videos of Walk 4.
shirt/moving their hands) or while glancing at objects such
as a clock or a watch [48]. Touching the face around 4. Automatic Deception Detection
the forehead, neck, or back of the head is also an indica-
tor of discomfort related to deception [24]. We use these From the user study, we obtain a dataset of 3D pose data
findings and consider the following set of gestures:{Hands for each video. Using this pose data, we extract gait and
In Pockets, Looking Around, Touching Face, Touching gesture features. We also compute deep features using a
Shirt/Jackets, Touching Hair, Hands Folded, Looking at deep LSTM-based neural network. We use these novel de-
Phone}. We chose this set because it includes all the hand ception features as input to a classification algorithm. In this
gestures observed in the walking videos of participants, and section, we first describe these deception features and their
these gestures have been reported to be related to decep- computation. We then describe the classification algorithm.
tion [24, 48]. For each walking video, we annotate whether
4.1. Gait Features
each gesture from this set is present or absent. For the hands
in pockets gesture, we also annotate how many hands are in In previous work [54], researchers have used a combina-
the pocket. tion of posture and movement features to represent a gait.
Figure 4. Experiment Area: The experiment area consists of two floors. The experimenter briefs the participant on Floor 1, and the
participant performs the task on Floor 2. We show screenshots of a participant performing the task from Cam 1 and Cam 2.
This is based on the finding that both posture and move-
ment features are used to predict an individual’s affective
state [34]. These features relate to the joint angles, distances
between the joints, joint velocities, and space occupied by
various parts of the body and have been used to classify
gaits according to emotions and affective states [14, 34, 54]. Figure 5. Our Classification Network: Each of the 3D pose se-
Since deceptive behavior impacts the emotional and cogni- quences corresponding to various walking styles is fed as input
tive state of an individual [58, 22], features that are indi- to the LSTM module consisting of 3 LSTM units, each of depth
cators of affective state may also be related to deception. 2. The output of the LSTM module (the deep feature pertaining
Therefore, we use a similar set of gait features for the clas- to the 3D pose sequence) is concatenated with the corresponding
L
sification of deceptive walks, as described below. Combin- gait feature and the gesture feature, denoted by the symbol.
ing the posture and the movement features, we obtain 29 The concatenated features are then fed into the fully connected
dimensional gait features. module, which consists of two fully connected layers. The out-
put of the fully connected module is passed through another fully
connected layer equipped with the softmax activation function to
4.1.1 Posture Features generate the predicted class labels.
In each frame, we compute the features relating to the dis- We summarize these posture features in Table 1. There are
tances between joints, angles between joints, and the space 13 posture features in total.
occupied by various parts of the body. These features cor-
respond to the body posture in that frame. We use the 3D
joint positions computed using the human pose extraction 4.1.2 Movement Features
algorithm (Section 3.4) to compute these posture features
In addition to the human posture in each frame, movement
as described below:
of the joints across time is also an important feature of
• Volume: We use the volume of the bounding box gaits [34]. We model this by computing the magnitude of
around a human as the feature that represents the com- the speed, acceleration, and movement jerk of the hands,
pactness of the human’s posture. head, and foot joints. For each joint, we compute these fea-
tures using the first, second, and third finite derivatives of
• Area: In addition to volume, we also use the areas of its 3D position computed using the human pose extraction
triangles between the hands and the neck and between algorithm. We also include the gait cycle time as a feature.
the feet and the root joint to model body compactness. We compute this by the time between two consecutive foot
strikes of the same foot. There are 16 movement features
• Distance: We use the distances between the feet, the in total. We aggregate both posture and movement features
hands, and the root joint as features. These features over the gait to form the 29 dimensional gait feature vector.
model body expansion and also capture the magnitude
of the hand and food movement. 4.2. Gesture Features
• Angle: We use the angles extended by different joints We use the set of gestures described in Section 3.5
at the neck to capture the head tilt and rotation. These and formulate the gesture features as a 7 dimensional vec-
features also capture whether the posture is slouched tor corresponding to the set {Hands In Pockets, Looking
or erect using the angle extended by the shoulders at Around, Touching Face, Touching Shirt/Jackets, Touching
the neck. Hair, Hands Folded, Looking at Phone} ∈ R7 . For each
gesture j in the set, we set the value of Gji = 1 if the gesture
• Stride Length: Stride length has been used to repre-
sent gait features in literature; therefore, we also in- is present in the walking video and Gji = 0 if it is absent.
clude stride length as a posture feature. We compute For hands in pockets, we use Gji = 1 if one hand is in the
the stride length by computing the maximum distance pocket, Gji = 2 if two hands are in the pocket, and Gji = 0
between the feet across the gait. if no hands are in the pockets.
Table 2. Classification: We compared our method with state-of- Table 3. Ablation Study: We evaluated the usefulness of the dif-
the-art methods for gait-based action recognition as well as per- ferent features and different classification algorithms for classify-
ceived emotion recognition. Both these classes of methods use ing a walk as natural or deceptive. We observed that all three com-
similar gait datasets as inputs but learn to predict different labels. ponents of the deceptive features (gait features, gesture features,
ST-GCN [71] DGNN [57] STEP [10] Ours and deep features) contribute towards the accurate prediction of
deceptive behavior.
80.4% 83.3% 87.5% 93.4% Features
Gestures Gait Gestures and Gait Deep All Combined

Random Forest 60.6% 70.0% 74.0% 79.3% 84.6%


4.3. Classification Algorithm LSTM+FC 62.5% 71.2% 77.3% 82.7% 93.4%

Given a sequence of 3D pose data for a fixed number of 5.1. Implementation Details
time frames as input, the task of the classification algorithm
is to assign the input one of two class labels — natural or We randomly selected 80% of the dataset for training the
deceptive. To achieve this, we develop a deep neural net- network, 10% for cross-validation and kept the remaining
work consisting of long short-term memory (LSTM) [28] 10% for testing. Inputs were fed to the network with a batch
layers. LSTM units consist of feedback connections and size of 8. We used the standard cross-entropy loss to train
gate functions that help them retain useful patterns from in- the network. The loss was optimized by running the Adam
put data sequences. Since the inputs in our case are walking optimizer [33] for n = 500 epochs with a momentum of 0.9
motions, which are periodic in time, we reason that LSTM and a weight decay of 10−4 . The initial learning rate was
units can learn to efficiently encode the different walking 0.001,
 which was
 reduced 7nto half its present value after 250
n 3n
patterns in our data, which, in turn, helps the network seg- 2 , 375 4 , and 437 8 epochs.
regate the data into the respective classes. We call the fea-
ture vectors learned from the LSTM layers deep features,
5.2. Experiments
fd ∈ R32 . We evaluate the performance of our LSTM-based net-
work as well as the usefulness of the deep features through
exhaustive experiments. All the experimental results are
4.3.1 Network Architecture summarized in Table 3.
Since ours is the first algorithm that detects deception
Our overall neural network is shown in Figure 5. We first from walking gaits, we compare the performance of our al-
normalize the input sequences of 3D poses so that each in- gorithm with a random forest classification method. First,
dividual value lies within the range of 0 and 1. We feed the we feed all the features and feature combinations to an off-
normalized sequences of 3D poses into an LSTM module, the-shelf random forest classifier with 100 decision trees,
which consists of 3 LSTM units of sizes 128, 64, and 32, each of max depth 2. Each tree performs binary classifi-
respectively, and each of depth 2. We concatenate the 32 cation on sub-samples of the dataset, and the final estimate
dimensional feature vectors output from the LSTM module is calculated as a weighted average of the prediction of the
with the 29 dimensional gait and the 7 dimensional gesture individual trees. Random forest classifiers do not naturally
features and feed the 68 dimensional combined deceptive capture the temporal patterns of the walking styles in the 3D
feature vectors into a convolution and pooling module. This pose sequences. Hence it leads to a lower accuracy com-
module consists of 2 convolution layers. The first convolu- pared to using an LSTM-based network. Moreover, since
tion layer has a depth of 48 and a kernel size of 3. It is each tree in the Random Forest is a weak classifier and the
followed by a maxpool layer with a window size of 3. The final estimate is obtained by averaging, the performance is
second convolution layer has depth 16 and a kernel size of observed to be poorer than the LSTM-based network even
3. The output of the second convolution layer is flattened when only the gait and gesture features are used for clas-
and passed into the fully connected module, which consists sification. Overall, we find a consistent 3% improvement
of 2 fully connected (FC) layers of sizes 32 and 8, respec- in accuracy across the board when using the LSTM-based
tively. All the FC layers are equipped with the ELU acti- network over the Random Forest classifier.
vation function. The output feature vectors from the fully Second, we compare the performance of using the
connected module are passed through a 2 dimensional fully LSTM-based network with the various input features indi-
connected layer with the softmax activation function to gen- vidually. Gesture features by themselves provide the low-
erate the output class probabilities. We assign the predicted est classification accuracies for both the classifiers because
class label as the one with a higher probability. they only coarsely summarize the subject’s activities and
not their walking patterns. Gait features, on the other hand,
5. Results contain only this information and are seen to be more help-
ful in distinguishing between the class labels. Gestures and
We first describe the implementation details of our clas- gait features collectively perform better than their individual
sification network, followed by a detailed summary of the performances. However, these features still lose some of the
experimental results. useful temporal patterns in the original 3D pose sequences.
Figure 7. Gesture Features: We present the percentage of videos
Figure 6. Separation of Features: We present the scatter plot of in which the gestures were observed for both deceptive and natural
the features for each of the two-class labels. We project the fea- walks. We observe that deceivers put their hands in their pockets
tures to a 3 dimensional space using PCA for visualization. The and look around more than participants in the natural condition.
gait and gesture features are not separable in this space. How- However, the trend is unclear in other gestures. This could be
ever, the combined gait+gestures features and deep features are explained in previous literature that suggests that deceivers try to
well separated even in this low dimensional space. The boundary alter or control what they think others are paying the most attention
demarcated for the deceptive class features is only for visualiza- to [22, 19, 48].
tion and does not represent the true class boundary. pockets than the participants in the natural condition.
2. Deceivers look around and check if anyone is noticing
Using LSTMs to learn the temporal patterns directly from them more than the participants in the natural condi-
the 3D pose sequences, we observe an improvement of 5% tion.
over using the combined gait and the gesture features. Fi-
nally, combining the deep features learned from the LSTM 3. Deceivers touch their hair more than the participants in
module with the gait and gesture features leads to an overall the natural condition.
classification accuracy of 93.4%. 4. Participants in the natural condition touched their
We also compared our method with prior methods for shirt/jacket more than the deceivers.
emotion and action recognition from gaits since these meth-
ods also solve the problem of gait classification (albeit with 5. Deceivers touched their faces and hair, folded their
a different set of labels). We compare with approaches that hands, and looked at their phones at about the same
use spatial-temporal graph convolution networks for emo- rate as those in the natural condition.
tion (STEP [10]) and action recognition (ST-GCN [71]).
We also compare our method with an approach that uses Previous literature suggests that deceivers are sometimes
a novel directed graph neural network (DGNN [57]) for ac- very aware of the cues they are putting out and may, in
tion recognition. We train these methods on our Deceptive- turn, display the opposite of the expectation or display a
Walk dataset and obtain their performance on the testing set controlled movement [22, 19, 48]. This would explain ob-
similar to our method. Our method outperforms these state- servations 4 and 5. Factors like how often a person lies or
of-the-art models used for emotion and action recognition how aware they are of others’ perceptions of them could
by a minimum of 5.9% (Table 2). contribute to whether they show fidgeting and nervous be-
Furthermore, we show the scatter of the features from havior or controlled and stable movements.
both the natural and the deceptive classes in Figure 6.
The original features are high dimensional (fgait ∈ 6. Conclusion, Limitations, and Future Work
R29 , fgesture ∈ R7 , and fd ∈ R32 ). Hence we per- We presented a novel method for distinguishing the de-
form PCA to project and visualize them on a 3 dimen- ceptive walks of individuals from natural walks based on
sional space. The gait and gesture features from the two their walking style or pattern in videos. Based on our novel
classes are not separable in this space. However, the com- DeceptiveWalk dataset, we train an LSTM-based classifier
bined gait+gesture features, as well as the deep features that classifies the deceptive features and predicts whether an
from the two classes, are well-separated even in this lower- individual is performing a deceptive walk. We observe an
dimensional space, implying that the deep features fd and accuracy of 93.4% obtained using 10-fold cross-validation
gait+gesture features fg can efficiently separate between the on 1144 data points.
two classes. There are some limitations to our approach. Our ap-
proach is based on our dataset collected in a controlled set-
5.3. Analysis of Gesture Cues ting. In the future, we would like to test our algorithm on
We tabulate the distribution of various gestures in Fig- real-world videos involving deceptive walks. Since the ac-
ure 7. We can make interesting observations from this data. curacy of our algorithm depends on the accurate extraction
of 3D poses, the algorithm may perform poorly in cases
1. Deceivers are more likely to put their hands in their where pose estimation is inaccurate (e.g., occlusions, bad
lighting). For this work, we manually annotate the gesture [11] Pradeep Buddharaju, Jonathan Dowdall, Panagiotis Tsi-
data. In the future, we would like to automatically annotate amyrtzis, Dvijesh Shastri, I Pavlidis, and MG Frank. Au-
different gestures performed by the individual and automate tomatic thermal monitoring system (athemos) for deception
the entire method. Additionally, our approach is limited to detection. In 2005 IEEE Computer Society Conference on
walking. In the future, we would like to extend our algo- Computer Vision and Pattern Recognition (CVPR’05), vol-
ume 2, pages 1179–vol. IEEE, 2005. 3
rithm to more general cases that include a variety of activ-
[12] David Buller and Judee Burgoon. Interpersonal deception
ities. The analysis of our data collection study reveals that
theory. Communication Theory, 6(3):203–242, 1996. 2
the mapping between gestures and deception is unclear and
[13] Niall J Conroy, Victoria L Rubin, and Yimin Chen. Auto-
may depend on other factors. In the future, we would like matic deception detection: Methods for finding fake news.
to explore the factors that govern the relationship between Proceedings of the Association for Information Science and
gestures and deception. Finally, we would like to combine Technology, 52(1):1–4, 2015. 3
our method with other verbal and non-verbal cues of decep- [14] A Crenn, A Khan, et al. Body expression recognition from
tion (e.g., gazing, facial expressions, etc.) and compare the anim. 3d skeleton. In IC3D, 2016. 6
usefulness of body expressions to other cues in detecting [15] Qian Cui, Eric J Vanman, Dongtao Wei, Wenjing Yang, Lei
deception. Jia, and Qinglin Zhang. Detection of deception based on
fmri activation patterns underlying the production of a de-
ceptive response and receiving feedback about the success
References of the deception after a mock murder crime. Social cognitive
[1] Mohamed Abouelenien, Veronica Pérez-Rosas, Rada Mihal- and affective neuroscience, 9(10):1472–1480, 2013. 3
cea, and Mihai Burzo. Deception detection using a multi- [16] R Dabral, A Mundhada, et al. Learning 3d human pose from
modal approach. In Proceedings of the 16th International structure and motion. In ECCV, 2018. 2, 3, 4
Conference on Multimodal Interaction, pages 58–65. ACM, [17] C. Darwin and P. Prodger. The Expression of Emotions in
2014. 3 Man and Animals. Philosophical Library, 1972/1955. 2
[2] Mohamed Abouelenien, Verónica Pérez-Rosas, Rada Mihal- [18] P Ravindra De Silva, Minetada Osano, Ashu Marasinghe,
cea, and Mihai Burzo. Detecting deceptive behavior via inte- and Ajith P Madurapperuma. Towards recognizing emo-
gration of discriminative features from multiple modalities. tion with affective dimensions through body gestures. In
IEEE Transactions on Information Forensics and Security, 7th International Conference on Automatic Face and Ges-
12(5):1042–1055, 2016. 3 ture Recognition (FGR06), pages 269–274. IEEE, 2006. 3
[3] Mohamed Abouelenien, Verónica Pérez-Rosas, Bohan Zhao, [19] Bella DePaulo, James Lindsay, Brian Malone, Laura Muh-
Rada Mihalcea, and Mihai Burzo. Gender-based multimodal lenbruck, Kelly Charlton, and Harris Cooper. Cues to decep-
deception detection. In Proceedings of the Symposium on tion. Psychological Bulletin, 129(1):74–118, 2003. 2, 8
Applied Computing, pages 137–144. ACM, 2017. 3 [20] Mingyu Ding, An Zhao, Zhiwu Lu, Tao Xiang, and Ji-Rong
Wen. Face-focused cross-stream network for deception de-
[4] Anthony P Atkinson, Mary L Tunstall, and Winand H Dit-
tection in videos. In Proceedings of the IEEE Conference
trich. Evidence for distinct contributions of form and motion
on Computer Vision and Pattern Recognition, pages 7802–
information to the recognition of emotions from body ges-
7811, 2019. 2, 3
tures. Cognition, 104(1):59–72, 2007. 3
[21] P. Ekman. Telling Lies: : Clues to deceit in the marketplace,
[5] Dan Atlas and Gideon L Miller. Detection of signs of at- marriage, and politics. Norton, 1994. 2
tempted deception and other emotional stresses by detecting
[22] Paul Ekman and Wallace Friesen. Nonverbal leakage and
changes in weight distribution of a standing or sitting person,
clues to deception. Psychiatry, 32(1):88–106, 1969. 2, 6, 8
Feb. 8 2005. US Patent 6,852,086. 3
[23] Tommaso Fornaciari and Massimo Poesio. Automatic de-
[6] H. Aviezer, Y. Trope, and A. Todorov. Body cues, not fa- ception detection in italian court cases. Artificial intelligence
cial expressions, discriminate between intense positive and and law, 21(3):303–340, 2013. 3
negative emotions. Science, 228(6111):1225–1229, 2012. 2
[24] David Givens. The Nonverbal Dictionary of Gestures, Signs,
[7] Danilo Avola, Luigi Cinque, Gian Luca Foresti, and Daniele and Body Language Cues. Center for Nonverbal Studies
Pannone. Automatic deception detection in rgb videos using Press, 2002. 5
facial action units. In Proceedings of the 13th International [25] Viresh Gupta, Mohit Agarwal, Manik Arora, Tanmoy
Conference on Distributed Smart Cameras, page 5. ACM, Chakraborty, Richa Singh, and Mayank Vatsa. Bag-of-lies:
2019. 3 A multimodal dataset for deception detection. In Proceed-
[8] Gene Ball and Jack Breese. Relating personality and be- ings of the IEEE Conference on Computer Vision and Pattern
havior: Posture and gestures. In International Workshop on Recognition Workshops, pages 0–0, 2019. 3
Affective Interactions, pages 196–203. Springer, 1999. 3 [26] Ikhsanul Habibie, Daniel Holden, Jonathan Schwarz, Joe
[9] Roy F. Baumeister, Arlene M. Stillwell, and Todd F. Heather- Yearsley, and Taku Komura. A recurrent variational autoen-
ton. Guilt: An interpersonal approach. Psychological Bul- coder for human motion synthesis. In BMVC, 2017. 3
letin, 115(2):243–267, 1994. 2 [27] Md Kamrul Hasan, Wasifur Rahman, Luke Gerstner, Taylan
[10] Uttaran Bhattacharya, Trisha Mittal, Rohan Chandra, Tan- Sen, Sangwu Lee, Kurtis Glenn Haut, and Mohammed Ehsan
may Randhavane, Aniket Bera, and Dinesh Manocha. Step: Hoque. Facial expression based imagination index and a
Spatial temporal graph convolutional networks for emotion transfer learning approach to detect deception. 2019. 3
perception from gaits. arXiv preprint arXiv:1910.12906, [28] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term
2019. 2, 3, 7, 8 memory. Neural computation, 9(8):1735–1780, 1997. 7
[29] Guosheng Hu, Li Liu, Yang Yuan, Zehao Yu, Yang Hua, Zhi- [44] Nicholas Michael, Mark Dilsizian, Dimitris Metaxas, and
hong Zhang, Fumin Shen, Ling Shao, Timothy Hospedales, Judee K Burgoon. Motion profiles for deception detection
Neil Robertson, et al. Deep multi-task learning to recognise using visual cues. In European Conference on Computer Vi-
subtle facial expressions of mental states. In Proceedings sion, pages 462–475. Springer, 2010. 2, 3
of the European Conference on Computer Vision (ECCV), [45] Johannes Michalak, Nikolaus F Troje, Julia Fischer, Patrick
pages 103–119, 2018. 2, 3 Vollmar, Thomas Heidenreich, and Dietmar Schulte. Em-
[30] M. L. Jensen, T. O. Meservy, J. Kruse, J. K. Burgoon, and bodiment of sadness and depressiongait patterns associated
J. F. Nunamaker. Identification of deceptive behavioral cues with dysphoric mood. Psychosomatic medicine, 71(5):580–
extracted from video. IEEE Intelligent Transportation Sys- 587, 2009. 3
tems, pages 1135–1140, 2005. 2, 3 [46] Romero Morais, Vuong Le, Truyen Tran, Budhaditya Saha,
[31] Hamid Karimi, Jiliang Tang, and Yanen Li. Toward end- Moussa Mansour, and Svetha Venkatesh. Learning regular-
to-end deception detection in videos. In 2018 IEEE Inter- ity in skeleton trajectories for anomaly detection in videos.
national Conference on Big Data (Big Data), pages 1278– In Proceedings of the IEEE Conference on Computer Vision
1283. IEEE, 2018. 3 and Pattern Recognition, pages 11996–12004, 2019. 1, 3
[32] Mehran Khodabandeh, Hamid Reza Vaezi Joze, Ilya [47] Allyson Mount. Intentions, gestures, and salience in ordinary
Zharkov, and Vivek Pradeep. Diy human action dataset gen- and deferred demonstrative reference. Mind & Language,
eration. In Proceedings of the IEEE Conference on Computer 23(2):145–164, 2008. 3
Vision and Pattern Recognition Workshops, pages 1448– [48] J. Navarro. Four-domain model for detection deception: An
1458, 2018. 3 alternative paradigm for interviewing. FBI Law Enforcement
[33] Diederik P Kingma and Jimmy Ba. Adam: A method for Bulletin, 72(6):19–24, 2003. 2, 5, 8
stochastic optimization. arXiv preprint arXiv:1412.6980, [49] Trong-Nguyen Nguyen and Jean Meunier. Anomaly detec-
2014. 7 tion in video sequence with appearance-motion correspon-
[34] A Kleinsmith, N Bianchi-Berthouze, et al. Affective body dence. In Proceedings of the IEEE International Conference
expression perception and recognition: A survey. IEEE TAC, on Computer Vision, pages 1273–1283, 2019. 1
2013. 3, 6
[50] Fatemeh Noroozi, Dorota Kaminska, Ciprian Corneanu,
[35] Gangeshwar Krishnamurthy, Navonil Majumder, Soujanya Tomasz Sapinski, Sergio Escalera, and Gholamreza Anbar-
Poria, and Erik Cambria. A deep learning approach jafari. Survey on emotional body gesture recognition. IEEE
for multimodal deception detection. arXiv preprint transactions on affective computing, 2018. 3
arXiv:1803.00344, 2018. 3
[51] Dario Pavllo, David Grangier, and Michael Auli. Quater-
[36] Chieh-Ming Kuo, Shang-Hong Lai, and Michel Sarkis. A
net: A quaternion-based recurrent model for human motion.
compact deep learning model for robust facial expression
arXiv preprint arXiv:1805.06485, 2018. 3
recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops, pages [52] Verónica Pérez-Rosas, Mohamed Abouelenien, Rada Mihal-
2121–2129, 2018. 1 cea, and Mihai Burzo. Deception detection using real-life
[37] Weston LaBarre. The cultural basis of emotions and ges- trial data. In Proceedings of the 2015 ACM on International
tures. Journal of personality, 16(1):49–68, 1947. 3 Conference on Multimodal Interaction, pages 59–66. ACM,
2015. 2, 3
[38] Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, and
Kwanghoon Sohn. Context-aware emotion recognition net- [53] Verónica Pérez-Rosas, Mohamed Abouelenien, Rada Mihal-
works. In Proceedings of the IEEE International Conference cea, Yao Xiao, CJ Linton, and Mihai Burzo. Verbal and non-
on Computer Vision, pages 10143–10152, 2019. 1 verbal clues for real-life deception detection. In Proceedings
[39] S. D. Levitt and J. A. List. Was there really a hawthorne of the 2015 Conference on Empirical Methods in Natural
effect at the hawthorne plant? an analysis of the original Language Processing, pages 2336–2346, 2015. 2, 3
illumination experiments. American Economic Journal: Ap- [54] Tanmay Randhavane, Aniket Bera, Kyra Kapsaskis, Uttaran
plied Economics, 3(1):224–38, 2011. 2 Bhattacharya, Kurt Gray, and Dinesh Manocha. Identify-
[40] Elisabeta Marinoiu, Mihai Zanfir, Vlad Olaru, and Cristian ing emotions from walking using affective and deep features.
Sminchisescu. 3d human sensing, action and emotion recog- arXiv preprint arXiv:1906.11884, 2019. 5, 6
nition in robot assisted therapy of children with autism. In [55] Rodrigo Rill-Garcia, Hugo Jair Escalante, Luis Villasenor-
Proceedings of the IEEE Conference on Computer Vision Pineda, and Veronica Reyes-Meza. High-level features for
and Pattern Recognition, pages 2158–2167, 2018. 1 multimodal deception detection in videos. In Proceedings
[41] J. Masip, E. Garrido, and C. Herrero. Defining deception. of the IEEE Conference on Computer Vision and Pattern
Anales de Psicologı́a, 20(1):147–171, 2004. 1, 4 Recognition Workshops, pages 0–0, 2019. 3
[42] Daniel McDuff and Mohammad Soleymani. Large-scale [56] C Roether, L Omlor, et al. Critical features for the perception
affective content analysis: Combining media content fea- of emotion from gait. Vision, 2009. 3
tures and facial reactions. In 2017 12th IEEE International [57] Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu.
Conference on Automatic Face & Gesture Recognition (FG Skeleton-based action recognition with directed graph neural
2017), pages 339–345. IEEE, 2017. 1 networks. In Proceedings of the IEEE Conference on Com-
[43] Thomas O Meservy, Matthew L Jensen, John Kruse, Judee K puter Vision and Pattern Recognition, pages 7912–7921,
Burgoon, Jay F Nunamaker, Douglas P Twitchell, Gabriel 2019. 2, 3, 7, 8
Tsechpenakis, and Dimitris N Metaxas. Deception detection [58] M. L. Slepian, E. J. Masicampo, N. R. Toosi, and N. Am-
through automatic, unobtrusive analysis of nonverbal behav- bady. The physical burdens of secrecy. Journal of Experi-
ior. IEEE Intelligent Systems, 20(5):36–43, 2005. 2, 3 mental Psychology: General, 141(4):619–624, 2012. 6
[59] Waqas Sultani, Chen Chen, and Mubarak Shah. Real-world [75] Chen-Lin Zhang, Hao Zhang, Xiu-Shen Wei, and Jianxin
anomaly detection in surveillance videos. In Proceedings Wu. Deep bimodal regression for apparent personality anal-
of the IEEE Conference on Computer Vision and Pattern ysis. In European Conference on Computer Vision, pages
Recognition, pages 6479–6488, 2018. 1 311–324. Springer, 2016. 1
[60] Michail Tsikerdekis and Sherali Zeadally. Online decep- [76] Kaihao Zhang, Wenhan Luo, Lin Ma, Wei Liu, and Hong-
tion in social media. Communications of the ACM, 57(9):72, dong Li. Learning joint gait representation via quintuplet
2014. 2 loss minimization. In Proceedings of the IEEE Conference
[61] Shinji Umeyama. Least-squares estimation of transformation on Computer Vision and Pattern Recognition, pages 4700–
parameters between two point patterns. TPAMI, (4):376– 4709, 2019. 3
380, 1991. 5 [77] Ziyuan Zhang, Luan Tran, Xi Yin, Yousef Atoum, Xiaom-
[62] Sophie Van der Zee, Ronald Poppe, Paul J Taylor, and Ross ing Liu, Jian Wan, and Nanxin Wang. Gait recognition via
Anderson. To freeze or not to freeze: A culture-sensitive disentangled representation learning. In Proceedings of the
motion capture approach to detecting deceit. PloS one, IEEE Conference on Computer Vision and Pattern Recogni-
14(4):e0215000, 2019. 3 tion, pages 4710–4719, 2019. 3
[78] Lina Zhou and Dongsong Zhang. Following linguistic foot-
[63] Aldert Vrij, Maria Hartwig, and Pär Anders Granhag. Read-
prints: automatic deception detection in online communica-
ing lies: nonverbal communication and deception. Annual
tion. Commun. ACM, 51(9):119–122, 2008. 3
review of psychology, 70:295–317, 2019. 3
[79] Miron Zuckerman, Bella DePaulo, and Robert Rosenthal.
[64] Changsheng Wan, LI Wang, and Vir V Phoha. A sur- Verbal and nonverbal communication of deception. Ad-
vey on gait recognition. ACM Computing Surveys (CSUR), vances in Experimental Social Psychology, 14:1–59, 1981.
51(5):89, 2019. 3 2
[65] Shangfei Wang and Qiang Ji. Video affective content analy- [80] Jiaxu Zuo, Tom Gedeon, and Zhenyue Qin. Your eyes say
sis: a survey of state-of-the-art methods. IEEE Transactions youre lying: An eye movement pattern analysis for face
on Affective Computing, 6(4):410–430, 2015. 1 familiarity and deceptive cognition. In 2019 International
[66] Yanxiang Wang, Bowen Du, Yiran Shen, Kai Wu, Guan- Joint Conference on Neural Networks (IJCNN), pages 1–8.
grong Zhao, Jianguo Sun, and Hongkai Wen. Ev-gait: Event- IEEE, 2019. 3
based robust gait recognition using dynamic vision sensors.
In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pages 6358–6367, 2019. 3
[67] Xiu-Shen Wei, Chen-Lin Zhang, Hao Zhang, and Jianxin
Wu. Deep bimodal regression of apparent personality traits
from short video sequences. IEEE Transactions on Affective
Computing, 9(3):303–315, 2017. 1
[68] Di Wu, Nabin Sharma, and Michael Blumenstein. Recent ad-
vances in video-based human action recognition using deep
learning: A review. In 2017 International Joint Confer-
ence on Neural Networks (IJCNN), pages 2865–2872. IEEE,
2017. 1
[69] Zhe Wu, Bharat Singh, Larry S Davis, and VS Subrahma-
nian. Deception detection in videos. In Thirty-Second AAAI
Conference on Artificial Intelligence, 2018. 2, 3
[70] Baohan Xu, Yanwei Fu, Yu-Gang Jiang, Boyang Li, and
Leonid Sigal. Heterogeneous knowledge transfer in video
emotion recognition, attribution and summarization. IEEE
Transactions on Affective Computing, 9(2):255–270, 2016.
1
[71] Sijie Yan, Yuanjun Xiong, and Dahua Lin. Spatial tempo-
ral graph convolutional networks for skeleton-based action
recognition. In Thirty-Second AAAI Conference on Artificial
Intelligence, 2018. 3, 7, 8
[72] Ceyuan Yang, Zhe Wang, Xinge Zhu, Chen Huang, Jianping
Shi, and Dahua Lin. Pose guided human video generation.
In Proceedings of the European Conference on Computer Vi-
sion (ECCV), pages 201–216, 2018. 3
[73] Guangle Yao, Tao Lei, and Jiandan Zhong. A review of
convolutional-neural-network-based action recognition. Pat-
tern Recognition Letters, 118:14–22, 2019. 1
[74] Shiqi Yu, Tieniu Tan, Kaiqi Huang, Kui Jia, and Xinyu Wu.
A study on gait-based gender classification. IEEE Transac-
tions on image processing, 18(8):1905–1910, 2009. 3

You might also like