Professional Documents
Culture Documents
Exploring Trace Transform For Robust Human Action Recognition - G. Goudelis2013
Exploring Trace Transform For Robust Human Action Recognition - G. Goudelis2013
Exploring Trace Transform For Robust Human Action Recognition - G. Goudelis2013
Pattern Recognition
journal homepage: www.elsevier.com/locate/pr
art ic l e i nf o
a b s t r a c t
Article history:
Received 27 March 2012
Received in revised form
26 February 2013
Accepted 1 June 2013
Machine based human action recognition has become very popular in the last decade. Automatic
unattended surveillance systems, interactive video games, machine learning and robotics are only few of
the areas that involve human action recognition. This paper examines the capability of a known
transform, the so-called Trace, for human action recognition and proposes two new feature extraction
methods based on the specic transform. The rst method extracts Trace transforms from binarized
silhouettes, representing different stages of a single action period. A nal history template composed
from the above transforms, represents the whole sequence containing much of the valuable spatiotemporal information contained in a human action. The second, involves Trace for the construction of a
set of invariant features that represent the action sequence and can cope with variations usually
appeared in video capturing. The specic method takes advantage of the natural specications of the
Trace transform, to produce noise robust features that are invariant to translation, rotation, scaling and
are effective, simple and fast to create. Classication experiments performed on two well known and
challenging action datasets (KTH and Weizmann) using Radial Basis Function (RBF) Kernel SVM provided
very competitive results indicating the potentials of the proposed techniques.
& 2013 Elsevier Ltd. All rights reserved.
Keywords:
Human action recognition
Motion analysis
Action classication
Trace transform
1. Introduction
Observation and analysis of the human behavior is an open
research topic for the last decade. Recognizing human actions in
everyday life, is a very challenging task that can be applied in
various areas such as automated crowd surveillance, shopping
behavior analysis, automated sport analysis, humancomputer
interaction and others. One could state the problem as the ability
of a system to automatically classify the action performed by a
human person, given the action containing video.
Although the problem can be easily grasped, providing a
solution to this problem is a daunting task that requires different
approach to several sub-problems. The challenge of the task rises
from various factors that inuence the recognition rate.
Individuality is a major issue as the same action can be
performed differently by every person. Complicated backgrounds,
occlusions, illumination variations, camera stabilization and view
angle are only a few of the problems that increase the complexity
and create a large number of prerequisites.
If we had to classify human action recognition in different subcategories, one could do it by taking into consideration the
underlying features used to represent the various activities.
As authors spot in [1], there are two main classes based on the
0031-3203/$ - see front matter & 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.patcog.2013.06.006
Please cite this article as: G. Goudelis, et al., Exploring trace transform for robust human action recognition, Pattern Recognition (2013),
http://dx.doi.org/10.1016/j.patcog.2013.06.006i
discriminant features of the sets extracted for each frame. Classication experiments using the above datasets, presented even
better results of 93.14% and 95.42% respectively, indicating the
potentiality of the method.
To the best of authors' knowledge, this is the rst time that the
Trace transform in any of its forms or its derivatives, is used for the
extraction of features for human action recognition.
The rest of the paper is organized as follows. Trace transform
and the theory behind it, is presented in Section 2. An overview of
the proposed methods is given in Section 3. In Section 4, History
Trace Template and History Triple Feature extraction techniques
are described. The experimental procedure is provided in Section 5
followed by conclusion in Section 6.
2. Trace transform
Trace transform is a generalization of Radon [36] transform while
at the same time Radon builds a sub-case of it. While Radon
transform of an image is a 2D representation of the image in
coordinates and p with the value of the integral of the image
computed along the corresponding line, placed at cell (; p), Trace
calculates functional T over parameter t along the line, which is not
necessarily the integral. Trace transform is created by tracing an
image with straight lines where certain functionals of the image
function are calculated. Different transforms having different properties can be produced from the same image. The transform produced
is in fact a 2-dimensional function of the parameters of each tracing
line. Denition of the above parameters for an image Tracing line is
given in Fig. 1. Examples of Radon and Trace transforms for different
action snapshots are given in Fig. 2. In following we will give a
description of the feature extraction procedure for the Trace Transform based on the theory provided in [37].
To better understand the specic transform, let us consider a
linearly distorted object (rotation, translation and scaling). We
could say that the object is just perceived in another coordinate
system linearly distorted. This could be easier explained by letting
us call the initial coordinate system of the image C1 and the new
distorted one, C2. Let us also suppose that the distorted system
can be obtained by rotating C1 by angle , scaling of the axes
by parameter v and by translating with vector (s0 cos 0 ;
s0 sin 0 ). Suppose that there is a 2D object F which is viewed
~ y.
~ F 2 x;
~ y
~ can be
from C1 as F 1 x; y and from C2 as F 2 x;
considered as an image constructed from F 1 x; y by rotation by
, scaling by v1 , and shifting by (s0 cos 0 ; s0 sin 0 ).
A linearly transformed image is actually transferred along lines
of another coordinate system, as the straight lines in the new
coordinate system also appear as straight lines. The parameters of
a line in C2 parameterized by (; p; t) in the old system C1, are
old
t
Fig. 1. Denition of the parameters of an image tracing line.
Please cite this article as: G. Goudelis, et al., Exploring trace transform for robust human action recognition, Pattern Recognition (2013),
http://dx.doi.org/10.1016/j.patcog.2013.06.006i
Trace
Radon
Action
snapshot
Fig. 2. Examples of Radon and Trace transforms created from the silhouettes of different action snapshots taken from Weizmann database.
At this point, the three functionals must be chosen. In following, invariant and sensitive to displacement functionals are used
which for simplicity, will be called invariant and sensitive
respectively. A functional of a function x is invariant if
x b x
bR
I 1 :
si1 :
bR
S1 :
bR
S2 :
1
Zx
4 0
s1
1
Zxb
s11
and
Zx b
1
b
Zx
s12 :
c 4 0
s2 :
4 0
ii :
c 4 0
i2 :
and
c c ;
Please cite this article as: G. Goudelis, et al., Exploring trace transform for robust human action recognition, Pattern Recognition (2013),
http://dx.doi.org/10.1016/j.patcog.2013.06.006i
10
11
3.1. History Trace Templates (HTTs)
12
13
14
16
or equivalently,
F; C 2 v F; C 1 :
17
Choosing so that
0;
18
19
21
Please cite this article as: G. Goudelis, et al., Exploring trace transform for robust human action recognition, Pattern Recognition (2013),
http://dx.doi.org/10.1016/j.patcog.2013.06.006i
Fig. 3. Examples of History Trace Templates produced for jack, side, skip, run and pjump actions taken from Weizmann database. Second row shows extracted silhouettes for
the above instances while third and fourth row show two different types of HTT, produced for each of the action videos.
Fig. 4. Weighted History Trace templates, for all the different action classes of KTH database using two different functionals (rows (a) and (b)). Important points clearly differ
from class to class.
The result is nally a new template that contains those highlighted scanning lines that produced values for the HTTs that differ
from each other by only up to a certain level . The results of the
above calculations on the different classes of KTH database are
illustrated in Fig. 4. WHTTs have been calculated considering every
time as training set, the set of HTT samples that constitute the
corresponding action class. To demonstrate that different functionals may introduce different characteristics of an action, two
different functionals have been used. The difference of the agged
points between the nal templates among action classes is
clearly shown.
D1 p; jT 1 p; T 2 p; j;
D2 p; jT 1 p; T 3 p; j;
D3 p; jT 2 p; T 3 p; j;
22
1;
if D1 p; r and D2 p; r and D3 p; r
0;
otherwise:
23
Please cite this article as: G. Goudelis, et al., Exploring trace transform for robust human action recognition, Pattern Recognition (2013),
http://dx.doi.org/10.1016/j.patcog.2013.06.006i
where Rp; is the line integral of the image along a line from
to . p and are the parameters that dene the position of the line.
So, Rf p; is the result of the integration of f over the line
p x cos y sin . The reference point is dened as the center
of the silhouette.
As human actions are in fact spatio-temporal volumes, the aim
is to represent as much of the dynamic and the structural
information of the action as possible. At this point, Trace transform
shows a great suitability for this task. It transforms 2-dimensional
images with lines into a domain of possible line parameters, where
each line in the image will give a peak positioned at the
corresponding line parameters. When Trace transform is calculated with respect to the center of the silhouette, specic coefcients will have capture much of the energy of the silhouette.
These coefcients will vary during time and will provide great
differences from one action to another for the same time-frame.
given from
N
T N p; g n p; :
25
n1
Functional
R
Tf x
0; rf r
dr
Tf x medianr0 ff r; f r1=2 g
where r xc and c medianx fx; f xg
4
5
Fig. 5. Extracted silhouettes and Trace transforms for one walking period. History Trace Template is shown on the bottom.
Please cite this article as: G. Goudelis, et al., Exploring trace transform for robust human action recognition, Pattern Recognition (2013),
http://dx.doi.org/10.1016/j.patcog.2013.06.006i
features is computed.
In a simple way triple feature is constructed as follows:
(a) Trace transform is produced by applying a Trace functional T
along lines tracing the image.
(b) The circus function of the image is produced by applying a
diametric functional P along columns of the Trace transform.
(c) The triple feature is nally produced by applying a circus
functional along the string of number produced in step b.
The procedure is illustrated in Fig. 6.
Dividing all norm F; C by each other, a set of independent
features is produced. So, the whole action sequence is nally
represented by a vector v comprised by the set of all triple feature
Fig. 6. Triple features extraction for one period of a wave action taken from Weizmann dataset.
Please cite this article as: G. Goudelis, et al., Exploring trace transform for robust human action recognition, Pattern Recognition (2013),
http://dx.doi.org/10.1016/j.patcog.2013.06.006i
26
where rat is the ratio of two normalized triple features and g the
number of calculated ratios.
This method allows the construction of many features easily.
Supposing that one makes use of 10 functionals for each stage of
the construction (e.g. 10 T functionals, 10 P functionals and 10
functionals) in a 10 frame action video, he may construct
10 10 10 10 10 000 features for one sequence. As mentioned above, these numbers may have not any physical meaning
according to human perception, but they may have the required
mathematical properties for classication purposes.
Since the discriminatory power of the features constructed will
denitely vary, a dimensionality reduction technique could provide a selection of the most discriminant features while make the
problem of classication more tractable. In our scheme the HTF
vectors produced as described above, become subject to LDA in
order to determine an appropriate subspace that is suitable for
classication. In practice we keep only a subset of the initial HTF
vector that contains the most discriminant of the calculated
feature capable to efciently describe the entire action sequence.
5. Experimental results
In this section, we will present the experimental results in
order to demonstrate the efciency of the proposed schemes for
human action recognition while at the same time, we will provide
the experimental evaluation of the different invariant Trace functionals calculated for the construction of HTTs.
Different published methods have used different evaluation scenarios. As it is stated in [41], most of the researchers working on the
eld of human action recognition have evaluated their methods on the
KTH [33] and Weizmann [38] datasets. However, there is not a unied
standard usually followed for evaluation. The authors of the above
paper also report differences up to 10.67% in results when different
validation approaches are applied to the same data.
In our experiments, the leave-one-person-out cross-validation
approach was used to evaluate performance. The specic protocol was chosen due to its popularity among researches. It also
Fig. 7. Action samples from Weizmann database for wave1, wave2, walk, pjump, side, run, skip, jack, jump and bend.
Fig. 8. Action samples from KTH database for walking, jogging, running, boxing, hand waving and hand clapping respectively.
Please cite this article as: G. Goudelis, et al., Exploring trace transform for robust human action recognition, Pattern Recognition (2013),
http://dx.doi.org/10.1016/j.patcog.2013.06.006i
Table 2
Results produced by calculating HTT, testing different functionals on the two
different datasets.
Trace Transform
Radon
1
2
3
4
5
6
7
87.7
89.82
88.41
86.66
88.00
89.82
89.82
90.22
91.11
92.20
92.00
90.52
93.11
92.20
92.20
93.41
Table 3
Classication percentages (%) achieved by different published methods on KTH
database.
Method
Average
accuracy (%)
Classier
86.50
94.00
94.16
81.20
50.33
88.30
74.79
SVM
SVM
VWCcorrel
NNC
NNC
NNC
NNC
80.90
71.70
81.50
84.40
91.80
90.22
93.14
SVM
SVM
pLSA
LPBOOST
SVM
SVM
SVM
Table 4
Classication percentages (%) achieved by different published methods on Weizmann database.
Method
Classier
97.80
84.3
96.3
86.66
94.40
72.8
93.4
95.42
SVM
SVM
SVM
MOH
1-NN
pLSA
SVM
SVM
6. Conclusion
In this paper, Trace transform is examined for its capability to
produce efcient features for human action recognition. More
specically, calculating Weighted Trace transforms (WTTs) we
initially presented an intuitive illustration of Trace capability to
distinguish action classes. In following, two new feature extraction
Please cite this article as: G. Goudelis, et al., Exploring trace transform for robust human action recognition, Pattern Recognition (2013),
http://dx.doi.org/10.1016/j.patcog.2013.06.006i
10
Acknowledgement
This work was co-nanced by ICT-Project Siren, under contract
(FP7-ICT-258453), by the European Union (European Social Fund
ESF) and Greek national funds through the Operational Program
Education and Lifelong Learning of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES.
Investing in knowledge society through the European Social Fund.
References
[1] C. Thurau, V. Hlavac, Pose primitive based human action recognition in videos
or still images, in: CVPR, IEEE Computer Society, Madison, USA, Omnipress,
2008, p. 8.
[2] L. Yeffet, L. Wolf, Local trinary patterns for human action recognition, in:
(ICCV), IEEE, 2009.
[3] V. Kellokumpu, G. Zhao, M. Pietikainen, Human activity recognition using a
dynamic texture based method, in: Proceedings of the British Machine Vision
Conference (BMVC), Leeds, UK, p. 10.
[4] C. Yang, Y. Guo, H. Sawhney, R. Kumar, Learning actions using robust string
kernels, in: HUMO07, Springer, 2007, pp. 313327.
[5] H. Jhuang, T. Serre, L. Wolf, T. Poggio, A biologically inspired system for action
recognition, in: ICCV, IEEE, 2007, pp. 18.
[6] J.C. Niebles, H. Wang, L. Fei-Fei, Unsupervised learning of human action
categories using spatial-temporal words, International Journal of Computer
Vision 79 (2008) 299318.
[7] K. Schindler, L. Van Gool, Action snippets: how many frames does human
action recognition require? in: CVPR08, 2008, pp. 18.
[8] I. Laptev, M. Marszaek, C. Schmid, B. Rozenfeld, Learning realistic human
actions from movies, in: Conference on Computer Vision & Pattern Recognition (CVPR), IEEE, 2008.
[9] K. Rapantzikos, Y. Avrithis, S. Kollias, Dense saliency-based spatiotemporal
feature points for action recognition, in: CVPR, 2009.
[10] R. Goldenberg, R. Kimmel, E. Rivlin, M. Rudzsky, Behavior classication by
eigendecomposition of periodic motions, in: Pattern Recognition, 2005,
pp. 38:10331043.
Please cite this article as: G. Goudelis, et al., Exploring trace transform for robust human action recognition, Pattern Recognition (2013),
http://dx.doi.org/10.1016/j.patcog.2013.06.006i
11
Georgios Goudelis received his B.Eng. (2003) in Electronic Engineering with Medical Electronics and his M.Sc. (2004) in Electronic Engineering both from University of Kent
at Canterbury. From 2004 to 2007 he has worked as a researcher with the Articial Intelligence and Information Analysis lab of the Department of Informatics at the Aristotle
University of Thessaloniki participating in several projects nanced by National and European funds. Currently, he is pursuing the Ph.D. degree in articial intelligence and
information analysis from the department of Electrical and Electronic Engineering, National Technical University of Athens, Greece. He has authored and co-authored
several journal papers and international conferences and contributed in one book chapter in his area of expertise. His current research interests include, statistical pattern
recognition especially for human actions localization and recognition, computational intelligence and computer vision.
Konstantinos Karpouzis graduated from the School of Electrical and Computer Engineering, of the National Technical University of Athens in 1998 and received his Ph.D.
degree in 2001 from the same University. His current research interests lie in the areas of natural human computer interaction, serious games, emotion understanding and
affective and assistive computing. He has published more than 110 papers in international journals and proceedings of international conferences and has contributed to the
Humaine Handbook on Emotion research and the Blueprint for an affectively competent agent. Since 1995 he has participated in more than twelve research projects at
Greek and European level; most notably the Humaine Network of Excellence, in the eld of mapping signals to signs of emotion and the FP7 STREP Siren on serious games
for conict resolution, where he is the Technical Manager.
Stefanos Kollias received the Diploma degree in Electrical Engineering from the National Technical University of Athens (NTUA) in 1979, the M.Sc. degree in Communication
Engineering from the University of Manchester (UMIST), U.K., in 1980, and the Ph.D. degree in Signal Processing from the Computer Science Division of NTUA in 1984. In 1982
he received a ComSoc Scholarship from the IEEE Communications Society. From 1987 to 1988, he was a Visiting Research Scientist in the Department of Electrical
Engineering and the Center for Telecommunications Research, Columbia University, New York, U.S.A. Since 1997 he is Professor of NTUA and Director of IVML. He is member
of the Executive Committee of the European Neural Network Society.
Please cite this article as: G. Goudelis, et al., Exploring trace transform for robust human action recognition, Pattern Recognition (2013),
http://dx.doi.org/10.1016/j.patcog.2013.06.006i