Professional Documents
Culture Documents
A Comprehensive Leap Motion Database For Hand Gesture Recognition
A Comprehensive Leap Motion Database For Hand Gesture Recognition
Abstract—The touchless interaction has received considerable physical contact between the man and the machine,
attention in recent years with benefit of removing the burden of specifically the computer.
physical contact. The recent introduction of novel acquisition
devices, like the leap motion controller, allows obtaining a very The technology can prove to be interesting for applications
informative description of the hand pose and motion that can be requiring interacting with the computer in particular
exploited for accurate gesture recognition. In this work, we environments, like an operating room where sterility causes a
present an interactive application with gestural hand control big issue. The system is designed to allow a touchless HMI, so
using leap motion for medical visualization, focusing on the that surgeons will be able to control medical images during
satisfaction of the user as an important component in the surgery.
composition of a new specific database.
This paper presents a study of hand gesture recognition
In this paper, we propose a 3D dynamic gesture recognition through the extraction, processing and interpretation of data
approach explicitly targeted to leap motion data. Spatial feature acquired by the LM. This leads us to design recognition and
descriptors based on the positions of fingertips and palm center classification approaches to develop a gesture library useful to
are extracted and fed into a support vector machine classifier in the required system control.
order to recognize the performed gestures. The experimental
results show the effectiveness of the suggested approach in the
We introduce several novel contributions. We collected a
recognition of the modeled gestures with a high accuracy rate of new dataset formed by specific dynamic gestures related to
about 81%. recommended commands, and then we created our own data
format.We propose a three-dimensional structure for the
Keywords—Hand gesture recognition; Touchless interaction; combination of spatial features like the arithmetic mean, the
Leap motion; Support vector machine standard deviation, the root mean square and the covariance,
in order to effectively classify dynamic gestures.
I. INTRODUCTION The paper is organized in the following way. Section II
describes the LM device. Section III represents the related
The user has dreamt for a long time to interact with a more works. Section IV describes our database. Section V
natural and intuitive machine rather than with conventional introduces the general pipeline of the proposed approach, in
means. For this reason, the gesture is the richest means of which, the first subsection presents the main feature
communication that can be used for human computer descriptors extracted from the dataset, and the second
interaction [2]. subsection describes the classifying algorithm. The
In the general context of Human-Machine Interaction, new experimental results are shown and discussed in section VI.
interfaces have been born. These provide a more natural The last part draws the conclusions.
interaction and a transparency of the computer tool. We cite,
for example, the gestural interfaces [12], which propose to II. LEAP MOTION DEVICE
improve the Human-Machine Interaction (HMI) using the
recognition of user gestures to enter commands. In this The LM is a compact sensor released in July 2013 by the
context, we hear a lot about the gestural controller "Leap Leap Motion Company [1]. The device has a small dimension
Motion" (LM). It offers a virtual interface to overcome any of 80 x 30 x 12.7 mm. It has a brushed aluminum body with a
black glass on its top surface, which hides three infrared
LEDs, used for scene illumination, and two CMOS cameras
spaced 4 centimeters apart, which capture images with a frame recognition accuracy in gesture and handwriting on the fly.
rate of 50 up to 200 fps. This depends on whether USB 2.0 or The acquired input data were treated as a time series of 3D
3.0 is used. The LMC allows a user to get information about positions and processed utilizing the dynamic time warping
objects located in a device’s field of view (a format similar to algorithm. In [7], an Indian sign language recognition system
an inverted pyramid, whose lower area length measures 25mm was developed with both hands using LM sensor. The
and the top one 600mm, with 150° of field of view). positional information of five-finger tips along with the center
of the palm for both hands was used to recognize the sign
The LMC Software Development Kit (SDK) (available for posture based on the Euclidean distance and the cosine
C++, Java, Objective- C, C#, Python, Javascript, etc.) can be
similarity. Likewise, in [8] the authors put forward a novel
utilized to develop applications that exploit the capabilities of method of pattern recognition to recognize symbols of the
this device, compatible with Windows and OS X operating Arabic sign language. The scheme extracted meaningful
systems. It is designed to provide real-time tracking of hands characteristics from the data, such as angles between fingers,
and fingers in a three-dimensional space with 0.01-millimeter to achieve a high-accuracy, which utilized a classifier to
accuracy, according to the manufacturer. The positions of the decide which gesture was being performed. In addition, a
hands and fingertips are detected in coordinates relative to the study presented in [10], described a database containing the
center of the controller, taking as reference the right-handed three-dimensional motion trajectories of the numbers and the
coordinate system and millimeters as the unit. Several gestures alphabet (36 gestures in total) which were captured by an LM
can be natively identified by the LM such us swipe gesture, for a rapid recognition of dynamic hand gestures. They used
circle gesture, key tap gesture and screen tap gesture. the SVM and the hidden markov model algorithms for
Another positive feature is its affordability. It costs classification. Moreover, Marin et.al in [3] utilized the LMC
approximately 80 $ (~ 160 TND), which collaborates to its and the Kinect devices to recognize the American manual
popularity. Therefore, the LM Motion has good performance, alphabet. A static gesture database is available online at
a high precision and a minimal amount of space to be an (http://lttm.dei.unipd.it/downloads/gesture) based on
adequate solution to opt for. fingertips’ positions and orientations. These features were then
fed into a multi-class support vector machine (SVM) classifier
to recognize their gestures. Depth features from the Kinect
were also combined with features from LMC to improve the
recognition performances. They only focused on static
gestures rather than dynamic ones.
Authors in [11] built two dynamic hand gesture datasets
with frames acquired with a LMC: the LeapMotion-Gesture3D
dataset and the Handicraft-Gesture dataset. The feature vector
with depth information is computed and fed into the Hidden
Fig. 1. LMC with micro–USB plug
Conditional Neural Field classifier to recognize dynamic hand
gestures.
III. EXISTING DATABASES In the medical field, the LMC offers a touchless interaction
Researchers have started to analyze the LM Controller system that allows the medical staff to interact directly with
(LMC) performances after its first release in 2013. A first digital devices that visualize digital images in a sterile
study of the accuracy and robustness of the LMC was environment. In this context, a usability evaluation study of a
presented in [5]. An industrial robot with a reference pen natural interaction system into the operating room conducted
allowing suitable position accuracy was used for the on the RISO system, (the acronym in English stands for
experiment. The results showed a deviation between the “Image Recognition in Operating Room”) was described in
desired 3D position and the average measured positions below [10]. The interaction with the medical images would occur
0.2 mm for static setups and of 1.2 mm for dynamic ones. through the LM by the mean of nine gestures forming their
database.
To improve the human computer interaction in different
fields through the LMC, we opt for recognizing the utilizer We propose a more developed hand gesture recognition
gestures to enter commands. system in the medical field, which not only provides static-
gesture recognition but also dynamic gesture ones. Only the
A study for TV control described in [9], during which 18
LMC is required to collect a new database formed by specific
participants contributed by free-hand gesture commands in 21
hand gestures most are useful for a sterile control interface,
television control tasks. The authors released their dataset
which we will present in the next section.
online, consisting of 378 LM gestures described by fingertips
position, direction, and velocity coordinates. In [6] a database
formed by 12 gestures was collected from over 100 IV. DESCRIPTION OF OUR DATASET
participants and then used to train a 3D recognition model During the technical visits to hospitals, we discussed with
based on convolutional neural networks, which could some surgeons the usability of touchless interaction systems in
recognize 2D projections of a 3D space. the sterile field to set finally the indispensable commands in
The sign language is another field of study: An order to handle medical images. These commands were
investigation of the LMC in [4] showed its potential identified mainly on the basis of intuitiveness and
515
2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)
516
2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)
517
2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)
B. SVM Classifier
A typical issue of many machine-learning techniques is
that a large dataset is required to properly train the classifier.
The acquisition of the training data is a critical task that can
require a huge amount of manual work. To compute the
results, we split the dataset into a training set and a test one. In
all the experiments, we have used the first six subjects for
learning and the last four ones for testing.
For classification, we test the SVM classifier, which is
one of the most common machine learning classifiers in use
today. It was derived from the learning theory and was widely Fig. 5. Accuracy rate based on variation of temporal window
utilized in detection and recognition.
TABLE II. PERFORMANCE OF LM FEATURES
In order to recognize the performed gestures, the
constructed feature vectors and their concatenation must be Features Accuracy (Wt = 20)
classified into G classes corresponding to the various gestures
Mean (µ) 40%
of the considered database. A multiclass SVM classifier based Standard deviation (S) 53.1818%
on the one-against-one approach has been used. A set of G (G- Covariance (C) 40.9091%
1) / 2 binary SVM classifiers are used to test each class against Root mean square (RMS) 30.9091%
each other and each output is chosen as a vote for a certain
gesture. For each sample in the test set, the gesture with the
maximum number of votes is the result of the recognition µ + S + C +RMS 80.9091 %
process. In particular, we opt for the SVM implementation in
the LIBSVM package (Chang and Lin, 2011). We set a non-
linear Gaussian radial basis function as the kernel, where the Finally, Table III provides the confusion matrix for the
classifier parameters are selected by means of a grid search SVM when all the features are combined together. The
approach and cross-validation on the training set. [3] diagonal of the matrix shows the correctly classified examples.
The dark gray cells represent the most correctly classified
518
2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)
examples for that class, while the light gray cells indicate the we will look to incorporate an alternative model, such as a the
false positives with failure rate greater than 10%. hidden markov model, as a segmentation method to determine
probable start and stop points for each gesture, and then input
Looking more in detail at the results in Table III, it is the identified frames of data into the convolutional neural
possible to notice how the accuracy is very close or above network model for gesture classification.
90% for most of them. G4, G5, G6, G7, G8 and G9 are the
critical gestures for the LM, and reveal a very high accuracy ACKNOWLEDGMENT
when recognized from the device. Whereas, gestures G1, G2
and G3, frequently fail the recognition. Both G2 and G3, two We would like to thank all people who contributed in the
reciprocal gestures, have a single raised finger (index) and collection and post processing of the database.
sometimes are confuse each other. Moreover, there are not
spatio-temporal characteristics used to differentiate between REFERENCES
them in our approach. G10 is sometimes confused with G4
since an open hand performs both of them. This is due to the [1] Leap Motion Controller. (Accessed on 10 November 2014).
limited accuracy of the hand direction estimation from the LM Available online at: https://www.leapmotion.com.
software. It can be also noticed that G1 is another challenging [2] M. Ben Abdallah, Med. Kallel, Med. S. Bouhlel, “An overview of
gesture owing to the touching fingers. gesture recognition,” IEEE, 6th International Conference on
Sciences of Electronics, Technologies of Information and
Telecommunications (SETIT), pp. 20-24, 2012.
TABLE III. CONFUSION MATRIX FOR PERFORMANCE EVALUATION [3] Marin, G., Dominio, F. & Zanuttigh, P, “Hand Gesture
Recognition with Leap Motion and Kinect,” IEEExplore, ICIP
G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 2014
G1 60 0 15 0 0 20 0 5 0 0 0 [4] Sharad Vikram, Lei Li, Stuart Russell.. “Handwriting and Gestures
in the Air, Recognizing on the Fly”. In Proceedings of the CHI
G2 5 50 25 5 0 0 0 0 15 0 0 2013 Extended Abstracts, Paris, France, 2013.
[5] Weichert, F.; Bachmann, D.; Rudak, B.; Fisseler, D. “Analysis of
G3 0 20 60 10 0 0 0 0 5 5 0 the accuracy and robustness of the leap motion controller”.
Sensors, vol.13, pp. 6380–6393, 2013.
G4 0 0 0 100 0 0 0 0 0 0 0 [6] R. McCartney1, J. Yuan1, and H.-P. Bischof. “Gesture
Recognition with the Leap Motion Controller”. Int'l Conf. IP,
G5 0 0 0 0 90 5 0 0 0 0 5 Comp. Vision, and Pattern Recognition IPCV'15, 2015
[7] Rajesh B. Mapari, Govind Kharat. “Real Time Human Pose
G6 0 0 0 0 0 100 0 0 0 0 0 Recognition Using Leap Motion Sensor”. ICRCICN, IEEE, 2015.
[8] Bassem Khelil, Hamid Amiri. “Hand Gesture Recognition Using
G7 0 0 0 0 0 0 100 0 0 0 0
Leap Motion Controller for Recognition of Arabic Sign
Language”. 3rd International conference ACECS’16,2016.
G8 0 0 0 0 0 0 0 100 0 0 0
[9] I-A.Zait¸ S-G.Pentiuc, R-D.Vatavu1, “On free-hand TV control:
G9 0 0 5 0 0 0 0 0 95 0 0 experimental results on user-elicited gestures with Leap Motion”.
Springer,Pers Ubiquit Comput vol19,pp821–838, 2015
G10 0 0 0 20 0 0 0 0 5 75 0 [10] Antonio Opromolla, Valentina Volpi, Andrea Ingrosso, Stefano
Fabri, Claudia Rapuano, Delia Passalacqua, and Carlo Maria
G11 0 0 0 5 0 5 0 5 0 0 85 Medaglia. “A Usability Study of a Gesture Recognition System
Applied During the Surgical Procedures”. Springer DUXU 2015,
Part III, LNCS 9188, pp. 682–692, 2015.
[11] Wei Lu; Zheng Tong; Jinghui Chu, “Dynamic Hand Gesture
VII. CONCLUSION Recognition With Leap Motion Controller,” IEEE Signal
Processing Letters, Vol.23, pp. 1188 – 1192, 2016.
In this paper, we have studied the influence of different [12] N.Triki; M. Kallel; Med. S. Bouhlel, “Imaging and HMI :
parameters on the overall recognition accuracy of a gesture Fondations and complementarities,” IEEE, 6th International
recognition system for the visualization and manipulation of Conference on Sciences of Electronics, Technologies of
medical images during surgical procedures. To evaluate the Information and Telecommunications (SETIT), pp. 25-29, 2012.
performance of our technique, we have collected a small but
challenging dataset of 11 dynamic gestures by the LM sensor.
Then the feature vectors of all gestures have been extracted.
Besides, the experimental database has been extended to
achieve early recognition. Subsequently, we have utilize the
train set to model the SVM, while the test set has been used to
detect the performance. Finally, the experimental results
demonstrate the effectiveness of the proposed method.
In this work, we utilized features based only in positional
information. Further research will be devoted to the
introduction of novel feature descriptors and to the extension
of the suggested approach to the recognition of dynamic
gestures exploiting also the temporal information. In addition,
519