Professional Documents
Culture Documents
Hand-Gesture Recognition Using Computer-Vision Techniques: David J. Rios-Soria Satu E. Schaeffer Sara E. Garza-Villarreal
Hand-Gesture Recognition Using Computer-Vision Techniques: David J. Rios-Soria Satu E. Schaeffer Sara E. Garza-Villarreal
21st International Conference on Computer Graphics, Visualization and Computer Vision 2013
ABSTRACT
We use our hands constantly to interact with things: pick them up, move them, transform their shape, or activate
them in some way. In the same unconscious way, we gesticulate in communicating fundamental ideas: ‘stop’,
‘come closer’, ‘over there’, ‘no’, ‘agreed’, and so on. Gestures are thus a natural and intuitive form of both
interaction and communication. Gestures and gesture recognition are terms increasingly encountered in discussions
of human-computer interaction. We present a tool created for human-computer interaction based on hand gestures.
The underlying algorithm utilizes only computer-vision techniques. The tool is able to recognize in real time six
different hand gestures, captured using a webcam. Experiments conducted to evaluate the system performance are
reported.
vide effective human-computer interaction for applica- movements as commands. The system is a combination
tions involving complex tasks. In these applications, of a pointer position and non-chorded keystroke input
users should be supplied with sophisticated interfaces device to track finger position [AM06a].
allowing them to navigate within the system, select ob- An interactive screen developed by The Alternative
jects, and manipulate them. Agency1 in UK is located in a department store window
The use of computer vision for human-computer in- (Figure 1). The Orange screen allows interaction just
teraction is a natural, non-intrusive, non-contact solu- by moving the hands in front of the window without the
tion. Computer vision can be used for gesture detection need to touch it.
and classification, and various approaches have been
proposed to support simple applications. To recognize
hand gestures using computer vision, it is first needed
to detect the hand on an image or video stream. Hand
detection and pose estimation involve extracting the po-
sition and orientation on the hand, fingertip locations,
and finger orientation from the images. Skin-color fil-
tering is a common method for locating the hand be- Figure 1: The world’s first touchless interactive shop
cause of its fast implementation. Skin-color filters rely window
on the assumption that the hand is the only skin-colored
object. Gesture classification is a research field involv- Lenman et al. [LBT02] use gesture detection to inter-
ing many machine-learning techniques such as neural act with electronic-home devices such as televisons and
networks and hidden Markov models [SP09]. DVD players.
However, hand-pose estimation is still a challenge in MacLean et al. [MHP+ 01] use hand-gesture recog-
computer vision. Several open problems remain to be nition for real-time teleconferencing applications. The
solved in order to obtain robustness, accuracy, and high gestures are used for controling horizontal and verti-
processing speed. The need of an inexpensive but high- cal movement as well as zooming functions. Schlömer
speed system is rather evident.Development of these et al. [SPHB08] use hand-gesture recognition for inter-
systems involves a challenge in the research of effective action with navigation applications such viewing pho-
input/output techniques, interaction styles, and evalua- tographs on a television, whereas Roomi et al. [RPJ10]
tion methods [EBN+ 07]. propose a hand-gesture detection system for interac-
tion with slideshow presentations in PowerPoint. The
gesture-detection system presented in Argyros et al.
3 Related work [AL06] allows to control remotely the computer mouse.
Sixthsense [MMC09] is a system that converts any
There are several areas where the detection of hand ges-
surface into an interactive surface. In order to interact
tures can be used, such as device interaction, virtual-
with the system, hand gesture recognition is used. In the
object interaction, sign-language recognition, and robot
Sixthsense system, color markers are used in the fingers
control. Wachs et al. [WKSE11] present some exam-
to detect the gestures.
ples of applications such as medical assistance systems,
crisis management, and human-robot interaction. In
the following subsection we present some examples of 3.2 Virtual object interaction
gesture-based interaction systems.
Gesture detection can be used for interaction with vir-
tual objects; there are several works that show applica-
3.1 Device interaction tions for this scenario.
Hirobe et al. [HNW+ 09] have created an interface
There are works related to electronic device interac-
for mobile devices using image tracking. The system
tion; for example, Stergiopoulou et al. [SP09] use self-
tracks the finger image and allows to type on an in-air
growing and self-organized neural networks for hand
keyboard and draw 3D pictures.
gesture recognition. Another example is Finger count-
HandVu [Kol10] is a hand-gesture vision-based recog-
ing [CB03] a simple human-computer interface. Using
nition system that allows interaction with virtual objects
a webcam, it interprets specific hand gestures as input
(Figure 2) HandVu detects the hand in a standard pos-
to a computer system in real time.
ture, then tracks it and recognizes key postures, all in
The UbiHand [AM06b] is an input device that uses a
real-time and without the need for camera or user cali-
miniature wrist-worn camera to track finger position,
bration. Although easy to understand, the used gestures
providing a natural and compact interface. A hand
are not natural.
model is used to generate a 3D representation of the
hand, and a gesture recognition system interprets finger
1 http://www.thealternative.co.uk/
sets of edge points. To create a connected set we select We identify the peaks of the fingers in the image by
one contour pixel as a seed and recursively add to the computing the convex hull of the hand edge. The con-
set pixels that are also contour pixels and are adjacent vex hull is a descriptor of shape, is the smallest con-
to at least one pixel in the set, until there are no more vex set that contains the edge; intuitively explained —
adjacent contour pixels. If there are contour pixels left, in two dimensions— as the form taken by a rubber band
then we select another contour pixel as a seed to create when placed around the object in question; an example
a new connected set; we repeat iteratively until all the is given in Figure 5). It is used in computer vision to
contour pixels are in a connected component. simplify complex shapes, particularly to provide a rapid
In our case, we assume the hand to be in the image fore- indication of the extent of an object.
ground, making it likely that the largest connected con- We now copy the binary image O to a new image
tour component will correspond to the hand, whereas C. We will then iteratively seek and eliminate con-
any smaller components of the contour set, if present, cave regions. Intuitively, this can be done by examining
correspond to some objects on the background. the values of the pixels in an arbitrary straight segment
We denote the set of contour pixels of the largest con- with both endpoints residing in white pixels. If any of
nected component by E. We construct a new binary im- the pixels along the segment are black, they are col-
age O by copying B and then setting to zero (black) all ored white, together with any black pixels beneath the
those pixels that correspond to the smaller connected segment. This repeated “filling” will continue until no
components of the contour and their insides, leaving more segments with white end points and intermediate
only E and the pixels inside it at one (white). This can black pixels exist. An algorithm for achieving this is
be done by a standard bucket-fill algorithm. given in the text book of Davies [Dav04].
The resulting white zone in C is now convex and the
4.4 Convex hull and convexity defects edge of that zone —all those white pixels that have at
least one black neighbor— form the convex hull of the
At this point, we have identified the edge of the hand in hand-shape in E. We denote this edge by H.
the image. We now proceed to determining which hand
gesture is being made in the image. The way in which
this is done depends on the the type of hand gestures
supported by the system —no single design is adequate
for all possible hand positions–. The gestures that we
wish to detect are shown in Figure 4.
(a) Edge and hull. (b) Vertices and defects.
2 http://opencv.willowgarage.com/
3 http://www.python.org/ Figure 9: Correctly detected gestures.
In total, 93.1% of the gestures were correctly de- present) divided by one fifth of the base width, rounded
tected, improving the results for a previous work [RSS12]; down to the preceding integer value.
the gestures for numbers three, four, and five have Another aspect that needs to be addressed in future
the highest accuracy and present low variation between work is the sensibility of the system to lighting condi-
hands. The gestures for number one, however, has the tions, as this affects the skin-color filtering, particularly
lowest detection percentage. Also, gestures for zero, with reflections and shadows. We expect these addi-
one, and two show variability according to the hand tions to improve the accuracy of the detection system,
used. The gesture-detection algorithm works correctly as well as ease the cognitive burden of the end user as it
a majority of the time, under the conditions used in our will no longer be necessary to keep the fingers separate
experiments. User observation helped us notice that the —something that one easily forgets—.
primary cause for incorrect gesture detection was the
particular form in which each user performs the gesture:
sometimes, for example, the fingers were very close to 8 References
each other. Some examples are shown in Figure 10. We
discuss a possible work-around to this problem as part REFERENCES
of future work in the next section.
[AL06] Antonis Argyros and Manolis Lourakis. Vision-
based interpretation of hand gestures for re-
mote control of a computer mouse. In Thomas
Huang, Nicu Sebe, Michael Lew, Vladimir
Pavlovic, Mathias Kölsch, Aphrodite Galata,
and Branislav Kisacanin, editors, Computer Vi-
sion in Human-Computer Interaction, volume
3979 of Lecture Notes in Computer Science,
pages 40–51. Springer, Berlin / Heidelberg, Ger-
many, 2006.
Vision-based hand pose estimation: A review. [RPJ10] S.M.M. Roomi, R.J. Priya, and H. Jayalak-
Computer Vision and Image Understanding, shmi. Hand gesture recognition for human-
108:52–73, 2007. computer interaction. Journal of Computer Sci-
ence, 6(9):1002–1007, 2010.
[HNW+ 09] Yuki Hirobe, Takehiro Niikura, Yoshihiro [RSS12] David J. Rios Soria and Satu E. Schaeffer. A tool
Watanabe, Takashi Komuro, and Masatoshi for hand-sign recognition. In 4th Mexican Con-
Ishikawa. Vision-based input interface for mo- ference on Pattern Recognition, volume 7329 of
bile devices with high-speed fingertip tracking. Lecture Notes in Computer Science, pages 137–
In 22nd ACM Symposium on User Interface 146. Springer, Berlin / Heidelberg, 2012.
Software and Technology, pages 7–8, New York,
NY, USA, 2009. ACM. [SP09] E. Stergiopoulou and N. Papamarkos. Hand ges-
ture recognition using a neural network shape
[KMB07] P. Kakumanu, S. Makrogiannis, and N. Bour- fitting technique. Engineering Applications of
bakis. A survey of skin-color modeling Artificial Intelligence, 22(8):1141–1158, 2009.
and detection methods. Pattern Recognition, [SPHB08] Thomas Schlömer, Benjamin Poppinga, Niels
40(3):1106–1122, 2007. Henze, and Susanne Boll. Gesture recognition
with a Wii controller. In Proceedings of the
[Kol10] Kolsch. Handvu. www.movesinstitute.org/ 2nd international conference on Tangible and
\textasciitildekolsch/HandVu/HandVu. embedded interaction, pages 11–14, New York,
html, 2010. NY, USA, 2008. ACM.
[KPS03] J. Kovac, P. Peer, and F. Solina. Human skin [VSA03] Vladimir Vezhnevets, Vassili Sazonov, and Alla
colour clustering for face detection. In Interna- Andreeva. A survey on pixel-based skin color
cional conference on Computer as a Tool, vol- detection techniques. In Proceedings of inter-
ume 2, pages 144–147, NJ, USA, 2003. IEEE. national conference on computer graphics and
vision, pages 85–92, Moscow, Russia, 2003.
[LBT02] S. Lenman, L. Bretzner, and B. Thuresson. Moscow State University.
Computer vision based hand gesture interfaces
for human-computer interaction. Technical re- [WKSE11] Juan Pablo Wachs, Mathias Kölsch, Helman
port, CID, Centre for User Oriented IT Design. Stern, and Yael Edan. Vision-based hand-
Royal Institute of Technology Sweden, Stock- gesture applications. Communications ACM,
hom, Sweden, June 2002. 54:60–71, feb 2011.
[Mah08] Tarek M. Mahmoud. A new fast skin color [WMW09] Jacob O. Wobbrock, Meredith Ringel Morris,
detection technique. World Academy of Sci- and Andrew D. Wilson. User-defined gestures
ence, Engineering and Technology, 43:501–505, for surface computing. In Proceedings of the
2008. 27th international conference on Human factors
in computing systems, pages 1083–1092, New
[MHP+ 01] J. MacLean, R. Herpers, C. Pantofaru, L. Wood, York, NY, USA, 2009. ACM.
K. Derpanis, D. Topalovic, and J. Tsotsos. Fast [WP09] Robert Y. Wang and Jovan Popović. Real-time
hand gesture recognition for real-time telecon- hand-tracking with a color glove. ACM Trans-
ferencing applications. In Proceedings of the actions on Graphics, 28:63:1–63:8, jul 2009.
IEEE ICCV Workshop on Recognition, Analy-
sis, and Tracking of Faces and Gestures in Real- [WSE+ 06] Juan Wachs, Helman Stern, Yael Edan, Michael
Time Systems, pages 133–140, Washington, DC, Gillam, Craig Feied, Mark Smith, and Jon Han-
USA, 2001. IEEE Computer Society. dler. A real-time hand gesture interface for
medical visualization applications. In Ashutosh
[MMC09] Pranav Mistry, Pattie Maes, and Liyan Chang. Tiwari, Rajkumar Roy, Joshua Knowles, Erel
WUW - wear ur world: a wearable gestural in- Avineri, and Keshav Dahal, editors, Applica-
terface. In Proceedings of the 27th international tions of Soft Computing, volume 36 of Advances
conference extended abstracts on Human factors in Soft Computing, pages 153–162. Springer,
in computing systems, pages 4111–4116, New Berlin / Heidelberg, 2006.
York, NY, USA, 2009. ACM.
[ZM11] Morteza Zahedi and Ali Reza Manashty. Ro-
[Par96] J. R. Parker. Algorithms for Image Processing bust sign language recognition system using ToF
and Computer Vision. John Wiley & Sons, Inc., depth cameras. Information Technology Journal,
New York, NY, USA, 1 edition, 1996. 1(3):50–56, 2011.