Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/4045874

Visual control of a robotic hand

Conference Paper · November 2003


DOI: 10.1109/IROS.2003.1248819 · Source: IEEE Xplore

CITATIONS READS
24 279

4 authors, including:

Ignazio Infantino Antonio Chella


Italian National Research Council Università degli Studi di Palermo
98 PUBLICATIONS 639 CITATIONS 291 PUBLICATIONS 2,315 CITATIONS

SEE PROFILE SEE PROFILE

Irene Macaluso
Trinity College Dublin
100 PUBLICATIONS 936 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

UnipaBCI Augmented Framework View project

Self-Consciousness and Theory of Mind for a Robot Developing Trust Relationships View project

All content following this page was uploaded by Antonio Chella on 21 May 2014.

The user has requested enhancement of the downloaded file.


Visual Control of a Robotic Hand
Ignazio Infantino1 Antonio Chella1,2 Haris Džindo1 Irene Macaluso1
1 2
ICAR-CNR sez. di Palermo DINFO Università di Palermo
Viale delle Scienze, edif. 11, Viale delle Scienze,
90128, Palermo, Italy 90128, Palermo, Italy
(infantino, dzindo, macaluso)@pa.icar.cnr.it chella@unipa.it

Abstract -- The paper deals with the design and control the robotic hand. This paper is structured as follows:
implementation of a visual control of a robotic system composed section II describes the whole system and the
of a dexterous hand and stereo cameras. The aim of the proposed experimentation setup; section III deals with the fingertip
system is to reproduce the movements of a human hand in order localization and the tracking on image sequence; section IV
to learn complex manipulation tasks. A novelty algorithm for a concerns the control of robotic hand by inverse kinematics
robust and fast fingertips localization and tracking is presented.
Moreover, a simulator is integrated in the system to give useful
computation; section V reports the details about various
feedbacks to the users during operations, and provide robust experimentations done; finally, some conclusions and future
testing framework for real experiments (see video). work are drawn in section VI.

I. INTRODUCTION II. THE IMPLEMENTED SYSTEM DESCRIPTION

The control of robotic systems has reached a high level Our proposal deals with the design and implementation
of precision and accuracy, but often the high complexity and of a visual control of a robotic system composed by a
task specificity are limiting factors for large scale uses. dexterous anthropomorphic hand (the DIST-Hand [1]) and
Today, robots are requested to be both “intelligent” and calibrated stereo cameras. The aim of the present work is to
“easy to use”, allowing a natural and useful interaction reproduce real-time movements of a human hand in order to
with human operators and users. In this paper, we deal with learn complex manipulation tasks.
the (remote) control of an anthropomorphic robotic hand: The human hand is placed in front of the stereo couple. A
our approach is to provide a Human-Computer-Interface black panel is used as background and illumination is
[10][17][15] based on natural hand gesture imitation of the controlled. Fingertips and wrist are extracted at each frame
operator showing his hand [8]. Different methods have been and their 3D coordinates are computed by a standard
proposed to capture human hand motion. Rehg and Kanade triangulation algorithm. Given a hand model, inverse
[11][12] introduced the use of a highly articulated 3D hand kinematics is used to compute the joint angles which are
model for the tracking of a human hand [13][14]. For translated to suitable commands sent to the DIST-Hand
tracking, the axes of the truncated cylinders that are used to controller.
model phalanges, are projected onto the image, and local The system (see figure 1) is composed by some common
edges are found. Fingertip positions are measured through a modules that work in parallel on the two flows of images
similar procedure. A nonlinear least squares method is used sourcing from the stereo rig:
to minimize the error between the measured joint and tip
locations and the locations predicted by the model. The 1) image pre-processing operation (noise filtering,
system runs in real-time, however, dealing with occlusions tresholding, edge detecting);
and handling background clutter remains a problem. Heap 2) the fingertip tracking following the first estimation;
and Hogg [6] used a deformable 3D hand shape model. The 3) fingertip localization on image based on previous
hand is modeled as a surface mesh which is constructed via position.
PCA from training examples. Real-time tracking is achieved
by finding the closest possibly deformed model matching The coordinates of the fingertips are the input of the other
the image. In [16], Cipolla and Mendoca presented a stereo software modules responsible of:
hand tracking system using a 2D model deformable by
affine transformations. Wu and Huang [19] proposed a two- 1) 3D coordinate computation;
step algorithm to estimate the hand pose, first estimating the 2) inverse kinematics;
global pose and subsequently finding the configuration of 3) actuator commands computation.
the joints. Their algorithm relies on the assumption that all
fingertips are visible. III. FINGERTIP LOCALIZATION AND TRACKING
Our method uses fingertips as features, considering each
finger except the thumb as planar manipulator, and starting This section deals with the localization of the fingertips
from this hypothesis we compute inverse kinematics to in long image sequences. In order to achieve real-time
performances state-space estimation techniques are used to where (r(k), c(k)) represent the fingertip pixel position at
reduce the search space from the full camera view to a kth frame, and (vr(k) , vc(k)) be its velocity in r and c
smaller search window centered on the prediction of the directions.
future position of the fingertips. In the following, a novel
algorithm is proposed to detect the fingertip position inside
the search window.

I*
Left Image Image Fingertip xreal 3D Coordinate
Preprocessing xpred Extraction Computation
Right Image Module Module Module

a b
Kalman Filter Inverse
Module Kinematics
Module

Qhuman
c
Initial estimates
Human-To-
Robot Mapping Fig.2 (a) Initial hand configuration used to calculate the first estimation of
Module the fingertip positions; (b) three points (yellow squares) with high
correlation response individuate a group and provide a fingertip candidate
Qrobot computed as centroid of such a group (red arrow); (c) circular templates
used to search fingertips.

To the communication
units Under the assumption that a fingertip moves with
constant translation over the long sequence and a
sufficiently small sampling interval (T), the plant equation
for the recursive tracking algorithm can be written as
Fig.1 The system architecture: two parallel set of procedures perform
separately the estimation of the fingertips on right and left images; the
coordinates extracted are the input of the procedures responsible for the X(k+1)=Ax(k)+w(k)
control of the robotic hand.
Where
The visual servoing procedure starts with a preliminary
setup phase, based on various image processing operations. 1 0 T 0
The procedure individuates the hand/arm on image, 0 1 0 T
compares it with robotic one and calculates exact size ratios. A =
0 0 1 0
Relevant features are tracked on image sequences, and 0 0 0 1
reconstructed in a three-dimensional space, using standard
computer vision algorithms [3]. These features are used as a and the plant noise w(k) is assumed to be zero mean with
guide to perform 3D reconstruction of the simulated covariance matrix Q(k).
arm/hand, and real system positioning. The reconstructed For each fingertip, the measurement used for the
movements are reproduced both by the simulator and the corresponding recursive tracking filter at time tk is the
robotic system in real time. image plane coordinates of the corresponding point in the
kth image. Thus the measurement is related to the state
A. Fingertip Tracking vector as:

In this section we are involved with prediction of Z(k) = Hx(k) + v(k)


feature points in long image sequences. If the motion of the where
observed scene is continuous, as it is, we should be able to
make predictions on the motion of the image points, at any 0 1 0 0
time, on the basis on their previous trajectories. Kalman H =
0 0 1 0
filter [1] [16] is the ideal tool in such tracking problems.
The motion of a fingertip at each time instance (frame) and the measurement noise v(k) is assumed to be zero mean
can be characterized by its position and velocity. For a with covariance matrix R(k).
feature point, define the state vector at time tk as After the plant and measurement equations have been
formulated, the Kalman filter can be applied to recursively
x(k)= [ r(k) c(k) vr(k) vc(k)]T estimate the motion between two consecutive frames and
track the feature point.
Assuming that the trajectory of a feature point has been noise or the use of a lower image scale doesn’t degrade the
established up to the kth frame of a sequence, we describe performance of the algorithm (see figure 3). No constraints
procedures for estimating the in-frame motion between the of the possible hand postures are necessary and gray-level
kth and (k+1)th frames. First, for a fingertip, the predicted video images are sufficient to obtain good results.
location of its corresponding point z(k+1|k) is computed by
the KF. Subsequently, a window centering at z(k+1|k) is
extracted from the (k+1)th image and the fingertip extraction
algorithm is applied to the window to identify salient feature
points as described in the next section. The same process is
repeated for each fingertip.
The initial fingertips and wrist coordinates are computed by
assuming the hand is placed as in figure 2.a. Under such
hypothesis it is straightforward to compute the fingertip
coordinates.

B. Fingertip Extraction

This subsection describes a novelty algorithm to


perform robust and fast localization of the fingertips.
Once a search window is determined for each fingertip, the
feature point is searched for within that window. The overall
shape of a human fingertip can be approximated by a Fig.3 Results of fingertips extraction in different hand postures; the
semicircle. Based on this observation, fingertips are algorithm robustness has been tested also with artificial noise added.
searched for by template matching with a set of circular
templates as shown in figure 2.c. Ideally, the size of the C. Three-dimensional coordinate computation
templates should differ for different fingers and different
users. However, our experiments showed that fixed window Given the camera matrices it is possible to compute the
size works well for various users. This result has been fundamental (or essential) matrix of the stereo rig [3].
proven by other authors [9]. We choose a square of 20x20 Epipolar geometry is used to find correspondences between
pixels with a circle whose radius is 10 pixels as a template two views.
for normalized correlation in our implementation. In order to reduce computational efforts, matching
In order to find meaningful data we first perform an features are searched on the epipolar line in the search
adaptive threshold on the response images to obtain distinct window of the second image provided by KF by means of
regions. For each correlation we select the points with the normalized correlation. 3D fingertip coordinates are
highest matching scores (usually 2-3) which lies in computed by a standard triangulation algorithm.
unconnected regions of the thresholded image.
Subsequently, near responses of different correlations IV. THE ROBOTIC HAND CONTROL
are grouped. We select the groups characterized by the
highest number of points associated and the centroid of such The coordinates of the four fingertips and the wrist are
groups are used to compute the fingertip candidates. In used to solve the inverse kinematics problem for joint
order to calculate those candidates, we must combine the angles, provided a kinematics model of a human hand. The
information provided by each point in the group. However, model adopted [7] is designed to remain simple enough for
only the points corresponding to semi-circle correlation inverse kinematics to be done in real-time, while still
responses are used in this stage. respecting human hand capabilities.
Each point is associated with a vector whose direction The whole hand is modelled by a 27 DOF skeleton
depends on the orientation of the corresponding semi-circle whose location is given by the wrist’s middle point and
(i.e. right - 0°, up - 90°, and so on). Its module is given by whose orientation is that of the palm. Each finger consists of
the correlation matching score. The direction in which the three joints. Except for the thumb, there are 2 degrees of
fingertip candidate should be searched is computed as the freedom (DOF) for metacarpophalangeal (MCP) joints, and
angle formed by the vector sum of such vectors (see figure 1 DOF for proximal interphalangeal (PIP) joints and distal
2.b). Therefore, the fingertip candidate coordinate for each interphalangeal (DIP) joints. The thumb is modelled by a 5
group lies on the line through the previously computed DOF kinematic chain, with 2 DOF for the trapezio-
centroid and whose direction is given by the above metacarpal (TM) and MCP joint and 1 for interphalangeal
procedure. We use techniques of data association based on (IP) joint.
the previous measurements and predicted state to choose the In order to obtain meaningful configurations and to
correct fingertip among the selected candidates. reduce the dimensionality of the joint space we include hand
The robustness of this algorithm has been tested using constraints [21] in the model. The constraints are of two
various fingers configuration: also in the more complicated types: static (limit the range of finger motion) and dynamic
case of two very close fingers on the background palm the (describe relationships between some finger joints, i.e. qII-
IV II-IV
system follow the correct feature. The addition of artificial DIP=(2/3)q PIP). Besides, the thumb can be modelled as a
4 DOF manipulator. Since the robotic hand has four fingers vector qi(k) may be computed by minimizing the following
and it is not placed on an arm, we are not interested neither function:
in the posture of the pinky, nor in the position/orientation of
the hand. Then, given the hand constraints, it is possible to F (qi(k)) = D ( xi(k), k(qi(k)) ), i=I...IV
reduce the number of DOF to 13.
subject to the human hand constraints. The initial guess is
A. Inverse Kinematics set to the previously computed joint vector qi(k-1) since
under the slow motions assumption we can state that the
In order to solve the inverse kinematics problem, joint angles vary little from time k-1 to time k.
calibration of the human hand has to be performed to adapt The computed positions of the hand are sent to the
to various users. This process is must be done off-line using human-to-robot mapping unit which computes the
an interactive procedure. We use 6 frames to represent corresponding robot joint angles. These angles are
finger kinematics: Oc (the camera frame), Oh (the hand communicated by local network to the real hand, or by the
frame whose origin is the wrist’s middle point and whose DDE mechanism to the simulator (see fig. 4).
axes are that of the camera frame) and Ofr(I-IV) (the finger-
root frame whose origin is at first flexion/extension joint of
each finger and whose axes are defined by the palm
orientation with z axes perpendicular to the palm plane and
x axis pointing towards the respective fingertip in the initial
configuration).

Fig.4 A simulator is part of the system architecture, allowing to visualize


the reconstructed movements on the virtual robotic hand. The 3D model
used is very similar to real robotic hand and receive the same commands.
The simulator also includes the model of a PUMA-arm that in future will
complete our system architecture.

Under the assumption that the hand orientation doesn’t


change in time, we can compute the fixed transformation
matrixes Thfr(I-IV) during the initialization phase. In such a
way it is possible to express the coordinates of each
fingertip in the finger-root frame and solve for the joint state
vector. For each finger except for the thumb we compute the
joint variable qMCP-AAII-IV as the angle between the projection
of the fingertip on the palm plane and the x axes. By
rotating the finger-root frame around z axis by angle qMCP-
II-IV
AA , each finger II-IV can be represented as a 3-link
planar manipulator. The joint angles qMCP-FE, qPIP and qDIP
for the fingers II-IV are then found by solving the inverse
kinematics problem for such a manipulator. The thumb joint
angles are found directly by solving the inverse kinematics
problem for the kinematics model associated with it. Fig.5 . Hand tracking and control of robotic hand movements using web-
cameras.
Given the fingertip coordinates xi(k) and the forward
kinematics equation xi(k)=k[qi(k)], i=I...IV, and denoting
with D(x,y) the distance between two points, each state
Sample images Hand movement Success rate (%) The software (the described system and the simulator) runs
on a personal computer (Pentium III, 450 MHz), equipped
Opened hand 100% with two video grabber cards and an ethernet card. The
(initial position) movements command are send to robotic DIST-hand using
a TCP-IP link to it.
The various procedures implemented to perform the
visual control allow a frame rate of 10 images per second,
permitting a qualitative correct reproduction of the
Thumb 89%
movements of the real hand presented (see video).
(T)
If the system find inconsistencies of fingertip positions
or fails to follow them (mainly due a too rapid movements),
the user can establish a new visual control showing the hand
in the position of initialization. A more reactive behavior of
Index 92%
the system is obtained if it is used only the images of a
(I)
single camera: avoiding the 3-d reconstruction of the
fingertips, it is possible to perform the control of the robotic
hand using the variations of the length ratios from the initial
position.
Middle 91% Naturally, the precision is lower but it is enough to
(M) execute manipulation or grasping tasks (see example shown
in figure 5). We have tested the system performing 17
standard movements starting from the initial position:
opened hand (no movement), movements involving only
Ring 90% one finger (4 cases), movements involving two fingers (6
(R) cases), three fingers (4 cases), four fingers, closed hands
(see figure 6). For each final posture, it has been calculated
the percentage of success in reaching correct final position
by qualitative observation of the robotic hand posture. The
Two fingers 85% results shown in figure 6 are related to 20 acquired
(TI, TM,TR, sequences for each class of movements.
IM, IR, MR)
VI. CONCLUSIONS

Three fingers 80% The research demonstrates how to control in “natural


(TIM, TIR, way” a robotic hand using only visual information. The
IMR, TMR) proposed system is able to reproduce the movements of a
human hand in order to perform telepresence tasks or to
learn complex manipulation tasks.
Four fingers 74% Moreover, a novelty algorithm for a robust and fast
(TIMR) fingertips localization and tracking is been presented and
results shown the goodness of the proposed solution. Future
work will extend the visual control to a complete robotic
arm-hand system.
Closed hand 73%
(TIMR+P) VII. ACKNOWLEDGMENTS

This research is partially supported by MIUR (Italian


Ministry of Education, University and Research) under
project RoboCare (A Multi-Agent System with Intelligent
Fig.6 .We have tested the system performing 17 standard movements Fixed and Mobile Robotic Components).
starting from the initial position. For each final posture, it ha been
calculated the percentage of success in reaching correct final position by
qualitative observation of the robotic hand. The results shown are related VIII. REFERENCES
to 20 acquired sequences for each class of movements. Note: letters
T,I,M,R,P indicate the fingers. [1] S. Bernieri, A. Caffaz, G. Casalino, “The DIST-Hand”
Robot IROS '97, Conference Video Proceedings,
V. EXPERIMENTAL RESULTS Grenoble (France), September 1997.
[2] C. K. Chui, G. Chen, Kalman Filtering: With Real-
We have tested our proposed method by using two Time Applications, Springer Series in Information
different stereo rigs: the first one was composed by two Sciences, 1999.
Sony cameras, and the second one by two usb web-cams.
[3] O. Faugeras, Three-Dimensional Computer Vision, [13] R. Rosales, V. Athitsos, L. Sigal, S. Sclaroff, “3D
MIT Press, Cambridge, MA, 1993. Hand Pose Reconstruction Using Specialized
[4] B.P.K. Horn, Robot Vision, MIT Press, Cambridge, Mappings”, in Proc. IEEE Intl. Conf. on Computer
MA, 1986. Vision (ICCV’01), Canada, Jul. 2001.
[5] F. Lathuilière, J.Y. Hervé, “Visual Hand Posture [14] Y. Sato, M. Saito, H. Koike, “Real-Time Input of 3D
Tracking in a Gripper Guiding Application”, in Proc. Pose and Gestures of a User’s Hand and Its
Of IEEE Int’l Conference on Robotics and Applications for HCI”, in Proc. of Virtual Reality
Automation, San Francisco, April 2002. 2001 Conference, pp. 13-17, March 2001, Yokohama,
[6] A. J. Heap, D. C. Hogg, “Towards 3-D hand tracking Japan.
using a deformable model”, in 2nd International Face [15] T. Starner, J. Weaver, A. Pentland, “Real-time
and Gesture Recognition Conference, pp. 140-145, American sign language recognition using desk- and
Killington, Vermont, USA, October 1996. wearable computer-based video”, IEEE PAMI, 20(12):
[7] J. Lee, T.L. Kunii, “Model-based analysis of Hand 1371–1375, 1998.
Posture”, in IEEE Computer Graphics and [16] B.D.R. Stenger, P.R.S. Mendonca, R. Cipolla,
Applications, pp. 77-86, 1995. “Model-Based Hand Tracking Using an Unscented
[8] C. Nolker, H. Ritter, “Visual Recognition of Kalman Filter”, in Proc. British Machine Vision
Continuous Hand Postures”, IEEE Transaction on Conference, Manchester, UK, September 2001.
Neural Network, Vol. 13, no. 4, July 2002. [17] B.D.R. Stenger, P.R.S. Mendonca, R. Cipolla, “Model
[9] K. Oka, Y. Sato, H. Koike, ”Real-time tracking of based 3D tracking of an articulated hand”, in Proc.
multiple fingertips and gesture recognition for CVPR‘01, pages 310–315, 2001.
augmented desk interface systems”, in Proc. 2002 [18] S. Waldherr S., Thrun S., Romero R, “A gesture-based
IEEE Int'l Conf. Automatic Face and Gesture interface for human-robot interaction”, Autonomous
Recognition 2002 , pp. 429-434, May 2002. Robots, 9, (2), 2000, 151-173.
[10] V. Pavlovic, R. Sharma, T.S. Huang., “Visual [19] Y.Wu, T. S. Huang, “View-independent Recognition
interpretation of hand gestures for human-computer of Hand Postures”, in Proc. of IEEE CVPR'2000, Vol.
interaction: a review”, IEEE PAMI, 19(7), 1997. II, pp.88-94, Hilton Head Island, SC, 2000
[11] J.M. Rehg, T. Kanade, “DigitEyes: Vision-Based [20] Y. Wu, J.Y. Lin, T. S. Huang, “Capturing Natural
Hand Tracking for Human-Computer Interaction”, in Hand Articulation”, in Proc. of IEEE Int'l Conf. on
Proc. of the IEEE Workshop on Motion of Non-Rigid Computer Vision, Vancouver, Canada, 2001.
and Articulated Objects, Austin, Texas, Nov. 1994, [21] Y. Wu, J. Y. Lin, T. S. Huang, "Modeling Human
pp.16-22. Hand Constraints", in Proc. Of Workshop on Human
[12] J.M. Regh, T. Kanade, “Visual tracking of high dof Motion (Humo2000), Austin, TX, Dec., 2000.
articulated structures: an application to human hand
tracking”, in Proc. of Third European Conf. on
Computer Vision, Stockholm, Sweden, 1994, pp. 35-
46.

View publication stats

You might also like