Image-Based_Visual_Servoing_Techniques_for_Robot_Control

Image-Based Visual Servoing Techniques for Robot
Control
1st Mohamed Kmich 2nd Hicham Karmouni 3rd Inssaf Harrade
2022 International Conference on Intelligent Systems and Computer Vision (ISCV) | 978-1-6654-9558-5/22/$31.00 ©2022 IEEE | DOI: 10.1109/ISCV54655.2022.9806078
Engineering, Systems, and Applications Engineering, Systems, and Applications Engineering, Systems, and Applications
Laboratory Laboratory Laboratory
National School of Applied Sciences, National School of Applied Sciences, National School of Applied Sciences,
Sidi Mohamed Ben Abdellah Sidi Mohamed Ben Abdellah Sidi Mohamed Ben Abdellah
University University University
Fez, Morocco Fez, Morocco Fez, Morocco
mohamed.kmich@usmba.ac.ma hicham.karmouni@usmba.ac.ma inssaf.harrade@usmba.ac.ma
4th Achraf Daoui 5th Mhamed Sayyouri

Engineering, Systems, and Applications Engineering, Systems, and Applications
Laboratory Laboratory
National School of Applied Sciences, National School of Applied Sciences,
Sidi Mohamed Ben Abdellah Sidi Mohamed Ben Abdellah
University University
Fez, Morocco Fez, Morocco
achraf.daoui@usmba.ac.ma mhamed.sayyouri@usmba.ac.ma
Abstract— Visual servoing involves extracting information behavior of any Visual Servoing technique. Visual features
from the image provided by cameras and integrating it into the can be selected in the camera image plane: such as point
robot control loop to achieve the desired action. Visual servoing coordinates [5], line or ellipse parameters [10], and moments
techniques based on geometric primitives are effective for a [11], or in Cartesian space: such as the pose of the target
large class of applications. In this paper, we present a object, 3D point coordinates [12], or combinations of the two
comparison between points, line, and circle primitives in terms previous visual features [13].
of feature error, computation time, and space velocity
convergence of the vision sensor. The simulation results show The 2D Visual Servoing method is the control of the
the ability of visual servo techniques based on the three motion of a robotic system using visual features expressed in
geometric primitives for controlling robots with superiority to the image space of the camera. There are several types of
line primitives in terms of computation time and speed of visual features that we can select for use in this method [14].
convergence. Only a few articles have been devoted to the comparative
study of visual servoing techniques in the literature. As in the
Keywords— Visual Servoing, 2D Visual Servoing, Visual article [9], the authors focused on comparing the two famous
Features. visual servoing techniques: IBVS and PBVS. In this paper, we
I. INTRODUCTION focused on the comparison between the basic features (which
are points, lines, and circles) used in image-based visual
Robotics is a technology that has developed enormously servoing (IBVS). It is shown that these features are almost
in recent years and is a field that includes the study, design, identical in terms of the convergence of the camera's spatial
and manufacture of robots [1]. Robots can be used in several velocity. However, in terms of the execution time and error of
fields of application such as industry [2], health [3], and the features, we found that lines converge faster than points
aerospace [4]. For the robot to recognize its environment, it which converge faster than circles. Furthermore, according to
must be equipped with sensors, and when we use a camera to the ETIR test [15], we find that lines are 28.93% more
control the movement of a robot, we talk about Visual efficient than circles and points are 23.72% more efficient
Servoing [5], [6]. than circles.
Visual Servoing consists of extracting information from The remainder of this paper is organized as follows: The
the image provided by the camera and integrating it into the kinematic control law employed in the image-based servo is
robot control loop to perform the desired task. There are two discussed in Section II. The analytical approach to obtaining
well-known configurations of camera location: the first is the interaction matrix of these visual elements is presented in
when the camera is fixed on the robot end-effector and Section III. In Section IV, the simulation results are presented.
observing the target object, this configuration is called eye-in- Finally, we end our paper with a conclusion and perspectives.
hand [7]; the second is when the vision sensor is mounted in
the world and simultaneously observes the target object and II. 2-D VISUAL SERVOING
the end-effector of the robot, this configuration is called eye-
to-hand. [8]. In the following, we will only discuss the first A. Camera Modeling
eye-in-hand configuration. Generally, we need eleven intrinsic and extrinsic
parameters to model a camera.
The use of Visual Servoing has improved the performance,
accuracy, and robustness of robotic systems [9]. The choice of • Intrinsic Parameters
the control law to be used and the choice of visual information To model the internal geometry and optical characteristics
used as inputs to this law have a significant influence on the of a camera, we need five parameters that are essential to relate
978-1-6654-9558-5/22/$31.00 ©2022 IEEE

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 04,2024 at 11:48:29 UTC from IEEE Xplore. Restrictions apply.
the metric coordinates to the pixel coordinates. These
parameters are :
- The focal length f of the camera in meter;
- The pixel coordinates ( u 0 , v 0 ) which are the
coordinates of the optical center O c ;
- The k u and k v parameters, which are the horizontal
and vertical factors expressed in pixels/meter, that
define the size of a pixel.
In addition, we can define these intrinsic parameters as a Fig. 1. 2D Visual Servoing scheme
3×3
matrix K by [16]: All control laws employed in robotic servo systems have
the aim to minimize the error e(t) between the initial and
 f / ku 0 u0 
  desired features. This error is usually defined by [5]:
K = 0 f / kv v0  (1)
 0 e(t) = s(m(t), a) - s* (3)
 0 1 
The parameters of equation (1) are described as: m (t ) is a
• Extrinsic Parameters vector that represents measurements in the image space (such
The extrinsic parameters are needed to link the camera as the coordinates of the points of interest or the coordinates
frame to the scene frame where the object is located. of the center of gravity of the target object). This vector is used
Therefore, we need 6 parameters: to generate another vector s ( m (t ), a ) of k visual features,
which represents additional information about the system
- Three parameters to determine the three rotations
(e.g., intrinsic, and extrinsic parameters of the vision sensor).
around the three axes of the reference frame, which
The values of the desired visual features are contained in the
is defined by a matrix ( c R m ) 3× 3 ;
vector marked s * .
- Three parameters to determine the three translations
Usually, we fix the desired visual features of the object (
along the three axes of the frame, which is defined by
s* = cte ), and therefore the temporal variations ofs depend
a matrix ( ctm )3×1 . only on the camera movement. Furthermore, in this case, we
consider that we have controlled the displacement of a six
Finally, the extrinsic parameters inhomogeneous degree of freedom (6 DOFs) camera, as a camera fixed on the
representation in the form of a matrix ( c M m ) 4 × 4 are defined robot end-effector of a 6 DOFs.
by [14]:
Once we have chosen the initial visual features, we can
 cR tm 
c easily design the control scheme. The design of a velocity
c
Mm =  m  (2) controller is probably the simplest method. For this, we need
 01×3 1 the relation between the temporal variations of s and the
spatial velocity of the camera. Let Vc = (v c , ω c ) the spatial
B. 2-D Visual Servoing
velocity of the camera, where v c = ( v x , v y , v z ) is the
The visual information taken from the image is divided
into: translation velocity of the camera frame origin and
ω c = (ω x , ω y , ω z ) is the rotation velocity of the camera frame.
• Simple geometric features: which come from the The following equation expresses the relation between the
projection of the target object in the image plane such temporal variations of s and the spatial velocity of the camera
as point coordinates, line parameters; circle [10];
Vc :
• Complex geometric features: which represent the
parameters of a contour [17] or 2D moments [18]; s& = LsVc (4)
• Hybrid geometric features: which represent the Where L s ∈ ℜ n × 6 is the interaction matrix of s.
possibility to define the control input as a combination
of different types of 2D features extracted from the From equations (1) and (2), we can therefore deduce the
image [13]. relationship between the temporal variations of the error e&
and the spatial velocity of the camera V c :
The Image-Based Visual Servoing loop can be
diagrammed as shown in the figure below:
e& = LeVc (5)
Where Ls = Le . if we consider the input to the system

controller is the camera's spatial velocity V c , and if we try to
provide an exponential decoupled decrease of the error, i.e. :
e& = −λe (6)
From equations (3) and (4), we obtain :
Vc = −λ L+
ee (7)  X1   LX1 
 .   
Where L+e ∈ ℜ 6× n is the Moore-Penrose pseudo-inverse    . 
s =  .   Ls =  .  (13)
of L e .    
 .   . 
In the following, we will look at the most well-known  X n  L 
visual features and determine the interaction matrix  Xn 
corresponding to each feature. When we have n≥ 4 We must pay attention to
III. 2-D VISUAL FEATURES singularities in the interaction matrix.
In this section, we recall the analytical calculation of the B. Line Features
interaction matrix corresponding to each fundamental visual For a straight line, we use the parameters ( ρ , θ ) to
feature in IBVS such as point, line, and circle.
express the interaction matrix [14].
A. Point Features We saw in the previous case that we need the depth of
X the point X 3 in the interaction matrix of the point. In this
Let a coordinate point X = ( X 1 , X 2 , X 3 ) in the
case, too, we need the equation of the plane that contains the
X in the
camera frame, and x = ( x1 , x 2 ) is the projection of line in the interaction matrix of the features of the line.
image plane. The relation between the coordinates of X and Several planes contain the line, but we need to choose one
x is: that has d ≠ 0 (the equation of the plane is
aX 1 + bX 2 + cX 3 + d = 0 ).
 x1 = X 1 / X 3
 (8) From the interaction matrix (9), x = ρ cosθ and
 x2 = X 2 / X 3
y = ρ sinθ .
if we take s = x = ( x1 , x 2 ) , the coordinates of the point in From a simple calculation, we find that the interaction
the image plane. After the time derivative of the projection matrix of the line parameters is defined as follows:
equations (6), we find: λθ cosθ λθ sinθ −λθ ρ −ρ cosθ −ρsinθ −1
Lρθ =   (14)
 x&1 = ( X&1 − x1 X& 3 ) / X 3 λ cosθ λ sinθ −λ ρ (1+ ρ2)sinθ −(1+ ρ2)cosθ 0 
 (9)  ρ ρ ρ 
x&2 = ( X& 2 − x2 X& 3 ) / X 3
Where λ ρ = − ( a ρ .sin θ + b ρ .cos θ + c ) / d ,
Let a camera observe a point in the world λθ = ( a.cos θ − b.sin θ ) / d and aX 1 + bX 2 + cX 3 + d = 0 is
X = ( X 1 , X 2 , X 3 ) and move with a velocity Vc = (vc , ωc ) in
the plane equation containing the straight line.
the frame of the world. The relations between the components As in the case of the point, one row provides two ranks
of the velocity of the point X and the components of the of the interaction matrix. Therefore, a minimum of three
spatial velocity of the camera Vc = (v c , ω c ) are given by: rows is required to obtain a full rank interaction matrix.
 X& 1 = −vx − ω y X 3 + ωz X 2 C. Circle Features

& In general, an ellipse in image space represents the
 X 2 = −v y − ω z X 1 + ωx X 3 (10) projection of a circle into Cartesian space, which is described
 X& = −v − ω X + ω X by the equation below [14]:
 3 z x 2 y 1
From (7) and (8) we get : u 2 + A1v 2 − 2 A2 uv + 2 A3 u + 2 A4 v + A5 = 0 (15)
 −1 X 3 0 x1 X 3 x1 x2 −(1 + x 2 ) x2  Where Ai ,(i = 1...5) represent the parameters of the ellipse

L = 1
 (11) in image space. The relation between the temporal variations
s  0 −1 X 3 x2 X 3 1 + x 2 − x x − x1 
 2 1 2  of these parameters and the spatial velocity of the camera is
given by the equation below:
The depth of the point relative to the camera frame is
represented by X 3 . Therefore, to use this interaction matrix in A& i = L ell ( Ai , ρ ) v c (16)
a control system, we must first estimate the value of X 3 .
Where Lell ( Ai , ρ ) is the interaction matrix given by [10] :
When x1 = x2 = 0 (principal point) :  2bA2 − 2aA1 2 A1 (b − aA2 ) 2bA4 − 2aA1 A3 2 A4 2 A1 A3 −2 A2 ( A1 + 1) 
 
 b − aA2 bA2 − a(2 A22 − A1 ) a( A4 − 2 A2 A3 ) + bA3 A3 2 A2 A3 − A4 A1 − 2 A22 −1 
 −1 X 3 0 0 0 −10  Lell ( Ai , ρ ) =  c − aA3

a( A4 − 2 A2 A3 ) + cA2 cA3 − a(2 A32 − A5 ) − A2 1 + 2 A32 − A5 A4 − 2 A2 A3 

L =
s  0 −1 X 3 0 1 0 0  (12)  bA3 + cA2 − 2 A4 bA4 + cA1 − 2aA2 A4 bA5 + cA4 − 2aA3 A4 A5 − A1 2 A3 A4 + A2 −2 A2 A4 − A3 
 2cA − 2aA 2cA4 − 2aA2 A5 2cA5 − 2aA3 A5 −2 A4 2 A3 A5 + 2 A3 −2 A2 A5 
 3 5 
(17)
A single point is sufficient to control v x or ω y and v y where ρ = ( a , b, c ) defines a plane of world coordinates
or ω x . α X 1 + β X 2 + γ X 3 + λ = 0 in which the ellipse is found and
Using several points (at least 3) allows for controlling all a = −α / λ , b = − β / λ and c = −γ / λ .
6 DOFs.
The interaction matrix of the circle projection in the image lines are 28.93% quicker than circles and points are 23.72%
space is of dimension 5×6, with a maximum rank equal to 5. more efficient than circles.
Therefore, we cannot solve for the spatial velocity of the
camera uniquely. To overcome this problem, we combine the B. Feature Error
interaction matrix of a circle and the interaction matrix of a In the second test, we compare the error between the three
point. Finally, we find that the new interaction matrix is of visual features used. The figures (4.2.1) represent the
dimension 7×6, and we can uniquely solve for the spatial convergence of the spatial velocity of the camera and the error
velocity of the camera in this case. in each case.
In the following part, the simulation results obtained with
MATLAB will be discussed.
IV. SIMULATION AND RESULTS
In the following, the results of the comparison between the
previous visual features used in IBVS are verified by
v (pixels)
simulations. Firstly, the controlled motion of the camera is
chosen for the IBVS method to work. The camera is mounted
on the end effector of the robot. The parameters of the camera
are the focal length f = 8 mm , and the pixel dimensions
Px = Py = 0.01mm / pixel .
In the simulations of these three visual features, the

lambda gain of the controller is set to λ = 0.04 , and using the
pseudo-inverse of the interaction matrix of the initial image
(a)
for each visual feature in the control law.
we consider that the camera is in the initial pose
T C 0 = [1,1, − 3( m ); 0, 0, 0.6( rad ) ] relative to its desired pose,
where t co = (1;1; − 3)( m ) is the translation vector of the origin
of the camera frame and R co = (0; 0; 0.6)( rad ) is the rotation
vector of the camera frame relative to the world frame.
Cartesian velocity
There are two subsections in this section. The first will

compare the computational time for each feature, while the
second will compare the error of these three features.
A. Computational Time
The computational time for each visual feature is
compared in the first test. For this purpose, we have chosen
the Execution Time Improvement Ratio (ETIR) as the
criterion for this comparison [15]. ETIR is defined as follows:
 t 
ETIR = 1− 1  ×100 (18)
 t2  (b)
Feature error
Where t1 and t2 are the execution times of the first and 200
second features. If t1 = t 2 so: ETIR = 0 . 100
The table shows the average computational time of each 0

feature. Note that ETIR1 is calculated between points and
circles, and ETIR2 is calculated between lines and circles. -100 u
1
u
2
-200
TABLE I. ETIR AND AVERAGE COMPUTATIONAL TIME OF 3 VISUAL u3
FEATURES u
-300 4
v
Visual Features Points Lines Circles 1
-400 v2
Mean 30.6671156 28.5724042 40.2049084
Computational v
3
time (s) -500 v
4
ETIR (%) ETIR1 = 23.72 ETIR2 = 28.93
-600
0 50 100 150 200 250
Time
The simulation results (TABLE I) show that the execution
time of lines is the shortest compared to the other features. (c)
Furthermore, according to the ETIR criterion, we find that Fig. 2. Simulation results for point features: (a) image initial and desired
(b) camera speed (in m/ s and rad / s ), (c) error.
0
100
200
300
400
v (pixels)
500
600
700
800
900
1000
0 200 400 600 800 1000
u (pixels)
(a) (a)
Cartesian velocity
0.25
vx
vy
0.2
vz
0.15 x
Cartesian velocity
y
z
0.1
0.05
-0.05
-0.1
0 20 40 60 80 100
Time step
(b) (b)
Feature error
Feature error
(c) (c)
Fig. 3. Simulation results for line features: (a) image initial and desired Fig. 4. Simulation results for circle features: (a) image initial and desired
(b) camera speed (in m/ s and rad / s ), (c) error. (b) camera speed (in m/ s and rad / s ), (c) error.
From the figures 2, 3. and 4. we find that the error (114s)
of lines converges better than the points (150s) which
converge also better than the circles (200s). Thus, we have a
correspondence between the results obtained from the first test
which is the computational time, and the results obtained by
exploiting the error.
V. CONCLUSION [7] A. Mohebbi, M. Keshmiri, et W. Xie, « A Comparative Study of
Eye-In-Hand Image-Based Visual Servoing: Stereo vs. Mono », JID, vol.
In this paper, we compared three visual features (points, lines, 19, no 3, p. 25‑54, janv. 2016, doi: 10.3233/jid-2015-0006.
[8] G. Flandin, F. Chaumette, et E. Marchand, « Eye-in-hand/eye-
and circles) in image-based visual servoing using two tests: to-hand cooperation for visual servoing », in Proceedings 2000 ICRA.
the first one is the execution time and the second one is the Millennium Conference. IEEE International Conference on Robotics and
error. concerning the first one, we find that ETIR1=23.72% Automation. Symposia Proceedings (Cat. No.00CH37065), San Francisco,
and ETIR2=28. 93%; and for the second one, we find that the CA, USA, 2000, vol. 3, p. 2741‑2746. doi: 10.1109/ROBOT.2000.846442.
[9] F. Janabi-Sharifi, L. Deng, et W. J. Wilson, « Comparison of
error is annulled after 114s for lines, 150s for points, and 200s Basic Visual Servoing Methods », IEEE/ASME Trans. Mechatron., vol. 16,
for circles. From these results, we deduced that the use of lines no 5, p. 967‑983, oct. 2011, doi: 10.1109/TMECH.2010.2063710.
is more efficient than the use of points which is more efficient [10] B. Espiau, « A new approach to visual servoing in robotics », p.
than the use of circles. In future work, we will examine other 31.
[11] F. Chaumette, « Image Moments: A General and Useful Set of
visual features such as ellipses and moments. Features for Visual Servoing », IEEE Trans. Robot., vol. 20, no 4, p.
713‑723, août 2004, doi: 10.1109/TRO.2004.829463.
ACKNOWLEDGMENT [12] W. J. Wilson et W. Hulls, « Cartesian osition Based Visual
The authors would like to express their gratitude to the Servoing », IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION,
o
vol. 12, n 5, p. 13, 1996.
anonymous referees whose thorough reviews and detailed [13] E. Malis, F. Chaumette, et S. Boudet, « 2 1/2 D visual
comments aided in improving the readability of this paper. servoing », IEEE Trans. Robot. Automat., vol. 15, no 2, p. 238‑250, avr.
1999, doi: 10.1109/70.760345.
REFERENCES [14] P. Corke, Robotics, Vision and Control, vol. 118. Cham:
Springer International Publishing, 2017. doi: 10.1007/978-3-319-54413-7.
[15] H. Karmouni, T. Jahid, M. Sayyouri, R. El Alami, et H. Qjidaa,
[1] S. Y. Nof, Éd., Handbook of industrial robotics, 2nd ed. New
« Fast 3D image reconstruction by cuboids and 3D Charlier’s moments », J
York: John Wiley, 1999.
Real-Time Image Proc, vol. 17, no 4, p. 949‑965, août 2020, doi:
[2] B. Bayram et G. İnce, « Advances in Robotics in the Era of
10.1007/s11554-018-0846-0.
Industry 4.0 », in Industry 4.0: Managing The Digital Transformation,
[16] K. Muller, C. K. Hemelrijk, J. Westerweel, et D. S. W. Tam,
Cham: Springer International Publishing, 2018, p. 187‑200. doi:
« Calibration of multiple cameras for large-scale experiments using a freely
10.1007/978-3-319-57870-5_11.
moving calibration target », Exp Fluids, vol. 61, no 1, p. 7, janv. 2020, doi:
[3] E. D. Oña, J. M. Garcia-Haro, A. Jardón, et C. Balaguer,
10.1007/s00348-019-2833-z.
« Robotics in Health Care: Perspectives of Robot-Aided Interventions in
[17] C.-L. Li, M.-Y. Cheng, et W.-C. Chang, « Dynamic
Clinical Practice for Rehabilitation of Upper Limbs », Applied Sciences,
performance improvement of direct image-based visual servoing in contour
vol. 9, no 13, p. 2586, juin 2019, doi: 10.3390/app9132586.
following », International Journal of Advanced Robotic Systems, vol. 15, no
[4] R. Bogue, « The growing use of robots by the aerospace
1, p. 172988141775385, janv. 2018, doi: 10.1177/1729881417753859.
industry », IR, vol. 45, no 6, p. 705‑709, déc. 2018, doi: 10.1108/IR-08-
[18] O. Tahri et F. Chaumette, « Point-based and region-based image
2018-0160.
moments for visual servoing of planar objects », IEEE Trans. Robot., vol.
[5] F. Chaumette et S. Hutchinson, « Visual servo control, Part I: o
21, n 6, p. 1116‑1127, déc. 2005, doi: 10.1109/TRO.2005.853500.
Basic approaches », p. 10, 2006.
[6] S. Hutchinson, G. D. Hager, et P. I. Corke, « A tutorial on visual
servo control », IEEE Trans. Robot. Automat., vol. 12, no 5, p. 651‑670,
oct. 1996, doi: 10.1109/70.538972.

Image-Based_Visual_Servoing_Techniques_for_Robot_Control

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Image-Based_Visual_Servoing_Techniques_for_Robot_Control

Uploaded by

Copyright:

Available Formats

Image-Based Visual Servoing Techniques for Robot

4th Achraf Daoui 5th Mhamed Sayyouri

978-1-6654-9558-5/22/$31.00 ©2022 IEEE

Where Ls = Le . if we consider the input to the system

 X& 1 = −vx − ω y X 3 + ωz X 2 C. Circle Features

From (7) and (8) we get : u 2 + A1v 2 − 2 A2 uv + 2 A3 u + 2 A4 v + A5 = 0 (15)

 −1 X 3 0 x1 X 3 x1 x2 −(1 + x 2 ) x2  Where Ai ,(i = 1...5) represent the parameters of the ellipse

In the simulations of these three visual features, the

There are two subsections in this section. The first will

second features. If t1 = t 2 so: ETIR = 0 . 100

The table shows the average computational time of each 0

You might also like