Professional Documents
Culture Documents
Robust Active Stereo Calibration
Robust Active Stereo Calibration
net/publication/28621115
CITATIONS READS
11 94
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Nicola Ferrier on 15 December 2013.
Figure 1: (a) The coordinate frames that are present on the dynamic stereo head. (b) The active stereo head.
calibration results, indicating whether further data is (g)aze the coordinate system attached to the pan/tilt
needed to ensure an accurate estimate of the kinematic unit for the current gaze position (i.e. it changes
model. with the pan/tilt angle). When the pan/tilt an-
In the remainder of the paper we present our for- gles are both zero, the axes of the gaze and ptu
mulation of the active stereo calibration problem for frames coincide. (i.e. Tgp (φ, τ ))
a system with two pan-tilt units mounted on a mo-
(c)amera the camera coordinate frame located at
bile robot base. Some background is presented and
the optical center of the camera with positive z
the calibration procedure is described. We present
axis along the optical axis.
results showing the accuracy of the method in the
presence of noise, and the ability of the procedure (w)orld a frame of reference (often attached to a cal-
to quantify possible inaccuracies (when the noise is ibration grid or some physical landmark) - the
too high). The kinematic model that results from this robot moves relative to this frame.
procedure enables us to estimate depth using the mea-
sured/controlled position of the cameras, along with The coordinate systems ptu, gaze, and camera are de-
image data, as demonstrated in section 5. fined for both the right and the left camera.
We express a point, b q with respect to the camera by
modeling the transformation
2 Head Geometry c
q = [Tcb ] b q (1)
Our physical system is similar to numerous others.
where [Tcb ] is decomposed into 3 transformations
Two pan-tilt are mounted on a mobile robot base. The
PTUs have two degrees of freedom, both of them ro- [Tcg Tgp Tpb ] . (2)
tational. In order to describe the physical set-up, it
is convenient to assign frames of reference. Here we The transformation from system base to camera has
present the mathematical notations and definitions for been decomposed into three transformations. Tpb is
the transformations used. the transformation from the robot base coordinates to
In the following discussions, we will outline the coor- the home position of the pan-tilt unit. Tcg is the trans-
dinate systems used throughout the rest of the paper formation from the pan-tilt unit to the optical center
(see figure 1): of the camera. Assuming a fixed focus and rigidly
attached pan-tilt units, Tcg and Tpb are assumed to
(b)ase the robot-centric coordinate system located
be fixed (constant) transformations. Tcg and Tpb can
midway between the two pan-tilt units. We de-
be found using the Robust Active Stereo Calibration
note the base coordinate system with a “b” (xb ,
(RASC) method described in section 3. Tgp represents
yb , zb ).
the pan/tilt action and is a function of those two an-
(p)tu fixed on the pan/tilt unit when in “home” po- gles. The transformation Tgp (φ, τ ) describes the mo-
sition (both cameras parallel to each other facing tion of the unit from its home position (0, 0) to the
“forward”). The pan/tilt angles are zero at this current “gaze” position (φ, τ ). Our Directed Percep-
position. (i.e. Tgp (φ = 0, τ = 0)) tion pan-tilt unit [4] rotates about a fixed origin, thus
2
we can model the PTU to gaze transformation as 2. Base-PTU calibration.
cos τ − sin τ cos φ − sin τ sin φ 0 (a) Use physical measurements between the
sin τ cos τ cos φ cos τ sin φ 0 PTUs to determine the distance and angle
Tgp =
0
(3)
between the PTUs.
− sin φ cos φ 0
0 0 0 1 (b) Construct PTU to base transformations
from the transformation between the PTUs.
where τ is the PTU tilt angle and φ is the pan angle.
The value τ is a negative rotation (with respect to
the chosen right hand coordinate system) but a pos-
3.1 Camera Calibration
itive value as commanded on the physical unit (see The RASC method requires both intrinsic and extrin-
figure 2). With this notation, our goal is re-stated as: sic parameters. This requires some type of 3D-2D
correspondence. Using a known 3D model or calibra-
Given the ability to control Tgp , find Tcg and tion grid: Determine correspondences between known
Tpb for each of the “eyes”. model points and image projections of those points,
solve for the projection matrix, and decompose the
projection matrix. Proceedures for this are laid out in
detail in [5] and [13].
(a) Move each eye to a set of discrete positions. where (Rcg , tcg ) ∈ SE(3) is the transformation from
Record the PTU position and the image. For the gaze to the camera (which we wish to determine).
each position: Equating equation (4) and (5) separates the rotation
– Perform camera calibration to determine and translation components to produce:
intrinsic and extrinsic parameters at each £ ¡ i j T
¢ ¤ £ i ¤
Rcg Rgp (Rgp ) (Rcg )T = Rcw j
(Rcw )T (6)
position.
– [Optional] Find the best intrinsic param-
£ ¤
eters (use an average of those found, do ro- i
I − Rcg Rgp j T
(Rgp ) (Rcg )T tcg =
bust statistics to reject outliers) then re-do £ i ¤
the extrinsic parameter estimation using the ticw − Rcw j
(Rcw )T tjcw (7)
new (fixed) intrinsic parameters. The first is easily rearranged to be of the form AX =
(b) Calculate the transformation from gaze to XB, for X = Rcg . Refering to the solution outlined by
image, Tgc using equations (6) and (7). Park and Martin [11], if A = XBX T then there exists
3
vectors α and β and corresponding skew-symmetric The solution for tcg is unique for a given X. The choice
matrices1 , [α] and [β] such that A = e[α] and B = e[β] of X is unique if M T M is nonsingular and has no re-
and A = XBX T can be re-written as peated eigenvalues. This clearly shows the advantage
T of using canonical coordinates because our nonlinear
e[α] = Xe[β] X T = eX[β]X = e[Xβ] . problem is transformed to a linear one for which a
closed form least squares solutions [2] can be applied.
Thus AX = XB implies the linear relationship
We use this to solve equation (8) and obtain the rota-
α = Xβ (8) tion component of the gaze to camera transformation.
Our rotation component is computed explicitly with-
Using canonical coordinates of SE(3) to parameterize out the use of optimization/minimization techniques.
the coordinate transformations, the equation AX = The quality of the solution can be verified by calculat-
XB is transformed into a linear equation. Park and ing the noise in the rotation solution. This is a good
Martin [11] first made this observation for solving indication of the overall solution quality. The amount
robot sensor calibration problems. The linear form of noise in the rotation solution can be obtained by
of AX − XB allows the use of earlier results in least averaging the Frobenious norm of
squares fitting on matrix subgroups [2] to derive a £ ¡ i j T
¢ ¤ £ i ¤
closed form solution to α = Xβ for X ∈ SE(3), α, β ∈ Rcg Rgp (Rgp ) (Rcg )T − Rcw j
(Rcw )T (10)
<3 (and hence to AX = XB, for A, B, X ∈ SO(3)).
for all i and j where i 6= j. Our experience indicated
The calibration procedure yields a set of estimates
that if Frobenious norm exceeds unity then the solu-
(Ak , Bk ), k = 1, . . . n (where Ak and Bk are elements
tion is unusable.
of SO(3) of the form given in equation (6)). Define
(αk , βk ) where αk = log Ak and βk = log Bk , where
3.3 Robot base to PTU calibration
φ
log Θ = (Θ − ΘT ) (9) The relationship between the PTUs can be found ex-
2 sin φ
plicitly using loop closure. First we write the equation
for Θ ∈ SO(3) such that Tr(Θ) 6= 1. Then the solu- of the loop,
tion X that minimizes £ ¤−1 £ R R R¤
L L L
X Tpg Tgc Tcw TRL Tpg Tgc Tcw = I (11)
η= d(XAk − BXk )
k Then solve for TRL to obtain
where d( , ) is a suitable metric on SO(3) is given by £ L L L ¤ £ R R R ¤−1
TRL = Tpg Tgc Tcw Tpg Tgc Tcw (12)
T −1/2 T
X = Rcg = (M M ) M L R L R
The values of Tpg , Tpg , Tgc , and Tgc are known through
where parameterization or calibration and Tcw can be found
n
X using the methods described in [5, 14]. These val-
M= βi αiT ues are substituted into equation (12) and TRL is cal-
i culated. This procedure provided reasonable results
(see [2] for details) and we take the symmetric, posi- (rotation error of 2 - 3 degrees and baseline distance
tive definite square root of (M T M )−1 (see [7]). The found within about a centimeter).
translation tcg that satisfies equation (7) A more practical and accurate way to determine the
transformation between the PTUs is to physically
tcg = (C T C)−1/2 C T d measure the angles and translation (because the PTUs
were placed using precision machined mounts we have
where a good estimate of the transformation and measure-
ment confirms the estimate). This method eliminates
I − A1 bA1 − XbB1
I − A2 bA2 − XbB2 the noise associated with Tgc and Tcw that can lead
to large errors in TRL . When the transformation was
C= .. and d = .. .
. . constructed using physical measurements the angular
I − An bAn − XbBn error was less one degree and the error in the baseline
1 Formally, α is the element in the Lie algebra that maps, via was on the order of 2-3 mm.
the exponential map, to the element A of the Lie group, details There is an ambiguity in the translation placement of
are given in [6]. the base frame “symmetrically” with respect to each
4
PTU: an entire plane of solutions exists. The cyclo-
pean frame assumes translations equal and opposite
in X (for our particular choice of axes). Choosing this
form above allows for a unique solution for b.
Although there is flexibility in its exact placement,
note that the base frame of reference is on the robot.
Hence employing this kinematic model, say using tri-
angulation to produce an estimate of scene depth,
will produce a measurement relative to the current
pose of the robot. This proves very useful for naviga-
tion/localization. In contrast to others who report the
need for localization [9], we can use image information
to solve the localization problem.
4 Evaluation
The procedure outlined in section 3.2 was imple-
mented along with the method outlined by Li [9]. Figure 3: Error in the translation vector found with
These methods were evaluated using simulated data varying amounts of input data error. Noise in the
to determine the effect of various levels of noise. input increases with set number. (See below for level
The simulated data was created using a set of known of noise)
transformations. This set of transformations was con-
structed to model the active vision system described in
section 2. Both methods require as input the pan/tilt
angles and Tcw . Because the pan/tilt angles are known figures 3 and 4. The error in the rotation matrix, or
within a few seconds of a degree, error was found to rotation error, is defined as kRcalc − Rtrue kF , where
arise prodominantly from the calibration procedure k · kF is the Frobenious norm of the matrix. The tans-
(image noise and errors in corner detection), and hence lation error is simply the L2 norm of the difference
in our simulation noise was added to the Tcw transfor- between ~tcalc and ~ttrue . The standard deviation was
mation. The following formula was used to add noise defined in a similar fashion with the standard devia-
to the true Tcw : tion of the rotation matrix defined as the Frobenious
norm of the deviations of each term. The standard
1 −ω3 ω2 deviation in the translation was the L2 norm of the
error true
Rcw = Rcw ω3 1 −ω1 (13) deviations in each of the terms in the translation.
−ω2 ω1 1 Referring to the results (figures 3 and 4) of each
error
£ ¤T method some interesting observations are made. The
tcw = ttrue
cw + t1 t2 t3 (14) RASC method produced results with lower error and
where the noise in the rotation is described by the pa- standard deviation in both the rotation and trans-
rameters ω1 , ω2 , and ω3 and t1 , t2 , and t3 charicter- lation components on average, but the difference in
izes the error in the translation component. The noise the rotation error was not significant. The trans-
parameters were Gaussian with mean zero. Equa- lation component of the Tgc found with the RASC
tion (14) was used to create data sets with eight differ- method contains significantly less error than that ob-
ent levels of noise. The data sets were defined by the tained with Li’s method.
radius of the Gaussian noise in the angle and transla- Another one of the major advantages of the method
tion parameters. In order to get a good statistical described by this paper is its robustness to noise in
representation of the performance of both methods data obtained from any one of the camera positions.
30 samples were created for each of the noise levels: The method outlined by Li [9] is dependent on the
(0.005 rad, 1 mm), (0.010 rad, 2 mm), (0.015 rad, 3 PTU home position, thus if data from this particular
mm), (0.020 rad, 4 mm), (0.025 rad, 5 mm), (0.030 position contains a large amount of error the results
rad, 6 mm), (0.035 rad, 7 mm), and (0.040 rad, 8 will be significantly effected. Using only a moderate
mm). amount of noise, (0.030 rad, 6 mm), in the image at
These eight data sets were used with each method and this home position, the method described by Li re-
the resulting error in the transformation is plotted in sulted in an order of magnitude more error than those
5
x y z
error 1.02 0.57 18.3
std 8.8 12.8 53.7
6
The model performed well on the data gathered, es- Symposia in Applied Mathematics, pages 1–19. Amer-
pecially in the x and y directions (see table 1). The ican Mathematical Society, 1990.
average error in the x and y directions was on the [2] R.W. Brockett. Least squares matching problems.
order of 1mm, but produced a standard deviation of Linear Algebra and its Applications, 124:761–777,
about 1cm. As expected, the results contained sig- 1989.
nificantly higher average error and standard deviation [3] M.J. Brooks, L. de Agapito, D.Q. Hyunh, and
in the z direction. Depth estimates from the closer L. Baumela. Towards robust metric reconstruction
two planes produced about 6mm of error on average via a dynamic uncalibrated stereo head. Journal of
with half the standard deviation of the data at the Image and Vision Computing, 16:989–1002, 1998.
larger (1.8615m) depth. The increase of error as depth [4] Directed Perception. Directed perception ptu model
increases was expected due to the nature of triangu- 46-17.5. URL://www.dperception.com.
lation. The accuracy of the measurement, combined [5] Olivier Faugeras. Three-Dimensional Computer Vi-
with the simplicity of the data acquisition and depth sion – A Geometric Viewpoint. MIT Press, 1993.
computation, will enable solutions (based on multi-
[6] N.J. Ferrier and J. Neubert. A closed form solution for
ple measurements) to be developed to compensate for
active stereo kinematic calibration, 2002. (in prepara-
the level of precision obtained. This is consistent with
tion) available at: http://robios.me.wisc.edu/papers.
many active vision approaches.
[7] G.H. Golub and C.F. Van Loan. Matrix Computa-
tions. Johns Hopkins Press, 1983.
[8] J. Knight and I. Reid. Active visual alignment of a
6 Discussion mobile stereo camera platform. In IEEE International
Conference on Robotics and Automation, pages 3203–
A method for kinematic calibration of a dynamic 3208, San Francisco, CA, April 2000. IEEE.
stereo head has been presented. Unlike previous
[9] Mengxiang Li. Kinematic calibration of an active
approaches, the solution is simple and explicit (i.e.
head-eye system. IEEE Transactions on Robotics and
truly in closed form without requiring analysis of Automation, 14(1):153–157, February 1998.
“cases”, nor having to perform any optimization or
[10] J. Neubert, A. Hammond, N. Guse, Y. Do, Y. Hu,
non-linear analysis). This simple approach is made
and N. Ferrier. Automatic training of a neural net for
possible through the use of canonical coordinates for
active stereo 3d reconstruction. In IEEE Int’l Conf.
Lie Groups describing the motion of the device. The on Robotics and Automation, pages 2140–2146, 2001.
method uses information “equally” (no bias from er-
[11] Frank C. Park and Bryan J. Martin. Robot sen-
rors in any “special” zero or home position), and the
sor calibration: Solving AX = XB on the euclidean
derivation automatically provides a qualification of the
group. IEEE Transactions on Robotics and Automa-
“quality” of fit - thus highly uncertain results can be tion, 10(5):717–721, October 1994.
rejected automatically.
[12] Yiu Chiung Shiu and Shaheen Ahmad. Calibration of
We have presented simulation results to verify RASC’s
wrist-mounted robotic sensors by solving homogenous
benefits with respect to a previous method. The simu-
transform equations of the form AX = XB. IEEE
lation also demonstrated the robustness of the RASC Transactions on Robotics and Automation, 5(1):16–
method to noise. We verified the accuracy of the 29, February 1989.
model by calibrating an active stereo vision system,
[13] Roger Y. Tsai. A versatile camera calibration tech-
then using the model we mapped the pan/tilt angles
nique for high-accuracy 3D machine vision metrol-
and image coordinates to 3D coordinates in the robot- ogy using off-the-shelf TV cameras and lenses. IEEE
centric frame. The model produced good estimates of Journal of Robotics and Automation, RA-3(4):323–
the measured coordinates. The demonstrated applica- 344, August 1987.
tion of the calibrated kinematic stereo head model for
[14] Roger Y. Tsai and Reimar K. Lenz. A new tech-
depth computation shows that this method is simple nique for fully autonomous and efficient 3d robotics
and accurate, and thus can be readily incorporated hand/eye calibration. IEEE Transactions on Robotics
into active vision solutions. and Automation, 5(3):345–358, June 1989.
References
[1] Roger Brockett. Some mathematical aspects of
robotics. In Robotics, volume 41 of Proceedings of