Professional Documents
Culture Documents
Model Globally, Match Locally: Efficient and Robust 3D Object Recognition
Model Globally, Match Locally: Efficient and Robust 3D Object Recognition
Abstract
Figure 2. (a) Point pair feature F of two oriented points. The component F1 is set to the distance of the points, F2 and F3 to the angle
between the normals and the vector defined by the two points, and F4 to the angle between the two normals. (b) The global model
description. Left: Point pairs on the model surface with similar feature vector F. Right: Those point pairs are stored in the same slot in the
hash table.
and explained in Fig. 3. Note that local coordinates have i.e. t lies on the half-plane defined by the x-axis and the
three degrees of freedom (one for the rotation angle α and non-negative part of the y-axis. For each point pair in the
two for a point on the model surface), while a general rigid model or in the scene, t is unique. αm can thus be precal-
movement in 3D has six. culated for every model point pair in the off-line phase and
is stored in the model descriptor. αs needs to be calculated
Voting Scheme Given a fixed reference point sr , we want only once for every scene point pair (sr , si ), and the final
to find the “optimal” local coordinates such that the number angle α is a simple difference of the two values.
of points in the scene that lie on the model is maximized.
This is done using a voting scheme, which is similar to the
3.4. Pose Clustering
Generalized Hough Transform and very efficient since the The above voting scheme identifies the object pose if the
local coordinates only have three degrees of freedom. Once reference point lies on the surface of the object. Therefore,
the optimal local coordinates are found, the global pose of multiple reference points are necessary to ensure that one of
the object can be recovered. them lies on the searched object. As shown in the previous
For the voting scheme, a two-dimensional accumulator section, each reference point returns a set of possible object
array is created. The number of rows, Nm , is equal to poses that correspond to peaks in its accumulator array. The
the number of model sample points |M |. The number of retrieved poses will only approximate the ground truth due
columns Nangle corresponds to the number of sample steps to different sampling rates of the scene and the model and
nangle of the rotation angle α. This accumulator array rep- the sampling of the rotation in the local coordinates. We
resents the discrete space of local coordinates for a fixed now introduce an additional step that both filters out incor-
reference point. rect poses and increases the accuracy of the final result.
For the actual voting, reference point sr is paired with To this end, the retrieved poses are clustered such that all
every other point si ∈ S from the scene, and the model sur- poses in one cluster do not differ in translation and rotation
face is searched for point pairs (mr , mi ) that have a similar for more than a predefined threshold. The score of a clus-
distance and normal orientation as (sr , si ). This search an- ter is the sum of the scores of the contained poses, and the
swers the question of where on the model the pair of scene score of a pose is the number of votes it recived in the voting
points (sr , si ) could be, and is performed using the pre- scheme. After finding the cluster with the maximum score,
computed model description: Feature Fs (sr , si ) is calcu- the resulting pose is calculated by averaging the poses con-
lated and used as key to the hash table of the global model tained in the cluster. Since multiple instances of the object
description, which returns the set of similar features on the can be in the scene, several clusters can be returned by the
Figure 4. Visualisation of different steps in the voting scheme: (1) Reference point sr is paired with every other point si , and their point
pair feature F is calculated. (2) Feature F is matched to the global model description, which returns a set of point pairs on the model that
have similar distance and orientation (3). For each point pair on the model matched to the point pair in the scene, the local coordinate α is
−1
calculated by solving si = Ts→g Rx (α)Ts→g mi . (4) After α is calculated, a vote is cast for the local coordinate (mi , α).
4. Results
We evaluated our method against a large number of syn-
thetic and real datasets and tested the performance, effi-
ciency and parameter dependence of the algorithm.
For all experiments the feature space was sampled by
setting ddist to be relative to the model diameter ddist =
τd diam(M ). By default, the sampling rate τd was set
to 0.05. This makes the parameter independent from the Figure 5. Models used in the experiments. Top row: The five ob-
model size. Normal orientation was sampled for nangle = jects from the scenes of Mian et al. [11, 10]. Note that the rhino
was excluded from the comparisons, as in the original paper. Bot-
30. This allows a variation of the normal orientation in re-
tom row from left to right: The Clamp and the Cross Shaft from
spect to the correct normal orientation of up to 12◦ . Both
us, and the Stanford Bunny from [22].
model and scene cloud were subsampled such that all points
have a minimum distance of ddist . 1/5th of the points in the
subsampled scene were used as reference points. After re- speed up the matching times. All given timings measure the
sampling the point cloud, the normals were recalculated by whole matching process including the scene subsampling,
fitting a plane into the neighbourhood of each point. This normal calculation, voting and pose clustering. Note that
step ensured that the normals correspond to the sampling the pose was not refined by way of, for example, ICP [26],
level and avoids problems with fine details, such as wrin- which would further increase the detection rate and accu-
kles on the surface. The scene and the model are sampled racy. For the construction of the global model description,
in the same way and the same parameters were used for all all point pairs in the subsampled model cloud were used.
experiments except where noted otherwise. The construction took several minutes for each model.
We will show that our algorithm has a superiour per-
formance and allows an easy trade-off between speed and 4.1. Synthetic Data
recognition rate. The algorithm was implemented in Java,
and all benchmarks were run on a 2.00GHz Intel Core Duo We first evaluated the algorithm against a large set
T7300 with 1GB RAM. Even though it is straightforward of synthetically generated three-dimensional scenes. We
to parallelize our method on various levels, we ran a single- choosed four models of Fig. 5, the T-Rex, the Chef, the
threaded version for all timings. We assume that an im- Bunny and the Clamp, to generate the synthetic scenes. The
plementation in C++ and parallelization would significantly chosen models demonstrate different geometries.
1
Detection rate
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5
Gaussian noise σ
(a) (b)
Figure 6. Results for the artificial scenes with a single object. (a) A (a)
point cloud with additive Gaussian noise added to every point
(σ = 5% of the model diameter). The pose of the Bunny was re-
1
covered in 0.96 seconds. (b) Detection rate against gaussian noise
over all 200 test scenes. 0.8
Recognition rate
0.6
Recognition rate
87.8% for spin images. The values relative to the 84%- 0.6
boundary was taken from [10] and is given for comparabil-
ity. The main advantage of our method is the possible trade- 0.4
The recognition rate still exceeds that of spin images. The % occlusion
0.8
Recognition rate
0.6
Qualitative results To show the performance of our al-
gorithm concerning real data, we took a number of scenes 0.4
using a self-build laser scanning setup. Two of the scenes
and detections are shown in Fig. 1 and 9. We deliberately 0.2
did not do any post-processing of acquired point clouds, Our method, τ=0.025 (85 sec/obj)
Our method, τ=0.04 (1.97 sec/obj)
such as outlier removal and smoothing. The objects were 0
65 70 75 80 85
matched despite a lot of clutter, occlusion and noise in the % clutter
scenes. The resulting poses seem accurate enough for ob- (d)
ject manipulation, such as pick and place applications.
Figure 8. (a) Example scene from the dataset of Mian et al. [10]
(b) Recognition results with our method. All objects in the scene
5. Conclusion were found. The results were not refined. (c) Recognition rate
against occlusion of our method compared to the results described
We introduced an efficient, stable and accurate method in [10] for the 50 scenes. The sample rate is τd = 0.025 with
to find free-form 3D objects in point clouds. We built a |S|/5 reference points for the green curve, and τd = 0.04 with
|S|/10 reference points for the light blue curve. (d) Recognition
global representation of the object that leads to indepen-
rate against clutter for our method.
dence from local surface information and showed that a lo-
cally reduced search space results in very fast matching. We
tested our algorithm on a large set of synthetic and real data References
sets. The comparison with traditional approaches shows im-
provements in terms of recognition rate. We demonstrated [1] P. J. Besl and R. C. Jain. Three-dimensional object recogni-
that with slight or even no sacrifice of the recognition per- tion. ACM Computing Surveys (CSUR), 17(1):75–145, 1985.
formance we can achieve dramatic speed improvement. [2] R. J. Campbell and P. J. Flynn. A survey of Free-Form object
[13] I. K. Park, M. Germann, M. D. Breitenstein, and H. Pfister.
Fast and automatic object pose estimation for range images
on the GPU. Machine Vision and Applications, pages 1–18,
2009.
[14] T. Rabbani and F. V. D. Heuvel. Efficient hough transform
for automatic detection of cylinders in point clouds. In Pro-
ceedings of the 11th Annual Conference of the Advanced
School for Computing and Imaging (ASCI05), volume 3,
pages 60–65, 2005.
[15] N. S. Raja and A. K. Jain. Recognizing geons from su-
perquadrics fitted to range data. Image and Vision Comput-
ing, 10(3):179–190, 1992.
[16] S. Ruiz-Correa, L. G. Shapiro, and M. Meila. A new
paradigm for recognizing 3-d object shapes from range data.
In Proc. Int. Conf. Computer Vision, pages 1126–1133. Cite-
seer, 2003.
Figure 9. A cluttered, noisy scene taken with a laser scanner. The
[17] R. B. Rusu, N. Blodow, and M. Beetz. Fast point feature
detected clamps are shown as colored wireframe.
histograms (FPFH) for 3D registration. In Proceedings of the
IEEE International Conference on Robotics and Automation
(ICRA), Kobe, Japan, 2009.
representation and recognition techniques. Computer Vision
[18] R. Schnabel, R. Wahl, and R. Klein. Efficient RANSAC for
and Image Understanding, 81(2):166–210, 2001.
point-cloud shape detection. In Computer Graphics Forum,
[3] H. Chen and B. Bhanu. 3D free-form object recognition in
volume 26, pages 214–226. Citeseer, 2007.
range images using local surface patches. Pattern Recogni-
[19] Y. Shan, B. Matei, H. Sawhney, R. Kumar, D. Huber, and
tion Letters, 28(10):1252–1262, 2007.
M. Hebert. Linear model hashing and batch ransac for rapid
[4] C. S. Chua and R. Jarvis. Point signatures: A new repre- and accurate object recognition. In IEEE Computer Soci-
sentation for 3d object recognition. International Journal of ety Conference on Computer Vision and Pattern Recognition,
Computer Vision, 25(1):63–85, 1997. volume 2. IEEE Computer Society; 1999, 2004.
[5] C. Dorai and A. K. Jain. COSMOS - a representation [20] F. Stein and G. Medioni. Structural indexing: efficient 3-
scheme for 3D Free-Form objects. IEEE Transactions on D object recognition. Pattern Analysis and Machine Intelli-
Pattern Analysis and Machine Intelligence, 19(10):1115– gence, IEEE Transactions on, 14(2):125–145, 1992.
1130, 1997. [21] Y. Sun, J. Paik, A. Koschan, D. L. Page, and M. A. Abidi.
[6] N. Gelfand, N. J. Mitra, L. J. Guibas, and H. Pottmann. Ro- Point fingerprint: a new 3-D object representation scheme.
bust global registration. In Proceedings of the third Euro- IEEE Transactions on Systems, Man, and Cybernetics, Part
graphics symposium on Geometry processing, page 197. Eu- B, 33(4):712–717, 2003.
rographics Association, 2005. [22] G. Turk and M. Levoy. Zippered polygon meshes from range
[7] A. E. Johnson and M. Hebert. Using spin images for efficient images. In Proceedings of the 21st annual conference on
object recognition in cluttered 3 d scenes. IEEE Transactions Computer graphics and interactive techniques, page 318.
on Pattern Analysis and Machine Intelligence, 21(5):433– ACM, 1994.
449, 1999. [23] E. Wahl, G. Hillenbrand, and G. Hirzinger. Surflet-pair-
[8] D. Katsoulas. Robust extraction of vertices in range images relation histograms: A statistical 3d-shape representation for
by constraining the hough transform. Lecture Notes in Com- rapid classification. In 3-D Digital Imaging and Modeling,
puter Science, pages 360–369, 2003. 2003. 3DIM 2003. Proceedings. Fourth International Con-
[9] G. Mamic and M. Bennamoun. Representation and recog- ference on, pages 474–481, 2003.
nition of 3D free-form objects. Digital Signal Processing, [24] J. Y. Wang and F. S. Cohen. Part II: 3-D object recog-
12(1):47–76, 2002. nition and shape estimation from image contours using b-
[10] A. S. Mian, M. Bennamoun, and R. Owens. Three- splines, shape invariant matching, and neural network. IEEE
dimensional model-based object recognition and segmenta- Transactions on Pattern Analysis and Machine Intelligence,
tion in cluttered scenes. IEEE transactions on pattern anal- 16(1):13–23, 1994.
ysis and machine intelligence, 28(10):1584–1601, 2006. [25] T. Zaharia and F. Prêteux. Hough transform-based 3D mesh
[11] A. S. Mian, M. Bennamoun, and R. Owens. On the repeata- retrieval. In Proc. of the SPIE Conf. 4476 on Vision Geometry
bility and quality of keypoints for local feature-based 3D ob- X, pages 175–185, 2001.
ject retrieval from cluttered scenes. International Journal of [26] Z. Zhang. Iterative point matching for registration of free-
Computer Vision, pages 1–14, 2009. form curves. Int. J. Comp. Vis, 7(3):119–152, 1994.
[12] A. S. Mian, M. Bennamoun, and R. A. Owens. Automatic
correspondence for 3D modeling: An extensive review. In-
ternational Journal of Shape Modeling, 11(2):253, 2005.