Professional Documents
Culture Documents
Colourising Point Clouds Using Independent Cameras
Colourising Point Clouds Using Independent Cameras
Abstract—We investigate the problem of colourising a point hand-held, to mounted on aerial and ground platforms. The
cloud using an arbitrary mobile mapping device and an inde- ability to easily add colour to point clouds generated by such
pendent camera. The loose coupling of sensors creates a number different platforms has a number of advantages. They include:
of challenges related to timing, point visibility, and colour deter-
mination. We address each of these problems and demonstrate (i) economic attractiveness, as existing camera-less devices can
our resulting algorithm on data captured using sensor payloads be fitted with a camera, (ii) there is no restriction on the camera
mounted to hand-held, aerial, and ground systems, illustrating the type or modality provided that it is of sufficient quality to gen-
“plug-and-play” portability of the method. erate reliable optical flow, (iii) many modern mapping devices
Index Terms—Mapping, sensor fusion, computer vision for are designed to be mobile, permitting increased colour accuracy
robotic applications. from multiple candidate colours per point, (iv) portability and
platform independence. These benefits serve as motivation for
the method we propose in this letter.
I. INTRODUCTION
In general, the fundamental process to achieve colourised
N THE last decade we have seen a dramatic increase in point clouds is to project the 2D camera images over the 3D
I the development of mobile mapping systems, where a 3D
geometric representation of the environment is generated with
points, such that colours (or other information such as hyper-
spectral data) are assigned to each 3D point. In this case, there are
high accuracy. key fundamental challenges associated with colourising point
In this space, lidar-based systems are very popular, finding clouds, which are (i) clock synchronisation between the lidar
application in robotics (vehicle navigation), gaming (virtual re- and the imaging device, (ii) determining the visibility of points
ality), surveying, among other industries. and (iii) intelligently determining the colour assignments for
Key advantages of lidar sensing over its main competitor each 3D point. To solve these challenges, our basic assumption
(camera) are the invariance to lighting conditions and high- is that the camera-less 3D mapping device outputs a point cloud
precision range information, making lidars an excellent and (i.e., the 3D map) and the associated device’s trajectory t, as
proven alternative for 3D mapping. is the case in modern simultaneous localisation and mapping
On the other hand, a fundamental drawback of lidars com- (SLAM) algorithms [1]–[4]. In this context, we propose the
pared to cameras is that they do not provide rich visual appear- following solutions:
ance information.
Depending on the application, this type of information is of A. Challenge 1: Clock Synchronisation
great benefit for human (and often machine) interpretation. An
obvious solution is to fuse data from lidar with camera data, The camera, device and other modalities need to share a
hence combining range with colour information. There are a common clock before any processing can be performed.
number of strategies to perform this fusion, and some are tightly To achieve synchronisation, we assume that the computed
dependent on the devices and sensor setup. Our goal in this letter trajectory t can be transformed to a camera trajectory using a
is to create a practical solution that is generic, and that can be fixed rigid transformation. In Section III-A we outline an algo-
applied to existing camera-less platforms, by simply adding a rithm for temporally synchronising captured camera video with
camera to any existing lidar 3D mapping device. the computed device trajectory using the yaw-rate of the device
Consider, for example, the 3D mapping devices shown in and an estimate of yaw-rate computed from the camera video.
Fig. 1, to which we have added a camera. In this figure, the The yaw-rate of the device can be computed from the device tra-
mapping strategies range from having the mapping sensor as jectory itself or using the output from an inertial measurement
unit, if available. We then cross-correlate the yaw-rates from
the camera and mapping device in order to obtain the temporal
Manuscript received February 22, 2018; accepted June 12, 2018. Date of
publication July 9, 2018; date of current version August 2, 2018. This letter was
offset and scaling which relate the two modalities.
recommended for publication by Associate Editor J. L. Blanco-Claraco and Ed-
itor C. Stachniss upon evaluation of the reviewers’ comments. (Corresponding
author: Paulo Borges.)
B. Challenge 2: Visibility of Points
The authors are with the Robotics and Autonomous System Group, Data61, Determining which points are visible from a specific view-
CSIRO, Canberra, ACT 2601, Australia (e-mail:, pvechersky@gmail.com;
mark.cox@csiro.au; vini@ieee.org; Thomas.Lowe@csiro.au). point is crucial considering that the camera and the lidar can be
Digital Object Identifier 10.1109/LRA.2018.2854290 mounted far from each other. The example shown in Fig. 1(e)
2377-3766 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on October 11,2021 at 11:54:29 UTC from IEEE Xplore. Restrictions apply.
3576 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 3, NO. 4, OCTOBER 2018
Fig. 1. Examples of mobile mapping devices with various lidar configurations, ranging from handheld (a-c), to aerial (d) and ground (e) platforms. The original
lidar device is highlighted in red, while the added camera (which is not connected or synchronised to the lidar) is indicated by the green box. References for the
lidar mapping devices: (a) Zebedee [2], (b) CSIRO Revo, (c) ZebCam [5], (d) Hovermap [6], (e) Gator [7].
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on October 11,2021 at 11:54:29 UTC from IEEE Xplore. Restrictions apply.
VECHERSKY et al.: COLOURISING POINT CLOUDS USING INDEPENDENT CAMERAS 3577
Fig. 3. Examples of (a) the cross-correlation output, (b) poor time synchronisation, (c) successful time synchronisation. The red and blue signals represent the
yaw-rates of the camera and lidar, respectively.
between the camera and the laser scanner can be rigid [19] or to an existing system as is the case in our work. To achieve
non-rigid [18], [20]. A dense surface reconstruction algorithm synchronisation in these circumstances, we assume that the de-
is used by Moussa et al. [18] to recover fine 3D details not vice trajectory t is computed by the mapping device (which is
captured by tripod mounted lidar scanners. Point clouds can customary in modern lidar based SLAM algorithms). It is also
also be converted to a mesh structure which allows texture assumed that t can be transformed to a camera trajectory using
information to fill in the space between 3D points in the point a fixed rigid transformation. The rest of this section details our
cloud [18], [19]. Our algorithm only produces a coloured point synchronisation framework.
cloud where each point is assigned a single colour.
It should be noted that many systems [18]–[20] capture high A. Camera and Lidar Synchronisation
resolution still images as opposed to the video we capture in
To synchronise the camera and the lidar data, we correlate the
our work. These systems typically use measured scene informa-
yaw-rate obtained from the camera with the yaw-rate associated
tion to position each image within the lidar scans. Our method
with the camera-less device. The latter can be extracted directly
assumes a rigid transformation between the lidar and camera
from the device trajectory t or from an inertial measurement unit
which can be calculated a priori. We also estimate the tempo-
(IMU), provided that the raw IMU data is filtered using a com-
ral offset and scaling which relates timestamps in the device
plementary filter and the bias is accounted for.Empirical testing
trajectory to timestamps in the captured video.
has shown that significant yaw motion is present when mapping
Identifying points in a point cloud which are visible from a
realistic areas, such that the signal-to-noise ratio (where yaw is
given viewpoint is a very challenging problem as a point cloud
the signal) is very high. Roll or pitch (or a linear combination
contains no surface information in which to identify occlusion.
of roll-pitch-yaw) could also be used in scenarios/environments
There exists a number of works which attempt to fit a surface
where sufficient yaw motion is not present. For the camera, op-
to the point cloud assuming that the synthesised surface has
tical flow information is extracted to compute the yaw-rate. In
certain properties [21]. For example, Xiong et al. [22] recover
our implementation, we used the Kanade-Lucas-Tomasi (KLT)
a triangulated mesh from the point cloud by optimising an 2,q
feature tracks [23], [24] for the sparse optical flow, although
objective containing a bi-linear term representing the vertices of
different algorithms can be employed.
a mesh and the edges between vertices. The 2,q term was chosen
The initial estimate of the image timestamps is computed
due to its robustness. An alternative approach is to compute point
by sampling the device yaw-rate using linear interpolation. We
visibility without having to perform a surface reconstruction.
then perform an optimisation to solve for the time shift and
The seminal work of Katz et al. [8], [9] defines a hidden point
rate adjustment that maximises the cross-correlation between
operator which permits the calculation of visible points via a
the camera yaw-rate and the linearly interpolated device yaw-
convex hull calculation. The computational complexity of the
rate signals. Given that the yaw-rate of a device can be char-
convex hull calculation is O(N log N ), making it attractive due
acterised as a high frequency signal with zero-mean and low
to the large number of viewpoints in the device trajectory. This
correlation between consecutive samples (see the red and blue
algorithm is used in our approach.
signals in Figs. 3(b) and (c) for an example), cross-correlating
the yaw-rates of the camera and t yields a very high distinct
III. TEMPORAL REGISTRATION
peak, as shown in Fig. 3(a). This distinctiveness brings high
It is important that the camera and mapping device are syn- robustness to the temporal synchronisation.
chronised before attempting to colourise the point cloud. This Figs. 3(b) and (c) illustrate an unsuccessful and a success-
is relatively straightforward when both devices share a common ful example of synchronisation between the camera and device
clock and triggering. However, this automation is not always yaw-rate. The absolute magnitude of the two yaw-rate signals
practical or available, particularly if the camera has been added do not always match, but the relative overlapping of the peaks
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on October 11,2021 at 11:54:29 UTC from IEEE Xplore. Restrictions apply.
3578 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 3, NO. 4, OCTOBER 2018
B. Spatial Alignment
Theorem 1: The point visibility algorithm by Katz et al.
Now that the timestamps of the images are synchronised with
[8], [9] is scale invariant when using the exponential kernel
the device trajectory, it is possible to obtain the camera trajectory
f exponential (d; γ ).
from the device trajectory. First, the device trajectory is linearly
Proof: Let Q = [q1 , . . . , q N ] be the point cloud in which
interpolated at the image timestamps to give us the pose of the
to compute visibility. Let V (C) be the points that lie on
device at the moment that a given image was captured. The
the convex hull of the point cloud C. Let P(Q; γ ) =
calibrated transform between the device reference frame and
[F(q1 ; γ ), . . . , F(q N ; γ )] be the transformation of the point
the camera (see Section VI-A1 for details) is then applied to
cloud Q using the generalised hidden point removal operator
these interpolated poses to yield the pose of the camera when
with exponential kernel f exponential (d; γ ). The function V (C) is
each image was captured.
scale invariant, i.e., V (dC) = d V (C), by virtue of the convex
hull operator. Since visibility is determined using V (P(d Q; γ ))
IV. SELECTING VISIBLE POINTS
we need to show that V (P(d Q; γ )) = g(d)V (P(Q; γ )) or
As we have already noted, the camera can only observe a more specifically P(d Q; γ ) = g(d)P(Q; γ ) or F(dq; γ ) =
subset of the point cloud from a particular position. In this sec- g(d)F(q; γ ). An expansion of F(dq; γ ) reveals F(dq; γ ) =
tion we outline our approach to identifying the visible points. We d γ F(q; γ ).
start by describing the algorithm proposed by Katz et al. [8], [9]. Theorem 2: The output of the point visibility algorithm by
The camera is assumed to be positioned at the origin of the Katz et al. [8], [9] varies according to the scale of the input point
coordinate system. Each 3D point qi in the point cloud Q = cloud when using the linear kernel f linear (d; γ ).
[q1 , . . . , q N ] is then mapped to a new 3D point pi = F(qi ) Proof: The proof of this theorem is quite involved and is
using the generalised hidden point removal operator postponed to Appendix B.
f (q) The fact that the computation of visibility using the linear
q q q = 0
F(q) = (1) kernel is affected by an object’s scale is clearly an undesirable
0 q=0 property. We now illustrate where this property manifests it-
self in practice. In Fig. 4 we see a point cloud of a concave
where the function f : R++ → R++ is a monotonically
structure d Q = d[q1 , q2 , q3 ] where the point dq1 belongs to a
decreasing kernel function.
foreground object and the points dq2 and dq3 represent a back-
When applying the hidden point removal operator to a point
ground object. We assume that for d = 1, the points Q are
cloud we see that the kernel function f performs a spherical
classified as visible. The proof for Theorem 2 shows that the
reflection i.e., if qi < q j then pi > p j . The key insight
angles β1 (d) and γ2 (d) formed in the transformed point cloud
in the seminal work of [8], [9] is that the visible points in point
P(d Q; γ ) = [ p1 (d), p2 (d), p3 (d)] are monotonically increas-
cloud Q can be found by finding the points that map to points
ing functions of d when q1 < q2 < q3 . Therefore, there
which lie on the convex hull of the transformed point cloud
exists a d such that β1 (d) + γ2 (d) > π , causing the point p3 (d)
(including the camera as a point in the point cloud as well).
to no longer lie on the convex hull and thus the point dq3 is
classified as invisible.
A. Choice of Kernel Function
A synthetic example illustrating this property is shown in
The original letter [8] proposed a linear kernel function Fig. 5. An extremely dense point cloud is created by first plac-
f linear (d; γ ) = γ − d. With the introduction of the generalised ing a red 3D box in front of a white planar wall. This structure is
hidden point removal operator [9], the choice of kernel func- then duplicated (blue coloured box), scaled by d > 1 and then
tion expanded to include the exponential inversion kernel translated such that when the combined structures (Fig. 5(a))
f exponential (d; γ ) = d γ with γ < 0. A key contribution of our are imaged with perfect visibility analysis, a symmetric image
letter is to show that these kernels differ with respect to how is produced. Fig. 5(b) shows the resulting projections of the
point visibility changes as a function of an object’s scale i.e., combined structure using the linear (top) and exponential inver-
how does the choice of kernel impact visibility when a point sion (bottom) kernels for determining visibility. The black pixels
cloud Q is scaled by d > 0? This question is answered in the in the projections correspond to empty space. As predicted, the
following two theorems. increase in scale has adversely impacted the visibility analysis
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on October 11,2021 at 11:54:29 UTC from IEEE Xplore. Restrictions apply.
VECHERSKY et al.: COLOURISING POINT CLOUDS USING INDEPENDENT CAMERAS 3579
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on October 11,2021 at 11:54:29 UTC from IEEE Xplore. Restrictions apply.
3580 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 3, NO. 4, OCTOBER 2018
TABLE I
SYSTEM PARAMETERS
The ‘Values’ column shows typical values, depending on the point density and speed required.
VI. EXPERIMENTS
In all experiments we used existing camera-less lidar scanning
devices and added our cameras to them. This section provides
implementation details and results.
A. Practical Considerations
The system runs on a Mid 2014 MacBook Pro with Intel
Core i7 CPU @ 2.20 GHz with four cores and 16 GB of mem-
ory. We used two types of consumer cameras in our tests: a
GoPro 4 Session and a Ricoh Theta S 360◦ camera. Both cam-
eras record video at 29.97 frames per second. Three types of
platforms were used for testing: a hand-held device (Fig. 1(b))
built in-house by our team, an unmanned aerial vehicle DJI Ma-
trice 600 (Fig. 1(d)), and an electric all-terrain John Deere TE
Gator autonomous ground vehicle (Fig. 1(e)). One of the goals
of the multiple platforms is to illustrate the easy applicability
and portability of the system, as shown in the very different
setups in the pictures. To generate t and the 3D point cloud
to be colourised, we use the SLAM implementation described
in [1] and [2].
1) Extrinsic Calibration: The objective of the calibration
step is to determine the extrinsic transformation between the
lidar’s base reference frame and the camera. To this end, we
have implemented a visual tool that allows the user to manually
adjust the view of the point cloud over the camera image. To
calibrate the extrinsics, the user tunes every component of the
Fig. 6. Resultant colourised point clouds for (a), (b) a hand-held system, transformation (translation, rotation and scale) until the required
(c), (d) a ground vehicle, and (e), (f) an aerial vehicle. The left column shows a degree of accuracy is obtained. The quality of the calibration is
view of the coloured point clouds and the right column shows captured camera
frames.
evaluated by performing a perspective projection of 3D points
visible by the camera to the image plane and observing the
quality of the alignment between features that are distinctive
enough in both modalities.
2) Key Parameters: As discussed throughout the letter,
several parameters affect the quality and processing time of the
system. In our experiments, we present results with different
parameter configurations, summarised in Table I.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on October 11,2021 at 11:54:29 UTC from IEEE Xplore. Restrictions apply.
VECHERSKY et al.: COLOURISING POINT CLOUDS USING INDEPENDENT CAMERAS 3581
TABLE II
CAMERA YAW-RATE AND COLOUR ESTIMATION PROCESSING TIMES FOR THE DATASETS SHOWN IN FIG. 6
For all cases, the ‘Kernel Type’ was the exponential inversion kernel with γ = −0.001.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on October 11,2021 at 11:54:29 UTC from IEEE Xplore. Restrictions apply.
3582 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 3, NO. 4, OCTOBER 2018
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on October 11,2021 at 11:54:29 UTC from IEEE Xplore. Restrictions apply.