Professional Documents
Culture Documents
15arspc Submission 97
15arspc Submission 97
15arspc Submission 97
to Change Detection
Tariq S. Abuhashim, Mitch Bryson, Salah Sukkarieh
Australian Centre for Field Robotics
The University of Sydney
NSW 2006, Australia
t.abuhashim@acfr.usyd.edu.au
1 Abstract
Remotely sensed Multi-temporal images collected via satellites, manned aircrafts and
Unmanned Aerial Vehicles (UAVs) are important in understanding ecological phe-
nomenon and provide an important record to better understanding of ecosystems over
space and time. Since ecology deals with ever changing ecosystems, both time and
space must be taken into account when modelling the terrain surface and land cover.
Moreover, sparsity of observations, data association, and change in scale, all make this
task more difficult. In this paper, we tackle the terrain modelling problem by building
a continuous surface representation that allows formulating a sampling criterion using
Locally Weighted Partial Least Squares (LWPLS). Motivated by number of applica-
tions including combining and comparing multi-temporal and multi-resolution datasets,
dynamic update, and change detection, the model is able to adapt for changes, and
provides adequate representation of prediction uncertainty. We demonstrate it’s sim-
plicity of implementation and it’s applicability of terrain modelling on images collected
using monocular camera mounted on a UAV.
2 Introduction
Map reconstruction plays a major role in many applications of mobile robots and re-
cently three-dimensional maps have gained substantial interest in different fields of
robotics community including terrain [5] and sea floor [12] visualisation, mapping and
classification of vegetation [6], path planning, control and navigation [13]. Raw point
clouds of features extracted from images have been used as the geometric primitive of
choice for several modelling tasks such as rendering [1], editing [30], and compression
[20].
1
Autonomous navigation in vegetated terrain is a challenging remains a challenging
problem in robotics due to difficulties in modelling the high variability of out door en-
vironments. In this effort, Unmanned Aerial Vehicles (UAVs) have several advantages
over piloted aircrafts; they are less costly, safer, can be deployed easily and repeatedly,
and are easy to fly at low altitude. However, their payload is limited as they can’t
carry heavy laser scanners. Vision sensors such as stereo rigs and monocular cameras
provide better and more feasible solution due to their light weight and wider field of
view. However, in applications where UAVs are to fly at higher altitudes wider baseline
is required to estimate depth accurately, and therefore only monocular cameras can be
used to acquire images of terrain and then to reconstruct it’s surface.
In this paper, we present a scheme for approximating terrain surface from irregular
sampled points that allows formulating a sampling criteria. Motivated by number of
applications including combining and comparing multi-temporal and multi-resolution
datasets, dynamic update, and change detection, we build a continuous representation
of terrain surface using LWPR. The model is able to adapt for changes, and provides
adequate representation of prediction uncertainty. Unlike other batch machine learning
techniques, the system accomplishes online learning without the need to store the
training data or large scale matrix inversion. It also assumes that the input and output
distributions of the data are unknown, and in which these distributions may change over
time. The system utilises a nonparametric regression approach to build a receptive field-
based learning system for incremental surface approximation. A local model is fitted
incrementally within each receptive field such that local area surface approximation
is accomplished in the spirit of a Taylor series expansion. Incremental learning is
accomplished by incrementally minimising a weighted local cross validation error. For
the application of terrain reconstruction, our approach offers number of advantages
over other machine learning algorithms:
1. Most of terrain representations change over time, and the terrain model requires
2
adaptation. One application that suits LWPR very well cause of the embedded
time scale representing the rate at which changes occur.
2. For applications requiring terrain models to be constructed incrementally and
online, our use of LWPR offers online learning of terrain model.
3. For applications requiring to navigate or to model large terrain, this involves
collecting large amount of training data and for some regression methods such as
SVM or GP, computational complexity scales with size of training set.
4. Terrain surface properties, such as smoothness and noise level, vary globally over
the area of interest and hence have to be treated locally. Therefore, multiple local
models offer better non-linear surface approximation over the local scale.
The rest of the paper is organised as follows, in section 4 we briefly review non-
parametric learning using LWPR. Section 5 review surface reconstruction algorithm.
Experimental setup is presented in section 6 with results of the algorithms demon-
strated on images collected via monocular camera mounted on an Unmanned Arial
Vehicle (UAV).
3 Related Work
The computer vision literature is rich with approaches to 3D surface reconstruction.
With the advent of computers with more memory and computational power, the contin-
uous approximations, such as B-splines [10] and Bezier curves [9], gave way to discrete
approximations, such as polygon meshes [14] and voxel grids [8]. In [12] a complete
reconstruction and visualisation pipeline that generates 3D meshes and voxels with dif-
ferent levels of details using sparse features extracted from seafloor is proposed. In [5]
a framework for integrating IMU, GPS and monocular vision camera information using
a batch smoothing approach which allows for the construction of dense terrain maps
was presented. The method integrates all the sensor information using a statistically
optimally non-linear least squares smoothing algorithm to estimate vehicle poses simul-
taneously to a dense point feature map of the terrain, a method that’s been known for
many years in computer vision community as Bundle Adjustment (BA) [27]. However,
in both approaches the reconstructed surfaces are irregular and sparse, and don’t allow
for dynamic updates or sampling criterion.
3
scales cubically with the number of training samples, requires a large memory to store
large training data, and the assumption that surface properties such as smoothness
and noise can be treated globally.
Inspired by local learning, number approaches were induced for real-time model
learning for robotics using the receptive field concept, such as the Locally Weighted
Projection Regression (LWPR) [29] and Local Gaussian Process Regression (LGPR)
[19]. In local regression, the state space (in the case of terrain surface learning the
state space is defined by the 3D positions of points features in a reference coordinates)
is partitioned into local regions within which the local model is approximated. The
allocation of these partitions is essential, and therefore an appropriate online clustering
of the state space becomes a central problem in local approaches. In [19], a LGPR
model was proposed to combine the high accuracy of the GPR and the fast speed of
the LWPR. The applicability of the algorithm was demonstrated for real-time online
learning of the inverse dynamics for robot model-based control. The training data was
partitioned into local regions, for each an individual GP model is trained. The predic-
tion for a query point is performed by weighted estimation using nearby local models.
A distance-based measure is used for partitioning the data and weighted predictions.
However, storing the local training data for inference is still required and therefore the
size of the memory required will scale with the size and dimensions of training data.
Another drawback of LGPR is that the covariance function need to be updated every
time new data point arrives, and that the deletion and insertion of new data points in
4
order to escape the curse of dimensionality of standard GPs is not trivial. The LWPR
algorithm removes this requirement as the true function is approximated within each
local region a linear function and only minimum number of parameters are stored. The
global true function is then approximated with local linear functions covering the whole
state space and learning become computationally feasible due to low computational
demands of the Local Projection Regression (LPR) which can be performed online.
5
parameters β̂k can be computed incrementally and online using Partial Least Squares
(PLS) [29, 11], and the distance metric Dk , which is equivalent to the inverse of the
surface length scale determining the smoothness of the predicted terrain surface, can
be updated incrementally using leave-one-out cross validation [29], and will be dis-
cussed more detailed in the following section. A pseudocode of the model learning is
outlined in algorithm 1, where K is the total number of clusters, η is a threshold that
determines if a point belongs to a cluster and hence determines when to create new
cluster and Dint is the initial distance metric. The significance of the threshold η will
be discussed in details in the results in section 5.
where N denotes the number of data points in the training set, M denotes number
of dimensions and γ is a tradeoff parameter that can be determined empirically or from
the assessment of the maximal local curvature of the function to be approximated. The
first term of the cost function is the mean leave-one-out cross validation error of the
local model, and the second term is a penalty term which ensures that the receptive
field wont shrink indefinitely in case of large amounts of training data. The matrix
P corresponds to the weighted covariance matrix of the input data. The inversion of
6
such matrix can be sometimes expensive, specially with increasing number of points
or dimensions
To contribute more into the computational efficiency of the algorithm, the optimi-
sation process is carried into the projection space were projected inputs are orthogonal.
Equation 5 can be exactly formulated in term of the projected inputs zi = [zi,1 , zi,R ]T
as
N M
1 wi (yi − ŷi )2 γ X
(6)
X
J= N T −1
+ Dij2
i=1 (1 − w i z i P z z i ) 2 M i,j=1
wi
P
i=1
5 Results
5.1 Experimental Setup
In this section we the experimental set up of the platform used to collect the training
terrain data. Data was collected over a farmland location in Queensland Australia using
the J3 Cub platform shown in figure 1a. The vehicle is capable of carrying a payload of
15kg with an endurance of one hour. The vehicle flew at a fixed hight of 100m above
the ground. The payload, shown in figure 1b, included a low-cost IMU running at 100
Hz, a GPS receiver providing updates at 5Hz, a Colour camera acquiring frames at
3.75Hz at a resolution of 1024 × 768 pixels, and a PC104 computer used to log the
sensor data.
5.2 Methodology
Inertial measurements were integrated with GPS updates using a navigation Extended
Kalman Filter (EKF) [2] to estimate the vehicle position and attitude. Images were
processes as pairs, as shown in figure 2, to form a stereo rig and the displacement
of the vehicle between the two positions, where every images pairs were taken, was
used to estimate the baseline between the two frames, and multiple-view geometry
7
(a) (b)
Figure 1: The J3 Cup UAV used to collect data and the payload system onboard.
principles were applied to triangulate features extracted from both images using Good
Features To Track from OpenCV library [4]. These features were tracked between
multiple frames using pyramid implementation of Locas-Kanade optical flow [3].
Tracked features were then used to estimate the fundamental matrix. In this work we
use the fundamental matrix with RANSAC and known rotations proposed by [16]. The
estimated fundamental matrix is them used to remove outliers violating the epipolar
constraints imposed by the fundamental matrix. Matched features pairs were then
triangulated to estimate features locations in world coordinates. Triangulated features
are then fed into the LWPR algorithm learning the smoothness of the surface by
clustering the data into local areas. The learning step includes learning the distance
metric and the local regression parameter within every cluster. The learned models are
then used to predict heights give set of random query points.
6 Results
This section we present results from both the terrain reconstruction algorithm and the
terrain learning algorithm applied to the data collected by the fixed-wing UAV. Figure 3
shows point cloud of triangulated features from pair of images. The extracted features
are noisy and sparse, specially in areas around trees where most of the extracted fea-
tures candidates were rejected due to the branches configuration changing with camera
prospective.
In order to build a continuous surface from the sparse observations shown in fig-
ure 3, the data has to be clustered into number of local regions where local linear
approximations apply. This is a critical step is the LWPR algorithm since local ap-
8
p1
p2 8 meters
Figure 2: Pair of images acquired using the down looking camera on the UAV. The frequency
by which images were captured allowed the UAV to travel average distance of 8 meters
between frames creating baseline required to estimate depth of features
2
95
90
−380
85
−385
80
−390 75
70
−395
65 North (meters)
East (meters)
Figure 3: Sparse features extracted and triangulated features in the 3D world coordinates
9
proximations within every region will heavily correlate to the observations included and
poor result of the clustering stage will result into deterioration of the overall algorithm
performance. The main factor controlling the clustering process is the threshold η that
determines if a point belongs to a cluster and hence determines when to create new
cluster. Figure 4 shows the results of varying the cluster threshold while learning the
distance metric from the data. In the figure, as the value of the threshold changes from
small values to larger values the space changes from being under-clustered into over-
clustered and hence the the degree of overlap between individual models increases. As
result, figure 5 shows the effect of degree of overlap variation. The smoothness of
the predicted surface is tightly related to the weight threshold value. As the degree of
overlap increases the surface preserves more details. On the other hand, as the degree
of overlap increases more local approximations become involved in predicting height
at given query point and hence we expect higher accuracy in predictions. This can
be easily noticed by looking at predicted variance variations. One more observation is
that as the threshold value increases the algorithm can extend predictions at regions
with no observations, while at low values of threshold the predicted mean goes to zero
as result of the zero mean assumption. Finally, it worths mentioning that as the value
of the threshold approached 1, the LWPR model approached the accuracy of the GPR.
Finally, for the purpose of texturing, rendering and 3D modelling, the continuous
surface representation can be resampled at regular steps and regular meshes can be
efficiently generated. Figure 6 shows the irregular Delaunay triangulation and the
regular mesh generated from the constructed terrain model.
10
Input space view of RFs
−375
−380
−385
East (meters)
−390
−395
−400
60 65 70 75 80 85 90 95 100
North (meters)
(a)
−375
−380
−385
East (meters)
−390
−395
−400
−405
60 65 70 75 80 85 90 95 100
North (meters)
(b)
−375
−380
East (meters)
−385
−390
−395
−400
60 65 70 75 80 85 90 95 100
North (meters)
(c)
Figure 4: State space is divided into local regions (receptive field) where the shape and size
of local areas is learned individually from the data using gradient descent for the cases of
(a) η = 0.001, (b) η = 0.1, and (c) η = 0.5
11
The fitted surface
8
6
4 5
2
0
0
95 95
90 −380 90
−380 85
85
−385 80 −385 80
−390 75 −390 75
70 −395 70
−395
65 65 North (meters)
East (meters) North (meters) East (meters)
(a) (b)
10
ALtitude (meters)
10
5 5
0 0
95 95
90 −380 90
−380 85
85 −385
−385 80 80
−390 75 −390 75
70 −395 70
−395 65
65 North (meters) East (meters) North (meters)
East (meters)
(c) (d)
10
ALtitude (meters)
10
5
5
0
0 95
95 90
90 −380
−380 85
85 −385 80
−385 80 −390 75
−390 75 70
70 −395
−395 65 North (meters)
65 North (meters) East (meters)
East (meters)
(e) (f)
Figure 5: Predicted mean of terrain surface (left column) and predicted standard deviation
(right column) for different values of η in the same order as in figure 4
12
(a)
(b)
Figure 6: Comparison between sparse Delaunay triangulated mesh and regular mesh of terrain
surface. In figure (a) the Delaunay triangulated mesh of observed features is sparse and
irregular. Specially at areas around trees due to the limited field of view of the down looking
camera. In figure (b) the reconstructed surface was resampled at regular intervals and the
Delaunay triangulation will result into a regular and smooth mesh. As shown in the figure,
the surface is too smooth due to sparsity of observations. It is expected that by increasing
the density of observed features, the algorithm will preserve more details.
13
References
[1] A. Adamson and M. Alexa. Anisotropic point set surfaces. Computer Graphics
Forum, 25(4):717–724, 2006.
[2] Y. Bar-Shalom, X.R. Li, X.R. Li, and T. Kirubarajan. Estimation with applications
to tracking and navigation. Wiley-Interscience, 2001.
[3] J.Y. Bouguet et al. Pyramidal implementation of the lucas kanade feature tracker
description of the algorithm. Intel Corporation, Microprocessor Research Labs,
OpenCV Documents, 1999.
[4] G.R. Bradski and A. Kaehler. Learning opencv. O’Reilly, 2008.
[5] M. Bryson, M. Johnson-Roberson, and S. Sukkarieh. Airborne smoothing and
mapping using vision and inertial sensors. In Proceedings of the 2009 IEEE inter-
national conference on Robotics and Automation, pages 3143–3148. Institute of
Electrical and Electronics Engineers Inc., The, 2009.
[6] M. Bryson, A. Reid, F. Ramos, and S. Sukkarieh. Airborne vision-based mapping
and classification of large farmland environments. Journal of Field Robotics, 2010.
[7] L. Csató and M. Opper. Sparse on-line Gaussian processes. Neural Computation,
14(3):641–668, 2002.
[8] Brian Curless and Marc Levoy. A volumetric method for building complex models
from range images. In SIGGRAPH ’96: Proceedings of the 23rd annual conference
on Computer graphics and interactive techniques, pages 303–312, New York, NY,
USA, 1996. ACM.
[9] J.A. Eisenman. Graphical editing of composite bezier curves. Master’s thesis, Mas-
sachusetts Institute of Technology, Dept. of Electrical Engineering and Computer
Science, 1988.
[10] B.F. Gregorski, B. Hamann, and K.I. Joy. Reconstruction of B-spline surfaces
from scattered data points. In Proceedings of Computer Graphics International,
pages 163–170. Citeseer, 2000.
[11] T. Hastie, R. Tibshirani, J. Friedman, and J. Franklin. The elements of statistical
learning: data mining, inference and prediction. The Mathematical Intelligencer,
27(2):83–85, 2005.
[12] M. Johnson-Roberson, O. Pizarro, S.B. Williams, and I. Mahon. Generation
and visualization of large-scale three-dimensional reconstructions from underwater
robotic surveys. Journal of Field Robotics, 27(1):21–51, 2010.
[13] S. Karumanchi, T. Allen, T. Bailey, and S. Scheding. Non-parametric learning to
aid path planning over slopes. The International Journal of Robotics Research,
2010.
14
[14] Venkat Krishnamurthy and Marc Levoy. Fitting smooth surfaces to dense polygon
meshes. In SIGGRAPH ’96: Proceedings of the 23rd annual conference on Com-
puter graphics and interactive techniques, pages 313–324, New York, NY, USA,
1996. ACM.
[15] Q.V. Le, A.J. Smola, and S. Canu. Heteroscedastic Gaussian process regression.
In Proceedings of the 22nd international conference on Machine learning, page
496. ACM, 2005.
[16] Todd Lupton. Inertial SLAM with Delayed Initialisation. 1999.
[17] C. Ma. Families of spatio-temporal stationary covariance models. Journal of
Statistical Planning and Inference, 116(2):489–501, 2003.
[18] D. Nguyen-Tuong and J. Peters. Incremental Sparsification for Real-time Online
Model Learning.
[19] D. Nguyen-Tuong, M. Seeger, and J. Peters. Local gaussian process regression
for real time online model learning and control. Advances in Neural Information
Processing Systems, 22, 2008.
[20] M. Pauly, M. Gross, and L.P. Kobbelt. Efficient simplification of point-sampled
surfaces. In Proceedings of the conference on Visualization, volume 2, pages
163–170. Citeseer, 2002.
[21] C.E. Rasmussen. A Unifying View of Sparse Approximate Gaussian Process Re-
gression. Journal of Machine Learning Research, 6:1939–1959, 2005.
[22] C.E. Rasmussen and Z. Ghahramani. Infinite mixtures of Gaussian process experts.
In Advances in neural information processing systems 14: proceedings of the 2001
conference, page 881. MIT Press, 2002.
[23] CE Rasmussen and CKI Williams. Gaussian Processes for Machine Learning. 2006.
The MIT Press, Cambridge, MA, USA.
[24] S. Schaal and C.G. Atkeson. Constructive incremental learning from only local
information. Neural Computation, 10(8):2047–2084, 1998.
[25] Y. Shen, A. Ng, and M. Seeger. Fast gaussian process regression using kd-trees.
Advances in neural information processing systems, 18:1225, 2006.
[26] V. Tresp. Mixtures of Gaussian processes. Advances in Neural Information Pro-
cessing Systems, pages 654–660, 2001.
[27] B. Triggs, P. McLauchlan, R. Hartley, and A. Fitzgibbon. Bundle adjustmentÑa
modern synthesis. Vision algorithms: theory and practice, pages 153–177, 2000.
[28] S. Vasudevan, F. Ramos, E. Nettleton, and H. Durrant-Whyte. Gaussian process
modeling of large-scale terrain. Journal of Field Robotics, 26(10):812–840, 2009.
15
[29] S. Vijayakumar, A. D’souza, and S. Schaal. Incremental online learning in high
dimensions. Neural Computation, 17(12):2602–2634, 2005.
[30] Matthias Zwicker, Mark Pauly, Oliver Knoll, and Markus Gross. Pointshop 3d: an
interactive system for point-based surface editing. In SIGGRAPH ’02: Proceedings
of the 29th annual conference on Computer graphics and interactive techniques,
pages 322–329, New York, NY, USA, 2002. ACM.
16