Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Semidirect Visual Odometry and Modular Multi-Sensor

Fusion approach to the SLAM problem


• Working Principle
At the core of this approach is the idea of dividing in two separate modules the
visual odometry and the sensor fusion tasks necessaries for the SLAM problem.
The first task is dealt with using the Semidirect Visual Odometry algorithm (SVO)
and the latter one by the Multi-Sensor Fusion algorithm (MSF). The main idea
behind this split approach is to use the SVO module to have a pose estimate and
then feed the pose estimate to the MSF algorithm where it will be fused together
with pose estimates coming from the other sensors (IMU, Sonar, etc).
o SVO: SVO is a lightweight highly expandable and open source visual
odometry algorithm tailored for on board pose computation. The algorithm
splits between two cores the Mapping and the Motion estimation loads.
The Motion estimation is performed by minimization of the photometric
residual, defined by the intensity difference of pixels in subsequent images
that observe the same 3D point. Methods that use every pixel in the image
are defined as dense. The SVO implementation proposes using only
corners and features lying on intensity edges, they call this a sparse
method. This speeds computation considerably and allows for the motion
estimation thread to process frames in real time. The Mapping part of the
algorithm is tasked with the estimation of the depth of the 3D points
extracted from features. This is done using recursive Bayesian filters that
are initialized only when the 3D points actively tracked fall below some
threshold. Decoupling the mapping and motion estimation allows for the
motion estimation core to deal in real time with upcoming frames and for
mapping thread to work only in keyframes, thus the motion estimation is
not slowed down by the mapping process.
o MSF: MSF uses an Extended Kalman Filter to process data coming from
diverse sensors and then fuses them together to have a final pose
estimate. This implementation of EKF can deal with multiple delayed
sensor readings (i.e. IMU working at several hundred Hz and poses from
the SVO at 20/30 Hz) and sensor failures.
• Pros and Cons
The main advantage of this implementation is the speed at which it runs on board
and the extend to which it can be customized. SVO has different presets already
available that span from lightweight ones to the most accurate ones. MSF, by its
very nature can use data from any kind of sensor to produce a final estimate of the
pose of the MAV. This should give us the possibility to thinker with sensor and
cameras down the line and to not commit to any sensor setup from the very
beginning. The main drawbacks of the MSF+SVO implementation are:
o MSF fuses data coming from SVO with pose estimates coming from other
sensors to better estimate the pose of the MAV but it does not use the data
coming from the sensors to better estimate the position of the 3D points in
the map of the environment.
o The 3D points are used in the Motion Estimation but are not used to
reconstruct a 3D map of the environment.
o The MSF has a page on ros.wiki where explanations on how to implement it
are given. SVO has only it’s GitHub page. [I have not yet tried to implement
them on ros].
o On it’s fastest preset the SVO+MSF has poor accuracy.
• Hardware
The SVO+MSF camera setup can run on basically any arm processor and uses
little ram. It is highly sensitive to clock speed and to best perform requires a
multicore processor. Both these requirements are fulfilled by the raspberry pi4, it
is not CUDA accelerated so maybe it will perform worse on the Jetson Nano than
on the Raspberry. As far as sensors and camera goes it can work with both mono
and stereo camera set up and can account for the distortion coming from large
FOV cameras, reading the hardware specification of the projects on which it has
been tested I noticed that all the cameras were global shutter cameras, at the
same time I have not read this to be mentioned as a strict requirement.
• Final considerations
The SVO+MSF solution seems to be a quite interesting solution. Its main
drawback is the fact that it does not reconstruct a map of the environment from
the 3D points it locates. If this is the direction we intend to move on we need to
find a way to overcome this problem using a ros package capable to use the 3D
points to reconstruct a 3D map. If we decide to use the intel cameras as the
cameras for the drone the SVO part would be redundant since those cameras
already output a pose for the drone. The MSF part of the algorithm could still be
used to fuse the output from the intel cameras with the other sensor onboard.
-Online resources
https://github.com/uzh-rpg/rpg_svo

https://github.com/uzh-rpg/rpg_svo_example

http://wiki.ros.org/ethzasl_sensor_fusion/Tutorials/Introductory%20Tutorial%20for%20Multi-
Sensor%20Fusion%20Framework

https://github.com/ethz-asl/ethzasl_msf

You might also like