Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

VLOCNET

Nguyen Anh Minh – IVSR - 2021


What is VLocNet?

- Pose Regression and Visual Odometry Estimation

- End-to-end trainable CNN architecture


- Supervised Learning:
- Input: 2 consecutive monocular images
- Output: 6-DoF global pose and odometry (x, y, z, φ, ϴ, Ѱ)

- Auxilarry Learning: Improve global localization by learning


visual
Idea
PoseNet: Use CNN for end-to-end global localization (global pose),
minimize translational and rotational L2- loss.

DeepVO: Use 2 consecutive images as input to learn temporal


features for relative pose estimation (odometry)
Idea
PoseNet: Use CNN for end-to-end global localization (global pose), Auxiliary learning:
minimize translational and rotational L2- loss. improve global pose
estimation by
learning VO as a
secondary task

DeepVO: Use 2 consecutive images as input to learn temporal


features for relative pose estimation (odometry)
Idea
PoseNet: Use CNN for end-to-end global localization (global pose), Auxiliary learning:
minimize translational and rotational L2- loss. improve global pose
estimation by
learning VO as a
secondary task

DeepVO: Use 2 consecutive images as input to learn temporal


features for relative pose estimation (odometry)
Multitask learning:
Learning unified
models for tasks
across different
domains
Architecture
Backbone: 3 ResNet-50, ELU non-linear activation

VO sub-net: Use Siamese architecture to


learn temporal correlation between two
consecutive motions.

Output: Relative poses and quaternions

GlobalPose sub-net: CNN with feed-back output


for geometric consistency

Output: Global poses and quaternions


GlobalPose subnet

 Input: Current image and previous predicted pose

 Output: predicted global pose (3x1 and 4x1)

=> Use feed-back output from previous frame to ensure


consistency of current prediction
GlobalPose subnet

Geometric Consistency loss: Learnable Weighting hyperparameters


GlobalPose subnet

Geometric Consistency loss: Learnable Weighting hyperparameters

Relative motion between 2


consecutive predictions

Ensure the difference between 2


consecutive outputs be as close as
ground truth odometry as possible
GlobalPose subnet

Geometric Consistency loss:


VO subnet

 Input: Current image and previous image

 Output: predicted relative pose between 2 images (3x1 and 4x1)


 Shared weights: Current stream ( ) shares weights with GlobalPose subnet, updated the same way in back prop
Geometric Consistency loss:
Benchmarking
Benchmarking
#Todo

1. Continue survey for learning-based localization and mapping


2. Training plan for Team AI

You might also like