VLocNet is an end-to-end trainable CNN architecture that performs both visual odometry estimation and 6-DoF pose regression from monocular images. It uses a multi-task learning approach, with two sub-networks: one for visual odometry that predicts relative pose between image pairs, and one for global pose regression that incorporates geometric consistency feedback. The architecture is based on ResNet-50 and uses auxiliary learning to improve global pose estimation by also learning visual odometry as a secondary task.
VLocNet is an end-to-end trainable CNN architecture that performs both visual odometry estimation and 6-DoF pose regression from monocular images. It uses a multi-task learning approach, with two sub-networks: one for visual odometry that predicts relative pose between image pairs, and one for global pose regression that incorporates geometric consistency feedback. The architecture is based on ResNet-50 and uses auxiliary learning to improve global pose estimation by also learning visual odometry as a secondary task.
VLocNet is an end-to-end trainable CNN architecture that performs both visual odometry estimation and 6-DoF pose regression from monocular images. It uses a multi-task learning approach, with two sub-networks: one for visual odometry that predicts relative pose between image pairs, and one for global pose regression that incorporates geometric consistency feedback. The architecture is based on ResNet-50 and uses auxiliary learning to improve global pose estimation by also learning visual odometry as a secondary task.
- Supervised Learning: - Input: 2 consecutive monocular images - Output: 6-DoF global pose and odometry (x, y, z, φ, ϴ, Ѱ)
- Auxilarry Learning: Improve global localization by learning
visual Idea PoseNet: Use CNN for end-to-end global localization (global pose), minimize translational and rotational L2- loss.
DeepVO: Use 2 consecutive images as input to learn temporal
features for relative pose estimation (odometry) Idea PoseNet: Use CNN for end-to-end global localization (global pose), Auxiliary learning: minimize translational and rotational L2- loss. improve global pose estimation by learning VO as a secondary task
DeepVO: Use 2 consecutive images as input to learn temporal
features for relative pose estimation (odometry) Idea PoseNet: Use CNN for end-to-end global localization (global pose), Auxiliary learning: minimize translational and rotational L2- loss. improve global pose estimation by learning VO as a secondary task
DeepVO: Use 2 consecutive images as input to learn temporal
features for relative pose estimation (odometry) Multitask learning: Learning unified models for tasks across different domains Architecture Backbone: 3 ResNet-50, ELU non-linear activation
VO sub-net: Use Siamese architecture to
learn temporal correlation between two consecutive motions.
Output: Relative poses and quaternions
GlobalPose sub-net: CNN with feed-back output
for geometric consistency
Output: Global poses and quaternions
GlobalPose subnet
Input: Current image and previous predicted pose
Output: predicted global pose (3x1 and 4x1)
=> Use feed-back output from previous frame to ensure
consistency of current prediction GlobalPose subnet
consecutive outputs be as close as ground truth odometry as possible GlobalPose subnet
Geometric Consistency loss:
VO subnet
Input: Current image and previous image
Output: predicted relative pose between 2 images (3x1 and 4x1)
Shared weights: Current stream ( ) shares weights with GlobalPose subnet, updated the same way in back prop Geometric Consistency loss: Benchmarking Benchmarking #Todo
1. Continue survey for learning-based localization and mapping