Download as pdf or txt
Download as pdf or txt
You are on page 1of 85

VIS Visual Intelligence

and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems

Demo Video from Huawei


VIS Visual Intelligence
and Systems

Demo Video from Amazon


VIS Visual Intelligence
and Systems

Demo Video from Mobileye


VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems

Computer vision is solved


VIS Visual Intelligence
and Systems

Install PyTorch

Find code on GitHub

Download trained model weights


VIS Visual Intelligence
and Systems

Install PyTorch

Find code on GitHub

Download trained model weights


VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems

Reviewer 2
VIS Visual Intelligence
and Systems

Renowned CV Professor
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems

Object Motion Initial Association


Detection Estimation Association Optimization

Find objects in Propagate the Associate objects Optimize the


each frame with objects from with estimated association with
your best object Frame T to Frame motion and matching
detection T+1. It may not appearance constraints using
algorithms. depend on Frame features. Hungarian
T+1. matching or GNN.

Simple Online and Realtime Tracking, ICIP 2016


Simple Online and Realtime Tracking with a Deep Association Metric, ICIP 2016
Tracking Without Bells and Whistles, ICCV 2019
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems

Does it need to be this complicated?

Simple Online and Realtime Tracking, ICIP 2016


Simple Online and Realtime Tracking with a Deep Association Metric, ICIP 2016
Tracking Without Bells and Whistles, ICCV 2019
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems

Location
Association

Object Motion Association


Detection Estimation Optimization

Appearance
Association
VIS Visual Intelligence
and Systems

Location
Association

Object Motion Association


Detection Estimation Optimization

Appearance
Association
VIS Visual Intelligence
and Systems

Why doesn’t appearance provide enough information in


current models?
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems

- Similar bounding boxes


- Misleading regions in the background
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems

Sparse GTs Quasi-Dense Samples


cls
RoI Align BBox
Backbone RPN
Head
reg

Frame 1 shared shared

cls
RoI Align BBox
Backbone RPN
Head
reg
Frame 2

Object Detection
VIS Visual Intelligence
and Systems

Sparse GTs Quasi-Dense Samples

RoI Align Embedding


Backbone RPN
Head

Frame 1 shared shared

RoI Align Embedding


Backbone RPN
Head
Contrastive Learning
Frame 2

Instance Similarity Learning


VIS Visual Intelligence
and Systems

Tracklets Vanished Tracklets Backdrops Detections

Embedding Consistent
Extractor

Previous Frames
shared High Similarity

Vanished Object

Embedding
Extractor
Inconsistent
Bi-directional Softmax
New Object Low Similarity
Current Frame
Object Association
VIS Visual Intelligence
and Systems

Tracklets 15 20 17 14 Vanished Tracklets Backdrops


1
11 10
22 12
16 19 7
6

0
7

Bi-directional Softmax
13
8

21
Previous Frames

Detections 13 0

14 19 23
22 20
1
15
Current Frame 17 21
VIS Visual Intelligence
and Systems

MOT 17
80

75

73.7 74.5
70

68.7
MOTA

67.8
65 66.6
63
60
60.5
55 56.3

50
Tracktor++v2 Lif_T* TubeTK* CTrackerV1 CenterTrack* QDTrack (Ours) FairMOT* QDTrack* (Ours)

* Indicates more external training data is used


VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems

2x speedup
The picture can't be displayed. The picture can't be displayed.

VIS Visual Intelligence


and Systems
Panoptic Drivable Area Bounding Box Instance Segmentation
Segmentation Lane & Tagging Tracking Tracking

Sunny
City Street
Daytime
VIS Visual Intelligence
and Systems

https://github.com/SysCV/bdd100k-models
The picture can't be displayed. The picture can't be displayed.

VIS Visual Intelligence


and Systems

103

103
103

103
318
131
12.6
300 30 28
120 12
# Labeled Frames

# Labeled Frames
# Instances

# Instances
200 20
80 8

100 40 10 8 4
34 3
8 0.92 0.75
1.64 0.23
0 0 0 0
KITTI MOT17 BDD100K KITTI MOTS BDD100K

Frames Instances
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems

This person is
about to disappear
VIS Visual Intelligence
and Systems

This person is
tracked
VIS Visual Intelligence
and Systems

The vehicles are


constantly occluded
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems

Waymo Open Dataset


60

55 55.6

50
MOTA

49.6
45
44.92
42.62
40
38.25
35
IoU Tracktor++ RetinaTrack SoDA QDTrack
2020 2020 2020 arXiv Ours
VIS Visual Intelligence
and Systems

Feature Representation
VIS Visual Intelligence
and Systems

3D Object Tracking Segmentation Tracking


VIS Visual Intelligence
and Systems

[R|t]

[R|t]
VIS Visual Intelligence
and Systems

Input Region Monocular


Image Proposals 3D Estimation
Di
m
Angle
T-2

Depth
r
nte
Ce

Hu, Cai, Wang, Lin, Sun, Krähenbühl, Darrell, Yu, Joint Monocular 3D Vehicle Detection and Tracking, ICCV 2019
VIS Visual Intelligence
and Systems

Input Region Monocular Deep


Image Proposals 3D Estimation Association
Di Trackers
m Predict
Angle
T-2

Depth
r
nte
Proposals Update

Ce
Associate

Di Trackers
m Predict
Angle
T-1

Depth
r
nte
Ce

Proposals Update
Associate

Hu, Cai, Wang, Lin, Sun, Krähenbühl, Darrell, Yu, Joint Monocular 3D Vehicle Detection and Tracking, ICCV 2019
VIS Visual Intelligence
and Systems

Input Region Monocular Deep Multi-frame


Image Proposals 3D Estimation Association Refinement
Di Trackers
m Predict
Angle
T-2

Depth
r
nte
Proposals Update

Ce
Associate

Di Trackers
m Predict
Angle
T-1

Depth
r
nte
Ce

Proposals Update
Associate

Di Trackers
m Predict
Angle
T
Depth
r
nte
Ce

Proposals Update
Associate
Frame (a) (b) (c) (d)
1. Object Detection 2. Object Dist., Orientation, Size 3. Association 4. Motion prediction
Hu, Cai, Wang, Lin, Sun, Krähenbühl, Darrell, Yu, Joint Monocular 3D Vehicle Detection and Tracking, ICCV 2019
VIS Visual Intelligence
and Systems

Occlusion-aware Tracked
Association Occluded
Lost

Depth
Frame
Order

Visible
Occluded
Truncated T-2
VIS Visual Intelligence
and Systems

Occlusion-aware Tracked
Association Occluded
Lost

Depth
Frame
Order

Visible T-1
Occluded
Truncated T-2
VIS Visual Intelligence
and Systems

Occlusion-aware Tracked
Association Occluded
Lost

Depth
Frame
Order

Visible T-1
Occluded
Truncated T-2
VIS Visual Intelligence
and Systems

Frame = T-1
Tracks

Frame =
Proposals
T
VIS Visual Intelligence
and Systems

Frame = T-1 Depth Ordering


Tracks
Matching

Occlusion
Detection
Low of Interest
Frame = T 0.06
Proposals 0.00
0.13

0.82

High IoU
VIS Visual Intelligence
and Systems

Frame = T-1 Depth Ordering


Tracks
Matching

Detection of
Low Interest
Frame = T 0.06
Proposals 0.00
0.13

0.82

High IoU
VIS Visual Intelligence
and Systems

Results on Waymo Dataset


Hu, Cai, Wang, Lin, Sun, Krähenbühl, Darrell, Yu, Joint Monocular 3D Vehicle Detection and Tracking, ICCV 2019
VIS Visual Intelligence
and Systems

Results on Waymo Dataset


Hu, Cai, Wang, Lin, Sun, Krähenbühl, Darrell, Yu, Joint Monocular 3D Vehicle Detection and Tracking, ICCV 2019
VIS Visual Intelligence
and Systems

nuScene 3D Tracking Testing Set


25

20
mMOTA

15

10

0
CenterTrack PermaTrack DEFT QD-3DT
VIS Visual Intelligence
and Systems

Ke, Li, Danelljan, Tai, Tang, Yu, Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021
VIS Visual Intelligence

PCAN for Segmentation Tracking


and Systems

Ke, Li, Danelljan, Tai, Tang, Yu, Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021
VIS Visual Intelligence

PCAN for Segmentation Tracking


and Systems

Ke, Li, Danelljan, Tai, Tang, Yu, Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021
VIS Visual Intelligence

PCAN for Segmentation Tracking


and Systems

Ke, Li, Danelljan, Tai, Tang, Yu, Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021
VIS Visual Intelligence
and Systems

STMask (CVPR 21) Ours (PCAN)


Ke, Li, Danelljan, Tai, Tang, Yu, Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021
VIS Visual Intelligence
and Systems

More research works: http://vis.xyz


VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems
VIS Visual Intelligence
and Systems

All projects @ http://vis.xyz


Github: SysCV
Twitter: @DrFisherYu

You might also like