Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Object Tracking

and Action Recognition


NGUYEN Duc Dung, Ph.D.
Institute of Information Technology, VAST
Course Plan
• Introduction

• Image Recognition • Object Tracking and


• Labwork Action Recognition
• Labwork

• Object Detection • 3D Reconstruction


• Labwork • Labwork

• Project Presentation

2
Computer Vision

Computer vision is an interdisciplinary scientific


field that deals with how computers can gain
high-level understanding from digital images or
videos.
Computer vision - Wikipedia

3
Video Analysis Examples

4
Video Analysis Examples

5
Video vs. (Still) Image?

6
Object Tracking

Source: https://pyimagesearch.com/
7
Object Tracking: Single vs. Multiple

Source: https://learnopencv.com/

8
Detection-based vs. Detection -free

9
Tracking-by-detection

Source: [1411.7935] Multiple object tracking with context awareness (arxiv.org)


10
Step 1: Detect object

Source: https://pyimagesearch.com/ 11
Step 2: Calculating distance between
new and existing objects

Source: https://pyimagesearch.com/ 12
Step 3: Update object’s location

Source: https://pyimagesearch.com/ 13
Step 4: Register new objects

Source: https://pyimagesearch.com/ 14
DeepSORT

15
Object Tracking Challenges

16
Action Recognition Using Kinect Sensors
Color & depth frames

Skeleton frame

17
KINECT: Configuration

Sensors Hardware Software


Capability Capability
RGB camera 320x240, 640x480 RGB Frames
Depth camera 320x240, 640x480, 0.8m-4m Depth Frames
IR camera IR light Skeletal tracking
Microphone array 4 Microphones Voice recognition:
English, French,
Spanish, Italian,
Japanese

The 1st PR4MCA Workshop 18


MSRC-12 Gesture Data

• Microsoft Research
Cambrige
• Data recording
– 12 gestures
– 30 peoples
– Video, illustration, text
instruction
• Data information
– 20 3D-points
– 719,359 frames in 6h40’
– 594 files
– 6,244 actions

The 1st PR4MCA Workshop 19


MSRC-12 Gestures

The 1st PR4MCA Workshop 20


Feature Extraction

Data dimensions
35 frames

• Joint velocity:
20x3x34

• Angle:
35x35

• Angle velocity:
35x34

The 1st PR4MCA Workshop 21


Gesture Classification

Support Vector Machines Relevance Vector Machines


• 1-vs-1 majority vote • 1-vs-the rest

• Efficiency in training and • Applicable to gesture


testing detection/spotting

The 1st PR4MCA Workshop 22


Parameter Selection

• RVM is more sensitive to the RBF width


• Similar “optimal” value for both SVM and RVM
The 1st PR4MCA Workshop 23
SVM vs. RVM: Performance and Size

• Similar predictive performance


• RVM is much more compact, faster in testing phase
The 1st PR4MCA Workshop 24
Leave-subject-out Error

• SVM vs. RVM: similar predictive behavior


• Actions of some subjects are hard to predict
The 1st PR4MCA Workshop 25
SVM vs RVM: Accuracy

G9 Had enough
G10 Change weapon
G11 Beat both

26
Labwork
• Face tracking

• Vehicle/people counting and speed estimation

27

You might also like