Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Introduction to Computer

Vision
Gaoang Wang
Education
2009-2013, Fudan University, B.S.
2013-2015, University of Wisconsin-Madison, M.S.
2015-2019, University of Washington, Seattle, Ph.D.

Working Experience
2019.06-2019.11, Research Scientist, Megvii

gaoangwang@intl.zju.edu.cn
2019.11-2020.07, Research Scientist II, Wyze Labs
2020.09-present, Assistant Professor, Zhejiang University
Homepage:
https://person.zju.edu.cn/gaoangwang
2021.03-present, Adjunct Assistant Professor, UIUC

Teaching
Machine Learning, Data Mining, Advanced Image Processing, Intro
to Computer Vision

Research Areas
Computer Vision, Machine Learning, Artificial Intelligence
Every image tells a story
• Goal of computer vision:
perceive the “story” behind
the picture
• But what does “story”
mean?
• Depends on what we want
to do with it
The goal(s) or computer vision
• What is the image about?
• What objects are in the image?
• Where are they?
• How are they oriented?
• What is the layout of the scene
in 3D?
• What is the shape of each
object?
Recent progress
• Depth cameras

https://realsense.intel.com/stereo/
Microsoft Kinect
Recent progress
• shape capture

The Matrix movies, ESC Entertainment, XYZRGB, NRC


Source: S. Seitz
Recent progress
• Optical character recognition (OCR)

Digit recognition, AT&T labs


http://www.research.att.com/~yann/ License plate readers
http://en.wikipedia.org/wiki/A
utomatic_number_plate_reco
gnition

Automatic check processing Source: S. Seitz


Recent progress
• Face detection

Source: S. Seitz
Recent progress
• Established technology: 3D Models of the world

Building Rome in a Day.


Sameer Agarwal, Noah
Snavely, Ian Simon, Steven M.
Seitz and Richard Szeliski.
ICCV, 2009, Kyoto, Japan.
Recent progress
• Recognizing objects

Mask R-CNN. Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick. ICCV 2017
Recent progress
• Species recognition

[iNaturalist]
Recent progress
• recognizing rare concepts
Recent progress
• Recovering 3D structure from limited views
Recent progress
• Integrating Vision and Action

Plan Turn left

Map

Cognitive Mapping and Planning for Visual Navigation


Saurabh Gupta, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik
CVPR 2017
CV Related Tasks
• Image Classification
• Detection and Tracking
• Pose Estimation
• Segmentation
• 3D and Localization
• Image Reconstruction
•…
Image Classification
• Fine-grained image classification
• Face recognition
• Face verification
Fine-Grained Image Classification
• Classify sub-categories
Face Recognition
• Identify different faces
Face Verification
• Verify whether two faces from the same person

Embedding
Features
Networks

Are they from the same person?


Detection and Tracking
• Video object detection
• 3D object detection
• Visual tracking
• Multi-object tracking
Video Object Detection
• Detect objects in sequential frames
3D Object Detection
• Localize the 3D shape or position of the targets
Visual Tracking
• Given the annotation of the first frame, detect the same object in
following frames
Multi-Object Tracking
• Associate detected objects in the input video
Pose Estimation
• Hand pose estimation
• Human pose estimation
• Car keypoint detection
Hand Pose Estimation
• Estimate 2D/3D hand pose from RGB image or depth image
Human Pose Estimation
• Estimate 2D/3D human pose
Car Keypoint Estimation
• Estimate 2D/3D car keypoint
Segmentation
• Semantic segmentation
• Instance segmentation
• Video object segmentation
Semantic Segmentation
• Segment objects with class labels
Instance Segmentation
• Segment objects with instance labels
Video Object Segmentation
• Segment objects in the video sequence
3D and Localization
• Depth map estimation
• Optical / scene flow estimation
• Camera pose estimation
Depth Map Estimation
• Estimate depth map from RGB images
Optical / Scene Flow Estimation
• Estimate the 2D/3D offsets between two images
Camera Pose Estimation
• Estimate camera location and orientation based on sequential
frames
Image Reconstruction
• Image denoising
• Super-resolution
• Image inpainting
Image Denoising
• Reconstruct images with noise
Super-Resolution
• Reconstruct high resolution images from low resolution images
Image Inpainting
• Recover missing regions in the image
Images are represented by pixels
The pinhole camera
Consequences
• Nearby pixels are similar
• Nearby pixels that are not similar
tend to lie on different objects
• Idea: To find where one object
ends and another begins, look
for abrupt changes in color
• Places of color change might
correspond to object boundaries
• Object boundaries are a clue to
object shape
• Idea: Use rough boundaries to
recognize object(s)
Consequences
• Nearby pixels are similar
• Counterexample: camouflage
Consequences
• Farther away objects appear smaller
Consequences
• Image formation is lossy
• Idea: use multiple images
• Need to find which pixel in image 2 matches which in image 1 - the
correspondence problem
Consequences
• Pixel color is complicated
• Idea: rely less on absolute color. Look at changes in color (may be object
boundaries or change in paint) instead
Challenges
• Images are ambiguous
Challenges
• Objects blend together
Challenges
• The many faces of intra-class variance

Shape Occlusion
variation

Viewpoint
variation

Scale
Background Illumination
clutter
Hard examples
• Concepts are subtle

Tenessee Warbler Orange Crowned


Warbler

https://www.allaboutbirds.org
Challenges
• local ambiguity

slide credit: Fei-Fei, Fergus & Torralba

You might also like