Professional Documents
Culture Documents
Local Features and Bag of Words Models
Local Features and Bag of Words Models
Previous Class
Overview and history of recognition
Svetlana Lazebnik
Svetlana Lazebnik
cloudy
brick
Svetlana Lazebnik
Object detection
find pedestrians
Svetlana Lazebnik
Image parsing
sky mountain
building
tree building
banner
street lamp market people
Svetlana Lazebnik
Image Categorization
Training
Training Images Image Features Training Labels
Classifier Training
Trained Classifier
Derek Hoiem
Image Categorization
Training
Training Images Image Features Training Labels
Classifier Training
Trained Classifier
Testing
Image Features Test Image Trained Classifier
Prediction
Outdoor
Derek Hoiem
Classifier Training
Trained Classifier
Derek Hoiem
Image representations
Templates
Intensity, gradients, etc.
Histograms
Color, texture, SIFT descriptors, etc.
Global histogram
Represent distribution of features
Color, texture, depth,
Images from Dave Kauchak
Joint histogram
Requires lots of data Loss of resolution to avoid empty bins
Images from Dave Kauchak
Marginal histogram
Requires independent features More data/bin than joint histogram
histint( hi , h j ) 1
m 1
(hi , h j )
1 2m
K 1
Few Bins
Need less data Coarser representation
Many Bins
Need more data Finer representation
Matching
Histogram intersection or Euclidean may be faster Chi-squared often works better Earth movers distance is good for when nearby bins represent similar values
Ensure smoothness
Gaussian weight Trilinear interpolation
a given gradient contributes to 8 bins: 4 in space times 2 in orientation
K. Grauman, B. Leibe
~
Example descriptor
K. Grauman, B. Leibe
Self-similarity Descriptor
Matching Local Self-Similarities across Images and Videos, Shechtman and Irani, 2007
Self-similarity Descriptor
Matching Local Self-Similarities across Images and Videos, Shechtman and Irani, 2007
Self-similarity Descriptor
Matching Local Self-Similarities across Images and Videos, Shechtman and Irani, 2007
Motion
Optical flow, tracked points
Distance
Stereo, position, occlusion, scene shape If known object: size, other objects
Bag-of-features models
Svetlana Lazebnik
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Bag-of-features steps
1. 2. 3. 4. Extract features Learn visual vocabulary Quantize features using visual vocabulary Represent images by frequencies of visual words
1. Feature extraction
Regular grid or interest regions
1. Feature extraction
Compute descriptor
Normalize patch
Detect patches
1. Feature extraction
Clustering
Visual vocabulary
Clustering
K-means clustering
Want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers mk
D( X , M )
( xi
cluster k point i in cluster k
mk ) 2
Example codebook
Appearance codebook
Source: B. Leibe
Another codebook
Appearance codebook
Source: B. Leibe
Computational efficiency
Vocabulary trees (Nister & Stewenius, 2006)
Spatial pyramid
level 0
level 0
level 2
Caltech101 dataset
http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html
Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.