Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 48

Part 1: Bag-of-words models

by Li Fei-Fei (UIUC)

Unrestricted
Related works
• Early “bag of words” models: mostly texture
recognition
– Cula et al. 2001; Leung et al. 2001; Schmid 2001;
Varma et al. 2002, 2003; Lazebnik et al. 2003
• Hierarchical Bayesian models for documents
(pLSA, LDA, etc.)
– Hoffman 1999; Blei et al, 2004; Teh et al. 2004
• Object categorization
– Dorko et al. 2004; Csurka et al. 2003; Sivic et al.
2005; Sudderth et al. 2005;
• Natural scene categorization
– Fei-Fei et al. 2005
Object Bag of ‘words’
Analogy to documents
China is forecasting a trade surplus of $90bn
Of all the sensory impressions proceeding to
(£51bn) to $100bn this year, a threefold
the brain, the visual experiences are the
increase on 2004's $32bn. The Commerce
dominant ones. Our perception of the world
Ministry said the surplus would be created by
around us is based essentially on the
a predicted 30% jump in exports to $750bn,
messages that reach the brain from our eyes.
compared with a 18% rise in imports to
For a long time it was thought that the retinal
$660bn. The figures are likely to further
sensory,
image was transmitted pointbrain,
by point to visual China, trade,
annoy the US, which has long argued that
centers in the brain; the cerebral cortex was
visual, perception,
a movie screen, so to speak, upon which the
surplus,
China's exports commerce,
are unfairly helped by a
deliberately undervalued yuan. Beijing
image inretinal,
the eye was cerebral cortex,
projected. Through the exports, imports, US,
agrees the surplus is too high, but says the
discoveries ofeye,
Hubelcell, optical
and Wiesel we now yuan, bank, domestic,
yuan is only one factor. Bank of China
know that behind the origin of the visual
perception in thenerve, image foreign, increase,
governor Zhou Xiaochuan said the country
brain there is a considerably
also needed to do more to boost domestic
more complicated Hubel, Wiesel
course of events. By
demand so moretrade, value
goods stayed within the
following the visual impulses along their path
country. China increased the value of the
to the various cell layers of the optical cortex,
yuan against the dollar by 2.1% in July and
Hubel and Wiesel have been able to
permitted it to trade within a narrow band, but
demonstrate that the message about the
the US wants the yuan to be allowed to trade
image falling on the retina undergoes a step-
freely. However, Beijing has made it clear
wise analysis in a system of nerve cells
that it will take its time and tread carefully
stored in columns. In this system each cell
before allowing the yuan to rise further in
has its specific function and is responsible for
value.
a specific detail in the pattern of the retinal
image.
learning recognition

codewords dictionary
feature detection
& representation

image representation

category models category


(and/or) classifiers decision
Representation

2.
codewords dictionary
feature detection
1. & representation

image representation

3.
1.Feature detection and representation
1.Feature detection and representation

• Regular grid
– Vogel et al. 2003
– Fei-Fei et al. 2005
1.Feature detection and representation

• Regular grid
– Vogel et al. 2003
– Fei-Fei et al. 2005
• Interest point detector
– Csurka et al. 2004
– Fei-Fei et al. 2005
– Sivic et al. 2005
1.Feature detection and representation

• Regular grid
– Vogel et al. 2003
– Fei-Fei et al. 2005
• Interest point detector
– Csurka et al. 2004
– Fei-Fei et al. 2005
– Sivic et al. 2005
• Other methods
– Random sampling (Ullman et al. 2002)
– Segmentation based patches (Barnard et al. 2003)
1.Feature detection and representation

Compute
SIFT Normalize
descriptor patch
[Lowe’99]

Detect patches
[Mikojaczyk and Schmid ’02]
[Matas et al. ’02]
[Sivic et al. ’03]

Slide credit: Josef Sivic


1.Feature detection and representation


2. Codewords dictionary formation


2. Codewords dictionary formation

Vector quantization

Slide credit: Josef Sivic


2. Codewords dictionary formation

Fei-Fei et al. 2005


Image patch examples of codewords

Sivic et al. 2005


3. Image representation
frequency

…..
codewords
Representation

2.
codewords dictionary
feature detection
1. & representation

image representation

3.
Learning and
Recognition

codewords dictionary

category models category


(and/or) classifiers decision
2 case studies

1. Naïve Bayes classifier


– Csurka et al. 2004

2. Hierarchical Bayesian text models


(pLSA and LDA)
– Background: Hoffman 2001, Blei et al. 2004
– Object categorization: Sivic et al. 2005, Sudderth et
al. 2005
– Natural scene categorization: Fei-Fei et al. 2005
First, some notations

• wn: each patch in an image


– wn = [0,0,…1,…,0,0]T
• w: a collection of all N patches in an image
– w = [w1,w2,…,wN]
• dj: the jth image in an image collection
• c: category of the image
• z: theme or topic of the patch
Case #1: the Naïve Bayes model

c w
N

N
c   arg max p (c | w)  p(c) p ( w | c)  p(c) p( wn | c)
c n 1

Object class Prior prob. of Image likelihood


decision the object classes given the class

Csurka et al. 2004


Csurka et al. 2004
Csurka et al. 2004
Case #2: Hierarchical Bayesian
text models
Probabilistic Latent Semantic Analysis (pLSA)

d z w
N
D Hoffman, 2001

Latent Dirichlet Allocation (LDA)

c  z w
N
D Blei et al., 2001
Case #2: Hierarchical Bayesian
text models
Probabilistic Latent Semantic Analysis (pLSA)

d z w
N
D

“face”

Sivic et al. ICCV 2005


Case #2: Hierarchical Bayesian
text models

“beach”

Latent Dirichlet Allocation (LDA)

c  z w
N
D
Fei-Fei et al. ICCV 2005
d z w Case #2: the pLSA model
N
D
d z w Case #2: the pLSA model
N
D

K
p( wi | d j )   p( wi | z k ) p( z k | d j )
k 1

Observed codeword Codeword distributions Theme distributions


distributions per theme (topic) per image
Slide credit: Josef Sivic
Case #2: Recognition using pLSA


z  arg max p ( z | d )
z

Slide credit: Josef Sivic


Case #2: Learning the pLSA parameters
Observed counts of
word i in document j

Maximize likelihood of data using EM

M … number of codewords
N … number of images
Slide credit: Josef Sivic
Demo

• Course website
task: face detection – no labeling
Demo: feature detection
• Output of crude feature detector
– Find edges
– Draw points randomly from edge set
– Draw from uniform distribution to get scale
Demo: learnt parameters
• Learning the model: do_plsa(‘config_file_1’)
• Evaluate and visualize the model: do_plsa_evaluation(‘config_file_1’)

Codeword distributions Theme distributions


per theme (topic) per image

p( w | z ) p( z | d )
Demo: recognition examples
Demo: categorization results

• Performance of each theme


Demo: naïve Bayes
• Learning the model: do_naive_bayes(‘config_file_2’)
• Evaluate and visualize the model: do_naive_bayes_evaluation(‘config_file_2’)
Learning and
Recognition

codewords dictionary

category models category


(and/or) classifiers decision
Invariance issues
• Scale and rotation
– Implicit
– Detectors and descriptors

Kadir and Brady. 2003


Invariance issues
• Scale and rotation
• Occlusion
– Implicit in the models
– Codeword distribution: small variations
– (In theory) Theme (z) distribution: different
occlusion patterns
Invariance issues
• Scale and rotation
• Occlusion
• Translation
– Encode (relative) location information

Sudderth et al. 2005


Invariance issues
• Scale and rotation
• Occlusion
• Translation
• View point (in theory)
– Codewords: detector
and descriptor
– Theme distributions:
different view points

Fergus et al. 2005


Model properties
Of all the sensory impressions proceeding to
the brain, the visual experiences are the
dominant ones. Our perception of the world
around us is based essentially on the
messages that reach the brain from our eyes.
For a long time it was thought that the retinal
sensory,
image was transmitted pointbrain,
by point to visual
centers in the brain; the cerebral cortex was
• Intuitive visual, perception,
a movie screen, so to speak, upon which the
image inretinal,
the eye was cerebral cortex,
projected. Through the
– Analogy to documents discoveries ofeye,
Hubelcell, optical
and Wiesel we now
know that behind the origin of the visual
perception in thenerve, image
brain there is a considerably
more complicated Hubel, Wiesel
course of events. By
following the visual impulses along their path
to the various cell layers of the optical cortex,
Hubel and Wiesel have been able to
demonstrate that the message about the
image falling on the retina undergoes a step-
wise analysis in a system of nerve cells
stored in columns. In this system each cell
has its specific function and is responsible for
a specific detail in the pattern of the retinal
image.
Model properties

• Intuitive
• (Could use)
generative models
– Convenient for weakly-
or un-supervised
training
– Prior information
– Hierarchical Bayesian
framework Sivic et al., 2005, Sudderth et al., 2005
Model properties

• Intuitive
• (Could use)
generative models
• Learning and
recognition relatively
fast
– Compare to other
methods
Weakness of the model

• No rigorous geometric information


of the object components
• It’s intuitive to most of us that
objects are made of parts – no
such information
• Not extensively tested yet for
– View point invariance
– Scale invariance
• Segmentation and localization
unclear

You might also like