Part 1: Bag-Of-Words Models: by Li Fei-Fei (UIUC)

Part 1: Bag-of-words models
by Li Fei-Fei (UIUC)
Unrestricted
Related works
• Early “bag of words” models: mostly texture
recognition
– Cula et al. 2001; Leung et al. 2001; Schmid 2001;
Varma et al. 2002, 2003; Lazebnik et al. 2003
• Hierarchical Bayesian models for documents
(pLSA, LDA, etc.)
– Hoffman 1999; Blei et al, 2004; Teh et al. 2004
• Object categorization
– Dorko et al. 2004; Csurka et al. 2003; Sivic et al.
2005; Sudderth et al. 2005;
• Natural scene categorization
– Fei-Fei et al. 2005
Object Bag of ‘words’
Analogy to documents
China is forecasting a trade surplus of $90bn
Of all the sensory impressions proceeding to
(£51bn) to $100bn this year, a threefold
the brain, the visual experiences are the
increase on 2004's $32bn. The Commerce
dominant ones. Our perception of the world
Ministry said the surplus would be created by
around us is based essentially on the
a predicted 30% jump in exports to $750bn,
messages that reach the brain from our eyes.
compared with a 18% rise in imports to
For a long time it was thought that the retinal
$660bn. The figures are likely to further
sensory,
image was transmitted pointbrain,
by point to visual China, trade,
annoy the US, which has long argued that
centers in the brain; the cerebral cortex was
visual, perception,
a movie screen, so to speak, upon which the
surplus,
China's exports commerce,
are unfairly helped by a
deliberately undervalued yuan. Beijing
image inretinal,
the eye was cerebral cortex,
projected. Through the exports, imports, US,
agrees the surplus is too high, but says the
discoveries ofeye,
Hubelcell, optical
and Wiesel we now yuan, bank, domestic,
yuan is only one factor. Bank of China
know that behind the origin of the visual
perception in thenerve, image foreign, increase,
governor Zhou Xiaochuan said the country
brain there is a considerably
also needed to do more to boost domestic
more complicated Hubel, Wiesel
course of events. By
demand so moretrade, value
goods stayed within the
following the visual impulses along their path
country. China increased the value of the
to the various cell layers of the optical cortex,
yuan against the dollar by 2.1% in July and
Hubel and Wiesel have been able to
permitted it to trade within a narrow band, but
demonstrate that the message about the
the US wants the yuan to be allowed to trade
image falling on the retina undergoes a step-
freely. However, Beijing has made it clear
wise analysis in a system of nerve cells
that it will take its time and tread carefully
stored in columns. In this system each cell
before allowing the yuan to rise further in
has its specific function and is responsible for
value.
a specific detail in the pattern of the retinal
image.
learning recognition
codewords dictionary
feature detection
& representation
image representation
category models category

(and/or) classifiers decision
Representation
2.
feature detection
1. & representation
3.
1.Feature detection and representation
• Regular grid
– Vogel et al. 2003
• Regular grid
• Interest point detector
– Csurka et al. 2004
– Sivic et al. 2005
• Regular grid
• Interest point detector
– Sivic et al. 2005
• Other methods
– Random sampling (Ullman et al. 2002)
– Segmentation based patches (Barnard et al. 2003)
Compute
SIFT Normalize
descriptor patch
[Lowe’99]
Detect patches
[Mikojaczyk and Schmid ’02]
[Matas et al. ’02]
[Sivic et al. ’03]
Slide credit: Josef Sivic

…
2. Codewords dictionary formation
…
Vector quantization

Fei-Fei et al. 2005

Image patch examples of codewords
Sivic et al. 2005

3. Image representation
frequency
…..
codewords
Representation
2.
feature detection
1. & representation
3.
Learning and
Recognition

2 case studies
1. Naïve Bayes classifier

2. Hierarchical Bayesian text models

(pLSA and LDA)
– Background: Hoffman 2001, Blei et al. 2004
– Object categorization: Sivic et al. 2005, Sudderth et
al. 2005
– Natural scene categorization: Fei-Fei et al. 2005
First, some notations
• wn: each patch in an image

– wn = [0,0,…1,…,0,0]T
• w: a collection of all N patches in an image
– w = [w1,w2,…,wN]
• dj: the jth image in an image collection
• c: category of the image
• z: theme or topic of the patch
Case #1: the Naïve Bayes model
c w
N
N
c   arg max p (c | w)  p(c) p ( w | c)  p(c) p( wn | c)
c n 1
Object class Prior prob. of Image likelihood

decision the object classes given the class
Csurka et al. 2004

Csurka et al. 2004
Csurka et al. 2004
Case #2: Hierarchical Bayesian
text models
Probabilistic Latent Semantic Analysis (pLSA)
d z w
N
D Hoffman, 2001
Latent Dirichlet Allocation (LDA)
c  z w
N
D Blei et al., 2001
text models
Probabilistic Latent Semantic Analysis (pLSA)
d z w
N
D
“face”
Sivic et al. ICCV 2005

text models
“beach”
Latent Dirichlet Allocation (LDA)
c  z w
N
D
Fei-Fei et al. ICCV 2005
d z w Case #2: the pLSA model
N
D
d z w Case #2: the pLSA model
N
D
K
p( wi | d j )   p( wi | z k ) p( z k | d j )
k 1
Observed codeword Codeword distributions Theme distributions

distributions per theme (topic) per image
Case #2: Recognition using pLSA

z  arg max p ( z | d )
z

Case #2: Learning the pLSA parameters
Observed counts of
word i in document j
Maximize likelihood of data using EM
M … number of codewords
N … number of images
Demo
• Course website
task: face detection – no labeling
Demo: feature detection
• Output of crude feature detector
– Find edges
– Draw points randomly from edge set
– Draw from uniform distribution to get scale
Demo: learnt parameters
• Learning the model: do_plsa(‘config_file_1’)
• Evaluate and visualize the model: do_plsa_evaluation(‘config_file_1’)
Codeword distributions Theme distributions

per theme (topic) per image
p( w | z ) p( z | d )
Demo: recognition examples
Demo: categorization results
• Performance of each theme

Demo: naïve Bayes
• Learning the model: do_naive_bayes(‘config_file_2’)
• Evaluate and visualize the model: do_naive_bayes_evaluation(‘config_file_2’)
Learning and
Recognition

Invariance issues
• Scale and rotation
– Implicit
– Detectors and descriptors
Kadir and Brady. 2003

Invariance issues
• Occlusion
– Implicit in the models
– Codeword distribution: small variations
– (In theory) Theme (z) distribution: different
occlusion patterns
Invariance issues
• Occlusion
• Translation
– Encode (relative) location information
Sudderth et al. 2005

Invariance issues
• Occlusion
• Translation
• View point (in theory)
– Codewords: detector
and descriptor
– Theme distributions:
different view points
Fergus et al. 2005

Model properties
Of all the sensory impressions proceeding to
the brain, the visual experiences are the
dominant ones. Our perception of the world
around us is based essentially on the
messages that reach the brain from our eyes.
For a long time it was thought that the retinal
sensory,
image was transmitted pointbrain,
by point to visual
centers in the brain; the cerebral cortex was
• Intuitive visual, perception,
a movie screen, so to speak, upon which the
image inretinal,
the eye was cerebral cortex,
projected. Through the
– Analogy to documents discoveries ofeye,
Hubelcell, optical
and Wiesel we now
know that behind the origin of the visual
perception in thenerve, image
brain there is a considerably
more complicated Hubel, Wiesel
course of events. By
following the visual impulses along their path
to the various cell layers of the optical cortex,
Hubel and Wiesel have been able to
demonstrate that the message about the
image falling on the retina undergoes a step-
wise analysis in a system of nerve cells
stored in columns. In this system each cell
has its specific function and is responsible for
a specific detail in the pattern of the retinal
image.
Model properties
• Intuitive
• (Could use)
generative models
– Convenient for weakly-
or un-supervised
training
– Prior information
– Hierarchical Bayesian
framework Sivic et al., 2005, Sudderth et al., 2005
Model properties
• Intuitive
• (Could use)
generative models
• Learning and
recognition relatively
fast
– Compare to other
methods
Weakness of the model
• No rigorous geometric information

of the object components
• It’s intuitive to most of us that
objects are made of parts – no
such information
• Not extensively tested yet for
– View point invariance
– Scale invariance
• Segmentation and localization
unclear

Part 1: Bag-Of-Words Models: by Li Fei-Fei (UIUC)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Part 1: Bag-Of-Words Models: by Li Fei-Fei (UIUC)

Uploaded by

Copyright:

Available Formats

Part 1: Bag-of-words models

category models category

Slide credit: Josef Sivic

Slide credit: Josef Sivic

Fei-Fei et al. 2005

Sivic et al. 2005

category models category

1. Naïve Bayes classifier

2. Hierarchical Bayesian text models

• wn: each patch in an image

Object class Prior prob. of Image likelihood

Csurka et al. 2004

Latent Dirichlet Allocation (LDA)

Sivic et al. ICCV 2005

Latent Dirichlet Allocation (LDA)

Observed codeword Codeword distributions Theme distributions

Slide credit: Josef Sivic

Maximize likelihood of data using EM

Codeword distributions Theme distributions

• Performance of each theme

category models category

Kadir and Brady. 2003

Sudderth et al. 2005

Fergus et al. 2005

• No rigorous geometric information

You might also like