Professional Documents
Culture Documents
Part 1: Bag-Of-Words Models: by Li Fei-Fei (UIUC)
Part 1: Bag-Of-Words Models: by Li Fei-Fei (UIUC)
by Li Fei-Fei (UIUC)
Unrestricted
Related works
• Early “bag of words” models: mostly texture
recognition
– Cula et al. 2001; Leung et al. 2001; Schmid 2001;
Varma et al. 2002, 2003; Lazebnik et al. 2003
• Hierarchical Bayesian models for documents
(pLSA, LDA, etc.)
– Hoffman 1999; Blei et al, 2004; Teh et al. 2004
• Object categorization
– Dorko et al. 2004; Csurka et al. 2003; Sivic et al.
2005; Sudderth et al. 2005;
• Natural scene categorization
– Fei-Fei et al. 2005
Object Bag of ‘words’
Analogy to documents
China is forecasting a trade surplus of $90bn
Of all the sensory impressions proceeding to
(£51bn) to $100bn this year, a threefold
the brain, the visual experiences are the
increase on 2004's $32bn. The Commerce
dominant ones. Our perception of the world
Ministry said the surplus would be created by
around us is based essentially on the
a predicted 30% jump in exports to $750bn,
messages that reach the brain from our eyes.
compared with a 18% rise in imports to
For a long time it was thought that the retinal
$660bn. The figures are likely to further
sensory,
image was transmitted pointbrain,
by point to visual China, trade,
annoy the US, which has long argued that
centers in the brain; the cerebral cortex was
visual, perception,
a movie screen, so to speak, upon which the
surplus,
China's exports commerce,
are unfairly helped by a
deliberately undervalued yuan. Beijing
image inretinal,
the eye was cerebral cortex,
projected. Through the exports, imports, US,
agrees the surplus is too high, but says the
discoveries ofeye,
Hubelcell, optical
and Wiesel we now yuan, bank, domestic,
yuan is only one factor. Bank of China
know that behind the origin of the visual
perception in thenerve, image foreign, increase,
governor Zhou Xiaochuan said the country
brain there is a considerably
also needed to do more to boost domestic
more complicated Hubel, Wiesel
course of events. By
demand so moretrade, value
goods stayed within the
following the visual impulses along their path
country. China increased the value of the
to the various cell layers of the optical cortex,
yuan against the dollar by 2.1% in July and
Hubel and Wiesel have been able to
permitted it to trade within a narrow band, but
demonstrate that the message about the
the US wants the yuan to be allowed to trade
image falling on the retina undergoes a step-
freely. However, Beijing has made it clear
wise analysis in a system of nerve cells
that it will take its time and tread carefully
stored in columns. In this system each cell
before allowing the yuan to rise further in
has its specific function and is responsible for
value.
a specific detail in the pattern of the retinal
image.
learning recognition
codewords dictionary
feature detection
& representation
image representation
2.
codewords dictionary
feature detection
1. & representation
image representation
3.
1.Feature detection and representation
1.Feature detection and representation
• Regular grid
– Vogel et al. 2003
– Fei-Fei et al. 2005
1.Feature detection and representation
• Regular grid
– Vogel et al. 2003
– Fei-Fei et al. 2005
• Interest point detector
– Csurka et al. 2004
– Fei-Fei et al. 2005
– Sivic et al. 2005
1.Feature detection and representation
• Regular grid
– Vogel et al. 2003
– Fei-Fei et al. 2005
• Interest point detector
– Csurka et al. 2004
– Fei-Fei et al. 2005
– Sivic et al. 2005
• Other methods
– Random sampling (Ullman et al. 2002)
– Segmentation based patches (Barnard et al. 2003)
1.Feature detection and representation
Compute
SIFT Normalize
descriptor patch
[Lowe’99]
Detect patches
[Mikojaczyk and Schmid ’02]
[Matas et al. ’02]
[Sivic et al. ’03]
…
2. Codewords dictionary formation
…
2. Codewords dictionary formation
Vector quantization
…..
codewords
Representation
2.
codewords dictionary
feature detection
1. & representation
image representation
3.
Learning and
Recognition
codewords dictionary
c w
N
N
c arg max p (c | w) p(c) p ( w | c) p(c) p( wn | c)
c n 1
d z w
N
D Hoffman, 2001
c z w
N
D Blei et al., 2001
Case #2: Hierarchical Bayesian
text models
Probabilistic Latent Semantic Analysis (pLSA)
d z w
N
D
“face”
“beach”
c z w
N
D
Fei-Fei et al. ICCV 2005
d z w Case #2: the pLSA model
N
D
d z w Case #2: the pLSA model
N
D
K
p( wi | d j ) p( wi | z k ) p( z k | d j )
k 1
z arg max p ( z | d )
z
M … number of codewords
N … number of images
Slide credit: Josef Sivic
Demo
• Course website
task: face detection – no labeling
Demo: feature detection
• Output of crude feature detector
– Find edges
– Draw points randomly from edge set
– Draw from uniform distribution to get scale
Demo: learnt parameters
• Learning the model: do_plsa(‘config_file_1’)
• Evaluate and visualize the model: do_plsa_evaluation(‘config_file_1’)
p( w | z ) p( z | d )
Demo: recognition examples
Demo: categorization results
codewords dictionary
• Intuitive
• (Could use)
generative models
– Convenient for weakly-
or un-supervised
training
– Prior information
– Hierarchical Bayesian
framework Sivic et al., 2005, Sudderth et al., 2005
Model properties
• Intuitive
• (Could use)
generative models
• Learning and
recognition relatively
fast
– Compare to other
methods
Weakness of the model