Professional Documents
Culture Documents
09expand Flat
09expand Flat
http://informationretrieval.org
Hinrich Schütze
2014-05-14
1 / 63
Overview
1 Recap
2 Motivation
5 Query expansion
2 / 63
Outline
1 Recap
2 Motivation
5 Query expansion
3 / 63
Relevance
4 / 63
Relevance: query vs. information need
5 / 63
Precision and recall
6 / 63
A combined measure: F
7 / 63
Averaged 11-point precision/recall graph
1
0.8
Precision
0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
Recall
8 / 63
Google dynamic summaries for [vegetarian diet running]
9 / 63
Take-away today
10 / 63
Overview
1 Recap
2 Motivation
5 Query expansion
11 / 63
Outline
1 Recap
2 Motivation
5 Query expansion
12 / 63
How can we improve recall in search?
13 / 63
Recall
14 / 63
Options for improving recall
15 / 63
Google used to expose query expansion in UI
˜flights -flight
˜dogs -dog
no longer available:
http://searchenginewatch.com/article/2277383/Google-Kil
16 / 63
Outline
1 Recap
2 Motivation
5 Query expansion
17 / 63
Relevance feedback: Basic idea
18 / 63
Relevance feedback: Examples
19 / 63
Relevance Feedback: Example 1
20 / 63
Results for initial query
21 / 63
User feedback: Select what is relevant
22 / 63
Results after relevance feedback
23 / 63
Vector space example: query “canine” (1)
source:
Fernando Dı́az
24 / 63
Similarity of docs to query “canine”
source:
Fernando Dı́az
25 / 63
User feedback: Select relevant documents
source:
Fernando Dı́az
26 / 63
Results after relevance feedback
source:
Fernando Dı́az
27 / 63
Example 3: A real (non-image) example
Initial query:
[new space satellite applications] Results for initial query: (r = rank)
r
+ 1 0.539 NASA Hasn’t Scrapped Imaging Spectrometer
+ 2 0.533 NASA Scratches Environment Gear From Satellite Plan
3 0.528 Science Panel Backs NASA Satellite Plan, But Urges Launches
of Smaller Probes
4 0.526 A NASA Satellite Project Accomplishes Incredible Feat: Staying
Within Budget
5 0.525 Scientist Who Exposed Global Warming Proposes Satellites for
Climate Research
6 0.524 Report Provides Support for the Critics Of Using Big Satellites
to Study Climate
7 0.516 Arianespace Receives Satellite Launch Pact From Telesat
Canada
+ 8 0.509 Telecommunications Tale of Two Companies
29 / 63
Results for expanded query (old ranks in parens)
r
* 1 (2) 0.513 NASA Scratches Environment Gear From Satellite
Plan
* 2 (1) 0.500 NASA Hasn’t Scrapped Imaging Spectrometer
3 0.493 When the Pentagon Launches a Secret Satellite,
Space Sleuths Do Some Spy Work of Their Own
4 0.493 NASA Uses ‘Warm’ Superconductors For Fast Cir-
cuit
* 5 (8) 0.492 Telecommunications Tale of Two Companies
6 0.491 Soviets May Adapt Parts of SS-20 Missile For Com-
mercial Use
7 0.490 Gaping Gap: Pentagon Lags in Race To Match the
Soviets In Rocket Launchers
8 0.490 Rescue of Satellite By Space Agency To Cost $90
Million
30 / 63
Outline
1 Recap
2 Motivation
5 Query expansion
31 / 63
Key concept for relevance feedback: Centroid
32 / 63
Centroid: Examples
⋄
⋄
⋄
⋄
⋄
x x
x
x
33 / 63
Rocchio algorithm
34 / 63
Rocchio algorithm
35 / 63
Exercise: Compute Rocchio vector
x
x
x x
x
x
circles: relevant
documents, Xs: nonrelevant documents compute:
~qopt = µ(Dr ) + [µ(Dr ) − µ(Dnr )]
36 / 63
Rocchio illustrated
x
~qopt ~ ~ NR x
µR − µ
x x
~µR
x
x
~µNR
38 / 63
Rocchio 1971 algorithm (SMART)
Used in practice:
~qm = α~q0 + βµ(Dr ) − γµ(Dnr )
1 X ~ 1 X
~dj
= α~q0 + β dj − γ
|Dr | |Dnr |
~
dj ∈Dr ~
dj ∈Dnr
40 / 63
Relevance feedback: Assumptions
41 / 63
Violation of A1
42 / 63
Violation of A2
43 / 63
Relevance feedback: Assumptions
44 / 63
Relevance feedback: Evaluation
45 / 63
Relevance feedback: Evaluation
46 / 63
Evaluation: Caveat
47 / 63
Exercise
48 / 63
Relevance feedback: Problems
49 / 63
Pseudo-relevance feedback
50 / 63
Pseudo-relevance feedback at TREC4
1 Recap
2 Motivation
5 Query expansion
52 / 63
Query expansion: Example
53 / 63
Types of user feedback
54 / 63
Query expansion
55 / 63
“Global” resources used for query expansion
56 / 63
Thesaurus-based query expansion
For each term t in the query, expand the query with words the
thesaurus lists as semantically related with t.
Example from earlier: hospital → medical
Generally increases recall
May significantly decrease precision, particularly with
ambiguous terms
interest rate → interest rate fascinate
Widely used in specialized search engines for science and
engineering
It’s very expensive to create a manual thesaurus and to
maintain it over time.
57 / 63
Example for manual thesaurus: PubMed
58 / 63
Automatic thesaurus generation
59 / 63
Co-occurence-based thesaurus: Examples
60 / 63
Query expansion at search engines
61 / 63
Take-away today
62 / 63
Resources
Chapter 9 of IIR
Resources at http://cislmu.org
Salton and Buckley 1990 (original relevance feedback paper)
Spink, Jansen, Ozmultu 2000: Relevance feedback at Excite
Justin Bieber: related searches fail
Word Space
Schütze 1998: Automatic word sense discrimination (describes
a simple method for automatic thesaurus generation)
63 / 63