Professional Documents
Culture Documents
8.relavance Feedback - II
8.relavance Feedback - II
Query Expansion
Introduction to Information Retrieval
2
Introduction to Information Retrieval
Recall
3
Introduction to Information Retrieval
4
Introduction to Information Retrieval
5
Introduction to Information Retrieval
Relevance feedback
6
Introduction to Information Retrieval
7
Introduction to Information Retrieval
8
Introduction to Information Retrieval
9
Introduction to Information Retrieval
10
Introduction to Information Retrieval
11
Introduction to Information Retrieval
Centroid: Example
12
Introduction to Information Retrieval
Rocchio’ algorithm
The Rocchio’ algorithm implements relevance feedback in
the vector space model.
Rocchio’ chooses the query that maximizes
13
Introduction to Information Retrieval
Rocchio’ algorithm
The optimal query vector is:
14
Introduction to Information Retrieval
Rocchio’ illustrated
Rocchio’ illustrated
Rocchio’ illustrated
Rocchio’ illustrated
19
Introduction to Information Retrieval
Rocchio’ illustrated
- difference vector
20
Introduction to Information Retrieval
Rocchio’ illustrated
Rocchio’ illustrated
… to get
22
Introduction to Information Retrieval
Rocchio’ illustrated
Rocchio’ illustrated
26
Introduction to Information Retrieval
Feedback in Language
Models
Introduction to Information Retrieval
Feedback in Language
Models
Introduction to Information Retrieval
34
Introduction to Information Retrieval
Violation of A1
35
Introduction to Information Retrieval
Violation of A2
36
Introduction to Information Retrieval
Misspelling
Cross-language IR
When relevant documents are a multimodal class
Subset of doc using different vocabulary: Berma/Myanmar
Query for which the answer set is inherently disjunctive
E.g., Pop starts who once worked at Burger King
Instances of a general concept appear as a disjunction of
more specific concepts: “Felines” (a cat or other member of
the cat family).
Introduction to Information Retrieval
38
Introduction to Information Retrieval
39
Introduction to Information Retrieval
Evaluation: Caveat
40
Introduction to Information Retrieval
41
Introduction to Information Retrieval
Pseudo-relevance feedback
Pseudo-relevance feedback automates the “manual” part
of true relevance feedback.
Pseudo-relevance algorithm:
Retrieve a ranked list of hits for the user’s query
Assume that the top k documents are relevant.
Do relevance feedback (e.g., Rocchio)
Works very well on average
But can go horribly wrong for some queries.
Several iterations can cause query drift
“Copper mines” => “Mines in Chile”.
42
Introduction to Information Retrieval
Query expansion
Query expansion is another method for increasing recall.
We use “global query expansion” to refer to “global
methods for query reformulation”.
In global query expansion, the query is modified based on
some global resource, i.e. a resource that is not query-
dependent.
Main information we use: (near-)synonymy
A publication or database that collects (near-)synonyms is
called a thesaurus.
We will look at two types of thesauri: manually created and
automatically created.
44
Introduction to Information Retrieval
45
Introduction to Information Retrieval
46
Introduction to Information Retrieval
47
Introduction to Information Retrieval
50
Introduction to Information Retrieval
51
Introduction to Information Retrieval
https://dl.acm.org/citation.cfm?id=2071390