L05-IR Models MMN

Performance of retrieval model
1
• Missing semantic information (e.g. word sense).
• Missing syntactic information (e.g. phrase structure, word order)
• Assumption of term independence (e.g. ignores synonomy).
2
• Relevant items are those documents that help the user in
answering his question.
• Non-Relevant items are items that don’t provide actually useful

information.
• For each item there are two possibilities it can be retrieved or

not retrieved by the user’s query
3
• Information retrieval performance can be measured with two metrics: precision and
recall.
• When a user decides to search for information on a topic, the total database and the
results to be obtained can be divided into 4 categories:
• Relevant and Retrieved

• Relevant and Not Retrieved
• Non-Relevant and Retrieved
• Non-Relevant and Not Retrieved
4
Relevant and Retrieved
Relevant and Not Retrieved
Non-Relevant and Retrieved
Non-Relevant and Not Retrieved
5
Consider the following example for classifying sport
web site
Document ID Ground Truth Classifier Output
D1 Sports Sports
D2 Sports Sports
D3 Not Sports Sports
D4 Sports Not Sports
D5 Not Sports Not Sports
D7 Not Sports Sports
D10 Sports Sports
D11 Sports Sports
6
Now, let us find TP, TN, FP and FN values.
TP = The document was classified as “Sports” and was actually “Sports”. D1, D2, D10, and D11 correspond to TP.
TN = The document was classified as “Not sports” and was actually “Not sports”. D5, D8, and D9 correspond to TN.
FP = The document was classified as “Sports” but was actually “Not sports”. D3 and D7 correspond to FP.
FN = The document was classified as “Not sports” but was actually “Sports”. D4, D6, and D12 correspond to FN.
So, TP = 4, TN = 3, FP = 2 and FN = 3.
Finally, precision = TP/(TP+FP) = 4/6 = 2/3
recall = TP/(TP+FN) = 4/7.
This means when the precision is 2/3, the recall is 4/7.
7
relevant irrelevant
Entire document retrieved & Not retrieved &
collection Retrieved
Relevant documents documents irrelevant irrelevant
retrieved & not retrieved but

relevant relevant
retrieved not retrieved
Number of relevant documents retrieved

precision 
Total number of documents retrieved
Number of relevant documents retrieved

recall 
Total number of relevant documents
8
• Precision
– The ability to retrieve top-ranked documents that are mostly
relevant.
• Recall
– The ability of the search to find all of the relevant items in the
corpus.
9
Determining Recall is Difficult
• Total number of relevant items is sometimes not available:
– Sample across the database and perform relevance judgment on
these items.
– Apply different retrieval algorithms to the same database for the
same query. The aggregate of relevant items is taken as the total
relevant set.
10
Trade-off between Recall and Precision
Returns relevant documents but
misses many useful ones too The ideal
1
Precision
0 1
Recall Returns most relevant
documents but includes
lots of junk
11
• For a given query, produce the ranked list of retrievals.
• Adjusting a threshold on this ranked list produces different sets of

retrieved documents, and therefore different recall/precision measures.
• Mark each document in the ranked list that is relevant according to the
gold (truth) standard.
• Compute a recall/precision pair for each position in the ranked list that
contains a relevant document.
12
n doc # relevant
1 588 x Let total # of relevant docs = 6
2 589 x Check each new recall point:
3 576
R=1/6=0.167; P=1/1=1
4 590 x
5 986
6 592 x
R=2/6=0.333; P=2/2=1 R:Recall
7 984 R=3/6=0.5; P=3/4=0.75 P:Precision
8 988
9 578 R=4/6=0.667; P=4/6=0.667
10 985
11 103
12 591
13 772 x R=5/6=0.833; p=5/13=0.38
14 990
13
n doc # relevant
Let total # of relevant docs = 6
1 588 x
Check each new recall point:
2 576
3 589 x
R=1/6=0.167; P=1/1=1
4 342
5 590 x
R=2/6=0.333; P=2/3=0.667
6 717
7 984 R=3/6=0.5; P=3/5=0.6
8 772 x
9 321 x R=4/6=0.667; P=4/8=0.5
10 498
11 113 R=5/6=0.833; P=5/9=0.556
12 628
13 772
14 592 x R=6/6=1.0; p=6/14=0.429
14
Interpolating a Recall/Precision Curve
• Interpolate a precision value for each standard recall level:
– rj {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}
– r0 = 0.0, r1 = 0.1, …, r10=1.0
• The interpolated precision at the j-th standard recall level is the
maximum known precision at any recall level between the j-th and
(j + 1)-th level:
P(rj )  max P(r )

r j  r  r j 1
15
16
17
• Typically average performance over a large set of queries.
• Compute average precision at each standard recall level across

all queries.
• Plot average precision/recall curves to evaluate overall system

performance on a document/query corpus.
18
• The curve closest to the upper right-hand corner of the graph
indicates the best performance
1
0.8 NoStem Stem
Precision
0.6
0.4
0.2
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
19
• Stem consider a group of words having the same stem
(closed in meaning)
• Replay: stem Play
• Running : stem Run
20
• Precision at the R-th position in the ranking of results for a
query that has R relevant documents.
n doc # relevant
1 588 x
R = # of relevant docs = 6
2 589 x
3 576
4 590 x
5 986
6 592 x R-Precision = 4/6 = 0.67
7 984
8 988
9 578
10 985
11 103
12 591
13 772 x
14 990
21
• One measure of performance that takes into account both
recall and precision.
• Harmonic mean of recall and precision:
2 PR 2
F   1 1
PR 
R P
• Compared to arithmetic mean, both need to be high for

harmonic mean to be high.
22
• A variant of F measure that allows weighting emphasis on
precision over recall:
(1   ) PR (1   )
2 2
E  2 1
 PR
2
R

P
• Value of  controls trade-off:

–  = 1: Equally weight precision and recall (E=F).
–  > 1: Weight recall more.
–  < 1: Weight precision more.
23

L05-IR Models MMN

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L05-IR Models MMN

Uploaded by

Copyright:

Available Formats

Performance of retrieval model

• Missing syntactic information (e.g. phrase structure, word order)

• Assumption of term independence (e.g. ignores synonomy).

• Non-Relevant items are items that don’t provide actually useful

• For each item there are two possibilities it can be retrieved or

• Relevant and Retrieved

Relevant and Not Retrieved

Non-Relevant and Retrieved

Non-Relevant and Not Retrieved

Finally, precision = TP/(TP+FP) = 4/6 = 2/3

recall = TP/(TP+FN) = 4/7.

This means when the precision is 2/3, the recall is 4/7.

retrieved & not retrieved but

retrieved not retrieved

Number of relevant documents retrieved

Number of relevant documents retrieved

• Adjusting a threshold on this ranked list produces different sets of

P(rj )  max P(r )

• Compute average precision at each standard recall level across

• Plot average precision/recall curves to evaluate overall system

• Replay: stem Play

• Running : stem Run

• Compared to arithmetic mean, both need to be high for

• Value of  controls trade-off:

You might also like