Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Performance of retrieval model

1
• Missing semantic information (e.g. word sense).

• Missing syntactic information (e.g. phrase structure, word order)

• Assumption of term independence (e.g. ignores synonomy).

2
• Relevant items are those documents that help the user in
answering his question.

• Non-Relevant items are items that don’t provide actually useful


information.

• For each item there are two possibilities it can be retrieved or


not retrieved by the user’s query

3
• Information retrieval performance can be measured with two metrics: precision and
recall.

• When a user decides to search for information on a topic, the total database and the
results to be obtained can be divided into 4 categories:

• Relevant and Retrieved


• Relevant and Not Retrieved
• Non-Relevant and Retrieved
• Non-Relevant and Not Retrieved

4
Relevant and Retrieved

Relevant and Not Retrieved

Non-Relevant and Retrieved

Non-Relevant and Not Retrieved

5
Consider the following example for classifying sport
web site
Document ID Ground Truth Classifier Output
D1 Sports Sports
D2 Sports Sports
D3 Not Sports Sports
D4 Sports Not Sports
D5 Not Sports Not Sports
D6 Sports Not Sports
D7 Not Sports Sports
D8 Not Sports Not Sports
D9 Not Sports Not Sports
D10 Sports Sports
D11 Sports Sports
D12 Sports Not Sports

6
Now, let us find TP, TN, FP and FN values.
TP = The document was classified as “Sports” and was actually “Sports”. D1, D2, D10, and D11 correspond to TP.

TN = The document was classified as “Not sports” and was actually “Not sports”. D5, D8, and D9 correspond to TN.

FP = The document was classified as “Sports” but was actually “Not sports”. D3 and D7 correspond to FP.

FN = The document was classified as “Not sports” but was actually “Sports”. D4, D6, and D12 correspond to FN.

So, TP = 4, TN = 3, FP = 2 and FN = 3.

Finally, precision = TP/(TP+FP) = 4/6 = 2/3

recall = TP/(TP+FN) = 4/7.

This means when the precision is 2/3, the recall is 4/7.

7
relevant irrelevant
Entire document retrieved & Not retrieved &
collection Retrieved
Relevant documents documents irrelevant irrelevant

retrieved & not retrieved but


relevant relevant

retrieved not retrieved

Number of relevant documents retrieved


precision 
Total number of documents retrieved

Number of relevant documents retrieved


recall 
Total number of relevant documents
8
• Precision
– The ability to retrieve top-ranked documents that are mostly
relevant.
• Recall
– The ability of the search to find all of the relevant items in the
corpus.

9
Determining Recall is Difficult
• Total number of relevant items is sometimes not available:
– Sample across the database and perform relevance judgment on
these items.
– Apply different retrieval algorithms to the same database for the
same query. The aggregate of relevant items is taken as the total
relevant set.

10
Trade-off between Recall and Precision
Returns relevant documents but
misses many useful ones too The ideal
1

Precision

0 1
Recall Returns most relevant
documents but includes
lots of junk

11
• For a given query, produce the ranked list of retrievals.

• Adjusting a threshold on this ranked list produces different sets of


retrieved documents, and therefore different recall/precision measures.

• Mark each document in the ranked list that is relevant according to the
gold (truth) standard.

• Compute a recall/precision pair for each position in the ranked list that
contains a relevant document.
12
n doc # relevant
1 588 x Let total # of relevant docs = 6
2 589 x Check each new recall point:
3 576
R=1/6=0.167; P=1/1=1
4 590 x
5 986
6 592 x
R=2/6=0.333; P=2/2=1 R:Recall
7 984 R=3/6=0.5; P=3/4=0.75 P:Precision
8 988
9 578 R=4/6=0.667; P=4/6=0.667
10 985
11 103
12 591
13 772 x R=5/6=0.833; p=5/13=0.38
14 990
13
n doc # relevant
Let total # of relevant docs = 6
1 588 x
Check each new recall point:
2 576
3 589 x
R=1/6=0.167; P=1/1=1
4 342
5 590 x
R=2/6=0.333; P=2/3=0.667
6 717
7 984 R=3/6=0.5; P=3/5=0.6
8 772 x
9 321 x R=4/6=0.667; P=4/8=0.5
10 498
11 113 R=5/6=0.833; P=5/9=0.556
12 628
13 772
14 592 x R=6/6=1.0; p=6/14=0.429
14
Interpolating a Recall/Precision Curve
• Interpolate a precision value for each standard recall level:
– rj {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}
– r0 = 0.0, r1 = 0.1, …, r10=1.0
• The interpolated precision at the j-th standard recall level is the
maximum known precision at any recall level between the j-th and
(j + 1)-th level:

P(rj )  max P(r )


r j  r  r j 1
15
16
17
• Typically average performance over a large set of queries.

• Compute average precision at each standard recall level across


all queries.

• Plot average precision/recall curves to evaluate overall system


performance on a document/query corpus.

18
• The curve closest to the upper right-hand corner of the graph
indicates the best performance

1
0.8 NoStem Stem
Precision

0.6
0.4
0.2
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
19
• Stem consider a group of words having the same stem
(closed in meaning)

• Replay: stem Play

• Running : stem Run

20
• Precision at the R-th position in the ranking of results for a
query that has R relevant documents.
n doc # relevant
1 588 x
R = # of relevant docs = 6
2 589 x
3 576
4 590 x
5 986
6 592 x R-Precision = 4/6 = 0.67
7 984
8 988
9 578
10 985
11 103
12 591
13 772 x
14 990
21
• One measure of performance that takes into account both
recall and precision.
• Harmonic mean of recall and precision:

2 PR 2
F   1 1
PR 
R P

• Compared to arithmetic mean, both need to be high for


harmonic mean to be high.

22
• A variant of F measure that allows weighting emphasis on
precision over recall:
(1   ) PR (1   )
2 2
E  2 1
 PR
2
R

P

• Value of  controls trade-off:


–  = 1: Equally weight precision and recall (E=F).
–  > 1: Weight recall more.
–  < 1: Weight precision more.

23

You might also like