Professional Documents
Culture Documents
L05-IR Models MMN
L05-IR Models MMN
1
• Missing semantic information (e.g. word sense).
2
• Relevant items are those documents that help the user in
answering his question.
3
• Information retrieval performance can be measured with two metrics: precision and
recall.
• When a user decides to search for information on a topic, the total database and the
results to be obtained can be divided into 4 categories:
4
Relevant and Retrieved
5
Consider the following example for classifying sport
web site
Document ID Ground Truth Classifier Output
D1 Sports Sports
D2 Sports Sports
D3 Not Sports Sports
D4 Sports Not Sports
D5 Not Sports Not Sports
D6 Sports Not Sports
D7 Not Sports Sports
D8 Not Sports Not Sports
D9 Not Sports Not Sports
D10 Sports Sports
D11 Sports Sports
D12 Sports Not Sports
6
Now, let us find TP, TN, FP and FN values.
TP = The document was classified as “Sports” and was actually “Sports”. D1, D2, D10, and D11 correspond to TP.
TN = The document was classified as “Not sports” and was actually “Not sports”. D5, D8, and D9 correspond to TN.
FP = The document was classified as “Sports” but was actually “Not sports”. D3 and D7 correspond to FP.
FN = The document was classified as “Not sports” but was actually “Sports”. D4, D6, and D12 correspond to FN.
So, TP = 4, TN = 3, FP = 2 and FN = 3.
7
relevant irrelevant
Entire document retrieved & Not retrieved &
collection Retrieved
Relevant documents documents irrelevant irrelevant
9
Determining Recall is Difficult
• Total number of relevant items is sometimes not available:
– Sample across the database and perform relevance judgment on
these items.
– Apply different retrieval algorithms to the same database for the
same query. The aggregate of relevant items is taken as the total
relevant set.
10
Trade-off between Recall and Precision
Returns relevant documents but
misses many useful ones too The ideal
1
Precision
0 1
Recall Returns most relevant
documents but includes
lots of junk
11
• For a given query, produce the ranked list of retrievals.
• Mark each document in the ranked list that is relevant according to the
gold (truth) standard.
• Compute a recall/precision pair for each position in the ranked list that
contains a relevant document.
12
n doc # relevant
1 588 x Let total # of relevant docs = 6
2 589 x Check each new recall point:
3 576
R=1/6=0.167; P=1/1=1
4 590 x
5 986
6 592 x
R=2/6=0.333; P=2/2=1 R:Recall
7 984 R=3/6=0.5; P=3/4=0.75 P:Precision
8 988
9 578 R=4/6=0.667; P=4/6=0.667
10 985
11 103
12 591
13 772 x R=5/6=0.833; p=5/13=0.38
14 990
13
n doc # relevant
Let total # of relevant docs = 6
1 588 x
Check each new recall point:
2 576
3 589 x
R=1/6=0.167; P=1/1=1
4 342
5 590 x
R=2/6=0.333; P=2/3=0.667
6 717
7 984 R=3/6=0.5; P=3/5=0.6
8 772 x
9 321 x R=4/6=0.667; P=4/8=0.5
10 498
11 113 R=5/6=0.833; P=5/9=0.556
12 628
13 772
14 592 x R=6/6=1.0; p=6/14=0.429
14
Interpolating a Recall/Precision Curve
• Interpolate a precision value for each standard recall level:
– rj {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}
– r0 = 0.0, r1 = 0.1, …, r10=1.0
• The interpolated precision at the j-th standard recall level is the
maximum known precision at any recall level between the j-th and
(j + 1)-th level:
18
• The curve closest to the upper right-hand corner of the graph
indicates the best performance
1
0.8 NoStem Stem
Precision
0.6
0.4
0.2
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
19
• Stem consider a group of words having the same stem
(closed in meaning)
20
• Precision at the R-th position in the ranking of results for a
query that has R relevant documents.
n doc # relevant
1 588 x
R = # of relevant docs = 6
2 589 x
3 576
4 590 x
5 986
6 592 x R-Precision = 4/6 = 0.67
7 984
8 988
9 578
10 985
11 103
12 591
13 772 x
14 990
21
• One measure of performance that takes into account both
recall and precision.
• Harmonic mean of recall and precision:
2 PR 2
F 1 1
PR
R P
22
• A variant of F measure that allows weighting emphasis on
precision over recall:
(1 ) PR (1 )
2 2
E 2 1
PR
2
R
P
23