Professional Documents
Culture Documents
Measure User Happiness: - How To ? - Depends On Many Factors
Measure User Happiness: - How To ? - Depends On Many Factors
Measure User Happiness: - How To ? - Depends On Many Factors
Introduction
Information Retrieval
Evaluation of IR systems 3
Introduction
• System quality
§ How fast does the system index?
• How many documents/hour for a certain distribution of
document sizes?
Information Retrieval
Evaluation of IR systems 4
Introduction
Information Retrieval
Evaluation of IR systems 5
Introduction
white AND red AND wine AND heart AND attack AND effective
Information Retrieval
Evaluation of IR systems 6
Unranked retrieval: TP, FP, FN, TN
Relevant Non-relevant
Retrieved true positive (TP) false positive (FP)
Not-retrieved false negative(FN) true negative(TN)
Relevant FN TP FP TN Non-relevant
Retrieved
Information Retrieval
Evaluation of IR systems 7
Unranked retrieval: Precision and Recall
𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡
𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 𝑇𝑃
𝑃= =
𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 𝑇𝑃 + 𝐹𝑃
𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡
𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 𝑇𝑃
𝑅= =
𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑇𝑃 + 𝐹𝑁
Information Retrieval
Evaluation of IR systems 8
Unranked retrieval: Precision and Recall
• An IR system can get high recall (but low precision) by retrieving all
documents for all queries
§ Recall is a non-decreasing function of the number of retrieved
documents
§ Precision in good IR systems is a decreasing function of the
number of retrieved documents
• Precision-oriented users
§ Web surfers
• Recall-oriented users
§ Professional searchers, paralegals, intelligence analysts
Information Retrieval
Evaluation of IR systems 9
Unranked retrieval: F-measure
1
𝐹=
1 1
𝛼𝑃+ 1−𝛼 𝑅
• 𝛼 ∈ 0,1
• 𝛼=1→𝐹=𝑃
• 𝛼=0→𝐹=𝑅
;⋅=>
• Usually 𝛼 = 0.5 → 𝐹 = =?> (balanced F-measure)
• Trade-off between the “degree of soundness” and the “degree of
completeness” of a system
∑D
CEF BC
• Weighted harmonic mean: 𝐻 = ∑D F
CEF BC G
C
Information Retrieval
Evaluation of IR systems 10
Unranked retrieval: F-measure
Information Retrieval
Evaluation of IR systems 11
Unranked retrieval: Accuracy
𝑇𝑃 + 𝑇𝑁
𝐴=
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
Information Retrieval
Evaluation of IR systems 12
Unranked retrieval: Precision and Recall drawbacks
Information Retrieval
Evaluation of IR systems 13
Ranked retrieval: Precision and Recall
Information Retrieval
Evaluation of IR systems 14
Ranked retrieval: Precision and Recall
• Precision/Recall function
• If the (𝑘 + 1)𝑡ℎ
retrieved document is relevant,
then 𝑅(𝑘 + 1)
>
𝑅 (𝑘) and 𝑃 𝑘 + 1 ≥
𝑃(𝑘)
• If the (𝑘 + 1)𝑡ℎ
retrieved document is non-relevant,
then 𝑅(𝑘 + 1)
=
𝑅 (𝑘), but 𝑃 𝑘 + 1 ≤
𝑃 (𝑘)
• To remove the jiggles, use interpolated precision
𝑃UVW 𝑅 = max 𝑃(𝑅 ])
[
> \>
Information Retrieval
Evaluation of IR systems 15
Ranked retrieval: Precision and Recall
Information Retrieval
Evaluation of IR systems 16
Ranked retrieval: Average Precision
Information Retrieval
Evaluation of IR systems 17
Ranked retrieval: Average Precision
j hC
1 1
𝑀𝐴𝑃 𝑄 = d d 𝑃(ℜg )
|𝑄| 𝑚U
UiI giI
Information Retrieval
Evaluation of IR systems 18
Ranked retrieval: Precision at k, R-precision
• Precision at k
• Set a fixed value of retrieved results 𝑘
§ Compute precision among top-𝑘 items
• pro: does not require any estimate of the size of the set of
relevant documents (useful in Web search)
• con: total number of relevant documents has strong influence
on Precision at k
– e.g. with 8 relevant docs precision at 20 can be at most 0.4
• R-precision
• Given a relevant set of size 𝑅𝑒𝑙
• Calculate number of relevant documents 𝑟 in the top-𝑅𝑒𝑙 set
• pros:
• a perfect system achieves R-precision = 1.0
l
• Intuitive meaning: = precision at 𝑅𝑒𝑙 = recall at 𝑅𝑒𝑙
>mn
• con: considers only one point on the Precision/Recall curve
Information Retrieval
Evaluation of IR systems 19
Ranked retrieval: Receiver-Operating-Characteristic (ROC)
Information Retrieval
Evaluation of IR systems 20
Ranked retrieval: example
§ An IR system gives the following rankings in response to two
queries 𝑞I and 𝑞;
§ The highlighted documents are the ones relevant to the user for a
specific query
§ Suppose that the whole document collection is shown for each
query
• The total number of relevant and non-relevant documents is
known
𝑞I 𝑞;
A C
B E
F A
D D
C B
E F
Information Retrieval
Evaluation of IR systems 21
Ranked retrieval: example
§ Draw the Precision-Recall curve for each query
𝑞I • Query 𝑞I
I I
A • Precision and Recall at 1 𝑃(1)
= I R(1)
= x
B ; ;
• Precision and Recall at 2 𝑃(2)
= ; R(2)
= x
F ; ;
• Precision and Recall at 3 𝑃(3)
= x R(3)
=
D x
x x
C • Precision and Recall at 4 𝑃(4)
= { R(4)
= x
E x x
• Precision and Recall at 5 𝑃(5)
= | R(5)
= x
x x
• Precision and Recall at 6 𝑃(6)
= ~ R(6)
= x
Information Retrieval
Evaluation of IR systems 22
Ranked retrieval: example
§ Draw the Precision-Recall curve for each query (continue…)
𝑞I • Query 𝑞I
A 1
B
F 0.8
D
Precision
0.6
C
E
0.4
0.2
Information Retrieval
Evaluation of IR systems 23
Ranked retrieval: example
§ Draw the Precision-Recall curve for each query (continue…)
𝑞; • Query 𝑞;
M M
C • Precision and Recall at 1 𝑃(1)
= I R(1)
= ;
E I I
• Precision and Recall at 2 𝑃(2)
= ; R(2)
= ;
A I I
• Precision and Recall at 3 𝑃(3)
= x R(3)
=
D ;
I I
B • Precision and Recall at 4 𝑃(4)
= { R(4)
= ;
F ; ;
• Precision and Recall at 5 𝑃(5)
= | R(5)
= ;
; ;
• Precision and Recall at 6 𝑃(6)
= ~ R(6)
= ;
Information Retrieval
Evaluation of IR systems 24
Ranked retrieval: example
§ Draw the Precision-Recall curve for each query (continue…)
𝑞; • Query 𝑞;
C 1
E 0.9
A 0.8
0.7
D
0.6
B
Precision
0.5
F
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
Recall
Information Retrieval
Evaluation of IR systems 25
Ranked retrieval: example
• Determine the R-precision for each query
• Query 𝑞I
;
• 𝑅𝑒𝑙 = 3 → R-precision = 𝑃(3)
= x
• Query 𝑞;
I
• 𝑅𝑒𝑙 = 2 → R-precision = 𝑃(2)
= ;
Information Retrieval
Evaluation of IR systems 26
Ranked retrieval: example
• Draw the Receiver-Operating-Characteristic for each query
𝑞I • Query 𝑞I
I M
A • 𝑇𝑃ltWm 1 = 𝑅 1 = 𝐹𝑃ltWm
1 =
x
x
B ; M
• 𝑇𝑃ltWm 2 = 𝑅 2 = 𝐹𝑃ltWm
2 =
x
x
F ; I
• 𝑇𝑃ltWm 3 = 𝑅 3 = 𝐹𝑃ltWm
3 =
x
D x
x I
C • 𝑇𝑃ltWm 4 = 𝑅 4 = 𝐹𝑃ltWm
4 =
x
x
E x ;
• 𝑇𝑃ltWm 5 = 𝑅 5 = 𝐹𝑃ltWm
5 =
x
x
x x
• 𝑇𝑃ltWm 6 = 𝑅 6 = 𝐹𝑃ltWm
6 =
x
x
Information Retrieval
Evaluation of IR systems 27
Ranked retrieval: example
• Draw the Receiver-Operating-Characteristic for each query
𝑞I • Query 𝑞I
1
A
0.9
B
0.8
F
0.7
D
True-Positive rate
0.6
C
0.5
E
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
False-Positive rate
Information Retrieval
Evaluation of IR systems 28
Ranked retrieval: example
• Draw the Receiver-Operating-Characteristic for each query
𝑞; • Query 𝑞;
M I
C • 𝑇𝑃ltWm 1 = 𝑅 1 = 𝐹𝑃ltWm
1 =
{
;
E I I
• 𝑇𝑃ltWm 2 = 𝑅 2 = 𝐹𝑃ltWm
2 =
{
;
A I ;
• 𝑇𝑃ltWm 3 = 𝑅 3 = 𝐹𝑃ltWm
3 =
{
D ;
I x
B • 𝑇𝑃ltWm 4 = 𝑅 4 = 𝐹𝑃ltWm
4 =
{
;
F ; x
• 𝑇𝑃ltWm 5 = 𝑅 5 = 𝐹𝑃ltWm
5 =
{
;
; {
• 𝑇𝑃ltWm 6 = 𝑅 6 = 𝐹𝑃ltWm
6 =
{
;
Information Retrieval
Evaluation of IR systems 29
Ranked retrieval: example
• Draw the Receiver-Operating-Characteristic for each query
𝑞; • Query 𝑞;
1
C
E
0.8
A
True-Positive rate
D
0.6
B
F
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
False-Positive rate
Information Retrieval
Evaluation of IR systems 30
References
Information Retrieval