Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

NUST National University of Sciences and Technology (NUST)

School of Electrical Engineering and Computer Science (SEECS)


MSCS-2K18/MSIT-2K18, Information Retrieval
Second Assignment, March/April 2019

Student Name:……………………………. Roll No:…………………... Marks 50


Student Name:……………………………. Roll No:…………………...
Student Name:……………………………. Roll No:…………………...
Instructions:
This assignment is due on 2nd April 2019. The solution to the assignment should be
handed to Instructor in the class. Handwritten/printed assignments are accepted.
Plagiarism in assignment will lead to zero marks. Please attach this sheet to the solution
as a header and make sure that your name and page number appear on each page of the
solution. To get good marks you must explain your solution.

Question 1: Answer to the point [Marks: 30].


A user has asked a query: “sweet chocolate icing sugar”. Assume the search engine gives
the following preprocessed documents given in table 1:
Table 1: Documents
Document Terms
1 hot sweet chocolate cocoa beans
2 sweet beans harvest icing butter
3 icing sugar beet sweet beens butter
4 sugar cane beet black chocolate hot

He judged the relevance of each document to the above given query and identified d1 and
d3 as relevant ones and d2 and d4 as irrelevant. Compute the modified query vector,
using the Rocciho Algorithm. The weight for the original query α=1, the relevant
documents β=0.75, and the irrelevant ones γ=0.15. Negative weights in the modified
query vector set to be 0. Show each step in calculation

Question 2: Answer to the point [Marks: 20].


Omar has implemented a relevance feedback web search system, where he is going to do
relevance feedback based only on words in the title text returned for a page (for
efficiency). The user is going to rank 3 results. The first user, Jinxing, queries for:
banana slug
and the top three titles returned are:
banana slug Ariolimax columbianus
Santa Cruz mountains banana slug
Santa Cruz Campus Mascot
Jinxing judges the first two documents relevant, and the third non relevant. Assume that
Omar’s search engine uses term frequency but no length normalization nor IDF. Assume
that he is using the Rocchio relevance feedback mechanism, with α = β = γ = 1. Show the
final revised query that would be run. (Please list the vector elements in alphabetical
order.)

You might also like