Professional Documents
Culture Documents
Assignment3 Ans
Assignment3 Ans
Exercise : Suppose your search engine has just retrieved the top 50 documents from your collection
based on scores from a ranking function R(Q,D). Your user interface can show only 10 results, but you
can pick any of the top 50 documents to show. Why might you choose to show the user something other
than the top 10 documents from the retrieved document set?
Answer:
The reason to show something other than the top 10 documents depends on various factors. For
example, some of the documents may be sponsored and carry more weightage than the rest. A search
engine can give more importance to the documents which are paid for being shown in the top 10 results
, also based on user history,
Exercise : Differentiate between simple inverted index, inverted index with counts, and inverted index
with positions. State the advantages and disadvantages for each one.
Answer:
Simple inverted index: An inverted index is a mapping of words to their location in a set of documents.
An inverted index is a simple hash table which maps words in the documents to some sort of document
identifier.
Inverted index with counts: Document postings can store any information needed for efficient ranking.
supports better ranking algorithms. Using word occurrence counts helps us rank the most relevant
document.
Inverted index with positions: contains the information of the word positions. Each posting contains
two numbers: a document number first, followed by a word position.
Exercise : State 3 different options to handle searches that prefer and benefit from looking at document
fields.
Exercise : Given the below info. Which documents contain the word fish in its title.
In document one . This title includes the word“fish”, because the inverted list for “fish” tells us that
“fish” is the second word in document 1.
)52,12(,)45,9(,)34,6(,)17,3(
Assigments 2-3
• Document-at-a-time
– Calculates complete scores for documents by processing all term lists, one document at
a time.
• Term-at-a-time
Both approaches have optimization techniques that significantly reduce time required to generate
scores.
• Make decision about stemming at query time rather than during indexing ,improved flexibility,
Pseudo-Relevance Feedback is one of the methods for improving search engine results. By
automatically extracting information from a previous search result, a new query is posed as an
expansion of the original query, and then it is searched again.
Relevance feedback is to involve the user in the retrieval process so as to improve the final result set.
10.Snippet Generation involves more features than just significance factor, state 4 of these
factors
pooling is used. In this technique, the top k results (for TREC, k varied between 50 and 200) from the
rankings obtained by different search engines (or retrieval algorithms) are merged into a pool,
duplicates are removed, and the documents are presented in some random order to the people doing
the relevance judgments. Pooling produces a large number of relevance judgments for each query.
• Typical contents
– List of URLs of results, their ranks on the result list, and whether they were clicked on
– Timestamp(s) - records the time of user events such as query submission, clicks