Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Information

Retrieval COMPSCI 121 / IN4MATX 141


Quiz #3 – Permutation A - 02/27/2018 WITH ANSWERS

Topics: Boolean, Ranked Retrieval, Vector Space Model and Efficient Scoring

Name __________________________________________________________________________________

Student ID______________________________________________________________________________

This exam is individual, closed-book and closed-notes.
§ If you’re taking the online version: during the quiz, you are not allowed to use
other programs or visit sites other than the quiz page on your Canvas session.
§ If you’re taking the paper version: you are only allowed to use this sheet (both
sides) and return it with your answers. No scratch paper is allowed.

Multiple Choice Questions: Please choose only one answer per question

Q1 – Imagine you have a collection of a million documents (N) with an average of
3,000 words per document and a total of M=400,000 terms (unique words). Which
statement is false regarding its Term-Document Incidence Matrix (TDIM)?
☐The TDIM would be extremely sparse (most entries would be 0).
☐The TDIM dimension is 4x1011.
ýGiven the TDIM, we can calculate the term frequency (tf).
☐Given the TDIM, we can calculate the document frequency (df).

Q2 – Which of the following statements is false with regards to Boolean Retrieval
model?
☐It answers queries based on Boolean expressions (AND, OR and NOT).
☐It does not capture information about term position in the documents.
ýIt considers document structure (zones in documents, such as headers).
☐It can combine two operators, such as “AND NOT” or “OR NOT”

Q3 – Select the most efficient processing order for the Boolean query Q.
Q: “web AND ranked AND retrieval”.
Term Doc. Freq
☐(web AND ranked) first, then merge with retrieval. web 522,196
☐(web AND retrieval) first, then merge with ranked. ranked 105,384
ý (retrieval AND ranked) first, then merge with web. retrieval 483,259
☐Any combination would result in the same amount of operations.

Q4 – Find the Jaccard coefficient (Jc) for the query q and docs d1 and d2 below.
q: machine learning d1: deep learning and artificial intelligence
d2: learning how to work with a virtual machine

ýJc(q,d1)=1/6, Jc(q,d2)=1/4 ☐Jc(q,d1)=0, Jc(q,d2)=1/4
☐Jc(q,d1)=1/3, Jc(q,d2)=1/5 ☐Jc(q,d1)=1/4, Jc(q,d2)=1/6

Page 1/2

This study source was downloaded by 100000826471882 from CourseHero.com on 06-08-2022 00:44:19 GMT -05:00

https://www.coursehero.com/file/30800343/Quiz-IRW18-3A-Answerspdf/
Information Retrieval COMPSCI 121 / IN4MATX 141
Quiz #3 – Permutation A - 02/27/2018 WITH ANSWERS

Q5 –Which of the following statements is false with regards to the Term-Document
Count Matrix (TDCM) of a set of M terms in a collection of N documents?
☐Given the TDCM, a document can be represented as a vector of natural numbers.
☐The TDCM considers term frequency (tf).
☐Given the TDCM, we can calculate the document frequency (df).
ýThe TDCM considers positional information of the terms within the document.

Q6 – Mark the false statement with regards to the document frequency (df)?
☐Frequent terms are less informative than rare terms.
☐The ‘df’ of a term ‘t’ can be found as the length of the posting list of t.
ý All the other statements are false.
☐The ‘df’ is an inverse measure of the informativeness of ‘t’.

Q7 – Which of the following statements is false with regards to the Vector Space
Model?
☐Terms represent dimensions, which results in a high-dimensional space.
☐Documents and queries are presented as weighted tf-idf vectors in the space.
☐ The distance query-document is not a good approach to rank its similarity.
ýDocuments should be ranked in decreasing order of cosine(query, document).

Q8 – Efficiency plays a key role in ranked retrieval. (a) What is the primary
computational bottleneck when ranking documents, and (b) how can it be mitigated?
ý (a) Computing scores. (b) Reducing ranking precision in favor of efficiency.
☐(a) Keeping the dictionary in main memory. (b) Using compression.
☐(a) Sorting scores. (b) Using a heap data structure instead of an array.
☐(a) Sorting scores. (b) Breaking posting lists down (high and low tiers).

Q9 – Imagine you are constructing Tiered Indexes to improve the efficiency of your
search engine. Which statement is false?
☐You will break index up into tiers of decreasing importance.
☐You can break the index by Authority or term frequency, among other scores.
ý Using Authority to break the index, the same document may appear in different
tiers.
☐Using term frequency to break the index, the same document may appear in
different tiers.

Q10 –In lecture, we saw that Authority is a Static Quality Score of a document. Which
of the following is not an Authority signal?
ý Any website with a valid registered domain (.com).
☐Research papers with many citations.
☐Wikipedia among websites.
☐Articles in certain newspapers.

Page 2/2

This study source was downloaded by 100000826471882 from CourseHero.com on 06-08-2022 00:44:19 GMT -05:00

https://www.coursehero.com/file/30800343/Quiz-IRW18-3A-Answerspdf/
Powered by TCPDF (www.tcpdf.org)

You might also like