Professional Documents
Culture Documents
Quiz IRW18 3A Answers PDF
Quiz IRW18 3A Answers PDF
Page 1/2
This study source was downloaded by 100000826471882 from CourseHero.com on 06-08-2022 00:44:19 GMT -05:00
https://www.coursehero.com/file/30800343/Quiz-IRW18-3A-Answerspdf/
Information Retrieval COMPSCI 121 / IN4MATX 141
Quiz #3 – Permutation A - 02/27/2018 WITH ANSWERS
Q5 –Which of the following statements is false with regards to the Term-Document
Count Matrix (TDCM) of a set of M terms in a collection of N documents?
☐Given the TDCM, a document can be represented as a vector of natural numbers.
☐The TDCM considers term frequency (tf).
☐Given the TDCM, we can calculate the document frequency (df).
ýThe TDCM considers positional information of the terms within the document.
Q6 – Mark the false statement with regards to the document frequency (df)?
☐Frequent terms are less informative than rare terms.
☐The ‘df’ of a term ‘t’ can be found as the length of the posting list of t.
ý All the other statements are false.
☐The ‘df’ is an inverse measure of the informativeness of ‘t’.
Q7 – Which of the following statements is false with regards to the Vector Space
Model?
☐Terms represent dimensions, which results in a high-dimensional space.
☐Documents and queries are presented as weighted tf-idf vectors in the space.
☐ The distance query-document is not a good approach to rank its similarity.
ýDocuments should be ranked in decreasing order of cosine(query, document).
Q8 – Efficiency plays a key role in ranked retrieval. (a) What is the primary
computational bottleneck when ranking documents, and (b) how can it be mitigated?
ý (a) Computing scores. (b) Reducing ranking precision in favor of efficiency.
☐(a) Keeping the dictionary in main memory. (b) Using compression.
☐(a) Sorting scores. (b) Using a heap data structure instead of an array.
☐(a) Sorting scores. (b) Breaking posting lists down (high and low tiers).
Q9 – Imagine you are constructing Tiered Indexes to improve the efficiency of your
search engine. Which statement is false?
☐You will break index up into tiers of decreasing importance.
☐You can break the index by Authority or term frequency, among other scores.
ý Using Authority to break the index, the same document may appear in different
tiers.
☐Using term frequency to break the index, the same document may appear in
different tiers.
Q10 –In lecture, we saw that Authority is a Static Quality Score of a document. Which
of the following is not an Authority signal?
ý Any website with a valid registered domain (.com).
☐Research papers with many citations.
☐Wikipedia among websites.
☐Articles in certain newspapers.
Page 2/2
This study source was downloaded by 100000826471882 from CourseHero.com on 06-08-2022 00:44:19 GMT -05:00
https://www.coursehero.com/file/30800343/Quiz-IRW18-3A-Answerspdf/
Powered by TCPDF (www.tcpdf.org)