PART-I: Multiple Choices: Jimma University

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

JIMMA UNIVERSITY

Jimma Institute of Technology (CS Department)


Introduction to Information Retrieval Final Exam

Time Allowed: 3 Hrs.


Full Name: ____________________________ Roll No. ______________Department ________

PART-I: Multiple Choices

Answer each of the following questions with the best choice that fits it by
writing the letter of your choice in the space provided. (1 Points for each).

____1. Which of the following is not the main focus of an IR model?


A. Document representation
B. Query representation.
C. Performance evaluation.
D. Determining similarity score.
E. Determining the notion of relevance.
F. None of the above.

____2. From a user’s perspective, IR efficiency should be measured in terms of the time a user
spends in the following activities except:
A. Query formulation and generation.
B. Query expression and execution.
C. Scanning search results to select relevant items.
D. Scanning search results to reject non-relevant items.
E. Reading or using the relevant items identified.
F. None of the above.

____3. Which of the following is/are true about the limitation of searching using keywords?
A. It cannot retrieve documents that contain synonyms terms.
B. It cannot retrieve documents that contain word form variants.
C. It may retrieve irrelevant documents that contain ambiguous terms.
D. All of the above.
E. B and C.
F. A and B.

____4. Which of the following is not among the major challenges that face Search Engine
developers?
A. Huge and heterogeneous collection.
B. Highly static and centralized collection or repositories.
C. Hyperlinked and heterogeneous information sources.
1
D. Unsophisticated and diversified users.
E. Spams and unethical Search Engine Optimization.
F. None of the above.

____5. The success of searching and finding relevant information depends on:
A. The nature and type of the information the user is seeking.
B. The nature and structure of the document repository.
C. The efficiency of IR tools and facilities available to the users.
D. The skill and ability of the user to use the IR system.
E. All of the above.
F. None of the above.

____6. Which of the following statement(s) is/are not true about difference between DBMS and
IR?
A. While a DBMS focuses on well-structured data an IR is concerned with unstructured data.
B. While a DBMS is designed to process well-defined (formal) query an IR is often concerned
with free-text and natural language query.
C. While a search in a DBMS is often probabilistic a search in an IR system is often
deterministic.
D. While the main focus of IR evaluation retrieval accuracy, the main focus of DBMS evaluation
is retrieval time.
E. C and D.
F. None of the above.

____7. Which of the following statement(s) is/are not true about “Bag-of-Words” model?
A. The sequence and order words ae ignored.
B. A document is represented by list of terms it contains
C. It is a simple approach that has been found to be very effective in IR.
D. The syntactic structure and semantic features of words are considered.
E. All of the above.
F. None of the above.

____8. From a user standpoint, the relevance of search result is:


A. Subjective B. Situational
C. Cognitive D. Dynamic
E. All of the above. F. None of the above.

2
PART- II: True/False.
Answer each of the following questions by writing True or False in the space
provided. Justify each of your answer by giving a brief description.
(1 Points for each).

______9. The expectation and information seeking behavior of users need to be anticipated for
the effectiveness of document indexing and representation.

______10. In IR, the notion of relevance is not only continuous but also dynamic.

______11. Since most words in a document have similar descriptive power, index terms need not
be weighted differently.

______12. Most IR models, including VSM, are based on statistical properties of a text
(document) rather than the linguistic features of the text.

______13. Query refinement techniques such as query expansion and relevance feedback cannot
modify and improve the ranking of search results.

______14. In IR, stemming is often used to improve precision.

______15. An IR based on Boolean model often returns either too few or too many documents.

3
PART- III: Essay Questions

Answer all of the following questions (Qs16 – Qs25) appropriately by writing


your answers on the answer sheets provided. Clarify your answers by giving
relevant examples wherever appropriate. (4 Points for each).

16. Suppose that you have been contracted by a manager of a local school library in order to
manage a project on development of an automated IR system. Assume that the library has about
2000 users (students), and about 10,0000 published (or printed) documents such as textbooks and
reference materials that have been manually catalogued.

(a.) Describe at least 3 major initial tasks or activities that you need to consider in order to
develop the proposed IR system.

(b.) Describe at least 4 important metadata elements that can be identified and used to index and
access the library collection.

(c.) Draw a general architecture and basic components of the IR system that you will suggest for
the automation of school library.

17. Explain why and how the following pre-processing steps should be handled in organizing
and representing the content of documents. Describe the difficulties associated with each step.

(a.) Tokenization.
(b.) Identification and elimination of stop-words.
(c) Stemming.

18. Explain why the following characteristics of online content are considered as a challenge in
designing and building Search Engines.

(a.) Huge and heterogeneous collection


(b.) Highly dynamic and distributed information resources
(c.) Multilingual and multimedia information resources

20. (a.) What do you understand by “bag-of-words” model?


(b.) Why keyword queries are often poor descriptions of actual information needs.
(c.) Describe the problems of compound words and phrasal terms in an IR.

4
21. Explain why the following issues should be taken into account in designing and evaluating
Search Engines (SE).

(a.) Efficiency of SE.


(b.) Coverage of SE.
(c.) Freshness of the index database.
(d.) Scalability and adaptability of SE.

22. (a.) What do you understand by IR models?


(b.) Explain significances of IR models in indexing and representing documents.
(c.) Compare and contrast the Vector Space Model with Boolean Model.
(d.) Describe at least three major limitations of Boolean Model that are addressed by the Vector
Space Model.

23. Compare and contrast the following concepts.


(a.) Retrieval vs. Browsing
(b.) Question Answering system vs. IR
(c.) Tokens vs. Terms
(d.) Exact match vs. Best match

24. Given the term frequencies (raw counts) and document frequencies shown in the following
table, compute TF (in Log10) and IDF (in Log10) as well as Term Weight (TF.IDF). Assume the
collection contains 1,000,000 documents. Show the details of your computation.

Term Raw term count Document TF IDF Term Weight


(Frequency) Frequency (TF.IDF)
database 200 10
retrieval 600 100
index 110 1000
information 3800 2,0000

25. (a.) Describe at least 3 problems of judging and determining the relevance in IR.

(b.) Compute the Cosine Similarity (showing the steps) for each of the three documents (Doc1,
Doc2, Doc3) and the Query (Q) using the term weights (Wij) given in the following table.

(c.) Rank the documents according to their similarity score (or relevance) to the query.

5
Documents Term1 Term2 Term3
weight weight weight
Doc1 2 3 5
Doc2 3 7 1
Doc3 5 3 2
Query (Q) 0 1 2

You might also like