TYCS SEM-6th Information Retrieval (MCQ) Question Bank

1) IR Stands for______________.

a) Information Retrieval
b) Information Retired
c) Inform Retrieval
d) Information Ready

2) Each item in the list is called as______________.

a) Items
b) Posting
c) Query
d) Information
3) etr term is called _________k-grams wildcard query.
4) To search document by _______________ in IR.
5) SEO stands for _____________ .
a) Search English Optimization
b) Search Engine Optimization
c) Search Engine Operator
d) Search Engine Operation
6) Dictionary performed by _________________pair
a) Key and Value
b) Value and Number
c) Id and Number
d) Name and code
7) An advantage of a positional index is that it reduces the asymptotic complexity of a
postings intersection operation.
A) True
B) False

8) _________can best be described as a programming model used to develop Hadoop-

based applications that can process massive amounts of data.
A) MapReduce
B) Mahout
C) Oozie
D) All of the mentioned

9) The purpose of the inverse document frequency is to increase the weight of terms with
high collection frequenc.
A) True
B) False

10) URL Stands for ______________________.

a) Uniform Ravar Location
b) Uniform Resource Locator
c) Uni Resource Locate
d) Uniform Reverse Locator

11) A data structure that maps terms back to the parts of a document in which they occur is
called an
A) Postings list
B) Incidence Matrix
C) Dictionary
D) Inverted Index

12) The first large information retrieval research group was formed by____________at
cornell in 1960.
a) Gerard Salton
b) Ratan Tata
c) Ramesh Bush
d) Think Roy
13) Input, Purpose and Output are the factors of _________ .
a) Summarization
b) Question Answering
c) Page Rank
d) Personalized Search

14) A deadlock can be broken down by

a) Committing one or more transactions
b) Aborting one or more transactions
c) Rolling back one or more transactions
d) Terminating one or more transactions.

15) NLTK stands for ______________ .

a) Natural Language Toolkit
b) Natural Lang Tool
c) Natural Long Tooltip
d) Nature Language Toolkit

16) Online transaction processing is used because

a) disk is used for storing files
b) it is efficient
c) it can handle random queries.
d) Transactions occur in batches

17) The primary storage medium for storing archival data is

a)floppy disk
b)magnetic disk
c)magnetic tape

18) Organizations have hierarchical structures because

a) it is convenient to do so
b) it is done by every organization
c) specific responsibilities can be assigned for each level
d) it provides opportunities for promotions

19) Spelling correction only depends on___________factor.

a) Query
b) term
c) indexpowerd

20) Boolean query operator?

a) +
b) -
d) <<<
21) A computer based information system is needed because
(i) The size of organization have become large and data is massive
(ii) Timely decisions are to be taken based on available data
(iii) Computers are available
(iv) Difficult to get clerks to process data

a)(ii) and (iii)

b)(i) and (ii)
c)(i) and (iv)
d)(iii) and (iv)

22) Operational information is needed for

a) Day to day operations
b) Meet government requirements
c) Long range planning
d) Short range planning

23) Data by itself is not useful unless

a) It is massive
b) It is processed to obtain information
c) It is collected from diverse sources
d) It is properly stated

24) For taking decisions data must be

a) Very accurate
b) Massive
c) Processed correctly
d) Collected from diverse sources

25) CLEF stands for________

a) Cross Language Evaluation Forum
b) Cross lingual evaluating field
c) Cross Language Evaluating Field
d) Cross Language Evaluating Forum

26)Variable size postings lists is used when

A) More seek time is desired and the corpus is dynamic
B) Less seek time is desired and the corpus is dynamic
C) Less seek time is desired and the corpus is static
D) More seek time is desired and the corpus is dynamic

27)Best implementation approach for dynamic indexing is

A) Periodic re indexing
B) Using Invalidation bit vector for deleted docs
C) None
D) Using logarithmic merge
28)Structured data allows for
A) Does not depend on data complexity
B) Less complex queries
C) No relationship
D) More complex queries

29) Data represent in_________________format IR System

a) Text
b) Image
c) Audio text media
d) Options a,b,c

30)Term document incidence matrix is

A) Sparse
B) Depends upon the data
C) Dense
D) Cannot predict

31) What is contiguity hypothesis in vector space classification

A) Documents from different classes dont overlap
B) Documents in the same class form a contiguous region of space
C) All of the above.
D) Intra cluster similarity is higher than inter-cluster similarity

32) Tactical information is needed for

A) Day to day operations
B) Meet government requirements
C) Long range planning
D) Short range planning

33) Strategic information is required by

a) Middle managers
b) Line managers
c) Top managers
d) All workers

34) Postings List is like Array structure in IR?

a) True
b) false
35) An index that includes sequences of words or terms of variable length that have been
extracted from a source document is called a
a) Phrase Index
b) Biword index
c) Positional index
d) Inverted Index
36) A process to efficiently intersect lists to be able to quickly find documents that contain
both terms is referred to as merging postings lists.
a) True
b) false

37) The formula used to estimate the vocabulary size of a collection is known as:

a) Zipf's law
b) Power law
c) Heap's law
d) Compression ratio

38) Weighted zone scoring is sometimes referred to as ranked Boolean retrieval.



39)In the bag of words model, the exact ordering of terms within the document is both
significant and relevant to processing.


b) False

40) The number of times that a word or term occurs in a document is called the:

a)Proximity Operator

b)Vocabulary Lexicon

c)Term Frequency

d)Indexing Granularity

