Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Consider the following documents,

Doc1: Data analysis deals with data sets


Doc2: Data mining in various fields
Doc3: Data analysis techniques and algorithms
Doc4: Data sets with temporal attributes
When the Term Document incidence matrix is constructed and the query data AND (sets OR
analysis) is executed on it, the resultant doc's retrieved will be which ones from the following?
Select one:
Doc1, Doc4
Doc1, Doc2,Doc3,Doc4
Doc1,Doc3, Doc4
Doc1
Given the Boolean Query (Brutus OR Caesar) AND Mary, which of following is optimised one of the
above given query?
Select one:
Details not enough to predict the answer
Brutus OR (Caesar and MARY)
(Brutus AND Mary) OR (Caesar AND Mary)
(Brutus OR Caesar) AND Mary
Which search engine is well supported with multiple languages?

Select one:
Google
Yahoo
Bing
Ask
Jaccard coefficient of (A, B) is zero if:
Select one:
A∩B=0
A=0
B=0
A ∪ B=0
Using the Porter’s algorithm, which of the following pairs cannot be mapped together?
Select one:
volumes-volume
university-universe
cars-car
abandoned-abandon
Which of the following type of web pages can satisfy the informational query search of a user?

Select one:
FAQ’s
All of the choices
How-To-Page
Category Page

Proximity operators are less efficient computationally because the index needs to store positional
information.
Select one:
False
Not enough information provided
True

Consider the following documents,


Doc1: Data analysis deals with data sets
Doc2: Data mining in various fields
Doc3: Data analysis techniques and algorithms
Doc4: Data sets with temporal attributes
When the Term Document incidence matrix is constructed for the above documents, what will be the
total number of entries in the matrix?
Select one:
52
50
56
64
Given the query sa*t , if you want to search for permuterm wildcard index, which of the following
keys can be looked upon?
Select one:
$sat*
t$as*
sat$*
t$sa*

Given a document collection which has 35 relevant documents, if an IR system retrieves 10 relevant
and 13 irrelevant documents, what is the recall value of the system?

Select one:
0.29
0.66
0.33
0.43

A commonly used measure of overlap of two sets A and B (A is document and B is query):
Select one:
K-gram index
Permuterm index
Jaccard coefficient
Kappa coefficient

Given the Boolean query with terms (cat OR bat) AND NOT (dog or mat), Which of the following will
be the equivalent Disjunctive Normal Form for the above query?
Select one:
cat AND (NOT dog) AND (NOT mat))OR (cat AND bat AND(NOT dog))
(cat AND bat AND (NOT dog)) OR (cat AND bat AND (NOT mat))
(cat AND (NOT dog) AND (NOT mat)) OR (bat AND (NOT mat) AND(NOT dog))
None of the choices
If I search for term X, and term X has many synonyms, then precision is more likely to be a problem
than recall.
Select one:
False
True
Not enough information provided

Consider an index for 1 million documents each having a length of 1,000 words. Say there are 100K
distinct terms in total. What is the space requirement for an uncompressed term-document incidence
matrix considering 1 cell takes 1 bit?
Select one:
10^12 bits
10^13 bits
10^11 bits
10^10 bits

What is the query-document match score that the Jaccard coefficient computes for the document
below?
Query: month of march
Document: John died in march

Select one:
1/6
2/7
1/7
2/6

Given a document containing the sentence ‘If there is a question, there is a solution’ , what are the
number of tokens in the sentence?
Select one:
9
4
10
6
When Lemmatization is applied to the term "Destruction” to which of the following form it gets
reduced?

Select one:
Destruct
Destruc
Destroy

If X denotes the length of string s1 and Y denotes the length of the string s2, then the edit distance
between s1 and s2 is never more than which of the following?
Select one:
Max(X,Y)
None of the choices
Min (X,Y)
X+Y

The type of index suitable for spelling correction:


Select one:
Biword Index
k-gram index
Extended Biword Index
Positional Index

The OR operator assigns a higher score to documents that contain both terms.
Select one:
True
Not enough information provided
False

You might also like