Mid Solution

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Limitation of boolean search and Limitation of Vector Space Model with example

Among the negative aspects of the Boolean approach that Salton identified were: the difficulty
of formulating a good Boolean query, the inability to limit the size of the output set, the fact
that output cannot be ranked in any logical or useful order, and the inability to weigh the terms
in either the query or the document. Perhaps the most serious criticism of the Boolean
approach, however, is that it sometimes produces results that are counter to those one would
expect using intuitive reasoning.

For example, in an OR search (A OR B OR C . . .), a document containing only one of the terms is
accorded the same weight as one that contains many or all of the search terms. On the other
hand, in an AND search (A AND B AND C . . .) a document that contains all but one of the
required terms is treated in the same way-i.e., not retrieved-as one that contains none of the
required terms.

Limitation of Vector Space Model with example


1. Long documents are poorly represented because they have poor similarity values (a
small scalar product and a large dimensionality).
2. Search keywords must precisely match document terms; word substrings might result in
a "false positive match".
3. Semantic sensitivity; documents with similar context but different term vocabulary
won't be associated, resulting in a "false negative match".
4. The order in which the terms appear in the document is lost in the vector space
representation.
5. Theoretically assumes terms are statistically independent.
6. Weighting is intuitive but not very formal.

How does TF-IDF differ from cosine similarity. Why cosine measure has the advantage over TF-
IDF?
TF-IDF will give you a representation for a given term in a document. Cosine similarity will give
you a score for two different documents that share the same representation. However, "one of
the simplest ranking functions is computed by summing the tf–idf for each query term".
Cosine similarity is used to compare the closeness of query vector and document vector where
the document vector is in terms of axes represented by query terms. Hence, its for comparing
the closeness of a query vector with document vector.
What are the evaluation measures that are used In information retrieval systems, elaborate
the equations?
Precision can be seen as a measure of quality, and recall as a measure of quantity. Higher
precision means that an algorithm returns more relevant results than irrelevant ones, and high
recall means that an algorithm returns most of the relevant results (whether or not irrelevant
ones are also returned).

You might also like