Professional Documents
Culture Documents
To Information Retrieval & Question Answering: E. Jembere Based On Lecture Slides Form Kathy Mckeown Lecture Slides
To Information Retrieval & Question Answering: E. Jembere Based On Lecture Slides Form Kathy Mckeown Lecture Slides
To Information Retrieval & Question Answering: E. Jembere Based On Lecture Slides Form Kathy Mckeown Lecture Slides
to
Information Retrieval
&
Question Answering
E. Jembere
1
QA: Example 1
• Who won the Nobel Peace Prize in 1991?
But many foreign investors remain sceptical, and western governments
are withholding aid because of the Slorc's dismal human rights record
and the continued detention of Ms Aung San Suu Kyi, the opposition
leader who won the Nobel Peace Prize in 1991.
The regime, which is also engaged in a battle with insurgents near its
eastern border with Thailand, ignored a 1990 election victory by an
opposition party and is detaining its leader, Ms Aung San Suu Kyi, who
was awarded the 1991 Nobel Peace Prize. According to the British Red
Cross, 5,000 or more refugees, mainly the elderly and women and
children, are crossing into Bangladesh each day.
04/15/2023 5
Information Retrieval
• Basic assumption: semantics carried in a
document can be captured by analyzing
(counting) the words that occur in it.
• I see what I eat means the same thing as I
eat what I see.
The ordering and the constituency of words are
immaterial
• This collection of words (I, see, what, eat)
is known as the bag of words approach.
6
Some key terminology
• A document refers generically to a unity
of text indexed in the system and
available for retrieval
• A collection refers to a set of documents
being used to satisfy a user query
• A term refers to a lexical entity that occurs
in a document
• A query represents a user’s information
query expressed as a set of terms
04/15/2023 8
Ad Hoc Retrieval
1. Represent documents as a set of weights in a
vector space.
2. Convert the query to a vector using the same
vector space and weighting scheme that were
used to represent the documents.
3. Compute the similarity between the query
vector and all the candidate documents
4. Return the documents ordered according to
how similar are they to the query