Professional Documents
Culture Documents
Introduction To IR Chapter 01
Introduction To IR Chapter 01
Wisdom
knowledge
Information
Data
Information Hierarchy
Data
The raw material of information
Information
Data organized and presented by someone
Knowledge
Information read, heard or seen and understood
Wisdom
Distilled and integrated knowledge and
understanding
What kinds of information are there?
Text
books, periodicals, WWW, memos, ads
published/refereed
Film
Photos, other Images
Broadcast TV, Radio
Telephone Conversations
Databases
What is Information Retrieval?
Finding relevant information in large collections of data
In such a collection you may want to find:
“Retrieve that amount of knowledge which a
user needs in a specific situation for solving
his/her current problem” (Kuhlen 1991)
Motivation
IR: Deals with representation, storage,
organization of, and access to information
items
Focus is on the user information need
Unfortunately User information need is not a
simple problem:
Find all docs containing information on college tennis
teams which: (1) are maintained by a USA university
and (2) participate in the NCAA tournament.
Motivation
This Information need can’t be used directly to
request information using the Web search
engines.
User must translate information need into a
query
The translation yields a set of keywords (index
terms), which summarizes the user information
needs.
The Goal of IR
Goal = find documents relevant to an information need from
a large document set
Info.
need
Query
IR
Retrieval system
Document Answer list
collection
The Goal of IR
In fact the primary goal of an IR system is
to retrieve all the documents which are
relevant to a user query while retrieving as
few non-relevant documents as possible.
Main problems in IR
Document and query indexing
How to best represent their contents?
Query evaluation (or retrieval process)
To what extent does a document correspond to a
query?
System evaluation
How good is a system?
Are the retrieved documents relevant? (precision)
Are all the relevant documents retrieved? (recall)
Information Retrieval versus Data Retrieval
Data retrieval
which docs contain a set of keywords?
Well defined structure (semantics)
Information retrieval
information about a subject or topic
semantics is frequently loose
small errors are tolerated
IR system:
interpret contents of information items
generate a ranking which reflects relevance
concepts of relevance is most important
Data vs. Information Retrieval
Data Retrieval Information Retrieval
Precise description Vague information need
Well-structured data Natural Language, images,
Unstructured ...
Precise results
Semantic interpretation
Yes-or-no results
Approximate results
Relevance ranking
SQL defined
Keyword & features
Basic Concepts
The User Task
Retrieval
Database
Browsing
Retrieval
information or data (both cases we say that
the user searches for useful information or
data excuting a retrieval task.
Purposeful
Basic Concepts
Browsing
Logical view of docs
... Added linguistic info... not clear if helps Slow, good
Full text
Text operations: reduce complexity to index terms
Keywords, stopwords
Categories
Fast, bad
Basic Concepts
Logical view of the documents: represented the
documents with a set of index terms or keywords.
Full text logical view: represented the documents with a
full set of words (modern computer).
structure
Ad-hoc retrieval
One time queries (e.g. Web search)
Filtering/Routing
Constant search profile (e.g. Spam filtering)
The Information Retrieval Cycle
Source
Selection Resource
Query
Formulation Query
Selection Documents
query reformulation,
vocabulary learning,
relevance feedback
Examination Documents
source reselection
Delivery
Supporting the Search Process
Source
Selection Resource
Query
Formulation Query
Examination Documents
Acquisition Collection
Delivery
History of IR
1950: Calvin N. Moors coins the term `Information Retrieval'
1959: Luhn describes statistical retrieval
1960: Maron and Kuhns dene a probabilistic model of IR
1966: Craneld project denes evaluation measures
1968: Gerard Salton's rst book about the SMART retrieval
system
1972: Lockheed introduces DIALOG as commercial online
service
Late 1980's: First PC systems incorporate retrieval
History of IR
Early 1990's: Cheap disks lead to the information storage
revolution
1992: Westlaw is the first large-scale information service
using
probabilistic retrieval
Mid 1990's: Multi-media databases
1994: The internet and web explosion
1995: IR techniques are incorporated in all kinds of
information
management applications