Professional Documents
Culture Documents
IR01 - Introduction To Information Retrieval
IR01 - Introduction To Information Retrieval
IR01
Introduction to Information
Retrieval
Fadi Yamout
Information Anxiety
Libraries were considered as
storehouse of books, periodicals, and
others.
Information Anxiety
Libraries were considered as
storehouse of books, periodicals, and
others.
Information Anxiety
75,000 magazine or newspaper are published
each year in the United States and Canada alone.
Information Anxiety
The amount of available information doubles
every few years
The number of books in top libraries doubles
every 5 years
Information Retrieval
IR came along in 1952, and from 1961 onwards
gained popularity in the research community
in support of cataloguing and indexing the
information electronically
Information Retrieval
It is a system capable of
Storing information such as text, images,
audio, video or other multi-media objects
Assists the user in locating it
Information Retrieval
Information Retrieval deals with:
Representation
Storage
Organization of information items
Access to information items
Successful IR system
A successful IR system
Fadi Yamout 11
Input
Often the collections of documents is several
billions of document
Traditional Floppy Disk
Input
This disk can store 1.44 megabytes
(400 pages)
Fadi Yamout 13
Traditional Hard Disk
Input
This disk can store 100-400 Gigabytes
(?????? pages)
Fadi Yamout 14
Hard Disk Pack
Input
More Secondary Storages
RAID
Fadi Yamout 15
Information versus Data Retrieval
Information Retrieval System does not provide
an exact answer
The output of an Information Retrieval System
in response to a search request consists of sets
of references
Information versus Data Retrieval
Data retrieval
which docs contain a set of keywords?
Well defined semantics
a single erroneous object implies failure!
Information retrieval
Information about a subject or topic
Semantics is frequently loose
Small errors are tolerated
Information Retrieval System
Objective
A good IR system should retrieve as
many relevant documents as
possible, but only the relevant
documents
Ranking
Good IRS typically rank the matched
documents so that those most likely to be
relevant (those with the higher similarity
with the query) are presented to the user
first
Past, Present, and Future
Early development
Indexing
Model
Querying
Ranking
Feedback