Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Information Retrieval

IR01
Introduction to Information
Retrieval
Fadi Yamout
Information Anxiety
Libraries were considered as
storehouse of books, periodicals, and
others.
Information Anxiety
Libraries were considered as
storehouse of books, periodicals, and
others.
Information Anxiety
75,000 magazine or newspaper are published
each year in the United States and Canada alone.
Information Anxiety
The amount of available information doubles
every few years
The number of books in top libraries doubles
every 5 years
Information Retrieval
IR came along in 1952, and from 1961 onwards
gained popularity in the research community
in support of cataloguing and indexing the
information electronically
Information Retrieval
It is a system capable of
Storing information such as text, images,
audio, video or other multi-media objects
Assists the user in locating it
Information Retrieval
Information Retrieval deals with:
Representation
Storage
Organization of information items
Access to information items
Successful IR system

A successful IR system

Locates information with less time

Locates relevant information


Query
A user accesses the Information Retrieval
System by submitting a query
Likely to include natural language text of the
documents or titles and abstracts
Query is then processed by a search engine
Query
Query is not the only tool to access
information
Hyperlink is an alternative

Fadi Yamout 11
Input
Often the collections of documents is several
billions of document
Traditional Floppy Disk

Input
This disk can store 1.44 megabytes
(400 pages)

Fadi Yamout 13
Traditional Hard Disk

Input
This disk can store 100-400 Gigabytes
(?????? pages)

Fadi Yamout 14
Hard Disk Pack

Input
More Secondary Storages
RAID

Fadi Yamout 15
Information versus Data Retrieval
Information Retrieval System does not provide
an exact answer
The output of an Information Retrieval System
in response to a search request consists of sets
of references
Information versus Data Retrieval
Data retrieval
which docs contain a set of keywords?
Well defined semantics
a single erroneous object implies failure!
Information retrieval
Information about a subject or topic
Semantics is frequently loose
Small errors are tolerated
Information Retrieval System
Objective
A good IR system should retrieve as
many relevant documents as
possible, but only the relevant
documents
Ranking
Good IRS typically rank the matched
documents so that those most likely to be
relevant (those with the higher similarity
with the query) are presented to the user
first
Past, Present, and Future

Early development

Table of contents of a book

Index at the end of the book


The Retrieval Process
Collecting documents

Indexing

Model

Querying

Ranking

Feedback

You might also like