Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

Information Storage And Retrieval

Dr.Ata ur Rahman

Department of library and Information Science . 1


Sarhad University of Science and Information Technology, Peshawar
Information Storage and
Retrieval
UNIT No Topic
1 Introduction to Information storage and Retrieval.
2 Modeling.
3 Retrieval Evaluation.
4 Query Language.
5 Query Operations.
6 Text and Multimedia Language and Properties.
7 Text Operation.
8 Indexing and Searching.
MID-TERM EXAM

Department of library and Information Science 2


Sarhad University of Science and Information Technology, Peshawar
Information Storage And
Retrieval
Unit NO TOPICS
9 Parallel and Distributed IR.

10 User Interfaces and Visualization.

11 Multimedia Media IR: Models and Languages.


12 Multimedia Media IR: Indexing and Searching.
13 Searching on the Web.
14 Vocabulary Control.
15 Libraries and Bibliographical System.

FINAL TERM EXAMINATION

3
Objectives
• The main objective of Information retrieval is to supply right Information.
• To the hand of right user at a right time. various materials and methods are used for
retrieving desired information.
• The term Information Retrieval first introduced by Calvin Moore's.

Department of library and Information Science 5


Sarhad University of Science and Information Technology, Peshawar
Recommended Books:

Alpiar, Ronald Computer data and storage. London: ULCC SRC, 1976
Jones, Karen Spark Information retrieval experiment, ..
London Butterworth,1981.
Muhmmad Raiz Modern techniques of documents and Information work.. Lahore:
Qadiria Books,1998.

Sharp, Jhon R Information: notes for students .. London :


Andre Deutsch,1970.

Van, Rijsbergen C.J, Information retrieval,.. 2nd .. London


:Butterworth, 1979.
Department of library and Information Science …………………… Sarhad University of Science and
4
Information Technology, Peshawar
Weekly Teaching Plan
Week No Topics Assignment s /
Quizzes
1 Introduction to Information storage and Retrieval.

2 Modeling.

3 Retrieval Evaluation.

4 Query Language.
5 Query Operations

6 Text and Multimedia Language and Properties


7 Text Operation
8 Indexing and Searching
MID-TERM EXAM

Department of ……………….. 10
Sarhad University of Science and Information Technology, Peshawar
Weekly Teaching Plan
Week No Topics Assignments
Quizzes

9 Parallel and Distributed IR.

10 User Interfaces and Visualization


11 Multimedia Media IR: Models and Languages.

12 Multimedia Media IR: Indexing and Searching.

13 Searching on the Web.

14 Vocabulary Control.

15 Libraries and Bibliographical System.

FINAL
11
CHAPTER NO 11

Multimedia Media IR: Models and Languages.

WEEK 11
Multimedia Information Retrieval
• Unlike alphanumeric data, multimedia data do not have any semantic
structure
• Achieving symmetry between annotation and query is difficult
• Retrieval is based on similarity between query and stored information
instead of exact match
• Stored information is represented using indexing
IR Model
• Information is preprocessed to extract features and semantic contents
• Indexed based on these features and semantics
• User’s query is processed and main features are extracted
• Query’s features are then compared with features or index of each
information item in the database
• Information item whose features are similar to those of the query are
retrieved and presented to the user
Design Issues
• Indexing
• a mechanism that reduces the search space of an operator without losing any
relevant information
• Similarity Computation
• easy to compute and should conform to human judgement
Performance Measures
• Retrieval speed, recall, precision
• Recall measures the ability of retrieving relevant information items
from the database
• defined as the ratio between the number of retrieved relevant items and the total number of
relevant items in the database
• Precision measures retrieval accuracy
• defined as the ratio between the number of retrieved relevant items and the number of total
retrieved items
• Recall and precision are usually considered together
• high recall and low precision
• high precision and low recall
Text Retrieval
• Text may be used to annotate other media such as audio, images and
video and conventional IR techniques used to retrieve multimedia
information
• Boolean IR systems or text-pattern search systems
• Substantial effort is spent in analyzing the contents of the documents
and in generating keywords and indices
• Boolean queries are keywords connected with logical operators (AND,
OR, NOT)
File Structures
• Flat files
• Inverted files
• for each term a separate index is constructed that stores the document identifiers for all
documents containing the term
• each term and the document IDs containing the term are organized into one row
• searching and retrieval is fast because only rows containing the query terms need to be
retrieved and there is no need to search the whole database
Extensions
• Nearness parameters used in query specification help define the topic
more precisely and therefore increase probable relevance of the
retrieved item
• Within Sentence and Adjacency specification in queries
• Term location information is included in the inverted file
• Term i : document id, paragraph no., sentence no., word no.
• For example, if an inverted file has the following entries:
information: R99, 10, 8, 3; R155, 15, 3, 6; R166, 2, 3,1
retrieval: R77, 9, 7, 2; R99, 10, 8, 4; R166, 10, 2, 5
Indexing
• Stop words -- grammatical functional words, such as “of,” “the,” and
“a.”
• Stemming -- reducing words to a common root form
• Thesaurus -- list of synonyms
• Weighting -- term significance derived from occurrence frequency
within a document and among different documents
Relevance Feedback
• Query modification
• terms occurring in documents previously identified as relevant are added to the original
query or the weight of such terms is increased
• terms occurring in documents previously identified as irrelevant are deleted from the query
or the weight of such terms is reduced
• Document modification
• terms in the query, but not in the user-judged relevant documents, are added to the
document index list with an initial weight
• weights of index terms in the query and also in relevant documents are increased by a
certain amount
• weights of index terms not in the query but in the relevant documents are decreased by a
certain amount
Audio Search and Retrieval
• Keywords can be highly subjective because of a different perspective
or even a different taxonomy
• Hard to browse directly since it must be heard in real-time (unlike
video which can be keyframed)
• Two categories : Speech and Non-speech
• with speech, indexing and retrieval is based on obtaining spoken words either manually or by
speech recognition technique
• with non-speech, indexing and retrieval may be based on text annotation (but will it help a
query like “find the first occurrence of the note G-sharp.”)
Any Question

You might also like