Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

INFORMATION

RETRIEVAL SYSTEMS
Subject Code: A70533
Regulations: R15 – JNTUH
Class: IV Year B.Tech CSE I Semester

Department of Computer Science and Engineering

BHARAT INSTITUTE OF ENGINEERING AND TECHNOLOGY

Ibrahimpatnam - 501 510, Hyderabad


INFORMATION RETRIEVAL SYSTEMS (A70533)
(Elective – 2)
COURSE PLANNER
I. COURSE OVERVIEW:
The main objective of this course is to present the scientific support in the field
of information search and retrieval. This course explores the fundamental
relationship between information retrieval, hypermedia architectures, and
semantic models, thus deploying and testing several important retrieval models
such as vector space, Boolean and query expansion. It discusses
implementation and evaluation issues of new algorithms like clustering, pattern
searching, and stemming with advanced data/file structures, indirectly
facilitating a platform to implement comprehensive catalogue of information
search tools while designing an e-commerce web site
II PRE-REQUISITES
1. Students must have the minimal concept of Data Base Management Systems
2. They must also have the concept of different types of algorithms used for searching data
3. They must also have the minimal knowledge of Natural language such as thesaurus,
synonyms etc. to understand the concept of retrieving the textual information because
text is the main data type used in Information Retrieval Systems
III. COURSE OBJECTIVES:
Demonstrate genesis and diversity of information retrieval situations for text and
1
hyper media.
Describe hands-on experience store, and retrieve information from www using
2
semantic approaches.
Demonstrate the usage of different data/file structures in building computational
3
search engines.
Analyze the performance of information retrieval using advanced
4 techniques such as classification, clustering, and filtering over
multimedia.
Analyze ranked retrieval of a very large number of documents with hyperlinks
5
between them.
Demonstrate Information visualization technologies like Cognition and perception in
6
the Internet or Web search engine.
IV. COURSE OUTCOMES:
S.No Description Blooms level of taxonomy
Describe the objectives of information retrieval
1 Understanding
systems.
Describe models like vector-space, probabilistic
2 and language models to iidentify the similarity of Understanding
query and document.
Implement clustering algorithms like hierarchical
3 Create
agglomerative clustering and k-means algorithm.
Understand relevance feedback in vector space
4 Understanding
model and probabilistic model.
Illustrate how N-grams are used for detection and
5 Understanding, Knowledge
correction of spelling errors.
6 Understand the method of Regression analysis to Understanding
estimate the probability of relevance.
Understand the method to construct thesauri
7 Understanding
automatically and Manually.
Understand natural language systems to build
8 Understanding, Knowledge
semantic networks for text.
Illustrate algorithms used for natural language
9 Understanding
processing.
Understand the measures to evaluate the
10 Understanding
performance of cross language information
Understand query, document and phrase
11 translation. Understanding, Knowledge

12 Design the method to build inverted index. Create


VII V. HOW PROGRAM OUTCOMES ARE
. ASSESSED:
Program Leve Proficiency
assessed by
Outcomes l
PO1 Engineering knowledge: the knowledge mathematics Assignments
Apply of , 3
science, engineering fundamentals, and an engineering ,
specialization Tutorials
to the solution of complex engineering
problems.
PO2 Problem analysis: Identify, formulate, review research
literature, and 3 Assignments
analyze engineerin problems reachin substantiate
complex g g d
conclusions using first principles of mathematics, natural
sciences, and
engineering sciences.
PO3 Design/development of solutions: Design solutions for
complex
engineering problems and design system components or 2 Mini
processes that Projects
meet the specified needs with appropriate consideration for
the public
health and safety, and the cultural, societal, and
environmental
considerations
.
PO4 Conduct investigations of complex problems: Use
research-based 2 Projects
knowledge and research includin design of
methods g experiments,
analysis and interpretation of data, and synthesis of the
information to
provide valid conclusions.
PO5 Modern tool usage: Create, select, and apply appropriate
techniques, 2 Mini
resources, and modern engineering and IT tools including Projects
prediction
and modeling to complex engineering activities with an
understanding
of the
limitations.
PO6 The engineer and society: Apply reasoning informed by
the 2 Assignments
contextual knowledge to societal, health, safety, legal
assess and
cultural issues and the consequent responsibilities relevant
to the
professional engineering
practice.
PO7 Environment and sustainability: Understand the impact
of the -- --
professional engineering in societal and environmenta
solutions l
contexts, and demonstrate the knowledge of, and need for
sustainable
development.
PO8 Ethics: Apply ethical principles and commit to -- --
professional ethics and
responsibilities and norms of the
engineering practice.
PO9 Individual and team work: Function effectively as an
individual, and -- --
as a member or leader in diverse teams, and in
multidisciplinary
settings.
PO1 Communication: Communicate effectively on complex
0 engineering
activities with the engineering community and with society 2 Assignments
at large,
such as, being able to comprehend and write effective
reports and
design documentation, make effective presentations, and
give and
receive clear instructions.
PO11 Project management and finance: Demonstrate
knowledge and -- --
understanding the and management principles
of engineerin and
g
apply these to one’s own work, as a member and leader in
VI. HOW PROGRAM SPECIFIC OUTCOMES ARE ASSESSED:

Program Specific Outcomes Level Proficiency


assessed by
PSO1 Professional Skills: The ability to research, understand
and implement computer programs in the areas related Lectures,
3 Assignments
to algorithms, system software, multimedia, web
design, big data analytics, and
networking for efficient analysis and design of
computer-based systems of varying
complexity.
PSO2 Problem-Solving Skills: The ability to apply standard
practices and strategies in software 3 Mini
Projects
project development
usingopen-ended programming
environments to deliver a quality product for business
success.
PSO3 Successful Career and Entrepreneurship: The ability
to employ modern computer languages, environments, 2 Guest
and platforms in creating innovative career paths, to be Lectures
an entrepreneur, and a zest for higher
studies.
N – None S - Supportive H - Highly Related
VII SYLLABUS:
UNIT – I:
Introduction: Retrieval strategies: vector space model, Probabilistic retrieval strategies:
Simple term weights, Non binary independence model, Language models.
UNIT – II:
Retrieval Utilities: Relevance feedback, clustering, N-grams, Regression analysis, Thesauri.
UNIT – III:
Retrieval utilities: Semantic networks, parsing
Cross –Language: Information Retrieval: Introduction, Crossing the Language barrier.
UNIT – IV:
Efficiency: Inverted Index, Query processing, Signature files, Duplicate document detection.
UNIT – V:
Integrating structured data and text. A historical progression, Information retrieval as
relational application, Semi Structured search using a relational schema.
Distributed Information Retrieval: A theoretical Model of Distributed retrieval, web search
SUGGESTED BOOKS
Text books:
1. David A. Grossman, OphirFrieder, Information Retrieval – Algorithms and Heuristics,
Springer, 2nd Edition( Distributed by Universal Press), 2004
Reference books:
1. Gerald J Kowalski, Mark T Maybury Information Storage and Retrieval Systems: Theory
and Implementation, Springer, 2004.
2. SoumenChakrabarti, Mining the Web : Discovering Knowledge from Hypertext Data,
Morgan – Kaufmann Publishers, 2002.
3. Christopher D Manning, PrabhakarRaghavan, HinrichSchutze, An Introduction to
Information Retrieval By Cambridge University Press, England, 2009.
VIII. COURSE PLAN:
Course
Lectur Wee Reference
Topic Learning
e k s
Outcomes
UNIT -I
1. Introduction T1, R3
Retrieval strategies: Introduction Understandin
g information
2.
1 retrieval
strategies
3. vector space model With examples
4. vector space model
Probabilistic retrieval strategies:
5.
Introduction
6. Simple term weights
7. 2 Non binary independence model
Non binary independence model,
8.
Language models
Mock Test #1
UNIT-II
9. Retrieval Utilities overview
10. Introduction
11. Retrieval Utilities overview T1, R3
Relevance feedback Knowledge
gathering
3
about Retreival
12.
Utilities and
relevance
feedback
Tutorial / Bridge Class # 1
13. Relevance feedback
14. Relevance feedback
15. 4 clustering
16. Clustering cont’d
Tutorial / Bridge Class # 2
17. N-grams
18. Regression analysis
19. 5 Regression analysis
20. Thesauri.
Tutorial / Bridge Class # 3
UNIT- III
21. Retrieval utilities T1, R3
Retrieval utilities cont’d Applying and
22. examine case
6
studies
23. Case study #1
24. Case study #2
Tutorial / Bridge Class # 4
25. Semantic networks
7
26. Semantic networks cont’d
Case study #1
27.
29. Case study #2
Tutorial / Bridge Class # 5
30. Parsing
31. Parsing cont’d
32. 8 Case study #1
33. Case study #2
Tutorial / Bridge Class # 6
MID-TERM #1 EXAMINATIONS (WEEK-9)
X. MAPPING COURSE OBJECTIVES LEADING TO THE
ACHIEVEMENT OF PROGRAM OUTCOMES AND PROGRAM
SPECIFIC OUTCOMES:
Program Specific
Program Outcomes
CO Outcomes
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3
1 3 -- -- 3 -- -- -- -- -- 2 -- -- 2 -- --
2 -- 3 -- -- -- -- -- -- -- -- -- -- -- 3 --
3 -- 3 3 -- -- -- -- -- -- -- -- -- -- 2 --
4 2 -- -- -- -- -- -- -- -- 3 -- -- 3 -- 2
5 -- -- -- 3 -- -- -- -- -- -- -- -- -- 3 --
6 3 -- -- -- -- 2 -- -- -- -- -- 3 -- 2 --
7 -- -- 3 -- -- -- -- -- -- -- -- -- -- 3 --
8 -- -- -- 3 -- -- -- -- -- 3 -- -- -- 2 --
9 3 -- -- -- -- -- -- -- -- 2 -- 3 2 -- --
10 -- 3 2 -- -- -- -- -- -- -- -- -- -- 3 --
11 3 -- -- -- -- -- -- -- -- 3 -- -- 3 -- --
12 -- -- 3 -- -- -- -- -- -- -- -- -- -- 3 --
AVG 1.2 0.8 0.9 0.8 0 0.2 0 0 0 1.08 0 0.5 0.833 1.75 0.167
X QUESTION BANK
Blooms Course
Taxonomy
S No Question Level Outcome
UNIT – I
Part - A (Short Answer Questions)
1 Define information retrieval system? Knowledge 1
Differentiate DBMS with information retrieval
2 system? Understand 1
3 Differentiate browsing vs. Searching? Knowledge 1
Explain your answer with relevant example Can
4 information retrieval Knowledge 1
system be related to a database management system?
5 Define briefly terms Knowledge 1
1. Precision
2. Recall
6 List 5 challenges of searching for information o the web? Knowledge 1
7 List 3 difference between data retrieval and information Knowledge 1
retrieval?
Part - B (Long Answer Questions)
Explain the differences between Information Retrieval
1 Systems Apply 1
and DBMS?
2 Explain similarity coefficient and determine the ranking of Knowledge 2
following documents
Q:gold silver truck
D1:shipment of gold damaged in a fire
D2:delivery of silver arrived in a silver truck
D3:shipment of gold arrived in a truck
Explain the concept of simple term weights for the above
3 query Understand 2
and documents?
4 Explain inverse document frequency? Evaluate 1
5 Explain about the objectives of IRS? Understand 1
UNIT – II
Part - A (Short Answer Questions)
1 Explain the purpose of retrieval utilities? Knowledge 3
2 Explain the concept of clustering as a retrieval utility? Understand 3
Explain how Relevance feedback is used to improve the
3 results Knowledge 4
of retrieval strategy?
4 Explain N-gram data structure? Knowledge 5
5 Describe regression analysis? Knowledge 6
Part - B (Long Answer Questions)
1 Explain about relevance feedback in vector space model? Understand 3
2 Explain about relevance feedback in probabilistic model? Understand 3
3 Discuss the use of manually generated thesaurus? Knowledge 5
4 Explain the concept of thesauri by constructing term-term Knowledge 3
similarity matrix?
5 Explain the approach of regression analysis to estimate the Knowledge 3
probability of relevance?
Unit III / Part - A (Short Answer Questions)
Discuss R-distance for calculating distance between query
1 and Understand 8
document?
2 Describe how ranking is based on constrained spreading Knowledge 8
activation?
Explain how NLP is used to reduce ambiguity in
3 language? Knowledge 9
4 Define cross language information retrieval? Apply 10
5 Define query translation? Understand 11
Part - B (Long Answer Questions)
1 Explain the concept of semantic networks for automatic Create 6
relevance ranking?
2 Explain why parsing is an essential feature of information Understand 8
retrieval system?
3 Explain three different types of translations? Apply 9
4 Discuss unbalanced and structured queries approaches for Understand 10
choosing translations?
5 Explain about syntactic parsing? Understand 8
UNIT - IV
Part - A (Short Answer Questions)
1 Explain index pruning? Knowledge 12
2 Explain posting list? Understand 12
3 Define document file? Understand 12
4 Describe index? Understand 13
5 Explain about I-Match? Understand 13
Part - B (Long Answer Questions)
Explain methods to reorder documents prior to
1 indexing? Understand 13
2 Discuss methods to compress an inverted index? Knowledge 13
3 Define efficiency? Explain about inverted index? Knowledge 13
4 Explain about throughput-optimized compression? Create 12
5 Explain various top-down and bottom-up algorithms? Create 12
9 Describe the method for finding similar duplicates? Understand 12
Explain how signature files are used to detect
10 duplicates? Understand 12
UNIT - V
Part - A (Short Answer Questions)

1 Define Data Integrity? Knowledge 14


2 Defin performance? Understand 14
Defin
3 e Portability? Understand 14
4 Explain are the extensions to SQL? Understand 14
5 List different types of User-defined Operators? Understand 14
Part - B (Long Answer Questions)
1 Explain about historical progression? Create 14
2 Discuss briefly about user-defined operators? Understand 14
3 Explain Non-first normal form approaches? Understand 14
Discuss about information retrieval as a relational
4 application? Understand 14
5 Explain about Boolean queries? Apply 14
XII OBJECTIVE QUESTIONS
UNIT-I
1.Which function is primarily used to compensate for errors in spelling of words? [ ]
A) Fuzzy B) Indexing C) Ranking D) Zoning
2.The _________ system that acts as a user frontend to the Retrieval are search system
allows the user to browse an item in the order of the paragraphs [ ]
A) OCR B) INQUERY C) DCARS D) NISO
3. The transformation from the received item to the searchable data structure is called[ ]
A) Ranking B) Indexing C) Term Masking D) None
4. The process of creating term linkages at index creation time is called_______ [ ]
A) Post-coordination B) Indexing C) Pre-coordination D) None
5. Concept indexing determines a ________set of concepts based upon a test set of terms and
uses them as a basis for indexing all items [ ]
A) Canonical B) Searching C) Associated D) Relationship
UNIT-II
1. Precision is directly affected by retrieval of non-relevant items and drops to a number
close to ____
2. The rank-frequency law of Ziph is___________
3. The format for proximity is: TERM1 within “m” “units” of TERM2 m___________
4. The _________process is a pattern recognition process that segments the scanned in
image into sub-regions
5. Under Boolean systems, the status display is a count of the number of items found by
the query is____________
UNIT-III
1. An___________________ is a system that is capable of storage, retrieval, and
maintenance of information.
2. The success of IRS can be measured by __________
3. The measures associated with IRS are __________ and ______________(precision
and recall)
4. AFB stands for _______________________(Automatic File Build)
5. The masking is done for single character in __________________
UNIT-IV
1. Words shares the same written form but a different meaning is known as
______________
2. Process of creating term linkages at index creation time is called ________________
3. The weighted systems are mostly known as ________________________
4. SMIL stands for ______________(Synchronized Multimedia Integrated Language)
5. The process of converting the received item into searchable data structure is known as
--------------------
UNIT-V
1. The structure that deals with layout of document context is __________________
2. QBIC is abbreviated as __________________
3. OPAC is abbreviated as _____________________
4. Set of digital objects is treated as __________________
5. The system used by DIALOG is ________________________
XII. RELEVANT SYLLABUS FOR GATE: Not applicable
RELEVANT SYLLABUS FOR IES: Not applicable
XIII WEBSITES
1. Information Storage and Retrieval Systems: Theory and Implementation By Kowalski
(UNIT I to UNIT VI)
2. Modern Information Retrieval by Ricardo Beeza-Yates ( UNIT VII and UNIT VIII)
XIV EXPERT DETAILS
1. Dr.S.ViswanadhaRaju,Professor of CSE JNTUHCE, JNT University Hyderabad
2. Dr. A GovardhanProfessor Computer Science & Engineering at School of
InformationTechnology, Jawaharlal Nehru Technological University Hyderabad
(JNTUH), India
3. Dr. B Padmaja RaniProfessor & Head, Computer Science & Engineering JNTUH
College of Engineering Hyderabad (Autonomous)
XV JOURNALS
1. Information Storage and Retrieval Systems: Theory and Implementation By Kowalski
(UNIT I to UNIT VI)
2. Modern Information Retrieval by Ricardo Beeza-Yates ( UNIT VII and UNIT VIII)
3. International Journal of Multimedia Information Retrieval (IJMIR)
4. International Journal of Information Retrieval Research (IJIRR
XVI .LIST OF TOPICS FOR STUDENT SEMINARS
1. Hypertext data structures and linkages
2. Stemming algorithms
3. Manual clustering
4. Information visualization
5. Measures of Information evaluation
6. Multimedia retrieval systems
XVII .CASE STUDIES/SMALL PROJECTS
 Presentation on image query processing i.e. about QBIC
 Presentation on one of the case studies of Information Retrieval System

You might also like