Download as pdf or txt
Download as pdf or txt
You are on page 1of 65

lOMoARcPSD|21906275

Information Retrieval

Information Technology (University of Delhi)

Studocu is not sponsored or endorsed by any college or university


Downloaded by Karthik (shanmugamkarthik848@gmail.com)
lOMoARcPSD|21906275

INFORMATION RETRIEVAL

IR- retrieving relevant documents as per user needs.


IRS should be user-friendly, memory efficient, and retrieve efficiently.

Objectives of IR:
1. Representation
2. Storage and Organization(if the IRS is memory efficient, we can retrieve more
information at a faster rate)
3. Access

IRS are designed keeping the 3 Vs in mind:


velocity(the rate at which data is increasing in the network)
variety(multiple types of data) and Veracity(heterogeneity)
Volume

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Difference between info retr and data retrIEVAL

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Retrieval process:
User generates query
After generation passed to next stage
System query generated
Crawling
Index the documents
Ranking

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

MODELLING

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Why ranking?= to measure relevance between user query and document query
problem(point 3) of uncertainty: a single word with multiple meanings. eg- if we enter
jaguar, we don't know if the user is looking for the car or the animal

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

APPLICATIONS---

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Conditional probability classifier

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

example(case study)

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Semanticity is lost- no meaning after elimination of stop words

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Term document matrix:

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

clustering text categorization

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Probabilistic model

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Responsible to capture vagueness by degree of membership function

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

NEURAL NETWORK MODEL

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

NEW CHAPTER

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

precision= relevant retrieved / retrieved


It tells that how much our model correctly classifies
positives cases out of all actual positive cases​.
recall= relevant retrieved/ relevant
It tells that how much the model correctly predicts the
positive cases out of the cases which the model predicts
positive.

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Reference collection=standard database

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Summary table stats- accuracy


Downloaded by Karthik (shanmugamkarthik848@gmail.com)
lOMoARcPSD|21906275

Inex- exclusively for xml retrieval


Downloaded by Karthik (shanmugamkarthik848@gmail.com)
lOMoARcPSD|21906275

Newsgroup around 20k docs

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)


lOMoARcPSD|21906275

Downloaded by Karthik (shanmugamkarthik848@gmail.com)

You might also like