Theory Assignment Information Retrieval (CSE 4053) : Institute of Technical Education and Research

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

INSTITUTE OF TECHNICAL EDUCATION

AND RESEARCH
(SOA Deemed to be University)

Theory Assignment
Information Retrieval (CSE 4053)

Submitted By
Name: SANDEEP BEHERA
Registration No.: 1841012480
Branch: CSE
Semester: 7th Section: D
s22 psignmei2
IRlhtovy
ondiep behov
L941012ugO
CsE D
Aw

Vide
|tomo-
4
qU0Ay

looo3
6002
S0000 25
d
2310.
dotwm

o.S
0.S
d
no-wedsize

Soe .S63.
I.S6 12

(Aev) D h
Aay (4)
/3 boo (/)/o
D
tAo))
booK
ove I(1yDyD
y My (/D
Vdn) Aove

(/)01 (s

0)ho
0.0&0
(
) Jo

Ler
Ser mea
() Voantemtam Vaua
w

+0
Mem
s
Maue 5.gs3.5033|146.2s2292 t.25

ferma 5.US|472se|32. s3
claus
clauses
S0 hou equipobobe
0
So P(at)P(4ematd)-
he wa no oun9 hs
idktiedlaonou
AnUm'tion So oe been a a bad Sec
Oy
dedomine Pc) baned On uqvm
H tmanina S
Tmall) P mo)
(mo P(uiqus/
fosHeion(
PC fm) PC4ostsize
/nmaa)

fvidnc

Poptto-ennaeP(tomati) P(at
P(htemat) /tomu

(4fone
evidw
ce

evidtPnau)P(ha/mt)f(tma P(4/ma)

Th eviniemOy' be tqmoudSinceeit a
COa tod
i
butiom au akwa +v
UAtve (Noumol disini
:
e(male)
O
Psma)
S S
ep
2
-
ku anamelts
oC-o
3. 2

dtbolon
ew
tl
mae)
made
S.9e8e
3
1.312e
-Oc
(s
Pestowo mta4o (moe).
Hhun Prodotf 6.1444e-0

PLkmae) 0S
klma)-2.23e
Plisl
femate) 26ciE-I G1
-1
foteionnum nal(4cmoeHph
doc
5,37a ke-0q
7
ie e -7
Sine ptorvo ruata
(on P rudcf
00 0

-1
Soe.,
MultimowalModula)

Taien/e) /s2
0+/c7.
(1oppool).
-199P.
*/s*7
P(Tiasale)- %6
Psoppao/T)2t/c+)4
P-ObloiC kd lam am qn hitet
do CUnus

PLcla)Pl)
PlTao1
|e-r(aaorl)) |
sppan
1-P(Toina|)1-eC )
(1-Plshagho

1-pm1)(1-lsc

RO0OG C

C1-Mo i)
0-{ Shag (-tje
/2))
-t(
4
e -) )
lapam/e))U-P(0saka1)
(1-
(1-V) (rV%)

Aowd S notin aus

Sondup

(sE D
THEORY ASSIGNMENTS, FEB-2022
Information Retrieval (CSE 4053)

Programme: B. Tech (CSE) Semester:7th


Full Marks: 10 Submission Dt.-10.02.2022

Subject/Course Learning Outcome *Taxonomy Ques. Marks


Level Nos.
Outline the concepts and apply the basics of
indexing and querying of an information
retrieval system

Understand the data corpus used in


information retrieval systems.

Illustrate various components and experiment


with different compression techniques to
compress the index of dictionary and its L1,L2,L3 1,2 4
postings lists.

Apply retrieval models to construct information


L1,L2,L3 3 2
retrieval systems

Apply text clustering and classification L2,L3,L4 4,5 4


techniques for information retrieval.

*Bloom’s taxonomy levels: Knowledge (L1), Comprehension (L2), Application (L3), Analysis (L4), Evaluation
(L5), Creation (L6)

Answer all questions. Each question carries equal mark.

 Assignment scores/markings depend on neatness, clarity and date of submission.


 Write your answers with enough detail about your approach and concepts used, so
that the grader will be able to understand it easily.
 You are allowed to use only those concepts which are covered in the lecture class till
date.

1. Compute the vector space similarity between the query digital cameras and the document
digital cameras and video cameras by filling out the empty columns in the following
table. Assume N = 10,000,000, logarithmic term weighting (wf columns) for query and
document, idf weighting for the query only and cosine normalization for the document
only. Treat and as a stop word. Enter term counts in the tf columns. What is the final
similarity score?

2. Consider a collection made of the 4 following documents (one document per line) :
1. John gives a book to Mary
2. John who reads a book loves Mary
3. who does John think Mary love ?
4. John thinks a book is a good gift
For the 3 terms belonging to the dictionary, namely book, love and Mary. Compute the
tf−idf based vector representation for the 4 documents in the collection
3. Rank the documents in collection {d1 , d2 } for query q using the language model approach
to IR introduced in class with Jelinek-Mercer smoothing. Use the mixture coefficient λ =
0.4.
d1 : Scottish sheep getting smaller due to climate change study says
d2 : The analysis has shown a dramatic shift in the natural ranges for US
Bird species in response to climate change
Query q: climate change

4. Classify whether a given person is a male or a female based on the measured features.
The features include height, weight, and foot size. Data set is shown below
sex height (feet) weight (lbs) foot size
male 6 180 12
(inches)
male 5.92 (5'11") 190 11
male 5.58 (5'7") 170 12
male 5.92 (5'11") 165 10
female 5 100 6
female 5.5 (5'6") 150 8
female 5.42 (5'5") 130 7
female 5.75 (5'9") 150 9

Below is a sample to be classified using Neive Bayes algorithm as a male or female.

sex height weight foot size


sample 6(feet) 130
(lbs) 8(inches)

5. Based on the data below, estimate a multinomial Naive Bayes classifier and apply the
classifier to the test document. Calculate the probability that the classifier assigns the test
document to c = China or.

docID words in document In c= China?


training set 1 Taipei Taiwan yes
2 Macao Taiwan Shanghai yes
3 Japan Sapporo no
4 Sapporo Osaka Taiwan no
5 London no
test set 6 Taiwan Taiwan Sapporo ?

You might also like