Professional Documents
Culture Documents
Theory Assignment Information Retrieval (CSE 4053) : Institute of Technical Education and Research
Theory Assignment Information Retrieval (CSE 4053) : Institute of Technical Education and Research
Theory Assignment Information Retrieval (CSE 4053) : Institute of Technical Education and Research
AND RESEARCH
(SOA Deemed to be University)
Theory Assignment
Information Retrieval (CSE 4053)
Submitted By
Name: SANDEEP BEHERA
Registration No.: 1841012480
Branch: CSE
Semester: 7th Section: D
s22 psignmei2
IRlhtovy
ondiep behov
L941012ugO
CsE D
Aw
Vide
|tomo-
4
qU0Ay
looo3
6002
S0000 25
d
2310.
dotwm
o.S
0.S
d
no-wedsize
Soe .S63.
I.S6 12
(Aev) D h
Aay (4)
/3 boo (/)/o
D
tAo))
booK
ove I(1yDyD
y My (/D
Vdn) Aove
(/)01 (s
0)ho
0.0&0
(
) Jo
Ler
Ser mea
() Voantemtam Vaua
w
+0
Mem
s
Maue 5.gs3.5033|146.2s2292 t.25
ferma 5.US|472se|32. s3
claus
clauses
S0 hou equipobobe
0
So P(at)P(4ematd)-
he wa no oun9 hs
idktiedlaonou
AnUm'tion So oe been a a bad Sec
Oy
dedomine Pc) baned On uqvm
H tmanina S
Tmall) P mo)
(mo P(uiqus/
fosHeion(
PC fm) PC4ostsize
/nmaa)
fvidnc
Poptto-ennaeP(tomati) P(at
P(htemat) /tomu
(4fone
evidw
ce
evidtPnau)P(ha/mt)f(tma P(4/ma)
Th eviniemOy' be tqmoudSinceeit a
COa tod
i
butiom au akwa +v
UAtve (Noumol disini
:
e(male)
O
Psma)
S S
ep
2
-
ku anamelts
oC-o
3. 2
dtbolon
ew
tl
mae)
made
S.9e8e
3
1.312e
-Oc
(s
Pestowo mta4o (moe).
Hhun Prodotf 6.1444e-0
PLkmae) 0S
klma)-2.23e
Plisl
femate) 26ciE-I G1
-1
foteionnum nal(4cmoeHph
doc
5,37a ke-0q
7
ie e -7
Sine ptorvo ruata
(on P rudcf
00 0
-1
Soe.,
MultimowalModula)
Taien/e) /s2
0+/c7.
(1oppool).
-199P.
*/s*7
P(Tiasale)- %6
Psoppao/T)2t/c+)4
P-ObloiC kd lam am qn hitet
do CUnus
PLcla)Pl)
PlTao1
|e-r(aaorl)) |
sppan
1-P(Toina|)1-eC )
(1-Plshagho
1-pm1)(1-lsc
RO0OG C
C1-Mo i)
0-{ Shag (-tje
/2))
-t(
4
e -) )
lapam/e))U-P(0saka1)
(1-
(1-V) (rV%)
Sondup
(sE D
THEORY ASSIGNMENTS, FEB-2022
Information Retrieval (CSE 4053)
*Bloom’s taxonomy levels: Knowledge (L1), Comprehension (L2), Application (L3), Analysis (L4), Evaluation
(L5), Creation (L6)
1. Compute the vector space similarity between the query digital cameras and the document
digital cameras and video cameras by filling out the empty columns in the following
table. Assume N = 10,000,000, logarithmic term weighting (wf columns) for query and
document, idf weighting for the query only and cosine normalization for the document
only. Treat and as a stop word. Enter term counts in the tf columns. What is the final
similarity score?
2. Consider a collection made of the 4 following documents (one document per line) :
1. John gives a book to Mary
2. John who reads a book loves Mary
3. who does John think Mary love ?
4. John thinks a book is a good gift
For the 3 terms belonging to the dictionary, namely book, love and Mary. Compute the
tf−idf based vector representation for the 4 documents in the collection
3. Rank the documents in collection {d1 , d2 } for query q using the language model approach
to IR introduced in class with Jelinek-Mercer smoothing. Use the mixture coefficient λ =
0.4.
d1 : Scottish sheep getting smaller due to climate change study says
d2 : The analysis has shown a dramatic shift in the natural ranges for US
Bird species in response to climate change
Query q: climate change
4. Classify whether a given person is a male or a female based on the measured features.
The features include height, weight, and foot size. Data set is shown below
sex height (feet) weight (lbs) foot size
male 6 180 12
(inches)
male 5.92 (5'11") 190 11
male 5.58 (5'7") 170 12
male 5.92 (5'11") 165 10
female 5 100 6
female 5.5 (5'6") 150 8
female 5.42 (5'5") 130 7
female 5.75 (5'9") 150 9
5. Based on the data below, estimate a multinomial Naive Bayes classifier and apply the
classifier to the test document. Calculate the probability that the classifier assigns the test
document to c = China or.