Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

581257 Information Retrieval Methods Autumn 2010 Exercise 1, solutions

1. What kind of information retrieval tasks you have com leted! What kind of retrieval methods you have used to com lete them! Solution: "he ans#er is individual. Methods mentioned may include searchin$ information on a certain to ic usin$ some search en$ine like %oo$le& searchin$ scientific articles #ith some li'rary search system or from di$ital li'raries& searchin$ for availa'le fli$hts and accommodation #ith some search en$ine of a travel a$ency& 'ro#sin$ in a #e' site& etc. 2. ()ourse 'ook*s e+ercise 1.2, )onsider these documentsDoc 1 Doc 2 Doc 3 Doc 4 'reakthrou$h dru$ for schi.o hrenia ne# schi.o hrenia dru$ ne# a roach for treatment of schi.o hrenia ne# ho es for schi.o hrenia atients

a, /ra# the term0document incidence matri+ for this document collection. Solution: "erm0document matri+D1 a roach 0 1 1 1 0 0 0 0 1 0 D2 0 0 1 0 0 1 0 0 1 0 D3 1 0 0 1 0 1 1 0 1 1 D4 0 0 0 1 1 1 0 1 1 0

'reakthrou$h dru$ for ho es ne# of atients schi.o hrenia treatment

', /ra# the inverted inde+ re resentation for this collection& as in 1i$ure 1.2 ( a$e 3,.
Solution: Inverted inde+a roach 4 2 'reakthrou$h 4 1 dru$ 4 1 4 2 for 4 1 4 2 4 5 ho es 4 5 ne# 4 2 4 2 4 5 of 4 2 atients 4 5 schi.o hrenia 4 1 4 2 4 2 4 5 treatment 4 2

2. ()ourse 'ook*s e+ercise 1.2, 1or the document collection sho#n in 6+ercise 2 ()ourse 'ook*s e+ercise 1.2,& #hat are the returned results for these 7ueriesa, schi.o hrenia A8/ dru$ Solution: doc1& doc2 ', for A8/ 89" (dru$ 9R a Solution: doc5 5. ()ourse 'ook*s e+ercise 1.5, 1or the 7ueries 'elo#& can #e still run throu$h the intersection in time O(x+y), #here x and y are the len$ths of the ostin$s lists for :rutus and )aesar! If not& #hat can #e achieve! a, :rutus A8/ 89" )aesar Solution: "ime is 9(+;y,. Instead of collectin$ documents that occur in 'oth those that occur in the first one and not in the second. ', :rutus 9R 89" )aesar Solution: "ime is 9(8, (#here 8 is the total num'er of documents in the collection, assumin$ #e need to return a com lete list of all documents satisfyin$ the 7uery. "his is 'ecause the len$th of the results list is only 'ounded 'y 8& not 'y the len$th of the ostin$s lists. 5. ()ourse 'ook*s e+ercise 1.7, Recommend a 7uery rocessin$ order for (tan$erine 9R trees, A8/ (marmalade 9R skies, A8/ (kaleidosco e 9R eyes, $iven the follo#in$ ostin$s list si.esTerm eyes kaleidosco e marmalade skies tan$erine trees Postings size 212212 8700< 107<12 271358 53352 213812 ostin$s lists& collect roach,

Solution: =sin$ the conservative estimate of the len$th of the union of ostin$s lists& the recommended order is(kaleidosco e 9R eyes, (200&221, A8/ (tan$erine 9R trees, (232&535, A8/ (marmalade 9R skies, (27<&571, >o#ever& de endin$ on the actual distri'ution of ostin$s& (tan$erine 9R trees, may #ell 'e lon$er than (marmalade 9R skies,& 'ecause the t#o com onents of the former are more asymmetric. 1or e+am le& the union of 11 and <<<0 is e+ ected to 'e lon$er than the union of 5000 and 5000 even thou$h the conservative estimate redicts other#ise.

3. "ry the search feature at http://www.rhymezone.com/shakespeare/. Write do#n five search features you
think it could do 'etter. Solution: ?uch features may include e+cludin$ search terms (89",& other :oolean o erators& rankin$ 'ased on the relevance& searchin$ for a com'ination of sin$le #ord and hrases& #ild0card 7ueries& ro+imity 7ueries& @.. 7. ()ourse 'ook*s e+ercise 1.12, "ry usin$ the :oolean search features on a cou le of maAor #e' search en$ines. 1or instance& choose a #ord& such as 'ur$lar& and su'mit the 7ueries (i, 'ur$lar& (ii, 'ur$lar A8/ 'ur$lar& and (iii, 'ur$lar 9R 'ur$lar. Book at the estimated num'er of results and to hits. /o they make sense in terms of :oolean lo$ic! 9ften they haven*t for maAor search en$ines. )an you make sense of #hat is $oin$ on! What a'out if you try different #ords! 1or e+am le& 7uery for (i, kni$ht& (ii, con7uer& and then (iii, kni$ht 9R con7uer. What 'ound should the num'er of results from the first t#o 7ueries lace on the third 7uery! Is this 'ound o'served! Example solution: 9'servations on 7ueries #ith some o ular search en$inesGoogle 'ur$lar Yahoo lta!ista "ing

8&220&000 2&330&000 0 2&<<0&000 23&800&000 2&510&000 38<&000 0 880&000 2&<30&000 23&800&000 5&<<0&000 23&<00&000 2&530&000

'ur$lar A8/ 'ur$lar 5&250&000 'ur$lar 9R 'ur$lar 5&<80&000

"he num'er of hits may sli$htly chan$e if the same 7uery is run several times. "he numeric results de end on #hich search en$ine version you are usin$ (e.$.& $oo$le.com or $oo$le.fi,. InferencesWhen the o erator A8/ is used in a 7uery& %oo$le rom ts not to use that 'ecause it includes all terms of a 7uery 'y default. Cahoo and :in$ seem to treat the term A8/ as a normal term and not as a :oolean o erator as sho#n 'y the to results and the total num'er of hits. Altavista seems to follo# the :oolean lo$ic most ti$htly. In case of %oo$le& the to documents returned #ith A8/ had the term 'ur$lar more than once #hich tells that 7uery rocessin$ does not include redundant terms removal. ?ur risin$ly& :in$ returns t#ice the num'er of documents #ith the term A8/ D it #ould 'e interestin$ to hear& #hat*s the reason for that. "he term 9R is treated as a :oolean o erator 'y Altavista& %oo$le and Cahoo& 'ut the num'er of hits is still a 'it confusin$ in the case of %oo$le. 9'servationsGoogle kni$ht con7uer kni$ht and con7uer 82&700&00 1<&800&000 1&820&000 Yahoo 3&<20&000 8&200&000 2&<50&000 2&170&000 2&700&000 lta!ista "ing

5<5&000&000 5&800&000 155&000&000 5&320&000 27&800&000 27&800&000 20&100&000 2&<50&000 2&020&000 2&<20&000

kni$ht A8/ con7uer 1&820&000 kni$ht or con7uer kni$ht 9R con7uer 1&820&000

212&000&000 12&300&000 317&000&000 8&830&000

Inferences"he num'er of hits for kni$ht A8/ con7uer should 'e less than for kni$ht or con7uer se arately #hich is o'served in all 5 searches. "he num'er of hits for kni$ht 9R con7uer should 'e more than for kni$ht or con7uer se arately& #hich is o'served in all 5 searches.

You might also like