Professional Documents
Culture Documents
Algorithms For Information Retrieval: Index Construction
Algorithms For Information Retrieval: Index Construction
RETRIEVAL
Index Construction
Bhaskarjyoti Das
Department of Computer Science Engineering
ALGORITHMS FOR INFORMATION
RETRIEVAL
Bhaskarjyoti Das
Department of Computer Science Engineering
ALGORITHMS FOR INFORMATION RETRIEVAL
Opportunities for Improvement with BSBI Sort Based Algorithm
▪Till we run out of main memory, work with the current block
▪docid sequentially accessed
▪Let‘s say a document ( docid n) is getting parsed
▪If the term is new, add it to the dictionary for the current
block
▪else identify the record for the particular term that is
already existing
▪If initially allocated small posting list is full, double it
▪ Add the docid n to the posting list for this term
▪Sort on terms for lexicographic ordering
▪Write the index block (dictionary and posting list) into the disk
▪Start the next block with a new dictionary
ALGORITHMS FOR INFORMATION RETRIEVAL
SPIMI-INVERT
Parse document and generate token stream of (term-docid) pair
…….
Each call writes a Block to disk
Bhaskarjyoti Das
Department of Computer Science Engineering