Professional Documents
Culture Documents
FS Mod 3 - Multilevel Indexing and B-Trees
FS Mod 3 - Multilevel Indexing and B-Trees
FS Mod 3 - Multilevel Indexing and B-Trees
Introduction-Invention of B-trees
• The goal was the discovery of a general method
for storing and retrieving data in large file
systems that would provide rapid access to the
data with minimal overhead cost.
• Douglas Comer in 1979 wrote an article “The
ubiquitous B-Tree”.
• R Bayer and E.McRight in 1972 published
“organization and Maintainance of Large ordered
Indexes” which announced B-trees to the world.
Statement of the Problem
• Fundamental problem with keeping an index
on Secondary storage is slow. This can be
broken down into two specific problems.
– Searching the index must be faster than binary
searching
– Insertion and deletion must be as fast as search
Indexing the Binary Search Trees
• Looking at the cost of keeping a list in sorted order
we can perform binary searches.
After adding NP MB TM LA UF ND TS NK
AVL Trees
• In honor of the Russian mathematicians, G.M.Adel’son-
Vel’skkii and E.M.Landis who first defined them.
• An AVL tree is hight-balanced tree. There is a limit placed on
the amount of difference allowed between the heights of
any two subtrees sharing common root.
• In AVL tree maximum allowable difference is one.
• An AVL tree hence is called height-balanced 1-tree or HB(1)
tree.
• It is a member of a more general class of height-balanced
trees known as HB(k), which are permitted to be k levels out
of balance.
• Following tree has AVL or HB(1) property.
• BCGEFDA
Paged Binary Trees
• Disk utilization of binary search tree is extremely inefficient.
i.e. when we read a node there are only three useful pieces of
information- key value and address of the left and right
subtree.
• This wastes most of the data read from the disk, which is
critical factor in the cost of searching which we can not afford.
• Paged binary tree attempts to address the problem by locating
multiple binary nodes on the same disk page.
• Here we do not incur the cost of a disk seek just to get few
bytes.
• Once we take time to seek an area of the disk we read entire
page from the file.
• Paging is potential solution to the inefficient
disk utilization of binary search trees.
• By dividing a binary tree into pages and then
storing each page in a block of contiguous
locations on disk, we should be able to reduce
the number of seeks associated with any
search.
• Paging has the potential to result faster
searching on secondary storage.
• In this tree we are able to locate any of the 63 nodes in the
tree with no more two disk accesses.
• Every page holds 7 nodes and can branch to eight new
pages.
• If we extend to one more level we add 64 new pages, we can
find any one of 511 nodes in only three seeks.
Problems with paged trees
• Inefficient disk usage : In previous tree there
are seven nodes per page. Of the 14 reference
fields in a single page 6 of them are reference
nodes within the page. i.e. we are using 14
reference fields to distinguish between 8
subtrees. Still wastage of memory.
• How to build paged tree? : We need sorted
list to build a paged tree.
B-Trees:
• Create a B-Tree for the following elements
An object oriented representation of B-Trees
Class BTree: Supporting Files of B-Tree Nodes