FS Mod 3 - Multilevel Indexing and B-Trees

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 37

Multilevel Indexing and B-Trees

Introduction-Invention of B-trees
• The goal was the discovery of a general method
for storing and retrieving data in large file
systems that would provide rapid access to the
data with minimal overhead cost.
• Douglas Comer in 1979 wrote an article “The
ubiquitous B-Tree”.
• R Bayer and E.McRight in 1972 published
“organization and Maintainance of Large ordered
Indexes” which announced B-trees to the world.
Statement of the Problem
• Fundamental problem with keeping an index
on Secondary storage is slow. This can be
broken down into two specific problems.
– Searching the index must be faster than binary
searching
– Insertion and deletion must be as fast as search
Indexing the Binary Search Trees
• Looking at the cost of keeping a list in sorted order
we can perform binary searches.
After adding NP MB TM LA UF ND TS NK
AVL Trees
• In honor of the Russian mathematicians, G.M.Adel’son-
Vel’skkii and E.M.Landis who first defined them.
• An AVL tree is hight-balanced tree. There is a limit placed on
the amount of difference allowed between the heights of
any two subtrees sharing common root.
• In AVL tree maximum allowable difference is one.
• An AVL tree hence is called height-balanced 1-tree or HB(1)
tree.
• It is a member of a more general class of height-balanced
trees known as HB(k), which are permitted to be k levels out
of balance.
• Following tree has AVL or HB(1) property.
• BCGEFDA
Paged Binary Trees
• Disk utilization of binary search tree is extremely inefficient.
i.e. when we read a node there are only three useful pieces of
information- key value and address of the left and right
subtree.
• This wastes most of the data read from the disk, which is
critical factor in the cost of searching which we can not afford.
• Paged binary tree attempts to address the problem by locating
multiple binary nodes on the same disk page.
• Here we do not incur the cost of a disk seek just to get few
bytes.
• Once we take time to seek an area of the disk we read entire
page from the file.
• Paging is potential solution to the inefficient
disk utilization of binary search trees.
• By dividing a binary tree into pages and then
storing each page in a block of contiguous
locations on disk, we should be able to reduce
the number of seeks associated with any
search.
• Paging has the potential to result faster
searching on secondary storage.
• In this tree we are able to locate any of the 63 nodes in the
tree with no more two disk accesses.
• Every page holds 7 nodes and can branch to eight new
pages.
• If we extend to one more level we add 64 new pages, we can
find any one of 511 nodes in only three seeks.
Problems with paged trees
• Inefficient disk usage : In previous tree there
are seven nodes per page. Of the 14 reference
fields in a single page 6 of them are reference
nodes within the page. i.e. we are using 14
reference fields to distinguish between 8
subtrees. Still wastage of memory.
• How to build paged tree? : We need sorted
list to build a paged tree.
B-Trees:
• Create a B-Tree for the following elements
An object oriented representation of B-Trees
Class BTree: Supporting Files of B-Tree Nodes

• Class Btree uses in-memory BTreeNode


objects, adds the file access portion and
enforces the consistent size of the nodes.
• The following code defines class Btree .
Searching in B-Tree
• Characteristics of most B-Tree algorithms
1. They are iterative
2. They work in two stages, operating alternatively on
entire pages(Class Btree) and then within pages(class
BTreeNode)

• Searching procedure is iterative, loading a page into


memory and then searching through the page,
looking for the key successively lower levels of the
tree until reaches the leaf level.
Insertion
• There are two important observations we can make
about the insertion, splitting and promotion process:
• The first operation in method Insert is to search to the root for
key using FindLeaf:
thisNode = FindLeaf(key);
• The next step is to insert key into the leaf node
result = thisNode->Insert(key,recAddr)
• When overflow is detected, the node must be split into two
nodes using following code
newNode=NewNode();
thisNode->Split(newNode);
Store(thisNode);
Store(newNode);
• The next step is to update the parent node. Since the largest key
in thisNode has changed,method UpdateKey is used to record
the change
parentNode->UpdateKey(largestKey, thisNode->LargestKey());
Testing the B-Tree
Worst Case Search Depth
• It is important to understand the relationship between the
page size of B-tree , the number of keys to be stored in the
tree, and the number of levels that the tree can extend.
• Example: Suppose we want to store 1000000 keys and
that, given nature of storage hardware and the size of
keys, it is reasonable to consider using a B-tree of order
512.
• In the worst case what will be the max number of disk
accesses required to locate a key in the tree? Or how deep
the tree will be?
• We can answer this by noting every key appears
in the leaf level. Hence , we need to calculate the
maximum height of a tree with 1000000 in the
leaves.
• By observing formal definition of B-tree
properties to calculate minimum number of
descendants that can extend from any level of B-
tree of some given order.
• The worst case occurs when every page of the
tree has only maximum number of descendants.
• In such case the keys are spread over a maximal
height for the tree and a minimal breadth.
• For a B-tree of order m, the minimum number of
descendants from the root page is 2, so the second
level of the tree contains only 2 pages.
• Each of these pages, in turn, has at least m/2
descendants.
• The third level then contains 2Xm/2 pages.
• The general pattern of the relation between depth and
the minimum number of descendants takes following
form:
Deletion, Merging and Redistribution
1. Deletion of C from above tree does not affect the tree.
2. Deletion of P changes P to O in the second level and
the root.
3. Deleting H, Causes an underflow and two leaf nodes
were merged.

You might also like