Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

DSCI 32013

Advanced Database Systems with Applications

Dr.(Mrs) M. Carmel Wijegunasekara


email – carmel@kln.ac.lk

Faculty of Computing and Technology,


University of Kelaniya

Note 5 Dept. of Software Engineering Faculty of


1 Computing and Technology
Dynamic Multilevel Indexing
(B-Trees and B+-Trees)

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 2


LEARNING OUTCOMES
• By the end of this lesson you should be able to

 Describe the tree structure indexing B-tree and B+-tree


 Construct a B-tree and a B+-tree
 Describe the difference between the B-tree and B+-tree
 Describe the performance of the B-tree and B+-tree
 Calculate the necessary parameters for a given data file using B-tree and B+-tree

Note 5 Dept. of Software Engineering Faculty of Computing and Technology


Introduction
• A tree is formed of nodes.
• Each node in the tree, except for a special node called the root, has
one parent node and several zero or more-child node.
• The root node has no parent
• A node that does not have any child is called a leaf node
• A nonleaf node is called an internal node
• The level of a node is always one more than the level of its parent,
with the level of the root being zero.
• A sub tree of a node consists of that node and all its descendent
nodes
Note 5 Dept. of Software Engineering Faculty of Computing and Technology 4
Note 5 Dept. of Software Engineering Faculty of Computing and Technology 5
B-Tree
• B-Trees are multi-way search trees commonly used in database
systems or other applications where data is stored externally on disks
and keeping the tree shallow is important.
• B-tree is always balanced
• A B-tree of order p, when used to access structure on a key field to
search for records in a datafile, can be defined as:

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 6


1. Each internal node in the B-tree is of the form
<P1, <K1,Pr1>, P2, <K2,Pr2>, ……… Pq-1, <Kq-1,Prq-1>, Pq-1 > where q<=p
Pi – tree pointer (a pointer to another node in the B-tree)
Pri – data pointer ( a pointer to the record whose search key field
value is equal to Ki or to the data file block containing that record)
2. Within each node K1 < K2<……..<Kq-1
3. For all search key field values X in the subtree pointed by Pi (the ith sub
tree) we have
Ki-1 < X < Ki for 1 < i < q
X < Ki for i = 1
Ki-1 < X for i = q

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 7


(a) A node in a B-tree with q –1 search values. (b) A B-tree of order p = 3. The values
were inserted in the order 8, 5, 1, 7, 3, 12, 9, 6.

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 8


4. Each node has at most p tree pointers
5. Each node except the root and leaf nodes, has at least  p/2
tree pointers. The root node has at least two tree pointers unless it is
the only node on the tree.
6. A node with q tree pointers, q <= p, has q-1 search key fields ( and
hence has q-1 data pointers)
7. All leaf nodes are at the same level. Leaf nodes have the same
structure as the internal nodes except that all of their tree pointers Pi
are NULL

Note: if we use B-tree on non-key field, we must change the definition


of the file pointers Pri to point to a block – or cluster of blocks that
contain the pointers to the file records
Note 5 Dept. of Software Engineering Faculty of Computing and Technology 9
• A B-tree starts with a single node at level 0
• Once the root node is full with p-1 search key values and we attempt to
insert another entry in the tree
• The root node splits into two nodes at level 1
• Only the middle value is kept in the root node
• The rest of the values are split evenly among the other two nodes
• When a non root node is full, and a new entry is inserted
• That node is split into two nodes at the same level
• The middle entry is moved to the parent node along with the two pointers to the
new split node
• If parent node is full, it is also split
• Splitting can propagate all the way to the root node, creating a new level if the root is
split.

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 10


Constructing a B-tree
• Suppose we start with an empty B-tree and keys arrive in the
following order:1 12 8 2 25 5 14 28 17 7 52 16 48 68 3 26 29
53 55 45
• We want to construct a B-tree of order 5
• The first four items go into the root:
1 2 8 12

• To put the fifth item in the root, the root is full


• Therefore, when 25 arrives, pick the middle key to make a new root
Note 5 Dept. of Software Engineering Faculty of Computing and Technology 11
Constructing a B-tree (contd.)

1 2 12 25

6, 14, 28 get added to the leaf nodes:


8

1 2 6 12 14 25 28

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 12


17 7 52 16 48 68 3 26 29 53 55 45 8

Constructing a B-tree (contd.) 1 2 6 12 14 25 28

Adding 17 to the right leaf node would over-fill it, so we take the middle
key, promote it (to the root) and split the leaf

8 17

1 2 6 12 14 25 28

7, 52, 16, 48 get added to the leaf nodes


8 17

1 2 6 7 12 14 16 25 28 48 52

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 13


68 3 26 29 53 55 45
8 17

Constructing a B-tree (contd.)


1 2 6 7 12 14 16 25 28 48 52

Adding 68 causes us to split the right most leaf, promoting 48 to the root, and adding 3
causes us to split the left most leaf, promoting 3 to the root; 26, 29, 53, 55 then go into the
leaves
3 8 17 48

1 2 6 7 12 14 16 25 26 28 29 52 53 55 68

Adding 45 causes a split of 25 26 28 29

and promoting 28 to the root then causes the root to split

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 14


Constructing a B-tree (contd.)

17

3 8 28 48

1 2 6 7 12 14 16 25 26 29 45 52 53 55 68

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 15


• If deletion of a value causes a node to be less than half full, it is
combined with neighboring node.
• This can propagate all the way to the root.
• Hence reduce the number of levels

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 16


Exercise
Search field V = 9 bytes; B = 512 bytes; record (data) pointer Pr = 7 bytes;
block pointer (tree) P = 6 bytes; Each B-tree node can have at most p
pointers (tree) and p-1 data pointers and p-1 search key fields. If each B-
tree node is to correspond to a disk block find the value of p.

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 17


(p*P) + ((p-1)*(Pr + V ) <= B

(p*6) + ((p-1)*(7+9) <= 512


6p +16p -16 <= 512
22p <= 528

We consider p = 23
p = 24 is not chosen because
may contain additional information such as number of entries (q)
pointer to the parent …

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 18


Exercise
Suppose that the search field of the above example is a nonordering
key field, and we construct a B-tree on this field. Assume that each
node of the B-tree is 69% full.
(a) On average how many pointers and search key field values will
each node have?
(b) Starting from the root how many values and pointers can exists, on
average at each subsequent levels
(c) Find the average entries of a two level and three level B-tree

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 19


(a) We calculated p as 23
Since 69% a node is full

the number of average pointers = p*0.69 = 16


the number of search key field values = 16 – 1 = 15
(b)
root 1 node 15 entries 16 pointers
Level 1: 16 nodes 240 (16*15) entries 256 (16*16) pointers
Level 2: 256 nodes 3840 ( 256*15) entries 4096 (256*16) pointers
Level 3: 4096 nodes 61440 (4096*15) entries 65536 pointers

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 20


(c)
second level = 3840+240+15 = 4095
third level = 4095+ 61440 = 65535

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 21


B+ -Tree
• Variation of the B-tree
• The structure of leaf nodes differ from the structure of the internal node
• The leaf nodes have an entry for every value of the search field, along
with a data pointer to the record if the search key is a key field
• For a nonkey search field, the pointer points to a block containing
pointers to the datafile records, creating an extra level of indirection.
• Some search field values from the leaf nodes are repeated in the
internal node of the B+-tree to guide the search

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 22


The structure of the internal node of a B+-tree of order p
1. Each internal node is of the form
<P1, K1, P2, K2, …….., Pq-1, Kq-1, Pq> where q<= p and each Pi is a tree pointer
2. Within each internal node K1< K2 <……..< Kq-1
3. For all search key field values X in the subtree pointed by Pi (the ith sub tree)
we have
Ki-1 < X < Ki for 1 < i < q
X < Ki for i = 1
Ki-1 < X for i = q

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 23


4. Each internal node has at most p tree pointers
5. Each internal node except the root, has at least  p/2 tree pointers.
The root node has at least two tree pointers if it is an internal node.
6. A internal node with q tree pointers, q <= p, has q-1 search key field
values

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 24


(a) Internal node of a B+-tree with q –1 search values. (b) Leaf node of a B+-tree with q –1 search
values and q –1 data pointers.

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 25


The structure of the leaf node of a B+-tree of order p
1. Each leaf node is of the form
< <K1,Pr1>, <K2,Pr2>, ……… <Kq-1,Prq-1>, Pnext > where q<=p
each Pri – data pointer, Pnext points to the next leaf node of the B+-tree
2. Within each leaf node K1 <= K2<=……..<=Kq-1 q <=p
3. Each Pri is a data pointer that points to the record whose search field
value Ki or to a file block containing the record (or to a block of record
pointers that point to records whose search field value is Ki, if the search
field is not a key.)
4. Each leaf node has at least  p/2 values.
5. All leaf nodes are at the same level

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 26


insertion of records in a B+-tree of order p = 3 and pleaf = 2
insertion sequence – 8, 5, 1, 7, 3, 12, 9, 6

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 27


Note 5 Dept. of Software Engineering Faculty of Computing and Technology 28
Note 5 Dept. of Software Engineering Faculty of Computing and Technology 29
Note 5 Dept. of Software Engineering Faculty of Computing and Technology 30
Note 5 Dept. of Software Engineering Faculty of Computing and Technology 31
8, 5, 1, 7, 3, 12, 9, 6

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 32


Performance issues
• Because entries in the internal nodes just contain search values and tree
pointers without any data pointers, more entries can be accommodated
into an internal node of a B+ tree than for a similar B-tree.
• Thus, for the same block (node) size, the order p can be larger for the B+ tree than
for a B-tree. This can lead to fewer B+ tree levels, and therefore improve search time.
• Because structures of internal nodes and leaf nodes are different, their
orders can be different.
• Generally, the order p is for the internal nodes, and leaf nodes can have a
different order denoted as pleaf, which is defined as being the maximum
number of data pointers in a leaf node.

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 33


Exercise
• Search field V = 9 bytes; B = 512 bytes; record (data) pointer Pr = 7
bytes; block pointer (tree) P = 6 bytes; Each B+-tree internal node can
have at most p pointers and p-1 search field values.
(a) Find the order of the internal node of the B+-tree
(b) Find the order of the leaf node of the B+-tree

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 34


(p*P) + ((p-1)* V <= B

(p*6) + ((p-1)*9 <= 512


6p +9p -9 <= 512
15p <= 521

p = 34
Larger than the value of 23 for the B-tree
As with B-tree we might have additional information
Note 5 Dept. of Software Engineering Faculty of Computing and Technology 35
(pleaf*(Pr + V)) + Pnext <= B

(pleaf *(7+9) )+ 6 <= 512


16pleaf +6 <= 512
16pleaf <= 506
pleaf = 31
Each leaf node can hold up to 31 key values/data point combinations

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 36


Exercise
Suppose that we construct a B+-tree on the field given in the previous
example. Assume that each node of the B+-tree is 69% full.
(a) On average how many points and search key field values will each
node have (internal and leaf)?
(b) Starting from the root how many values and pointers can exists, on
average at each subsequent levels

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 37


(a) For internal node:
We calculated p as 34
Since 69% a node is full

the number of average pointers = p*0.69 = 34*0.69 = 23


the number of search key field values = 23 – 1 = 22
For leaf node:
We calculated pleaf as 31
Since 69% a node is full

the number of average pointers = pleaf*0.69 = 31*0.69 = 21


the number of search key field values = 21
Note 5 Dept. of Software Engineering Faculty of Computing and Technology 38
(b)
root 1 node 22 entries 23 pointers
Level 1: 23 nodes 506 (23*22) entries 529 (23*23) pointers
Level 2: 529 nodes 11638 ( 529*22) entries 12167 (529*23) pointers
Leaf : 12167 nodes 255507 data record pointers

Note 5 Dept. of Software Engineering Faculty of Computing and Technology 39

You might also like