Professional Documents
Culture Documents
Data Structure Java - Collections Ok-466-482
Data Structure Java - Collections Ok-466-482
Data Structure Java - Collections Ok-466-482
16.2.2 Search
B-tree search is quite straightforward. Figure 16.17 illustrates searches in a 5-ary B-tree:
(a) To search the B-tree for NO, we first look among the elements of the root node (DK,
IS, NO). We find NO there, so the search is successful.
(b) To search the B-tree for IT, we first look among the elements of the root node (DK, IS,
NO). We do not find IT there, but we see that IT is greater than IS and less than NO, so
450 lava Collections
we proceed to the right child of IS / left child of NO. Now we look among the
elements of that node (IT, LU, NL). We do find IT there, so the search is successful.
(c) To search the B-tree for DE, we first look among the elements of the root node (DK.
IS, NO). We do not find DE there, but we see that DE is less than DK, so we proceed
to the left child of DK. Now we look among the elements of that node (BE, CA. DE).
We do find DE there, so the search is successful.
(d) To search the B-tree for SE, we first look among the elements of the root node (DK.
IS, NO). We do not find SE there, but we see that SE is greater than NO. so we
proceed to the right child of NO. Now we look among the elements of that node (PT.
TR, UK, US). We do not find SE there, so we proceed to the left child of TR. But there
is no such child, so the search is unsuccessful.
These examples illustrate an advantage of a B-tree over a BST: the average search path
is shorter, since the B-tree is shallower. This advantage is, however, offset by the need to
search the elements within each B-tree node on the search path.
public class Btree {
/ / Each Btree object is a B-tree header.
/ / This B-tree is represented as follows: arity is the maximum number
/ / of children per node, and root is a link to its root node.
// Each B-tree node is represented as follows: size contains its size; a
// subarray elems[0. ..size-1] contains its elements;and a subarray
// childs[0...size] contains links to its child nodes. For each element
// elems[i], childs[i] is a link to its left child, and childs[i+l] is a
// link to its right child. In a leaf node, all child links are null.
/ / Moreover, for every element x in the left subtree of element y:
// x.compareTo(y) < 0
/ / and for every element z in the right subtree of element y:
// z.compareTo(y)> 0.
private int arity;
private Btree.Node root;
public Btree (int k) {
/ / Construct an empty B-tree of arity k.
root = null;
arity = k;
this.childs[this.size] = childs[r];
root node
f t // T 1 // 1 childs
left right child right child right
child of / left of yl left child
of u child of v child of z of z
Figure 16.16 Detailed structures of Btree and Btree .Node objects representing a k-ary B-tree.
unsuccessful
Figure 16.17 Illustrations of search in a 5-ary B-tree.
Balanced Search Tree Data Structures 453
Program 16.19 Implementation of the B-tree search algorithm as a Java method (in class Btree),
with an auxiliary method (in class Btree .Node).
454 lava Collections
Algorithm 16.18 captures the idea of B-tree search. It can be seen as a generalization of
BST search (Algorithm 10. 10).
Program 16.19 shows an implementation of the B-tree search algorithm. The search
method belongs to the Btree class (Program 16.15). The auxiliary searchlnNode
method belongs to the Btree.Node inner class. It searches a node for the target
element, using the array binary search algorithm. (Binary search might seem excessive
for a 5-ary B-tree, whose nodes contain at most 4 elements, but it is fully justified in
practical applications where we use high-arity B-trees.)
A
IS:
•CA»FR» •UK»US»
[•BE»CA»DK»FR«
|*PT*UK»US»|
Figure 16.20 Illustration of insertions in an (initially empty) 5-ary B-tree (continued on next page}.
Balanced Search Tree Data Structures 455
16.2.3 Insertion
Now let us consider the problem of inserting a new element into a B-tree. We start with a
complete illustration.
Figure 16.20 illustrates the effects of successfully inserting country-codes into a 5-ary B-
tree. Initially, the B-tree is empty, i.e., its header contains a null link. Let us study this
illustration in detail.
(a) To insert US into the B-tree, we create a root node containing only US. (This root
node is also a leaf node.)
(b) To insert CA, UK, and FR into the B-tree, we insert these elements in the root node,
keeping it sorted.
(c) To insert IS into the B-tree, we first attempt to insert it in the root node. But this makes
the node overflow, since it now contains too many elements (CA, FR, IS, UK, US).
(See below left, where the overflowed node is highlighted.) So we identify the median
of these five elements (IS), and split the leaf node into a pair of siblings: one contain-
ing the elements less than the median (CA, FR) and one containing the greater
elements (UK, US). We move the median itself up to the parent node. Since in
this case there is no parent node (we are splitting the root node), we must create a
new parent node and move the median up to it. (See below right.)
(d) To insert DK into the B-tree, we first look among the elements of the root node. Since
DK is less than IS, we proceed to the first child node. Since the latter is a leaf node, we
insert DK in it. Similarly, to insert NO, NL, and BE into the B-tree. we insert them in
the second, second, and first child nodes, respectively.
(e) To insert LU into the B-tree, we first look among the elements of the root node. Since
LU is greater than IS, we proceed to the second child node. Since the latter is a leaf
node, we attempt to insert LU in it. But this makes the leaf node overflow. (See below
left.) So we identify the median element (NO), and split the leaf node into a pair of
siblings: one containing the lesser elements (LU, NL) and one containing the greater
elements (UK, US). We move the median itself up to the parent node, which now
contains two elements (IS, NO). (See below right.)
Balanced Search Tree Data Structures 457
(f) To insert PT into the B-tree, we straightforwardly insert it in the third child node.
(g) To insert GR into the B-tree, we attempt to insert it in the first child node. But this
makes the node overflow. So we split it into a pair of siblings, one containing BE and
CA, the other containing FR and GR, and we move the median DK up to the parent
node.
(h) To insert TR, IT, DE, and ES into the B-tree, we straightforwardly insert them in the
fourth, third, first, and third child nodes, respectively,
(i) To insert HU and CZ into the B-tree, we straightforwardly insert them in the second
and first child nodes, respectively,
(j) To insert PL into the B-tree, we attempt to insert it in the fourth child node. But this
makes the node overflow. So we split it into a pair of siblings, one containing PL and
PT, the other containing UK and US, and we move the median TR up to the parent
node,
(k) To insert IE into the B-tree, we attempt to insert it in the second child node. But this
makes the node overflow:
So we split this node into a pair of siblings, one containing ES and FR, the other contain-
ing HU and IE, and we move the median GR up to the parent node. But this makes the
parent node itself overflow:
458 lava Collections
So we split the parent node likewise into a pair of siblings, one containing DK and GR.
the other containing NO and TR, and we move the median IS up to a newly-created
grandparent node, which we make the root node:
The ideas illustrated in Example 16.1 are captured by Algorithms 16.21 and 16.22. The
main algorithm searches the tree in much the same way as the B-tree search algorithm.
When the algorithm reaches a leaf node, it inserts the new element in that leaf node (step
4). If that leaf node has overflowed (i.e., it now has k elements), the auxiliary algorithm is
called to split that node (step 5).
The auxiliary algorithm splits a given node. It first determines the median of that
node's elements (step 1). Then it splits node curr into two siblings: left-sib takes the
elements less than the median, with their children (step 2); and right-sib takes the
elements greater than the median, with their children (step 3). Finally, the algorithm
moves the median itself, and its children left-sib and right-sib, up to the parent node. If
there is no parent node, the algorithm creates one (step 4.1). If a parent node already
exists, the algorithm inserts the median into the parent node (step 5.1). and then if
necessary it splits the parent node by a recursive call to itself (step 5.2).
Program 16.23 shows a Java implementation of B-tree insertion. The insert and
splitNode methods belong to the Btree class. They closely follow Algorithms 16.21
and 16.22, respectively, with one important extra detail. Whenever insert moves
down the tree from parent to child, it stacks the parent node. Thereafter, if splitNode
has to move an element up from child to parent, it finds the parent node at the top of the
stack. Moreover, since insert has already searched the parent node in order to decide
which child link to follow, it also stacks the child link's position, for that is exactly where
splitNode must insert any element moved up from child to parent.
The second auxiliary method, insertlnNode, belongs to the Btree .Node inner
class. It inserts a given element, with its children, at a given position in the node.
Note that splitNode creates two new nodes, l e f t Sib and right Sib. and
discards the node that is being split. This makes the code elegant and readable. However,
it would be more efficient to create only one new node, and reuse the node that is being
split. (See Exercise 16.10.)
459
Program 16.23 Implementation of the B-tree insertion algorithm as a Java method, with auxiliary
methods (in class Btree) (continued on next page).
16.2.4 Deletion
Now let us consider the problem of deleting an element from a B-tree. We start with a
complete illustration.
than 2 elements. (See below left, where the underflowed node is highlighted.) So we
must try to restock the underflowed node by moving an element from one of its
nearest siblings. The right sibling cannot lose any elements without itself underflow-
ing. Fortunately, the left sibling contains more than enough elements. So we move the
left sibling's righmost element DE up to the parent node, replacing DK, and move DK
itself down to the underflowed node:
(d) To delete DK from the B-tree, we first search for it and find it in a leaf node. We
delete DK from the leaf node. But this causes the leaf node to underflow. Moreover,
we cannot restock the leaf node by moving an element from either of its left or right
siblings, without making that sibling underflow. However, we can coalesce the
underflowed node with either of its siblings. Suppose that we choose to coalesce it
with its right sibling. We move all that sibling's elements (LU, NL) into the under-
flowed node, together with the intermediate element from the parent node (IT):
(If k is odd, the underflowed node and its sibling together contain exactly k - 2
elements. If k is even, they actually contain k - 3 elements, since in (k - l)/2 we discard
the remainder.)
All these ideas are captured by Algorithms 16.25 and 16.26. The main algorithm
should be easy to understand. The auxiliary algorithm, which restocks an underflowed
node, breaks down into three cases. Step 1 deals with the case of an underflowed root
node, which contains no elements at all, and so can simply be discarded. Step 2 deals with
the case of an underflowed node that has a nearest sibling with a spare element. Step 3
Initially:
•BE»CA*DE»
•BE*CA*DE» |»IT»LU»NL»|
•ES»FR»||•LU^NL*||•PT«TR»UK»US«|
deals with the case of an underflowed node that must be coalesced with a nearest sibling,
In the last case the parent node contracts and might itself underflow; if that happens, the
algorithm calls itself recursively to restock the parent node.
We leave the implementation of B-tree deletion to Exercise 16. 1 1 .
1 6.2.5 Analysis
Search
Lei us analyze the B-tree search algorithm.
First consider searching a full k-ary B-tree, i.e., one in which every node has size k — 1 .
From equation (16.6), this B-tree has size n — kd+l — 1 . Conversely, if we know n, we
can determine the B-tree' s depth:
d = \ogk(n + 1) - 1
Algorithm 16.18 starts by visiting the root node, and each subsequent iteration visits a
child of the current node. We can immediately deduce that:
Maximum no. of nodes visited = d + 1
Insertion
The B-tree insertion algorithm, by the same reasoning, needs about Iog2n comparisons to
find the node where the new element will be inserted. There is the added complication of