Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

2-3-4 Trees

See section 3.3.2 in the GT text. A 2-3-4 tree (also called a (2,4) tree) is similar to a BST in that it stores values in the internal nodes and

has a property relating the values stored in a subtree to the values in the parent node. but different from a BST because Each internal node has a size property. I.e., an internal node can have 2, 3 or 4 children.

The tree has a depth property I.e., all external nodes must be at the s Example.
17

4 123

9 78 12 18

20 24

30 33

41 56 80

Consider the node (4,9): it contains two values and has three children (the nodes (1,2,3), (7,8), and (12)).
14

Q: What are the possible number of values an internal node of a 2-3-4 tree may contain?

1, 2, or 3. Since a node has 1 more child than internal values and the num Notice the property relating the values in a subtree to the values in the parent node of its root. For example, Consider the subtree rooted at node (20,30,41). All values in this subtree must be greater than 17. Now consider the subtree rooted at node (7,8). All values in this subtree must be greater than 4 and less than 9.
17

4 123

9 78 12 18

20 24

30 33

41 56 80

Q: What range of values would be allowed in any subtree rooted at node (33)? between 31 and 40 inclusive

Q: If we were to insert the value 15 into the tree above, where would it need to go to preserve the order property? In the same node as (1

NOTE: This is formally dened in section 3.3.1, (Multi-way Search Trees) i


15

2-3-4 Trees
Lets formalize our intuition by introducing some notation: A node with d children is called a d-node; The values stored at the node are labelled k1 , k2 , ..., kd1 The children are labelled v1 , v2 , ..., vd. The ordering property states that k1 < k2 < ... < kd1 and every value k stored in the subtree rooted at vi satises ki1 < k < ki We use the convention that k0 = and kd = + Example Notice, we have omitted the empty leaves. We could have drawn leaves (external nodes) from each of the nodes (1,2,3), (7,8), (12), (18), (24), (33) and (56,80).
17

4 123

9 78 12 18

20 24

30 33

41 56 80

Q: Why might we need these empty leaf nodes?

We need to consider these empty leaf nodes for the size property (when e
16

Q: Given the two properties, size and depth what can we say about the height of a 2-3-4 tree which stores n items?

Max...same as a BST, i.e. must have at least 2 children per node, min...ea How are the operations INSERT, DELETE, and SEARCH different for a 2-3-4 tree than for a BST?
17

4 123

9 78 12 18

20 24

30 33

41 56 80

Q: In what order would the nodes be examined if you were searching for the key 7? 17, (4,9),(7,8) Q: Which keys are evaluated at each node and what questions are asked? =17? no, 17 yes, =4? no, 4? no, =9? no, 9? yes, =7? yes What about searching for key = 11?

17

More formally: A d-node has values k1 , k2 , ..., kd1 and children v1 , v2 , ..., vd .

The ordering property states that k1 < k2 < ... < kd1 and every value k stored in the subtree rooted at vi satises ki1 < k < ki Starting with node n which is a d-node we are searching for key k. SEARCH(n,k): 1 if n is a leaf, \\(k is not in the tree.) 2 return NIL. 3 for i = 1 to d-1 4 if k = k_i 5 return 6 if k < k_i 7 return SEARCH(v_i,k) 8 end for 9 return SEARCH(v_d, k) 10 end SEARCH Trace through this algorithm on the same keys we searched for earlier.

18

2-3-4 Trees Insertion


Consider inserting 6 into the above tree. First we search to determine whether 6 is already in the tree. Q: If 6 is not in the tree, what does our SEARCH routine return? NULL Q: Given that 6 is not in the tree and we may want to insert it into the tree. What would be a more useful return value? The parent of where 6 should go, i.e., (7,8). Q: What changes do we need to make to the search method? Replace the return NIL part of line 1 with return parent(n). Now SEARCH returns the destination node into which we must insert the new element. Q: How do we insert a key into a 2-node or a 3-node?

Put the new element into the correct position so that the order property sti
17

4 123

9 678 12 18

20 24

30 33

41 56 80

19

17

4 123

9 678 12 18

20 24

30 33

41 56 80

Q: Do we need to worry about the sub-trees if we are shifting the keys? Q: What happens if the destination node is a 4-node? We will violate the size property. Example. Suppose we insert 6 and then try to insert 5. We get the node (5,6,7,8) and a size violation. We resolve the problem by doing a SPLIT: Select the median of the four keys (well round up), so key 7. Now we create two child nodes: the node ( 5 the node ( 8 ) and give them the parent (7). We insert the subtree
20

,6

) and

into the node (4,9) of (6,7,8).

which was the original parent

Q: What does the nal tree look like after the insertion?
17

4 123

7 9 56 8 12 18

20 24

30 33

41 56 80

Formally: When an insertion results in a 4-node n with keys k1 , k2 , k3 , k4 and subtrees v0 , v1 , v2 , v3 , v4 : Let p represent the parent node of n. Assume for now that n is not the root. Let i be the position in p where n is the subtree (vi = n). A split occurs which creates a 3-node s1 with keys and subtrees v0 , v1 and v2 key k4 and subtrees v3 and v4 .
21

k1 , k 2

, and a 2-node s2 with

Reorganize the original tree by Inserting k3 into p as its ki key Shifting the higher keys in p to the right. ps subtrees vi and vi+1 are assigned s1 and s2 . The original subtrees of p, vj for j = 1 . . . i 1 we let remain the same.

The subtrees of p vj for j = i + 2 . . . 3 are each shifted to the rig Q: What problem may occur?

This may result in p itself becoming a 5-node. If p was already a 4-node th Q: How far can this go? This could propogate all the way to the root. Q: How do we take care of this case?

In this case there is no node p into which we could insert k2 so we create

22

2-3-4 Trees Deletion


Consider the following 2-3-4 tree:
17

4 123

7 9 56 8 12 18

20 24

30 33

41 56 80

Q: How would the tree change if we deleted the element with key 2? simply replace the 4-node (1,2,3) by the 3-node (1,3). Q: What would be the problem if we deleted the element 4 in the same way? too many children for the number of keys

Using the same approach as for BSTs we nd the predecessor for 4 a So, in this case:

We take 4s predecessor, 3, and delete it, replacing 4 with 3. Therefore, we only really consider the case where we are deleting an
23

17

4 123

7 9 56 8 12 18

20 24

30 33

41 56 80

Q: Which tree property is violated if we delete the element with key 8 from the above tree? too few leaves (instead of too many). This problem is called UNDERFLOW. Q: How might we solve this problem when we delete 8? We solve this problem by BORROWING a key from a sibling. In this case the sibling (5,6) has 2 keys so it can spare one. Suppose we shifted 6 over to replace the node 8. What property would we violate? We would violate the ordering property.

Solution?

24

Notice that this could happen from either sibling so if we were to delete 33, we could rotate 41, 56 and 33.
17

4 123

7 9 56 8 12 18

20 24

30 33

41 56 80

Q: Will this work if we want to delete 24? Q: Who will rotate or why wont it work? Solution: In this case we MERGE. So combine the node 24

with one of its 2-node siblings and the ke

In this case, one choice is to combine the nodes 18,20 and 24. We delete node 24 from this and attach the combined node as the subtree of the parent. Here is the resulting tree:

25

17

4 123

7 9 56 8 12

30 18 20 33

41 56 80

Q: What problem could this cause? This could cause the parent p to underow. Solution? When this occurs we resolve the underow by borrowing from or merging New Example.
17 33 12 18 80

7 123

Delete 18 from this tree.


26

18 cant borrow from 80. 80 is already a 2-node. 18 cant borrow from (12) or (1,2,3). They are not siblings. So 18 is deleted and 80 merges with 33 and now the parent 33 underows.
17 33 12 18 80 123 17 ?? 12 33 80

7 123

Now ?? could borrow from node 7 if it were a 3-node or 4-node but because it is a 2-node, 7 is merged with 17 resulting in:
??

7 123 12

17 33 80

Since 17 came from a 2-node it now underows. Because it is the root, it is simply removed from the tree and (7,17) becomes the new root.
27

Relating 2-3-4 Trees to Red-Black Trees


Red-black trees are another type of balanced search tree. They are binary trees where every node has an additional property. Each node is coloured either red or black. The following properties must hold: 1. The root of the tree is black. 2. Every external node is black 3. The children of a red node are black. 4. All external nodes have the same black depth. (The same number of black ancestors.) [See the text for detailed discussions about red-black trees.] We can understand red-black trees by relating them to 2-34 trees. Consider the denition of red-black trees. A black node can have red children or black children All red nodes have only black children. Q: How can we relate black and red nodes to 2,3 or 4 nodes in a 2-3-4 tree?

black nodes with no red children become 2-nodes. Black nodes with 1 red
28

B-trees
B-trees are a generalization of 2-3-4 trees. B-trees are multiway trees with all leaves at the same level and a varying number of children per node. A B-tree node can hold at most m keys and pointers to m + 1 children. More formally, the following properties must hold in a B-tree of order m. 1. The root must hold at least 1 key and at most m keys. 2. Each node must hold between
m 2

keys and m keys.

3. All leaves must be at the same level. Q: How does a 2-3-4 tree relate to a B-tree? A: A 2-3-4 tree is a B-tree of order 3 Q: Consider keeping a search tree on disk. How many disk accesses would it take to search a tree of height h? A: O(h) Q: Why is this so important? This is important because each disk access is very very slow relative to a
29

When the OS reads from disk, it reads a minimum of 1 block of data from the disk. Why might this be inefcient for a 2-3-4 tree?

If all you want is one 2-node from a 2-3-4 tree (possibly 12 bytes) the It also has to spin the disk to the correct sector, move the head to the correct track etc. Q: How might one design the tree better to take advantage of the fact that disk access is expensive but once we do the access we read at least one full block of data? A: Make the nodes of the B-tree as large as we can without exceeding 1

Therefore, set n so that the total number of bytes is as close to a block size Similar to 2-3-4 trees, B-trees have insertion and deletion algorithms that SPLIT and MERGE as necessary to maintain the properties.

30

Augmenting Data Structures


[Not in G &T Text. In CLRS chapter 14.] A 2-3-4 tree by itself is not very useful. To support more useful queries we need more structure.

General Denition: An augmented data structure is simply an existing data structure modied to store additional information and/or perfo Example: We want a data structure that will allow us to answer two types of rank queries on sets of values, in addition to the standard operations for maintaining the set (INSERT, DELETE, SEARCH):

RANK(k): Given a key k, what is its rank, i.e., its position among the SELECT(r): Given a rank r, which key has this rank? For example, if our set of values is 3,15,27,30,56, then RANK(15) = 2 and SELECT(4) = 30.

31

Lets look at 3 different ways we could do this. 1. Use 2-3-4 trees without modication. Q: What will be the time for a query? Q: Will the other operations (SEARCH/INSERT/DELETE) take any longer? no.

Queries: Simply do an inorder traversal of the tree, keeping track of t

at worst O(n) because we have may have to visit every node if our r

Q: What is the problem? Could we do better? very inefcient.

2. Augment 2-3-4 trees so that each node has an additional eld rank Q: What will be the time for a query? For RANK(k), the same as SEARCH, or O(log n). For SELECT(r), can use binary search: O(log n) Q: Will the other operations (SEARCH/INSERT/DELETE) take any longer?

INSERT will now require the update of the rank eld. How quickly ca Q: What is the problem? Could we do better? Still inefcient for INSERT and DELETE.
32

3. Augment the tree in a more sophisticated way.

Before proceeding with this example, lets dene what we mean by augmenting a data structure:

Generally, a data structure is augmented by doing the following: Determine additional information that needs to be maintained.

Check that the additional information can be maintained during each Implement the new operations.

Back to our example: 3. Augment the tree in a more sophisticated way. Q: How can we augment the nodes of 2-3-4 trees so that we can perform our queries faster? Q: What would help us with questions about rank ?
33

augment each node x such that it has an additional eld size[x] that s Q: How is this related to rank? Suppose we have a 2-node with a single element x. What is rank(x) in terms of the keys that come before x in the tree? RANK(x) = 1 + # keys that come before x in the tree. Q: Now with respect to the left subtree rooted at x, what is the relative RANK(x)? RANK(x) = SIZE(v1 ) + 1 where v1 is the left child. Q: Now suppose we have a 4-node X with elements x1 , x2 , x3 . What is the relative rank of xi with respect to the subtree rooted at X ?

RELATIVE RANK(xi) = 1+# elements the precede xi in the tree = 1+ SI =


j =i j =1

SIZE(vj )+1.

So the rank of a node is related to the size of the subtrees rooted at neighbouring nodes.

34

Lets look at rank queries more closely: Computing RANK(k): Given key k, do a SEARCH(k) keeping track of the rank of the current node. Each time you go down a level you must add the size of the subtrees

Think of this as the relative rank of the key to the left of the subtree Recall the SEARCH algorithm for 2-3-4 trees:
SEARCH(n,k): 1 if n is a leaf, return NIL. \\(k is not the in the tree.) 2 for i = 1 to d 3 if k = k_i 4 return 5 if k < k_i 6 return SEARCH(v_i,k) 7 end for 8 return SEARCH(v_{d}, k) 9 end SEARCH

Q: When we recursively call SEARCH(vi,k) what do we add to r? size of vj + 1 for each j < i

35

Q: When we nd x how do we determine its true rank? take the current rank so far, r, and add the size of the x s left child.

Note that we did not deal with degenerate cases (such as when k does no Finding the key with rank r. SELECT(r): Given rank r , Start at x = root[T] and work down.

Let S be the left-most child that has not yet been explored. Compare r to size[S] + 1 .

If they are equal return the element to the immediate right of the poin If (r < size[S] +1), we know that the element we are looking for is in S, so call the routine recursively on S If (r > size[S] + 1), then we know the node we are looking for is in one of Ss right siblings and that its relative rank in the remaining elements (ignoring S) is equal to r - (size[S] + 1) so we change r accordingly and go down the next sibling of S to the right.
36

Once again, we did not deal with degenerate cases (such as when r is a Q: What will be the complexity for a rank query? same as SEARCH, ie. O(log n). Q: What about the updates: INSERT and DELETE? These operations consist of two phases for 2-3-4 trees: the operation itself, followed by the x-up process. Well look at the operation phase rst, and check the x-up process only once afterwards. Operations:

INSERT(x): Simply increment the size of the subtree rooted at ever DELETE(x): Consider the element y that is actually removed by the operation (so y = x or y = predecessor(x)). What do we know about the sizes of the subtree rooted at every node on the path from the root down to y? It decreases by 1, so we simply traverse that path to decrement the

37

Fix-up Processes: SPLITS: Consider splitting a node and promoting a key. To nd the size of the new nodes we will need to check the sizes of each
M S K L M N X Y Z V W T K L N V W U

X Y Z

where size(U) = size(W) + size(V) + 1 size(T) = size(S) - size(U) - 1 = size(X) + size(Y) + size(Z) + 2 ) and the size of the parent node of S is unaffected. Remember that size means the total number of elements stored at that node and below. However, if S was the root and did not have a parent, then a new root will be created with key M and it will have size = size(S).

38

BORROW & ROTATE: Recall that a deletion can leave a 1node with no key and only one child which requires us to borrow from a sibling. In the picture below, ?? is the key which has been deleted (demoted).
M K s1 s2 s3 L

K
s1 s2

L
s3

??
s4

M s4

The only size elds that have changed are those of the nodes containing K and ??. The size of the node containing K has decreased by size(s3)+1. and the size of the node containing ?? has increased by size(s3)+1.

Note that the node KL must be at least a 3-node but it could also be

39

MERGES: What do merges affect?

Merges only affect the size of the resulting merged node

We illustrate merge from a left sibling, (merge from a right sibling is handled in a similar way).
L T X K Y Z U S K L X Y Z

where, size(S) = size(T) + size(U) + 1 and the size of the parent of S is unaffected. This takes O(1). time. Q: What is the update time?

We have only added a constant amount of extra work during the rst We have nally achieved what we wanted: each operation (old or new) takes time (log n) in the worst-case.

40

You might also like