Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 27

ICS 214A: Database Management Systems Winter 2004

Lecture 06: B-Trees


Professor Chen Li

Overview
B-trees:
Give up on sequentiality of index Try to get balance the structure
B means balanced.

Can have as many levels of index as appropriate Make sure each block is between half used and completely full

One particular variant: B+ tree


We first discuss on B+ trees Then we will talk about B-trees
ICS214A Notes 06 2

3 5 11

B+ tree Example

Fanout: n=3

ICS214A

30 35 30 100 101 110 100 120 130 150 156 179 180 200
3 Notes 06

Root

120 150 180

Sample non-leaf (interior) node

57

81

to keys
< 57

to keys
57<= k<81

to keys
81<=k<95

95

to keys

>=95

ICS214A

Notes 06

Sample leaf node


From non-leaf node to next leaf in the sequence

To record with key 57

To record with key 81

ICS214A

Notes 06

To record with key 85

95

57

81

In textbooks notation
NonLeaf
30 30

n=3

Leaf
30 35 30 35

ICS214A

Notes 06

Control shape of B+tree


All leaves at same lowest level (balanced tree) Pointers in leaves point to records, except for the sequence pointer Size of nodes (fixed):
n+1 pointers n keys

Dont want a node to be too empty. Thus use at least


Non-leaf: Leaf: (n+1)/2 pointers (n+1)/2 pointers to tuples

ICS214A

Notes 06

Example
Full node min. node
30 30 35

n=3
120 150 180 3 5 11
Notes 06

Non-leaf

Leaf

ICS214A

Number of pointers/keys
Max Max ptrs keys Root Non-leaf (non-root) Leaf (non-root) n+1 n Min ptrs 2 Min keys 1

n+1
n

n
n

(n+1)/2
(n+1)/2

(n+1)/2-1
(n+1)/2

Not including the sequence pointer

ICS214A

Notes 06

Using B-trees to do a search


Lookup queries
empId = 235

Range queries:
empID > 400 empID <= 600 100 <= empID < 400

ICS214A

Notes 06

10

Insertions
Cases: (a) simple case: space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root

ICS214A

Notes 06

11

Case (a): Insert Key = 32


100

n=3

30
ICS214A

30 31 32
Notes 06

3 5 11

12

Case (b): Insert Key = 7


100

n=3

3 57 11

30 30 31

ICS214A

3 5

Notes 06

13

Case (c): Insert Key = 160


160 100

n=3

120 150 180

180
Notes 06

ICS214A

160 179

150 156 179

180 200
14

Case (d): New root (insert 45)


30 new root

n=3

10 20 30

30 32 40

10 12

20 25

40

ICS214A

Notes 06

40 45
15

1 2 3

Deletion
Cases:
(a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Redistribute keys (d) Cases (b) or (c) at non-leaf

ICS214A

Notes 06

16

Case (b): Coalesce with a sibling


Delete 50
10 40 100

n=4

10 20 30 40

ICS214A

Notes 06

40 50

17

Case (c): redistribute key


10 40 35 100

Delete 50

n=4

ICS214A

Notes 06

35 40 50

10 20 30 35

18

Case (d): nonleaf coalesce


Delete 37 new root
10 20 25 40
30 40 25

n=4

25 26 30

10 14

20 22

30 37

ICS214A

Notes 06

40 45
19

1 3

B+tree deletions in practice


Often, coalescing is not implemented
Since its too hard and not worth it!

ICS214A

Notes 06

20

Buffering for B+tree

Is LRU a good policy for B+tree buffers?


No!
Should try to keep root in memory at all times; and perhaps some nodes from second level

ICS214A

Notes 06

21

How to choose fanout n?


n affects:
The depth of the tree: logn (N), where N = # of records
Why? The time of binary search within a node is very small compared to the disk IO time

We want to have a large n to reduce the tree height If a node corresponds to a block, we choose a largest n to fill up the block space.

ICS214A

Notes 06

22

B+tree Optimizations
To improve performance, we want to reduce the height. Two strategies:
decrease the number of leaf pages
shorten the data stored in leaf pages using compression techniques

increase index node fanout by compressing key representation

These compression techniques:


reduce I/O costs but typically complicate insertion, deletion, and search algorithms

Deciding the policy for maintaining B+tree is part of physical database design.

ICS214A

Notes 06

23

Example: prefix compression


Use prefix compression at the lower nodes. Consider a page containing keys: Manino, Manna, Mannari, Mannarino, Mannella, Mannelli Man is a common prefix. Store keys as: (3, ino) (3 na) (5 ri) (7 no) (4 ella) (7 i) Construct the strings:
(3, ino): Man + ino Manino (3 na): Man + na Manna (5 ri): Manna + ri Mannari e.g., pick first 5 characters of previous string (7 no): Mannari + no Mannarino (4 ella): Mann + ella Mannella (7 i): Mannell + I Mannelli

ICS214A

Notes 06

24

Suffix Compression
Bertolucci Copelletti ... Gambogi Bert Cop . .Gam . . Cooperativa .. Copelletti. . . Cooperativa Copelletti. . .

Beret

Bertolucci
a) Full key values stored in all nodes of the B*- tree b) Suffix compression for key values in high- level nodes

Use suffix compression in high nodes.


Keys truncated to the right as long as they can distinguish between the high key value the left pointer points to and the low value the right pointer points to.

E.g.: If Bert is enough at the root node to do the routing, we dont need to store Bertolucci
ICS214A Notes 06 25

B-tree versus B+tree


K1 P1 K2 P2 K3 P3

to keys < K1

to record to record to record with K1 with K2 with K3 to keys to keys to keys K1<x<K2 K2<x<k3 >k3

A B-tree has record pointers in non-leaf nodes In practice, B+trees are preferred, and widely used
ICS214A Notes 06 26

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 85 105 65 125 25 45

B-tree example

ICS214A Notes 06 27

145 165

n=2

You might also like