Professional Documents
Culture Documents
Physical DBs B Tree PDF
Physical DBs B Tree PDF
Physical DBs B Tree PDF
B+-Trees
Adnan YAZICI
Computer Engineering Department
METU
(Fall 2005)
DBMS , A. Yaz c 1
B+-Tree Index Files
DBMS , A. Yaz c 2
B+-Tree Index Files (Cont.)
DBMS , A. Yaz c 3
B+-Trees as Dynamic Multi-level Indexes
DBMS , A. Yaz c 4
B+-Tree Node Structure
l Typical node
DBMS , A. Yaz c 5
Leaf Nodes in B+-Trees
DBMS , A. Yaz c 6
Non-Leaf Nodes in B+-Trees
DBMS , A. Yaz c 7
Example of a B+-tree
DBMS , A. Yaz c 8
Example of B+-tree
!"
DBMS , A. Yaz c 9
Observations about B+-trees
l Since the inter-node connections are done by
pointers, “logically” close blocks need not be
“physically” close.
l The non-leaf levels of the B+-tree form a hierarchy
of sparse indices.
l The B+-tree contains a relatively small number of
levels (logarithmic in the size of the main file),
thus searches can be conducted efficiently.
l Insertions and deletions to the main file can be
handled efficiently, as the index can be
restructured in logarithmic time (as we shall see).
DBMS , A. Yaz c 10
Queries on B+-Trees
l Find all records with a search-key value of k.
1. Start with the root node
1. Examine the node for the smallest search-key value > k.
2. If such a value exists, assume it is Kj. Then follow Pi to
the child node
3. Otherwise k Km–1, where there are m pointers in the
node. Then follow Pm to the child node.
2. If the node reached by following the pointer above
is not a leaf node, repeat step 1 on the node
3. Else we have reached a leaf node.
1. If for some i, key Ki = k follow pointer Pi to the desired
record or bucket.
2. Else no record with search-key value k exists.
DBMS , A. Yaz c 11
Queries on B+-Trees (Cont.)
l In processing a query, a path is traversed in the tree from
the root to some leaf node.
l If there are K search-key values in the file, the path is no
longer than log n/2 (K) .
l A node is generally the same size as a disk block,
typically 4 kilobytes, and n is typically around 100 (40
bytes per index entry).
l With 1 million search key values and n = 100, at most
log50(1,000,000) = 4 nodes are accessed in a lookup.
l Contrast this with a balanced binary free with 1 million
search key values — around 20 nodes are accessed in a
lookup
– above difference is significant since every node
access may need a disk I/O, costing around 20
milliseconds!
DBMS , A. Yaz c 12
B+ Tree
DBMS , A. Yaz c 13
B+ Tree
l B+ Tree Properties
l B+ Tree Searching
l B+ Tree Insertion
l B+ Tree Deletion
DBMS , A. Yaz c 14
B+ Tree
– Balanced Tree
l Same height for paths from root to leaf
l Given a search-key K, nearly same access time for
different K values
– B+ Tree is constructed by parameter n
l Each Node (except root) has n/2 to n pointers
l Each Node (except root) has n/2 -1 to n-1 search-
key values
K1 K2 K1 K2 Kn-1
P1 P2 P3 P1 P2 Pn-1 Pn
DBMS , A. Yaz c 15
B+ Tree
l Search keys are sorted in order
– K1 < K2 < … <Kn-1
•Non-leaf Node
–Each key-search values in subtree Si K1 K2
pointed by Pi < Ki, >=Ki-1 P1 P2 P3
Key values in S1 < K1
S1 S2 S3
K1 <= Key values in S2 < K2
P3
•Leaf Node P1
K1 K2 …
P2
–Pi points record or bucket with Record of K1 Record of K2
search key value Ki
Record of K2
–Pn points to the neighbor leaf
node …
DBMS , A. Yaz c 16
B+ Tree: Search
l Given a search-value k
– Start from the root, look for the largest search-key
value (Kl) in the node <= k
– Follow pointer Pl+1 to next level, until reach a leaf
node K <=k<K
l l+1
K1 K2 … Kl Kl+1 Kn-1
P1 P2 P3 Pl+1 Pn-1 Pn
…
DBMS , A. Yaz c 17
B+ Tree:Insertion
l Overflow
– When the number of search-key values exceed n-1
7 9 13 15 Insert 8
–Leaf Node
•Split into two nodes:
–1st node contains (n-1)/2 values
–2nd node contains remaining values
–Copy the smallest search-key value of the 2nd node
to parent node
7 8 9 13 15
DBMS , A. Yaz c 18
B+ Tree: Insertion
l Overflow
– When number of search-key values exceed n-1
7 9 13 15 Insert 8
–Non-Leaf Node
•Split into two nodes:
–1st node contains n/2 -1 values
–Move the smallest of the remaining values, together
with pointer, to the parent
–2nd node contains the remaining values
9
7 8 13 15
DBMS , A. Yaz c 19
B+ Tree: Insertion
l Example 1: Construct a B+ tree for (1, 4, 7,
10, 17, 21, 31, 25, 19, 20, 28, 42) with n=4.
7
1 4 7
1 4 7 10
7 17
1 4 7 10 17 21
DBMS , A. Yaz c 20
B+ Tree: Insertion
B+ Tree Insertion
l 1, 4, 7, 10, 17, 21, 31, 25, 19, 20, 28, 42
7 17 25
1 4 7 10 17 21 25 31
17
7 20 25
1 4 7 10 17 19 20 21 25 31
DBMS , A. Yaz c 21
B+ Tree: Insertion
l 1, 4, 7, 10, 17, 21, 31, 25, 19, 20, 28, 42
17
7 20 25 31
1 4 17 19 31 42
7 10 20 21 25 28
DBMS , A. Yaz c 22
B+ Tree: Insertion
l Example 2: n=3, insert 4 into the following
B+Tree 9 10
7 8 Subtree Subtree
C D
2 5 Leaf Leaf
A B
7 10
4 8 C D
2 4 5 A B
DBMS , A. Yaz c 23
B+ Tree: Deletion
l Underflow
– When number of search-key values < n/2 -1
9 10 Delete 10
–Leaf Node
13 18
•Redistribute to sibling
•Right node not less than left node 9 10 13 14 16
9 10 13 14 9 13 14
DBMS , A. Yaz c 24
B+ Tree: Deletion
–Non-Leaf Node
9 10 Delete 10
•Redistribute to sibling
•Through parent 13 18
•Right node not less than left node
•Merge (contain too few entries) 9 10 14 15 16
in parent
13 18 22 18 22
9 10 14 16 9 13 14 16
DBMS , A. Yaz c 25
B+ Tree: Deletion
l Example 3: n=3, delete 3
5 20
3 8 Subtree
A
1 3
20
5 8 Subtree
A
DBMS , A. Yaz c 26
B+ Tree: Deletion
l Example 4: Delete 28, 31, 21, 25, 19
20
7 17 25 31 50
1 4 7 10 17 19 20 21 25 28 31 42
20
7 17 25 50
1 4 7 10 17 19 20 21 25 31 42
DBMS , A. Yaz c 27
B+ Tree: Deletion
l Example 4: Delete 28, 31, 21, 25, 19
7 17 20 50
1 4 7 10 17 19 20 25 42
7 17 50
1 4 7 10 17 20 42
DBMS , A. Yaz c 28
B+-Tree File Organization
l Index file degradation problem is solved by using B+-
Tree indices. Data file degradation problem is solved
by using B+-Tree File Organization.
l The leaf nodes in a B+-tree file organization store
records, instead of pointers.
l Since records are larger than pointers, the maximum
number of records that can be stored in a leaf node is
less than the number of pointers in a nonleaf node.
l Leaf nodes are still required to be half full.
l Insertion and deletion are handled in the same way
as insertion and deletion of entries in a B+-tree index.
DBMS , A. Yaz c 29
B+-Tree File Organization (Cont.)
# $ % & '
l Good space utilization important since records use
more space than pointers.
l To improve space utilization, involve more sibling
nodes in redistribution during splits and merges
– Involving 2 siblings in redistribution (to avoid split /
merge where possible) results in each node having
at least entries
DBMS , A. Yaz c 30
Indexing Strings
DBMS , A. Yaz c 31
B-Tree Index Files
n($ )* +
, $
+
n( +
,
+ $ *
n- '
DBMS , A. Yaz c 32
B-Tree Index File Example
DBMS , A. Yaz c 33
B-Tree Index Files (Cont.)
l Advantages of B-Tree indices:
– May use less tree nodes than a corresponding B+-Tree.
– Sometimes possible to find search-key value before
reaching leaf node.
l Disadvantages of B-Tree indices:
– Only small fraction of all search-key values are found
early
– Non-leaf nodes are larger, so fan-out is reduced. Thus,
B-Trees typically have greater depth than corresponding
B+-Tree
– Insertion and deletion more complicated than in B+-Trees
– Implementation is harder than B+-Trees.
l Typically, advantages of B-Trees do not outweigh
disadvantages.
DBMS , A. Yaz c 34
Index Definition in SQL
l Create an index
create index <index-name> on <relation-name>
(<attribute-list>)
E.g.: create index b-index on branch(branch_name)
l Use create unique index to indirectly specify and
enforce the condition that the search key is a candidate
key.
– Not really required if SQL unique integrity constraint is
supported
l To drop an index
drop index <index-name>
DBMS , A. Yaz c 35