Download as pdf or txt
Download as pdf or txt
You are on page 1of 60

PRESENTATION TOPIC ADVANCED TREES

ADVANCE DATA STRUCTURES

BRACT’S, Vishwakarma Institute of Information Technology, Pune-48

(An Autonomous Institute affiliated to Savitribai Phule Pune University)


(NBA and NAAC accredited, ISO 9001:2015 certified)
OBJECTIVE/S OF THIS SESSION

1.Understand indexing
2.Understand B and B+ Trees

Learning Outcome/Course Outcome


1. Understand data organization
2. Create B Tree
3. Differentiate between B Tree and B+ Tree

2
INDEXING
DENSE INDEX
MULTILEVEL INDEX
M-WAY SEARCH TREE
MOTIVATION FOR B-TREES

B-Tree was developed in the year of 1972 by Bayer and McCreight with the
name Height Balanced m-way Search Tree. Later it was named as B-Tree.

Index structures for large datasets cannot be stored in main memory

Storing it on disk requires different approach to efficiency


MOTIVATION (CONT.)

Assume that we use an AVL tree to store about 20 million records

We end up with a very deep binary tree with lots of different disk
accesses;

We know we can’t improve on the log n lower bound on search for


a binary tree

But, the solution is to use more branches and thus reduce the height
of the tree!
⚫ As branching increases, depth decreases
DEFINITION OF A B-TREE
A B-tree of order m is an m-way tree (i.e., a tree where each node
may have up to m children) in which:
1. the number of keys in each non-leaf node is one less than the
number of its children and these keys partition the keys in the
children in the fashion of a search tree
2. all leaves are on the same level
3. all non-leaf nodes except the root have at least [m / 2] children
4. the root is either a leaf node, or it has from two to m children
5. a leaf node contains no more than m – 1 keys
INSERTING INTO A B-TREE

Attempt to insert the new key into a leaf

If this would result in that leaf becoming too big, split the leaf into
two, promoting the middle key to the leaf’s parent

If this would result in the parent becoming too big, split the parent
into two, promoting the middle key

This strategy might have to be repeated all the way to the top

If necessary, the root is split in two and the middle key is promoted
to a new root, making the tree one level higher
AN EXAMPLE B-TREE
A B-tree of order 5
containing 26 items
26

6 12

42 51 62
1 2 4 7 8 13 15 18 25

27 29 45 46 48 53 55 60 64 70 90

Note that all the leaves are at the same level


CONSTRUCTING A B-TREE
Suppose we start with an empty B-tree and keys arrive in the
following order:
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45

We want to construct a B-tree of order 5

The first four items go into the root:

1 2 8 12

To put the fifth item in the root would violate condition 5

Therefore, when 25 arrives, pick the middle key to make a new root
CONSTRUCTING A B-TREE (CONTD.)

1 2 12 25

6, 14, 28 get added to the leaf nodes:


8

1 2 6 12 14 25 28
CONSTRUCTING A B-TREE (CONTD.)

Adding 17 to the right leaf node would over-fill it, so we take the
middle key, promote it (to the root) and split the leaf
8 17

1 2 6 12 14 25 28

7, 52, 16, 48 get added to the leaf nodes


8 17

1 2 6 7 12 14 16 25 28 48 52
CONSTRUCTING A B-TREE (CONTD.)
Adding 68 causes us to split the right most leaf, promoting 48 to the
root, and adding 3 causes us to split the left most leaf, promoting 3
to the root; 26, 29, 53, 55 then go into the leaves

3 8 17 48

1 2 6 7 12 14 16 25 26 28 29 52 53 55 68

Adding 45 causes a split of 25 26 28 29

and promoting 28 to the root then causes the root to split


CONSTRUCTING A B-TREE (CONTD.)

17

3 8 28 48

1 2 6 7 12 14 16 25 26 29 45 52 53 55 68
OPERATIONS

B-Tree of order 4
⚫ Each node has at most 4 (M) pointers and 3 (M-1) keys, and root has
at least 2 (M/2) pointers and 1 ((M/2)-1) key.

Insert: 5, 3, 21, 9, 1, 13, 2, 7, 10, 12, 4, 8

Delete: 2, 21, 10, 3, 4


INSERT 5, 3, 21

*5* a

*3*5* a

* 3 * 5 * 21 * a
INSERT 9

*9* a

b c
*3*5* * 21 *

Node a splits creating 2 children: b and c


INSERT 1, 13

*9* a

b c
*1*3*5* * 13 * 21 *

Nodes b and c have room to insert more elements


INSERT 2

*3*9* a

b d c
*1*2* *5* * 13 * 21 *

Node b has no more room, so it splits creating node d.


INSERT 7, 10

*3*9* a

b d c

*1*2* *5*7* * 10 * 13 * 21 *

Nodes d and c have room to add more elements


INSERT 12

* 3 * 9 * 13 * a

b d c e
*1*2* *5*7* * 10 * 12 * * 21 *

Nodes c must split into nodes c and e


INSERT 4
a
* 3 * 9 * 13 *

b d c e
*1*2* *4*5*7* * 10 * 12 * * 21 *

Node d has room for another element


INSERT 8
a
*9*

f g

*3*7* * 13 *

b d h c e
*1*2* *4*5* *8* * 10 * 12 * * 21 *

Node d must split into 2 nodes. This causes node a to split into 2
nodes and the tree grows a level.
REMOVAL FROM A B-TREE

During insertion, the key always goes into a leaf. For


deletion we wish to remove from a leaf. There are three
possible ways we can do this:
1 - If the key is already in a leaf node, and removing it
doesn’t cause that leaf node to have too few keys, then
simply remove the key to be deleted.
2 - If the key is not in a leaf then it is guaranteed (by the
nature of a B-tree) that its predecessor or successor will
be in a leaf -- in this case we can delete the key and
promote the predecessor or successor key to the non-leaf
deleted key’s position.
PROPERTIES OF B-TREES
Minimum children= (M/2)
Ex. Order 5 tree it’s (5/2)=3

Maximum children=M
Order 5 tree it’s 5

Minimum keys= (M/2)-1


Ex. Order 5 tree it’s (5/2)-1=2

Maximum keys=M-1
Order 5 tree it’s 4
LEAF NODE DELETION CASES:

Node to be deleted is leaf node

Node has less than min. number of Node has more than min.
keys number of keys

Consider Consider Merge to either


Left Right Left/ Right Sibling
Sibling Sibling with Parent
REMOVAL FROM A B-TREE (2)

If (1) or (2) lead to a leaf node containing less than the minimum
number of keys then we have to look at the siblings immediately
adjacent to the leaf :
⚫ 3: if one of them has more than the min. number of keys then we can
promote one of its keys to the parent and take the parent key into our
lacking leaf
⚫ 4: if neither of them has more than the min. number of keys then the
lacking leaf and one of its neighbours can be combined with their shared
parent (the opposite of promoting a key) and the new leaf will have the
correct number of keys; if this step leave the parent with too few keys
then we repeat the process up to the root itself, if required
B TREE
a
*9*

f g

*3*7* * 13 *

b d h c e
*1*2* *4*5* *8* * 10 * 12 * * 21 *

B tree Of order 4
Minimum Keys: 1
Maximum keys 3
Minimum children: 2
Maximum children 4
DELETE 2
CASE: NODE HAS MORE THAN MIN. NUMBER
OF KEYS

*9* a

f g
*3*7* * 13 *

b d h c e

*1* *4*5* *8* * 10 * 12 * * 21 *

2 deleted directly. Node b can loose an element without


underflow.
DELETE 21
CASE: CONSIDER LEFT SIBLING (DELETE
21)

*9* a

f g
*3*7* * 13 *

b d h c e

*1* *4*5* *8* * 10 * 12 * * 21 *

Consider left sibling as more keys are there. Rearrange


nodes.
DELETE 21
CASE: CONSIDER LEFT SIBLING

*9* a

f g
*3*7* * 12 *

b d h c e

*1* *4*5* *8* * 10 * * 13 *

Deleting 21 causes node e to underflow, so elements are


redistributed between nodes c, g, and e
DELETE 1
CASE: CONSIDER RIGHT SIBLING (DELETE 1)

*9* a

f g
*3*7* * 12 *

b d h c e

*1* *4*5* *8* * 10 * * 13 *

Deleting 1 causes node b to underflow, so elements are


redistributed between nodes b, f, and d
DELETE 1
CASE: CONSIDER RIGHT SIBLING

*9* a

f g
*4*7* * 12 *

b d h c e

* 3* *5* *8* * 10 * * 13 *

Deleting 1 causes node b to underflow, so elements are


redistributed between nodes b, f, and d
DELETE 10

*4*7*9* a

b d h e
*3* * 5* *8* * 12 * 13 *

Deleting 10 causes node c to underflow. This causes the


parent, node g to recombine with nodes f and a. This causes
the tree to shrink one level.
REASONS FOR USING B-TREES

When searching tables held on disc, the cost of each disc transfer is
high but doesn't depend much on the amount of data transferred,
especially if consecutive items are transferred
⚫ If we use a B-tree of order 101, say, we can transfer each node in one disc
read operation
⚫ A B-tree of order 101 and height 3 can hold 1014 – 1 items (approximately
100 million) and any item can be accessed with 3 disc reads (assuming we
hold the root in memory)
If we take m = 4, we get a 2-3 tree, in which non-leaf nodes have
two or three children (i.e., one or two keys)
⚫ B-Trees are always balanced (since the leaves are all at the same level), so
2-3 trees make a good type of balanced tree
B+ TREES:
B+ tree is an extension of the B tree.
The difference in B+ tree and B tree is that in B tree the
keys and records can be stored as internal as well as leaf
nodes whereas in B+ trees, the records are stored as leaf
nodes and the keys are stored only in internal nodes.
The records are linked to each other in a linked list
fashion.
This arrangement makes the searches of B+ trees faster
and efficient.
Internal nodes of the B+ tree are called index nodes.
DIFFERENCE OF B TREES AND B+
TREES:

B-Tree B+ Tree
Data is stored in leaf nodes as well as internal nodes. Data is stored only in leaf nodes.

Searching is a bit slower as data is stored in internal Searching is faster as the data is stored only in the
as well as leaf nodes. leaf nodes.

No redundant search keys are present. Redundant search keys may be present.

Deletion operation is complex. Deletion operation is easy as data can be directly


deleted from the leaf nodes.

Leaf nodes cannot be linked together. Leaf nodes are linked together to form a linked list.
B+ Tree Node Structure
IMPORTANCE
B+ trees are used by
⚫ NTFS, ReiserFS, NSS, XFS, JFS, ReFS, and BFS file
systems for metadata indexing
⚫ BFS for storing directories.
⚫ IBM DB2, Informix, Microsoft SQL Server, Oracle 8, Sybase
ASE, and SQLite for table indexes
DATA STRUCTURES FOR
STRINGS
TRIES Data Structure
INTRODUCTION

Numbers as key values: are data items of constant size,


and can be compared in constant time.

In real applications, text processing is more important than


the processing of numbers

We need different structures for strings than for numeric


keys.
MOTIVATING EXAMPLE
Example: 112 < 467 , Numerical comparison in O(1).

Compare Strings lexicographically does not reflect the


similarity of strings.
⚫ Western > Eastern , Strings comparison in O(min(|s1|,|s2|)).
where |s| denotes the length of the string s

Text fragments have a length; they are not elementary


objects that the computer can process in a single step.
APPLICATIONS

Bioinformatics
(DNA/RNA or protein sequence data).

Search Engines.

Spell checker.
TRIES
The basic tool for string data structures, similar in role
to the balanced binary search tree, is called “trie”

Derive from “retrieval.” (Pronounced either try or tree)

In this tree, the nodes are not binary. They contain


potentially one outgoing edge for each possible
character, so the degree is at most the alphabet size |A| .
TRIES CONT.

Prefix Vs. Suffix.


Ex. “computer”.
Prefix:(c, co, com).
Suffix: (r, er, ter)
Each node in this tree structure corresponds to a prefix of
some strings of the set.
If the same prefix occurs several times, there is only one
node to represent it.
The root of the tree structure is the node corresponding to
the empty prefix.
TRIES EXAMPLE
A = {a, c,
e}
Strings: e a
c
•aaa A A A
•aaccee cc
ce ac aa
•ac ee
•cc A A A A A
•cea cec cea aac aaa
eee
•cece
•eee A A A A A
cece aacc

A A
aacce

A
aaccee
A
TRIES EXAMPLE

Example: standard trie for the set of strings


S = { bear, bell, bid, bull, buy, sell, stock,
stop }
STRING TERMINATION
Strings are sequences of characters from some alphabet.
But for use in the computer, we need an important
further information: how to recognize where the string
ends.

There are two solutions for this:

1. We can have an explicit termination character, which is


added at the end of each string, but may not occur within
the string “\0” (ASCII code 0) , or

We can store together with each string its length.


TERMINATION EXAMPLE
Strings:
t f e
•exam
•example r a x
•fail
•false u i e l i a
•tree
•trie
e e e s l m
•true

\0 \0 \0 e \0 \0 p

l
\0

\0
STRING TERMINATION
The use of the special termination character ’\0’ has a
number of advantages in simplifying code.

It has the disadvantage of having one reserved character


in the alphabet that may not occur in strings.

There are many nonprintable ASCII codes that should


never occur in a text and ’\0’ is just one of them.
FIND, INSERT AND DELETE
To perform a find operation in this structure:
1. Start in the node corresponding to the empty prefix.
2. Read the query string, following for each read character
the outgoing pointer corresponding to that character to
the next node.
3. After we read the query string, we arrived at a node
corresponding to that string as prefix.
4. If the query string is contained in the set of strings stored
in the trie, then this node belongs to that unique string.
FIND, INSERT AND DELETE
To perform an insert operation in this structure:
1. Perform find
e
2. Any time we encounter a null pointer we create a new node.

x
Example:
⚫ Insert “extra” t a

r m

a \0

\0
FIND, INSERT AND DELETE
To perform a delete operation in this structure:
1. Perform find
e
2. Delete all nodes on the path from ‘\0’ to the root of the
tree unless we reach a node with more than 1 child
x

Example: t a
⚫ Delete “extra”
r m

a \0

\0
THANK YOU

You might also like