Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 27

TREE STRUCTURED

INDEXING

Dr. Hari Om Gupta


Professor, Department of Electrical Engineering
IIT Roorkee
Indexed Sequential Access Method (ISAM)

P0 K1 P1 K2 P2 … Km Pm

Format for an Index Page or Block


One level index structure

K1 K2 Kn Index file

Data file
Page 1 Page 2 Page 3 Page n+1
ISAM INDEX STRUCTURE

NON LEAF PAGES

---

--- ---

--- -- ---

LEAF OVER FLOW


PRIMARY PAGES
PAGES PAGES
Non leaf pages contains index entries of the form
(search key value,page id)
Can use alternative (1) or (2) or (3), but commonly alternate (2) or (3)
is used. (key, rid)

Data pages

Index pages

Overflow pages

Page allocation in ISAM


Root
4
2

1 3 5 64
8 3 1

10* 15* 20* 27* 33* 37* 42* 46* 51* 55* 64* 97*

Sample ISAM Tree


Root
4
2
Non leaf
pages

1 3 5 64
8 3 1

Leaf pages

10* 15* 20* 27* 33* 37* 42* 46* 51* 55* 64* 97*

23* 52* 63*


Sample ISAM Tree
Overflow pages 53*

ISAM Tree after inserts


To search a record , the number of disk I/O is equal to the number
of levels of the tree and is equal to
logFN
N= no. of primary leaf pages; F= Fan-out

1,000,000 records file, 10records/page and F=100


N=1,000,000/10= 100,000
Levels=log100(100,000)=3
Thus number I/O of the tree=3 to reach the required page

If we use binary search for the sorted file, number of steps


Equal to
log2(100,000)=17 steps

Deletion in ISAM: BLANK OVERFLOW PAGES ARE


RELEASED WHEREAS BLANK PRIMARY PAGES ARE
Root
4
2
Non leaf
pages

1 3 5 64
8 3 1

Leaf pages

10* 15* 20* 27* 42* 46* 51* 55* 64* 97*

23* 52* 63*


Sample ISAM Tree
Overflow pages
ISAM Tree after deletes (53,33,37)
Disadvantages
Long overflow chains Retrieval time increases
to overcome this
(a) Initially 20% of each page is kept free.
(b) Eliminate overflow chains by a complete reorganization
of the file.
There may be to many blanks if data shrink

Advantages
No need to lock index level pages thus queues & waiting
time to get access to a page is reduced in comparison to B+ tree.
It is static , response time will be less if overflow pages and
blank pages are few .
B+ TREE
(A DYNAMIC INDEX STRUCTURE)

•Operations (insert, delete) on tree keep it balanced.


•A minimum occupancy of 50% is guaranteed for each node except
root node.
•Searching for a record requires just a traversal from the root node
to the appropriate leaf.
•Leaf pages are sorted and have pointers to link from one page to
other page.
•Height is normally 3 or 4.
•Index level pages are to be locked tree dynamically changes.
One level index structure

K1 K2 Kn Index file

Data file
Page 1 Page 2 Page 3 Page n+1
One level index structure

Non leaf Index entries


nodes

Data entries
Page 1 Page 2 Page 3 Page n

Leaf pages (a sequence set-> sorted file)

B+ tree structure : index file


B+ Tree

A node other then root node may contain m entries such that
d<=m<=2d
2d is capacity of page (max. number of entries a page may store)
d is a parameter of the B+ tree called the order of the tree.
The leaf pages are chained together in a doubly linked list.
In the common case that alternative (2) or (3) is used as in ISAM.
However as general rule B+ tree are likely to perform better than ISAM.

Search
Insert
Root
14 17 24 30

1 3 5 7 15 16 19 20 23 24 26 29 33 34 39 41

B+ tree, order d=2


Insert record with key value 8
5

Split leaf pages during insert of entry 8


Left side Right side
d d+1
1 3 5 7 8
Root 17

5 1 2 3
3 4 0

1 3 5 7 8 15 16 19 20 23 2 26 29 33 34
4

After inserting record with key value 8


Insert
Root
8 17 24 30

1 3 5 7 8 15 16 19 20 23 24 26 29 33 34 39 41

B+ tree after inserting Entry 8 using redistribution

For Redistribution
Have to retrieve the sibling with empty cell
Checking for redistribution increases I/O for index node spit.
Thus spit may not be advantageous.
For growing files
(a) Do not redistribute for non leaf vacancies
(b) Limited redistribution (Only with neighbours) for leaf pages
Delete
Search & delete with following restriction
If a node is at minimum occupancy before deletion and
deletion causes it to go below the occupancy threshold. When
this happens, we must either redistribute entries from an
adjacent sibling, or merge the node with a sibling, in order to
maintain minimum occupancy
Root 17

5 1 2 3
3 4 0

1 3 5 7 8 15 16 19 20 2 26 29 33 34
4

After deleting record with key value 23


2 3 3
4 0 0

19 2 26 29 33 34 39 41 19 24 26 29 33 34 39 41
4

Partial B+ Tree during deletion of Entry 20


Root 5 13 17 30

1 3 5 7 8 15 16 19 24 26 29 33 34 39 41

B+ Tree during deletion of Entry 20


Duplicates

ISAM
Overflow pages
B+ Tree
Use (key, rid) the record entry no. along with key
may be used as search key
B+ Tree in Practice

Key Compression and Code


Fan out of the tree=F
Height of the tree = logF (# of data entry)
Thus higher the fan-out, lower is tree height and
response time will be less
To increase the fan out, use small size of search key.
Code is normally used to reduce the size of the search
key.
Dani Lal Data Ram Devand
Man Verma Sagar

Dani Lal Dara Singh


Man
...

After key compression


Dan Dat Dev
a
B+ Tree Bulk Loading
Root
Sorted pages of data entries not yet in B+tree

1 5 6 8 9 10 12 14 15 16 20 22 31 35 39 41

Root
6 9 Sorted pages of data entries not yet in B+tree

1 5 6 8 9 10 12 14 15 16 20 22 31 35 39 41
9
Root
6 12 Sorted pages of data entries not yet in B+tree

1 5 6 8 9 10 12 14 15 16 20 22 31 35 39 41

data entries not yet in B+tree


9 15
Root
6 12 20 31

1 5 6 8 9 10 12 14 15 16 20 22 31 35 39 41
Root

15

9 31

6 12 20 39

1 5 6 8 9 10 12 14 15 16 20 22 31 35 39 41

Complete B+ Tree
Multidimensional Indexes
Summary
Other variation of B tree >> not popular

You might also like