Indexing Structures For Files: Database Design Database Design

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Chapter Outline

Database Design

Types of Single-level Ordered Indexes


Chapter 14 

 Primary Indexes
 Clustering Indexes
Indexing Structures for Files  Secondary Indexes
 Multilevel Indexes
 Dynamic Multilevel Indexes Using B-Trees and B+-
Trees
CS 6360.501 (Fall 2009)
Instructor: Sunan Han
The University of Texas at Dallas

Slide 14- 2

Indexes as Access Paths Indexes as Access Paths


 Assume a file of records already exists with some primary  The index file usually occupies considerably less
organization described in Ch13 (ordered, unordered or disk blocks than the data file because its entries are
hashed) much smaller (can be easily stored in memory)
 Indexes are additional auxiliary access structures that are
 A binary search on the index yields a pointer to the
used to speed up the retrieval of records
file record
 In a database system, index structures provide an efficient
secondary access path without affecting the physical  The index is usually specified on one field of the file
placement of records on disk (although it could be specified on several fields)
 Indexes can also be characterized as dense or sparse  One form of an index is a file of entries <field value,
 A dense index has an index entry for every search key value pointer to record>, which is ordered by field value
(and hence every record) in the data file.
 A sparse (or nondense) index, on the other hand, has index
 The index is called an access path on the field
entries for only some of the search values
Slide 14- 3 Slide 14- 4
Single-Level Indexes: Primary Index
Primary index
 Defined on an ordered data file on the ordered
 The data file is ordered on a key field key field
 Includes one index entry for each block in the data file;
the index entry has the key field value for the first record
in the block, which is called the block anchor
 A similar scheme can use the last record in a block.
 A primary index is a nondense (sparse) index, since it
includes an entry for each disk block of the data file and
the keys of its anchor record rather than for every search
value.

Slide 14- 5 Slide 14- 6

Example-1 Single-Level Indexes: Clustering Index


 Suppose that records are key-field ordered in the file and that:
 fixed record size R = 100 bytes, block size B = 1024 bytes, number of
records r = 30,000  Defined on an ordered data file
 The records are unspanned. Then, we get:  The data file is ordered on a non-key field unlike primary
 blocking factor Bfr = B/R = 1024/100 = 10 records/block
index, which requires that the ordering field of the data
 number of file blocks b =  r/Bfr = 30000/10 = 3000 blocks
 For an index on the key field, assume the field size V = 9 bytes,
file have a distinct value for each record.
assume the record pointer size P = 6 bytes. Then:  Includes one index entry for each distinct value of the
 index entry size Ri = (V + P) = (9 + 6) = 15 bytes field; the index entry points to the first data block that
 index blocking factor Bfri = B/Ri  = 1024/15  = 68 entries/block contains records with that field value.
 number of index blocks bi = b/ Bfri = 3000/68 = 45 blocks
(One index entry corresponds to one file block for sparse indexing)  It is another example of nondense index where Insertion
 binary search needs log2bi = log245 = 6 block accesses. It requires an and Deletion is relatively straightforward with a
additional block access to the data file => total block access is 7 clustering index.
 Without index, the binary search cost on the file itself would be:
 log2b = log23000 = 12 block accesses
Slide 14- 7 Slide 14- 8
A Clustering
Index Another Clustering
Example Index Example:
Records of same
clustering field value
are in separate blocks

Slide 14- 9 Slide 14- 10

Single-Level Indexes: Secondary Index


 A secondary index provides a secondary means of accessing
a file for which some primary access already exists.
 The secondary index may be on a field which is a candidate Example of a
key and has a unique value in every record, or a non-key with Dense
duplicate values. They are non-ordering fields
 The index is an ordered file with two fields.
Secondary Index
 The first field is of the same data type as some non-ordering for a Key Field
field of the data file that is an indexing field.
 The second field is either a block pointer or a record pointer.
 There can be many secondary indexes (and hence, indexing
fields) for the same file, for different records fields
 Includes one entry for each record in the data file; hence, it is
a dense index
Slide 14- 11 Slide 14- 12
Example-2
 Suppose that:
 fixed record size R = 100 bytes, block size B = 1024 bytes, number of
records r = 30,000
 The records are unspanned. Then, we get:
 blocking factor Bfr = B/R = 1024/100 = 10 records/block Example of a
 number of file blocks b =  r/Bfr = 30000/10 = 3000 blocks
 We construct a secondary index on a nonordering candidate key
Secondary Index
field, assume the field size V = 9 bytes, assume the record pointer for a Nonkey
size P = 6 bytes. Then:
 index entry size Ri = (V + P) = (9 + 6) = 15 bytes
Field
 index blocking factor Bfri = B/Ri  = 1024/15 = 68 entries/block
 number of index blocks bi = r/ Bfri = 30000/68 = 442 blocks
(Each index entry corresponds to one file record for dense indexing)
 binary search needs log2bi = log2442 = 9 block accesses (10 is the
total for the final data file block access)
 This is compared to an average linear search cost (w/o index) of:
 (b/2) = 3000/2 = 1500 block accesses
Slide 14- 13 Slide 14- 14

Summary of Single-Level Indexing Multi-Level Indexes

 Primary indexing reduces the search cost of the original file


search on an ordering field
 Because a single-level index is an ordered file, we can create
a primary index to the index itself to further reduce the
search cost
 In this case, the original index file is called the first-level index
and the index to the index is called the second-level index.
 We can repeat the process, creating a third, fourth, ..., top
level until all entries of the top level fit in one disk block
 A multi-level index can be created for any type of first-level
index (primary, secondary, clustering) as long as the first-
level index consists of more than one disk block
Slide 14- 15 Slide 14- 16
Example-3
 In example 1 (sparse primary index)
Two-level Primary  fixed record size R = 100 bytes, block size B = 1024 bytes, number of
records r = 30,000
Index  blocking factor Bfr = B/R = 1024/100 = 10 records/block
 number of file blocks b = r/Bfr = 30000/10 = 3000 blocks
 index entry size Ri = (V + P) = (9 + 6) = 15 bytes
 index blocking factor Bfri = B/Ri  = 1024/15  = 68 entries/block
(This is called the fan-out factor of the multi-level index)
 number of index blocks b1 = b/ Bfri = 3000/68 = 45 blocks
 binary search needs log2b1 = log245 = 6 block accesses (7 is the total
for an additional access to the data file block)
 For the second-level index to the 45 first-level index file blocks:
 number of index blocks b2 = b1 / Bfri = 45/68 = 1 block
 Total file block access is 1 (2nd-level) + 1 (1st-level) + 1 (data file) =
3
Slide 14- 17 Slide 14- 18

Multi-Level Indexes Search Trees

 Such a multi-level index is a form of search tree  A search tree of order p is a tree such that each
node contains at most p-1 search values and p
 However, insertion and deletion of new index entries
pointers in the order <P1,K1,P2,K2, …, Pq-1,Kq-1,Pq>,
is a severe problem because every level of the index
where q ≤ p, each Pi is a pointer to a child node, or
is an ordered file
null and each Ki is a unique search value from some
 This leads to dynamic multi-level indexes ordered set of values, and the following must hold:
 Dynamic multi-level indexing leaves some additional 1. Within each node, K1 < K2 < …, < Kq-1
space in each block for inserting new entries 2. For all values X in the subtree pointed at by Pi,
Ki-1 < X < Ki, for 1<i<q and X < Ki if i=1, and Ki-1 < X
if i=q

Slide 14- 19 Slide 14- 20


A Node in a Search Tree with Pointers to FIGURE 14.9
Subtrees below It A search tree of order p = 3.

Slide 14- 21 Slide 14- 22

Dynamic Multilevel Indexes Using B-Trees and Dynamic Multilevel Indexes Using B-Trees and
B+-Trees B+-Trees
 In B-Tree and B+-Tree data structures, each node  An insertion into a node that is not full is quite
corresponds to a disk block efficient
 Most multi-level indexes use B-tree or B+-tree data  If a node is full the insertion causes a split into two
structures because of the insertion and deletion problem nodes
 Space has to be reserved in each tree node to allow for  Splitting may propagate to other tree levels
new index entries  A deletion is quite efficient if a node does not
 Each node is kept between half-full and completely full become less than half full
 If a deletion causes a node to become less than half
full, it must be merged with neighboring nodes

Slide 14- 23 Slide 14- 24


Difference between B-tree and B+-tree B-Trees
 When used as an access structure on a key field in a data
file, a B-Tree of order p can be defined as follows
 In a B-tree, pointers to data records exist at all levels 1. Each internal node in the B-tree is of the form
of the tree <P1,<K1,Pr1>,P2,<K2,Pr2>, …, Pq-1,<Kq-1,Prq>,Pq>, where q ≤ p,
each Pi is a tree pointer and Pri is a data pointer to the record
 In a B+-tree, all pointers to data records exists at the whose search key field value is Ki (or the block containing the
leaf-level nodes record)
 A B+-tree can have less levels (or higher capacity of 2. Within each node, K1 < K2 < …, < Kq-1
search values) than the corresponding B-tree 3. For all values X in the subtree pointed at by Pi,
Ki-1 < X < Ki, for 1<i<q and X < Ki if i=1, and Ki-1 < X if i=q
4. Each internal node has at least p/2 tree pointers
5. A node with q tree pointers (q ≤ p) has q-1 search key field
values (and q-1 data pointers)
6. All leaf nodes are at the same level and have their tree
pointers to be null
Slide 14- 25 Slide 14- 26

B-tree Structures B-Trees Insertion


 It starts with a single root node at level 0
 When it’s full with p-1 key values and an insertion occurs, two
nodes at level 1 are created and all values except the middle
one are evenly distributed in the two new nodes
 The root keeps the middle value and adds two tree pointers
to the new split nodes
 When any node in the B-tree is full, it undergoes the same
process to split into two node at the next level
 When a node used up all its tree pointers (can not be split
any more) the split will propagate upwards
 If it happens at the root, the root is split and a new root and
therefore a new tree level is added
Slide 14- 27 Slide 14- 28
B-Trees Deletion B+-Trees
 The internal nodes are similar a search tree defined
 When deletion of a record causes two neighboring earlier (<P1,K1,P2,K2, …, Pq-1,Kq-1,Pq>) except that
nodes to be less than half full, a merge will happen  Ki-1 < X ≤ Ki, for 1<i<q and X ≤ Ki if i=1, and Ki-1 < X if i=q
 Each internal node has at least p/2 tree pointers
 This merge may cause a reduction of a tree level
 The leaf nodes are define as follows
 <<K1,Pr1>,<K2,Pr2>, …, ,<Kq-1,Prq>,Pnext>, where q ≤ p, each
Pri is a data pointer to the record whose search key field
value is Ki, or to a file block containing the record. Pnext
points to the next leaf node
 K1 < K2 < …, < Kq-1
 Each leaf node has at least p/2 values, or a
redistribution/deletion is needed
 All leaf nodes are at the same level
Slide 14- 29 Slide 14- 30

The Nodes of a B+-tree

An Example
of an Insertion
in a B+-tree

Internal nodes form paths


to the leaf nodes that point
to the actual data

2 levels => up to 9 leaves

Slide 14- 31 Slide 14- 32


An Example of a
Deletion in a B+- Summary
tree
 Types of Single-level Ordered Indexes
 Primary Indexes
Causes a redistribution
at the same level  Clustering Indexes
 Secondary Indexes
 Multilevel Indexes
 Dynamic Multilevel Indexes Using B-Trees and B+-
Causes a redistribution Trees
at higher levels

Slide 14- 33 Slide 14- 34

Assignment #12

 Page 545: 14.14 a, b, c, d, e


 Due date 11/23/09

Slide 14- 35

You might also like