Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 52

Disk Storage

Basic File Structures


Indexing

CS 421 / 720
Spring 2015
1
Disk Storage Device
• Data stored as magnetized areas on magnetic disk surfaces.
• A disk pack contains several magnetic disks connected to a rotating spindle.
• Disks are divided into concentric circular tracks on each disk surface.
• The division of a track into sectors is hard-coded on the disk surface and
cannot be changed.
– One type of sector organization calls a portion of a track that subtends a fixed
angle at the center as a sector.
– Different number of sectors in different tracks that maintain an uniform density

2
Disk Storage Devices
 The block size B is fixed during disk formatting
 Whole blocks are transferred between disk and main memory for processing.

• A (movable) read-write head moves to the track that contains


the block to be transferred.
– Disk rotation moves the block under the read-write head for reading or writing.
• A physical disk block (hardware) address consists of:
– a cylinder number (imaginary collection of tracks of same radius from all
recorded surfaces)
– the track number or surface number (within the cylinder)
– and block number (within track).
• Reading or writing a disk block is time consuming because of
the seek time (positioning on right track) and rotational delay
(latency)

3
Records
• Records contain fields which have values of a particular
type
– E.g., amount, date, time, age
• Records could be fixed or variable length
• Fields themselves may be fixed length or variable length
– Fixed: integer, string (chair), boolean, date…
– Variable: String (varchar), user defined types
– BLOB (Binary Large Objects), images, video and audio streams,
stored separately, pointer in a record
• Variable length fields can be mixed into one record:
– Separator characters or length fields are needed so that the
record can be “parsed.”

4
Three record storage formats
A fixed-length record with six fields and size of 71 bytes.

A record with two variable-length fields and three fixed-length fields.

A variable-field record with three types of separator characters.

5
Blocking
• Blocking: storing a number of records in one block on the disk.
• Blocking factor (bfr): the number of records per block,
rounded down (average number for variable record size)
– There may be empty space in a block if an integral number
of records do not fit in one block.
• Spanned Records:
– Refers to records that exceed the size of one or more
blocks and hence span a number of blocks.

6
Files of Records
• A file is a sequence of records, where each record is a
collection of data values (or data items).
• A file descriptor (or file header) includes information that
describes the file, such as the field names and their data types,
and the addresses of the file blocks on disk.
• A file can have fixed-length or variable-length records.
• File records can be unspanned or spanned
• Primary file organizations: heap, ordered or hashed
• The physical disk blocks that are allocated to hold the records
of a file can be contiguous, linked, or indexed.

7
Heap: Unordered Files

• A heap file is also called a pile file.


• New records are inserted at the end of the file.
• A linear search through the file records is
necessary to search for a record.
– This requires reading and searching half the file blocks
on the average, and is hence quite expensive.
• Record insertion is quite efficient.
• Reading the records in order of a particular field
requires sorting the file records.
8
Ordered (Sequential) File
• File records are kept sorted by the values
of an ordering field.
• Insertion is expensive: records must be
inserted in the correct order.
– It is common to keep a separate
unordered overflow file for new
records to improve insertion
efficiency; this is periodically merged
with the main ordered file.
• A binary search can be used to search for
a record on its ordering field value.
– This requires reading and searching
log2 of the file blocks on the average,
a big improvement over linear search.
• When used with indexes, indexed-
sequential files
9
Average Access Times

• The average access time to access a specific


record for a given type of file

10
Hashed Files
• Hashing for disk files is called External Hashing
– Internal hashing - The hash function returns a memory address to a particular array location.
– External hashing - The hash function returns a block address of all records that map to the
particular bucket
• The file blocks are divided into M equal-sized buckets, numbered b0, b1, ..., bM-1.
• Typically, a bucket corresponds to one (or a fixed number of) disk block.

• One of the file fields is designated to be the hash key of the file.
• The record with hash key value K is stored in bucket i, where i=h(K), and h
is the hashing function.
– One common hash function is h(K)= K mod M
• Search is very efficient on the hash key. 11
Hashed Files Collision Resolutions
Collisions occur when a new record hashes to a bucket that is
already full.

• Open addressing: Proceeding from the occupied position specified by the


hash address, the program checks the subsequent positions in order until an
unused (empty) position is found.

• Chaining: For this method, various overflow locations are kept, usually by
extending the array with a number of overflow positions. In addition, a
pointer field is added to each record location. A collision is resolved by
placing the new record in an unused overflow location and setting the
pointer of the occupied hash address location to the address of that overflow
location.

• Multiple hashing: The program applies a second hash function if the first
results in a collision. If another collision results, the program uses open
addressing or applies a third hash function and then uses open addressing if
necessary.
12
Hashed Files – Chaining overflow handling

13
Hashed Files (contd.)
• To reduce overflow records, a hash file is typically kept 70-80%
full.
• The hash function h should distribute the records uniformly
among the buckets
– Otherwise, search time will be increased because many
overflow records will exist.
• Main disadvantages of static external hashing:
– Fixed number of buckets M is a problem if the number of
records in the file grows or shrinks.
– Ordered access on the hash key is quite inefficient.

14
Dynamic File Expansion
• Static Hashing can lead to long overflow chains.
• Extendible Hashing avoids overflow pages by splitting a
full bucket when a new data entry is to be added to it.
– Directory to keep track of buckets, doubles periodically.
– Can get large with skewed data.
• Linear Hashing avoids directory by splitting buckets
round-robin, and using overflow pages.
– Overflow pages not likely to be long.
– Space utilization could be lower than Extendible Hashing,
since splits not concentrated on “dense” data areas.

15
Extendible hashing
Extendible hashing does not require an overflow area; it
uses the directories whose entries point to the disk
blocks that stores a record
• A directory is an array of size 2d where d is called the
global depth.
• A directory is accessed using binary representation of
the hash value h(K)

• An insertion in a disk block that is full causes the block


to split into two blocks and the records are
redistributed among the two blocks.
– The directory is updated appropriately.

16
Extendible
Hashing

17
Example • Directory is array of size 4.
• To find bucket for r, take last `global
LOCAL DEPTH 2
depth’ # bits of h(r); we denote r by
4* 12* 32* 16*
Bucket A h(r).
GLOBAL DEPTH
– If h(r) = 5 = binary 101, it is in

2 2
bucket pointed to by 01.
Bucket B
00 1* 5* 21* 13*

01
2
• Insert: If bucket is full, split it
10
Bucket C (allocate new page, re-distribute).
11 10*

2
• If necessary, double the
DIRECTORY
15* 7* 19* Bucket D directory. (Splitting a bucket
does not always require
DATA PAGES doubling)

Global depth of directory: Max # of bits needed to tell which bucket an entry belongs to.
Local depth of a bucket: # of bits used to determine if an entry belongs to this bucket. 18
Insert h(r) = 6 (The Easy Case)
6 = binary 00110

Bucket A Bucket A
LOCAL DEPTH 2 LOCAL DEPTH 2

GLOBAL DEPTH 4* 12* 32* 16* GLOBAL DEPTH 4* 12* 32* 16*

2 2 Bucket B 2 2 Bucket B

00 1* 5* 21* 13* 00 1* 5* 21* 13*

01 01
Bucket C Bucket C
10 2 10 2

11 10* 11 10* 6*

Bucket D Bucket D
DIRECTORY 2 DIRECTORY 2
15* 7* 19* 15* 7* 19*

DATA PAGES DATA PAGES


Insert h(r) = 20 (Causes Doubling)
20 = binary 10100

Bucket A
LOCAL DEPTH 2 2 Bucket A
LOCAL DEPTH 3 Bucket A
LOCAL DEPTH
GLOBAL DEPTH 4* 12*32*16* 32* 16*
GLOBAL DEPTH 32*16*
GLOBAL DEPTH

Bucket B 2 2 Bucket B
2 2 3 2 Bucket B
00 1* 5* 21* 13*
00 1* 5* 21*13* 000 1* 5* 21*13*
01
2 Bucket C 001
01 10
010 2 Bucket C
Bucket C 11 10*
10 2
011 10*
11 10* 2 Bucket D 100
DIRECTORY 2 Bucket D
15* 7* 19* 101
Bucket D 110 15* 7* 19*
DIRECTORY 2
2 Bucket A2 111
15* 7* 19*
4* 12*20* 3 Bucket A2
DIRECTORY
DATA PAGES (`split image' 4* 12*20*
of Bucket A)
Extendable Hashing
• Benefits of extendable hashing:
– Hash performance does not degrade with growth of file
– Minimal space overhead
• Disadvantages of extendable hashing
– Extra level of indirection to find desired record
– Bucket address table may itself become very big (larger than
memory)
• Need a tree structure to locate desired record in the structure!
– Changing size of bucket address table is an expensive
operation
• Linear hashing is an alternative mechanism which
avoids these disadvantages at the possible cost of more
bucket overflows
Linear hashing
• Linear hashing does require an overflow area but does
not use a directory.
– Blocks are split in linear order as the file expands (round robin).

• Suppose we stare with M buckets (0,1,…M-1) and the initial hash


function hi=K mod M
• When any file bucket overflows, the first bucket is split into two buckets,
0 and M (at the end of file), using hi+1=K mod 2M hash function
• When 2M is used, any record in bucket 0 will hash to either 0 or bucket
M
• As further collisions lead to overflow records, additional buckets are split
in the linear order (2, 3,..)
• When all M original buckets split (there are 2M buckets), the process
continues with hi+2=K mod 4M from the bucket 0

22
Linear Hashing (Contd.)
• Insert: If bucket to insert into is full:
• Add overflow page and insert data entry.
• (Maybe) Split Next bucket and increment Next.
• Can choose any criterion to `trigger’ split.
• Overflow occurs
• Buckets reached threshold of capacity (e.g. 75%)
• Since buckets are split round-robin, long overflow chains
don’t develop!
• Doubling of directory in Extendible Hashing is similar;
switching of hash functions is implicit in how the # of
bits examined is increased.
M=4 Insert 43
Level=0 Level=0

h h PRIMARY h h PRIMARY OVERFLOW


1 0 Next= PAGES 1 0 PAGES PAGES
0
32*44* 36* 32*
000 00 000 00
Next=
Data entry r 1
001 01 9* 25* 5* with h(r)=5 001 01 9* 25* 5*

14* 18*10*30* 14* 18*10*30*


010 10 Primary 010 10
bucket page
31*35* 7* 11* 31*35* 7* 11* 43*
011 11 011 11

100 00 44* 36*

Insert 22, 37, 29, 66, 34


24
End of round
Level=1
PRIMARY OVERFLOW
h1 h0 PAGES PAGES
Next=0
Level=0 000 00 32*
PRIMARY OVERFLOW
h1 h0 PAGES PAGES
001 01 9* 25*
000 00 32*
010 10 66* 18* 10* 34* 50*
001 01 9* 25*
011 11 43* 35* 11*
010 10 66* 18* 10* 34*
Next=3 100 00 44* 36*
011 11 31* 35* 7* 11* 43*

101 11 5* 37* 29*


100 00 44* 36*

101 5* 37* 29* 110 10 14* 30* 22*


01

110 10 14* 30* 22* 111 11 31* 7*

Insert 50 25
Indexes
• Types of Single-level Ordered Indexes
– Primary Indexes
– Clustering Indexes
– Secondary Indexes
• Multilevel Indexes
• Dynamic Multilevel Indexes Using B and B+ Trees

26
Indexes as Access Paths
• A single-level index is an auxiliary file that
makes it more efficient to search for a record
in the data file.
• The index is usually specified on one field of
the file (although it could be specified on
several fields)
• One form of an index is a file of entries <field
value, pointer to record>, which is ordered by
field value
• The index is called an access path on the field.
27
Indexes as Access Paths (contd.)

• The index file usually occupies considerably less disk blocks


than the data file because its entries are much smaller
• A binary search on the index yields a pointer to the file record
• Indexes can be characterized as dense or sparse
– A dense index has an index entry for every search key value (and
hence every record) in the data file.
– A sparse (or nondense) index, on the other hand, has index
entries for only some of the search values

28
Single-Level Indexes: Primary Index
• Primary Index
– Defined on an ordered data file
– The data file is ordered on a key field
– Includes one index entry for each block in the data file; the
index entry has the key field value for the first record in the
block, which is called the block anchor
– A similar scheme can use the last record in a block.
– A primary index is a nondense (sparse) index, since it
includes an entry for each disk block of the data file and the
keys of its anchor record rather than for every search value.
– Problem: insertion and deletion of records

29
Primary index on
the ordering key
field

30
Indexes as Access Paths
• Example: Given the following data file EMPLOYEE(NAME,
SSN, ADDRESS, JOB, SAL, ... )
• Suppose that:
– record size R=100 bytes block size B=1024 bytes r=30,000
records
• Then, we get:
– blocking factor Bfr= bottom(B div R) = 10 records/block
– number of file blocks b= (r/Bfr)= (30000/10)= 3000 blocks

• Average linear search cost of:


– (b/2)= 3000/2= 1500 block accesses

• Binary search on the (ordered) data file


– Upper (log2b)=12 block accesses

31
Example: Block Accesses with Primary Index

• For an index on the SSN field, assume the field size VSSN=9 bytes,
assume the record pointer size PR=6 bytes. Then:
– index entry size RI=(VSSN+ PR)=(9+6)=15 bytes
– index blocking factor BfrI= B div RI= bottom(1024 div 15)= 68
entries/block
– Number of index entries is 3000
– number of index blocks b= (r/ BfrI)= (3000/68)= 45 blocks
– binary search on index file needs upper(log2bI)= log245= 6 block
accesses
– one additional block access for the data file = 7 block accesses

– This is compared to the binary search cost:


• log2b= log23000= 12 block accesses

32
Practice
• Example: Given the following data file EMPLOYEE(NAME, SSN,
ADDRESS, JOB, SAL, ...
• Suppose that:
– record size R=150 bytes block size B=512 bytes r=30000
records

– (note the solution in the comments of this slide)

33
Single-Level Indexes: Clustering Index

• Clustering Index
– Defined on an ordered data file
– The data file is ordered on a non-key field unlike primary index,
which requires that the ordering field of the data file have a
distinct value for each record.
– Includes one index entry for each distinct value of the field; the
index entry points to the first data block that contains records
with that field value.
– It is another example of sparse (nondense) index
– Insertion and Deletion is relatively straightforward with a
clustering index.

34
A Clustering
Index
Example

• FIGURE 14.2
A clustering index on the
DEPTNUMBER ordering
non-key field of an
EMPLOYEE file.

35
Another Clustering Index
Example:

Separate block cluster for


each group of records tat
share the same value

36
Single-Level Indexes: Secondary Index

• Secondary Index
– A secondary index provides a secondary means of accessing a
file for which some primary access already exists.
– The secondary index may be on a field which is a candidate key
and has a unique value in every record, or a non-key with
duplicate values.
– The index is an ordered file with two fields.
• The first field is of the same data type as some non-ordering field
of the data file that is an indexing field.
• The second field is either a block pointer or a record pointer.
– There can be many secondary indexes (and hence, indexing
fields) for the same file.
– Includes one entry for each record in the data file; hence, it is a
dense index

37
Example of a
Dense
Secondary Index Block pointers,
Nonordering
Key field

Index: 442 blocks

9+1 block accesses

38
An Example of a
Secondary Index on
a nonkey field,
record pointers

39
Multi-Level Indexes
• Because a single-level index is an ordered file, we can create a
primary index to the index itself;
– In this case, the original index file is called the first-level index and
the index to the index is called the second-level index.
• We can repeat the process, creating a third, fourth, ..., top level
until all entries of the top level fit in one disk block
• A multi-level index can be created for any type of first-level index
(primary, secondary, clustering) as long as the first-level index
consists of more than one disk block
• A multi-level index is a form of search tree
• Insertion and deletion of new index entries is a severe problem
because every level of the index is an ordered file.

40
A Two-level Primary
Index

41
Example: Multilevel index
fo (fan-out) is blocking factor for the index

fo=68 index entries per block,

b1=442 first level blocks


b2= upper(442/68) = 7 second level blocks
b3= upper(7/68)=1 third level blocks

t=3 levels

t+1=4 block level accesses

42
Search trees
Nodes are in a form P1,K1, P2, K2…,Kq-1,Pq
Order p: at most p-1 search values and p pointers (q<=p)
Balanced (not binary): all of its leaf are at the same level

FIGURE 14.8: A Node in a


Search Tree with Pointers
to Subtrees below It

FIGURE 14.9: A search tree of order p = 3

43
Dynamic Multilevel Indexes: B -Trees and B+-Trees

• Most multi-level indexes use B-tree or B+ tree data.


• B-Tree and B+ Tree data structures are variations of search trees
that allow efficient insertion and deletion of new search values
• B-Tree and B+-Tree data structures leave space in each tree node
(disk block) to allow for new index entries
– Each node is kept between half-full and completely full

• An insertion into a node that is not full is quite efficient


– If a node is full the insertion causes a split into two nodes
– Splitting may propagate to other tree levels
• A deletion is quite efficient if a node does not become less than
half full
– If a deletion causes a node to become less than half full, it must be
merged with neighboring nodes 44
Difference between B-tree and B+-tree

• In a B-tree, pointers to data records exist at all


levels of the tree
• In a B+-tree, all pointers to data records exists
at the leaf-level nodes
• A B+-tree can have less levels (or higher
capacity of search values) than the
corresponding B-tree

45
B-tree Structures

46
The Nodes of a B+-tree
• The nodes of a B+-tree
– (a) Internal node of a B+-tree with q –1 search values.
– (b) Leaf node of a B+-tree with q – 1 search values and q – 1 data pointers.

47
Using B+ Tree Indexes
• For B+ tree of order p
– internal nodes have at least p/2 pointers - round up, and at most p
pointers
• for p=3, internal nodes are either (1 value, 2 pointers) or (2 values, 3 pointers)
– leaf node has at least ( p-1)/2 keys, at most (p-1) keys, and a
horizontal pointer to next leaf
• (for p=3, leaf nodes are either 1 or 2 values)
• Retrieving a record - search root, follow pointer to correct
child, search that, follow pointer to correct child…until you
get to leaf, then find key and follow data pointer
• Inserting a record - insert key and address in correct leaf of
index. If too full, split. May split parent, etc. - could add
level
• Deleting a record - delete key and address from leaf. If too
empty, redistribute with left sibling – if not possible, try
right sibling – if not possible unite - could lose level
48
Leaf Index Inserting a new element into B+ tree
Page Page
Full Full Action

No No 1. Place the record in sorted position in the appropriate leaf page


Yes No 1. Split the leaf page
2. Place Middle Key in the index page in sorted order. Left leaf
page contains records equal or below the middle key.
3. Right leaf page contains records with keys greater than the
middle key.

Yes Yes 1. Split the leaf page.


2. Records with keys <= middle key go to the left leaf page.
3. Records with keys > middle key go to the right leaf page.

4. Split the index page.


5. Keys < middle key go to the left index page.
6. Keys > middle key go to the right index page.
7. The middle key goes to the next (higher level) index.

IF the next level index page is full, continue splitting the index
pages.
49
An Example of
an insertion in a
B+-tree

50
Leaf page Index page Deleting from B+ tree
below below fill
fill factor factor Action

No No Delete the record from the leaf page. Arrange keys in


ascending order to fill void. If the key of the deleted
record appears in the index page, use the next key to
replace it.

Yes No Combine the leaf page and its sibling. Change the index
page to reflect the change.
Yes Yes 1. Combine the leaf page and its sibling.
2. Adjust the index page to reflect the change.
3. Combine the index page with its sibling.
Continue combining index pages until you reach a page
with the correct fill factor or you reach the root page.

51
An Example
of a deletion
in a B+-tree

52

You might also like