Professional Documents
Culture Documents
Disk Storage Basic File Structures Indexing: CS 421 / 720 Spring 2015
Disk Storage Basic File Structures Indexing: CS 421 / 720 Spring 2015
CS 421 / 720
Spring 2015
1
Disk Storage Device
• Data stored as magnetized areas on magnetic disk surfaces.
• A disk pack contains several magnetic disks connected to a rotating spindle.
• Disks are divided into concentric circular tracks on each disk surface.
• The division of a track into sectors is hard-coded on the disk surface and
cannot be changed.
– One type of sector organization calls a portion of a track that subtends a fixed
angle at the center as a sector.
– Different number of sectors in different tracks that maintain an uniform density
2
Disk Storage Devices
The block size B is fixed during disk formatting
Whole blocks are transferred between disk and main memory for processing.
3
Records
• Records contain fields which have values of a particular
type
– E.g., amount, date, time, age
• Records could be fixed or variable length
• Fields themselves may be fixed length or variable length
– Fixed: integer, string (chair), boolean, date…
– Variable: String (varchar), user defined types
– BLOB (Binary Large Objects), images, video and audio streams,
stored separately, pointer in a record
• Variable length fields can be mixed into one record:
– Separator characters or length fields are needed so that the
record can be “parsed.”
4
Three record storage formats
A fixed-length record with six fields and size of 71 bytes.
5
Blocking
• Blocking: storing a number of records in one block on the disk.
• Blocking factor (bfr): the number of records per block,
rounded down (average number for variable record size)
– There may be empty space in a block if an integral number
of records do not fit in one block.
• Spanned Records:
– Refers to records that exceed the size of one or more
blocks and hence span a number of blocks.
6
Files of Records
• A file is a sequence of records, where each record is a
collection of data values (or data items).
• A file descriptor (or file header) includes information that
describes the file, such as the field names and their data types,
and the addresses of the file blocks on disk.
• A file can have fixed-length or variable-length records.
• File records can be unspanned or spanned
• Primary file organizations: heap, ordered or hashed
• The physical disk blocks that are allocated to hold the records
of a file can be contiguous, linked, or indexed.
7
Heap: Unordered Files
10
Hashed Files
• Hashing for disk files is called External Hashing
– Internal hashing - The hash function returns a memory address to a particular array location.
– External hashing - The hash function returns a block address of all records that map to the
particular bucket
• The file blocks are divided into M equal-sized buckets, numbered b0, b1, ..., bM-1.
• Typically, a bucket corresponds to one (or a fixed number of) disk block.
• One of the file fields is designated to be the hash key of the file.
• The record with hash key value K is stored in bucket i, where i=h(K), and h
is the hashing function.
– One common hash function is h(K)= K mod M
• Search is very efficient on the hash key. 11
Hashed Files Collision Resolutions
Collisions occur when a new record hashes to a bucket that is
already full.
• Chaining: For this method, various overflow locations are kept, usually by
extending the array with a number of overflow positions. In addition, a
pointer field is added to each record location. A collision is resolved by
placing the new record in an unused overflow location and setting the
pointer of the occupied hash address location to the address of that overflow
location.
• Multiple hashing: The program applies a second hash function if the first
results in a collision. If another collision results, the program uses open
addressing or applies a third hash function and then uses open addressing if
necessary.
12
Hashed Files – Chaining overflow handling
13
Hashed Files (contd.)
• To reduce overflow records, a hash file is typically kept 70-80%
full.
• The hash function h should distribute the records uniformly
among the buckets
– Otherwise, search time will be increased because many
overflow records will exist.
• Main disadvantages of static external hashing:
– Fixed number of buckets M is a problem if the number of
records in the file grows or shrinks.
– Ordered access on the hash key is quite inefficient.
14
Dynamic File Expansion
• Static Hashing can lead to long overflow chains.
• Extendible Hashing avoids overflow pages by splitting a
full bucket when a new data entry is to be added to it.
– Directory to keep track of buckets, doubles periodically.
– Can get large with skewed data.
• Linear Hashing avoids directory by splitting buckets
round-robin, and using overflow pages.
– Overflow pages not likely to be long.
– Space utilization could be lower than Extendible Hashing,
since splits not concentrated on “dense” data areas.
15
Extendible hashing
Extendible hashing does not require an overflow area; it
uses the directories whose entries point to the disk
blocks that stores a record
• A directory is an array of size 2d where d is called the
global depth.
• A directory is accessed using binary representation of
the hash value h(K)
16
Extendible
Hashing
17
Example • Directory is array of size 4.
• To find bucket for r, take last `global
LOCAL DEPTH 2
depth’ # bits of h(r); we denote r by
4* 12* 32* 16*
Bucket A h(r).
GLOBAL DEPTH
– If h(r) = 5 = binary 101, it is in
2 2
bucket pointed to by 01.
Bucket B
00 1* 5* 21* 13*
01
2
• Insert: If bucket is full, split it
10
Bucket C (allocate new page, re-distribute).
11 10*
2
• If necessary, double the
DIRECTORY
15* 7* 19* Bucket D directory. (Splitting a bucket
does not always require
DATA PAGES doubling)
Global depth of directory: Max # of bits needed to tell which bucket an entry belongs to.
Local depth of a bucket: # of bits used to determine if an entry belongs to this bucket. 18
Insert h(r) = 6 (The Easy Case)
6 = binary 00110
Bucket A Bucket A
LOCAL DEPTH 2 LOCAL DEPTH 2
GLOBAL DEPTH 4* 12* 32* 16* GLOBAL DEPTH 4* 12* 32* 16*
2 2 Bucket B 2 2 Bucket B
01 01
Bucket C Bucket C
10 2 10 2
11 10* 11 10* 6*
Bucket D Bucket D
DIRECTORY 2 DIRECTORY 2
15* 7* 19* 15* 7* 19*
Bucket A
LOCAL DEPTH 2 2 Bucket A
LOCAL DEPTH 3 Bucket A
LOCAL DEPTH
GLOBAL DEPTH 4* 12*32*16* 32* 16*
GLOBAL DEPTH 32*16*
GLOBAL DEPTH
Bucket B 2 2 Bucket B
2 2 3 2 Bucket B
00 1* 5* 21* 13*
00 1* 5* 21*13* 000 1* 5* 21*13*
01
2 Bucket C 001
01 10
010 2 Bucket C
Bucket C 11 10*
10 2
011 10*
11 10* 2 Bucket D 100
DIRECTORY 2 Bucket D
15* 7* 19* 101
Bucket D 110 15* 7* 19*
DIRECTORY 2
2 Bucket A2 111
15* 7* 19*
4* 12*20* 3 Bucket A2
DIRECTORY
DATA PAGES (`split image' 4* 12*20*
of Bucket A)
Extendable Hashing
• Benefits of extendable hashing:
– Hash performance does not degrade with growth of file
– Minimal space overhead
• Disadvantages of extendable hashing
– Extra level of indirection to find desired record
– Bucket address table may itself become very big (larger than
memory)
• Need a tree structure to locate desired record in the structure!
– Changing size of bucket address table is an expensive
operation
• Linear hashing is an alternative mechanism which
avoids these disadvantages at the possible cost of more
bucket overflows
Linear hashing
• Linear hashing does require an overflow area but does
not use a directory.
– Blocks are split in linear order as the file expands (round robin).
22
Linear Hashing (Contd.)
• Insert: If bucket to insert into is full:
• Add overflow page and insert data entry.
• (Maybe) Split Next bucket and increment Next.
• Can choose any criterion to `trigger’ split.
• Overflow occurs
• Buckets reached threshold of capacity (e.g. 75%)
• Since buckets are split round-robin, long overflow chains
don’t develop!
• Doubling of directory in Extendible Hashing is similar;
switching of hash functions is implicit in how the # of
bits examined is increased.
M=4 Insert 43
Level=0 Level=0
Insert 50 25
Indexes
• Types of Single-level Ordered Indexes
– Primary Indexes
– Clustering Indexes
– Secondary Indexes
• Multilevel Indexes
• Dynamic Multilevel Indexes Using B and B+ Trees
26
Indexes as Access Paths
• A single-level index is an auxiliary file that
makes it more efficient to search for a record
in the data file.
• The index is usually specified on one field of
the file (although it could be specified on
several fields)
• One form of an index is a file of entries <field
value, pointer to record>, which is ordered by
field value
• The index is called an access path on the field.
27
Indexes as Access Paths (contd.)
28
Single-Level Indexes: Primary Index
• Primary Index
– Defined on an ordered data file
– The data file is ordered on a key field
– Includes one index entry for each block in the data file; the
index entry has the key field value for the first record in the
block, which is called the block anchor
– A similar scheme can use the last record in a block.
– A primary index is a nondense (sparse) index, since it
includes an entry for each disk block of the data file and the
keys of its anchor record rather than for every search value.
– Problem: insertion and deletion of records
29
Primary index on
the ordering key
field
30
Indexes as Access Paths
• Example: Given the following data file EMPLOYEE(NAME,
SSN, ADDRESS, JOB, SAL, ... )
• Suppose that:
– record size R=100 bytes block size B=1024 bytes r=30,000
records
• Then, we get:
– blocking factor Bfr= bottom(B div R) = 10 records/block
– number of file blocks b= (r/Bfr)= (30000/10)= 3000 blocks
31
Example: Block Accesses with Primary Index
• For an index on the SSN field, assume the field size VSSN=9 bytes,
assume the record pointer size PR=6 bytes. Then:
– index entry size RI=(VSSN+ PR)=(9+6)=15 bytes
– index blocking factor BfrI= B div RI= bottom(1024 div 15)= 68
entries/block
– Number of index entries is 3000
– number of index blocks b= (r/ BfrI)= (3000/68)= 45 blocks
– binary search on index file needs upper(log2bI)= log245= 6 block
accesses
– one additional block access for the data file = 7 block accesses
32
Practice
• Example: Given the following data file EMPLOYEE(NAME, SSN,
ADDRESS, JOB, SAL, ...
• Suppose that:
– record size R=150 bytes block size B=512 bytes r=30000
records
33
Single-Level Indexes: Clustering Index
• Clustering Index
– Defined on an ordered data file
– The data file is ordered on a non-key field unlike primary index,
which requires that the ordering field of the data file have a
distinct value for each record.
– Includes one index entry for each distinct value of the field; the
index entry points to the first data block that contains records
with that field value.
– It is another example of sparse (nondense) index
– Insertion and Deletion is relatively straightforward with a
clustering index.
34
A Clustering
Index
Example
• FIGURE 14.2
A clustering index on the
DEPTNUMBER ordering
non-key field of an
EMPLOYEE file.
35
Another Clustering Index
Example:
36
Single-Level Indexes: Secondary Index
• Secondary Index
– A secondary index provides a secondary means of accessing a
file for which some primary access already exists.
– The secondary index may be on a field which is a candidate key
and has a unique value in every record, or a non-key with
duplicate values.
– The index is an ordered file with two fields.
• The first field is of the same data type as some non-ordering field
of the data file that is an indexing field.
• The second field is either a block pointer or a record pointer.
– There can be many secondary indexes (and hence, indexing
fields) for the same file.
– Includes one entry for each record in the data file; hence, it is a
dense index
37
Example of a
Dense
Secondary Index Block pointers,
Nonordering
Key field
38
An Example of a
Secondary Index on
a nonkey field,
record pointers
39
Multi-Level Indexes
• Because a single-level index is an ordered file, we can create a
primary index to the index itself;
– In this case, the original index file is called the first-level index and
the index to the index is called the second-level index.
• We can repeat the process, creating a third, fourth, ..., top level
until all entries of the top level fit in one disk block
• A multi-level index can be created for any type of first-level index
(primary, secondary, clustering) as long as the first-level index
consists of more than one disk block
• A multi-level index is a form of search tree
• Insertion and deletion of new index entries is a severe problem
because every level of the index is an ordered file.
40
A Two-level Primary
Index
41
Example: Multilevel index
fo (fan-out) is blocking factor for the index
t=3 levels
42
Search trees
Nodes are in a form P1,K1, P2, K2…,Kq-1,Pq
Order p: at most p-1 search values and p pointers (q<=p)
Balanced (not binary): all of its leaf are at the same level
43
Dynamic Multilevel Indexes: B -Trees and B+-Trees
45
B-tree Structures
46
The Nodes of a B+-tree
• The nodes of a B+-tree
– (a) Internal node of a B+-tree with q –1 search values.
– (b) Leaf node of a B+-tree with q – 1 search values and q – 1 data pointers.
47
Using B+ Tree Indexes
• For B+ tree of order p
– internal nodes have at least p/2 pointers - round up, and at most p
pointers
• for p=3, internal nodes are either (1 value, 2 pointers) or (2 values, 3 pointers)
– leaf node has at least ( p-1)/2 keys, at most (p-1) keys, and a
horizontal pointer to next leaf
• (for p=3, leaf nodes are either 1 or 2 values)
• Retrieving a record - search root, follow pointer to correct
child, search that, follow pointer to correct child…until you
get to leaf, then find key and follow data pointer
• Inserting a record - insert key and address in correct leaf of
index. If too full, split. May split parent, etc. - could add
level
• Deleting a record - delete key and address from leaf. If too
empty, redistribute with left sibling – if not possible, try
right sibling – if not possible unite - could lose level
48
Leaf Index Inserting a new element into B+ tree
Page Page
Full Full Action
IF the next level index page is full, continue splitting the index
pages.
49
An Example of
an insertion in a
B+-tree
50
Leaf page Index page Deleting from B+ tree
below below fill
fill factor factor Action
Yes No Combine the leaf page and its sibling. Change the index
page to reflect the change.
Yes Yes 1. Combine the leaf page and its sibling.
2. Adjust the index page to reflect the change.
3. Combine the index page with its sibling.
Continue combining index pages until you reach a page
with the correct fill factor or you reach the root page.
51
An Example
of a deletion
in a B+-tree
52