Physical Design Notes

Slides compiled from Dr.
Chen’s material and Fundamentals of Database Systems, Elmasri, Navathe, 7th edition
PHYSICAL DESIGN
Introduction
 Conceptual / Logical / Physical Design
ANSI-SPARC Three-level Architecture
Part of database that is

relevant to a particular user
Meta-data, schema, System

catalogue
Details of data storage and

access path
Introduction
 Storage hierarchy:
 Primary storage: includes storage media that can be operated on directly by
the computer CPU, comes from RAM
 Secondary storage – hard-disk drives
 Tertiary storage - optical disks and tape
 Databases are stored on secondary storage because:

 Databases are too large for primary storage
 Permanent data loss happens less frequently on secondary storage
 Cost of storage per unit of data is much less
 Often database backups are

stored on magnetic tapes or DVDs
Records
 A file is a sequence of records
 Records
 Contain fields which have values of a particular type (i.e., rows of a table)
 Stored in disk blocks
 Blocking factor (bfr): the number of records per block
 Example:
 Assume # of records: r = 30,000 records in the EMPLOYEE file
 Each record size: s = 150 bytes Block size: B = 512 bytes
 bfr = ? bfrf = floor(B/s) = floor(512/150)= 3 records/block
 # of blocks needed to store the file? # of file blocks: fb = ceiling(r/bfrf)= ceiling(30000/3)= 10,000 blocks
 Average linear search: how many block accesses? fb/2 = 10000/2 = 5000
 Average binary search: how many block accesses? ceiling(log2fb) = ceiling(log210000) = 14
 How to find records more efficiently?

Storing Row Data
 The storage structure is proprietary (not standards based) and

varies per DBMS
 Different approaches to map column values per row in binary
formats that
 Optimize storage efficiency (least amount of data)
 Optimize access efficiency (fast search access)
 Simple row data (all non-BLOB columns) must fit within a

database page minus overheads
 BLOB (whether text or binary) is commonly only stored as a
pointer (8-bytes) with the row data
Optimizing Storage Efficiency
 Tables commonly have NULLable columns

 Don’t need to save data not defined for a given row – sparse tables
 Character data commonly varies in length
 Only save meaningful data bytes – don’t pad
 Need to compare specific column values quickly to filter rows for

a search query
 Fixed storage structures are quicker to access than parsing
storage values per row
 Some strategies are…

Row Data Page Layout
 Rows must fit within one page

 Rows are packed together such that empty space at the end
 Empty space exists
 Cannot fit more rows of same table on a page
 Allows-for/results-from changes to varying length column values
Page
Page Header Row 1 Row 2
Row 2 Row 3
Row 3 Row 4
Row 5 Free Space
Free Space
Free Space
Row Header – Row Data with Offsets
 Identification information
 Internal row ID
 Schema information
 Row data length
 Column count (maximum relative column with valid not NULL
data values)
Row Data
Row Header Value Offsets Column 1 Column 2 Column 3
Column 4 Column 5
Column 5 Column 6
Column 8
Row Header - Column Offset Array
 1-to-2-byte offset of a value relative to the beginning of the

row data
 Allows calculation of varying length columns
 Carry “magic” value (such as -1) for NULL values
 Only needs to be as long as columns with valid data values
Row Data
Row Header Column 1 Column 2 NULL Column 4 Length Column 5
Column 5 Length Column 6
Column 6 NULL Length Column 8 NULL
NULL Length Column 11

Common Strategies: Column Markers
 Per column markers

 1-byte NULL flag if column allows NULLs
 1 to 2 byte length for varying length data types
 Markers only present if declared column requires info
 Trailing columns with NULLs truncated (no data

included)
Data Types and Storage
Data Type Byte Storage (with NULLs / Lengths)

Strings ANSI (1 byte * m) or UNICODE (2 byte * m)
Binary 1 byte * m
Integers size (1, 2, 4, or 8 bytes) or (m + 1) / 2 + 1bytes
Floats size (4 or 8 byte)
Fixed Decimal (m + 1) / 2 + 1 bytes
Logical (Bit) 1 byte
Temporal commonly 4 to 10 bytes based on precision needed
BLOBs 8 byte pointer on row data + enough pages to hold all data
Efficiency: How Can You Help?
 Know your data!

 Assume that DBMS is “dumb”…
 it
does not know about the nature of your data
better than you!
 Optimize your schema:

 Fixed,not NULL columns?
 Varying, not NULL columns?
 Least NULLable to most NULLable columns

INDEXING
Index
 Used to speed up the retrieval of records in response to certain

search conditions
 efficient access to records based on indexing fields
 search indexes -> addresses -> records
 any field can be used to create an index
 multiple indexes on different fields can be constructed on the same file
 Common indexes
 single-level indexes: Primary Index, Clustering Index, Secondary Index
 multileveled indexes: B-tree, B+-tree, etc.
Single-Level Indexes
 Usually defined on a single field of a table (indexing field)

 index file is much smaller than the data file
 binary search is reasonable efficient
 Types
 Primary Index: on a key field (field values ordered, distinct)
 one index entry for each record block (block anchor –first record’s key)
 Clustering Index: on ordered non-key field (field values ordered,
duplicate)
 one index entry for each distinct value of the field: the first block that
contains records with that field value.
 Secondary Index: on non-ordered field (field values non-ordered,,
distinct / duplicate))
 used when some other primary access already existed
Primary Index
on the Key Field,
(ordered, distinct)
A Clustering
Index Example:
non-key, (ordered,
duplicate)
Index field
(Not ordered, distinct value)
Index file:
<ordered value, Pointer>
Pointer: to the record

Index field
Not ordered, duplicated value
Index file:
<ordered value, Pointer>
Pointer: to the block that contains

record pointers
Secondary Index - review
 Secondary Index: used when some other primary access already

exists
 Indexing field
 Not ordered & with distinct value
 Not ordered & have duplicated values
 Index file containing (key, pointer)

 Key: each value of the indexing field
 Important: index values are ordered
 Pointer
 For distinct: to the record (dense index)
 For duplicated: to the block that containing record pointers (sparse index)
Efficiency: # of block accesses
 Assume a file with # of records r = 30,000 with record size s = 150 bytes and block
size B = 512 bytes
 Without index - review

 File blocking factor: bfr_f = floor(B/s) = floor(512/150)= 3 records/block
 # of file blocks: fb = ceiling(r/bfr_f)= ceiling(30000/3)= 10,000 blocks
 If search field that is not ordered: linear search: fb/2 = 10000/2 = 5000
 If ordered: binary search: ceiling(log2fb) = ceiling(log210000) = 14
 With primary index

 Assume size of indexing field V = 9 bytes,
pointer P = 7 bytes and size of index entry si = V+P=16
 Index blocking factor: bfri = B/si = 512/16 = 32
 # of index entries: ri = fbtotal = 10,000
 # of index blocks: fbi = ceiling(ri/bfri) = ceiling(10000/32) = 313
 Block accesses: # of index block accesses
(binary search)+ 1 file block access = ceiling(log2fbi) + 1
= ceiling(log2 313) + 1 = 9+1= 10
How to further improve efficiency?
 single-level index itself (index file): ordered, distinct

 create a primary index to the index itself
 Multi-level indexes!
 We can repeat the process, creating a third, fourth, ..., top level until all
entries of the top level fit in one disk block
 Such a multi-level index is a form of search tree
 However, insertion and deletion of new index entries is a problem because
every level of the index is an ordered file.
A Two-Level
Primary Index
Dynamic multi-level index using B trees
Dynamic multi-level index:
B-tree & B+-tree
 Most widely used multi-level indexes

 Balanced tree
 Each node corresponds to a disk block
 Each node is kept between half-full and completely full
 An insertion into a node
 Not full: quite efficient;
 Full: causes a split into two nodes. May propagate to other tree levels
 A deletion
 A node does not become less than half full: quite efficient;
 Causes a node to become less than half full: it must be merged with
neighboring nodes - May propagate to other tree levels
2-3 tree & B tree
• 2-3 tree: B tree of order 3
– Root: either a leaf or has between 2 and 3 children
– Other non-leaf node: between 2 and 3 children
– All nodes: between 1 and 2 key values
• B tree
– Is a generalization of 2-3 trees
– All leaves are at the same depth balanced tree: of order M
– Root: either a leaf or has between 2 and M children (1 and M-1 keys)
– Other node is kept between half-full and completely full
• # of tree pointers: between M/2 and M
• # of key values: between M/2 -1 and M -1 (always 1 fewer than # of tree
pointers in its same node)
– Each node corresponds to a disk block
• Big O: search using B tree vs. AVL?
B+ tree properties
• Root: if tree has <=L items, root is a leaf, otherwise between 2 and M
children
• Internal node order M or Pint

– # of tree pointers: Pint/2 ~ Pint
– # of key values: Pint/2-1 ~ Pint-1 key values (only “virtual” data)
– data pointers? None! (not real data entries)
– In a node: Smaller data to the left, bigger to the right
– Tree branches: lefti <= keyi; keyi< righti <= keyi+1
• Leaf node order L or Pleaf (all leaves have the same depth)
– # of data pointers: Pleaf/2 ~ Pleaf
– # of key values: Pleaf/2 ~ Pleaf (“real” data)
– Tree pointers? 1 tree pointer to next leaf
Example
• So how do we pick M and L (Pint and Pleaf)?
o Depends on the application, in our example, it would be based on
the disk-block size
• Suppose M=4 (max # ptrs in internal node) and L=5 (max

# data items at leaf)
o All internal nodes have at least 2 children
o All leaves have at least 3 data items
28
Back to disk application example
• What makes B trees appropriate for such an application?
• Many keys stored in one internal node

– All brought into memory in one disk access
• IF we pick M wisely
– Makes the binary search over M-1 keys totally worth it (insignificant
compared to disk access times)
• Internal nodes contain only keys

– Any find wants only one data item; wasteful to load unnecessary items
with internal nodes
– So only bring one leaf of data items into memory
– Data-item size doesn’t affect what M is
29
Example of B+-tree insertion
• Insertion sequence: 8, 5, 1, 7, 3, 12, 9, 6
– For B+-tree, assume Pint = 3 and Pleaf = 2
– Always insert to leaf node, all real data stay at leaf
– Split rule: “promote” the middle “virtual” value
• Left: leaf node <= upper level key; internal node < upper level key
• Right: >
• Practice:
– Insertion sequence 12, 9, 6, 7, 1, 5, 8, 3?
– How about 1, 5, 7, 3, 8, 12, 9, 6? (resulting tree same as the B+ tree
for 8, 5, 1, 7, 3, 12, 9, 6)
Example of
an Insertion in
a B+tree
B+ tree for 12, 9, 6, 7, 1, 5, 8, 3
B+ tree efficiency example
 Assume disk block size B = 512; key size K = 9; data pointer

size Pd = 7; tree pointer size Pt = 6
 B+ tree
 Calculate Pint & Pleaf
(Pint * Pt) + (Pint - 1)*K <= B Pleaf * (Pd + K) + Pt <= B
16 Pleaf + 6 <= 512
6 Pint + 9(Pint - 1) <= 512 Pleaf = 31
Pint = 34
Application Example (disk)
Consider the application of B-trees for directories. Suppose that you are given a B-tree with
parameters (Pint) M=11 and (Pleaf) L=8. Determine the approximate size of a disk block on
the machine where this implementation would be used, assuming you also have the following
information:
– Key Size =8 Bytes
– Pointer Size = 4 Bytes
– Data Size = 16 Bytes per record (including the key)
Give a numeric answer showing all the steps taken to arrive to your answer (Compute the
possible sizes of both internal and leaf nodes). In addition, give a short explanation using the
parameters given and the equations.
• Determine the internal node size
• Determine leaf size
The larger size is 128 (which turns out to be a power of 2), hence the disk block is 128 bytes
35

Physical Design Notes

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Physical Design Notes

Uploaded by

Copyright:

Available Formats

Slides compiled from Dr.

 Conceptual / Logical / Physical Design

ANSI-SPARC Three-level Architecture

Part of database that is

Meta-data, schema, System

Details of data storage and

 Databases are stored on secondary storage because:

 Often database backups are

 A file is a sequence of records

 Blocking factor (bfr): the number of records per block

 How to find records more efficiently?

 The storage structure is proprietary (not standards based) and

 Simple row data (all non-BLOB columns) must fit within a

 Tables commonly have NULLable columns

 Need to compare specific column values quickly to filter rows for

 Some strategies are…

 Rows must fit within one page

 1-to-2-byte offset of a value relative to the beginning of the

Column 5 Length Column 6

Column 6 NULL Length Column 8 NULL

NULL Length Column 11

 Per column markers

 Markers only present if declared column requires info

 Trailing columns with NULLs truncated (no data

Data Type Byte Storage (with NULLs / Lengths)

 Know your data!

 Optimize your schema:

 Least NULLable to most NULLable columns

 Used to speed up the retrieval of records in response to certain

 Usually defined on a single field of a table (indexing field)

Pointer: to the record

Pointer: to the block that contains

 Secondary Index: used when some other primary access already

 Index file containing (key, pointer)

 Without index - review

 With primary index

 single-level index itself (index file): ordered, distinct

 Most widely used multi-level indexes

• Internal node order M or Pint

• Suppose M=4 (max # ptrs in internal node) and L=5 (max

• Many keys stored in one internal node

• Internal nodes contain only keys

 Assume disk block size B = 512; key size K = 9; data pointer

• Determine the internal node size

• Determine leaf size

You might also like