Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Slides compiled from Dr.

Chen’s material and Fundamentals of Database Systems, Elmasri, Navathe, 7th edition

PHYSICAL DESIGN
Introduction

 Conceptual / Logical / Physical Design

ANSI-SPARC Three-level Architecture

Part of database that is


relevant to a particular user

Meta-data, schema, System


catalogue

Details of data storage and


access path
Introduction

 Storage hierarchy:
 Primary storage: includes storage media that can be operated on directly by
the computer CPU, comes from RAM
 Secondary storage – hard-disk drives
 Tertiary storage - optical disks and tape

 Databases are stored on secondary storage because:


 Databases are too large for primary storage
 Permanent data loss happens less frequently on secondary storage
 Cost of storage per unit of data is much less

 Often database backups are


stored on magnetic tapes or DVDs
Records

 A file is a sequence of records

 Records
 Contain fields which have values of a particular type (i.e., rows of a table)
 Stored in disk blocks

 Blocking factor (bfr): the number of records per block

 Example:
 Assume # of records: r = 30,000 records in the EMPLOYEE file
 Each record size: s = 150 bytes Block size: B = 512 bytes
 bfr = ? bfrf = floor(B/s) = floor(512/150)= 3 records/block
 # of blocks needed to store the file? # of file blocks: fb = ceiling(r/bfrf)= ceiling(30000/3)= 10,000 blocks
 Average linear search: how many block accesses? fb/2 = 10000/2 = 5000
 Average binary search: how many block accesses? ceiling(log2fb) = ceiling(log210000) = 14

 How to find records more efficiently?


Storing Row Data

 The storage structure is proprietary (not standards based) and


varies per DBMS
 Different approaches to map column values per row in binary
formats that
 Optimize storage efficiency (least amount of data)
 Optimize access efficiency (fast search access)

 Simple row data (all non-BLOB columns) must fit within a


database page minus overheads
 BLOB (whether text or binary) is commonly only stored as a
pointer (8-bytes) with the row data
Optimizing Storage Efficiency

 Tables commonly have NULLable columns


 Don’t need to save data not defined for a given row – sparse tables
 Character data commonly varies in length
 Only save meaningful data bytes – don’t pad

 Need to compare specific column values quickly to filter rows for


a search query
 Fixed storage structures are quicker to access than parsing
storage values per row

 Some strategies are…


Row Data Page Layout

 Rows must fit within one page


 Rows are packed together such that empty space at the end
 Empty space exists
 Cannot fit more rows of same table on a page
 Allows-for/results-from changes to varying length column values

Page
Page Header Row 1 Row 2
Row 2 Row 3
Row 3 Row 4
Row 5 Free Space
Free Space
Free Space
Row Header – Row Data with Offsets

 Identification information
 Internal row ID
 Schema information
 Row data length
 Column count (maximum relative column with valid not NULL
data values)

Row Data
Row Header Value Offsets Column 1 Column 2 Column 3

Column 4 Column 5

Column 5 Column 6

Column 8
Row Header - Column Offset Array

 1-to-2-byte offset of a value relative to the beginning of the


row data
 Allows calculation of varying length columns
 Carry “magic” value (such as -1) for NULL values
 Only needs to be as long as columns with valid data values
Row Data
Row Header Column 1 Column 2 NULL Column 4 Length Column 5

Column 5 Length Column 6

Column 6 NULL Length Column 8 NULL

NULL Length Column 11


Common Strategies: Column Markers

 Per column markers


 1-byte NULL flag if column allows NULLs
 1 to 2 byte length for varying length data types

 Markers only present if declared column requires info

 Trailing columns with NULLs truncated (no data


included)
Data Types and Storage

Data Type Byte Storage (with NULLs / Lengths)


Strings ANSI (1 byte * m) or UNICODE (2 byte * m)
Binary 1 byte * m
Integers size (1, 2, 4, or 8 bytes) or (m + 1) / 2 + 1bytes
Floats size (4 or 8 byte)
Fixed Decimal (m + 1) / 2 + 1 bytes
Logical (Bit) 1 byte
Temporal commonly 4 to 10 bytes based on precision needed
BLOBs 8 byte pointer on row data + enough pages to hold all data
Efficiency: How Can You Help?

 Know your data!


 Assume that DBMS is “dumb”…

 it
does not know about the nature of your data
better than you!

 Optimize your schema:


 Fixed,not NULL columns?
 Varying, not NULL columns?

 Least NULLable to most NULLable columns


INDEXING
Index

 Used to speed up the retrieval of records in response to certain


search conditions
 efficient access to records based on indexing fields
 search indexes -> addresses -> records
 any field can be used to create an index
 multiple indexes on different fields can be constructed on the same file

 Common indexes
 single-level indexes: Primary Index, Clustering Index, Secondary Index
 multileveled indexes: B-tree, B+-tree, etc.
Single-Level Indexes

 Usually defined on a single field of a table (indexing field)


 index file is much smaller than the data file
 binary search is reasonable efficient

 Types
 Primary Index: on a key field (field values ordered, distinct)
 one index entry for each record block (block anchor –first record’s key)
 Clustering Index: on ordered non-key field (field values ordered,
duplicate)
 one index entry for each distinct value of the field: the first block that
contains records with that field value.
 Secondary Index: on non-ordered field (field values non-ordered,,
distinct / duplicate))
 used when some other primary access already existed
Primary Index
on the Key Field,
(ordered, distinct)
A Clustering
Index Example:
non-key, (ordered,
duplicate)
Index field
(Not ordered, distinct value)

Index file:
<ordered value, Pointer>

Pointer: to the record


Index field
Not ordered, duplicated value

Index file:
<ordered value, Pointer>

Pointer: to the block that contains


record pointers
Secondary Index - review

 Secondary Index: used when some other primary access already


exists
 Indexing field
 Not ordered & with distinct value
 Not ordered & have duplicated values

 Index file containing (key, pointer)


 Key: each value of the indexing field
 Important: index values are ordered

 Pointer
 For distinct: to the record (dense index)
 For duplicated: to the block that containing record pointers (sparse index)
Efficiency: # of block accesses
 Assume a file with # of records r = 30,000 with record size s = 150 bytes and block
size B = 512 bytes

 Without index - review


 File blocking factor: bfr_f = floor(B/s) = floor(512/150)= 3 records/block
 # of file blocks: fb = ceiling(r/bfr_f)= ceiling(30000/3)= 10,000 blocks
 If search field that is not ordered: linear search: fb/2 = 10000/2 = 5000
 If ordered: binary search: ceiling(log2fb) = ceiling(log210000) = 14

 With primary index


 Assume size of indexing field V = 9 bytes,
pointer P = 7 bytes and size of index entry si = V+P=16
 Index blocking factor: bfri = B/si = 512/16 = 32
 # of index entries: ri = fbtotal = 10,000
 # of index blocks: fbi = ceiling(ri/bfri) = ceiling(10000/32) = 313
 Block accesses: # of index block accesses
(binary search)+ 1 file block access = ceiling(log2fbi) + 1
= ceiling(log2 313) + 1 = 9+1= 10
How to further improve efficiency?

 single-level index itself (index file): ordered, distinct


 create a primary index to the index itself

 Multi-level indexes!
 We can repeat the process, creating a third, fourth, ..., top level until all
entries of the top level fit in one disk block
 Such a multi-level index is a form of search tree
 However, insertion and deletion of new index entries is a problem because
every level of the index is an ordered file.
A Two-Level
Primary Index
Dynamic multi-level index using B trees
Dynamic multi-level index:
B-tree & B+-tree

 Most widely used multi-level indexes


 Balanced tree
 Each node corresponds to a disk block
 Each node is kept between half-full and completely full
 An insertion into a node
 Not full: quite efficient;
 Full: causes a split into two nodes. May propagate to other tree levels
 A deletion
 A node does not become less than half full: quite efficient;
 Causes a node to become less than half full: it must be merged with
neighboring nodes - May propagate to other tree levels
2-3 tree & B tree
• 2-3 tree: B tree of order 3
– Root: either a leaf or has between 2 and 3 children
– Other non-leaf node: between 2 and 3 children
– All nodes: between 1 and 2 key values

• B tree
– Is a generalization of 2-3 trees
– All leaves are at the same depth balanced tree: of order M
– Root: either a leaf or has between 2 and M children (1 and M-1 keys)
– Other node is kept between half-full and completely full
• # of tree pointers: between M/2 and M
• # of key values: between M/2 -1 and M -1 (always 1 fewer than # of tree
pointers in its same node)
– Each node corresponds to a disk block
• Big O: search using B tree vs. AVL?
B+ tree properties

• Root: if tree has <=L items, root is a leaf, otherwise between 2 and M
children

• Internal node order M or Pint


– # of tree pointers: Pint/2 ~ Pint
– # of key values: Pint/2-1 ~ Pint-1 key values (only “virtual” data)
– data pointers? None! (not real data entries)
– In a node: Smaller data to the left, bigger to the right
– Tree branches: lefti <= keyi; keyi< righti <= keyi+1

• Leaf node order L or Pleaf (all leaves have the same depth)
– # of data pointers: Pleaf/2 ~ Pleaf
– # of key values: Pleaf/2 ~ Pleaf (“real” data)
– Tree pointers? 1 tree pointer to next leaf
Example
• So how do we pick M and L (Pint and Pleaf)?
o Depends on the application, in our example, it would be based on
the disk-block size

• Suppose M=4 (max # ptrs in internal node) and L=5 (max


# data items at leaf)
o All internal nodes have at least 2 children
o All leaves have at least 3 data items

28
Back to disk application example
• What makes B trees appropriate for such an application?

• Many keys stored in one internal node


– All brought into memory in one disk access
• IF we pick M wisely
– Makes the binary search over M-1 keys totally worth it (insignificant
compared to disk access times)

• Internal nodes contain only keys


– Any find wants only one data item; wasteful to load unnecessary items
with internal nodes
– So only bring one leaf of data items into memory
– Data-item size doesn’t affect what M is

29
Example of B+-tree insertion
• Insertion sequence: 8, 5, 1, 7, 3, 12, 9, 6
– For B+-tree, assume Pint = 3 and Pleaf = 2
– Always insert to leaf node, all real data stay at leaf
– Split rule: “promote” the middle “virtual” value
• Left: leaf node <= upper level key; internal node < upper level key
• Right: >

• Practice:
– Insertion sequence 12, 9, 6, 7, 1, 5, 8, 3?
– How about 1, 5, 7, 3, 8, 12, 9, 6? (resulting tree same as the B+ tree
for 8, 5, 1, 7, 3, 12, 9, 6)
Example of
an Insertion in
a B+tree
B+ tree for 12, 9, 6, 7, 1, 5, 8, 3
B+ tree efficiency example

 Assume disk block size B = 512; key size K = 9; data pointer


size Pd = 7; tree pointer size Pt = 6

 B+ tree
 Calculate Pint & Pleaf
(Pint * Pt) + (Pint - 1)*K <= B Pleaf * (Pd + K) + Pt <= B
16 Pleaf + 6 <= 512
6 Pint + 9(Pint - 1) <= 512 Pleaf = 31
Pint = 34
Application Example (disk)
Consider the application of B-trees for directories. Suppose that you are given a B-tree with
parameters (Pint) M=11 and (Pleaf) L=8. Determine the approximate size of a disk block on
the machine where this implementation would be used, assuming you also have the following
information:
– Key Size =8 Bytes
– Pointer Size = 4 Bytes
– Data Size = 16 Bytes per record (including the key)
Give a numeric answer showing all the steps taken to arrive to your answer (Compute the
possible sizes of both internal and leaf nodes). In addition, give a short explanation using the
parameters given and the equations.

• Determine the internal node size

• Determine leaf size

The larger size is 128 (which turns out to be a power of 2), hence the disk block is 128 bytes

35

You might also like