CL4636inst

V9.
cover
IBM Training Front cover

Instructor Guide
DB2 10.5 for LUW Advanced Database Administration with DB2

BLU Acceleration
Course code CL463 ERC 6.0
Instructor Guide
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide.
The following are trademarks of International Business Machines Corporation, registered in many
jurisdictions worldwide:
AIX® Balanced Warehouse® DB™
DB2 Universal Database™ DB2® InfoSphere®
Notes® Optim™ PartnerWorld®
pureScale® Tivoli® WebSphere®
400®
Adobe is either a registered trademark or a trademark of Adobe Systems Incorporated in the United
States, and/or other countries.
Intel and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in
the United States and other countries.
Lenovo and ThinkPad are trademarks or registered trademarks of Lenovo in the United States,
other countries, or both.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other
countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Java™ and all Java-based trademarks and logos are trademarks or registered trademarks of
Oracle and/or its affiliates.
VMware and the VMware "boxes" logo and design, Virtual SMP and VMotion are registered
trademarks or trademarks (the "Marks") of VMware, Inc. in the United States and/or other
jurisdictions.
Other product and service names might be trademarks of IBM or other companies.
April 2015 edition

The information contained in this document has not been submitted to any formal IBM test and is distributed on an “as is” basis without
any warranty either express or implied. The use of this information or the implementation of any of these techniques is a customer
responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While
each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will
result elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.
© Copyright International Business Machines Corporation 2005, 2015.

This document may not be reproduced in whole or in part without the prior written permission of IBM.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
V9.0
Instructor Guide
TOC Contents
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Course description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
Unit 1. Advanced Monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Monitoring infrastructure introduced in DB2 9.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4
Snapshot infrastructure characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7
Characteristics of In-Memory metrics used to support the monitor table functions . . . . . . . 1-9
Focus areas for In-Memory metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12
In-Memory Metrics: System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-14
In-Memory Metrics: Data objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-16
In-Memory Metrics: Activity perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18
Controls for collecting monitor data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-20
Monitoring system information using table functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-24
Monitoring time spent waiting for resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-27
Additional time-related metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-29
Additional wait times reported with DB2 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-31
Example of query using the XML document returned by MON_GET_CONNECTION_DETAILS 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-33
Monitoring administrative views simplify access to important metrics . . . . . . . . . . . . . . . . 1-36
Monitoring performance with SQL: Buffer pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-39
Monitoring performance with SQL: Sorts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-41
Monitoring performance with SQL: Top dynamic SQL statements . . . . . . . . . . . . . . . . . . 1-43
Monitoring performance with SQL: Long-running SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-45
Monitoring performance with SQL application wait times . . . . . . . . . . . . . . . . . . . . . . . . . 1-48
Monitoring performance with SQL: Lock chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-50
Monitoring performance with SQL: Lock memory usage . . . . . . . . . . . . . . . . . . . . . . . . . . 1-53
Monitoring performance with SQL: Lock escalations, deadlocks and timeouts . . . . . . . . . 1-55
Monitoring performance with SQL queries that have a high prep time . . . . . . . . . . . . . . . 1-57
Monitoring performance with SQL: Costly table scans . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-60
Monitoring performance with SQL: Checking page cleaners . . . . . . . . . . . . . . . . . . . . . . 1-63
Monitoring prefetch efficiency of applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-66
Monitoring performance: Database memory usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-68
Using MONITOR functions with database partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-70
Monitor performance with SQL: Index usage with multiple database partitions . . . . . . . . 1-72
Monitoring performance with SQL DPF queries that waited on FCM Send/Receive . . . . . 1-74
Monitor Log space usage with the table function MON_GET_TRANSACTION_LOG . . . . 1-76
Monitor health: Oldest transaction holding log space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-78
Monitor health: Table space size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-80
Database recovery: Split mirror copy planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-82
Tracking monitor history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-84
DB2 LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs . . . 1-86
DB2 for Linux, UNIX, and Windows Performance Tuning and Monitoring Workshop . . . . 1-88
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-90
Student Exercise 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-92
© Copyright IBM Corp. 2005, 2015 Contents iii

Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Instructor Guide
Unit 2. Advanced Table Space Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3
Storage Management alternatives (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-5
Storage Management alternatives (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-8
DMS and Automatic Storage table space characteristics . . . . . . . . . . . . . . . . . . . . . . . . . .2-11
DMS table spaces: Internal structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-14
Table space management: High Water Marks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-17
High Water Mark: Dropped table (Example 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-20
High Water Mark: Dropped table (Example 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-22
High Water Mark: Offline table REORG without using temporary tablespace (Example 1) 2-24
High Water Mark: Offline table REORG without using temporary tablespace (Example 2) 2-27
Using MON_GET_CONTAINER and MON_GET_TABLESPACE to check space allocations 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-30
DB2 functions to reclaim unused storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-32
Reclaiming space using ALTER TABLESPACE with Automatic Storage . . . . . . . . . . . . .2-34
Reclaimable Automatic Storage: Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-37
Reclaimable Automatic Storage: Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-39
Checking table spaces for reclaimable storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-41
Reclaimable storage: Monitoring the processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-44
Table space extent allocation: Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-46
Table space maps: ALTER TABLESPACE ADD example . . . . . . . . . . . . . . . . . . . . . . . .2-49
Monitoring the Rebalancer: LIST UTILITIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-52
Rebalancer: db2diag.log Status messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-54
Extent Allocations using ALTER NEW STRIPE SET . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-56
Moving a DMS managed tablespace to new containers Online using ALTER TABLESPACE 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-59
Example of Auto-growth stopping for DMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-61
Using Storage Groups for Automatic storage table spaces . . . . . . . . . . . . . . . . . . . . . . . .2-65
Review - Creating a storage group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-67
Review - Assigning a table space to a storage group . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-69
Query storage groups with SQL using the table function ADMIN_GET_STORAGE_PATHS 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-71
Listing storage groups with the db2pd command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-74
Changing the storage group for an Automatic Storage table space . . . . . . . . . . . . . . . . . .2-76
Tablespace rebalance can be suspended using ALTER TABLESPACE . . . . . . . . . . . . . .2-79
Monitoring extent movement for when the storage group is altered for a table space . . . .2-81
Table space growth with Automatic Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-84
Automatic Storage Rebalance to use newly added storage paths . . . . . . . . . . . . . . . . . . .2-86
Dropping storage paths: Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-89
Automatic Storage for Temporary table spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-92
Converting a DMS table space to use Automatic Storage . . . . . . . . . . . . . . . . . . . . . . . . .2-94
Example converting a DMS managed tablespace to use Automatic storage . . . . . . . . . . .2-97
Using WLM to prioritize activities based on the data accessed . . . . . . . . . . . . . . . . . . . . .2-99
Data centered Workload Management - Predictive . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-102
Example: Determining the priority of activities based on what data is estimated to be accessed 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-104
Data centered Workload Management - Reactive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-106
Example: Changing the priority of activities based on what data is accessed during execution -
Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-109
Example: Changing the priority of activities based on what data is accessed during execution -
Part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-111
iv DB2 10.5 for LUW Adv Admin with DB2 BLU © Copyright IBM Corp. 2005, 2015
V9.0
Instructor Guide
TOC Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-113

Student Exercise 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-115
Unit 3. DB2 10.5 BLU Acceleration Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
What is DB2 with BLU Acceleration? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5
Traditional methods to improve performance for analytic queries . . . . . . . . . . . . . . . . . . . . 3-8
The Seven Big Ideas of DB2 with BLU Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11
Application view of table data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
Columnar storage in DB2 (conceptual) – Big Idea #1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16
Column-organized tables - basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18
Big Idea #2 - Simple to Implement and Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-21
What does setting DB2_WORKLOAD=ANALYTICS impact ? . . . . . . . . . . . . . . . . . . . . . . 3-24
To implement DB2 BLU Acceleration without setting DB2_WORKLOAD . . . . . . . . . . . . . 3-27
What makes DB2 with BLU Acceleration easy to use ? . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-30
What happens when you create a new column-organized table ? . . . . . . . . . . . . . . . . . . . 3-33
Notes regarding the CREATE TABLE statement for a column-organized table . . . . . . . . . 3-35
Why is there a page map index for a column-organized table ? . . . . . . . . . . . . . . . . . . . . 3-38
Big Idea #3 : BLU uses Multiple Compression Techniques to achieve extreme compression 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-40
Column-Level Dictionaries are Static . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-42
Column Dictionaries are built by the LOAD utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-45
Load Processing for column-organized Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-48
Monitor column-organized table LOAD using LIST UTILITIES command . . . . . . . . . . . . . 3-51
Utility Heap Memory Considerations for LOAD utility with column-organized tables . . . . . 3-53
LOAD utility options for column-organized tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-56
Using the SYSCAT.TABLES data for column-organized tables to check size, compression
results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-58
Catalog information for Column Oriented tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-61
Use PCTENCODED in SYSCAT.COLUMNS to check for columns with a low percentage of
encoded values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-64
Compression result examples with column-organized tables using DB2 10.5 . . . . . . . . . . 3-66
Big Idea #4 Data Skipping - Synopsis Table used to improve scan efficiency for
column-organized tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-68
Additional information about synopsis tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-71
Sample Describe output for the Synopsis table associated with a column-organized Table 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-74
Creative approaches to reducing processing costs for queries with column-organized tables 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-77
Big Idea #5 - Deep Hardware Instruction Exploitation SIMD . . . . . . . . . . . . . . . . . . . . . . . 3-80
Vector processing engine for processing vectors of column data instead of individual values 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-82
Big Idea #6 - Core friendly parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-84
Intraquery parallelism and intrapartition parallelism required for Column-organized tables 3-86
Big Idea #7 - Scan friendly memory caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-89
Dynamic List prefetching for column-organized table access . . . . . . . . . . . . . . . . . . . . . . 3-91
BLU Acceleration illustration 10TB query in seconds - Register encoded vector processing 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-94
Storage Objects for Column-Organized Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-97
Monitoring Component Object Allocations for using ADMIN_GET_TAB_INFO . . . . . . . . 3-100
Using INSPECT CHECK TABLE for column-organized tables . . . . . . . . . . . . . . . . . . . . . 3-102
© Copyright IBM Corp. 2005, 2015 Contents v

Instructor Guide
LOAD utility considerations for column-organized tables . . . . . . . . . . . . . . . . . . . . . . . . .3-104

INSERT processing for column-organized tables - INSERT updates many pages compared to
row-organized tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-106
Processing for DELETE and UPDATE SQL statements with column-organized tables . .3-109
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-112
Unit 4. DB2 10.5 BLU Acceleration Implementation and Use . . . . . . . . . . . . . . . . . . . . . . . . .4-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3
Considerations for implementing column-organized tables . . . . . . . . . . . . . . . . . . . . . . . . .4-5
System requirements for column-organized tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-7
General system configuration recommendations for column-organized table usage . . . . .4-10
Sort memory configuration for column-organized tables . . . . . . . . . . . . . . . . . . . . . . . . . .4-12
Current restrictions for column-organized tables in DB2 10.5 . . . . . . . . . . . . . . . . . . . . . .4-14
IBM InfoSphere Optim Query Workload Tuner for DB2 for LUW estimates benefits for
column-organized tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-17
db2convert - command line tool to ease converting row-organized tables to column-organized
tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-19
Additional notes for db2convert usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-23
Sample db2convert output shows progress and compression results . . . . . . . . . . . . . . . .4-26
Using ADMIN_MOVE_TABLE to convert row-organized tables to a column-organized table 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-28
DB2 Utility support for column-organized tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-32
Referential Integrity and Unique constraints for column-organized tables . . . . . . . . . . . . .4-34
Using explain tools to evaluate access plans for column-organized tables . . . . . . . . . . . .4-36
Explain report for summary query using a column-organized table . . . . . . . . . . . . . . . . . .4-38
Execution Plans for column-organized tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-41
Example access plan for a column-organized table using an index scan . . . . . . . . . . . . . .4-44
Explain report detail for CTQ operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-47
Explain report object data for column-organized table . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-49
Explain report shows estimated costs vary depending on the number of columns accessed 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-51
What to look for in access plans for column-organized tables . . . . . . . . . . . . . . . . . . . . . .4-53
Explain report for joining two column-organized tables . . . . . . . . . . . . . . . . . . . . . . . . . . .4-56
Explain report for joining three column-organized tables . . . . . . . . . . . . . . . . . . . . . . . . . .4-59
Explain report using db2expln for joining three column-organized tables . . . . . . . . . . . . . .4-63
Explain report for joining two column-organized tables and one row-organized table . . . .4-65
DB2 Workload Management of databases with column-organized tables . . . . . . . . . . . . .4-69
Default query concurrency management for AYALYTICS workload databases . . . . . . . . .4-71
Automatic Workload Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-74
Default workload management objects for concurrency control . . . . . . . . . . . . . . . . . . . . .4-76
Default Workload flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-79
Default Workload Management Explained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-82
Default Workload Management Explained - continued . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-84
Querying the Default WLM Work Class Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-86
Querying the Default WLM Threshold Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-88
How Many Queries are Above/Below the Cost line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-90
Adjusting the TIMERON Cost Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-92
Easily Tune default WLM controls for over-utilized or under-utilized state . . . . . . . . . . . . .4-94
Using SQL and db2pd to monitor processing column-organized tables . . . . . . . . . . . . . . .4-97
Monitoring metrics for column-organized tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-99
Monitoring column-organized tables and synopsis tables using MON_GET_TABLE . . . .4-102
vi DB2 10.5 for LUW Adv Admin with DB2 BLU © Copyright IBM Corp. 2005, 2015
V9.0
Instructor Guide
TOC Monitoring the number of columns referenced per query for each table using
MON_GET_TABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-104
Monitoring Page Map Index statistics for column-organized tables using MON_GET_INDEX 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-106
Column-organized table join sortheap memory usage can be monitored using HASH join
statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-108
Additional monitoring elements for column-organized table processing . . . . . . . . . . . . . 4-111
Monitor elements to monitor prefetch requests for data in column-organized tables . . . . 4-114
Monitoring Database statistics with column-organized tables using MON_GET_DATABASE 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-117
Monitor column-organized table LOAD using db2pd command -utilities option . . . . . . . . 4-119
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-121
Student exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-123
Unit 5. DB2 10.5 BLU Acceleration Implementing Shadow Tables and User Maintained MQTs
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
Materialized Query Table – Concept Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5
MQT Refresh Options - review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
When can the DB2 Optimizer substitute a MQT table in the access plan for a query ? . . . 5-11
Utilization of MQT tables when using Column-organized tables . . . . . . . . . . . . . . . . . . . . 5-13
Creating a User Maintained MQT as a Column-organized table . . . . . . . . . . . . . . . . . . . . 5-16
Loading the data into the User Maintained MQT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-18
Checking usage of the User Maintained MQT in the access plan for a query . . . . . . . . . . 5-20
Shadow Tables can be used to accelerate Analytics query processing in an OLTP Database 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-22
Shadow table characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-24
How to create a Shadow table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-27
Summary of Shadow Tables Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-30
Example DDL to create a shadow table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-32
How to enable use of Shadow Tables for an application . . . . . . . . . . . . . . . . . . . . . . . . . . 5-34
More about Latency-based routing to Shadow Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-37
Configuration of SORTHEAP for a database with mixture of BLU Acceleration and OLTP
processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-40
Using a Connect Procedure to Enable Shadow Tables for SQL compilation . . . . . . . . . . . 5-42
A Sample Connect Procedure to enable shadow table usage for selected applications . . 5-44
Shadow Tables with InfoSphere Data Replication CDC . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-46
Asynchronous Maintenance of Shadow Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-49
Important Infosphere CDC terms and concepts for Shadow Table Implementation – part 1 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-52
Important Infosphere CDC terms and concepts for Shadow Table Implementation – part 2 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-55
Task list to implement Infosphere CDC for DB2 LUW to support Shadow Tables . . . . . . . 5-57
Using dmconfigurets to create a CDC Instance associated with the DB2 source database 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-59
Using the CDC Management Console to define a Datastore linked to the CDC Instance . 5-62
Using the CDC management Console to assign a user connection for the datastore . . . . 5-64
Create a CDC Subscription to manage Table Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . 5-66
Define a table mapping from the Row-organized source to the Column-organized target – Step
1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-68
© Copyright IBM Corp. 2005, 2015 Contents vii

Instructor Guide
2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-70
3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-72
4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-74
Start Mirroring for the Shadow Tables using the CDC Subscription . . . . . . . . . . . . . . . . . .5-76
End Replication using the CDC Subscription in order to define additional table mappings 5-79
Use the Management Console to monitor activity for the CDC subscription . . . . . . . . . . .5-81
Using Data Studio to create Shadow Tables or User Maintained MQT tables . . . . . . . . . .5-83
Infosphere Query Workload Tuner support for Shadow tables and Column-organized MQT
tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-85
Checking usage of a Shadow Table in the access plan for a query . . . . . . . . . . . . . . . . . .5-87
Using Shadow tables to process queries that join multiple Row-organized tables . . . . . . .5-89
Access plan example for joining two row-organized tables with access routed to two shadow
tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-91
Comparison of Column-organized User Maintained MQT tables to Shadow Tables . . . . .5-94
Refresh a Shadow Table using LOAD outside of CDC without concurrent IUDs . . . . . . . .5-96
Using Shadow Tables in a database with HADR Primary and Standby databases in use .5-98
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-101
Student exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-103
Unit 6. Using Optimizer Profiles to control Access Plans . . . . . . . . . . . . . . . . . . . . . . . . . . .6-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-3
Optimizer Profiles Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-5
Optimizer Profiles: Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-7
Optimization profiles: Anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-11
Optimization profile schema contents example List of Access Requests . . . . . . . . . . . . . .6-14
Sample optimization profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-16
Putting an optimization profile into effect (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-20
Putting an optimization profile into effect (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-23
Optimization guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-26
Optimizer guidelines example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-29
Forming table references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-32
Table references with Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-36
Verify statement matching and profile usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-39
Optimization profile supports registry variables and inexact matching . . . . . . . . . . . . . . . .6-42
Inexact matching in Optimization profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-45
Modifying an optimization profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-48
Sample Optimization Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-51
Sample Tables and Indexes used for examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-53
Sample query 1: Default Access plan - Class 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-56
Step 1: One Global optimization guideline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-62
Sample query 1: Access plan - Optimization Level 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-64
Step 2: Multiple Global optimization guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-66
Sample query 1: Access plan - Profile 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-69
Step 3: Use a Statement guideline to select an index . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-74
Sample query 1: Access plan - Profile 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-77
Use a Statement guideline to set index access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-82
Sample query 1: Access plan - Profile 4 - List Prefetch . . . . . . . . . . . . . . . . . . . . . . . . . . .6-85
Define a Statement guideline with multiple indexes: Index ANDING . . . . . . . . . . . . . . . . .6-90
viii DB2 10.5 for LUW Adv Admin with DB2 BLU © Copyright IBM Corp. 2005, 2015
V9.0
Instructor Guide
TOC Sample query 1: Access plan - Profile 6 - Index Anding . . . . . . . . . . . . . . . . . . . . . . . . . . 6-93

Two table join: Default Access plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-99
Define a Statement guideline to control table access for joining tables . . . . . . . . . . . . . 6-105
Two table join: Profile 7 sets access methods for two table join . . . . . . . . . . . . . . . . . . . 6-108
Define a Statement guideline to request a Merge Join and also control table access . . 6-113
Two table join: Profile 8 sets Merge Join and access methods for two table join . . . . . . 6-116
Define a Statement guideline to request a Nested Loop Join and also control table access 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-121
Two table join: Profile 9 sets Nested Loop Join and access methods for two table join . 6-124
Three table join: Default Access Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-129
Define a Statement guideline to request a Hash Join of two tables in a three table join . 6-134
Three table join: Using Optimizer Profile 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-137
Define a Statement guideline to control both joins for a three table join . . . . . . . . . . . . . 6-142
Two table join: Profile 13 sets Merge Join and Hash Joins for three table join . . . . . . . . 6-146
A Zigzag Join can be included in an optimization profile starting with DB2 10.1 . . . . . . . 6-152
Implement a View to simplify the Application Join SQL . . . . . . . . . . . . . . . . . . . . . . . . . . 6-154
An optimization guideline can reference tables based on the View definition . . . . . . . . . 6-157
The Optimized SQL Statement can be used to resolve table references for optimization
guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-159
Define a Statement guideline that uses the TABID from the Optimized SQL text . . . . . . 6-162
Two table join: Profile 15 uses TABIDs to specify table access methods for Join SQL with a View 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-165
Define a Statement guideline that specifies the INLIST to Join Rewrite guideline . . . . . 6-170
Access plan using default optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-174
Access plan using a profile with the INLIST to Join rewrite guideline . . . . . . . . . . . . . . . 6-176
Evaluate several MQTs to reduce query costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-179
Create several alternative MQTs for testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-182
Default Access plan uses MQT with summary data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-185
Define a Statement guideline to specify the use of a specific MQT regardless of cost for one
SQL statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-191
Access plan based on non-matching SQL text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-194
Access plan based on matching SQL text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-200
Suggestions for success with Optimizer profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-206
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-209
Unit 7. Table Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3
7.1. Table Partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5
Large table design alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6
Database partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9
Multi Dimensional Clustering table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12
Using UNION ALL views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-15
Basics of Materialized Query Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-18
Table partitioning: What is it and Why use it? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21
Table partitioning: More benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-24
Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-26
Creating a range-partitioned table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-29
Considerations for creating a partitioned table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-32
Defining ranges (Long syntax) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-35
Defining ranges (Short syntax) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-37
© Copyright IBM Corp. 2005, 2015 Contents ix

Instructor Guide
Partitioning on multiple columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-39

Create Table: Open-ended ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-41
Create Table: Inclusive and Exclusive bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-43
Adding new ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-45
Create Table: Naming partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-48
Storage Mapping: Mapping ranges to table spaces (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . .7-51
Storage Mapping: Mapping ranges to table spaces (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . .7-54
Global (Non-partitioned) indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-56
Partitioned indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-58
Example of creating partitioned indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-61
Describe Data Partitions shows partitioned index object table space . . . . . . . . . . . . . . . . .7-63
Using Storage Groups to assign data partitions to different storage devices . . . . . . . . . . .7-65
Explain reports show the performance characteristics of table spaces used for range partitioned
tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-67
Using the MON_GET_INDEX function for performance statistics for partitioned indexes .7-69
Storage Mapping: Large objects are Local . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-71
Partition elimination: Table scans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-74
Partition elimination shown in DB2 Explain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-77
Example 1: Partitioned and Non-partitioned indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-80
Example 2: Partitioned and Non-partitioned indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-83
Operations for Roll-out and Roll-in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-86
Roll-in overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-89
Use SET INTEGRITY to complete the roll-in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-92
Using IMMEDIATE UNCHECKED option for SET INTEGRITY following ALTER TABLE
ATTACH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-95
Exception tables for SET INTEGRITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-98
ALTER TABLE ATTACH locking considerations pre-DB2 10.1 and DB2 10.1 . . . . . . . . .7-100
Alternatives for roll-in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-102
Using Refresh Immediate MQTs with table partitioning . . . . . . . . . . . . . . . . . . . . . . . . . .7-105
Tips for smoother roll-in (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-107
Tips for smoother roll-in (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-109
Generated columns, identity columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-112
Roll-out overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-115
Attach or Detach using non-partitioned indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-118
Asynchronous index cleanup after DETACH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-120
Attach or Detach using partitioned indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-123
Table availability during ALTER TABLE Detach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-126
MQTs and DETACH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-129
Utility support for partitioned tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-131
Table partitioning with Database Partitions and MDC Defined . . . . . . . . . . . . . . . . . . . . .7-134
Simultaneous partition elimination and block elimination . . . . . . . . . . . . . . . . . . . . . . . . .7-137
Table partitioning + MDC (roll-in) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-140
Using table partitioning or Database partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-143
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-146
Student exercise 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-148
Unit 8. Advanced Table Reorganization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-2
8.1. Advanced Table Reorganization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-5
DB2 reorganization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-6
Overflow records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-8
x DB2 10.5 for LUW Adv Admin with DB2 BLU © Copyright IBM Corp. 2005, 2015
V9.0
Instructor Guide
TOC Goals of the REORG utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-11

REORG does not always shrink a table: Reclustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-14
REORG does not always shrink a table: PERCENT FREE . . . . . . . . . . . . . . . . . . . . . . . . 8-16
When to REORG? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-19
Recommending REORG: REORGCHK (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-22
REORGCHK: Table statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-25
Recommending REORG: REORGCHK (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-28
REORGCHK: Index statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-31
Using REORGCHK results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-34
Access modes of REORG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-37
Invoking Table or Index reorganization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-40
Table reorganization: CLP syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-43
Classic (Offline) table reorg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-46
Reclustering REORGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-49
Offline table REORG: Reclustering – Table scan sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-51
Offline table REORG: Table space storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-54
Offline table REORG: Scan sort storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-57
Page size considerations: Offline table REORG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-59
DB2 Compression feature summary – A brief history . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-61
DB2 10.1 implements page level adaptive compression . . . . . . . . . . . . . . . . . . . . . . . . . . 8-64
How Does Adaptive Compression Work? Step 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-67
How Does Adaptive Compression Work? Step 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-69
Manual Dictionary building: Offline table REORG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-71
Offline REORG with RESETDICTINARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-74
Offline REORG with KEEPDICTIONARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-77
Compression Dictionary Build using REORG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-80
Automatic Dictionary Creation concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-83
Automatic Compression Dictionary Creation on data population . . . . . . . . . . . . . . . . . . . . 8-86
Using ADMIN_GET_TAB_COMPRESS_INFO table function result for classic compressed
tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-89
Using db2pd to monitor table reorg status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-91
Using db2pd to check reorg statistics when a reorg fails to complete . . . . . . . . . . . . . . . . 8-94
Query access to REORG information in Database History . . . . . . . . . . . . . . . . . . . . . . . . 8-96
Online table reorganization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-99
DB2 10.5 Enhancements for Online Table reorganization . . . . . . . . . . . . . . . . . . . . . . . . 8-101
Online table reorganization: Algorithm choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-104
Online table reorganization: How does it work? (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . 8-107
Online table reorganization: How does it work? (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . 8-110
Online table reorganization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-113
Online table reorganization: Usage considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-116
Online table reorganization: Monitoring with db2pd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-118
REORG and MDC tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-120
REORG and range-partitioned tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-123
Using REORGCHK reports for range-partitioned tables . . . . . . . . . . . . . . . . . . . . . . . . . 8-126
Offline or Online Table REORG? (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-128
Offline or Online table REORG? (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-131
Online index reorganization and creation (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-134
Online index reorganization and creation (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-137
Online index reorganization and creation syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-140
Online index reorganization and creation: Considerations . . . . . . . . . . . . . . . . . . . . . . . . 8-142
Characteristics for standard Online Index reorganization . . . . . . . . . . . . . . . . . . . . . . . . . 8-144
© Copyright IBM Corp. 2005, 2015 Contents xi

Instructor Guide
Starting with DB2 10.1 the RECLAIM EXTENTS option can be used for reclaiming unused index
object space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-146
Checking indexes for reclaimable space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-148
Using REORG INDEXES to reclaim index space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-150
Monitoring online index REORG status with db2pd commands . . . . . . . . . . . . . . . . . . . .8-152
Online Table and Index REORG: Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-154
Offline table REORG: Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-157
REORG utility: Crash recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-159
REORG utility: Roll forward recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-162
Incompatibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-164
Locking for the REORG utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-166
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-170
Student Exercise 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-172
Unit 9. Multiple Dimension Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-2
Single dimensional clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-4
9.1. Multidimensional Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-7
MDC: Rows clustered by Dimension values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-8
MDC Create Table example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-11
Terminology: Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-14
Terminology: Slice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-16
Terminology: Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-18
Dimension block indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-20
Block Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-22
Row Indexes versus Block Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-24
Block Index characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-26
Block Index: Dynamic bitmap ANDing ?Query example 1 . . . . . . . . . . . . . . . . . . . . . . . . .9-28
Query processing: Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-30
Differences between MDCs and clustering indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-36
The Block Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-39
Insert processing details: Existing block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-41
Insert processing details: New block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-43
Delete processing details: Not empty block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-45
Delete processing details: Empty block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-47
Update processing for MDC tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-49
Reduced index maintenance and logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-51
9.2. Multidimensional Table Processing using Load. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-53
Load: Fast and efficient data roll-in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-54
Load processing for MDC tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-56
Performance and tuning: MDC load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-58
MDC locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-61
Examples of MDC locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-63
Using block locking for data roll-in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-66
MDC Rollout Delete performance options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-69
Enabling Rollout options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-72
MDC Immediate Index Cleanup Rollout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-76
MDC Deferred Index Cleanup Rollout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-78
MDC rollout log space usage examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-81
xii DB2 10.5 for LUW Adv Admin with DB2 BLU © Copyright IBM Corp. 2005, 2015
V9.0
Instructor Guide
TOC Showing the background Index Cleanup process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-83

Which rollout should be used? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-85
MDC Rollout restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-88
Sparse MDC tables: Effects of large data rollout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-91
Using REORG with RECLAIM EXTENTS ONLY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-93
REORG with RECLAIM EXTENTS ONLY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-95
Checking for MDC tables with reclaimable space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-97
MDC design considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-99
Considerations for Dimension selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-101
MDC Dimension on a generated column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-104
MDC and generated columns: Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-107
The Importance of Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-110
Step 1: Identify candidate dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-113
Step 2: Estimate number of cells per table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-116
Step 3: Cell density statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-119
Database partitioning and MDC (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-122
Database partitioning and MDC (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-124
MDC tuning summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-126
MDC Design Advisor (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-128
MDC Design Advisor (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-131
MDC performance: Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-135
Example: Object size comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-137
Example: Point query on Block Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-139
Example: Range query on a Block Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-141
Example: Range query on two dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-143
Example: Full table scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-145
Example: Query on a Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-147
Example: Index ORing of Block and RID Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-149
Example: Point query on promotion RID Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-151
Example: Nested loop join with RID Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-153
Example: Nested loop join with Block Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-155
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-157
Unit 10. Advanced Data Movement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
Review - Load utility characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4
Load utility phases: Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-7
Load process model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-9
Multipartition Load utility process model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-13
Online Load: ALLOW READ ACCESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-16
Load options affecting performance (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-19
Load options affecting performance (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-22
Other Load performance factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-25
Index maintenance in Load (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-28
Index maintenance in Load (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-31
LOAD performance versus CREATE INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-34
Tuning the Index Build phase for LOAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-37
Load using a named pipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-40
Load from Cursor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-42
Additional LOAD Input options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-44
© Copyright IBM Corp. 2005, 2015 Contents xiii

Instructor Guide
LOAD performance experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-50

Checking Load status: Load query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-52
Load monitoring: LIST UTILITIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-57
Load utility recovery options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-60
Load utility effects during rollforward recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-63
INGEST utility - Why a new utility? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-66
Deciding where to run the INGEST utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-69
Most basic INGEST command syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-72
Input types and formats for INGEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-74
Ingest - Input types and formats. examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-77
INGEST - Using Field definition lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-79
Using Field definition lists and SQL expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-82
INGEST Field options -- example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-84
Ingest SQL statements - UPDATE example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-86
INGEST SQL statements - Merge and Delete examples . . . . . . . . . . . . . . . . . . . . . . . . .10-88
Fault toleration options for INGEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-90
INGEST Fault toleration example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-93
INGEST - Error handling options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-95
Error handling options -- Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-99
INGEST - Restart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-101
INGEST Restart (continued) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-104
INGEST - Restart -- Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-107
Monitoring - Example of INGEST LIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-109
Monitoring - Example of INGEST GET STATS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-111
INGEST utility - Configuration parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-114
INGEST processing architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-117
Comparison to IMPORT and LOAD - supported Table types . . . . . . . . . . . . . . . . . . . . .10-120
Comparison to IMPORT and LOAD - Column types . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-122
Comparison to IMPORT and LOAD Input types and formats . . . . . . . . . . . . . . . . . . . . .10-124
When to use INGEST rather than LOAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-126
When to use LOAD rather than INGEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-128
db2move utility options: Export/Import/Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-130
db2move considerations for Export/Import/Load options . . . . . . . . . . . . . . . . . . . . . . . .10-132
db2move COPY option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-135
db2move COPY schema examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-139
ADMIN_COPY_SCHEMA procedure: Copy a specific schema and its objects in same database 
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-141
Considerations for making changes to DB2 tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-146
Online Table Move stored procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-149
ADMIN_MOVE_TABLE: Processing phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-151
ADMIN_MOVE_TABLE procedure methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-153
ADMIN_MOVE_TABLE call parameters (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-155
ADMIN_MOVE_TABLE call parameters (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-159
ADMIN_MOVE_TABLE: INIT phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-162
ADMIN_MOVE_TABLE: Copy phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-165
ADMIN_MOVE_TABLE: Replay phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-168
ADMIN_MOVE_TABLE: Swap phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-171
Using Step-mode calls for ADMIN_TABLE_MOVE . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-174
ADMIN_MOVE_TABLE processing options (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-177
ADMIN_MOVE_TABLE processing options (2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-180
xiv DB2 10.5 for LUW Adv Admin with DB2 BLU © Copyright IBM Corp. 2005, 2015
V9.0
Instructor Guide
TOC Setting options in the SYSTOOLS.ADMIN_MOVE_TABLE control table using

ADMIN_MOVE_TABLE_UTIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-183
ADMIN_MOVE_TABLE_UTIL settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-186
Online Table Move makes use of Source Table Indexes . . . . . . . . . . . . . . . . . . . . . . . . 10-189
Impact of source index selected for REPLAY processing . . . . . . . . . . . . . . . . . . . . . . . 10-192
Additional index considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-195
Using ADMIN_MOVE_TABLE with data compression . . . . . . . . . . . . . . . . . . . . . . . . . . 10-198
Example 1: Move a table to new table spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-201
Example 2: Move a table to a manually created target table . . . . . . . . . . . . . . . . . . . . . 10-203
Example 3: Move a table with multiple steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-205
Objects and privileges that are preserved during the online table movement . . . . . . . . 10-208
Restrictions for using ADMIN_MOVE_TABLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-211
General suggestions for using ADMIN_MOVE_TABLE . . . . . . . . . . . . . . . . . . . . . . . . . 10-214
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-216
Student exercise 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-218
Unit 11. DB2 Database Auditing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3
DB2 Audit facilities prior to DB2 9.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-5
Standard DB2 Audit data categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-7
Limitations for audit support prior to DB2 9.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-10
DB2 Database Audit features: Part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-13
DB2 Database Audit features: Part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-16
db2audit command used to manage instance-level auditing . . . . . . . . . . . . . . . . . . . . . . 11-19
db2audit command examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-24
DB2 instance and database auditing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-27
Audit path configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-30
Example of configured audit data and archive paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-35
Creating Audit policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-38
Audit policies are assigned using AUDIT statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-43
Audit statement additional information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-48
Audit granularity: The database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-50
Audit granularity: Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-52
Audit granularity: Authorities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-54
Granularity: Users, Groups and Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-57
Granularity: Trusted contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-59
EXECUTE category (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-61
EXECUTE category (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-63
Audit-related Stored Procedures and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-66
Listing archived audit logs using a Table function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-70
Access to audit data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-73
Example of query using EXECUTE data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-76
Extracting audit data to a file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-78
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-85
Student exercise 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-87
© Copyright IBM Corp. 2005, 2015 Contents xv

Instructor Guide
xvi DB2 10.5 for LUW Adv Admin with DB2 BLU © Copyright IBM Corp. 2005, 2015
V9.0
Instructor Guide
TMK
Trademarks
The reader should recognize that the following terms, which appear in the content of this training
document, are official trademarks of IBM or other companies:
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide.
The following are trademarks of International Business Machines Corporation, registered in many
jurisdictions worldwide:
AIX® Balanced Warehouse® DB™
DB2 Universal Database™ DB2® InfoSphere®
Notes® Optim™ PartnerWorld®
pureScale® Tivoli® WebSphere®
400®
Adobe is either a registered trademark or a trademark of Adobe Systems Incorporated in the United
States, and/or other countries.
Intel and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in
the United States and other countries.
Lenovo and ThinkPad are trademarks or registered trademarks of Lenovo in the United States,
other countries, or both.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other
countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Java™ and all Java-based trademarks and logos are trademarks or registered trademarks of
Oracle and/or its affiliates.
VMware and the VMware "boxes" logo and design, Virtual SMP and VMotion are registered
trademarks or trademarks (the "Marks") of VMware, Inc. in the United States and/or other
jurisdictions.
Other product and service names might be trademarks of IBM or other companies.
© Copyright IBM Corp. 2005, 2015 Trademarks xvii

Instructor Guide
xviii DB2 10.5 for LUW Adv Admin with DB2 BLU © Copyright IBM Corp. 2005, 2015
V8.2
Instructor Guide
pref Course description

DB2 10.5 for LUW Advanced Database Administration with DB2 BLU
Acceleration
Duration: 5 days
Purpose
This course is designed to teach you how to:
• Fully use the advanced technical functions and features of DB2
LUW 10.1 and 10.5.
• Implement DB2 BLU Acceleration, column-organized table
support, for a new or existing DB2 database.
• Describe how the column dictionaries used for DB2 BLU
Acceleration are built and utilized to provide extreme compression
for column-organized tables.
• Explain the default workload management used for DB2 BLU
Acceleration processing and how you can tailor the WLM objects to
efficiently use system resources.
• Monitor a DB2 database or application that uses column-organized
tables using SQL monitor functions.
• Implement Shadow tables for selected row-organized tables to
improve analytics query performance
• Configure a DB2 database that supports a mixture of application
processing, including OLTP and Analytics query processing with
Shadow tables
• Create the Infosphere CDC Datastore, Subscription and Table
mappings required to support Shadow tables
• Implement a User Maintained MQT for a column-organized table
• Create optimization profiles that allow applications to control
specific operations included in the access plans selected by the
DB2 Optimizer, like which index is used to access a table or which
join method to utilize for joining tables.
• Perform advanced monitoring using the DB2 administrative views
and routines in SQL queries.
• Configure and manage the implementation of DB2 instance or
database level auditing, including using the db2audit command
and creation of audit policies which can be assigned to specific
© Copyright IBM Corp. 2005, 2015 Course description xix

Instructor Guide
tables, users or database roles to perform selective collection of

audit records.
• Explore DB2's management of disk space usage in Database
Managed Storage (DMS) table spaces, including the activities of
the rebalancer. Use SQL queries and utilities to check the high
water mark on table spaces and to monitor the rebalance
operation.
• Move data from one table to another or from one database to
another using utilities like db2move.
• Utilize the ADMIN_MOVE_TABLE procedure to implement table
changes with a minimal impact to data availability for applications.
• Implement automatic storage management for table spaces and
storage groups or enable automatic resize options for DMS
managed table spaces to reduce administration requirements and
complexity.
• Exploit and monitor the REORG utility processing for offline and
online table, and index reorganization This includes planning for
the disk space and database log space necessary for
reorganization.
• Utilize the REORG Utility to implement row compression for large
tables, to reduce disk utilization and improve I/O performance for a
DB2 database and understand the automatic creation of
compression dictionaries.
• For Multidimensional Clustering (MDC) tables, determine how to
select the dimension columns and table space extent size for
efficient implementation of MDC tables. Compare the block
indexes used with MDC tables with rows based indexes. Select the
MDC rollout option that best matches application needs and
achieves the best performance results.
• Plan and implement range based table partitioning for large DB2
tables. Utilize the ALTER TABLE ATTACH and DETACH options to
support roll-in and roll-out operations for range-partitioned tables.
Compare the advantages of selecting or combining range
partitioning with the hash-based partitioning used in DB2
partitioned databases or the multiple dimensions provided by MDC
tables.
• You get practical experience in the planning and utilization of a
wide variety DB2 LUW utilities and functions by performing a series
of lab exercises using DB2 Advanced Enterprise Edition 10.5
installed on a Linux platform. One lab exercise uses Infosphere
Change Data Capture Management Console installed on a
xx DB2 10.5 for LUW Adv Admin with DB2 BLU © Copyright IBM Corp. 2005, 2015
V8.2
Instructor Guide
pref Windows system. The exercises build skills that can be applied to
DB2 database servers on any Linux, UNIX or Windows
environment.
Audience
This is an advanced course for DB2 LUW experienced database
administrators who support DB2 for UNIX, Windows, and Linux
databases.
Prerequisites
You should complete:
• DB2 10 for LUW: Basic Administration for Linux and Windows
(CL2X3) or
• DB2 10 for LUW: Basic Administration for AIX (CL213) or
• DB2 10 for Linux, UNIX, and Windows Quickstart for Experienced
Relational DBAs (CL485)
• Or have equivalent experience
Objectives
After completing this course, you should be able to:
• Monitor a DB2 LUW database using command line processor
queries
• Implement DB2 BLU Acceleration, column-organized table
support, for a new or existing DB2 database.
• Configure a DB2 database that uses DB2 BLU Acceleration,
column-organized tables, including sort memory and utility heap
memory considerations
• Describe the default workload management used for DB2 BLU
efficiently use system resources
Shadow tables
© Copyright IBM Corp. 2005, 2015 Course description xxi

Instructor Guide
• Implement DB2 Instance audit data collection using the db2audit

command or database level auditing by creating audit policy
objects and assigning the policies to objects using the AUDIT
command.
• Analyze REORGCHK reports to determine if the table or the index
reorganization would improve database efficiency. Invoke and
monitor the processing for the REORG utility running offline or
online
• Manage the disk space allocated in DMS table spaces using
ALTER TABLESPACE to extend or to reduce the containers, and
monitor the progress of the DB2 rebalancer process
• Implement automatic resize for DMS table spaces or Automatic
Storage management for table spaces to reduce the complexity of
managing DB2 LUW databases
• Describe the conditions that would impact selection of the INGEST
utility rather than using LOAD
• Set the options for the INGEST utility and monitor ingest
processing
• Plan and execute the DB2MOVE utility to copy selected table data
for an entire schema for objects from one DB2 database to another
• Implement an optimization profile to control a portion of the access
plan selected by the DB2 Optimizer to achieve specific application
performance results
• Select options and processing modes for the online table move
procedure, ADMIN_MOVE_TABLE, to implement changes to
tables with minimal loss of data access by applications
• Plan and implement MDC tables to improve application
performance, including selecting the appropriate table space
extent size
• Utilize range-based partitioned tables to support large DB2 tables
that require very efficient roll-in and roll-out capabilities
xxii DB2 10.5 for LUW Adv Admin with DB2 BLU © Copyright IBM Corp. 2005, 2015
V8.2
Instructor Guide
pref Contents
Advanced Monitoring
Advanced Table Space Management
DB2 10.5 BLU Acceleration Concepts
DB2 10.5 BLU Acceleration Implementation and Use
DB2 10.5 BLU Acceleration Implementing Shadow Tables and User
Maintained MQTs
Using Optimizer Profiles to control Access Plans
Table Partitioning
Advanced Data Movement
Advanced Table Reorganization
Multiple Dimension Clustering
DB2 Database Auditing
© Copyright IBM Corp. 2005, 2015 Course description xxiii

Instructor Guide
xxiv DB2 10.5 for LUW Adv Admin with DB2 BLU © Copyright IBM Corp. 2005, 2015
V8.2
Instructor Guide
pref Agenda
Day 1
(00:20) Welcome
(02:00) Unit 1: Advanced Monitoring
(01:00) Exercise 1: DB2 Advanced Monitoring with SQL
(02:00) Unit 2: Advanced Table Space Management
(01:00) Exercise 2: DB2 Advanced DMS Table Space Management
Day 2
(01:30) Unit 3: DB2 10.5 BLU Acceleration Concepts
(01:30) Unit 4: DB2 10.5 BLU Acceleration Implementation and Use
(01:00) Exercise 3:Using DB2 BLU Acceleration to improve
performance for analytics query processing.
(01:15) Unit 5: DB2 10.5 BLU Acceleration Implementing Shadow
Tables and User Maintained MQTs
(01:00) Exercise 4:Implement Shadow Tables and User Maintained
Materialized Query Tables
Day 3
(02:30) Unit 6: Using Optimizer Profiles to control Access Plans
(01:00) Exercise 5: Using Optimizer Profiles to control Access Plans
(02:30) Unit 7: Table Partitioning
(01:30) Exercise 6: Range-partitioned Tables
Day 4
(02:30) Unit 8: Advanced Table Reorganization
(01:15) Exercise 7: Advanced Table Reorganization
(02:00) Unit 9: Multiple Dimension Clustering
(01:00) Exercise 8: DB2 Multidimensional Clustering
© Copyright IBM Corp. 2005, 2015 Agenda xxv

Instructor Guide
Day 5
(02:30) Unit 10: Advanced Data Movement
(01:15) Exercise 8: DB2 Advanced Data Movement
(01:00) Unit 11: DB2 Database Auditing
(01:00) Exercise 9: DB2 Database Audit implementation
xxvi DB2 10.5 for LUW Adv Admin with DB2 BLU © Copyright IBM Corp. 2005, 2015
V8.1
Instructor Guide
Uempty Unit 1. Advanced Monitoring
Estimated time
02:00
What this unit is about

This unit describes a methodology for monitoring DB2 databases
using SQL queries from a command line processor session or using a
tool like IBM Data Studio. The SQL queries will combine DB2's
administrative table functions and views with standard SQL functions
to produce specific results that can be easy to run and analyze.
What you should be able to do

After completing this unit, you should be able to:
• Compare the infrastructure used to support SNAPSHOT
monitoring with the current monitoring infrastructure
• Configure a database to collect the activity, request and object
metrics returned by the Monitoring Table functions
• Investigate current application activity that might indicate
performance problems using SQL statements
• Use the DB2-provided views and functions in SQL to evaluate
efficient use of database memory for locks, sorting and database
buffer pools
• Check database health indicators, like log space available
• and table space utilization using CLP queries using Monitor
functions and views
© Copyright IBM Corp. 2005, 2015 Unit 1. Advanced Monitoring 1-1

Instructor Guide
Unit objectives
efficient use of database memory for locks, sorting and
database buffer pools
• Check database health indicators, like log space available and
table space utilization using CLP queries using Monitor
functions and views
© Copyright IBM Corporation 2013
Figure 1-1. Unit objectives CL4636.0
Notes:
Here are the objectives for this lecture unit.
1-2 DB2 10.5 for LUW Adv Admin with DB2 BLU © Copyright IBM Corp. 2005, 2015
V8.1
Instructor Guide
Uempty Instructor notes:

Purpose — Identify the objectives of this unit.
Details — State the objectives.
Additional information —
Transition statement — First, we will introduce the monitoring infrastructure introduced
with DB2 9.7.

Instructor Guide
Monitoring infrastructure introduced in DB2 9.7

• DB2 LUW is moving away from system monitor and snapshot
technology for database monitoring used prior to DB2 9.7
– Moving towards constant in-memory aggregation and accumulation of metrics
within DB2 at different levels
• DB2 9.7 introduced a low-impact, efficient alternative to the traditional

system monitor infrastructure
– Independent from system monitor (that is, not connected to existing infrastructure
or monitor switches)
• Why implement a new infrastructure?

– Snapshot produces large volumes of information
• Can be hard to consume, analyze, and store for future reference
– Taking snapshots sometimes has an unintended impact on system performance:
• Contention with ongoing work while snapshot attempts to get a cohesive picture of
whole system
• Volumes of data can cause resource consumption issues
– Monitoring needs to available all the time not just when problems occur
• Needs to be lightweight in all facets including access
Figure 1-2. Monitoring infrastructure introduced in DB2 9.7 CL4636.0
Notes:
DB2 9.7 introduced a new monitoring infrastructure that you can access through new table
functions and new event monitors. This infrastructure is a superior alternative to the
existing system monitor, event monitors, snapshot commands, and snapshot SQL
interfaces.
The new monitoring infrastructure is independent from the snapshot based statistics used
in prior releases and does not use the DB2 instance level switches to control detailed data
collection.
The GET SNAPSHOT command reports returned large amounts of information which
could take extra effort to locate the specific data needed. For example, the table snapshot
lists every table accessed and the table space snapshot lists every defined table space.
Using the snapshot based SQL functions and views allowed the snapshot statistics to be
selectively accessed, but the internal processing would still incur the overhead associated
with producing a complete set of statistics.
V8.1
Instructor Guide
Uempty The system overhead associated with some of the snapshot switches that control collection
of detailed statistics lead some users to only turn on the detailed data when needed for a
specific problem or tuning effort.
The new monitoring infrastructure was designed to provide detailed system and application
statistics with a reduced overhead for collecting the monitor data. The plan is to provide a
set of views and functions based on the new monitoring facility that will eventually
completely replace snapshot based monitoring.

Instructor Guide
Instructor notes:
Purpose — To introduce the new monitoring infrastructure implemented in DB2 9.7 and
explain some of the reasons for this new direction in database monitoring facilities.
Details —
Additional information — Even though this level of the course presents the product
functions at the Version 10 (10.1 and 10.5), some students may support databases using
older releases where snapshot monitoring was the standard methid.
Transition statement — Let’s review some of the characteristics of the snapshot
monitoring infrastructure used in previous releases of DB2 LUW.
V8.1
Instructor Guide
Uempty
Snapshot infrastructure characteristics

• Overhead with all switches ON for OLTP, about 6%
• Snapshot interfaces are C APIs:
– CLP command invokes the APIs
– SQL wrappers (for example, views and functions) invoke the APIs
• SQL snapshot wrappers impose heavy overhead:

– Native binary snapshot output mapped to relational output
– Full data stream is returned (size depends on snapshot type):
• No filtering at source; filtered during SQL statement processing
• System temporary table space used to sort and filter
• Routines are fenced

• Need SYSMON authority as well as EXECUTE or SELECT on
SQL wrapper
Figure 1-3. Snapshot infrastructure characteristics CL4636.0
Notes:
The overhead associated with collecting the detailed snapshot statistics varies depending
on the application workload. A test performed using a OLTP transaction system showed
that about 6 percent overhead was added if all of the snapshot switches were turned on.
The snapshot monitoring facilities are based on a set of C programming language API
calls. The SQL wrappers that provide the results for the DB2 snapshot based
administrative functions and views utilize the same C based API calls which are executed
as fenced routines. These routines use the Snapshot API calls to retrieve a complete set of
statistics and then filter the data to match the specific SQL statement. So a SQL statement
might request a few data elements for one table space, but all the detailed information
about every table space would be collected and brought into memory. In some cases,
system temporary table space would be used to produce the results.
In order to access the snapshot-based statistics using the SQL functions or views a user
needs to be a member of one of the DB2 instance level group authorizations, SYSADM,
SYSCTRL, SYSMAINT or SYSMON.

Instructor Guide
Instructor notes:
Purpose — To discuss some of the system overhead issues associated with collecting and
retrieving snapshot based monitor data.
Details —
Transition statement — Next we will discuss the characteristics associated with collecting
the in-memory metrics used for the new monitor table functions and views.
V8.1
Instructor Guide
Uempty
Characteristics of In-Memory metrics used to
support the monitor table functions
• Overhead with all metrics active, for OLTP, about 3%
• Collection of metrics:
– DB2 Agent collects metrics locally during processing
– Agent rolls up metrics at logical break points during processing:
• Unit of work boundary
• Approximately every 10 seconds for long running transactions
• Access to monitor metrics:

– Data is queried directly from aggregates kept at the target accumulation point
• No need to drill down to agent or application levels for accumulation
– SQL access is through trusted table functions with direct memory access
– Input arguments and predicates allow filtering of data at source to reduce
volumes and overhead of queries
• Functions allow request for a single table, table space or database connection to
be specified
Figure 1-4. Characteristics of In-Memory metrics used to support the monitor table functions CL4636.0
Notes:
The in-memory metrics used to support the current monitor functions were designed to
reduce the system overhead associated with collecting those metrics. A test performed
using a OLTP transaction system showed that about 3 percent overhead was added if all of
the new metrics were collected.
One method used to reduce the overhead is to have each DB2 agent collect the metrics
locally as the processing is performed. The information collected by each agent is rolled up
to the higher levels, like connection, workload and service subclass at logical breakpoints
like the end of a unit of work. For long-running transactions, the metrics get rolled up on an
interval of about 10 seconds. This keeps the statistics at the higher levels close to the
actual numbers but reduces the cost of tracking work at the higher levels.
When monitor data is requested, it is reported directly from the higher levels of aggregation
rather than drilling down to each individual agent. SQL access to the in-memory metrics is
performed using a set of trusted functions with less processing overhead.

Instructor Guide
Many of the monitor table functions provide call parameters that limit the scope of
processing for collecting metrics to a single occurrence. For example, the table function
MON_GET_TABLE can be called for information about one table.
The snapshot-based functions and views were created to match the various sections of the
matching snapshot report. For example, a query might need to join the SNAPAPPL,
SNAPAPPL_INFO and SNAPSTMT views to retrieve necessary statistics for a single
database connection. The single function MON_GET_CONNECTION can be used to
access the current in-memory metrics for a database connection.
V8.1
Instructor Guide

Purpose — To discuss the characteristics associated with the in-memory metrics used to
support the monitor table functions.
Details —
Transition statement — Next, we will discuss the three focus areas for the monitor table
functions.

Instructor Guide
Focus areas for In-Memory metrics

• System:
– Provide total perspective of application work being done by
database system
– Aggregated through the WLM infrastructure
• Data objects:
– Provide perspective of impact of application work on data objects
– Aggregated through data storage infrastructure
• Activity:
– Provide perspective of work being done by specific SQL statements
– Aggregated through the package cache infrastructure
Figure 1-5. Focus areas for In-Memory metrics CL4636.0
Notes:
The DB2 relational monitoring interfaces are based on three basics views of the database
server.
With a 'system' view, all of the work performed is collected and then aggregated based on
the workload management infrastructure. The statistics for the current units of work and
connections can be retrieved. The accumulation of work performed for each WLM workload
or service subclass can also be accessed.
The data object view provides metrics for each table, index, buffer pool, and table space in
the database. Metrics at the table space container level are also available. These container
level statistics could be used to track the disk activity by container. In some cases a
performance problem might be shown using these container-level metrics that would not be
as clear when viewed at the table space level.
The activity view is also based on the workload management concepts. The metrics for
current activities, like SQL statements or LOAD utilities are available. The package cache
statistics provide a longer term view, providing the summary statistics for multiple
executions of both static and dynamic statements.
V8.1
Instructor Guide

Purpose — To discuss the three focus areas or views behind the in-memory metrics.
Details —
Transition statement — Next we will look at the system-based metrics.

Instructor Guide
In-Memory Metrics: System
Connection Service Class Service Class

R R R
Workload Workload
Occurrence Definition
(UOW) R
R
Request Metrics • MON_GET_UNIT_OF_WORK

• MON_GET_UNIT_OF_WORK_DETAILS
• MON_GET_CONNECTION
• MON_GET_CONNECTION_DETAILS
Database DB2 Agent • MON_GET_SERVICE_SUBCLASS
Request Collects Data • MON_GET_SERVICE_SUBCLASS_DETAILS
• MON_GET_WORKLOAD
• MON_GET_WORKLOAD_DETAILS
• MON_GET_DATABASE
Legend • MON_GET_DATABASE_DETAILS
ȈR = Accumulation of request metrics collected by agent
Figure 1-6. In-Memory Metrics: System CL4636.0
Notes:
As application requests are processed by the database manager, the various statistics are
added to each level used for reporting.
The detailed processing metrics are added to the specific unit of work that generated the
database request. Each unit of work is added to the statistics for the connection associated
with the unit of work. The statistics are also added to the workload and service subclass
that the request was processed under.
With DB2 10.5 and later, database level statistics are also available, using
MON_GET_DATABASE and MON_GET_DATAABSE_DETAILS.
V8.1
Instructor Guide

Purpose — To explain how the processing for each database request adds to the
information collected for the related unit of work, service class and workload.
Details —
Transition statement — Next we will look at the collection of statistics for various
database objects.

Instructor Guide
In-Memory Metrics: Data objects
• MON_GET_TABLE
• MON_GET_INDEX
Buffer pool
Metrics • MON_GET_BUFFERPOOL
• MON_GET_TABLESPACE
• MON_GET_CONTAINER
Buffer pool
Container
Metrics
Temp Table space

Table space Table space Table space
Table space Metrics
Container
Row Data
LOB /Data
Index
Table XML Data
Table
Metrics
Database
DB2 Agent Request
Collects Data
Figure 1-7. In-Memory Metrics: Data objects CL4636.0
Notes:
A each database request is processed, the statistics are accumulated for each related
database object including:
• Each table accessed
• Each index utilized for a request
• The table spaces that were accessed for data, index, large object or XML data.
• Some requests could generate activity for temporary table spaces that would also need
to be accumulated.
• Statistics are tracked for each container within a table space. This would show if one or
more containers might be related to an I/O performance bottleneck.
• Each buffer pool accessed will also reflect the activity for the request.
V8.1
Instructor Guide

Purpose — To explain how the statistics for each database object could be tracked as an
application request is handled.
Details —
Transition statement — Next we will look at the activity perspective of the in-memory
metrics..

Instructor Guide
In-Memory Metrics: Activity perspective
• MON_GET_ACTIVITY_DETAILS
• MON_GET_PKG_CACHE_STMT
• MON_GET_PKG_CACHE_STMT_DETAILS
Activity WLM Activity Package Cache

Metrics A A
DB2 Agent Activity Level

Database
Request Collects Data
Legend
ȈA = Accumulation of metrics from activity execution portion of request
Figure 1-8. In-Memory Metrics: Activity perspective CL4636.0
Notes:
A request comes into an agent and is processed. If the request is related to an activity, then
the agent gathers the metrics from the start of activity execution and at regular intervals
aggregates them in the activity control block. When the activity completes, those activity
execution metrics are propagated to the package cache and aggregated under the specific
cached section that was executed (static and dynamic SQL).
V8.1
Instructor Guide

Purpose — To discuss the activity-based perspective for in-memory statistics.
Details —
Transition statement — Next we will look at options to control collection of metrics for the
new monitoring functions.

Instructor Guide
Controls for collecting monitor data

Monitoring Table Functions Related Related
Database Level Workload Management
Configuration Setting
MON_GET_TABLE Always collected
MON_GET_INDEX
MON_GET_BUFFERPOOL mon_obj_metrics
MON_GET_TABLESPACE
MON_GET_CONTAINER
MON_GET_ACTIVITY_DETAILS mon_act_metrics COLLECT ACTIVITY

MON_GET_PKG_CACHE_STMT METRICS clause on
MON_GET_PKG_CACHE_STMT-DETAILS workloads
MON_GET_UNIT_OF_WORK mon_req_metrics COLLECT REQUEST
MON_GET_UNIT_OF_WORK_DETAILS METRICS clause on a
MON_GET_CONNECTION
service superclass
MON_GET_CONNECTION_DETAILS
MON_GET_SERVICE_SUBCLASS
MON_GET_SERVICE_SUBCLASS_DETAILS
MON_GET_WORKLOAD
MON_GET_WORKLOAD_DETAILS
MON_GET_DATABASE
MON_GET_DATABASE_DETAILS
Figure 1-9. Controls for collecting monitor data CL4636.0
Notes:
With the new monitor elements and infrastructure, you can use SQL statements to
efficiently collect monitor data to determine whether specific aspects of the system are
working correctly and to help you diagnose performance problems, while incurring a
reasonable performance overhead. With the new access methods, you can get all the data
you need without using the snapshot interfaces. The increased monitoring granularity gives
you more control over the data collection process; collect the data you want from the
source you want
.
Monitoring information is collected about the work performed by your applications and
reported through table function interfaces at the following three levels:
System level
These monitoring elements provide details about all work being performed on the
system. Monitor-element access points include service subclass, workload definition,
unit of work, and connection as well as the database level.
V8.1
Instructor Guide
Uempty Activity level

These monitor elements provide details about activities being performed on the system
(a specific subset of the work being performed on the system). You can use these
elements to understand the behavior and performance of activities. Monitor-element
access points include individual activities, and entries in the database package cache.
Data object level
These monitoring elements provide details about the work being processed by the
database system within specific database objects such as indexes, tables, buffer pools,
table spaces, and containers, thereby enabling you to quickly identify issues with
particular data objects that might be causing system problems. Monitor-element access
points include buffer pool, container, index, table, and table space.
mon_obj_metrics - This parameter controls the collection of data object metrics on the
entire database.
Configuration type Database
Parameter type Configurable online
Default [range] BASE [NONE,BASE,EXTENDED]
If you set this configuration parameter to BASE or EXTENDED, all metrics reported
through the following interfaces will be collected:
• MON_GET_BUFFERPOOL
• MON_GET_TABLESPACE
• MON_GET_CONTAINER
If you set this configuration parameter to NONE, the metrics reported through the above
mentioned interfaces will not be updated
mon_act_metrics - This parameter controls the collection of activity metrics on the entire
database and affects activities submitted by connections associated with any DB2®
workload definitions.
Default [range]
BASE [NONE,BASE,EXTENDED]
through the following interfaces will be collected for all activities executed on the data
server, regardless of the DB2 workload the connection that submitted the activity is
associated with:
• MON_GET_PKG_CACHE_STMT

Instructor Guide
• Activity event monitor (DETAILS_XML monitor element in the event_activity

logical data groups)
interfaces are collected only for the subset of activities submitted by a connection that is
associated with a DB2 workload whose COLLECT ACTIVITY METRICS clause has
been set to BASE or EXTENDED.
mon_req_metrics - This parameter controls the collection of request metrics on the entire
database and affects requests executing in any DB2 service classes.
Default [range] BASE [NONE,BASE,EXTENDED]
through the following interfaces are collected for all requests executed on the data
server, irrespective of the DB2 service class the request runs in:
• MON_GET_DATABASE
• MON_GET_DATABASE_DETAILS
• MON_GET_UNIT_OF_WORK
• MON_GET_CONNECTION
• MON_GET_SERVICE_SUBCLASS
• MON_GET_SERVICE_SUBCLASS_DETAILS
• MON_GET_WORKLOAD
• Statistics event monitor (DETAILS_XML monitor element in the event_wlstats
and event_scstats logical data groups)
• Unit of work event monitor
interfaces are collected only for the subset of requests running in a DB2 service class
whose service superclass has the COLLECT REQUEST METRICS clause set to BASE
or EXTENDED.
V8.1
Instructor Guide

Purpose — To show the new monitoring functions and explain how the collection of
information for different functions can be controlled either at the database level or based on
workload management options.
Details —
Transition statement — Next we will discuss the two types of monitor functions provided
by DB2.

Instructor Guide
Monitoring
system information using table functions
• The system monitoring perspective encompasses all the work and
effort expended by the data server to process application requests.
• You can determine what the data server is doing as a whole or for
particular subsets of application requests.
• Table functions are provided in pairs:
– One for relational access, each monitor element is one column of
data
– One (DETAILS) for XML access to monitor elements
• Use the following table functions for accessing current system
monitoring information:
– MON_GET_SERVICE_SUBCLASS and MON_GET_SERVICE_SUBCLASS_DETAILS
– MON_GET_WORKLOAD and MON_GET_WORKLOAD_DETAILS
– MON_GET_CONNECTION and MON_GET_CONNECTION_DETAILS
– MON_GET_UNIT_OF_WORK and MON_GET_UNIT_OF_WORK_DETAILS
– MON_GET_DATABASE and MON_GET_DATABASE_DETAILS
– MON_GET_ACTIVITY and MON_GET_ACTIVITY_DETAILS
Figure 1-10. Monitoring system information using table functions CL4636.0
Notes:
The system monitoring perspective encompasses the complete volume of work and effort
expended by the data server to process application requests. From this perspective, you
can determine what the data server is doing as a whole as well as for particular subsets of
application requests.
Monitor elements for this perspective, referred to as request monitor elements, cover the
entire range of data server operations associated with processing requests.
Request monitor elements are continually accumulated and aggregated in memory so they
are immediately available for querying. Request monitor elements are aggregated across
requests at various levels of the workload management (WLM) object hierarchy: by unit of
work, by workload, by service class. They are also aggregated by connection.
Use the following table functions for accessing current system monitoring information:
• MON_GET_DATABASE and MON_GET_DATABASE_DETAILS
• MON_GET_SERVICE_SUBCLASS and MON_GET_SERVICE_SUBCLASS_DETAILS
• MON_GET_WORKLOAD and MON_GET_WORKLOAD_DETAILS
V8.1
Instructor Guide
Uempty • MON_GET_CONNECTION and MON_GET_CONNECTION_DETAILS

• MON_GET_UNIT_OF_WORK and MON_GET_UNIT_OF_WORK_DETAILS
This set of table functions enables you to drill down or focus on request monitor elements
at a particular level of aggregation. Table functions are provided in pairs: one for relational
access to monitor data and the other for XML access to the monitor elements.

Instructor Guide
Instructor notes:
Purpose — To discuss the monitor table functions, designed to return detailed statistics for
DB2 database servers. The table functions listed offer two options, one that produces a
simple list of relational columns, while the DETAILS set of functions provide data elements
in the form of an XML document.
Details —
Transition statement — Next we will look at some of the monitor elements available
through the monitor table functions that show where time is spent processing database
requests.
V8.1
Instructor Guide
Uempty
Monitoring time spent waiting for resources

• Standard Wait time statistics:
– total_wait_time - Total wait time
– agent_wait_time - Agent wait time
– pool_read_time - Total buffer pool physical read time
– pool_write_time - Total buffer pool physical write time
– client_idle_wait_time - Client idle wait time
– direct_read_time - Direct Read Time
– direct_write_time - Direct write time
– fcm_recv_wait_time - FCM receive wait time
– fcm_send_wait_time - FCM send wait time
– ipc_recv_wait_time - Interprocess communication received wait time
– ipc_send_wait_time - Interprocess communication send wait time
– tcpip_recv_wait_time - TCP/IP receive wait time
– tcpip_send_wait_time - TCP/IP send wait time
– lock_wait_time - Time waited on locks
– log_disk_wait_time - Log disk wait time
– log_buffer_wait_time - Log buffer wait time
• Detailed Wait time statistics:

– audit_subsystem_wait_time - Audit subsystem wait time
– audit_file_write_wait_time - Audit file write wait time
– diaglog_write_wait_time - Diag log write time
– fcm_message_recv_wait_time - FCM message receive wait time
– fcm_message_send_wait_time - FCM message send wait time
– fcm_tq_recv_wait_time - FCM tablequeue receive wait time
– fcm_tq_send_wait_time - FCM tablequeue send wait time
Figure 1-11. Monitoring time spent waiting for resources CL4636.0
Notes:
The visual lists the monitor elements that show how much time was spent waiting for
different reasons. These can be used to determine if a system might have a performance
bottleneck related to logging, disk I/Os, locking or network performance problems.
The list of detailed wait times includes any wait times associated with writing diagnostic
messages or recording database audit records.
For DB2 partitioned databases, there are a series of statistics that show how much time
was spent waiting for data to be sent or received within the FCM component.

Instructor Guide
Instructor notes:
Purpose — To discuss the list monitor elements that can be used to see if applications are
being delayed for various reasons, like locking, logging or network communications.
Details —
Transition statement — Next we will discuss some additional time-related statistics.
V8.1
Instructor Guide
Uempty
Additional time-related metrics

• Time metrics reported for:
– Connections
– Units of work
– Each Statement in the package cache
– WLM Workload
– WLM Service subclass
– WLM Activity
• For Sort Processing:
– total_section_sort_time – Total amount of time spent performing sorts
– total_section_sort_proc_time – Total amount of
processing (non-wait) time spent performing sorts while executing a
section
• total_rqst_time – The total amount of time spent working on requests
• total_cpu_time – The total amount of CPU time used while within DB2.
Represents total of both user and system CPU time.
• total_act_time – The total amount of time spent executing activities.
• total_act_wait_time – Total time spent waiting within the DB2 database
server, while processing an activity
Figure 1-12. Additional time-related metrics CL4636.0
Notes:
The time-related monitor statistics are accumulated for each unit of work, connection as
well as being added to the related workload, management workload, service class and
activity data. The statistics will also be added into the information associated with each
SQL statement stored in the database package cache.
To better understand the time spent performing sort operations there are two elements:
• total_section_sort_time indicates the total amount of time spent performing sorts.
• total_section_sort_proc_time show the total amount of processing (non-wait) time
spent performing sorts while executing a section.
There are statistics that show the total amount of time working on requests
(total_rqst_time) and the total amount of CPU time consumed (total_cpu_time).
The MON_GET_ACTIVITY_DETAILS table function and MON_GET_PKG_CACHE_STMT
table function include the total amount of time spent executing activities and the total time
spent waiting during processing an activity.

Instructor Guide
Instructor notes:
Purpose — To discuss other time-related statistics that can help to effectively work on
database performance issues.
Details —
Transition statement — Next we will look at some additional wait related statistics that
were added with DB2 10.
V8.1
Instructor Guide
Uempty
Additional wait times reported with DB2 10

PREFETCH_WAIT_TIME The time an application spent waiting for an I/O server
(prefetcher) to finish loading pages into the buffer pool
PREFETCH_WAITS The number of times waited for an I/O server

(prefetcher) to finish loading pages into the buffer pool
TOTAL_EXTENDED_LATCH_WAIT_TIME The amount of time, in milliseconds, spent in

extended latch waits
TOTAL_EXTENDED_LATCH_WAITS The number of extended latch waits.
COMM_EXIT_WAIT_TIME Time spent waiting for the return from a communication

buffer exit library API function. The value is given in
milliseconds
COMM_EXIT_WAITS The number of times a communication buffer exit

library API function is called.
EVMON_WAIT_TIME The amount of time that an agent waited for an event

monitor record to become available.
EVMON_WAITS_TOTAL The number of times that an agent waited for an event

monitor record to become available.
Figure 1-13. Additional wait times reported with DB2 10 CL4636.0
Notes:
DB2 10.1 and later provide additional wait time related monitor elements.
The visual shows some new monitor elements that can be used to understand application
wait times, including prefetching data, waiting for DB2 latches and waits associated with
event monitoring.
For example, the monitor element comm_exit_wait_time shows the time spent waiting for
the return from a communication buffer exit library API function. The value is given in
milliseconds.

Instructor Guide
Instructor notes:
Purpose — To discuss some wait related monitor elements that can be used to analyze
application delays starting with DB2 10.1.
Details —
Transition statement — Next we will look at an example query that uses one of the XML
based table functions to return a result.
V8.1
Instructor Guide
Uempty
Example of query using the XML document
returned by MON_GET_CONNECTION_DETAILS
• Display connections returning the highest volume of data to clients, ordered by rows
returned .
SELECT detmetrics.application_handle,
detmetrics.rows_returned,
detmetrics.tcpip_send_volume
FROM TABLE(MON_GET_CONNECTION_DETAILS(CAST(NULL as bigint), -2))

AS CONNMETRICS,
XMLTABLE (XMLNAMESPACES( DEFAULT 'http://www.ibm.com/xmlns/prod/db2/mon'),

'$detmetric/db2_connection' PASSING XMLPARSE(DOCUMENT CONNMETRICS.DETAILS) as
"detmetric“
COLUMNS "APPLICATION_HANDLE" INTEGER PATH 'application_handle',

"ROWS_RETURNED" BIGINT PATH 'system_metrics/rows_returned',
"TCPIP_SEND_VOLUME" BIGINT PATH 'system_metrics/tcpip_send_volume'
) AS DETMETRICS
ORDER BY rows_returned DESC
The following is an example of output from this query.
APPLICATION_HANDLE ROWS_RETURNED TCPIP_SEND_VOLUME

------------------ -------------------- --------------------
21 4 0
Figure 1-14. Example of query using the XML document returned by MON_GET_CONNECTION_DETAILS CL4636.0
Notes:
Using XML to report monitor information provides improved extensibility and flexibility. New
monitor elements can be added to the product without having to add new columns to an
output table. Also, XML documents can be processed in a number of ways, depending on
your needs. For example:
• You can use XQuery to run queries against the XML document.
• You can use the XSLTRANSFORM scalar function to transform the document into other
formats.
• You can view their contents as formatted text using built-in MON_FORMAT_XML_*
formatting functions, or the XMLTABLE table function.
are produced by several monitoring interfaces. The sections that follow describe how
results are returned as XML documents.
The Monitor table functions with names that end with _DETAILS produce XML documents
containing monitor elements. Examples of these table functions include:
• MON_GET_PKG_CACHE_STMT_DETAILS

Instructor Guide
• MON_GET_SERVICE_SUBCLASS_DETAILS
• MON_GET_DATABASE_DETAILS
The visual shows an example of a query using the MON_GET_CONNECTION_DETAILS
monitor table function. The XMLTABLE function is used to selectively present a set of
monitor elements from the XML document in the column named DETAILS in the form of a
standard tabular result.
V8.1
Instructor Guide

Purpose — To show an example of a SQL query that uses one of the ‘DETAILS’ monitor
table functions, in this case MON_GET_CONNECTION_DETAILS.
Details —
Transition statement — Next we will at some administrative views that can simplify
access to monitor data.

Instructor Guide
Monitoring administrative views simplify

access to important metrics
MON_BP_UTILIZATION Includes hit ratios and average read and write times for all
buffer pools.
MON_TBSP_UTILIZATION Includes hit ratios and utilization percentage for all table
spaces.
MON_LOCKWAITS Information about agents that are waiting to obtain locks in
the currently connected database.
MON_PKG_CACHE_SUMMARY Metrics returned are aggregated over all executions of the
statement.
MON_CURRENT_SQL Metrics for all activities that were submitted and have not
yet been completed.
MON_CURRENT_UOW Identifies long running units of work.
MON_SERVICE_SUBCLASS_SUMMARY Key metrics for all service subclasses.
MON_WORKLOAD_SUMMARY Key metrics for all workloads.
MON_CONNECTION_SUMMARY Key metrics for all connections.
MON_DB_SUMMARY Key metrics aggregated over all service classes.
Figure 1-15. Monitoring administrative views simplify access to important metrics CL4636.0
Notes:
A set of administrative views that encapsulate key queries using the monitoring table
functions.The schema for these views is SYSIBMADM.
These provide many detailed metrics describing the database objects and environment.
To see the most important metrics in an easily readable format, you can use the new
monitoring administrative views. You can simply issue a SELECT * command to see the
main metrics from each table function, as well as some common calculated values, like
buffer pool hit ratios.
The following administrative views are available:
• MON_BP_UTILIZATION
• MON_TBSP_UTILIZATION
• MON_LOCKWAITS
• MON_PKG_CACHE_SUMMARY
• MON_CURRENT_SQL
V8.1
Instructor Guide
Uempty • MON_CURRENT_UOW
• MON_SERVICE_SUBCLASS_SUMMARY
• MON_WORKLOAD_SUMMARY
• MON_CONNECTION_SUMMARY
• MON_DB_SUMMARY
For example, the MON_DB_SUMMARY administrative view returns key metrics
aggregated over all service classes in the currently connected database. It is designed to
help monitor the system in a high-level manner by providing a concise summary of the
database. The view includes information like IO_WAIT_TIME_PERCENT, which shows the
percentage of the time spent waiting within the DB2 database server that was due to I/O
operations. This includes time spent performing direct reads or direct writes, and time
spent reading data and index pages from the table space to the bufferpool or writing them
back to disk.

Instructor Guide
Instructor notes:
Purpose — To briefly discuss some of the administrative views. These became available
with DB2 9.7 Fix Pack 1 to simplify access to key DB2 system metrics.
Details —
Transition statement — Now we will look at examples of SQL queries that monitor
different aspects of database server processing.
V8.1
Instructor Guide
Uempty
Monitoring performance with SQL: Buffer pools

SELECT substr(bp_name,1,30) as bp_name ,
pool_data_l_reads, pool_data_p_reads,
(100 * (pool_data_l_reads - pool_data_p_reads )) /( pool_data_l_reads )
as data_hit_pct,
pool_index_l_reads, pool_index_p_reads ,
(100 * (pool_index_l_reads - pool_index_p_reads )) /( pool_index_l_reads )
as index_hit_pct
FROM TABLE (MON_GET_BUFFERPOOL(NULL,-2) ) as tbuff
where bp_name not like 'IBMSYSTEM%';
Exclude the System 'Hidden'
Buffer pools
BP_NAME POOL_DATA_L_READS POOL_DATA_P_READS DATA_HIT_PCT
------------------ -------------------- -------------------- -------------
IBMDEFAULTBP 813 229 71
CLPBUFFL 35172 2760 92
CLPBUFFS 41361 34649 16
POOL_INDEX_L_READS POOL_INDEX_P_READS INDEX_HIT_PCT

-------------------- ------------------- --------------
1116 408 63
64 21 67
103 76 26
Figure 1-16. Monitoring performance with SQL: Buffer pools CL4636.0
Notes:
The example query uses the function MON_GET_BUFFERPOOL to return logical and
physical data and index page read counts and calculates hit ratios for each page type.
The system hidden buffer pools are excluded from the results.

Instructor Guide
Instructor notes:
Purpose — To show a simple query that can be used to check buffer pool activity and hit
ratios for each database buffer pool.
Details —
Transition statement — Next we will look at a SQL query that monitors database sort
processing.
V8.1
Instructor Guide
Uempty
Monitoring performance with SQL: Sorts
with dbcfg1 as ( Get the shared sort

select int(value) as sheapthres_shr heap threshold from the
from sysibmadm.dbcfg where name = 'sheapthres_shr' ) database config
select sheapthres_shr as "Shared_sort_heap" ,

sort_shrheap_allocated as "Shared_sort_allocated" ,
dec((100 * sort_shrheap_allocated)/sheapthres_shr,5,2)
as " % Sheap_alloc" ,
dec((100* sort_shrheap_top)/sheapthres_shr,5,2)
as " % Max Sheap_alloc" ,
sort_overflows as "Sort_Overflows",
total_sorts as "Total_Sorts"
from dbcfg1, table (MON_GET_DATABASE(-1)) AS MONDB
Shared_sort_heap Shared_sort_allocated % Sheap_alloc % Max Sheap_alloc

---------------- --------------------- -------------- ------------------
5024 4 0.00 40.00
Sort_Overflows Total_Sorts
-------------------- --------------------
14 49
Figure 1-17. Monitoring performance with SQL: Sorts CL4636.0
Notes:
The common table expression accesses a view, SYSIBMADM.DBCFG, that lists all the
database configuration parameters. Each configuration parameter appears as one row in
the table, so the predicate. name = 'sheapthres_shr', is used to request the defined size of
the database shared sort heap.
The SELECT joins the value from the Database configuration to the statistics from the table
function MON_GET_DATABASE that relates to sort performance. Prior to DB2 10.5 the
view SYSIBMADM.SNAPDB could be used for database sort statistics.
These statistics show the statistics for sorts using database shared memory. If the
Database Manager configuration option SHEAPTHRES is not set to 0, then many sort
operations will be performed using private memory. The query shows the current and
maximum usage of the database shared memory heap for sorts. This area is also used for
hash joins and dynamic index anding operations.
The number of sort overflows can be compared to the total number of sorts to see if the
database configuration option sortheap may need to be increased.

Instructor Guide
Instructor notes:
Purpose — This example uses a query to report on sort performance statistics. A join is
used because the database configuration parameter sheapthres_shr is needed to calculate
the percentage of the defined shared sort heap threshold that is currently allocated. A
result of greater than 100% could be returned. Starting with DB2 9, shared sort memory
utilization is allowed to overflow the defined size for sheapthres_shr.
Details —
Transition statement — Next we will look at a query that looks for SQL statements in
package cache that may require some additional analysis.
V8.1
Instructor Guide
Uempty
Monitoring performance with
SQL: Top dynamic SQL statements
select num_executions as "Num Execs", Get the Dynamic SQL Stats
total_act_time as total_time, from Package Cache
(total_act_time / num_executions ) as "Avg Time (msec)",
total_sorts as "Num Sorts",
(total_sorts / num_executions ) as "Sorts Per Stmt",
total_section_sort_time,
substr(stmt_text,1,35) as "SQL Stmt" from
table ( MON_GET_PKG_CACHE_STMT('d',NULL,NULL,-1)) as dyn_cache where
num_executions > 0 and total_routine_time = 0
order by 2 desc fetch first 5 rows only
Num Execs TOTAL_TIME Avg Time (msec) Num Sorts Sorts Per Stmt
---------- -------------------- -------------------- ------------- --------------
5 3124 624 5 1
5 2070 414 5 1
5 2018 403 5 1
5 1240 248 5 1
5 281 56 5 1
TOTAL_SECTION_SORT_TIME SQL Stmt
----------------------- --------------------------------------------------
6 SELECT * from clpm.hist2 where acct_id between 10
600 SELECT * from clpm.hist2 order by balance desc
154 SELECT hist2.ACCT_ID, hist2.TELLER_ID, hist2.BRANC
546 SELECT * from clpm.hist1 order by balance desc
108 SELECT hist1.ACCT_ID, hist1.TELLER_ID, hist1.BRANC
Figure 1-18. Monitoring performance with SQL: Top dynamic SQL statements CL4636.0
Notes:
This SQL query example shows how the WHERE clause and ORDER BY clause on a
SELECT statement can be effectively used to find the dynamic SQL statement statistics
that will be of the highest interest for performance reviews.
The monitor table function MON_GET_PKG_CACHE_STMT is used to retrieve the
statistics from SQL statements in the database package cache. The function can be used
to monitor the static and dynamic SQL statements stored in the package cache. The
example SQL includes the ‘d’ function call parameter to limit result to the dynamic SQL
statements.
The example query shows the top dynamic SQL statements based on highest total activity
time. These are the queries that should get focus to ensure they are well tuned. The
example query includes the number of sorts in the result, so that you can see if a query is
executed frequently and performs a lot of sorts, as these might be a good candidate for
adding a new index.
You might decide to use the DB2 design advisor to gather suggestions for reducing
processing costs for these SQL statements.

Instructor Guide
Instructor notes:
Purpose — This example uses a query to report on the performance statistics for dynamic
SQL statements in the package cache. This type of query will be most useful in databases
that remain active for longer periods of time.
Details —
Transition statement — Next we will look at an example of using a query to check for long
running SQL statements.
V8.1
Instructor Guide
Uempty
Monitoring
performance with SQL: Long-running SQL
select varchar(application_name,15) as Appl_name ,
elapsed_time_sec as "Elapsed Seconds" ,
varchar(activity_state,20) as "Status ",
varchar(session_auth_id,10) as auth_id ,
total_cpu_time, rows_returned,
substr(stmt_text,1,30) as "SQL Statement"
from sysibmadm.mon_current_sql
order by 2 desc
APPL_NAME Elapsed Seconds Status AUTH_ID TOTAL_CPU_TIME

--------------- --------------- -------------------- ---------- --------------------
db2bp 764 EXECUTING INST461 8000
db2bp 73 IDLE INST461 28000
db2bp 0 EXECUTING INST461 0
ROWS_RETURNED SQL Statement

-------------------- ------------------------------
0 update clpm.hist1 set balance=
777 select * from clpm.hist2 where
0 select varchar(application_nam
Figure 1-19. Monitoring performance with SQL: Long-running SQL CL4636.0
Notes:
This query uses the administrative view, SYSIBMADM.MON_CURRENT_SQL, which
returns key metrics for all activities that were submitted on all members of the database
and have not yet been completed, including a point-in-time view of currently executing SQL
statements (both static and dynamic) in the currently connected database.
This can be used to find information about the current active SQL statements including the
time elapsed since this activity began, in seconds. You can also see the status of the
activity.
The activity_state can be used to understand what the activity is currently doing (for
example, is the activity stuck in a queue or waiting for input from the client).
Possible values include:
• CANCEL_PENDING
• EXECUTING
• IDLE

Instructor Guide
• INITIALIZING
• QP_CANCEL_PENDING
• QP_QUEUED
• QUEUED
• TERMINATING
• UNKNOWN
The query shows the count of rows returned and total_cpu_time which are good indicators
for SQL statements that are using large amounts of system resources.
The view MON_CURRENT_SQL became available with DB2 9.7, Fix Pack 1.
In order to use this view, one of the following authorizations is required:
• SELECT privilege on the MON_CURRENT_SQL administrative view
• CONTROL privilege on the MON_CURRENT_SQL administrative view
• DATAACCESS authority
V8.1
Instructor Guide

Purpose — This example uses a query to report on the status for the currently active SQL
statements. This type of query can be useful to help locate the SQL statements that are
being processed and are taking the longest to complete.
Details —
Transition statement — Next we will look at an example of using a query to check for
delays in the currently active applications.

Instructor Guide
Monitoring
performance with SQL application wait times
select application_handle as appl_id ,
total_wait_time ,
pool_read_time, pool_write_time, Check for total wait time
log_disk_wait_time, And most common reasons for delays
lock_wait_time
from table(mon_get_connection(NULL,-1) )
order by total_wait_time
APPL_ID TOTAL_WAIT_TIME POOL_READ_TIME POOL_WRITE_TIME
-------------------- -------------------- -------------------- --------------------
247 486 292 1
248 166662 82517 5205
250 166702 83088 4695
249 166897 81330 5341
251 166953 82397 4840
LOG_DISK_WAIT_TIME LOCK_WAIT_TIME
-------------------- --------------------
0 0
73175 897
72975 718
74581 1059
74166 1158
Figure 1-20. Monitoring performance with SQL application wait times CL4636.0
Notes:
The example query uses the MON_GET_CONNECTION table function to look at some of
the most common reasons for delays by the current application connections. This could be
used to see how much time the current application connections have spent waiting for
buffer pool read and write operations, log disk writes and also shows time spent waiting for
locks.
The sample report shows that most of the application wait time has been for either reading
pages into the buffer pool or writing log records to the DB2 system logs.
Even though the time spent waiting on buffer pool writes is not high, it still might be an
indication that the system could be running more efficiently, since this indicates that page
writes are being performed synchronously rather than asynchronously.
V8.1
Instructor Guide

Purpose — This example uses a query to report on wait times for currently connected
applications.
Details —
Transition statement — Let's look at an example of a query that shows the lock chain of
applications waiting on locks.

Instructor Guide
Monitoring performance with SQL: Lock chains
select substr(lw.hld_application_name,1,10) as "Hold App",

Who is holding the lock?
substr(lw.hld_userid,1,10) as "Holder",
substr(lw.req_application_name,1,10) as "Wait App",
substr(lw.req_userid,1,10) as "Waiter",
lw.lock_mode ,
lw.lock_object_type , Who is waiting on the lock?
substr(lw.tabname,1,10) as "TabName",
substr(lw.tabschema,1,10) as "Schema",
lw.lock_wait_elapsed_time
How long is the wait?
as "waiting (s)"
FROM
SYSIBMADM.MON_LOCKWAITS lw ;
Hold App Holder Wait App Waiter LOCK_MODE LOCK_OBJECT_TYPE TabName Schema waiting (s)
---------- ---------- ---------- ---------- --------- ------------------ -------- ------- -----------
db2bp INST461 db2bp INST461 X TABLE HIST1 CLPM 61
Figure 1-21. Monitoring performance with SQL: Lock chains CL4636.0
Notes:
This query shows any lock chains that currently exist. It shows the lock holder, the
application/user waiting on the lock, as well as the object locked and the length of time the
waiter has been waiting. It is not abnormal to see lock wait chains. What is abnormal is to
see lengthy waiting times. If you see long waits, you should look at what the holding
application is doing (what SQL statement and what the application status is) to determine if
the application is well tuned.
The MON_LOCKWAITS administrative view returns information about agents working on
behalf of applications that are waiting to obtain locks in the currently connected database. It
is a useful query for identifying locking problems. This administrative view replaces the
SNAPLOCKWAIT administrative view which is deprecated and might be discontinued in a
future release.
One of the following authorizations is required:
• SELECT privilege on the MON_LOCKWAITS administrative view
• CONTROL privilege on the MON_LOCKWAITS administrative view
V8.1
Instructor Guide
Uempty • DATAACCESS authority

Instructor Guide
Instructor notes:
Purpose — This shows a query that can be used to list the lock chains for applications that
are currently in a lock wait. Applications that are not in a lock wait will not be listed.
Details —
Transition statement — Let's look at an example of a query that shows the usage of
locking memory.
V8.1
Instructor Guide
Uempty
Monitoring
performance with SQL: Lock memory usage
• Locklist is configured in number of 4k pages.
• Locklistinuse is displayed in bytes.
WITH
dbcfg1 as ( select float(int(value) * 4096) as locklist
from sysibmadm.dbcfg where name = 'locklist' ) ,
dbcfg2 as ( select float(int(value)) as maxlocks
from sysibmadm.dbcfg where name = 'maxlocks' )
select
dec((MDB.lock_list_in_use/locklist)*100,4,1) as "% Lock List",
dec((MDB.lock_list_in_use/(locklist*(maxlocks/100))*100),4,1)
as "% to Maxlock",
MDB.appls_cur_cons as "Number of Cons",
MDB.lock_list_in_use/MDB.appls_cur_cons
as "Avg Lock Mem Per Con (bytes)"
FROM DBCFG1, DBCFG2 , TABLE (MON_GET_DATABASE(-1)) AS MDB
% Lock List % to Maxlock Number of Cons Avg Lock Mem Per Con (bytes)
----------- ------------ -------------------- ----------------------------
29.9 59.9 3 819072
1 record(s) selected.
Figure 1-22. Monitoring performance with SQL: Lock memory usage CL4636.0
Notes:
In the example, the administrative view, SYSIBMADM.DBCFG, is accessed twice, once to
get the size of the locklist database shared memory heap, and once to get the configuration
parameter, maxlocks.
Now the query can compare current lock memory usage, using the table function
MON_GET_DATABASE to the size of the locklist as configured in the db config
parameters. Prior to DB2 10.5 the view SYSIBMADM.SNAPDB could be used to retrieve
lock memory usage.
Maxlocks represents the percentage of the locklist that any one application can use before
a lock escalation from row to table locking would occur. If the amount of lock list in use is
below the maxlocks percentage of the total lock list, then no one application can be causing
an escalation. If the lock list in use is greater than the maxlocks percent of the total locklist,
then it is possible for one or more applications to be approaching a lock escalation. By
looking at the average lock memory per connection, you can get an idea of approximately
how much memory each application is using (but that assumes each application is using
the locklist uniformly which might not be true).

Instructor Guide
Instructor notes:
Purpose — This query shows the usage of locking memory for the database. Lock list
memory is only reported at the database level, not the application level, so it is difficult to
see if any one application is close to its maxlocks limit. The query can only show if the total
lockist memory is above or below the maxlocks limit and average lock memory for the
current connected applications.
Details —
Transition statement — Let's look at an example of a query that shows the statistics for
lock escalations, lock timeouts and deadlocks for applications.
V8.1
Instructor Guide
Uempty
Monitoring performance with SQL:
Lock escalations, deadlocks and timeouts
Select substr(conn.application_name,1,10) as Application,
substr(conn.system_auth_id,1,10) as AuthID,
conn.num_locks_held as "# Locks",
conn.lock_escals as "Escalations",
conn.lock_timeouts as "Lock Timeouts",
conn.deadlocks as "Deadlocks",
(conn.lock_wait_time / 1000) as "Lock Wait Time"
from table(MON_GET_CONNECTION(NULL,-1)) as conn ;
APPLICATION AUTHID # Locks Escalations Lock Timeouts

----------- ---------- -------------------- -------------------- ------------------
db2bp INST461 2 0 0
db2bp INST461 2 1 0
db2bp INST461 3 0 0
Deadlocks Lock Wait Time

-------------------- --------------------
0 0
0 0
0 209
Figure 1-23. Monitoring performance with SQL: Lock escalations, deadlocks and timeouts CL4636.0
Notes:
In general, deadlocks, lock timeouts and lock escalations cause application problems and
your application should be designed to avoid them. You should also have sufficient lock
memory to ensure escalations do not occur. You can use this query to see if any of the
current application connections have been involved in any of these lock issues.
The SQL query uses the MON_GET_CONNECTION table function, which can be used to
look at the statistics for each connection including the current application and the userid
who is running the application.
The sample query result shows that one of the connections encountered a lock escalation
and another application connection has a significant amount of lock wait time.

Instructor Guide
Instructor notes:
Purpose — This query shows the number of lock held, lock escalations, lock timeouts, and
deadlocks for applications currently connected to the database.
Details —
Transition statement — Let's look at an example of a query that analyzes the
precompilation time for dynamic SQL statements in the package cache.
V8.1
Instructor Guide
Uempty
Monitoring performance with
SQL queries that have a high prep time
select num_executions, stmt_exec_time as total_exec_time,
(stmt_exec_time / num_executions ) as Avg_exec_time, prep_time ,
(( 100 * prep_time ) /(stmt_exec_time /num_executions)) as pct_prep,
substr(stmt_text,1,40) as "SQL_Text"
from TABLE (MON_GET_PKG_CACHE_STMT('d',NULL,NULL,-1)) as dyn_cache
where stmt_exec_time > 1000
order by prep_time desc
NUM_EXECUTIONS TOTAL_EXEC_TIME AVG_EXEC_TIME PREP_TIME

-------------------- -------------------- -------------------- --------------------
5 10888 2177 77
15 35577 2371 47
15 10601 706 1
15 7980 532 1
PCT_PREP SQL_Text
-------------------- ----------------------------------------
3 SELECT * from clpm.hist1 order by balanc
1 SELECT * from clpm.hist2 order by balanc
0 SELECT hist2.ACCT_ID, hist2.TELLER_ID, h
0 SELECT * from clpm.hist2 where acct_id
Figure 1-24. Monitoring performance with SQL queries that have a high prep time CL4636.0
Notes:
You can examine the package cache to see how frequently a query is run as well as the
average execution time for each of these queries.
The MON_GET_PKG_CACHE_STMT table function returns a point-in-time view of both
static and dynamic SQL statements in the database package cache. In the example the ‘d’
call parameter limits the results to dynamic statements.
The query sorts the results based on the time required to prepare the statement. It
compares the statement compilation time, prep_time, to a calculated average execution
time.
If the time it takes to compile and optimize a query is almost as long as it takes for the
query to execute, you might want to look at the optimization class that you are using.
Lowering the optimization class might make the query complete optimization more rapidly
and, therefore, return a result sooner.

Instructor Guide
In some cases, a query might take a significant amount of time to prepare, but it is
executed thousands of times, without being prepared again. For these statements, the
optimization class might not be an issue.
V8.1
Instructor Guide

Purpose — This query shows the number executions, the average execution time, and
compares the precompile time to the average execution time for dynamic SQL statements
in the package cache for the database.
Details —
Transition statement — Let's look at an example of a query that looks for applications that
might be performing costly table scans.

Instructor Guide
Monitoring
performance with SQL: Costly table scans
• High ratio of rows read to rows returned can indicate table
scans
select varchar(session_auth_id,10) as Auth_id,
varchar(application_name,20) as Appl_name,
io_wait_time_percent as Percent_IO_Wait,
rows_read_per_rows_returned as Rows_read_vs_Returned
from SYSIBMADM.MON_CONNECTION_SUMMARY
AUTH_ID APPL_NAME PERCENT_IO_WAIT ROWS_READ_VS_RETURNED

---------- -------------------- --------------- ---------------------
INST461 db2bp 10.00 17202
INST461 db2batch 15.22 2
Figure 1-25. Monitoring performance with SQL: Costly table scans CL4636.0
Notes:
The MON_CONNECTION_SUMMARY administrative view returns key metrics for all
connections in the currently connected database. It is designed to help monitor the system
in a high-level manner, showing incoming work per connection.
The metrics returned represent the accumulation of all metrics for requests that were
submitted by the identified connection across all members of the database.
The schema is SYSIBMADM.
• SELECT privilege on the MON_CONNECTION_SUMMARY administrative view
• CONTROL privilege on the MON_CONNECTION_SUMMARY administrative view
This query returns information that might indicate that applications that might be performing
large table scans. The column ROWS_READ_PER_ ROWS_RETURNED shows the
average number of rows read from the table per rows returned to the application. If the
V8.1
Instructor Guide
Uempty selectivity is low, then the application might be performing a table scan (perhaps
unnecessarily if an index were available).
The result also shows the percentage of the time spent waiting within the DB2 database
server that was due to I/O operations. This includes time spent performing direct reads or
direct writes, and time spent reading data and index pages from the table space to the
bufferpool or writing them back to disk.

Instructor Guide
Instructor notes:
Purpose — This query compares the number of rows returned to the number of rows read
to produce the query results. If a small percentage of the rows being read are included in
the query result, then a table scan might being used and a index might significantly improve
performance.
Details —
Transition statement —
V8.1
Instructor Guide
Uempty
Monitoring
performance with SQL: Checking page cleaners
SELECT sum(pool_data_writes) as pool_data_writes,

sum(pool_data_writes - pool_async_data_writes) as data_sync_writes,
sum(pool_write_time) as pool_write_time,
sum(pool_lsn_gap_clns) as pool_lsn_gap_clns,
sum(pool_drty_pg_steal_clns) as pool_drty_pg_steal_clns,
sum(pool_drty_pg_thrsh_clns) as pool_drty_pg_thrsh_clns
FROM TABLE(MON_GET_BUFFERPOOL(NULL,-1)) as bp1
where bp_name not like 'IBMSYSTEM%'
POOL_DATA_WRITES DATA_SYNC_WRITES POOL_WRITE_TIME

-------------------- -------------------- --------------------
5264 0 31444
POOL_LSN_GAP_CLNS POOL_DRTY_PG_STEAL_CLNS POOL_DRTY_PG_THRSH_CLNS

-------------------- ----------------------- -----------------------
23 0 0
Figure 1-26. Monitoring performance with SQL: Checking page cleaners CL4636.0
Notes:
The query uses the table function MON_GET_BUFFERPOOL to retrieve database
statistics for buffer pool page cleaning activity.
Page cleaners will be triggered in three ways:
1. When a dirty page is chosen as the victim buffer page, thus requiring the dirty page to
be written before the desired page can be read.
2. When the changed page threshold is reached which represents a percentage of dirty
pages in the buffer pool.
3. When Dirty pages in the buffer pool exceed the database configuration setting that
limits redo processing for a crash recovery.

Instructor Guide
Note
Prior to DB2 10.5 the option softmax was used to define the age limit of changed buffer
pool pages as a percentage of a logfile. that would need to be reread during a crash
recovery.
With DB2 10.5, the page cleaning criterion is determined by the setting for the
page_age_trgt_mcr configuration parameter. Page cleaners are triggered when the
oldest page in the buffer pool exceeds the configured time for the page_age_trgt_mcr
configuration parameter.
This query shows the percentage for each type of page cleaning activity.
V8.1
Instructor Guide

Purpose — This query looks at the database statistics for page cleaning activity to
calculate the percentages for each of the three types of page cleaners. Of the three, the 
% steals being high might be an indication that writing database pages to disk is not being
processed efficiently.
If the DB2 registry variable DB2_USE_ALTERNATE_PAGE_CLEANING is ON, then these
statistics will all be 0 and this query will not return a result.
Details —
Transition statement — Let's look at an example of a query that looks at the database
statistics for prefetching activity.

Instructor Guide
Monitoring prefetch efficiency of applications
SELECT pool_queued_async_data_reqs as Prefetch_Reqs,

pool_queued_async_data_pages as Prefetch_pages ,
( pool_queued_async_data_pages / pool_queued_async_data_reqs )
as Pages_per_prefetch
pool_failed_async_data_reqs as Failed_prefetch,
prefetch_wait_time , prefetch_waits
FROM TABLE(MON_GET_CONNECTION(NULL,-1)) AS c1
where application_name = 'db2bp‘ ;
PREFETCH_REQS PREFETCH_PAGES PAGES_PER_PREFETCH

-------------------- -------------------- --------------------
3573 28587 8
FAILED_PREFETCH PREFETCH_WAIT_TIME PREFETCH_WAITS

-------------------- -------------------- --------------------
0 711 143
Figure 1-27. Monitoring prefetch efficiency of applications CL4636.0
Notes:
The example query uses the table function MON_GET_CONNECTION to retrieve some
monitor elements for database connections running a particular application.
The query returns the number of prefetch requests and calculates the average number of
pages read per prefetch. It also returns the number and total duration of waits for prefetch
operations.
The element pool_failed_async_data_reqs shows the number of times an attempt to queue
a data prefetch request was made but failed. One possible reason is the prefetch queue is
full.
V8.1
Instructor Guide

Purpose — To discuss a SQL query that shows prefetch activity and wait times for a
particular application that is currently active.
Details —
Transition statement — Next we will look at a query that show current database memory
allocations.

Instructor Guide
Monitoring Database memory usage using the table

function MON_GET_MEMORY_POOL
SELECT VARCHAR(MEMORY_POOL_TYPE,20) AS POOL_TYPE,

MEMORY_POOL_USED, MEMORY_POOL_USED_HWM
FROM TABLE (MON_GET_MEMORY_POOL ('DATABASE',NULL,NULL) ) AS TMEM
POOL_TYPE MEMORY_POOL_USED MEMORY_POOL_USED_HWM

-------------------- -------------------- --------------------
UTILITY 65536 65536
PACKAGE_CACHE 524288 917504
XMLCACHE 131072 131072
CAT_CACHE 393216 393216
BP 16908288 16908288
BP 52166656 52166656
BP 851968 851968
BP 589824 589824
BP 458752 458752
BP 393216 393216
SHARED_SORT 196608 262144
LOCK_MGR 2228224 2228224
DATABASE 60489728 60489728
Figure 1-28. Monitoring performance: Database memory usage CL4636.0
Notes:
The MON_GET_MEMORY_POOL table function retrieves metrics from the memory pools
contained within a memory set.
The visual shows a sample report generated using MON_GET_MEMORY_POOL. The
function can be used to check current memory allocations and also to see the peak usage
of memory for each pool while the database was active.
V8.1
Instructor Guide

Purpose — To discuss using MON_GET_MEMORY_POOL to analyze database memory
pool allocations.
Details —
Transition statement — Next we will look at monitoring partitioned database processing
with SQL functions.

Instructor Guide
Using MONITOR
functions with database partitioning
• Using the MONITOR table functions in a partitioned database environment, you can:
– Receive data for a single partition or for all partitions.
– If you choose to receive data for all partitions, the table functions return one row
for each partition.
• You can add the values across partitions to obtain the value of a monitor element
across partitions.
To list the activity on all tables accessed since the database was activated, aggregated
across all database members, ordered by highest number of reads.
SELECT varchar(tabschema,20) as tabschema, varchar(tabname,20) as

tabname,
sum(rows_read) as total_rows_read,
sum(rows_inserted) as total_rows_inserted, sum(rows_updated) as
total_rows_updated,
sum(rows_deleted) as total_rows_deleted
FROM TABLE(MON_GET_TABLE('','',-2)) AS t
GROUP BY tabschema, tabname
ORDER BY total_rows_read DESC
Using -2 for member
retrieves stats for
All Database partitions
Figure 1-29. Using MONITOR functions with database partitioning CL4636.0
Notes:
Monitor table functions and views are routines with names that begin with "MON", such as
MON_GET_SERVICE_SUBCLASS or MON_GET_TABLE. When using the table functions
in a partitioned database environment, you can choose to receive data for a single partition
or for all partitions. If you choose to receive data for all partitions, the table functions return
one row for each partition. You can add the values across partitions to obtain the value of a
monitor element across partitions.
The example query listed would return cumulative statistics for DB2 table activity from all
database partitions. The ‘-2’ option for the member function call parameter would include
stats for all partitions but the GROUP BY clause would combine these statistics into a
single result row for each table.
V8.1
Instructor Guide

Purpose — To discuss using the Monitor table functions with database partitioning. The
monitoring functions provide a 'member' call parameter that refers to the database
partitions in a DPF system.
Details —
Transition statement — Let's look at another example query using MON_GET_INDEX to
review index access in a partitioned database.

Instructor Guide
Monitor performance with SQL:

Index usage with multiple database partitions
select varchar(mon.tabname,12) as table,
member, Using -2 for member
varchar(cat.indname,14) as IX_Name, retrieves stats for
mon.IID as Index_id, All Database partitions
mon.index_scans, mon.index_only_scans
from table(MON_GET_INDEX(NULL,NULL,-2)) as mon ,
SYSCAT.INDEXES as cat
where mon.tabname = cat.tabname
and mon.tabschema = cat.tabschema
and mon.iid = cat.iid
and mon.tabname in ('ACCT','HISTORY','BRANCH','TELLER')
order by mon.tabname
TABLE MEMBER IX_NAME INDEX_ID INDEX_SCANS INDEX_ONLY_SCANS

------------ ------ -------------- -------- -------------------- --------------
ACCT 1 ACCTINDX 1 1 1
HISTORY 1 HISTIX3 3 1 0
TELLER 0 TELLINDX 1 1 1
Figure 1-30. Monitor performance with SQL: Index usage with multiple database partitions CL4636.0
Notes:
The example SQL query uses the MON_GET_INDEX Monitor table function to list the
cumulative statistics for all indexes defined on a specific set of application tables.
Using a ‘-2’ for the third function call parameter causes the table function to include all
database partitions.
The sample result shows that two of the tables have been accessed on database partition
1, while the TELLER table has been accessed on database partition 0. The indexes on
these tables that exist but have not been utilized yet show an index scan count of zero.
V8.1
Instructor Guide

Purpose — To show an example query that uses the MON_GET_INDEX monitor table
function to list index usage for each database partition.
Details —
Transition statement — Next we will see an example of a query showing possible
performance issues in the FCM processing for a partitioned database system.

Instructor Guide
Monitoring performance with SQL

DPF queries that waited on FCM Send/Receive
select member, total_act_time,
fcm_recv_volume, fcm_send_volume,
fcm_recv_wait_time, fcm_send_wait_time,
total_cpu_time,
substr(stmt_text,1,30) as sql_text
from table(mon_get_pkg_cache_stmt (NULL, NULL, NULL ,-2)) as t1
where fcm_recv_wait_time > 10000 order by member
MEMBER TOTAL_ACT_TIME FCM_RECV_VOLUME FCM_SEND_VOLUME

------ -------------------- -------------------- --------------------
0 54577 49277091 12294047
1 45586 14197054 32985721
2 47815 14205970 32971991
FCM_RECV_WAIT_TIME FCM_SEND_WAIT_TIME TOTAL_CPU_TIME SQL_TEXT

-------------------- -------------------- -------------------- ------ -------
32804 7116 6288616 SELECT ACCT.ACCT_ID, ACCT.NAME
Figure 1-31. Monitoring performance with SQL DPF queries that waited on FCM Send/Receive CL4636.0
Notes:
The visual show a sample query that uses the MON_GET_PKG_CACHE_STMT table
function to return some of the FCM related statistics from the dynamic and static SQL
statements in package cache memory. Since data collects in the package cache memory
over time for an active database, this data could be used to locate SQL statements that
required larger amounts of data to be sent between partitions or spent the most time
waiting for FCM data to be sent or received.
The sample shows the volume of FCM data sent and received across three database
partitions for one SQL statement. The query includes a WHERE clause that limits the
results to statements with more than ten seconds of time spent waiting for FCM data to be
received.
The amount of time spent waiting to send and receive FCM data is also shown by database
partition.
V8.1
Instructor Guide

Purpose — To discuss an example query that looks for SQL statements in the database
package cache that spent a significant amount of time waiting for FCM data to be received
in a partitioned database.
Details —
Transition statement — Next we will look at using the MON_GET_TRANSACTION_LOG
monitor function to check database log usage.

Instructor Guide
Monitor Log space usage with the table function

MON_GET_TRANSACTION_LOG
Amount of log used and free space

currently
select
int(total_log_used/1024/1024) as "Log Used (Meg)",
int(total_log_available/1024/1024)
as "Log Space Free (Meg)",
int((float(total_log_used) /
float(total_log_used+total_log_available))*100)
as "Pct Used",
int(tot_log_used_top/1024/1024) as "Max Log Used (Meg)",
int(sec_log_used_top/1024/1024) as "Max Sec. Used (Meg)",
int(sec_logs_allocated) as "Secondaries"
from table (MON_GET_TRANSACTION_LOG(-2)) as tlogs ;
High Water Marks
Log Used (Meg) Log Space Free (Meg) Pct Used Max Log Used (Meg) Max Sec. Used (Meg) Secondaries
-------------- -------------------- -------- ------------------ ------------------- -----------
12 3 76 12 10 14
Figure 1-32. Monitor Log space usage with the table function MON_GET_TRANSACTION_LOG CL4636.0
Notes:
A database log full condition effects all of the update transactions using the database. This
query checks the vital statistics for database log space, including how much is currently
being used and how much is still available. The High Water Mark for log space usage is
shown, so that you can properly set your log space parameters, logprimary, logsecond and
logfilsiz.
The MON_GET_TRANSACTION_LOG table function returns information about the
transaction logging subsystem for the currently connected database. This function can be
used to replace the use of the SNAPDB view and SNAP_GET_DB function for access to
important database logging status.
The query return the current log space usage as well as a high water mark of database log
space usage.
V8.1
Instructor Guide

Purpose — To discuss using MON_GET_TRANSACTION_LOG to query database
logging related statistics.
Details —
Transition statement — Next we will look at a query that shows if a long running
transaction is holding database log space.

Instructor Guide
Monitor health:
Oldest transaction holding log space
select substr(uow.workload_occurrence_state,1,20) as "Status",

substr(uow.session_auth_id,1,10) as "Authid", What is the
uow.application_handle as "Appl Handle", application doing?
int(uow.UOW_LOG_SPACE_USED/1024/1024) as "Log Used (M)",
uow.total_act_time as "Total Activity time(msec)",
uow.total_act_wait_time as "Total Activity Wait time",
uow.uow_start_time as "UOW Start Time"
from table (MON_GET_TRANSACTION_LOG(-1)) as db,
table (MON_GET_UNIT_OF_WORK(NULL,-1)) as uow
where uow.application_handle = db.APPLID_HOLDING_OLDEST_XACT
Status Authid Appl Handle Log Used (M)

-------------------- ---------- -------------------- ------------
UOWWAIT INST461 66 10
Total Activity time(msec) Total Activity Wait time UOW Start Time
------------------------- ------------------------ --------------------------
770 66 2013-09-17-11.47.32.768302
Figure 1-33. Monitor health: Oldest transaction holding log space CL4636.0
Notes:
If database monitoring indicates that a large amount of active database log space is
allocated, this example SQL query can be used to check which application has the oldest
transaction holding log space. That is, if this application were to commit, there would be
some log space freed up for other transactions to use. If this transaction has been idle for
some time, then perhaps the person running the application has updated some rows, but
has stepped out for a coffee without committing their transaction. It might be necessary to
force this application's transaction to rollback in order to avoid a database log full condition.
The application id of the active transaction holding the oldest log record is retrieved from a
the function MON_GET_TRANSACTION_LOG.
The application name and status come from the MON_GET_UNIT_OF_WORK table
function which includes the amount of log space used for the current unit of work.
V8.1
Instructor Guide

Purpose — This query shows the statistics for the application that is holding the oldest log
record in the active logs. This is the LOW_TRAN_LSN for a crash recovery. This is much
more important in systems with long running transactions that might cause a log full
condition.
Details —
Transition statement — Let's look at an example of a query that shows information about
the disk space utilization in table spaces.

Instructor Guide
Monitor health: Table space size
select substr(tbsp_name,1,30) as "Tablespace Name",

tbsp_type as "Type", substr(tbsp_state,1,20) as "Status",
(tbsp_total_size_kb / 1024 ) as "Size (Meg)",
( 100 - tbsp_utilization_percent ) as "% Free Space",
((( 100 - tbsp_utilization_percent ) * tbsp_usable_size_kb)
/ 100000 ) as "Meg Free Space"
Shows you how much
FROM SYSIBMADM.MON_TBSP_UTILIZATION
free space and
percentage free
Tablespace Name Type Status Size (Meg) % Free Space Meg Free Space
----------------- ---------- -------- ----------- ------------- ---------------
SYSCATSPACE DMS NORMAL 96 23.80 23.39
TEMPSPACE1 SMS NORMAL 0 0.00 0.00
USERSPACE1 DMS NORMAL 32 98.83 32.25
TSP01 DMS NORMAL 0 80.00 0.32
TSP07 SMS NORMAL 0 0.00 0.00
SYSTOOLSPACE DMS NORMAL 32 92.13 29.95
USERTEMP SMS NORMAL 0 0.00 0.00
CLPMTSP1 DMS NORMAL 62 78.18 49.98
CLPMTSP2 DMS NORMAL 62 77.88 49.79
Figure 1-34. Monitor health: Table space size CL4636.0
Notes:
This query can be used to check the space utilization for each table space, including the
current size, percentage free space, and size of free space.
The MON_TBSP_UTILIZATION administrative view returns key monitoring metrics,
including hit ratios and utilization percentage, for all table spaces and all database
partitions in the currently connected database. It provides critical information for monitoring
performance as well as space utilization. This administrative view is a replacement for the
TBSP_UTILIZATION administrative view.
V8.1
Instructor Guide

Purpose — This query shows the space utilization statistics for all the table spaces in the
database.
Details —
Transition statement — Next we will look at a query that shows disk paths associated
with a database.

Instructor Guide
Database recovery: Split mirror copy planning

select substr(type,1,20) as type ,
substr(path,1,50) as path
FROM SYSIBMADM.DBPATHS
order by type
TYPE PATH
-------------------- --------------------------------------------------
DBPATH /database/inst461/NODE0000/SQL00001/
DBPATH /database/inst461/NODE0000/SQL00001/MEMBER0000/
DB_STORAGE_PATH /dbauto/path2/
DB_STORAGE_PATH /dbauto/path1/
LOCAL_DB_DIRECTORY /database/inst461/NODE0000/sqldbdir/
LOGPATH /database/inst461/NODE0000/SQL00001/LOGSTREAM0000/
TBSP_CONTAINER /database/inst461/NODE0000/SQL00001/DMSclpm4
TBSP_CONTAINER /database/inst461/NODE0000/SQL00001/DMSCLPM2
TBSP_CONTAINER /database/inst461/NODE0000/SQL00001/tsp03
TBSP_CONTAINER /database/inst461/NODE0000/SQL00001/tsp02
TBSP_DIRECTORY /database/inst461/NODE0000/SQL00001/sms/sms01/
Figure 1-35. Database recovery: Split mirror copy planning CL4636.0
Notes:
The SYSIBMADM.DBPATHS administrative view returns the values for database paths
required for tasks such as split mirror backups. The information needs to be collected from
several sources, including table space statistics and the database configuration.
The TYPE column values are:
• TBSP_DEVICE — Raw device for a database managed space (DMS) table space.
• TBSP_CONTAINER — File container for a DMS table space.
• TBSP_DIRECTORY — Directory for a system managed space (SMS) table space.
• LOGPATH - Primary log path.
• LOGPATH_DEVICE — Raw device for primary log path.
• MIRRORLOGPATH — Database configuration mirror log path.
• DB_STORAGE_PATH — Automatic storage path.
• DBPATH — Database directory path.
• LOCAL_DB_DIRECTORY — Path to the local database directory.
V8.1
Instructor Guide

Purpose — This shows an example of using a DB2 provided administrative view to list the
disk paths that would be needed for a split mirror copy of a database.
Details —
Transition statement — Next we will look at a method to collect the results from SQL
queries that use the administrative views and functions for later analysis.

Instructor Guide
Tracking monitor history

Create your own monitor
history tables
create table con_history as
( select current timestamp AS CURRENT_TIME,
appls_in_db2,
appls_cur_cons, agents_top
from TABLE (MON_GET_DATABASE(-1)) AS MONDB )
definition only
insert into con_history

select current timestamp, Populate them on a regular
appls_in_db2, basis to use for problem
appls_cur_cons, tracking or perf tuning
agents_top
from TABLE (MON_GET_DATABASE(-1)) AS MONDB ;
Figure 1-36. Tracking monitor history CL4636.0
Notes:
The monitoring statistics returned by the monitor table functions and views are in memory
and are lost when the database or instance stops.
A normal DB2 table can be created to store the monitor statistics so that you can track what
is happening over time with your database.
The visual shows how a table could be created using the DEFINITION ONLY clause that
matches the monitoring query that retrieves selected statistics using the
MON_GET_DATABASE table function.
V8.1
Instructor Guide

Purpose —
Details —

Instructor Guide
DB2 LUW Performance Tuning and

Monitoring for Single and Multiple Partition DBs
• DB2 LUW Performance Tuning and Monitoring for Single and Multiple
Partition DBs (CL44)
• Audience:
– Database designers, database administrators, application developers
• Objectives:
– Define the impact of database design (tables, indexes, and data
placement) on database performance
– Describe database application programming considerations and how
they affect performance
– Identify and describe the parameters (database and non-database) that
affect performance
– Tune parameters to achieve optimum performance for Online Transaction
processing (OLTP) or Data Warehouse environments
– Identify and use the tools that assist in monitoring and tuning of single
partition and multiple partition databases
– Analyze explain reports to identify the access strategies selected by the
DB2 Optimizer for execution of SQL statements including the selection of
indexes, join techniques, sorts and table queues
Figure 1-37. DB2 LUW Performance Tuning and Monitoring for Single and Multiple Partition DBs CL4636.0
Notes:
V8.1
Instructor Guide

Purpose —
Details —

Instructor Guide
DB2 for Linux, UNIX, and Windows

Performance Tuning and Monitoring Workshop
• DB2 for Linux, UNIX, and Windows Performance Tuning and
Monitoring Workshop (CL41)
• Audience: Database designers, database administrators,
application developers
• Objectives:
– Define the impact of database design (tables, indexes, and data
placement) on database performance
– Describe database application programming considerations and how
they affect performance
– Identify and describe the parameters (database and non-database)
that affect performance
– Tune parameters to achieve optimum performance
– Identify and use the tools that assist in monitoring and tuning of a
database
Figure 1-38. DB2 for Linux, UNIX, and Windows Performance Tuning and Monitoring Workshop CL4636.0
Notes:
V8.1
Instructor Guide

Purpose —
Details —

Instructor Guide
Unit summary
Having completed this unit, you should be able to:
efficient use of database memory for locks, sorting and
database buffer pools
• Check database health indicators, like log space available and
table space utilization using CLP queries using Monitor
functions and views
Figure 1-39. Unit summary CL4636.0
Notes:
V8.1
Instructor Guide

Purpose —
Details —

Instructor Guide
Student exercise 1
Figure 1-40. Student Exercise 1 CL4636.0
Notes:
V8.1
Instructor Guide

Purpose —
Details —

Instructor Guide
V8.1
Instructor Guide
Uempty Unit 2. Advanced Table Space Management
Estimated time
02:00

This unit describes various aspects of table space management. For
Database Managed (DMS) and Automatic Storage Managed Table
spaces, the concepts of the high water mark, the mapping of extents to
the containers and the processing of the Rebalancer utility are
covered. Students will also learn how to implement and administer
table spaces using Automatic Storage management. We will discuss
the differences between DMS and Automatic Storage management
and how to convert a DMS managed tablespace to utilize automatic
storage. The use of storage groups to mange table spaces will be
covered.

• Describe the benefits and limitations of using DMS and Automatic
Storage management for table spaces
• Monitor tablespace space allocations using SQL with
MON_GET_TABLESPACE and MON_GET_CONTAINERS
functions
• ALTER a table space to handle High Water Mark related issues
and reclaim unused space from DMS and Automatic Storage table
spaces
• Use the REBALANCE option to control container allocations for
Automatic Storage table spaces when changing storage paths
• Monitor the processing done by the Rebalancer using LIST
UTILITIES and db2pd commands
• Plan and implement changes to disk space allocations using
ALTER TABLESPACE options: ADD, EXTEND, RESIZE, DROP,
and BEGIN NEW STRIPE SET
• Convert a DMS-managed table space to use Automatic Storage
• Move a Tablespace from one storage group to another
© Copyright IBM Corp. 2005, 2015 Unit 2. Advanced Table Space Management 2-1
Instructor Guide
• Utilize Data tags to alter WLM service subclass for processing

activities based on data accessed
V8.1
Instructor Guide
Uempty
Unit objectives
• Describe the benefits and limitations of using DMS and Automatic Storage management
for table spaces
• Monitor tablespace space allocations using SQL with MON_GET_TABLESPACE and
MON_GET_CONTAINERS functions
• ALTER a table space to handle High Water Mark related issues and reclaim unused
space from DMS and Automatic Storage table spaces
• Use the REBALANCE option to control container allocations for Automatic Storage table
spaces when changing storage paths
• Monitor the processing done by the Rebalancer using LIST UTILITIES and db2pd
commands
• Plan and implement changes to disk space allocations using ALTER TABLESPACE
options: ADD, EXTEND, RESIZE, DROP, and BEGIN NEW STRIPE SET
• Utilize Data tags to alter WLM service subclass for processing activities based on data
accessed
Notes:
Here are the objectives for this unit.
Instructor Guide
Instructor notes:
Transition statement — First we will describe the features of automatic storage managed
table spaces.
V8.1
Instructor Guide
Uempty
Storage Management alternatives (1 of 2)

• Automatic Storage Managed:
– Administration is very easy, no need to define the number or names of the containers
– Disk space assigned from disk paths for a storage group
– Monitoring of available space at the storage group level instead of each table space
– Multiple containers will be created using all available paths for the storage group
– Automatic Storage can be enabled when the database is created or added to an existing database
• Default is ON for CREATE DATABASE with DB2 9
– Storage paths can be added or removed using ALTER STOGROUP
– Uses standard DMS and SMS under the covers:
• DMS used for REGULAR and LARGE table spaces
• SMS used for SYSTEM and USER TEMPORARY table spaces
– Table space allocation controlled by CREATE/ALTER options:
• INITIALSIZE: Defaults to 32 MB
• AUTORESIZE: Can be set to YES or NO
• INCREASESIZE: Can be set to amount or percent increase
• MAXSIZE: Can define growth limits
Figure 2-2. Storage Management alternatives (1 of 2) CL4636.0
Notes:
The administration of table spaces using Automatic Storage is very easy. The CREATE
TABLESPACE syntax does not require the names of containers or the number of
containers to be defined. For Automatic Storage table spaces, the disk space is assigned
from a storage group. If no storage group is specified, a default storage group will be used.
This allows the DBA to monitor available space at the storage group level instead of each
individual table space. As long as there is space available in one of the defined Automatic
Storage paths, a table space can be automatically extended by DB2. Smaller databases
may only want a single storage group for the database.
When a table space is created, DB2 can create multiple containers, using all of the
available storage paths, which helps performance for table and index scans.
Additional storage path(s) can be added using an ALTER STOGROUP statement, to
support growth over time.
Instructor Guide
Automatic Storage management actually uses DMS and SMS table spaces under the
covers. A DMS table space is used for REGULAR and LARGE table spaces, while SMS is
used for SYSTEM and USER TEMPORARY table spaces.
The allocation of space for Automatic Storage table spaces can be controlled using the
options of CREATE and ALTER TABLESPACE, including:
• INITIALSIZE – Which defaults to 32 MB, if not specified.
• AUTORESIZE – Can be set to YES or NO, with YES being the default for Regular and
Large table spaces.
• INCREASESIZE – Can be set to a specific amount or percentage increase.
MAXSIZE – Can be used to define a limit on growth for the table space.
V8.1
Instructor Guide

Purpose — To discuss automatic storage management for table spaces in more detail.
Details —
Transition statement — Next we will discuss the features of DMS and SMS managed
table spaces.
Instructor Guide
Storage Management alternatives (2 of 2)

• DMS – Database Managed:
– Manually assigned space can be adjusted for the table space after it has
been created by adding or enlarging containers as well as decreasing or
deleting containers
– AUTORESIZE can be defined to handle increased demand for space
• DB2 can only automatically add space to the last stripe set
– A table's objects can be split across multiple table spaces
– Can be converted to Automatic Storage management
– Maximum size (8 TB 4K page, 64 TB 32K page) per table space (per
Partition) using a LARGE table space
• SMS – System Managed:
– Efficient for handling small temporary tables
– Temporary table spaces should be defined using automatic storage for
more flexible management
– Use of SMS for non-temporary tables was deprecated with 10.1
Figure 2-3. Storage Management alternatives (2 of 2) CL4636.0
Notes:
For DMS management, the characteristics are:
• The disk space can be adjusted for the table space after it has been created by adding
or enlarging containers as well as decreasing or deleting containers, using the ALTER
TABLESPACE command.
• The AUTORESIZE option can be enabled for DMS table spaces to dynamically
increase the allocated disk space to handle greater then expected application needs.
• A table's objects can be split across multiple table spaces. The index and large object
components of a table can be directed to other table spaces when the table is created.
• DB2 has more direct control over storage allocation. Space can be reserved for growth
of critical application tables and indexes.
• Maximum size (8 TB 4K page, 64 TB 32K page) per Table space (per Partition) using a
Large DMS table space, which uses 6-byte Row IDs. Regular DMS table spaces use
4-byte Row IDs, which can address up to 16 million pages. The 6-byte Row IDs used for
a Large table space provides a 4-byte page address and two-bytes are used for the row
V8.1
Instructor Guide
Uempty offset within a data page. Indexes based on tables in Large tablespaces will require 6
bytes per index entry to address the data rows.
System-managed storage
In an SMS (System-Managed Storage) table space, the operating system's file system
manager allocates and manages the space where the table is stored. The storage model
typically consists of many files, representing table objects, stored in the file system space.
The user decides on the location of the files, DB2 Database for Linux, UNIX, and Windows
controls their names, and the file system is responsible for managing them. By controlling
the amount of data written to each file, the database manager distributes the data evenly
across the table space containers. Each table has at least one SMS physical file
associated with it. When a table is dropped, the associated files are deleted, so the disk
space is freed. SMS table spaces do not have a defined size limit.
Important
The SMS table space type has been deprecated in Version 10.1 for user-defined
permanent table spaces and might be removed in a future release. The SMS table space
type is not deprecated for catalog and temporary table spaces.
Since system and user temporary tablespaces can be managed with automatic storage
management, temporary tablespaces should be defined as automatic storage managed
rather than standard SMS managed
Instructor Guide
Instructor notes:
Purpose — To summarize the characteristics of DMS and SMS managed table spaces.
Details —
Transition statement — To discuss how DB2 manages the disk space for DMS and
permanent automatic storage managed table spaces.
V8.1
Instructor Guide
Uempty
DMS and Automatic Storage table space characteristics

• DB2 directly manages disk space allocation to objects using the table
space extent size
• Containers are either operating system files or raw devices(DMS):
– With DMS, table space capacity can be increased manually online by adding
new containers or extending existing ones using DDL
– Unused space can be manually reduced or eliminated online using DDL
– AUTORESIZE can be used to help avoid SQL -289 failures
• Data is striped across containers by extent
• Disk space is allocated at table space creation/extension time
– SMPs (Space Map Pages) keep track of what extents are used and which are
free
• Data objects are located by:
– OBJECT TABLE - Locates first extent in the object
– EMPs (Extent Map Pages) for the object - Locate other extents in the object
Figure 2-4. DMS and Automatic Storage table space characteristics CL4636.0
Notes:
For DMS and Automatic storage table spaces DB2 uses directly manages the disk space
allocation as objects are created, extended, or dropped.
• The containers for DMS table spaces can be either operating system files or raw
devices. Raw devices can only be used for DMS tablespaces.
• With DMS you can increase table space capacity online by adding new containers or
extending existing ones using ALTER TABLESPACE DDL.
• You can decrease capacity online by shrinking or dropping containers online using
ALTER TABLESPACE DDL.
• The AUTORESIZE option can be used to help prevent the SQL code -289 failures when
the table space is full, by automatically extended the table space file containers.
Autoresize is enabled by default for Automatic Storage tablespaces and not enabled for
DMS managed tablespaces.
• The data is striped across containers by extent to improve the I/O parallelism when
scanning data and index pages.
Instructor Guide
• The Disk space is allocated at table space creation (CREATE TABLESPACE) OR

extension (ALTER TABLESPACE) time. This can be used to reserve space for critical
applications or restrict growth for certain table spaces.
• DB2 maintains SMPs (Space Map Pages) to keep track of what extents are used and
which are free.
• The data objects are located using:
- The OBJECT TABLE - Locates first extent in the object.
- The EMPs (Extent Map Pages) for the object - Locate other extents in the object.
V8.1
Instructor Guide

Purpose — To review some of the important features associated with DMS table space
management.
Details —
Transition statement — Let's look at the internal structure for DMS table spaces.
Instructor Guide
DMS table spaces: Internal structure
Background – DMS table spaces

What happens on disk after the following:
Table space (Logical) Address Map db2 create tablespace tb2 managed by
database using (file '/myfile' 1024,
0 Table space Header device '/dev/rhd7' 2048) extentsize
1 First SMP Extent 4 prefetchsize 8
db2 create table t1 in tb2
2 Object Table
db2 create table t2 in tb2
3 Extent Map for T1
4 First Extents of Data Container (Physical) Address Map
/mydir1/.
Pages for T1
5
6 Extent Map for T2
0
2
/mydir2/.
Tag
1
3
Tag
4 5
First Extent of Data 6 7
7 Pages for T2 8 9
10 11
8 Another Extent of Data 12 13
14 15
Pages for T1
508 509
31968 Second SMP Extent /myfile
/dev/rhd7
Figure 2-5. DMS table spaces: Internal structure CL4636.0
Notes:
Here is an example of a CREATE TABLESPACE DDL statement for a DMS managed table
space.
db2 create tablespace tb2 managed by database
using (file '/myfile' 1024, device '/dev/rhd7' 2048)
extentsize 4 prefetchsize 8
The table space tb2 will have two containers, a file '/myfile' with 1024 pages and a raw
device '/dev/rhd7' with 2048 pages. The extent size is 4 pages, so prefetch size was set to
8 pages to support parallel I/O of the two containers. The extents will be striped across the
two containers.
The first extent in the logical table space address map is a header for the table space
containing internal control information. The second extent is the first extent of space map
pages (SMP) for the table space. SMP extents are spread at regular intervals throughout
the table space. Each SMP extent is a bit map of the extents from the current SMP extent
to the next SMP extent. The bit map is used to track which of the intermediate extents are
in use.
V8.1
Instructor Guide
Uempty The next extent following the SMP is the object table for the table space. The object table is
an internal table that tracks which user objects exist in the table space and where their first
extent map page (EMP) extent is located. Each object has its own EMPs which provide a
map to each page of the object that is stored in the logical table space address map.
For each table object created, two extents are allocated to hold the extent map for that
object and one extent for storing data. Additional extents will be assigned to the tables as
needed.
Extents are never shared among the objects in a DMS table space, so small tables can
contain a number of empty pages, depending on the extent size defined for the table
space. The indexes for each table are treated as a group object, so all of the indexes for
one table will share a common extent map extent, rather than require one for each index.
Instructor Guide
Instructor notes:
Purpose — This is used to show the extents that are used by DB2 to manage the space
within a DMS table space.
Details —
Transition statement — Let's look at the concept of the high water mark for a DMS table
space.
V8.1
Instructor Guide
Uempty
Table space management: High Water Marks

• TS High Water Mark is the highest
numbered page that is currently
allocated to some object or used
for internal space management.
• High Water Mark is NOT the
highest numbered page ever
used in the table space.
• HWM can be much higher than
the number of pages currently
in use due to dropped tables or
offline table REORG
• This effects both DMS and non-temporary Automatic Storage
table spaces
– Limits ability to release currently unused space
– Increases amount of data, extents saved by BACKUP utility
• DB2 provides methods to reduce the HWM

– ALTER TABLESPACE LOWER HIGH WATER MARK
– ALTER TABLESPACE REDUCE MAX (Auto Storage)
Figure 2-6. Table space management: High Water Marks CL4636.0
Notes:
One of the characteristics of DMS managed table spaces is the High Water Mark for the
table space. The High Water Mark for DMS table spaces can be monitored using the table
function MON_GET_TABLESPACE.
The table space High Water Mark is the highest numbered page that is currently allocated
to some object or used for internal space management. The High Water Mark is not the
highest numbered page ever used in the table space. It can increase or decrease as DB2
manages the extents assigned to objects in the table space.
In some cases, the High Water Mark is equal to number of pages currently allocated in the
table space but it can also be much higher. When an Offline REORG utility is used to
reorganize a table and a temporary table space is not assigned, the High Water Mark might
increase or decrease significantly. Dropping a table might also cause the High Water Mark
to be greater than the number of pages currently in use.
In most cases, the High Water Mark has little or no impact on the use of the table space to
support applications. There are a few cases where the current High Water Mark might
become an issue for a DBA.
Instructor Guide
Beginning with DB2 9.7, the ALTER TABLESPACE options LOWER HIGH WATER MARK
option REDUCE can be used to rearrange the extents in a table space to reduce the high
water mark and allow the unused disk space to be released.
The DB2 Backup utility copies the extents in DMS table spaces up to the High Water Mark
to the backup image. Any Redirected RESTORE of the table space must allocate
containers with space equal to or greater than the High Water Mark at the time of the
BACKUP, even though there might be many free extents included.
V8.1
Instructor Guide

Purpose — To define the term High Water Mark associated with DMS table spaces. This
also applies to permanent automatic storage tablespaces.
Details —
Transition statement — Let's look at some examples showing the effects for activity in a
DMS table space on the High Water Mark value.
Instructor Guide
High Water Mark: Dropped table (Example 1)

Monitor using
Table space Header
MON_GET_TABLESPACE Table space Header
SMP Extent 1
SMP Extent 1
Total number of pages = 200
Object Map
Number of usable pages = 192 Object Map
Table 1 Extent 0 Extent Map Number of used pages = 120 Table 1 Extent 0 Extent Map
Number of free pages = 72
Table 1 Extent 1
High water mark (pages) = 120 Table 1 Extent 1
Table 1 Extent 2
Table 1 Extent 3
Table 1 Extent 3
Table 1 Extent 3
Table 1 Extent 4
Table 1 Extent 4
Table 1 Extent 5 DROP TABLE 2
Table 1 Extent 5
Table 2 Extent 0 Extent Map
Free
Table 2 Extent 1
Free
Table 2 Extent 2 Total number of pages = 200 Free
Number of usable pages = 192
Table 2 Extent 3
Number of used pages = 88 Free
Table 1 Extent 6 Number of free pages = 104 Table 1 Extent 6
High water mark (pages) = 120
Table 1 Extent 7
Table 1 Extent 7
Free Extents
High Water Mark did NOT change Free Extents
Extent size 8 pages
Figure 2-7. High Water Mark: Dropped table (Example 1) CL4636.0
Notes:
This shows some of the monitor elements returned by MON_GET_TABLESPACE before
and after a DROP TABLE statement for one table was executed.
By dropping Table 2, the 4 extents (32 pages) assigned to that table are now free, so the
number of used pages decreased from 120 to 88. The number of free pages increase from
72 to 104. Those pages are now available to create a new table or to extend the other
table.
Notice that before the drop table the High Water Mark was 120, which was equal to the
number of used pages. The drop table did not change the table space High Water Mark,
because the page at the High Water Mark is still in use by the remaining table.
V8.1
Instructor Guide

Purpose — This shows how a DROP TABLE can cause the number of used pages to
become less than the table space high water mark.
Details —
Transition statement — Lets look at another example of DROP TABLE processing.
Instructor Guide
High Water Mark: Dropped table (Example 2)

Monitor using
Table space Header MON_GET_TABLESPACE Table space Header
SMP Extent 1 Total number of pages = 200 SMP Extent 1
Object Map Number of usable pages = 192 Object Map
Number of used pages = 120
Table 1 Extent 0 Extent Map Free
Table 1 Extent 1 High water mark (pages) = 120 Free
Table 1 Extent 2 Free
DROP TABLE 1
Table 2 Extent 0 Extent Map Table 2 Extent 0 Extent Map
Table 2 Extent 1 Total number of pages = 200 Table 2 Extent 1
Table 2 Extent 2 Table 2 Extent 2
Table 2 Extent 3 number of free pages = 136 Table 2 Extent 3
Free Extents High Water Mark DOES change Free Extents
Extent size 8 pages

Figure 2-8. High Water Mark: Dropped table (Example 2) CL4636.0
Notes:
and after a DROP TABLE statement for the other table in the table space was executed.
By dropping Table 1, the 8 extents (64 pages) assigned to that table are now free, so the
number of used pages decreased from 120 to 56. The number of free pages increased
from 72 to 136. Those pages are now available to create a new table or to extend the other
table.
Before the dropping the table, the High Water Mark was 120, which was equal to the
number of used pages. The dropped table did reduce the table space High Water Mark
from 120 to 104, because the page at the original High Water Mark was freed. The current
High Water Mark is higher than the number of used pages because there are now 6 free
extents (48 pages) below the point of the High Water Mark.
V8.1
Instructor Guide

Purpose — This shows how a DROP TABLE can cause the High Water Mark to change
but still be greater than the number of used pages.
Details —
Transition statement — Let’s look at an example of REORG TABLE processing.
Instructor Guide
High Water Mark: Offline table REORG without

using temporary tablespace (Example 1)
Monitor using
SMP Extent 1 SMP Extent 1

Total number of pages = 200 Object Map
Object Map
Table1 Extent 0 Extent Map Number of used pages = 72 Free
number of free pages = 120 Free
Table1 Extent 1
Table1 Extent 2 Free

REORG TABLE1
Free Table 1 Extent 0 Extent Map

Free Number of usable pages = 192 Table 1 Extent 1
Number of used pages = 64 Table 1 Extent 2
Free
Free High water mark (pages) = 112 Table 1 Extent 3
Free Table 1 Extent 4
Free Free
Free Extents High Water Mark Increases Free Extents
Extent size 8 pages

Figure 2-9. High Water Mark: Offline table REORG without using temporary tablespace (Example 1) CL4636.0
Notes:
and after a REORG utility was used to reorganize the only table in a table space. This is an
offline REORG, with no temporary table space assigned.
For example:
db2 REORG TABLE TABLE1
If no temporary table space is assigned for the REORG, a new copy of the table is built
using additional extents from the table's table space. The original copy of the table remains
intact until the new copy is completed.
In this example, the reorganization was able to reduce the number of extents required for
TABLE1, so the number of used pages decreased from 72 to 64. The number of free pages
increased from 120 to 128. Those pages are now available to create a new table or to
extend TABLE1.
In this case, the original High Water Mark was 72, which was equal to the number of used
pages. The REORG increased the table space High Water Mark from 72 to 112, because
V8.1
Instructor Guide
Uempty table data was copied to higher numbered extents. The current High Water Mark is higher
than the number of used pages because there are now 6 free extents (48 pages) below the
point of the High Water Mark.
Instructor Guide
Instructor notes:
Purpose — This shows how a REORG TABLE can cause the High Water Mark to be
greater than the number of used pages.
Details —
Transition statement — Let’s look at another example of REORG TABLE processing.
V8.1
Instructor Guide
Uempty
High Water Mark: Offline table REORG without
using temporary tablespace (Example 2)
Monitor using
SMP Extent 1 Total number of pages = 200 SMP Extent 1
Object Map Number of usable pages = 192 Object Map
Free Table 1 Extent 0 Extent Map
Free High water mark (pages) = 104 Table 1 Extent 1
Free Table 1 Extent 2
Free REORG TABLE 1 Table 1 Extent 3
Free Free
Free Free
Table 1 Extent 0 Extent Map Free
Table 1 Extent 2 Number of usable pages = 192 Free
Free High water mark (pages) = 56 Free
Free Free
Free Extents High Water Mark Decreases Free Extents
Extent size 8 pages

Figure 2-10. High Water Mark: Offline table REORG without using temporary tablespace (Example 2) CL4636.0
Notes:
and after a REORG utility was used to reorganize the only table in a table space. This is an
offline REORG, with no temporary table space assigned.
For example:
db2 REORG TABLE TABLE1
Like the previous example, no temporary table space is assigned for the REORG, so the
new copy of the table is built using additional extents from the table's table space. In this
case, there were enough free extents below the High Water Mark to make the copy of the
table.
In this example, the reorganization did not reduce the number of extents required for
TABLE1, so the number of used pages remained 56 and the number of free pages
remained 136.
The original High Water Mark was 104 and there were a number of free extents below that
point in the table space. The extent holding the original High Water Mark was freed by the
Instructor Guide
reorganization so the High Water Mark decreased from 104 to 56, because table data was
copied to lower numbered extents. The current High Water Mark is now equal to the
number of used pages.
V8.1
Instructor Guide

Purpose — This shows how a REORG TABLE might reduce the High Water Mark if
enough free extents are available and the table being reorganized was holding the High
Water Mark page.
Details —
Transition statement — Next we will discuss using monitor functions to analyze table
space utilization and status.
Instructor Guide
Using MON_GET_CONTAINER and

MON_GET_TABLESPACE to check space allocations
SELECT VARCHAR(TBSP_NAME,10) AS TBSP_NAME, CONTAINER_ID, STRIPE_SET ,
CONTAINER_TYPE, TOTAL_PAGES
FROM TABLE(MON_GET_CONTAINER(NULL,-1)) AS CONT WHERE TBSP_NAME IN
('DMSMTSPD','ASMTSPD') ORDER BY TBSP_NAME,CONTAINER_ID
TBSP_NAME CONTAINER_ID STRIPE_SET CONTAINER_TYPE TOTAL_PAGES

USABLE_PAGES
---------- -------------------- -------------------- ---------------- --------------------
ASMTSPD 0 0 FILE_EXTENT_TAG 4512
DMSMTSPD 0 0 FILE_EXTENT_TAG 4000
SELECT VARCHAR(TBSP_NAME,20) AS TBSP_NAME, TBSP_TOTAL_PAGES AS TOTAL_PAGES,

TBSP_USED_PAGES AS USED_PAGES , (100 * TBSP_USED_PAGES / TBSP_TOTAL_PAGES ) AS
PCT_USED,
TBSP_PAGE_TOP AS HIGH_WATER_MARK, (100 * TBSP_PAGE_TOP / TBSP_TOTAL_PAGES ) AS
PCT_HWM
FROM TABLE (MON_GET_TABLESPACE(NULL,-1)) AS TBSP WHERE TBSP_NAME IN
('DMSMTSPD','ASMTSPD') ORDER BY TBSP_NAME
TBSP_NAME TOTAL_PAGES USED_PAGES PCT_USED HIGH_WATER_MARK PCT_HWM

----------- ----------- ----------- --------- ----------------- ---------
ASMTSPD 13536 3176 23 10904 80
DMSMTSPD 14000 3176 22 10904 77
Figure 2-11. Using MON_GET_CONTAINER and MON_GET_TABLESPACE to check space allocations CL4636.0
Notes:
The visual shows SQL query examples that can be used to gather information on the
current space usage for table spaces, including the high water mark.
The first SQL query uses the table function MON_GET_CONTAINERS to retrieve the
number and size of the containers for two tablespaces.
The second query example uses the table function MON_GET_TABLESPACE to find the
allocation size, current space usage and also the current high water mark for two
tablespaces. Notice that in both example the high water mark is significantly higher than
the current space used.
V8.1
Instructor Guide

Purpose — To show several example SQL queries that use the table function
MON_GET_CONTAINER and MON_GET_TABLESPACE to review current table space
statistics.
Details —
Transition statement — Next we will discuss how to reclaim unused space from
tablespaces.
Instructor Guide
DB2 functions to reclaim unused storage

• DMS and non-temporary Automatic Storage-managed table
spaces created after installation of DB2 9.7 have a different
internal structure that allows extents to be remapped to lower
the high water mark
• For Automatic Storage table spaces, the REDUCE MAX option
can be used to release all unused space
db2 alter tablespace ASTS1 reduce max
• For DMS, it requires two steps to release the unused extents:

1. The High Water can be reduced to match the number of used
extents with the LOWER HIGH WATER MARK option
db2 alter tablespace DMSTS1 lower high water mark
2. Next the ALTER TABLESPACE options REDUCE, RESIZE or
DROP can be used to adjust the space assigned
db2 alter tablespace DMSTS1 reduce (all containers 100 M)
Figure 2-12. DB2 functions to reclaim unused storage CL4636.0
Notes:
For a DMS or Automatic Storage table space created using DB2 9.7 or later, you can use
reclaimable storage to return unused storage to the system for reuse. Reclaiming storage
is an online operation; it does not impact the availability of data to users.
You can reclaim the unused storage at any time by using the ALTER TABLESPACE
statement with the REDUCE option:
• For Automatic Storage table spaces, the REDUCE option has sub options to specify
whether to reduce storage by the maximum possible amount or by a percentage of the
current table space size.
• For DMS table spaces, first use the ALTER TABLESPACE statement with the LOWER
HIGH WATER MARK option, and then execute the ALTER TABLESPACE statement
with the REDUCE option and associated container operation clauses.
V8.1
Instructor Guide

Purpose — To introduce the ALTER TABLESPACE options that were introduced with DB2
9.7 that can make adjusting a High Water Mark and freeing unused disk space easier.
Details —
Transition statement — Next we will discuss an example of reducing the high water mark
for an automatic storage table space.
Instructor Guide
Reclaiming space using

ALTER TABLESPACE with Automatic Storage
• When the REDUCE option of ALTER TABLESPACE is used for an Automatic
Storage table space:
– Empty containers can be dropped
– DB2 might move used extents to free space nearer the beginning of the table
space
– Containers can be re-sized
– You can specify the amount of space reduction:
• The maximum amount possible
• An amount that you specify in kilobytes, megabytes or gigabytes, or pages
• A percentage of the current size of the table space
• If you do not specify an amount, the table space size is reduced as much as
possible without moving extents
– For table spaces with reclaimable storage, the High Water Mark can be reduced
• The LOWER HIGH WATER MARK option of ALTER TABLESPACE will move the
maximum number of extents in order to lower the high water mark; however, no
container re-sizing operations are performed.
Figure 2-13. Reclaiming space using ALTER TABLESPACE with Automatic Storage CL4636.0
Notes:
When you reduce the size of an Automatic Storage table space, the database manager
attempts to lower the high water mark for the table space and reduce the size of the table
space containers. In attempting to lower the high water mark, the database manager might
drop empty containers and might move used extents to free space nearer the beginning of
the table space. Next, containers are re-sized such that total amount of space in the table
space is equal to or slightly greater than the high water mark.
To lower the high water mark, you must have an Automatic Storage table space that was
created with DB2 9.7 or later. Reclaimable storage is not available in table spaces created
with earlier versions of the DB2 product. You can see which table spaces in a database
support reclaimable storage using the MON_GET_TABLESPACE table function.
You can reduce the size of an Automatic Storage space for which reclaimable storage is
enabled in a number of ways. You can specify that the database manager reduce the table
space by:
• The maximum amount possible
V8.1
Instructor Guide
Uempty • An amount that you specify in kilobytes, megabytes or gigabytes, or pages

• A percentage of the current size of the table space.
In each case, the database manager attempts to reduce the size by moving extents to the
beginning of the table space, which, if sufficient free space is available, will reduce the high
water mark of the table space. Once the movement of extents has completed, the table
space size is reduced to the new high water mark.
If you do not specify an amount by which to reduce the table space, the table space size is
reduced as much as possible without moving extents. The database manager attempts to
reduce the size of the containers by first freeing extents for which deletes are pending. (It is
possible that some pending delete extents cannot be freed for recoverability reasons, so
some of these extents might remain.) If the high water mark was among those extents
freed, then the high water mark is lowered, otherwise no change to the high water mark
takes place. Next, the containers are re-sized such that total amount of space in the table
space is equal to or slightly greater than the high water mark. This operation is performed
using the ALTER TABLESPACE with the REDUCE clause by itself.
If you only want to lower the High Water Mark, consolidating in-use extents lower in the
table space without performing any container operations, you can use the ALTER
TABLESPACE statement with the LOWER HIGH WATER MARK clause. This option moves
the maximum number of extents in order to lower the high water mark, however, no
container re-sizing operations are performed. This operation is performed using the ALTER
TABLESPACE with the LOWER HIGH WATER MARK clause by itself.
Once a REDUCE or LOWER HIGH WATER MARK operation is under way, you can stop it
by using the REDUCE STOP or LOWER HIGH WATER MARK STOP clause of the ALTER
TABLESPACE statement. Any extents that have been moved will be committed, the high
water mark will be reduced to it's new value and containers will be re-sized to the new high
water mark.
Instructor Guide
Instructor notes:
Purpose — To discuss the enhanced options of DB2 9.7 to reduce Automatic Storage
table spaces, which might also reduce the High Water Mark.
Details —
Transition statement — Next we will see and example of using the MAX option to make
an automatic storage table space as small as possible.
V8.1
Instructor Guide
Uempty
Reclaimable Automatic Storage: Example 1
ALTER
DROP TABLE 2 TABLESPACE …
DROP TABLE 3 REDUCE MAX
Internal tablespace metadata extents

Table 1
Table 2
Table 3
Extent that is allocated to tablespace, but not to a table
Figure 2-14. Reclaimable Automatic Storage: Example 1 CL4636.0
Notes:
This is an example of reducing an Automatic Storage table space by the maximum amount
possible.
The statement would simply be: ALTER TABLESPACE TS1 REDUCE MAX
In this case, the keyword MAX is specified as part of the REDUCE clause, indicating that
the database manager should attempt to move the maximum number of extents to the
beginning of the table space. This might be used when a table is loaded for read-only
access and no additional growth is expected, until the data is refreshed.
Instructor Guide
Instructor notes:
Purpose — To show an example of using the ALTER TABLESPACE REDUCE MAX option
to make an Automatic Storage table space as small as possible.
Details —
Transition statement — Next we will see an example that reduces a table space by a
specified percentage.
V8.1
Instructor Guide
Uempty
Reclaimable Automatic Storage: Example 2
ALTER
TABLESPACE …
REDUCE 25
DROP TABLE 2
PERCENT
DROP TABLE 3
Internal tablespace metadata extents

Table 1
Table 2
Table 3
Extent that is allocated to tablespace, but not to a table
Figure 2-15. Reclaimable Automatic Storage: Example 2 CL4636.0
Notes:
This example illustrates reducing an Automatic Storage table space by a percentage of the
current table space size.
The command syntax is this case could be:
ALTER TABLESPACE TS1 REDUCE 25 PERCENT
This attempts to reduce the size of the table space TS1 to 75% of it's original size, if
possible.
This form might be used to reduce the current disk requirements for a table space, but still
retain some available free space for efficient growth.
During the extent consolidation process, extents that contain data are moved to unused
extents below the high water mark. After extents are moved, if free extents still exist below
the high water mark, they are released as free storage. Next, the high water mark is moved
to the page in the table space just after the last in-use extent.
Instructor Guide
Instructor notes:
Purpose — To show a simple method to release a portion of the unused space in an
automatic storage tablespace.
Details —
Transition statement — Next we will look at an example of the MON_GET_TABLESPACE
function to check for reclaimable table spaces.
V8.1
Instructor Guide
Uempty
Checking table spaces for reclaimable storage

select substr(TBSP_NAME,1,14) as TS_NAME ,
TBSP_TOTAL_PAGES as Total_pages,
TBSP_USED_PAGES as Used_pages ,
TBSP_PAGE_TOP as High_water_MARK ,
reclaimable_space_enabled
from table ( MON_GET_TABLESPACE ( NULL , -1 ) ) as ts
where TBSP_NAME NOT LIKE ('SYS%') and TBSP_TYPE = 'DMS'
TS_NAME TOTAL_PAGES USED_PAGES HIGH_WATER_MARK RECLAIMABLE_SPACE_ENABLED
-------------- -------------- -------------------- -------------------- -------------

USERSPACE1 8192 3264 3264 0
TP1DMSH 4608 3120 3120 0
TP1DMSAD 12000 6400 6400 0
TP1DMSAI 20000 7040 13984 1
MDCTSP1 8192 5760 5760 1
MDCTSP2 8192 2048 2048 1
• Use the MON_GET_TABLESPACE monitoring table function

• Look for table spaces with a HWM that is much higher than the
number of used pages
• Look for the reclaimable space attribute of ‘1’
Figure 2-16. Checking table spaces for reclaimable storage CL4636.0
Notes:
The ability to use the new ALTER TABLESPACE options to lower the high water mark
depend on whether the table space supports reclaimable storage. The
MON_GET_TABLESPACE table function can be used to query various table space
statistics. The column RECLAIMABLE_SPACE_ENABLED can checked for a value of ‘1',
indicating the table space supports reclaiming space.
The sample query listed is:
select substr(TBSP_NAME,1,14) as TS_NAME ,
TBSP_TOTAL_PAGES as Total_pages,
TBSP_USED_PAGES as Used_pages ,
TBSP_PAGE_TOP as High_water_MARK ,
reclaimable_space_enabled
from table ( MON_GET_TABLESPACE ( NULL , -1 ) ) as ts
where TBSP_NAME NOT LIKE ('SYS%') and TBSP_TYPE = 'DMS'
This query lists all non-SMS table spaces and excludes the system table spaces. The
report shows the total number of pages allocated, the number of pages used, the current
Instructor Guide
high water mark and whether storage could be reclaimed. The report could be used to
locate any table spaces that show a high water mark that is much higher than the number
of pages currently in use.
V8.1
Instructor Guide

Purpose — To show an example of a report generating using the
MON_GET_TABLESPACE function that was added with DB2 9.7. The snapshot-based
commands, functions and views do not indicate whether a table space is enabled for
reclaiming storage.
Details —
Transition statement — Next we will show a SQL query that can be used to monitor
progress of extent rebalancing.
Instructor Guide
Reclaimable storage: Monitoring the processing

• Reducing the table space size is an online operation:
– Application DML and DDL can run concurrently
– Online Backup operations can not be processing the extents being
moved
• Backup can wait while a group of extents are moved
• Extent movement progress can be monitored using the Table function

MON_GET_EXTENT_MOVEMENT_STATUS()
TBSP_NAME TBSP_ID .... NUM_EXTENTS_MOVED NUM_EXTENTS_LEFT TOTAL_MOVE_TIME

--------- ------- ----------------- ---------------- ---------------
USERSPACE1 2 -1 -1 -1
TS1 3 4000 2000 60000
Figure 2-17. Reclaimable storage: Monitoring the processing CL4636.0
Notes:
Reducing the size of table spaces through extent movement is an online operation. In other
words, data manipulation language (DML) and data definition language (DDL) can continue
to be run while the reduce operation is taking place. Some operations, such as a backup or
restore cannot run concurrently with extent movement operations; in these cases, the
process requiring access to the extents being moved (for example, backup) waits until a
(non-user-configurable) number of extents have been moved, at which point the backup
process obtains a lock on the extents in question, and continues from there.
You can monitor the progress of extent movement operation with an SQL statement using
the DB MON_GET_EXTENT_MOVEMENT_STATUS table function.
V8.1
Instructor Guide

Purpose — To discuss the impact access to a table space while extents are being moved
and to show an example of the MON_GET_EXTENT_MOVEMENT_STATUS table function
that could be used to monitor progress of the operation.
Details —
Transition statement — Next we will discuss the allocation of extents in table spaces with
multiple containers.
Instructor Guide
Table space extent allocation: Example 1
CREATE TABLESPACE DMSTS1

MANAGED BY DATABASE USING (FILE 'DMSCONT1' 1000,
FILE 'DMSCONT2' 1000)
EXTENTSIZE 8 ; DMST1
DMSCONT1 DMSCONT2
Container 0 Container 1
1000 pages 1000 pages
•As new extents are allocated

DB2 assigns lowest numbered free Container 0 Container 1
Extent 0
extent Extent 1
•Data for objects is balanced across Extent 2 Extent 3

Extent 4 Extent 5
the containers
Extent 6 Extent 7
................ ................
Extent 246 Extent 247
Figure 2-18. Table space extent allocation: Example 1 CL4636.0
Notes:
In a DB2 database, pages in a DMS table space are logically numbered from 0 to (N-1),
where N is the number of usable pages in the table space.
The pages in a DMS table space are grouped into extents, based on the extent size, and
from a table space management perspective, all object allocation is done on an extent
basis. That is, a table might use only half of the pages in an extent, but the whole extent is
considered to be in use and owned by that object. One extent is used to hold the container
tag, and the pages in this extent cannot be used to hold data.
Because space in containers is allocated by extent, pages that do not make up a full extent
will not be used. For example, if you have a 205-page container with an extent size of 10,
one extent will be used for the tag, 19 extents will be available for data, and the five
remaining pages are wasted.
If a DMS table space contains a single container, the conversion from logical page number
to physical location on disk is a straightforward process where pages 0, 1, 2, will be located
in that same order on disk.
V8.1
Instructor Guide
Uempty It is also a fairly straightforward process in the case where there is more than one container
and each of the containers is the same size. The first extent in the table space (containing
pages 0 to (extent size - 1)) will be located in the first container, the second extent will be
located in the second container, and so on. After the last container, the process repeats in
a round-robin fashion, starting back at the first container.
For table spaces containing containers of different sizes, DB2 can not use a simple
round-robin approach for the entire range of space because some containers can hold
more than others.
The example shows a table space created with the following DDL:
CREATE REGULAR TABLESPACE DMSTS1
MANAGED BY DATABASE USING (FILE 'DMSCONT1' 1000, FILE 'DMSCONT2' 1000)
EXTENTSIZE 8 ;
The two containers are the same size (1000 pages), so there is one range (range 0) with all
extents striped across both containers (numbered 0 and 1). Each stripe, therefore, has two
extents.
There are 2000 total pages, but two extents of 8 pages each were used by DB2 for the
container tags, so there are 1984 pages left (numbered 0 to 1983).
The 1984 pages are in 248 extents (numbered 0 to 247), so the Max Extent shown is 247.
Instructor Guide
Instructor notes:
Purpose — This shows a sample example of the table space extent mapping for a DMS
table space with two containers of the same size.
Details —
Additional information — In previous versions of this course we covered the more
detailed extent map included in the GET SNAPSHOT FOR TABLESPACES report. Since
the SNAPSHOT monitoring is deprecated, we will stay at a higher level discussing extent
mapping to containers, but explain stripe sets.
Transition statement — Next we will explain how extents are mapped when a new
container is added.
V8.1
Instructor Guide
Uempty
Table space maps:
ALTER TABLESPACE ADD example
ALTER TABLESPACE DMSTS1 ADD (FILE 'DMSCONT3' 1000);
DMSCONT1 DMSCONT2
Container 0 Container 1 DMSCONT1 DMSCONT2 DMSCONT3
1000 pages 1000 pages Container 0 Container 1 Container 2
1000 pages 1000 pages 1000 pages
Container 0 Container 1 Container 0 Container 1 Container 2

Extent 0 Extent 1 Extent 0 Extent 1 Extent 2

................ ................ ................ ................ ................
• When a new container is added to a DMS managed tablespace

using ADD DB2 will rebalance extents automatically
• Rebalance stops at the HWM for the tablespace
• REBALANCE performed asynchronous to ALTER processing
Figure 2-19. Table space maps: ALTER TABLESPACE ADD example CL4636.0
Notes:
In this example, we start with the first example table space that had two containers of 1000
pages each. As the objects in that table space begin to grow, it might be necessary to add
additional space.
One option is to add a third container as follows:

ALTER TABLESPACE DMSTS1 ADD (FILE 'DMSCONT3' 1000);
When the new container is added, it becomes part of the current stripe set, so DB2 will
rebalance the extents so that data is evenly spread among the containers of the stripe set.
The visual shows that extent number 2 needed to be moved from container 0 to the new
container number 2. The Maximum Extent increased from 247 to 371, because the new
container added 124 new extents (992 pages after using one extent for the new container's
tag).
Instructor Guide
What's not apparent is the work that DB2 had to do to get from the first table space map to
the one shown here after the ALTER.
In order to get from the original table space map shown to the new table space map shown
that results from the ALTER TABLESPACE processing, DB2 uses a utility called the
Rebalancer which moves extents one by one to reach the target table space extent
mapping.
The addition of the new container using the ADD option for this current example means
that DB2 needs to rebalance the existing data stored in the first two containers so that the
data is evenly spread across all three containers.
This is done starting at the lowest numbered extents up to the point of the current High
Water Mark for the table space. In the visual, it shows that Extent 2 is moved from
Container 0 to Container 2. Next, Extent 3 is moved from Container 1 to Container 0. This
continues, extent by extent, until the rebalancer reaches the table space High Water Mark.
Empty extents above the High Water Mark do not need to be processed, but empty extents
below the High Water Mark are moved.
Applications can continue to access the data in this table space while the rebalancer is
running.
V8.1
Instructor Guide

Purpose — To show what the rebalancer needs to do behind the scenes to get from the
original table space extent mapping to the new mapping with three containers.
Details —
Additional information — The examples show a simple case where containers are of an
equal size. You could cover many complex examples involving containers of different sizes,
but most administrators will implement one table spaces with one container or a few equal
sized containers.
Transition statement — Let's look at using the LIST UTILITIES command to check on the
status of the Rebalancer utility processing.
Instructor Guide
Monitoring the Rebalancer: LIST UTILITIES
db2 list utilities show detail

ID = 21
Type = REBALANCE
Database Name = MUSICDB
Member Number = 0
Description = Tablespace ID: 12
Start Time = 09/23/2013 11:01:54.027168
State = Executing
Invocation Type = User
Throttling:
Priority = Unthrottled
Progress Monitoring:
Estimated Percentage Complete = 0
Total Work = 397 extents
Completed Work = 0 extents
Start Time = 09/23/2013 11:01:54.027408
db2 SET UTIL_IMPACT_PRIORITY FOR 21 TO 20
Figure 2-20. Monitoring the Rebalancer: LIST UTILITIES CL4636.0
Notes:
Even though the ALTER TABLESPACE command seems to complete quickly, the
rebalancer utility processing triggered might take a long time to complete and perform a
large number of I/Os.
The DB2 command, LIST UTILITIES, can be used to check on the progress of a
Rebalancer utility that is currently active in the database. You can also use the -utilities
option of the db2pd command.
This example in the visual shows that the Rebalancer utility has just started processing.
It has will move 397 extents that require rebalancing in the table space with a ID of 12.
Notice that the LIST UTILITIES output shows that the rebalancer is running unthrottled.
If the DBM configuration option, UTIL_IMPACT_LIM, is set to a value less than 100, the
SET UTIL_IMPACT_PRIORITY command could be used to throttle the rebalancer
processing to limit the impact on other applications.
V8.1
Instructor Guide

Purpose — This shows an example of using LIST UTILITIES to monitor the progress of
the Rebalancer utility.
Details —
Transition statement — Next we will look at the db2diag.log file messages generated by
the rebalancer processing.
Instructor Guide
Rebalancer: db2diag.log Status messages

2013-09-24-11.23.55.275889-240 E93263E451 LEVEL: Warning
PID : 31445 TID : 140165699856128 PROC : db2sysc 0
INSTANCE: inst461 NODE : 000 DB : MUSICDB
HOSTNAME: ibmclass
EDUID : 71 EDUNAME: db2rebal (MUSICDB) 0
FUNCTION: DB2 UDB, buffer pool services, sqlb_fwd_rebalance, probe:5514
MESSAGE : ADM6058I Rebalancer for table space "DMSMTSPD" (ID "11") was
started.
2013-09-24-11.23.57.476180-240 I93715E465 LEVEL: Info

HOSTNAME: ibmclass
FUNCTION: DB2 UDB, buffer pool services, sqlb_fwd_rebalance, probe:6735
DATA #1 : <preformatted>
Table Space ID 11: Last extent moved was #997 and number of extents moved was #996.
2013-09-24-11.23.57.479433-240 E94181E453 LEVEL: Warning

HOSTNAME: ibmclass
FUNCTION: DB2 UDB, buffer pool services, sqlb_rebalance, probe:8623
MESSAGE : ADM6062I Rebalance for table space "DMSMTSPD" (ID "11") has been
completed.
Figure 2-21. Rebalancer: db2diag.log Status messages CL4636.0
Notes:
The LIST UTILITIES and db2pd -utilities commands can be used to get information about
an active Rebalancer utility.
Once the processing has completed, there are messages in the db2diag.log file that show
when each Rebalancer utility starts and completes and the number of extents processed.
This could be helpful information if there was a question about a system performance
problem that happened previously. You could check to see if there was a table space being
rebalanced at that time.
These are considered at the Warning level for messages.
V8.1
Instructor Guide

Purpose — To discuss the db2diag.log messages generated when a tablespace rebalance
operation is processed.
Details —
Transition statement — Next we will look at adding a container to a table space using a
new stripe set.
Instructor Guide
Extent Allocations using ALTER NEW STRIPE

SET
ALTER TABLESPACE DMSTS1 BEGIN NEW STRIPE SET (FILE 'DMSCONT3' 1000)
DMSCONT1 DMSCONT2
Container 0 DMSCONT1 DMSCONT2
Container 1
Container 0
Stripe Set 0
1000 pages 1000 pages Container 1
1000 pages 1000 pages
Stripe Set 1
DMSCONT3
Container 2
1000 pages
Set Container 0 Container 1 Container 2

•Extent Rebalanicing is 0 Extent 0 Extent 1
not necessary: 0 Extent 2 Extent 3
•No data will be stored in 0 Extent 4 Extent 5
the new container until the 0 ................ ................
first two containers are 0 Extent 246 Extent 247
completely filled. 1 Extent 248
1 .................
1 Extent 371
Figure 2-22. Extent Allocations using ALTER NEW STRIPE SET CL4636.0
Notes:
When added a new container with the ADD option, we made additional extents available to
the DMS table space, but the rebalancer processing can require a large number of I/Os.
The EXTEND and RESIZE options can be used to change the current file containers to add
extents without rebalancing, but there might not be enough free space available in the
current file systems to increase the current containers.
The BEGIN NEW STRIPE SET option for ALTER TABLESPACE can be used to add new
containers to a DMS table space and bypass the rebalancer processing. This makes the
new space available as quickly as possible.
In the example, there are two file containers each with 1000 pages.
If the following SQL is used:
ALTER TABLESPACE DMSTS1 BEGIN NEW STRIPE SET (FILE 'DMSCONT3' 1000)
This will add a new container with 1000 pages to the table space.
V8.1
Instructor Guide
Uempty Notice that the original containers are considered to be Stripe Set 0, while the new
container (container 2) is the only container assigned to the new the Stripe Set 1.
This new stripe set begins with extent 248 up to The Max Extent 371. No data will be stored
in the new container until the first two containers are completely filled.
Instructor Guide
Instructor notes:
Purpose — This shows an example of using the BEGIN NEW STRIPE SET option to
increase the available space in a DMS table space and avoid the overhead of rebalancing
the extents.
Details —
Transition statement — Next we will discuss how a DMS table space can be moved using
ALTER TABLESPACE commands.
V8.1
Instructor Guide
Uempty
Moving a DMS managed tablespace to new containers
Online using ALTER TABLESPACE
• A DMS managed tablespace could be moved to new
containers online using ALTER TABLESPACE commands
• First use ALTER TABLESPACE to add containers with

sufficient space for to HWM of TS
– Use BEGIN NEW STRIPE SET to defer rebalancing work
ALTER TABLESPACE DMSTS1 BEGIN NEW STRIPE SET
(FILE 'DMSCONT3' 1000 , FILE 'DMSCONT4' 1000 )
• Next DROP original containers to complete the move

ALTER TABLESPACE DMSTS1 DROP (FILE 'DMSCONT1' , FILE 'DMSCONT2' )
Figure 2-23. Moving a DMS managed tablespace to new containers Online using ALTER TABLESPACE CL4636.0
Notes:
You could move a DMS tablespace to a new set of containers using a BACKUP and
RESTORE with the REDIRECT option but that would require the tablespace to be offline
during the RESTORE processing.
You can use ALTER TABLESPACE commands to move a DMS tablespace while it remains
online.
The first example command uses ALTER TABLESPACE with the BEGIN NEW STRIPE
SET option to allocate the new containers, but a rebalance operation is not required at this
point.
To complete the data movement, the ALTER TABLESPACE with the DROP option can be
used to remove the original containers. The REBALANCE utility will be invoked to move the
extents to available space in the new containers.
Instructor Guide
Instructor notes:
Purpose — To discuss using ALTER TABLESPACE commands to move a DMS managed
tablespace to a new set of containers while the table space remains online. Later we will
discuss how storage groups can be used to move automatic storage managed tablespaces
in a similar manner.
Details —
Transition statement — Next we will discuss how autogrowth works with DMS managed
tablespaces.
V8.1
Instructor Guide
Uempty
Example of Auto-growth stopping for DMS
How do you kick start

Auto-resize auto-growth again?
table space C0 C1 1. Make more room
created with C0 C1 available on the file
C0 C1 system holding C1
2 containers Table Table
space space 2. Extend C0 by some
grows grows
amount
3. Add a new stripe set
(recommended if #1
not possible)
1) 2) Extending C0 results 3) Adding the new

in a new range being stripe set here
created that holds results in a new
C0 C1 C0 C1 only that one C0 C1 range being created
container. Hence, in the tables space
auto-resize will only map. Hence, auto-
extend that one resize will only
container. extend these new
C2 C3 containers from here
on.
Figure 2-24. Example of Auto-growth stopping for DMS CL4636.0
Notes:
DMS table spaces are made up of file containers or raw device containers, and their sizes
are set when the containers are assigned to the table space. The table space is considered
full when all of the space within the containers has been used. You can add or extend
containers using the ALTER TABLESPACE statement, allowing more storage space to be
given to the table space.
DMS table spaces also have a feature called autoresize. As space is consumed in a DMS
table space that can be automatically resized, DB2 can extend one or more file containers.
To enable the AUTORESIZE feature specify the AUTORESIZE YES clause as part of the
CREATE TABLESPACE statement:
CREATE TABLESPACE DMS1 MANAGED BY DATABASE
USING (FILE '/db2files/DMS1' 10 M) AUTORESIZE YES
Instructor Guide
You can also enable or disable the AUTORESIZE feature after a DMS table space has
been created by using the AUTORESIZE clause on the ALTER TABLESPACE statement:
ALTER TABLESPACE DMS1 AUTORESIZE YES
ALTER TABLESPACE DMS1 AUTORESIZE NO
Two other attributes, MAXSIZE and INCREASESIZE, are associated with autoresize table
spaces.
Maximum size (MAXSIZE)
The MAXSIZE clause on the CREATE TABLESPACE statement defines the maximum size
for the table space. For example, the following statement creates a table space that can
grow to 100 megabytes (per partition if the database has multiple partitions):
USING (FILE '/db2files/DMS1' 10 M)
AUTORESIZE YES MAXSIZE 100 M
The MAXSIZE NONE clause specifies that there is no maximum limit for the table space.
The table space can grow until a file system limit or DB2's table space limit has been
reached (see the SQL Limits section in the SQL Reference). No maximum limit is the
default if the MAXSIZE clause is not specified when the AUTORESIZE feature is enabled.
The ALTER TABLESPACE statement changes the value of MAXSIZE for a table space that
has AUTORESIZE already enabled. For example:
ALTER TABLESPACE DMS1 MAXSIZE 1 G
ALTER TABLESPACE DMS1 MAXSIZE NONE

If a maximum size is specified, the actual value that DB2 enforces might be slightly smaller
than the value provided because DB2 attempts to keep container growth consistent. It
might not be possible to extend the containers by equal amounts and reach the maximum
size exactly.
Increase size (INCREASESIZE)
The INCREASESIZE clause on the CREATE TABLESPACE statement defines the amount
of space used to increase the table space when there are no free extents within the table
space, and a request for one or more extents has been made. The value can be specified
as an explicit size or as a percentage. For example:
AUTORESIZE YES INCREASESIZE 5 M

AUTORESIZE YES INCREASESIZE 50 PERCENT
V8.1
Instructor Guide
Uempty A percentage value means that the increase size is calculated every time that the table
space needs to grow, and growth is based on a percentage of the table space size at that
time. For example, if the table space is 20 megabytes in size and the increase size is 50
percent, the table space grows by 10 megabytes the first time (to a size of 30 megabytes)
and by 15 megabytes the next time.
If the INCREASESIZE clause is not specified when the AUTORESIZE feature is enabled,
the database manager determines an appropriate value to use, which might change over
the life of the table space. Like AUTORESIZE and MAXSIZE, you can change the value of
INCREASESIZE using the ALTER TABLESPACE statement.
If a size increase is specified, the actual value used by DB2 might be slightly different than
the value provided. This adjustment in the value used is done to keep growth consistent
across the containers in the table space.
In this slide, the dashed boxes represent file systems and the amount of space they have
available to them. As growth occurs (and this growth occurs at the same rate for C0 and
C1, keeping them at the same size), the file system holding C1 becomes full. At this point,
the table space will no longer grow automatically.
Case #1: In this case, free space is being made on the file system holding C1. This can
occur by deleting non-DB2 files that might exist, or by extending the file system in size
(platform dependent). Since C0 and C1 are the containers in the last range of the table
space map and there is now free space on each of the file systems holding them, when an
AUTORESIZE operation is attempted, it will be successful.
Case #2: In this case, container C0 is being extended by some amount of space (minimum
would need to be one extent). In doing this, a new range is added to the table space map
and this range will only include container C0. Therefore, when an AUTORESIZE operation
is attempted next by DB2, it will only attempt to grow container C0. Because there is free
space available on that file system, the container extension should succeed.
Case #3: In this case, the user is deciding that they want to maintain the same amount of
striping so they add two new containers to the table space using the BEGIN NEW STRIPE
SET option of ALTER TABLESPACE. The result of this is that new space is added to the
table space and the data that currently exists in the table space will *not* be rebalanced
onto the new containers. And, of course, a new stripe set (which in this case is a single new
range) is added to the table space map. So, when an AUTORESIZE operation is next
attempted, it will try to resize those containers that are in the last range of the map, which
happens to be C2 and C3 in this case.
Instructor Guide
Instructor notes:
Purpose — To introduce the autoresize options for DMS table spaces.
Details —
Transition statement — Next we will discuss using storage groups for automatic storage
tablespaces.
V8.1
Instructor Guide
Uempty
Using Storage Groups for Automatic storage
table spaces
Partitioned Table Sales
Partition 2012Q1 2011Q4 2011Q3 2011Q2 2011Q1 2010Q4 … 2006Q3
Automatic Table Space 14 Table Space 13 Table Space 12 Table Space 11 Table Space 10 Table Space 9 … Table Space 1
Storage …
Table Space
spath: spath: /warm/fs1 spath: /cold/fs1

/hot/fs1 spath: /warm/fs2 spath: /cold/fs2
Storage spath: /cold/fs3
DB2 SG_HOT
Group SG_WARM SG_COLD
10.1
Physical Disk
SSD RAID Array FC/SAS RAID Array SATA RAID Array
Figure 2-25. Using Storage Groups for Automatic storage table spaces CL4636.0
Notes:
With introduction of storage groups, now there is an extra layer of abstraction between tbsp
and disks
Now you can group table spaces sharing similar characteristics into same group of storage
paths.
After you create storage groups that map to the different classes of storage in your
database management system, you can assign automatic storage table spaces to those
storage groups, based on which table spaces have hot or cold data.
You can dynamically reassign a table space to a different storage group as the data
changes or your business direction changes.
- SSD = Solid State Drive
- FC = Fiber Channel
- SAS = Serial Attached SCSI
- SATA = Serial ATA
Instructor Guide
Instructor notes:
Purpose — To show an example of a range partitioned table that uses multiple storage
groups to manage the table spaces with different temperature of data.
Details —
Transition statement — Next we will look at how to define a storage group.
V8.1
Instructor Guide
Uempty
Review - Creating a storage group

• Use the CREATE STOGROUP statement to create a new
storage group
>>-CREATE--STOGROUP--storagegroup-name-------------------------->
.-,--------------.
V |
>--ON---'storage-path'-+--Ɣ------------------------------------->
>--+----------------------------------+--Ɣ---------------------->
'-OVERHEAD--number-of-milliseconds-'
>--+-----------------------------------------------+--Ɣ--------->
'-DEVICE READ RATE--number-megabytes-per-second-'
>--+--------------------------------+--Ɣ------------------------>
'-DATA TAG--+-integer-constant-+-'
'-NONE-------------'
>--+----------------+--Ɣ---------------------------------------><
'-SET AS DEFAULT-'
CREATE STOGROUP HIGHEND
ON '/dbe/filesystem1', '/db2/filesystem2'
OVERHEAD 0.75 DEVICE READ RATE 500
CREATE STOGROUP MIDRANGE ON 'D:\', 'E:\' SET AS DEFAULT

Figure 2-26. Review - Creating a storage group CL4636.0
Notes:
Use the CREATE STOGROUP statement to create storage groups. Creating a storage
group within a database assigns storage paths to the storage group.
To create a storage group by using the command line, enter the following statement:
CREATE STOGROUP operational_sg ON '/filesystem1', '/filesystem2',
'/filesystem3'...
where operational_sg is the name of the storage group and /filesystem1,
/filesystem2, /filesystem3 , ... are the storage paths to be added.
Instructor Guide
Instructor notes:
Purpose — To discuss using the CREATE STOGROUP statement to define a storage
group.
Details —
Transition statement — Next we will discuss how to assign an automatic storage table
space to storage group.
V8.1
Instructor Guide
Uempty
Review - Assigning a table space to a storage group

• A specific storage group can be selected with the USING
STOGROUP option of CREATE TABLESPACE
create tablespace tsp04

managed by automatic storage using stogroup app_data
initialsize 100 K maxsize none
extentsize 2;
• The database storage path defined when the database is

created named , IBMSTOGROUP
– This will be the default storage group for automatic storage table
spaces when USING STOGROUP is not specified
– The ALTER STOGROUP statement option SET AS DEFAULT can be
used to change which storage group is the default for a database
Figure 2-27. Review - Assigning a table space to a storage group CL4636.0
Notes:
Using the CREATE TABLESPACE statement or ALTER TABLESPACE statement, you can
specify or change the storage group a table space uses. If a storage group is not specified
when creating a table space, then the default storage group is used.
Any table spaces that use the same storage group can have different PAGESIZE and
EXTENTSIZE values. These attributes are related to the table space definition and not to
the storage group.
Instructor Guide
Instructor notes:
Purpose — To discuss how an automatic storage table space can be assigned to a
specific or a default storage group.
Details —
Transition statement — Next we will discuss using SQL queries to check the storage
groups for a database.
V8.1
Instructor Guide
Uempty
Query storage groups with SQL using the table function
ADMIN_GET_STORAGE_PATHS
select varchar(storage_group_name,20) as storage_group,
storage_group_id,
varchar(db_storage_path,20) as storage_path,
db_storage_path_state,
(fs_total_size / 1000000) as total_path_MB,
(sto_path_free_size / 1000000) as path_free_MB
from table(admin_get_storage_paths('',-1)) as T1
STORAGE_GROUP STORAGE_GROUP_ID STORAGE_PATH DB_STORAGE_PATH_STATE

-------------------- ---------------- -------------------- ---------------------
IBMSTOGROUP 0 /dbauto/path1 IN_USE
APP_DATA 1 /dbauto/path2 IN_USE
TOTAL_PATH_MB PATH_FREE_MB
-------------------- --------------------
20940 5649
20940 5649
Figure 2-28. Query storage groups with SQL using the table function ADMIN_GET_STORAGE_PATHS CL4636.0
Notes:
The ADMIN_GET_STORAGE_PATHS table function returns a list of automatic storage
paths for each database storage group, including file system information for each storage
path.
Syntax
>>-ADMIN_GET_STORAGE_PATHS--(--storage_group_name--,--member--)-><
The schema is SYSPROC.
Table function parameters

storage_group_name - An input argument of type VARCHAR(128) that specifies a
valid storage group name in the currently connected database when this function is
called. If the argument is NULL or an empty string, information is returned for all storage
Instructor Guide
groups in the database. If the argument is specified, information is only returned for the
identified storage group.
member - An input argument of type INTEGER that specifies a valid member in the
same instance as the currently connected database when calling this function. Specify
-1 for the current database member, or -2 for all database members. If the NULL value
is specified, -1 is set implicitly.
Authorization
One of the following authorities is required to execute the routine:
• EXECUTE privilege on the routine
• DBADM authority
• SQLADM authority
V8.1
Instructor Guide

Purpose — To show an example of a query that uses the table function
ADMIN_GET_STORAGE_PATHS to get information about the storage paths for a
database.
Details —
Transition statement — Next we will look at using the db2pd command to list storage path
information for a database.
Instructor Guide
Listing storage groups with the db2pd command

• db2pd –db testdb –storagegroups
Database Member 0 -- Database MUSICDB -- Active -- Up 0 days 00:09:09 -- Date

03/23/2012 09:13:49
Storage Group Configuration:

Address SGID Default DataTag Name
0x8F241740 0 Yes 0 IBMSTOGROUP
0x8F240490 1 No 0 SG_HIGH
0x90C39640 2 No 0 SG_LOW
Storage Group Statistics:

Address SGID State Numpaths NumDropPen
0x8F241740 0 0x00000000 2 0
0x8F240490 1 0x00000000 2 0
0x90C39640 2 0x00000000 2 0
Storage Group Paths:

Address SGID PathID PathState PathName
0x8F241850 0 0 InUse /dbauto/path1
0x8F241BF0 0 1 InUse /dbauto/path2
0x94F6F210 1 1024 InUse /dbauto/path1/sg_high
0x94F6F510 1 1025 InUse /dbauto/path2/sg_high
0x90C39750 2 2048 InUse /dbauto/path1/sg_low
0x90C39AF0 2 2049 InUse /dbauto/path2/sg_low
Figure 2-29. Listing storage groups with the db2pd command CL4636.0
Notes:
The visual shows a sample report generated using the -storagegroups option of the db2pd
command.
The report shows the current default storage group and the paths assigned to each storage
group.
V8.1
Instructor Guide

Purpose — To look at a sample db2pd command report showing the paths for the storage
groups for a database.
Details —
Transition statement — Next we will discuss changing the storage group for a
tablespace.
Instructor Guide
Changing the storage group for an Automatic Storage

table space
• An ALTER TABLESPACE statement can be used to move a
table space data to a new storage group
ALTER TABLESPACE TbSpc USING sg_target
• When the ALTER TABLESPACE is committed an implicit

REBALANCE operation is used to move the extents to new
containers and free the original containers
1. New containers are allocated on the target storage group’s storage

paths.
2. All original containers are marked drop pending and new allocation
requests are satisfied from the new containers
3. A reverse rebalance is preformed, moving data off of the containers
on the paths being dropped
4. The containers are physically dropped
Figure 2-30. Changing the storage group for an Automatic Storage table space CL4636.0
Notes:
When the table space is moved to the new storage group, the containers in the old storage
group are marked as drop pending. After the ALTER TABLESPACE statement is
committed, containers are allocated on the new storage group’s storage paths, the existing
containers residing in the old storage groups are marked as drop pending, and an implicit
REBALANCE operation is initiated. This operation allocates containers on the new storage
path and rebalances the data from the existing containers into the new containers. The
number and size of the containers to create depend on both the number of storage paths in
the target storage group and on the amount of free space on the new storage paths. The
old containers are dropped, after all the data is moved.
Moving a table space to a new storage group includes the following steps:
1. New containers are allocated on the target storage group’s storage paths.
2. All original containers are marked drop pending and new allocation request are satisfied
from the new containers.
V8.1
Instructor Guide
Uempty 3. A reverse rebalance is preformed, moving data off of the containers on the paths being
dropped.
4. The containers are physically dropped.
Instructor Guide
Instructor notes:
Purpose — To discuss using ALTER TABLESPACE to change the storage group for a
table space.
Details —
Transition statement — Next we will discuss the ability to suspend a rebalance operation
and then resume the processing at a later time.
V8.1
Instructor Guide
Uempty
Tablespace rebalance can be suspended using ALTER
TABLESPACE
• With DB2 10.1, you can explicitly suspend a table space
rebalance operation that is in progress during performance-
sensitive periods and resume at a later time
– To suspend the rebalance operation, issue the ALTER TABLESPACE
statement with the REBALANCE SUSPEND clause
db2 alter tablespace ts02 rebalance suspend
– The rebalance operation in placed in a suspended state
– The suspended state is persistent and the rebalance operation is
restarted upon database activation.
– To resume the operation, issue the ALTER TABLESPACE statement
with the REBALANCE RESUME clause
– You can monitor rebalance operations in progress using the
MON_GET_REBALANCE_STATUS table function.
• The implicit rebalance started when a table space is altered to
a new storage group can also be suspended
Figure 2-31. Tablespace rebalance can be suspended using ALTER TABLESPACE CL4636.0
Notes:
Starting with DB2 10.1 the ALTER TABLESPACE statement has a clause that allows you to
explicitly suspend a rebalance operation that is in progress during performance-sensitive
periods and resume at a later time.
To suspend the rebalance operation, issue the ALTER TABLESPACE statement with the
REBALANCE SUSPEND clause. This places the operation into suspended state.
To resume the operation, issue the ALTER TABLESPACE statement with the REBALANCE
RESUME clause.
The suspended state is persistent and the rebalance operation is restarted upon database
activation.
You can monitor rebalance operations in progress using the
MON_GET_REBALANCE_STATUS table function.
Instructor Guide
Instructor notes:
Purpose — To discuss using the ALTER TABLESPACE command to suspend and resume
rebalance processing for a table space.
Details —
Transition statement — Next we will look at an example that uses SQL to monitor a table
space rebalance.
V8.1
Instructor Guide
Uempty
Monitoring extent movement for when the storage group
is altered for a table space
• The MON_GET_REBALANCE_STATUS function can be used to monitor
rebalance processing when a table space is move to a new storage group
select varchar(tbsp_name,20) as tbsp_name,

rebalancer_mode, rebalancer_status,
rebalancer_extents_remaining,
rebalancer_extents_processed,
varchar(rebalancer_target_storage_group_name,20) as
target_group
from table(mon_get_rebalance_status(NULL,-2)) as T1
TBSP_NAME REBALANCER_MODE REBALANCER_STATUS
-------------------- ------------------------------ -----------------
CLPMTSP1 FWD_REBAL_OF_2PASS ACTIVE
REBALANCER_EXTENTS_REMAINING REBALANCER_EXTENTS_PROCESSED TARGET_GROUP

---------------------------- ---------------------------- --------------------
741 41 SG_HIGH
Figure 2-32. Monitoring extent movement for when the storage group is altered for a table space CL4636.0
Notes:
You can use the MON_GET_REBALANCE_STATUS table function to monitor the progress
of rebalance operations on a database.
This procedure returns data for a table space only if a rebalance operation is in progress.
Otherwise, no data is returned.
To monitor a table space rebalance operation:
Instructor Guide
Issue the MON_GET_REBALANCE_STATUS table function with the tbsp_name and

dbpartitionnum parameters:
select
varchar(tbsp_name, 30) as tbsp_name,
dbpartitionnum,
member,
rebalancer_mode,
rebalancer_status,
rebalancer_extents_remaining,
rebalancer_extents_processed,
rebalancer_start_time
from table(mon_get_rebalance_status(NULL,-2)) as tResults
This visual shows an example of the output for monitoring the progress of a table space
rebalance operation with an SQL query.
V8.1
Instructor Guide

Purpose — To show an example of a query that uses the MON_GET_REBALANCE
_STATUS table function to monitor progress for a rebalance operation.
Details —
Transition statement — Next we will discuss how AUTORESIZE works for automatic
storage tablespaces.
Instructor Guide
Table space growth with Automatic Storage
Two storage paths The third storage To continue growing,

and a table space has path is not used by the table space must
a container on each the table space yet add a new stripe set
C0 C1 C0 C1 C0 C1
Third storage TS grows until New stripe set
path is added C0 can't grow added automatically
Only now is the To continue growing, C4 will grow as the

recently added storage the table space must table space grows
path utilized add a new stripe set from here on
C0 C1 C0 C1 C0 C1
TS grows until New stripe set
C2 C3 C2 can't grow C2 C3 added automatically C2 C3
Note: For
C4 simplicity, we're
just showing one
table space within
the database
Figure 2-33. Table space growth with Automatic Storage CL4636.0
Notes:
This slide is intended to show how AUTORESIZE occurs in an Automatic storage-managed
table spaces. Like the AUTORESIZE behavior described earlier in this presentation,
containers in the last range of the table space map extend when an AUTORESIZE
operation is needed. However, once one of those containers cannot be grown any further
(usually due to a file system full error, but could also be the result of a file system limitation
(for example, there is a maximum container size) or a low ulimit setting), the Storage
Manager component within DB2 is queried for a new list of containers to create. At this
point, all of the storage paths are taken into consideration, including recently added ones.
V8.1
Instructor Guide

Purpose — This shows how table space growth occurs with Automatic Storage table
spaces. The diagram shows the third file system staggered compared to the other two. This
is just to show the layout of containers compared to each other. You can look at data
placement as a waterfall or trickle from one set of containers to the next.
Details —
Transition statement — Next we will see how the ALTER TABLESPACE REBALANCE
function can be used to better utilize new storage paths.
Instructor Guide
Automatic Storage Rebalance to use newly added

storage paths
ALTER TABLESPACE myts REBALANCE

High Water Mark
Two storage paths One New storage paths REBALANCE causes DB2 If table space is not growing
table space has a not used by the to create equal-sized rapidly, consider REDUCing
container on each path table space containers in new paths, it to make space available
immediately and redistribute extents to for other table spaces
them
p1 p2 p1 p2 p3 p4 p1 p2 p3 p4 p1 p2 p3 p4
C0 C1 C2 C3
C0 C1 C0 C1 C0 C1 C2 C3
ALTER STOGROUP… ALTER TABLESPACE ... ALTER TABLESPACE ...

ADD ‘p3’,’ p4’ REBALANCE REDUCE
(optional)
High Water Mark
• The REBALANCE option can be used to cause DB2 to utilize new

storage paths for a specific table space to increase I/O parallelism
Figure 2-34. Automatic Storage Rebalance to use newly added storage paths CL4636.0
Notes:
The visual shows how a database administrator could add new Automatic Storage paths
and direct DB2 to use those paths to provided would handle a sequence of events.
1. An Automatic Storage table space, TS1 is created in a STOGROUP with two balanced
storage paths (p1 and p2). The table space is assigned two equal sized containers (C0
and C1), one on each path.
2. Planning for growth, a database administrator might add two more Automatic Storage
paths to the STOGROUP (p3 and p4). The containers for the TS1 table space would
not be automatically effected by the new storage paths.
3. If the DBA decides that the table space TS1 would benefit from using the new storage
paths, the ALTER TABLESPACE TS1 REBALANCE statement would cause DB2 to
allocate two new containers (C2 and C3) on the new paths. The size of the new
containers would be equal to the size of the original containers. DB2 would
automatically rebalance the extents so that the data would be evenly spread over all
four storage paths. Having all four containers of an equal size would allow the table
space to continue to grow in a balanced manner.
V8.1
Instructor Guide
Uempty 4. If the table space TS1 is not expected to need the newly added space soon, an ALTER
TABLESPACE REDUCE function can be used to release either a portion or all unused
space.
Instructor Guide
Instructor notes:
Purpose — To show how the ALTER TABLESPACE REBALANCE function can be used to
give a DBA greater control of the use of Automatic Storage paths for selected table spaces.
Details —
Transition statement — Next we will look at dropping automatic storage paths.
V8.1
Instructor Guide
Uempty
Dropping storage paths: Example
p1 p2 p3 p1 p2 p3 p1 p3
ALTER TABLESPACE
TS1 TS1 REBALANCE TS1
C1 C2 C3 C1 C2 C3
C1 C3
C1 C2 C3 TS2 C1 C2 C3
ALTER TABLESPACE
TS2 REBALANCE
C1 C3 TS2
ALTER STOGROUP…
DROP ‘p2’
Drop Pending
Figure 2-35. Dropping storage paths: Example CL4636.0
Notes:
The DROP option for the ALTER STOGROUP command specifies that one or more
storage paths are to be removed from the collection of storage paths defined for a storage
group. If table spaces are actively using a storage path being dropped, then the state of the
storage path is changed from "In Use" to "Drop Pending” and future use of the storage path
will be prevented.
Before the operation of dropping a storage path can be completed, any table space
containers on that path must be removed. If an entire table space is no longer needed, you
can drop it before dropping the storage path from the database. In this situation, no
rebalance is required. If, however, you want to keep the table space, a REBALANCE
operation is required. In this case, when there are storage paths in the "drop pending"
state, the database manager performs a reverse rebalance, where movement of extents
starts from the high water mark extent (the last possible extent containing data in the table
space), and ends with extent 0.
When the REBALANCE operation is run, the following takes place:
Instructor Guide
- A reverse rebalance is performed. Data in any containers in the "drop pending" state
is moved into the remaining containers.
- The containers in the drop pending state are dropped.
- If this is the last table space using the storage path, then the storage path is dropped
as well.
- If the containers on the remaining storage paths are not large enough to hold all of
the data being moved, the database manager might have to first create or extend
containers on the remaining storage paths before performing the rebalance.
V8.1
Instructor Guide

Purpose — To discuss the statements that could be used to remove Automatic Storage
paths from a storage group.
Details —
Transition statement — Next we will see how these new storage path and tablespace
states created by dropping an automatic storage path can be listed.
Instructor Guide
Automatic Storage for Temporary table spaces

• Table spaces are created using SMS as the underlying table
space type
– The auto-resize options have no meaning and cannot be specified
• Remember that SMS is already an auto-extend type of infrastructure
(where objects grow by a page or extent at a time)
• Differences between automatic storage and non-automatic

storage temporary table spaces:
Non-automatic storage Automatic Storage
Containers must be explicitly provided when Containers cannot be provided when the
the table space is created. table space is created, they will be assigned
and allocated automatically by DB2.
Containers cannot be added after the table DB2 will redefine the containers across the
space has been created. storage paths at database startup.
A redirected restore operation can be used A redirected restore operation cannot be

to redefine the containers associated with used to redefine the containers associated
the table space. with the table space because DB2 is in
control of space management.
Figure 2-36. Automatic Storage for Temporary table spaces CL4636.0
Notes:
For System and User Temporary table spaces that are managed by Automatic Storage, the
underlying structures are SMS table spaces.
The table shows the differences between Automatic Storage and traditional SMS
management for these temporary table spaces.
V8.1
Instructor Guide

Purpose — This shows the characteristics of temporary table spaces that are managed by
Automatic Storage.
Details —
Transition statement — Next we will discuss converting a DMS table space to use
Automatic Storage management.
Instructor Guide
Converting a
DMS table space to use Automatic Storage
1. ALTER TABLESPACE can be used to convert a DMS table
space to an Automatic Storage table space without an
outage:
– ALTER TABLESPACE … MANAGED BY AUTOMATIC STORAGE
– All new growth comes from the Automatic Storage paths
– Old (file or raw) containers can be removed using ALTER
TABLESPACE REBALANCE
2. A REDIRECTED RESTORE can be used to convert DMS

table spaces to AUTOMATIC STORAGE table spaces
– SET TABLESPACE CONTAINERS … USING AUTOMATIC
STORAGE
Figure 2-37. Converting a DMS table space to use Automatic Storage CL4636.0
Notes:
Beginning with DB2 9.7, you can convert some or all of your database-managed space
(DMS) table spaces in a database to use Automatic Storage. Using Automatic Storage
simplifies your storage management tasks.
To convert a DMS table space to use Automatic Storage, use one of the following methods:
• Alter a single table space. This method keeps the table space online but involves a
rebalance operation that takes time to move data from the non-Automatic Storage
containers to the new Automatic Storage containers.
- Issue the ALTER TABLESPACE statement, specifying the MANAGED BY
AUTOMATIC STORAGE clause for the table space that you want to convert. New
containers are added, but no extent movement is performed by this operation.
- Issue the ALTER TABLESPACE statement again, this time specifying the
REBALANCE option. This option removes the original user-defined containers so
that all table space containers are managed by Automatic Storage. If you do not
specify the REBALANCE option now and issue the ALTER TABLESPACE statement
V8.1
Instructor Guide
Uempty later with the REDUCE option, your Automatic Storage containers will be removed.
To recover from this problem, issue the ALTER TABLESPACE statement, specifying
the REBALANCE option.
• Use a redirected restore operation. If you are converting a single table space with this
method, you cannot access the table space while the operation is in progress. If you are
converting multiple table spaces, you cannot access the entire database while the
operation is in progress.
- Run the RESTORE DATABASE command, specifying the REDIRECT parameter. If
you want to convert a single table space, also specify the TABLESPACE parameter: 
RESTORE DATABASE database_name TABLESPACE table_space_name
REDIRECT
- Run the SET TABLESPACE CONTAINERS command, specifying the USING
AUTOMATIC STORAGE parameter, for each table space that you want to convert: 
SET TABLESPACE CONTAINERS FOR tablespace_id USING AUTOMATIC
STORAGE
- Run the RESTORE DATABASE command again, this time specifying the
CONTINUE parameter: 
RESTORE DATABASE database_name CONTINUE
- Run the ROLLFORWARD DATABASE command, specifying the TO END OF
LOGS and AND STOP parameters: ROLLFORWARD DATABASE database_name
TO END OF LOGS AND STOP
Instructor Guide
Instructor notes:
Purpose — To discuss two methods to convert DMS-managed table spaces to use
Automatic Storage. A database level RESTORE could be used to convert all DMS table
spaces to use Automatic Storage. The SET TABLESPACE CONTAINERS would be
needed for each table space.
Details —
Transition statement — Next we will show some sample DB2 commands that can be
used to convert a DMS tablespace to use automatic storage.
V8.1
Instructor Guide
Uempty
Example converting a DMS managed tablespace to use
Automatic storage
• Use ALTER tablespace to convert management type
– A storage group could be specified
– DB2 will assign containers with sufficient space but will defer data movement
– Original containers are still allocated and in use
alter tablespace dmsmtspd using stogroup sg1

managed by automatic storage
• Use ALTER tablespace rebalance to complete movement

– REBALANCE moves the extents to the AS based containers
– Original containers removed when rebalance completes
alter tablespace dmsmtspd rebalance
• Use ALTER tablespace REDUCE as needed to release unused space
Figure 2-38. Example converting a DMS managed tablespace to use Automatic storage CL4636.0
Notes:
The ALTER TABLESPACE statement with the MANAGED BY AUTOMATIC STORAGE
option will convert the DMS table space into an Automatic Storage-managed table space,
by assigning new containers from the Automatic Storage paths.
The example ALTER command includes the USE STOGROUP option to select a specific
storage group.
When the ALTER completes, the originally assigned containers for the DMS table space
are not effected and no rebalancing of extents is needed.
The ALTER TABLESPACE command with the REBALANCE option triggers DB2 to move
the extents from the original containers in order to free the containers to be removed.
You may also decide to use the ALTER TABLESPACE REDUCE command to release
unused space from the tablespace and make it available for other tablespaces using the
same storage paths.
Instructor Guide
Instructor notes:
Purpose — To discuss using ALTER TABLESPACE commands to convert a DMS
tablespace to an automatic storage tablespace.
Details —
Additional information — You can not ALTER an automatic storage managed tablespace
to a DMS managed tablespace.
Transition statement — Next we will begin discussing the definition of data tags for a
table space.
V8.1
Instructor Guide
Uempty
Using WLM to prioritize activities based on the data
accessed
• The priority for an activity can be based on the data accessed
– Uses a combination of a data tag, which is a numeric identifier applied to a table
space or storage group, and WLM controls
• A data tag can be assigned directly to a table space, for any type of table space
management (DMS,AS)
• The data tag for a storage group for the table space can be inherited by objects in the
table space (AS only)
• Predictive prioritization, prior to execution

– The DB2 optimizer produces a list of data tags for an activity for all table spaces
that may be accessed during execution
– Work class sets can identify activities that have a particular data tag
– A work action can map activities matching a work class set to a specific service
class before they begin to execute
• Reactive prioritization, during execution

– Use the new DATATAGINSC threshold to map an activity to a different service
class at run time if data accessed is assigned a particular data tag.
– Useful when a data object may or may not be accessed, like access to a range
partitioned table based on parameter markers
Figure 2-39. Using WLM to prioritize activities based on the data accessed CL4636.0
Notes:
DB2 WLM can prioritize activities based on the accessed data
Using DB2 WLM, you can now prioritize an activity based on the data that the activity
accesses, either before the activity executes (predictively) or while the activity is
executing (reactively).
To prioritize an activity, you use a combination of a data tag, which is a numeric
identifier applied to a table space or storage group, and WLM controls. For example, if
you have a table space IMPORTANT_TS containing critical data that has a data tag
assigned to it, you could map any query that reads data from a table in this table space
to a service class that is allocated a higher percentage of overall CPU cycles on the
system.
You can assign a data tag directly to a table space or assign the data tag to the storage
group for the table space and have the table space inherit the data tag from the storage
group. Storage groups are groups of storage paths with similar characteristics. Using a
multi-temperature data storage approach, you can create storage groups that map to
different classes of storage in your system. You can assign automatic storage table
Instructor Guide
spaces to these storage groups, based on which table spaces have hot, warm, or cold
data. Frequently accessed (hot) data is stored on fast storage, infrequently accessed
(warm) data is stored on slower storage, and rarely accessed (cold) data is stored on
slow, less-expensive storage. As hot data cools down and is accessed less frequently,
you can move it to slower storage. You can dynamically reassign a table space to a
different storage group by using the ALTER TABLESPACE statement, specifying the
USING STOGROUP option.
To support data tags, the following DB2 commands SQL reference statements have
been added or modified:
- The output of the -tablespace parameter for the db2pd command now includes
information about data tags.
- The output of the -workclasses parameter for the db2pd command now lists the
work class attributes below the basic work class information.
- The ALTER TABLESPACE statement has the new DATA TAG clause.
- The ALTER THRESHOLD statement has the new DATATAGINSC clause.
- The ALTER WORK CLASS SET statement has the new DATA TAG LIST
CONTAINS clause.
- The CREATE TABLESPACE statement has the new DATA TAG clause.
- The CREATE THRESHOLD statement has the new DATATAGINSC clause.
- The CREATE WORK CLASS SET statement has the new DATA TAG LIST
CONTAINS clause.
V8.1
Instructor Guide

Purpose — To discuss the definition of a data tag for a table space.
Details —
Transition statement — Next we will discuss using a data tag to predictively set priority.
Instructor Guide
Data centered Workload Management - Predictive
• Work Action set maps statement to service class

• SQL compiler predicts what data (and table spaces) will be
touched by the SQL statement and builds the list of data tags
• List of data tags will define the initial service class placement
Table space
data tag (1)
Superclass MAINSC
1 Subclass
SCHIGH
Default
Work 4
action set Subclass Table space
Workload
SCMED data tag (4)
9
Subclass
SCLOW
Table space
data tag (9)
Figure 2-40. Data centered Workload Management - Predictive CL4636.0
Notes:
Predictive prioritization using work class and work action sets uses an estimated data tag
list that is obtained for an activity at compile time, similar to cost and cardinality estimates.
The estimated data tag list contains the data tags for all table spaces that the compiler
believes will be accessed during execution of the activity.
You can define work class sets to identify activities that have a particular data tag in their
estimated data tag lists. You can then define a work action to map any activities matching a
work class set to a specific service class before they begin to execute.
V8.1
Instructor Guide

Purpose — To to discuss the predictive setting of processing priority based on the data
tags for a table space.
Details —
Transition statement — Next we will review some sample workload management
definitions used to set processing priority using data tags.
Instructor Guide
Example: Determining the priority of activities based on

what data is estimated to be accessed
• Set table space or storage group data tags
– For example
• High priority data with a data tag value of 1
• Medium priority data with a data tag value 4
• Low priority data with a data tag value of 9
ALTER TABLESPACE SALESCURRENT DATA TAG 1
• Create a work class set containing work classes that isolate the DML activities
based on data tags
CREATE WORK CLASS SET WCS (
WORK CLASS WC_HIGH WORK TYPE DML DATA TAG LIST CONTAINS 1,
WORK CLASS WC_MED WORK TYPE DML DATA TAG LIST CONTAINS 4,
WORK CLASS WC_LOW WORK TYPE DML DATA TAG LIST CONTAINS 9)
• Create a work action set containing work actions that will map the activities assigned
to different service sub classes
CREATE WORK ACTION SET MAINWAS FOR SERVICE CLASS MAINSC
USING WORK CLASS SET WCS
( WORK ACTION MAP_HIGH ON WORK CLASS WC_HIGH MAP ACTIVITY TO SCHIGH,
WORK ACTION MAP_MED ON WORK CLASS WC_MED MAP ACTIVITY TO SCMED,
WORK ACTION MAP_LOW ON WORK CLASS WC_LOW MAP ACTIVITY TO SCLOW)
Figure 2-41. Example: Determining the priority of activities based on what data is estimated to be accessed CL4636.0
Notes:
The visual shows examples of workload management objects used to set processing
priority using data tags for table spaces.
The work class set named WSC defines three work classes, each based on a single data
tag value.
The work action set named MAINWAS uses the work classes defined by the work class set
WCS to map activities to different workload management service subclasses.
For example, if a SQL statement accesses a table in a table space with a data tag value of
1, the activity would match the work class WC_HIGH, which would be mapped to the
service subclass SCHIGH.
V8.1
Instructor Guide

Purpose — To show an example of a WORK CLASS SET and WORK ACTION SET that
could use data tags to direct an activity to select a service class prior to execution.
Details —
Transition statement — Next we will look at using data tags to reactively set processing
priority.
Instructor Guide
Data centered Workload Management - Reactive
• Allows changing priority of workload at runtime based

on data accessed
– New data tag threshold DATATAGINSC with remap action
– At runtime, based on tag of table space accessed, place activity in a different service class
Table space
data tag (1)
Superclass WLMDATATIERS
Subclass
WLM_HIGH
Work
action set Subclass Table space
Workload
WLM_MED data tag (5)
Subclass
WLM_LOW
Table space
data tag (9)
Figure 2-42. Data centered Workload Management - Reactive CL4636.0
Notes:
Reactive prioritization using the new DATATAGINSC threshold maps an activity to a
different service class at run time when the activity accesses data that is assigned a
particular data tag. For example, you can specify that an activity will be mapped to a
different service class when it reads data from a table space with data tag value of 9.
Reactive prioritization is useful if the compiler cannot accurately estimate the list of data
tags for the activity. An example of such a case is a query against a range-partitioned table
that uses parameter markers. The compiler cannot necessarily determine what table
ranges are accessed in advance.
Data tag thresholds are evaluated when a scan is first opened on a table and when an
insert is performed into a table. Any new data tag thresholds picked up by the activity after
a scan has been opened as a result of a remap operation is not applied to that scan.
By using the DATATAGINSC threshold, you can map an activity to a different service class
at run time when the activity accesses data that is assigned a particular data tag. For
example, Create a DATATAGINSC threshold on the high priority service subclass
V8.1
Instructor Guide
Uempty WLM_HIGH, and if any activity touches any data with a data tag of anything other than 1,
map it to the medium priority service subclass.
Each work class within a work class set has an evaluation order. The evaluation order is
used to determine the order work classes are checked against an activity; and the first work
class with attributes that match those of the activity, is the work class that will be used.
Using the example above, if an activity might touch data with the tag values of 1, 5 and 9,
the activity would get assigned to WC_HIGH because WC_HIGH is first in the evaluation
order. If the user wanted that activity to, instead get mapped to WC_LOW, they would have
to put WC_LOW ahead of WC_MED and WC_HIGH in the evaluation order.
Instructor Guide
Instructor notes:
Purpose — To discuss using a data tag to reactively change processing service class for
an activity.
Details —
Transition statement — Next we will look at some sample WLM objects that use data tags
to reactively set service classes.
V8.1
Instructor Guide
Uempty
Example: Changing the priority of activities based on
what data is accessed during execution – Part 1
• Set table space or storage group data tags
• Define service superclass and sub classes for high, medium and low priority activities
CREATE SERVICE CLASS WLMDATATIERS
CREATE SERVICE CLASS WLM_HIGH UNDER WLMDATATIERS SOFT CPU SHARES 5000
CREATE SERVICE CLASS WLM_MEDIUM UNDER WLMDATATIERS HARD CPU SHARES 3000
CREATE SERVICE CLASS WLM_LOW UNDER WLMDATATIERS HARD CPU SHARES 2000
• Create a work class set containing work classes that isolate out activities based on the estimated data tag
CREATE WORK CLASS SET WLM_DATATAGS_WCS
(WORK CLASS WLM_DML_HIGH1_WC WORK TYPE DML DATA TAG CONTAINS 1,
WORK CLASS WLM_DML_MEDIUM_WC WORK TYPE DML DATA TAG CONTAINS 5,
WORK CLASS WLM_DML_LOW_WC WORK TYPE DML DATA TAG CONTAINS 9)
• Create a work action set containing work actions that map activities to the appropriate service subclasses based
on the list of estimated data tags
CREATE WORK ACTION SET WLM_DATATAGS_WAS FOR SERVICE CLASS WLM_DATATIERS
USING WORK CLASS SET WLM_DATATAGS_WCS
(WORK ACTION WLM_MAP_HIGH_WA ON WORK CLASS WLM_DML_HIGH1_WC
MAP ACTIVITY TO WLM_HIGH,
WORK ACTION WLM_MAP_MEDIUM_WA ON WORK CLASS WLM_DML_MEDIUM_WC
MAP ACTIVITY TO WLM_MEDIUM
WORK ACTION WLM_MAP_LOW_WA ON WORK CLASS WLM_DML_LOW_WC
MAP ACTIVITY TO WLM_LOW
)
Figure 2-43. Example: Changing the priority of activities based on what data is accessed during execution - Part 1 CL4636.0
Notes:
The visual shows several workload management object definitions.
The service class WLMDATATIERS contains three service sub classes, WLM_HIGH,
WLM_MEDIUM and WLM_LOW.
The work class set WLM_DATATAGS_WSC defines three work classes, each with a single
data tag.
The work action set defines three work actions that direct activities to the three service
subclasses predictively.
Next we will need to define thresholds to handle the reactive setting of processing priority.
Instructor Guide
Instructor notes:
Purpose — To show the sample WLM objects that provide the basis for doing reactive
setting of processing priority.
Details —
Transition statement — Next we will see the sample WLM thresholds that change service
class usage during execution of activities.
V8.1
Instructor Guide
Uempty
Example: Changing the priority of activities based on
what data is accessed during execution – Part 2
• Create a threshold to remap activities from WLM_HIGH to
WLM_MEDIUM if they touch data considered medium priority
CREATE THRESHOLD REMAP_HIGH_TO_MEDIUM
FOR SERVICE CLASS WLM_HIGH UNDER WLM_DATATIERS ACTIVITIES
ENFORCEMENT DATABASE PARTITION
WHEN DATATAGINSC NOT IN (1)
REMAP ACTIVITY TO WLM_MEDIUM
• Create a threshold to a threshold to remap activities from

WLM_MEDIUM to WLM_LOW if they touch data considered
low priority
CREATE THRESHOLD REMAP_MEDIUM_TO_LOW

FOR SERVICE CLASS WLM_MEDIUM UNDER WLM_DATATIERS ACTIVITIES
ENFORCEMENT DATABASE PARTITION
WHEN DATATAGINSC IN (9)
REMAP ACTIVITY TO WLM_LOW
Figure 2-44. Example: Changing the priority of activities based on what data is accessed during execution - Part 2 CL4636.0
Notes:
The sample threshold named REMAP_HIGH_TO_MEDIUM causes the workload
management function to move an activity from the WLM_HIGH service subclass to
WLM_MEDIUM if a data object is accessed with a data tag value higher than 1.
The threshold REMAP_MEDIUM_TO_LOW causes the workload manager to remap an
activity in the WLM_MEDIUM service class to WLM_LOW if a data object in a table space
with a data tag value of 9 is accessed.
Instructor Guide
Instructor notes:
Purpose — To review two sample WLM threshold objects that could be used to reactively
adjust service class assignment using table space data tags.
Details —
Transition statement — Next we will summarize the topics for this lecture unit.
V8.1
Instructor Guide
Uempty
Unit summary
• Describe the benefits and limitations of using DMS and Automatic Storage management for
table spaces
• Monitor tablespace space allocations using SQL with MON_GET_TABLESPACE and
MON_GET_CONTAINERS functions
• ALTER a table space to handle High Water Mark related issues and reclaim unused space
from DMS and Automatic Storage table spaces
• Use the REBALANCE option to control container allocations for Automatic Storage table
spaces when changing storage paths
• Monitor the processing done by the Rebalancer using LIST UTILITIES and db2pd
commands
• Plan and implement changes to disk space allocations using ALTER TABLESPACE
options: ADD, EXTEND, RESIZE, DROP, and BEGIN NEW STRIPE SET
• Utilize Data tags to alter WLM service subclass for processing activities based on data
accessed
Notes:
Instructor Guide
Instructor notes:
Purpose —
Details —
V8.1
Instructor Guide
Uempty
Student exercise
Notes:
Instructor Guide
Instructor notes:
Purpose —
Details —
V5.4
Instructor Guide
Uempty Unit 3. DB2 10.5 BLU Acceleration Concepts
Estimated time
01:30

This unit describes the support for DB2 BLU Acceleration,
column-organized tables, in DB2 10.5. We will explain the ‘seven big
ideas’ that work together inside DB2 BLU Acceleration to provide
dramatic performance improvements for DB2 analytics query
workloads. We will discuss how to implement database support for
DB2 BLU Acceleration. We will explain how the column-dictionaries
are built by the LOAD utility to achieve extreme compression for
column-organized tables.

• List the seven ‘big ideas’ that work together to provide DB2 BLU
Acceleration
• Describe how the column dictionaries used to provide extreme
compression of column-organized tables are built and utilized
• Explain the impact of setting the DB2 registry variable
DB2_WORKLOAD to ANALYTICS
• Describe the different storage objects used for column-organized
table compared to row-organized tables, including the special page
map index.
• Explain how DB2 uses a synopsis table to support data skipping
with DB2 BLU Acceleration, column-organized tables
References
The Information Center for DB2 10.5:
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/index.jsp
Upgrading to DB2 Version 10.5 SC27-5513-00
What's New for DB2 Version 10.5 SC27-5519-00
© Copyright IBM Corp. 2005, 2015 Unit 3. DB2 10.5 BLU Acceleration Concepts 3-1
Instructor Guide
V5.4
Instructor Guide
Uempty
Unit objectives
• List the seven ‘big ideas’ that work together to provide DB2
BLU acceleration
• Describe the different storage used for column-organized table
compared to row-organized tables
• Explain how DB2 uses a synopsis table to support data
skipping with column-organized tables
Notes:
These are the objectives for this lecture unit.
Instructor Guide
Instructor notes:
Purpose — To introduce the content of this unit.
Details —
Transition statement — Let’s first see an overview of the functionality in DB2 10.5 which
we call BLU Acceleration.
V5.4
Instructor Guide
Uempty
What is DB2 with BLU Acceleration?

• Large order of magnitude benefits
– Performance
– Storage savings
– Time to value
• New technology in DB2 for analytic queries

– CPU-optimized unique runtime handling
– Unique encoding for speed and compression
– Unique memory management
– Columnar storage, vector processing
– Built directly into the DB2 kernel
• Revolution or evolution
– BLU tables coexists with traditional row tables
- in same schema, storage, and memory
– Query any combination of row or BLU tables
– Easy conversion of tables to BLU tables
• Change everything, or change incrementally
Figure 3-2. What is DB2 with BLU Acceleration? CL4636.0
Notes:
This slide describes at a high level what DB2 with BLU Acceleration is. What is the key
business value of implementing BLU Acceleration?
This is a new technology that has been developed by IBM and integrated directly into the
DB2 engine. BLU Acceleration is a new storage engine along with integrated runtime
(directly into the core DB2 engine) to support the storage and analysis of column-organized
tables. The BLU Acceleration processing is parallel to the regular, row-based table
processing found in the DB2 engine. This is not a bolt-on technology nor is it a separate
analytic engine that sits outside of DB2. Much like when IBM added XML data as a first
class object within the database along with all the storage and processing enhancements
that came with XML, now IBM has added column-organized tables directly into the storage
and processing engine of DB2.
Simply put, this is a column-organized table store in DB2. Along with this store are many
benefits including significantly improved performance, massive storage savings and ease
of implementation and ease of management.
Instructor Guide
This feature allows us to deliver on these performance and storage innovations while also
optimizing the use of main-memory, improving I/O efficiency and exploiting CPU
instructions and characteristics to enhance the value derived from your database
investments.
V5.4
Instructor Guide

Purpose — To introduce DB2 with BLU acceleration, as a feature that can provide
dramatic benefits in data storage any query performance while it can share a database with
standard row-organized tables.
Details —
Transition statement — Next we will review the traditional methods used to achieve peak
performance results for database applications.
Instructor Guide
Traditional methods to improve performance for

analytic queries
• Decide on partition strategies
– Database Partitioning
– Range Partitioning
– Clustering (Multiple Dimensional Clustering)
• Select Compression Strategy
• Create Table
• Load data
• Create Auxiliary Performance Structures
– Materialized views (MQT)
– Create indexes
– Tune memory
• Tune I/O
• Add Optimizer hints
• Statistics collection
Figure 3-3. Traditional methods to improve performance for analytic queries CL4636.0
Notes:
The visual list some of the steps that would traditionally be used to optimize performance
for relational database applications.
• Decide on partition strategies - Before a table is created in a database the data
partitioning strategies that would best fit the application would be determined including:
- Database partitioning - You might decide to use a partitioned database cluster to
provide the benefits of inter-partition parallel processing.
- Range partitioning - You might decide to implement a range partitioned table which
can use partition elimination to improve performance.
- Clustering - You may also consider clustering data or using multi-dimensional
clustering depending on the type of access used for tables
• Select the compression strategy - With DB2 10.1, you could choose the static classic
compression or add adaptive compression for tables to reduce system I/O and memory
resources for access to large tables.
• Create the table - Next the table would be defined.
V5.4
Instructor Guide
Uempty • Load the data - Once the table is defined, the table data can be loaded using the DB2
LOAD utility, another DB2 utility like INGEST or using application processing.
• Create Auxiliary Performance Structures - In order to achieve good performance for the
application additional supporting database objects might be created, like indexes or
materialized query tables (MQT). You may also adjust buffer pool memory allocations to
improve access performance for a table. These may require additional tuning over time
as table size changes and new application queries are developed.
• Tuning I/O - In some cases you may decide to tune the I/O access resources for a table.
For example, you might move a table to a new high performance device, like solid state
storage or implement multi-temperature storage for a range partitioned table.
• Add Optimizer hints - For a query that does not perform well using the standard access
plan generated by the SQL compiler, you might provide optimizer hints, in DB2 LUW
this is done using optimization profiles.
• Statistics Collection - You might find that the default table statistics are not sufficient to
generate efficient access plans for some queries, so special detailed table and index
statistics might be collected. With DB2 LUW you can define statistical views and collect
statistics on those to supplement the table statistics.
We will see that with column-organized tables, we can reduce the task list to two steps,
create the table and then load the data.
Instructor Guide
Instructor notes:
Purpose — To briefly review the many steps that may be used to achieve good
performance for applications accessing database tables. One of the key objectives of the
support for column-organized tables in DB2 is to eliminate the need for many of these time
consuming steps. For example, column-organized table do not use any user defined
indexes, so you would not spend any time creating or maintaining efficient indexes.
We will see that the LOAD utility will automatically collect table statistics, so with
column-organized tables, the concept is to just create the table, load the data. The other
steps are not required.
Details —
Transition statement — Next we will introduce the seven big ideas that work together to
provide the dramatic performance benefits of DB2 with BLU acceleration.
V5.4
Instructor Guide
Uempty
The Seven Big Ideas of DB2 with BLU Acceleration
7
Column
Store
6 1
Simple to
Data
Implement
Skipping
and Use
Data Mart
Analytics
5 Super Fast 2
Optimal Super Easy Extreme
Memory
Compression
Caching
4 3
Deep HW
Core-
Instruction
Friendly
Exploitation
Parallelism
(SIMD)
Need webcast troubleshooting help? Click attachments

5
Figure 3-4. The Seven Big Ideas of DB2 with BLU Acceleration CL4636.0
Notes:
BLU Acceleration is based on seven innovations that have been added to DB2 which we
call “Big Ideas”. Each of these big ideas is a technology capability that provides business
value and en-mass make up what we refer to as BLU Acceleration.
Lower Operation Costs
• Column Store – by storing table data in column-organized format we not only save
significantly on storage costs but we also improve I/O and memory efficiency. This
lowers operating costs.
• Simple to Implement and Use – with BLU Acceleration you just create the
column-organized table and then load and go. The additional steps like memory tuning,
selection and creation of indexes, etc. are eliminated. This lowers administration and
development costs significantly
• Extreme Compression – by using compression and sophisticated encoding algorithms,
DB2 can save significantly on storage costs including power, cooling, and management
of that storage.
Instructor Guide
Hardware Optimized
• Extreme Compression – in addition to the lower costs, the compression algorithms used
exploit processor characteristics to improve performance. The compression we use
works with a register friendly encoding technique to improve processor efficiency
• Deep Hardware Instruction Exploitation – we will discuss this in more detail later in this
lecture, but with SIMD processing we are multiplying the performance of the processor
by having instructions work on multiple data elements simultaneously.
• Core Friendly Parallelism – Access plans on column-organized tables will leverage all
of the cores on the server simultaneously to deliver better analytic query performance
• Optimal Memory Caching – With row-organized tables, a full table scan ends up putting
data into the bufferpool that is often not required. For column-organized tables, if there
are columns that are involved in joins or other predicates in many queries then we can
pack the bufferpool full of those columns while keeping other columns out of memory if
they are not regularly used. This improves performance and optimizes the memory
available
Extreme Performance
• Optimal Memory Caching – as stated above, this not only helps to optimize hardware
but also improves overall workload performance
• Data Skipping – by keeping track of which pages of data contain which column values,
we can reduce the I/O costs for query processing by simply skipping data we already
know would not qualify for the query.
• Column Store – in addition to lowering costs, by selecting only columns that are part of
a query we can increase performance of queries by an order of magnitude in some
cases.
V5.4
Instructor Guide

Purpose — Briefly discuss the seven big ideas that we will explain in more detail in this
lecture.
Details —
Transition statement — Next we will begin to describe the big ideas that went into DB2
BLU acceleration for DB2 10.5.
Instructor Guide
Application view of table data
• Input data is in row format:

John Piconne 47 18 Main Street Springfield MA 01111
Susan Nakagawa 32 455 N. 1st St. San Jose CA 95113
Sam Gerstner 55 911 Elm St. Toledo OH 43601
Chou Zhang 22 300 Grand Ave Los Angeles CA 90047
Mike Hernandez 43 404 Escuela St. Los Angeles CA 90033
Pamela Funk 29 166 Elk Road #47 Beaverton OR 97075
Rick Washington 78 5661 Bloom St. Raleigh NC 27605
Ernesto Fry 35 8883 Longhorn Dr. Tucson AZ 85701
Whitney Samuels 80 14 California Blvd. Pasadena CA 91117
Carol Whitehead 61 1114 Apple Lane Cupertino CA 95014
• In a column-organized table, each row of data gets compressed and

converted to columnar format upon LOAD or INSERT
Figure 3-5. Application view of table data CL4636.0
Notes:
The standard application view of the data, that of a relational table with rows and columns
does not change when using column-organized tables.
The column-organized tables can be loaded using the DB2 LOAD utility from the same row
oriented input, or applications can use SQL INSERT statements to populate the tables. As
new rows of data are stored, DB2 will compress the data and convert it to the columnar
format.
V5.4
Instructor Guide

Purpose — To explain that the use of Column-organized tables does not require changes
to the application view of the data and standard SQL or DB2 utilities can be used to load
the data. The storage of data in the new columnar format is DB2 internal processing.
Details —
Transition statement — Next we will see how the storage for column-organized tables is
different.
Instructor Guide
Columnar storage in DB2 (conceptual) – Big Idea #1
• DB2 uses separate set of extents and pages for each column
TSN
0 John Piconne 47 18 Main Street Springfield MA 01111
1 Susan Nakagawa 32 455 N. 1st St. San Jose CA 95113
2 Sam Gerstner 55 911 Elm St. Toledo OH 43601
TSN = page
3 Chou Zhang 22 300 Grand Ave Los Angeles CA 90047
Tuple
Sequence 4 Mike Hernandez 43 404 Escuela St. Los Angeles CA 90033
Number 5 Pamela Funk 29 166 Elk Road #47 Beaverton OR 97075
6 Rick Washington 78 5661 Bloom St. Raleigh NC 27605
7 Ernesto Fry 35 8883 Longhorn Dr. Tucson AZ 85701
8 Whitney Samuels 80 14 California Blvd. Pasadena CA 91117
9 Carol Whitehead 61 1114 Apple Lane Cupertino CA 95014
10
11 page
…
• Each table column is assigned to a set of pages

• Each page is filled with data from a single column, the number of rows
with data in a page would vary
Figure 3-6. Columnar storage in DB2 (conceptual) – Big Idea #1 CL4636.0
Notes:
DB2 allocates extents of pages for each column in a column-organized table. The page
size and extent size is fixed for each table, based on the table space assigned when the
CREATE TABLE statement is executed.
Each page will only contain data from a single column in the table. The number of rows that
share a page will vary.
The visual shows the concept of storing data columns on different pages. We will see that
compression techniques are used for every column of data. The visual shows
uncompressed data for ease of understanding.
V5.4
Instructor Guide

Purpose — To explain that while the standard data columns for a row-organized table
would be stored together on a data page, each column of data is stored on a different page
Details —
Transition statement — Next we will discuss some basic information about the storage of
Instructor Guide
Column-organized tables - basics

• The term TSN, Tuple Sequence Number (like logical Row ID)
– Rows are assigned a TSN, in an ascending order when the data row is stored
– TSNs would uniquely identify one row of data within a table
– DB2 uses the TSN to locate and retrieve column data for a specific row
• Typically, column-organized tables use less space than row-organized
tables
– In general the compression efficiency is greater at the page level based on data
from a single column in the table compared to row based compression
• Column-organized tables with many columns and few rows can be
larger than row-organized tables
– DB2 allocates extents for each column in a table
– Extent allocation is used to optimize prefetching a column of a table during a scan
– For a small table with just a few rows, the unused pages in these extents could
make the space allocation large compared to the actual data stored
• For example a table might have 20 rows with 50 columns in a table space with an
extent size of 4 pages. Each column would be allocated at least one extent, so the
table would require at least 200 pages for 20 data rows.
Figure 3-7. Column-organized tables - basics CL4636.0
Notes:
DB2 uses a tuple sequence number (TSN) to manage rows of data stored in
column-organized tables. The TSN is similar to a unique row identifier and is used
internally by DB2, applications so not need to reference this value.
The TSN for a row of data is unique within a table. It is used by DB2 to pull together all of
the column data for a row that is being selected by an application.
In general, the compression results for a column-organized table exceeds the compression
results for a similar row-organized table. One exception to this concept is for tables that
have a small number of data rows. In order to efficiently scan data for one column of a
column-organized table, DB2 allocates unique extents for each data column.
The visual describes the example of a small table with the following characteristics:
• The table has just 20 rows of data
• The table was created with 50 columns
• The table space used to create the table was defined with an extent size of four pages
V5.4
Instructor Guide
Uempty Since each of the 50 columns would require at least one extent of four pages, the table
would require at least 200 pages of storage, even though there may over by 20 rows of
data.
Instructor Guide
Instructor notes:
Purpose — To discuss the allocation of extents of pages to each column of a
column-organized table. For large tables, the improved compression results for
column-organized tables will offset the possibility of unused page space.
Details —
Transition statement — Next we will see how easy it is to implement and use
V5.4
Instructor Guide
Uempty
Big Idea #2 – Simple to Implement and Use

• A single DB2 registry variable can be used to implement DB2
BLU acceleration
db2set DB2_WORKLOAD=ANALYTICS
• If possible, set DB2_WORKLOAD=ANALYTICS before a

database is created
– Allow AUTOCONFIGURE to set database configuration and memory
allocations
– Sets database page size to 32K
– Enables workload management WLM configuration for analytics query
processing
• Setting DB2_WORKLOAD=ANALYTICS for an existing
database
– Run AUTOCONFIGURE command manually
– Verify that sort heap, utility heap, and Buffer pools are large
Figure 3-8. Big Idea #2 - Simple to Implement and Use CL4636.0
Notes:
DB2 column-organized tables add columnar capabilities to DB2 databases, which includes
data stored with column organization and vector processing of column data. Using this
table format with star schema data marts provides significant improvements to storage,
query performance, and ease of use through simplified design and tuning.
If the majority of tables in your database are going to be column-organized tables, set the
DB2_WORLOAD registry variable to ANALYTICS prior to creating the database. Doing so
helps to configure memory, table organization, page size, and extent size, and enables
workload management.
The recommended approach is to put as many tables into column-organized format as
possible, if the workload is entirely an analytics/OLAP workload.
These workloads are characterized by non-selective data access (that is, queries access
more than approximately 5% of the data), and extensive scanning, grouping, and
aggregation.
Instructor Guide
Workloads that are transactional in nature should not use column-organized tables.
Traditional row-organized tables with index access are generally better suited for these
environments.
In the case of mixed workloads, which include a combination of analytic query processing
and very selective access (involving less than 2% of the data), a mix of row-organized and
column-organized tables might be suitable.
V5.4
Instructor Guide

Purpose — To discuss the use of a single DB2 registry variable DB2_WORKLOAD, to
enable the configuration and processing differences for column-organized tables.
Details —
Additional information — Some students might ask if they already set DB2_WORKLOAD
to match an application like SAP, if they can still set DB2_WORKLOAD to ANALYTICS. You
can advise those students to check with the application vendor. I understand that SAP is
working on a set of options that will work best when column-organized tables are used with
the SAP applications.
Instructor Guide
What does setting DB2_WORKLOAD=ANALYTICS

impact ?
• DATABASE CFG option dft_table_org = COLUMN
• default page size set by CREATE DATABASE is 32KB
• DATABASE CFG option dft_extent_sz = 4
• DATABASE CFG option dft_degree = ANY
• Intra query parallelism is enabled for any workload (including
SYSDEFAULTUSERWORKLOAD) that specifies MAXIMUM DEGREE DEFAULT,
even if DBM CFG intra_parallel is disabled.
• DATABASE CFG option catalogcache_sz - higher value than default
• DATABASE CFG option sortheap and sheapthres_shr - higher value than
default.
• DATABASE CFG option util_heap_sz – higher value than default
• WLM controls concurrency on SYSDEFAULTMANAGEDSUBCLASS.
• Automatic table maintenance and auto_reorg = ON,
performs space reclamation for column-organized tables by default.
Figure 3-9. What does setting DB2_WORKLOAD=ANALYTICS impact ? CL4636.0
Notes:
Setting the DB2 registry variable DB2_WORKLOAD to ANALYTICS prior to creating the
database) will establish an optimal default configuration when using the database for
analytic workloads.
The ANALYTICS option ensures that the following configuration settings are performed
automatically (unless AUTOCONFIGURE is disabled):
• The dft_table_org (default table organization for user tables) database configuration
parameter is set to COLUMN.
• The dft_degree (default degree) database configuration parameter is set to ANY.
• The dft_extent_sz (default extent size) database configuration parameter is set to 4.
• The catalogcache_sz (catalog cache) database configuration parameter is set to a
higher value than that for a non-analytics workload.
• The sortheap (sort heap) and sheapthres_shr (sort heap threshold for shared sorts)
database configuration parameters are calculated specifically for an analytics workload,
V5.4
Instructor Guide
Uempty and take into account the additional memory requirements for processing
column-organized data.
• The util_heap_sz (utility heap size) database configuration parameter is set to a value
that takes into account the additional memory that is required to load the data into
• The auto_reorg (automatic reorganization) database configuration parameter is set to
ON.
Note
Running the AUTOCONFIGURE command against an existing database when

DB2_WORKLOAD is set to ANALYTICS has the same result.
The following additional choices are made automatically:

• The default database page size for a newly created database is set to 32K.
• A larger database shared sort heap is allocated.
• Intra-query parallelism is enabled for any workload (including
SYSDEFAULTUSERWORKLOAD) that specifies MAXIMUM DEGREE DEFAULT, even
if intra_parallel is disabled.
• Concurrency control is enabled on SYSDEFAULTMANAGEDSUBCLASS.
• Automatic table maintenance performs space reclamation for column-organized tables
by default.
Instructor Guide
Instructor notes:
Purpose — To discuss the impact on a DB2 database when the DB2 registry variable
DB2_WORKLOAD is set to ANALYTICS.
Details —
Transition statement — Next we will describe how easy it is to work with DB2 with BLU
acceleration for column-organized tables.
V5.4
Instructor Guide
Uempty
To implement DB2 BLU Acceleration for an existing
database without setting DB2_WORKLOAD
• If you cannot set DB2_WORKLOAD to ANALYTICS
– Create the database with:
• 32K page size
• UNICODE code set (this is the default), and an IDENTITY or IDENTITY_16BIT collation
– Update the database configuration as follows:
• Set the dft_table_org to COLUMN so that new tables are created as column-organized tables
by default
• Or use ORGANIZE BY COLUMN clause on each CREATE TABLE statement.
– Set the dft_degree to ANY.
– Set the dft_extent_sz 4
– Increase the value of the catalogcache_sz (catalog cache) by 20%
– Ensure that the sortheap (sort heap) and sheapthres_shr ARE NOT set to AUTOMATIC.
• Consider increasing these values significantly for analytics workloads.
– Set the util_heap_sz to 1,000,000 pages and AUTOMATIC to address the resource needs of the
LOAD command
– Set the auto_reorg to ON.
– Ensure that the sheapthres, DBM parameter is set to 0
– Ensure that intraquery parallelism is enabled.
• Intraquery parallelism can be enabled at the instance level, database level, or application level
– Enable concurrency control on the SYSDEFAULTMANAGEDSUBCLASS service subclass by
issuing the following statement:
• ALTER THRESHOLD SYSDEFAULTCONCURRENT ENABLE
Figure 3-10. To implement DB2 BLU Acceleration without setting DB2_WORKLOAD CL4636.0
Notes:
If you cannot create your database and have it auto-configured while
DB2_WORKLOAD=ANALYTICS, take the following steps to create and optimally configure
your database for analytic workloads.
1. Create the database with a 32K page size, a UNICODE code set (this is the default),
and an IDENTITY or IDENTITY_16BIT collation.
For example:
CREATE DATABASE DMART COLLATE USING IDENTITY PAGESIZE 32 K
2. Update the database configuration as follows:
a. Set the dft_table_org (default table organization for user tables) database
configuration parameter to COLUMN so that new tables are created as
column-organized tables by default; otherwise, the ORGANIZE BY COLUMN clause
must be specified on each CREATE TABLE statement.
b. Set the dft_degree (default degree) database configuration parameter to ANY.
Instructor Guide
c. Set the dft_extent_sz (default extent size) database configuration parameter to 4.

d. Increase the value of the catalogcache_sz (catalog cache) database configuration
parameter by 20% (it is set automatically during database creation).
e. Ensure that the sortheap (sort heap) and sheapthres_shr (sort heap threshold for
shared sorts) database configuration parameters are not set to AUTOMATIC.
Consider increasing these values significantly for analytics workloads. A reasonable
starting point is setting sheapthres_shr to the size of the buffer pool (across all buffer
pools). Set sortheap to some fraction (for example, 1/20) of sheapthres_shr to
enable concurrent sort operations.
f. Set the util_heap_sz (utility heap size) database configuration parameter to
1,000,000 pages and AUTOMATIC to address the resource needs of the LOAD
command. If the database server has at least 128 GB of memory, set util_heap_sz
to 4,000,000 pages. If concurrent load operations are running, increase the value of
util_heap_sz to accommodate higher memory requirements.
g. Set the auto_reorg (automatic reorganization) database configuration parameter to
ON.
Important
These changes will increase the overall database memory that is required for your
database. Consider increasing the database_memory configuration parameter for
the database if this parameter was not already set to AUTOMATIC.
3. Ensure that the sheapthres database manager configuration parameter is set to 0 (this
is the default value). Note that this setting applies to all databases in the instance.
4. Ensure that intraquery parallelism, which is required to access column-organized
tables, is enabled. Intraquery parallelism can be enabled at the instance level, database
level, or application level; for details, see Intraquery parallelism and intrapartition
parallelism.
5. Enable concurrency control on the SYSDEFAULTMANAGEDSUBCLASS service
subclass by issuing the following statement:
ALTER THRESHOLD SYSDEFAULTCONCURRENT ENABLE
V5.4
Instructor Guide

Purpose — To discuss enabling use of DB2 BLU acceleration in databases where the
registry variable DB2_WORKLOAD can not be set to ANALYTICS.
Details —
Transition statement — Next we will discuss the ease of working with DB2 BLU
Acceleration.
Instructor Guide
What makes DB2 with BLU Acceleration easy to

use ?
• LOAD and then… run queries
– No indexes
– No REORG (it’s automated)
– No RUNSTATS (it’s automated)
– No MDC or MQTs or Materialized Views
– No partitioning
– No statistical views
– No optimizer hints
• It is just DB2!
– Same SQL, language interfaces, administration
– Reuse DB2 process model, storage, utilities
Figure 3-11. What makes DB2 with BLU Acceleration easy to use ? CL4636.0
Notes:
The simplicity of DB2 with BLU Acceleration is one of the key value propositions. The fact
that it is really simple to use is key. Remember it is all part of the DB2 kernel.
Because of the simplicity and the built in space reclamation and statistics gathering, and
also the fact that other structures such as indexes, MDC tables, MQT tables, etc are not
needed really to add to the value.
From a customer perspective it is LOAD and GO! You can start running your queries
immediately after data load and start getting the performance gains of DB2 with BLU
Acceleration immediately.
Setting DB2_WORKLOAD to ANALYTICS does all the work:
• Automatically configures DB2 for optimal analytics performance
• Makes column-organized tables the default table type
• Enables automatic workload management
• Enables automatic space reclaim
V5.4
Instructor Guide
Uempty • Page and extent size configured for analytics

• Memory for caching, sorting and hashing, utilities are automatically initialized based on
the server size and available RAM
Instructor Guide
Instructor notes:
Purpose — To review how simple it is to start using column-organized tables. With DB2
10.5 Fix Pack 4 and later you can create a user maintained MQT for a column-organized
table. The primary objective for enabling MQT usage with column-organized tables is to
simplify conversion from row to column organization, where MQT usage is already in place.
Details —
Transition statement — Next we will see how a table is created to be a column-organized
table.
V5.4
Instructor Guide
Uempty
What happens when you create a new column-
organized table ?
• Use a standard CREATE TABLE statement
For example:
CREATE TABLE JTNISBET.STAFF (
ID SMALLINT NOT NULL,
NAME VARCHAR(9),
. . . .
COMM DECIMAL(7,2) )
ORGANIZE BY COLUMN
IN TSPACED INDEX IN TSPACEI ;
• A system generated page map index is associated with the column-organized table
– The index contains one entry for each page in the table
– Index has a generated name like SQL130617115333860 and uses the schema of
SYSIBM
• A system generated ‘synopsis table’ is associated with the column-organized table
– The synopsis table can be used as a ‘rough’ index to skip pages based on SQL
predicates
– A synopsis has a generated name like SYN130617110037170122_HISTORY and
uses the schema of SYSIBM
Figure 3-12. What happens when you create a new column-organized table ? CL4636.0
Notes:
To create a column-organized table, specify the ORGANIZE BY COLUMN clause on the
CREATE TABLE statement.
When a column-organized table is created, DB2 creates a system generated page map
index. The index contains one entry for each page in the column-organized table. The
index is assigned a system generated name and uses a schema of SYSIBM.
Column organized tables can also have a system generated ‘synopsis table’. The data in
the table is used as a rough index, to skip reading pages that DB2 when can determines
that none of the column data in the page will match an SQL predicate.
The synopsis table has a system generated name that is prefixed by ‘SYN’, and uses the
schema SYSIBM.
Instructor Guide
Instructor notes:
Purpose — To show how to create a column-organized table and to explain that DB2 will
create an associated page map index and synopsis table automatically. the lecture covers
the use of the index and synopsis table later in more detail so do not plan on covering those
details here.
Details —
Transition statement — Next we will cover some additional notes about creating
V5.4
Instructor Guide
Uempty
Notes regarding the CREATE TABLE statement for
a column-organized table
• ORGANIZE BY COLUMN clause
– If the database configuration option DFT_TABLE_ORG is set to COLUMN then it
is not necessary to include the ORGANIZE BY COLUMN clause to create a
column-organized table
• IN tablespace clause
– Tablespace specified must be an automatic storage managed tablespace that
supports reclaimable storage
– Column-organized table data will be stored in the tablespace
– Also storage for the compression dictionary and other table metadata
– The synopsis table will be created and stored here
• INDEX IN tablespace clause (optional)
– The system generated page map index will use this tablespace
– Any enforced primary key or unique key related indexes will also use this
• The COMPRESS clause is not used for column-organized tables,
compression is assumed
• Maximum row length including overhead is 32K regardless of the page
size used
Figure 3-13. Notes regarding the CREATE TABLE statement for a column-organized table CL4636.0
Notes:
To create a column-organized table, specify the ORGANIZE BY COLUMN clause on the
If you want to create tables with a specific table organization without having to specify the
ORGANIZE BY COLUMN or the ORGANIZE BY ROW clause, you can change the default
table organization by setting the DFT_TABLE_ORG database configuration parameter.
Alternatively, you can change the default table organization to COLUMN automatically by
setting the DB2_WORKLOAD registry variable to ANALYTICS. This setting establishes a
configuration that is optimal for analytic workloads.
A column-organized table can have a maximum of 1012 columns, regardless of page size,
where the byte counts of the columns must not be greater than 32,677. Extended row size
support does not apply to column-organized tables.
Create column-organized tables in automatic storage table spaces only. The INDEX IN
tablespace clause can be used to designate an alternate table space for storage of the
page map index as well as any primary key or unique indexes defined on the table.
Instructor Guide
The COMPRESS clause is not used for column-organized table. All column-organized
tables are compressed and an error will be generated if the COMPRESS option is specified
when a column-organized table is created.
V5.4
Instructor Guide

Purpose — To discuss a few additional considerations for creating column-organized
tables.
Details —
Transition statement — Next we will discuss the use of the page map index for
Instructor Guide
Why is there a page map index for a column-

organized table ?
• The page map index for a column-organized table is a system
generated index
– The INDEXTYPE column in SYSCAT.INDEXES will contain CPMA
• Allows DB2 to access all of the pages that contain data for one
column
– Scans can avoid access to pages containing data for columns
that are not needed
• Once DB2 determines that the column data for a specific row
is needed, the page map index allows data for other columns
necessary to produce the result to be located
Figure 3-14. Why is there a page map index for a column-organized table ? CL4636.0
Notes:
Every column organized table has a system generated page map index. This index is used
internally by DB2 to process column-organized tables. The index is based on
system-generated columns, not user defined column data.
Since pages in a Column-organized table only contain data for a single table column, the
page map index allows DB2 to scan a table and only read pages containing data columns
referenced in a query.
The page map index allows DB2 to locate the page containing the column data for a
particular row.
The index points to a page assigned to a column-organized table, not to any particular
row’s column value. The index size will depend on the number of pages used to store the
column-organized table.
V5.4
Instructor Guide

Purpose — To discuss the contents and use of the page map index. This could be
compared to the block indexes used for MDC tables, but there are significant differences.
The index does not contain keys based on the column values that are user defined.
Details —
Transition statement — Next we will discuss how DB2 compresses the data in
Instructor Guide
Big Idea #3 : BLU uses Multiple Compression

Techniques to achieve extreme compression
Approximate Huffman-Encoding (“frequency-based
compression”), prefix compression, and offset compression
Frequency-based compression: Most common values use
fewest bits
0 = California 2 High Frequency States
1 = NewYork (1 bit covers 2 entries)
Example showing 3 different
code lengths. Code lengths 000 = Arizona
vary depending on the data 001 = Colorado
values. 010 = Kentucky 8 Medium Frequency States
011 = Illinois (3 bits cover 8 entries)
…
111 = Washington
000000 = Alaska 40 Low Frequency States
000001 = Rhode Island (6 bits cover 64 entries)
…
Exploiting skew in data distribution improves compression ratio

Very effective since all values in a column have the same data type
Figure 3-15. Big Idea #3 : BLU uses Multiple Compression Techniques to achieve extreme compression CL4636.0
Notes:
Compression for row-organized tables maps repeated byte patterns across multiple
columns to shorter dictionary codes. In contrast, compression for column-organized tables
maps entire values to dictionary codes. Since all the column values have same data type,
frequency-based compression can be used to highly compress the data.
The example shows that multiple encoding types can be associated with a single column of
a table. DB2 selects the combination of encoding methods for each column based on the
column data.
V5.4
Instructor Guide

Purpose — To explain the different approach used to encode column data for
column-organized tables compared to the row based compression used in previous
releases.
Details —
Transition statement — Next we will discuss the two types of compression used for
Instructor Guide
Column-Level Dictionaries are Static
Update Column-Level Page

Dictionaries Compression
• Once created, column-level • Page compression reduces need to

dictionaries for a table are never rebuild column dictionaries!
updated • New values not covered by static
• REORG can not be used to rebuild column level dictionaries can still be
the column level dictionaries encoded and compressed at the page
• Table data can be unloaded and level dictionaries
reloaded to rebuild dictionaries • Reduces deteriorating compression
ratio over time
Figure 3-16. Column-Level Dictionaries are Static CL4636.0
Notes:
With column-organized table, the column-level dictionaries are created by the LOAD utility
as data is loaded. These column-level dictionaries are based on a scan of the LOAD utility
input and are never updated once they are built.
With DB2 10.5, the REORG utility does not support reorganizing column-organized table,
so it can not be used to rebuild the compression dictionaries.
If a column-organized table is extended with new data and the compression ratio drops, the
table data could be unloaded and a new LOAD into an empty table could be used to build
new column dictionaries.
DB2 does utilize page level dictionaries for column-organized tables. As column data for a
set of rows is added to a page, the static column dictionary is used to compress the data.
Once the page is nearly full, DB2 will check to see if a page level dictionary, which could
handle data values not covered by the column dictionary or some page level adjustment to
encoding can be used to improve compression results for the page. In some cases, DB2
could determine that the space needed for the page compression dictionary exceeds the
space saved and the page compression would not be used.
V5.4
Instructor Guide
Uempty The page level dictionary option, is intended to reduce the requirement to rebuild the static
column dictionaries.
Once a page is filled the page level dictionary will not be changed due data change activity.
Instructor Guide
Instructor notes:
Purpose — To discuss the use of the column dictionaries which are static once built and
the page level compression dictionary which can be built as new pages are added for a
column’s data.
Details —
Transition statement — Next we will discuss the function of the LOAD utility that creates
the column dictionaries for a column-organized table.
V5.4
Instructor Guide
Uempty
Column Dictionaries are built by the LOAD utility

• ANALYZE phase added to LOAD utility for column-organized tables
• Load utility input is scanned to analyze the data content for each table column prior
to normal scan to begin load phase
• LOAD can acquire large amounts of utility heap memory to track frequent values in
each data column
– Histograms are created to collect frequent values
– Memory utilization can trigger reduction in histogram data for columns with many
distinct values
• Once data is scanned, histograms are used to create a custom column dictionary for
each column
– Column dictionary can use multiple types of encoding for a single column
– Each column dictionary allows for unencoded data
• Column dictionaries can be much larger than the table level dictionaries for row
based compression
– Column based dictionary size in megabytes
– Row based static dictionary size in kilobytes
Figure 3-17. Column Dictionaries are built by the LOAD utility CL4636.0
Notes:
When data is being loaded into a column-organized table, the first phase is the analyze
phase, which is unique to column-organized tables.
The analyze phase occurs only if a column compression dictionary needs to be built, which
happens during a LOAD REPLACE operation, a LOAD REPLACE RESETDICTIONARY
operation, a LOAD REPLACE RESETDICTIONARYONLY operation, or a LOAD INSERT
operation (if the column-organized table is empty).
For column-organized tables, this phase is followed by the load, build, and delete phases.
Loading data into column-organized tables is very similar to loading data into
row-organized tables, with the following exceptions:
The input data source is processed twice if a column compression dictionary must be built.
If the input source can be reopened, it is read twice. If the input source cannot be
reopened, its contents are temporarily cached in the load temporary file directory. The
default path for load temporary files is located under the instance directory, or in a location
that is specified by the TEMPFILES PATH option on the LOAD command.
Instructor Guide
During the scan for the ANALYZE phase, the LOAD utility can use a large amount of
memory in the database utility heap to track the column data values for each column. The
goal is to keep column values that are used in many rows so that they can be encoded to
save space. DB2 creates histograms for each column to store the values and count
occurrences. As utility heap memory becomes full, DB2 will decide how to refine column
information in the histograms to reduce memory consumption.
Once the input scan is completed, DB2 to review the information collected in the
histograms for each column to determine the most efficient encoding techniques to use for
each column. One factor is the need to limit the size of the column dictionaries.
V5.4
Instructor Guide

Purpose — To discuss briefly how the column dictionaries are created by a new phase of
LOAD utility processing. This is somewhat similar to the scan used by REORG to build the
row compression dictionary for a table, but the compression used here is much more
complex.
Details —
Additional information — Students may ask about automatic dictionary creation, that
DB2 implemented for row-organized compression. At the time of the general availability for
DB2 10.5, ADC is not enabled for column-organized tables.
Transition statement — Next we will discuss LOAD processing for column-organized
tables.
Instructor Guide
Load Processing for column-organized Tables
Pass 1 ANALYZE PHASE

Only if dictionaries need to be built
Build
Convert raw data histograms Build column
Input from row-organized to track compression
Source format to column- value dictionaries
organized format frequency
Pass 2 LOAD PHASE

User Table
Convert raw data Compress values.

from row-organized Build data pages.
Input
format to column- Update synopsis Synopsis
Source Build keys for page Table
organized format
map index and any
unique indexes.
Index keys
Figure 3-18. Load Processing for column-organized Tables CL4636.0
Notes:
Load has three phases that are specific to Column-Organized Tables:
- ANALYZE Phase 1
• Only applies to column-organized tables if dictionaries needed
- Load Replace or Load Insert into empty table where KEEPDICTIONARY not
specified.
• Histograms are built to track frequency of data values of all columns
• Column compression dictionaries are built based on histograms
- LOAD Phase 2 - Modified for column-organized tables
• Column and page compression dictionaries used to compress data
• Compressed values written to data pages
• Synopsis table maintained
• Keys built for page map index and any unique indexes
V5.4
Instructor Guide
Uempty - BUILD Phase 3

• Page map index and any unique indexes are built
Instructor Guide
Instructor notes:
Purpose — To review the LOAD utility processing phases for column-organized tables.
Details —
Transition statement — Next we will see you can monitor progress for loading a
column-organized table with LIST UTILITIES.
V5.4
Instructor Guide
Uempty
Monitor column-organized table LOAD using LIST
UTILITIES command
db2 list utilities show detail
ID = 1
Type = LOAD
Database Name = TESTBLU
Member Number = 0
Description = [LOADID: 50.2013-05-20-08.29.44.906635.0 (4;4)]
[*LOCAL.inst20.130520122733] OFFLINE LOAD DEL AUTOMATIC INDEXING REPLACE COPY NO INST20
.HIST2
Start Time = 05/20/2013 08:29:44.937717
State = Executing
Phase Number = 1
Description = SETUP
Total Work = 0 bytes
Completed Work = 0 bytes
Start Time = 05/20/2013 08:29:44.937722
Phase Number [Current] = 2

Description = ANALYZE
Total Work = 472595 rows
Completed Work = 280415 rows
Start Time = 05/20/2013 08:29:45.071666
Phase Number = 3
Description = LOAD
Figure 3-19. Monitor column-organized table LOAD using LIST UTILITIES command CL4636.0
Notes:
The visual shows the LIST UTILITIES command output for the LOAD utility with a
In the sample output, the current phase of load processing is the ANALYZE phase, where
the column-dictionaries are being built.
Compared to a LOAD for a row-organized table, the LOAD processing for a
column-organized table should save considerable time with a reduction in time spent
building and updating indexes. The page map index, is page based not row based, so it will
be relatively small and easy to build.
If any enforced primary key or unique key constraints are defined on the table, LOAD will
need to build those indexes.
Instructor Guide
Instructor notes:
Purpose — To show an example of a LIST UTILITIES report showing the extra ANALYZE
phase for the loading of a column-organized table.
Details —
Transition statement — Next we will discuss the importance of database utility heap
memory for loading column-organized tables.
V5.4
Instructor Guide
Uempty
Utility Heap Memory Considerations for LOAD utility
with column-organized tables
•Faster Load Performance

•Better Compressed Tables
•Faster Query Performance
Load allocates memory from utility heap

util_heap_sz recommendations:
• Use minimum UTIL_HEAP_SZ = AUTOMATIC (1000000)
• Use UTIL_HEAP_SZ = AUTOMATIC (4000000)
if database server has >= 128 GB of memory
• New AUTOMATIC option allows utilities to use database memory overflow
• Configured/reserved UTIL_HEAP_SZ + available overflow < 25% db memory
Figure 3-20. Utility Heap Memory Considerations for LOAD utility with column-organized tables CL4636.0
Notes:
Insufficient memory during load utility processing for row-organized tables can reduce load
performance but does not affect the efficiency of compression on the tables. After the load
completes, there is no long-term negative effect on the tables.
In contrast, insufficient memory during the load ANALYZE phase for column-organized
tables could yield less than optimal compressed tables. It is critical to have sufficient utility
heap memory, but the memory allocated for utility should be in proportion to the total
amount of database server memory so that performance of non-utility operations is not
impacted. The performance of the LOAD phase can be improved by additional memory.
Sufficient memory results in faster load performance, better compressed tables, and faster
query performance.
• The util_heap_sz (utility heap size) database configuration parameter should be set to
at least 1,000,000 pages to address the resource needs of the LOAD command.
• If the database server has at least 128 GB of memory, util_heap_sz should be set to
4,000,000 pages.
Instructor Guide
• If concurrent load utilities are running, the util_heap_sz value should be increased to
accommodate higher memory requirements.
• If memory is scarce, the util_heap_sz value can be increased dynamically only when a
load operation is running
V5.4
Instructor Guide

Purpose — To discuss the importance of setting database utility heap memory size large
for loading column-organized tables.
Details —
Transition statement — Next we will review the options for building the column
dictionaries for the LOAD utility.
Instructor Guide
Dictionary creation options for column-organized

tables
• Load Utility
RESETDICTIONARY
Default for column-organized tables to ensure LOAD REPLACE always
rebuilds dictionary with new data
REPLACE RESETDICTIONARYONLY
Creates dictionary based on input data without loading any rows
Can create dictionary prior to ingesting any data from SQL-based utilities
like INGEST or IMPORT
KEEPDICTIONARY
Use existing dictionary to compress data
LOAD INSERT with an non-empty table must use the KEEPDICTIONARY
option
Automatic Dictionary Creation (ADC)
ADC for column-organized tables enabled with DB2 10.5 fixpack 1
Uses a small sample to create the column dictionaries which may not
produce the best compression results
Page level dictionaries can be built with DB2 10.5 fixpack 4
Figure 3-21. LOAD utility options for column-organized tables CL4636.0
Notes:
Since the column-dictionaries used for column-organized tables are very dependent on the
table data, the default option for loading column-organized tables with a LOAD REPLACE
is RESETDICTIONARY, to build new column dictionaries.
The new option RESETDICTIONARYONLY creates a column compression dictionary that
is based on the input file without loading any rows. You can use this option to create the
compression dictionary before you ingest any data by using SQL-based utilities. This
option is applicable to column-organized tables only.
The KEEPDICTIONARY option, must be used for LOAD INSERT to a table that contains
data. It can also be used for LOAD REPLACE to keep the column dictionaries previously
created and apply that to the new input without using the ANALYZE phase of processing.
Starting with DB2 10.5 Fix Pack 1, automatic dictionary creation (ADC), can build the
column dictionaries for a column-organized table during SQL INSERT processing, which
would include IMPORT or INGEST usage. With DB2 10.5 Fix Pack 4 and above, INSERT
processing is able to create page-level column dictionary data to improve compression
results.
V5.4
Instructor Guide

Purpose — To discuss the options of the LOAD utility for building or keeping the column
dictionaries used for compressing column-organized tables. The building of column
dictionaries with INSERT processing was enabled and enhanced after the initial release of
DB2 10.5.
Details —
Transition statement — Next we will discuss using the information in SYSCAT.TABLES to
check the disk space usage for column-organized tables.
Instructor Guide
Using the SYSCAT.TABLES data for column-organized

tables to check size, compression results
• New column : TABLEORG
– Value ‘C’ indicates column-organized table
– Value ‘R’ indicates traditional row-organized tables
• Checking for efficient compression
– Use PCTPAGESSAVED
• An estimate is based on the number of data pages needed to store the
table in uncompressed row organization. (from RUNSTATS)
• Understanding the current space used
– NPAGES – number of pages with table data
• For column-organized these are in the Column Data Object
– FPAGES – is the total number of pages
• This includes the Column Data Object and the Data Object
– MPAGES – is the number of pages for table metadata
• This includes the columns dictionary data in the Data Object
Figure 3-22. Using the SYSCAT.TABLES data for column-organized tables to check size, compression results CL4636.0
Notes:
The system catalog view SYSCAT.TABLES can be used to check for efficient compression
results with column-organized tables. The value of the column PCTPAGESSAVED
provides an estimate of the savings in data page storage for column-organized tables
compared to the uncompressed row data. This information is collected by the RUNSTATS
utility.
The column TABLEORG in SYSCAT.TABLES will have a value of ‘C’ for column-organized
tables.
For column-organized tables, the user table data is stored in a special column organized
storage object. The column NPAGES in SYSCAT.TABLES is a count of these pages with
table column data.
Since the column dictionaries for column-organized tables can be much larger than the
dictionary data stored for row organized tables, the column MPAGES in SYSCAT.TABLES
shows the total number of pages used for table metadata, which includes these column
dictionaries.
V5.4
Instructor Guide
Uempty The column FPAGES in SYSCAT.TABLES shows a total page count for column-organized
tables which includes both the data and column organized object.
Instructor Guide
Instructor notes:
Purpose — To discuss the information in the SYSCAT.TABLES view that can be used to
track compression results and current space usage for column-organized tables.
Details —
Transition statement — Next we will look at an example query that uses
SYSCAT.TABLES to check space usage for column-organized tables.
V5.4
Instructor Guide
Uempty
Catalog information for Column Oriented tables

SELECT VARCHAR(TABNAME,12) AS TABNAME, CARD,
NPAGES, FPAGES,
MPAGES, TABLEORG, PCTPAGESSAVED
FROM SYSCAT.TABLES WHERE TABSCHEMA = 'TEST'
ORDER BY CARD
TABNAME CARD NPAGES FPAGES MPAGES TABLEORG PCTPAGESSAVED

------------ --------- ------ ------ -------- --------- -------------
BRANCH 100 8 9 1 C 0
TELLER 1000 9 10 1 C 0
HISTORY 513576 171 181 10 C 84
ACCT 1000000 137 138 1 C 96
Figure 3-23. Catalog information for Column Oriented tables CL4636.0
Notes:
The sample SQL query uses the SYSCAT.TABLES view to check the compression results
and disk space allocations for several column-organized tables.
Notice that the small tables do not show good compression results. This is caused by the
allocation of pages per column in column-organized tables that are not filled with just a few
rows of data. Since these tables are small, the lack of space savings for storage is not
important.
Note
Some of the catalog statistics that have been used to evaluate compression results for
row-organized tables do not apply to column-organized tables. Some examples are
AVGCOMPRESSEDROWSIZE and PCTROWSCOMPRESSED.
Instructor Guide
We will discuss later why it is still recommended to create the small tables that will be
joined with larger tables as column-organized to get the benefits of efficient join processing.
V5.4
Instructor Guide

Purpose — To show an example of a DB2 catalog query showing compression results and
space usage for column-organized tables.
Details —
Transition statement — Next we will check the statistics in SYSCAT.COLUMNS that
shows if individual columns of a column-organized table are being efficiently compressed.
Instructor Guide
Use PCTENCODED in SYSCAT.COLUMNS to check for

columns with a low percentage of encoded values
SELECT COLNO, VARCHAR(COLNAME,20) AS COLNAME,
VARCHAR(TABNAME,20) AS TABNAME, COLCARD, PCTENCODED FROM SYSCAT.COLUMNS
WHERE TABSCHEMA = 'TEST' AND TABNAME IN ('ACCT','HISTORY')
ORDER BY TABNAME,COLNO
COLNO COLNAME TABNAME COLCARD PCTENCODED

------ -------------------- -------------------- -------------------- ----------
0 ACCT_ID ACCT 1000000 100
1 NAME ACCT 1 100
2 ACCT_GRP ACCT 992 100
3 BALANCE ACCT 14 100
4 ADDRESS ACCT 1 100
5 TEMP ACCT 1 100
0 ACCT_ID HISTORY 176128 100
1 TELLER_ID HISTORY 984 100
2 BRANCH_ID HISTORY 100 100
3 BALANCE HISTORY 2976 100
4 DELTA HISTORY 1 100
5 PID HISTORY 4 100
6 TSTMP HISTORY 465140 100
7 ACCTNAME HISTORY 1 100
8 TEMP HISTORY 2 100
Figure 3-24. Use PCTENCODED in SYSCAT.COLUMNS to check for columns with a low percentage of encoded values CL4636.0
Notes:
The sample SQL query shows how the SYSCAT.COLUMNS view can be used to check if
each column dictionary of a column-organized table is efficiently compressing the data
values for that column of the table.
The column PCTENCODED in SYSCAT.COLUMNS shows the percentage of values that
are encoded as a result of compression for a column in a column-organized table. If the
encoding techniques stored in the column dictionary of a column do not match a data value
in the column, the data can be stored as unencoded. If a large percentage of the values in
a column are unencoded, it may be the data used to build the column dictionary does not
match the current data for the column. Page level dictionaries can handle some new data
values by creating a page level dictionary that provides encoding not present in the static
column dictionary.
V5.4
Instructor Guide

Purpose — To show an example that uses the SYSCAT.COLUMNS information to check
compression results on a column basis.
Details —
Transition statement — Next we will look at some examples of the compression results
with column-organized tables based on early product tests.
Instructor Guide
Compression result examples with column-organized

tables using DB2 10.5
• Approximately 2x-3x storage reduction compared to DB2 10.1
adaptive compression (comparing all objects - tables, indexes, etc)
– New advanced compression techniques
– Fewer storage objects required
26 © Copyright IBM Corporation 2014
Figure 3-25. Compression result examples with column-organized tables using DB2 10.5 CL4636.0
Notes:
The visual shows the reductions is storage requirements for several different types of
customer data using DB2 10.5 column-organized tables.
The three sets of bars show:
• The space requirements for uncompressed tables using DB2 10.1
• The space required for compressed table using DB2 10.1 adaptive compression.
• The space required for compress column-organized tables using DB2 10.5
A portion of the space saved using column-organized tables is the space that was used for
additional objects like indexes for the row-organized tables.
V5.4
Instructor Guide

Purpose — This shows some sample data storage savings for early testing using the
column-organized tables with DB2 10.5.
Details —
Transition statement — Next we will see how DB2 uses the synopsis table for a
column-organized table to perform data skipping.
Instructor Guide
Big Idea #4 Data Skipping - Synopsis Table used to

improve scan efficiency for column-organized tables
User table: SALES_COL
0
Meta-data that describes which ranges of
S_DATE QTY ...
values exist in which parts of the user table
0 2012-03-01 176 ...
2012-03-02 85 ...
SYN130330165216275152_SALES_COL 2012-03-02 267
TSNMIN TSNMAX S_DATEMIN S_DATEMAX ... 2012-03-04 231
0 1023 2012-03-01 2012-04-04 ... ...
1024 2047 2012-04-06 2012-05-01 ... 1023
2012-04-04
1024
... ...
TSN = Tuple Sequence Number

2047
...
Enables DB2 to skip portions of table when scanning data during query
Benefits from data clustering, loading pre-sorted data
• Predicate WHERE S_DATE = 2012-05-01 would skip first range
• Predicate WHERE S_DATE > 2012-04-03 would scan both ranges
Figure 3-26. Big Idea #4 Data Skipping - Synopsis Table used to improve scan efficiency for column-organized tables CL4636.0
Notes:
It is common to create indexes on row-organized tables to provide more efficient access to
various subsets of a table and to avoid table scans which consume system I/O resources
and memory. To support different types of table access, a series of indexes may be created
which require storage space and increase the overhead of loading and changing table
data.
With DB2 10.5, Column-organized tables to not support user created indexes. DB2 does
automatically maintain a synopsis table for column-organized tables. Unlike standard
indexes, the synopsis table describes a range of rows rather than a single row.
The visual shows an example of a table SALES_COL, which contains a date column
named S_DATE. As data rows are added to the table SALES_COL, DB2 records the
minimum and maximum values for the column S_DATE for a range of rows.
The first entry in the synopsis table covers the first 1024 rows or TSNs. For the column
S_DATE the synopsis table has two columns that indicate a range of data values from
‘2012-03-01’ to ‘’2012-04-04’.
V5.4
Instructor Guide
Uempty If an application requests data from the table SALES-COL and includes the predicate
WHERE S_DATE = ‘2012-05-01’, DB2 could use the information in the synopsis table to
skip reading the first 1025 TSNs, since the predicate is outside the range contained in
those rows. The second range, TSNs 1024 to 2047 would be scanned.
If an application selects data from the table using the predicate ‘S-DATE > ‘2012-04-03’,
the data for the S_DATE column in the synopsis table would indicate that both of the TSN
ranges shown would need to be scanned.
Important
The information in the synopsis table is most valuable when the table data is clustered
using the column referenced by the predicate. In the previous example, if the data is loaded
in S_DATE column sequence, then queries that include predicates for certain ranges of
S_DATE values can use the synopsis table to skip large portions of the table data.
If a table has rows in a random sequence for a particular column, the synopsis table will be
less useful in allowing ranges of rows to be skipped during scans.
This is similar to having a very unclustered index on a table. In order to retrieve a large set
of result rows, many pages need to be accessed and DB2 may decide to perform a full
scan and bypass using the index.
Instructor Guide
Instructor notes:
Purpose — To introduce the concept of using the synopsis table to perform data skipping.
Details —
Transition statement — Next we will discuss more about synopsis tables.
V5.4
Instructor Guide
Uempty
Additional information about synopsis tables

• Since column-organized tables do not have traditional indexes, the synopsis table
provides information that DB2 can use to reduce the number of pages accessed
when the SQL statement includes predicates
• Each row of data in the synopsis table contains a summary of column values for a
range of 1024 table rows
• Synopsis tables have a schema name of SYSIBM
• A synopsis table is a column-organized table that is automatically created and
maintained by the system to store metadata for an associated user-defined column-
organized table.
• The synopsis table contains all the user table's non-character columns (that is,
datetime, Boolean, or numeric columns) and those character columns that are part
of a primary key or foreign key definition.
– With DB2 10.5 Fix Pack 4, the synopsis table for a new column-organized table
also includes CHAR, VARCHAR, GRAPHIC, and VARGRAPHIC columns.
• The synopsis table stores the minimum and maximum values for each column
across a range of rows and uses those values to skip over data that is of no interest
to a query during evaluation of certain type of predicates (=, >, >=, <, <=,
BETWEEN, NOT BETWEEN, IS NOT NULL, IN, and NOT IN).
• The only supported operation against a synopsis table is a select operation.
Figure 3-27. Additional information about synopsis tables CL4636.0
Notes:
The synopsis table includes information about multiple columns for a range of 1024 data
rows, so it will be much smaller than the table that it describes.
Synopsis tables are automatically created and maintained by DB2. You do not select which
columns are summarized within the synopsis table. As data rows are loaded or inserted,
DB2 will create the matching entries in the synopsis table.
The synopsis table is stored internally as a column-organized table. DB2 is able to access
the information about selected columns very efficiently.
DB2 created a generated name for synopsis tables than begin with a prefix of ‘SYN’ and
use the table name as suffix. DB2 uses the schema name of SYSIBM rather than using the
schema of the base table.A synopsis table is a column-organized table that is automatically
created and maintained by the system to store metadata for an associated user-defined
Instructor Guide
The synopsis table contains all the user table's non-character columns (that is, datetime,
Boolean, or numeric columns) and those character columns that are part of a primary key
or foreign key definition.
Note
As of Version 10.5 Fix Pack 4, the synopsis table for a new column-organized table also
includes CHAR, VARCHAR, GRAPHIC, and VARGRAPHIC columns.
The synopsis table stores the minimum and maximum values for each column across a
range of rows and uses those values to skip over data that is of no interest to a query
during evaluation of certain type of predicates (=, >, >=, <, <=, BETWEEN, NOT
BETWEEN, IS NOT NULL, IN, and NOT IN).
The only supported operation against a synopsis table is a select operation.
V5.4
Instructor Guide

Purpose — To provide information about the characteristics of synopsis tables.
Details —
Transition statement — Next we will look at the DESCRIBE TABLE output for a synopsis
table.
Instructor Guide
Sample Describe output for the Synopsis table

associated with a column-organized Table
• Describe table SYSIBM.SYN130617110037170122_HISTORY
Data type Column

Column name schema Data type name Length Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
ACCT_IDMIN SYSIBM INTEGER 4 0 No
ACCT_IDMAX SYSIBM INTEGER 4 0 No
TELLER_IDMIN SYSIBM SMALLINT 2 0 No
TELLER_IDMAX SYSIBM SMALLINT 2 0 No
BRANCH_IDMIN SYSIBM SMALLINT 2 0 No
BRANCH_IDMAX SYSIBM SMALLINT 2 0 No
BALANCEMIN SYSIBM DECIMAL 15 2 No
BALANCEMAX SYSIBM DECIMAL 15 2 No
DELTAMIN SYSIBM DECIMAL 9 2 No
DELTAMAX SYSIBM DECIMAL 9 2 No
PIDMIN SYSIBM INTEGER 4 0 No
PIDMAX SYSIBM INTEGER 4 0 No
TSTMPMIN SYSIBM TIMESTAMP 10 6 No
TSTMPMAX SYSIBM TIMESTAMP 10 6 No
ACCTNAMEMIN SYSIBM CHARACTER 20 0 No
ACCTNAMEMAX SYSIBM CHARACTER 20 0 No
TEMPMIN SYSIBM CHARACTER 6 0 No
TEMPMAX SYSIBM CHARACTER 6 0 No
TSNMIN SYSIBM BIGINT 8 0 No
TSNMAX SYSIBM BIGINT 8 0 No
Low/High tuple sequence no.

for a range of rows
Figure 3-28. Sample Describe output for the Synopsis table associated with a column-organized Table CL4636.0
Notes:
Assume a column-organized table is created with the following CREATE TABLE statement:
CREATE TABLE HISTORY
(ACCT_ID INTEGER NOT NULL,
TELLER_ID SMALLINT NOT NULL,
BRANCH_ID SMALLINT NOT NULL,
BALANCE DECIMAL(15,2) NOT NULL,
DELTA DECIMAL(9,2) NOT NULL,
PID INTEGER NOT NULL,
TSTMP TIMESTAMP NOT NULL WITH DEFAULT,
ACCTNAME CHAR(20) NOT NULL,
TEMP CHAR(6) NOT NULL)
ORGANIZE BY COLUMN
IN TSROWD INDEX IN TSROWI;
V5.4
Instructor Guide
Uempty The DESCRIBE TABLE sample output shows the column names used for a synopsis table
that would be automatically created for that table.
Notice that the column names are prefixed with the names of the base table column and
suffixed by ‘MIN’ or ‘MAX’ and have the same data type as the base table column.
There is also a pair of columns named ‘TSNMIN’ and ‘TSNMAX’ to store the starting and
ending TSN ids for the set of data rows.
Instructor Guide
Instructor notes:
Purpose — To show the generated column names for a synopsis table.
Details —
Transition statement — Next we will describe some of the other techniques used by DB2
to reduce processing costs for access to column-organized tables.
V5.4
Instructor Guide
Uempty
Creative approaches to reducing processing costs
for queries with column-organized tables
• With row-organized tables processing costs tend to be closely
linked to the number of rows accessed
– Row compression requires the entire row to be uncompressed to
access any column data for joins or predicate evaluation
– Row organization may require more pages to be accessed in buffer
pools and read from disk
• With column-organized tables
– Late decompression, the ability to operate directly on compressed
data for certain operations, thereby reducing memory usage
– A vector processing engine for processing vectors of column data
instead of individual values.
– Improved system scaling across cores
• Many operations on column-organized tables designed for intra-parallel
processing
– Multiplied CPU power that uses single instruction, multiple data
(SIMD) processing for many operations.
Figure 3-29. Creative approaches to reducing processing costs for queries with column-organized tables CL4636.0
Notes:
With row-organized tables we tend to associate costs with each row accessed by a query.
With row based compression, the entire data row needs to be uncompressed before any
predicate can be applied to a column value or perform a join with another table. In some
cases, an index could be used to bypass the need to access the full data row.
To access one million rows of data, the data pages containing the full rows will need to be
accessed in a buffer pool and possibly read from disk.
The design of column-organized tables uses a technique referred to as late
decompression, where some predicates and operations like joins can operate directly on
the compressed form of the column value.
Column-organized processing also utilizes a technique of vector processing for column
data rather than performing operations once per data row.
The new routines that were developed to support access to column-organized tables were
designed to drive parallel processing by multiple CPU cores.
Instructor Guide
In some cases DB2 leveraged hardware machine instructions termed SIMD for Single
Instruction with Multiple Data that improve the efficiency of handling sets of data instead of
operating on each single column value.
V5.4
Instructor Guide

Purpose — To briefly introduce some of the new technologies that work together in
column-organized processing.
Details —
Transition statement — Next we will discuss the deep hardware exploitation used for
Instructor Guide
Big Idea #5 – Deep Hardware Instruction

Exploitation SIMD
• Actionable compression Instruction Encoded Data
s
– Order-preserving encoding allows
predicates to be evaluated on compressed
data
• SIMD (Single Instruction Multiple Data)
parallelism used for fast predicate
Results
evaluation on multiple compressed
values
• Avoiding decompression during
predicate evaluation provides significant
query performance gains
Figure 3-30. Big Idea #5 - Deep Hardware Instruction Exploitation SIMD CL4636.0
Notes:
The fifth big idea used for DB2 BLU Acceleration is the use of special hardware instructions
to work on multiple data elements with a single instruction. This is known as Single
Instruction Multiple Data or SIMD. There are special instructions available on various
hardware platform to accomplish this. DB2 10.5 includes special BLU Acceleration code to
do this. Currently DB2 can put as much data as 128 bits into a SIMD register.
The implementation for column based compression allows ‘actionable compression’,
meaning a number of operations can used the compressed form of a column value directly.
The compression technique is ‘order preserving’ meaning DB2 can perform operations like
checking ‘GREATER THAN’ and ‘LESS THAN’, not just ‘EQUAL TO’ comparisons on the
compressed column values.
DB2 can use a single machine instruction to multiply the power of the CPU, getting results
of all those data values packed into the register. This means we can use the power of the
CPUs while performing scans but also joins and aggregations.
V5.4
Instructor Guide

Purpose — To discuss in more detail the use of special machine instructions that can
reduce processing costs for column-organized tables.
Details —
Transition statement — Next we will discuss a simple example of how vector processing
can be used by DB2 for column-organized tables.
Instructor Guide
Vector processing engine for processing vectors

of column data instead of individual values
When column data is encoded and stored in a page, the column
values that share the same encoding technique are stored
together as a processing vector
For example:
– DB2 scanned the input and found in a particular column, CITY_NAME
contained 15 values that occurred in many rows
– Those 15 values are each assigned a unique 4-bit code in the column
dictionary for the CITY_NAME column for the table
– DB2 can store the column values for 16 rows into a 64-bit unit of
storage
– When processing that column, DB2 can perform some operations on
the 64-bit unit of storage rather than processing each of the 16 column
values one at a time
Figure 3-31. Vector processing engine for processing vectors of column data instead of individual values CL4636.0
Notes:
One technique used by DB2 to reduce processing costs for queries using
column-organized tables is vector processing. This utilizes the encoded data which DB2
stored based on the column based compression dictionaries.
For example, if a table contained a column named CITY_NAME, DB2 might find that fifteen
values occurred in many data rows, like ‘NEW YORK’, ‘PARIS’ or ‘TOKYO’. DB2 can
assign unique 4-bit codes to each of the fifteen values.
As rows of data are loaded into the table, DB2 creates pages with values from the
CITY_NAME column. Any rows with a value equal to the fifteen encoded cities can be
stored together into 64-bit units of storage which could hold sixteen values.
Some operations can load the 64-bits of data into a register to be acted on at the same time
instead of repeating a single column operation sixteen times.
V5.4
Instructor Guide

Purpose — To discuss the use of vectors of column values compared to operations on
single column vales. This should stay at a very high level of discussion. The internal
storage methods are complex and subject to change over time. The concept is that DB2
utilizes the work done to compress the data to reduce processing costs during data access.
Details —
Transition statement — Next we will discuss core friendly processing.
Instructor Guide
Big Idea #6 – Core friendly parallelism

• Column-organized processing optimized to
the physical attributes of the server
– Queries on BLU Acceleration tables
automatically parallelized
• Maximizes CPU cache, cacheline

efficiency
Figure 3-32. Big Idea #6 - Core friendly parallelism CL4636.0
Notes:
With DB2 10.5 with BLU Acceleration close attention is paid to multi-core parallelism. DB2
with BLU Acceleration is designed from the ground up to take advantage of the cores you
have and to always drive multi-core parallelism for the queries you have. This is all done in
shared memory – this is not the inter-partition parallelism used for database partitioning.
Part of the code design included processing that would optimize efficiency of hardware
memory and cache operations to improve performance.
V5.4
Instructor Guide

Purpose — To discuss the inclusion of processing that drives multiple CPU cores
efficiently.
Details —
Additional information — In the past, the DB2 product tended to approach complex
analytical queries with parallel processing based on inter-partition parallelism which used
multiple database partitions to drive multiple system processor for parallel processing. The
processing for column-organized tables is design for efficient intra-parallel processing,
which does not require multiple database partitions to drive many CPU cores.
Transition statement — Next we will discuss the link between column-organized
processing and DB2 intra-parallel processing.
Instructor Guide
Intraquery parallelism and intrapartition parallelism

required for Column-organized tables
• The processing of a query against column-organized tables
requires that intraquery parallelism be enabled for the
application
• The following statement types require intraquery parallelism:
– All DML operations that reference column-organized tables
– The ALTER TABLE ADD UNIQUE / PRIMARY KEY CONSTRAINT
statement against a column-organized table if rows have been
inserted into the table
– The RUNSTATS command against a column-organized table
– The LOAD command against a column-organized table
– If an application attempts to execute one of these statements or
commands without intraquery parallelism enabled, an error is returned
• Setting DB2_WORKLOAD=ANALYTICS implicitly enables
intrapartition parallelism for workload objects created with
MAXIMUM DEGREE set to DEFAULT,
Figure 3-33. Intraquery parallelism and intrapartition parallelism required for Column-organized tables CL4636.0
Notes:
The processing of a query against column-organized tables requires that intraquery
parallelism be enabled for the application that is compiling and executing the query. The
following statement types require intraquery parallelism:
• All DML operations that reference column-organized tables
• The ALTER TABLE ADD UNIQUE / PRIMARY KEY CONSTRAINT statement against a
column-organized table if rows have been inserted into the table
• The RUNSTATS command against a column-organized table
• The LOAD command against a column-organized table
If an application attempts to execute one of these statements or commands without
intraquery parallelism enabled, an error is returned.
Access to column-organized tables also requires intrapartition parallelism.
Setting DB2_WORKLOAD=ANALYTICS implicitly enables intrapartition parallelism for
workload objects created with MAXIMUM DEGREE set to DEFAULT, and is recommended
V5.4
Instructor Guide
Uempty when using column-organized tables. If this registry variable is not set, or is set to another
value, intrapartition parallelism must be explicitly enabled before column-organized tables
can be accessed.
Intrapartition parallelism can be enabled or disabled explicitly by using one of the following
methods:
• Instance level. Intrapartition parallelism can be enabled for the instance by setting the
intra_parallel database manager configuration parameter to YES. This setting is not
dynamic and requires the instance to be restarted.
• Database level. Intrapartition parallelism can be enabled to apply to all applications that
map to a particular workload. To enable intrapartition parallelism for a workload, set its
MAXIMUM DEGREE attribute to a value greater than 1. This setting is dynamic and will
take effect at the beginning of the next unit of work. Users without a specific workload
management configuration can set the MAXIMUM DEGREE attribute on
SYSDEFAULTUSERWORKLOAD to enable intrapartition parallelism at the database
level. Users with a customized workload management configuration can use this
attribute to enable intrapartition parallelism for a specific workload, such as a subset of
applications that access column-organized tables.
• Application level. Intrapartition parallelism can be enabled for an individual application
by calling the ADMIN_SET_INTRA_PARALLEL procedure. The setting takes effect at
the start of the application's next unit of work. This is a good option if you want
intrapartition parallelism disabled at the instance level, but enabled for specific
applications that access column-organized tables.
Instructor Guide
Instructor notes:
Purpose — To discuss the requirement for DB2 intra-parallel processing to support many
functions associated with column-organized tables. In a simple database system, the
setting of DB2_WORKLOAD to ANALYTICS is all that is needed to enable this feature. In a
mixed application environment, some additional configuration might be required.
Details —
Transition statement — Next we will discuss scan-friendly memory caching.
V5.4
Instructor Guide
Uempty
Big Idea #7 – Scan friendly memory caching

• Memory optimized (not “In-Memory”)
– No need to ensure all data fits in memory
• BLU includes new scan-friendly victim

selection to keep a near optimal % of pages
buffered in memory
– Traditional RDMSes use ‘most recently used’ RAM
victim selection for large scans
• “There’s no hope of caching everything, so just Near optimal caching
victimize the last page read”
– A key BLU design point is to run well when all
data fits in memory, and when it doesn’t !
• Even with large scans, BLU prefers DISKS
selected pages in the bufferpool, using
an algorithm that adaptively computes
a target hit ratio for the current scan,
based on the size of the bufferpool,
the frequency of pages being re-accessed
in the same scan, and other factors
– Benefit: less I/O !
Figure 3-34. Big Idea #7 - Scan friendly memory caching CL4636.0
Notes:
DB2 10.5 column-organized table support includes an enhanced caching strategy for buffer
pools to substantially reduce I/O.
With row-organized tables, the need to uncompress the entire row and read the full row of
columns into the buffer pool influenced the query processing to perform all of the
operations on the columns for that row at the same time.
DB2 uses unique processing logic for column-organized tables that marks those buffer pool
pages containing column data that will likely need to be accessed again to be retained
longer in the buffer pool and avoid extra I/Os.
DB2 designed the processing for column-organized tables to fully utilize buffer pool
memory but is also optimized to handle larger tables that exceed buffer pool capacity.
Instructor Guide
Instructor notes:
Purpose — To explain the DB2 developed new processing logic for column-organized
tables to keep high-value pages of column data in the buffer pool memory to avoid extra I/O
operations to reread pages.
Details —
Transition statement — Next we will discuss the use of dynamic prefetch for column
organized tables.
V5.4
Instructor Guide
Uempty
Dynamic List prefetching for column-organized
table access
• Dynamic list prefetching, a new prefetching type that is used in
query execution plans that access column-organized tables
– Dynamic list prefetching is used to prefetch only those pages that are
accessed while a specific portion of a table is scanned
• Page map index allows pages for selected columns to be read
• Synopsis table can be used to bypass pages that do not contain matches
for predicates
– Maximizes the number of pages that are retrieved by asynchronous
prefetching
– Minimizes synchronous reads by queuing work until the required
pages are loaded into the buffer pool
• DB2 uses special processing logic that drives work to be
performed when the input data is available rather than have
subagents waiting for data to arrive
Figure 3-35. Dynamic List prefetching for column-organized table access CL4636.0
Notes:
Dynamic list prefetching is used to prefetch exactly those pages that will be accessed while
scanning a specific portion of the table.
This prefetch method maximizes the number of pages that are retrieved by asynchronous
prefetching (while minimizing synchronous reads) by queuing work until the required pages
have been loaded into the buffer pool.
The number of pages that each subagent can prefetch simultaneously is limited by the
prefetch size of the table space being accessed (PREFETCHSIZE). The prefetch size can
have significant performance implications, particularly for large table scans.
The PREFETCHSIZE clause on either the CREATE TABLESPACE or the ALTER
TABLESPACE statement traditionally lets you specify the number of prefetched pages that
will be read from the table space when prefetching is performed. The value that you specify
(or AUTOMATIC) is stored in the PREFETCHSIZE column of the SYSCAT.TABLESPACES
catalog view. Although dynamic list prefetching typically prefetches up to PREFETCHSIZE
pages at a time, this might not always be possible and in some cases, the
PREFETCHSIZE value might be automatically adjusted for performance reasons. If you
Instructor Guide
observe I/O waits while a query is prefetching data using dynamic list prefetching, try
increasing the value of PREFETCHSIZE.
The processing for column-organized tables included logic to drive work to DB2 agents
when data is available in the buffer pool to avoid agents performing synchronous waits for
data to operate on.
V5.4
Instructor Guide

Purpose — To discuss some of the new data prefetch logic that was implemented for
column-organized tables. For row-organized table, DB2 tended to use sequential
prefetching where all pages in a table were read, or prefetching based on an index on the
table. The page map index provides a simple method to locate all of the extents holding
data for one column of a table, which can be efficiently prefetched.
Details —
Transition statement — Next we will discuss an example where all of these features of
column-organized tables can work together to provide a fast result for a query.
Instructor Guide
BLU Acceleration illustration 10TB query in seconds

- Register encoded vector processing
• The System: 32 cores, 1TB memory, 10TB table with 100 columns and 10
years of data
• The Query: How many “sales” did we have in 2010?
– SELECT COUNT(*) from MYTABLE where YEAR = ‘2010’
• The Result: In seconds or less as each CPU core examines the equivalent
of just 8MB of data
DA DA DA D D
DA TA
TA DA TA
DA DA
AT AT
TA
DA TA
DA TA
DA TA
A A
TA TA TA
Actionable Compression Column Processing Data Skipping

reduces to 1TB reduces to 10GB reduces to 1GB
10TB data In-memory
Parallel Processing Vector Processing Scans as

32MB linear scan fast as Result in
on each core via 8MB through SIMD seconds or less
DA DA DA
TA TA TA
37
Figure 3-36. BLU Acceleration illustration 10TB query in seconds - Register encoded vector processing CL4636.0
Notes:
Prior to the implementation of DB2 10.5 with BLU Acceleration, the idea of being able to get
a query result from a large table with 10 terabytes of data in a second or less without using
indexes would seem impossible.
Here’s how the design components of DB2 10.5 with BLU acceleration could possibly
achieve this.
Assume the system has 32 processor cores and 1TB of memory. The table has 10TB of
raw application data, 100 columns that hold ten years of information.
A simple query might want to count sales for one year, 2010.
• First, the extreme compression on column-organized tables might reduce the raw 10TB
of data into 1TB of storage.
• If the query only accesses one column, then the storage for 99 columns can be
bypassed, so the remaining one percent might be 10GB of data.
V5.4
Instructor Guide
Uempty • Using the synopsis table to perform data skipping we may be able to bypass reading
the other 9 years, which in 90 percent, so the query now only needs 1GB of the 10GB
for the one column.
• Since the system has 32 CPUs the scan could be divided and processed in parallel
which would have 32MB processed by each CPU.
• If the column data is processed using vectors that handle four columns of data per
operation, the processing per CPU is now reduced to 8MB of data.
All the processing techniques could work together to reduce the query processing to an
amount of processing that could complete quickly.
Instructor Guide
Instructor notes:
Purpose — To discuss how all the processing features of DB2 BLU Acceleration could
work together to reduce what would seem like a large query to process very efficiently.
There are many simple assumptions in this that may not apply to every query, the main
point is the features working together to process a large amount of data and return a result
quickly.
Details —
Transition statement — Next we will look at the different storage structure for
V5.4
Instructor Guide
Uempty
Storage Objects for Column-Organized Tables

Column-Organized Data Object (DAT)
Storage Object (COL)
Meta Data
User Table Data
(Dictionaries)
Empty Pages Index Object (INX)

(if exist)
Page Map Index
Unique/Primary Index
Row-organized table has 1 internal data object

Column-organized table has 2 internal storage objects for data
Column-Organized Storage Object: user data + empty
pages
Data Object: meta data, including column-level dictionaries
Figure 3-37. Storage Objects for Column-Organized Tables CL4636.0
Notes:
Similar to the implementation of XML data, DB2 implemented a new Column-organized
storage object to column-organized tables.
For a row-organized table, the standard type data columns, excluding the LONG and XML
data that are not stored inline, are stored in the data object for the table.
For a column-organized table, the user column data is stored in a set of pages termed the
column-organized storage object. The column dictionaries and some other table metadata
are stored in the data storage object for the table.
The page map index which DB2 creates for each column-organized table will be stored in
the index storage object for the table. DB2 will also use the index storage object for any
primary key or unique indexes defined on the table.
Instructor Guide
Note
One major impact of using a new storage object for column organized tables is the
reporting of access to the new type of page in DB2 monitoring statistics.
V5.4
Instructor Guide

Purpose — To introduce the use of a new storage object for column-organized tables.
Details —
Transition statement — Next we will see how we can use the function
ADMIN_GET_TAB_INFO to analyze space usage for column-organized tables.
Instructor Guide
Monitoring Component Object Allocations for

using ADMIN_GET_TAB_INFO
select SUBSTR(TABSCHEMA,1,10) AS SCHEMA ,
SUBSTR(TABNAME,1,12) AS TABLE ,
DATA_OBJECT_P_SIZE, INDEX_OBJECT_P_SIZE , COL_OBJECT_P_SIZE,
COL_OBJECT_L_SIZE
FROM TABLE ( ADMIN_GET_TAB_INFO ('TEST',NULL ) ) AS TABINFO
order by TABNAME
SCHEMA TABLE DATA_OBJECT_P_SIZE INDEX_OBJECT_P_SIZE COL_OBJECT_P_SIZE COL_OBJECT_L_SIZE

------- -------- -------------------- -------------------- ----------------- -----------------
TEST ACCT 256 512 5248 5248
TEST BRANCH 256 256 1152 1152
TEST HISTORY 768 512 7424 7424
TEST TELLER 256 256 1280 1280
Figure 3-38. Monitoring Component Object Allocations for using ADMIN_GET_TAB_INFO CL4636.0
Notes:
The table function ADMIN_GET_TAB_INFO now includes new columns of information.
COL_OBJECT_P_SIZE and COL_OBJECT_L_SIZE that show the storage used for the
column organized storage object.
col_object_l_size - Amount of disk space logically allocated for the column-organized
data in the table, reported in kilobytes.
col_object_p_size - Amount of disk space physically allocated for the column-organized
data in the table, reported in kilobytes. The size returned takes into account full extents
allocated for the table and includes the EMP extents for objects created in DMS table
spaces. This size represents the physical size of the base columnar data only
V5.4
Instructor Guide

Purpose — To show how ADMIN_GET_TAB_INFO can be used to query current storage
utilization for column-organized tables.
Details —
Transition statement — Next we will see an example of the INSPECT report for a
column-organize table.
Instructor Guide
Using INSPECT CHECK TABLE for column-

organized tables
Action: CHECK TABLE
Schema name: TEST
Table name: HISTORY
Tablespace ID: 6 Object ID: 6
Result file name: insphist.dat
Table phase start (ID Signed: 6, Unsigned: 6; Tablespace ID: 6) : TEST.HISTORY
Data phase start. Object: 6 Tablespace: 6 Shows the Data

The index type is 2 for this table.
Traversing DAT extent map, anchor 56. Object (Metadata)
Extent map traversal complete.
DAT Object Summary: Total Pages 10 - Used Pages 2 - Free Space 12 %
Data phase end.
Column-organized object phase start. Object: 6 Tablespace: 6

Traversing COL extent map, anchor 72.
Extent map traversal complete. Shows the column-
COL Object Summary: Total Pages 224 - Used Pages 224
Column-organized object phase end. organized Object
Index phase start. Object: 6 Tablespace: 6
Traversing INX extent map, anchor 120.
Extent map traversal complete.
INX Object Summary: Total Pages 8 - Used Pages 4
Shows the Index Object
Index phase end.
Table phase end.
Processing has completed. 2013-06-17-15.35.24.015648
Figure 3-39. Using INSPECT CHECK TABLE for column-organized tables CL4636.0
Notes:
The visual shows an example of the INSPECT CHECK TABLE report generated using a
column-organized table. The report includes the Column-organized object as a new section
of the report in addition to the data and index objects. This should be the largest portion of
the storage for a column-organized table.
V5.4
Instructor Guide

Purpose — To show an example of the INSPECT CHECK TABLE report for a
Details —
Transition statement — Next we will discuss some considerations for using the LOAD
utility with column-organized tables.
Instructor Guide
LOAD utility considerations for column-organized

tables
For row-organized tables, LOAD builds new extents
For column-organized tables, LOAD will begin to fill the last
partially filled page for each data column
Since each column has its own set of extents, continuing to fill
the last page for each column avoids leaving many unused
pages
STATISTICS: Default for column-organized tables is YES
Some load options not currently supported for column-organized
tables
LOAD RESTART
SAVECOUNT
ALLOW READ ACCESS
Figure 3-40. LOAD utility considerations for column-organized tables CL4636.0
Notes:
For row-organized tables the LOAD utility builds new extents of data pages with the data
rows provided as input. If the operation is a LOAD INSERT into a table with existing data, It
does not store new data into any pages where data is already stored.
For column-organized tables, when the LOAD utility is used for a LOAD INSERT into a
non-empty table, DB2 will find the last partially filled page for each column of the table and
continue to fill that page and extent.
For column-organized tables, the default LOAD option is STATISTICS USE PROFILE. A
LOAD INSERT operation into a column-organized table maintains table statistics by default
if the table was empty at the start of the load operation.
Some LOAD options, including RESTART, SAVECOUNT and ALLOW READ ACCESS are
not currently supported for column-organized tables.
V5.4
Instructor Guide

Purpose — To discuss some considerations for using the LOAD utiltiy for
Details —
Transition statement — Next we will discuss the processing for INSERT statements for
Instructor Guide
INSERT processing for column-organized tables - INSERT

updates many pages compared to row-organized tables
CUST_NAME AGE ADDR_STREET ADDR_CITY STATE ZIPCODE


• A table with 50 columns must change 50 pages for each row inserted
• DB2 uses buffered insert processing to handle applications performing many INSERTs
• Newly Inserted rows are always stored in the last partially filled pages assigned to each
column
• DB2 internally manages multiple concurrent applications performing INSERTs to avoid
page contention
• Once a page is filled, no additional column data will be added to the page
Figure 3-41. INSERT processing for column-organized tables - INSERT updates many pages compared to row-organized tables
CL4636.0
Notes:
Applications that use SQL INSERT, including DB2 IMPORT and INGEST can run using
The storage for column-organized tables is optimized for SELECT processing.
Since every column of the column-organized table is stored on a different data page, the
processing for inserting a new row of data will need to access and change many pages. For
example, inserting one new row in a table with fifty columns will need change the data in
fifty pages, which will require logging.
When a row is inserted, the new columns of data will extend the table, using partially filled
pages for each column. Once a page for a particular column is filled, no additional column
data will be directed to that page, even if other column data is later marked deleted.
DB2 does some special internal processing for inserting rows into column-organized tables
that buffers the new data, to reduce the processing overhead when applications are
inserting many new rows. That processing also manages the handling for multiple
V5.4
Instructor Guide
Uempty concurrent applications performing inserts into the same table to avoid contention for the
pages where new column data will be stored.
Instructor Guide
Instructor notes:
Purpose — To discuss the different processing required for inserting new rows into a
column-organized table compared to a row-organized table.
Details —
Additional information — The RESETDICTIONARYONLY mode could be used if all data
will be added using INSERT processing.
Transition statement — Next we will discuss processing for UPDATE and DELETE
statements for column-organized tables.
V5.4
Instructor Guide
Uempty
Processing for DELETE and UPDATE SQL
statements with column-organized tables
• When an application DELETES a data row
– The data for each column is flagged as deleted in the page for each column
– The space is not available for reuse for inserting new rows
– If many rows are deleted that were stored in sequence, then pages containing the
column data may contain all delete flagged entries
– Extents that contain pages where all of the column data has been flagged as
deleted can be released using a REORG with RECLAIM EXTENTS
• When an application UPDATES a data row
– Updates are processed using a DELETE of the old data row and an INSERT for
the changed row
– This impacts every page containing columns for the data row
– Like DELETE the original column storage is flagged as deleted but no space is
made available
– An application performing massive updates to a column-organized table will
require a large amount of additional pages to contain the column data for the
changed rows.
Figure 3-42. Processing for DELETE and UPDATE SQL statements with column-organized tables CL4636.0
Notes:
When a row in a column-organized table is deleted, the row is logically (but not physically)
deleted. As a result, the space that is used by the deleted row is not available to
subsequent transactions, and remains unusable until space reclamation occurs.
For example, consider the case where a table is created and 1 million rows are inserted
in batch operation A. The size of the table on disk after batch operation A is 5 MB. After
some time, batch operation B inserts another 1 million rows. Now the table consumes
10 MB on disk. At this point, all of the rows that were inserted in batch operation A are
deleted, and the table size on disk remains 10 MB. If a third batch operation C inserts
another 1 million rows into the table, 5 MB of additional space is required.
With row-organized tables, the rows that are inserted in batch operation C would use
the space that was vacated by the deleted rows from batch operation A.
A REORG TABLE command is required to reclaim the space that was used by the rows
inserted in batch operation A.
Instructor Guide
When a row in a column-organized table is updated, the row is first deleted and a new copy
of the row is inserted at the end of the table.
This means that an updated row consumes space in proportion to the number of times the
row has been updated until space reclamation occurs.
All rows in the extent where the update took place must be deleted before any space
reclamation will occur.
The number of pages impacted by UPDATE and DELETE processing for
column-organized tables would be expected to generate additional logging compared to
similar processing for row-organized tables.
V5.4
Instructor Guide

Purpose — To discuss the differences in processing for UPDATE and DELETE statements
Details — Column-organized tables are optimized for SELECT processing with a small
number of columns being referenced. The type of compression used for column-organized
tables, encoding a column of data into several bits for storage in an array, makes updates
and deletes more complex and space reuse difficult. For example, if you changed a column
value for one column, it is possible that the new value may not fit in the same space. The
concept of a tuple sequence number (TSN) means that you can not move the data of one
column to another page without moving all of the column data as if it was a newly inserted
row.
Transition statement — Next we will summarize the topics covered in this lecture on
Instructor Guide
Unit summary
• List the seven ‘big ideas’ that work together to provide DB2
BLU acceleration
• Describe the different storage used for column-organized table
compared to row-organized tables
• Explain how DB2 uses a synopsis table to support data
skipping with column-organized tables
Notes:
V5.4
Instructor Guide

Purpose —
Details —
Instructor Guide
V5.4
Instructor Guide
Uempty Unit 4. DB2 10.5 BLU Acceleration

Implementation and Use
Estimated time
01:30

This unit describes the implementation of DB2 BLU Acceleration,
Column-organized tables, in DB2 10.5. We will discuss how to
implement database support for column-organized tables. The use of
the db2convert command and the ADMIN_MOVE_TABLE procedure
to convert row-organized tables to column-organized tables will be
covered. We will see examples of SQL statements that reference a
mixture of column-organized and row-organized tables. DB2 explain
reports will be used to help understand the differences in processing
for queries that access column-organized tables, compared to access
for row-organized tables. We will discuss how to monitor the
processing of applications using DB2 BLU Acceleration.

• Implement DB2 BLU Acceleration support for a new or existing
DB2 database
• Configure a DB2 database that uses DB2 BLU Acceleration,
column-organized tables, including sort memory and utility heap
memory considerations
efficiently use system resources
• Monitor a DB2 database or application that uses column-organized
tables using SQL monitor functions
• Locate the column-organized processing portion of the access
plans for column-organized tables in DB2 explain reports
• Use db2convert or ADMIN_MOVE_TABLE to convert
row-organized tables to column-organized tables
© Copyright IBM Corp. 2005, 2015 Unit 4. DB2 10.5 BLU Acceleration Implementation and Use 4-1
Instructor Guide
References
The IBM Knowledge Center for DB2 10.5:
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/index.jsp
Upgrading to DB2 Version 10.5 SC27-5513-00
What's New for DB2 Version 10.5 SC27-5519-00
V5.4
Instructor Guide
Uempty
Unit objectives
• Implement column-organized table support for a new or
existing DB2 database
• Configure a DB2 database that uses DB2 column-organized
tables, including sort memory and utility heap memory
Acceleration processing and how you can tailor the WLM
objects to efficiently use system resources
• Monitor a DB2 database or application that uses column-
organized tables using SQL monitor functions
• Use db2convert or ADMIN_MOVE_TABLE to convert row-
organized tables to column-organized tables
Notes:
These are the objectives for this lecture unit.
Instructor Guide
Instructor notes:
Details —
Transition statement — Let’s first see an overview of the functionality in DB2 10.5 which
we call BLU Acceleration.
V5.4
Instructor Guide
Uempty
Planning for using column-organized Tables
Considerations for implementing column-

organized tables

Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 8.0
Figure 4-2. Considerations for implementing column-organized tables CL4636.0
Notes:
In this section we will review some of the considerations for implementing
column-organized tables for a DB2 database.
Instructor Guide
Instructor notes:
Purpose — To introduce the planning for implementing column-organized tables.
Details —
Transition statement — Next we will review the system requirements for
V5.4
Instructor Guide
Uempty
System requirements for column-organized tables

• Linux (x86-x64, Intel and AMD processors) and AIX (POWER
processors)
• With DB2 10.5 Fix Pack 5 BLU Acceleration is now supported on these
additional operating systems:
– Linux on zSeries.
– Windows operating systems that are supported by DB2 Advanced Enterprise
Server Edition and DB2 Advanced Workgroup Server Edition.
• DB2 10.5 Fix Pack 4 (Cancun)
– Introduced HADR database support of column-organized tables
– SQL Compatibility features provided by the DB2_COMPATIBILITY_VECTOR
registry variable are available for column-organized tables.
• Database features not currently supported:
– pureScale cluster
– Database partitioning (DPF)
– Databases whose code set and collation are not UNICODE and IDENTITY or
IDENTITY_16BIT
Figure 4-3. System requirements for column-organized tables CL4636.0
Notes:
System and database configurations for column-organized tables:
Column-organized tables are supported only on Linux (x86-x64, Intel and AMD processors)
and AIX (POWER processors).
Information
Starting with DB2 Version 10.5 Fix Pack 5 contains the functionality of the previous fix pack
and also includes the following enhancements:
• Column-organized tables was added for these operating systems:
- Linux on zSeries.
Instructor Guide
- Windows operating systems that are supported by DB2 Advanced Enterprise Server
Edition and DB2 Advanced Workgroup Server Edition.
The following system and database configurations do not support column-organized tables
in the DB2 Version 10.5 release.
• Databases in a pureScale environment
• Partitioned databases
• Databases that are created without automatic storage enabled
• Table spaces that are not enabled for reclaimable storage
• Databases whose code set and collation are not UNICODE and IDENTITY or
IDENTITY_16BIT
V5.4
Instructor Guide

Purpose — To review some of the system requirements and some DB2 features that are
not currently supported with column-organized tables.
Details —
Transition statement — Next we will look at some general system configuration
recommendations.
Instructor Guide
General system configuration recommendations for

column-organized table usage
Small Medium Large

Raw data (CSV) ~1TB ~5TB ~10TB
Minimum:
#cores 8 16 32
Memory 64GB 256GB 512GB
High-end performance:
#cores 16 32 64
Memory 128 – 256GB 384 – 512GB 1 – 2TB
Assumption: all data is active and equally “hot”.
Figure 4-4. General system configuration recommendations for column-organized table usage CL4636.0
Notes:
The visual show a table with minimum and ‘high-end’ performance configurations based on
the amount of row application data stored in the column-organized tables. One assumption
is that all of the table data will be accessed frequently.
In general the system processor and memory configurations are based on tests with
column-organized tables. The DB2 BLU Acceleration routines were designed for systems
with large system memory and multiple processors.
V5.4
Instructor Guide

Purpose — To review some recommended system memory and processor configurations
for using column-organized tables.
Details —
Transition statement — Next we will discuss allocation of DB2 database shared memory
for sort processing.
Instructor Guide
Sort memory configuration for column-organized

tables
• Ensure that the sortheap (sort heap) and sheapthres_shr (sort
heap threshold for shared sorts) database configuration
parameters ARE NOT set to AUTOMATIC
– STMM management of sort memory is not currently supported for
column-organized table processing
• Consider increasing these values significantly for analytics
workloads.
• Suggested sort memory configuration
– Set sheapthres_shr to the size of the buffer pool (across all buffer
pools)
– Set sortheap to some fraction (for example, 1/20) of sheapthres_shr to
enable concurrent sort operations.
Figure 4-5. Sort memory configuration for column-organized tables CL4636.0
Notes:
The processing for column-organized tables makes heavy use of database shared sort
memory. When column-organized tables are join, the standard join logic is based on a hash
join which uses sort memory for holding the data from the inner table of the join.
DB2 does not current support the use of the self tuning memory management of the sort
memory options, sortheap (sort heap) and sheapthres_shr (sort heap threshold for shared
sorts, with column-organized tables. These database configuration parameters can not be
set to AUTOMATIC.
Consider increasing these values significantly for analytics workloads. A reasonable
starting point is setting sheapthres_shr to the size of the buffer pool (across all buffer
pools). Set sortheap to some fraction (for example, 1/20) of sheapthres_shr to enable
concurrent sort operations.
V5.4
Instructor Guide

Purpose — To discuss the configuration of database shared memory for sorts when
column-organized tables will be used.
Details —
Transition statement — Next we will discuss some of the current restrictions for use with
Instructor Guide
Current restrictions for column-organized tables

in DB2 10.5
• Schemas that include column-organized tables cannot be transported
• Data federation with column-organized tables is not supported
• For data replication with column-organized tables as source
– A CREATE TABLE statement with the DATA CAPTURE CHANGES
clause is not supported.
• Queries using the RR or RS isolation level are not supported with
column-organized tables
• Columns of type BLOB, DBCLOB, CLOB, or XML cannot be defined for
Figure 4-6. Current restrictions for column-organized tables in DB2 10.5 CL4636.0
Notes:
The following additional restrictions apply to column-organized tables in the DB2 Version
10.5 release:
• Schemas that include column-organized tables cannot be transported.
• Data federation with column-organized tables is not supported.
• For data replication with column-organized tables as either source or target:
• A CREATE TABLE statement with the DATA CAPTURE CHANGES clause is not
supported.
• For label-based access control (LBAC), the following syntax is not supported: ALTER
TABLE...SECURITY POLICY policyname
• For row and column access control (RCAC), the following syntax is not supported:
ALTER TABLE...ACTIVATE ROW | COLUMN ACCESS CONTROL
CREATE PERMISSION / CREATE MASK
V5.4
Instructor Guide
Uempty The following SQL statements do not support column-organized tables as target or source
tables:
CREATE EVENT MONITOR
CREATE GLOBAL TEMPORARY TABLE
CREATE INDEX and ALTER INDEX
CREATE MASK
CREATE NICKNAME
CREATE PERMISSION
CREATE TRIGGER
DECLARE GLOBAL TEMPORARY TABLE
SET INTEGRITY
Important
Queries using the RR or RS isolation level are not supported with column-organized tables.
XA transactions are not supported.
Instructor Guide
Instructor notes:
Purpose — To briefly cover some of the current restrictions for column-organized tables.
These items are table level restrictions, not databases level restrictions covered earlier, like
database partitioning and pureScale. The items covered here like federation and
replication apply to the column-organized tables in a database. The features could still be
used for the row-organized tables in the database.
Details —
Transition statement — Next we will discuss using the Optim Query Workload tuner to
analyze SQL workloads to decide which tables should be converted to use
V5.4
Instructor Guide
Uempty
IBM InfoSphere Optim Query Workload Tuner for DB2 for
LUW estimates benefits for column-organized tables
• Advisor identifies candidate tables for conversion to columnar

format.
• Analyzes SQL workload and estimates execution cost on row-
and column-organized tables.
Figure 4-7. IBM InfoSphere Optim Query Workload Tuner for DB2 for LUW estimates benefits for column-organized tables CL4636.0
Notes:
IBM InfoSphere Optim Query Workload Tuner Version 4 includes the Workload Table
Organization Advisor, which examines all of the tables that are referenced by the
statements in a query workload.
Its recommendations lead to the best estimated performance improvement for the query
workload as a whole. The advisor presents its analysis and rationales so that you can see
the tables that are recommended for conversion from row to column organization.
This tool can generate the ADMIN_TABLE_MOVE procedure call statements to convert
row-organized tables to column-organized tables.
The Query Workload Tuner was enhanced to support the use of Column-organized MQT
tables and Shadow tables.
Instructor Guide
Instructor notes:
Purpose — To discuss using the Optim Query Workload Tuner, Version 4 which has been
updated to make recommendation for converting tables form row-organization to
column-organization.
Details — As a note, the db2advis command line Design Advisor, does not currently make
any recommendations for column-organized tables.
Transition statement — Next we will discuss using a new tool db2convert to convert
row-organized table to column-organized tables.
V5.4
Instructor Guide
Uempty
db2convert – command line tool to ease converting
row-organized tables to column-organized tables
• Converts one or all row-organized user tables in a specified

database into column-organized tables.
• The row-organized tables remain online during command
processing
• Calls ADMIN_MOVE_TABLE procedure
• For monitoring purposes, the command displays statistics about
the conversion
db2convert
-d <database-name> (this is the only mandatory parameter)
-stopBeforeSwap
-continue (resumes a previously stopped conversion)
-z <schema-name>
-t <table-name>
-sts <source tablespace>
-ts <target tablespace for new table>
-opt <ADMIN_MOVE_TABLE options> (e.g. COPY_USE_LOAD)
…
Figure 4-8. db2convert - command line tool to ease converting row-organized tables to column-organized tables CL4636.0
Notes:
With DB2 10.5 you can use the db2convert command to convert one or all row-organized
user tables into column-organized tables in a specified database.
The row-organized tables remain online during command processing. The command
displays statistics about the conversion for monitoring purposes.
You must have SQLADM or DBADM authority to invoke the ADMIN_MOVE_TABLE stored
procedure, on which the db2convert command depends. You must also have the
appropriate object creation authorities, including the authority to issue the SELECT
statement on the source table.
The syntax for the db2convert command tool is as follows:
>>-db2convert-- -d--database_name--+------------------+--------->
+- -stopBeforeSwap-+
'- -continue-------'
>--+--------------+--+----------------------------------+------->
Instructor Guide
'- -u--creator-' '- -z--schema--+-----------------+-'

'- -t--table_name-'
>--+-----------------------------------------------------------+-->
+- -ts--target_tablespace_name------------------------------+
'- -dts--data_tablespace_name-- -its--index_tablespace_name-'
>--+-------------------------------+---------------------------->
'- -sts--source_tablespace_name-'
>--+--------------------------+--+---------+-------------------->
| .-COPY_USE_LOAD-. | '- -trace-'
'- -opt--+-AMT_options---+-'
>--+-------------------------------+--+---------+--------------->
'- -usr--userid-- -pw--password-' '- -force-'
>--+-----------------------+--+---------+----------------------><
'- -o--output_file_name-' '- -check-'
Important
Important: Range partitioned tables, MDC tables, and ITC tables are not converted by
default. To convert these table types, use the -force option.
Tables in partitioned database environments, tables in non-automatic storage table spaces,
tables that have generated columns, and tables with columns of type BLOB, DBCLOB,
CLOB, or XML cannot be converted into column-organized tables.
Because this utility calls the ADMIN_MOVE_TABLE stored procedure, the utility inherits all
restrictions that apply to the ADMIN_MOVE_TABLE stored procedure.
Example:
To convert all of the tables in MYDATABASE into column-organized tables, issue the
following command (after performing a full database backup operation):
db2convert -d mydatabase
V5.4
Instructor Guide
Uempty After an initialization period, the command output shows information about the table
size and compression ratios, as shown in the following example:
Table NumRows RowsComm InitSize(MB) FinalSize(MB) CompRate(%) Progress(%)
--------------- -------- -------- ------------- -------------- ------------ ---------------
USER.TABLE1 1500000 0 105.47 0.26 99.76 0
USER.TABLE2 1500000 0 105.47 0.26 99.76 0
USER.TABLE3 1500000 0 105.47 0.26 99.76 0
Total Pre-Conversion Size (MB): 316.42

Total Post-Conversion Size (MB): 0.77
Total Compression Rate (Percent): 99.76
The following table types cannot be converted with db2convert:

• Range clustered tables
• Typed tables
• Materialized query tables
• Declared global temporary tables
• Created global temporary tables
Instructor Guide
Instructor notes:
Purpose — To introduce the command line tool, db2convert that can be used to convert
row-organized table to column-organized tables.
Details —
Transition statement — There are some additional notes about using db2convert.
V5.4
Instructor Guide
Uempty
Additional notes for db2convert usage

• The following table attributes are not used during conversion to column-organized
tables:
– Triggers
– Secondary indexes
• If they are not required, drop any dependent objects that cannot be transferred to
column-organized tables before invoking the db2convert command.
• The following table attributes are used as NOT ENFORCED during conversion to
column-organized tables:
– Foreign keys
– Check constraints
• The table conversion process temporarily requires space for both the source and
the target tables.
• No online process to convert column-organized tables back to row-organized
tables, the best practice is to perform a backup before you convert the tables to
column organization.
• If the database is recoverable and the default COPY_USE_LOAD option is used,
performing the conversion in three separate steps is strongly recommended:
1. Invoke the db2convert command, specifying the -stopBeforeSwap option.
2. Perform a manual online backup of the target table space or table spaces.
3. Invoke the db2convert command, specifying the -continue option.
Figure 4-9. Additional notes for db2convert usage CL4636.0
Notes:
Here are some additional notes regarding using db2convert to convert row-organized
tables to column-organized tables.
The following table attributes are not used when converting to column-organized tables:
• Triggers
• Secondary indexes
If they are not required, drop any dependent objects that cannot be transferred to
column-organized tables before invoking the db2convert command.
The following table attributes are used as NOT ENFORCED when converting to
• Foreign keys
• Check constraints
The table conversion process temporarily requires space for both the source and the target
table. Because there is no online process to convert column-organized tables back to
Instructor Guide
row-organized tables, the best practice is to perform a backup before converting the tables
to column organization.
If the database is recoverable and the default COPY_USE_LOAD option is used,

performing the conversion in three separate steps is strongly recommended:
• Invoke the db2convert command, specifying the -stopBeforeSwap option.
• Perform a manual online backup of the target table space or table spaces.
• Invoke the db2convert command, specifying the -continue option.
If the table being converted has foreign key (referential integrity) constraints, a long offline
phase for the table is to be expected during conversion.
V5.4
Instructor Guide

Purpose — To cover some additional notes on using the db2convert command.
Details —
Transition statement — Next we will look at an example of the output from running a
db2convert command.
Instructor Guide
Sample db2convert output shows progress and

compression results
Table RowsNum RowsComm Status Progress (%)
--------------------- --------------- --------------- --------------- ---------------
"TESTROW"."ACCT" 1000000 908100 COPY 90.81

--------------------- --------------- --------------- --------------- ---------------
"TESTROW"."ACCT" 1000000 1000000 COPY 100.00

------------------------------------- --------------- --------------- ---------------
"TESTROW"."ACCT" 0 0 REPLAY 100.00

------------------------------------- --------------- --------------- ---------------
"TESTROW"."ACCT" 0 0 SWAP 0.00

------------------------------------- --------------- --------------- ---------------
"TESTROW"."ACCT" 0 0 SWAP 100.00
Table RowsNum InitSize (MB) FinalSize (MB) CompRate (%) State

------------------------------------- --------------- --------------- ------------ -------
"TESTROW"."ACCT" 1000000 126.25 26.12 79.31 Completed
Pre-Conversion Size (MB): 126.25

Post-Conversion Size (MB): 26.12
Compression Rate (Percent): 79.31
Figure 4-10. Sample db2convert output shows progress and compression results CL4636.0
Notes:
The visual shows a portion of the output generated from a db2convert command that was
run to convert a single table from row-organization to column-organization.
The db2convert command calls the ADMIN_MOVE_TABLE procedure to perform the table
conversion. The output shows progress and completion of each phase of processing.
There is also a summary that estimates the saving in storage space following the table
conversion.
V5.4
Instructor Guide

Purpose — To show some sample output, running the db2convert command.
Details —
Transition statement — Next we will discuss using the ADMIN_MOVE_TABLE procedure
to convert a row-organized table to a column-organized table.
Instructor Guide
Using ADMIN_MOVE_TABLE to convert row-

organized tables to a column-organized table
• The ADMIN_MOVE_TABLE procedure can be used to convert row-
organized tables to a column-organized table
• To indicate conversion to a column-organized table you can specify
ORGANIZE BY COLUMN as an option of ADMIN_MOVE_TABLE
For example:
call admin_move_table('TEST','ACCT2','AS2','AS2','AS2',
'ORGANIZE BY COLUMN', '','','','COPY_USE_LOAD','MOVE')
• Specify COPY_USE_LOAD option to move data using a LOAD utility to

generate the Column Dictionaries
• Target table could be pre-defined as a column-organized table
• Any Unique or Primary Key indexes on the source table will be added to
the column-organized target table as enforced Unique or Primary Key
constraints
• IBM Data Studio can be used to generate the ADMIN_MOVE_TABLE
call statement to convert a row-organized table
Figure 4-11. Using ADMIN_MOVE_TABLE to convert row-organized tables to a column-organized table CL4636.0
Notes:
Using ADMIN_MOVE_TABLE to convert row-organized tables into column-organized
tables
Conversion can be achieved in either of the following two ways:
• By specifying a column-organized target table
• By specifying the ORGANIZE BY COLUMN clause as the organize_by_clause
parameter.
The ADMIN_MOVE_TABLE stored procedure remains online. When a row-organized table
is being moved into a column-organized table, applicable column-organized table
restrictions on queries (that is, limited isolation levels) start at the end of processing, after
the new table becomes visible to queries.
ADMIN_MOVE_TABLE requires triggers on the source table to capture changes. Because
triggers are currently not supported on column-organized tables, the source table cannot
be a column-organized table (SQL2103N).
V5.4
Instructor Guide
Uempty Indexes on column-organized tables are not supported. ADMIN_MOVE_TABLE silently

converts primary key and unique indexes into primary key or unique constraints and
ignores all non-unique indexes.
You cannot use the ADMIN_MOVE_TABLE procedure to convert a row-organized table
into a column-organized table if the table contains unique indexes that are defined on
nullable columns. Create any unique constraints on the target table before you call the
ADMIN_MOVE_TABLE stored procedure.
ADMIN_MOVE_TABLE provides two call parameter options. In one mode, the procedure
call parameters define the changes to be made and the procedure creates the target table.
For example to use the ADMIN_MOVE_TABLE stored procedure to convert the
row-organized STAFF table into a column-organized table.
Use the ADMIN_MOVE_TABLE stored procedure to convert the row-organized STAFF
table into a column-organized table without specifying a target table. The ORGANIZE BY
COLUMN clause must be specified as a parameter so that the target table is created as a
CALL SYSPROC.ADMIN_MOVE_TABLE(
'OTM01COL',
'STAFF',
'',
'',
'',
'ORGANIZE BY COLUMN',
'',
'',
'',
'COPY_USE_LOAD',
'MOVE'
)
In the following example a table named TEST.ACCT2 is moved to a new tablespace and
converted to a column-organized table. The LOAD utility is used for the COPY phase.
call admin_move_table('TEST','ACCT2','AS2','AS2','AS2'
,'ORGANIZE BY COLUMN', '','','','COPY_USE_LOAD','MOVE')
Result set 1
--------------
KEY VALUE
--------------------------------
--------------------------------------------------------------------------
------------------------------------------------------
AUTHID INST20
CLEANUP_END 2013-06-26-12.13.36.463983
Instructor Guide
CLEANUP_START 2013-06-26-12.13.36.275748
COPY_END 2013-06-26-12.13.35.640270
COPY_OPTS OVER_INDEX,LOAD,WITH_INDEXES,NON_CLUSTER
COPY_START 2013-06-26-12.13.21.951896
COPY_TOTAL_ROWS 1000000
INDEXNAME ACCT2ACCT
INDEXSCHEMA TEST
INDEX_CREATION_TOTAL_TIME 0
INIT_END 2013-06-26-12.13.21.537935
INIT_START 2013-06-26-12.13.19.632628
ORIGINAL_TBLSIZE 129280
REPLAY_END 2013-06-26-12.13.36.163455
REPLAY_START 2013-06-26-12.13.35.640947
REPLAY_TOTAL_ROWS 0
REPLAY_TOTAL_TIME 0
STATUS COMPLETE
SWAP_END 2013-06-26-12.13.36.229232
SWAP_RETRIES 0
SWAP_START 2013-06-26-12.13.36.164119
UTILITY_INVOCATION_ID
0100000006000000080000000000000000002013062612132153943900000000
VERSION 10.05.0000
Return Status = 0
V5.4
Instructor Guide

Purpose — To discuss using the ADMIN_MOVE_TABLE procedure to convert
row-organized tables to column-organized tables.
Details —
Transition statement — Next we will cover use of other DB2 utilities with
Instructor Guide
DB2 Utility support for column-organized tables

• REORG
– Only supports RECLAIM EXTENTS option, no standard offline or
online table reorganization
– Automated by default, but the REORG can be run manually to free full
extents emptied by deletes or updates
• REORGCHK report is not useful for analysis of column-
organized tables
• RUNSTATS
– LOAD utility collects statistics by default, but standard RUNSTATS
can be run manually
• db2advis – does not make recommendations for column-
organized tables
– Use Infosphere Optim Query Workload Tuner
Figure 4-12. DB2 Utility support for column-organized tables CL4636.0
Notes:
With DB2 10.5 the only mode of processing for the REORG utility with column-organized
tables is the RECLAIM EXTENTS option. When DB2_WORKLOAD is set to ANALYTICS,
DB2 will automatically run the RECLAIM EXTENTS type of reorganization for
The REORGCHK command report does not provide useful analysis for column-organized
tables.
The LOAD utility will by default collect table statistics when an empty column-organized
table is loaded. The RUNSTATS command can also be run manually to collect new table
statistics.
When providing recommendations about indexes, MQTs, or MDC tables, the Design
Advisor, db2advis, ignores column-organized tables.
V5.4
Instructor Guide

Purpose — To discuss using various DB2 utilities and commands with column-organized
tables.
Details —
Transition statement — Next we will discuss the use of constraints with column-organized
tables.
Instructor Guide
Referential Integrity and Unique constraints for

• The CREATE INDEX statement can not be used to define
traditional indexes on a column-organized table
• Constraints
– ENFORCED check and foreign key (referential integrity) constraints
are not supported on column-organized tables.
• These constraints are supported as informational (NOT ENFORCED)
constraints.
– You cannot specify the WITH CHECK OPTION when creating a view
that is based on column-organized tables
• Primary Key and Unique Constraints
– You can define ENFORCED or NOT ENFORCED Primary key and
unique constraints for column-organized tables
• DB2 will create system maintained indexes for enforced constraints
• The performance of a select, update, or delete operation that affects only
one row in a column-organized table can be improved if the table has
unique indexes
Figure 4-13. Referential Integrity and Unique constraints for column-organized tables CL4636.0
Notes:
The CREATE INDEX statement is not supported to define an index for a column-organized
table.
ENFORCED check and foreign key (referential integrity) constraints are not supported on
column-organized tables. These constraints are supported as informational (NOT
ENFORCED) constraints.
You cannot specify the WITH CHECK OPTION when creating a view that is based on
You can define ENFORCED and NOT ENFORCED Primary Key and Unique constraints
for column organized tables. If you define an enforced Primary Key or Unique constraint for
a column organized table, DB2 will create the necessary supporting index that will be used
to perform the necessary checking when rows are added or updated.
The performance of a select, update, or delete operation that affects only one row in a
column-organized table can be improved if the table has unique indexes because the query
optimizer can use an index scan instead of a full table scan.
V5.4
Instructor Guide

Purpose — To discuss the options to defined ENFORCED or NOT ENFORCED
constraints on column-organized tables. The definition of a unique constraint or primary
key should not be considered a way to define indexed access to column-organized tables.
The use of unique indexes to support access to single rows became available with Fix
Pack 1 of DB2 10.5.
Details —
Transition statement — In the next section we will look at examples of DB2 explain tool
reports that include access to column-organized tables.
Instructor Guide
Access plans for column-organized Tables
Using explain tools to evaluate access

plans for column-organized tables

Figure 4-14. Using explain tools to evaluate access plans for column-organized tables CL4636.0
Notes:
In this section we will look at a number of examples of DB2 Explain tool reports for queries
using column-organized tables to better understand the processing for queries using those
tables.
V5.4
Instructor Guide

Purpose — To introduce the next section in the lecture, access plans and explain tool
reports for column-organized tables.
Details —
Transition statement — Next we will look at a simple access plan for a query using a
Instructor Guide
Explain report for summary query using a column-

organized table
Rows
RETURN
( 1)
Cost
SELECT ACCT_GRP, SUM(BALANCE) I/O
FROM test.ACCT |
WHERE 57.5334
ACCT_GRP BETWEEN 100 AND 150
CTQ
GROUP BY
( 2)
ACCT_GRP
378.646
41.7544
|
57.5334
GRPBY
Group BY Performed in ( 3)
Column-organized 378.641
Portion of access plan 41.7544
|
57997.4
TBSCAN
( 4)
377.516
41.7544
|
1e+06
CO-TABLE: TEST
ACCT
Q1
Figure 4-15. Explain report for summary query using a column-organized table CL4636.0
Notes:
The visual shows the access plan for a simple query using a column-organized table,
created by the db2exfmt explain tool. What is new in the access plan is the CTQ operator
between the Group By and Return.
DB2 Version 10.5 provides a new CTQ plan operator that represents the transition between
column-organized data processing and row-organized data processing.
Important
The CTQ operator represents a boundary within the DB2 query engine, in which operators
that appear below the boundary process data as compressed column-organized vectors
and tuples, whereas operators that are higher than the boundary operate on tuples that are
not encoded.
V5.4
Instructor Guide
Uempty In this sample access plan shown the table scan and group by operators are below the
CTQ operator. This indicates that those operations were performed using
column-organized processing.
Instructor Guide
Instructor notes:
Purpose — To review a simple access plan generated by db2exfmt to show the new CTQ
operator.
Details —
Transition statement — Next we will discuss the new CTQ operator in more detail.
V5.4
Instructor Guide
Uempty
Execution Plans for column-organized tables

• Runtime operators are optimized for row- and column-organized tables
• CTQ operator transfers data from column- to row-organized processing
• Operators that are optimized for column-org tables below CTQ include:
– Table scan
– Hash-based join, optionally employing a semi-join
– Hash-based group by. Potentially faster without the sort
– Hash-based unique.
• Aim is to “push down” most operators below CTQ
• Some operations cannot be pushed down, such as:
– SORT
– SQL OLAP function (e.g., rank())
– Inequality join (join predicate that is NOT equality)
Figure 4-16. Execution Plans for column-organized tables CL4636.0
Notes:
DB2 Runtime operators are optimized for row and column-organized tables.
Returning a query result to an application is performed with data in a row-organized form.
The CTQ plan operator that represents the transition between column-organized data
processing and row-organized data processing.
The column-organized table support routines are optimized to process some operators
using column-organized processing. These include:
• Table scan
• Hash-based joins
• Hash-based group by
• Hash based unique
Some operators must be performed using the standard row-organized processing
including:
Instructor Guide
• Sort
• SQL OLAP functions
• Inequality join
V5.4
Instructor Guide

Purpose — To discuss operators in an access plan that are able to take advantage of
column-organized processing and those that must be performed using row-organized
processing.
Details —
Transition statement — Next we will look at an example of an access plan for a
column-organized table that uses an index scan.
Instructor Guide
Example access plan for a column-organized table

using an index scan
1
CTQ
( 2)
43.9788
6
|
SELECT * FROM T1 WHERE T1.PK = ? 1
NLJOIN
( 3)
43.7775
• The db2exfmt output shows an example 6
/----+-----\
of a column-organized table that is 1 1
accessed by using an index with isolation CTQ TBSCAN
( 4) ( 6)
level CS 9.10425 34.6733
1 5
| |
• This query runs by using row-organized 1 1000
processing because all the predicates can IXSCAN CO-TABLE: BLUUSER
be applied at the index scan and the table ( 5) T1
9.10425 Q1
is not being joined to any other table. 1
|
1000
INDEX: BLUUSER
PK
Figure 4-17. Example access plan for a column-organized table using an index scan CL4636.0
Notes:
In DB2 Cancun Release 10.5.0.4 and later fix packs, index access for SELECT statements
that are run with isolation level CS (cursor stability) is supported. An extra runtime
optimization is available to improve the performance of column-organized UR and CS
index scans when the data does not require column-organized processing. In some
situations, it is more efficient for the access to be done by using row-organized processing
because this approach avoids the overhead of switching between column-organized and
row-organized formats. This additional optimization, in which index access is performed by
using row-organized data processing, is not possible if all the following conditions apply:
• The index access occurs directly on the inner of a nested-loop join (NLJOIN).
• There are join predicates to be applied by the index scan.
• The join occurs between tables in the same subselect.
The db2exfmt output shows an example of a column-organized table that is accessed by
using an index with isolation level CS. This query runs by using row-organized processing
V5.4
Instructor Guide
Uempty because all the predicates can be applied at the index scan and the table is not being
joined to any other table.
Instructor Guide
Instructor notes:
Purpose — The visual shows a sample access plan where an index scan operation is
performed for a column-organized table. The index created to support a primary key or
unique constraint on a column-organized table can be used to improve access efficiency
for some SQL SELECT statements.
Details —
Transition statement — Next we will look at an example of the detail section of an access
plan for a CTQ operator.
V5.4
Instructor Guide
Uempty
Explain report detail for CTQ operator

2) TQ : (Table Queue) Input Streams:
Cumulative Total Cost: -------------
378.646 3) From Operator #3
Cumulative CPU Cost: Estimated number of rows:
8.04973e+08 57.5334
Cumulative I/O Cost: Number of columns: 2
41.7544
Subquery predicate ID:
Cumulative Re-Total Cost: 129.893 Not Applicable
Cumulative Re-CPU Cost: 8.04867e+08
Cumulative Re-I/O Cost: 0 Column Names:
Cumulative First Row Cost: 8.17176 ------------
Estimated Bufferpool Buffers: 0 +Q4.$C1+Q4.ACCT_GRP
Arguments:
--------- Output Streams:
LISTENER: (Listener Table Queue type) --------------
FALSE 4) To Operator #1
TQDEGREE: (Degree of Intra-Partition Estimated number of rows: 57.5334
parallelism) Number of columns: 2
1 Subquery predicate ID: Not Applicable
TQMERGE : (Merging Table Queue flag)
FALSE Column Names:
TQORIGIN: (Table Queue Origin type) ------------
COLUMN-ORGANIZED DATA +Q4.$C1+Q4.ACCT_GRP
TQREAD : (Table Queue Read type)
READ AHEAD
UNIQUE : (Uniqueness required flag)
FALSE
Figure 4-18. Explain report detail for CTQ operator CL4636.0
Notes:
A table queue that is used to pass table data from one database agent to another.
The CTQ operator is a special type of table queue, used for processing queries with
The sample detail section from a db2exfmt report contains this CTQ operator. The
TQORIGIN in the Arguments section indicates COLUMN-ORGANIZED DATA.
Instructor Guide
Instructor notes:
Purpose — To briefly review an example of the detailed information for a CTQ operator in
a db2exfmt report.
Details —
Transition statement — Next we will look at the information for a column-organized table
shown in the object usage section of an explain report.
V5.4
Instructor Guide
Uempty
Explain report object data for column-organized
table
Schema: COLORG
Name: HISTORY
Type: Column-organized Table
Time of creation: 2014-11-14-11.04.07.405297
Last statistics update: 2014-11-14-11.33.52.457762
Number of columns: 9
Number of rows: 490864
Width of rows: 40
Number of buffer pool pages: 184
Number of data partitions: 1
Distinct row values: No
Tablespace name: TSCOLD
Tablespace overhead: 6.725000
Tablespace transfer rate: 0.320000
Source for statistics: Single Node
Prefetch page count: 4
Container extent page count: 4
Table overflow record count: 0
Table Active Blocks: -1
Average Row Compression Ratio: -1
Percentage Rows Compressed: -1
Average Compressed Row Size: -1
Figure 4-19. Explain report object data for column-organized table CL4636.0
Notes:
Since a query may access some combination of row-organized and column-organized
tables, in the objects used section of the db2exfmt report each column-organized table will
show a type of ‘Column-organized Table’.
The sample column-organized table object information shows some of the table statistics
used by the DB2 optimizer to build and cost the access plan.
Instructor Guide
Instructor notes:
Purpose — To discuss an example of the object information for a column-organized table
in a DB2 db2exfmt explain report.
Details —
Transition statement — Next we will compare two similar access plans using a
V5.4
Instructor Guide
Uempty
Explain report shows estimated costs vary
depending on the number of columns accessed
Access Plan: Access Plan:
----------- -----------
Total Cost: 197.054 Total Cost: 204.802
Query Degree: 1 Query Degree: 1
Rows Rows
RETURN SELECT RETURN SELECT
( 1) HISTORY.BRANCH_ID, ( 1) HISTORY.BRANCH_ID,
Cost HISTORY.ACCTNAME Cost HISTORY.ACCTNAME,
I/O FROM I/O HISTORY.ACCT_ID,
| HISTORY AS HISTORY | HISTORY.BALANCE
……… WHERE ……| FROM
2336.77 HISTORY.BRANCH_ID = 25 2336.77 HISTORY AS HISTORY
CTQ ORDER BY CTQ WHERE
( 4) HISTORY.ACCTNAME ASC ( 4) HISTORY.BRANCH_ID = 25
196.229 203.945 ORDER BY
38.5352 62.6197 HISTORY.ACCT_ID ASC
| |
2336.77 2336.77
TBSCAN TBSCAN
( 5) ( 5)
196.216 Increased cost 203.927
38.5352 And I/O estimates 62.6197
| |
513576 513576
CO-TABLE: TEST CO-TABLE: TEST
HISTORY HISTORY
Q1 Q1
Figure 4-20. Explain report shows estimated costs vary depending on the number of columns accessed CL4636.0
Notes:
The visual shows two access plans from db2exfmt reports.
The SQL query text is similar for the two queries, except for the number of columns of data
returned by each. The query on the left returns two columns of data, while the query on the
right returns four columns of data.
Since the table used is a column-organized table, each column of data is stored in a distinct
set of table pages. The example shows that the DB2 optimizer estimates additional I/O
costs for the query that returns more columns in the result. The total estimated cost for the
query that references more columns is slightly higher.
With column-organized tables it is especially important to only include the required columns
in a query result to minimize costs and improve performance.
Instructor Guide
Instructor notes:
Purpose — To show the increased estimated I/O costs based on the number of columns
referenced by a query. This is a new concept compared to row-organized tables, where the
number of columns referenced does not impact the number of pages accessed.
Details —
Transition statement — Next we will discuss some general guidelines of what to look for
in access plans for column-organized tables.
V5.4
Instructor Guide
Uempty
What to look for in access plans for column-
organized tables
A good access plan would have: A suboptimal access plan could have:
• One or few CTQ • Many CTQs

operators • Many operators above
• Few operators above CTQ
CTQ • Operators above CTQ
• Operators above CTQ work on many rows
work on few rows • Many rows flow through
• Few rows flow through the CTQ
the CTQ
Disclaimer: While these are the most common indicators of good vs suboptimal
plans with column-organized tables, there can be exceptions.
Figure 4-21. What to look for in access plans for column-organized tables CL4636.0
Notes:
In many cases the performance characteristics of DB2 with BLU Acceleration will avoid the
need to analyze SQL access plans. You are not concerned with index usage or join
methods that you may need to analyze with row-organized tables.
Some general characteristics of a good access plan for column-organized tables are:
• One CTQ operator or possible a few CTQ operators
• Not many operators working using row-organized processing above the CTQ operator
• The operators above the CTQ are processing a small number of rows
• A small number of rows are flowing through the CTQ operator from column-organized
processing into row-organized processing
Some characteristics of a suboptimal access plan for column-organized tables are:
• Many CTQ operators
• Many operators working using row-organized processing above the CTQ operator
Instructor Guide
• The operators above the CTQ are processing a large number of rows
• A large number of rows are flowing through the CTQ operator from column-organized
processing into row-organized processing
V5.4
Instructor Guide

Purpose — To discuss some things to look for in access plans for column-organized tables
that could indicate potential performance issues.
Details —
Transition statement — Next we will look at an example of an access plan for a query that
joins two column-organized tables.
Instructor Guide
Explain report for joining two column-organized

tables
SELECT HISTORY.BRANCH_ID, TELLER.TELLER_NAME, |
HISTORY.ACCTNAME,HISTORY.ACCT_ID,HISTORY.BALANCE 2374.76
FROM HISTORY AS HISTORY, TELLER AS TELLER CTQ
WHERE HISTORY.TELLER_ID = TELLER.TELLER_ID AND ( 4)
HISTORY.BRANCH_ID = 25 225.686
ORDER BY HISTORY.BRANCH_ID ASC, HISTORY.ACCT_ID ASC 77.1883
| Column oriented
2374.76
Total Cost: 226.575 HSJOIN
processing
Query Degree: 1 ( 5)
225.661
Rows 77.1883
RETURN /-------+-------\
( 1) 2336.77 1000
Cost TBSCAN TBSCAN
I/O ( 6) ( 7)
| 207.783 17.8558
2374.76 74.662 2.52632
Row oriented TBSCAN | |
processing ( 2) 513576 1000
226.575 SORT Performed in CO-TABLE: TEST CO-TABLE: TEST
77.1883 Row-organized HISTORY TELLER
| Portion of access plan Q2 Q1
2374.76
SORT
( 3)
226.479
77.1883
Figure 4-22. Explain report for joining two column-organized tables CL4636.0
Notes:
The visual shows a portion of the access plan from an explain tool report for a query that
joins two column-organized tables.
Notice that the two tables are joined using a HASH join, the HSJOIN operator in the
column-organized processing section of the access plan. In most cases DB2 will utilize the
smaller result set as the inner, right side, of the HASH join to reduce memory requirements.
Since HASH joins use DB2 sort memory, processing and performance costs can be
impacted by insufficient sort memory for the database.
This is a copy of the access plan for the query.
Access Plan:
-----------
Total Cost: 226.575
Query Degree:1
Rows
RETURN
V5.4
Instructor Guide
Uempty ( 1)
Cost
I/O
|
2374.76
TBSCAN
( 2)
226.575
77.1883
|
2374.76
SORT
( 3)
226.479
77.1883
|
2374.76
CTQ
( 4)
225.686
77.1883
|
2374.76
HSJOIN
( 5)
225.661
77.1883
/-------+-------\
2336.77 1000
TBSCAN TBSCAN
( 6) ( 7)
207.783 17.8558
74.662 2.52632
| |
513576 1000
HISTORY TELLER
Q2 Q1
Instructor Guide
Instructor notes:
Purpose — To discuss a simple two table join query using column-organized tables. The
standard join method for column-organized table is a form of HASH join.
Details —
Transition statement — Next we will look at the access plan for joining three
V5.4
Instructor Guide
Uempty
Explain report for joining three column-organized
tables
SELECT ACCT.ACCT_ID, .... HISTORY.TEMP 310668
FROM ACCT , TELLER , HISTORY CTQ
WHERE ACCT.ACCT_ID = HISTORY.ACCT_ID AND ( 4)
ACCT.ACCT_GRP BETWEEN 100 AND 700 AND 626.331
HISTORY.TELLER_ID = TELLER.TELLER_ID 139.857
ORDER BY HISTORY.PID ASC |
310668
Total Cost: 838.428 HSJOIN
Column oriented ( 5)
Query Degree: 1
processing 620.916
Rows 139.857
RETURN /---------+----------\
( 1) 305697 1000
Cost HSJOIN TBSCAN
I/O ( 6) ( 9)
| 600.063 18.9682
310668 137.173 2.68421
TBSCAN /-------+-------\ |
Row oriented ( 2) 513576 595232 1000
processing 838.428 TBSCAN TBSCAN CO-TABLE: TEST
139.857 ( 7) ( 8) TELLER
| 203.491 388.68 Q2
310668 84.2958 52.8772
SORT | |
( 3) 513576 1e+06
825.893 CO-TABLE: TEST CO-TABLE: TEST
139.857 HISTORY ACCT
| Q1 Q3
Figure 4-23. Explain report for joining three column-organized tables CL4636.0
Notes:
The visual shows a portion of the access plan from an explain tool report for a query that
joins three column-organized tables.
Notice that the three tables are joined using two HASH join operations. Both join operations
are performed using column-organized processing.
Access Plan:
-----------
Total Cost: 838.428
Query Degree:1
Rows
RETURN
( 1)
Instructor Guide
Cost
I/O
|
310668
TBSCAN
( 2)
838.428
139.857
|
310668
SORT
( 3)
825.893
139.857
|
310668
CTQ
( 4)
626.331
139.857
|
310668
HSJOIN
( 5)
620.916
139.857
/---------+----------\
305697 1000
HSJOIN TBSCAN
( 6) ( 9)
600.063 18.9682
137.173 2.68421
/-------+-------\ |
513576 595232 1000
TBSCAN TBSCAN CO-TABLE: TEST
( 7) ( 8) TELLER
203.491 388.68 Q2
84.2958 52.8772
| |
V5.4
Instructor Guide
Uempty 513576 1e+06

HISTORY ACCT
Q1 Q3
Operator Symbols :
------------------
Symbol Description
--------- ------------------------------------------
ATQ : Asynchrony
BTQ : Broadcast
CTQ : Column-organized data
DTQ : Directed
LTQ : Intra-partition parallelism
MTQ : Merging (sorted)
STQ : Scatter
XTQ : XML aggregation
TQ* : Listener
Instructor Guide
Instructor notes:
Purpose — To show another example of an access plan, this one for a three table join, all
tables being column-organized. You may want to ask students about the TBSCAN
operations for column-organized tables and note that you will not see references to the
page map indexes or synopsis tables in the explain reports.
Details —
Transition statement — Next we will look at the db2expln explain report for the same
three table join query.
V5.4
Instructor Guide
Uempty
Explain report using db2expln for joining three
Estimated Cost = 849.247131 Optimizer Plan:
Estimated Cardinality = 323848.687500
Operator
( 4) CDE Subquery
(ID)
| Tables Referenced:
| | 1: TEST.TELLER ID = 4,6
| | 2: TEST.ACCT ID = 7,4 RETURN
| | 3: TEST.HISTORY ID = 6,6 ( 1)
| #Output Columns = 10
|
( 3) Insert Into Sorted Temp Table ID = t1
| #Columns = 10 TBSCAN
| #Sort Key Columns = 1 ( 2)
| | Key 1: (Ascending) |
| Sortheap Allocation Parameters:
SORT
| | #Rows = 323849.000000
| | Row Width = 104 ( 3)
| Piped |
( 2) Access Temp Table ID = t1 CTQ
| #Columns = 10
( 4)
| Relation Scan
| | Prefetch: Eligible |
| Sargable Predicate(s) *
( 1) | | Return Data to Application +-------++-------+
| | | #Columns = 10
Table: Table: Table:
( 1) Return Data Completion
TEST TEST TEST
HISTORY ACCT TELLER
Figure 4-24. Explain report using db2expln for joining three column-organized tables CL4636.0
Notes:
The visual shows a portion of the db2expln explain report for the three table join query
listed on the previous slide.
The db2expln report does not show the three table join as HASH join operators, it lists the
operation as a CDE Subquery that references the three tables. CDE, which stands for
columnar data engine, is a term used by DB2 to refer to column-organized processing.
Instructor Guide
Instructor notes:
Purpose — To discuss an example db2expln explain report for a query joining three
Details —
Transition statement — Next we will look at an example of an explain report joining
column-organized and row-organized tables.
V5.4
Instructor Guide
Uempty
Explain report for joining two column-organized
tables and one row-organized table
|
318667
^HSJOIN
( 4)
Row oriented
644.008 processing
142.331
/------------+-------------\
318667 1000
CTQ FETCH
Column ( 5) ( 9)
605.559 28.6196
oriented 138.331 4
processing | /---+----\
318667 1000 1000
HSJOIN IXSCAN TABLE: TEST
( 6) ( 10) TELLER
601.136 0.284191 Q2
138.331 0
/-------+-------\ |
513576 620487 1000
SELECT ACCT.ACCT_ID, ....
TBSCAN TBSCAN INDEX: SYSIBM
HISTORY.TEMP
( 7) ( 8) SQL130617120709960 FROM ACCT , TELLER , HISTORY
203.491 389.527 Q2 WHERE ACCT.ACCT_ID =
84.2958 54.0351 HISTORY.ACCT_ID AND
| | ACCT.ACCT_GRP BETWEEN 100 AND 700
513576 1e+06 AND HISTORY.TELLER_ID =
CO-TABLE: TEST CO-TABLE: TEST TELLER.TELLER_ID
HISTORY ACCT ORDER BY HISTORY.PID ASC
Q1 Q3
Figure 4-25. Explain report for joining two column-organized tables and one row-organized table CL4636.0
Notes:
For some applications, you may decide to use a mixture of column-organized and
row-organized tables.
The visual shows a portion of the db2exfmt explain tool report for the sample SQL query
used in previous examples, with one table defined as a row-organized table.
Notice that DB2 joins the two column-organized tables using column-organized processing
before the CTQ operator in the access plan.
The join operation with the row-organized must be performed using row-organized
processing, above the CTQ operator in the access plan.
In this example, the estimated costs using three column-organized tables was slightly less
than the cost estimates when one table was row-organized.
Access Plan:
-----------
Instructor Guide
Total Cost: 646.227

Query Degree:1
Rows
RETURN
( 1)
Cost
I/O
|
25684.8
TBSCAN
( 2)
646.227
142.331
|
25684.8
SORT
( 3)
645.19
142.331
|
25684.8
^HSJOIN
( 4)
633.734
142.331
/------------+-------------\
25684.8 1000
CTQ FETCH
( 5) ( 9)
604.281 28.6196
138.331 4
| /---+----\
25684.8 1000 1000
HSJOIN IXSCAN TABLE: TEST
( 6) ( 10) TELLER
603.923 0.284191 Q2
138.331 0
/-------+-------\ |
V5.4
Instructor Guide
Uempty 620487 41394.5 1000

TBSCAN TBSCAN INDEX: SYSIBM
( 7) ( 8) SQL130617120709960
389.527 210.867 Q2
54.0351 84.2958
| |
1e+06 513576
ACCT HISTORY
Q3 Q1
Instructor Guide
Instructor notes:
Purpose — To show an example of an access plan joining a mixture of column-organized
and row-organized tables.
Details —
Transition statement — Next we will cover the workload management capabilities DB2
uses for column-organized tables.
V5.4
Instructor Guide
Uempty
Workload Management with column-organized Tables
DB2 Workload Management of databases

with column-organized tables

Figure 4-26. DB2 Workload Management of databases with column-organized tables CL4636.0
Notes:
In this next section we will cover some special workload management functionality DB2
uses for databases running queries with column-organized tables.
Instructor Guide
Instructor notes:
Purpose — To introduce this topic on workload management for database with
Details —
Transition statement — Next we will discuss how DB2 uses workload management
functionality to optimize systems where analytic queries use column-organized tables.
V5.4
Instructor Guide
Uempty
Default query concurrency management
for AYALYTICS workload databases
• Query processing using column-organized tables is designed to run fast
by leveraging the highly parallelized in-memory processing of data
• To ensure that heavier workloads on column-organized data do not
overload the system when many queries are running simultaneously,
DB2 can limit the number of heavyweight queries executing at the same
time
• The limit is implemented using a WLM concurrency threshold
– Automatically enabled on new databases when DB2_WORKLOAD is set to
ANALYTICS
– Can be manually enabled on existing databases
• Field experience with DB2 WLM has also demonstrated that in analytic
environments controlling the admission of heavyweight queries into the
system benefits stability and overall performance
• When the limit on the number of heavyweight queries is reached, the
remaining queries are queued
– These must wait until other queries in this class complete before beginning their
execution
Figure 4-27. Default query concurrency management for AYALYTICS workload databases CL4636.0
Notes:
To ensure that heavier workloads on column-organized data do not overload the system
when many queries are submitted simultaneously, there is a limit on the number of
heavyweight queries that can execute on the database at the same time.
This limit can be implemented by using the default workload management concurrency
threshold that is automatically enabled on new databases when the value of the
DB2_WORKLOAD registry variable is set to ANALYTICS, or that can be manually enabled
on existing databases.
The processing of queries against column-organized tables is designed to run fast by
leveraging the highly parallelized in-memory processing of data. The trade-off for this high
performance is that queries referencing column-organized tables also have a relatively
large footprint in terms of memory and CPU when compared to similar queries processing
row-organized table data. As such, the execution of these types of queries is optimal when
relatively few of them are admitted to the system at a time. This enables them to
individually leverage more processing power and memory on the system, and minimizes
contention on the processor caches.
Instructor Guide
Field experience with DB2 workload management has also demonstrated that in analytic
environments that support mixed workloads (where queries might vary widely in their
degree of complexity and resource needs), controlling the admission of "heavyweight"
queries into the system yields improvements in both system stability and overall
performance, because resource overload on the system is avoided.
When the limit on the number of heavyweight queries is reached, the remaining queries are
queued and must wait until other queries leave the system before beginning their
execution. This can help to ensure system stability when a large number of complex ad hoc
queries are running on systems that have not implemented a specific workload
management strategy. Users who want to further optimize the execution of mixed
workloads on their systems are encouraged to look at the full range of workload
management capabilities offered in the DB2 database product.
V5.4
Instructor Guide

Purpose — To explain the implementation of a set of default workload management
capabilities that DB2 uses when the DB2_WORKLOAD registry variable is set to
ANALYTICS. This is done to make sure the resources needed to support the parallel
processing and memory usage for column-organized tables is available for query
processing.
Details —
Transition statement — Next we will discuss the concurrency management used for
heavyweight queries.
Instructor Guide
Automatic Workload Management

• Built-in and automated query resource consumption control
• Enabled automatically when DB2_WORKLOAD=ANALYTICS
• Many queries can be submitted, but limited number get executed
concurrently
Applications and Users DB2 DBMS kernel

Up to tens of thousands of SQL Moderate number of queries
queries at once consume resources
SQL Queries
..
.
Figure 4-28. Automatic Workload Management CL4636.0
Notes:
DB2 10.5 has built-in and automated query resource consumption controls.
Every additional query that runs concurrently naturally consumes more memory, locks,
CPU, and memory bandwidth.
The DB2 BLU Acceleration feature automatically allows a high level of concurrent queries
to be submitted for processing, but limits the number that consume resources at any point
in time.
That means more memory and CPU for each query that are actively running. This benefits
the entire analytics workload.
V5.4
Instructor Guide

Purpose — To discuss a simple graphic showing a large number of concurrent database
users able to make requests that are managed using workload management functionality
to provide efficient processing for resource demanding queries.
Details —
Transition statement — Next we will see the workload management objects that DB2
BLU Acceleration uses to manage concurrency for complex queries.
Instructor Guide
Default workload management objects for

concurrency control
• Default query concurrency management is implemented using existing
DB2 WLM infrastructure.
• Several new default WLM objects will be created on both upgraded and
newly created databases
– A service subclass, SYSDEFAULTMANAGEDSUBCLASS
• Under the existing SYSDEFAULTUSERCLASS superclass
• Where heavyweight queries will run and can be controlled and monitored as a group
– A CONCURRENTDBCOORDACTIVITIES threshold,
SYSDEFAULTCONCURRENT
• Which is applied to the SYSDEFAULTMANAGEDSUBCLASS subclass
• Controls the number of concurrently executing queries
• A work class set, SYSDEFAULTUSERWCS, and a new work class,
SYSMANAGEDQUERIES, which identifies the class of heavyweight
queries that are to be controlled
– Work class encompasses queries that are classified as READ DML falling above
a timeron threshold that reflects heavier queries
• A work action set, SYSDEFAULTUSERWAS, and work action,
SYSMAPMANAGEDQUERIES
– Maps all queries that fall into the SYSMANAGEDQUERIES work class to the
SYSDEFAULTMANAGEDSUBCLASS service subclass
Figure 4-29. Default workload management objects for concurrency control CL4636.0
Notes:
Default query concurrency management is implemented by leveraging the existing DB2
workload management infrastructure.
Several new default workload management objects will be created on both upgraded and
newly created databases:
• A service subclass, SYSDEFAULTMANAGEDSUBCLASS, under the existing
SYSDEFAULTUSERCLASS superclass, where heavyweight queries will run and can
be controlled and monitored as a group
• A CONCURRENTDBCOORDACTIVITIES threshold, SYSDEFAULTCONCURRENT,
which is applied to the SYSDEFAULTMANAGEDSUBCLASS subclass to control the
number of concurrently executing queries that are running in that subclass
• A work class set, SYSDEFAULTUSERWCS, and a new work class,
SYSMANAGEDQUERIES, which identify the class of heavyweight queries that are to
be controlled. The SYSMANAGEDQUERIES work class encompasses queries that are
V5.4
Instructor Guide
Uempty classified as READ DML (an existing work type for work classes) falling above a
timeron threshold that reflects heavier queries.
• A work action set, SYSDEFAULTUSERWAS, and work action,
SYSMAPMANAGEDQUERIES, which map all queries that fall into the
SYSMANAGEDQUERIES work class to the SYSDEFAULTMANAGEDSUBCLASS
service subclass
Instructor Guide
Instructor notes:
Purpose — DB2 uses some new default workload management objects to implement
concurrency controls for queries that are expected to consume larger amounts of system
resources.
Details —
Transition statement — Next we will see how the default workload management objects
can be easily tuned.
V5.4
Instructor Guide
Uempty
Default Workload flow
Figure 4-30. Default Workload flow CL4636.0
Notes:
This slide show the default workload management for BLU in more detail. Here are the
specifics:
• We split statements submitted to the system into two categories; unmanaged and
managed.
• Read-only queries with an estimated cost of > 150000 timerons are mapped to the
managed class.
The result is that:
Heavy queries are queued and only N are executed concurrently allowing them to
maximize their memory consumption / CPU parallelism and complete more quickly as well
as preventing system overload.
We maintain the response time of lightweight point queries by allowing them to bypass the
control and avoid queuing behind large queries; these queries have a much smaller
resource impact on the system so we let them pass through as quickly as possible.
Instructor Guide
Other activities (DDL, Utilities, ETL) continue to be unmanaged. If managing these is

desirable the WLM environment can be customized further.
Note
We apply a concurrency limit to the managed class which is computed at database

creation time based on the machine hardware and CPU parallelism to ensure orderly
execution of heavier weight analytic queries.
Note that you can recompute this value if your system configuration changes by
rerunning AUTOCONFIGURE.
V5.4
Instructor Guide

Purpose — To show how the estimated cost for query processing divides the workload into
two sets, managed and unmanaged.
Details —
Transition statement — Next we will explain the key concepts used for workload
management of column-organized query processing.
Instructor Guide
Default Workload Management Explained
Submitted statements are divided into two categories

– Managed
– Unmanaged
Read-only DML with a query cost estimate greater than

150000 timerons are considered “complex” analytic queries
and are mapped to the default managed subclass
– A query concurrency limit computed based on the underlying
system hardware is applied against all managed queries to
ensure orderly execution of complex queries/optimize overall
throughput / prevent resource overload
Figure 4-31. Default Workload Management Explained CL4636.0
Notes:
The workload management objects used by DB2 use a fixed query cost estimate to
determine if a SQL query should be run using the managed service subclass where a
concurrency limit is applied.
V5.4
Instructor Guide

Purpose — To discuss how a fixed query cost estimate is used to determine if a SQL query
is complex and needs to be run is the managed service subclass.
Details —
Instructor Guide
Default Workload Management Explained (cont.)
Read-only DML with a query cost estimate < 150000 timerons are
mapped to the unmanaged class to ensure that small point queries do
not end up queued behind larger complex queries
– This is key as it avoids negatively impacting the response times of short
queries which is the typical drawback of queuing schemes
– Short queries generally have much smaller resource impact on the system so
allowing them to run unmanaged is feasible
Non-DML activities continue to run unmanaged by default

– Current Default Workload Management is focused specifically on the impacts
of large columnar analytic queries
End result is simple and effective (if not completely optimal) workload
management out of the box for BLU Acceleration
Figure 4-32. Default Workload Management Explained - continued CL4636.0
Notes:
The end result is simple and effective workload management out of the box, resulting in
more efficient use of resources, better stability, and better performance.
At the same time while this configuration is much better than an unmanaged system, it is a
rather “blunt” instrument, and it would not be accurate to qualify it as optimal. In the next
section we will explore how with a little bit of tuning you can further optimize these controls
and really squeeze the best performance out of your system.
V5.4
Instructor Guide

Purpose — To discuss the unmanaged class of processing for a DB2 database with an
analytics workload.
Details —
Instructor Guide
Querying the Default WLM Work Class Settings
hotellnx86:/home/hotellnx86/davek> db2pd -workclasssets –alldbs

Database Member 0 -- Database XDB -- Active -- Up 0 days 10:36:52 -- Date
2013-09-05-22.33.08.942063
(…)
Work Classes:
Address = 0x00002AAC34C11840
ClassSetId = 2147483647
ClassId = 2147483647
ClassName = SYSMANAGEDQUERIES
Work Class Attributes:
Work Type = 2 Query cost level
Timeron Cost:
From Value = 150000
To Value = 0
(…)
Figure 4-33. Querying the Default WLM Work Class Settings CL4636.0
Notes:
The db2pd command can be used to show the current setting for workload management
objects in an active DB2 database.
The visual shows a portion of the report generated using the -workclasssets option of
db2pd. The work class name is SYSMANAGEDQUERIES. The work class attributes show
the lower threshold limit of 150000, with no upper limit.
V5.4
Instructor Guide

Purpose — To look at an example of a db2pd report showing the current lower limit for
estimated processing cost in the work class used for managed queries.
Details —
Instructor Guide
Querying the Default WLM Threshold Settings
hotellnx86:/home/hotellnx86/davek> db2pd -thresholds -alldbs
(…)
Service Class Thresholds:
Threshold Name = SYSDEFAULTCONCURRENT

Threshold ID = 2147483647
Domain = 40
Domain ID = 4 Query concurrency limit
Predicate ID = 90
Maximum Value = 12
Enforcement = D
Queueing = Y
Queue Size = -1
Collect Flags = N
Partition Flags = C Threshold is enabled
Execute Flags = S
Enabled = Y
Check Interval (seconds) = 0
Remap Target Serv. Subclass = 0
Log Violation Evmon Record = Y
(…)
Figure 4-34. Querying the Default WLM Threshold Settings CL4636.0
Notes:
The visual shows a portion of the report generated using the -thresholds option of db2pd.
The threshold name is SYSDEFAULTCONCURRENT. The report shows that the maximum
concurrency is set to 12 on this system, and the threshold is enabled.
V5.4
Instructor Guide

Purpose — To show a db2pd report that could be used to check the current concurrency
limit used for the managed subclass.
Details —
Instructor Guide
“How Many Queries are Above/Below the Cost

Line?”
with smallcost as
( Count of queries below
select sum(num_coord_exec) as smallcost from
table(mon_get_pkg_cache_stmt(null,null,null,-2))
TIMERON threshold
where query_cost_estimate < 150000
),
smalltime as Count of queries that
( execute for less than 30
select sum(num_coord_exec) as smalltime from
table(mon_get_pkg_cache_stmt(null,null,null,-2)) seconds (“short” queries)
where (coord_stmt_exec_time / nullif(num_coord_exec,0))
< 30 ),
total as
(
select sum(num_coord_exec) as total Total number of query
from table(mon_get_pkg_cache_stmt(null,null,null,-2))
)
executions on the
select (smallcost * 100) / total as pctsmallcost, system
(smalltime * 100) / total as pctsmalltime
from smallcost, smalltime, total;
About 30% of our queries are

PCTSMALLCOST PCTSMALLTIME
------------ ------------ running “unmanaged” but 50% of
30 50
our queries are “short running”
Figure 4-35. How Many Queries are Above/Below the Cost line CL4636.0
Notes:
The visual shows how the MON_GET_PKG_CACHE_STMT table function could be used
to calculate a percentage of the SQL statements in the database package cache had an
estimated cost that would case them to be processed as unmanaged. In the example 30
percent of the statements in package cache would be unmanaged.
The query also calculates a percent of SQL statements that has an average execution time
of 30 seconds or less. The sample result shows 50 percent of the statements had an
average execution time of less than 30 seconds.
V5.4
Instructor Guide

Purpose — To show an example query that could be used to check the database package
cache to see how many queries that have executed would be considered complex and
would have used the managed workload subclass for processing.
Details —
Transition statement — Next we will see how we might adjust the estimated cost limit for
managed query processing.
Instructor Guide
Adjusting the TIMERON Cost Threshold
Shorter/Smaller Queries
Adjust the
cost so that
queries that
Query cost = 150000 run for less
than 30
Queries shorter seconds are
than 30 secs Adjust cost not controlled
ALTER WORK CLASS SET SYSDEFAULTUSERWCS

ALTER WORK CLASS SYSMANAGEDQUERIES
FOR TIMERONCOST FROM 200000 TO UNBOUNDED
Increased from 150000
Longer/Larger Queries
Figure 4-36. Adjusting the TIMERON Cost Threshold CL4636.0
Notes:
This visual shows an illustration the spectrum of queries that have executed on our system,
organized by execution time from shortest to longest. From our previous query we know
that while approximately 30% of our queries are qualifying as unmanaged, roughly 50% fall
into what we want to categorize as “short running”.
By iteratively increasing the query cost and rechecking our percentages we can tune our
mapping criteria to ensure that all our “short running” queries are mapped to the
unmanaged default subclass, helping to ensure we minimize their response times while still
controlling the heavyweight queries submitted against the system.
Of course the inverse also applies and you may need to decrease the query cost of larger
queries are getting mapped to the unmanaged class.
V5.4
Instructor Guide

Purpose — To discuss the possible adjustment to the estimated cost for queries that would
cause a query to be processed as managed.
Details —
Transition statement — Next we will discuss adjustments to the default workload
management objects in general.
Instructor Guide
Adjusting the default WLM configuration
• The mapping of heavyweight READ DML queries to

SYSDEFAULTMANAGEDSUBCLASS can be enabled or disabled:
– To Enable the default work action mapping
ALTER WORK ACTION SET SYSDEFAULTUSERWAS ENABLE
– To Disable the default work action mapping so that all queries get mapped to
SYSDEFAULTSUBCLASS
ALTER WORK ACTION SET SYSDEFAULTUSERWAS DISABLE
• The timeron range for the mapping of heavyweight READ DML queries can be
adjusted:
ALTER WORK CLASS SET SYSDEFAULTUSERWCS
ALTER WORK CLASS SYSMANAGEDQUERIES FOR TIMERONCOST
FROM 100000 TO UNBOUNDED
• The concurrency threshold on SYSDEFAULTMANAGEDSUBCLASS can be
enabled or disabled:
– To Enable the concurrency threshold
ALTER THRESHOLD SYSDEFAULTCONCURRENT ENABLE
– To Disable the concurrency threshold
ALTER THRESHOLD SYSDEFAULTCONCURRENT DISABLE
• The concurrency limit can be adjusted
ALTER THRESHOLD SYSDEFAULTCONCURRENT
WHEN CONCURRENTDBCOORDACTIVITIES > 100 STOP EXECUTION
Figure 4-37. Easily Tune default WLM controls for over-utilized or under-utilized state CL4636.0
Notes:
If the system appears to be running in an under-utilized state, take the following steps:
Examine the WLM_QUEUE_TIME_TOTAL metric, which is reported by various
system-level table functions (or the statistics event monitor), to determine whether
queries in the system are accumulating time waiting on concurrency thresholds.
If no such queue time is being accumulated, the system is simply running under
peak capacity, and no tuning is necessary.
If queue time is being accumulated, work is being queued on the system. Monitor
the amount of work running in SYSDEFAULTSUBCLASS and
SYSDEFAULTMANAGEDSUBCLASS, and consider incrementally increasing the
TIMERONCOST minimum on SYSMANAGEDQUERIES if it appears that too large
a proportion of the workload is running within the managed class.
Assuming that the distribution of managed and unmanaged work appears reasonable,
consider incrementally increasing the concurrency limit that is specified by
SYSDEFAULTCONCURRENT until system resource usage reaches the target level.
V5.4
Instructor Guide
Uempty If the system appears to be running in an over-utilized state, take the following steps:
Monitor the amount of work running in SYSDEFAULTSUBCLASS and
SYSDEFAULTMANAGEDSUBCLASS, and consider incrementally decreasing the
TIMERONCOST minimum on SYSMANAGEDQUERIES if it appears that too small a
proportion of the workload is running within the managed class.
Assuming that the distribution of managed and unmanaged work appears reasonable,
consider incrementally decreasing the concurrency limit that is specified by
SYSDEFAULTCONCURRENT until system resource usage is back within the target
range.
Instructor Guide
Instructor notes:
Purpose — To discuss adjusting the configuration of the default workload management
objects to handle an over-utilized or under-utilized system.
Details —
Transition statement — Next we will discuss monitoring for DB2 database were
column-organized tables will be accessed.
V5.4
Instructor Guide
Uempty
Monitoring databases and application using column-organized

Tables
Using SQL and db2pd to monitor

processing column-organized tables

Figure 4-38. Using SQL and db2pd to monitor processing column-organized tables CL4636.0
Notes:
In this next section we will be covering the monitoring facilities for column-organized table
processing.
Instructor Guide
Instructor notes:
Purpose — To introduce the next section, which will cover some new monitoring
capabilities associated with column-organized tables.
Details —
Transition statement — Next we will look at the new monitoring elements that can be
used to understand processing for column-organized tables.
V5.4
Instructor Guide
Uempty
Monitoring metrics for column-organized tables

• column-organized tables utilize a new column-organized object from a
storage perspective
• Access to column-organized object pages are counted separate from
other storage objects like data, index and xml
– Counters for total logical and physical column-organized data page reads
• POOL_COL_L_READS
• POOL_COL_P_READS
• POOL_COL_LBP_PAGES_FOUND
– Counter for column-organized data page writes - POOL_COL_WRITES
– Counters for asynchronous column-organized data page reads and writes and
pages found:
• POOL_ASYNC_COL_READS
• POOL_ASYNC_COL_READ_REQS
• POOL_ASYNC_COL_WRITES
• POOL_ASYNC_COL_LBP_PAGES_FOUND
– Counters for column-organized data page reads per table:
• OBJECT_COL_L_READS
• OBJECT_COL_P_READS
• OBJECT_COL_LBP_PAGES_FOUND
Figure 4-39. Monitoring metrics for column-organized tables CL4636.0
Notes:
A new set of monitor elements enables the monitoring of data page I/O for
column-organized tables separately from that of row-organized tables. You can use these
monitor elements to understand what portion of the I/O is being driven by access to
column-organized tables when a workload impacts both row-organized and
column-organized tables. These elements can also help you to tune the system, for
example, by helping you to decide whether to place column-organized tables in separate
table spaces, or whether to use a separate buffer pool. A column-organized data page
contains column data for a column-organized table.
Counters for total logical and physical column-organized data page reads and pages found:
• POOL_COL_L_READS
• POOL_COL_P_READS
• POOL_COL_LBP_PAGES_FOUND
Counter for column-organized data page writes: POOL_COL_WRITES
Instructor Guide
Counters for asynchronous column-organized data page reads and writes and pages
found:
• POOL_ASYNC_COL_READS
• POOL_ASYNC_COL_READ_REQS
• POOL_ASYNC_COL_WRITES
• POOL_ASYNC_COL_LBP_PAGES_FOUND
Counters for column-organized data page reads per table (and per statement per table,
reported through monitor usage lists):
• OBJECT_COL_L_READS
• OBJECT_COL_P_READS
• OBJECT_COL_LBP_PAGES_FOUND
V5.4
Instructor Guide

Purpose — To discuss the new monitoring elements used to track activity for the
column-organized object pages.
Details —
Transition statement — Next we will look at an example of an SQL query that can be
used to monitor activity for column-organized tables.
Instructor Guide
Monitoring column-organized tables and synopsis

tables page reads using MON_GET_TABLE
SELECT VARCHAR(TABNAME,30) AS TABLE, VARCHAR(TABSCHEMA,12) AS SCHEMA,
ROWS_READ, OBJECT_DATA_L_READS AS DATA_L_READS,
OBJECT_COL_L_READS AS COLUMN_L_READS, OBJECT_COL_P_READS,
OBJECT_COL_LBP_PAGES_FOUND
FROM TABLE(MON_GET_TABLE(NULL,NULL,-2)) AS T1
WHERE TABSCHEMA = 'TEST' OR (TABSCHEMA = 'SYSIBM' AND TABNAME LIKE 'SYN%')
ORDER BY TABSCHEMA,TABNAME
TABLE SCHEMA ROWS_READ DATA_L_READS

------------------------------ ------------ -------------------- --------------------
SYN130617110037170122_HISTORY SYSIBM 502 4
SYN130617115333621920_ACCT SYSIBM 977 4
SYN130618131822321797_TELLER SYSIBM 0 4
ACCT TEST 606208 21
HISTORY TEST 513576 35
TELLER TEST 1000 23
COLUMN_L_READS OBJECT_COL_P_READS OBJECT_COL_LBP_PAGES_FOUND

-------------------- -------------------- --------------------------
7 5 2
7 5 2
2 1 1
237 17 220
332 37 295
7 5 2
Figure 4-40. Monitoring column-organized tables and synopsis tables using MON_GET_TABLE CL4636.0
Notes:
The example query uses the MON_GET_TABLE monitor function to return selected table
activity statistics.
The query specifies a set of column-organized tables in the schema name ‘TEST’. There
are standard statistics like ROWS_READ and several new monitor elements like
OBJECT_COL_P_READS and OBJECT_COL_LBP_PAGES_FOUND showing processing
for the pages in the column-organized object. Notice the counts for pages read from the
table data object. These are references to the column dictionaries and other table
metadata.
The query also includes the synopsis tables that DB2 creates and uses internally, to better
understand the column-organized table processing. Since synopsis tables are internally
managed column-organized tables, access to these will also be tracked with the same
monitor elements.
V5.4
Instructor Guide

Purpose — To discuss an example of a query that uses MON_GET_TABLE to show buffer
pool activity and counts for rows read for a set of column-organized tables.
Details —
Transition statement — Next we will look at monitoring column references by table.
Instructor Guide
Monitoring the number of columns referenced per

query for each table using MON_GET_TABLE
SELECT varchar(tabname,20) as table,
varchar(tabschema,12) as schema,
rows_read,
table_scans, num_columns_referenced,
section_exec_with_col_references,
( num_columns_referenced / section_exec_with_col_references ) as
avg_columns_persql
from table(mon_get_table('ROWORG',NULL,-2)) as t1
order by tabname ;
TABLE SCHEMA ROWS_READ TABLE_SCANS NUM_COLUMNS_REFERENCED

--------- ------------ ------------- ------------ ----------------------
ACCT ROWORG 1000000 1 2
BRANCH ROWORG 11 0 2
HISTORY ROWORG 627152 0 4
SECTION_EXEC_WITH_COL_REFERENCES AVG_COLUMNS_PERSQL
-------------------------------- --------------------
1 2
1 2
2 2
Figure 4-41. Monitoring the number of columns referenced per query for each table using MON_GET_TABLE CL4636.0
Notes:
The elements section_exec_with_col_references and num_columns_referenced returned
by MON_GET_TABLE can be used to determine the average number of columns being
accessed from a table during execution of the runtime section for an SQL statement. This
average column access count can help identify row-organized tables that might be
candidates for conversion to column-organized tables (for example, wide tables where only
a few columns are typically accessed).
V5.4
Instructor Guide

Purpose — To show a SQL query that can check the average number of columns
referenced for each table. This would help to determine in converting a row-organized table
to a column-organized table would likely produce improved performance.
Details —
Transition statement — Next we will look at monitoring the access to page map indexes
Instructor Guide
Monitoring Page Map Index statistics for column-

organized tables using MON_GET_INDEX
SELECT VARCHAR(MON.TABNAME,12) AS TABLE, MEMBER,
VARCHAR(CAT.INDNAME,30) AS IX_NAME, MON.IID AS INDEX_ID, MON.NLEAF,
MON.OBJECT_INDEX_L_READS, MON.OBJECT_INDEX_P_READS,
MON.OBJECT_INDEX_LBP_PAGES_FOUND
FROM TABLE(MON_GET_INDEX('TEST',NULL,-2)) AS MON, SYSCAT.INDEXES AS CAT
WHERE MON.TABNAME = CAT.TABNAME AND MON.TABSCHEMA = CAT.TABSCHEMA
AND MON.IID = CAT.IID
AND MON.TABNAME IN ('ACCT','HISTORY','BRANCH','TELLER')
ORDER BY MON.TABNAME
TABLE MEMBER IX_NAME INDEX_ID NLEAF

------------ ------ ------------------------------ -------- --------------------
ACCT 0 SQL130617115333860 1 1
HISTORY 0 SQL130617110037380 1 1
TELLER 0 SQL130618131822550 1 1
OBJECT_INDEX_L_READS OBJECT_INDEX_P_READS OBJECT_INDEX_LBP_PAGES_FOUND

-------------------- -------------------- ----------------------------
6 1 5
9 1 8
5 1 4
Figure 4-42. Monitoring Page Map Index statistics for column-organized tables using MON_GET_INDEX CL4636.0
Notes:
The page map indexes that DB2 creates and accesses internally for column-organized
tables can be monitored like any other DB2 index.
The use of page map indexes will not be included in the explain reports, but the processing
of these indexes can still be monitored.
The sample query uses the statistics available through the monitor table function
MON_GET_INDEX that shows buffer pool activity for index pages.
V5.4
Instructor Guide

Purpose — To show that DB2 does include the page level activity for the page level
indexes on column-organized tables in the data returned by MON_GET_INDEX.
Details —
Transition statement — Next we will look at the monitoring for HASH join processing used
Instructor Guide
Column-organized table join sortheap memory usage

can be monitored using HASH join statistics
• HASH join use of sortheap memory can be monitored using:
– TOTAL_HASH_JOINS
– TOTAL_HASH_LOOPS
– HASH_JOIN_OVERFLOWS
– HASH_JOIN_SMALL_OVERFLOWS
– POST_SHRTHRESHOLD_HASH_JOINS
• These can be monitored at various levels
– Activity or package cache entry
– Connection, Unit of Work, Workload, Service Subclass, etc.
– Database level using MON_GET_DATABASE or
MON_GET_DATABASE_DETAILS
Figure 4-43. Column-organized table join sortheap memory usage can be monitored using HASH join statistics CL4636.0
Notes:
DB2 uses a form of HASH join for joining Column-organized tables. This makes use of
database sort memory during processing. With DB2 10.5 a set of monitor elements that
were previously available to snapshot based monitoring can now be retrieved using the
newer monitor functions like MON_GET_ACTIVITY, MON_GET_CONNECTION and
MON_GET_DATABASE.
The monitor elements related to HASH join processing are:
• TOTAL_HASH_JOINS - The total number of hash joins executed.
• TOTAL_HASH_LOOPS - The total number of times that a single partition of a hash join
was larger than the available sort heap space.
• HASH_JOIN_OVERFLOWS - The number of times that hash join data exceeded the
available sort heap space.
• HASH_JOIN_SMALL_OVERFLOWS - The number of times that hash join data
exceeded the available sort heap space by less than 10%.
V5.4
Instructor Guide
Uempty • POST_SHRTHRESHOLD_HASH_JOINS - The total number of times that a hash join

heap request was limited due to concurrent use of shared sort heap space.
Instructor Guide
Instructor notes:
Purpose — To discuss the monitoring of HASH join processing for column-organized
tables to see if the available sort memory allowed the HASH join to process efficiently.
Details —
Transition statement — Next we will look at monitoring for the HASH based Group By
processing for column-organized tables as well as new time-spent monitoring elements.
V5.4
Instructor Guide
Uempty
Additional monitoring elements for column-
organized table processing
• The GROUP BY operator on column-organized tables uses hashing as
the grouping method.
– Hashed GROUP BY operators are consumers of sort memory
– The following new monitor elements support the monitoring of sort memory
consumption during hashed GROUP BY operations
• TOTAL_HASH_GRPBYS
• ACTIVE_HASH_GRPBYS
• HASH_GRPBY_OVERFLOWS
• POST_THRESHOLD_HASH_GRPBYS
• ACTIVE_HASH_GRPBYS_TOP
• New time-spent monitor elements

– TOTAL_COL_TIME - represents total elapsed time over all column-organized
processing subagents
– TOTAL_COL_PROC_TIME - represents the subset of TOTAL_COL_TIME in
which the column-organized processing subagents were not idle on a measured
wait time (for example: lock wait, IO)
– TOTAL_COL_EXECUTIONS - the total number of times that data in column-
organized tables was accessed during statement execution.
Figure 4-44. Additional monitoring elements for column-organized table processing CL4636.0
Notes:
The GROUP BY operator on column-organized tables uses hashing as the grouping
method. Hashed GROUP BY operators are consumers of sort memory.
The following new monitor elements support the monitoring of sort memory consumption
during hashed GROUP BY operations. These elements are similar to existing monitor
elements for other sort memory consumers.
• TOTAL_HASH_GRPBYS
• ACTIVE_HASH_GRPBYS
• HASH_GRPBY_OVERFLOWS
• POST_THRESHOLD_HASH_GRPBYS
Time-spent monitor elements provide information about how the DB2 database manager is
spending time processing column-organized tables. The time-spent elements are broadly
categorized into wait times and processing times.
The following monitor elements are added to the time-spent hierarchy:
Instructor Guide
The three TOTAL_* metrics count the total time that is spent in column-organized
data processing across all column-organized processing subagents.
TOTAL_COL_TIME represents total elapsed time over all column-organized
processing subagents.
TOTAL_COL_PROC_TIME represents the subset of TOTAL_COL_TIME in which
the column-organized processing subagents were not idle on a measured wait time
(for example: lock wait, IO).
TOTAL_COL_EXECUTIONS represents the total number of times that data in
column-organized tables was accessed during statement execution.
V5.4
Instructor Guide

Purpose — To discuss some additional monitoring elements that can be used to analyze
queries using column-organized processing.
Details —
Transition statement — Next we will look at the monitoring of prefetch activity for
Instructor Guide
Monitor elements to monitor prefetch requests for

data in column-organized tables
• DB2 tracks buffer pool hit rates for each column so prefetch can be
enabled or disabled on access to column data
• New monitor elements to measure prefetcher efficiency for data in
column-organized tables that are being submitted to prefetchers and the
number of pages that prefetchers skipped reading because the pages
were already in memory
• Efficient prefetching of data in column-organized tables is important for
mitigating the I/O costs of data scans
• The following monitor elements enable the monitoring of prefetch
requests for data in column-organized tables:
– POOL_QUEUED_ASYNC_COL_REQS
– POOL_QUEUED_ASYNC_COL_PAGES
– POOL_FAILED_ASYNC_COL_REQS
– SKIPPED_PREFETCH_COL_P_READS
– SKIPPED_PREFETCH_UOW_COL_P_READS
Figure 4-45. Monitor elements to monitor prefetch requests for data in column-organized tables CL4636.0
Notes:
The prefetch logic for queries that access column-organized tables is used to
asynchronously fetch only those pages that each thread will read for each column that is
accessed during query execution. If the pages for a particular column are consistently
available in the buffer pool, prefetching for that column is disabled until the pages are being
read synchronously, at which time prefetching for that column is enabled again.
Although the number of pages that a thread can prefetch simultaneously is limited by the
prefetch size of the table space that is being accessed, several threads can also prefetch
pages simultaneously.
New monitor elements to measure prefetcher efficiency can help you to track the volume of
requests for data in column-organized tables that are being submitted to prefetchers, and
the number of pages that prefetchers skipped reading because the pages were already in
memory. Efficient prefetching of data in column-organized tables is important for mitigating
the I/O costs of data scans.
The following monitor elements enable the monitoring of prefetch requests for data in
V5.4
Instructor Guide
Uempty • POOL_QUEUED_ASYNC_COL_REQS
• POOL_QUEUED_ASYNC_COL_PAGES
• POOL_FAILED_ASYNC_COL_REQS
• SKIPPED_PREFETCH_COL_P_READS
• SKIPPED_PREFETCH_UOW_COL_P_READS
Instructor Guide
Instructor notes:
Purpose — To discuss the monitoring for prefetch activity for column-organized tables.
There are a number of differences in prefetch processing for column-organized tables. One
feature in column-organized processing is the enabling and disabling of prefetch requests
based on buffer pool hits on a column basis, so DB2 can use prefetch for column data that
is not likely to be memory resident and bypass prefetch overhead if a column has become
buffer pool resident.
Details —
Transition statement — Next we will look at a query that summarizes key performance
statistics for column-organized processing at a database level,
V5.4
Instructor Guide
Uempty
Monitoring Database statistics with column-
organized tables using MON_GET_DATABASE
SELECT ROWS_READ, ROWS_RETURNED,
TOTAL_SORTS, SORT_OVERFLOWS,
TOTAL_HASH_JOINS, HASH_JOIN_OVERFLOWS,
POOL_COL_L_READS, TOTAL_COL_TIME,
TOTAL_HASH_GRPBYS, HASH_GRPBY_OVERFLOWS, SORT_SHRHEAP_TOP
FROM TABLE(MON_GET_DATABASE(-1))
ROWS_READ ROWS_RETURNED TOTAL_SORTS SORT_OVERFLOWS

-------------------- -------------------- -------------------- --------------------
3532131 69427 47 0
TOTAL_HASH_JOINS HASH_JOIN_OVERFLOWS POOL_COL_L_READS TOTAL_COL_TIME

-------------------- -------------------- -------------------- --------------------
14 0 2190 2798
TOTAL_HASH_GRPBYS HASH_GRPBY_OVERFLOWS SORT_SHRHEAP_TOP

-------------------- -------------------- --------------------
3 0 12000
Figure 4-46. Monitoring Database statistics with column-organized tables using MON_GET_DATABASE CL4636.0
Notes:
The sample query uses the MON_GET_DATABASE table function to retrieve some of the
monitoring elements that would indicate some key performance measures for
column-organized processing, like hash joins, hash based group by processing.
These could be used to monitor efficient configuration of database shared sort memory,
which can not be managed by the self tuning memory manager with column-organized
table support.
Instructor Guide
Instructor notes:
Purpose — To show a query that uses the MON_GET_DATABASE monitor element to
look at some key performance measures that could indicate a need to change the
configuration for database shared sort memory.
Details — The MON_GET_DATABASE function was added with DB2 10.5 to make it
easier to access database level statistics without using snapshot based monitoring.
Transition statement — Next we will look at an example of a db2pd command that shows
a LOAD utility processing a column-organized table.
V5.4
Instructor Guide
Uempty
Monitor column-organized table LOAD using
db2pd command –utilities option
Database Member 0 -- Active -- Up 0 days 00:25:21 -- Date 2013-05-20-
08.40.33.104992
Utilities:
Address ID Type State Invoker Priority StartTime
DBName NumPhases CurPhase Description
0x000000020557F540 3 LOAD 0 0 0 Mon May 20
08:40:20 TESTBLU 4 3 [LOADID: 50.2013-05-20-08.40.20.042757.0 (4;6)]
[*LOCAL.inst20.130520122733] OFFLINE LOAD DEL AUTOMATIC INDEXING REPLACE COPY NO INST20 .ACCT
Progress:
Address ID PhaseNum CompletedWork TotalWork
StartTime Description
0x000000020557F868 3 1 0 bytes 0 bytes
Mon May 20 08:40:20 SETUPo
0x000000020557FA20 3 2 1000000 rows 1000000 rows
Mon May 20 08:40:20 ANALYZEl
0x000000020557FBA8 3 3 831694 rows 1000000 rows
Mon May 20 08:40:26 LOADm
0x000000020557FD30 3 4 0 indexes 2 indexes
NotStarted BUILD
Analyze Phase shown for column-organized Table
Figure 4-47. Monitor column-organized table LOAD using db2pd command -utilities option CL4636.0
Notes:
The processing performed by the LOAD utility for column-organized tables is a key
component for column-organized table support. The column dictionaries are created by a
LOAD utility using a new phase of processing, ANALYZE.
The sample db2pd command report for the -utilities option can be used to monitor LOAD
utiltiy processing for column-organized tables. The example report output shows that the
LOAD utility has completed the ANALYZE phase and is currently performing the LOAD
phase of processing. Compared to load processing for row-organized tables, the BUILD
phase should be less time-consuming, since the page map indexes used for
column-organized tables are built on a page basis not a row basis.
The memory requirements for loading column-organized tables may require some changes
to the number of load utilities that are run concurrently.
Instructor Guide
Instructor notes:
Purpose — To look at an example of a db2pd command -utilities report that could be used
to monitor loading column-organized tables.
Details —
Transition statement — Next we will summarize the topics covered in this lecture on
V5.4
Instructor Guide
Uempty
Unit summary
• Implement column-organized table support for a new or
existing DB2 database
• Configure a DB2 database that uses DB2 column-organized
tables, including sort memory and utility heap memory
Acceleration processing and how you can tailor the WLM
objects to efficiently use system resources
• Monitor a DB2 database or application that uses column-
organized tables using SQL monitor functions
• Use db2convert or ADMIN_MOVE_TABLE to convert row-
organized tables to column-organized tables
Notes:
Instructor Guide
Instructor notes:
Purpose —
Details —
V5.4
Instructor Guide
Uempty
Student exercise
Figure 4-49. Student exercise CL4636.0
Notes:
Instructor Guide
Instructor notes:
Purpose —
Details —
V9.0
Instructor Guide
Uempty
Unit 5. DB2 10.5 BLU Acceleration Implementing
Shadow Tables and User Maintained MQTs
Estimated time
01:15

This unit describes the support for Materialized Query Tables with
Column-organized tables. We will explain how a User Maintained MQT can
be created as a column-organized table. The concepts and management of
Shadow Tables, a special type of MQT will be covered. The support for
Shadow tables requires configuration and use of Infosphere Change Data
Capture for DB2 LUW, so we will describe the CDC objects that need to be
created to enable use of shadow tables. We will also cover the various DB2
special registers that must be set for applications to enable use of shadow
tables, including the use of a connection procedure.

• Implement Shadow tables for selected row-organized tables to improve
analytics query performance
processing, including OLTP and Analytics query processing with Shadow
tables
• Create the Infosphere CDC Datastore, Subscription and Table mappings
required to support Shadow tables
• Describe the various Special Register settings, like REFRESH AGE and
CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION for using
Shadow tables
• Utilize explain reports to verify use of Shadow Tables in access plans
How you will check your progress

• Accountability:
- Machine exercises
© Copyright IBM Corp. 2005, 2015 Unit 5. DB2 10.5 BLU Acceleration Implementing Shadow Tables and 5-1
Instructor Guide
References
The IBM Knowledge Center for DB2 LUW can be used to get additional
details on the functions and commands described in this
lecture unit.
V9.0
Instructor Guide
Uempty
Unit objectives
Shadow tables
• Describe the various Special Register settings, like REFRESH
AGE and CURRENT MAINTAINED TABLE TYPES FOR
OPTIMIZATION for using Shadow tables
• Utilize explain reports to verify use of Shadow Tables in access
plans
Notes:
List is a list of objectives for this lecture unit.
Instructor Guide
Instructor notes:
Details —
Transition statement — Let’s first we will review some of the important terms and concepts for
materialized query tables.
V9.0
Instructor Guide
Uempty
Materialized Query Table – Concept Review

• A Materialized Query Table (MQT) is a physical table containing the
precomputed results from the tables that you specify in the materialized
query table definition
– The CREATE TABLE statement used to create a MQT contains the clause AS
SELECT to define the query that produces the table contents
For example
CREATE TABLE SALES.SUMMARY_2014 as
SELECT ….. FROM SALES.SALESDETAIL …WHERE SALES_YEAR = 2014…
• The DB2 Optimizer may utilize the MQT table data to produce a query
result if the SQL statement being processed matches the MQT table
definition and access to the MQT reduces processing costs
SELECT … FROM SALES.SALESDETAIL
WHERE SALES_YEAR = 2014 and …..
SELECT … FROM SALES_SUMMARY_2014

DB2 Optimizer WHERE SALES_YEAR = 2014 and …..
Figure 5-2. Materialized Query Table – Concept Review CL4636.0
Notes:
Materialized query tables (MQTs) are a powerful way to improve response time for complex
analytical queries because their data consists of precomputed results from the tables that you
specify in the materialized query table definitions.
MQTs can help improve response time particularly for queries that use one or more of the following
types of data:
• Aggregate data over one or more dimensions
• Joins and aggregate data over a group of tables
• Data from a commonly accessed subset of data
• Repartitioned data from a table, or part of a table, in a partitioned database environment
The larger the base tables, the more significant are the potential improvements in response time
when you use MQTs.
During the query rewrite phase, the optimizer determines whether to use an available MQT in place
of accessing the referenced base tables directly. If an MQT is used, you need access privileges on
Instructor Guide
the base tables, not the MQT, and the explain facility can provide information about which MQT was
selected.
V9.0
Instructor Guide

Purpose — To introduce the concept of a MQT as a query result that the DB2 optimizer can utilize
to reduce query processing costs.
Details —
Transition statement — Next we will discuss the options for refreshing the contents of MQT
tables.
Instructor Guide
MQT Refresh Options - review

• REFRESH IMMEDIATE
– The MQT contents will be updated automatically by DB2 when the tables
referenced by the MQT AS SELECT statement change
– Implies MAINTAINED BY SYSTEM
– REFRESH TABLE and SET INTEGRITY can be used to load data into the MQT
• REFRESH DEFERRED
– The MQT contents is not automatically synchronized with the referenced tables.
The contents can reflect a snapshot from a previous point in time
– MAINTAINED BY SYSTEM
• REFRESH TABLE and SET INTEGRITY IMMEDIATE CHECKED can be used to load
data into the MQT
– MAINTAINED BY USER
• REFRESH TABLE and SET INTEGRITY IMMEDIATE CHECKED CAN NOT be used
to load data
• The MQT can be populated using LOAD or DML (INSERT, UPDATE..)
• SET INTEGRITY .. IMMEDIATE UNCHECKED is used to enable MQT usage
Figure 5-3. MQT Refresh Options - review CL4636.0
Notes:
REFRESH IMMEDIATE
If a MQT table is defined as REFRESH IMMEDIATE, The changes made to the underlying
tables as part of a DELETE, INSERT, or UPDATE are cascaded to the materialized query table.
In this case, the content of the table, at any point-in-time, is the same as if the specified
subselect is processed. Materialized query tables defined with this attribute do not allow
INSERT, UPDATE, or DELETE statements
The REFRESH TABLE and SET INTEGRITY with IMMEDIATE CHECKED statements can be
used to refresh the MQT contents.
REFRESH DEFERRED
A MQT table can be defined as REFRESH DEFERRED, The changes made to the underlying
tables are not cascaded to the materialized query table.
For system maintained MQT tables, the data in the table can be refreshed at any time using the
REFRESH TABLE statement. The data in the table only reflects the result of the query as a
snapshot at the time the REFRESH TABLE statement is processed. System-maintained
V9.0
Instructor Guide
Uempty materialized query tables defined with this attribute do not allow INSERT, UPDATE, or DELETE
statements
User-maintained materialized query tables defined with this attribute do allow INSERT,
UPDATE, or DELETE statements and can also be populated using a LOAD utility. The SET
INTEGRITY statement, with the IMMEDIATE UNCHECKED option is used to let DB2 know that
the MQT contents are valid.
Instructor Guide
Instructor notes:
Purpose — To explain the REFRESH options for MQT tables. The column-organized tables that
we will be discussing in this lecture are all DEFERRED REFRESH MQT tables. We will discuss the
MAINTAINED BY REPLICATION option later when we talk about shadow tables.
Details —
Transition statement — Next we will discuss the conditions necessary for the DB2 optimizer to
utilize a MQT table for an access plan.
V9.0
Instructor Guide
Uempty
When can the DB2 Optimizer substitute a MQT
table in the access plan for a query ?
• The MQT is defined using the default, ENABLE QUERY
OPTIMIZATION
• The optimization class must be set to allow MQT usage in
access plans, classes 2, 5, 7 and 9
• Use of a Refresh Immediate MQT tables does not depend on
setting for CURRENT REFRESH AGE
• For Refresh Deferred MQT tables
– CURRENT REFRESH AGE is set to ANY
– CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION is set
such that it includes the materialized query table type.
Figure 5-4. When can the DB2 Optimizer substitute a MQT table in the access plan for a query ? CL4636.0
Notes:
A REFRESH IMMEDIATE materialized query table defined with ENABLE QUERY OPTIMIZATION
is always considered for optimization if CURRENT QUERY OPTIMIZATION is set to 2 or a value
greater than or equal to 5.
A REFRESH DEFERRED materialized query table defined with ENABLE QUERY OPTIMIZATION
can be used to optimize the processing of queries if each of the following conditions is true:
• CURRENT REFRESH AGE is set to ANY.
• CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION is set such that it includes the
materialized query table type.
• CURRENT QUERY OPTIMIZATION is set to 2 or a value greater than or equal to 5.
Instructor Guide
Instructor notes:
Purpose — To introduce the requirement to set the CURRENT REFRESH AGE special register to
access a deferred refresh MQT. This is true for column-organized MQT tables and shadow tables.
Details —
Transition statement — Next we will discuss the usage of column-organized tables and MQT
tables.
V9.0
Instructor Guide
Uempty
Utilization of MQT tables when using Column-
organized tables
• With initial implementation of Column-organized tables in DB2 10.5
– A MQT table could not be defined using the ORGANIZE BY COLUMN clause
– A Column-organized table could not be referenced by a MQT
– General concept was that the high performance of column-organized tables for
analytics queries reduces the need to create and utilize MQTs to produce query
results efficiently
• With the Cancun release of DB2 10.5 (Fix Pack 4)
– A User Maintained MQT table can be defined using the ORGANIZE BY COLUMN
clause
– A new type of MQT, referred to as Shadow Tables is available
• The MAINTAINED BY REPLICATION clause is used to define a MQT as a Shadow
Table
• Shadow table MQT tables must be defined as REFRESH DEFERRED
• Infosphere Change Data Capture software is used to automate synchronization of
base tables with the Shadow tables
Figure 5-5. Utilization of MQT tables when using Column-organized tables CL4636.0
Notes:
With the introduction of Column-organized tables in DB2 10.5, there was a restriction that a MQT
table could not be created using the ORGANIZE BY COLUMN clause. There was another
restriction that a MQT table could not reference a base table that was column-organized. Since the
use of MQT tables is to improve query performance and the use of BLU Acceleration with
column-organized tables was designed to dramatically improve performance, the need for MQT
tables was considered less critical.
Starting with Fix Pack 4 of DB2 10.5:
• You can now create column-organized user-maintained materialized query tables (MQTs). This
enhancement is particularly useful if you are upgrading your DB2 server to Version 10.5 and
have existing MQTs. Help reduce upgrade costs by converting your MQTs into
column-organized user-maintained MQTs that are eligible to match queries that contain a mix of
column-organized and row-organized tables.
• You can create a shadow table, which is a column-organized copy of a row-organized table that
includes all columns or a subset of columns. Shadow tables are implemented as materialized
query tables (MQTs) that are maintained by replication. Using shadow tables, you can get the
performance benefits of BLU Acceleration for analytic queries in an online transaction
Instructor Guide
processing (OLTP) environment. Analytical queries against row-organized tables are

automatically routed to shadow tables if the replication latency falls within a user-defined limit.
Shadow tables are maintained by IBM® InfoSphere Change Data Capture for DB2 (InfoSphere
CDC), a component of the InfoSphere Data Replication product. InfoSphere CDC
asynchronously replicates DML statements that are applied on the source table to the shadow
table. By default, all applications access the source tables. Queries are automatically routed to
the source table (row-organized) or the shadow table (column-organized copy of the source
table) by using a latency-based algorithm that prevents applications from accessing the shadow
table when the latency is beyond the user-defined limit.
V9.0
Instructor Guide

Purpose — To introduce the new types of column-organized MQT tables that can be created,
starting the Fix Pack 4 of DB2 10.5.
Details —
Transition statement — First we will explain how to implement user maintained MQT tables that
are column-organized.
Instructor Guide
Creating a User Maintained MQT as a Column-

organized table
• Requirements for defining a User Maintained MQT as a Column-
organized table
– REFRESH DEFERRED must be specified
– MAINTAINED BY USER is specified
– ORGANIZED BY COLUMN clause must be specified
– Only Column-organized tables can be referenced in AS SELECT clause
– SELECT statement can contain multiple table joins and GROUP BY clause
CREATE TABLE COLORG.HIST_UMQT

( BRANCH_ID, TELLER_ID, SBALANCE , SCOUNT )
AS ( SELECT BRANCH_ID, TELLER_ID , SUM(BALANCE) AS SBALANCE,
COUNT(*) AS SCOUNT
FROM COLORG.HISTORY
GROUP BY BRANCH_ID, TELLER_ID )
DATA INITIALLY DEFERRED REFRESH DEFERRED
MAINTAINED BY USER
ORGANIZE BY COLUMN IN TSCOLD ;
SET INTEGRITY FOR COLORG.HIST_UMQT ALL IMMEDIATE UNCHECKED
Figure 5-6. Creating a User Maintained MQT as a Column-organized table CL4636.0
Notes:
Beginning with DB2 10.5 Fix Pack 4, you can create user maintained MQT tables that are
column-organized.
The CREATE TABLE statement used to define a column-organized MQT must have the following
options included:
• You must specify the ORGANIZE BY COLUMN clause when creating a column-organized MQT
• MAINTAINED BY USER must be specified, as system maintained column-organized MQT
tables are not supported.
• REFRESH DEFERRED must be specified. since REFRESH IMMEDIATE is unsupported
• The referenced source tables must be column-organized.
User maintained column-organized MQT tables can include joins and GROUP BY clauses.
The example shows a column-organized MQT table definition that is based on a SQL statement
that contains summarized results from a single source table.
V9.0
Instructor Guide

Purpose — To discuss the options that need to be included when a column-organized user
maintained MQT is created.
Details —
Transition statement — Next we will look at an the statements that could be used to load data into
a column-organized MQT.
Instructor Guide
Loading the data into the User Maintained MQT

• A LOAD utility or SQL can be used to populate the Column-
organized MQT
declare colhist1 cursor for SELECT BRANCH_ID, TELLER_ID ,

SUM(BALANCE) AS SBALANCE, COUNT(*) AS SCOUNT
FROM COLORG.HISTORY GROUP BY BRANCH_ID, TELLER_ID
order by BRANCH_ID, TELLER_ID
load from colhist1 of cursor replace into COLORG.HIST_UMQT

NONRECOVERABLE
SET INTEGRITY FOR COLORG.HIST_UMQT ALL IMMEDIATE UNCHECKED
Note, the Column compression dictionaries would be built during

LOAD processing and statistics would be generated
Figure 5-7. Loading the data into the User Maintained MQT CL4636.0
Notes:
The sample statements show how a column-organized user maintained MQT table could be loaded
with the current data using a declared cursor with a LOAD command.
The DECLARE CURSOR statement would be similar to the SELECT statement used to create the
MQT, but it could include an ORDER BY clause. You could also load data into the MQT using SQL
statements.
The SET INTEGRITY statement with the IMMEDATE UNCHECKED option is used to bring the
user-maintained materialized query table out of set integrity pending state.
V9.0
Instructor Guide

Purpose — To show how a user maintained MQT could be loaded with the result data of the query
that was used to define the MQT.
Details —
Transition statement — Next we will look at the explain report that can be used to check if a user
maintained MQT will be used to produce the result for a SQL query.
Instructor Guide
Checking usage of the User Maintained MQT in

the access plan for a query
set current degree 'ANY';
Access Plan:
set current maintained table types for optimization USER ;
26
set current refresh age ANY ; SORT
( 3)
Original Statement: 171.646
------------------ 23
SELECT |
HISTORY.BRANCH_ID, 26
sum(HISTORY.balance) as br_balance, CTQ
count(*) as br_trans ( 4)
FROM 171.636
COLORG.HISTORY AS HISTORY 23
WHERE |
HISTORY.BRANCH_ID between 10 and 35 26
GROUP BY HISTORY.BRANCH_ID GRPBY
ORDER BY HISTORY.BRANCH_ID ASC ( 5)
171.63
23
Extended Diagnostic Information: |
-------------------------------- 8897.98
Diagnostic Identifier: 1 TBSCAN
Diagnostic Details: EXP0148W The following MQT or ( 6)
statistical view was considered in query matching: 171.274
"COLORG ". "HIST_UMQT".
23
Diagnostic Identifier: 2
|
Diagnostic Details: EXP0149W The following MQT was
used (from those considered) in query matching: 34223
"COLORG ". "HIST_UMQT". CO-TABLE: COLORG
HIST_UMQT
Q1
Figure 5-8. Checking usage of the User Maintained MQT in the access plan for a query CL4636.0
Notes:
The visual shows a group of SET CURRENT statements that would be used to establish the
conditions necessary to allow the user maintained MQT table to be used for a SQL query, including:
• SET CURRENT DEGREE ‘ANY’
• SET CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION USER
• SET CURRENT REFRESH AGE ANY
The slide include the original SQL statement text, that references the column-organized table that
was used for the MQT definition.
The extended diagnostic section of the DB2 db2exfmt explain report includes several messages
stating that the optimizer found the MQT which matched the SQL statement and decided to utilize
the MQT in the access plan.
The slide also includes a portion of the access plan diagram, showing the MQT table being scanned
rather than the table referenced in the SQL text.
V9.0
Instructor Guide

Purpose — To show how a user maintained column-organized MQT table could be used in the
access plan generated by a DB2 explain tool.
Details —
Transition statement — Next we will discuss a different type of column-organized MQT, shadow
tables.
Instructor Guide
Shadow Tables can be used to accelerate

Analytics query processing in an OLTP Database
Column-organized tables
Row-organized tables
can process some SQL
can process all SQL that
queries to improve
changes data and
performance
SELECT statements
expected to perform well
Figure 5-9. Shadow Tables can be used to accelerate Analytics query processing in an OLTP Database CL4636.0
Notes:
This visual shows how shadow tables can be used to accelerate analytics that are directly run
against transactional data as it happens.
Within the same DB2 database, we have both the main transactional row-organized tables and their
corresponding shadow tables which are copies of the source tables, but in columnar format.
With this dual format architecture, transactional applications continue to optimally target the main
row-organized tables, while complex analytical queries are re-routed to the corresponding shadow
tables. Since the shadow tables are in columnar format, the analytical queries are accelerated by
an order of magnitude faster via BLU technology.
To maintain the shadow tables, the solution leverages Change Data Capture, an IBM InfoSphere
Data Replication product. Performance testing has shown that the latency between the main
transactional tables and the shadow tables can be as low as single digit seconds, allowing analytics
to act on as close to real time data as possible.
Since shadow tables are used to accelerate the analytics processing, any extra indexes that were
created just to speed up the analytic queries can be dropped. This may offset any impact that the
data replication to shadow tables may have on the transactional workload.
V9.0
Instructor Guide

Purpose — The key point here is that shadow tables are implemented directly in the same DB2
database where transactions are processed using row-organized tables rather than creating a
second DB2 database with a copy of the transactional data. Applications reference the
row-organized tables, but the DB2 optimizer is able to route complex analytics type SQL processing
to the column-organized tables to improve performance.
Details —
Transition statement — Next we will discuss the characteristics for shadow tables.
Instructor Guide
Shadow table characteristics

• A shadow table is a column-organized copy of a row-organized table
that includes all columns or a subset of columns.
• Shadow tables are implemented as materialized query tables (MQTs)
that are maintained by replication.
• Using shadow tables, you can get the performance benefits of BLU
Acceleration for analytic queries in an online transaction processing
(OLTP) environment.
• Shadow tables are maintained by IBM InfoSphere Change Data Capture
for DB2 LUW, a component of the InfoSphere Data Replication product.
– InfoSphere CDC asynchronously replicates DML statements that are applied on
the source table to the shadow table.
• By default, all applications access the source tables.
– Queries are automatically routed to the source table (row-organized) or the
shadow table (column-organized copy of the source table) based on estimated
costs
– A latency-based algorithm is available to prevent applications from accessing the
shadow table when the latency is beyond the user-defined limit.
Figure 5-10. Shadow table characteristics CL4636.0
Notes:
A shadow table is a column-organized copy of a row-organized table that includes all columns or a
subset of columns. Shadow tables are implemented as materialized query tables (MQTs) that are
maintained by replication.
Using shadow tables, you can get the performance benefits of BLU Acceleration for analytic queries
in an online transaction processing (OLTP) environment. Analytical queries against row-organized
tables are automatically routed to shadow tables if the replication latency falls within a user-defined
limit.
BLU Acceleration enhances the performance of complex queries through a column-organized
architecture. By combining this enhanced performance for complex queries with the efficiency of
row-organized tables for OLTP queries, you can use shadow tables to capitalize on the best of both
worlds.
Shadow tables are maintained by IBM InfoSphere Change Data Capture for DB2, a component of
the InfoSphere Data Replication product. InfoSphere CDC asynchronously replicates DML
statements that are applied on the source table to the shadow table.
V9.0
Instructor Guide
Uempty By default, all applications access the source tables. Queries are automatically routed to the source
table (row-organized) or the shadow table (column-organized copy of the source table) by using a
latency-based algorithm that prevents applications from accessing the shadow table when the
latency is beyond the user-defined limit.
Shadow tables improve analytic query performance in OLTP environments without having to add
indexes for this purpose.
Instructor Guide
Instructor notes:
Purpose — To explain some of the unique characteristics of shadow tables.
Details —
Transition statement — Next we will discuss how to create a shadow table.
V9.0
Instructor Guide
Uempty
How to create a Shadow table

• DB2 10.5 Fix pack 4 delivered a new type of Materialized
Query Table, referred to as a shadow table
– Shadow tables are Column-organized tables
– Shadow tables are created using the CREATE TABLE with a AS
SELECT statement with the following requirements:
• MAINTAINED BY REPLICATION clause is required
• REFRESH DEFERRED is required
• The SELECT statement refers to a single Row-organized table
• The SELECT can include a subset of the Columns in the source Row-
organized table, but no GROUP BY clause is allowed, so each row in the
Shadow table is related to a single row on the Row-organized table
• ORGANIZED BY COLUMN clause is required
– Shadow tables have an enforced Primary Key that matches a Primary
Key or Unique Constraint from the source table
• The primary key allows Infosphere CDC to apply each row change in
source table to a single row of the Shadow table
Figure 5-11. How to create a Shadow table CL4636.0
Notes:
Shadow tables became available with Fix Pack 4 of DB2 10.5. Shadow tables are a special type of
Materialized Query Table (MQT).
Create the shadow table by issuing the CREATE TABLE statement with the MAINTAINED BY
REPLICATION clause. This clause identifies this table as a shadow table. The primary key of the
source table must be included in the select list of the CREATE TABLE statement for the shadow
table.
The CREATE TABLE statement for Shadow tables must include these options:
• REFRESH DEFERRED
• ORGANIZED BY COLUMN must be specified even if the default table organization has been
set to COLUMN
• The following restrictions apply to the fullselect in a shadow table definition:
- The fullselect can reference only one base table; joins are not supported.
- The base table must be a row-organized table.
Instructor Guide
- The subselect can contain only a select-clause and a from-clause, no GROUP BY can be
included.
- The projection list of the shadow table can reference only base table columns that are valid
in a column-organized table. Expressions are not supported in the projection list. You cannot
rename the columns that are referenced in the projection list by using the column list or the
AS clause.
- The projection list of the shadow table must include at least one set of enforced unique
constraint or primary key columns from the base table.
- The fullselect cannot include references to a nickname, a typed table, or a view or contain
the SELECT DISTINCT clause.
V9.0
Instructor Guide

Purpose — To discuss the options that are required in the CREATE TABLE statement used to
define a shadow table.
Details —
Transition statement — Next we will review some of the benefits associated with shadow tables.
Instructor Guide
Summary of Shadow Tables Benefits
• Single database – analytics SQL directly on transactional

data
• Analytics using column-organized shadow tables with BLU
Acceleration – order of magnitude faster!
• Optimal transactional workload
– Continue to optimally access row-organized tables
– No need for secondary indexes on row-organized tables for
analytics purpose
• No change to queries – DB2 optimizer does the routing
• Minimal latency – analytics on near real time data
– Leverage IBM InfoSphere Data Replication (CDC)
– Available with DB2 AESE and DB2 AWSE (for shadow table
usage)
Figure 5-12. Summary of Shadow Tables Benefits CL4636.0
Notes:
This slide summarizes the key benefits of the shadow table solution:
• A single DB2 database where analytics act directly on transactional data
• The Analytics query processing is accelerated using column-organized shadow tables that is
order of magnitude faster
• The transactional workload continue to optimally access the row-organized tables.
• Using shadow tables, there is a reduced requirement for additional secondary indexes on the
row-organized tables that may otherwise be needed for analytics query processing.
• No changes are required to SQL queries, since routing to shadow tables is done by the DB2
optimizer when statements are compiled. Note that only dynamic SQL can be routed to use
shadow tables.
• Minimal latency is achieved by leveraging IBM InfoSphere Data Replication (CDC) which is
available with the editions of DB2 LUW that support Column-organized tables:
- DB2 AESE - DB2 Advanced Enterprise Server Edition
- DB2 AWSE - DB2 Advanced Workgroup Server Edition
V9.0
Instructor Guide

Purpose — To review some key benefits for using shadow tables.
Details —
Transition statement — Next we will look at an example of the CREATE TABLE statement used to
create a shadow table.
Instructor Guide
Example DDL to create a shadow table

• Create a Column-organized shadow table for a specific set of
columns in a Row-organized table, ROWORG.ACCT
CREATE TABLE COLORG.ACCT_SHAD

( ACCT_ID, ACCT_GRP, BALANCE )
AS ( SELECT ACCT_ID, ACCT_GRP, BALANCE FROM ROWORG.ACCT )
DATA INITIALLY DEFERRED REFRESH DEFERRED
MAINTAINED BY REPLICATION
ORGANIZE BY COLUMN IN TSSHADD INDEX IN TSSHADI
SET INTEGRITY FOR COLORG.ACCT_SHAD ALL IMMEDIATE UNCHECKED
ALTER TABLE COLORG.ACCT_SHAD ADD CONSTRAINT ACCT_SHAD_PK

PRIMARY KEY ( ACCT_ID )
The ACCT_ID column in the table ROWORG.ACCT is the Primary Key
Figure 5-13. Example DDL to create a shadow table CL4636.0
Notes:
The visual shows an example of the statements used to create a new shadow table.
The CREATE TABLE statement refers to the row-organized table ROWORG.ACCT. The shadow
table will only include the subset of columns that are expected to by used by analytics queries. The
clauses REFRESH DEFERRED, MAINTAINED BY REPLICATION and ORGANIZED BY COLUMN
are included.
The SET INTEGRITY statement with the IMMEDIATE UNCHECKED option resolves the set intgrity
pended status for the new shadow table.
The ALTER TABLE statement defines an enforced primary key for the shadow table based on the
primary key column for the row-organized table.
V9.0
Instructor Guide

Purpose — To discuss an example definition for a new shadow table.
Details —
Transition statement — Next we will discuss some database and application requirements for
using shadow tables.
Instructor Guide
How to enable use of Shadow Tables for an

application
• The DB2 instance may not be configured for column-organized
processing
– DB2_WORKLOAD is probably not set to ANALYTICS for an OLTP database
– Configure the database options to support column-organized tables
• Ensure that the SORTHEAP and SHEAPTHRES_SHR are not set to AUTOMATIC
• Configure UTIL_HEAP_SZ to support LOAD processing
• Configure LOGARCHMETH1 for archive logging
– Required to support CDC based Replication
– Intra-parallel processing must be enabled
• The following statement could be used in an application
CALL SYSPROC.ADMIN_SET_INTRA_PARALLEL('YES');
SET CURRENT DEGREE 'ANY';
– For latency-based routing to a Shadow table
• CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION special register is set
to contain only REPLICATION
– The DB CFG option DFT_MTTB_TYPES can be set to REPLICATION
• CURRENT REFRESH AGE special register is set to a duration other than zero or ANY
– Lock Isolation can only be CS or UR for a shadow table to be considered
Figure 5-14. How to enable use of Shadow Tables for an application CL4636.0
Notes:
It is quite possible that Shadow tables would be defined in an existing DB2 database where the
OLTP processing is currently performed.
That DB2 instance and database would probably not been configured using the DB2_WORKLOAD
registry variable setting of ANALYTICS.
Since shadow tables are column-organized tables, the DB2 database will need to meet the basic
requirements for BLU Acceleration processing, including:
• The database configuration options SORTHEAP and SHEAPTHRES_SHR can not be set to
AUTOMATIC, but need to be large enough to process column-organized tables.
• The utility heap configuration option, UTIL_HEAP_SZ, needs to be large enough to support
efficient LOAD processing for column-organized tables.
• The database needs to be configured for archive logging, using LOGARCHMETH1, since the
Infosphere Change Data Capture software needs to be able to access archived logs.
Column-organized processing in DB2 requires intra-parallel processing being enabled. One method
to enable intra-parallel processing is for the application to call the procedure
ADMIN_SET_INTRA_PARALLEL and to set the special register CURRENT DEGREE to ANY.
V9.0
Instructor Guide
Uempty Latency-based routing is a performance improvement technique that directs a query to a shadow
table when the replication latency is within a user-defined limit.
If you create a shadow table with ENABLE QUERY OPTIMIZATION clause, each of the following
conditions must be true to optimize query processing that is based on a latency period:
• The CURRENT QUERY OPTIMIZATION special register is set to 2 or a value greater than or
equal to 5.
• The CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION special register is set to
contain only REPLICATION.
• The CURRENT REFRESH AGE special register is set to a duration other than zero or ANY.
Instructor Guide
Instructor notes:
Purpose — To discuss some configuration requirements that are required for using shadow tables
because they are column-organized tables. Some additional special register settings are necessary
because shadow tables are deferred refresh MQT tables.
Details —
Transition statement — Next we will discuss some more details about latency based routing for
shadow tables.
V9.0
Instructor Guide
Uempty
More about Latency-based routing to Shadow
Tables
• Replication latency is the amount of time that it takes for a transaction
against a source table to be applied to a shadow table.
• Latency-based routing is a performance improvement technique that
directs a query to a shadow table when the replication latency is within a
user-defined limit
• The limit is based on the Special Register CURRENT REFRESH AGE,
based on a timestamp
– SET CURRENT REFRESH AGE 500, sets the limit to 5 minutes
• Replication latency information is communicated to the DB2 database
through the SYSTOOLS.REPL_MQT_LATENCY table
– Updated by InfoSphere CDC to take advantage of latency-based routing
– Can be created using SYSINSTALLOBJECTS procedure
• CALL SYSINSTALLOBJECTS('REPL_MQT','C','TSWORK',NULL)
• Non-latency based routing can also be used with shadow tables
– SET CURRENT REFRESH AGE ANY
Figure 5-15. More about Latency-based routing to Shadow Tables CL4636.0
Notes:
Replication latency is the amount of time that it takes for a transaction against a source table to be
applied to a shadow table. Latency-based routing is a performance improvement technique that
directs a query to a shadow table when the replication latency is within a user-defined limit.
If you create a shadow table with ENABLE QUERY OPTIMIZATION clause, each of the following
conditions must be true to optimize query processing that is based on a latency period:
• The CURRENT QUERY OPTIMIZATION special register is set to 2 or a value greater than or
equal to 5.
• The CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION special register is set to
contain only REPLICATION.
• The CURRENT REFRESH AGE special register is set to a duration other than zero or ANY.
This special register specifies the refresh age as a timestamp with a format of
yyyymmddhhmmss where the parts of the timestamp are defined as follows:
- yyyy indicates the year (0-9999)
- mm indicates the month (0-11)
Instructor Guide
- dd indicates the day (0-30)

- hh indicates the hour (0-23)
- mm indicates the minute (0-5)
- ss indicates the second (0-59)
The timestamp value can be truncated on the left. This left truncation means that you do
not have to specify year, month, day, and so on, if you want a refresh age of only one
second. However, individual elements that are preceded by another element must
include any leading zeros. For example, a refresh age of 10705 represents 1 hour, 7
minutes, and 5 seconds.
Replication latency information is communicated to the DB2 instance through the
SYSTOOLS.REPL_MQT_LATENCY table that is updated by InfoSphere CDC to take advantage of
latency-based routing. This table can be created using the DB2 stored procedure
SYSINSTALLOBJECTS.
Shadow tables could also be used without latency checking by setting the CURRENT REFRESH
AGE special register to ANY.
V9.0
Instructor Guide

Purpose — To discuss the concepts of latency based selection of shadow tables, with
communication between the replication software, CDC and DB2 so applications can prevent the
DB2 optimizer from using a shadow table if the data in the shadow table is not current enough.
Details —
Transition statement — Next we will talk about setting the value of SORTHEAP in a DB2
database where column-organized shadow tables and row-organized tables will be accessed.
Instructor Guide
Configuration of SORTHEAP for a database with

mixture of BLU Acceleration and OLTP processing
• The requirements for sort heap memory, SORTHEAP, for efficient
processing of Column-organized tables are higher than you would
normally set for databases in OLTP environments
• A higher setting for SORTHEAP could impact access plans for the OLTP
processing using the row-organized tables
• You can use the OPT_SORTHEAP_EXCEPT_COL value option for
DB2_EXTENDED_OPTIMIZATION to override the value of the sortheap
database configuration parameter.
– The override value affects query optimization only and does not
determine the amount of actual memory that is available at run time.
– If the query accesses a column-organized table, this override value is
ignored
db2set DB2_EXTENDED_OPTIMIZATION=“OPT_SORTHEAP_EXCEPT_COL 5000”
Figure 5-16. Configuration of SORTHEAP for a database with mixture of BLU Acceleration and OLTP processing CL4636.0
Notes:
You can use the OPT_SORTHEAP_EXCEPT_COL value option of the DB2 registry variable
DB2_EXTENDED_OPTIMIZATION to override the value of the sortheap database configuration
parameter.
The override value affects query optimization only and does not determine the amount of actual
memory that is available at run time. If the query accesses a column-organized table, this override
value is ignored to allow the query compiler to use the current value of the sortheap database
configuration parameter.
One usage of the OPT_SORTHEAP_EXCEPT_COL is for shadow tables. Shadow tables facilitate
BLU Acceleration for analytical queries in OLTP environment. Shadow tables are column-organized
tables. The requirements for sort heap memory are higher than you would normally have for
databases in OLTP environments. To increase the sort heap memory without affecting existing
access plans for OLTP queries, add OPT_SORTHEAP_EXCEPT_COL to
DB2_EXTENDED_OPTIMIZATION to override the value of the sortheap database configuration
paramete
V9.0
Instructor Guide

Purpose — To discuss a DB2 registry variable that could be used to prevent SQL access plans for
the row-organized tables from being impacted if the SORTHEAP memory is set high to support the
Details —
Transition statement — Next we will discuss using a connection procedure to set the application
environment special registers outside of the applications.
Instructor Guide
Using a Connect Procedure to Enable Shadow

Tables for SQL compilation
• Leverage existing DB2 connect procedure functionality to enable
shadow table usage automatically on database connection
• Allow custom logic to inform DB2 which connections can use
shadow tables
– Separate transactional connections from analytical connections
– Specify different latency limits for different OLAP connections
row-
Transactional Transactional organized
tables
Connect procedure executed

Analytics1
for each connection
Analytics1
shadow
tables
Analytics2 Analytics2
Figure 5-17. Using a Connect Procedure to Enable Shadow Tables for SQL compilation CL4636.0
Notes:
A connect procedure can be used to automatically enable usage of shadow table for selected
applications without the need of modifying the applications.
Basically, a connect procedure is a SQL procedure that is executed for each connection to the
database.
The body of the procedure can be implemented to test attributes of each connection and
conditionally execute those SQL statements on the previous slide only for those connections
corresponding to applications that have been identified to be able to make use of shadow tables.
The diagram on this visual shows three connections:
• One transactional connection that should not make use of shadow tables,
• Two analytical connections that have been identified to make use of shadow tables.
The connect procedure is executed for all three connections but only those analytical connections
are allowed to make use of shadow tables while the transactional one continues to only access the
row-organized tables.
V9.0
Instructor Guide

Purpose — To introduce the concept of using a connection procedure to set the special registers
needed to enable shadow table access without changing applications.
Details —
Transition statement — Next we will look at an example connection procedure for shadow table
support.
Instructor Guide
A Sample Connect Procedure to enable shadow table

usage for selected applications
CREATE OR REPLACE PROCEDURE ADMIN_SCHEMA.SHADOW_SETUP()

BEGIN
DECLARE APPLNAME VARCHAR(128);
SET APPLNAME = (SELECT APPLICATION_NAME FROM TABLE
(SYSPROC.MON_GET_CONNECTION(MON_GET_APPLICATION_HANDLE(),-1)));
IF (APPLNAME LIKE 'report%' OR

APPLNAME = 'end_of_day_summary') THEN Replace this with
CALL SYSPROC.ADMIN_SET_INTRA_PARALLEL('YES'); custom conditions.
SET CURRENT DEGREE 'ANY'; Can use any other
connection attributes
SET CURRENT MAINTAINED TYPES REPLICATION;
such as user ID, etc
SET CURRENT REFRESH AGE 500;
END IF;
END@
GRANT EXECUTE ON PROCEDURE Every connection will now

ADMIN_SCHEMA.SHADOW_SETUP TO PUBLIC@ execute the procedure and
UPDATE DB CFG USING enable shadow table as
CONNECT_PROC "ADMIN_SCHEMA.SHADOW_SETUP"@ appropriate
-----------------------
UPDATE DB CFG
Tip – reset connect_proc
USING CONNECT_PROC NULL@ before updating
procedure body
Figure 5-18. A Sample Connect Procedure to enable shadow table usage for selected applications CL4636.0
Notes:
Here we have a sample connect procedure.
The body is quite simple. It tests the application name of the connection to look for certain analytic
applications that has been identified to be able to make use of shadow tables. In this sample, only
for these analytic applications will DB2 consider routing to shadow tables, all other applications are
not impacted by the presence of shadow tables.
Since a connect procedure is just a standard SQL procedure, you can customize it to fit your needs
to allow you to properly and effectively identify which connections should make use of shadow
tables. For example, instead of hard-coding connection attributes in the procedure, you can setup
an “opt-in” DB2 table containing applications that can make use of shadow tables and have the
connect procedure search this “opt-in” DB2 table to determine the connections for which to enable
usage of shadow tables.
V9.0
Instructor Guide

Purpose — To discuss a sample connection procedure that could set the special registers
associated with shadow table access without changing applications.
Details —
Transition statement — Next we will discuss how to setup Infosphere Change Data Capture to
work with shadow tables.
Instructor Guide
Shadow Tables with InfoSphere Data Replication

CDC
• Available with DB2 AESE and DB2 AWSE (for shadow table usage)
– InfoSphere Data Replication CDC for DB2 LUW 10.2.1 Interim Fix 12 or later
– InfoSphere CDC Access Server Version 10.2.1 Interim Fix 5 or later
– (Optional) InfoSphere CDC Management Console Version 10.2.1 Interim Fix 5 or later
Figure 5-19. Shadow Tables with InfoSphere Data Replication CDC CL4636.0
Notes:
This is a pictorial view illustrating how DB2 and CDC work together to maintain the shadow tables.
Infosphere CDC is composed of various components
• The CDC Replication Engine is the main component that reads from the DB2 logs and apply the
corresponding updates to the shadow tables.
• The CDC Access Server is the component that manages various aspect of data replication such
as setting up data stores, replication subscriptions and so on.
• Both the Replication Engine and the Access Server are required for data replication and these
components can be installed on the same server as DB2.
• The CDC Management Console is the graphical interface to Access Server. This is an optional
component that can be used to manage the data replication from a desktop/laptop.
In the visual you can see that the row-organized tables and column-organized shadow tables are
defined in the same DB2 database. The changes logged for the row-organized tables are read by
the Capture agent of CDC and then used by the Apply agent of CDC to update the
column-organized shadow tables. The CDC software is running on the same server as the DB2
database.
V9.0
Instructor Guide
Uempty
Important
To support shadow tables, you must install a supported version of the following
InfoSphere CDC software components:
• InfoSphere CDC for DB2 for LUW Version 10.2.1 Interim Fix 12 or later releases.
• InfoSphere CDC Access Server Version 10.2.1 Interim Fix 5 or later releases.
• InfoSphere CDC Management Console Version 10.2.1 Interim Fix 5 or later
Instructor Guide
Instructor notes:
Purpose — To discuss the use of the Infosphere CDC software to replicate the contents of the
row-organized tables used by OLTP applications to the column-organized shadow tables in the
same DB2 database. The processing of CDC is asynchronous from the OLTP transaction
processing, but the latency between the changes in the two sets of tables can be a matter of
seconds.
Details —
Transition statement — Next we will discuss more about the asynchronous processing performed
by CDC for shadow tables.
V9.0
Instructor Guide
Uempty
Asynchronous Maintenance of Shadow Tables
• Minimal impact to OLTP transactions via asynchronous

maintenance
– Capture engine scrapes DB2 logs for deltas
– Apply engine consolidates updates to shadow tables
– Leverages DB2 Cancun Release index scan driven updates
• Shadow tables benefit from larger transaction size
– CDC system parameter (default to 5 seconds – good for most workloads)
acceptable_latency_in_seconds_for_column_organized_tables
– Allow buffering of apply operations to shadow tables
– More effective synopsis tables
– Take note to keep this CDC parameter below DB2 REFRESH AGE special
register
• Latency communication to DB2
– CDC system parameter maintain_replication_mqt_latency_table=true
– Populates DB2 SYSTOOLS.REPL_MQT_LATENCY table
Figure 5-20. Asynchronous Maintenance of Shadow Tables CL4636.0
Notes:
CDC maintenance of the shadow tables is an asynchronous process and hence has minimal impact
to OLTP transactions. As mentioned previously, with shadow tables, you may decide to drop any
existing extra indexes that were created to speed up analytic queries which may offset any
overhead that the CDC replication may have on the OLTP transactions.
CDC replication involves a capture engine that scrapes the DB2 logs for update deltas and feed
these deltas to the apply engine to maintain the shadow tables. The apply to the shadow tables is
further optimized to leverage the enhanced index scan driven updates for column-organized tables
with DB2 10.5 fix pack 4.
Normally, CDC will apply deltas to the target tables as quickly as it can. However, for shadow tables
and column organized tables in general, it is advantageous to apply deltas to the target in chunks to
increases the size of each apply transaction. CDC offers a system parameter,
acceptable_latency_in_seconds_for_column_organized_tables to accomplish this by delaying
the apply to the target tables up to a certain number of seconds. The default is 5 seconds which
should be good enough for most workloads.
Instructor Guide
Note that this delay will affect the replication latency and hence should be set to a value that is
under the latency limit that your analytic applications can tolerate as specified by the REFRESH
AGE special register, else DB2 may not route the analytic queries to the shadow tables.
As mentioned previously, CDC communicates the latency info to DB2 via the
SYSTOOLS.REPL_MQT_LATENCY table. A CDC system parameter,
maintain_replication_mqt_latency_table needs to be set to TRUE for CDC to populate this table
with the latency info.
V9.0
Instructor Guide

Purpose — To discuss how CDC applies changes in the row-organized table to the shadow tables.
There are several CDC options that were implemented for shadow table support including one that
specifies that the DB2 table needs to used to communicate latency show the DB2 optimizer can
decide if the shadow table data meets the application limit for latency based on REFRESH AGE.
Details —
Transition statement — Next we will discuss some important Infosphere CDC terms and concepts
that you need to know to implement shadow tables.
Instructor Guide
Important Infosphere CDC terms and concepts for

Shadow Table Implementation – part 1
• CDC Access Server used to perform management tasks
– Define user accounts for CDC management
– Access Server must be started, uses a tcpip port
• dmaccessserver command is used to start Access Server
– CDC Datastore object provides user access to CDC Instance

• For Shadow tables, one datastore is defined to both the Source and target datastore
• A CDC instance is created to manage CDC processing for a particular DB2 LUW
database
– The CDC instance is defined with a tcpip port number for communication
– A DB2 User name and password is defined for the CDC instance
• Must be able to access source and shadow tables and run LOAD
• A DBADM authority user could be used
– The DB2 instance must be started in order to start the CDC instance because it connects to the
DB2 database
– dmconfigurets command is used to create the CDC Instance and also to start and stop CDC
instance processing
• CDC Management Console is a Windows based application that simplifies creation

and management of CDC objects and monitoring of CDC processing and status
information
Figure 5-21. Important Infosphere CDC terms and concepts for Shadow Table Implementation – part 1 CL4636.0
Notes:
Shadow tables require the following InfoSphere CDC software components:
• InfoSphere CDC for DB2 for LUW
This software is the replication engine for DB2 for Linux, UNIX, and Windows.
The replication engine reads the transaction logs and captures all the DML
operations for the source row-organized table. Then, the apply agent applies
these changes to the shadow table.
A CDC instance will be created for each DB2 database that will use CDC to replicate the
changes to shadow tables. The command dmconfigurets can be used to create, edit, start
and stop a CDC instance.
A DB2 user name is defined to establish a connection to the DB2 database associated with
the CDC instance. In order to start the CDC instance the DB2 instance will need to be
started.
• InfoSphere CDC Access Server
V9.0
Instructor Guide
Uempty This software is a server application that directs communications between the
replication engine processes and the InfoSphere CDC Management Console or
the command line processor (CHCCLP).
You will use access server to define CDC user profiles.
Access server is also used to create a CDC object called a datastore that is used by a CDC
user to work with a CDC Instance.
• InfoSphere CDC Management Console
This software is an administration application that you can use to configure and
monitor replication for shadow tables. This GUI interface runs on only Windows
operating systems. It includes an event log and a monitoring tool.
Information
A datastore is an abstraction that represents an InfoSphere CDC instance. It

holds information about the database and data files that are required for
replication. InfoSphere CDC Management Console and the CHCCLP
command-line interface interact with the database by connecting to only a
datastore. While general InfoSphere CDC environments contain source and
target datastores, shadow tables require only one datastore because the
source and target are the same database.
Instructor Guide
Instructor notes:
Purpose — To provide a basic description of the CDC components used to implement and manage
shadow tables for a DB2 database.
Details —
Transition statement — Next we will describe CDC subscriptions and table mappings used for
DB2 shadow tables.
V9.0
Instructor Guide
Uempty
Important Infosphere CDC terms and concepts for
Shadow Table Implementation – part 2
• Subscription: Replication connection between source and target
datastores
– Logical unit for start/stop mirroring, monitoring, latency communication
– For Shadow tables, for each DB2 database, create a single subscription object
containing all shadow tables
• Table Mapping: Defines how to replicate source table to target table

– Row-organized table as source table
– Shadow table as target table
– Use standard replication
– Use mirror replication method
– Shadow table primary key index as target key
Figure 5-22. Important Infosphere CDC terms and concepts for Shadow Table Implementation – part 2 CL4636.0
Notes:
Subscriptions
A subscription is a container for table mappings. It logically links source and target datastores
and contains various parameters that determine the replication behavior.
For shadow tables, you must create one single subscription that replicates all shadow tables in
a database. Also, mark the subscription as persistent, which allows for better fault tolerance in
situations where replication is disrupted.
Table mappings
Table mappings contain information on how individual tables (or columns of tables) are
replicated from the source to the target datastores.
For shadow tables, choose standard replication with a one-to-one table mapping between a
row-organized (source) table and the shadow (target) table. For the target table key, specify the
unique index corresponding to the primary key of the shadow table to provide a one-to-one
table mapping and performance improvements.
Before you add, modify, or delete table mappings that belong to a subscription, you must end
replication.
Instructor Guide
Instructor notes:
Purpose — To describe the CDC terms subscription and table mapping that are used to manage
shadow tables.
Details —
Transition statement — Next we will describe a task list that you will use to implement CDC
replication for shadow tables in a DB2 LUW database.
V9.0
Instructor Guide
Uempty
Task list to implement Infosphere CDC for DB2
LUW to support Shadow Tables
• Install CDC Access Server and CDC for DB2 LUW
• Create a CDC User to manage CDC using CDC Access Server
• Create a CDC Instance associated with the DB2 Database where the
Row-organized source tables and Shadow Tables will be located
• Create a Datastore associated with the CDC Instance
• Add a Datastore connection for the CDC User
• Create a CDC Subscription that will be used to manage the replication
for all of the shadow tables in the DB2 database
• Create table mappings for each row-organized table that is referenced
in the definition of a shadow table
– Note there can only be one Shadow table defined for a source Row-organized
table
• Start Mirroring for the CDC Subscription
Figure 5-23. Task list to implement Infosphere CDC for DB2 LUW to support Shadow Tables CL4636.0
Notes:
The visual includes a list of tasks that you will perform to implement Infosphere CDC with a DB2
database to maintain a set of shadow tables.
The CDC software uses a user profile that contains a user name, the password and the role for the
user, which sets limits on what the cdc user can perform.
When you create the CDC instance, you define the DB2 database user and password that DB2 will
use to authorize the work performed by CDC in that database, including access to tables and
running the LOAD utility.
We will be showing some examples of creating the CDC objects we need to implement shadow
tables.
Instructor Guide
Instructor notes:
Purpose — To discuss some of the CDC related tasks required to implement shadow tables in a
DB2 database. The sequence of the tasks shows what needs to be completed before the next task
can be started.
Details —
Transition statement — Next we will see how a CDC instance can be created.
V9.0
Instructor Guide
Uempty
Using dmconfigurets to create a CDC Instance
associated with the DB2 source database
This tcpip port number is

used to communicate to
CDC for this one DB2
database
It is different from the DB2
Instance port used for
database connections
Figure 5-24. Using dmconfigurets to create a CDC Instance associated with the DB2 source database CL4636.0
Notes:
The dmconfigurets command can be used to create a CDC instance, edit the setting and also to
start and stop the CDC instance.
The definition of the CDC instance includes:
• The maximum amount of disk space for the InfoSphere CDC staging store on your DB2 server.
The default value is 100GB.
• The amount of physically available RAM that you want to allocate for this InfoSphere CDC
instance and press Enter. By default, the configuration, tool allocates 512 MB of RAM for each
32-bit instance and 1024 MB of RAM for each 64-bit instance. A value of 3600 MB works well in
most environments.
• A tcpip port that will be used to communicate with the CDC instance
• A local DB2 instance and one DB2 database are selected
• A DB2 user and password to associate with the CDC Instance connection to the database. This
needs to be authorized to access the tables and perform LOAD processing. A DBADM authority
could be use.
Instructor Guide
• A Refresh loader path is defined. This local disk path will be used to store files that CDC
generates for the LOAD processing when shadow tables are refreshed. The DB2 user for the
CDC instance needs to have permission to access this disk path.
• A DB2 database schema is specified that will be used for a set of table that CDC manages in
the DB2 database.
V9.0
Instructor Guide

Purpose — To show an example of the options selected to create a CDC Instance. The sanme tool
is used to start, stop and delete an instance.
Details —
Transition statement — Next we will look at an example of a definition for a CDC datastore.
Instructor Guide
Using the CDC Management Console to define a

Datastore linked to the CDC Instance
Note, this is the same

tcpip port number defined
for the CDC Instance
Figure 5-25. Using the CDC Management Console to define a Datastore linked to the CDC Instance CL4636.0
Notes:
The visual shows an example of a new CDC datastore being created.
The datastore is assigned a name. The input includes a tcpip host name and port number that
associated the datastore with a specific CDC instance. This links the datastore to one DB2
database.
V9.0
Instructor Guide

Purpose — To show how the CDC Management Console application can be used to create a new
datastore linked to the DB2 database where the shadow tables are defined.
Details —
Transition statement — Next we will see how a CDC defined user profile will be associated with a
datastore.
Instructor Guide
Using the CDC management Console to assign a

user connection for the datastore
Note, CDC will use this

user/password for its
connection to the DB2
database
Figure 5-26. Using the CDC management Console to assign a user connection for the datastore CL4636.0
Notes:
A datastore connection needs to be defined to allow the CDC user to manage the CDC subscription
and table mappings for a DB2 database.
The visual shows how the CDC Management Console can be used to define and manage datastore
connections. These can also be defined using CDC Access Server commands.
V9.0
Instructor Guide

Purpose — To show how a CDC user profile gets linked to a datastore connection using the CDC
Management Console application.
Details —
Transition statement — Next we will see how to create the CDC Subscription for a DB2 database
with Shadow tables.
Instructor Guide
Create a CDC Subscription to manage Table

Mappings
Note, for shadow tables the row-organized

source and the column-organized target
are in the same DB2 database, so a single
datastore is used
Figure 5-27. Create a CDC Subscription to manage Table Mappings CL4636.0
Notes:
Before you can start replicating data, you need to add a subscription. A CDC subscription is a group
of table mappings. A subscription provides a single point of control for common operations such as
start mirroring, stop mirroring, or start refresh for a set of tables that must be maintained at the
same time.
Mirroring is the process of continuous replication of changed data from the source system to the
target system, whereas refresh is the process that synchronizes the target table with the current
contents of the source table.
We will only define a single CDC subscription to contain all of the table mappings from the
row-organized tables to the shadow tables. Since all of the tables are in one DB2 database, a single
CDC datastore will be specified as the source and target.
V9.0
Instructor Guide

Purpose — To show how a CDC Subscription can be created using the CDC management
console.
Details —
Transition statement — Next we will step through the options to define one table mapping.
Instructor Guide
Define a table mapping from the Row-organized

source to the Column-organized target – Step 1
Figure 5-28. Define a table mapping from the Row-organized source to the Column-organized target – Step 1 CL4636.0
Notes:
Next we will need to add table mappings for each of the shadow tables in the DB2 database.
The first prompt is to select a mapping type. We can select Custom table mapping for mapping
mode with a standard mapping type.
V9.0
Instructor Guide

Purpose — To begin to define a table mapping for the CDC subscription.
Details —
Transition statement — Next we will select the source and target tables for one table mapping.
Instructor Guide

Select the source table and target table for replication
Notes:
Using the CDC Management Console, the source row-organized table and the target
column-organized shadow table can be selected from lists of table objects in the DB2 database
associated with the CDC datastore.
V9.0
Instructor Guide

Purpose — To show the next step in creating a table mapping, the selection of the source and
target tables.
Details —
Transition statement — Next we will see how a key is defined to allow CDC to replicate changes
form the source to the target shadow table.
Instructor Guide

• The primary key on the shadow

table is required to provide a one-
to-one mapping for each row in
the source table to the
corresponding row in the shadow
table.
• The primary key on the shadow

table must match the enforced
primary key constraint or a unique
constraint on the source table in
both column list and key
sequence.
• This primary key also facilitates

maintenance of the shadow table.
Notes:
The next step in defining a table mapping is to define the key columns that CDC will use to replicate
changes from the source table to the target table.
Shadow tables are required to be defined with a primary key, so we can tell CDC to use the primary
key index to apply any changes form the source row-organized table to the shadow table.
DB2 enhanced the use of unique key and primary key indexes to perform single row update and
delete processing for column-organized tables, so CSC will benefit from that functionality.
V9.0
Instructor Guide

Purpose — To discuss the definition of the key columns for the target shadow table to allow CDC to
apply changes.
Details —
Transition statement — Next we will see how the replication method option is selected for the
table mapping.
Instructor Guide

Choose the Mirror option for Replication
Review options selected for Replication
Notes:
For shadow tables, the CDC table mapping replication method will be set to Mirror. We want the
changes in the row-organized tables to be continuously applied to the shadow table copies.
The Management Console will show the options selected for the table mapping and ask if additional
table mappings need to be defined, or if the subscription definition is complete.
You will define a table mapping for each shadow table that you created with the CREATE TABLE
statement defined as MAINTAINED BY REPLICATION.
V9.0
Instructor Guide

Purpose — To show an example of the selection of a replication method for shadow tables, which
will use the mirroring method for CDC rather than the refresh method.
Details —
Transition statement — Next we will see how to start CDC mirroring for the newly defined
subscription.
Instructor Guide
Start Mirroring for the Shadow Tables using the

CDC Subscription
Subscription uses
the Datastore for
source and target
Note, CDC understands that one shadow

table has not been loaded with data and
needs to be refreshed before standard
replication can be started
Figure 5-32. Start Mirroring for the Shadow Tables using the CDC Subscription CL4636.0
Notes:
Once a CDC subscription is created to contain the table mappings from the row-organized tables to
the shadow tables that we defined using CREATE TABLE statements with the MAINTAINED BY
REPLICATION option, we need to instruct CDC to start the mirroring operation.
We can use the CDC Management Console to select the subscription object and right click to show
the command options and then select start mirroring. We will choose Continuous for a mirroring
option.
The Management Console application will show how many of the table mappings require a refresh
operation before normal mirroring of changes can be started. If we are defining a new set of table
mappings, CDC can use LOAD processing to copy the source table data to the shadow tables.
CDC tracks its position of processing in the DB2 logs to make sure all changes to the shadow
tables are completed.
If we stop CDC mirroring for the subscription for a period of time, CDC will remember its log position
and read the logs produced since mirroring was stopped to catch up to the current table changes.
V9.0
Instructor Guide
Uempty If we need to add new shadow tables, the CDC subscription will need to be updated with additional
table mappings. Those changes will require stopping mirroring and restarting mirroring once the
changes to the subscription are completed.
Instructor Guide
Instructor notes:
Purpose — To show an example of starting mirroring for a CDC subscription that contains the
shadow tables using the CDC Management Console application.
Details —
Transition statement — Next we will see how mirroring is stopped using Management Console.
V9.0
Instructor Guide
Uempty
End Replication using the CDC Subscription in
order to define additional table mappings
Subscription uses
the Datastore for
source and target
Figure 5-33. End Replication using the CDC Subscription in order to define additional table mappings CL4636.0
Notes:
The visual shows an example of using the CDC Management Console to End Replication for a
CDC Subscription.
You might need to stop replication to make changes to the CDC subscription, like adding or deleting
a table mapping.
Instructor Guide
Instructor notes:
Purpose — To show an example of using the CDC Management Console to stop replication for a
subscription.
Details —
Transition statement — Next we will look at using the CDC Management Console to monitor CDC
processing and status for shadow tables.
V9.0
Instructor Guide
Uempty
Use the Management Console to monitor activity
for the CDC subscription
Subscription uses
the Datastore for
source and target
Figure 5-34. Use the Management Console to monitor activity for the CDC subscription CL4636.0
Notes:
You can use the IBM InfoSphere CDC Management Console to monitor your InfoSphere CDC
subscription. This tool supports the monitoring of replication operations and latencies, the collection
of various statistics about source and target datastores, and event detection and notification.
Instructor Guide
Instructor notes:
Purpose — To show how the CDC Management Console can be used to monitor the status of the
CDC Subscription for the DB2 shadow tables.
Details —
Transition statement — Next we will look at using tools like Data Studio to create shadow tables.
V9.0
Instructor Guide
Uempty
Using Data Studio to create Shadow Tables or
User Maintained MQT tables
• Data Studio can be used to generate the DB2 DDL statements to create
Shadow Tables and Column-organized MQT tables
• Data Studio does not create the necessary Infosphere CDC table
mappings or subscriptions
Figure 5-35. Using Data Studio to create Shadow Tables or User Maintained MQT tables CL4636.0
Notes:
The IBM Data Studio tool was updated to support the creation of Shadow Tables and
Column-organized User Maintained MQT Tables. The tool understands the options that are
supported and generated the DDL statements to create the MQT tables.
The generated DDL for a shadow table includes the CREATE TABLE statement for the shadow
table. It also generates the SET INTEGRITY statement and an ALTER TABLE statement to define
the required primary key constraint.
The Data Studio tool does not generate or maintain the CDC objects like subscriptions and table
mappings. You can use the CDC Management Console to create and manage the CDC objects
related to shadow tables.
Instructor Guide
Instructor notes:
Purpose — To show how IBM Data Studio could be used to define new shadow tables or User
Maintained MQT tables.
Details —
Transition statement — Next we will see how the DB2 explain tool can be used to verify the use of
a shadow table for a particular SQL statement.
V9.0
Instructor Guide
Uempty
Infosphere Query Workload Tuner support for
Shadow tables and Column-organized MQT tables
• InfoSphere Optim Query Workload Tuner, Version 4.1.1
contains the following enhancements and new features
– Allows column-organized MQTs to be defined on column-organized
tables
• When connected to a DB2 10.5.4 database, the Workload Table
Organization Advisor can convert row-organized, user-maintained,
deferrred-refresh MQTs to column organization when the tables on which
an MQT is defined are column organized
– Allows column-organized copies (or replicas) of row-organized tables
in the form of a shadow tables (where subsets of columns can be
replicated.
• When you are connected to a DB2 10.5.4 database, the Workload Table
Organization Advisor can recommend shadow tables for row-organized
tables.
Figure 5-36. Infosphere Query Workload Tuner support for Shadow tables and Column-organized MQT tables CL4636.0
Notes:
The tool Infosphere Query Workload Tuner has been enhanced to recommend column-organized
user maintained MQT tables and shadow tables to improve a query workload when the DB2
database associated with the workload is running DB2 10.5 Fix Pack 4 or higher.
This would be useful where a workload is being analyzed that uses row-organized tables with MQT
tables. The query workload tuner could suggest converting the base tables to column organization
and also to convert the row-organized MQT tables to user maintained column-organized MQT
tables.
The DB2 design advisor tool, db2advis ignores column-organized tables when a workload is
analyzed.
Instructor Guide
Instructor notes:
Purpose — To discuss using the Infosphere Query Workload Tuner to analyze SQL workloads to
determine if shadow tables or user maintained MQT tables would improve workload performance.
Details —
Transition statement — Next we will see how the DB2 explain tool can be used to see if a shadow
table will be used in an access plan.
V9.0
Instructor Guide
Uempty
Checking usage of a Shadow Table in the access
plan for a query
set current degree 'ANY';
Access Plan:
set current maintained table types for optimization REPLICATION
; 0
SORT
set current refresh age 500 ;
( 3)
7.05284
Original Statement: 1
------------------ |
SELECT 0
HISTORY.BRANCH_ID, CTQ
sum(HISTORY.balance) as br_balance, ( 4)
count(*) as br_trans 7.05195
1
FROM ROWORG.HISTORY AS HISTORY
|
WHERE HISTORY.BRANCH_ID between 10 and 20 0
GROUP BY HISTORY.BRANCH_ID GRPBY
ORDER BY HISTORY.BRANCH_ID ASC ( 5)
7.05015
Extended Diagnostic Information: 1
-------------------------------- |
Diagnostic Identifier: 1 0
TBSCAN
Diagnostic Details: EXP0148W The
following MQT or statistical view was ( 6)
considered in query matching: "COLORG ". 7.04846
"HIST_SHAD". 1
Diagnostic Identifier: 2 |
Diagnostic Details: EXP0149W The 0
following MQT was used (from those CO-TABLE: COLORG
considered) in query matching: "COLORG ". HIST_SHAD
"HIST_SHAD". Q1
Figure 5-37. Checking usage of a Shadow Table in the access plan for a query CL4636.0
Notes:
The slide contains a series of SET CURRENT statements that an application could use to allow the
DB2 optimizer to utilize a shadow table for an access plan.
The db2exfmt explain report shows the original SQL statement references the ow-organized table
ROWORG.HISTORY. The access plan graph shows that the shadow table, COLORG.HIST_SHAD.
The Diagnostic section of the explain report contains messages idicating that the shadow table was
considered and selected for use in the access plan.
Instructor Guide
Instructor notes:
Purpose — To look at a DB2 explain report showing a shadow table being used to perform a
SELECT statement tat references the row-organized table. The SET CURRENT statements were
necessary to meet the conditions necessary for the DB2 optimizer to consider use of the shadow
table.
Details —
Transition statement — Next we will look at an explain report for a two table join.
V9.0
Instructor Guide
Uempty
Using Shadow tables to process queries that join
multiple Row-organized tables
• What if a query references a row-organized table that has a shadow table defined
and joins it with a row-organized table that does not have a shadow table ?
SELECT BR.branch_name, sum(HISTORY.balance) as br_balance,

count(*) as br_trans
FROM
ROWORG.HISTORY AS HISTORY, ROWORG.BRANCH as BR
WHERE
Restriction for SQL Optimization
HISTORY.BRANCH_ID between 10 and 20 and
Shadow tables can only be used
HISTORY.BRANCH_ID = BR.BRANCH_ID
GROUP BY br.BRANCH_NAME
for joins if all of the tables have
ORDER BY br.BRANCH_NAME ASC
shadow tables defined
Extended Diagnostic Information:

--------------------------------
Diagnostic Details: EXP0076W No materialized query table matching was performed on
the statement during query rewrite because there is a replication-maintained
materialized query table defined on at least one, but not every, table referenced in the
query.
Figure 5-38. Using Shadow tables to process queries that join multiple Row-organized tables CL4636.0
Notes:
With standard row-organized MQT tables, the MQT might be selected to perform a portion of the
query processing and table referenced in the SQL statement that are not included in the MQT
definition could be joined with the MQT table in the access plan.
The sample SQL query shown references two tables, ROWORG.HISTORY that has an associated
column-organized shadow table defined, and ROWORG.BRANCH a small row-organized table that
does not have a matching shadow table.
The explain report for this SQL statement contains a warning message in the diagnostic section that
describes the restriction that a replication-maintained MQT table, a shadow table was detected
during access plan generation, but since the SQL statement referenced a table that did not have a
shadow table defined, the MQT matching was bypassed and the referenced row-organized tables
were used.
Instructor Guide
Instructor notes:
Purpose — To show how the explain report can note that the rule for use of shadow tables in
joining tables, is that all the joined tables need to have shadow tables defined or none of the
shadow tables can be used.
Details —
Transition statement — Next we will see that the DB2 optimizer will use the shadow tables for a
join if all joined tables have shadow tables defined.
V9.0
Instructor Guide
Uempty
Access plan example for joining two row-organized
tables with access routed to two shadow tables
Access Plan: |
----------- 1
Total Cost: 405.401 CTQ
Query Degree: 1 ( 4)
405.399
89
Rows |
RETURN 1
( 1) GRPBY
Cost ( 5)
I/O 405.397
| 89
1 |
53995.1
TBSCAN
^HSJOIN
( 2) ( 6)
405.401 403.253
89 89
| /-------+-------\
1 300408 11
SORT TBSCAN TBSCAN
( 3) ( 7) ( 8)
287.267 112.765
405.4
73 16
89 | |
490864 100
CO-TABLE: COLORG CO-TABLE: COLORG
HIST_SHAD BRANCH_SHAD
Q2 Q1
Figure 5-39. Access plan example for joining two row-organized tables with access routed to two shadow tables CL4636.0
Notes:
The visual shows the access plan for the same SQL statement referenced in the previous slide can
utilize two shadow tables if they are defined and meet the latency criteria set by the application.
Instructor Guide
The diagnostic section of the explain report would look similar to the following:
--------------------------------
Diagnostic Details: EXP0148W The following MQT or statistical view was
considered in query matching: "COLORG ".
"BRANCH_SHAD".
considered in query matching: "COLORG ".
"HIST_SHAD".
Diagnostic Details: EXP0149W The following MQT was used (from those
considered) in query matching: "COLORG ".
"BRANCH_SHAD".
considered) in query matching: "COLORG ".
"HIST_SHAD".
V9.0
Instructor Guide

Purpose — To show that a join query can utilize shadow tables if all of the tables being joined also
have shadow tables.
Details —
Transition statement — Next we will compare shadow tables and column-organized user
maintained MQT tables.
Instructor Guide
Comparison of Column-organized User Maintained

MQT tables to Shadow Tables
Shadow Tables Column-organized User
Maintained MQT
MQT Table organization Column-organized Column-organized
Source Table Row-organized Column-organized

organization
Table References in Only one table reference One or more tables referenced in
MQT definition allowed AS SELECT clause
MQT definition MAINTAINED BY MAINTAINED BY USER

REPLICATION
GROUP BY allowed in No Yes

MQT defintion
REFRESH AGE setting ANY or value indicating a limit ANY

options for latency of CDC Replication
Method of loading and Infosphere CDC Manual LOAD or SQL

maintaining data in MQT Manual LOAD or SQL
Figure 5-40. Comparison of Column-organized User Maintained MQT tables to Shadow Tables CL4636.0
Notes:
The table compares some of the characteristics for the two types of column-organized MQT tables.
One key difference is that a shadow table is always based on a single row-organized table. A user
maintained column-organized MQT table can reference one or more column-organized tables.
A shadow table MQT can not contain a GROUP BY clause in the AS SELECT section of the MQT
definition.
Latency-based routing of a query can only be performed using replication-maintained shadow
tables. User maintained column-organized MQT tables must be accessed with a CURRENT
REFRESH AGE set to ANY.
V9.0
Instructor Guide

Purpose — To review and clarify the differences between shadow tables and the column-organized
MQT tables. Since shadow tables can not contain a GROUP BY clause, a DBA might try to
implement a user maintained column-organized MQT to handle a summary query based on a
row-organized table. The problem is that you can not reference a row-organized table in a user
maintained column-organized MQT definition.
Details —
Transition statement — Next we will discuss using a LOAD utility to populate a shadow table
rather than using CDC to refresh the table contents.
Instructor Guide
Refresh a Shadow Table using LOAD outside of

CDC without concurrent IUDs
• CDC uses LOAD (with fixed options) to refresh shadow tables
• How to control the LOAD options for refresh?
– Can refresh a shadow table by directly LOAD into it
– After LOAD, need to “mark table capture point” on the row-organized table
to indicate to CDC that shadow table is in sync with row-organized table at
the current log position. Available as a command and through CM
– User needs to guarantee that there is no IUDs on the row-organized table
during this process as CDC will not apply any IUDs, prior to the capture
point, to the shadow table
– CDC will start replicating IUDs, after the capture point, to the shadow table
db2 “LOAD FROM trade.del OF DEL

REPLACE INTO TRADE_SHADOW NONRECOVERABLE”; Use LOAD to refresh shadow
db2 “SET INTEGRITY FOR TRADE_SHADOW table outside of CDC
ALL IMMEDIATE UNCHECKED”;
-- CDC command to mark capture point

dmmarktablecapturepoint -I <cdc_instance> Inform CDC that
-s <subscription> -t <schema>.TRADE; TRADE_SHADOW is in sync
with TRADE
Figure 5-41. Refresh a Shadow Table using LOAD outside of CDC without concurrent IUDs CL4636.0
Notes:
When CDC refreshes a shadow table, it uses the DB2 LOAD command with a fixed set of options.
It is possible to directly LOAD into a shadow table outside of CDC, allowing the possibility to tailor
the LOAD options. When directly running LOAD on the shadow table, it is important to use either
NONRECOVERABLE or COPY YES to avoid putting the tablespace in backup pending.
After the LOAD into the shadow table, you need to perform a “mark table capture point” on the
row-organized table to inform CDC that the shadow table is sync with the row-organized table at the
current log position.
Note CDC will not apply any Insert, Update or Deletes that occurs prior to this capture point to the
shadow table. This implies that it is up to the user to guarantee that there is no IUDs on the
row-organized table during this processing.
Once the mark capture point is performed, CDC will start to replicate any IUDs after the capture
point to the shadow table when the subscription is started.
V9.0
Instructor Guide

Purpose — To discuss using a manual LOAD for shadow tables. It is necessary to inform CDC that
the shadow table has been loaded, so it can begin mirroring changes form the DB2 logs.
Details —
Transition statement — Next we will discuss how shadow tables could be used in a database
using HADR.
Instructor Guide
Using Shadow Tables in a database with HADR Primary

and Standby databases in use
Figure 5-42. Using Shadow Tables in a database with HADR Primary and Standby databases in use CL4636.0
Notes:
The use of shadow tables is supported for high availability disaster recovery (HADR) environments.
However, there are a number of considerations for ensuring that latency-based routing is available
in the event of a failover. For an HADR setup, you install and configure InfoSphere CDC on both a
primary server and on the standby server. The InfoSphere CDC instance is active on the
primary server and passive on the standby. HADR replicates data from the primary to the standby,
including both source and shadow tables.
With InfoSphere CDC, there are two types of metadata that are used to synchronize the source and
target tables. The operational metadata, which is information such as the instance signature and
bookmarks, is stored in the database and is replicated by HADR to the standby servers.
The configuration metadata, which is information such as subscriptions and mapped tables, is
stored in the cdc-installation-dir, so it is not replicated by HADR. Therefore, after you implement
shadow tables in an HADR environment, any configuration changes that are made to the shadow
tables on the primary are not reflected on the standby server.
You propagate configuration metadata changes by employing a scheduled InfoSphere CDC
metadata backup-and-copy over the standby servers by using the dmbackupmd command.
V9.0
Instructor Guide
Uempty You might also want to apply those changes to the standby servers immediately after they take
place. For example, after any applying DDL statements that change the source row-organized
table, the target shadow table, or both.
After a failover or role switch occurs and the HADR primary role has moved to a host where
InfoSphere CDC was previously passive, you must manually start CDC on the new primary node. If
the configuration metadata is not synchronized, you have to reapply any changes that occurred on
the old primary server.
Instructor Guide
Instructor notes:
Purpose —
Details —
V9.0
Instructor Guide
Uempty
Unit summary
Shadow tables
• Describe the various Special Register settings, like REFRESH
AGE and CURRENT MAINTAINED TABLE TYPES FOR
OPTIMIZATION for using Shadow tables
• Utilize explain reports to verify use of Shadow Tables in access
plans
Notes:
Instructor Guide
Instructor notes:
Purpose —
Details —
V9.0
Instructor Guide
Uempty
Student exercise
Figure 5-44. Student exercise CL4636.0
Notes:
Instructor Guide
Instructor notes:
Purpose —
Details —
V8.1
Instructor Guide
Uempty Unit 6. Using Optimizer Profiles to control Access

Plans
Estimated time
2:30

This unit describes the use of optimization profiles to directly control
the access plans selected by the DB2 Optimizer. We will begin with a
general discussion of the structure and options available for creating
optimization profiles. This will help developers or administrators that
need to implement optimization profiles to support specific application
performance goals. The lecture includes a series of sample
optimization profiles that will be used to demonstrate some of the
guidelines that can be included in an optimization profile. Sample
Explain tool reports will be used to better understand the effect these
optimization profiles have on the access plans generated to execute
the application SQL statements. We will see how optimization profiles
can impact how tables are accessed, the join methods used and the
selection of Materialized Query Tables in the access plans for SQL
statements.

• Create an optimizer profile to control access plans generated for
SQL statements
• Define Global guidelines in an optimizer profile that impact all SQL
statements for an application
• Select a specific index to access a table for one SQL query in an
optimizer profile
• Specify the join methods used to join tables using an optimizer
profile
• Describe the techniques to refer to tables within a profile when the
application utilizes views rather than direct table names
• Use the DB2 Explain tool to verify the access plan created based
on an optimizer profile and to resolve any profile format problems
© Copyright IBM Corp. 2005, 2015 Unit 6. Using Optimizer Profiles to control Access Plans 6-1
Instructor Guide
• Create an optimizer profile to control use of Materialized Query

Tables in the generated access plan
V8.1
Instructor Guide
Uempty
Unit objectives
• Create an optimizer profile to control access plans generated for SQL
statements
optimizer profile
• Specify the join methods used to join tables using an optimizer profile
• Use the DB2 Explain tool to verify the access plan created based on an
optimizer profile and to resolve any profile format problems
• Create an optimizer profile to control use of Materialized Query Tables in
the generated access plan
Notes:
These are the objectives for this unit.
Instructor Guide
Instructor notes:
Purpose — Describe the objectives for this lecture unit.
Details —
V8.1
Instructor Guide
Uempty
Optimizer Profiles Concepts

Figure 6-2. Optimizer Profiles Concepts CL4636.0
Notes:
We will start by discussing the concepts involved in creating and using optimization
profiles.
Instructor Guide
Instructor notes:
Purpose —
Details —
Transition statement — First, we will compare using an optimizer profile with other
methods used to influence the access plan for processing SQL statements.
V8.1
Instructor Guide
Uempty
Optimizer Profiles: Overview

• There are many ways to influence the access plan generated by the DB2 Optimizer:
– New or more detailed RUNSTATS
– Setting the query optimization class – DFT_QUERYOPT
– Create a Statistical View
– Manually updating the table or index statistics
– Setting Tablespace Overhead and Transferrate values
– Adjust the memory amounts for buffer pools and sort heap
– Set Query Compiler Registry Variables like DB2_REDUCED_OPTIMIZATION
• Optimizer profiles offer a method to control specific parts of the access plan:
– How a table is accessed; table scan versus index usage
– Which Index is used to access a table
– The join method used for table joins; Hash Join versus Nested Loop
– Which MQT tables are considered for rerouting table access
– Some controls can apply to all SQL statements (Global)
– Some controls can be limited to a single SQL statement text
• Optimizer profiles are defined and can be adjusted outside of an application
Figure 6-3. Optimizer Profiles: Overview CL4636.0
Notes:
The DB2 Optimizer is designed to select an access strategy that provides good
performance and minimizes the system resources required to process database requests.
There are times when the access plan selected by the optimizer might not perform well and
the application developer or system administrator might decide to alter the access plan.
There are many ways that can be used to influence the access plan selected for database
requests.
• If the table or index statistics are missing, out of date, or lacking some important details,
the RUNSTATS utility can be used to collect new statistics that can provide the
optimizer with better information and will often correct an inefficient access plan.
• The DB2 Optimizer supports setting a level of optimization that controls the amount of
time spent checking different access options. Changing the optimization class might
result in a different access plan being selected that might result in better performance
for the application. A default optimization class can be set for the database using the
DFT_QUERYOPT configuration option.
Instructor Guide
• The collection of catalog statistics based on a statistical view can in some cases provide
the DB2 Optimizer with accurate information about the result from a query that might
not be apparent from the standard table and index statistics. These view-based
statistics might help generate a more efficient access plan.
• DB2 permits the manual updating of table or index statistics, which could influence the
optimizer to select a different access strategy for processing a database request. This
method usually involves a trial and error approach that could take a significant amount
of time to complete. Manually altering table or index statistics might produce an efficient
access plan for one type of database request but cause other application requests that
access these tables to perform less efficiently.
• The table space values for the OVERHEAD and TRANFERRATE are used to estimate
I/O performance. In some cases, setting different values for these characteristics could
influence the optimizer to alter the access plan. When the optimizer is comparing the
expected performance of performing a table scan compared to access through an
index, these performance characteristics play a key role. In general, setting these to
match the performance of the current disk system should provide the optimizer with an
accurate view of the database server system.
• The optimizer considers the size of database buffer pools and the amount of memory
available for sort operations. The memory available for a sort operation is based on the
configuration option sortheap. Changing these memory allocations might generate a
better access plan for an application. The self tuning memory management options
might automatically adjust buffer pool and sort memory sizes, which could effect access
plans for requests running on the database server.
• There are a group of DB2 query compiler related registry variables that can effect the
processing performed by the DB2 Optimizer, like DB2_REDUCED_OPTIMIZATION.
These can effect the amount of time spent generating an access plan and might alter
the access plan selected for some SQL statements.
The support for optimization profiles was formalized with DB2 9, to provide more direct
control over the access plan selected during SQL compilation. An optimization profile is an
XML document that contains optimization guidelines for one or more data manipulation
language (DML) statements. The optimizer profile provides a method to impact the
compilation of access plans and can be used to request adjustments to the default access
strategy generated by the SQL compiler.
An optimizer profile can be defined to specify the following:
• The method used to access a table. For example, you might want to request an index
scan instead of a non-indexed table scan.
• A particular index can be selected to access a table.
• The method used to join tables can be specified. For example, you might decide a Hash
Join technique would perform better than the Nested Loop join selected by the
optimizer.
V8.1
Instructor Guide
Uempty • A list of Materialized Query Tables, or MQTs, can be specified to control which MQTs
the optimizer should consider to access in place of the tables referenced by a query.
Some of the guidelines listed in an optimizer profile can apply to any SQL statement
compiled when the profile is in effect. Some guidelines, like the selection of a particular
index to use in an access plan, apply to a specific SQL statement defined in the profile.
Instructor Guide
Instructor notes:
Purpose — To review the methods that can be used to influence the access plan
generated for SQL statements. It is important to note that these methods would normally be
used to correct a performance problem before an optimizer profile would be considered
necessary. You want to emphasize that an optimizer profile can be used to directly control a
portion of the access plan, once you determine that a change to the default access plan is
necessary to achieve the necessary performance results.
Details —
Additional information — Although optimization profiles were implemented into the DB2
LUW product prior to DB2 9, the support was formalized and documented for general use
beginning with DB2 9.1 and there is a trend to add additional types of guidelines with new
DB2 releases.
Transition statement — Next we will look at the anatomy of an optimizer profile.
V8.1
Instructor Guide
Uempty
Optimization profiles: Anatomy

• XML document:
– Elements and attributes understood as explicit optimization guidelines
– Composed and validated with Current Optimization Profile Schema (COPS):
• Options for the current release can be viewed in the Information Center
• Also contained in a file SQLLIB/MISC/DB2OptProfile.xsd
– Profile Header (exactly one):

• Metadata and processing directives
• For example, COPS version
– Global optimization guidelines (at most one):

• Apply to all statements for which profile is in effect
• For example, eligible MQTs guideline defining MQTs to be considered for routing
– Statement-level optimization guidelines (zero or more):

• Apply to a specific statement when the profile is in effect
• Specifies aspects of desired execution plan
Figure 6-4. Optimization profiles: Anatomy CL4636.0
Notes:
An optimization profile is an XML document that can contain optimization guidelines for one
or more SQL statements. The correspondence between each SQL statement and its
associated optimization guidelines is established using the SQL text and other information
that is needed to unambiguously identify an SQL statement.
The valid optimization profile contents for a given DB2 release is described by an XML
schema that is known as the current optimization profile schema (COPS). An optimization
profile applies only to DB2 Database for Linux, UNIX, and Windows servers. The COPS
can be displayed using the Information Center and a copy can also be found in
DB2OptProfile.xsd, which is located in the misc subdirectory of the sqllib directory.
An optimization profile can contain global guidelines, which apply to all data manipulation
language (DML) statements that are executed while the profile is in effect, and it can also
contain specific guidelines that apply to individual DML statements in a package.
For example:
Instructor Guide
You could write a global optimization guideline requesting that the optimizer refer to the
materialized query tables (MQTs) Test.SumSales and Test.AvgSales whenever a
statement is processed while the current optimization profile is active.
You could write a statement-level optimization guideline requesting that the I_SUPPKEY
index be used to access the SUPPLIERS table whenever the optimizer encounters the
specified statement.
An optimization profile contains two major sections where you can specify these two types
of guidelines: a global optimization guidelines section can contain one OPTGUIDELINES
element, and a statement profile section can contain any number of STMTPROFILE
elements. An optimization profile must also contain an OPTPROFILE element, which
includes metadata and processing directives.
V8.1
Instructor Guide

Purpose — To discuss the contents of the XML formatted document that defines an
optimizer profile. Some students that have not worked with XML formatted documents
might be concerned that it will be difficult to create a working optimizer profile with a limited
knowledge of XML. This lecture is intended to show that simple working examples will
make it relatively easy to create optimizer profiles when they are needed to quickly address
a specific performance requirement.
Details —
Additional information — The approach taken with this lecture is to provide many
working examples of optimization profiles to help student prepare to implement an
optimization profile for their applications when necessary. The COPS is a dictionary that
defines what the valid optimization guidelines are for the current release of DB2. It can be
used to determine which elements and attributes to put into the XML document that defines
an optimization profile. We are not going to try and teach students how to use the COPS
document in this lecture. This lecture will present a series of examples that can be used to
implement the most common types of guidelines and will reduce but not eliminate the need
to reference the detailed format information contained in the COPS.
Transition statement — Next we will look at some sample information in the optimization
profile schema.
Instructor Guide
Optimization profile schema contents example

List of Access Requests
*************************************************************************************










<xs:group name="accessRequest">
<xs:choice>
<xs:element name="TBSCAN" type="tableScanType"/>
<xs:element name="IXSCAN" type="indexScanType"/>
<xs:element name="LPREFETCH" type="listPrefetchType"/>
<xs:element name="IXAND" type="indexAndingType"/>
<xs:element name="IXOR" type="indexOringType"/>
<xs:element name="XISCAN" type="indexScanType"/>
<xs:element name="XANDOR" type="XANDORType"/>
<xs:element name="ACCESS" type="anyAccessType"/>
</xs:choice>
</xs:group>
Figure 6-5. Optimization profile schema contents example List of Access Requests CL4636.0
Notes:
The visual shows a small section from the file DB2OptProfile.xsd, showing a list of access
request elements that can be defined in a optimization profile.
For example the element name “IXSCAN” can be used to define indexed access to a table,
while a different element ‘XISCAN’ is used to define access using an XML index.
V8.1
Instructor Guide

Purpose — To see a short example of the text in the file DB2OptProfile.xsdshowing the
valid access request types that the optimizer recognizes in an optimization profile.
Details —
Transition statement — Next we will look at an example of an XML document that defines
an optimizer profile.
Instructor Guide
Sample optimization profile

<?xml version="1.0" encoding="UTF-8"?>
<OPTPROFILE VERSION=“10.5.0">

<OPTGUIDELINES>
<MQT NAME=“Test.AvgSales"/>
<MQT NAME=“Test.SumSales"/>
</OPTGUIDELINES>

<STMTPROFILE ID="Guidelines for TPCD Q9">
<STMTKEY SCHEMA="TPCD">
<![CDATA[SELECT S.S_NAME, S.S_ADDRESS, S.S_PHONE,
S.S_COMMENT FROM PARTS P, SUPPLIERS S, PARTSUPP PS
WHERE P_PARTKEY = PS.PS_PARTKEY AND S.S_SUPPKEY = PS.PS_SUPPKEY AND P.P_SIZE = 39
AND P.P_TYPE = ’BRASS’ AND S.S_NATION = ’MOROCCO’ AND S.S_NATION IN (’MOROCCO’, ’SPAIN’)
AND PS.PS_SUPPLYCOST = (SELECT MIN(PS1.PS_SUPPLYCOST) FROM PARTSUPP PS1, SUPPLIERS S1
WHERE P.P_PARTKEY = PS1.PS_PARTKEY AND S1.S_SUPPKEY = PS1.PS_SUPPKEY AND
S1.S_NATION = S.S_NATION)]]>
</STMTKEY>
<OPTGUIDELINES>
<IXSCAN TABLE=“S1" INDEX="I_SUPPKEY"/>
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
Figure 6-6. Sample optimization profile CL4636.0
Notes:
The visual shows an example of a valid optimization profile for DB2 Version 10.5,
containing a global optimization guidelines section and a statement profile section with one
STMTPROFILE element.
<OPTPROFILE VERSION="10.5.0.0">

<OPTGUIDELINES>
<MQT NAME="Test.AvgSales"/>
<MQT NAME="Test.SumSales"/>
</OPTGUIDELINES>
V8.1
Instructor Guide
Uempty

<STMTPROFILE ID="Guidelines for TPCD Q9">
<STMTKEY SCHEMA="TPCD">
<![CDATA[SELECT S.S_NAME, S.S_ADDRESS, S.S_PHONE,
S.S_COMMENT FROM PARTS P, SUPPLIERS S, PARTSUPP PS
WHERE P_PARTKEY = PS.PS_PARTKEY AND S.S_SUPPKEY = PS.PS_SUPPKEY
AND P.P_SIZE = 39 AND P.P_TYPE = 'BRASS'
AND S.S_NATION = 'MOROCCO' AND S.S_NATION IN ('MOROCCO', 'SPAIN')
AND PS.PS_SUPPLYCOST = (SELECT MIN(PS1.PS_SUPPLYCOST)
FROM PARTSUPP PS1, SUPPLIERS S1
WHERE P.P_PARTKEY = PS1.PS_PARTKEY AND S1.S_SUPPKEY = PS1.PS_SUPPKEY
AND S1.S_NATION = S.S_NATION)]]>
</STMTKEY>
<OPTGUIDELINES>
<IXSCAN TABLE="S1" INDEX="I_SUPPKEY"/>
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
• The OPTPROFILE element
An optimization profile begins with the OPTPROFILE element. In the preceding
example, this element consists of a VERSION attribute specifying that the optimization
profile version is 10.5.
• The global optimization guidelines section
Global optimization guidelines apply to all statements for which the optimization profile
is in effect. The global optimization guidelines section is represented by the global
OPTGUIDELINES element. In the preceding example, this section contains a single
global optimization guideline specifying that the MQTs Test.AvgSales and
Test.SumSales should be considered when processing any statements for which the
optimization profile is in effect.
• The statement profile section
A statement profile defines optimization guidelines that apply to a specific statement.
There can be zero or more statement profiles in an optimization profile. The statement
profile section is represented by the STMTPROFILE element. In the preceding
Instructor Guide
example, this section contains guidelines for a specific statement for which the
optimization profile is in effect.
Each statement profile contains a statement key and statement-level optimization
guidelines, represented by the STMTKEY and OPTGUIDELINES elements,
respectively.
The statement key identifies the statement to which the statement-level optimization
guidelines apply. In this example, the STMTKEY element contains the original
statement text and other information that is needed to unambiguously identify the
statement. Using the statement key, the optimizer matches a statement profile with the
appropriate statement. This relationship enables you to provide optimization guidelines
for a statement without having to modify the application.
The statement-level optimization guidelines section of the statement profile is
represented by the OPTGUIDELINES element. This section is made up of one or more
access or join requests, which specify methods for accessing or joining tables in the
statement. After a successful match with the statement key in a statement profile, the
optimizer refers to the associated statement-level optimization guidelines when
optimizing the statement. The example contains one access request, which specifies
that the SUPPLIERS table referenced in the nested subselect use an index named
I_SUPPKEY.
V8.1
Instructor Guide

Purpose — To show an example of an optimizer profile. Do not try to go into any detailed
examination of this profile. We will look at a series of examples later in the lecture. Point out
the sections within the XML document. The global section guidelines would apply to all
SQL statements. The statement profile section would only apply to the one SQL statement
defined by the statement key.
Details —
Transition statement — Next we will discuss how an optimizer profile can be put into
effect once the XML document is created.
Instructor Guide
Putting an optimization profile into effect (1 of 2)

• Create the OPT_PROFILE table in the SYSTOOLS schema
– Call the SYSINSTALLOBJECTS procedure:
db2 "call sysinstallobjects('opt_profiles', 'c', '', '')"
– Or Issue the CREATE TABLE statement:

create table systools.opt_profile
( schema varchar(128) not null,
name varchar(128) not null,
profile blob (2m) not null,
primary key (schema, name)
• Compose document, validate, insert into OPT_PROFILE with a qualified

name
– Create the XML document in a file: inventory_db.xml
– Assign the profile a unique name: schema.name , “DBA”.”INVENTDB”
– Create a file for use by the import utility
File profiledata.del:
“DBA”,”INVENTDB”,”inventory_db.xml”
– Run the Import
IMPORT FROM profiledata.del OF DEL MODIFIED BY LOBSINFILE
INSERT_UPDATE into SYSTOOLS.OPT_PROFILES
Figure 6-7. Putting an optimization profile into effect (1 of 2) CL4636.0
Notes:
For an optimization profile to be used within a database, the profile must be stored in the
table SYSTOOLS.OPT_PROFILE.
There are two methods to create this table:
1. Call the SYSINSTALLOBJECTS procedure:
db2 "call sysinstallobjects('opt_profiles', 'c', '', '')"
2. Issue the CREATE TABLE statement:
create table systools.opt_profile (
schema varchar(128) not null,
name varchar(128) not null,
profile blob (2m) not null,
primary key (schema, name)
)
The columns in the SYSTOOLS.OPT_PROFILE table are defined as follows:
V8.1
Instructor Guide
Uempty • SCHEMA: Specifies the schema name for an optimization profile. The name can
include up to 30 alphanumeric or underscore characters, but define it as
VARCHAR(128), as shown.
• NAME: Specifies the base name for an optimization profile. The name can include up to
128 alphanumeric or underscore characters.
• PROFILE: Specifies an XML document that defines the optimization profile.
After an optimization profile has been created and its contents validated against the current
optimization profile schema (COPS), the contents must be associated with a unique
schema-qualified name and stored in the SYSTOOLS.OPT_PROFILE table. You can use
the LOAD, IMPORT, and EXPORT commands to manage the files in that table.
For example, the IMPORT command can be used from any DB2 client to insert or update
data in the SYSTOOLS.OPT_PROFILE table. The EXPORT command can be used to
copy a profile from the SYSTOOLS.OPT_PROFILE table into a file.
The following example shows how to insert three new profiles into the
SYSTOOLS.OPT_PROFILE table. Assume that the files are in the current directory.
Create an input file (for example, profiledata) with the schema, name, and file name for
each profile on a separate line:
"ROBERT","PROF1","ROBERT.PROF1.xml"
"ROBERT","PROF2","ROBERT.PROF2.xml"
"DAVID", "PROF1","DAVID.PROF1.xml"
Execute the IMPORT command:
import from profiledata of del
modified by lobsinfile
insert into systools.opt_profile
To update existing rows, use the INSERT_UPDATE option on the IMPORT command.
Since the schema and name are assigned to the optimization profile as the row is being
added to the table, you should carefully select the schema and name column values so that
it will be easy to uniquely locate the XML document. This will be very important for
performing any problem determination later when the profile is being used.
Instructor Guide
Instructor notes:
Purpose — To explain that optimizer profiles must be stored in the table named
SYSTOOLS.OPT_PROFILE to be used by an application. Here we describe how to create
the table and how an optimizer profile can be added or replaced in the table using the
IMPORT utility.
Details —
Transition statement — Next we will discuss how an application can be associated with a
specific optimizer profile.
V8.1
Instructor Guide
Uempty
Putting an optimization profile into effect (2 of 2)

• At the package level, using the optprofile bind option
– For example, to bind the optimization profile “DBA”.”INVENTDB” to the package
“inventapp”
db2 prep inventapp.sqc bindfile optprofile DBA.INVENTDB
db2 bind inventapp.bnd
• At the dynamic statement level: using current optimization profile special

register
EXEC SQL SET CURRENT OPTIMIZATION PROFILE = ‘DBA.INVENTDB’;

/* The following statements are both optimized with ‘DBA.INVENTDB’ */
EXEC SQL PREPARE stmt FROM SELECT ... ;
EXEC SQL EXECUTE stmt;
EXEC SQL EXECUTE IMMEDIATE SELECT ... ;
db2 SET CURRENT SCHEMA = ‘JON';

db2 SET CURRENT OPTIMIZATION PROFILE = 'SALES';
/* This statement is optimized with 'JON.SALES' */
db2 “SELECT …………… “
Figure 6-8. Putting an optimization profile into effect (2 of 2) CL4636.0
Notes:
There are several methods to specify which optimization profile the optimizer is to use for
an application:
• For applications with embedded SQL, you can use the OPTPROFILE bind option to
specify that an optimization profile is to be used at the package level.
• An application can use the CURRENT OPTIMIZATION PROFILE special register to
specify that an optimization profile is to be used at the statement level.This special
register contains the qualified name of the optimization profile used by statements that
are dynamically prepared for optimization.
• For CLI applications, you can use the CURRENTOPTIMIZATIONPROFILE client
configuration option to set this special register for each connection.
The OPTPROFILE bind option setting also specifies the default optimization profile for the
CURRENT OPTIMIZATION PROFILE special register. The order of precedence for
defaults is as follows:
Instructor Guide
• The OPTPROFILE bind option applies to all static statements, regardless of any other
settings.
• For dynamic statements, the value of the CURRENT OPTIMIZATION PROFILE special
register is determined by the following, in order of lowest to highest precedence:
- The OPTPROFILE bind option
- The CURRENTOPTIMIZATIONPROFILE client configuration option
- The most recent SET CURRENT OPTIMIZATION PROFILE statement in the
application
V8.1
Instructor Guide

Purpose — To explain the ways an application can invoke an optimizer profile for DB2 to
use in generating the access plans for static or dynamic SQL statements.
Details —
Transition statement — Next we will discuss some of the optimizer guidelines that could
be included in an optimizer profile.
Instructor Guide
Optimization guidelines
• Access path guidelines:
– Base access request
• Method to access a table, for example, TBSCAN, IXSCAN
– Join request
• Method and sequence for performing a join, for example, HSJOIN, NLJOIN
– XML Access methods:
• XISCAN – Single XML Index scan
• XANDOR – Multiple XML Index access
• Query rewrite guidelines:
– INLIST2JOIN - IN-list to join
– SUBQ2JOIN - Subquery to join
– NOTEX2AJ - NOT EXISTS subquery to anti-join
– NOTIN2AJ - NOT IN subquery to anti-join
• General optimization guidelines:
– REOPT (ONCE/ALWAYS/NONE)
– MQT choices
– QRYOPT – set the query optimization class
– REGISTRY – set SQL compiler Registry Variables
– RTS – adjust Real Time Statistics time limit for new stats
Figure 6-9. Optimization guidelines CL4636.0
Notes:
An optimizer profile can contain guidelines that impact how tables are accessed.
• Access requests can be used to specify how a table should be accessed for a single
defined SQL statement. You can select a table scan or an index based scan. You can
request a particular index, or let the optimizer choose the best available index.
• For SQL statements that join tables, you can select the join method and also join
orders. For example you might want to use a nested loop join and pick which tables will
be the inner and outer table for join processing.
• For applications using tables with XML indexes, you can create an access request that
would specify the use of one or more XML indexes.
An optimizer profile can contain guidelines that impact the query rewrite phase of the
optimization process.
• A INLIST2JOIN query rewrite request element can be used to enable or disable the
IN-LIST predicate-to-join rewrite transformation.
V8.1
Instructor Guide
Uempty • The NOTEX2AJ query rewrite request element can be used to enable or disable the
NOT-EXISTS predicate-to-anti-join rewrite transformation.
• The NOTIN2AJ query rewrite request element can be used to enable or disable the
NOT-IN predicate-to-anti-join rewrite transformation.
• The SUBQ2JOIN query rewrite request element can be used to enable or disable the
subquery-to-join rewrite transformation.
General optimization guidelines can be used to impact the SQL compiler for all SQL
statements:
• The REOPT element can be used to override the setting of the REOPT bind option.
• The DEGREE element can override the intra-parallel processing option set during BIND
or set using the CURRENT DEGREE special register.
• The RTS general request element can be used to enable, disable or provide a time
budget for real-time statistics collection.
• The QUERYOPT element can be used to select a query optimization class, either at the
global or statement level.
• The REGISTRY element can be used to set options for selected DB2 Registry variables
that impact access plan selection.
• The MQT element specifies that an MQT should be considered for optimizing the
statements. Only specified MQTs will be considered. Multiple MQT elements can be
specified.
Instructor Guide
Instructor notes:
Purpose — To discuss some of the guidelines that can be included in an optimizer profile.
You might want to explain to students that we are only covering the most common types of
guidelines in this lectures, not every possible option.
Details —
Transition statement — Next we will look at an example of an optimizer profile that
contains both global and statement level guidelines.
V8.1
Instructor Guide
Uempty
Optimizer guidelines example

Example:
SELECT S.S_NAME, S.S_ADDRESS, S.S_PHONE, S.S_COMMENT
FROM "Tpcd".PARTS, "Tpcd".SUPPLIERS S, "Tpcd".PARTSUPP PS
WHERE P_PARTKEY = PS.PS_PARTKEY AND S.S_SUPPKEY = PS.PS_SUPPKEY AND
P_SIZE = 39 AND P_TYPE = ’BRASS’ AND
S.S_NATION IN ('MOROCCO', 'SPAIN') AND
PS.PS_SUPPLYCOST =
(SELECT MIN(PS1.PS_SUPPLYCOST)
FROM "Tpcd".PARTSUPP PS1, "Tpcd".SUPPLIERS S1
WHERE “Tpcd“.PARTS.P_PARTKEY = PS1.PS_PARTKEY AND
S1.S_SUPPKEY = PS1.PS_SUPPKEY AND
S1.S_NATION = S.S_NATION)
ORDER BY S.S_NAME;
<OPTGUIDELINES>
<NLJOIN>
<IXSCAN TABLE='”Tpcd”.Parts’/>
<IXSCAN TABLE="PS"/>
</NLJOIN>
</OPTGUIDELINES>
Join requests contains 2 elements – inner and outer

Elements can be base accesses or other join requests
Figure 6-10. Optimizer guidelines example CL4636.0
Notes:
For example, suppose that even after you had updated the database statistics and
performed all other tuning steps, the access plan generated by the optimizer still did not
perform well and you think that using a Nested Loop join between the PARTS and
PARTSUPP tables would perform better than the default plan.
Instructor Guide
This is the application’s SQL statement text:

SELECT S.S_NAME, S.S_ADDRESS, S.S_PHONE, S.S_COMMENT
FROM "Tpcd".PARTS, "Tpcd".SUPPLIERS S, "Tpcd".PARTSUPP PS
WHERE P_PARTKEY = PS.PS_PARTKEY AND S.S_SUPPKEY = PS.PS_SUPPKEY AND
P_SIZE = 39 AND P_TYPE = ’BRASS’ AND
S.S_NATION IN ('MOROCCO', 'SPAIN') AND
PS.PS_SUPPLYCOST =
(SELECT MIN(PS1.PS_SUPPLYCOST)
FROM "Tpcd".PARTSUPP PS1, "Tpcd".SUPPLIERS S1
WHERE “Tpcd“.PARTS.P_PARTKEY = PS1.PS_PARTKEY AND
S1.S_SUPPKEY = PS1.PS_SUPPKEY AND
S1.S_NATION = S.S_NATION)
ORDER BY S.S_NAME;
In this case, an explicit optimization guideline can be used to influence the optimizer. For
example:
<OPTGUIDELINES>
<NLJOIN>
<IXSCAN TABLE='”Tpcd”.Parts’/>
<IXSCAN TABLE="PS"/>
</NLJOIN>
</OPTGUIDELINES>
Optimization guidelines are specified using a simple XML specification. Each element
within the OPTGUIDELINES element is interpreted as an optimization guideline by the DB2
Optimizer. The NLJOIN element is used to request the Nested loop join method. Two
IXSCAN elements specify that index scans should be used to access the two tables. The
IXSCAN access request elements use TABLE attributes to identify the PARTS and
PARTSUPP tables based on the corresponding exposed names in the original statement.
Since the PARTS table is listed first, it will be used as the outer table for the join, while the
PARTSUPP table will be the inner table. Since no specific index names are specified, the
optimizer might select any available index, based on relative costs.
Each STMTPROFILE element provides a set of optimization guidelines for one application
statement. The targeted statement is identified by the STMTKEY sub-element. The
optimization profile is then given a schema-qualified name and inserted into the database.
V8.1
Instructor Guide

Purpose — To show an optimizer profile example that requests both the type of access for
two of the tables and specifies the join method to be used. The DB2 Optimizer is able to
select any index based on cost and can perform the other processing for the SQL
statement using the most efficient methods. Explain that Optimization guidelines that
identify exposed or extended names that are not unique within the context of the entire
statement are considered ambiguous and are not applied.
Details —
Transition statement — Next we will discuss in more detail how to define the table
references in an optimizer profile.
Instructor Guide
Forming table references

• Two methods to reference a table:
– Reference ‘exposed’ name in the original SQL statement:
• Use ‘TABLE’ attribute
• Rules for specifying SQL identifiers apply to ‘TABLE’ attribute
– Reference correlation name in the optimized SQL statement:
• Use ‘TABID’ attribute
• ‘Optimized’ SQL is the semantically equivalent version of the statement after it has
been optimized by Query Rewrite
• Use the Explain facility to get the optimized SQL statement
• Note: There is no guarantee that correlation names in the optimized SQL statement
are stable across new releases
• Table references must refer to a single table or they are ignored
– That is, no ambiguous references
• Unqualified table references are implicitly qualified by the current schema
• If both ‘TABLE’ and ‘TABID’ are specified, they must refer to the same table or they
are ignored
Figure 6-11. Forming table references CL4636.0
Notes:
The term table reference is used to mean any table, view, table expression, or the table
which an alias references in an SQL statement or view definition. An optimization guideline
can identify a table reference using either its exposed name in the original statement or the
unique correlation name that is associated with the table reference in the optimized
statement.
Using exposed names in the original statement to identify table references
A table reference is identified by using the exposed name of the table. The exposed name
is specified in the same way that a table would be qualified in an SQL statement.
The rules for specifying SQL identifiers also apply to the TABLE attribute value of an
optimization guideline. The TABLE attribute value is compared to each exposed name in
the statement. If the TABLE attribute value is schema-qualified, it matches any equivalent
exposed qualified table name. If the TABLE attribute value is unqualified, it matches any
equivalent correlation name or exposed table name. The TABLE attribute value is therefore
considered to be implicitly qualified by the default schema that is in effect for the statement.
V8.1
Instructor Guide
Uempty These concepts are illustrated by the following example. Assume that the statement is
optimized using the default schema Tpcd.
select s_name, s_address, s_phone, s_comment
from parts, suppliers, partsupp ps
where p_partkey = ps.ps_partkey and
s.s_suppkey = ps.ps_suppkey and
p_size = 39 and
p_type = 'BRASS'
The following are valid TABLE attribute values that identify a table reference in the
statement:
• '"Tpcd".PARTS'
• 'PARTS'
• ‘ps’
• ‘suppliers’
• 'Parts' (because the identifier is not delimited, it is converted to uppercase characters).
The following TABLE attribute values would fail to identify a table reference in the
statement:
• '"Tpcd2".SUPPLIERS'
• 'PARTSUPP' (not an exposed name)
• 'Tpcd.PARTS' (the identifier Tpcd must be delimited; otherwise, it is converted to
uppercase characters).
Identifying table references using correlation names in the optimized statement
An optimization guideline can also identify a table reference using the unique correlation
names that are associated with the table reference in the optimized statement. The
optimized statement is a semantically equivalent version of the original statement, as
determined during the query rewrite phase of optimization. The optimized statement can be
retrieved from the Explain tables. The TABID attribute of an optimization guideline is used
to identify table references in the optimized statement. For example:
The Original statement text is:
SELECT V1.BRANCH_ID, V1.TELLER_NAME , V1.ACCTNAME, V1.ACCT_ID, V1.BALANCE,
B1.BRANCH_NAME, B1.AREA_CODE
FROM OPTVIEW1 AS V1 , BRANCH AS B1
WHERE V1.BRANCH_ID = B1.BRANCH_ID
ORDER BY V1.ACCT_ID ASC , B1.BRANCH_ID
Instructor Guide
The following is the Optimized statement:

SELECT Q1.BRANCH_ID AS "BRANCH_ID", Q2.TELLER_NAME AS "TELLER_NAME",
Q1.ACCTNAME AS "ACCTNAME", Q1.ACCT_ID AS "ACCT_ID", Q1.BALANCE AS
"BALANCE", Q3.BRANCH_NAME AS "BRANCH_NAME", Q3.AREA_CODE AS
"AREA_CODE", Q3.BRANCH_ID
FROM INST411.HISTORY AS Q1, INST411.TELLER AS Q2, INST411.BRANCH AS Q3
WHERE (Q1.TELLER_ID = Q2.TELLER_ID) AND (80 <= Q1.BRANCH_ID) AND
(Q1.BRANCH_ID <= 95) AND (Q1.BRANCH_ID = Q3.BRANCH_ID)
ORDER BY Q1.ACCT_ID, Q3.BRANCH_ID
The following guideline could be used:
<OPTGUIDELINES>
<TBSCAN TABID="Q3" />
<LPREFETCH TABID="Q1" INDEX='HISTIX3' />
</OPTGUIDELINES>
This optimization guideline includes two access request elements; the TBSCAN element
specifies that the BRANCH table should be accessed using a table scan. The
LPREFETCH element specifies that the HISTIX3 index be used to access the HISTORY
table using a list prefetch operation. The TABID attributes are using the unique correlation
names that are associated with the table references in the optimized statement.
If a single optimization guideline specifies both the TABLE and TABID attributes, they must
identify the same table reference, or the optimization guideline is ignored.
Note
There is currently no guarantee that correlation names in the optimized statement will be
stable when upgrading to a new release of the DB2 product.
V8.1
Instructor Guide

Purpose — To discuss how an optimization guideline can refer to a table either using the
exposed name in the original SQL statement or using a correlation name from the
optimized SQL text. The reference to a table needs to be unambiguous.
Details —
Transition statement — Next we will discuss how to create table references when the
SQL text includes views rather than table names.
Instructor Guide
Table references with Views

• View Example 1
– CREATE VIEW “DBGuy".V1 as (SELECT * FROM EMPLOYEE E
WHERE SALARY > 50,000) ;
– CREATE VIEW DB2USER.V2 AS (SELECT * FROM “DBGuy".V1
WHERE DEPTNO IN (’52’, ’53’,’54’) ;
– SELECT * FROM DB2USER.V2 A WHERE V2.HIRE_DATE > ’01/01/2004’ ;
<OPTGUIDELINES> <IXSCAN TABLE=‘E’/> </OPTGUIDELINES>
• View Example 2
– CREATE VIEW “DBGuy".V1 as (SELECT * FROM EMPLOYEE A
WHERE SALARY > 50,000) ;
– CREATE VIEW DB2USER.V2 AS (SELECT * FROM “DBGuy".V1
WHERE DEPTNO IN (’52’, ’53’,’54’) ;
– SELECT * FROM DB2USER.V2 A WHERE V2.HIRE_DATE > ’01/01/2004’ ;
<OPTGUIDELINES>
<IXSCAN TABLE=’A/“DBGuy".V1/A’ />
</OPTGUIDELINES>
Figure 6-12. Table references with Views CL4636.0
Notes:
Use of views in an SQL statement can complicate the task of creating an optimizer profile
that needs to control access to the tables defined in a view. In the first example shown, the
use of unique exposed names allows the table reference to utilize the exposed name from
a view definition.
The second example shows how an Optimization guideline can use extended syntax to
identify table references that are embedded in views. Here is an example of defining a
guideline with the extended syntax.
The view DBGuy.v1 is created with a reference to the EMPLOYEE table.
create view "DBGuy".v1 as
(select * from employee A where salary > 50000)
Next a view DB2USER.v2 is created with a reference to the view DBGuy.v1.
create view DB2USER.v2 as
(select * from “DBGuy”.v1
where deptno in ('52', '53', '54')
V8.1
Instructor Guide
Uempty Assume the following SQL statement requires an optimization profile:

select * from DB2USER.v2 A
where v2.hire_date > '01/01/2004'
The following guideline could be used in an optimization profile:
<OPTGUIDELINES>
<IXSCAN TABLE='A/"DBGuy".V1/A'/>
</OPTGUIDELINES>
The IXSCAN access request element specifies that an index scan is to be used for the
EMPLOYEE table reference that is embedded in the views DB2USER.V2 and
"DBGuy".V1. The extended syntax for identifying table references in views is a series of
exposed names separated by a slash character. The value of the TABLE attribute
A/"DBGuy".V1/A illustrates the extended syntax. The last exposed name in the sequence
(A) identifies the table reference that is a target of the optimization guideline. The first
exposed name in the sequence (A) identifies the view that is directly referenced in the
original statement. The exposed name or names in the middle ("DBGuy".V1) pertain to the
view references along the path from the direct view reference to the target table reference.
The rules for referring to exposed names from optimization guidelines, described in the
previous section, apply to each step of the extended syntax.
The extended syntax, shown in example 2, is necessary because the exposed name "A"
occurred both in the original SQL text and also in the view definition. Had the exposed
name of the EMPLOYEE table reference in the view been unique with respect to all tables
that are referenced either directly or indirectly by the statement, the extended name syntax
would not be necessary.
Extended syntax can be used to target any table reference in the original statement, SQL
function, or trigger.
Instructor Guide
Instructor notes:
Purpose — To show how a table reference can be defined in an optimizer profile when the
original SQL statement references a view instead of a table name. The extended syntax,
shown in example 2, is necessary because the exposed name "A" occurred both in the
original SQL text and also in the view definition.
Details —
Transition statement — Next we will discuss how you can verify that the DB2 Optimizer
utilized an optimization profile to generate the access plan.
V8.1
Instructor Guide
Uempty
Verify statement matching and profile usage

db2 set current optimization profile = PROFILE3
db2 set current explain mode explain
db2 “SELECT H1.BRANCH_ID, H1.ACCTNAME, ………
db2 set current explain mode no
db2exfmt -1 –d DBX -O expout.txt <?xml version="1.0" encoding="UTF-8"?>
<OPTGUIDELINES>
expout.txt <QRYOPT VALUE="7" />
…………..
</OPTGUIDELINES>
Profile Information:
<STMTPROFILE ID="Profile 3 History Query 2 - Index Scan ">
--------------------
<STMTKEY SCHEMA="INST411">
OPT_PROF: (Optimization Profile Name)
INST411.PROFILE3 <![CDATA[
STMTPROF: (Statement Profile Name) SELECT H1.BRANCH_ID, H1.ACCTNAME,
Profile 3 History Query 2 ………
- Index Scan ]]>
………….
</STMTKEY>
Extended Diagnostic Information: <OPTGUIDELINES>
-------------------------------- <IXSCAN TABLE="H1" INDEX="HISTIX3" />
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
Check for any messages
indicating profile errors
Figure 6-13. Verify statement matching and profile usage CL4636.0
Notes:
The db2exfmt Explain tool provides a simple method to verify that an optimizer profile is
being utilized by the DB2 Optimizer and to review the generated access plan.
For example, a DBA using a user name of INST411 creates a new optimizer profile and
stores it into the SYSTOOL.OPT_PROFILE table using the name PROFILE3 and a
schema of INST411. The profile contains a statement level guideline. The following
sequence of commands shows how to test the profile usage:
db2 set current optimization profile = PROFILE3
db2 set current explain mode explain
db2 “SELECT H1.BRANCH_ID, H1.ACCTNAME, ………
db2 set current explain mode no
db2exfmt -1 –d DBX -O expout.txt
The SQL SELECT statement would need to match the SQL text listed in the optimizer
profile.
Instructor Guide
The report generated by the db2exfmt command would include information similar to the
following:
--------------------
INST411.PROFILE3
STMTPROF: (Statement Profile Name)
Profile 3 History Query 2 - Index Scan
This shows that the profile INST411.PROFILE3 was utilized by the DB2 Optimizer. It also
shows that the SQL statement being compiled matched a statement profile in the optimizer
profile with the ID attribute of ‘Profile 3 History Query 2 - Index Scan’. If there were any
problems encountered using the optimizer profile, there could be messages included in the
Extended Diagnostic Information section of the report.
The DBA could review the access plan and other cost information in the Explain report to
determine if the modified access plan would likely perform well. The db2batch facility could
be used to execute the SQL statement with the optimizer profile and collect actual
performance statistics.
V8.1
Instructor Guide

Purpose — To discuss using the db2exfmt report to verify the format and effect of using an
optimizer profile for an SQL statement. Students might ask about using the db2expln
Explain tool. This tool could be used to look at the access plan generated when an
optimizer profile has been used but the report does not contain any specific additional
information about the profile.
Details —
Transition statement — Next we will discuss how an optimizer profile could be modified
and put into effect while a database server is active.
Instructor Guide
Optimization profile supports setting registry variables

• You can set many registry variables
at the global level or, for specific
statements, at the statement-level
– DB2_ANTIJOIN
– DB2_EXTENDED_OPTIMIZATION
– DB2_INLIST_TO_NLJN
– DB2_MINIMIZE_LISTPREFETCH
– DB2_NEW_CORR_SQ_FF
– DB2_OPT_MAX_TEMP_SIZE
– DB2_REDUCED_OPTIMIZATION
– DB2_RESOLVE_CALL_CONFLICT
– DB2_SELECTIVITY
– DB2_SELUDI_COMM_BUFFER
– DB2_SORT_AFTER_TQ
Figure 6-14. Optimization profile supports registry variables and inexact matching CL4636.0
Notes:
Optimization profiles can have different registry variable values applied to a specific query
statement or to many query statements used in an application.
Setting registry variables in an optimization profile can increase the flexibility you have in
using different query statements for different applications. When you use the db2set
command to set registry variables, the registry variable values are applied to the entire
instance. In optimization profiles, the registry variable values apply only to the statements
specified in the optimization profile. By setting registry variables in an optimization profile,
you can tailor specific statements for applications without worrying about the registry
variable settings of other query statements.
Only a subset of registry variables can be set in an optimization profile. The following
registry variables can be set in an optimization profile:
- DB2_ANTIJOIN
- DB2_EXTENDED_OPTIMIZATION (Only the ON, OFF, and IXOR values are
supported)
V8.1
Instructor Guide
Uempty - DB2_INLIST_TO_NLJN
- DB2_MINIMIZE_LISTPREFETCH
- DB2_NEW_CORR_SQ_FF
- DB2_OPT_MAX_TEMP_SIZE
- DB2_REDUCED_OPTIMIZATION
- DB2_RESOLVE_CALL_CONFLICT
- DB2_SELECTIVITY
- DB2_SELUDI_COMM_BUFFER
- DB2_SORT_AFTER_TQ
Registry variables can be set at both the global level and statement level. If the registry
variable is set at the global level, it uses the registry variable settings for all the statements
in the optimization profile. If the registry variable is set at the statement level the setting for
that registry variable applies only to that specific statement. If the same registry variable is
set at both the global and statement levels, the registry variable value at the statement
level takes precedence.
Syntax for setting registry variables
Each registry variable is defined and set in an OPTION XML element, with a NAME and a
VALUE attribute, all of which are nested in a REGISTRY element.
For example:
<REGISTRY>
<OPTION NAME='DB2_SELECTIVITY' VALUE='YES'/>
<OPTION NAME='DB2_REDUCED_OPTIMIZATION' VALUE='NO'/>
</REGISTRY>
To have OPTION elements apply to all statements in the application that uses this profile,
include the REGISTRY and OPTION elements in the global OPTGUIDELINES element.
Instructor Guide
Instructor notes:
Purpose — To discuss the ability to set some DB2 registry variables through an
optimization profile.
Details —
Transition statement — Next we will discuss using inexact matching.
V8.1
Instructor Guide
Uempty
Inexact matching in Optimization profiles
• Inexact matching is used for flexible matching between the compiling statements
and the statements within the optimization profile
• You can compile many different statements with different literal values in the
predicate and the statements still match.
For example, the following statements match inexactly but they do not match exactly
select c1 into :hv1 from t1 where c1 > 10
<STMTPROFILE ID='S1'>
<STMTMATCH EXACT=‘FALSE'/>
<STMTKEY>
<![CDATA[select t1.c1, count(*) from t1,t2
where t1.c1 = t2.c1 and t1.c1 > 0]]>
</STMTKEY>
<OPTGUIDELINES>
<NLJOIN>
<TBSCAN TABLE='T1'/>
<TBSCAN TABLE='T2'/>
</NLJOIN>
</OPTGUIDELINES>
</STMTPROFILE>
Figure 6-15. Inexact matching in Optimization profiles CL4636.0
Notes:
During compilation, if there is an active optimization profile, the compiling statements are
matched either exactly or inexactly with the statements in the optimization profile.
Inexact matching is used for flexible matching between the compiling statements and the
statements within the optimization profile. Inexact matching ignores literals, host variables,
and parameter markers when matching the compiling statement to the optimization profile
statements. Therefore, you can compile many different statements with different literal
values in the predicate and the statements still match.
For example, the following statements match inexactly but they do not match exactly:
Inexact matching is applied to both SQL and XQuery statements. However, string literals
that are passed as function parameters representing SQL or XQuery statements or
statement fragments, including individual column names are not inexactly matched.
Instructor Guide
XML functions such as XMLQUERY, XMLTABLE, and XMLEXISTS that are used in an
SQL statement are exactly matched. String literals could contain the following items:
A whole statement with SQL embedded inside XQuery, or XQuery embedded inside an
SQL statement
An identifier, such as a column name
An XML expression that contains a search path
Syntax for specifying matching
Within the optimization profile, you can set either exact or inexact matching at either the
global level or at the statement-level. The XML element STMTMATCH can be used to
set the matching method.
The STMTMATCH element has an EXACT attribute which can be set to either TRUE or
FALSE. If you specify the value TRUE, exact matching is enabled. If you specify the
value FALSE, inexact matching is enabled. If you do not specify this element or if you
specify only the STMTMATCH element without the EXACT attribute, exact matching is
enabled by default.
To have a matching method apply to all statements within the optimization profile, place
the STMTMATCH element at the global level just after the top OPTPROFILE element.
To have a matching method apply to a specific statement within the optimization profile,
place the STMTMATCH element just after the STMTPROFILE element.
V8.1
Instructor Guide

Purpose — To discuss using inexact matching of statements to optimization profiles.
Details —
Transition statement — Next we will discuss how an optimizer profile could be modified
and put into effect while a database server is active.
Instructor Guide
Modifying an optimization profile

• Note: Optimization profiles are cached in database memory once they
are referenced by an application
• Edit the optimization profile, applying the necessary changes, and validate the XML.
• Update the SYSTOOLS.OPT_PROFILE table with the new profile. The
INSERT_UPDATE option of IMPORT can be used.
• The previous optimization profile can be flushed from memory using the FLUSH
OPTIMIZATION PROFILE CACHE statement
When you flush the optimization profile cache, any dynamic statements that were
prepared with the old optimization profile are also invalidated.
Example 1: Invalidate one cached optimization profile

SET CURRENT SCHEMA = '“DBA"'
FLUSH OPTIMIZATION PROFILE CACHE “INVENTDB"
Example 2: Invalidate all cached optimization profiles

FLUSH OPTIMIZATION PROFILE CACHE ALL
• DELETE and UPDATE triggers could be used to automatically flush the profile
cache. A sample procedure can be found in the DB2 Information Center.
Figure 6-16. Modifying an optimization profile CL4636.0
Notes:
When a new optimizer profile is being developed, you might need to make a series of
changes to the profile to achieve the desired access plan and performance result. It might
also be necessary to adjust the profile to match a change in an SQL statement used by an
application and you might need to make the change without stopping the DB2 database
server.
You can modify an optimization profile by editing the document, validating it against the
current optimization profile schema (COPS), and replacing the original document in the
SYSTOOLS.OPT_PROFILE table with the new version.
When an optimization profile is referenced, it is compiled and cached in memory in a area
called the optimization profile cache. In order to use the updated version of the optimizer
profile the cached reference must also be removed from memory. You can use the FLUSH
OPTIMIZATION PROFILE CACHE statement to remove the old profile from the
optimization profile cache and to invalidate any statement in the dynamic plan cache that
was prepared using the old profile (logical invalidation).
To modify an optimization profile:
V8.1
Instructor Guide
Uempty • Edit the optimization profile, applying the necessary changes, and validate the XML.
• Update the SYSTOOLS.OPT_PROFILE table with the new profile.
• Issue the FLUSH OPTIMIZATION PROFILE CACHE statement to remove any versions
of the optimization profile that might be contained in the optimization profile cache.
When you flush the optimization profile cache, any dynamic statements that were
prepared with the old optimization profile are also invalidated in the dynamic plan
cache. The FLUSH OPTIMIZATION PROFILE CACHE statement can be used to flush
the references to a single optimizer profile or all profiles from the optimization profile
cache.
• Any subsequent reference to the optimization profile causes the optimizer to read the
new profile and to reload it into the optimization profile cache. Also, because of the
logical invalidation of statements that were prepared under the old optimization profile,
any calls made to those statements will be prepared under the new optimization profile
and re-cached in the dynamic plan cache.
The Information Center contains examples of a stored procedure and UPDATE and
DELETE triggers that could be implemented to automatically flush the optimization profile
cache when a profile is changed or deleted.
Instructor Guide
Instructor notes:
Purpose — To discuss how to implement changes to optimizer profiles and how to use the
FLUSH OPTIMIZATION PROFILE CACHE statement to remove cached versions of a
profile from database memory.
Details —
Transition statement — Next we will step through a series of optimizer profile examples to
see how to define different guidelines to impact the access plans generated by the DB2
Optimizer.
V8.1
Instructor Guide
Uempty
Sample Optimization Profiles

Figure 6-17. Sample Optimization Profiles CL4636.0
Notes:
In this section, we will learn how to implement optimizer profiles using a series of example
profiles. Each time we will examine the Explain tool report to review the revised access
plan.
Instructor Guide
Instructor notes:
Purpose — Explain to students that we will be looking at a series of optimizer profile
examples to see how the defined profile changes the access plan produced by the DB2
Optimizer. This series of examples can help students implement their own optimizer
profiles based on working samples and reduce the learning curve required to get the
performance results that can be critical to key applications.
Details —
Transition statement — First we will take a look at the test tables that were used to create
the Explain reports with the optimizer profiles.
V8.1
Instructor Guide
Uempty
Sample Tables and Indexes used for examples
HISTORY table Includes columns BRANCH_ID and TELLER_ID

513K rows 9176 Pages
Index HISTIX1 BRANCH_ID 77% Clusterratio
Index HISTIX2 TELLER_ID 11% Clusterratio
Index HISTIX3 (BRANCH_ID,TELLER_ID) 10% Clusterratio
TELLER table include unique TELLER_ID

1000 rows 29 pages
Index TELLINDX TELLER_ID 100% Clusterratio
BRANCH table include unique BRANCH_ID

100 rows 4 pages
Index BRANINDX BRANCH_ID 100% Clusterratio
Figure 6-18. Sample Tables and Indexes used for examples CL4636.0
Notes:
In order to create effective optimizer profiles, we need some information about the tables
and indexes that we will be working with.
A set of test tables were used to generate the Explain reports that will be presented. The
fact table for our sample queries is the HISTORY table that contains a transaction history.
Each row in the HISTORY table contains teller and branch keys. The test HISTORY table
has over 500,000 rows. There are three indexes on the HISTORY table.
• The index HISTIX1 is a single column index based on branch numbers, the
BRANCH_ID column.
• The index HISTIX2 is a single column index based on teller numbers, the TELLER_ID
column.
• The index HISTIX3 is a two column index combining the BRANCH_ID and TELLER_ID
into a single index.
The BRANCH table contains information about bank branches and contains 100 rows.
There is a single unique index based on the BRANCH_ID column.
Instructor Guide
The TELLER table contains information about bank tellers and contains 1000 rows. There
is a single unique index based on the TELLER_ID column.
V8.1
Instructor Guide

Purpose — To provide some information about the tables that will be used to generate the
sample Explain reports. In order to create optimizer profiles that are effective in improving
performance, students will need to understand the characteristics of the tables and indexes
that are being accessed.
Details —
Transition statement — Next we will look at the first sample query and review the default
access plan selected by the optimizer.
Instructor Guide
Sample query 1: Default Access plan – Class 5

SELECT HISTORY.BRANCH_ID, Access plan uses Index HISTIX1
HISTORY.ACCTNAME, HISTORY.ACCT_ID,
HISTORY.BALANCE with List Prefetch
FROM HISTORY AS HISTORY
WHERE 3790.19
FETCH
HISTORY.TELLER_ID BETWEEN 10 AND 50 ( 4)
AND HISTORY.BRANCH_ID BETWEEN 3 AND 20 3260.72
ORDER BY HISTORY.ACCT_ID ASC 3548.83
/---+----\
92443.7 513576
RIDSCN TABLE: OPT
Rows
( 5) HISTORY
RETURN
Access Plan: ( 1)
240.682 Q1
----------- 166.361
Cost
|
Total Cost: 3261 I/O
92443.7
|
Cumulative I/O Cost: 3548 SORT
3790.19
Package Context: TBSCAN
( 6)
240.682
--------------- ( 2)
166.361
SQL Type: Dynamic 3261.81
|
3548.83
Optimization Level: 5 |
92443.7
Blocking: Block All Cursors IXSCAN
3790.19
( 7)
Isolation Level: Cursor Stability SORT
218.138
( 3)
166.361
3261.7
|
3548.83
513576
|
INDEX: OPT
HISTIX1
Q1
Figure 6-19. Sample query 1: Default Access plan - Class 5 CL4636.0
Notes:
The first sample query accesses the HISTORY table for a defined range of TELLER_ID
and BRANCH_ID column values. The range of TELLER_ID is between 10 and 50. The
range for BRANCH_ID is between 2 and 20. The query text is as follows:
SELECT HISTORY.BRANCH_ID, HISTORY.ACCTNAME, HISTORY.ACCT_ID,
HISTORY.BALANCE
FROM HISTORY AS HISTORY
WHERE
HISTORY.TELLER_ID BETWEEN 10 AND 50
AND HISTORY.BRANCH_ID BETWEEN 3 AND 20
ORDER BY HISTORY.ACCT_ID ASC
The default optimizer class of 5 was used. With basic table statistics, the optimizer
estimates that 18% of the HISTORY rows would contain BRANCH_ID values between 3
and 20.
V8.1
Instructor Guide
Uempty The optimizer selected the HISTIX1 index, based on the BRANCH_ID column to access
the table. Since the index HISTIX1 more is not highly clustered, the optimizer decided to
use List Prefetch to retrieve the estimated 92,443 rows to reduce I/O costs.
The predicate on TELLER_ID is expected to reduce the result to 3,790.
The estimated total cost for the query is 3261 with an estimated 3548 I/Os required.
Instructor Guide
Here is a portion of the db2exfmt report generated with the sample query:
DB2 Universal Database Version 10.5, 5622-044 (c) Copyright IBM Corp. 1991, 2012
Licensed Material - Program Property of IBM
IBM DATABASE 2 Explain Table Format Tool
******************** EXPLAIN INSTANCE ********************
DB2_VERSION: 10.05.0
FORMATTED ON DB: MUSICDB
SOURCE_NAME: SQLC2K26
SOURCE_SCHEMA: NULLID
SOURCE_VERSION:
EXPLAIN_TIME: 2013-10-15-16.28.51.899924
EXPLAIN_REQUESTER: INST461
Database Context:
----------------
Parallelism: None
CPU Speed: 1.220223e-07
Comm Speed: 100
Buffer Pool size: 5005
Sort Heap size: 1001
Database Heap size: 4440
Lock List size: 2000
Maximum Lock List: 22
Average Applications: 1
Locks Available: 14080
Package Context:
---------------
SQL Type: Dynamic
Optimization Level: 5
Blocking: Block All Cursors
Isolation Level: Cursor Stability
---------------- STATEMENT 1 SECTION 201 ----------------

QUERYNO: 1
QUERYTAG: CLP
Statement Type: Select
Updatable: No
Deletable: No
Query Degree: 1
Original Statement:
------------------
SELECT
H1.BRANCH_ID,
H1.ACCTNAME,
H1.ACCT_ID,
H1.BALANCE
FROM
V8.1
Instructor Guide
Uempty HISTORY AS H1
WHERE
H1.TELLER_ID BETWEEN 10 AND 50 AND
H1.BRANCH_ID BETWEEN 3 AND 20
ORDER BY
H1.ACCT_ID ASC
Optimized Statement:
-------------------
SELECT
Q1.BRANCH_ID AS "BRANCH_ID",
Q1.ACCTNAME AS "ACCTNAME",
Q1.ACCT_ID AS "ACCT_ID",
Q1.BALANCE AS "BALANCE"
FROM
OPT.HISTORY AS Q1
WHERE
(Q1.BRANCH_ID <= 20) AND
(3 <= Q1.BRANCH_ID) AND
(Q1.TELLER_ID <= 50) AND
(10 <= Q1.TELLER_ID)
ORDER BY
Q1.ACCT_ID
Access Plan:
-----------
Total Cost: 3261.81
Query Degree:1
Cumulative Total Cost: 3261.81
Cumulative CPU Cost: 6.82287e+08
Cumulative I/O Cost: 3548.83
Cumulative Re-Total Cost: 59.7699
Cumulative Re-I/O Cost: 0
Cumulative First Row Cost: 3261.7
Estimated Bufferpool Buffers: 0
Rows
RETURN
( 1)
Cost
I/O
|
3790.19
TBSCAN
( 2)
3261.81
3548.83
|
3790.19
SORT
( 3)
3261.7
3548.83
|
Instructor Guide
3790.19
FETCH
( 4)
3260.72
3548.83
/---+----\
92443.7 513576
RIDSCN TABLE: OPT
( 5) HISTORY
240.682 Q1
166.361
|
92443.7
SORT
( 6)
240.682
166.361
|
92443.7
IXSCAN
( 7)
218.138
166.361
|
513576
INDEX: OPT
HISTIX1
Q1
V8.1
Instructor Guide

Purpose — The first step in creating an optimizer profile is to review the default access
plan generated by the DB2 Optimizer. We will start with a simple query that accesses one
table. The default access plan in the example should perform well, but we want to see how
we would alter this access plan using an optimizer profile if a performance problem did
occur.
Details —
Transition statement — We will start with a very simple optimizer profile.
Instructor Guide
Step 1: One Global optimization guideline

Profile 1 – set a global optimization guideline for all statements
to use query optimization class 1
set current optimization profile PROFILE1
<OPTPROFILE VERSION=“10.5.0.0">
<OPTGUIDELINES>
<QRYOPT VALUE="1" />
</OPTGUIDELINES>
</OPTPROFILE>
Figure 6-20. Step 1: One Global optimization guideline CL4636.0
Notes:
To start as simple as possible, we will start with an optimizer profile that contains a single
global guideline. In this case, our optimization profile sets the query optimization class to a
value of 1. Using a query optimization class of 1 could reduce the time required to generate
access plans since fewer alternative access options are considered.
The global optimizer guideline would apply to all SQL statements compiled with this profile
in effect regardless of the SQL text. The optimizer profile use the QRYOPT element to set
the optimization class to a value of 1 as follows:
<OPTPROFILE VERSION="105.0.0">
<OPTGUIDELINES>
</OPTGUIDELINES>
</OPTPROFILE>
V8.1
Instructor Guide

Purpose — To start with a very simple optimizer profile that contains a single global
optimization guideline and no statement level guidelines.
Details —
Transition statement — Next we will look at a portion of the Explain report generated with
this optimization profile in effect.
Instructor Guide
Sample query 1:
Access plan – Optimization Level 1
SELECT HISTORY.BRANCH_ID, Optimizer selected a table scan
HISTORY.BALANCE List Prefetch is not considered
FROM HISTORY AS HISTORY Rows
WHERE RETURN
Cost
AND HISTORY.BRANCH_ID BETWEEN 3 AND 20 I/O
ORDER BY HISTORY.ACCT_ID ASC |
3790.19
Access Plan: TBSCAN
( 2)
-----------
8272.72
Total Cost: 8272 9177
Cumulative I/O Cost: 9177 |
3790.19
SORT
Package Context: ( 3)
--------------- 8272.72
SQL Type: Dynamic 9177
|
3790.19
Blocking: Block All Cursors TBSCAN
Isolation Level: Cursor Stability ( 4)
8272.04
9177
Profile Information: |
-------------------- 513576
OPT_PROF: (Optimization Profile Name) TABLE: OPT
HISTORY
INST461.PROFILE1
Q1
Figure 6-21. Sample query 1: Access plan - Optimization Level 1 CL4636.0
Notes:
The same HISTORY table query was compiled using the PROFILE1 optimizer profile that
contains the global guideline setting the optimization class to 1. Looking at the Explain
report, generated using db2exfmt, we can see that our optimizer profile was used and that
the optimization level was 1 instead of the default level of 5.
The access plan changed from the default plan, using a table scan instead of an index to
access the HISTORY table. Note that the estimated cost is more than twice the cost of the
default plan. The estimated I/O cost is also higher, 9177 compared to the default of 451.
With an optimization class of 1, the DB2 Optimizer will not consider using a List prefetch
operation based on an index. With unclustered indexes, the table scan became the best
alternative method to access the table.
The new access plan might or might not perform better when executed. We can see that
the optimizer estimates that a table scan which needs to read every page in the table would
have a higher resource cost than using the index with the default access plan.
V8.1
Instructor Guide

Purpose — To look at portions of the Explain report to verify that the optimizer profile was
used during SQL compilation and to see if a different access plan was selected.
Details —
Transition statement — Next we will look at a sample optimizer profile with more than one
global guideline.
Instructor Guide
Step 2: Multiple Global optimization guidelines

Profile 2 – Set several global optimization guidelines for all
statements
– Use query optimization class 3
– Set Degree for Intra-parallel processing to 4
set current optimization profile PROFILE2
<OPTGUIDELINES>
<DEGREE VALUE="4" />
</OPTGUIDELINES>
</OPTPROFILE>
Figure 6-22. Step 2: Multiple Global optimization guidelines CL4636.0
Notes:
The next sample optimization profile shows that several global guidelines can be combined
into a single profile. In this case, we might want to set the query optimization class to 3 and
also set the degree of parallelism to 4 for one application. These guidelines would apply to
all SQL statements.
The optimization profile text is as follows:
<OPTGUIDELINES>
<DEGREE VALUE="4" />
</OPTGUIDELINES>
</OPTPROFILE>
V8.1
Instructor Guide
Uempty Putting these options into an optimization profile would allow various settings to be tested
without making any changes to the application or implementing database configuration
changes that would impact other applications.
Instructor Guide
Instructor notes:
Purpose — To look at a sample optimization profile that contains more than one global
guideline, but no statement level guidelines.
Details —
Transition statement — Next we will look at the access plan generated using this
optimizer profile.
V8.1
Instructor Guide
Uempty
Sample query 1: Access plan – Profile 2

SELECT HISTORY.BRANCH_ID, Access plan uses Index HISTIX1
HISTORY.BALANCE with List Prefetch and parallel sort
FROM HISTORY AS HISTORY 3790.19
WHERE FETCH
AND HISTORY.BRANCH_ID BETWEEN 3 AND 20 3260.72
3548.83
ORDER BY HISTORY.ACCT_ID ASC Rows
/---+----\
RETURN
92443.7 513576
( 1)
Access Plan: Cost
RIDSCN TABLE: OPT
----------- I/O
( 6) HISTORY
240.682 Q1
Total Cost: 3262 |
166.361
Cumulative I/O Cost: 3548 3790.19
LMTQ |
Query Degree: 4 ( 2) 92443.7
3262.3 SORT
3548.83 ( 7)
Package Context: 240.682
|
--------------- 3790.19 166.361
SQL Type: Dynamic TBSCAN |
Optimization Level: 3 ( 3) 92443.7
3261.81 IXSCAN
Blocking: Block All Cursors ( 8)
3548.83
Isolation Level: Cursor Stability | 218.138
Profile Information: 3790.19 166.361
-------------------- SORT |
( 4) 513576
OPT_PROF: (Optimization Profile 3261.7 INDEX: OPT
Name) 3548.83 HISTIX1
INST461.PROFILE2 | Q1
Figure 6-23. Sample query 1: Access plan - Profile 2 CL4636.0
Notes:
We are using the same SQL query that was used in previous examples. This portion of the
Explain report shows that the optimizer profile INST461.PROFILE2 was used during SQL
compilation. The report shows that the optimization level used was 3 and the Query Degree
was 4 as requested in the profile.
The access plan selected was very similar to the default access plan, but with a degree of
parallelism of 4 the optimizer is going to use a parallel sort operation, indicated by the Local
Merge Table Queue (LMTQ) operation.
The estimated costs are very close to those for the default access plan.
Instructor Guide
Here is a portion of the Explain report generating using this optimizer profile.
SOURCE_VERSION:
EXPLAIN_TIME: 2013-10-15-16.40.35.712255
Database Context:
----------------
Parallelism: Intra-Partition Parallelism
CPU Speed: 1.220223e-07
Comm Speed: 100
Package Context:
---------------
SQL Type: Dynamic

QUERYNO: 1
QUERYTAG: CLP
Updatable: No
Deletable: No
Query Degree: 4
--------------------
INST461.PROFILE2
Original Statement:
------------------
SELECT
H1.BRANCH_ID,
H1.ACCTNAME,
H1.ACCT_ID,
H1.BALANCE
FROM
HISTORY AS H1
V8.1
Instructor Guide
Uempty WHERE
ORDER BY
H1.ACCT_ID ASC
-------------------
SELECT
FROM
OPT.HISTORY AS Q1
WHERE
ORDER BY
Q1.ACCT_ID
Access Plan:
-----------
Total Cost: 3262.3
Query Degree:4
Rows
RETURN
( 1)
Cost
I/O
|
3790.19
LMTQ
( 2)
3262.3
3548.83
|
3790.19
TBSCAN
( 3)
3261.81
3548.83
|
3790.19
Instructor Guide
SORT
( 4)
3261.7
3548.83
|
3790.19
FETCH
( 5)
3260.72
3548.83
/---+----\
92443.7 513576
RIDSCN TABLE: OPT
( 6) HISTORY
240.682 Q1
166.361
|
92443.7
SORT
( 7)
240.682
166.361
|
92443.7
IXSCAN
( 8)
218.138
166.361
|
513576
INDEX: OPT
HISTIX1
Q1
V8.1
Instructor Guide

Purpose — To review a portion of the Explain tool report generated using the optimizer
profile with several global guidelines being applied during query compilation.
Details —
Transition statement — Next we will look at a simple optimizer profile with a statement
level guideline.
Instructor Guide
Step 3:
Use a Statement guideline to select an index
Profile 3 – Set global and statement level optimization guidelines
<OPTGUIDELINES>
</OPTGUIDELINES>

<STMTKEY SCHEMA=“OPT">
<![CDATA[
SELECT H1.BRANCH_ID, H1.ACCTNAME,
H1.ACCT_ID, H1.BALANCE
FROM HISTORY AS H1
WHERE H1.TELLER_ID BETWEEN 10 AND 50
AND H1.BRANCH_ID BETWEEN 3 AND 20
ORDER BY H1.ACCT_ID ASC
]]>
</STMTKEY>
<OPTGUIDELINES>
<IXSCAN TABLE="H1" INDEX="HISTIX3" />
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
Figure 6-24. Step 3: Use a Statement guideline to select an index CL4636.0
Notes:
You might decide that performance for an application would be improved if a particular
index was used to access a table, but the optimizer might be selecting another index or a
table scan instead. In order to specify the use of a particular index, you will need to specify
a statement level profile. The STMTKEY element contains the SQL text to be matched by
the optimizer.
The IXSCAN optimization guideline element can be used to specify that a table should be
accessed using an index scan. The INDEX attribute can be used to specify that a particular
index should be utilized. We will use the following guideline to request that the index
HISTIX3 is used to access the table.
<OPTGUIDELINES>
</OPTGUIDELINES>
V8.1
Instructor Guide
Uempty The following optimization profile includes one global level optimization guideline and a
statement level optimization guideline that requests an index scan be used to access the
HISTORY table.
<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTKEY SCHEMA="INST411">
<![CDATA[
FROM HISTORY AS H1
WHERE H1.TELLER_ID BETWEEN 10 AND 50 AND
]]>
</STMTKEY>
<OPTGUIDELINES>
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
The TABLE attribute of the IXSCAN element uses the exposed name "H1" from the SQL
statement to reference the HISTORY table. The CDATA section is used within the
STMTKEY element to correctly handle SQL text that contains ‘<‘ or ‘>’ characters that
would otherwise be processed as special XML format characters. This SQL statement
does not contain those characters but including the CDATA section would allow the
statement to be safely modified later if necessary.
The STMTPROFILE includes the ID attribute "Profile 3 History Query 1- Index Scan”. This
is very helpful when testing the optimizer profile because this ID will appear in the Explain
tool report if the optimizer matches the SQL text with this statement profile. An optimizer
profile could contain many statement profiles; our example only has one included in the
XML document.
Instructor Guide
Instructor notes:
Purpose — To show an optimization profile example that contains one statement level
optimization guideline that requests that a particular named index be used to access a
table.
Details —
Transition statement — Next we will look at the access plan generated by the optimizer
using this optimizer profile.
V8.1
Instructor Guide
Uempty
Sample query 1: Access plan – Profile 3

Rows
RETURN
SELECT H1.BRANCH_ID, H1.ACCTNAME, ( 1)
H1.ACCT_ID, H1.BALANCE Cost
FROM HISTORY AS H1 I/O
|
WHERE H1.TELLER_ID BETWEEN 10 AND 50 3790.19
AND H1.BRANCH_ID BETWEEN 3 AND 20 TBSCAN
( 2) Access plan
20868.6
3214.57 uses Index
|
Access Plan: 3790.19 HISTIX3 without
----------- SORT List Prefetch
( 3)
Total Cost: 20868 20868.6
Package Context: |
--------------- 3790.19
FETCH
SQL Type: Dynamic ( 4)
Optimization Level: 7 20867.9
Blocking: Block All Cursors 3214.57
Isolation Level: Cursor Stability /---+----\
3790.19 513576
Profile Information: IXSCAN TABLE: OPT
-------------------- ( 5) HISTORY
OPT_PROF: (Optimization Profile Name) 218.004 Q1
INST461.PROFILE3 162.893
|
STMTPROF: (Statement Profile Name) 513576
Profile 3 History Query 1 - Index Scan INDEX: OPT
HISTIX3
Q1
Figure 6-25. Sample query 1: Access plan - Profile 3 CL4636.0
Notes:
The Profile Information section of the db2exfmt Explain report shows that the optimizer
profile named INST461.PROFILE3 was used to compile the access plan. It also shows that
the SQL statement matched the statement profile with an ID of ‘Profile 3 History Query 1 -
Index Scan ‘. The profile also included a global guideline to set the optimization class to 7.
The access plan generated now uses an index scan of the HISTIX3 index to access the
table. Using this index allows both predicates to be handled during the index scan. Notice
that the estimated cost for this statement is now 20868, which is over six times the original
cost for the default access plan. The estimated I/O cost is 3214 compared to the default I/O
cost of 3548. Using the HISTIX3 without the benefits of a List prefetch operation is
expected to be much less efficient.
What is important to notice from this example is that using the IXSCAN access request
element in the optimization profile caused the optimizer to select this direct index scan
method to access the HISTORY table, even though this is considered by the optimizer to
be a much less efficient access plan.
Instructor Guide
In the next optimization profile example we will use the LPREFETCH access request
element to control the index selection, but to preserve the use of the List prefetch
operation.
V8.1
Instructor Guide
Uempty Here is a portion of the Explain report generated using the PROFILE3 optimizer profile.
---------------
SQL Type: Dynamic

QUERYNO: 1
QUERYTAG: CLP
Updatable: No
Deletable: No
Query Degree: 1
--------------------
INST461.PROFILE3
Profile 3 History Query 1 - Index Scan
Original Statement:
------------------
SELECT
H1.BRANCH_ID,
H1.ACCTNAME,
H1.ACCT_ID,
H1.BALANCE
FROM
HISTORY AS H1
WHERE
ORDER BY
H1.ACCT_ID ASC
-------------------
SELECT
FROM
OPT.HISTORY AS Q1
WHERE
ORDER BY
Instructor Guide
Q1.ACCT_ID
Access Plan:
-----------
Total Cost: 20868.6
Query Degree:1
Rows
RETURN
( 1)
Cost
I/O
|
3790.19
TBSCAN
( 2)
20868.6
3214.57
|
3790.19
SORT
( 3)
20868.6
3214.57
|
3790.19
FETCH
( 4)
20867.9
3214.57
/---+----\
3790.19 513576
IXSCAN TABLE: OPT
( 5) HISTORY
218.004 Q1
162.893
|
513576
INDEX: OPT
HISTIX3
Q1
V8.1
Instructor Guide

Purpose — To see the effect on the access plan selected when the IXSCAN access
request element is added to the optimization profile. A key learning objective for students
here is to understand that they need to choose either the IXSCAN or LPREFETCH access
query elements to control the use of an index and to determine if a List prefetch operation
will be included in the access plan.
Details —
Transition statement — Next we will look at a version of the optimization profile that uses
the LPREFETCH access request to control index usage.
Instructor Guide
Use a Statement guideline to set index access

Profile 4 – Set global and a statement level optimization guideline to specify LIST PREFETCH for a
specific index
<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 4 History Query 1 - Index List Prefetch ">
<![CDATA[
FROM HISTORY AS H1
]]>
</STMTKEY>
<OPTGUIDELINES>
<LPREFETCH TABLE="H1" INDEX="HISTIX3" />
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
Figure 6-26. Use a Statement guideline to set index access CL4636.0
Notes:
The next sample optimization profile, PROFILE4, uses the LPREFETCH access request
element to request the use of the HISTIX3 index to access the HISTORY table, but to
perform a List prefetch rather than accessing each data row directly, to improve access
efficiency. The following guideline is used:
<OPTGUIDELINES>
</OPTGUIDELINES>
V8.1
Instructor Guide
Uempty Here is the complete optimizer profile document.

<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 4 History Query 1 - Index List Prefetch ">
<STMTKEY SCHEMA="OPT">
<![CDATA[
FROM HISTORY AS H1
]]>
</STMTKEY>
<OPTGUIDELINES>
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
In this example, we will include the INDEX attribute to specify which index to use in the
generated access plan. If no INDEX attribute was included, the optimizer could select any
available index to use in the List prefetch operation based on lowest estimated cost.
Instructor Guide
Instructor notes:
Purpose — To discuss using the LPREFETCH access request element in an optimizer
profile to request a list prefetch type of access to a table. This can be used with or without
the INDEX attribute being used to request use of a particular index.
Details —
Transition statement — Next we will look at the Explain report generated using this profile
with the LPREFETCH access request.
V8.1
Instructor Guide
Uempty
Sample query 1:
Access plan – Profile 4 – List Prefetch
H1.ACCT_ID, H1.BALANCE 3790.19
FROM HISTORY AS H1 FETCH
WHERE H1.TELLER_ID BETWEEN 10 AND 50 ( 4)
7600.46
AND H1.BRANCH_ID BETWEEN 3 AND 20 7653.97
ORDER BY H1.ACCT_ID ASC /---+----\
3790.19 513576
RIDSCN TABLE: OPT
( 5) HISTORY
Access Plan: 218.63 Q1
----------- 162.893
Total Cost: 7601 |
SORT
( 6)
Package Context: 218.63
--------------- 162.893 Access plan
SQL Type: Dynamic |
3790.19 uses Index
Optimization Level: 7 IXSCAN
Blocking: Block All Cursors ( 7)
HISTIX3 with
Isolation Level: Cursor Stability 218.004 List Prefetch
Profile Information: 162.893
|
-------------------- 513576
OPT_PROF: (Optimization Profile Name) INDEX: OPT
INST461.PROFILE4 HISTIX3
STMTPROF: (Statement Profile Name) Q1
Profile 4 History Query 1 - Index List Prefetch
Figure 6-27. Sample query 1: Access plan - Profile 4 - List Prefetch CL4636.0
Notes:
Index List Prefetch ‘. The profile also included a global guideline to set the optimization
class to 7.
The access plan generated now performs a List prefetch operation using the HISTIX3
index to access the table. Using this index allows both predicates to be handled during the
index scan. Notice that the estimated cost for this statement is now 7601, which is much
lower than the estimated cost for the direct index scan using this index. The estimated I/O
cost of 7653 is higher but the I/Os are estimated to be more efficient because of the
prefetch technique.
Instructor Guide
Here is a portion of the Explain report generated using the PROFILE4 optimizer profile.
---------------
SQL Type: Dynamic

QUERYNO: 1
QUERYTAG: CLP
Updatable: No
Deletable: No
Query Degree: 1
--------------------
INST461.PROFILE4
Profile 4 History Query 1 - Index List Prefetch
Original Statement:
------------------
SELECT
H1.BRANCH_ID,
H1.ACCTNAME,
H1.ACCT_ID,
H1.BALANCE
FROM
HISTORY AS H1
WHERE
ORDER BY
H1.ACCT_ID ASC
-------------------
SELECT
FROM
OPT.HISTORY AS Q1
WHERE
ORDER BY
V8.1
Instructor Guide
Uempty Q1.ACCT_ID
Access Plan:
-----------
Total Cost: 7601.56
Query Degree:1
Rows
RETURN
( 1)
Cost
I/O
|
3790.19
TBSCAN
( 2)
7601.56
7653.97
|
3790.19
SORT
( 3)
7601.44
7653.97
|
3790.19
FETCH
( 4)
7600.46
7653.97
/---+----\
3790.19 513576
RIDSCN TABLE: OPT
( 5) HISTORY
218.63 Q1
162.893
|
3790.19
SORT
( 6)
218.63
162.893
|
3790.19
IXSCAN
( 7)
218.004
162.893
Instructor Guide
|
513576
INDEX: OPT
HISTIX3
Q1
V8.1
Instructor Guide

Purpose — To show that the high estimated cost that resulted from the previous
optimization profile use of HISTIX3 was mostly because of the loss of efficiency from the
List prefetch and not the selection of the alternate index. There might be times when
performing tests using optimization profiles can help to clarify changes in estimated costs
that would otherwise be difficult to obtain, because we are normally limited to seeing the
cost for the most efficient plan selected, not the alternatives that were bypassed.
Details —
Transition statement — Next we will look at an optimization profile that specifies the use
of multiple indexes for processing an SQL statement.
Instructor Guide
Define a Statement guideline

with multiple indexes: Index ANDING
Profile 6 – Set global and a statement level optimization guideline to specify Index ANDING with two indexes
<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 6 History Query 1 - Index Anding ">
<![CDATA[
FROM HISTORY AS H1
]]>
</STMTKEY>
<OPTGUIDELINES>
<IXAND TABLE='H1' > Index Sequence impacts the order for the
<INDEX IXNAME='HISTIX2'/> Index ANDING operation
<INDEX IXNAME='HISTIX1'/>
</IXAND>
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
Figure 6-28. Define a Statement guideline with multiple indexes: Index ANDING CL4636.0
Notes:
The SQL statement we are using has two range predicates on the BRANCH_ID and
TELLER_ID columns. The HISTORY table has two single column indexes based on these
columns. An Index Anding operation could be used to combine the results of multiple index
scans to reduce the number of pages that would need to be read from the HISTORY table
to produce the result. The next sample optimization profile, PROFILE6, uses the IXAND
access request element to cause the optimizer to combine HISTIX1 and HISTIX2 index
scans in an Index anding operation for the defined SQL statement. The sequence of the
INDEX elements will control the sequence that indexes are processed using the build and
probe methods for dynamic bitmap index anding. The following guideline is used:
<OPTGUIDELINES>
<IXAND TABLE='H1' >
</IXAND>
</OPTGUIDELINES>
V8.1
Instructor Guide

<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 6 History Query 2 - Index Anding ">
<![CDATA[
FROM HISTORY AS H1
WHERE H1.TELLER_ID BETWEEN 10 AND 50 AND
]]>
</STMTKEY>
<OPTGUIDELINES>
<IXAND TABLE='H1' >
</IXAND>
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
Instructor Guide
Instructor notes:
Purpose — To show an example of an optimization profile that specifies the access of two
indexes in an Index ANDing operation to process an SQL statement.
Details —
Transition statement — Next we will look at the Explain tool report generated using the
optimization profile with multiple index access.
V8.1
Instructor Guide
Uempty
Sample query 1:
Access plan – Profile 6 – Index Anding
3790.19
FETCH
SELECT H1.BRANCH_ID, H1.ACCTNAME, Access plan ( 4)
H1.ACCT_ID, H1.BALANCE uses Indexes 2982.25
FROM HISTORY AS H1 3214.98
HISTIX2 and /---+----\
WHERE H1.TELLER_ID BETWEEN 10 AND 50 HISTIX1 with
3790.19 513576
AND H1.BRANCH_ID BETWEEN 3 AND 20 RIDSCN
Index Anding TABLE: OPT
( 5) HISTORY
ORDER BY H1.ACCT_ID ASC 309.065 Q1
204.907
|
Access Plan: 3790.19
----------- SORT
( 6)
Total Cost: 2983 309.065
Package Context: |
--------------- 3790.19
IXAND
SQL Type: Dynamic ( 7)
Optimization Level: 7 308.144
Blocking: Block All Cursors 204.907
Isolation Level: Cursor Stability /-----+------\
21056.6 92443.7
Profile Information: IXSCAN IXSCAN
-------------------- ( 8) ( 9)
OPT_PROF: (Optimization Profile Name) 87.5331 218.138
INST461.PROFILE6 38.5462 166.361
| |
STMTPROF: (Statement Profile Name) 513576 513576
Profile 6 History Query 1 - Index Anding INDEX: OPT INDEX: OPT
HISTIX2 HISTIX1
Q1 Q1
Figure 6-29. Sample query 1: Access plan - Profile 6 - Index Anding CL4636.0
Notes:
Index Anding ‘. The profile also included a global guideline to set the optimization class to
7.
The access plan generated now performs an Index Anding operation, combining pointers
from the HISTIX1 and HISTIX2 indexes to access the table. Using the Index Anding
operation produced an access plan with an estimated cost of 2983, which is 9 percent
lower than the cost estimated for the default access plan.
The estimated I/O cost of 3214 is also lower than the I/O cost for the default plan.
Unlike the previous examples, the access plan generated using the optimization profile is
less costly than the default access plan selected by the optimizer.
Instructor Guide
Looking at the row estimates for the two index scan operations, we see that the HISTIX2
scan is expected to return 21056 matches while the HISTIX1 index scan is expected to
produce 92443 matches.
It is possible that the Index Anding method was not costed by the optimizer because the
current table and index statistics made the index anding method less likely to produce an
efficient access plan. A different range of BRANCH_ID and TELLER_ID values in the SQL
STATEMENT might have caused the optimizer to select an Index Anding access method.
V8.1
Instructor Guide
SOURCE_VERSION:
EXPLAIN_TIME: 2013-10-15-16.49.30.440246
Database Context:
----------------
Parallelism: None
CPU Speed: 1.220223e-07
Comm Speed: 100
Package Context:
---------------
SQL Type: Dynamic

QUERYNO: 1
QUERYTAG: CLP
Updatable: No
Deletable: No
Query Degree: 1
--------------------
INST461.PROF4
Profile 4 History Query 1 - Index Anding
Original Statement:
------------------
SELECT
H1.BRANCH_ID,
H1.ACCTNAME,
H1.ACCT_ID,
H1.BALANCE
Instructor Guide
FROM
HISTORY AS H1
WHERE
ORDER BY
H1.ACCT_ID ASC
-------------------
SELECT
FROM
OPT.HISTORY AS Q1
WHERE
ORDER BY
Q1.ACCT_ID
Access Plan:
-----------
Total Cost: 2983.34
Query Degree:1
Rows
RETURN
( 1)
Cost
I/O
|
3790.19
TBSCAN
( 2)
2983.34
3214.98
|
3790.19
SORT
( 3)
2983.23
3214.98
V8.1
Instructor Guide
Uempty |
3790.19
FETCH
( 4)
2982.25
3214.98
/---+----\
3790.19 513576
RIDSCN TABLE: OPT
( 5) HISTORY
309.065 Q1
204.907
|
3790.19
SORT
( 6)
309.065
204.907
|
3790.19
IXAND
( 7)
308.144
204.907
/-----+------\
21056.6 92443.7
IXSCAN IXSCAN
( 8) ( 9)
87.5331 218.138
38.5462 166.361
| |
513576 513576
INDEX: OPT INDEX: OPT
HISTIX2 HISTIX1
Q1 Q1
Instructor Guide
Instructor notes:
Purpose — To review the Explain report generated when the optimization profile with the
Index Anding access request element is used. Students might want to know why the
optimizer did not select this access plan automatically, since it does have a lower cost. In
previous testing, I have noticed that I would need to adjust SQL predicates to include
roughly similar row estimates to get an access plan with Index Anding instead of a single
index scan in an access plan.
Details —
Transition statement — Next we will add a little more complexity to our sample SQL
statement, by adding access to a second table.
V8.1
Instructor Guide
Uempty
Two table join: Default Access plan

82172.1
^HSJOIN
Access Plan: ( 4)
----------- 3189.57
3422.12
Total Cost: 3234
/----------+----------\
82172.1 1000
Cumulative I/O Cost: 3422 FETCH FETCH
( 5) ( 9)
3098.53 89.1049
3389.12 33
/---+----\ /---+----\
82172.1 513576 1000 1000
RIDSCN TABLE: OPT IXSCAN TABLE: OPT
Default access plan ( 6) HISTORY ( 10) TELLER
214.452 Q2 27.2812 Q1
uses a Hash Join 147.964 4
| |
82172.1 1000
Tables accessed using SORT INDEX: INST411
( 7) TELLINDX
Indexes 214.452 Q1
147.964
• History – List Prefetch |
• Teller - IXSCAN 82172.1 SELECT H1.BRANCH_ID, T1.TELLER_NAME ,
IXSCAN H1.ACCTNAME, H1.ACCT_ID, H1.BALANCE
( 8) FROM HISTORY AS H1 , TELLER AS T1
194.652 WHERE H1.TELLER_ID = T1.TELLER_ID AND
147.964 H1.BRANCH_ID BETWEEN 80 AND 95
|
513576 ORDER BY H1.BRANCH_ID ASC,
INDEX: OPT H1.ACCT_ID ASC
HISTIX1
Q2
Figure 6-30. Two table join: Default Access plan CL4636.0
Notes:
The next sample SQL statement we will analyze adds access to a second table, which
increases the number of possible access plans that could be selected.
The sample SQL statement used is as follows:
SELECT H1.BRANCH_ID, T1.TELLER_NAME , H1.ACCTNAME, H1.ACCT_ID, H1.BALANCE
FROM HISTORY AS H1 , TELLER AS T1
WHERE H1.TELLER_ID = T1.TELLER_ID AND
ORDER BY H1.BRANCH_ID ASC,
H1.ACCT_ID ASC
The information from the TELLER table is accessed based on matching the TELLER_ID
column values in the two tables. A single predicate based on the BRANCH_ID column in
the HISTORY table limits the result to a range of bank branches.
The default access plan selected by the DB2 Optimizer for this SQL statement uses:
- A Hash Join method to join the tables
Instructor Guide
- The TELLER table is accessed using an index scan as the inner table for the Hash
join.
- The HISTORY table is accessed using the index HISTIX1 to retrieve the range of
BRANCH_ID rows.
- A List prefetch is used to improve access efficiency with this unclustered index.
V8.1
Instructor Guide
Uempty The total estimated cost for the default access plan is 3234, with an estimated I/O cost of
3422. Here is a portion of the Explain report based on the default access plan.
Package Context:
---------------
SQL Type: Dynamic

QUERYNO: 1
QUERYTAG: CLP
Updatable: No
Deletable: No
Query Degree: 1
Original Statement:
------------------
SELECT
H1.BRANCH_ID,
T1.TELLER_NAME,
H1.ACCTNAME,
H1.ACCT_ID,
H1.BALANCE
FROM
HISTORY AS H1,
TELLER AS T1
WHERE
H1.TELLER_ID = T1.TELLER_ID AND
ORDER BY
H1.BRANCH_ID ASC,
H1.ACCT_ID ASC
-------------------
SELECT
Q1.TELLER_NAME AS "TELLER_NAME",
FROM
OPT.TELLER AS Q1,
OPT.HISTORY AS Q2
WHERE
(Q2.TELLER_ID = Q1.TELLER_ID)
ORDER BY
Q2.BRANCH_ID,
Instructor Guide
Q2.ACCT_ID
Access Plan:
-----------
Total Cost: 3234.81
Query Degree:1
Rows
RETURN
( 1)
Cost
I/O
|
82172.1
TBSCAN
( 2)
3234.81
3422.12
|
82172.1
SORT
( 3)
3232.3
3422.12
|
82172.1
^HSJOIN
( 4)
3189.57
3422.12
/----------+----------\
82172.1 1000
FETCH FETCH
( 5) ( 9)
3098.53 89.1049
3389.12 33
/---+----\ /---+----\
82172.1 513576 1000 1000
RIDSCN TABLE: OPT IXSCAN TABLE: OPT
( 6) HISTORY ( 10) TELLER
214.452 Q2 27.2812 Q1
147.964 4
| |
82172.1 1000
SORT INDEX: INST411
( 7) TELLINDX
214.452 Q1
147.964
V8.1
Instructor Guide
Uempty |
82172.1
IXSCAN
( 8)
194.652
147.964
|
513576
INDEX: OPT
HISTIX1
Q2
Instructor Guide
Instructor notes:
Purpose — To look at a slightly more complex SQL statement example that adds access
to a second sample table, the TELLER table.
Details —
Transition statement — Next we will look at an optimization profile that will set the method
for accessing the two tables but allow the optimizer to select the join method.
V8.1
Instructor Guide
Uempty
Define a Statement guideline
to control table access for joining tables
Profile 7 – Set global and a statement level optimization guideline to specify table access type but not
the join method
<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 7 History Query 3 Join - control table access ">
<![CDATA[
SELECT H1.BRANCH_ID, T1.TELLER_NAME , H1.ACCTNAME,
ORDER BY H1.BRANCH_ID ASC, H1.ACCT_ID ASC
]]>
</STMTKEY>
<OPTGUIDELINES>
<TBSCAN TABLE="T1" />
<LPREFETCH TABLE="H1" INDEX='HISTIX3' /> • Use a table scan for TELLER
• Access the HISTORY Table using
</OPTGUIDELINES> Index HISTIX3 With List Prefetch
</STMTPROFILE>
• Optimizer selects JOIN method
</OPTPROFILE>
Figure 6-31. Define a Statement guideline to control table access for joining tables CL4636.0
Notes:
In some cases, we might decide to use an optimization profile to alter a portion of the
processing for an SQL statement but allow the DB2 Optimizer to determine the remaining
options in the access plan to minimize total costs for a query.
In the next sample optimization profile, we will specify how the two tables should be
accessed, but allow the DB2 Optimizer to determine the most efficient method to join the
two tables. The exposed names from the SQL statement, H1 for HISTORY and T1 for
TELLER are used in the TABLE attributes for the access request elements.
In the default access plan, the TELLER table was accessed using the index TELLINDX, but
all 1000 rows were retrieved. In our optimization profile, we will specify that a table scan
should be used to access the TELLER table.
The default access plan used the single column index HISTIX1 and List prefetch to access
the HISTORY table. In the optimization profile we will override this index selection using a
LPREFETCH access request element with the INDEX attribute of HISTIX3, to use the
multiple column index instead.
Instructor Guide
The following guideline is used:

<OPTGUIDELINES>
<LPREFETCH TABLE="H1" INDEX='HISTIX3' />
</OPTGUIDELINES>
Here is the complete optimizer profile document.
<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 7 History Query 3 Join - control table access ">
<![CDATA[
]]>
</STMTKEY>
<OPTGUIDELINES>
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
V8.1
Instructor Guide

Purpose — To look at using an optimization profile to alter the access methods for two
tables that are being joined but to allow the optimizer to select the join method based on
lowest cost.
Details —
Transition statement — Next we will look at the Explain report generated when this
optimization profile is used.
Instructor Guide
Two table join:

Profile 7 sets access methods for two table join
82172.1
^HSJOIN
( 4)
Access Plan: 7849.8
8587.93
----------- /---------+---------\
Total Cost: 14680 82172.1 1000
FETCH TBSCAN
Cumulative I/O Cost: 10355 ( 5) ( 9)
7651.42 196.446
8558.93 29
Profile Information: /---+----\ |
-------------------- 82172.1 513576 1000
OPT_PROF: (Optimization RIDSCN TABLE: OPT TABLE: OPT
( 6) HISTORY TELLER
Profile Name) 219.23 Q2 Q1
INST461.PROFILE7 152.993
STMTPROF: (Statement |
Profile Name) 82172.1
SORT SELECT H1.BRANCH_ID, T1.TELLER_NAME ,
Profile 7 History Query 3 Join ( 7) H1.ACCTNAME, H1.ACCT_ID, H1.BALANCE
- control table access 219.23 FROM HISTORY AS H1 , TELLER AS T1
152.993
| WHERE H1.TELLER_ID = T1.TELLER_ID AND
82172.1 H1.BRANCH_ID BETWEEN 80 AND 95
IXSCAN ORDER BY H1.BRANCH_ID ASC,
Optimizer selects ( 8) H1.ACCT_ID ASC
199.43
• Hash Join 152.993
• Inner / Outer tables |
513576
for join INDEX: OPT
HISTIX3
Q2
Figure 6-32. Two table join: Profile 7 sets access methods for two table join CL4636.0
Notes:
profile named PROFILE7 was used to compile the access plan. It also shows that the SQL
statement matched the statement profile with an ID of ‘Profile 7 History Query 3 Join -
control table access ‘. The profile also included a global guideline to set the optimization
class to 7.
The access plan generated now uses the same Hash Join method for joining the two
tables. Using the access types specified by the optimization profile produced an access
plan with an estimated cost of 14680, which is much higher than cost estimated for the
default access plan. The estimated I/O cost of 10355, is also higher than the I/O cost for
the default plan. The revised access plan might perform well but the optimizer estimates a
higher resource cost for this plan.
Sometimes an optimization profile can be used to show an aspect of the logic used inside
the optimizer. In this case we see that using a table scan to retrieve every row of the
TELLER table is considered to be slightly more costly than reading those rows using the
clustered index.
V8.1
Instructor Guide
Package Context:
---------------
SQL Type: Dynamic

QUERYNO: 1
QUERYTAG: CLP
Updatable: No
Deletable: No
Query Degree: 1
--------------------
INST461.L7
Profile 7 History Query 3 Join - control table access
Original Statement:
------------------
SELECT
H1.BRANCH_ID,
T1.TELLER_NAME,
H1.ACCTNAME,
H1.ACCT_ID,
H1.BALANCE
FROM
HISTORY AS H1,
TELLER AS T1
WHERE
ORDER BY
H1.BRANCH_ID ASC,
H1.ACCT_ID ASC
-------------------
SELECT
FROM
OPT.TELLER AS Q1,
OPT.HISTORY AS Q2
Instructor Guide
WHERE
ORDER BY
Q2.BRANCH_ID,
Q2.ACCT_ID
Access Plan:
-----------
Total Cost: 14679.8
Query Degree:1
Rows
RETURN
( 1)
Cost
I/O
|
82172.1
TBSCAN
( 2)
14679.8
10355.9
|
82172.1
SORT
( 3)
13873.9
9471.93
|
82172.1
^HSJOIN
( 4)
7849.8
8587.93
/---------+---------\
82172.1 1000
FETCH TBSCAN
( 5) ( 9)
7651.42 196.446
8558.93 29
/---+----\ |
82172.1 513576 1000
RIDSCN TABLE: OPT TABLE: OPT
( 6) HISTORY TELLER
219.23 Q2 Q1
152.993
V8.1
Instructor Guide
Uempty |
82172.1
SORT
( 7)
219.23
152.993
|
82172.1
IXSCAN
( 8)
199.43
152.993
|
513576
INDEX: OPT
HISTIX3
Q2
Instructor Guide
Instructor notes:
Purpose — To review the access plan from the Explain tool report based on the
optimization profile that sets the methods to access two tables but allows the optimizer to
choose the join method.
Details —
Transition statement — Next we will look at using an optimization profile to use a merge
join method for our SQL query.
V8.1
Instructor Guide
Uempty
Define a Statement guideline to request
a Merge Join and also control table access
Profile 8 – Set global and a statement level optimization guideline to control access to tables and request a Merge
Join

<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 8 History Query 3 MSJOIN Join - control table access ">
<![CDATA[
]]>
</STMTKEY> • Profile selects a Merge Join
<OPTGUIDELINES> • Sequence of tables used to determine the
<MSJOIN> Inner/Outer tables
<TBSCAN TABLE="T1" /> • Use a table scan for TELLER (outer)
<LPREFETCH TABLE="H1" /> • Access the HISTORY Table using
</MSJOIN> Any Index with List Prefetch (inner)
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE> © Copyright IBM Corporation 2013
Figure 6-33. Define a Statement guideline to request a Merge Join and also control table access CL4636.0
Notes:
An optimization profile can be used to control as much of the access plan as you decide is
necessary to produce the performance result needed. In the next sample optimization
profile, we will specify that a Merge Join method should be used to join the two tables. We
will also specify how the two tables should be accessed.
<OPTGUIDELINES>
<MSJOIN>
<LPREFETCH TABLE="H1" />
</MSJOIN>
</OPTGUIDELINES>
The guideline uses the MSJOIN join request element to specify that a merge join should be
used. The TBSCAN and LPREFETCH access request elements specify how to access the
TELLER and HISTORY tables. The sequence of the access request elements tells the
optimizer to use the TELLER table as the outer table for the join and the HISTORY table as
Instructor Guide
the inner table for the join processing. Since no INDEX attribute is included with the
LPREFETCH access request element, the optimizer can select any of the three HISTORY
table indexes based on cost. The exposed table names from the SQL statement, H1 for
HISTORY and T1 for TELLER are used in the TABLE attributes for the access request
elements.
<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 8 History Query 3 MSJOIN Join - control table
access ">
<![CDATA[
]]>
</STMTKEY>
<OPTGUIDELINES>
<MSJOIN>
</MSJOIN>
</OPTGUIDELINES>
V8.1
Instructor Guide

Purpose — To discuss an optimization profile that specifies the join method to join the two
tables and also to control the access type for each table.
Details —
Transition statement — Next we will look at the Explain report generated using this
Instructor Guide
Two table join: Profile 8 sets

Merge Join and access methods for two table join
Access Plan: Total Cost: 3380
Cumulative I/O Cost: 3418
82172.1
MSJOIN 82172.1
( 4) FETCH
3335.05 ( 11)
3418.12 3098.53
/---+----\ 3389.12
1000 82.1721 /---+----\
TBSCAN FILTER 82172.1 513576
( 5) ( 8) RIDSCN TABLE: OPT
196.584 3133.95 ( 12) HISTORY
29 3389.12 214.452 Q2
| | 147.964
1000 82172.1 |
SORT TBSCAN 82172.1
( 6) ( 9) SORT
196.584 3133.95 ( 13)
29 3389.12 214.452
| | 147.964
1000 82172.1 |
TBSCAN SORT 82172.1
( 7) ( 10) IXSCAN
196.446 3131.44 ( 14)
29 3389.12 194.652
| | 147.964
1000 82172.1 Optimizer selects
|
TABLE: OPT FETCH 513576 • Index HISTIX1
TELLER ( 11) INDEX: OPT
Q1 3098.53 HISTIX1
3389.12 Q2
Figure 6-34. Two table join: Profile 8 sets Merge Join and access methods for two table join CL4636.0
Notes:
The visual shows a portion of the access plan generated using the optimization profile
listed on the previous slide. The access plan includes the Merge Join method with the
TELLER table as the outer table and the HISTORY table as the inner table for the join,
which matches our optimization profile.
A table scan was used to access the TELLER table as requested. The optimizer selected
the index HISTIX1 to be used for the List prefetch access requested in the profile.
Using this optimization profile produced an access plan with an estimated cost of 3380,
which is slightly percent higher than the cost estimated for the default access plan. The
estimated I/O cost of 3418, is slightly lower than the I/O cost for the default plan.
Performance testing would be necessary to demonstrate that this access plan could
consistently produce the performance result needed by the application.
V8.1
Instructor Guide
Package Context:
---------------
SQL Type: Dynamic

QUERYNO: 1
QUERYTAG: CLP
Updatable: No
Deletable: No
Query Degree: 1
--------------------
INST461.PROFILE8
Profile 8 History Query 3 MSJOIN Join - control table access
Original Statement:
------------------
SELECT
H1.BRANCH_ID,
T1.TELLER_NAME,
H1.ACCTNAME,
H1.ACCT_ID,
H1.BALANCE
FROM
HISTORY AS H1,
TELLER AS T1
WHERE
ORDER BY
H1.BRANCH_ID ASC,
H1.ACCT_ID ASC
-------------------
SELECT
FROM
OPT.TELLER AS Q1,
OPT.HISTORY AS Q2
Instructor Guide
WHERE
ORDER BY
Q2.BRANCH_ID,
Q2.ACCT_ID
Access Plan:
-----------
Total Cost: 3380.28
Query Degree:1
Rows
RETURN
( 1)
Cost
I/O
|
82172.1
TBSCAN
( 2)
3380.28
3418.12
|
82172.1
SORT
( 3)
3377.78
3418.12
|
82172.1
MSJOIN
( 4)
3335.05
3418.12
/---+----\
1000 82.1721
TBSCAN FILTER
( 5) ( 8)
196.584 3133.95
29 3389.12
| |
1000 82172.1
SORT TBSCAN
( 6) ( 9)
196.584 3133.95
29 3389.12
V8.1
Instructor Guide
Uempty | |
1000 82172.1
TBSCAN SORT
( 7) ( 10)
196.446 3131.44
29 3389.12
| |
1000 82172.1
TABLE: OPT FETCH
TELLER ( 11)
Q1 3098.53
3389.12
/---+----\
82172.1 513576
RIDSCN TABLE: OPT
( 12) HISTORY
214.452 Q2
147.964
|
82172.1
SORT
( 13)
214.452
147.964
|
82172.1
IXSCAN
( 14)
194.652
147.964
|
513576
INDEX: OPT
HISTIX1
Q2
Instructor Guide
Instructor notes:
Purpose — To show the access plan from the Explain report generated using the
optimization profile with a MSJOIN join request element. One key aspect of this profile is
that we request the type of access for the HISTORY table, being a list prefetch, but allowed
the optimizer to select the most efficient index.
Details —
Transition statement — Next we will look at an optimization profile that selects a Nested
Loop join method to process this same query.
V8.1
Instructor Guide
Uempty
Define a Statement guideline to request a
Nested Loop Join and also control table access
Profile 9 – Set global and a statement level optimization guideline to control access to tables and request a Nested
Loop Join

<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 9 History Query 3 NLJOIN Join - control table access ">
<![CDATA[
]]>
</STMTKEY> •Profile selects a Nested Loop Join
<OPTGUIDELINES> •Sequence of tables used to determine the
<NLJOIN> Inner/Outer tables
<TBSCAN TABLE="T1" /> •Use a table scan for TELLER (outer)
<LPREFETCH TABLE="H1" /> •Access the HISTORY Table using
</NLJOIN> Any Index with List Prefetch (inner)
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE> © Copyright IBM Corporation 2013
Figure 6-35. Define a Statement guideline to request a Nested Loop Join and also control table access CL4636.0
Notes:
In the next sample optimization profile, we will specify that a Nested Loop Join method
should be used to join the two tables. We will also specify how the two tables should be
accessed.
<OPTGUIDELINES>
<NLJOIN>
</NLJOIN>
</OPTGUIDELINES>
The guideline uses the NLJOIN join request element to specify that a nested loop join
should be used. Like the previous example, the TBSCAN and LPREFETCH access
request elements specify how to access the TELLER and HISTORY tables. The sequence
of the access request elements tells the optimizer to use the TELLER table as the outer
table for the join and the HISTORY table as the inner table for the join processing. Since no
Instructor Guide
INDEX attribute is included with the LPREFETCH access request element, the optimizer
can select any of the three HISTORY table indexes based on cost. The exposed table
names from the SQL statement, H1 for HISTORY and T1 for TELLER are used in the
TABLE attributes for the access request elements.
Setting the query optimization class to a value of 0, either in the application or using an
optimization profile could have been used to force the optimizer to use a nested loop join
method, but that would have precluded the use of List Prefetch for any indexed access.
The use of the table scan for the TELLER table would also have been less likely using
optimization class 0. This optimization profile is a direct, stable way to achieve this very
specific access plan, if that is the objective.
<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 9 History Query 3 NLJOIN Join - control table
access ">
<![CDATA[
]]>
</STMTKEY>
<OPTGUIDELINES>
<NLJOIN>
</NLJOIN>
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
V8.1
Instructor Guide

Purpose — To discuss this example of an optimization profile that specifies a nested loop
join method and also controls the type of access to each table.
Details —
Transition statement — Next we will look at the Explain report generated when we use
this optimization profile to get a nested loop join between the two tables.
Instructor Guide
Two table join: Profile 9 sets Nested

Loop Join and access methods for two table join
82172.2
NLJOIN
( 4)
149957
Access Plan: 40743
----------- /------+-------\
Total Cost: 156787 1000 82.1722
TBSCAN FETCH
( 5) ( 6)
Cumulative I/O Cost: 42511 196.446 758.779
Profile Information: 29 234.751
-------------------- | /---+----\
OPT_PROF: (Optimization Profile Name) 1000 82.1721 513576
TABLE: OPT RIDSCN TABLE: OPT
INST461.PROFILE9 TELLER ( 7) HISTORY
STMTPROF: (Statement Profile Name) Q1 203.897 Q2
Profile 9 History Query 3 NLJOIN Join 152.974
- control table access |
82.1721
SORT
( 8)
203.897
152.974
|
Optimizer selects 82.1721
IXSCAN
• Index HISTIX3 based on ( 9)
the TELLER_ID column 203.891
152.974
|
513576
INDEX: OPT
HISTIX3
Q2
Figure 6-36. Two table join: Profile 9 sets Nested Loop Join and access methods for two table join CL4636.0
Notes:
listed on the previous slide. The access plan includes the Nested Loop method with the
TELLER table as the outer table and the HISTORY table as the inner table for the join,
A table scan was used to access the TELLER table as requested. The optimizer selected
the index HISTIX2, based on the TELLER_ID column to locate the HISTORY rows for each
row in the TELLER table. The index will be used to perform the List Prefetch access
requested in the profile for each TELLER table row.
Using this optimization profile produced an access plan with an estimated cost of 156,787,
which is much higher in estimated cost compared to the default access plan. The estimated
I/O cost of 42511, is also much higher than the I/O cost for the default plan.
The cost estimates would indicate that this access plan is unlikely to perform better than
the default access plan, but it demonstrates the ability to use optimization profiles to
override the standard access plan selection logic when needed.
V8.1
Instructor Guide
Package Context:
---------------
SQL Type: Dynamic

QUERYNO: 1
QUERYTAG: CLP
Updatable: No
Deletable: No
Query Degree: 1
--------------------
INST461.PROFILE9
Profile 9 History Query 3 NLJOIN Join - control table access
Original Statement:
------------------
SELECT
H1.BRANCH_ID,
T1.TELLER_NAME,
H1.ACCTNAME,
H1.ACCT_ID,
H1.BALANCE
FROM
HISTORY AS H1,
TELLER AS T1
WHERE
ORDER BY
H1.BRANCH_ID ASC,
H1.ACCT_ID ASC
-------------------
SELECT
FROM
OPT.TELLER AS Q1,
OPT.HISTORY AS Q2
Instructor Guide
WHERE
ORDER BY
Q2.BRANCH_ID,
Q2.ACCT_ID
Access Plan:
-----------
Total Cost: 156787
Query Degree:1
Cumulative Total Cost: 156787
Cumulative First Row Cost: 156011
Rows
RETURN
( 1)
Cost
I/O
|
82172.2
TBSCAN
( 2)
156787
42511
|
82172.2
SORT
( 3)
155981
41627
|
82172.2
NLJOIN
( 4)
149957
40743
/------+-------\
1000 82.1722
TBSCAN FETCH
( 5) ( 6)
196.446 758.779
29 234.751
| /---+----\
1000 82.1721 513576
TABLE: OPT RIDSCN TABLE: OPT
TELLER ( 7) HISTORY
Q1 203.897 Q2
152.974
V8.1
Instructor Guide
Uempty |
82.1721
SORT
( 8)
203.897
152.974
|
82.1721
IXSCAN
( 9)
203.891
152.974
|
513576
INDEX: OPT
HISTIX3
Q2
Instructor Guide
Instructor notes:
Purpose — To review the Explain report generated using the optimization profile with the
NLJOIN join request element.
Details —
Transition statement — Next we will look at creating optimization profiles for an SQL
statement joining three tables.
V8.1
Instructor Guide
Uempty
Three table join: Default Access Plan

Access Plan: Total Cost: 3246 Cumulative I/O Cost: 3423
82172.1
^HSJOIN
( 4)
3198.25
3423.12
/-------------------+-------------------\
82172.1 16
^HSJOIN FETCH
( 5) ( 12)
3189.57 6.77909
3422.12 1
/----------+----------\ /----+----\
82172.1 1000 16 100
FETCH FETCH IXSCAN TABLE: OPT
( 6) ( 10) ( 13) BRANCH
3098.53 89.1049 0.0106189 Q1
3389.12 33 0
/---+----\ /---+----\ |
82172.1 513576 1000 1000 100
RIDSCN TABLE: OPT IXSCAN TABLE: OPT INDEX: INST411
( 7) HISTORY ( 11) TELLER BRANINDX
214.452 Q3 27.2812 Q2 Q1
147.964 4
| |
82172.1 1000
SORT INDEX: INST411
( 8) TELLINDX
214.452 Q2
147.964 (HISTIX1)
Figure 6-37. Three table join: Default Access Plan CL4636.0
Notes:
Next we will add a little more complexity to the SQL statement by adding a third table
reference. The following SQL statement will be used.
SELECT H1.BRANCH_ID, T1.TELLER_NAME , H1.ACCTNAME, H1.ACCT_ID,
H1.BALANCE, B1.BRANCH_NAME, B1.AREA_CODE
FROM HISTORY AS H1 , TELLER AS T1, BRANCH AS B1
WHERE H1.TELLER_ID = T1.TELLER_ID AND H1.BRANCH_ID = B1.BRANCH_ID AND
ORDER BY H1.ACCT_ID ASC , B1.BRANCH_ID
The visual shows a portion of the default access plan for this three table join. The optimizer
selected a Hash join method to join the HISTORY and Teller tables, with Index scans used
to access both tables. The composite result of the first join is next joined to the BRANCH
table using a Hash join, with the small BRANCH table data as the inner table for the join
processing. As in previous examples, all of the TELLER table rows are read using an index
scan.
Instructor Guide
Here is a portion of the Explain tool report for the default access plan.
SOURCE_VERSION:
EXPLAIN_TIME: 2013-10-15-17.15.33.872950
Database Context:
----------------
Parallelism: None
CPU Speed: 1.220223e-07
Comm Speed: 100
Package Context:
---------------
SQL Type: Dynamic

QUERYNO: 1
QUERYTAG: CLP
Updatable: No
Deletable: No
Query Degree: 1
Original Statement:
------------------
SELECT
H1.BRANCH_ID,
T1.TELLER_NAME,
H1.ACCTNAME,
H1.ACCT_ID,
H1.BALANCE,
B1.BRANCH_NAME,
B1.AREA_CODE
FROM
HISTORY AS H1,
TELLER AS T1,
BRANCH AS B1
V8.1
Instructor Guide
Uempty WHERE
H1.BRANCH_ID = B1.BRANCH_ID AND
ORDER BY
H1.ACCT_ID ASC,
B1.BRANCH_ID
-------------------
SELECT
Q3.BALANCE AS "BALANCE",
Q1.BRANCH_NAME AS "BRANCH_NAME",
Q1.AREA_CODE AS "AREA_CODE",
Q1.BRANCH_ID
FROM
OPT.BRANCH AS Q1,
OPT.TELLER AS Q2,
OPT.HISTORY AS Q3
WHERE
(Q3.BRANCH_ID = Q1.BRANCH_ID) AND
ORDER BY
Q3.ACCT_ID,
Q1.BRANCH_ID
Access Plan:
-----------
Total Cost: 3246.08
Query Degree:1
Rows
RETURN
( 1)
Cost
I/O
|
82172.1
TBSCAN
( 2)
3246.08
Instructor Guide
3423.12
|
82172.1
SORT
( 3)
3243.57
3423.12
|
82172.1
^HSJOIN
( 4)
3198.25
3423.12
/-------------------+-------------------\
82172.1 16
^HSJOIN FETCH
( 5) ( 12)
3189.57 6.77909
3422.12 1
/----------+----------\ /----+----\
82172.1 1000 16 100
FETCH FETCH IXSCAN TABLE: OPT
( 6) ( 10) ( 13) BRANCH
3098.53 89.1049 0.0106189 Q1
3389.12 33 0
/---+----\ /---+----\ |
82172.1 513576 1000 1000 100
RIDSCN TABLE: OPT IXSCAN TABLE: OPT INDEX: INST411
( 7) HISTORY ( 11) TELLER BRANINDX
214.452 Q3 27.2812 Q2 Q1
147.964 4
| |
82172.1 1000
SORT INDEX: INST411
( 8) TELLINDX
214.452 Q2
147.964
|
82172.1
IXSCAN
( 9)
194.652
147.964
|
513576
INDEX: OPT
HISTIX1
Q3
V8.1
Instructor Guide

Purpose — To look at the default access plan generated by the DB2 Optimizer for a three
table join SQL statement.
Details —
Transition statement — Next we will define an optimization profile to control a portion of
the access plan for this three table join.
Instructor Guide
Define a Statement guideline to request

a Hash Join of two tables in a three table join
Profile 12 – Set global and a statement level optimization guideline to set Inner/Outer
tables and request a Hash Join

<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 12 History Query 4 Control one join HSJOIN only ">
<![CDATA[
H1.ACCT_ID, H1.BALANCE, B1.BRANCH_NAME, B1.AREA_CODE
WHERE H1.TELLER_ID = T1.TELLER_ID
AND H1.BRANCH_ID = B1.BRANCH_ID AND
]]>
</STMTKEY>
<OPTGUIDELINES>
• Profile selects a Hash Join for History and Branch
<HSJOIN> • Profile sets table scan for Branch
<ACCESS TABLE="H1" /> • Sequence of tables used to determine the
<TBSCAN TABLE="B1" /> Inner/Outer tables
</HSJOIN> • Optimizer selects method for History and
</OPTGUIDELINES>
</STMTPROFILE>
join method for the third table
</OPTPROFILE>
Figure 6-38. Define a Statement guideline to request a Hash Join of two tables in a three table join CL4636.0
Notes:
In our sample three table join, the most costly join is the join between the HISTORY and
BRANCH tables. In the next sample optimization profile, we will specify that a Hash Join
method should be used to join the two tables. We will allow the optimizer to select any
method to access the HISTORY tables based on lowest estimated cost, but force a table
scan to be used to access the BRANCH table.
<OPTGUIDELINES>
<HSJOIN>
<ACCESS TABLE="H1" />
<TBSCAN TABLE="B1" />
</HSJOIN>
</OPTGUIDELINES>
The guideline uses the HSJOIN join request element to specify that a Hash join should be
used. Like previous examples, the ACCESS request elements specify that any method to
access the HISTORY table might be selected by the optimizer.
V8.1
Instructor Guide
Uempty The sequence of the access request elements tells the optimizer to use the small BRANCH
table as the inner table for the join and the HISTORY table as the outer table for the join
processing. The optimizer will select the join method to join the TELLER to the composite
result of this two table join.
<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 12 History Query 4 Control one join HSJOIN only
">
<![CDATA[
]]>
</STMTKEY>
<OPTGUIDELINES>
<HSJOIN>
<ACCESS TABLE="H1" />
</HSJOIN>
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
Instructor Guide
Instructor notes:
Purpose — To discuss implementing an optimization profile to control a portion of the join
processing when multiple tables are being joined. In most cases, you would target the
portion of the join processing that would require most of the processing time and resources.
Details —
V8.1
Instructor Guide
Uempty
Three table join: Using Optimizer Profile 12

/--------------+---------------\
82172.1 1000
^HSJOIN FETCH
( 5) ( 11)
3127.54 89.1049
3393.12 33
/---------+---------\ /---+----\
82172.1 16 1000 1000
FETCH TBSCAN IXSCAN TABLE: OPT
( 6) ( 10) ( 12) TELLER
3098.53 27.103 27.2812 Q2
3389.12 4 4
/---+----\ | |
82172.1 513576 100 1000
RIDSCN TABLE: OPT TABLE: OPT INDEX: INST411
( 7) HISTORY BRANCH TELLINDX
214.452 Q3 Q1 Q2
147.964
|
82172.1
SORT
( 8)
214.452
147.964
| Access Plan: Total Cost: 3266 Cumulative I/O Cost: 3426
82172.1
IXSCAN
( 9) • Optimizer selects List Prefetch for History Table
194.652 • Teller Table Access and Join are not changed from the Default plan
147.964
|
513576
INDEX: OPT
HISTIX1
Q3
Figure 6-39. Three table join: Using Optimizer Profile 12 CL4636.0
Notes:
listed on the previous slide. The access plan includes the Hash Join method with the
BRANCH table as the inner table and the HISTORY table as the outer table for the join,
The optimizer used the table scan to access the BRANCH table to retrieve the sixteen
branch rows that would be used for matching the HISTORY rows based on the
BRANCH_ID range included in the predicate for the HISTORY table. The index HISTIX1,
based on the BRANCH_ID column would be used to locate the HISTORY rows matching
the predicate in the SQL statement. The optimizer chose to use list prefetch to access the
HISTORY rows.
The optimizer did not change the portion of the access plan used to join this composite with
the TELLER table data. The generated plan uses a Hash join, with the TELLER table as
the inner table and uses an index scan to access the table.
The estimated cost for the revised access plan is 3266, which is slightly higher than the
default access plan. The estimated I/O is also slightly higher than the default plan.
Instructor Guide
Database Context:
----------------
Parallelism: None
CPU Speed: 1.220223e-07
Comm Speed: 100
Package Context:
---------------
SQL Type: Dynamic

QUERYNO: 1
QUERYTAG: CLP
Updatable: No
Deletable: No
Query Degree: 1
--------------------
INST461.PROFILE12
Profile 12 History Query 4 Control one join HSJOIN only
Original Statement:
------------------
SELECT
H1.BRANCH_ID,
T1.TELLER_NAME,
H1.ACCTNAME,
H1.ACCT_ID,
H1.BALANCE,
B1.BRANCH_NAME,
B1.AREA_CODE
FROM
HISTORY AS H1,
TELLER AS T1,
BRANCH AS B1
WHERE
V8.1
Instructor Guide
Uempty H1.BRANCH_ID BETWEEN 80 AND 95

ORDER BY
H1.ACCT_ID ASC,
B1.BRANCH_ID
-------------------
SELECT
Q1.BRANCH_ID
FROM
OPT.BRANCH AS Q1,
OPT.TELLER AS Q2,
OPT.HISTORY AS Q3
WHERE
ORDER BY
Q3.ACCT_ID,
Q1.BRANCH_ID
Access Plan:
-----------
Total Cost: 3266.4
Query Degree:1
Rows
RETURN
( 1)
Cost
I/O
|
82172.1
TBSCAN
( 2)
3266.4
3426.12
|
82172.1
Instructor Guide
SORT
( 3)
3263.9
3426.12
|
82172.1
^HSJOIN
( 4)
3218.58
3426.12
/--------------+---------------\
82172.1 1000
^HSJOIN FETCH
( 5) ( 11)
3127.54 89.1049
3393.12 33
/---------+---------\ /---+----\
82172.1 16 1000 1000
FETCH TBSCAN IXSCAN TABLE: OPT
( 6) ( 10) ( 12) TELLER
3098.53 27.103 27.2812 Q2
3389.12 4 4
/---+----\ | |
82172.1 513576 100 1000
RIDSCN TABLE: OPT TABLE: OPT INDEX: INST411
( 7) HISTORY BRANCH TELLINDX
214.452 Q3 Q1 Q2
147.964
|
82172.1
SORT
( 8)
214.452
147.964
|
82172.1
IXSCAN
( 9)
194.652
147.964
|
513576
INDEX: OPT
HISTIX1
Q3
V8.1
Instructor Guide

Purpose — To review the access plan in the Explain report generated using the
optimization profile that controls one of the two joins performed for our sample query.
Details —
Transition statement — Next we will look at using the optimization profile to control both
joins performed for our sample SQL statement.
Instructor Guide
Define a Statement guideline to

control both joins for a three table join
Profile 13 – Set global and a statement level optimization guideline to set Inner/Outer tables and request a Hash
Join

<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 13 History Query 4 Control Joins and Access methods ">
<![CDATA[
]]>
</STMTKEY>
<OPTGUIDELINES>
<MSJOIN> • Profile selects a Hash Join for History and Branch
<HSJOIN>
(This is outer of Merge Join)
<IXSCAN TABLE="H1" />
• Profile select Teller as Inner table for Merge Join
</HSJOIN> • Sequence of tables used to determine the
<IXSCAN TABLE='T1' /> Inner/Outer tables
</MSJOIN > • Optimizer selects specific indexes for Index Scans
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
Figure 6-40. Define a Statement guideline to control both joins for a three table join CL4636.0
Notes:
When necessary we can create an optimization profile that controls multiple join operations
for a particular SQL statement. As in the previous example, we will specify that a Hash Join
method should be used to join the HISTORY and BRANCH tables. This time we will use
the MSJOIN join request element to specify a Merge Scan join method for joining the
composite HISTORY and BRANCH data with the TELLER table being the inner table for
the join. We will also specify IXSCAN access request elements for the HISTORY and
TELLER tables but use a TBSCAN access request element for the BRANCH table. The
IXSCAN access request elements will not include INDEX attributes, so the optimizer will
select any available index based on lowest estimated cost.
V8.1
Instructor Guide
Uempty The following guideline is used:

<OPTGUIDELINES>
<MSJOIN>
<HSJOIN>
</HSJOIN>
<IXSCAN TABLE='T1' />
</MSJOIN
</OPTGUIDELINES>
The sequence of these request elements is important. The MSJOIN join request element
includes the HSJOIN join request element first, which means the Hash join of the
HISTORY and BRANCH data will form the outer portion of the merge join. The IXSCAN
access request then tells the optimizer that the index scan of the TELLER table will form
the inner table for the merge join. The HSJOIN join request element uses an IXSCAN
access request for indexed access of the HISTORY table and a TBSCAN access request
element to access the BRANCH table.
Instructor Guide

<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 13 History Query 4 Control Joins and Access
methods ">
<![CDATA[
]]>
</STMTKEY>
<OPTGUIDELINES>
<MSJOIN>
<HSJOIN>
</HSJOIN>
<IXSCAN TABLE='T1' />
</MSJOIN >
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
V8.1
Instructor Guide

Purpose — To discuss this optimization profile which adds more controls for the generated
access plan. This shows how the sequence of several join methods can be specified in a
profile. This example contains more specific controls for the methods used to access the
tables. These examples are intended to show how to specify different types of control for
access plans; we are not suggesting that these types of changes would perform better than
the default access plan. It is still up to the administrator to determine what changes to make
to an access plan. We are just trying to provide good examples that will help to define those
changes in an optimization profile.
Details —
Transition statement — Next we will look at the access plan in an Explain tool report
generated using this more detailed optimization profile.
Instructor Guide
Two table join: Profile 13 sets

Merge Join and Hash Joins for three table join
82172.1
^MSJOIN
( 4)
30687.3
4666.06
/------------+------------\
82172.1 1
TBSCAN FILTER
( 5) ( 11)
22201 89.1049
3392.93 33
| |
82172.1 1000
SORT FETCH
( 6) ( 12)
22198.5 89.1049
3392.93 33
| /---+----\
82172.1 1000 1000
^HSJOIN IXSCAN TABLE: OPT
( 7) ( 13) TELLER
22163 27.2812 Q2
3392.93 4
/---------+---------\ |
82172.1 16 1000
FETCH TBSCAN INDEX: INST411
( 8) ( 10) TELLINDX
22134 27.103 Q2
3388.93 4
/---+----\ | Access Plan: Total Cost: 30745
82172.1 513576 100 Cumulative I/O Cost: 4666
IXSCAN TABLE: OPT TABLE: OPT
( 9) HISTORY BRANCH
194.652 Q3 Q1
Figure 6-41. Two table join: Profile 13 sets Merge Join and Hash Joins for three table join CL4636.0
Notes:
listed on the previous slide that controls both join methods and also the access methods for
each of the three tables. The access plan includes the Hash Join method with the
BRANCH table as the inner table and the HISTORY table as the outer table for the join,
which matches our optimization profile. This composite result becomes the outer portion of
the Merge join operation with the TELLER table as the inner table.
The optimizer selected an index scan to access the HISTORY table. The index HISTIX1,
based on the BRANCH_ID column is used to locate the HISTORY rows matching the
predicate in the SQL statement. The optimizer chose not use list prefetch to access the
HISTORY rows because the optimization profile used the IXSCAN access request
element.
The estimated total cost for the revised access plan is 30735, which is over nine times
higher than the default access plan. The estimated I/O is 4666 which is somewhat higher
than the default plan estimate. Much of the added cost comes from the indexed access to
the HISTORY table without the benefit of List prefetch. We could revise our optimization
V8.1
Instructor Guide
Uempty profile to include the LPREFETCH access request element in place of the IXSCAN if
testing proved that the List prefetch would benefit execution performance.
Instructor Guide
Package Context:
---------------
SQL Type: Dynamic

QUERYNO: 1
QUERYTAG: CLP
Updatable: No
Deletable: No
Query Degree: 1
--------------------
INST461.PROFILE13
Profile 13 History Query 4 Control Joins and Access methods
Original Statement:
------------------
SELECT
H1.BRANCH_ID,
T1.TELLER_NAME,
H1.ACCTNAME,
H1.ACCT_ID,
H1.BALANCE,
B1.BRANCH_NAME,
B1.AREA_CODE
FROM
HISTORY AS H1,
TELLER AS T1,
BRANCH AS B1
WHERE
ORDER BY
H1.ACCT_ID ASC,
B1.BRANCH_ID
-------------------
SELECT
V8.1
Instructor Guide
Uempty Q3.BALANCE AS "BALANCE",

Q1.BRANCH_ID
FROM
OPT.BRANCH AS Q1,
OPT.TELLER AS Q2,
OPT.HISTORY AS Q3
WHERE
ORDER BY
Q3.ACCT_ID,
Q1.BRANCH_ID
Access Plan:
-----------
Total Cost: 30735.2
Query Degree:1
Rows
RETURN
( 1)
Cost
I/O
|
82172.1
TBSCAN
( 2)
30735.2
4666.06
|
82172.1
SORT
( 3)
30732.7
4666.06
|
82172.1
^MSJOIN
( 4)
30687.3
4666.06
/------------+------------\
82172.1 1
TBSCAN FILTER
Instructor Guide
( 5) ( 11)
22201 89.1049
3392.93 33
| |
82172.1 1000
SORT FETCH
( 6) ( 12)
22198.5 89.1049
3392.93 33
| /---+----\
82172.1 1000 1000
^HSJOIN IXSCAN TABLE: OPT
( 7) ( 13) TELLER
22163 27.2812 Q2
3392.93 4
/---------+---------\ |
82172.1 16 1000
FETCH TBSCAN INDEX: INST411
( 8) ( 10) TELLINDX
22134 27.103 Q2
3388.93 4
/---+----\ |
82172.1 513576 100
IXSCAN TABLE: OPT TABLE: OPT
( 9) HISTORY BRANCH
194.652 Q3 Q1
147.964
|
513576
INDEX: OPT
HISTIX1
Q3
V8.1
Instructor Guide

Purpose — To review the access plan from the Explain report generated using the
optimization profile that controlled the join methods and access method used for the three
table join.
Details —
Transition statement — Next we will look at an example of a zigzag join method that
could be included in an optimization profile.
Instructor Guide
A Zigzag Join can be included in an optimization profile

starting with DB2 10.1
• Sample section from an optimization profile
<OPTGUIDELINES>
<ZZJOIN>
<IXSCAN TABLE='PERIOD'/>
<ACCESS TABLE='STORE'/>
<IXSCAN TABLE='SALES'/>
</ZZJOIN>
</OPTGUIDELINES>
• The guideline below requests a zigzag join.

– It specifies that PERIOD and STORE dimension tables are to be the first and
second legs, respectively, to the zigzag join operator
– SALES is the fact table to be accessed using an index scan on any multi-column
index on SALES
– The guideline does not specify how STORE is to be accessed, but it explicitly
requests that PERIOD be accessed using an index scan; exactly which index to
use is left up to the optimizer.
Figure 6-42. A Zigzag Join can be included in an optimization profile starting with DB2 10.1 CL4636.0
Notes:
The visual shows how a zigzag join could be defined in an optimization profile.
The sequence of the tables within the ZZJOIN element impacts the sequence for accessing
tables, with the last table being the fact table, ‘SALES’ in the example.
V8.1
Instructor Guide

Purpose — To show an example of the definition of a zigzag join method within an
Details —
Transition statement — Next we will look at using optimization profiles for SQL
statements that include views rather than direct references to tables.
Instructor Guide
Implement a View
to simplify the Application Join SQL
AND H1.BRANCH_ID = B1.BRANCH_ID
AND H1.BRANCH_ID BETWEEN 80 AND 95 Original SQL Text
create view optview1 as (

SELECT H1.BRANCH_ID as BRANCH_ID,
T1.TELLER_NAME AS TELLER_NAME ,
H1.ACCTNAME AS ACCTNAME , H1.ACCT_ID AS ACCT_ID ,
H1.BALANCE AS BALANCE
WHERE H1.TELLER_ID = T1.TELLER_ID Create a View
AND H1.BRANCH_ID BETWEEN 80 AND 95 )
SELECT V1.BRANCH_ID, V1.TELLER_NAME , V1.ACCTNAME,

V1.ACCT_ID, V1.BALANCE, B1.BRANCH_NAME, B1.AREA_CODE
WHERE V1.BRANCH_ID = B1.BRANCH_ID Modified SQL
Figure 6-43. Implement a View to simplify the Application Join SQL CL4636.0
Notes:
The use of views rather than direct table references can somewhat complicate the
definition of optimization profiles, but there are several methods available to create the
table references in an optimization profile for SQL statements that use views.
The SQL statement with the three table join that we previously used has the following SQL
text:
We could create a view that would provide all the information needed from the HISTORY
and TELLER tables including the join predicate and the predicate that limits the results to a
specific range of bank branches.
V8.1
Instructor Guide
Uempty The following DDL statement could be used to create the view called optview1.
SELECT H1.BRANCH_ID as BRANCH_ID, T1.TELLER_NAME AS TELLER_NAME ,
An application could now be written that would use the following SQL statement to perform
the same access as our original SQL statement but be simplified using the optview1 view.
Instructor Guide
Instructor notes:
Purpose — To discuss how an administrator or application developer might need to create
optimization profiles based on SQL statements that use views rather than direct references
to tables. Here we show how a view might replace a portion of the sample SQL statement
text that we previously analyzed.
Details —
Transition statement — Next we will look at creating an optimization guideline for this
revised SQL statement.
V8.1
Instructor Guide
Uempty
An optimization guideline can
reference tables based on the View definition
WHERE V1.BRANCH_ID = B1.BRANCH_ID Application SQL
ORDER BY V1.ACCT_ID ASC , B1.BRANCH_ID ;
<OPTGUIDELINES>
</OPTGUIDELINES>

SELECT H1.BRANCH_ID as BRANCH_ID,
T1.TELLER_NAME AS TELLER_NAME ,
WHERE H1.TELLER_ID = T1.TELLER_ID View Definition
Figure 6-44. An optimization guideline can reference tables based on the View definition CL4636.0
Notes:
Now suppose that with this new SQL statement using the view optview1 we decide we
need to control the access plan generated with an optimization profile. We want to use a
table scan to access the BRANCH table and use the index HISTIX3 in a List Prefetch
access to the HISTORY table. The following optimization guideline could be defined.
<OPTGUIDELINES>
</OPTGUIDELINES>
The TABLE attribute used for the TBSCAN access request element is "B1" which is the
exposed name for the BRANCH table in the application SQL statement. The TABLE
attribute for the LPREFETCH access request element is "H1”, which matches the exposed
name of the HISTORY table in the definition of the optview1 view. This exposed name is
unambiguous within table references contained in the SQL statement and view definition.
The optimizer can interpret this guideline and use the HISTIX3 index in a List prefetch
operation to access the HISTORY table.
Instructor Guide
Instructor notes:
Purpose — To show how the exposed names for tables in a view definition can be used to
reference tables when creating an optimization guideline. This method works well when
you have control of the application SQL statement text and also control the view definitions.
Details —
Transition statement — Next we will see how we can use the optimized SQL text
generated by the optimizer to create table references for this same SQL statement.
V8.1
Instructor Guide
Uempty
The Optimized SQL Statement can be used to
resolve table references for optimization guidelines
Original Statement:
------------------
SELECT V1.BRANCH_ID, V1.TELLER_NAME , V1.ACCTNAME, V1.ACCT_ID, V1.BALANCE,
B1.BRANCH_NAME, B1.AREA_CODE
-------------------
<OPTGUIDELINES>
</OPTGUIDELINES>
Figure 6-45. The Optimized SQL Statement can be used to resolve table references for optimization guidelines CL4636.0
Notes:
In some cases, especially when working with more complex SQL statements using views, it
might be more difficult to make unambiguous references using the exposed table names.
In these cases, the optimized SQL statement, which is available in the db2exfmt Explain
reports, can be used with the TABID attribute for table references.
The visual shows the optimized SQL statement for our sample three table join as follows:
The optimized SQL statement assigned the correlation names of "Q1” to the HISTORY
table and "Q3" to the BRANCH table.
Instructor Guide
We can use these to create an optimization guideline as follows:

<OPTGUIDELINES>
</OPTGUIDELINES>
If this optimization guideline was going to be used over a period of time for the application,
it would be important to check the optimized SQL statement text when new product
releases or fix levels were added to make sure the TABID attributes had not changed.
V8.1
Instructor Guide

Purpose — To discuss using the correlation names in the optimized statement with the
TABID attributes to reference a table in an optimization guideline.
Details —
Additional information — There is currently no guarantee that correlation names in the
optimized statement will be stable when upgrading to a new release of the DB2 product.
Transition statement — Next we will define an optimization profile that uses TABID
attributes to refer to tables when the SQL statement uses a view.
Instructor Guide
Define a Statement guideline that

uses the TABID from the Optimized SQL text
Profile 15 – Set global and a statement level optimization guideline access tables based on the TABIDs
in the optimized statement
<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 15 History Query with View - Control Access methods - tabIDs ">
<![CDATA[
FROM OPTVIEW1 AS V1 , BRANCH AS B1 Profile uses TABIDs from the
WHERE
V1.BRANCH_ID = B1.BRANCH_ID
Optimized SQL text (db2exfmt)
]]> • Use a table scan for BRANCH
• Access the HISTORY Table using
</STMTKEY>
<OPTGUIDELINES>
Index HISTIX3 With List Prefetch
<LPREFETCH TABID="Q1" INDEX='HISTIX3' /> Optimizer selects other options
And Join Methods
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
Figure 6-46. Define a Statement guideline that uses the TABID from the Optimized SQL text CL4636.0
Notes:
We can create an optimization profile using the view based SQL statement previously
discussed and the optimized SQL text generated by the DB2 Optimizer, so that we can
control the access methods used for two of the three tables. The optimizer will be selecting
the join methods and the access method for the third table based on estimated costs.
V8.1
Instructor Guide

<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 15 History Query with View - Control Access
methods - tabIDs ">
<![CDATA[
WHERE
]]>
</STMTKEY>
<OPTGUIDELINES>
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
Instructor Guide
Instructor notes:
Purpose — To review a sample optimization profile that uses the correlation names taken
form the optimized SQL statement to define access request in an optimization profile.
Details —
Transition statement — Next we will look at the Explain tool report generated using the
optimization profile containing TABID attributes.
V8.1
Instructor Guide
Uempty
Two table join: Profile 15 uses TABIDs to specify
table access methods for Join SQL with a View
107851 1000
^HSJOIN INDEX: INST411
( 5) TELLINDX
7789.23 Q2
8656.41
/---------+---------\
107851 21
FETCH TBSCAN
( 6) ( 10)
7759.62 27.1073
8652.41 4
/---+----\ |
107851 513576 100
( 7) HISTORY BRANCH
278.449 Q1 Q3
200.543
|
107851
SORT
( 8)
278.449 Access Plan: Total Cost: 18495
200.543
| Cumulative I/O Cost: 11304
107851 Profile Information:
IXSCAN OPT_PROF: (Optimization Profile Name)
( 9) INST461.PROFILE15
247.447 STMTPROF: (Statement Profile Name)
200.543 Profile 15 History Query with View -
| Control Access methods - tabIDs
513576
INDEX: OPT
HISTIX3
Q1
Figure 6-47. Two table join: Profile 15 uses TABIDs to specify table access methods for Join SQL with a View CL4636.0
Notes:
listed on the previous slide that controls the access methods for the BRANCH and
HISTORY tables for the sample SQL statement that uses the optview1 view.
The optimizer selected a Hash join for joining the tables HISTORY and BRANCH. A nested
loop join was selected to join the composite to the TELLER table. The HISTORY table will
be used as the outer table for the first hash join and an index scan of the TELLER table will
be used as the inner table for the nested loop join. The BRANCH table will be accessed
using a table scan as directed by the optimization guideline and used as the inner table for
the Hash join.
Instructor Guide
Package Context:
---------------
SQL Type: Dynamic

QUERYNO: 1
QUERYTAG: CLP
Updatable: No
Deletable: No
Query Degree: 1
--------------------
INST461.PROFILE15
Profile 15 History Query with View - Control Access methods - tabIDs
Original Statement:
------------------
SELECT
V1.BRANCH_ID,
V1.TELLER_NAME,
V1.ACCTNAME,
V1.ACCT_ID,
V1.BALANCE,
B1.BRANCH_NAME,
B1.AREA_CODE
FROM
OPTVIEW1 AS V1,
BRANCH AS B1
WHERE
ORDER BY
V1.ACCT_ID ASC,
B1.BRANCH_ID
-------------------
SELECT
V8.1
Instructor Guide
Uempty Q3.BRANCH_ID
FROM
OPT.HISTORY AS Q1,
OPT.TELLER AS Q2,
OPT.BRANCH AS Q3
WHERE
(Q1.TELLER_ID = Q2.TELLER_ID) AND
(Q1.BRANCH_ID = Q3.BRANCH_ID)
ORDER BY
Q1.ACCT_ID,
Q3.BRANCH_ID
Access Plan:
-----------
Total Cost: 18495
Query Degree:1
Cumulative Total Cost: 18495
Rows
RETURN
( 1)
Cost
I/O
|
107851
^NLJOIN
( 2)
18495
11304.4
/------+-------\
107851 1
TBSCAN FETCH
( 3) ( 11)
17880.1 13.5387
11270.4 2
| /---+----\
107851 1 1000
SORT IXSCAN TABLE: OPT
( 4) ( 12) TELLER
16693.2 6.77192 Q2
9963.41 1
| |
107851 1000
^HSJOIN INDEX: INST411
( 5) TELLINDX
7789.23 Q2
8656.41
Instructor Guide
/---------+---------\
107851 21
FETCH TBSCAN
( 6) ( 10)
7759.62 27.1073
8652.41 4
/---+----\ |
107851 513576 100
( 7) HISTORY BRANCH
278.449 Q1 Q3
200.543
|
107851
SORT
( 8)
278.449
200.543
|
107851
IXSCAN
( 9)
247.447
200.543
|
513576
INDEX: OPT
HISTIX3
Q1
Notice that the correlation names that we used in the optimization profile are also shown in
the access plan section of the Explain report.
V8.1
Instructor Guide

Purpose — To review the access plan generated for the View based SQL statement using
the optimization profile that used the correlation names from the optimized SQL statement.
Details —
Transition statement — Next we will look at an optimization profile that contains a query
rewrite guideline.
Instructor Guide
Define a Statement guideline that

specifies the INLIST to Join Rewrite guideline
Profile 16 – Set a DB2 Registry variable using a global guideline and a statement level optimization guideline that
specifies a query rewrite rule

Profile Global section sets a compiler registry variable
<OPTGUIDELINES> that only effects application using the profile
<REGISTRY>
<OPTION NAME='DB2_REDUCED_OPTIMIZATION' VALUE='YES'/>
</REGISTRY>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 16 History Query 6 - In List ">
<![CDATA[
H1.BRANCH_ID IN (3,5,17,51,99)
]]>
</STMTKEY>
Profile specifies that the Query Rewrite guideline that
<OPTGUIDELINES>
<INLIST2JOIN TABLE='H1' /> converts an IN LIST to a Nested Loop Join should be used
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
Figure 6-48. Define a Statement guideline that specifies the INLIST to Join Rewrite guideline CL4636.0
Notes:
Query rewrite guidelines can be used to affect the transformations that are considered
during the query rewrite optimization phase, which transforms the original statement into a
semantically equivalent optimized statement.
The optimal execution plan for the optimized statement is determined during the plan
optimization phase. Consequently, query rewrite optimization guidelines can affect the
applicability of plan optimization guidelines.
Each query rewrite optimization guideline corresponds to one of the optimizer's query
transformation rules.
The following query transformation rules can be affected by query rewrite optimization
guidelines:
• IN-LIST-to-join
• Subquery-to-join
• NOT-EXISTS-subquery-to-antijoin
V8.1
Instructor Guide
Uempty • NOT-IN-subquery-to-antijoin
The sample optimization profile shown in the visual contains a INLIST2JOIN query
optimization guideline, defined within the statement profile.
The optimization profile example also contains a global guideline that sets the DB2 reistry
variable DB2_REDUCED_OPTIMIZATION to a value of ‘YES’.
The following SQL statement is defined in the profile. It contains the IN clause listing
several bank branches that should be included in the result.
H1.BRANCH_ID IN (3,5,17,51,99)
The following optimization guideline uses the INLIST2JOIN query rewrite guideline at the
statement level and a registry setting at the global level.
Instructor Guide
In this case the query rewrite rule is limited to a single defined SQL statement and only
associated with access to the HISTORY table.
<OPTGUIDELINES>
<REGISTRY>
<OPTION NAME='DB2_REDUCED_OPTIMIZATION' VALUE='YES'/>
</REGISTRY>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 16 History Query 6 - In List ">
<![CDATA[
H1.BRANCH_ID IN (3,5,17,51,99)
]]>
</STMTKEY>
<OPTGUIDELINES>
<INLIST2JOIN TABLE='H1' />
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
V8.1
Instructor Guide

Purpose — To show an example of an optimization guideline that specifies a query rewrite
guideline at the statement level.
Details —
Transition statement — First, we will look at the default access plan generated for the
sample query without any optimization profile in effect.
Instructor Guide
Access plan using default optimization

/--------------------+
25678.8
Access Plan: Total Cost: 5958 FETCH
Cumulative I/O Cost: 6330 ( 5)
5857.1
6296.5
Predicate: /---+----\
H1.BRANCH_ID IN (3,5,17,51,99) 25678.8 513576
RIDSCN TABLE: OPT
( 6) HISTORY
339.387 Q4
48.7
+----------------+----------------+----------------+----------------+
5135.76 5135.76 5135.76 5135.76 5135.76
SORT SORT SORT SORT SORT
( 7) ( 9) ( 11) ( 13) ( 15)
67.8776 67.8776 67.8776 67.8776 67.8776
9.74 9.74 9.74 9.74 9.74
| | | | |
5135.76 5135.76 5135.76 5135.76 5135.76
IXSCAN IXSCAN IXSCAN IXSCAN IXSCAN
( 8) ( 10) ( 12) ( 14) ( 16)
66.9907 66.9907 66.9907 66.9907 66.9907
9.74 9.74 9.74 9.74 9.74
| | | | |
513576 513576 513576 513576 513576
INDEX: OPT INDEX: OPT INDEX: OPT INDEX: OPT INDEX: OPT
HISTIX1 HISTIX1 HISTIX1 HISTIX1 HISTIX1
Q4 Q4 Q4 Q4 Q4
Figure 6-49. Access plan using default optimization CL4636.0
Notes:
The visual shows a portion of the access plan generated by the optimizer without using any
The SQL statement text was the following:
SELECT H1.BRANCH_ID, T1.TELLER_NAME , H1.ACCTNAME, H1.ACCT_ID, H1.BALANCE
AND H1.BRANCH_ID IN (3,5,17,51,99)
ORDER BY H1.BRANCH_ID ASC, H1.ACCT_ID ASCl
The access to the HISTORY table shows a series of five index scans, using the HISTIX1
index. Each index scan would retrieve the matching HISTORY rows for one of the values
listed in the IN LIST predicate. A list prefetch operation would be used to improve access
efficiency for each scan.
V8.1
Instructor Guide

Purpose — To review the access plan generated to process an IN LIST predicate by the
optimizer without any optimization profile in effect. You might notice that each of the five
index scans is estimated to return one percent of the table rows. This is based on default
table statistics, because there are 100 bank branches, each single bank would be
estimated to have the same number of HISTORY transactions.
Details —
Transition statement — Next we will look at access plans for this same query using the
optimization profile with the query rewrite guideline.
Instructor Guide
Access plan using a profile

with the INLIST to Join rewrite guideline
10000
NLJOIN
( 5) V9.7
1073.7
985.818
/------+-------\ 25678.8
5 2000 HSJOIN
TBSCAN FETCH ( 7) V10.5
( 6) ( 7) 8242.52
5.66813e-05 693.092 9177
0 639.798 /------+------\
| /---+----\ 513576 5
5 2000 200000 TBSCAN TBSCAN
TABFNC: SYSIBM IXSCAN TABLE: OPT ( 8) ( 9)
GENROW ( 8) HISTORY 8202.21 2.92853e-05
Q1 8.39007 Q4 9177 0
1 | |
| 513576 5
200000 TABLE: OPT TABFNC: SYSIBM
INDEX: OPT HISTORY GENROW
HISTIX1 Q4 Q1
Q4
Figure 6-50. Access plan using a profile with the INLIST to Join rewrite guideline CL4636.0
Notes:
The visual shows two different portions of access plans generated using the sample profile
shown. They were produced using two different HISTORY tables and explained using two
different software levels of DB2 LUW.
The join shown on the left was produced using DB2 LUW 9.7. The INLIST set of values is
used as a small table that is the outer table for a nested loop join, where the HISTORY
table is accessed using an index scan. In each case the optimizer estimates that one
percent of the HISTORY rows will match each single BRANCH_ID value.
The join shown on the right was produced based on a larger HISTORY table using DB2
LUW 10.5. Here the INLIST is shown as the inner table of a Hash join, with the HISTORY
table being accessed using a table scan as the outer table of the join.
In both cases the INLIST appears as if it was a small table, with a name
SYSIBM.GENROW, that is joined to a base table.
The optimization profile also contained a global guideline that set a DB2 registry variable
named DB2_REDUCED_OPTIMIZATION.
V8.1
Instructor Guide
Uempty There is a message in the diagnostic information section of the explain report that is
triggered by the DB2_REDUCED_OPTIMIZATION variable setting.
The following is the message text from the db2exfmt report.
--------------------------------
Diagnostic Details: EXP0039I Query complexity measure. Highest number
of joins in any query block: "2".
Instructor Guide
Instructor notes:
Purpose — To review two access plan sections generated using the optimization profile
containing a INLIST2JOIN query rewrite guideline.
Details —
Transition statement — Next we will look at using optimization profiles to effect the use of
Materialized Query Tables in generated access plans.
V8.1
Instructor Guide
Uempty
Evaluate several MQTs to reduce query costs

SELECT H1.TELLER_ID, T1.TELLER_NAME ,
SUM(H1.BALANCE) AS TOTAL_BALANCE, COUNT(*) AS TRANS_COUNT
H1.TELLER_ID BETWEEN 100 AND 200
GROUP BY H1.TELLER_ID , T1.TELLER_NAME Application SQL Text
ORDER BY H1.TELLER_ID ASC ;
The DB2 Design Advisor suggests the following MQT with a GROUP BY clause
CREATE SUMMARY TABLE OPTMQT1

AS
(SELECT Q4.C0 AS "C0", Q4.C1 AS "C1", Q4.C2 AS "C2", Q4.C3 AS "C3"
FROM TABLE(SELECT Q3.C0 AS "C0", Q3.C1 AS "C1",
SUM(Q3.C2) AS "C2", COUNT(* ) AS "C3"
FROM TABLE(SELECT Q2.TELLER_ID AS "C0", Q1.TELLER_NAME
AS "C1", Q2.BALANCE AS "C2" FROM INST411.TELLER AS
Q1, OPT.HISTORY AS Q2 WHERE (Q2.TELLER_ID = Q1.TELLER_ID))
AS Q3 GROUP BY Q3.C1, Q3.C0) AS Q4)
DATA INITIALLY DEFERRED REFRESH DEFERRED IN USERSPACE1
Figure 6-51. Evaluate several MQTs to reduce query costs CL4636.0
Notes:
Next we will be looking at using an optimization profile to effect the evaluation and selection
of Materialized Query Tables (MQT) to reduce the processing costs for database requests.
We will use a SQL statement that produces a summary result based on data in the
HISTORY and TELLER tables. This report provides a summary of transactions for a
specific range of bank tellers. The SQL statement is as follows:
GROUP BY H1.TELLER_ID , T1.TELLER_NAME
Instructor Guide
The DB2 Design Advisor can be used to suggest implementation of Materialized Query
Tables that could reduce processing costs for a defined SQL workload.
The Design Advisor was used to generate the following MQT definition, which was
estimated to significantly reduce estimated costs for the query.
CREATE SUMMARY TABLE OPTMQT1
AS
(SELECT Q4.C0 AS "C0", Q4.C1 AS "C1", Q4.C2 AS "C2", Q4.C3 AS "C3"
FROM TABLE(SELECT Q3.C0 AS "C0", Q3.C1 AS "C1", SUM(Q3.C2) AS "C2",
COUNT(* ) AS "C3"
FROM TABLE(SELECT Q2.TELLER_ID AS "C0", Q1.TELLER_NAME
AS "C1", Q2.BALANCE AS "C2" FROM INST411.TELLER AS
Q1, INST411.HISTORY AS Q2 WHERE (Q2.TELLER_ID = Q1.TELLER_ID))
AS Q3 GROUP BY Q3.C1, Q3.C0) AS Q4)
V8.1
Instructor Guide

Purpose — To introduce this section of the lecture where we will discuss using
optimization profiles to influence the selection of MQTs in access plans.
Details —
Transition statement — Next we will look at several other alternative Materialized Query
Tables that could also be used to improve performance for our sample query.
Instructor Guide
Create several alternative MQTs for testing

GROUP BY H1.TELLER_ID , T1.TELLER_NAME Application SQL Text
MQT based on joined tables and specific TELLER_ID data subset

CREATE TABLE OPTMQT2 AS (
H1.BALANCE FROM HISTORY AS H1 , TELLER AS T1
H1.TELLER_ID BETWEEN 1 AND 200 )
DATA INITIALLY DEFERRED REFRESH DEFERRED IN USERSPACE1 ;
MQT based on joined tables for all TELLER_ID values

WHERE H1.TELLER_ID = T1.TELLER_ID )
DATA INITIALLY DEFERRED REFRESH DEFERRED IN USERSPACE1 ;
Figure 6-52. Create several alternative MQTs for testing CL4636.0
Notes:
The MQT definition suggested by the Design Advisor would likely be selected by the DB2
Optimizer and would be able to handle our sample query very efficiently. There might be a
number of other application queries that need to be tuned that are more detailed and do not
contain the same GROUP BY clause, so this MQT could not be utilized for those requests.
We will implement two different MQT tables that would eliminate the join processing
overhead of using the TELLER and HISTORY tables, but would retain the information for
each transaction, not summarized by a GROUP BY clause.
One MQT will be defined to hold data for all one thousand bank tellers as follows:
WHERE H1.TELLER_ID = T1.TELLER_ID )
V8.1
Instructor Guide
Uempty Another MQT will be defined to hold data for the specific subset of bank tellers, including
the range used by our sample summary query, those with a TELLER_ID between 1 and
200, which would be about 20% of the transactions.
H1.TELLER_ID BETWEEN 1 AND 200 )
Instructor Guide
Instructor notes:
Purpose — To explain that we want to do some performance testing with several
alternative MQTs. The two MQTs shown here would reduce processing costs for our
sample query by joining the two tables, but would not be summarized using a GROUP BY
clause.
Details —
Transition statement — First, we will look at the default access plan generated by the
optimizer for our sample query, having these three MQT tables to choose from.
V8.1
Instructor Guide
Uempty
Default Access plan uses MQT with summary data

Rows
Access Plan: RETURN
----------- ( 1)
Total Cost: 95 Cost
I/O
|
101.126
TBSCAN
Extended Diagnostic Information: ( 2)
-------------------------------- 95.1028
14
Diagnostic Details: EXP0079W The following MQT was not used in the 101.126
final access plan, because the plan cost with this SORT
( 3)
MQT was more expensive or a better candidate was
95.1027
available: "OPT "."OPTMQT2". 14
Diagnostic Details: EXP0079W The following MQT was not used in the 101.126
final access plan, because the plan cost with this TBSCAN
MQT was more expensive or a better candidate was ( 4)
available: "OPT "."OPTMQT3". 95.0931
14
|
1000
TABLE: OPT
OPTMQT1
Q1
Figure 6-53. Default Access plan uses MQT with summary data CL4636.0
Notes:
The visual shows the default access plan generated by the optimizer for our sample query.
The original SQL statement text was the following:
Original Statement:
------------------
SELECT H1.TELLER_ID, T1.TELLER_NAME , SUM(H1.BALANCE) AS TOTAL_BALANCE,
COUNT(*) AS TRANS_COUNT
WHERE H1.TELLER_ID = T1.TELLER_ID AND H1.TELLER_ID BETWEEN 100 AND 200
ORDER BY H1.TELLER_ID ASC
Instructor Guide
The optimized SQL text contained in the Explain report shows that the MQT suggested by
the Design Advisor was going to replace the use of our TELLER and HISTORY tables as
follows:
-------------------
SELECT
Q1.C0 AS "TELLER_ID",
Q1.C1 AS "TELLER_NAME",
Q1.C2 AS "TOTAL_BALANCE",
Q1.C3 AS "TRANS_COUNT"
FROM
OPT.OPTMQT1 AS Q1
WHERE
(100 <= Q1.C0) AND
(Q1.C0 <= 200)
ORDER BY
Q1.C0
The extended diagnostic section of the Explain report shows that all three MQT tables were
considered to process this query, but OPTMQT1 was selected because its use resulted in a
lower estimated cost.
V8.1
Instructor Guide
Uempty Here is a portion of the Explain report generated without any optimization profile.
Package Context:
---------------
SQL Type: Dynamic

QUERYNO: 2
QUERYTAG: CLP
Updatable: No
Deletable: No
Query Degree: 1
Original Statement:
------------------
SELECT
H1.TELLER_ID,
T1.TELLER_NAME,
SUM(H1.BALANCE) AS TOTAL_BALANCE,
FROM
HISTORY AS H1,
TELLER AS T1
WHERE
GROUP BY
H1.TELLER_ID,
T1.TELLER_NAME
ORDER BY
H1.TELLER_ID ASC
-------------------
SELECT
Q1.C0 AS "TELLER_ID",
Q1.C1 AS "TELLER_NAME",
Q1.C2 AS "TOTAL_BALANCE",
Q1.C3 AS "TRANS_COUNT"
FROM
OPT.OPTMQT1 AS Q1
WHERE
(100 <= Q1.C0) AND
(Q1.C0 <= 200)
ORDER BY
Q1.C0
Access Plan:
-----------
Instructor Guide
Total Cost: 95.1028

Query Degree:1
Rows
RETURN
( 1)
Cost
I/O
|
101.126
TBSCAN
( 2)
95.1028
14
|
101.126
SORT
( 3)
95.1027
14
|
101.126
TBSCAN
( 4)
95.0931
14
|
1000
TABLE: OPT
OPTMQT1
Q1

--------------------------------
Diagnostic Details: EXP0079W The following MQT was not used in the
final access plan, because the plan cost with this
available: "OPT "."OPTMQT2".
V8.1
Instructor Guide
Uempty Diagnostic Details: EXP0148W The following MQT or statistical view was
considered in query matching: "OPT "."OPTMQT1".
considered) in query matching: "OPT "."OPTMQT1".
Instructor Guide
Instructor notes:
Purpose — To show that, having all three MQTs available, the DB2 Optimizer would select
the one with the lowest estimated cost to perform our sample summary query.
Details —
Transition statement — Next we will look at using an optimization profile to effect the
selection of MQTs during SQL compilation.
V8.1
Instructor Guide
Uempty
Define a Statement guideline to specify the use of a
specific MQT regardless of cost for one SQL statement
Profile 18 – Set global MQT list for all statements and a statement level optimization guideline to force the use of
one specific MQT

<OPTGUIDELINES>
<MQT NAME="OPT.OPTMQT2" />
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 8 History Query 4 - MQTs ">
<![CDATA[
WHERE H1.TELLER_ID = T1.TELLER_ID AND • Profile limits MQT evaluation for all
H1.TELLER_ID BETWEEN 100 AND 200 statements to two MQTs
GROUP BY H1.TELLER_ID , T1.TELLER_NAME • Profile sets OPTMQT3 usage for one
ORDER BY H1.TELLER_ID ASC SQL text regardless of plan cost
]]>
</STMTKEY>
<OPTGUIDELINES>
<MQTENFORCE NAME="OPT.OPTMQT3" />
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
Figure 6-54. Define a Statement guideline to specify the use of a specific MQT regardless of cost for one SQL statement CL4636.0
Notes:
The MQTENFORCE element can be used as either a global or statement level optimization
guideline to override the optimizer cost based decision by forcing it to choose specific
MQTs.
We can create an optimization profile using the summary SQL statement previously
discussed to force the optimizer to select the MQT named OPTMQT3 for the access plan
for the SQL statement even though other access plans would have lower total cost.
We will include two MQT global optimization choice elements in the optimization profile and
add a statement profile matching our SQL text with the MQTENFORCE element.
This would limit the selection of MQTs for any non-matching SQL text used by the
application to the two MQT tables listed.
Instructor Guide

<OPTGUIDELINES>
</OPTGUIDELINES>
<STMTPROFILE ID="Profile 8 History Query 4 - MQTs ">
<![CDATA[
ORDER BY H1.TELLER_ID ASC
]]>
</STMTKEY>
<OPTGUIDELINES>
<MQTENFORCE NAME="OPT.OPTMQT3" />
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
You might have an application that uses an optimization profile containing several
statement level guidelines. In that case, it could be important to limit the selection of one
specific MQT using the MQTENFORCE element to the single SQL statement where that
change in access plan is considered important.
Note
Using MQT and MQTENFORCE elements in an optimization profile do not override the
standard rules that the DB2 Optimizer uses for selection of MQT usage in an access plan.
For example, using a deferred refresh MQT will still require setting the REFRESH AGE
option to ANY and the optimizer will not perform any MQT matching when the query
optimization class is set to 0, 1 or 3.
V8.1
Instructor Guide

Purpose — To discuss using the MQTENFORCE element in an optimization profile to
force the optimizer to select a specific MQT for the access plan, even when other
alternative plans have lower estimated costs.
Details —
optimization profile with the MQTENFORCE optimization guideline when the SQL text does
not match the statement profile.
Instructor Guide
Access plan based on non-matching SQL text

51
GRPBY
Access Plan: ( 2)
----------- 1043.89
Total Cost: 1043 954
|
Cumulative I/O Cost: 954 51
TBSCAN
Profile Information: ( 3)
-------------------- 1043.89
954
OPT_PROF: (Optimization Profile Name) |
INST461.PROFILE8 51
SORT
Extended Diagnostic Information: ( 4)
1043.89
-------------------------------- 954
|
Diagnostic Identifier: 1 51.01
Diagnostic Details: EXP0052W The following MQT or statistical view was pGRPBY
( 5)
not considered for rewrite matching because it did 1043.88
not match with any MQTs specified in any 954
optimization profiles: "OPT "."OPTMQT1". |
TBSCAN
Diagnostic Details: EXP0079W The following MQT was not used in the ( 6)
final access plan, because the plan cost with this 1039.82
MQT was more expensive or a better candidate was 954
available: "OPT "."OPTMQT3". |
95297
TABLE: OPT
OPTMQT2
Q1
Figure 6-55. Access plan based on non-matching SQL text CL4636.0
Notes:
The visual shows a portion of the access plan generated by the optimizer using the
optimization profile that contains the statement level MQTENFORCE guideline and global
MQT guidelines when the SQL text does not match the statement guideline.
The generated access plan shows MQT named OPTMQT2 was chosen to produce the
query result based on the MQTENFORCE guideline in our optimization profile. The
estimated total cost for this access plan is 1043 which is much higher than the estimated
costs for the default plan. Most of this cost is the additional I/Os needed to scan the larger
MQT required for the detailed data.
V8.1
Instructor Guide
Uempty The Extended Diagnostic Information section of the Explain report shows the following
message indicating that the MQT named OPTMQT1 was excluded because it was not
listed in the optimization profile:
not considered for rewrite matching because it did
not match with any MQTs specified in any
optimization profiles: "OPT "."OPTMQT1".
There is a diagnostic message explaining why the MQT named OPTMQT3 was not
selected for use.
The Profile Information section of the Explain report does shows the profile name and does
not include the statement ID, since it was not a matching SQL text.
--------------------
INST461.PROF8
Instructor Guide
Here is a portion of the Explain report generated using this optimization profile.
Package Context:
---------------
SQL Type: Dynamic

QUERYNO: 1
QUERYTAG: CLP
Updatable: No
Deletable: No
Query Degree: 1
--------------------
INST461.PROF8
Original Statement:
------------------
SELECT
H1.TELLER_ID,
T1.TELLER_NAME,
FROM
HISTORY AS H1,
TELLER AS T1
WHERE
GROUP BY
H1.TELLER_ID,
T1.TELLER_NAME
ORDER BY
H1.TELLER_ID ASC
-------------------
SELECT
Q4.TELLER_ID AS "TELLER_ID",
Q4.$C2 AS "TOTAL_BALANCE",
Q4.$C3 AS "TRANS_COUNT"
FROM
(SELECT
Q3.TELLER_ID,
Q3.TELLER_NAME,
SUM(Q3.BALANCE),
V8.1
Instructor Guide
Uempty COUNT(*)
FROM
(SELECT
Q2.TELLER_ID,
Q1.TELLER_NAME,
Q2.BALANCE
FROM
OPT.TELLER AS Q1,
OPT.HISTORY AS Q2
WHERE
(100 <= Q2.TELLER_ID) AND
) AS Q3
GROUP BY
Q3.TELLER_NAME,
Q3.TELLER_ID
) AS Q4
ORDER BY
Q4.TELLER_ID
Access Plan:
-----------
Total Cost: 1043.89
Query Degree:1
Cumulative Re-Total Cost: 198
Rows
RETURN
( 1)
Cost
I/O
|
51
GRPBY
( 2)
1043.89
954
|
51
TBSCAN
( 3)
1043.89
954
|
51
SORT
( 4)
1043.89
Instructor Guide
954
|
51.01
pGRPBY
( 5)
1043.88
954
|
21010.7
TBSCAN
( 6)
1039.82
954
|
95297
TABLE: OPT
OPTMQT2
Q1
V8.1
Instructor Guide

Purpose — To review the portions of the Explain report showing the access plan and
diagnostic messages generated when using the optimization profile with the
MQTENFORCE guideline.
Details —
Transition statement — Next we will look at the Explain report generated using the same
optimization profile with the MQTENFORCE optimization guideline when the SQL text does
match the statement profile.
Instructor Guide
Access plan based on matching SQL text

101
Access Plan: GRPBY
----------- ( 2)
Total Cost: 4385 4385.82
4733
Cumulative I/O Cost: 4733 |
101
Profile Information: TBSCAN
-------------------- ( 3)
4385.81
OPT_PROF: (Optimization Profile Name) 4733
INST461.PROFILE8 |
STMTPROF: (Statement Profile Name) 101
Profile 8 History Query 7 - MQTs SORT
( 4)
4385.81
Extended Diagnostic Information: 4733
-------------------------------- |
pGRPBY
Diagnostic Details: EXP0052W The following MQT or statistical view was ( 5)
not considered for rewrite matching because it did 4385.79
not match with any MQTs specified in any 4733
optimization profiles: "OPT "."OPTMQT1". |
68140
Diagnostic Identifier: 2 TBSCAN
Diagnostic Details: EXP0079W The following MQT was not used in the ( 6)
final access plan, because the plan cost with this 4372.66
MQT was more expensive or a better candidate was 4733
|
available: "OPT "."OPTMQT2". 472956
TABLE: OPT
OPTMQT3
Q1
Figure 6-56. Access plan based on matching SQL text CL4636.0
Notes:
The visual shows a portion of the access plan generated by the optimizer using the
optimization profile that contains the statement level MQTENFORCE guideline and global
MQT guidelines when the SQL text does match the statement guideline.
The generated access plan shows MQT named OPTMQT3 was chosen to produce the
query result based on the MQTENFORCE guideline in our optimization profile. The
estimated total cost for this access plan is 4385 which is higher than the estimated costs
using the other MQT tables. Most of this cost is the additional I/Os needed to scan the
larger MQT which contains data for all TELLER_ID values.
V8.1
Instructor Guide
Uempty The Extended Diagnostic Information section of the Explain report shows the following
message indicating that the MQT named OPTMQT1 was excluded because it was not
listed in the optimization profile:
not considered for rewrite matching because it did
not match with any MQTs specified in any
optimization profiles: "OPT "."OPTMQT1".
There is a diagnostic message explaining why the MQT named OPTMQT2 was not
selected for use. The message implies the MQT was not selected based on cost, but in this
example, the MQTENFORCE guideline was the reason it was bypassed.
The Profile Information section of the Explain report shows the profile name and also the
statement ID of the staement profile that matched the SQL text.
--------------------
INST461.PROF8
Profile 8 History Query 4 - MQTs
Instructor Guide
Here is a portion of the Explain report generated using this optimization profile.
Package Context:
---------------
SQL Type: Dynamic

QUERYNO: 1
QUERYTAG: CLP
Updatable: No
Deletable: No
Query Degree: 1
--------------------
INST461.PROF8
Profile 8 History Query 4 - MQTs
Original Statement:
------------------
SELECT
H1.TELLER_ID,
T1.TELLER_NAME,
FROM
HISTORY AS H1,
TELLER AS T1
WHERE
GROUP BY
H1.TELLER_ID,
T1.TELLER_NAME
ORDER BY
H1.TELLER_ID ASC
-------------------
SELECT
Q4.TELLER_ID AS "TELLER_ID",
Q4.$C2 AS "TOTAL_BALANCE",
Q4.$C3 AS "TRANS_COUNT"
FROM
(SELECT
Q3.TELLER_ID,
V8.1
Instructor Guide
Uempty Q3.TELLER_NAME,
SUM(Q3.BALANCE),
COUNT(*)
FROM
(SELECT
Q2.TELLER_ID,
Q1.TELLER_NAME,
Q2.BALANCE
FROM
OPT.TELLER AS Q1,
OPT.HISTORY AS Q2
WHERE
(100 <= Q2.TELLER_ID) AND
) AS Q3
GROUP BY
Q3.TELLER_NAME,
Q3.TELLER_ID
) AS Q4
ORDER BY
Q4.TELLER_ID
Access Plan:
-----------
Total Cost: 4385.82
Query Degree:1
Rows
RETURN
( 1)
Cost
I/O
|
101
GRPBY
( 2)
4385.82
4733
|
101
TBSCAN
( 3)
4385.81
4733
|
101
SORT
Instructor Guide
( 4)
4385.81
4733
|
102.257
pGRPBY
( 5)
4385.79
4733
|
68140
TBSCAN
( 6)
4372.66
4733
|
472956
TABLE: OPT
OPTMQT3
Q1
V8.1
Instructor Guide

Purpose — To review the portions of the Explain report showing the access plan and
diagnostic messages generated when using the optimization profile with the
MQTENFORCE guideline.
Details —
Transition statement — Next we will review some suggestions for successful
implementation of optimization profiles. Purpose — To review the portions of the Explain
report showing the access plan and diagnostic messages generated when using the
optimization profile with the MQTENFORCE guideline.
Details —
Transition statement — Next we will review some suggestions for successful
implementation of optimization profiles.
Instructor Guide
Suggestions for success with Optimizer profiles

• Do NOT wait until you have a performance crisis to implement your first
optimizer profile
• Start with simple examples to develop working samples
• Use the ID attribute of the STMTPROFILE to verify the use of a profile in
the db2exfmt explain tool report
• Pick a profile name that can easily be linked back to the profile XML
document for analysis
• You can use an optimizer profile to help understand the optimizer’s view
of resource costs for the access plan that you think would be efficient
• An optimizer profile may be the quickest way to test a possible solution
to a performance problem when standard analysis may take too long
• Remember to flush the profile cache when making changes to existing
profiles
• You might find an XML editor helpful in locating any format errors in the
XML document used for the optimization profile
Figure 6-57. Suggestions for success with Optimizer profiles CL4636.0
Notes:
Here are some suggestions for successful implementation of optimization profiles to control
the access plans generated by the DB2 Optimizer.
There is some learning curve involved in implementing the first optimization profiles for an
application system. If you think you might want to use an optimization profile to quickly fix a
problem access plan when a performance problem is encountered, the pressures of that
type of situation can make the simple mistakes that you will likely make on setting up your
profile much more difficult to find and fix.
It will help to gain some confidence working with some simple SQL statements that have
less complex access plans. Once you have a set of working examples of optimization
profiles, it will be easier to create more complex profiles based on these examples.
It can be helpful to change the ID attribute in the statement profile each time that you make
a change to the optimization profile. This ID will appear in the Explain report in the Profile
Information section and it will make it clear that the new version of the optimization profile
was actually used. If you are testing a new optimization profile, making a series of changes
to get a particular access plan, you might forget that the previous version of the profile
V8.1
Instructor Guide
Uempty might be cached in memory and the updated version of the profile was not really used.
Changing the ID can help avoid confusion and save time.
You assign the profile a name and schema when it is inserted into the
SYSTOOLS.OPT_PROFILE table. Select the profile name and schema so that you can
easily match it with the XML document that defines the profile. The Explain report only
shows the profile name and schema.
One useful way to make use of an optimization profile is to force the optimizer to select a
particular index or join method that you expected to be used and then review the Explain
report to see why that technique was considered to have a higher total cost. That could
help resolve a performance problem in a way that would not require an optimization profile
to be necessary.
Using an optimization profile might be a quicker method of resolving a performance
problem when limited time is available. Other methods of altering an access plan can take
a considerable amount of time and involve some trial and error. In many cases, a
performance problem might be rooted in outdated or insufficient table or index statistics,
but it could take hours of analysis trying to collect the necessary statistics and review the
Explain reports to see if a better access plan would be selected. The optimization profile
has the benefit of providing more direct controls over the access plan selected and can be
implemented in a very targeted manner so that only specific application SQL statements
are impacted.
If you are testing the impact of different optimization profiles you can assign a new unique
schema and name to a profile to avoid matching any previous profile that might be cached
in database memory. If you need to change a profile that is currently being used by
applications, you might need to keep the same profile name. You can use the FLUSH
OPTIMIZATION PROFILE CACHE statement to clear the previous version of a profile from
memory of an active database and allow the new version to take effect.
The diagnostic messages in the Explain report can point to the area in the XML document
used for the optimization profile where some problem is detected. You might find that using
a XML editor can be helpful to find the format problems that might exist in the optimization
profile document when the message indicates that your XML document is not well formed.
Instructor Guide
Instructor notes:
Purpose — To discuss a group of suggestions that might help developers or administrators
implementing optimization profiles to avoid problems and achieve the required application
performance results that are needed.
Details —
V8.1
Instructor Guide
Uempty
Unit summary
• Create an optimizer profile to control access plans generated for SQL
statements
optimizer profile
• Specify the join methods used to join tables using an optimizer profile
• Use the DB2 Explain tool to verify the access plan created based on an
optimizer profile and to resolve any profile format problems
• Create an optimizer profile to control use of Materialized Query Tables in
the generated access plan
Notes:
Instructor Guide
Instructor notes:
Purpose —
Details —
V8.1
Instructor Guide
Uempty
Student exercise 5
Notes:
Instructor Guide
Instructor notes:
Purpose —
Details —
V8.2
Instructor Guide
Uempty Unit 7. Table Partitioning
Estimated time
02:30

This unit describes the range-based table partitioning. Some common
alternatives for managing large amounts of data in DB2 will be briefly
reviewed, including database partitioning, multidimensional clustering
and use of UNION ALL views. The basic concepts and definition of
range-based table partitioning will be presented. We will describe use
of multiple table spaces for the data, indexes and large object portions
of range-partitioned tables. The use of the new ALTER TABLE options
ATTACH, DETACH and ADD to support data roll-in and roll-out for
partitioned tables will be covered. The impact of partitioned and
non-partitioned indexes will be explained. The functions performed by
SET INTEGRITY for range-partitioned tables will be presented. One
section of the presentation will provide suggestions for selection of
table partitioning, MDC and database partitioning and when it might be
beneficial to combine several of these features for an application.

• Describe the alternative options for handling data roll-in and roll-out
including database partitioning, Multi-Dimensional Clustering
(MDC) and UNION ALL views
• Describe the basic concepts for range-based table partitioning,
including partitioned and non-partitioned indexing and multiple
table spaces
• Define the data partition ranges for a table using the short and long
form syntax
• List the steps used for data roll-in and roll-out for table partitioning,
including ATTACH, DETACH and ADD for data partitions
• Plan the use of online SET INTEGRITY as part of the roll-in and
roll-out processing for range-partitioned tables
• Describe the maintenance for refresh immediate materialized
query tables (MQT) when used with table partitioning
© Copyright IBM Corp. 2005, 2015 Unit 7. Table Partitioning 7-1

Instructor Guide
• Select between table partitioning, MDC, and database partitioning

depending on the application and data characteristics
V8.2
Instructor Guide
Uempty
Unit objectives
including database partitioning, Multi-Dimensional Clustering (MDC)
and UNION ALL views
including partitioned and non-partitioned indexing and multiple table
spaces
• Define the data partition ranges for a table using the short and long form
syntax
• Plan the use of SET INTEGRITY as part of the roll-in and roll-out
processing for range-partitioned tables
• Describe the maintenance for refresh immediate materialized query
tables (MQT) when used with table partitioning
• Select between table partitioning, MDC clustering, and database
partitioning depending on the application and data characteristics
Notes:

Instructor Guide
Instructor notes:
V8.2
Instructor Guide
Uempty 7.1. Table Partitioning
Instructor topic introduction

What students will do —
How students will do it —
What students will learn —
How this will help students on their job —

Instructor Guide
Large table design alternatives

• Table Design issues:
– Very Large Tables
– High Availability
– Scalability
– Efficient access plans for complex queries Multiple Database
Partitions
– Efficient Load (Roll-in and Roll-out )
Multi Dimensional
Clustering MDC Union ALL Views Range Partitioned
Tables
Figure 7-2. Large table design alternatives CL4636.0
Notes:
Database Administrators and Applications Developers are confronted with increasingly
demanding systems with:
• Very Large Tables – As the price of disk storage continues to decrease, the demand to
store more information has increased.
• High Availability – Many systems provide 24 x 7 access to data and are required to
planned outages and scheduled maintenance as short as possible.
• Scalability – Databases need to be handle growth in a way that system resources can
be added to maintain performance in a cost effective manner.
• Efficient access plans for complex queries – Application users expect to be able to
perform analysis on large portions of very large tables with high performance
characteristics.
• Efficient Load (Roll-in and Roll-out) – Many applications need to add and remove
large amounts of data on a periodic basis, possibly weekly or monthly, adding to
V8.2
Instructor Guide
Uempty existing tables in a manner that minimizes the time required to add the data and also
minimizes any loss of application access to the data.
With DB2 LUW 10, Database Administrators have a number of tools and techniques
available to address these requirements including:
• Database Partitioning
• Multi-Dimensional Clustering (MDC) tables
• UNION ALL views
Beginning with DB2 9 for Linux, UNIX and Windows, range-based table partitioning
provides a significant new tool to this list.

Instructor Guide
Instructor notes:
Purpose — To introduce the application characteristics that might utilize the table
partitioning functions in DB2 10 and list the options that were available in prior releases of
DB2 LUW to deal with these issues. Do not try to explain the use of these areas in any
detail., we will be covering these options at an overview level to make sure the terminology
is understood by students.
Details —
Transition statement — Let's take a quick look at some of these features.
V8.2
Instructor Guide
Uempty
Database partitioning
• Database partitioning distributes rows based on a system
hashing algorithm
Orders Table
CREATE TABLE ORDERS
(ordernum int not null,
orderdate date not null…… )
DB2 Hash Routine
DISTRIBUTE BY HASH (ordernum) ….
Host 1 Host 2
Database Database Database Database
Partition 1 Partition 2 Partition 3 Partition 4
db2nodes.cfg
1 HOST1 0 TS1 TS1 TS1 TS1
2 HOST1 1 Orders Orders Orders Orders
3 HOST2 0 Table Table Table Table
4 HOST2 1 Indexes Indexes Indexes Indexes
Up to 1000 Logical Database Partitions

Figure 7-3. Database partitioning CL4636.0
Notes:
The use of Database Partitions to handle very large tables has been available for the DB2
for Linux, UNIX and Windows environment for a number of years. This database option is
currently provided by the InfoSphere Balanced Warehouse offering. This feature allows a
large table to be automatically divided into manageable subsets. DPF utilizes a shared
nothing architecture that allows the data to be spread across multiple database servers
with up to 1000 logical database partitions.
The decision to use the DB2 Database Partitioning is not generally made based purely on
data volume, but more on the basis of the workload. As a general guideline, most
partitioned database deployments are in the area of data warehousing and business
intelligence. DB2 Database partitioning is highly recommended for large complex query
environments, because its shared-nothing architecture allows for outstanding scalability.
In a DB2 Partitioned Database, a system hashing algorithm distributes the rows across the
logical database partitions in an effort to balance the number of rows stored on each as
evenly as possible. This provides a high degree of parallel processing when accessing the
large tables, each database partition performing a portion of the query in parallel to reduce

Instructor Guide
query execution time. Each table would have one or more columns defined as the
partitioning key that would be the input to the hashing routing and would determine each
row's location.
In the following example, the DISTRIBUTE BY HASH clause designates the column named
ordernum as the partitioning key for the orders table:
CREATE TABLE ORDERS
(ordernum int not null,
orderdate date not null…… )
DISTRIBUTE BY HASH (ordernum) ….
In a DB2 Partitioned database, the indexes on each table are divided across the partitions
so the all the index pointers to each row are stored on the local database partition that the
row is stored on.
V8.2
Instructor Guide

Purpose — This should be used to provide a very basic overview of Database partitioning
available for DB2 LUW databases. One of the key drivers for using Database Partitioning
had been the limited addressing available for DB2 tables in a single partition database, the
four byte address for rows allowed 64 GB of data to be stored with 4K pages and 512 GB of
data with 32K pages. Starting with DB2 9 a six-byte addressing capability is associated
with LARGE tablespaces. The db2nodes.cfg file defines the configuration of a DB2
Partitioned Database at the DB2 instance level and determines how many database
partitions are in use and which system each database partition runs on.
Details —
Transition statement — Let's look at the concepts involving with Multi-Dimensional
Clustering tables.

Instructor Guide
Multi Dimensional Clustering table
MDC stores rows in blocks based on defined dimension columns

CREATE TABLE ORDERS
(ordernum int not null, orderdate date not null, partnum int not null ..…… )
ORGANIZE BY DIMENSIONS (orderdate, partnum) ….
Block Indexes
Orders Table TS1
orderdate, 07/20/06 Part 100
partnum
07/20/06 Part 101 Row Indexes
orderdate
07/20/06 Part 102 ordernum
partnum
07/21/06 Part 100
07/21/06 Part 100
07/22/06 Part 100
Figure 7-4. Multi Dimensional Clustering table CL4636.0
Notes:
Multi-Dimensional Clustering (MDC) tables were introduced in DB2 LUW Version 8 and
provided some powerful features to address application requirements.
MDC tables store the table's rows in blocks based on the values in the columns defined as
the dimensions in the ORGANIZE BY DIMENSIONS clause when the table is created.
In the example below a table is defined with two dimensions, the columns orderdate and
partnum, is created.
CREATE TABLE ORDERS
(ordernum int not null, orderdate date not null, partnum int not null ..…… )
ORGANIZE BY DIMENSIONS (orderdate, partnum) ….
For MDC tables, DB2 uses the tablespace extent size as the block size for the table. Only
rows with the same values for the dimension columns will be stored in each block.
In the example, rows with the same orderdate and partnum values would be stored in the
same block. So one block might contain all the orders for partnum 101 on orderdate
7/20/2006, another block could hold the order for partnum 102 with the same orderdate.
V8.2
Instructor Guide
Uempty The example show two blocks needed to hold the rows for partnum 100 on orderdate
7/21/2006. Some portion of each block might be used if there are not many rows with that
combination of dimension values. There can be many blocks for a given combination of
dimension values.
This method of storing MDC rows eliminates the need to reorganize the table to recluster
the rows in a clustered sequence, which improves the availability of MDC tables.
Selection of appropriate block size and dimension columns are critical for MDC to perform
well and be stored efficiently. In some cases, generated column values might be used to
handle tables where the column values are too unique to efficiently fill the data blocks.
One of the unique advantages of using MDC tables is the block level indexing for the
dimension columns. These block level indexes support very efficient access to large
subsets of large tables; each index entry causes a block of matching rows to be prefetched
from disk.
Another feature of MDC tables supports efficient Roll-in and Roll-out of data using the
LOAD utility. The blocks freed when a SQL DELETE is used to remove a large group of
rows, can be reused for the Roll-in of new data by the LOAD utility.
The free blocks can also be released very efficiently using the RECLAIM EXTENTS ONLY
option of the REORG utility.

Instructor Guide
Instructor notes:
Purpose — This provides a simple example of using a MDC table. Do not try to go into too
much detail. Point out the combination of Block and Row level indexes. The graphic
assumes varied numbers of rows with common values for the partnum and orderdate
columns. The block indexes only require one index entry for each block, while row indexes
point to each row and can be hundreds of times larger. In order to support large tables, a
MDC table could be combined with Database partitioning. Point out that the MDC data
portion would be stored in a single table space and therefore would be limited in size in a
single partition database.
Details —
Transition statement — Let's take a quick look at using UNION ALL views to support
large tables.
V8.2
Instructor Guide
Uempty
Using UNION ALL views

UNION ALL view combines multiple physical tables into one
Logical table
CREATE VIEW ORDERS TS1 ORDERS_Q12010 Indexes

AS (
(Select * from orders_Q12010 )
UNION ALL
(Select * from orders_Q22010 ) TS2 ORDERS_Q22010 Indexes
UNION ALL
(Select * from orders_Q32010)
UNION ALL
(Select * from orders_Q42010 ) ) TS3 ORDERS_Q32010 Indexes
TS4 ORDERS_Q42010 Indexes
Figure 7-5. Using UNION ALL views CL4636.0
Notes:
Another approach to providing applications access to very large tables is use of UNION
ALL views. A group of normal tables would be created to store distinct subsets of the
application's data. Each table would have a Check Constraint defined to limit the table's
contents, like a range of dates or any other important grouping used by the application.
A VIEW would be created that includes a 'SELECT * FROM ' for each table with a UNION
ALL clause between each SELECT. For example:
CREATE VIEW ORDERS
AS (
UNION ALL
UNION ALL
(Select * from orders_Q32010)
UNION ALL
(Select * from orders_Q42010 ) )

Instructor Guide
Each table could be created in its own table space. This would allow the application to
access a logical table that was not limited in storage by the DB2 limits for each table and
table space.
The DB2 Utilities like LOAD, RUNSTATS, and REORG would operate at the individual
table level, but SQL INSERT, UPDATE, DELETE and SELECT can use the view to give the
appearance of a single large table.
With UNION ALL views, the DB2 Optimizer can use the defined Check Constraints to
generate access plans that would only access the underlying tables that were needed
based on the SQL statement's predicates. In the example, each table would have a check
constraint defined, like ORDERDATE BETWEEN '1/1/2010' AND '3/31/2010', for the table
ORDERS_Q12010. A query with a predicate of ORDERDATE BETWEEN '5/1/2010' AND
'5/31/2010' that accessed the UNION ALL view would only need to read data from the table
ORDERS_Q22010.
In order to support roll-in and roll-out, the new set of data could be loaded into a new table
with its own indexes and the UNION ALL view would be dropped and recreated to add the
new table or remove a table that is no longer needed.
V8.2
Instructor Guide

Purpose — This shows a simple example of using a UNION ALL view to support access to
larger tables. This option is the one most directly comparable to the table partitioning in
DB2 LUW. A key difference is that the DB2 utilities like LOAD, RUNSTATS and REORG will
be operating at the Partitioned Table level rather than at the individual table level for the
tables that make a UNION ALL view.
Details —
Transition statement — Let's look at using materialized query tables or MQTs to improve
the performance of queries that access large tables.

Instructor Guide
Basics of Materialized Query Tables

• A physical table containing precomputed results of a query
• Maintained by the database or user maintained
• Example:
CREATE TABLE sales_by_region AS
(SELECT region, COUNT(*) AS count, SUM(charge)…
FROM sales GROUP BY region)
DATA INITIALLY DEFERRED REFRESH IMMEDIATE;
• Generally defined to contain summaries of fact table
• Optimizer will automatically route to an MQT, when access
costs are reduced
• Example:
SELECT SUM(charge) FROM sales
WHERE REGION IN (‘NW’, ‘NE’);
SALES
SALES_BY_REGION
Figure 7-6. Basics of Materialized Query Tables CL4636.0
Notes:
Materialized Query Tables, or MQTs, are often used to improve the performance of queries
that need to access large tables and might involve joining several tables or grouping results
for summarization.
The MQT is created as a physical table that contains the result of a query defined by a SQL
SELECT statement.
For example:
CREATE TABLE sales_by_region AS
(SELECT region, COUNT(*) AS count, SUM(charge)…
FROM sales GROUP BY region)
DATA INITIALLY DEFERRED REFRESH IMMEDIATE;
In this example, an MQT named sales_by_region contains a query result from the sales
table. The REFRESH IMMEDIATE clause implies that DB2 will continually update the MQT
to provide an accurate result whenever the sales table changes. The DB2 Optimizer can
V8.2
Instructor Guide
Uempty safely substitute the sales_by_region table for a query that accesses the sales table and
matches the contents of the query.
An MQT can be used to dramatically reduce the system resources required to generate a
query result without needing to change an application to directly refer to the MQT.

Instructor Guide
Instructor notes:
Purpose — To provide a simple example of using an MQT to improve query performance.
An MQT can also be defined as REFRESH DEFERRED or MAINTAINED BY USER but the
REFRESH IMMEDIATE MQT is the most important to discuss in relation to table
partitioning because of the impact of using the new ATTACH and DETACH commands on
any refresh immediate MQTs. Since it will be common to define MQTs with a table
partitioned table as a source table, it is necessary to have a basic understanding of MQTs.
Details —
Transition statement — Now let's look at the features of table partitioning.
V8.2
Instructor Guide
Uempty
Table partitioning: What is it and Why use it?

• Allows a single logical table to be
broken up into multiple separate Without Partitioning
physical storage objects:
– Each corresponds to a partition of the
table SALESDATA
– Partition boundaries correspond to
specified value ranges in a specified
partition key
• Main Benefits: With Partitioning
– Allows for partition elimination during Applications

see single table
SQL processing
– Allows for optimized roll-in / roll-out
processing (for example, minimized
logging) SALESDATA SALESDATA SALESDATA
JanPart FebPart MarPart
– Allows for divide and conquer
management of huge tables
Figure 7-7. Table partitioning: What is it and Why use it? CL4636.0
Notes:
Table Partitioning was introduced with DB2 9.1 for Linux, UNIX and Windows and it offers
some unique options to address application requirements.
With Table Partitioning, the table is defined to have distinct parts, called partitions for ease
of management, but is seen as a single large table by the application.
Main Benefits:
• Allows for partition elimination during SQL processing – Each partition contains a
range or subset of defined column values. The DB2 Optimizer can eliminate those
partitions that do not match the query predicates.
• Allows for optimized roll-in/roll-out processing (for example, minimized logging)
– Special ATTACH and DETACH commands support roll-in and roll-out, to minimize the
impact to table availability.
• Allows for divide and conquer management of huge tables – Each partition can be
defined in its own table space, so the standard addressing limits for table and table
space sizes apply to each partition not the total table size. There can be thousands of

Instructor Guide
partitions defined to provide access to huge amounts of data. The table spaces can be
independently accessed for backup and recovery.
V8.2
Instructor Guide

Purpose — This introduces the features of Table Partitioning. Do not discuss features like
Roll-in or Roll-out in any detail at this point in the presentation.
Details —
Transition statement — Next we will discuss some additional benefits to table partitioning.

Instructor Guide
Table partitioning: More benefits

• More benefits: Without Partitioning
– Can partition a single table Table space 1 Table space 2
across multiple table spaces

Table_1 Table_2
(see diagram on the right)
– SET INTEGRITY processing Table_3
can be performed online
– Flexible placement of large
Indexes
With Partitioning (example)
• May be defined using
Table space A Table space B Table space C
multiple table spaces
Table_1.p1 Table_1.p2 Table_1.p3
Table_2.p1
Figure 7-8. Table partitioning: More benefits CL4636.0
Notes:
More benefits:
• Table partition a single table across multiple table spaces – Each partition can be
defined in its own table space, which not only allows for much larger tables to be
defined but allows for data to be managed for recovery purposes at the partition level.
• SET INTEGRITY processing is now online – When a new partition is added to a table
partitioned table using ATTACH, the SET INTEGRITY command that validates the new
data and updates the indexes can be run while applications continue to have both read
and write access to the existing data.
• Indexes of a partitioned table can be defined using multiple table spaces and
buffer pools – Since Large Table partitioned tables might require several very large
indexes, there are options to utilize multiple table spaces for the indexes on a range
partitioned tables. The mapping of indexes to table spaces depends on whether the
index is partitioned or non-partitioned. This will be explained later in the lecture.
V8.2
Instructor Guide

Purpose — In DB2 LUW, all of the indexes for each table have been treated as a single
object and stored together. With the introduction of partitioned indexes in DB2 9.7, there
are two different methods for managing the indexes for table partitioned tables. This will be
covered in detail later in the lecture. The use of online SET INTEGRITY will be also
discussed in more detail later in the presentation. Storing partitions of a table-partitioned
table in different table spaces will have a similar effect on recovery to using DPF database
partitioning. For example, if the table space for one partition needs to be recovered, the
other table spaces used by that table can remain online and accessible.
Details —
Transition statement — Let's take a minute to review the terminology associated with
some of these options.

Instructor Guide
Terminology
• DATABASE PARTITIONING
– Distributing data by key hashing across logical database partitions
• DATABASE PARTITION
– An individual logical database partition of a DB2 partitioned database
• TABLE PARTITIONING
– Splitting data by key range over multiple physical objects
• RANGE or DATA PARTITION:
– An individual range of a table using table partitioning
– Represented by an object on disk
• MULTI DIMENSIONAL CLUSTERING
– Organizing data in table (or range of a table) by multiple key values
(a.k.a. MDC)
• CELL
– Group of EXTENTS containing the rows matching a particular
combination of MDC dimension values
• Database partitioning, table partitioning and multidimensional clustering
can be used simultaneously on the same table
Figure 7-9. Terminology CL4636.0
Notes:
The introduction of table partitioning in DB2 9 required some adjustments to terminology
associated with the DB2 features. In the past any reference to a partitioned table generally
referred to using database partitioning. These terms will help to clarify the use of the
different database features in a DB2 10 database.
The terms DATABASE PARTITIONING and DATABASE PARTITION will be used to refer to
using the hash based partitioning in DB2 databases.
• DATABASE PARTITIONING – Distributing data by key hashing across logical database
partitions.
• DATABASE PARTITION – An individual logical database partition of a DB2 Partitioned
Database.
The terms TABLE PARTITIONING, RANGE PARTITION or DATA PARTITION will be used
to refer to using the range based table partitioning.
• TABLE PARTITIONING – Splitting data by key range over multiple physical objects.
V8.2
Instructor Guide
Uempty • RANGE or DATA PARTITION – An individual range of a table partitioned table,

represented by an object on disk.
The terms MULTI DIMENSIONAL CLUSTERING, CELL and BLOCK INDEXING will be
used to refer to MDC tables defined with the ORGANIZE BY DIMENSIONS clause.
• MULTI DIMENSIONAL CLUSTERING (MDC) – Organizing data in table (or range of a
table) by multiple key values.
• CELL – Group of EXTENTS containing the rows matching a particular combination of
MDC dimension values.
It is important to understand that these features: Database Partitioning, Table Partitioning
and Multi Dimensional Clustering can be used in any combination to support application
requirements.

Instructor Guide
Instructor notes:
Purpose — This is meant to clarify the terms used to refer to the various DB2 features.
The most important distinctions involve limiting the use of the term table partitioning to
references to the new range-based table partitioning feature tables and using database
partitioning to refer to tables using hash based partitioning in a DB2 partitioned database.
The terms used for MDCs remain distinct. We will cover using combinations of features
later in the presentation.
Details —
Transition statement — Next we will look at the syntax for creating a table with table
partitioning.
V8.2
Instructor Guide
Uempty
Creating a range-partitioned table

• Short and Long Forms
• Partitioning column(s):
tbsp1 tbsp2 tbsp3
– Must be base types (for example,
no LOBs, LONG VARCHARs) 1 <= c1 < 34 34 <= c1 < 67 67 <= c1 <= 99
– Can specify multiple columns t1.p1 t1.p2 t1.p3

– Can specify generated columns
Short Form
• Special values: CREATE TABLE t1(c1 INT, ………)
IN tbsp1, tbsp2, tbsp3
– MINVALUE, MAXVALUE can be PARTITION BY RANGE(c1)
(STARTING FROM (1) ENDING (99) EVERY (33))
used to specify open ended
ranges, for example:
CREATE TABLE t1 … Long Form
CREATE TABLE t1(c1 INT, ………)
(STARTING(MINVALUE) PARTITION BY RANGE(c1)
ENDING(MAXVALUE) … (STARTING FROM (1) ENDING(33) IN tbsp1,
ENDING(66) IN tbsp2,
ENDING(99) IN tbsp3)
– SQL0327N: The row cannot be
inserted because it is outside the
bounds
Figure 7-10. Creating a range-partitioned table CL4636.0
Notes:
The syntax for defining a range-partitioned table has a long and short form for defining the
partitioning ranges.
An example of the Short Form is:
CREATE TABLE t1(c1 INT, .........) IN tbsp1, tbsp2, tbsp3
PARTITION BY RANGE(c1)
(STARTING FROM (1) ENDING (99) EVERY (33))
The equivalent definition in the Long Form would be:
CREATE TABLE t1(c1 INT, .........)
(STARTING FROM (1) ENDING(33) IN tbsp1,
ENDING(66) IN tbsp2,
ENDING(99) IN tbsp3)
Both forms define a table with three range partitions based on the column value of c1. The
first range containing rows with c1 between 1 and 33, the next range between 34 and 66,

Instructor Guide
and the last range containing c1 between 67 and 99. The three range partitions would be
stored in three separate table spaces, tbsp1, tbsp2 and tbsp3, respectively.
The column or columns selected for range partitioning must be base data types, which
implies that large object and long varchar data types could not be selected. A generated
column could be specified for range partitioning.
The special values MINVALUE and MAXVALUE can be used to specify that one range
could include all lower values or all higher values.
A new SQL error code will be used for range-partitioned tables:
• SQL0327N: The row cannot be inserted because it is outside the bounds.
This code will be used when a INSERT attempts to add a new row with a value in the
range column(s) that does not match any of the active data partitions.
V8.2
Instructor Guide

Purpose — To show a simple example of defining a range-partitioned table using the short
and long form.
Details —
Transition statement — Let's look at some of the key considerations for creating a
range-partitioned table.

Instructor Guide
Considerations for creating a partitioned table

• What tables benefit from being partitioned:
– Large tables
– Roll-in/Roll-out
– Business intelligence style queries
• Which column(s) to partition on:

– Dates (roll-in)
– Partition elimination based on query predicates, like date range
• Granularity of ranges should match roll-in/roll-out
• Consider placing different ranges in different table spaces
• Placement of indexes and LOBs using multiple table spaces
Figure 7-11. Considerations for creating a partitioned table CL4636.0
Notes:
The first thing to consider for creating new range-partitioned tables is which tables would
most benefit from being partitioned. If you are regularly rolling in or out batches of data,
then you definitely should consider table partitioning. It is also useful for very large tables
that might exceed the size limits for non-partitioned tables. Finally, consider table partitions
if you have queries that could benefit from 'partition elimination', where the DB2 Optimizer
can include or exclude the rows at a data partition level based on the predicates of the
query. We will see some examples of this later in the presentation.
Next you need to decide what columns to partition on. The experience of customers using
other products that have some form of partitioning is that they commonly partition on a date
column, or an equivalent. This is very natural if the application will be rolling in new batches
of data for current activity. Other less common uses would be partition data based on
application use of subsets of the table. For example an international company could store
the sales for country in data partitions based on a country code column. This would allow
queries that only needed for selected countries to eliminate access to a large portion of the
table data.
V8.2
Instructor Guide
Uempty The next consideration is the size of the ranges. Normally, this just corresponds to what
you plan to roll-in or out. If data is rolled in on a monthly basis, then it would be natural to
define each range to contain one month of data.
Part of the planning will be what table spaces to use for the data partitions, the indexes and
large object data. If the table data size exceeds the limits of a single table space, then
multiple table spaces will need to be used.
Beginning with DB2 LUW 9.7, the indexes for partitioned tables can be defined as
partitioned or non-partitioned. It is possible to define the indexes in multiple table spaces.
For very large tables, indexes might need to be defined in LARGE rather than REGULAR
table spaces.
The placement for large object data types will also need to be planned.

Instructor Guide
Instructor notes:
Purpose — This presents the most important planning considerations for defining
range-partitioned tables.
Details —
Transition statement — Now lets look at another example of defining table partitioning
using the long form syntax.
V8.2
Instructor Guide
Uempty
Defining ranges (Long syntax)

• Use STARTING … ENDING … to specify ranges
CREATE TABLE sales(sale_date DATE, customer INT, …)
PARTITION BY RANGE(sale_date)
(STARTING ‘1/1/2013’,
STARTING ‘4/1/2013’,
STARTING ‘7/1/2013’,
STARTING ‘10/1/2013’ ENDING ’12/31/2013’);
• Creates four ranges
Figure 7-12. Defining ranges (Long syntax) CL4636.0
Notes:
Here is a simple example of creating a partitioned table using the long form. It is
partitioned on the column sale_date. There are four ranges.
(STARTING '1/1/2013',
STARTING '4/1/2013',
STARTING '7/1/2013',
STARTING '10/1/2013' ENDING '12/31/2013');
You need to specify at least enough starting and ending boundaries for the ranges to make
it unambiguous. In this example, each starting bound is specified, plus the last ending
bound. The other ending bounds can be determined from the context. Given the above
definition, the first range would start with a date of '1/1/2013' and end with, but not include,
'4/1/2013'.
Holes are allowed in the range definitions. For example, a table could be created with
ranges for January, March and May leaving gaps for February and April.

Instructor Guide
Instructor notes:
Purpose — This shows another simple example of defining a range-partitioned table using
the long syntax.
Details —
Transition statement — Let's look an example of using the short syntax for defining a
V8.2
Instructor Guide
Uempty
Defining ranges (Short syntax)

• Use STARTING … ENDING … EVERY to quickly define
ranges
(STARTING ‘1/1/2009’ ENDING ’12/31/2013’ EVERY 3
MONTHS);
• Creates 20 data partitions, one for each quarter
• Simple way of creating many partitions quickly and easily
• Appropriate for equal-sized ranges based on dates or numbers
Figure 7-13. Defining ranges (Short syntax) CL4636.0
Notes:
It is common to create dozens or hundreds of ranges in a partitioned table. The short
syntax makes this easy to do. Rather than specifying each range explicitly, you specify the
start of the first range, the end of the last one and the size of each range. The database will
figure out all the boundaries and create the ranges for you.
For example:
(STARTING '1/1/2009' ENDING '12/31/2013' EVERY 3 MONTHS);
This example creates 20 ranges, one for each quarter over a period of 5 years. This short
syntax works for dates or integers where the ranges are of equal size. The syntax can be
used for a duration from years and months to microseconds.

Instructor Guide
Instructor notes:
Purpose — This shows a simple example of defining ranges for table partitioning using the
short form of the syntax.
Details —
Transition statement — Let's next look at defining range partitions based on multiple
columns.
V8.2
Instructor Guide
Uempty
Partitioning on multiple columns

• Multiple columns can be specified in the PARTITION BY
clause
CREATE TABLE sales(year INT, month INT, …)
PARTITION BY RANGE(year, month)
(STARTING (2013, 1),
STARTING (2013, 4),
STARTING (2013, 7),
STARTING (2013, 10) ENDING (2013, 12));
• Similar to defining multiple columns in an index
Figure 7-14. Partitioning on multiple columns CL4636.0
Notes:
All the examples so far use a single column to partition the table. Multiple columns can be
used for table partitioning in the in a way similar to defining an index on multiple columns.
Rows are partitioned into the proper range-based on the first column, then within all the
rows that have the same value in the first column, they are further subpartitioned based on
the second column, and so on for however many columns were used to define the
partitioning.
For example:
CREATE TABLE sales(year INT, month INT, …)
PARTITION BY RANGE(year, month)
(STARTING (2013, 1),
STARTING (2013, 4),
STARTING (2013, 7),
STARTING (2013, 10) ENDING (2013, 12));
The example show a table, sales, where the ranges of data partitions will be based on the
two columns, year and month.

Instructor Guide
Instructor notes:
Purpose — To show a simple example of creating a range-partitioned table using two
columns to define the range for each partition.
Details —
Transition statement — Let's look at defining open ended ranges for table partitioning.
V8.2
Instructor Guide
Uempty
Create Table: Open-ended ranges

• Use MINVALUE and MAXVALUE to specify open ended ranges
– In this example, the first range holds everything before the year
2010
– Think of these as positive and negative infinity

(
STARTING MINVALUE ENDING '12/31/2012',
STARTING '1/1/2013' ENDING '3/31/2013',
STARTING '4/1/2013' ENDING '6/30/2013,
STARTING '7/1/2013' ENDING '9/30/2013',
STARTING '10/1/2013' ENDING '12/31/2013'
);
Figure 7-15. Create Table: Open-ended ranges CL4636.0
Notes:
The special values MINVALUE and MAXVALUE can be used to specify open-ended
ranges. Think of these as positive and negative infinity.
In this example, the first range holds everything before the year 2008.
( STARTING MINVALUE ENDING '12/31/2012',
STARTING '1/1/2013' ENDING '3/31/2013',
STARTING '4/1/2013' ENDING '6/30/2013,
STARTING '7/1/2013' ENDING '9/30/2013',
STARTING '10/1/2013' ENDING '12/31/2013' );

Instructor Guide
Instructor notes:
Purpose — To show an example using MINVALUE to create an open-ended range.
Details —
Transition statement — Let's look at using the terms EXCLUSIVE and INCLUSIVE to
define ranges for table partitioning.
V8.2
Instructor Guide
Uempty
Create Table: Inclusive and Exclusive bounds

• Use the EXCLUSIVE keyword to indicate range boundary is
exclusive:
– By default, bounds are inclusive
– This example avoids holes by making each ending bound the same
as the next starting bound, and using EXCLUSIVE for the ending
bound

(
STARTING MINVALUE ENDING '1/1/2013' EXCLUSIVE,
STARTING '1/1/2013' ENDING '4/1/2013' EXCLUSIVE,
STARTING '10/1/2013' ENDING '12/31/2013’ INCLUSIVE
);
Figure 7-16. Create Table: Inclusive and Exclusive bounds CL4636.0
Notes:
By default, bounds in range-partitioned tables are inclusive.
The terms INCLUSIVE and EXCLUSIVE can be used to define how to handle the starting
or ending values for a range.
• INCLUSIVE indicates that all values equal to the specified value are to be included in
the data partition containing this boundary.
• EXCLUSIVE indicates that all values equal to the specified value are NOT to be
included in the data partition containing this boundary.
The following example uses the EXCLUSIVE keyword to indicate range boundaries that
are exclusive. So the first range would be all dates prior to 1/1/2013, but not include that
date. The date 1/1/2013 would begin the second range. The last range ends with, and
includes the date 12/31/2013.
The example in the visual avoids holes by making each ending bound the same as the next
starting bound, and using EXCLUSIVE for the ending bound.

Instructor Guide
Instructor notes:
Purpose — This describes how to use the keywords INCLUSIVE and EXCLUSIVE to
determine if the starting or ending value of a range should be included in that range.
Details —
Transition statement — Next let's look at using ALTER TABLE to ADD a new range to an
existing range-partitioned table.
V8.2
Instructor Guide
Uempty
Adding new ranges

• Use ALTER ADD to add new ranges to an existing partitioned
table
ALTER TABLE sales ADD PARTITION

STARTING ‘1/1/2014’ ENDING ‘3/31/2014’ IN
TBSPACE1;
• Creates a new empty range in TBSPACE1
• The new range can not overlap any existing range
Figure 7-17. Adding new ranges CL4636.0
Notes:
You can add a new range to an existing table using the SQL statement ALTER TABLE ...
ADD PARITITION. This creates a new empty range.
In this example, the new range has been explicitly placed in the table space TBSPACE1.
ALTER TABLE sales ADD PARTITION
STARTING '1/1/2014' ENDING '3/31/2014' IN TBSPACE1;
The ADD PARTITION clause allows a new data partition to be added to an existing table in
much the same way as the long form of the CREATE TABLE syntax. The range of values
for the new data partition is determined by the STARTING and ENDING clauses. This
range must not overlap that of an existing data partition.
One or both of the STARTING and ENDING clauses must be supplied.
If the STARTING clause is omitted, then the new data partition is assumed to be at the end
of the table; similarly, if the ENDING clause is omitted, the new data partition is assumed to
be at the start of the table.

Instructor Guide
The STARTING clause and ENDING clause syntax is the same as in CREATE TABLE. Just
as in CREATE TABLE, data partitions can be given a name or a number.
If no IN clause is specified for ADD PARTITION, the table space in which to place the data
partition will be chosen using the same method as is used by CREATE TABLE. Application
Packages are invalidated during ADD.
The newly added data partition will be immediately available once the ALTER is committed.
V8.2
Instructor Guide

Purpose — This provides information about the adding of a new data partition to a
Details —
Transition statement — Let's look at assigning names to the partitions of a

Instructor Guide
Create Table: Naming partitions

• You can name partitions via the PART or PARTITION keyword:
– Provided name overrides system generated name
– Used to specify partition on partition level operations like DETACH
(more on this later)

(
PART rest STARTING MINVALUE,
PARTITION q1 STARTING '1/1/2013',
PARTITION q4 STARTING '10/1/2013' ENDING '12/31/2013'
);
Figure 7-18. Create Table: Naming partitions CL4636.0
Notes:
If a data partition name is not supplied, or the short form of the syntax is being used, then a
system-generated name of the form PARTn will be used. System-generated names are
chosen to not conflict with user-supplied data partition names. When data partitions are
added, attached or detached after the creation of a partitioned table range, the resulting
system-generated data partition names might not be in sequence.
You can name partitions via the PART or PARTITION keyword on the CREATE TABLE or
an ALTER TABLE with the ADD option. Having names relative to the table contents, like
sales2008Q1, can make operations like the ATTACH and DETACH options of ALTER
TABLE less prone to error.
V8.2
Instructor Guide
Uempty For example:

CREATE TABLE sales(sale_date DATE, customer INT, ...)
( PART rest STARTING MINVALUE,
PARTITION q4 STARTING '10/1/2013' ENDING '12/31/2013' );

Instructor Guide
Instructor notes:
Purpose — To explain the ability to assign application meaningful names to individual
partitions of a range-partitioned table.
Details —
Transition statement — Let's look at assigning table spaces to the range-partitioned table
using the short syntax.
V8.2
Instructor Guide
Uempty
Storage Mapping:
Mapping ranges to table spaces (1 of 2)
• Short syntax:
– The IN clause on CREATE
tbsp1 tbsp2 tbsp3
TABLE now accepts a list
sales.1Q/09 sales.2Q/09 sales.3Q/09
– In this example, the ranges will sales.4Q/09 sales.1Q/10 sales.2Q/10
cycle through the provided table
spaces in round-robin fashion
– Data in the 1Q/2009 will be Table spaces must have the same Page size
placed in tbsp1, 2Q/2009 in and Extent size
tbsp2, 3Q/2009 in tbsp3,
4Q/2009 in tbsp1, etc.

IN TBSP1, TBSP2, TBSP3
(
STARTING '1/1/2009' ENDING '12/31/2013'
EVERY 3 MONTHS
);
Figure 7-19. Storage Mapping: Mapping ranges to table spaces (1 of 2) CL4636.0
Notes:
The IN clause on CREATE TABLE for a range-partitioned table accepts a list of table
spaces rather than being limited to a single table space name.
For example using the short form:
IN TBSP1, TBSP2, TBSP3
( STARTING '1/1/2009' ENDING '12/31/2013' EVERY 3 MONTHS );
This SQL statement would create a table with twenty partitions. The ranges will cycle
through the provided three table spaces in round-robin fashion. Data in the 1Q/2009
partition will be placed in tbsp1, 2Q/2009 in tbsp2, 3Q/2009 in tbsp3, then the partition for
4Q/2009 would again use the table space tbsp1, and so on.
Partitioned tables can have their partitions spread across multiple table spaces. When
multiple table spaces are specified, all of the table spaces must exist, and they must all be
either SMS or regular DMS or large DMS table spaces. All data table spaces used by a

Instructor Guide
table must be of the same type and all large table spaces used by a table must be of the
same type. As with previous releases a LONG IN clause is only allowed if the data is stored
in a LARGE DMS table space.
All table spaces used by a table, including data table spaces, table spaces used by
indexes, and long table spaces, must be in the same database partition group. All data
table spaces used by a table must have the same page size and extent size.
A warning is returned if they don't have the same prefetch size. Note that prefetch size can
be automatic.
V8.2
Instructor Guide

Purpose — This shows an example of having DB2 assign the data partitions to a list of
table spaces using a round-robin approach.
Details —
Transition statement — Let's look assigning the table spaces to partitions specifically
using the long-form syntax.

Instructor Guide
Storage Mapping:
Mapping ranges to table spaces (2 of 2)
• Long syntax:
– You explicitly specify a table
space for each partition
tbsp1 tbsp2 tbsp3
– In this example, data in the
1Q/2013 will be placed in tbsp1,
2Q/2013 in tbsp2, 3Q/2013 in sales.1Q sales.2Q sales.3Q
2013
tbsp3 2013 2013

(
STARTING MINVALUE IN TBSP1,
STARTING '3/1/2013' IN TBSP2,
STARTING '6/1/2013' ENDING '9/30/2013 IN TBSP3
);
Figure 7-20. Storage Mapping: Mapping ranges to table spaces (2 of 2) CL4636.0
Notes:
Using the Long-Form syntax, you explicitly specify a table space for each partition.
For example:
( STARTING MINVALUE IN TBSP1,
STARTING '3/1/2013' IN TBSP2,
STARTING '6/1/2013' ENDING '9/30/2013 IN TBSP3 );
In this example, data in the 1Q/2013 will be placed in tbsp1, 2Q/2013 in tbsp2, and
3Q/2013 in tbsp3.
V8.2
Instructor Guide

Purpose — This shows an example of using the long-form to explicitly assign table spaces
to each data partition. If batches of data are rolled-in and rolled-out, it might be useful to
have each partition in its own table space, so that the table space and its assigned disk
space can be dropped when the data is not longer needed.
Details —
Transition statement — Next we will discuss the addition of partitioned indexes for

Instructor Guide
Global (Non-partitioned) indexes

• Non-partitioned indexes:
– Each index contains entries of every row in the range-partitioned table
– MDC Block indexes
• Non-partitioned before DB2 9.7 Fix Pack 1
• Partitioned starting with DB2 9.7 Fix Pack 1
– Each index is managed as a separate storage object
– CREATE INDEX IN tsname can be used to override location for new indexes
One Index 1 Index 2

Storage Sales Date Product ID
Object
IX Table space 1 IX Table space 2
Data Data Data Data

Range 1 Range 2 Range 3 Range 4
2013 Q1 2013 Q2 2013 Q3 2013 Q4
Table space 1 Table space 2 Table space 3 Table space 4
Figure 7-21. Global (Non-partitioned) indexes CL4636.0
Notes:
The indexes for range-partitioned tables implemented with DB2 9.1 were non-partitioned or
global indexes. Unlike standard DB2 indexes, these global indexes are managed as
independent storage objects, which could be assigned to different table spaces. The
INDEX IN clause was added to the CREATE INDEX statement to allow new global indexes
to be placed in specific table spaces.
Each non-partitioned index contains all of the keys for every row in the table. An extra two
bytes is added to the index key to store the data partition ID for each row. DB2 can use this
data partition ID, during an index scan to exclude rows based on predicates for the
range-partitioning columns. For example, a table with twenty-four months of sales data
could be stored in a table where each data range holds one month's. A query that needs
three months of sales for one product might scan the product index, which contains the row
pointers for all twenty-four months, but exclude the twenty-one months of data from access
based on the partition ID numbers.
V8.2
Instructor Guide

Purpose — To review the characteristic for the global or non-partitioned indexes that were
available for range-partitioned tables in DB2 9.1 and 9.5.
Details —
Transition statement — Next we will introduce the partitioned indexes, introduced with
DB2 9.7.

Instructor Guide
Partitioned indexes
• Partitioned indexes:
– A unique partitioned index must contain the columns used in the PARTITION BY
RANGE clause
– Each index contains one index partition for each data range in the
range-partitioned table
– All partitioned indexes are managed as a single storage object per data range
– CREATE INDEX IN tsname can NOT be used to override the index location
– The INDEX IN clause of CREATE TABLE can be specified for a single data range
– INDEX IN clause can be specified for ALTER TABLE ADD PARTITION
One Index 1 Index 1 Index 1 Index 1

Sales Date Sales Date Sales Date Sales Date
Storage
Object Index 2 Index 2 Index 2 Index 2
Product ID Product ID Product ID Product ID
Data Data Data Data

2013 Q1 2013 Q2 2013 Q3 2013 Q4

Figure 7-22. Partitioned indexes CL4636.0
Notes:
Starting with DB2 9.7, you can create partitioned indexes for range-partitioned tables
where the index itself is partitioned such that each data partition has an associated index
partition. You can create both non-partitioned and partitioned indexes for a single
partitioned table.
An index on an individual data partition is an index partition; the set of index partitions that
make up the entire index for the table is a partitioned index. All of the partitioned indexes
are managed as a single storage object per data range.
The table space to be used for storing the partitioned indexes is based on the definition set
during the CREATE TABLE processing. You can not specify the INDEX IN clause when a
CREATE INDEX statement defines a new partitioned index. Since these indexes are
divided into index partitions, the extra two byte partition id is not needed for each index key
entry.
The example shown in the visual shows a range partitioned table with four data ranges
defined in four table spaces. The two partitioned indexes are managed using four storage
objects, one for each table space. This shows that the storage for partitioned indexes is
V8.2
Instructor Guide
Uempty closely related to each data range, unlike the non-partitioned indexes which were
independent of the data ranges.

Instructor Guide
Instructor notes:
Purpose — To discuss the different storage technique used for partitioned indexes.
Details —
Transition statement — Next we will see an example of CREATE TABLE and CREATE
INDEX statements used to define partitioned indexes for a range-partitioned table.
V8.2
Instructor Guide
Uempty
Example of creating partitioned indexes

• Placement for partitioned indexes can be specified using the long form
when the table is created
CREATE TABLE PARTTAB.HISTORYPART ( ACCT_ID INTEGER NOT NULL ,

TELLER_ID SMALLINT NOT NULL ,
BRANCH_ID SMALLINT NOT NULL ,
BALANCE DECIMAL(15,2) NOT NULL ,
……….
TEMP CHAR(6) NOT NULL )
PARTITION BY RANGE (BRANCH_ID)
(STARTING FROM (1) ENDING (20) IN TSHISTP1 INDEX IN TSHISTI1 ,
STARTING FROM (21) ENDING (40) IN TSHISTP2 INDEX IN TSHISTI2 ,
STARTING FROM (41) ENDING (60) IN TSHISTP3 INDEX IN TSHISTI3 ,
STARTING FROM (61) ENDING (80) IN TSHISTP4 INDEX IN TSHISTI4 ) ;
CREATE INDEX PARTTAB.HISTPIX1 ON PARTTAB.HISTORYPART (TELLER_ID)

PARTITIONED ;
CREATE INDEX PARTTAB.HISTPIX2 ON PARTTAB.HISTORYPART (BRANCH_ID)

PARTITIONED ;
Figure 7-23. Example of creating partitioned indexes CL4636.0
Notes:
The visual shows a sample CREATE TABLE statement for a range partitioned table named
PARTTAB.HISTORYPART.
This table has four defined data ranges, based on the BRANCH_ID column. Each data
range defines a specific data and index table space to use for that range.
There are two CREATE INDEX statements shown, defining two partitioned indexes. Each
of these two indexes will be place into a common storage object located in the table space
defined by the INDEX IN clause named in the range definition section of the CREATE
TABLE statement.

Instructor Guide
Instructor notes:
Purpose — To show how partitioned indexes can be directed to use certain table spaces,
associated with each defined data range.
Details —
Transition statement — Next we will see how the DESCRIBE DATA PARTITIONS
statement can be used to list the table spaces used for partitioned indexes.
V8.2
Instructor Guide
Uempty
Describe Data Partitions shows partitioned index
object table space
db2 describe data partitions for table parttab.historypart show detail
PartitionId Inclusive (y/n) Inclusive (y/n)

Low Value High Value
----------- - -------------- - -------------------
0 Y 1 Y 20
1 Y 21 Y 40
2 Y 41 Y 60
3 Y 61 Y 80
PartitionId PartitionName TableSpId PartObjId IndexTblSpId LongTblSpId Access

Mode
Status
----------- --------------- ----------- ----------- ------------ ----------- - ------
0 PART0 11 4 16 11 F
1 PART1 12 4 18 12 F
2 PART2 13 4 19 13 F
3 PART3 14 4 20 14 F
Figure 7-24. Describe Data Partitions shows partitioned index object table space CL4636.0
Notes:
The DESCRIBE DATA PARTITIONS statement with the SHOW DETAIL option lists the
table space IDs associated with the storage objects ( data, index and long data) for each
data partition of a range partitioned table. This is a simple way to check the table spaces
that would be used for managing the partitioned indexes.

Instructor Guide
Instructor notes:
Purpose — To show an example report generated by the DESCRIBE DATA PARTITIONS
statement that shows the index table spaces defined for a range-partitioned table that
would determine how any partitioned indexes would be stored.
Details —
V8.2
Instructor Guide
Uempty
Using Storage Groups to assign data partitions to
different storage devices
Partitioned Table Sales
Partition 2012Q1 2011Q4 2011Q3 2011Q2 2011Q1 2010Q4 … 2006Q3
Automatic Table Space 14 Table Space 13 Table Space 12 Table Space 11 Table Space 10 Table Space 9 … Table Space 1
Storage …
Table Space
spath: spath: /warm/fs1 spath: /cold/fs1

/hot/fs1 spath: /warm/fs2 spath: /cold/fs2
Storage spath: /cold/fs3
SG_HOT SG_WARM
Group SG_COLD
Physical Disk
SSD RAID Array FC/SAS RAID Array SATA RAID Array
Figure 7-25. Using Storage Groups to assign data partitions to different storage devices CL4636.0
Notes:
DB2 10.1 introduced the definition of storage groups for DB2 automatic storage
tablespaces. The concept is that a storage group would be used to direct the table space
containers to storage devices with potentially different performance characteristics.
It might be useful to store the most current, highly active data, like the current month or
quarter of data, on high performance devices like solid state disks. You might want to use
devices with medium performance for data, like the current year. Data from previous years
may only be accessed occasionally and could be stored on less expensive storage
devices.
The ranges of a range partitioned table can be defined using tablespaces in different
storage groups. DB2 provides a method, using ALTER TABLESPACE where a tablespace
could be easily moved from one storage group to another, when the access requirements
for the data changes.

Instructor Guide
Instructor notes:
Purpose — To discuss using the storage groups introduced with DB2 10.1, to provide
different performance for vearious ranges of a range partitioned table.
Details —
Transition statement — Next we will see that the DB2 optimizer recognizes that ranges of
a range partitioned table are stored in tablespaces with different storage access costs.
V8.2
Instructor Guide
Uempty
Explain reports show the performance characteristics of
table spaces used for range partitioned tables
Differing OVERHEAD/TRANSFERATE/PREFETCHSIZE VALUES FOR VARIOUS TABLESPACES USED BY OBJECTS
PARTITIONS:
----------------------------------------------------------------------------------:
Data Partition ID: 0
------------------
Tablespace name: TSHISTP1
Table space ID: 11

------------------
Table space ID: 12

------------------
Table space ID: 13

------------------
Table space ID: 14
Figure 7-26. Explain reports show the performance characteristics of table spaces used for range partitioned tables CL4636.0
Notes:
This example of a section of the db2exfmt explain report shows that the data partitions of a
range partitioned table are stored in tablespaces with different I/O costing attributes.
The DB2 optimizer can use these costs to adjust the I/O costs for accessing the table.
These I/O costs can be defined at the storage group level and derived by the tablespaces
assigned to the storage group. When a tablespace is altered to use a different storage
group, the I/O cost attributes would automatically apply to access requests. The
administrator would not need to alter the tablespace to reflect the change in tablespace I/O
performance estimates.

Instructor Guide
Instructor notes:
Purpose — To show an example of a DB2 explain report, indicating that the DB2 optimizer
noticed that the ranges of table were using tablespace with different performance costs.
Details —
Transition statement — Next we will look at an example of the performance statistics that
can be retrieved for partitioned indexes using the MON_GET_INDEX table function.
V8.2
Instructor Guide
Uempty
Using the MON_GET_INDEX function for
performance statistics for partitioned indexes
select substr(tabschema,1,10) as schema,
substr(tabname,1,15) as name, iid as index_id,
data_partition_id, index_scans, index_only_scans
from table(MON_GET_INDEX('PARTTAB','HISTORYPART',-1)) AS IX
order by 3,4
SCHEMA NAME INDEX_ID DATA_PARTITION_ID INDEX_SCANS INDEX_ONLY_SCANS

---------- --------------- -------- ----------------- -------------------- --------------------
PARTTAB HISTORYPART 1 0 3 3
Figure 7-27. Using the MON_GET_INDEX function for performance statistics for partitioned indexes CL4636.0
Notes:
The MON_GET_INDEX table function can be used to retrieve index statistics and usage
counters from an active DB2 database. The usage statistics reflect the work performed
since the database was activated. For partitioned indexes of range-partitioned tables, the
usage statistics are reported for each index partition. These statistics would show which
indexes were being used the most and also if certain index partitions were more heavily
utilized.
The function can be used to retrieve statistics for a single table, all tables in a schema, or
all tables.

Instructor Guide
Instructor notes:
Purpose — To look at an example of a query that uses the MON_GET_INDEX table
function to list statistics for each index partition associated with a range-partitioned table.
Details —
Transition statement — Let's look at the assignment of table spaces for the Large Object
columns in a range-partitioned table.
V8.2
Instructor Guide
Uempty
Storage Mapping: Large objects are Local

• Large objects (LOBs, and so on)
are local: tbsp4 tbsp5
– Separate storage object for each

partition i1 i2
– By default in same table space as

corresponding data partition
tbsp1 tbsp2 tbsp3
– Can be specified per partition via
LONG IN clause on CREATE TABLE
CREATE TABLE t1(c1 INT, c2 INT, c3 BLOB) t1.p1 t1.p2 t1.p3
INDEX IN tbsp4
LONG IN tbsp6, tbsp7, tbsp8
(STARTING FROM (1)
ENDING (99) t1.LONG1 t1.LONG2 t1.LONG3
EVERY (33));
CREATE INDEX i1(c1);
CREATE INDEX i2(c2) IN tbsp5; tbsp6 tbsp7 tbsp8
Figure 7-28. Storage Mapping: Large objects are Local CL4636.0
Notes:
The Large object columns in range-partitioned tables, CLOBs, BLOBs and Long Varchar
data type, are stored locally. This means there will be a separate storage object per
partition to hold this type of data. The default location for the LONG data would be in the
same table space as the corresponding data partition. If the range-partitioned table uses
DMS table spaces for the data partitions, then the LONG IN clause of the CREATE TABLE
statement can be used to specify the table spaces for LONG data.
All large table spaces used by a table must have the same page size but do not need to
have the same extent size. The page size and extent size used by large table spaces do
not need to match those of the data table spaces.

Instructor Guide
In the following example:

CREATE TABLE t1(c1 INT, c2 INT, c3 BLOB)
INDEX IN tbsp4
LONG IN tbsp6, tbsp7, tbsp8
(STARTING FROM (1)
ENDING (99)
EVERY (33));
The example shows a range-partitioned table with three data partitions using the three
table spaces tbsp1, tbsp2 and tpsp3. The LONG IN clause specifies that the data for
column c3, a binary large object, would be stored in the table space tbsp6 for data rows in
tbsp1, and in tbsp7 for data rows in tbsp2, and so forth.
V8.2
Instructor Guide

Purpose — This shows the option to define LONG data table spaces for each data
partition of a range-partitioned table.
Details —
Transition statement — Let's look at using partition elimination for a query that requires a
table scan.

Instructor Guide
Partition elimination: Table scans

SELECT * FROM t1
WHERE
scan
year = 2008 AND
month > 7 tbsp1 tbsp2 tbsp3 tbsp4
t1.p1 t1.p2 t1.p3 t1.p4

– Will only access data in
table spaces tbsp3 and
1Q/2008 2Q/2008 3Q/2008 4Q/2008
tbsp4
scan
SELECT * FROM t2
WHERE tbsp1 tbsp2 tbsp3
A > 50 AND
A < 150 t2.p1 t2.p2 t2.p3
0<=A<100 100<=A<200 200<=A<300

– Will only access data in
tbsp1 and tbsp2
Figure 7-29. Partition elimination: Table scans CL4636.0
Notes:
Some queries might require large amounts of data to be accessed where no index matches
the query predicates or the low clustering ratio of the index makes indexed access too
costly. Without range partitioning, a table scan would need to scan the entire table.
In the first example, a table is created to hold the data from the year 2008 and PARTITION
BY RANGE is used to divide the rows into four data partitions, each containing three
months of data.
With the following query:
SELECT * FROM t1
WHERE year = 2008 AND month > 7
The optimizer could select a table scan that would eliminate the two data partitions that
have data from the first six months of the year and only scan the two data partitions that
could hold data rows with a month value greater than 7. This could save a significant
amount of time and system resources.
V8.2
Instructor Guide
Uempty In the second example, a table is created with the PARTITION BY RANGE clause dividing
data into three data partitions based on ranges of values for the column A. The first range
would be from 1 to 99, the second range from 100 to 199, and the third range would
contain from 200 to 299.
With the following query:
SELECT * FROM t2
WHERE A > 50 AND A < 150
The optimizer could select a table scan that would eliminate the third data partition that has
rows with A column values between 200 and 299 because there can be no possible
matches in that data partition. The other two data partitions would need to be scanned.

Instructor Guide
Instructor notes:
Purpose — This shows two examples of using a range-partitioned table for queries where
the predicates match the column used for defining the data partitions. In each case, the
DB2 Optimizer could apply 'partition elimination' and completely avoid reading some of the
data partitions.
Details —
Transition statement — Let's take look at how partition elimination is indicated in the
Explain output for a query.
V8.2
Instructor Guide
Uempty
Partition elimination shown in DB2 Explain

SQL Statement:
select * from historypart

where branch_id between 11 and 60
and teller_id between 800 and 810
Table ranges defined on
Branch_id column
Section Code Page = 850 Predicate included:
( 5) Access Table Name = INST411.HISTORYPART ID = -6,-32768 branch_id between 11 and 60
| Index Scan: Name = INST411.HISTPIX1 ID = 1
| | Regular Index (Not Clustered)
| | Index Columns:
| | | 1: TELLER_ID (Ascending)
| #Columns = 0
| Data-Partitioned Table
| Skip Inserted Rows
| Avoid Locking Committed Data
| Currently Committed for Cursor Stability
| Data Partition Elimination Info:
| | Range 1:
| | | #Key Columns = 1
| | | | Start Key: Inclusive Value
| | | | | 1: 11
| | | | Stop Key: Inclusive Value
| | | | | 1: 60
| Active Data Partitions: 0-2
| #Key Columns = 1
Active Data Partitions: 0-2
| | Start Key: Inclusive Value
| | | | 1: 800
| | Stop Key: Inclusive Value
| | | | 1: 810
| Index-Only Access
| Index Prefetch: None
Figure 7-30. Partition elimination shown in DB2 Explain CL4636.0
Notes:
The Explain report indicates when the predicates associated with a SQL statement can be
used to eliminate some of the data partitions being accessed for an operation. The
example shows a portion of a Explain report based on a SQL statement that selects data
from a range partitioned table that was defined with ranges based on a column named
branch_id. The predicate ‘branch_id between 11 and 60’ is used to eliminate the ranges of
the table that would not include any matching rows.
The following text is included in the Explain report:
| Data Partition Elimination Info:
| | Range 1:
| | | #Key Columns = 1
| | | | Start Key: Inclusive Value
| | | | | 1: 11
| | | | Stop Key: Inclusive Value
| | | | | 1: 60
| Active Data Partitions: 0-2

Instructor Guide
This shows that the first three data partitions defined will be accessed for the index scan.
The index scan operation is based on the other predicate which defines a range of values
for the teller_id column.
V8.2
Instructor Guide

Purpose — To show an example Explain report where a predicate is used to eliminate
some of the data ranges for the range-partitioned table.
Details —
Transition statement — Next we will compare the Explain reports for several queries
using either partitioned or non-partitioned indexes.

Instructor Guide
Example 1:
Partitioned and Non-partitioned indexes
/----+-----\ /----+-----\
1819.88 472956 1791.03 472956
RIDSCN DP-TABLE: PARTTAB RIDSCN DP-TABLE: PARTTAB
( 5) HISTORYPART ( 5) HISTORYPART2
131.396 Q1 334.407 Q1
26.0842 195.621
| |
1819.88 1791.03
SORT SORT
( 6) ( 6)
131.396 334.407
26.0842 195.621
|
Partitioned |
Non-Partitioned
1819.88 Indexes 1791.03 Indexes
IXAND
IXAND Lower I/O ( 7)
Higher I/O cost
( 7)
130.762 Costs for Scans 333.784 For index scans
26.0842 195.621
/-----+------\ /-----+------\
10705.1 20100.7 11488.4 18433.3
IXSCAN IXSCAN IXSCAN IXSCAN
( 8) ( 9) ( 8) ( 9)
67.5317 62.0989 210.067 122.6
8.42004 17.6642 139.29 56.3317
| | | |
472956 472956 472956 472956
INDEX: PARTTAB INDEX: PARTTAB INDEX: PARTTAB INDEX: PARTTAB
HISTPIX1 HISTPIX2 HISTP2IX1 HISTP2IX2
Q1 Q1 Q1 Q1
Figure 7-31. Example 1: Partitioned and Non-partitioned indexes CL4636.0
Notes:
These two Explain reports were generating using the same two range-partitioned tables,
loaded with the same data as the previous example. One table was defined with partitioned
indexes, the other with non-partitioned indexes, like those supported by previous DB2
releases.
The SQL statement used for both was:
SELECT HISTORYPART.TELLER_ID, HISTORYPART.BRANCH_ID,
HISTORYPART.BALANCE,
HISTORYPART.ACCTNAME
FROM PARTTAB.HISTORYPART AS HISTORYPART
WHERE HISTORYPART.BRANCH_ID > 70 AND HISTORYPART.TELLER_ID
BETWEEN 100 AND 200
order by 2,1
The visual shows the portion of the access plan graph where the index scan operations
were performed. For this example, similar access plans were used for partitioned and
non-partitioned indexes.
V8.2
Instructor Guide
Uempty The example on the left shows much lower estimated I/O and processing costs using the
partitioned indexes compared to the version using two non-partitioned indexes.

Instructor Guide
Instructor notes:
Purpose — To discuss the reduced I/O and estimated timeron costs shown in Explain
reports for partitioned indexes compared to the costs for non-partitioned indexes.
Details —
Transition statement — Next we will look at one more Explain report example.
V8.2
Instructor Guide
Uempty
Example 2:
Partitioned and Non-partitioned indexes
| /----+-----\
4750 4845.46 472956
FETCH RIDSCN DP-TABLE: PARTTAB
( 4) ( 5) HISTORYPART2
283.611 445.129 Q1
143.98 290.269
/----+-----\ |
21410.3 472956 4845.46
RIDSCN DP-TABLE: PARTTAB SORT
( 5) HISTORYPART ( 6)
143.274 Q1 445.129
16.8401 290.269
| With | With
21410.3
Partitioned
4845.46 Non-Partitioned
SORT IXAND
( 6) Indexes ( 7)
Indexes
143.274
Lower I/O Cost 443.261 Higher I/O Cost
16.8401 290.269
Using 1 Index Uses 2 indexes
| /-----+------\
21410.3 22976.8 49869.7
IXSCAN IXSCAN IXSCAN
( 7) ( 8) ( 9)
135.064 210.907 229.713
16.8401 139.29 150.979
| | |
472956 472956 472956
INDEX: PARTTAB INDEX: PARTTAB INDEX: PARTTAB
HISTPIX1 HISTP2IX1 HISTP2IX2
Q1 Q1 Q1
Figure 7-32. Example 2: Partitioned and Non-partitioned indexes CL4636.0
Notes:
These two Explain reports were generating using the same two range-partitioned tables,
loaded with the same data as the previous example. One table was defined with partitioned
indexes, the other with non-partitioned indexes, like those supported by previous DB2
releases.
The SQL statement used for both was:
SELECT HISTORYPART.TELLER_ID, HISTORYPART.BRANCH_ID,
HISTORYPART.BALANCE,
HISTORYPART.ACCTNAME
FROM PARTTAB.HISTORYPART AS HISTORYPART
WHERE HISTORYPART.BRANCH_ID > 55
AND HISTORYPART.TELLER_ID BETWEEN 100 AND 200
order by 2,1
The visual shows the portion of the access plan graph where the index scan operations
were performed. For this example, different access plans were used for partitioned and

Instructor Guide
non-partitioned indexes. The partitioned index access plan uses a single index scan, while
the non-partitioned index plan uses two indexes that are anded together.
The example on the right shows much lower estimated I/O and processing costs for the
single IXSCAN operation using the partitioned indexes compared to the version using two
non-partitioned indexes. The SQL statement includes a predicate,
‘HISTORYPART.BRANCH_ID > 55' and the BRANCH_ID column was used to define the
table data ranges. The access plan for the partitioned indexed table uses an index to
handle the predicate on the column that is not the range-partitioning column (TELLER_ID)
and selects two index partitions of that index based on the predicate for the BRANCH_ID
column. The access plan using the non-partitioned indexes processes both indexes, one
on TELLER_ID and one on BRANCH_ID. The index scans using the one partitioned index
is estimated to require about 17 I/Os, while the two index scans using the non-partitioned
index are estimated to require 139 and 150 I/Os.
V8.2
Instructor Guide

Purpose — To show another example of the reduced estimated costs shown in Explain
reports for a table using partitioned indexes compared to non-partitioned indexes. The
access plan is different in this case; one using a single index, while the other decides to
perform index anding between two indexes.
Details —
Transition statement — Now we will see how roll-in and roll-out of data can be
accomplished with range-partitioned tables.

Instructor Guide
Operations for Roll-out and Roll-in

• ALTER TABLE … DETACH:
– An existing range is split off as a stand alone table
– Data instantly becomes invisible
– Minimal interruption to other queries accessing table
• ALTER TABLE … ATTACH:

– Incorporates an existing table as a new range
– Follow with SET INTEGRITY to validate data and maintain
non-partitioned indexes
– Data becomes visible all at once after COMMIT for SET INTEGRITY
• With DB2 10.1 SET INTEGRITY IMMEDIATE UNCHECKED is supported
– Minimal interruption to other queries accessing table
• Key points:
– No data movement
– Nearly instantaneous
Figure 7-33. Operations for Roll-out and Roll-in CL4636.0
Notes:
Options of the ALTER TABLE SQL statement allow for data partitions to be added using the
ATTACH option or removed using the DETACH option.
When an ALTER TABLE with the DETACH option is used for a range-partitioned table, one
data partition will be split off as a separate table, named in the statement. As soon as the
statement is committed, all of the data in the detached partition is instantly invisible to any
applications accessing the range-partitioned table. This allows minimal interruption to other
queries accessing the table. The ALTER TABLE will require a super-exclusive table lock to
complete.
The ALTER TABLE with the ATTACH option is used to incorporates an existing table as a
new range in a range-partitioned table. A SET INTEGRITY command will be required to
perform the index maintenance for non-partitioned indexes and any other integrity
checking. The data becomes visible all at once after COMMIT of the SET INTEGRITY. This
sequence minimizes the interruption to other queries accessing the table.
The ALTER TABLE ATTACH and DETACH, do not perform any data movement so the
processing can be nearly instantaneous. The SET INTEGRITY command has options that
V8.2
Instructor Guide
Uempty allow the checking to be performed while applications are allowed to read and write the
table.

Instructor Guide
Instructor notes:
Purpose — This introduces the ATTACH and DETACH options for ALTER TABLE on a
range-partitioned table. We will going into more detail on these and how they are used for
data roll-in and roll-out.
Details —
Transition statement — Next we will look at the steps for doing roll-in using the ATTACH
function of the ALTER TABLE statement.
V8.2
Instructor Guide
Uempty
Roll-in overview
• LOAD / Insert into NewMonthSales
• (Perform ETL on NewMonthSales)
• ALTER TABLE Big_Table …
ATTACH PARTITION …
STARTING '03/01/2013' LOAD
ENDING '03/31/2013' Tablespace A Tablespace B Tablespace C
FROM TABLE NewMonthSales
Big_Table.p1 Big_Table.p2 NewMonthSales
– Very fast operation
– No data movement required
– Index maintenance deferred ATTACH
Tablespace A Tablespace B Tablespace C

• COMMIT
– New data still not visible Big_Table.p1 Big_Table.p2 Big_Table.p3
• SET INTEGRITY FOR Big_Table ……

– Potentially long running operation:
• Validates data
• Maintains Non-partitioned indexes, MQTs
– Existing data available while it runs
• COMMIT
– New data visible
Figure 7-34. Roll-in overview CL4636.0
Notes:
The recommended approach for data roll-in for a range-partitioned table would be the
following steps:
1. Use LOAD or Insert into NewMonthSales: A new empty table. A previously detached
data partition could be reused.
2. Perform ETL on NewMonthSales: Prepare data for use in the range-partitioned table.
3. ALTER TABLE Big_Table … ATTACH PARTITION … STARTING '03/01/2013' ENDING
'03/31/2013' FROM TABLE NewMonthSales
The ATTACH should be a very fast operation because no data movement is required.
Index maintenance will be performed later.
4. COMMIT: To release the lock on the table.
5. SET INTEGRITY FOR Big_Table ……
This is a potentially long-running operation, but applications can still read and write to
the existing data partitions. The data in the new partition is validated, any

Instructor Guide
non-partitioned indexes will be updated and refresh immediate MQTs can be

maintained.
6. COMMIT: To make the new data partition visible to applications.
V8.2
Instructor Guide

Purpose — This summarizes the steps used to roll-in data into a range-partitioned table.
Details —
Transition statement — Next we will discuss using the SET INTEGRITY to complete the
roll-in by updating the indexes and performing any other necessary checking.

Instructor Guide
Use SET INTEGRITY to complete the roll-in

• SET INTEGRITY does:
– Any Index maintenance needed
– Checking of range and other constraints
– MQT maintenance
– Generated column maintenance
• Table is online throughout processing
• New data becomes visible at end of SET INTEGRITY
Figure 7-35. Use SET INTEGRITY to complete the roll-in CL4636.0
Notes:
When a new data partition is ATTACHed to a range-partitioned table, the data in that new
partition is not yet visible to applications. A SET INTEGRITY statement is required to
complete the roll-in of the new data partition.
SET INTEGRITY does the following for range-partitioned tables:
• Index maintenance – For any non-partitioned (global) indexes on a range-partitioned
table, the keys values from the newly attached partition will need to be added to the
indexes.
• Checking of range and other constraints:
- The values in the columns defined in the PARTITION BY RANGE clause will be
checked to make sure that each row is in the correct partition
- Check and Referential Integrity Constraints will be checked
• MQT maintenance – Dependent refresh immediate MQTs can be incrementally
refreshed.
V8.2
Instructor Guide
Uempty • Generated column maintenance – Generated and identity columns can be set. If the
FORCE GENERATED or GENERATE IDENTITY options are specified.
The range-partitioned table is online throughout the process except for the locking
associated with the ALTER ATTACH.
New data in the attached data partition becomes visible when the SET INTEGRITY is
committed.

Instructor Guide
Instructor notes:
Purpose — This describes the use of the SET INTEGRITY statement to complete the
process of rolling in data into a range-partitioned table with an ALTER TABLE ATTACH.
Details —
V8.2
Instructor Guide
Uempty
Using IMMEDIATE UNCHECKED option for SET
INTEGRITY following ALTER TABLE ATTACH
• With DB2 9.7 after an ALTER TABLE ATTACH added a new
data partition to a range partitioned table, the new data could
not be accessed until the SET INTEGRITY with IMMEDIATE
CHECKED completed and was committed
• The new data could have been validated prior to ATTACH
• Starting with DB2 10.1 the IMMEDIATE UNCHECKED option
can be used
– IMMEDIATE UNCHECKED bypasses the range and integrity checks
– If any non-partitioned indexes are defined on the table the processing
will be performed as if IMMEDIATE CHECKED was specified
– If any partitioned indexes are invalid, they will be rebuilt
• Applications can begin to use the new data partition faster !
Figure 7-36. Using IMMEDIATE UNCHECKED option for SET INTEGRITY following ALTER TABLE ATTACH CL4636.0
Notes:
Starting with DB2 10.1 the IMMEDIATE UNCHECKED option of SET INTEGRITY can be
used to bypass range and integrity checks to make the new data added by the ALTER
TABLE ATTACH statement faster.
If data integrity checking, including range validation and other constraints checking, can be
done through application logic that is independent of the data server before an attach
operation, newly attached data can be made available for use much sooner. You can
optimize the data roll-in process by using the SET INTEGRITY…ALL IMMEDIATE
UNCHECKED statement to skip range and constraints violation checking. In this case, the
table is brought out of SET INTEGRITY pending state, and the new data is available for
applications to use immediately, as long as there are no nonpartitioned user indexes on the
target table.
If there are nonpartitioned indexes (except XML columns path indexes) on the table to
maintain after an attach operation, the SET INTEGRITY…ALL IMMEDIATE UNCHECKED
statement behaves as though it were a SET INTEGRITY…IMMEDIATE CHECKED
statement. All integrity processing, nonpartitioned index maintenance, and table state

Instructor Guide
transitions are performed as though a SET INTEGRITY…IMMEDIATE CHECKED

statement was issued.
V8.2
Instructor Guide

Purpose — To discuss using the IMMEDIATE UNCHECKED option to bypass checking for
data added to a range partitioned table using the ALTER TABLE ATTACH statement.
Details —
Transition statement — Next we will discuss the use of exception tables for SET
INTEGRITY of a range-partitioned table.

Instructor Guide
Exception tables for SET INTEGRITY

• Without an exception table, any violation will fail the entire
operation
• Recommendation: Provide an exception table for SET

INTEGRITY, for example:
SET INTEGRITY FOR

sales ALLOW WRITE ACCESS,
sales_by_region ALLOW WRITE ACCESS
IMMEDIATE CHECKED INCREMENTAL
FOR EXCEPTION IN sales USE sales_ex;
Figure 7-37. Exception tables for SET INTEGRITY CL4636.0
Notes:
If any exceptions are encountered during SET INTEGRITY processing and no exception
table is provided, processing will stop and another SET INTEGRITY will be needed to
make the new data partition available. Exceptions could be caused by duplicate keys in
unique indexes or rows with values outside the defined range of a data partition.
Since SET INTEGRITY processing could be quite long for a large newly attached data
partition, it is recommended to provide exception tables.
For example:
SET INTEGRITY FOR sales ALLOW WRITE ACCESS,
sales_by_region ALLOW WRITE ACCESS
IMMEDIATE CHECKED INCREMENTAL
FOR EXCEPTION IN sales USE sales_ex ;
In this example, the range-partitioned table sales is being checked following an ALTER
TABLE ATTACH for a new data range. A dependent refresh immediate MQT named
sales_by_region will also be incrementally refreshed.
V8.2
Instructor Guide

Purpose — This explains the importance of having exception tables for the SET
INTEGRITY used to process a recently attached data partition of a range-partitioned table.
Details —
Transition statement — Let's look at the impact on table availability when using ATTACH
for data roll-in processing.

Instructor Guide
ALTER TABLE ATTACH locking considerations

pre-DB2 10.1 and DB2 10.1
DB2 9.1 to DB2 9.7 With DB2 10.1 and later
• A table level Z lock is used to block • A table level IX lock is used for
all access to the table at the time of ALTER TABLE ATTACH
the ALTER ATTACH • ALTER ATTACH will not wait for
• The ALTER will wait for all current dynamic non-repeatable read
queries to complete and release the queries (UR,CS,RS) to complete
locks on the table • ALTER ATTACH will still wait for
• No access to the table is permitted Static SQL and repeatable read
until the ALTER ATTACH commits queries to complete
• New SQL statements can not be
compiled until ALTER ATTACH
commits
Figure 7-38. ALTER TABLE ATTACH locking considerations pre-DB2 10.1 and DB2 10.1 CL4636.0
Notes:
Starting with DB2 10.1 the process of adding or attaching a data partition to a partitioned
table by using the ALTER TABLE statement with the ADD PARTITION or ATTACH
PARTITION clause has been enhanced. The partitioned table now remains accessible to
dynamic queries running under the RS, CS, or UR isolation level.
With DB2 10.1 the IX level table lock is used for the ALTER TABLE ATTACH statement
rather that the Super-exclusive Z lock used with DB2 9.7. If applications are using either
static SQL or Dynamic SQL in repeatable read isolation, the ALTER TABLE ATTACH will
need to wait for the application statement to complete.
V8.2
Instructor Guide

Purpose — To discuss the locking differences between DB2 9.7 and DB2 10.1 for the
ALTER TABLE ATTACH statement.
Details —
Transition statement — Let's look at alternative methods for data roll-in.

Instructor Guide
Alternatives for roll-in

• Predefine empty ranges for future use via LOAD, INGEST,
SQL INSERT:
– Avoids potential locking delays for ATTACH
– Tradeoff is lower performance, more contention
– Reasonable choice if data trickles in continuously
• ADD + LOAD:
– Recommend against this
– Access to table disrupted twice: once for ADD, once for LOAD
– ADD will lock table, drain packages
– Entire table is read-only during most of LOAD processing
Figure 7-39. Alternatives for roll-in CL4636.0
Notes:
There are other practical methods for data roll-in with range-partitioned tables.
It is possible to predefine empty ranges in a range-partitioned table for later use by a LOAD
or INGEST utility or by applications using SQL INSERT. Having the necessary ranges in
place would avoid the locking concerns associated with the ALTER TABLE ATTACH. The
unused data partitions would only require a small amount of disk storage.
The ALTER TABLE ADD option could be used to create a new range, but like the ATTACH
the table availability would be impacted.
The LOAD utility can be invoked with a range-partitioned table as its target. There are no
LOAD options to limit its scope in a range-partitioned table, so even if all of the input data
was loaded into one data partition, the entire table would become read only during some of
the processing and be offline for load completion. When SET INTEGRITY is used following
an online LOAD, an exclusive table lock is required at the end of its processing.
Using the IMPORT utility or an application with SQL INSERT could provide continuous
table availability, but these methods do not perform as well as LOAD for large amounts of
V8.2
Instructor Guide
Uempty data. If data arrives continuously rather than being added in large batches, then an
IMPORT or SQL INSERT would provide a good solution.

Instructor Guide
Instructor notes:
Purpose — This explains some of the alternative methods of data roll-in for
Details —
Transition statement — Let's next look at how refresh immediate MQTs would be handled
if the source table is a range-partitioned table.
V8.2
Instructor Guide
Uempty
Using Refresh Immediate MQTs with table
partitioning
• MQT contents automatically maintained by database when
insert/update/delete from base table
• Maintained by SET INTEGRITY after bulk operations:

– LOAD APPEND or ATTACH:
• New data initially invisible in base table
• MQTs continue to reflect visible data only
• Use SET INTEGRITY on base and MQTs to refresh
– DETACH:
• Data removed from table is instantly invisible
• MQTs go offline
• Use SET INTEGRITY to refresh MQTs
Figure 7-40. Using Refresh Immediate MQTs with table partitioning CL4636.0
Notes:
If a Materialized Query Table is defined as REFRESH IMMEDIATE then the contents of the
MQT are automatically maintained by the database when SQL INSERT, UPDATE and
DELETE statements change the base table or tables.
For range-partitioned tables, SET INTEGRITY is used to incrementally update refresh
immediate MQTs following a LOAD INSERT or ALTER TABLE ATTACH. In these two
cases, the new data is initially invisible in the base tables and the MQT would continue to
reflect the visible table contents. The SET INTEGRITY would list both the base table and
the MQTs for processing.
When an ALTER TABLE DETACH is used to remove a data partition, that partition's data is
instantly invisible to applications. Any REFRESH IMMEDIATE MQTs would be forced
offline by the DETACH. The MQTs would remain offline until a SET INTEGRITY is used to
refresh the MQTs. In this case, the SET INTEGRITY would list the MQTs but would not list
the base table.

Instructor Guide
Instructor notes:
Purpose — This explains the use of SET INTEGRITY to maintain the contents of
REFRESH IMMEDIATE MQTs using a range-partitioned table as a base. The use of
REFRESH DEFERRED MQTs is not discussed because those are updated using
REFRESH TABLE and generally have less associated complexity.
Details —
Transition statement — Let's discuss some tips for smoother data roll-in.
V8.2
Instructor Guide
Uempty
Tips for smoother roll-in (1 of 2)

• Issue COMMIT WORK after ATTACH, SET INTEGRITY:
– New data is not visible after SET INTEGRITY until committed
• SET LOCK TIMEOUT WAIT

– Prevent SET INTEGRITY from failing on lock conflict at the end
• Plan for query draining by ATTACH:

– ATTACH will not complete until it drains existing queries for the table
– Meanwhile, no new queries can start
Figure 7-41. Tips for smoother roll-in (1 of 2) CL4636.0
Notes:
Here are a few tips for making roll-in processing go smoother:
1. Issue a COMMIT WORK as soon as possible after the ALTER TABLE ATTACH and
SET INTEGRITY. The ATTACH has a super exclusive X-Lock on the table that will be
released by commit. The data in the attached data partition will not be available to
applications until the SET INTEGRITY is committed.
2. Use SET LOCK TIMEOUT WAIT before the SET INTEGRITY. The processing for SET
INTEGRITY is a single unit of work. If the SET INTEGRITY were to encounter a lock
time out, the processing would need to start over with a new SET INTEGRITY, so you
should use a SET LOCK TIMEOUT WAIT to avoid a lock timeout.
3. Plan for query draining required by ATTACH. The ALTER TABLE ATTACH processing is
short but the need for the super exclusive table lock, would wait for currently running
queries with locks on the table to commit and release those locks. During this time,
applications that need to acquire the table lock will be waiting. It would be best to plan a
time for the ALTER TABLE to be run when no long running queries are active.

Instructor Guide
Instructor notes:
Purpose — This describes a few practices that will help the roll-in processing to go
smoothly.
Details —
Transition statement — Let's discuss some additional tips for smooth roll-in processing.
V8.2
Instructor Guide
Uempty
Tips for smoother roll-in (2 of 2)

• Use a single SET INTEGRITY statement:
– Include all refresh immediate MQTs and the base table in the same SET
INTEGRITY statement
– MQTs that are not refreshed in the first pass go off line
– Multiple SET INTEGRITY statements = multiple passes of the data
• Specify ALLOW WRITE ACCESS with SET INTEGRITY:

– The default is the old, offline behavior
– Also available: ALLOW READ ACCESS
– Tradeoff: higher availability options may run slower
• Make use of exception tables
• Consider doing roll-in and roll-out together (a.k.a. rotate)

– ATTACH and DETACH in the same transaction minimizes the time that the
table is unavailable
Figure 7-42. Tips for smoother roll-in (2 of 2) CL4636.0
Notes:
Here are some more tips for making roll-in processing go smoother:
1. Use a single SET INTEGRITY statement to process the base table and also include all
refresh immediate MQTs. This allows all the necessary processing to be performed with
a single pass of the attached or loaded data partition. Any REFRESH IMMEDIATE
MQTs not included in the SET INTEGRITY list will be forced offline.
2. Remember to specify ALLOW WRITE ACCESS with SET INTEGRITY, since the default
is ALLOW NO ACCESS. There is also an option to limit applications to read access
using ALLOW READ ACCESS. These options offer higher availability but the additional
locking might cause the SET INTEGRITY processing to take longer.
3. Include an exception table with the SET INTEGRITY for the base table, otherwise
errors like duplicate unique index keys would cause the SET INTEGRITY to fail and
require restarting the processing from the beginning.
4. Consider doing roll-in and roll-out together as part of the same processing cycle as
follows:

Instructor Guide
- Create a new empty table and use LOAD to processing the new data.
- In a single transaction, do the ALTER TABLE DETACH and ALTER TABLE ATTACH
without a COMMIT between them. This would get both catalog updates processed
while the X-lock is held on the table.
- Run the SET INTEGRITY to check the new data partition and apply the changes to
refresh immediate MQTs for removing the old range and adding the new one.
V8.2
Instructor Guide

Purpose — This describes some additional practices that will help the roll-in processing to
go smoothly.
Details —
Transition statement — Let's take a look at how generated and identity columns can be
handled for range-partitioned tables using LOAD and ATTACH for data roll-in.

Instructor Guide
Generated columns, identity columns

• For ATTACH, column type and nullability must match
• Columns that are generated/identity in target need not be so in
source
• After ATTACH, column will be generated/identity in target, with
the appropriate default
• Default behavior is to check values in generated/identity
columns
• Use SET INTEGRITY … FORCE GENERATED if you want the
generated column values filled in by the database
• Use SET INTEGRITY ….GENERATE IDENTITY to have the
identity column values filled in by the database
• Rows cannot move to a different range during SET
INTEGRITY
Figure 7-43. Generated columns, identity columns CL4636.0
Notes:
A range-partitioned table might contain generated and identity columns. The ALTER
TABLE ATTACH option takes a table and attaches that table as a new data partition. The
source table for the attach must have columns that match the columns of the target
range-partitioned table for the attach to be successful. Columns that are defined as
generated or identity in the target table do not have do be defined as generated or identity
in the source table, but the column data type and nullability options would need to match
the columns with the same name in the target table.
After the ATTACH, the column definitions will be based on those of the target
range-partitioned table. The SET INTEGRITY processing default option is to check the
contents of the columns defined as generated or identity for validity. The SET INTEGRITY
can be run with the options FORCE GENERATED to have the generated column values
filled in by the database during the set integrity processing. The option GENERATE
IDENTITY will cause any identity column values to be filled in. If a generated or identity
column is one of the columns being used for table partitioning and the value generated
during SET INTEGRITY would cause the row to belong in a different data partition, then the
V8.2
Instructor Guide
Uempty row will not be moved, it will be removed and placed in the exception table or an error
message will be written and the processing will stop.

Instructor Guide
Instructor notes:
Purpose — This describes the processing options for generated and identity columns of a
range-partitioned table for the SET INTEGRITY following an ATTACH. LOAD and SQL
INSERT will handle the generated and identity columns in a normal way.
Details —
Transition statement — Let's take a look at how data roll-out can be accomplished for a
V8.2
Instructor Guide
Uempty
Roll-out overview
• ALTER TABLE Big_Table
DETACH PARTITION p3
INTO TABLE OldMonthSales
– Very fast operation
– No data movement required
– Index maintenance for non-partitioned
indexes performed asynchronously in
background
Table space A Table space B Table space C
• DETACH is not allowed on a table that is the
parent of an enforced referential integrity (RI) Big_Table.p1 Big_Table.p2 Big_Table.p3
relationship.
• COMMIT:
DETACH
– Detached data now invisible
– Detached partition ignored in
non-partitioned index scans Table space A Table space B Table space C
– Rest of Big_Table available Big_Table.p1 Big_Table.p2 OldMonthSales
• SET INTEGRITY FOR Mqt1, Mqt2

– (Optional) maintains MQTs on Big_Table
• EXPORT OldMonthSales; DROP
OldMonthSales
– (Optional) this becomes a standalone table
that you can do whatever you want with
Figure 7-44. Roll-out overview CL4636.0
Notes:
The steps involved in a typical roll-out with table partitioning would be:
• Use the ALTER TABLE DETACH statement to remove the selected data partition:
ALTER TABLE Big_Table DETACH PARTITION p3
INTO TABLE OldMonthSales
This operation is very fast because no data movement is required. The index
maintenance for any non-partitioned indexes is deferred. Any partitioned indexes will be
defined as indexes for the detached table.
• COMMIT: The committing of the DETACH makes the detached data invisible to
applications using the base table.
• Applications doing a table scan would completely skip the detached data partition. The
index pointers for detached data partitions would be ignored by index scans. The other
data partitions would be available for application use.
• SET INTEGRITY FOR Mqt1, Mqt2: This optional step would necessary to make any
dependent refresh immediate MQTs usable following the DETACH.

Instructor Guide
• At this point, the detached partition is available as a separate table. The table could be
dropped, exported and saved or used independently as a new table. The table will not
have any indexes at this point.
V8.2
Instructor Guide

Purpose — This should be used to explain the roll-out processing using ALTER TABLE
DETACH for a range-partitioned table.
Details —
Transition statement — Next we will review the impact of using non-partitioned indexes
for attaching or detaching data ranges.

Instructor Guide
Attach or Detach using non-partitioned indexes

• When a new data range is attached
– Index entries for new rows are added during SET INTEGRITY processing
• When a data range is detached:

– Index entries for the detached range must be removed by ASYNC Index Cleanup
– The Detached table does not have any indexes (except MDC block indexes)
Index 1 Index 2
Sales Date Product ID
IX Table space 1 IX Table space 2
Data Data Data Data

detach Range 1 Range 2 Range 3 Range 4
attach
2013 Q1 2013 Q2 2013 Q3 2013 Q4

Figure 7-45. Attach or Detach using non-partitioned indexes CL4636.0
Notes:
The non-partitioned indexes on a range-partitioned table contain pointers for rows in every
data range. When the ALTER TABLE ATTACH function is used to add a data range to an
existing range partitioned table, the index maintenance is deferred until the SET
INTEGRITY command processing is performed. The newly attached range is not
accessible until the SET INTEGRITY is committed. This index maintenance extends the
work performed by SET INTEGRITY and delays availability of the new data.
When the ALTER TABLE DETACH function is used to remove a data range from a
range-partitioned table that has non-partitioned indexes, an asynchronous task, called
Asynchronous Index Cleanup runs to physically remove all of the row pointers from the
non-partitioned indexes for the detached data range. While this processing does not delay
application access to the remaining data ranges, the index cleanup processing does take
system resources and might require index reorganization to reclaim unused index space.
V8.2
Instructor Guide

Purpose — To review the impact of using non-partitioned indexes for range-partitioned
tables when data ranges are attached or detached.
Details —
Transition statement — Next we will see how the Asynch Index cleanup utility would
appear in a LIST UTILITIES report.

Instructor Guide
Asynchronous index cleanup after DETACH

• Asynchronous Index Cleanup
performed for non-partitioned
indexes: $ DB2 LIST UTILITIES SHOW DETAIL
ID = 3
– Low priority, throttled, Type = ASYNCH INDEX CLEANUP
Database Name = WSDB
background process Partition Number = 0
Description = Table: T1, Index: I1
– Reclaims space in Start Time
11:15:01.978513
= 12/15/2010
Non-partitioned indexes (keys State

Invocation Type
= Executing
= Automatic
corresponding to data rolled-out) Throttling:
Priority = 50
– Automatically started when Progress Monitoring:
Total Work = 5 pages
DETACH is committed Completed Work
Start Time
= 0 pages
= 12/15/2010
(or after refresh of dependent 11:15:01.980518
MQTs) $ DB2 UTIL_IMPACT_PRIORITY FOR 3 TO 90
– Pauses to avoid lock conflicts One cleaner per index

with user activity
– Will NOT keep database active
– Hardens progress periodically;
picks up where it left off after
shutdowns
Figure 7-46. Asynchronous index cleanup after DETACH CL4636.0
Notes:
Asynchronous index cleanup
Asynchronous index cleanup (AIC) is the deferred cleanup of indexes following operations
that invalidate index entries. Depending on the type of index, the entries might be row
identifiers (RIDs) or block identifiers (BIDs). Either way, these entries are removed by the
index cleaners which operate asynchronously in the background.
AIC accelerates the detach of a data partition from a partitioned table. If the partitioned
table contains one or more non-partitioned indexes then AIC is initiated. In this case, AIC
removes all non-partitioned index entries that refer to the detached data partition and any
pseudo-deleted entries. Once all the indexes have been cleaned, the identifier associated
with the detached data partition is removed from the system catalog.
V8.2
Instructor Guide
Uempty
Note
If the partitioned table has dependent materialized query tables (MQTs) defined,
AIC is not initiated until after a SET INTEGRITY operation is performed.
While AIC is in progress, normal table access is maintained. Queries accessing the
indexes simply ignore any invalid entries that have not yet been cleaned.
In most cases, one cleaner is started for each non-partitioned index associated with the
partitioned table. An internal task distribution daemon is responsible for distributing the AIC
tasks to the appropriate database partitions and assigning database agents.
Both the distribution daemon and cleaner agents are internal, system applications. They
appear in the LIST APPLICATION output with the application name db2taskd and db2aic,
respectively. To prevent accidental disruption, system applications cannot be forced. The
distribution daemon remains online as long as the database is active. The cleaners remain
activate until the cleaning is complete. If the database deactivates while cleaning is in
progress, AIC resumes when the database reactivates.
AIC incurs minimal performance impact.
An instantaneous row lock test is required to determine whether a pseudo-deleted entry is
committed. However, since the lock is never acquired, concurrency is not affected.
Each cleaner acquires a minimal table space lock (IX) and table lock (IS), the locks are
released when the cleaner determines other applications are waiting for the locks. When
this occurs, the cleaner temporarily suspends processing for five minutes.
Cleaners are also integrated with the utility throttling facility. By default, each cleaner has a
utility impact priority of 50. This priority can be changed using the SET
UTIL_IMPACT_PRIORITY command or the db2UtilityControl API.
Monitoring
While AIC is in progress, it can be monitored with the LIST UTILITIES command. Each
index cleaner appears in the monitor as a separate utility.

Instructor Guide
Instructor notes:
Purpose — This describes the processing, called Asynchronous Index Cleanup, that is
used to remove the index pointers for detached data partitions. This processing was
implemented to minimize the time it would take for a DETACH to complete, and increase
table availability.
Details —
Transition statement — Next we will look at use of partitioned indexes when the ALTER
TABLE ATTACH or DETATCH is used for a range-partitioned table with partitioned indexes.
V8.2
Instructor Guide
Uempty
Attach or Detach using partitioned indexes

• When a new data range is attached:
– Reduced SET INTEGRITY processing, if matching Indexes exist on attached table,
no indexes built for new range during SET INTEGRITY processing.
– ERROR ON MISSING INDEXES option causes ATTACH to fail if source table does
not have matching indexes. By default, any missing indexes will be created.
• When a data range is detached:

– Index entries for the detached range are assigned to the detached table, ASYNC
index processing is not needed.
– Partition indexes are retained and assigned default names during detach.
Index 1 Index 1 Index 1 Index 1

Sales Date Sales Date Sales Date Sales Date
Index 2 Index 2 Index 2 Index 2

detach Product ID Product ID Product ID Product ID attach
Data Data Data Data

2013 Q1 2013 Q2 2013 Q3 2013 Q4

Figure 7-47. Attach or Detach using partitioned indexes CL4636.0
Notes:
You can use partitioned indexes to improve performance when you roll data into a table.
Before you alter a partitioned table that uses partitioned indexes to attach a new partition or
a new source table, you should create indexes on the table that you are attaching to match
the partitioned indexes of the partitioned table. After attaching the source table, you still
must issue a SET INTEGRITY statement to perform tasks, such as range validation and
constraint checking. However, if the source tables indexes match all of the partitioned
indexes on the target table, SET INTEGRITY processing does not incur the performance
and logging overhead associated with index maintenance. The newly rolled-in data is
accessible quicker than it would otherwise be.
All index key columns of the partitioned index on the target table must match with the index
key columns of the index on the source table. If all other properties of the index are the
same, then the index on the source table is considered a match to the partitioned index on
the target table. That is, the index on the source table can be used as an index on the
target table. The table here can be used to determine if the indexes are considered a match
or not.

Instructor Guide
The attach operation implicitly builds missing indexes on the source table corresponding to
the partitioned indexes on the target table. The implicit creation of the missing indexes
does take time to complete. You have an option to create an error condition if the attach
operation encounters any missing indexes. The option is called ERROR ON MISSING
INDEXES and is one of the attach operation options. The error returned when this happens
is SQL20307N, SQLSTATE 428GE, reason code 18. Information on the non-matching
indexes is placed in the administration log.
The attach operation drops indexes on the source table that do not match the partitioned
indexes on the target table. The identification and dropping of these non-matching indexes
takes time to complete. You should drop these indexes before attempting the attach
operation.
When the ALTER TABLE DETACH is used for a range-partitioned table with partitioned
indexes, each of the index partitions defined on the source table for the data partition being
detached becomes an index on the target table. The index object is not physically moved
during the detach operation. However, the metadata for the index partitions of the table
partition being detached are removed from the catalog table SYSINDEXPARTITIONS and
new index entries are added in SYSINDEXES for the new table as a result of the detach
operation. The original index identifier (IID) is kept and stays unique just as it was on the
source table.
The index name for the surviving indexes on the target table are system-generated (using
the form SQLyymmddhhmmssxxx). The schema for these indexes is the same as the
schema of the target table except for any path indexes, regions indexes, and MDC Block
indexes, which are in the SYSIBM schema. Other system-generated indexes like those to
enforce unique and primary key constraints will have a schema of the target table because
the indexes are carried over to the detached table but the constraints are not. You can use
the RENAME command to rename the indexes that are not in the SYSIBM schema.
V8.2
Instructor Guide

Purpose — To provide additional information about the advantages of using partitioned
indexes when the ALTER TABLE ATTACH or DETACH are executed.
Details —
Transition statement — Let's take a look at the availability of the partitioned table during
the roll-out of data.

Instructor Guide
Table availability during ALTER TABLE Detach

• Prior to DB2 9.7 Fix Pack 1 a Super-exclusive Z Lock was needed for
the DETACH to execute
• ALTER TABLE...DETACH PARTITION statement with DB2 9.7 Fix Pack

1 or later
– Queries can continue to access the unaffected data partitions of the table during a
roll-out operation
• The DETACH operation does not wait for dynamic uncommitted read (UR) isolation
level queries
• DETACH does not interrupt any currently running dynamic UR queries
• If Dynamic non-UR queries (read or write queries) have not locked the partition to be
detached, the DETACH operation can run concurrently
• If dynamic non-UR queries have locked the partition to be detached, the DETACH
operation waits for the lock to be released
• Hard invalidation must occur on all static packages that are dependent on the table
before the DETACH operation can proceed.
Figure 7-48. Table availability during ALTER TABLE Detach CL4636.0
Notes:
Beginning with DB2 Version 9.7 Fix Pack 1 and later fix packs, when detaching a data
partition of a partitioned table, queries can continue to access the unaffected data partitions
of the table during a roll-out operation initiated by the ALTER TABLE...DETACH
PARTITION statement.
When detaching a data partition from a partitioned table using the ALTER TABLE
statement with the DETACH PARTITION clause, the source partitioned table remains
online, and queries running against the table continue to run. The data partition being
detached is converted into a stand-alone table in the following two-phase process:
The ALTER TABLE...DETACH PARTITION operation logically detaches the data partition
from the partitioned table.
An asynchronous partition detach task converts the logically detached partition into a
stand-alone table.
If there are any dependent tables that need to be incrementally maintained with respect to
the detached data partition (these dependent tables are referred to as detached dependent
V8.2
Instructor Guide
Uempty tables), the asynchronous partition detach task starts only after the SET INTEGRITY
statement is run on all detached dependent tables.
In absence of detached dependents, the asynchronous partition detach task starts after the
transaction issuing the ALTER TABLE...DETACH PARTITION statement commits.
The ALTER TABLE...DETACH PARTITION operation performs in the following manner:
• The DETACH operation does not wait for dynamic uncommitted read (UR) isolation
level queries before it proceeds, nor does it interrupt any currently running dynamic UR
queries. This behavior occurs even when the UR query is accessing the partition being
detached.
• If dynamic non-UR queries (read or write queries) have not locked the partition to be
detached, the DETACH operation can complete while dynamic non-UR queries are
running against the table.
• If dynamic non-UR queries have locked the partition to be detached, the DETACH
operation waits for the lock to be released.
• Hard invalidation must occur on all static packages that are dependent on the table
before the DETACH operation can proceed.
• The following restrictions that apply to data definition language (DDL) statements also
apply to a DETACH operation because DETACH requires catalogs to be updated:
- New queries cannot be compiled against the table.
- A bind or rebind cannot be performed on queries that run against the table.
- To minimize the impact of these restrictions, issue a COMMIT immediately after a
DETACH operation.

Instructor Guide
Instructor notes:
Purpose — This describes impact to availability of the table during the roll-out processing
for a range-partitioned table.
Details —
Transition statement — Let's take a look at how the ALTER TABLE DETACH effects
REFRESH IMMEDIATE MQTs.
V8.2
Instructor Guide
Uempty
MQTs after DETACH

• Refresh Immediate MQTs go offline after DETACH
• Use SET INTEGRITY to refresh them
• Target table of DETACH is untouchable until MQTs are dealt

with:
– SYSCAT.DATAPARTITIONS shows 'D' in STATUS field for these
– SYSCAT.TABLES shows 'L' for table type for these
• In absence of MQTs, target is immediately available
• Target can be made available via new SET INTEGRITY option

SET INTEGRITY … FULL ACCESS
Note: This forces MQTs to be fully processed.
Figure 7-49. MQTs and DETACH CL4636.0
Notes:
The Refresh Immediate MQTs that are dependent on range-partitioned tables go offline
when the ALTER TABLE DETACH is used to remove partitions. The detached table is not
accessible until the dependent MQTs are incrementally refreshed by the SET INTEGRITY.
The DB2 catalog table SYSCAT.DATAPARTITIONS shows 'D' in STATUS field for these
and SYSCAT.TABLES shows 'L' for table type for these. In the absence of MQTs, the
detached target table is immediately available. There is a new option for the SET
INTEGRITY, FULL ACCESS can be used to force all of the dependent MQTs to be
refreshed.

Instructor Guide
Instructor notes:
Purpose — This describes the handling of the detached data partition table when the base
table has dependent Refresh Immediate Materialized Query Tables.
Details —
Transition statement — Let's take a look at how DB2 utilities support range-partitioned
tables.
V8.2
Instructor Guide
Uempty
Utility support for partitioned tables

• REORG TABLE:
– By default All Partitions reorganized serially
– No INPLACE reorgs, OFFLINE reorg with ALLOW READ ACCESS allowed
– The ON DATA PARTITION clause can be used to reorg one data partition
• If only Partitioned indexes are defined other partitions may have full read/write
• Multiple concurrent partition level reorgs are allowed
• REORG INDEX:
– Individual Non-partitioned Indexes can be reorganized
– No ALLOW WRITE ACCESS allowed
– The ON DATA PARTITION clause can be used to reorg the partitioned indexes for one data
partition
• RUNSTATS:
– Statistics are collected for ALL partitions
– Can use TABLESAMPLE to reduce I/O and CPU costs
• ROLLFORWARD to Point In Time
– Requires that all table spaces sharing pieces of a table be rolled forward together
• LOAD:
– Supports Partitioned table as target
– Can not limit Load to selected partition(s), limited access during load
– Supports all INDEXING modes: REBUILD, INCREMENTAL, DEFERRED
Figure 7-50. Utility support for partitioned tables CL4636.0
Notes:
There are some special considerations for using the DB2 utilities with range-partitioned
tables.
• The REORG utility can be used to reorganize a range-partitioned table, but in INPLACE
mode of reorganization is not supported. By default, all of the data partitions will be
reorganized serially, which would increase the elapsed time for a large table. 
Beginning with DB2 9.7 Fix pack 1, the ON DATA PARTITION clause specifying a
partition of the table supports the following features:
- REORG TABLE performs a classic table reorganization on the specified data
partition while allowing the other data partitions of the table to be fully accessible for
read and write operations when there are no nonpartitioned indexes (other than
system-generated XML path indexes) on the table.
- The supported access modes on the partition being reorganized are ALLOW NO
ACCESS and ALLOW READ ACCESS. When there are nonpartitioned indexes on
the table (other than system-generated XML path indexes), the ALLOW NO
ACCESS mode is the default and the only supported access mode for the entire

Instructor Guide
table. REORG INDEXES ALL performs an index reorganization on a specified data

partition while allowing full read and write access to the remaining data partitions of
the table. All access modes are supported.
• The REORG utility can be used to reorganize a single index non-partitioned index of a
range-partitioned table rather that needing to reorganize all of the indexes. The ALLOW
WRITE ACCESS for index reorganization is not supported for indexes on
range-partitioned tables, read only access is allowed when ALLOW READ ACCESS is
specified.
• The RUNSTATS utility is used to update the catalog statistics for range-partitioned
tables. New statistics will be collected for all data partitions. There are no options to
collect statistics for selected data partitions. The option TABLESAMPLE could be used
to reduce the system resources required to collect new statistics for large
• When performing a table space point-in-time recovery using the ROLLFORWARD
command if any of the table spaces being rolled forward contains any portion, of a
range-partitioned table, then all of the table spaces sharing pieces of a table must be
rolled forward together. This would apply to the data, indexes and large objects. This
means that the range-partitioned table would need to be recovered to bring all of the
data partitions to the same point in time. This is consistent with the way the data and
index components of non-partitioned tables are handled.
• The LOAD utility can be used with a range-partitioned table as its target, but there is no
way to limit the impact of performing the load to selected data partitions. The input data
might only contain data that would be added to one partition, but the locking will occur at
the table level, which will impact application access to the table during load processing.
A LOAD run on a table before the ALTER TABLE ATTACH command is issued, would
not cause the same availability problems. All of the standard indexing modes are
supported, including REBUILD, INCREMENTAL, DEFERRED.
V8.2
Instructor Guide

Purpose — This describes some of the limitations and considerations for using the DB2
utilities with range-partitioned tables.
Details —
Transition statement — Let's take a look at an example of creating a table using all three
features.

Instructor Guide
Table partitioning with Database Partitions and

MDC dimensions defined
CREATE TABLE CUST_ORDERS
(CUSTNUM CHAR(12), SALES_DATE DATE, PRODUCT CHAR(10), STORE_NUM INT, …)
IN Tablespace A, Tablespace B, Tablespace C …
INDEX IN Tablespace B
DISTRIBUTE BY HASH (CUSTNUM)
PARTITION BY RANGE (SALES_DATE) (STARTING FROM ‘1/1/2004’
ENDING ‘12/31/2008’ EVERY 3 MONTHS )
ORGANIZE BY DIMENSIONS (PRODUCT,STORE_NUM)
Data Row
Distribute via Hash
DB partition 1 DB partition 2
Partition By Range Partition By Range
Table space A Table space B Table space C Table space A Table space B Table space C
Part 1 Part 2 Part 3 Part 1 Part 2 Part 3
PRODUCT PRODUCT
Organize By Organize By Organize By Organize By Organize By Organize By
STORE_NUM STORE_NUM
Figure 7-51. Table partitioning with Database Partitions and MDC Defined CL4636.0
Notes:
Look at the following example of a CREATE TABLE statement that defines table
partitioning, DPF database partitioning and multidimensional clustering.
CREATE TABLE CUST_ORDERS
(CUSTNUM CHAR(12), SALES_DATE DATE, PRODUCT CHAR(10),
STORE_NUM INT, …)
IN Tablespace A, Tablespace B, Tablespace C …
INDEX IN Tablespace B
DISTRIBUTE BY HASH (CUSTNUM)
PARTITION BY RANGE (SALES_DATE)
(STARTING FROM '1/1/2004' ENDING '12/31/2008'
EVERY 3 MONTHS )
ORGANIZE BY DIMENSIONS (PRODUCT,STORE_NUM)
This would create a new table CUST_ORDERS with the following characteristics:
• The column CUSTNUM, defined in the DISTRIBUTE BY HASH clause, would be used
for DPF database partitioning. The number of database partitions would be a
V8.2
Instructor Guide
Uempty characteristic of the database partition group for the table spaces. All of the table
spaces would need to be in the same database partition group. The CUSTNUM column
might have been selected because there are a large number of distinct customer
numbers. All of the information for a single customer would be stored on a single
partition.
• The column SALES_DATE, defined in the PARTITION BY RANGE clause, would be
used for table partitioning. The short syntax would cause the database to define a
series of data partitions, each holding three months of data for the defined date range.
The SALES_DATE column might have been selected because the data is roll-in on a
quarterly basis.
• The columns PRODUCT and STORE_NUM, defined in the ORGANIZE BY
DIMENSIONS clause, would be used to define the multidimensional clustering
dimensions, which would create the block indexes for this table. The columns
PRODUCT and STORE_NUM might have been selected as MDC dimensions to
support the reporting requirements by specific store and product numbers.

Instructor Guide
Instructor notes:
Purpose — This shows a specific example of a CREATE TABLE that uses all three
features, table partitioning, database partitioning and MDC. In this case the columns used
for each feature are different. This is not a requirement, but it helps to keep the example
simple and more clear.
Details —
Transition statement — Let's take a look at a query using table partitioning with MDC
dimensions and how the access for a query could combine partition elimination and effect
use of MDC block indexing.
V8.2
Instructor Guide
Uempty
Simultaneous
partition elimination and block elimination
CREATE TABLE sales(…) SELECT * FROM sales
PARTITION BY (sale_date) WHERE sale_date >
(STARTING '01/01/2004' '04/01/2004' AND sale_date <
ENDING '12/31/2008' '06/01/2004' AND region =
EVERY 3 MONTHS) 'SW'
ORGANIZE BY (region,
product) Block Index scan
Q1 2004 Q2 2004 Q4 2008

NW NW NW NW
desk sofa desk desk
SW
chair
SW
sofa
SW
chair
SW
sofa ... SW
chair
SW
sofa
NE NE NE NE NE NE
desk table desk table desk table
SE SE SE SE SE
chair chair sofa chair sofa
Figure 7-52. Simultaneous partition elimination and block elimination CL4636.0
Notes:
In this example, a table was created that combined table partitions with multidimensional
clustering as follows:
CREATE TABLE sales(...)
PARTITION BY (sale_date)
(STARTING '01/01/2004' ENDING '12/31/2008' EVERY 3 MONTHS)
ORGANIZE BY (region, product)
A query runs with the following predicates:
SELECT * FROM sales
WHERE sale_date > '04/01/2004' AND sale_date < '06/01/2004'
AND region = 'SW'
The query includes a predicate with a range of dates for the sale_date column. The table
was defined with table partitions, each holding three months of data based on the column,
so the DB2 Optimizer would be able to determine that all of the rows that could possible

Instructor Guide
match this predicate would come from a single data partition. All of the other data partitions
would be eliminated.
The query also contains a predicate on the region column, which is one of the two columns
defined as MDC dimensions. The block index for the region column could be efficiently
used to locate the slice of data within the one data partition that would match this second
predicate. The block index entries would each contain the partition IDs as part of the index
pointer.
V8.2
Instructor Guide

Purpose — This shows an example of using table partitioning combined with MDC
dimensions to efficiently produce a query result from a large table.
Details —
Transition statement — Let's compare using a MDC and table partitioning for data roll-in.

Instructor Guide
Using Table partitioning or MDC dimensions for

efficient data roll-in
Need to Address – Roll-in Recommendation

Roll-in or roll-out a month or more of data Partitioning
...during a traditional offline window
...during a small processing window (less than 1 Partitioning
minute)
...while keeping the table continuously available MDC
Roll-in a week or more of data to tables with Partitioning

dependent IMMEDIATE REFRESH MQTs
Load data daily (either online or offline) Both together
Load data continually (online) Both together
Figure 7-53. Table partitioning + MDC (roll-in) CL4636.0
Notes:
Deciding whether to use table partitioning or MDC or both depends on what problems you
most need to address for the application.
This slide compares and contrasts table partitioning with MDC, focusing on issues with
roll-in.
If the application requires rolling in large batches of data and there is some window of time
when the table can be offline, then table partitioning is probably the better choice.
Table partitioning allows the roll-in of data with a very brief period where the table is
unavailable, which could be as short as a few seconds. If this functionality is appropriate for
the application, then table partitioning is clearly the better choice.
If the application can not tolerate having the table off line for even one minute when rolling
in a batch of new data, then table partitioning offers less advantage in the area of roll-in.
However, there are other reasons to use table partitioning. Table partitioning is particularly
well suited for roll-in scenarios involving IMMEDIATE REFRESH MQTs.
V8.2
Instructor Guide
Uempty If the new data is added to the table continuously, then the ATTACH and DETACH are not
applicable.
If the application does roll-in of a new batch every day, then table partitioning might or
might not be a good choice. Although table partitioning allows 32,000 partitions to be
defined, most applications of table partitioning will not use more than a few hundred
partitions.
Table partitioning offers a capability for very large tables across multiple table spaces that
MDC alone does not address. The combination of table partitioning and MDC might be
appropriate for some tables. The table partitioning ranges might be on a monthly basis,
while a MDC dimension could divide data for each single day.

Instructor Guide
Instructor notes:
Purpose — This compares the selection of table partitioning and MDC based on the
application requirements, with the primary criteria being how data is added to the table.
Details —
Transition statement — Finally, let's consider the selection of table partitioning compared
to database partitioning.
V8.2
Instructor Guide
Uempty
Using table partitioning or Database partitioning
Need to Address Best Explanation

Table capacity Table Table Partitioning is simpler to setup and
Partitioning maintain.
Parallel query execution Database Database partitioning provides high query
(query performance) partitions parallelism.
Partition elimination Table Table partitioning provides partition

(query performance) Partitioning elimination.
Maximum query Both Query parallelism and partition
performance elimination are complementary.
Use both for maximal query performance.
Figure 7-54. Using table partitioning or Database partitioning CL4636.0
Notes:
Both table partitioning and database partitioning using InfoSphere Balanced Warehouse
offer solutions for very large tables. Prior to DB2 9.1, database partitioning was the primary
solution for applications that required access to very large tables. If the only issue was
table size, then table partitioning would be a preferred solution because it is simpler to
setup and use, especially if the application only has a few very large tables.
The database partitioning does continue to offer unique strength in environment where
query parallelism is needed to produce better query performance. Database partitioning is
the only option that allows the CPU, Memory and Disk I/O resources of multiple systems to
be combined to work on a single query.
Table partitioning can benefit query performance when the query includes range or equality
predicates on the columns used to divide the table data into data partitions. This enables
the DB2 Optimizer to utilize partition elimination for table and index scans.
The query parallelism and partition elimination benefits can be combined when table
partitioning is implemented for a table in a partitioned database.

Instructor Guide
The shared nothing architecture used for DB2 partitioned databases has some important
implications for administration of very large tables. The recovery utilities, Backup and
Restore, work at the database partition level, so a damaged partition can be recovered
leaving the other database partitions available for application access. The REORG utility
can be used to reorganize a subset of a table on selected database partitions. These allow
the large table to be handled for scheduled backups and reorganization in more
manageable amounts.
The use of table partitioning to divide a table across multiple table spaces does offer DBAs
the option to backup or restore parts of a range-partitioned table, using table space
backups.
Beginning with DB2 9.7 Fix Pack 1, a single data partition of a range partitioned table can
be reorganized using the REORG utility.
The RUNSTATS utility only allows statistics collection for one database partition. Table data
on the other database partitions are assumed to have similar characteristics. This might
reduce the system resources to collect statistics, but the statistics might not accurately
reflect the entire table. Using RUNSTATS for a range-partitioned table collects statistics on
all of the data partitions.
V8.2
Instructor Guide

Purpose — This describes some advantages for selecting either table partitioning, DB2
database partitioning, or combining the two.
Details —
Transition statement — Let's summarize what we have learned about table partitioning.

Instructor Guide
Unit summary
including database partitioning, Multi-Dimensional Clustering (MDC)
and UNION ALL views
including partitioned and non-partitioned indexing and multiple table
spaces
• Define the data partition ranges for a table using the short and long form
syntax
• Plan the use of SET INTEGRITY as part of the roll-in and roll-out
processing for range-partitioned tables
• Describe the maintenance for refresh immediate materialized query
tables (MQT) when used with table partitioning
• Select between table partitioning, MDC clustering, and database
partitioning depending on the application and data characteristics
Notes:
V8.2
Instructor Guide

Purpose —
Details —
Transition statement — End of unit.

Instructor Guide
Student exercise 6
Figure 7-56. Student exercise 6 CL4636.0
Notes:
V8.2
Instructor Guide

Purpose —
Details —

Instructor Guide
V8.2
Instructor Guide
Uempty Unit 8. Advanced Table Reorganization
Estimated time
02:30

This unit describes the concepts and processing for the DB2 REORG
utility. The differences between online and offline table reorganizations
are examined in detail. The processing for Index Create and Index
reorganization is also described. Students learn the DB2 commands
and SQL queries that can be used to track the progress of REORG
utilities. This unit also covers the methods for determining which tables
and indexes would benefit from reorganization, including using the
REORGCHK report. One section will cover the role of the REORG
utility for implementing data and index compression.

• Describe the reasons for reorganizing tables and indexes
• Examine a REORGCHK report to determine which tables and
indexes to reorganize
• Use the db2pd command to monitor REORG utility progress
• Utilize the REORG utility to implement compression for tables and
indexes
• Compare using REORG to build a compression dictionary to
automatic dictionary creation
• Plan the use of offline and online table and index reorganizations to
minimize the impact to applications and optimize performance
• Utilize the RECLAIM EXTENTS option of REORG to free unused
space in data and indexes with minimal processing
• Describe the locking and logging required for online and offline
REORGs
© Copyright IBM Corp. 2005, 2015 Unit 8. Advanced Table Reorganization 8-1
Instructor Guide
Unit objectives
• Examine a REORGCHK report to determine which tables and indexes
to reorganize
indexes
• Compare using REORG to build a compression dictionary to automatic
dictionary creation
• Utilize the RECLAIM EXTENTS option of REORG to free unused space
in data and indexes with minimal processing
REORGs
Notes:
V8.2
Instructor Guide

Instructor Guide
V8.2
Instructor Guide
Uempty 8.1. Advanced Table Reorganization

Instructor Guide
DB2 reorganization
• In DB2 for Linux, UNIX and Windows, the objects that can be reorganized by a
user are tables and indexes
• Many changes to table data (INSERTs/UPDATEs/DELETEs) can affect the
physical organization of table and index data to the point where performance is
adversely affected.
• The goal of reorganization is to improve SQL query performance
– Reclaim fragmented space into physically contiguous pages within an object
– Improve the physical clustering of object data.
• The result is that access to a reorganized object can then be done with minimal
I/O and buffer pool misses as well as with maximum prefetcher effectiveness, that
is, maintain or improve query performance.
TABLE
REORG
Figure 8-2. DB2 reorganization CL4636.0
Notes:
In DB2 for Linux, UNIX, and Windows environments, the database administrator can use
the REORG Utility to reorganize Tables and Indexes.
The goal of reorganization is to improve SQL performance. The changes to the contents of
a table resulting from SQL Inserts, Updates and Deletes can cause a table to reach a point
where the table contains many more pages than are necessary to hold the current rows. If,
for example, all of the data for one month or one year has just been deleted from a table
that holds a rolling history, the table might contain a number of pages that are only partially
filled and some pages might not contain any rows. That space could be used, over time to
insert new data, but a reorganization of the table could reduce the size of a table, which
could reduce the number of I/OS that would be needed to scan the table and improve
application performance.
Inserting and deleting rows can also effect how efficiently the indexes on a table are
structured. Some applications might experience reduced performance when the physical
sequence of the data does not match the clustering index. The REORG utility can be used
to rearrange the data rows to match the sequence of a clustering index.
V8.2
Instructor Guide

Purpose — To introduce the term reorganization in DB2 as it applies to tables, indexes and
pages.
Details —
Transition statement — Let's define pointer overflow records and why they can occur in a
DB2 database.
Instructor Guide
Overflow records
• Overflow records are created by an update that enlarges an existing
row such that it no longer fits on its page:
– The record is inserted on another page (where it fits) as an overflow record
– The original record is converted to a pointer record which only contains the
overflow record's RID
– Indexes keep the original RID; so an extra page read is required to access the
data
– If possible, avoid updates that enlarge records; otherwise pointer/overflow
records may be created and performance reduced
• Offline Table REORG eliminates these overflow records

– CLEANUP OVERFLOWS option provides overflow elimination in an online
reorg
• Query using OVERFLOW statistic returned by SYSCAT.TABLES or run

REORGCHK
• Monitor new overflow creation using MON_GET_TABLE
Figure 8-3. Overflow records CL4636.0
Notes:
Row data can overflow when VARCHAR columns are updated with values that are longer
than the initial values and there is insufficient space in the data page to hold the longer row
data. In such cases, a pointer is kept at the original location in the row and the actual value
is stored in another location that is indicated by the pointer. This can impact performance
because the database manager must follow the pointer to find the contents of the row. This
two-step process increases the processing time and might also increase the number of
I/Os required.
Reorganizing the table data will eliminate the row overflows; therefore, as the number of
overflow rows increases, the potential benefit of reorganizing your table data increases.
The RUNSTATS utility counts the number of overflow rows in a table and saves the count
in the OVERFLOW column in SYSCAT.TABLES.
The table function MON_GET_TABLE can be used to monitor the creation of new overflow
records for each table of an active DB2 database.
V8.2
Instructor Guide
Uempty
Note
The CLEANUP OVERFLOWS option was added with DB2 10.5, as a suboption of
INPLACE reorg to remove pointer overflow records but bypass other row movement
processing . This option will be covered later in the discussion on INPLACE reorganization.
Instructor Guide
Instructor notes:
Purpose — To define the term overflow record/row to understand why the reorg might be
necessary to remove the overflow pointers and improve table access efficiency.
Details —
Transition statement — Next we will discuss the basic goals of the REORG utility.
V8.2
Instructor Guide
Uempty
Goals of the REORG utility

• Table REORG goals:
– Defragment or compact data onto fewer data pages
– Physically recluster data into the same logical sequence as an index
– Eliminate pointer-overflow records
– Create or rebuild the dictionary used for row compression and compress the
table data
• Index REORG goals:

– Remove fragmentation (when index pages, due to frequent updates, become
mostly empty)
– Improve physical clustering (the degree to which the physical ordering of the
index leaf pages matches the order of the key values contained within those
pages - this is important for sequential prefetching to work efficiently; otherwise,
index scan performance degrades)
– Reduce number of levels in an index
– Remove pseudo-deleted rows and pseudo-deleted pages
– Rebuild indexes after changing the index compression option
– Reclaim disk space used by dropped indexes
Figure 8-4. Goals of the REORG utility CL4636.0
Notes:
The goals of Table reorganization are:
1. Defragment or compact data onto fewer data pages – If many rows have been
deleted, the REORG Utility might be able to compact the remaining rows and use fewer
pages.
2. Physically recluster data into the same logical sequence as an index – One option
of the REORG Utility is to arrange the data rows to match the sequence of an index.
3. Eliminate pointer-overflow records – The pointer-overflow records that were created
when rows were moved to a different page, because of an increase in row size, will be
removed during a reorganization.
4. Create or Rebuild the Dictionary used for row compression and compress the
table data – An offline REORG utility can create the dictionary data required for row
compression and produce a compressed table.
Instructor Guide
The goals for Index reorganization are:

1. Remove fragmentation – Some table change activity, especially inserts and deletes,
can result in index pages becoming mostly empty. Reorganizing the indexes can reduce
this fragmentation.
2. Improve physical clustering – The degree to which the physical ordering of the index
leaf pages matches the order of the key values contained within those pages - this is
important for sequential prefetching to work efficiently; otherwise, index scan
performance degrades.
3. Reduce number of levels in an index – The indexes are tree structures that can grow
in levels as the size of the index increases. Reorganizing an index might reduce the
overall size of the index such that the number of levels in the index decreases. This
reduces the number of index pages that need to be read in order to locate data rows.
4. Remove pseudo deleted rows and pseudo deleted pages – With Type 2 indexes,
under some conditions, when rows are deleted, the index pointers to those rows are
marked as pseudo deleted rather than being physically removed. In some cases, all of
the keys in an index page might have been pseudo deleted. Index reorganization can
remove these pointers and reduce the number of leaf pages in an index.
5. Rebuild indexes after changing the index compression option – The index
reorganization can rebuild indexes to implement or remove compression for indexes.
6. Reclaim space from the index object when an index is dropped.
V8.2
Instructor Guide

Purpose — This states the basic goals of the REORG utility for table and index
reorganization.
Details —
Transition statement — Let's look at a scenario where running a REORG utility could
increase the size of a table rather than make it smaller.
Instructor Guide
REORG does not always

shrink a table: Reclustering
3 pages in total 4 pages in total

page 0
page 0
1000 bytes 'row E'
1000 bytes 'row A'
2000 bytes 'row D'
3000 bytes free
1000 bytes 'row A' REORG by index page 1
0 bytes free 3500 bytes 'row B'
500 bytes free
page 1
page 2
3500 bytes 'row B' 3500 bytes 'row C'
500 bytes 'row F' 500 bytes free
0 bytes free
page 3
2000 bytes 'row D'
page 2
1000 bytes 'row E'
3500 bytes 'row C'
500 bytes 'row F'
500 bytes free
500 bytes free
Figure 8-5. REORG does not always shrink a table: Reclustering CL4636.0
Notes:
In some cases, running the REORG utility can increase the number of pages required to
hold the data.
When one of the table's indexes is used to recluster the rows in the data pages, the
REORG utility will rearrange the rows to match the sequence of the selected index. If the
rows vary in size, the new sequence could cause unused space that requires more pages
than the table needed before the reorganization.
In the example shown, the table has rows that vary in size, so that when the REORG
re-sequences the rows by the indexed column, there are some pages that can not be filled
and the table gets larger. That free space could be used to hold new rows or to expand
existing rows during an update.
V8.2
Instructor Guide

Purpose — This shows an example where the reorg would need more pages to store the
reorganized table data because putting the rows in the clustered sequence left too much
unused space.
Details —
Transition statement — Next we will look at the effect that defining PCTFREE for a table
or index can have on the reorganization.
Instructor Guide
REORG does not always

shrink a table: PERCENT FREE
Table before reorg
db2 alter table prod.payroll pctfree 25
db2 reorg table prod.payroll index prod.empindex
Table after reorg

75% 75% 75% 75%
Figure 8-6. REORG does not always shrink a table: PERCENT FREE CL4636.0
Notes:
In the example, a table is altered to define a percentage of free space. The REORG utility
would then build new pages with at least that percentage of space empty. This would allow
new rows to be inserted or existing rows to be updated and increase in size.
Free space can be defined for a table using the PCTFREE option on an ALTER TABLE
statement. This will not have any immediate impact on the table's size. During the next
reorganization of the table, the REORG utility would only add rows to the page that would
not overlap the size defined by PCTFREE. This freespace would also be used for building
pages during LOAD Utility processing.
ALTER TABLE <tabname> PCTFREE integer
Free space can be defined for an index using PCTFREE and LEVEL2 PCTFREE.
.-PCTFREE 10-------.
CREATE INDEX--+------------------+---+-------------------------+-->
'-PCTFREE--integer-' '-LEVEL2 PCTFREE--integer-'
V8.2
Instructor Guide
Uempty PCTFREE – Specifies what percentage of each index page to leave as free space when
building the index. The first entry When additional entries are placed in an index page at
least integer percent of free space is left on each page. The value of integer can range from
0 to 99. However, if a value greater than 10 is specified, only 10 percent free space will be
left in non-leaf pages. The default is 10.
LEVEL2 PCTFREE – Specifies what percentage of each index level 2 page to leave as
free space when building the index. The value of integer can range from 0 to 99. If LEVEL2
PCTFREE is not set, a minimum of 10 or PCTFREE percent of free space is left on all
non-leaf pages. If LEVEL2 PCTFREE is set, integer percent of free space is left on level 2
intermediate pages, and a minimum of 10 or integer percent of free space is left on level 3
and higher intermediate pages.
Instructor Guide
Instructor notes:
Purpose — This shows an example of a table that would be larger after the reorganization
due to the amount defined for PCTFREE.
Details —
Transition statement — Next we will discuss when to run the REORG utility.
V8.2
Instructor Guide
Uempty
When to REORG?
• Consider the following factors, which might indicate that you should
reorganize a table:
– A high volume of insert, update, and delete activities on tables accessed by
queries
– Significant changes in the performance of queries that use an index with a high
cluster ratio
– Executing RUNSTATS to refresh statistical information does not improve
performance
– The REORGCHK command indicates a need to reorganize your table
– Data Row compression is being implemented for a table
– To implement index compression
– The tradeoff between the cost of increasing degradation of query performance
and the cost of reorganizing your table, which includes the CPU time, the
elapsed time, and the reduced concurrency resulting from the REORG utility
locking the table until the reorganization is complete
Figure 8-7. When to REORG? CL4636.0
Notes:
Consider the following factors, which might indicate that you should reorganize a table:
1. A high volume of insert, update, and delete activity on tables accessed by queries. If
many rows are inserted, there might not be enough free space to keep them in the
clustered sequence. If many rows are deleted, the table will still have the space
allocated and a REORG can free the unnecessary space.
2. Significant changes in the performance of queries that use an index with a high cluster
ratio.
Some applications access groups of rows using the clustered index and might not
perform well if the table becomes unclustered.
3. Executing RUNSTATS to refresh statistical information does not improve performance.
In some cases, a RUNSTATS utility can collect the current statistics and resolve
performance problems. If a table has become unclustered or if it contains a large
amount of free space, the REORG utility might be needed to improve access efficiency.
4. The REORGCHK command indicates a need to reorganize your table.
Instructor Guide
5. If Data Row Compression is being implemented for a table, the REORG utility can be
used to build the compression dictionary and compress the data rows.
6. An Index reorganization can be used to implement index compression if the
COMPRESS option for an existing index is altered to YES.
There is a trade-off between the cost of increasing degradation of query performance and
the cost of system resources required to reorganize a table, including the CPU time, the
elapsed time, and the reduced concurrency resulting from the REORG utility locking the
table until the reorganization is complete.
V8.2
Instructor Guide

Purpose — To explain the primary reasons for running a REORG utility on a table.
Details —
Transition statement — Next we will look at an example of a REORGCHK report.
Instructor Guide
Recommending REORG: REORGCHK (1 of 2)
db2 reorgchk on schema auto

Table statistics:
F1: 100 * OVERFLOW / CARD < 5

F2: 100 * (Effective Space Utilization of Data Pages) > 70
F3: 100 * (Required Pages / Total Pages) > 80
SCHEMA NAME CARD OV NP FP ACTBLK TSIZE F1 F2 F3 REORG

------------------------------------------------------------------------------------
Table: AUTO.HIST1
AUTO HIST1 137619 0 4174 5427 - 9770949 0 44 76 -**
Table: AUTO.HIST2
AUTO HIST2 39612 0 1601 1602 - 2812452 0 43 99 -*-
Table: AUTO.HIST3
AUTO HIST3 136880 0 4918 4919 - 9718480 0 49 99 -*-
------------------------------------------------------------------------------------
Figure 8-8. Recommending REORG: REORGCHK (1 of 2) CL4636.0
Notes:
The visual shows an example of a REORGCHK report for a group of tables.
In this example:
- All three tables are flagged as below the threshold for the F2 calculation, meaning
the tables have greater than 30% free space.
- The table AUTO.HIST1 is also flagged for the F3 calculation because more than
20% pages contain no rows.
The REORGCHK table report contains the following information that is either copied from
system catalog statistics or calculated from the table statistics:
CARD (CARDINALITY) – The number of rows in base table.
OV (OVERFLOW) – The number of pointer-overflow rows.
NP (NPAGES) – The number of pages that contain data.
FP (FPAGES) – The total number of pages in the data file, including pages that have no
data or have not yet been used to store data.
V8.2
Instructor Guide
Uempty ACTBLK – The total number of active blocks for a multidimensional clustering (MDC)
table. |This field is only applicable to tables defined using the ORGANIZE BY clause. It
indicates the number of blocks of the table that contain data.
TSIZE – The table size in bytes. Calculated as the product of the number of rows in the
table (CARD) and the average row length. The average row length is computed as the sum
of the average column lengths (AVGCOLLEN in SYSCOLUMNS) plus 10 bytes of row
overhead. For long fields and LOBs, only the approximate length of the descriptor is used.
The actual long field or LOB data is not counted in TSIZE.
Instructor Guide
Instructor notes:
Purpose — This shows an example of a REORGCHK report for tables that have some
indicators that a reorganization should be run for all three tables.
Details —
Transition statement — Let's look at the calculations that REORGCHK uses to evaluate
each table.
V8.2
Instructor Guide
Uempty
REORGCHK: Table statistics

F1: 100 * OVERFLOW / CARD < 5
• The total number of Overflow records in the table should be less

than 5%
F2: 100 * (TSIZE / ((FPAGES-1) * (TABLEPAGESIZE- 6)) > 70
• There should be less than 30% free space in the table
F3: 100 * (Required Pages / Total Pages) > 80
• The number of pages that contains no rows at all should be less than
20% of the total number of pages in the table
Figure 8-9. REORGCHK: Table statistics CL4636.0
Notes:
• Formula F1:
100*OVERFLOW/CARD < 5
The total number of overflow rows in the table should be less than 5 percent of the total
number of rows. Overflow rows can be created when rows are updated and the new
rows contain more bytes than the old ones (VARCHAR fields), or when columns are
added to existing tables.
• Formula F2:
For regular tables:
100*TSIZE / ((FPAGES-1) * (TABLEPAGESIZE- 6)) > 70
The table size in bytes (TSIZE) should be more than 70 percent of the total space
allocated for the table. (There should be less than 30% free space.) The total space
allocated for the table depends upon the page size of the table space in which the table
resides (minus an overhead of 6 bytes). Because the last page allocated is not usually
filled, 1 is subtracted from FPAGES.
Instructor Guide
For MDC tables:

100*TSIZE / ((ACTBLK-FULLKEYCARD) * EXTENTSIZE * (TABLEPAGESIZE- 6)) > 70
FULLKEYCARD represents the cardinality of the composite dimension index for the
MDC table. Extent size is the number of pages per block. The formula checks if the
table size in bytes is more than the 70 percent of the remaining blocks for a table after
subtracting the minimum required number of blocks.
• Formula F3:
100*NPAGES/FPAGES > 80
The number of pages that contain no rows at all should be less than 20 percent of the
total number of pages. (Pages can become empty after rows are deleted.)
For MDC tables, the formula is:
100 * activeblocks / ( ( fpages / ExtentSize ) - 1 )
V8.2
Instructor Guide

Purpose — This shows the calculations based on catalog statistics that indicate when a
table should be reorganized to improve efficiency.
Details —
Transition statement — Now let’s look at the index section of the REORGCHK report.
Instructor Guide
Recommending REORG: REORGCHK (2 of 2)
Index statistics:
F4: CLUSTERRATIO or normalized CLUSTERFACTOR > 80

F5: 100 * (Space used on leaf pages / Space available on non-empty leaf pages) > MIN(50, (100 - PCTFREE))
F6: (100 - PCTFREE) * (Amount of space available in an index with one less level / Amount of space
required for all keys) < 100
F7: 100 * (Number of pseudo-deleted RIDs / Total number of RIDs) < 20
F8: 100 * (Number of pseudo-empty leaf pages / Total number of leaf pages) < 20
SCHEMA.NAME INDCARD LEAF ELEAF LVLS NDEL KEYS LEAF_RECSIZE NLEAF_RECSIZE

--------------------------------------------------------------------------------------------------
Table: RG.HIST1
Index: RG.HIST1IX1
137619 346 0 3 0 60728 4 4
Index: RG.HIST1IX2
137619 315 0 3 0 44671 4 4
Table: RG.HIST2
Index: RG.HIST2IX1
39612 337 0 3 0 25363 4 4
Table: RG.HIST3
Index: RG.HIST3IX1
136880 320 2 3 0 50 2 2
LEAF_PAGE_OVERHEAD NLEAF_PAGE_OVERHEAD F4 F5 F6 F7 F8 REORG

----------------------------------------------------------------
822 822 3 93 58 0 0 *----
822 822 3 93 64 0 0 *----
822 822 4 31 175 0 0 ***--
984 984 27 69 95 0 0 *----
Figure 8-10. Recommending REORG: REORGCHK (2 of 2) CL4636.0
Notes:
The visual shows an example of a REORGCHK report for the indexes on a group of tables.
In this example:
- All four indexes are flagged as below the threshold for the F4 calculation, meaning
the indexes are not clustered.
- The index on RG.HIST2 is flagged for the F5 calculation because more than 50% of
the allocated index space is free.
- The indexes on RG.HIST2 is also flagged for the F6 calculation indicating that these
indexes could be reduced one level if reorganized.
- None of the indexes are flagged for the F7 calculation because less than 20% of the
keys are pseudo-deleted RIDs.
- None of the indexes are flagged for the F8 calculation because less than 20% of the
pages are pseudo-empty leaf pages.
V8.2
Instructor Guide
Uempty The REORGCHK index report contains the following information that is either copied from
system catalog statistics or calculated from the index statistics:
CARD – The number of rows, cardinality, in base table.
LEAF – The total number of index leaf pages (NLEAF).
ELEAF – The number of pseudo empty index leaf pages (NUM_EMPTY_LEAFS). A
pseudo empty index leaf page is a page on which all the RIDs are marked as deleted, but
have not been physically removed.
NDEL – The number of pseudo deleted RIDs (NUMRIDS_DELETED). A pseudo deleted
RID is a RID that is marked deleted. This statistic reports pseudo deleter RIDs on leaf
pages that are not pseudo empty. It does not include RIDs marked as deleted on leaf
pages where all the RIDs are marked deleted.
LVLS – The number of index levels (NLEVELS).
ISIZE – The Index size, calculated from the average column length of all columns
participating in the index.
KEYS – The number of unique index entries that are not marked deleted
(FULLKEYCARD).
The Index report also lists information about the index record lengths.
LEAF_RECSIZE – Record size of the index entry on a leaf page. This is the average size
of the index entry excluding any overhead and is calculated from the average column
length of all columns participating in the index.
NLEAF_RECSIZE – Record size of the index entry on a non-leaf page. This is the average
size of the index entry excluding any overhead and is calculated from the average column
length of all columns participating in the index except any INCLUDE columns.
LEAF_PAGE_OVERHEAD – Reserved space on the index leaf page for internal use.
NLEAF_PAGE_OVERHEAD – Reserved space on the index non-leaf page for internal
use.
Instructor Guide
Instructor notes:
Purpose — This shows the index section of a REORGCHK report where all of the indexes
have some indicators for requiring reorganization for one or more of the calculations.
Details —
Transition statement — Let's look at calculations in REORGCHK for indexes.
V8.2
Instructor Guide
Uempty
REORGCHK: Index statistics

F4: CLUSTERRATIO or normalized CLUSTERFACTOR > 80
• The clustering ratio of an index should be greater than 80%

(Low cluster ratio means index sequence not the same as table sequence)
F5: 100 * (Space used on leaf pages / Space available on

non-empty leaf pages) > MIN(50, (100 - PCTFREE))
• Less than 50% of the space reserved for index entries should be empty
F6: (100-PCTFREE)*((INDEXPAGESIZE-96)/(ISIZE+12))**
(NLEVELS-2)*(INDEXPAGESIZE-96)/(KEYS*(ISIZE+9)
+(CARD-KEYS)*5) < 100
• Determine if recreating the index would result in a tree having fewer levels.
F7:100 * (NUMRIDS_DELETED / (NUMRIDS_DELETED + CARD)) < 20
• The number of pseudo-deleted RIDs on non-pseudo-empty pages should be less than 20 percent
F8:100 * (NUM_EMPTY_LEAFS/NLEAF) < 20
• The number of pseudo-empty leaf pages should be less than 20 percent of the total number of leaf
pages.
Figure 8-11. REORGCHK: Index statistics CL4636.0
Notes:
REORGCHK uses the following formulas to analyze the indexes and their relationship to
the table data:
• Formula F4:
CLUSTERRATIO or normalized CLUSTERFACTOR > 80
The clustering ratio of an index should be greater than 80 percent. When multiple
indexes are defined on one table, some of these indexes have a low cluster ratio. (The
index sequence is not the same as the table sequence.) This cannot be avoided. Be
sure to specify the most important index when reorganizing the table. The cluster ratio
is usually not optimal for indexes that contain many duplicate keys and many entries.
• Formula F5:
100*(KEYS*(ISIZE+9)+(CARD-KEYS)*5) /
((NLEAF-NUM_EMPTY_LEAFS)*INDEXPAGESIZE) > MIN(50, (100 - PCTFREE))
Instructor Guide
Less than 50 percent of the space reserved for index entries should be empty (only
checked when NLEAF>1). If the percentage of index freespace is greater then 50, the
define limit is adjusted.
• Formula F6:
(100-PCTFREE)*((INDEXPAGESIZE-96)/(ISIZE+12))**
(NLEVELS-2)*(INDEXPAGESIZE-96) / (KEYS*(ISIZE+9)+(CARD-KEYS)*5)
< 100
Formula 6 is used to determine if recreating the index would result in a tree having
fewer levels. This formula checks the ratio between the amount of space in an index
tree that has one less level than the current tree, and the amount of space needed. If a
tree with one less level could be created and still leave PCTFREE available, then a
reorganization is recommended. The actual number of index entries should be more
than 90% (or 100-PCTFREE) of the number of entries an NLEVELS-1 index tree can
handle (only checked if NLEVELS>1).
• Formula F7:
100 * (NUMRIDS_DELETED / (NUMRIDS_DELETED + CARD)) < 20
The number of pseudo-deleted RIDs on non-pseudo-empty pages should be less than
20 percent.
• Formula F8:
100 * (NUM_EMPTY_LEAFS/NLEAF) < 20
The number of pseudo-empty leaf pages should be less than 20 percent of the total
number of leaf pages.
V8.2
Instructor Guide

Purpose — This explains the calculations performed by reorgchk for indexes.
Details —
Transition statement — Let's next look how to use the reorgchk results to decide what
type of reorganization to perform.
Instructor Guide
Using REORGCHK results

• If the results of the calculations for F1, F2 or F3 exceed the bounds set
by the formula then the table and indexes should be reorganized.
– If Only F1 is flagged, a REORG with CLEANUP OVERFLOWS could be used
• If the results of the calculation for F4 exceed the bounds set by the
formula for a Cluster Index then the table and indexes should be
reorganized.
• If the results of the calculations for F1, F2, F3 and F4 do not exceed the
bounds set by the formula and the results of the calculations for F5 or
F6 do exceed the bounds set, then index reorganization is
recommended.
• If the result of the calculations F7 exceeds the bounds set, but the
results of F1, F2, F3, F4, F5 and F6 are within the set bounds, then it is
recommended that a cleanup of the indexes be done using the
CLEANUP ONLY option of reorg indexes.
• If the only calculation result to exceed the set bounds is that of F8, it is
recommended that a cleanup of the pseudo empty pages of the indexes
be done using the CLEANUP ONLY PAGES option of reorg indexes.
Figure 8-12. Using REORGCHK results CL4636.0
Notes:
The results of the REORGCHK calculations show what type of table or index
reorganization might be needed. The goal is to resolve any out of bounds indicators using
the least system resources.
If any of the table formulas are out of bounds, these are the F1, F2 and F3, then a full table
and index reorganization would address the condition. This process consumes the most
resource but will resolve all possible out of bounds conditions. The reorganized table
should be stored in a more efficient manner. If the F1 calculation is the only issue, the
CLEANUP OVERFLOWS option could be used to remove overflow pointers.
If the results of the calculation for F4 exceed the bounds set by the formula for a Cluster
Index then the table and indexes should be reorganized.
V8.2
Instructor Guide
Uempty
Note
It is not uncommon for several non-clustered indexes to be flagged for the F4 calculation.
These can be ignored.
If the results of the calculations for F1, F2, F3 and F4 do not exceed the bounds set by the
formula and the results of the calculations for F5 or F6 do exceed the bounds set, then
index reorganization is recommended. The reorganized indexes should be stored in a more
efficient manner.
If only the results of the calculation for F7 exceeds the bounds set, but the results of F1, F2,
F3, F4, F5 and F6 are within the set bounds, then it is recommended that a cleanup of the
indexes be done using the CLEANUP ONLY option of reorg indexes.
If the only calculation result to exceed the set bounds is that of F8, it is recommended that
a cleanup of the pseudo empty pages of the indexes be done using the CLEANUP ONLY
PAGES option of reorg indexes.
Instructor Guide
Instructor notes:
Purpose — To give recommendations for performing table and index reorganizations
based on the reorgchk calculations.
Details —
Transition statement — Next we will discuss the access options for table reorganizations.
V8.2
Instructor Guide
Uempty
Access modes of REORG

• Online or Offline are associated with the following types of
access:
– ALLOW READ ACCESS
– ALLOW NO ACCESS
– ALLOW WRITE ACCESS
• REORG TABLE:
– 'Offline' ==> "Classic Reorg" (as pertains to Tables) permits:
• ALLOW READ ACCESS (the default)
• ALLOW NO ACCESS (truly offline)
– 'Online' ==> "Inplace Reorg" (not to be confused with
'in-tablespace reorg' as pertains to classic table reorg)
• ALLOW WRITE ACCESS (the default)
• ALLOW READ ACCESS
• REORG INDEXES:
• ALLOW READ ACCESS (the default)
• ALLOW NO ACCESS
• ALLOW WRITE ACCESS
Figure 8-13. Access modes of REORG CL4636.0
Notes:
One of the key decisions to make when reorganizing a table is whether to allow access to
the table during reorganization. Applications might require read or write access to some
tables 24 x 7.
The REORG utility supports the following options that define the type of access to a table
permitted during the table or index reorganization. The options are ‘ALLOW NO ACCESS’,
‘ALLOW READ ACCESS’, and ‘ALLOW WRITE ACCESS’.Some types of reorganization
do not support all types of access.
The default access mode for the offline table reorganization is ‘ALLOW READ’, but
‘ALLOW NO ACCESS’ is also supported. For range partitioned tables ‘ALLOW NO
ACCESS’ is the only supported option, unless the ON DATA PARTITION clause is included
and there are not any non-partitioned indexes on the table.
The default access mode for the online table reorganization is ‘ALLOW WRITE’, but
‘ALLOW READ ACCESS’ is also supported.
Instructor Guide
For reorganizing the indexes of a non-partitioned table, the default access mode is
‘ALLOW READ ACCESS’, but ‘ALLOW WRITE ACCESS’ and ‘ALLOW NO ACCESS’
could also be used. When reorganizing a single non-partitioned index of a range partitioned
table, the only supported access mode is ‘ALLOW READ ACCESS’. When all of the
indexes on a range partitioned table are being reorganized, ‘ALLOW NO ACCESS’ is the
only access mode unless the ON DATA PARTITION clause is used to specify a single data
partition.
V8.2
Instructor Guide

Purpose — This shows that different modes of table and index reorganization allow some
but not all types of access to be specified.
Details —
Transition statement — Let's look at methods that can be used to invoke the REORG
utility.
Instructor Guide
Invoking Table or Index reorganization

• CLP - REORG command:
– db2 reorg table staff inplace on dbpartitionnum(10)
– db2 reorg indexes all for table employee allow write access cleanup only
• For example, ../sqllib/samples/clp/tbonlineinx.db2
• API - db2Reorg
– db2Reorg(versionNumber, &paramStruct, &sqlca):
• For example, for tables ../sqllib/samples/c/tbreorg.sqc
• For example, for indexes ../sqllib/samples/c/tbonlineinx.sqc
• SQL via Stored Procedure Call - ADMIN_CMD( )

– CALL ADMIN_CMD('reorg table employee index empid')
• Using the IBM Data Studio administration view
• Automatic Reorganization based on the AUTO_REORG database

configuration
Figure 8-14. Invoking Table or Index reorganization CL4636.0
Notes:
There are a number of methods for invoking the REORG utility.
• The CLP command - REORG command
db2 reorg table staff inplace on dbpartitionnum(10)
db2 reorg indexes all for table employee allow write access cleanup only
The file, tbonlineinx.db2, in the ../sqllib/samples/clp/ directory contains examples of
the DB2 commands to reorganize tables and indexes.
• API - db2Reorg — This interface can be used to invoke a REORG utility from within an
application program.
db2Reorg(versionNumber, &paramStruct, &sqlca)
The sample files, tbreorg.sqc and tbonlineinx.sqc, in the ../sqllib/samples/c directory
are examples of C applications that perform table and index reorganizations.
V8.2
Instructor Guide
Uempty • SQL via Stored Procedure Call - ADMIN_CMD( ) – A stored procedure can be used to
invoke DB2 commands like REORG from an application program.
For example:
CALL ADMIN_CMD('reorg table employee index empid')
• IBM Data Studio – provides a GUI that supports many DB2 utilities like table
reorganization.
• Automatic Reorganization – A DB2 database can be configured using the
AUTO_REORG configuration parameter to automatically reorganize tables and indexes
based on a policy that could limit the tables selected for reorganization.
Instructor Guide
Instructor notes:
Purpose — This shows the methods provided by DB2 to invoke the REORG utility.
Details —
Transition statement — Let's look at the primary method for invoking the REORG utility,
the CLP command.
V8.2
Instructor Guide
Uempty
Table reorganization: CLP syntax –

REORG TABLE table-name Classic-options
Table-Clause: |--+-------------------+--+-------------------->
.-CLASSIC-. +-ALLOW NO ACCESS---+ '-USE--tbspace-name-'
|--+-+---------+--| classic-options |----+--| '-ALLOW READ ACCESS-'
+-INPLACE--+-| inplace-options |-+--------+
| '-+-STOP--+-----------' | >--+-------------------+--+-------------------->
| '-PAUSE-' | '-INDEX--index-name-' '-INDEXSCAN-'
| .-ALLOW WRITE ACCESS-. |
'-RECLAIM EXTENTS--+--------------------+-' >--+------------------------------------------->
+-ALLOW READ ACCESS--+ '-LONGLOBDATA--+-----------------------+-'
'-ALLOW NO ACCESS----' '-USE--longtbspace-name-'
.-KEEPDICTIONARY--.
>--+-----------------+----------------|
'-RESETDICTIONARY-'
Inplace-options:
.-ALLOW WRITE ACCESS-.

|--+--------------------+--------------------------------------->
'-ALLOW READ ACCESS--'
.-FULL-. .-TRUNCATE TABLE---. .-START--.

>--+-+------+--+-------------------+--+------------------+-+--+--------+--|
| '-INDEX--index-name-' '-NOTRUNCATE TABLE-' | '-RESUME-'
'-CLEANUP OVERFLOWS-------------------------------------'
Figure 8-15. Table reorganization: CLP syntax CL4636.0
Notes:
REORG INDEXES/TABLE command
Reorganizes an index or a table.
You can reorganize all indexes defined on a table by rebuilding the index data into
unfragmented, physically contiguous pages. On a data partitioned table, you can
reorganize a specific nonpartitioned index on a partitioned table, or you can reorganize
all the partitioned indexes on a specific data partition.
If you specify the CLEANUP option of the index clause, cleanup is performed without
rebuilding the indexes. This command cannot be used against indexes on declared
temporary tables or created temporary tables (SQLSTATE 42995).
Instructor Guide
The table option reorganizes a table by reconstructing the rows to eliminate fragmented
data, and by compacting information. On a partitioned table, you can reorganize a
single partition.
Scope: This command affects all database partitions in the database partition group.
Authorization
One of the following authorities:
- SYSADM
- SYSCTRL
- SYSMAINT
- DBADM
- SQLADM
- CONTROL privilege on the table.
Required connection: Database
V8.2
Instructor Guide

Purpose — This shows some of the options provided by the REORG command. Point out
that online (INPLACE) and offline (CLASSIC) modes provide different options.
Details —
Transition statement — Let's look at the details for offline table reorganization.
Instructor Guide
Classic (Offline) table reorg

• Shadow copy approach:
– Table space used to hold shadow copy is specified by user
– Shadow built in TEMP table space or within table space containing the table to
be reorganized
– Total storage required depends on mode of reorganization
• Phases: 1- Sort, 2- Build, 3- Replace(or Copy), 4- Index Rebuild
• Processing modes:
– Reclustering via table scan sort (default) or indexscan
– Space reclamation (compaction) via table scan
• LONG/LOB data is not REORGed by default
• Access modes:
– ALLOW READ ACCESS (default)
– ALLOW NO ACCESS
Figure 8-16. Classic (Offline) table reorg CL4636.0
Notes:
The offline or classic REORG uses a Shadow copy approach, which means that the table
is left intact while a reorganized copy of the table is created. This allows the table to be
read by applications while the new table is being built, but requires additional disk space to
hold the second copy of the table data. The second copy of the table will be allocated in the
same table space as the table unless the USE option is included to direct DB2 to build the
shadow copy in a temporary table space. The total storage required for the offline reorg
depends the on mode of reorganization.
There are, at most, four phases to an offline reorganization:
1. Scan-Sort – Where the table data is scanned and sorted to arrange the data by the
columns in a selected index.
2. Build – Where the new copy of the table is built allowing the freespace defined in
PCTFREE for the table.
3. Replace (or Copy) – Where the new data replaces the original copy of the table.
4. Index Rebuild – Where all indexes on the table are rebuilt with pointers to the new row
locations.
V8.2
Instructor Guide
Uempty There are two basic processing modes for an offline table reorganization:
1. Reclustering via table scan sort (default) or indexscan – This will be done if the
INDEX option is specified or if there is an index on the table with the CLUSTER option.
2. Space reclamation (compaction) via table scan – This will be done when no INDEX
option is specified and no index on the table was created with the CLUSTER option.
The REORG Utility will not reorganize any LONG/LOB data for the table unless the option
LONGLOBDATA is included.
The allowed Access modes are:
• ALLOW READ ACCESS (default)
• ALLOW NO ACCESS
Instructor Guide
Instructor notes:
Purpose — This explains some of the characteristics and processing associated with
offline table reorganization. Most of the options and phases will be described in more detail
in the following graphics.
Details —
Transition statement — Let's look at the characteristics for a reclustering REORG utility.
V8.2
Instructor Guide
Uempty
Reclustering REORGs
• When a clustering index exists on a table:
– It is not necessary to specify an index on the REORG command. In
this case, the clustering index will be used under-the-covers to
perform a reclustering reorg. This happens for either offline or online
table REORG.
– Offline table REORG permits the specification and use of an index
other than the clustering index to recluster the data
– Online table REORG does not permit the specification of an index
other than the clustering index when attempting to perform a
reclustering REORG
• For a reclustering REORG, the RIDs are logged and it is only

necessary to log these if the database is recoverable
Figure 8-17. Reclustering REORGs CL4636.0
Notes:
When a clustering index exists on a table it is not necessary to specify an index on the
REORG command. In this case, the clustering index will be used under-the-covers to
perform a reclustering REORG. This happens for either offline or online table REORG.
An Offline table REORG permits the specification and use of an index other than the
clustering index to recluster the data, but in most cases the clustering index should be
used.
The Online table REORG does not permit the specification of an index other than the
clustering index when attempting to perform a reclustering REORG.
For a reclustering REORG, the RIDs are logged and it is only necessary to log these if the
database is recoverable.
Instructor Guide
Instructor notes:
Purpose — This explains some of the details for performing a reclustering or index-based
REORG. It is important that students note that if a table has an index defined with the
CLUSTER option that it will become a reclustering REORG even if there is no INDEX
specified.
Details —
Transition statement — Next we will discuss the scan-sort phase used for some offline
reorganizations.
V8.2
Instructor Guide
Uempty
Offline table REORG:
Reclustering – Table scan sort
• Sort of table records to create new reorganized version of the table
rather than using index scan
• A reorg may be required because clustering index is not well

clustered so a table scan sort will give better I/O characteristics
(might be slower for sparse tables where index itself is somewhat
small)
• Table scan sort is disabled under-the-covers if:

– LONG/LOB data is being reorganized
– Length of sort record is too large (RID is included in sort record)
• Index recreate optimization:

– If reclustering index is unique DMS type, recreation of this index will not
require a sort. Rather this index is rebuilt by simply scanning the newly
reorganized data table.
– Any other indexes that require recreation will involve a sort.
Figure 8-18. Offline table REORG: Reclustering – Table scan sort CL4636.0
Notes:
For a reclustering offline table reorganization, DB2 will sort table records to create a new
reorganized version of the table rather than using an index scan. The INDEXSCAN option
can be specified to bypass the table scan and sort. It is possible that the reorganization is
being performed because the clustering index is not well clustered, so a table scan and sort
might be more efficient.
If the table being reorganized contains a large number of empty pages, the INDEXSCAN
could perform better than the table scan and sort.
The table scan sort is disabled under-the-covers if the LONGLOBDATA option is specified.
The index scan might also be forced if the length of the sort record is too large for the page
size of the temporary table space. The RID is included in the sort record.
During the phase where indexes are rebuilt, a table scan might be used to avoid sorting the
clustering index data if it is a unique index in a DMS table space. If the clustering index is
the only index on the table, then this phase might not require any temporary table space
access.
Instructor Guide
The index build for any other indexes on the table will involve a sort operation.
V8.2
Instructor Guide

Purpose — This describes some of the different processing steps that might be performed
for a reclustering offline table reorganization and how the REORG options like
INDEXSCAN AND LONGLOBDATA will effect the REORG processing.
Details —
Transition statement — Let's review the space requirements for offline table
reorganization.
Instructor Guide
Offline table REORG: Table space storage

• Offline table REORG is a shadow copy approach - enough
additional storage to accommodate another copy of the table
must be available:
– CASE A: A system temporary table space is specified, for example:
db2 reorg table staff use TEMPSPACE1
– The shadow copy is built within the specified temp space
– CASE B:
db2 reorg table staff
– The shadow copy is built within the user table space that the original
table resides in
• Additional table space storage might be required for the table

scan sort mode of reorg processing in order to accommodate
sort processing
Figure 8-19. Offline table REORG: Table space storage CL4636.0
Notes:
A major consideration involved in planning for offline table reorganization is the additional
disk space that might be required to complete the reorg processing.
Since Offline table REORG uses a shadow copy approach, enough additional storage to
accommodate another copy of the table must be available.
There are two possible places to create the shadow copy of the table:
• CASE A: The USE option names a system temporary table space.
db2 reorg table staff use TEMPSPACE1
The shadow copy is built within the specified temp space.
• CASE B: No USE option is included:
db2 reorg table staff
The shadow copy is built within the user table space that the original table resides in. If
there is not sufficient space in the table space, the REORG will fail in the phase where
V8.2
Instructor Guide
Uempty the shadow copy is built and the utility will terminate. Since the original table is still
intact, there will not be any impact to applications that need to access the table.
Additional temporary table space storage might be required for the table scan sort
mode of REORG processing in order to accommodate sort processing.
Instructor Guide
Instructor notes:
Purpose — This explains the impact of the USE option for REORG and where the space
will be needed for the shadow copy of the table.
Details —
Transition statement — Let's take another look at the extra space needed for offline
REORGs.
V8.2
Instructor Guide
Uempty
Offline table REORG: Scan sort storage
db2 reorg table T1 index I1 db2 reorg table T1 index I1 use TEMPSPACE1
SORT
3x TEMP
2x TEMP SORT
TDASPILL
SHADOW
T1 TDASPILL
TDAMERGE T1
TDAMERGE
SHADOW
USERSPACE1 TEMPSPACE1 USERSPACE1 TEMPSPACE1
Figure 8-20. Offline table REORG: Scan sort storage CL4636.0
Notes:
A reclustering offline table reorganization that includes the scan-sort phase might need to
allocate space that could be three times the size of the table's data. The scan-sort
processing might need space for two copies of the table's data in a temporary table space.
The Build phase will create another copy of the table data in a temporary table space or the
table's table space depending on the USE option.
Instructor Guide
Instructor notes:
Purpose — This explains that depending on the options, a reclustering offline table
reorganization might need space equal to three times the table's data.
Details —
Transition statement — Let's take a look at some considerations for page size during an
offline REORG.
V8.2
Instructor Guide
Uempty
Page size considerations: Offline table REORG

• USE temp space option – System temporary table space
page size must equal page size for table (Data and Large
Object table spaces)
• Scan-Sort phase – With no INDEXSCAN option, the scan sort

phase may require a temporary table space with a page size
equal to that of the original table
• Index Build phase – A temporary table space may be

required for sorting index keys
– Different than table space in USE if more than one avaiable with the
correct page size
Figure 8-21. Page size considerations: Offline table REORG CL4636.0
Notes:
The page size of the system temporary table space used by offline table REORG must
match the page size of the table spaces in which the table data resides (including the
LONG and/or LOB column data). If not, an SQL2217 error will terminate processing.
Even if a temp space is not explicitly specified (via USE clause), if the REORG is
performed via scan sort, the sort might spill to disk (a temp space) and hence a temp space
with a page size equal to that of the original table must exist. The INDEXSCAN option
could be used to bypass the need for this temporary table space.
The index recreation phase usually involves sorts and most likely will require temp space
utilization depending on sort heap size. The temp space specified with the USE clause will
not be the same temp space used by index recreation if more than one temp space of the
same page size exists. Temp space assignment is round-robin based.
Instructor Guide
Instructor notes:
Purpose — This explains that the REORG Utility might require a temporary table space
with a page size that matches the table space of the table being reorganized to avoid a
processing error.
Details —
Transition statement — Let's review the options for table compression with DB2.
V8.2
Instructor Guide
Uempty
DB2 Compression feature summary – A brief history

• Data row compression was introduced in DB2 9.1
• COMPRESS option for CREATE and ALTER TABLE is used to specify
compression
• Classic Row Compression uses a static dictionary:
– Dictionary stored in data object, about 100K in size
– Compression Dictionary needs to be built before a row can be compressed
– Dictionary can be built or rebuilt using REORG TABLE offline, which also
compresses existing data
– A dictionary will be built automatically when a compressed table reaches a
threshold size (about 2 MB). This applies to SQL INSERT as well as IMPORT
and LOAD (DB2 9.5 feature Automatic Dictionary Creation).
• Compression is intended to:
– Reduce disk storage requirements
– Reduce I/Os for scanning tables
– Reduce buffer pool memory for storing data
• Compression for Indexes, Temporary data and XML data was added in
DB2 9.7
– Index compression applies several space saving techniques, no dictionary
– XML compression uses a second static dictionary based on XML data
Figure 8-22. DB2 Compression feature summary – A brief history CL4636.0
Notes:
The first row compression capability for DB2 LUW was introduced with DB2 9.1. That
function is now called ‘classic row compression’, which uses a static dictionary for each
table to compress the data object rows. The dictionary, which is necessary to perform
compression is commonly built using an offline table reorganization.
DB2 9.5 added the automatic dictionary creation function, to build a static dictionary based
on a relatively small data sample, to achieve reduced compression results but avoid the
table reorganization.
The primary goals for using compression are to:
• Reduce disk space storage requirements
• Reduce the number of disk read operations needed to scan large tables
• Increase the number of rows available in a given amount of buffer pool memory
Several additional types of compression were added in DB2 9.7:
• Compression for indexes, which is not dictionary based
Instructor Guide
• Compression for system and user temporary tables

• Compression for XML data, which uses a second compression dictionary for each table
V8.2
Instructor Guide

Purpose — To review the compression features available before DB2 10.1 and to clarify
when each feature was implemented. This is helpful since many customers will migrate
from various DB2 release levels.
Details —
Transition statement — Next we will introduce the adaptive row compression function in
DB2 10.1.
Instructor Guide
DB2 10.1 implements page level adaptive

compression
• Adaptive compression concepts
– Improves upon the compression rates achieved using classic row compression by
alone
• Classic row compression: Typically saves ~40%-75%
• Adaptive row compression: Typically saves ~75%-85%
• Adaptive typically saves 30% over classic row compression
– Incorporates classic row compression, but performs additional compression on a

page-by-page basis
– Reduces need to reorganize tables to rebuild the static dictionary as new data
patterns arrive in rows
• How adaptive compression works

– Is enabled using CREATE/ALTER TABLE COMPRESS option
– Applies compression to rows using the static, table level dictionary first
– Once a page is filled, uses a page-level dictionary-based compression algorithm
to compress data based on data repetition within each page of data
– If page data changes over time and rows expand reducing the compression ratio,
the page level dictionary will be automatically rebuilt with current data patterns
Figure 8-23. DB2 10.1 implements page level adaptive compression CL4636.0
Notes:
Adaptive compression
Adaptive compression improves upon the compression rates that can be achieved
using classic row compression by itself. Adaptive compression incorporates classic row
compression; however, it also works on a page-by-page basis to further compress data.
Of the various data compression techniques in the DB2 product, adaptive compression
offers the most dramatic possibilities for storage savings.
How adaptive compression works
Adaptive compression actually uses two compression approaches. The first employs
the same table-level compression dictionary used in classic row compression to
compress data based on repetition within a sampling of data from the table as a whole.
The second approach uses a page-level dictionary-based compression algorithm to
compress data based on data repetition within each page of data. The dictionaries map
repeated byte patterns to much smaller symbols; these symbols then replace the longer
byte patterns in the table. The table-level compression dictionary is stored within the
table object for which it is created, and is used to compress data throughout the table.
V8.2
Instructor Guide
Uempty The page-level compression dictionary is stored with the data in the data page, and is
used to compression only the data within that page
Instructor Guide
Instructor notes:
Purpose — To introduce the adaptive row compression feature.
Details —
Transition statement — Next we will discuss some additional information about adaptive
compression.
V8.2
Instructor Guide
Uempty
How Does Adaptive Compression Work? Step 1
California
[1] 9
• Step 1: Compression with static table level [2] San
[3] Jose
dictionary [4] Francisco
Christine Haas (408) 463-1234 555 Bailey Avenue San Jose California 95141 [5] Avenue
John Thompson (408) 463-5678 555 Bailey Avenue San Jose California 95141
[6] Street
Jose Fernandez (408) 463-1357 555 Bailey Avenue San Jose California 95141
Margaret Miller (408) 463-2468 555 [7] Road
4400Bailey
NorthAvenue
1st San Jose California 95141
Bruce Kwan (408) 956-9876 4400 Street
North 1st San Jose California 95134
James Geyer (408) 956-5432 4400 Street
North 1st San Jose California 95134
Linda Hernandez (408) 956-9753 Street San Jose California 95134
Theodore Mills (408) 927-8642 650 Harry Road San Jose
San
California 95134 Compression with
Susan Stern (408) 927-9630 650 Harry Road San Jose
San Francisc
California 95134 global table static
James Polaski (415) 545-1423 425 Market Street o California 94105
John Miller (415) 545-5867 425 Market Street
San Francisc
o California 94105
dictionary
San Francisc
James Walker (415) 545-4132 425 Market Street o California 94105
San Francisc
Elizabeth Brown (415) 545-8576 425 Market Street o
Francisc California 94105
Sarah Johnson (415) 545-1928 425 Market Street o California 94105
Christine Haas (408) 463-1234 555 Bailey [5] [2][3] [1] 95141
John Thompson (408) 463-5678 555 Bailey [5] [2][3] [1] 95141
• Table-level compression symbol [3] Fernandez (408) 463-1357 555 Bailey [5] [2][3] [1] 95141
Margaret Schneider (408) 463-2468 555 Bailey [5] [2][3] [1] 95141
dictionary containing globally Bruce Kwan (408) 956-9876 4400 North 1st [6] [2][3] [1] 95134
recurring patterns James Geyer (408) 956-5432 4400 North 1st [6] [2][3] [1] 95134
Linda Hernandez (408) 956-9753 4400 North 1st [6] [2][3] [1] 95134
• Table-level dictionary can only be Theodore Mills (408) 927-8642 650 Harry [7] [2][3] [1] 95134
Susan Stern (408) 927-9630 650 Harry [7] [2][3] [1] 95134
rebuilt during classic table REORG James Polaski (415) 545-1423 425 Market [6] [2][4] [1] 94105
John Miller (415) 545-5867 425 Market [6] [2][4] [1] 94105
– Involves re-compressing all data James Walker (415) 545-4132 425 Market [6] [2][4] [1] 94105
Elizabeth Miller (415) 545-8576 425 Market [6] [2][4] [1] 94105
Sarah Johnson (415) 545-1928 425 Market [6] [2][4] [1] 94105
Figure 8-24. How Does Adaptive Compression Work? Step 1 CL4636.0
Notes:
Adaptive compression must always start with a Classic compression dictionary. This
compression dictionary is similar to prior versions of DB2. The STATIC dictionary contains
patterns of frequently used data that is found ACROSS the entire table. Either a classic
reorg must be used for existing tables to generate this STATIC dictionary, or the dictionary
gets built when a table hits a threshold of data (typically 1-2MB of data) when using
AUTOMATIC COMPRESSION.
A customer needs to be aware that altering a table to use ADAPTIVE compression will
cause the following:
• Automatic dictionary creation will be done once about 2M of data is populated in the
table
• All of the data in the table PRIOR to the STATIC dictionary being created will not be
“TABLE” compressed. They are eligible for ADAPTIVE compression however
• A full OFFLINE REORG will be required if you want to compress all of the data in the
table
Instructor Guide
Instructor notes:
Purpose — To explain that the first step in adaptive compression is to apply the static form
of compression to data rows first as they are placed into pages.
Details —
Transition statement — Next we will discuss how the page level compression information
is added.
V8.2
Instructor Guide
Uempty
How Does Adaptive Compression Work? Step 2
• Step 2: Compression with Page-Level Dictionaries (1) ernandez
Christine Haas (2)1234 (4) (2) (408) 463-

Christine Haas (408) 463-1234 555 Bailey [5] [2][3] [1] 5141
John Thompson (408) 463-5678 555 Bailey [5] [2][3] [1] 5141 John Thompson (2)5678 (4) (3) (408) 956-
Ellen Fernandez (408) 463-1357 555 Bailey [5] [2][3] [1] 5141
Ellen F(1) (2)1357 (4) (4) 555 Bailey [5] [2][3] [1] 5141
Data page
Margaret Schneider (408) 463-2468 555 Bailey [5] [2][3] [1] 5141 Page level dictionary
4400 North 1st
Bruce Kwan (408) 956-9876 [6] [2][3] [1] 5134
4400 North 1st Margaret Schneider (2)2468 (4) (5) 4400 North 1st [6] [2][3] [1] 5134
James Geyer (408) 956-5432 [6] [2][3] [1] 5134
4400 North 1st
Linda Hernandez (408) 956-9753 [6] [2][3] [1] 5134 Bruce Kwan (3)9876 (5) (1) James
(2) John
[9]odore Mills (408) 927-8642 650 Harry [7] [2][3] [1] 5134 James Geyer (3)5432 (5)
(3) Mill
Susan Stern (408) 927-9630 650 Harry [7] [2][3] [1] 5134
Linda
[9]odore H(1)
(3)s (3)9753 (6)
(4)8642 (5)
James Polaski (415) 545-1423 425 Market [6] [2][4] [1] 4105 (4) (408) 927-
Data page
John Miller (415) 545-9876 425 Market [6] [2][4] [1] 4105 Susan Stern (4)9630 (6) (5) (408) 956-
James Walker (408) 956-4132 425 Market [6] [2][4] [1] 4105
(1) Polaski (5)1423 (7) (6) 650
Page
Harry
level
[7]dictionary
[2][3] [1] 5134
[8] Miller (408) 956-8576 425 Market [6] [2][4] [1] 4105
Sarah Johnson (408) 956-1928 425 Market [6] [2][4] [1] 4105 (7) 425 Market [6] [2][4] [1] 4105
(2) (3)er (5)9876 (7)
(1) Walker (5)4132 (7)
• Page-level compression dictionaries contain locally

[8] frequent
(3)er patterns
(5)8576 (7)
Sarah (2)son (5)1928 (7)

• Page-level compression dictionary building and rebuilding is fully automatic
• Algorithm optimized for compressing data already compressed by
table-level dictionary
• Page-level compression dictionaries are stored as special records in
each page
Figure 8-25. How Does Adaptive Compression Work? Step 2 CL4636.0
Notes:
Once a STATIC dictionary is built, the adaptive compression feature will create local
page-level dictionaries. In the case of individual pages, there may be recurring patterns
that may not have been picked up by the STATIC dictionary. This will also be the case as
more data is added to the table since new pages may contain patterns of data that did not
exist when the original STATIC dictionary was created.
This ADAPTIVE compression places a small dictionary on the page itself. The algorithm
will decide whether or not the savings of compression outweigh the costs of storing the
dictionary – similar to the was STATIC compression may not compress rows on a page.
The actual process of creating the page dictionary is dependent on whether or not a
“threshold” is met. Rebuilding a page dictionary for every INSERT, UPDATE, or DELETE
will result in a very high amount of overhead. Instead, the algorithm checks to see how
“stale” the dictionary is and updates it when it believes that higher savings can be
achieved.
Instructor Guide
Instructor notes:
Purpose — To see how the page level dictionary data is created once a page is almost
filled with rows compressed using the static dictionary functions.
Details —
Transition statement — Next we see the options for the REORG utility that control
creation of compression dictionaries.
V8.2
Instructor Guide
Uempty
Static Dictionary building: Offline table REORG

• When the dictionary is being built, a temporary in-memory
buffer of size 10 MB is required
– This memory will be allocated from the utilities heap (util_heap_sz)
• All the data rows that exist in a table will participate in the
building of the compression dictionary during Classic (Offline)
table reorg when row compression is active
>--REORG--<table name>--+-----------------------+---->
'--INDEX--<index name>--‘
.-ALLOW READ ACCESS-.
>--+-+-------------------+--+----------------+--+-----------+-->
'-ALLOW NO ACCESS---' '-USE--<tbspace>-' '-INDEXSCAN-'
.-KEEPDICTIONARY---.
>--+-------------+-+-------------------+-+-->
'-LONGLOBDATA-' '-RESETDICTIONARY---‘
Figure 8-26. Manual Dictionary building: Offline table REORG CL4636.0
Notes:
The REORG utility provides support for implementing data row compression for a table,
including creating a compression dictionary and compressing the data rows.
Although the table attribute COMPRESS is activated using DDL statements, CREATE
TABLE or ALTER TABLE, a dictionary needs to be created for the table before data rows
can be stored in a compressed form. The REORG utility can be used the perform the
dictionary creation as well as rebuilding the table with compressed data rows. The REORG
command options KEEPDICTIONARY and RESETDICTIONARY are used to control the
compression dictionary building functions during an offline table reorganization.
RESETDICTIONARY: If the COMPRESS attribute for the table is YES then a new row
compression dictionary is built. All the rows processed during reorganization are subject to
compression using this new dictionary. This dictionary replaces any previous dictionary. If
the COMPRESS attribute for the table is NO and the table does have an existing
compression dictionary then reorg processing will remove the dictionary and all rows in the
newly reorganized table will be in non-compressed format. It is not possible to compress
long, LOB, index, or XML objects.
Instructor Guide
KEEPDICTIONARY: If the COMPRESS attribute for the table is YES and the table has a
compression dictionary then no new dictionary is built. All the rows processed during
reorganization are subject to compression using the existing dictionary. If the COMPRESS
attribute for the table is NO and the table has a compression dictionary then reorg
processing will remove the dictionary and all the rows in the newly reorganized table will be
in non-compressed format. It is not possible to compress long, LOB, index, or XML objects.
These functions are only for the offline or classic mode of table reorganization, and can not
be specified in combination with the INPLACE option.
All the data rows that exist in a table will participate in the building of the compression
dictionary during Classic (Offline) table reorg when row compression is active for the table.
When the dictionary is being built a temporary in-memory buffer of size 10 MB is required.
This memory will be allocated from the utilities heap (util_heap_sz).
V8.2
Instructor Guide

Purpose — This describes the options of the REORG utility that can be used to build the
compression dictionary.
Details —
Transition statement — Now we will look at the processing for the REORG utility when
RESETDICTIONARY is used to build a new compression dictionary.
Instructor Guide
Offline REORG with RESETDICTIONARY

• Clustering REORG using default SCANSORT
– Build Compression Dictionary during SCANSORT
• Clustering REORG using INDEXSCAN
– Extra table scan for building Compression Dictionary
• Non-clustering REORG - reclaim space
– Extra table scan for building Compression Dictionary
• Table data is compressed during BUILD phase using new Dictionary
• Table data is uncompressed for INDEX Build phase
compress uncompress
SCAN SORT Build Phase Replace Phase Index Rebuild
Scan for
Building Dictionary
Index SCAN
Figure 8-27. Offline REORG with RESETDICTINARY CL4636.0
Notes:
The additional processing by the REORG utility to build the compression dictionary for a
table and compress the table data when the RESETDICTIONARY option is included
depends on the mode for the offline reorganization. This could add additional I/O and CPU
resources to the processing for the reorganization.
There are three modes for an offline table reorganization:
• Reclustering using a SCANSORT – In this mode, the table data will be reclustered
based on the index selected by the INDEX option or the index on the table with a
CLUSTER attribute. The table is scanned and sorted to produce the correct sequence
rather than accessing the index. The dictionary is built using the table scan association
with the SCAN-SORT which is the first phase of processing, so no extra table scan is
needed.
• Reclustering using an INDEX SCAN – In this mode, the table data will be reclustered
based on the index selected by the INDEX option or the index on the table with a
CLUSTER attribute when the INDEXSCAN option is included for the REORG. The
index is used to retrieve the table data so no sorting is required. In this mode, an extra
V8.2
Instructor Guide
Uempty table scan will be required to build the compression dictionary. This will be done before
the index is scanned to retrieve the data to build the copy of the table.
• Reclaiming or Nonclustering reorg – In this mode, data rows are not sequenced
based on an index. This goal of the reorganization is usually to compact a table after
many rows have been deleted. In this mode, an extra table scan will be required to build
the compression dictionary. This will be done before the table is scanned to retrieve the
data to build the copy of the table.
In all cases, the table data is compressed during the BUILD phase using the new
Dictionary. The table data is uncompressed during the table scan used to retrieve the
indexed column data for the Index Build phase. One table scan is required for each index
built. Index pages will be built in the compressed format if the COMPRESS option is YES
for an index.
If the table was already in a compressed format prior to reorganization and the
RESETDICTIONARY was specified to refresh the dictionary contents, the table data would
be uncompressed first and then re-compressed during the BUILD phase.
Instructor Guide
Instructor notes:
Purpose — This describes the additional CPU and I/O processing associated with table
reorganization with the RESETDICTIONARY option. One tradeoff is that the compressed
table should require fewer pages to be written to the table copy in the Build phase and then
read during the Index Build phase.
Details —
Transition statement — Now we will look at the processing for the REORG utility when
KEEPDICTIONARY is used to reorganize a table using the current table dictionary.
V8.2
Instructor Guide
Uempty
Offline REORG with KEEPDICTIONARY

• Clustering REORG using default SCANSORT
– Uncompress data for sort, Re-Compress for Build
• Clustering REORG using INDEXSCAN
– No need to Uncompress/Recompress data, use Index for build sequence
• Nonclustering REORG reclaim space
– No need to Uncompress/Recompress data
• Uncompressed data is compressed if needed during BUILD phase
• Table data is uncompressed for INDEX Build phase
uncompress Compress if needed uncompress
SCAN SORT Build Phase Replace Phase Index Rebuild
Index SCAN
Figure 8-28. Offline REORG with KEEPDICTIONARY CL4636.0
Notes:
The additional processing by the REORG utility to reorganize a compressed table when the
KEEPDICTIONARY option is included depends on the mode for the offline reorganization.
This could add additional I/O and CPU resources to the processing for the reorganization.
There are three modes for an offline table reorganization:
• Reclustering using a SCANSORT – The table data will be uncompressed as part of
the table scan and left uncompressed during the sort. This means that the sorting might
be done on a copy of the data that is much larger than the table data. The table data will
be re-compressed after the sort, during the Build phase.
• Reclustering using an INDEX SCAN – Since the dictionary is not changing, the index
scan is able to access the data rows in the sorted sequence without uncompressing the
rows, so the already compressed rows go into the build phase without needing to be
compressed. This can save CPU time, compared to the SCANSORT option, but could
require extra disk I/Os if the index has a low cluster ratio.
Instructor Guide
• Reclaiming or Nonclustering reorg – Since the dictionary is not changing, the table
scan is used to access the data rows without uncompressing the rows, so the already
compressed rows go into the build phase without needing to be compressed.
In all cases, any uncompressed table data is compressed during the BUILD phase using
the current Dictionary. The table data is uncompressed during the table scan used to
retrieve the indexed column data for the INDEX Build phase. Index pages will be built in the
compressed format if the COMPRESS option is YES for an index.
V8.2
Instructor Guide

Purpose — This describes the additional CPU and I/O processing associated with table
reorganization with the KEEPDICTIONARY option. Students might want to run some tests
to see how reorg performs using the SCANSORT and INDEXSCAN options. One
significant difference is that a highly compressed table would require enough temporary
space for two copies of the uncompressed data for the scansort processing. The copy of
the table made during the BUILD phase would be compressed, so if the USE option is
specified to make the copy in a temporary table space, that copy would be the compressed
size, not the uncompressed size.
Details —
Transition statement — Next we will discuss the impact on data object space usage when
a table is compressed using a REORG utility.
Instructor Guide
Compression Dictionary Build using REORG
EMPTY TABLE Uncompressed Row Data Compressed Row Data
Dictionary
REORG
INSERT
TABLE
LOAD ALTER
TABLE
COMPRESS
YES
INDEX
CREATE TABLE
WITH
COMPRESS NO
(DEFAULT)
Figure 8-29. Compression Dictionary Build using REORG CL4636.0
Notes:
This visual is a pictorial representation of compression dictionary building and table
compression using an offline table reorganization.
1. The figure on the far left represents an empty table which has been created with the
COMPRESS option set to NO, which is the default.
2. It is subsequently populated with data which is uncompressed – the green shading in
the middle figure rectangle depicts uncompressed data residing in the table. The data
could be added using the LOAD utility, the IMPORT utility or an application using SQL
INSERT.
3. The ALTER TABLE statement is used to change the COMPRESS option to YES. The
table definition is changed in the DB2 catalogs, but the table does not have a
compression dictionary and the data is still uncompressed.
4. The REORG utility is run offline to create a compression dictionary and compress all the
records that exist within the table.
V8.2
Instructor Guide
Uempty 5. All new data which might be moved into the table is now also subject to being
compressed.
Row compression is performed on the data component for a table. The index and any large
object components for the table remain uncompressed.
Instructor Guide
Instructor notes:
Purpose — To show the steps required to get a table into a compressed format using the
REORG utility.
Details —
Transition statement — Next we will discuss DB2 support for automatic dictionary
creation.
V8.2
Instructor Guide
Uempty
Automatic Dictionary Creation concepts

• Compression automatically kicks-in as the table grows if COMPRESS
attribute set
• Reduces or eliminates need for table REORG to build dictionary and
compress rows
• The threshold at which ADC triggers is dependent on the size of the
table and how much data exists within the table:
– Need enough data to achieve significant compression.
– Do not want to leave too much data in the table uncompressed.
– Need to limit performance impact to the triggering transaction. This could be an
application processing a single INSERT.
• Applicable to growth operations: INSERT, IMPORT, LOAD,
REDISTRIBUTE
• Impact of ADC:
– Compression ratio can be less than optimal.
– Slight performance impact when threshold crossed and dictionary gets built.
Figure 8-30. Automatic Dictionary Creation concepts CL4636.0
Notes:
Automatic (compression) Dictionary Creation (ADC)
Compression dictionaries are automatically created during data population operations on
tables for which you defined the COMPRESS attribute to YES if a compression dictionary
does not already exist within the physical table or partition and after a table reaches
approximately 1 MB in size as the result of data being added (through insert or load
processing, for example), the dictionary is created and is inserted into the table. Provided
that the table COMPRESS attribute remains enabled, all data moved into the table after
creation of the compression dictionary is subject to compression.
This allows data to be stored in a table in compressed form without needing to run a
REORG utility to build a dictionary and compress the rows.
Since the compression dictionary created by the automatic dictionary creation routines
uses a small amount of table data, the resulting compression might not be as effective as a
dictionary created from a larger sample.
Instructor Guide
The automatic dictionary creation is synchronous to the processing that triggers the
creation. This means that an application performing a single SQL INSERT might be slightly
impacted while the dictionary data is populated.
V8.2
Instructor Guide

Purpose — To describe the basic features of automatic dictionary creation. This feature
became available with DB2 9.5.
Details —
Transition statement — Let’s step through an example that shows automatic dictionary
creation.
Instructor Guide
Automatic Compression
Dictionary Creation on data population
EMPTY TABLE Uncompressed Row Data Uncompressed Row Data Uncompressed Row Data
INSERT INSERT INSERT

IMPORT IMPORT IMPORT
ADC Threshold
LOAD
LOAD LOAD
REDIST
REDIST REDIST
COMPRESS YES
Uncompressed Row Data
INSERT Compression Dictionary

Synchronous
IMPORT Compressed Row Data
Dictionary
Build
LOAD
REDIST
Figure 8-31. Automatic Compression Dictionary Creation on data population CL4636.0
Notes:
This visual illustrates how Automatic Dictionary Creation (ADC) takes place.
• In this example, we begin with an empty table which has the table COMPRESS attribute
set to YES.
• Data then begins to be moved into the table. Table growth can proceed by SQL INSERT
processing or utility processing such as IMPORT, LOAD, or a REDISTRIBUTE.
• As the data beings to populate the data it resides in the table as uncompressed. Note
the dotted line in the figure – this represents the threshold for triggering ADC. It is an
internally set threshold. Once the table reaches a certain size (on the order of 1 to 2 MB
of pages) and contains a sufficient amount of data, dictionary creation is automatically
triggered. The growth could occur over an extended period of time or very rapidly, if
something like a large IMPORT is being used.
• As the threshold is breached the dictionary creation operation runs synchronously using
the application connection that caused the breach.
V8.2
Instructor Guide
Uempty • Once the dictionary build has completed and the dictionary has been inserted into the
table, any subsequent data to enter the table respects the dictionary and is
compressed.
Instructor Guide
Instructor notes:
Purpose — To show how a compression dictionary will be built synchronously as an
application processes new data into a table defined with row compression enabled.
Details —
Transition statement — Next we will look at an example of a query using the
ADMIN_GET_TAB_COMPRESS_INFO table function.
V8.2
Instructor Guide
Uempty
Using ADMIN_GET_TAB_COMPRESS_INFO table
function result for classic compressed tables
select *
from table(ADMIN_GET_TAB_COMPRESS_INFO('INST481',NULL) ) AS T1
TABSCHEMA TABNAME DBPARTITIONNUM DATAPARTITIONID OBJECT_TYPE ROWCOMPMODE

----------- --------- -------------- --------------- ----------- -----------
INST481 ACCT 0 0 DATA S
INST481 HIST2 0 0 DATA S
PCTPAGESSAVED_CURRENT AVGROWSIZE_CURRENT PCTPAGESSAVED_STATIC

--------------------- ------------------ --------------------
78 24 78
60 28 59
AVGROWSIZE_STATIC PCTPAGESSAVED_ADAPTIVE AVGROWSIZE_ADAPTIVE

----------------- ---------------------- -------------------
24 83 18
28 59 28
Figure 8-32. Using ADMIN_GET_TAB_COMPRESS_INFO table function result for classic compressed tables CL4636.0
Notes:
This example result from the ADMIN_GET_TAB_COMPRESS_INFO table function shows
the estimated compression results for two tables that are currently using classic row
compression.
The ROWCOMPMODE column result is ‘S’, indicating the tables have static dictionaries.
The results show expected compression results with and without adaptive compression. In
some cases the table could benefit both from a rebuilt static dictionary and also from
adaptive compression.
Instructor Guide
Instructor notes:
Purpose — To show an example query that uses the table function
ADMIN_GET_TAB_COMPRESS_INFO to estimate compression, both static and adaptive
for tables with classic compression. This would be useful for tables migrated to DB2 10.1,
that were using classic compression with previous releases.
Details —
Transition statement — Let's first take a look at an example of the reorg option output
from a db2pd command.
V8.2
Instructor Guide
Uempty
Using db2pd to monitor table reorg status

db2pd –db MUSICDB -reorg
Database Member 0 -- Database MUSICDB -- Active -- Up 0 days 00:02:12 -- Date 2013-10-24-

09.24.50.705218
Table Reorg Information:

Address TbspaceID TableID PartID MasterTbs MasterTab TableName Type IndexID
TempSpaceID
0x00007F6C60937C78 11 4 n/a n/a n/a HIST1 Offline 1 1
0x00007F6C60939678 11 6 n/a n/a n/a HIST3 Offline 1 1
Table Reorg Stats:

Address TableName Start End PhaseStart
0x00007F6C60937C78 HIST1 10/24/2013 09:23:48 10/24/2013 09:23:50 10/24/2013
09:23:49
0x00007F6C60939678 HIST3 10/24/2013 09:24:39 10/24/2013 09:24:42 10/24/2013
09:24:42
MaxPhase Phase CurCount MaxCount Status Completion

4 IdxRecreat 0 0 Done 0
3 IdxRecreat 0 0 Done 0
• Shows active or last completed REORG for each table

• Statistics are in database memory, lost when database deactivates
Figure 8-33. Using db2pd to monitor table reorg status CL4636.0
Notes:
The -reorg option of the db2pd command returns information about table reorganizations
that have completed or are currently processing.
Table Reorg Stats:
Address: A hexadecimal value.
TableName: The name of the table.
Start: The time that the table reorganization started.
End: The time that the table reorganization ended.
PhaseStart: The start time for a phase of table reorganization.
MaxPhase: The maximum number of reorganization phases that will occur during
the reorganization. This value only applies to offline table reorganization.
Phase: The phase of the table reorganization. This value only applies to offline table
reorganization. The possible values are:
Instructor Guide
• Sort
• Build
• Replace
• InxRecreat
CurCount: A unit of progress that indicates the amount of table reorganization that
has been completed. The amount of progress represented by this value is relative to
the value of MaxCount, which indicates the total amount of work required to
reorganize the table.
MaxCount: A value that indicates the total amount of work required to reorganize the
table. This value can be used in conjunction with CurCount to determine the
progress of the table reorganization.
Status: The status of an online table reorganization. This value does not apply to
offline table reorganizations. The possible values are:
• Started
• Paused
• Stopped
• Done
• Truncat
Completion: The success indicator for the table reorganization. The possible values
are:
• 0: The table reorganization completed successfully.
• -1: The table reorganization failed.
PartID: The data partition identifier. One row is returned for each data partition,
showing the reorganization information.
MasterTbs: For partitioned tables, this is the logical table space identifier to which
the partitioned table belongs. For non-partitioned tables, this value corresponds to
the TbspaceID.
MasterTab: For partitioned tables, this is the logical table identifier of the partitioned
table. For non-partitioned tables, this value corresponds to the TableID.
Type: The type of reorganization. The possible values are:
• Online
• Offline
IndexID: The identifier of the index that is being used to reorganize the table.
TempSpaceID: The table space in which the table is being reorganized.
V8.2
Instructor Guide

Purpose — To show an example of a db2pd report showing REORG processing status. In
previous releases we used snapshot based commands and table functions, like GET
SNAPSHOT FOR TABLES.
Details —
Transition statement — Next we will at an example of a db2pd report showing the status
of a table reorganization that did not complete successfully.
Instructor Guide
Using db2pd to check reorg statistics when a

reorg fails to complete
db2pd –db MUSICDB -reorg
Database Member 0 -- Database MUSICDB -- Active -- Up 0 days 00:00:46 -- Date 2013-10-24-

09.23.24.781628

TempSpaceID
0x00007F6C60937C78 11 4 n/a n/a n/a HIST1 Offline 1
11
Table Reorg Stats:

Address TableName Start End PhaseStart
0x00007F6C60937C78 HIST1 10/24/2013 09:22:40 10/24/2013 09:22:42 10/24/2013
09:22:41
MaxPhase Phase CurCount MaxCount Status Completion

4 Build 1967 5426 Stopped
• Report shows REORG did not complete Build phase processing
Figure 8-34. Using db2pd to check reorg statistics when a reorg fails to complete CL4636.0
Notes:
The visual shows an example of a table reorganization that failed to complete normally.
The report shows:
• The table HIST1 was being reorganization with an offline reorg
• Reorg processing was in the BUILD phase
• The status is ‘stopped’, indicating the processing began, but is not currently active
• The report shows that the tablespace and temporary tablespace ids are the same,
meaning no temporary tablespace was specified. In this example the tablespace disk
space was insufficient to complete processing.
V8.2
Instructor Guide

Purpose — To look at a db2pd report for a table reorganization that failed to complete.
Details —
Transition statement — Let's first take a look at an example of the output from a query
using the view SYSIBMADM.DB_HISTORY.
Instructor Guide
Query access to
REORG information in Database History
Example:
db2 "select substr(tabname,1,7) as tabname,
operation,operationtype,objecttype,
start_time, end_time
from SYSIBMADM.DB_HISTORY
where operation = ‘G’ "
TABNAME OPERATION OPERATIONTYPE OBJECTTYPE START_TIME END_TIME

------- --------- ------------- ---------- -------------- --------------
STAFF G F T 20050313120130 20050313122143
EMP_RES G F T 20050313122250 20050313122501
EMP_RES G F T 20050313123603 20050313124258
STAFF G N I 20050313124415 20050313124415
STAFF G N I 20050313124916 20050313125003
STAFF G F T 20050313133403 20050313135808
STAFF G F T 20050313234249 20050313234249
STAFF G N I 20050313234530 20050313234530
Figure 8-35. Query access to REORG information in Database History CL4636.0
Notes:
The DB_HISTORY administrative view returns information from the history files from all
database partitions.
The schema is SYSIBMADM.
Authorization
- SELECT privilege on the DB_HISTORY administrative view
- CONTROL privilege on the DB_HISTORY administrative view
- DATAACCESS authority
- DBADM authority
- SQLADM authority
Default PUBLIC privilege
V8.2
Instructor Guide
Uempty In a non-restrictive database, SELECT privilege is granted to PUBLIC when the view is
automatically created.
Usage note
When a data partitioned table is reorganized, one record for each reorganized data
partition is returned. If only a specific data partition of a data partitioned table is
reorganized, only a record the for the partition is returned.
Example
Select the database partition number, entry ID, operation, start time, and status
information from the database history files for all the database partitions of the database
to which the client is currently connected.
SELECT DBPARTITIONNUM, EID, OPERATION, START_TIME, ENTRY_STATUS
FROM SYSIBMADM.DB_HISTORY
The following is an example of output for this query.
DBPARTITIONNUM EID OPERATION START_TIME ENTRY_STATUS
-------------- -------------------- --------- -------------- ------------
0 1 A 20051109185510 A
Instructor Guide
Instructor notes:
Purpose — This shows an example of a query result using the view
SYSIBMADM.DB_HISTORY to retrieve information about previous REORG utilities.
Details —
Transition statement — Let's next look at the details for online table reorganizations.
V8.2
Instructor Guide
Uempty
Online (INPLACE) table reorganization

• Inplace Table Reorganization:
– Rows moved within existing table object to re-establish clustering, reclaim free space,
and eliminate overflows
– Executes as asynchronous background application (process name - db2reorg)
– Table must be at least three pages in size
– Cannot inplace reorg LONG/LOB data (use offline reorg)
• Attributes:
– Minimal extra storage requirement
– Incremental: benefit of effects seen immediately
– No iterative log processing phase
– Table quiesce for object switch over at end can be avoided
– Think of it as a Trickle Reorg
Table available for

full S/I/U/D access
during reorg
Figure 8-36. Online table reorganization CL4636.0
Notes:
The online or INPLACE table reorganization is primarily used when applications accessing
the table can not be suspended for the time required for an offline reorganization.
Inplace Table Reorganization
• Rows moved within existing table object to re-establish clustering, reclaim free space,
and eliminate overflows
• Executes as asynchronous background application (process name - db2reorg)
• Table must be at least 3 pages in size
• Cannot inplace reorg LONG/LOB data (use offline reorg)
Attributes:
• Minimal extra storage requirement
• Incremental: Benefit of effects seen immediately
• No iterative log processing phase
• Table quiesce for object switch over at end can be avoided
• Think of it as a Trickle Reorg
Instructor Guide
Instructor notes:
Purpose — This introduces the main features for an online or inplace table reorganization.
Details —
Transition statement — Let's look at the two algorithms for inplace reorganization.
V8.2
Instructor Guide
Uempty
DB2 10.5 Enhancements for Online Table
reorganization
• DB2 Version 10.5 enhances reorganization capabilities in the
following ways:
– INPLACE (online) table reorganization is supported for tables that use
adaptive compression.
– Reclaiming extents on insert time clustering (ITC) tables
• Consolidates sparsely allocated blocks into a smaller number of blocks
• Consolidation is done before empty extents are freed.
– CLEANUP OVERFLOWS suboption of INPLACE reorg to remove
pointer overflow records but bypass other row movement processing
• The table size will not be reduced
• Overflow records may have been introduced in compressed tables when
rows are updated
db2 REORG TABLE T1 INPLACE CLEANUP OVERFLOWS
Figure 8-37. DB2 10.5 Enhancements for Online Table reorganization CL4636.0
Notes:
DB2 10.5 includes online reorganization enhancements that save you time and reclaims
more space when you maintain your tables.
DB2 Version 10.5 enhances reorganization capabilities in the following ways:
• Inplace (online) table reorganization is now supported for tables that use adaptive
compression.
• Reclaiming extents on insert time clustering (ITC) tables now consolidates sparsely
allocated blocks into a smaller number of blocks. This consolidation is done before
empty extents are freed.
• Inplace table reorganization has a new option CLEANUP OVERFLOWS.
- An INPLACE CLEANUP OVERFLOWS reorganization traverses the table and
searches for pointer or overflow records.
- Any record found is converted to a normal record by the operation.
Instructor Guide
- This operation improves performance for tables that have a significant number of
pointer or overflow records.
- The operation does not result in a reduction of the size of the table.
V8.2
Instructor Guide

Purpose — To discuss some enhancements for online table reorganizations that became
available with DB2 10.5.
Details —
Transition statement — Let's look at the two algorithms for inplace reorganization.
Instructor Guide
Online table reorganization: Algorithm choices
TIME
VACATE PAGE RANGE: MOVE & CLEAN to make space Move rows from end of table, filling up holes at
the start
free
space
FILL PAGE RANGE: MOVE & CLEAN to fill space
VACATE PAGE RANGE: MOVE & CLEAN to make space
Reclustering Space Reclamation

y Uses clustering index during y Backward scan starts at end, fills
FILL phases holes earlier in table identified by
simultaneous forward scan
Figure 8-38. Online table reorganization: Algorithm choices CL4636.0
Notes:
There are two algorithms for the inplace reorganization. A reclustering reorg rearranges the
rows based on a clustering index. The space reclamation reorg is used to reduce the size
of a table that has many deleted rows, but does not put the remaining data into any key
sequence. Both algorithms are designed to minimize the impact of this reorganization for
applications that will be accessing the table during the reorg.
Reclustering: The reclustering reorg processes the table in a series of repeated cycles:
1. Vacate Page Range – The first step is to move all the rows from a group of pages into
available gaps in higher numbered pages. The goal is to use available free space as the
temporary location of these rows and not require additional pages to be allocated. This
leaves that group of pages empty.
2. Fill Page Range – The previously emptied pages are filled in with rows sequenced by
the clustering index. If the table has a defined PCTFREE, that percentage of each page
will remain empty to allow new rows to be added or existing rows to be extended over
time.
V8.2
Instructor Guide
Uempty 3. Commit Page Range – When the selected group of pages has been refilled with data
rows, the changes will be committed to release any locks held.
The reorg will then start another vacate page range step for the next group of pages until
the entire table is processed. The reorg could be paused if needed.
Space Reclamation: The space reclamation reorg is not index-based. Its goal is to reduce
the number of allocated pages utilizing free space within the table. Two scanners are used,
one starting at the first page and a reverse scan starting at the end of the table. The rows at
the end of the table will be moved into the gaps found in other pages leaving as many
empty pages at the end of the table as possible. This processing will be broken into page
groups that will be committed to release locks before moving to the next group of pages. If
the table has a defined PCTFREE, the forward processing will limit the number of rows left
in each page based on the defined freespace. This will continue until the entire table has
been processed.
Instructor Guide
Instructor notes:
Purpose — This describes the two algorithms utilized by the Online Reorganization utility.
All of the changes by the online REORG will be logged for recoverability, so even if the
system or the reorg process fails, the last unit of work will be rolled back and the table will
remain available to other applications.
Details —
Transition statement — Next we will discuss how the online REORG is able to perform all
of this moving of data rows while allowing applications to access the table.
V8.2
Instructor Guide
Uempty
Online table
reorganization: How does it work? (1 of 2)
• Ensuring scanners do not miss a row or see a row twice:
– Each row that is moved leaves behind a small RP (reorg pointer) record
that contains the RID of the row's new location
– The row is inserted as a RO (reorg overflow) record that contains the data
– Once reorg finishes moving a set of rows, it waits for all existing accesses to end
(these existing accesses are called old scanners)
– During this wait time, new accesses can start - they are new scanners
– Old scanners use OLD RIDs
• Follow RP records; Ignore RO records
– New scanners use NEW RIDs
• Ignore RP records; Honor RO records
– Once all old scanners have completed, reorg cleans up moved rows:
• RPs deleted
• ROs converted to normal records
• Design favors online accesses whenever possible:

– Reorg speed was not #1 design consideration
– In general, reorg's rate of progress will be highest
with shorter online transactions
Figure 8-39. Online table reorganization: How does it work? (1 of 2) CL4636.0
Notes:
The online REORG is moving rows around in the table while applications are allowed to
process SQL. If an application is performing a table scan, the online REORG needs to be
performed in such a way that the application does not miss any rows or access any row
twice.
Ensuring scanners do not miss a row or see a row twice:
• Each row that is moved leaves behind a small RP (reorg pointer) record that contains
the RID of the row's new location
• The row is inserted as a RO (reorg overflow) record that contains the data
• Once reorg finishes moving a set of rows, it waits for all existing accesses to end (these
existing accesses are called old scanners)
• During this wait time, new accesses can start - they are new scanners
• Old scanners use OLD RIDs
- Follow RP records; Ignore RO records
Instructor Guide
• New scanners use NEW RIDs

- Ignore RP records; Honor RO records
• Once all old scanners have completed, reorg cleans-up moved rows
- RPs deleted
- ROs converted to normal records
The online REORG processing was designed to minimize any delays to applications
accessing the table during the REORG, not to optimize the duration of the REORG utility. If
the applications accessing the table during the online REORG are relatively short
transactions, then the time the REORG processing spends waiting for the old scanners to
complete will allow the online REORG to complete in less time.
V8.2
Instructor Guide

Purpose — This explains the concepts of old scanners and new scanners that the online
REORG uses to prevent applications from missing any rows or seeing any row twice due to
the data movement performed by the REORG.
Details —
Transition statement — Lets look at an example of the online REORG handling for an
application scanning a table during the REORG.
Instructor Guide
Online table
reorganization: How does it work? (2 of 2)
One table scanner when reorg begins moving a batch of rows.

This scanner will consider itself an old scanner.
(Could be more scanners of course, this is just an example)
Reorg moves the batch of records to their proper place in the

table. The new records are created as reorg overflow records,
and the old records are converted to reorg pointer records
(they contain the RID of the reorg overflows).
When the old scanner fetches the reorg overflows, it will
ignore them. When it fetches the reorg pointers, it will follow
the RID to obtain the record data from the corresponding reorg
overflow. Meanwhile new scanners can begin: scanners that
begin after reorg finishes moving a batch of rows, are known as
new scanners.
New scanners ignore reorg pointers and honor reorg

overflows (that is, they treat them as normal data records).
Once all the old scanner(s) have completed, the reorg

pointers can be removed (cleaned up), the reorg overflows
can be converted to normal records, and reorg can commit this
batch and move on to the next batch.
Figure 8-40. Online table reorganization: How does it work? (2 of 2) CL4636.0
Notes:
In the example, there is one table scanner active when reorg begins moving a batch of
rows. This scanner will consider itself an old scanner. There could be more scanners of
course, this is just an example.
REORG moves the batch of records to their proper place in the table. The new records are
created as REORG Overflow records (RO), and the old records are converted to REORG
Pointer records (RP), they contain the RID of the reorg overflows.
When the old scanner fetches the REORG Overflows (RO), it will ignore them. When it
fetches the REORG Pointers (RP), it will follow the RID to obtain the record data from the
corresponding REORG overflow. Meanwhile, new scanners can begin. Scanners that
begin after REORG finishes moving a batch of rows are known as new scanners.
New scanners ignore REORG Pointers (RP) and honor REORG Overflows (RO), treating
them the same as normal data records.
V8.2
Instructor Guide
Uempty Once all the old scanner(s) have completed, the REORG Pointers can be removed
(cleaned up), the REORG Overflows can be converted to normal records, and REORG can
commit this batch and move on to the next batch.
Instructor Guide
Instructor notes:
Purpose — This shows and example where an old scanner and new scanner are
processing the table during the online REORG.
Details —
Transition statement — Let's next look at the processing at the start and end for online
REORGs.
V8.2
Instructor Guide
Uempty
Online table reorganization

Read access
Timeline Write access
Writers Old Accesses

Old Accesses Gone Gone
Gone
time
Online Drain Move rows in batches: If Truncate

Reorg existing – Move set of rows, cleanup, commit Requested
Issued – Repeat until done – S table lock
accesses – New accesses are always allowed – Drain existing
– While accesses while
allowing allowing new read
new accesses to start
accesses – Truncate table
to start – Commit
Graphical View of Table: Incremental Reclustering Benefits Seen Immediately

Clustered Region Truncated Table
Figure 8-41. Online table reorganization CL4636.0
Notes:
The example shows several things about the processing flow for online table
reorganization.
First, when the online table REORG starts, it will need to wait for any applications that are
in the process of scanning the table to complete their scans. This is shown as draining the
scanners that were already active when the online REORG began. New scanners are not
prevented from starting while the online REORG waits.
Once the old scanners have completed, the online REORG can begin moving groups of
rows. When the moving has completed and the pointers have been cleaned up, the group
of changes can be committed and another group started.
The activity of applications might cause the online REORG to wait, but DB2 will not prevent
new applications from starting scans.
As the online REORG makes it way through the table, the rows will become gradually more
clustered, if this is a reclustering REORG. This will provide some immediate benefit to the
applications that are beginning to process while it is still in progress.
Instructor Guide
When the online REORG completes that last group of rows, there might be some empty
pages left at the end of the table. If the NOTRUNCATE TABLE option is not specified, the
online REORG will need to acquire the read/only S lock for the table in order to safely
release those empty pages. This could cause an update transaction to enter a lock wait
and possibly time out. In order to avoid this potential impact to applications, the
NOTRUNCATE TABLE option can be used to skip the table lock and leave the empty
pages at the end of the table.
V8.2
Instructor Guide

Purpose — This shows some of the interaction between the online table REORG and
applications that are accessing the table during its processing.
Details —
Transition statement — Let's look at some usage considerations for online table
reorganization.
Instructor Guide
Online table
reorganization: Usage considerations
• If space-reclamation is primary consideration, do not specify a clustering index
– Less row moves, faster reorg
• Use NOTRUNCATE option to avoid read-only access period at end (required for
object truncation)
• If a table has a clustering index, it will be used by default
REORG {TABLE table-name Table-Clause | INDEXES ALL FOR TABLE table-name

Index-Clause} [On-DbPartitionNum-Clause]
Table-Clause:
[INDEX index-name] [[ALLOW {READ | NO} ACCESS]
[USE tablespace-name] [INDEXSCAN] [LONGLOBDATA]] |
[INPLACE [ [ALLOW {WRITE | READ} ACCESS] [NOTRUNCATE TABLE]
[START | RESUME] | {STOP | PAUSE} ]]
Examples:
// Recluster data, allowing write access always
REORG TABLE t1 INDEX i1 INPLACE ALLOW WRITE ACCESS NOTRUNCATE TABLE
// Reclaim all embedded unused space in the table

REORG TABLE t1 INPLACE ALLOW WRITE ACCESS
// Reclaim space on partition 5

REORG TABLE t1 INPLACE ALLOW WRITE ACCESS ON DBPARTITIONNUM 5
Figure 8-42. Online table reorganization: Usage considerations CL4636.0
Notes:
If the primary purpose for the online reorganization is space-reclamation, then do not
specify a clustering index. This will reduce the number times that rows are moved and
improve online REORG performance. If a table has a clustering index, it will be used by
default, so all online REORGs will be reclustering REORGs.
The NOTRUNCATE TABLE option can be specified to avoid the read-only access period at
the end of the online reorg, that is required for object truncation.
Examples:
To Recluster data, allowing write access:
REORG TABLE t1 INDEX i1 INPLACE ALLOW WRITE ACCESS NOTRUNCATE TABLE
To reclaim all embedded unused space in the table:
REORG TABLE t1 INPLACE ALLOW WRITE ACCESS
To reclaim space selectively on one database partition, partition 5:
REORG TABLE t1 INPLACE ALLOW WRITE ACCESS ON DBPARTITIONNUM 5
V8.2
Instructor Guide

Purpose — This suggests that not specifying an index for the inplace REORG is a way to
reduce the elapsed times for online REORGs. It also suggests using the NOTRUNCATE
option to eliminate the read only table locking at the end of an online table REORG.
Details —
Transition statement — Let's look at monitoring the online table REORG processing.
Instructor Guide
Online table reorganization: Monitoring with db2pd
• Use db2pd -reorg to view progress

TempSpaceID
0x00007F6C5FC37BF8 11 5 n/a n/a n/a HIST2 Online 1
11
Table Reorg Stats:

Address TableName Start End
0x00007F6C5FC37BF8 HIST2 10/24/2013 09:27:13 10/24/2013 09:27:13
PhaseStart MaxPhase Phase CurCount MaxCount Status Completion

n/a n/a n/a 0 1601 Paused
Progress reported
in units of pages
Figure 8-43. Online table reorganization: Monitoring with db2pd CL4636.0
Notes:
The db2pd -reorg report can be used to monitor processing for online table
reorganizations. The report may also show online table reorganizations that have
completed, if the database remains active.
The report shows an example of a table with a status of ‘Paused’, indicating that the
processing is not active. This status could occur if there was a problem in the processing or
if the REORG command with the PAUSE option was issued. A REORG command with the
RESUME can be used to restart processing.
V8.2
Instructor Guide

Purpose — To show an example of a db2pd report that shows an online table
reorganization that has been paused.
Details —
Transition statement — Let's look at the considerations for reorganization of MDC tables.
Instructor Guide
REORG and MDC tables

• Online (INPLACE) table REORG is not supported for MDC tables
• REORG of a MDC table just does compaction (space reclaim) since
clustering is already maintained via MDC Block Indexes
• No index can be specified with the REORG command - block index is
used under-the-covers
• REORG processing is Block Index scan-based - No table scan sort
• As clustering is maintained within a MDC table, the use of such tables
reduces the need to reorganize
• RECLAIM EXTENTS ONLY mode releases free extents without a full
table reorganization:
– Very fast , does not copy current data
– Unused extents returned to table space
– Useful after large rollout based on dimension column
– Table can remain online using ALLOW WRITE ACCESS
Figure 8-44. REORG and MDC tables CL4636.0
Notes:
For Multidimensional Clustered (MDC) tables, only offline table REORG is supported.
Since the rows are maintained in blocks by the defined dimension columns, it is
unnecessary to recluster a MDC. The system required block indexes are used to insure
clustering. Therefore, a REORG Utility can not specify the INDEX option for a MDC table
REORG.
The REORG of a MDC table only does compacting or space reclamation. This could be
used if many rows have been deleted so the MDC could be reorganized into a smaller
number of blocks.
Since the REORG processing is Block Index scan-based, no table scan sort phase would
be used.
In general, since clustering is always maintained within a MDC table, the use of such tables
reduces the need to reorganize them to correct a reduced cluster ratio.
MDC tables can be reorganized to reclaim extents that are not being used. Starting in DB2
9.7, a complete offline table reorganization is no longer needed to reclaim the MDC
V8.2
Instructor Guide
Uempty extents. Both the REORG TABLE command and the db2Reorg API have a reclaim extents
option. As part of this method to reorganize MDC tables, you can also control the access to
the MDC table while the reclaim operation is taking place. Your choices include: no access,
read access, and write access (which is the default). Reclaimed space from the MDC table
can be used by other objects within the table space.
Instructor Guide
Instructor notes:
Purpose — This explains that some of the reorg options do not apply for MDC tables, but
in some cases, the use of a MDC table will reduce or eliminate the need to perform a
REORG. We will discuss the RECLAIM EXTENTS ONLY option in more detail in the MDC
lecture topic.
Details —
Transition statement — Next we will briefly discuss the reorganization options for
V8.2
Instructor Guide
Uempty
REORG and range-partitioned tables

• Online (INPLACE) table REORG is not supported for
range-partitioned tables
• For Full table reorganization, all partitions are processed
serially
• A single data partition can be reorganized using the ON DATA
PARTITION clause (DB2 9.7 Fix Pack 1 or later):
– If all indexes are partitioned, full read and write access is allowed to
other partitions
– You can concurrently reorganize different data partitions or
partitioned indexes on a partition
– REORG INDEXES ALL performs an index reorganization on a
specified data partition while allowing full read and write access to
the remaining data partitions of the table
Figure 8-45. REORG and range-partitioned tables CL4636.0
Notes:
The INPLACE option of REORG TABLE is not supported for range-partitioned tables.
Beginning with DB2 9.7 Fix Pack 1 and later fix packs, you can use the REORG command
on a partitioned table to perform a reorganization of the data of a specific partition or the
partitioned indexes of a specific partition. Only access to the specified data partition is
restricted, the remaining data partitions of the table retain full read and write access.
On a partitioned table, using the REORG TABLE or REORG INDEXES ALL command with
the ON DATA PARTITION clause specifying a partition of the table supports the following
features:
• REORG TABLE performs a classic table reorganization on the specified data partition
while allowing the other data partitions of the table to be fully accessible for read and
write operations when there are no nonpartitioned indexes (other than
system-generated XML path indexes) on the table. The supported access modes on the
partition being reorganized are ALLOW NO ACCESS and ALLOW READ ACCESS.
When there are nonpartitioned indexes on the table (other than system-generated XML
Instructor Guide
path indexes), the ALLOW NO ACCESS mode is the default and the only supported
access mode for the entire table.
• REORG INDEXES ALL performs an index reorganization on a specified data partition
while allowing full read and write access to the remaining data partitions of the table. All
access modes are supported.
You can issue REORG TABLE commands and REORG INDEXES ALL commands on a
data partitioned table to concurrently reorganize different data partitions or partitioned
indexes on a partition. When concurrently reorganizing data partitions or the partitioned
indexes on a partition, users can access the unaffected partitions but cannot access the
affected partitions. All the following criteria must be met to issue REORG commands that
operate concurrently on the same table:
• Each REORG command must specify a different partition with the ON DATA
PARTITION clause.
• Each REORG command must use the ALLOW NO ACCESS mode to restrict access to
the data partitions.
• The partitioned table must have only partitioned indexes if issuing REORG TABLE
commands. No nonpartitioned indexes (except system-generated XML path indexes)
can be defined on the table.
V8.2
Instructor Guide

Purpose — To briefly discuss the reorganization options for range-partitioned tables. The
detailed discussion of range-partitioned tables will occur in a later lecture unit.
Details —
Transition statement — Next we will see how the a range-partitioned table with
partitioned indexes appears in a REORGCHK report.
Instructor Guide
Using
REORGCHK reports for range-partitioned tables
TABLE STATS
SCHEMA.NAME CARD OV NP FP ACTBLK TSIZE F1 F2 F3 REORG

----------------------------------------------------------------------------------------
Table: PARTTAB.HISTORYPART
344859 0 8286 8290 - 24140130 0 72 100 ---
Data Partition: PART0
225855 0 6197 6198 - 15809850 0 63 100 -*-
40966 0 719 720 - 2867620 0 100 100 ---
INDEX STATS
SCHEMA.NAME INDCARD LEAF ELEAF LVLS NDEL KEYS ... F4 F5 F6 F7 F8 REORG

-----------------------------------------------------------------------------------------------
Index: PARTTAB.HISTPIX1
225855 815 0 3 0 1000 ... 9 59 48 0 0 *----
40966 96 0 2 0 999 ... 12 93 - 0 0 *----
37418 88 0 2 0 996 ... 12 93 - 0 0 *----
Figure 8-46. Using REORGCHK reports for range-partitioned tables CL4636.0
Notes:
The visual shows a portion of a REORGCHK report from a range-partitioned table with
several partitioned indexes. In the table report there is a summary row that provides
statistics at the table level. This is followed by a set of statistics for each data partition. The
sample report shows that the data partition PART0 is flagged for reorganization while the
next data partition PART1 is not flagged. Beginning with DB2 9.7 Fix Pack 1 the ON DATA
PARTITION clause could be used to selectively reorganize a single data partition.
The index statistics portion of the sample report provides the reorganization related stats
for each partition of any partitioned index. Non-partitioned indexes span the entire table
and would have a single set of statistics per index in the report.
V8.2
Instructor Guide

Purpose — To look at an example of some data from a REORGCHK report from a
range-partitioned table that has several partitioned indexes. With the addition of the ON
DATA PARTITION option for the REORG utility, this report would be used determine if a
partition level table reorganization might be beneficial.
Details —
Transition statement — Let's look at advantages and disadvantages for the offline table
REORG utility.
Instructor Guide
Offline or Online table REORG? (1 of 2)

Offline Table REORG:
• PROS:
– Provides the fastest table reorganization
– Allows LOB/LONG columns to be reorganized (LONGLOBDATA is not allowed for
INPLACE reorg)
– Indexes are rebuilt once the table has been reorganized
– Original version of table can be read only up until the last phase of reorg (replace
phase)
• CONS:
– Large space requirement: Shadow copy approach so need approximately twice as
much space as the original table
– Limited access: Read-only until Replace/Copy phase
– All-or-nothing process
– Can only be stopped by the application or user who understands how to stop the
process
• Recommendation: Choose this method if you can reorganize tables during

a maintenance window
Figure 8-47. Offline or Online Table REORG? (1 of 2) CL4636.0
Notes:
Classic table reorganization
This method provides the fastest table reorganization, especially if you do not need to
reorganize LOB or LONG data. In addition, indexes are rebuilt in perfect order after the
table is reorganized. Read-only applications can access the original copy of the table
except during the last phases of the reorganization, in which the permanent table replaces
the shadow copy of the table and the indexes are rebuilt.
On the other hand, consider the following possible disadvantages:
• Large space requirement
Because classic table reorganization creates the shadow copy of the table, it can
require twice as much space as the original table. If the reorganized table is larger than
the original, reorganization can require more than twice as much space as the original.
The shadow copy can be built in a temporary table space if the table tablespace is not
large enough, but the replace phase performs best in the same DMS table space.
Tables in SMS table spaces must always store the shadow copy in temporary space.
V8.2
Instructor Guide
Uempty • Limited table access

Even read-only access is limited to the first phases of the reorganization process.
• All or nothing process
If the reorganization fails at any point, it must be restarted from the beginning on the
nodes where it failed.
• Performed within the controller of the application that invokes it
The reorganization can be stopped only by that application or by a user who
understands how to stop the process and has authority to execute the FORCE
command for the application. Once the offline reorganization enters the replace/copy
phase, the REORG must complete or the table will not be available to be accessed,
which in some cases means that recovery would be required.
Instructor Guide
Instructor notes:
Purpose — This explains the major advantages and disadvantages for offline table
reorganization.
Details —
Transition statement — Let's look at the major advantages and disadvantages for online
V8.2
Instructor Guide
Uempty
Offline or Online table REORG? (2 of 2)

Online Table REORG:
• PROS:
– Allows applications to access the table while executing
– Can be paused and resumed
– Runs asynchronously
– Requires less working storage since table is incrementally processed
• CONS:
– Slower than Classic method (~10-20x)
– Only allowed for tables with Type-2 indexes
– Cannot reorganize LONG/LOBs
– Indexes are maintained, not rebuilt, so index reorganization may subsequently be required
– Requires more log space
• Recommendation: Choose this method for 24x7 operations with minimal maintenance
windows
• Consider ADMIN_MOVE_TABLE procedure alternative:
– Target table can be reorganized prior to SWAP
– Minimal outage required for the SWAP phase
Figure 8-48. Offline or Online table REORG? (2 of 2) CL4636.0
Notes:
In-place table reorganization
The in-place method is slower and does not ensure perfectly ordered data, but it can allow
applications to access the table during the reorganization. In addition, in-place table
reorganization can be paused and resumed later by anyone with the appropriate authority
by using the schema and table name. Consider the following trade-offs:
• Imperfect index reorganization
You might need to reorganize indexes later to reduce index fragmentation and reclaim
index object space.
• Longer time to complete
When required, in-place reorganization defers to concurrent applications. This means
that long-running statements or RR and RS readers in long-running applications can
slow the reorganization progress. In-place reorganization might be faster in an OLTP
environment in which many small transactions occur.
• Requires more log space
Instructor Guide
Because in-place table reorganization logs its activities so that recovery is possible after
an unexpected failure, it requires more log space than classic reorganization. It is
possible that in-place reorganization will require log space equal to several times the
size of the reorganized table. The amount of required space depends on the number of
rows that are moved and the number and size of the indexes on the table.
In general, the recommendation is to choose in-place table reorganization for 24x7
operations with minimal maintenance windows.
Information
The ADMIN_MOVE_TABLE procedure might, in some cases, be considered as an

alternative to using an INPLACE table reorganization. It can copy and reorganize a table
allowing applications to continue with full read and write access but does require a brief
time when the table would be offline to swap to the new table. Running the procedure in
step mode would allow the SWAP operation to be carefully timed to minimize impact to
applications.
V8.2
Instructor Guide

Purpose — This explains the major advantages and disadvantages for online table
reorganization. The details on ADMIN_MOVE_TABLE are provided in the Table Movement
unit, but it is mentioned here that there might be cases where it could be used instead of
the INPLACE REORG.
Details —
Transition statement — Let's look at the processing for online index reorganization.
Instructor Guide
Online index reorganization and creation (1 of 2)

• Rebuild Phase: Scan table, sort keys, and (re)build index:
– The new index is created in:
• Shadow Index Object : New, separate storage object (for example, SQL00003.IN1)
• : Used in the case of REORG
• Ghost index : New/free pages in existing index storage object for the table
• : Used in the case of CREATE
– Concurrent R/W accesses allowed
• All update transactions that touch the indexed columns while a create/reorg is in progress, log special
• informational log records for the new index (they don't update the shadow/ghost index)
• These informational log records are logged as well as buffered in a separate memory area
• Log Catch-up Phase: When done building shadow/ghost, scan memory buffer and apply
informational log records to shadow/ghost index
– Repeat as necessary to account for subsequent updates
• Final Log Catch-up Iteration and Shadow/Ghost switch-over:

– REORG: Quiesce all accesses, switch over to new object, drop old
– CREATE: Quiesce write accesses, make new ghost index visible
Table available for full R/W

access during create/reorg
(until final
switch-over)
Figure 8-49. Online index reorganization and creation (1 of 2) CL4636.0
Notes:
Online index reorganization
DB2 allows applications to read and update a table and its existing indexes during an index
reorganization using the REORG INDEXES command.
During online index reorganization, the entire index object (that is, all indexes on the table)
is rebuilt. A shadow copy of the index object is made, leaving the original indexes and the
table available for read and write access. Any concurrent transactions that update the table
are logged. Once the logged table changes have been forward-fitted and the new index
(the shadow copy) is ready, the new index is made available. While the new index is being
made available, all access to the table is prohibited.
The default behavior of the REORG INDEXES command is ALLOW NO ACCESS, which
places an exclusive lock on the table during the reorganization process, but you can also
specify ALLOW READ ACCESS or ALLOW WRITE ACCESS to permit other transactions
to read from or update the table.
V8.2
Instructor Guide
Uempty For tables using DMS table spaces, the indexes can be created in large table spaces
(formerly long table spaces). In situations where the existing indexes consume more than
32 GB, this will allow you to allocate sufficient space to accommodate the two sets of
indexes that will exist during the online index reorganization process.
Instructor Guide
Instructor notes:
Purpose — This shows some of the internal processing for online index reorganization and
also for creating new indexes.
Details —
Transition statement — Let's look at processing for online index reorganization with
concurrent application access to the table.
V8.2
Instructor Guide
Uempty
Online index reorganization and creation (2 of 2)
Timeline
New Accesses (Readers, Updaters)
Readers
Gone
Updaters New Accesses (Reorg)

Old Gone New
Accesses Accesses
Gone (Create)
Log time
Online Reorg: Drain Rebuild Phase Reorg: Create:
Catch-up Acquire Z
Index Ensure existing User
Phase table lock,
Reorg this
new
is a
accesses, switch to
commits
Last
(Create) UofW while iteration new object,
issued allowing acquires S remove old,
new table lock commit
accesses before final Create:
log Make new index
to start
processing visible;
iteration S table lock held
until you commit
Figure 8-50. Online index reorganization and creation (2 of 2) CL4636.0
Notes:
This shows the processing for online index reorganization using the REORG INDEX
command and also includes the processing for creating new indexes, using a CREATE
INDEX SQL statement because the steps are similar.
Drain existing accesses – When the index reorganization starts, a new unit of work is
started. The index REORG will need to wait for applications accessing the table to stop
before the rebuild phase starts. New applications can begin to access the table while the
index REORG waits.
Rebuild Phase – The indexes are built as a shadow object for index REORG, or a ghost
index for create index.
Log Catchup Phase – Index changes that occurred since the beginning of the Rebuild
phase are applied to the shadow/ghost index. A Read(S) lock will be acquired to wait for
update applications to complete.
Instructor Guide
Final Processing – For Index REORG, the Superexclusive(Z) lock will be acquired in
order to switch over to the new set of indexes. For Create Index, the Read(S) lock will not
be released until the application issues a commit.
V8.2
Instructor Guide

Purpose — This shows some of the internal processing and locking performed for online
index reorganization and also for creating new indexes.
Details —
Transition statement — Let's look at command syntax for index REORGs and index
creation.
Instructor Guide
Online index reorganization syntax
REORG
>--+-+-INDEXES ALL FOR TABLE--table-name------------+--| Index clause |-+-->

| '-INDEX--index-name--+-----------------------+-' |
| '-FOR TABLE--table-name-'
Index-Clause:
.-REBUILD---------------.
|--+--------------------+--+-----------------------+------------|
+-ALLOW NO ACCESS----+ '-space-reclaim-options-'
+-ALLOW WRITE ACCESS-+
'-ALLOW READ ACCESS--'
space-reclaim-options:
|--+--------------------+--+-----------------+------------------|
| .-ALL---. | '-RECLAIM EXTENTS-'
'-CLEANUP--+-------+-'
'-PAGES-'
Figure 8-51. Online index reorganization and creation syntax CL4636.0
Notes:
The INDEX option specifies an individual index to be reorganized on a data partitioned
table. Reorganization of individual indexes are only supported for nonpartitioned indexes
on a partitioned table.
The REBUILD option is the default, which means the index contents will be rebuilt.
When CLEANUP is requested, a cleanup rather than a REBUILD is done. The indexes are
not rebuilt and any pages freed up are available for reuse by indexes defined on this table
only.
V8.2
Instructor Guide

Purpose — This shows examples of the syntax used for index reorganization.
Details —
Transition statement — Let's look at some additional considerations for online index
reorganization and index creation.
Instructor Guide
Online index
reorganization and creation: Considerations
• Ensure the Utilities Heap is large enough to buffer informational log records:
– If utility heap is too small, log catchup phase may need to retrieve logs from archive
– Watch for warning messages in Administration Notification Log
– UTIL_HEAP_SZ can be dynamically changed
• For REORG, the shadow index object is created in the same table space as
the existing index object for ALLOW READ/WRITE ACCESS:
– If you do not have enough space for both in this table space:
• For SMS or DMS: consider adding more disk to the table space, or,
• For DMS: If you are approaching the regular DMS table space maximum size,
consider placing indexes in a LARGE table space
CREATE LARGE TABLESPACE myindexts MANAGED BY DATABASE ...

CREATE TABLE t1 (c1 INT,...) ... INDEXES IN myindexts
CREATE INDEX i1 ON t1 (c1 )
COMMIT
Figure 8-52. Online index reorganization and creation: Considerations CL4636.0
Notes:
Here are some additional considerations for online index reorganization and index creation.
It might be necessary to increase the size of the database utility heap when reorganizing
indexes if the ALLOW WRITE ACCESS option is included. The informational log records
will be saved in the database utility heap and used during the Log Catchup phase. If the
utility heap is too small, log catchup phase might need to retrieve logs from archive. Watch
for warning messages in Administration Notification Log. The UTIL_HEAP_SZ can be
dynamically changed, if necessary.
For REORG, the shadow index object is created in the same table space as the existing
index object for ALLOW READ/WRITE ACCESS.
If you do not have enough space for both in this table space:
• Consider adding more disk to the table space
• If you are approaching the regular DMS table space maximum size, consider placing
indexes in a LARGE table space
V8.2
Instructor Guide

Purpose — This explains that the processing for index reorganization might require
increased memory in the utility heap and increased table space disk space.
Details —
Transition statement — Let's look at the CLEANUP ONLY option for index reorganization
.
Instructor Guide
Characteristics for standard Online Index

reorganization
• The REORG utility with the INDEXES ALL FOR TABLE can be used to
rebuild all of the indexes on a table
• The ALLOW WRITE ACCESS option does not block application
changes to the table during index reorganization
– The default mode is ALLOW READ ACCESS
– An exclusive table lock is necessary to swap the new index object for
the old one at the end of processing
• Index reorganization is a single long running transaction, which could
impact active log space utilization
• The new index object needs to be allocated in the same table space as
the original index object
– Requires twice the index disk space
– Can generate I/O workload to build the index object copy
Figure 8-53. Characteristics for standard Online Index reorganization CL4636.0
Notes:
The visual shows some of the characteristics for the index reorganization processing with
DB2 9.7.
The ALLOW WRITE ACCESS option allows the indexes for a table to be reorganized with
minimal loss of access to the table, but an exclusive lock is required to swap to the new set
of indexes at the end of processing.
The REORG utility uses an index copy during index reorganization, which requires the
index table space to be able to hold both copies at the same time.
A lengthy index reorganization could impact database log space requirements, since it runs
as a long running single transaction.
V8.2
Instructor Guide

Purpose — To review some of the characteristics for index reorganization at the DB2 9.7
level of function.
Details —
Transition statement — Next we will discuss the processing performed for the RECLAIM
EXTENTS option with index reorganization.
Instructor Guide
Starting with DB2 10.1 the RECLAIM EXTENTS option

can be used for reclaiming unused index object space
• Phases of processing for RECLAIM EXTENTS for index reorganization
– Phase 1 - Collocation
• Move pages around within the index object to form empty extents
• During this phase the reorg will do periodic commits
– Phase 2 - Drain
• Waits for concurrent access to release table locks
• Does not prevent new application requests from starting
• Does not create new log records in this phase
– Phase 3 – Extent reclaim
• Free extents in the index object are release to the table space
• Some utilities and commands are compatible
– IMPORT (insert), EXPORT, BACKUP, RUNSTATS
– ALTER TABLESPACE
• Some utilities are not compatible
– IMPORT (replace), LOAD
Figure 8-54. Starting with DB2 10.1 the RECLAIM EXTENTS option can be used for reclaiming unused index object space CL4636.0
Notes:
Beginning with DB2 10.1, the REORG utility provides the option RECLAIM EXTENTS,
which moves index pages around within the index object to create empty extents, and then
free these empty extents from exclusive use by the index object and makes the space
available for use by other database objects within the table space. Extents are reclaimed
from the index object back to the table space.
V8.2
Instructor Guide

Purpose — To discuss using the RECLAIM EXTENTS option of the REORG utility to
release unused index space.
Details —
Transition statement — Next we will see how the ADMIN_GET_INDEX_INFO table
function can be used to check for reclaimable index space.
Instructor Guide
Checking indexes for reclaimable space
For indexes in DMS table spaces, the RECLAIMABLE_SPACE value returned

by the ADMIN_GET_INDEX_INFO table function is an estimate of the disk space
that can be reclaimed from the entire index object by running
the REORG INDEXES command with the RECLAIM EXTENTS option.
select varchar(indname,20) as index_name,

iid, index_object_l_size, index_object_p_size,
reclaimable_space
from table(admin_get_index_info('','CLPM',NULL) ) as T1
INDEX_NAME IID INDEX_OBJECT_L_SIZE INDEX_OBJECT_P_SIZE RECLAIMABLE_SPACE

-------------------- ------ -------------------- -------------------- --------------------
HIST1IX1 1 2784 2784 480
HIST1IX2 2 2784 2784 480
HIST2IX1 1 2784 2784 448
HIST2IX2 2 2784 2784 448
Figure 8-55. Checking indexes for reclaimable space CL4636.0
Notes:
The ADMIN_GET_INDEX_INFO table function can be used to query index status
information. The RECLAIMABLE_SPACE column provides an estimate of disk space, in
kilobytes, that can be reclaimed from the entire index object by running the REORG
INDEXES or REORG INDEX command with the RECLAIM EXTENTS option.
V8.2
Instructor Guide

Purpose — To show an example of a SQL query that uses the ADMIN_GET_INDEX_INFO
table function to check for reclaimable space in the index object for DB2 tables.
Details —
Transition statement — Next we will see examples of using the RECLAIM EXTENTS
option of index reorganization.
Instructor Guide
Using REORG INDEXES to reclaim index space

REORG indexes all for table clpm.hist1 allow write access
cleanup all reclaim extents ;
REORG indexes all for table clpm.hist2 allow write access

cleanup all reclaim extents ;
INDEX_NAME IID INDEX_OBJECT_L_SIZE INDEX_OBJECT_P_SIZE RECLAIMABLE_SPACE

-------------------- ------ -------------------- -------------------- --------------------
HIST1IX1 1 2240 2240 0
HIST1IX2 2 2240 2240 0
HIST2IX1 1 2272 2272 0
HIST2IX2 2 2272 2272 0
Figure 8-56. Using REORG INDEXES to reclaim index space CL4636.0
Notes:
The sample REORG INDEXES ALL commands could be used to release unused index
object disk space without the processing overhead of a full index reorganization.
V8.2
Instructor Guide

Purpose — To show the syntax used for running the REORG utiltiy to reclaim index pages.
Details —
Transition statement — Next we will look at monitoring for an index reorganization using
the db2pd command.
Instructor Guide
Monitoring online index REORG status with db2pd

commands
db2pd -reorgs index –db testdb
Index Reorg Stats:

Retrieval Time: 02/08/2010 23:04:21
TbspaceID: -6 TableID: -32768
Schema: TEST1 TableName: BIGRPT
Access: Allow none
Status: Completed
Start Time: 02/08/2010 23:03:55 End Time: 02/08/2010 23:04:04
Total Duration: 00:00:08
Prev Index Duration: -
Cur Index Start: -
Cur Index: 0 Max Index: 2 Index ID: 0
Cur Phase: 0 ( - ) Max Phase: 0
Cur Count: 0 Max Count: 0
Total Row Count: 750000
Figure 8-57. Monitoring online index REORG status with db2pd commands CL4636.0
Notes:
You can use the db2pd command to monitor the progress of index reorganization
operations on a database. Issue the db2pd command with the -reorgs index parameter.
The visual shows an example of a db2pd report indicating that an index reorganization has
completed.
V8.2
Instructor Guide

Purpose — To show an example of the db2pd report that could be used to monitor index
reorganization.
Details —
Transition statement — Let's look at logging for online table reorganization.
Instructor Guide
Online Table and Index REORG: Logging

• Online Table Reorg can generate many log records but uses
internal commits to limit the active log space needed:
– Reclustering Reorg may move each row twice which is logged
– Each Index on the table is updated several times per row movement
and must be logged
– The reclustering index may experience splits and merges which are
logged
– Log space depends on:
• Number of rows moved
• Number of indexes on the table
• Index key size
• Online Index Reorg processing is a single unit of work. If it

runs out of log space, it will be completely rolled back.
Figure 8-58. Online Table and Index REORG: Logging CL4636.0
Notes:
As Online Table REORG (OLR) consumes log space it frequently issues internal commits
and hence does not hold significant logs as active. Online Index REORG does not
internally commit as it executes and hence holds the log back. If there is insufficient active
log space, Online Index REORG will fail and all work done will be rolled back.
It is likely that every row has to be moved twice during OLR. Given one index, each row
move has to update the index key to add the new location, and then when all users of the
old location are gone, the key is updated again to remove the old reference. The row is
moved back and the steps are followed all over again. So, for that 1 row, there are quite a
few logical reads and writes, probes of the index, etc. All this is also logged so that the OLR
is fully recoverable, so there is a minimum of 2 data log records (including the row data
each time) and 4 index log records (including the key data each time). The reclustering
index in particular will also be prone to filling up the index pages causing index splits and
merges which also must be logged.
In a nutshell, log space required for OLR is extremely variable and is entirely dependent on
the number of rows being REORGed, the number of indexes, the size of the index keys,
V8.2
Instructor Guide
Uempty and how unorganized the table is to start with. Best practice is to establish a typical
benchmark for log space utilization.
Instructor Guide
Instructor notes:
Purpose — This discusses the logging produced by an online table or index
reorganization. The total logging for an online table REORG can be several times the size
of the table, although the units of work will not be very large.
Details —
Transition statement — Let's look at logging for offline table reorganization.
V8.2
Instructor Guide
Uempty
Offline table REORG: Logging

• Offline Table Reorg logging is much less than online reorg:
– Each offline table reorg will generate a few control log records which
do not consume much log space
– For DMS-based tables that are reorged without using a temp space,
there are additional log records to manage the reorg object
– For a reclustering reorg, the RIDs of every record are logged
• RIDs not logged if database is not recoverable (LOGRETAIN)
Figure 8-59. Offline table REORG: Logging CL4636.0
Notes:
In general, the offline table REORG logs much less than an online table reorganization.
Each offline table REORG will generate a few control log records which do not consume
much log space.
If the REORG is a reclustering REORG, the RIDs of every record are logged. Each log
record for the RIDs can hold a maximum of 8000 RIDs, with each RID consuming 4 bytes.
This logging can be a contributing factor to running out of log space during a offline
REORG. If the database is using circular logs, LOGRETAIN=OFF, then the index RIDS are
not logged because they are only needed to support redoing a REORG during roll forward
processing.
Instructor Guide
Instructor notes:
Purpose — This explains the factors that effect logging for offline table reorganization.
Details —
Transition statement — Let's look at considerations for crash recovery processing
associated with the REORG utility.
V8.2
Instructor Guide
Uempty
REORG utility: Crash recovery

• Offline Table Reorg:
– Crash Recovery during Scan-sort or Build phase will simply revert to existing
table
– Crash Recovery during Copy/Replace will need to redo copy from shadow copy
of table
– Redoing the Copy/Replace will not take long if copy for a DMS table and no
USE option was specified
– Crash during Index Build phase will mark indexes as invalid to defer processing
– Loss of TEMPORARY table space could force table space recovery
• Online Table Reorg:
– Crash Recovery will only undo current unit of work which should not be large
– Reorg will be paused but can be resumed
• Online Index Reorg:
– All logged changes will be rolled back
– Create Index will need to be reissued
Figure 8-60. REORG utility: Crash recovery CL4636.0
Notes:
Offline table REORG is all-or-nothing up until the Copy phase has begun. That is, the table
REORG is rolled back if the Copy phase has not been reached.
Online table REORG processes a table incrementally, is fully logged, and commits
regularly. If a crash occurs during an online table reorganization, the current unit or work
will be rolled back and the utility processing will be paused. The REORG TABLE command
with a RESUME option can be used to restart and complete processing.
For crash recovery that involves an offline table REORG, the temp file for the REORG
object is required, but not the temp space required by doing the scansort. Suppose a crash
occurs when the REORG object is completely populated and the process of copying it back
over the old data has started. Recovery will restart the Copy/Replace phase from the
beginning and all the data in the REORG object is required for recovery. There is a big
difference between SMS and DMS table spaces in this case. The SMS object will have to
be copied from one object to the other, while for DMS, the new REORG object is simply
pointed to and the original table dropped if the reorg was done in the same table space.
Instructor Guide
If a crash recovery is performed for an offline table REORG and the temporary table space
data needed to do the copy/replace is lost, it might be necessary to restore the table space
to recover access to the tables in the table space of the REORGed table.
If a crash occurs during online index reorganization, the processing will be rolled back and
the CREATE INDEX will need to be run again.
V8.2
Instructor Guide

Purpose — This explains some of the considerations for crash recovery processing if a
REORG was running at the time of the crash.
Details —
Transition statement — Let's look the processing associated with REORG utilities during
roll forward recovery of a database or table space.
Instructor Guide
REORG utility: Roll forward recovery

• During database or table space roll forward, the REORG processing will
be redone
• For an offline reclustering REORG, the logged RIDs will be used to

recluster the table so no scan-sort is needed
• For offline REORGs, space will be allocated in table's table space if

original REORG did not include USE option for temporary table space
– Parallel recovery could redo several REORGs at the same time and
require additional table space pages
• Online REORGs will be redone using logged changes, which may take
a long time
• Best to backup database or table spaces after REORGs to avoid

redoing the REORG processing during a roll forward
Figure 8-61. REORG utility: Roll forward recovery CL4636.0
Notes:
Rolling forward through an offline table REORG will use the logged RIDs to recreate the
order of operations that created the reorged object. This means that there is no index scan
or scansort performed, and the only temp space required would be for the REORG object
itself, which would be within the table's table space if no TEMP space was originally used.
During table offline REORG, unless the REORG copy is specified to be in a different TEMP
table space, the REORG copy is built in the same table space. During roll forward, multiple
REORGs can be played concurrently due to parallel recovery. In this case, the disk space
usage will be different (more is consumed) than at run time and there could be a roll
forward error caused by running out of available space.
The roll forward processing for the online REORG could take a long time to redo the large
number of changes logged during online table reorganization.
To avoid the additional roll forward processing for online or offline REORGs, taking a
database or table space backup when the REORGs are completed would allow the roll
forward processing to begin at a point in the logs after the REORGs are committed.
V8.2
Instructor Guide

Purpose — This discussed some of the considerations for roll forward processing of online
and offline reorganization.
Details —
Transition statement — Let's look at the incompatibilities between online and offline
REORGs with other DB2 commands or functions.
Instructor Guide
Incompatibilities
Offline TABLE Online TABLE Online INDEX
REORG REORG REORG
CREATE INDEX X X X
RUNSTATS X
Offline TABLE REORG X X X
Offline LOAD X X X
Online BACKUP X
ROLLFORWARD X X X
INSPECT X
IMPORT X X
EXPORT X
Online INDEX REORG X X X
RESTORE X X X
Online TABLE REORG X X X
Online LOAD X X X
DDL Z-lock X X X
Figure 8-62. Incompatibilities CL4636.0
Notes:
This table shows which DB2 utilities or functions are not allowed to operate concurrently,
shown with a X), with table and index reorganizations running online or offline. This shows
that none of the Utility functions are compatible with an offline table reorganization.
Online (inplace) table reorganization and online index reorganization can be running
concurrently with an online Backup, RUNSTATS, INSPECT, and EXPORT. In addition, the
online index reorganization can be running concurrently with an IMPORT utility.
V8.2
Instructor Guide

Purpose — This shows the incompatibilities between REORG utility operations and other
DB2 commands or functions.
Details —
Transition statement — Let's look at the locking considerations for the REORG utility.
Instructor Guide
Locking for the REORG utility

• REORG TABLE:
– IS table and NS row lock on row in SYSCAT.TABLES
– Classic, non-inplace - IX table space, U table, upgrade to Z for
replacement
– In-place - IX table space, IS table, X alter table, S on rows
moved/cleanup, upgrade to S on table to prepare for truncate,
special Z drain/wait to truncate
• REORG INDEXES:
– IS table and NS row lock on row in SYSCAT.TABLES
– IX table space, IN table, X alter table
– S drain lock for each index (all writers must be aware)
– S lock at end to perform final log catch-up, then Z lock to perform
index switch
Figure 8-63. Locking for the REORG utility CL4636.0
Notes:
Here are some of the lock considerations for table and index reorganization.
REORG TABLE
• IS table and NS row lock on row in SYSCAT.TABLES
• Classic, non-inplace – IX table space, U table, upgrade to Z for replacement
• In-place – IX table space, IS table, X alter table, S on rows moved/cleanup, upgrade to
S on table to prepare for truncate, special Z drain/wait to truncate
REORG INDEXES
• IS table and NS row lock on row in SYSCAT.TABLES
• IX table space, IN table, X alter table
• S drain lock for each index (all writers must be aware)
• S lock at end to perform final log catch-up, then Z lock to perform index switch
Additional notes:
Does ONLINE Table REORG issue commit? If so, what happens when a commit cannot be
issued due to lock-wait status?
V8.2
Instructor Guide
Uempty Yes, the inplace REORG issues commits after each phase of processing. If the inplace
REORG is in Lock-wait, it will behave like any other transaction that is in Lock-wait that has
not issued a commit.
Can a user application (who does not have REORG privileges) acquire an Internal Reorg
Lock on a table even while it is NOT doing a REORG? A lock snapshot of a user
application is showing an Internal Reorg Lock.
An application that is executing against the same table that is being REORGed will show a
shared Internal Reorg Lock. This does not mean that particular application is performing a
REORG, just that it is executing at the same time as the REORG, and on the same table.
This type of lock is used as a synchronization mechanism for inplace REORG processing.
Can table reorg lock timeout a user application?
Yes, just like any other transaction.
When Online Table REORG (OLR) yields to an application, is it by design for it to leave an
open transaction?
NO. OLR needs to complete the transaction to remove temporary records created by OLR
before it can commit.
Can OLR be paused when in Lock-wait status? If it can, will it still hold the active log when
paused?
OLR can be paused when in Lock-wait status. However, after being paused, OLR will be
put in a Cleanup phase. And during Cleanup phase, it is possible that OLR might again end
up in Lock-wait status since it might have to wait for applications to release locks before
cleaning up the rows. Thus pausing OLR does not guarantee that it stays out of Lock-wait
status, and hence it might still hold the active log after being paused. When the pause is
able to complete, there will no longer be an OLR process/transaction to hold back the
active log.
Does OLR always acquire locks without timeout or does it depend on the phase OLR is in?
Yes, OLR always acquires locks without timeout.
General recommendation is to commit frequently on application side. If OLR does not
make progress for a long time, get a snapshot, find out which application holds the lock,
and force that application so that OLR can progress. There might be a long-running open
cursor from a command line, monitoring tool, or other script which might be holding back
the progress of OLR indefinitely, so try to identify that application during this process.
If OLR is just like any other transaction, does it timeout based on the LOCKTIMEOUT
parameter? What OLIR?
For online table REORG, the LOCKTIMEOUT parameter is not respected, that is, OLR
does not timeout based on LOCKTIMEOUT. Since OLR was designed to run in the
background with minimal impact to concurrent applications, it will wait on locks indefinitely.
For online index REORG at the end of the catch-up phase, it first quiesces the writers by
requesting a S lock and then before switching the shadow object to the real index object, it
Instructor Guide
requests a Z lock. Both locks are not to be timed out and will be less likely to be chosen as
the deadlock victim if deadlock occurs.
Is Online Index REORG any different than Online Table REORG with respect to
concurrency?
Inplace table REORG was designed to have minimal impact on concurrent applications.
Inplace Table REORG holds S row locks for very short periods of time while it moves rows
around in the table. Online Index REORG is not inplace; it uses a shadow object which is
invisible to the users until a switch occurs between the original and the shadow object. This
means that for an index REORG that specifies ALLOW WRITE ACCESS, minimal locking
will be done during the Index Build phase but all activity on the indexes will need to be
quiesced in order to switch the original and rebuilt (shadow) objects. Inplace Table REORG
has no comparable need for a quiesce so there is less of a concurrency issue with that
operation.
V8.2
Instructor Guide

Purpose — This describes the locking considerations for the REORG utility running in
different modes.
Details —
Transition statement — Let's summarize this unit.
Instructor Guide
Unit summary
• Examine a REORGCHK report to determine which tables and indexes
to reorganize
indexes
• Compare using REORG to build a compression dictionary to automatic
dictionary creation
• Utilize the RECLAIM EXTENTS option of REORG to free unused space
in data and indexes with minimal processing
REORGs
Notes:
V8.2
Instructor Guide

Purpose —
Details —
Instructor Guide
Student exercise 7
Notes:
V8.2
Instructor Guide

Purpose —
Details —
Instructor Guide
V8.2
Instructor Guide
Uempty Unit 9. Multiple Dimension Clustering
Estimated time
02:00

This unit describes the concepts and implementation steps required to
implement multidimensional clustering (MDC) tables. Students learn
which types of SQL queries might benefit from the block-level indexing
and prefetching used for MDC tables. We will cover the options
available to efficiently roll-in or roll-out large amounts of data, including
using a table reorganization to reclaim unused disk space.

• Compare the features and performance advantages of
multidimensional clustering (MDC) to single-dimensional clustering
• Define the concepts of MDC tables, including cell, slice, and
dimension
• Describe the characteristics of the block indexes used for MDC
tables including the index maintenance performed for SQL
INSERT, DELETE, and UPDATEs
• Explain how the block and row indexes can be combined to
efficiently process SQL statements
• Utilize the LOAD utility to roll-in new data into a MDC table
• Select options for efficient data roll-out and roll-in
• Analyze the effects on table space size of selecting alternative
dimensions and extent sizes
© Copyright IBM Corp. 2005, 2015 Unit 9. Multiple Dimension Clustering 9-1
Instructor Guide
Unit objectives
• Compare the features and performance advantages of multidimensional clustering
(MDC) to single-dimensional clustering
• Define the concepts of MDC tables, including cell, slice, and dimension
• Describe the characteristics of the block indexes used for MDC tables including the
index maintenance performed for SQL INSERT, DELETE, and UPDATEs
• Explain how the block and row indexes can be combined to efficiently process SQL
statements
• Utilize the LOAD Utility to roll-in new data into a MDC table
• Analyze the effects on table space size of Introducing...
selecting alternative dimensions and extent sizes MDC!
Notes:
V8.2
Instructor Guide

Instructor Guide
Single dimensional clustering

• Benefits:
– One can physically cluster data on insert according to the order
of a single clustering index
– It improves the performance of range queries and prefetching
• How it works: Clustering
index
– On insert, the clustering index is used to find the location of the on region
same or next key and we attempt to insert on the same page
– If there is insufficient space on the target page, we search in
a spiral fashion through the target free space map page
(covers 500 data pages) Table
– If no page is found in the target free space map page, we

search other free space map pages using the normal insert
algorithm but using a worst-fit search
Index
on year
• Drawbacks:
– Clustering is in single dimension only
– All other indexes are unclustered and may require more I/Os
for scanning ranges of rows
– Clustering degrades over time requiring REORG
– These are row based indexes, which are often very large
Figure 9-2. Single dimensional clustering CL4636.0
Notes:
Benefits:
• The CLUSTER option for an index on a table can be used to physically cluster data on
insert according to the order of a single 'clustering' index. The primary purpose of the
clustered index is to improve the performance of range queries and prefetching. For
example, if there was a clustered index on the REGION column, queries that need to
access one region or a range of regions will be able to efficiently retrieve the data from
a relatively small number of pages.
• When new rows are inserted into a table with a clustered index, the clustering index is
used to find the location of the same or next key and we attempt to insert using the
same page. If there is insufficient space on the target page, we search in a spiral
fashion through the target free space map page, which covers a 500 data page range. If
no page is found in the target free space map page, we search other free space map
pages using the normal insert algorithm but using a worst-fit search to place the new
row in an area that could efficiently handle additional similar inserts.
V8.2
Instructor Guide
Uempty Drawbacks:
• A table can only have one clustered index. All other indexes are unclustered and might
require more I/Os for scanning ranges of rows. In the example shown, an index on the
YEAR column might be very unclustered, so that queries that need to access the rows
for one or several years might need to access a large number of pages. In many cases,
the DB2 Optimizer might choose to perform a full table scan rather than use an
unclustered index, if using the unclustered index would require too many I/Os and take
longer than the table scan.
• If there is not enough free space within the table to insert the new rows, the clustering
might degrade over time. The REORG utility can be used to recluster a table, so that
the sequence of the rows in the data pages matches the sequence of keys in the
clustered index.
• Standard indexes are row-based indexes, containing a pointer to every row in the table,
so large tables with several indexes might require a large amount of disk storage for the
indexes.
Instructor Guide
Instructor notes:
Purpose — To review the characteristics of tables with a single dimensional clustered
index.
Details —
Transition statement — Let's introduce the concept of multidimensional clustering for
DB2 tables.
V8.2
Instructor Guide
Uempty 9.1. Multidimensional Clustering

Instructor Guide
MDC: Rows clustered by Dimension values

• In multidimensional clustered (MDC) tables, data is organized
along block (extent) boundaries according to dimension
(clustering) values
Extents making up an MDC table

with dimensions region and year
East, 2012
East, 2012
Dimension North, 2012 Dimension
Block Region Year Block
Index North, 2013
Index
on Region North, 2013 on Year
South,2012
4
Data stored
Figure 9-3. MDC: Rows clustered by Dimension values CL4636.0
Notes:
In multidimensional clustered (MDC) tables, data is organized along block (extent)
boundaries according to dimension (clustering) values.
In this example, we have an MDC table with two dimensions: region and year.
• When data is inserted into the table, records having different dimension values are put
into separate extents.
• In this way, each extent contains data that has a particular combination of dimension
values; and a particular set of dimension values will ONLY be found in a subset of
extents of the table.
• There might be more records having a particular set of dimension values than fit in a
single extent. Multiple extents can be assigned to a particular set of dimension values. If
there are just a few records with a particular set of dimension values the extent will be
partially filled and that space can not be used for records with a different set of
dimension values. For example, there are only a few records for the Region 'North' and
Year '2012', but if the block containing rows for region 'North' and year '2013' is filled a
V8.2
Instructor Guide
Uempty new empty block will need to be allocated, rather than using the empty pages in the
block for region 'North' and year '2012'.
• Dimension values are indexed with BLOCK indexes, that is, indexes which point to
extents instead of individual records.
• Selection of Block size (Extent size) for MDC tables will have a significant impact on the
disk space required to store the data.
Instructor Guide
Instructor notes:
Purpose — Identify how MDC works.
Details — MDC is a way to cluster the data so that it never gets out of clustered sequence,
and so that it can be clustered on several different dimensions at the same time.
Additional information — You should indicate that block and extent are interchangeable
terms in MDC tables.
The important thing to note is that, within any given extent, all of the rows will have the
same values for each of the dimensions. In the example, dimensions are region and year.
There are only a few rows for North/2012 so several pages are empty, while there are
enough rows for North/2013 to require several blocks. Rows with different dimension
values will never share a block of pages.
Transition statement — Let's look at the syntax used to define an MDC table.
V8.2
Instructor Guide
Uempty
MDC Create Table example

CREATE TABLE MDCTABLE (
YEAR INT,
STATE CHAR (2),
SALES INT,
...)
ORGANIZE BY (YEAR, STATE)
Extent
Logical Physical
Block Index Page #s Page #s Block Index
on YEAR on STATE
0-3 2012, NY 132-135
YEA 2012, NY STAT
4-7 700-703 E
STATE
R
YEAR
2012, NJ
8-11 444-447
12-15 2013, NM 704-707
Figure 9-4. MDC Create Table example CL4636.0
Notes:
The following is an example of the Data Definition Language (DDL) used to define a MDC
table:
CREATE TABLE MDCTABLE (
YEAR INT,
STATE CHAR (2),
SALES INT,
….)
ORGANIZE BY (YEAR, STATE)
This SQL statement would create a MDC table with two dimensions, YEAR and STATE. All
of the records for the YEAR '2012' in the STATE 'NY' will be stored in one or more blocks,
while records for the YEAR '2012' in the STATE 'NJ' will be stored in other blocks. One
index will be maintained to point to each block by the YEAR value and another index will
point to each block by the STATE value. This allows applications to efficiently access the
table by either YEAR or STATE independently, as well as find the rows for any combination
of YEAR and STATE.
Instructor Guide
Notice that it is not necessary to define any specific values or ranges for the YEAR and
STATE dimensions.
The mapping of the MDC block to the table space extent improves the I/O performance for
accessing a block of rows. The pages within an extent can be stored consecutively on disk
even though all of the table's pages might not reside sequentially on disk.
The dimensions defined on a MDC table are not limited to a single column. Here is an
example of a CREATE TABLE with two dimensions, one using the two columns Year and
Nation and a second dimension on the single column Color.
CREATE TABLE MDCTABLE
( Year INT,
Nation CHAR(25),
Color VARCHAR(10), .......... )
ORGANIZE BY( (Year, Nation), Color )
This MDC table will have 2 dimension block indexes, one on (Year, Nation), another on
Color and a composite block index on (Year, Nation, Color).
V8.2
Instructor Guide

Purpose — An example is provided to show how simple it is to define a MDC table.
Details —
Transition statement — Let's look at the term dimension used for defining MDC tables.
Instructor Guide
Terminology: Dimension
• A dimension is an axis along which data is physically
organized in an MDC table
2013, 2012,
Canada, Canada,
blue 2013, yellow
2013,
Canada, Canada,
Nation yellow yellow
dimension
2013 2012,
Mexico, Mexico,
blue yellow
2013,
2013, Mexico,
Mexico, yellow
yellow
Color Year
dimension dimension
Figure 9-5. Terminology: Dimension CL4636.0
Notes:
In this example, we have three dimensions: nation, color, and year.
• The table can be thought of as being organized into a three-dimensional cube.
• Data having particular dimension values can be found via that dimension's axis in the
grid.
• This cube is simply a way of conceptualizing how the data is organized in a MDC table
having three dimensions.
V8.2
Instructor Guide

Purpose — Defining terms: dimension.
Details — In this particular example, we have three dimensions on the table: nation, color,
and year. A cube is a good way to illustrate these dimensions and distinct values of the
dimensions.
Transition statement — Let' look at the next term: slice.
Instructor Guide
Terminology: Slice
• A slice is the portion of the table containing data having a
certain key value of one of the dimensions
2013, 2012, Canada slice of

Canada, Canada,
blue yellow
2013,
Canada,
2013,
Canada,
Nation Dimension
dimension
2013 2012,
Mexico, Mexico,
blue yellow
2013,
2013, Mexico,
Mexico, yellow
yellow
Color Year
dimension dimension
Figure 9-6. Terminology: Slice CL4636.0
Notes:
Continuing with the three-dimensional table example:
• Any value of a particular dimension defines for us a slice of the table.
• That slice contains all data in the table having that value for that dimension, and only
that data.
• In this case, we show the slice for nation = Canada.
• The data with year = 2013, would be a slice based on the Year dimension.
• A request for all rows with color = yellow, would be a slice based on the Color
dimension.
V8.2
Instructor Guide

Purpose — Define slice.
Details — Use visual to define slice. A slice is illustrated.
Transition statement — The next term is cell.
Instructor Guide
Terminology: Cell
• A cell is the portion of the table containing data having a
unique set of dimension values;
– The intersection formed by taking a slice from each dimension
2013, 2012, Cell for

1998,
Canada, Canada,
Mexico,
yellow
(2013, Canada,
blue 2013, 2013, yellow
Canada, Canada, yellow)
dimension
2013 2012,
1998,
Each cell
Mexico, Mexico,
blue
Canada,
yellow contains one
2013, yellow
2013, Mexico, or more blocks
Mexico, yellow
yellow
Color Year
dimension dimension
Figure 9-7. Terminology: Cell CL4636.0
Notes:
A cell is the overlap or intersection of slices from each of the dimensions.
• There is a logical cell in the table for each unique combination of existing dimension
values.
• Each cell is physically made up of one or more extents or blocks, which themselves
contain data having the cell's dimension values. All the rows with a nation of 'Canada', a
color of 'Yellow', and year value of '2013' would make up one cell in the MDC table.
V8.2
Instructor Guide

Purpose — Define cell.
Details — A cell is illustrated.
Additional information — Physically, each cell can be made up of one or more blocks.
Transition statement — Next we look at the block index for one dimension.
Instructor Guide
Dimension block indexes

Each key value corresponds
to a different slice of the table.
Nation
dimension
Block Canada
index on
Nation
Mexico
dimension
Year
Color dimension
dimension
Key for Canada:
Each key has a list of BIDs (Block IDs):
Canada 4,0 12,0 48,0 52,0 76,0 100,0 216,0 292,0304,0 444,0
One for each block belonging to that
Keypart BID (Block ID) = <first pool relative page of block, 0> slice of the table
Figure 9-8. Dimension block indexes CL4636.0
Notes:
• This diagram attempts to illustrate that each slice and cell in the table actually contains
a number of extents or blocks containing data associated with those slices or cells.
• A slice corresponds to a key and its list of IDs in a block index.
• Note that the index key has a list of block IDs; a block ID is made up of the first pool
relative page of the block and a dummy (0) slot. In the example, the first block ID with a
key value of 'Canada' in the Nation Dimension index is (4,0). If the MDC has a block
size of 4 pages, DB2 would access pages 4,5,6 and 7.
• Compare this to a RID in a RID index, which is made up of the page number and slot
number of a record in the table.
• Since the BID and RID structure is so similar, index manager treats both the same until
the block or record is actually accessed.
V8.2
Instructor Guide

Purpose — Show what a block index looks like.
Details — A dimension block index, for example, for the dimension "" would have an entry
for "" and then pointers to all the extents that contain "" in the "" dimension. Instead of
having real slot values, a dummy value of 0 is held for the slot.
Additional information — A block index is made up of key part and set of block IDs (just
as a RID index is made up of key part and RID); the slot is a dummy entry of "0".
Transition statement — Let's look at the how these block indexes are created and
maintained.
Instructor Guide
Block Indexes
• Automatically created and maintained by DB2 when the
CREATE TABLE statement is issued
• As system objects, block indexes cannot be manually created,

renamed or dropped
• Types of Block Indexes:

– Dimension Block Index: One per dimension, which contains
pointers to each occupied block for that dimension. Each key
value points to one slice.
– Composite Block Index: One per table, which contains all
columns involved in all dimensions specified for the table. The
composite block index is used to maintain clustering during
insert and update activities. It can also be used for query
processing. Each key value points to one cell.
Figure 9-9. Block Indexes CL4636.0
Notes:
The Block indexes for MDC tables are automatically created and maintained by DB2 when
the CREATE TABLE statement is issued. As system objects, block indexes cannot be
manually created, renamed or dropped.
When you create a table, you can specify one or more keys as dimensions along which to
cluster the data. Each of these MDC dimensions can consist of one or more columns
similar to regular index keys.
A dimension block index will be automatically created for each of the dimensions specified,
and it will be used by the optimizer to quickly and efficiently access data along each
dimension. Each key value points to one slice.
A composite block index will also automatically be created, containing all columns across
all dimensions, and will be used to maintain the clustering of data over insert and update
activity. A composite block index will only be created if a single dimension does not already
contain all the dimension key columns. The composite block index might also be selected
by the optimizer to efficiently access data that satisfies values from a subset, or from all of
the column dimensions. Each key value points to one cell.
V8.2
Instructor Guide

Purpose — This explains that ALL of the block indexes are created as part of the CREATE
TABLE processing. The composite block index is introduced here as an additional required
index that is used for insert, update and delete operations in addition to query processing.
Details —
Transition statement — Now let’s compare block indexes to the standard row indexes on
DB2 tables.
Instructor Guide
Row Indexes versus Block Indexes
Row Indexes: 1 index

entry per row
Block Indexes:
1 index entry per block
= Row = Extent (block)
Figure 9-10. Row Indexes versus Block Indexes CL4636.0
Notes:
The standard indexes on DB2 tables are Row ID (RID) indexes, with one RID per data
record.
Block Indexes (BIDs) only have one index entry pointing to each block. Consequently, the
size of the BID is dramatically smaller than the RID index. The smaller size for the block
index increases the probability that a BID index page needed is in the memory cache and
reduces the number of buffer pool pages required to maintain a high index hit ratio.
Both RID and Block indexes are maintained as tree structures. The example shows that the
block index, with fewer levels can reduce the number of index pages required to access the
table's data.
Multiple block indexes can be created for access to the same fact table. Since these
indexes are small, the disk space required to support these indexes is also reduced.
V8.2
Instructor Guide

Purpose — This is intended to clarify the advantage of using block indexes in reducing the
size and number of levels in the index.
Details —
Transition statement — Let's look at some of the characteristics of the block indexes.
Instructor Guide
Block Index characteristics

• Rows are sorted and grouped by dimension values
• Each Dimension has its own index:
– Indexes are small because they point to blocks of rows (not RIDs),
smaller by a factor of block size * avg # records in page (where
block size = # pages in an extent (2-256) )
– Can be kept completely in memory
– Insert into index only for first row in a block
– Delete index entry when last row in block deleted
• Dynamic bit map ANDing to select rows:

– Fetch or prefetch relevant blocks only
– Index ANDing can combine multiple block and/or RID indexes
Figure 9-11. Block Index characteristics CL4636.0
Notes:
In an MDC table, the rows are sorted and grouped by dimension values.
Each Dimension has its own index. The Block Indexes are small because they point to
blocks of rows (not RIDs), smaller by a factor of block size * avg # records in page (where
block size = # pages in an extent (2-256 pages) ).
These smaller block indexes can be kept completely in memory.
The block index is only updated when the first row is Inserted into a new block and does not
need to be updated as additional rows are added to the block. In the same way, the block
index entry is only deleted if the last row in a block is deleted.
DB2 can utilize various techniques when accessing MDC tables including:
• Dynamic bit map ANDing to select rows
• Fetch or prefetch relevant blocks only
• Index ANDing can combine multiple block and/or RID indexes
V8.2
Instructor Guide

Purpose — This presents some of the unique features of the block indexes on MDC
tables. DB2 can utilize the composite block index or any subset of the dimension block
indexes effectively.
Details —
Transition statement — Let's next look at an example of using multiple dimension block
indexes using dynamic index ANDing.
Instructor Guide
Block Index:
Dynamic bitmap ANDing – Query example 1
Block Index on Date Block Index on Category Block Index on Store
2013 Block IDs Catg Block IDs Store Block IDs

050 15,20,22,29, Toys 9,15,21,23, 21018,29,31
52 32,44 21123,24
051 32 Auto 63 21210,11,15,21,
052 8,9,10,44 Clothing 10,11,43,45 44,45
053 10,11, 21 Tools 22,32 213 9,22
01000101011000101100100100
01101100101000110110110100
00001011001011001101001101
Select Sum (handling_charge) from mdc1
Blocks selected = 15, 21, & 44 where date between 2013050 and 2013053
and category = "toys"
and store = 212
Figure 9-12. Block Index: Dynamic bitmap ANDing ?Query example 1 CL4636.0
Notes:
If an SQL query includes several predicates with AND conditions, the DB2 Optimizer might
select the access strategy of 'Dynamic bitmap Index ANDing', which involves reading
several indexes and creating dynamic bitmaps from the RIDs to reduce the list of pages
that need to accessed. This access technique can also be used with the BIDs in MDC
Block Indexes.
The example query is:
Select Sum (handling_charge) from mdc1
where date between 2013050 and 2013053
and category = "toys"
and store = 212
The table has three dimension indexes on the date, category, and store columns. Any one
of these indexes could be used to retrieve a group of blocks that would contain all of the
necessary rows. By reading all three indexes and processing the resulting bitmaps in
memory, only those blocks that were pointed to by all three indexes need to be accessed.
This can reduce the I/Os required to retrieve the result set and improve performance.
V8.2
Instructor Guide

Purpose — To give an example of using multiple block indexes with dynamic bitmap index
ANDing to improve the performance of a SQL query with several AND predicates.
Details —
Transition statement — Let's look at another example of using multiple block indexes to
process a query.
Instructor Guide
Query processing: Example 2

SELECT * FROM MDCTABLE WHERE COLOR='BLUE' AND NATION='USA'
• First, a dimension block index lookup is done
• Then, block ANDing
• Finally, a mini-relation scan of resulting blocks in the table is done
Key from dimension block Key from dimension block Resulting BID list of
index on Color index on Nation resulting blocks to scan
Blue 4,0 12,0 48,0 52,0 76,0 100,0 216,0 USA 12,0 76,0 92,0 100,0112,0216,0 276,0 12,0 76,0 100,0 216,0
+ =
Figure 9-13. Query processing: Example 2 CL4636.0
Notes:
This is another example of using multiple dimension block indexes to process a query.
The query text is:
SELECT * FROM MDCTABLE WHERE COLOR='BLUE' AND NATION='USA'
The COLOR dimension index would be accessed to get a list of the blocks for the key of
'BLUE'. Next the NATION dimension index would be read to get a list of the blocks with a
key of 'USA'. The resulting bitmaps can be ANDed to find the blocks that were common to
both keys. Finally, a mini-relation scan of resulting blocks in the table will be done.
V8.2
Instructor Guide

Purpose — This provides another example of using multiple block indexes to process an
SQL query.
Details —
Transition statement — Let's look at an example of combining a block index and a
standard RID index to process a query.
Instructor Guide

• Say we have dimensions color, year, nation and a RID index on part #
SELECT * FROM MDCTABLE WHERE COLOR='BLUE' AND PARTNO < 1000
• First, a dimension block index lookup and a RID index lookup is done
• Then, block and RID ANDing
• The result is only those RIDs belonging to qualifying blocks
Key from dimension block index RIDs from RID index on Part # Resulting RIDs to
on Color fetch
Blue 4,0 12,0 48,0 52,0 76,0 100,0 216,0 6,4 8,12 50,1 77,3 107,0115,0219,5 276,9 6,4 50,1 77,3 219,5
+ =
Notes:
This is an example of combining a dimension block index on the COLOR column and a RID
index on the PARTNO column.
The query text is:
SELECT * FROM MDCTABLE WHERE COLOR='BLUE' AND PARTNO < 1000
'BLUE'. Next the PARTNO index would be read to get a list of the RIDs with a key less than
1000. In combining a mixture of block and row indexes, DB2 needs to interpret the block
IDs as containing all of the rows in the pages in that block. For example, the first BID for the
key of 'BLUE' is (4,0), which includes all of the rows in pages 4,5,6, and 7 (based on a
extent size of 4 pages). This includes the row found in the PARTNO index (6,4). The result
is a list of RIDs that are also contained in the blocks found in the COLOR block index.
V8.2
Instructor Guide

Purpose — To provide an example of combining an MDC block index and a standard RID
index on the same table to produce a query result.
Details —
Transition statement — Next we will look at an example that shows a block index being
combined with a row index for a query containing an OR condition.
Instructor Guide

• Color, year, and nation dimensions plus a RID index on part #
SELECT * FROM MDCTABLE WHERE COLOR='BLUE' OR PARTNO < 1000
• First a dimension block index lookup and RID index lookup will be
done
• Then, block and RID ORing
• The result is all records in qualifying blocks, plus additional RIDs
outside of those
Key from dimension blocks RIDs from RID index
block index Resulting blocks and RIDs
on Color on Part # to fetch
4,0 12,0 48,0 52,0 76,0 100,0 216,0
Blue 4,0 12,0 48,0 52,0 76,0 100,0 216,0 6,4 8,12 50,1 77,3 107,0 115,0 219,5 276,9
8,12 107,0 115,0 276,9
+ =
Notes:
This is another example of combining a dimension block index on the COLOR column and
a RID index on the PARTNO column. This query uses an OR condition rather than the AND
condition.
The query text is:
SELECT * FROM MDCTABLE WHERE COLOR='BLUE' OR PARTNO < 1000
'BLUE'. Next the PARTNO index would be read to get a list of the RIDs with a key less than
1000. The index ORing function would be used to remove the rows that are duplicated in
the two lists. For example, the PARTNO RID index contained a pointer (6,4), but page 6
row 4 would be duplicated in one of the BIDs from the COLOR index, Block (4,0) which
would contain all the rows in pages 4,5,6 and 7.
V8.2
Instructor Guide

Purpose —
Details —
Instructor Guide
Differences between MDCs and clustering

indexes
MDC Clustering Index

Data in table is always organized by MDC Reduces but does not eliminate the need for reorg
columns in data blocks (non-enforced clustering). Clustering can be better
(enforced clustering). maintained if the right amount of PCTFREE is
defined.
Data may be clustered by many dimensions. Data clustering is unidimensional.
Block Indexes (one per dimension). Row index (one clustering index per table).
Reorg releases unused blocks or combines Reorg orders the rows stored in a table based on the
partially filled blocks. cluster index columns.
To avoid excessive unnecessary storage Cluster column(s) can be more granular than MDCs.
allocation, dimension granularity should be more
coarse with lower cardinality.
May increase the size of the table on disk Does not affect the size of the table on disk unless
considerably if inappropriate dimension design you need to increase PCTFREE to maintain
chosen. clustering.
Figure 9-16. Differences between MDCs and clustering indexes CL4636.0
Notes:
To compare the characteristics of MDC tables to a table with clustered index:
• The data in an MDC table is always organized by the defined dimension columns in
data blocks. A clustered index might reduce the need to reorganize a table, but if
sufficient free space is not available, newly inserted rows can be stored out of the
clustered sequence. The PCTFREE option, which has a default of 0%, can be used to
specify the free space for a table. The LOAD and REORG utilities build the data pages
based on the PCTFREE option.
• With a clustered index, the data is clustered by one index, which might reduce the
performance of the non-clustered indexes. A MDC table can have multiple dimensions
defined, so that any of the dimension indexes can perform well.
• The DB2 REORG utility can be used to recluster a table with a clustered index so that
the data rows match the sequence of the keys in the clustered index. A REORG utility
can not specify an index for an MDC table. The REORG for a MDC table is only used to
release unused blocks or combine partially filled blocks to reduce the table size, and the
REORG must be run offline.
V8.2
Instructor Guide
Uempty • In order to avoid excessive unnecessary storage allocation, the granularity of the
dimensions defined for a MDC table should be more coarse with lower cardinality. If a
set of columns with unique values were selected as the dimensions for a MDC, there
would only be one row per block. The granularity of a clustered index is much less
important; a table with a unique clustered index would not require additional storage.
• The careful selection of dimension columns and extent size are critical to controlling the
size of a MDC table. The size of a table with a clustered index might be effected by use
of the PCTFREE option for the table.
Instructor Guide
Instructor notes:
Purpose — To review the differences between a MDC table and a table with a clustered
index.
Details —
Transition statement — Let's look at the block map that is used to track the status of the
blocks in a MDC table.
V8.2
Instructor Guide
Uempty
The Block Map

• The block map is a MDC structure which stores the status of each block of
the table
• Free blocks can be easily found for use in a new or currently full cell
• MDC load can reuse free blocks rather than extending to new blocks:
– Load Status for blocks allow reads during Load
– Set Integrity Pending for blocks that need SET INTEGRITY after Load
– Refresh Status for blocks to REFRESH MQT after Load
• Table scans also use the block map to quickly Extents in the table
access only extents currently containing data 0
1
0 1 2 3 4 5 6 7 ...
2 East, 2012
X F U U U F U F ...
3 North, 2012
North, 2013 Year
4
X Reserved
5
F Free - no bits set
6 South, 2012
U In use - data assigned to a cell ..
.
Reserved Data stored
Figure 9-17. The Block Map CL4636.0
Notes:
The block map is a MDC structure which stores the status of each block of the table. When
all of the rows in a block are deleted, the block is changed to a free status. Free blocks can
be easily found for use in either a new or currently full cells.
When the LOAD utility is used on a MDC table, it can reuse free blocks rather than
extending to new blocks. This is a big advantage for tables where data is rolled out and
then newer data is loaded. If an online LOAD is run, a Load Status for blocks allows the
existing rows to be read but bypass the newly loaded rows until the load processing
completes. During the LOAD a 'Set Integrity Pending' status will be set for the newly loaded
blocks so that the SET INTEGRITY can limit the checking to the new data. If the MDC table
is a source for a Materialized Query Table, the LOAD utility will set a Refresh Status for the
new blocks, so the REFRESH TABLE processing will be limited to new data.
If a query requires a table scan of a MDC table, the block map can be used to limit the scan
to the extents currently containing data, skipping the free blocks.
Instructor Guide
Instructor notes:
Purpose — This is used to explain the use of the block map for processing a MDC table
including the various status indicators set by the LOAD utility.
Details —
Transition statement — Let's look at the processing for inserting new rows into a MDC
table.
V8.2
Instructor Guide
Uempty
Insert processing details: Existing block
• Step 1: Probe composite block index

for cell values
C
Canada,2013 blue 52,0 80,0 172,0 332,0

• Step 2: If cell exists, scan list of BIDs
• Step 3: For each BID, scan block for

FSCR
space, using block's FSCR
Page 52
• Step 4: If space found on a page in the

New record
block, log and insert record
• Step 5: If RID indexes exist, log and

New key/RID in RID index
insert key/RID in each
Figure 9-18. Insert processing details: Existing block CL4636.0
Notes:
Since rows in a MDC table must be stored in blocks based on the values of the dimension
columns specified during the CREATE TABLE, the processing of INSERT, UPDATE, and
DELETE statements is different than in non-MDC tables.
For inserting a row into an existing block in a MDC table, the following steps are taken:
Step 1: Probe the composite block index for cell values of the new row.
Step 2: If a cell exists with the same values, get the list of BIDs.
Step 3: For each BID, scan the block for available space, using block's FSCR.
Step 4: If sufficient space is found on a page in the block, log and insert record.
Step 5: If RID index(es) exist, log and insert key/RID for each index.
Instructor Guide
Instructor notes:
Purpose — This shows the steps required to insert a row into an existing block.
Details —
Transition statement — Let's look at the steps necessary when a new block needs to be
created to process an insert.
V8.2
Instructor Guide
Uempty
Insert processing details: New block
• Step 1: Probe composite block index for

C
cell values
0 1 2 3 4 5 6 7 ...
X F U U U F U F ...
• Step 2: If cell does not exist, or no space found
0 1 2 3 4 5 6 7 ...
in existing blocks, scan block map for
free block
X U U U U F U F ...
• Step 3: If free block found, log and change

block status to in-use
C
New BID in block index

• Step 4: Log and add new key/BID to each
block index (dims and composite)
New record • Step 5: Log and insert record into block
• Step 6: If RID indexes exist, log and insert

New key/RID in RID index
key/RID in each
Figure 9-19. Insert processing details: New block CL4636.0
Notes:
If a row is inserted into a MDC, a new block might need to be created to store that data. For
inserting a row into a new block in a MDC table, the following steps are taken.
Step 1: Probe the composite block index for cell values of the new row.
Step 2: If cell does not exist, or if no space is located during the search of the existing
blocks for the cell, then scan block map for free block.
Step 3: If a free block found, log and change the block status to in-use.
Step 4: Log and add new key/BID to each block index (composite and all dimension
indexes).
Step 5: Log and insert record into the new block.
Step 6: If RID index(es) exist, log and insert key/RID in each RID index.
Instructor Guide
Instructor notes:
Purpose — This shows the steps taken when the INSERT processing for a MDC table
requires a new block to be added to the table.
Details —
Transition statement — Let's take a look at the processing for deleting rows.
V8.2
Instructor Guide
Uempty
Delete processing details: Not empty block
• Step 1: Log and delete record from table.

Note: Block not emptied.
Delete record

delete key/RID from each.
Remove RID from RID index
Figure 9-20. Delete processing details: Not empty block CL4636.0
Notes:
If a row is deleted from a MDC and it is not the only row in that block, the following steps
are taken:
Step 1: Log and delete record from table.
Step 2: If RID index(es) exist, log and delete key/RID from each index.
Instructor Guide
Instructor notes:
Purpose — This shows the simplified processing for deleting a row from a block in a MDC
table when other rows are in the same block.
Details —
Transition statement — Let's look at the processing for a delete of the last row in a block.
V8.2
Instructor Guide
Uempty
Delete processing details: Empty block
• Step 1: Log and delete record from table.

Delete record
Note: Block now empty.
0 1 2 3 4 5 6 7 ...
X U U U U F U F ...
• Step 2: Find corresponding block entry in

Block entry for empty block
the block map
0 1 2 3 4 5 6 7 ...
X F U U U F U F ...
• Step 3: Log and change block status to free
C
• Step 4: Log and remove BID from each
block index (dims and composite)
Remove BID from block index

Remove RID from RID index
remove RID from each
Figure 9-21. Delete processing details: Empty block CL4636.0
Notes:
If a row is deleted from a MDC table and as a result the block is now empty, the following
steps are taken:
Step 1: Log and delete record from table. The block is now empty.
Step 2: Find corresponding block entry in the block map.
Step 3: Log and change block status to free.
Step 4: Log and remove BID from each block index (composite and all dimension
indexes).
Step 5: If RID index(es) exist, log and remove RID from each index.
Instructor Guide
Instructor notes:
Purpose — This shows additional processing steps required to delete the last row from a
block in a MDC table.
Details —
Transition statement — Let's look at the processing for UPDATE SQL statements in a
MDC table.
V8.2
Instructor Guide
Uempty
Update processing for MDC tables

• Update of non-dimension values:
– In this case, update row in place as in regular tables
– If updating a variable length column and the record no longer fits on the page,
search for another page with enough space
– First search within the same block
– If space is not found in that block, use the insert algorithm to find another block
– There is no need to update the block indexes unless a new block is created
• Update of dimension values:

– Move the record to a different cell
– The update is converted into a delete then an insert of the changed record
– Block indexes need to be updated only if a block is emptied or a new block is
created as in regular insert
Figure 9-22. Update processing for MDC tables CL4636.0
Notes:
The processing for SQL UPDATE statements on a MDC table depends on whether the
update changes one of the columns defined as a dimension for the table.
If the UPDATE changes the value of non-dimension columns:
In this case, the row is updated in place as in regular tables. If updating a variable
length column and the record no longer fits on the page, DB2 must search for another
page with enough space. First DB2 searches within the same block. If space is not
found in that block, DB2 uses the insert algorithm to find another. There is no need to
update the block indexes unless a new block is created.
If the UPDATE changes the value of any of the dimension columns:
In this case, the record MUST move to a different cell. The UPDATE is thus converted
into a delete from the current cell and an insert of the changed record into a different
cell. The Block indexes need to be updated if a block is emptied or a new block is
needed to insert the changed row.
Instructor Guide
Instructor notes:
Purpose — This shows the processing for SQL UPDATE statements is dependent on
whether the update changes one or more of the defined dimension columns.
Details —
Transition statement — Let's review how using a MDC table can reduce some of the
overhead associated with maintaining indexes on a table.
V8.2
Instructor Guide
Uempty
Reduced index maintenance and logging

• Block indexes need only be updated when the first record is
inserted in a block or the last record is deleted from a block
• Index overhead for maintenance and logging are therefore

reduced
• For every block index that would have otherwise been a RID
index, this overhead and logging is reduced enormously!
• The reduction is by a factor of cell cardinality
Figure 9-23. Reduced index maintenance and logging CL4636.0
Notes:
The maintenance of the block indexes for a MDC table is significantly reduced when
compared to RID indexes on the same columns.
The Block indexes need only be updated when the first record is inserted in a block or the
last record is deleted from a block rather than for every record. This reduces the processing
and I/Os necessary for the index maintenance and the logging of the index changes.
For every block index that would have otherwise been a RID index, this overhead and
logging is reduced enormously! The reduction is by a factor of cell cardinality. If 100 rows fit
in each block, then there could be a 99% reduction in this overhead.
Instructor Guide
Instructor notes:
Purpose — This is a summary of the advantages of using block indexes for the MDC
tables when compared to using standard row-level indexes.
Details —
Transition statement — Let's take a look at processing using LOAD for a MDC table.
V8.2
Instructor Guide
Uempty 9.2. Multidimensional Table Processing using Load

Instructor Guide
Load: Fast and efficient data roll-in

• Use DB2 LOAD Utility for fast data roll-in
– Efficient algorithm organizes data along dimension lines
– Less logging and fewer updates for block indexes versus RID indexes
• MDC load exhibits better space management:
– Roll-in of a new slice can reuse freed blocks from previously emptied blocks
– Load uses the block map to determine which blocks are free
• Offline or Online (ALLOW READ ACCESS)
Add
Generated
Columns
User Data Organize data,

detecting dimension
boundaries
Store in free blocks
where possible; 1997,
Mexico,
1998 1997, 1997,
1998,
Mexico,
1997
append additional Mexico, Mexico,

1997, 1998,
1997 1997
Mexico, Mexico,
1998 1997, 1997, 1997
Mexico, Mexico, 1997, 1998,
blocks as 1997,
Canada,
1997 1997
1998,
Canada,
1998
1997,
Canada,
1997,
Canada,
1997
Canada,
1997
Canada, 1997
necessary 1998
1997,
Canada,
1997
1997,
Canada,
1997
1997
Composite
block index Add Indexes
Figure 9-24. Load: Fast and efficient data roll-in CL4636.0
Notes:
The DB2 LOAD utility can be used for fast data roll-in for MDC tables. For MDC tables, the
LOAD utility can use the block map to reuse blocks that are marked as free due to deleting
all of the rows from those blocks. In a non-MDC table, the LOAD utility can not use existing
empty pages so the table must be extended even if a large amount of space is available in
the table.
For example, a large table holds the last 12 months of sales data and each month the
oldest month's data is deleted and a new month's data is loaded. If the delete processing is
done first and the date column is defined as one of the MDC dimensions, then a large
number of blocks would become free. The LOAD can now reuse those blocks and reduce
the additional space required for the next month's data. In a non-MDC table, a REORG
could be used to reduce table size after deleting large numbers of rows. The REORG might
be unnecessary for a MDC table.
The block indexes for a MDC table will require less processing for index maintenance by
the LOAD utility. The LOAD for a MDC table can be run offline or allow reads of existing
data to continue during a LOAD in INSERT mode.
V8.2
Instructor Guide

Purpose — This describes some differences between LOAD processing for MDC tables.
Details —
Transition statement — Let's take a closer look at some of the LOAD processing for a
MDC table.
Instructor Guide
Load processing for MDC tables

• Load examines the incoming data stream using a moving window
• It tries to cluster a window's portion of data at a time in a clustering buffer in
memory
• Data in that section is separated into blocks according to cell value
• All cell values found are stored in the cell table, which is stored in temp space
• If a block is filled, it is written directly to disk
• Any partially filled blocks are stored in an in-memory dynamic cache, to be filled
if/when further data for the same cell is found in the data stream
• If the cache fills up, it spills directly to the DB container (where it will eventually
reside anyway)
In memory
clustering buffer
Database containers
Cell table
Full block
with index
Partial block
Dynamic partial block/page cache
Incoming data stream

Figure 9-25. Load processing for MDC tables CL4636.0
Notes:
When loading data into a non-MDC table, the LOAD utility just creates new extents with the
input data and appends those new extents to the table.
For MDC tables, the LOAD needs to store the input rows into extents based on the defined
Dimension column values. The Load utility examines the incoming data stream using a
moving window. The LOAD tries to fill a block with data before writing it to disk. It tries to
cluster a window's portion of data at a time in a clustering buffer in memory. Data in that
section is separated into blocks according to cell value. All cell values found are stored in
the cell table, which is stored in temp space. If a block is filled, it is written directly to disk.
Any partially filled blocks are stored in an in-memory dynamic cache, to be filled when and
if further data for the same cell is found in the data stream. If the cache fills up, it spills
directly to the table space container disk, where it will eventually reside anyway.
V8.2
Instructor Guide

Purpose — This explains the additional processing that occurs in memory for the LOAD
utility when loading data into a MDC table. This shows that for an unsorted input file, the
additional memory will be required for efficient load processing. This will be addressed in
the next graphic.
Details —
Transition statement — Let's look at some of the considerations for tuning the
performance of the LOAD utility with a MDC table.
Instructor Guide
Performance and tuning: MDC load

• Input data can be sorted or unsorted
• If unsorted:
– Increase UTIL_HEAP size: This will affect all Loads in the system.
– Increase DATA BUFFER: This will only affect one Load job. When DATA
BUFFER is specified, one must make sure the UTIL_HEAP size is set
large enough to accommodate multiple concurrent Load jobs.
• Avoid cell table spills
– Increase the temp space buffer pool to accomodate the cell table
• Size of dimension key and number of distinct cells
• Be aware that block map updates are logged by load (performance
impact is minor)
• Tips!
– Load from cursor is usually faster than insert with subselect
• DPF insert from subselect can perform better than load from
cursor due to collocation
– Load always starts at a block boundary, so best used for data belonging
to new cells (or initial table population)
Figure 9-26. Performance and tuning: MDC load CL4636.0
Notes:
If the input used for loading a MDC table is sorted by the dimension column values, the
LOAD can process that data into the MDC efficiently. If the input data is not sorted,
additional memory can be used to reduce the overhead of sorting the data into blocks.
Since most of the memory used by the LOAD processing comes from the utility heap, it
might be necessary to increase the value for UTIL_HEAP_SZ in the database configuration
file to allocate a large utility heap. The LOAD utility will use approximately 25% of the
available utility heap memory for each LOAD. The Load utility DATA BUFFER option can
be used to specify a specific amount of memory to be used for one LOAD utility.
If multiple concurrent Load jobs are run, it might be necessary to increase the size of the
utility heap for the database.
In general, using the Load utility to load from a declared cursor is faster than using a SQL
INSERT with a subselect. In a DB2 partitioned database, there are some cases where the
SQL INSERT from a subselect can perform better than the cursor LOAD if the insert can
take advantage of collocation.
V8.2
Instructor Guide
Uempty Since the Load allocates new data extents (blocks) rather than filling existing blocks that
might have some space available, the LOAD utility should be used for data belonging to
new cells, or for performing a LOAD in REPLACE mode.
MDC load operations will always have a build phase since all MDC tables have block
indexes.
The block map updates are logged by the Load utility but the performance impact should
be small. During the Load phase, extra logging for the maintenance of the block map will be
performed. There are approximately two extra log records per extent allocated. To ensure
good performance, the LOGBUFSZ database configuration parameter should be set to a
value that takes this into account.
A system temporary table with an index is used to load data into MDC tables. The size of
the table is proportional to the number of distinct cells loaded. The size of each row in the
table is proportional to the size of the MDC dimension key. To minimize disk I/O caused by
the manipulation of this table during a load operation, ensure that the buffer pool for the
temporary table space is large enough.
Instructor Guide
Instructor notes:
Purpose — This explains some of the tuning options that can be used to improve LOAD
processing for MDC tables.
Details —
Transition statement — Let's take a look at the locking performed for MDC tables.
V8.2
Instructor Guide
Uempty
MDC locking
• In MDC tables, table level locking, block level locking, or row
level locking can be done
Region Region Region

East North South West East North South West East North South West
Page 0 Page 0 Page 0

97 98 99 00 97 98 99 00 97 98 99 00
Year Year Year
Table locking Block locking Row locking
• This allows more locking efficiency, fewer lock escalations,

and provides the same concurrency as regular tables while
supporting all isolation semantics
Figure 9-27. MDC locking CL4636.0
Notes:
In MDC tables, table-level locking, block-level locking, or row-level locking can be done.
This allows more locking efficiency, fewer lock escalations, and provides the same
concurrency as regular tables while supporting all isolation semantics.
For example, if there is a MDC table defined with the YEAR column as one of the
dimensions, and an UPDATE statement is processed with a predicate of YEAR = '2012', it
would be possible to acquire exclusive locks at the block-level rather than needing one lock
for each row updated. This could reduce the demand for memory in the locklist and
possibly avoid a lock escalation.
Instructor Guide
Instructor notes:
Purpose — This is used to introduce the use of block-level locks for MDC tables in addition
to the normal table and row-level locking.
Details —
Transition statement — Now let's take a look some examples of block-level locking for
different types of SQL processing.
V8.2
Instructor Guide
Uempty
Examples of MDC locking

• Insert:
– When adding a new block to a cell an X(Exclusive) lock on the block is acquired
– Once the block is added to the cell, the X lock is downgraded to IX to allow concurrent
inserts to the block until the UOW is committed
– Inserting into an existing block only requires an IX block lock
• Delete:
– When deleting ALL records from a block, only block locks (X and/or U locking) are
used rather than locking the rows
– If other data predicates indicate SOME records will be deleted, then a IX lock will be
used on the block and standard row locking will be used
– If the last record is deleted from a block, the IX block lock will be converted to an X lock
to remove the block from the cell
• Scans:
– For RR scans, all rows touched need to be locked
– In MDC tables, a S lock on the block is used to reduce the number of locks needed
Figure 9-28. Examples of MDC locking CL4636.0
Notes:
Here are some examples of the locking done for different types of SQL statements.
Insert: When adding a new block to a cell an X (Exclusive) lock on the block is acquired.
Once the block is added to the cell, the X lock is downgraded to IX (Intent Exclusive) to
allow concurrent inserts from other applications into the block until a commit releases the
block lock. If a row is inserted into an existing block, an IX block lock is acquired to allow
other rows to be inserted or updated.
Delete: If the predicates for a SQL DELETE indicate that ALL records will be deleted from
a block, then the exclusive block locks (X and/or U locking) are used rather than locking the
rows. This reduces the number of locks that are needed and saves locklist memory. If the
SQL DELETE includes other data predicates indicating that only SOME records will be
deleted, then a IX lock will be used on the block and standard row locking will be used. This
might allow other applications to read or change other rows in the block. If the last record is
deleted from a block, the IX block lock will be converted to an X lock to remove the block
from the cell.
Instructor Guide
When performing scans of a MDC table, if the isolation level is repeatable read (RR), then
all rows touched need to be locked. With a MDC table, a S lock for the block is used to
reduce the number of locks needed. This reduces demand for locklist memory and might
prevent a lock escalation to a table-level lock, which could reduce concurrency.
V8.2
Instructor Guide

Purpose — This shows which block-level locks are acquired for accessing MDC tables.
Students using MDC tables will begin seeing the block locks in snapshot monitoring in
addition to the normal row and table locks.
Details —
Instructor Guide
Using block locking for data roll-in
db2 alter table sales.history locksize blockinsert
• Might improve the performance of INSERT operations:

– Locks at the block level (X) and avoids row locks for insertions.
– Reduces the number of locks needed for large roll-in of data,
may avoid lock escalations.
– Useful for large insertions into cells by individual transactions.
– If there are multiple concurrent insertions to the same cell by
different transactions, each transaction will insert into separate
blocks to avoid lock waits. This could result in partially filled
blocks.
• Other operations (select, update, delete) perform normal

locking.
Figure 9-29. Using block locking for data roll-in CL4636.0
Notes:
For MDC tables, the selection of the BLOCKINSERT clause might improve the
performance of INSERT operations by locking at the block-level and avoiding row locks for
insertions. Row-level locking is still performed for all other operations and is performed on
key insertions to protect Repeatable Read (RR) scanners. The BLOCKINSERT option is
useful for large insertions into cells by individual transactions. None of the LOCKSIZE
choices prevent normal lock escalation.
You can specify BLOCKINSERT for the LOCKSIZE clause in order to use block-level
locking during INSERT operations only. When this is specified, row-level locking is
performed for all other operations, but only minimally for INSERT operations. That is,
block-level locking is used during the insertion of rows, but row-level locking is used for
next-key locking if RR scans are encountered in the indexes as they are being updated.
BLOCKINSERT locking might be beneficial in the following cases:
• There are multiple transactions doing mass insertions into separate cells.
V8.2
Instructor Guide
Uempty • Concurrent insertions to the same cell by multiple transactions is not occurring, or it is
occurring with enough data inserted per cell by each of the transactions that the user is
not concerned that each transaction will insert into separate blocks.
Instructor Guide
Instructor notes:
Purpose — The use of BLOCKINSERT locking for data roll-in can reduce the lock storage
requirements and the overhead associated with row-level locking when large numbers of
rows are being inserted into a MDC table.
Details —
Transition statement — Next we will look at using the DB2_MDC_ROLLOUT registry
variable or setting CURRENT MDC ROLLOUT MODE to improve the performance of data
roll out for MDC tables.
V8.2
Instructor Guide
Uempty
MDC Rollout Delete performance options

Faster DELETE along cell or slice boundaries:
– Immediate Index Cleanup Rollout
– Deferred Index Cleanup Rollout
DELETE FROM MDC.TAB1 WHERE NATION=‘Canada’ and YEAR = 2013 and COLOR = ‘yellow’
2013, 1998,
2012,
Canada, Mexico,
Canada,
blue 2013, 2013, yellow Cell for
Nation Canada, Canada, (2013, Canada, yellow)
dimension yellow yellow
2013, 2012,
1998,
Mexico,
Canada,
Mexico, Each cell contains one
blue yellow
yellow
2013, 2013, or more blocks.
Mexico, Mexico,
yellow yellow
Color Year
dimension dimension
Figure 9-30. MDC Rollout Delete performance options CL4636.0
Notes:
Immediate Index Cleanup
During a rollout deletion, the deleted records are not logged. Instead, the pages that
contain the records are made to look empty by reformatting parts of the pages. The
changes to the reformatted parts are logged, but the records themselves are not
logged.
The default behavior, immediate cleanup rollout, is to clean up RID indexes at delete
time. This mode can also be specified by setting the DB2_MDC_ROLLOUT registry
variable to IMMEDIATE, or by specifying IMMEDIATE on the SET CURRENT MDC
ROLLOUT MODE statement.
There is no change in the logging of index updates, compared to a standard delete
operation, so the performance improvement depends on how many RID indexes there
are. The fewer RID indexes, the better the improvement, as a percentage of the total
time and log space.
Deferred Index Cleanup
Instructor Guide
Alternatively, you can have the RID indexes updated after the transaction commits,
using deferred cleanup rollout. This mode can also be specified by setting the
DB2_MDC_ROLLOUT registry variable to DEFER, or by specifying DEFERRED on the
SET CURRENT MDC ROLLOUT MODE statement. In a deferred rollout, RID indexes
are cleaned up asynchronously in the background after the delete commits. This
method of rollout can result in significantly faster deletion times for very large deletes, or
when a number of RID indexes exist on the table. The speed of the overall cleanup
operation is increased, because during a deferred index cleanup, the indexes are
cleaned up in parallel, whereas in an immediate index cleanup, each row in the index is
cleaned up one by one. Moreover, the transactional log space requirement for the
DELETE statement is significantly reduced, because the asynchronous index cleanup
logs the index updates by index page instead of by index key.
V8.2
Instructor Guide

Purpose — This introduces the two options to improve the performance of large delete
processing in MDC tables.
Details —
Transition statement — Next we will discuss the methods to set MDC rollout options.
Instructor Guide
Enabling Rollout options

• Options for processing of SQL DELETE based on dimension
columns defined for MDC table, like deleting all rows for a date
range:
– Standard DELETE processing logs each row deleted, updates block and row
indexes and frees the space immediately
– IMMEDIATE CLEANUP processing logs delete for each page, updates all block
and row indexes and frees space when COMMIT is issued.
– DEFERRED CLEANUP processing logs delete for each page, updates the
block indexes only and starts an asynchronous process to update the row
indexes and free the space AFTER the commit is complete.
• DB2_MDC_ROLLOUT registry variable:
– 1, TRUE, ON, YES, IMMEDIATE (default)
– 0, FALSE, OFF, NO
– DEFER
• DELETE statement special register:
– SET CURRENT MDC ROLLOUT MODE IMMEDIATE
– SET CURRENT MDC ROLLOUT MODE NONE
– SET CURRENT MDC ROLLOUT MODE DEFERRED
Figure 9-31. Enabling Rollout options CL4636.0
Notes:
There are two ways to select the type of processing for MDC rollout deletes.
• You can set the DB2_MDC_ROLLOUT registry variable, which is dynamic, to DEFER,
IMMEDIATE or OFF.
• Alternatively, you can set the CURRENT MDC ROLLOUT MODE special register to
DEFERRED, IMMEDIATE or NONE.
A database monitor element, BLOCKS_PENDING_CLEANUP, allows you to determine
the number of MDC table blocks that are pending cleanup.
The default behavior for deletes that qualify for rollout is to perform an immediate index
cleanup. You can decide when a deferred index cleanup is needed. Because
DB2_MDC_ROLLOUT is dynamic, any new compilations of your DELETE statement use
the new setting. However, you will probably find using the CURRENT MDC ROLLOUT
MODE special register a way to more finely control rollout behavior.
V8.2
Instructor Guide
Uempty SET CURRENT MDC ROLLOUT MODE statement

The SET CURRENT MDC ROLLOUT MODE statement assigns a value to the CURRENT
MDC ROLLOUT MODE special register. The value specifies the type of rollout cleanup that
is to be performed on qualifying DELETE statements for multidimensional clustering (MDC)
tables.
Invocation
This statement can be embedded in an application program or issued through the use
of dynamic SQL statements. It is an executable statement that can be dynamically
prepared.
Authorization
None required.
Syntax
>-SET--CURRENT--MDC ROLLOUT MODE--+-NONE----------+-----------><
+-IMMEDIATE-----+
+-DEFERRED------+
'-host-variable-'
Description
NONE – Specifies that MDC rollout optimization during delete operations is not to be
used. The DELETE statement is processed in the same way as a DELETE statement
that does not qualify for rollout.
IMMEDIATE – Specifies that MDC rollout optimization is to be used if the DELETE
statement qualifies. If the table has RID indexes, the indexes are updated immediately
during delete processing. The deleted blocks are available for reuse after the
transaction commits.
DEFERRED – Specifies that MDC rollout optimization is to be used if the DELETE
statement qualifies. If the table has RID indexes, index updates are deferred until after
the transactions commits. With this option, delete processing is faster and uses less log
space, but the deleted blocks are not available for reuse until after the index updates
are complete.
host-variable – A variable of type VARCHAR. The length of host-variable must be less
than or equal to 17 bytes (SQLSTATE 42815). The value of the host variable must be a
left-justified string that is one of 'NONE', 'IMMEDIATE', or 'DEFERRED' (case
insensitive). If host-variable has an associated indicator variable, the value of that
indicator variable must not indicate a null value (SQLSTATE 42815).
Subsequent DELETE statements that are eligible for rollout processing respect the setting
of the CURRENT MDC ROLLOUT MODE special register. Currently executing sections are
not affected by a change to this special register.
Instructor Guide
The effects of executing the SET CURRENT MDC ROLLOUT MODE statement are not
rolled back if the unit of work in which the statement is executed is rolled back.
1+1
=2 Example
Specify deferred cleanup behavior for the next DELETE statement that qualifies for rollout
processing.
SET CURRENT MDC ROLLOUT MODE IMMEDIATE
DB2_MDC_ROLLOUT
Operating system: All
Default: IMMEDIATE, Values: IMMEDIATE, OFF, or DEFER
This variable enables a performance enhancement known as rollout for deletions from
MDC tables. Rollout is a faster way of deleting rows in an MDC table, when entire cells
(intersections of dimension values) are deleted in a search DELETE statement. The
benefits are reduced logging and more efficient processing.
There are three possible outcomes of the variable setting:
• No rollout – If OFF is specified.
• Immediate rollout – If IMMEDIATE is specified.
• Rollout with deferred index cleanup – If DEFER is specified.
If the value is changed after startup, any new compilations of a statement will respect the
new registry value setting.
For statements that are in the package cache, no change in delete processing will be made
until the statement is recompiled.
The SET CURRENT MDC ROLLOUT MODE statement overrides the value of
DB2_MDC_ROLLOUT at the application connection level.
Note
In DB2 Version 9.7 and later releases, deferred cleanup rollout is not supported on a
data partitioned MDC table with partitioned RID indexes. Only the NONE and
IMMEDIATE modes are supported. The cleanup rollout type will be IMMEDIATE if the
DB2_MDC_ROLLOUT registry variable is set to DEFER, or if the CURRENT MDC
ROLLOUT MODE special register is set to DEFERRED to override the
DB2_MDC_ROLLOUT setting.
V8.2
Instructor Guide

Purpose — To provide information about the two methods for setting the mode for
processing rollout type deletes for MDC tables.
Details —
Transition statement — Next we will discuss the processing for immediate index cleanup
of MDC rollout deletes.
Instructor Guide
MDC Immediate Index Cleanup Rollout

• Performance improvements by avoiding per-row logging and
physical data deletion:
– Blocks are marked ROLLOUT
– Removes BID from block index
– Clears the slot directory on each page in the block
– Writes one log record per page (versus per row)
• Secondary indexes still updated synchronously:

– Must scan the rows (as usual) to update each index to remove keys
– Index logging is unchanged
Figure 9-32. MDC Immediate Index Cleanup Rollout CL4636.0
Notes:
For a rollout deletion, the deleted records are not logged. Instead, the pages that contain
the records are made to look empty by reformatting parts of the pages. The changes to the
reformatted parts are logged, but the records themselves are not logged.
The default behavior, immediate cleanup rollout, is to clean up RID indexes at delete time.
This mode can also be specified by setting the DB2_MDC_ROLLOUT registry variable to
IMMEDIATE or by specifying IMMEDIATE with the SET CURRENT MDC ROLLOUT
MODE statement. There is no change in the logging of index updates, as compared to a
standard delete, so the performance improvement depends on how many RID indexes
there are. The fewer RID indexes, the better the improvement is, as a percentage of the
total time and log space.
V8.2
Instructor Guide

Purpose — This describes the IMMEDIATE option for MDC Rollout processing, which is
the default beginning with DB2 9.5.
Performance is improved by changing the way the logging is performed for the data rows.
The changes to any row indexes are performed and logged as part of the processing for
the DELETE.
Details —
Transition statement — Next we will discuss deferred index cleanup for MDC rollout
deletes.
Instructor Guide
MDC Deferred Index Cleanup Rollout

• Each qualifying DELETE statement:
– Marks each block INUSE and ROLLOUT
– Removes BID from block index
– Clears the slot directories of each page
– Writes one log record per page (versus per row)
– Initiates an asynchronous background process
• For each committed rolled out block, the background process:

– Marks each block INUSE, ROLLOUT, CLEANUP
– Update RID indexes
– Frees the block and makes it available for reuse
– Reduced index maintenance time by spawning one coordinating cleaner per
index
– Reduced logging by cleaning entire index pages, one log record per index page
versus one per RID removed
Figure 9-33. MDC Deferred Index Cleanup Rollout CL4636.0
Notes:
Using deferred cleanup rollout, the RID indexes can be updated after the transaction
performing the MDC rollout commits. This mode can be specified by setting the
DB2_MDC_ROLLOUT registry variable to DEFER or by specifying DEFERRED with the
SET CURRENT MDC ROLLOUT MODE statement. In a deferred rollout, RID indexes are
cleaned up asynchronously in the background after the commit of the delete.
This method of rollout can result in significantly faster deletion times for very large deletes
or when a number of RID indexes exist on a table. The speed of the overall cleanup
operation is increased because during a deferred index cleanup, the indexes are cleaned
up in parallel, whereas in an immediate index cleanup, each row in the index is cleaned up
one by one. As well, the transactional log space requirement for the DELETE statement is
significantly reduced because the asynchronous index cleanup logs the index updates by
index page instead of by index key.
Deferred cleanup rollout requires additional memory resources, which are taken from the
database heap. If DB2 is unable to allocate the memory structures it requires, the deferred
cleanup rollout fails and a message is written to the administrator log.
V8.2
Instructor Guide
Uempty The DEFERRED mode for processing MDC rollouts provides the most benefits for
applications that need the DELETE processing to be as fast as possible. Asynchronous
index cleanup (AIC) is the deferred cleanup of indexes following operations that invalidate
index entries. Rollout AIC is invoked when a rollout delete is committed or, if the database
was shut down, when the table is first accessed following a restart of the database. While
AIC is in progress, any queries against the indexes work, including those accessing the
index being cleaned up.
There one coordinating cleaner per MDC table. The index cleanup for multiple rollouts is
consolidated in the cleaner. The cleaner spawns a cleanup agent for each RID index, and
the cleanup agents update the RID indexes in parallel. Cleaners are also integrated with
the Utility Throttling Facility. By default, each cleaner has a utility impact priority of 50
(acceptable values are between 1 and 100, with 0 indicating no throttling). You can change
the priority by using the SET UTIL_IMPACT_PRIORITY command or the db2UtilityControl
API.
Instructor Guide
Instructor notes:
Purpose — To describe the benefits of deferred processing of RID indexes for MDC
rollout. We will discuss the processing performed by asynchronous index cleanup.
Details —
Transition statement — Next we will look at the results of some tests that show the impact
on database logging based on different MDC ROLLOUT options.
V8.2
Instructor Guide
Uempty
MDC rollout log space usage examples

11 million rows (134260 pages), 16K page, 16 extent size, 4 nodes, 8
RID indexes
Log s pace us age (in pe rce ntage ) for diffe re nt rollout options
120.00%
100.00%
80.00%
Percentage
Delete (no rollout)

60.00% Immediate rollout
Def erred rollout
40.00%
20.00%
0.00%
03 1.5 3.0 30.0 97.0
Pe rce ntage De le te in Table
Figure 9-34. MDC rollout log space usage examples CL4636.0
Notes:
The visual shows the effects on log space used for the three different modes of processing
MDC rollouts depending on the percentage of the table’s data that was deleted.
Percentage of Log space in percentage
% Deleted Standard Immediate Deferred
0.3% 100.00% 46.43% 81.67%
1.5% 100.00% 48.84% 16.89%
3.0% 100.00% 53.67% 12.27%
30% 100.00% 65.21% 2.78%
97% 100.00% 52.21% 2.06%
Log Space Required in 4K pages
% Deleted Standard Immediate Deferred
0.3% 420 195 343
1.5% 2504 1223 423
3.0% 456 5 2450 560
30.0% 32352 21097 899
97.0% 129191 67448 2663
Instructor Guide
Instructor notes:
Purpose — This shows the effects of using the different MDC rollout options on log space
required.
The DEFERRED option is most useful for larger deletes.
Details —
V8.2
Instructor Guide
Uempty
Showing the background Index Cleanup process

• Query BLOCKS_PENDING_CLEANUP monitor element available in
ADMIN_GET_TAB_INFO table function or ADMINTABINFO view
• LIST UTILITIES SHOW DETAILS:
ID = 1
Type = MDC ROLLOUT INDEX CLEANUP
Database Name = SALESDB
Partition Number = 0
Description = TABLE: PROD.MDCSALES
State = Executing
Invocation Type = Automatic
Throttling:
Priority = 50 Each Phase represents 1 RID Index
Estimated Percentage Complete = 33
Phase Number = 1
Description = PROD.IXSALES_STORE
Total Work = 3000 pages
Completed Work = 1520 pages
Start Time = 07/01/2013 16:08:47.434239
Figure 9-35. Showing the background Index Cleanup process CL4636.0
Notes:
Monitoring
Because the rolled-out blocks on an MDC table are not reusable until after the cleanup is
complete, it is useful to monitor the progress of a deferred index cleanup rollout. Use the
LIST UTILITIES monitor command to display a utility monitor entry for each index being
cleaned up.
You can also query the number of blocks in the table currently being cleaned up through
deferred index cleanup rollout (BLOCKS_PENDING_CLEANUP) by using the
SYSPROC.ADMIN_GET_TAB_INFO table function.
In the visual, sample output for the LIST UTILITIES, shows MDC Rollout Index Cleanup
progress, as indicated by the number of pages in each index that have been cleaned up.
Each phase listed in the output represents one of the RID indexes being cleaned for the
table.
Instructor Guide
Instructor notes:
Purpose — This shows the methods available to check the status of deferred index
cleanup for MDC tables.
Details —
V8.2
Instructor Guide
Uempty
Which rollout should be used?

• Deferred cleanup rollout:
– Speed of delete
– MDC table with many RID indexes
– Limited transaction log space
– Large deletion on dimensional columns
• Immediate cleanup rollout:

– Need to reuse existing block after delete is committed
– Less impact for Index scan performance after delete
– Memory is constrained
– Small deletion on dimensional columns
– No RID indexes
Figure 9-36. Which rollout should be used? CL4636.0
Notes:
When to use deferred cleanup rollout

If delete performance is the most important factor to you, and there are RID indexes
defined on the table, use deferred cleanup rollout. Note that prior to index cleanup,
index-based scans of the rolled out blocks suffer a small performance penalty, depending
on the amount of rolled out data. Here are other issues to consider when deciding between
immediate index cleanup and deferred index cleanup:
• Size of delete: Choose deferred cleanup rollout for very large deletes. In cases where
dimensional delete statements are frequently issued on many small MDC tables, the
overhead to asynchronously clean index objects might outweigh the benefit of the time
saved during the delete.
• Number and type of indexes: If the table contains a number of RID indexes, which
require row-level processing, use deferred cleanup rollout.
Instructor Guide
• Block availability: If you want the block space freed by the delete statement to be
available immediately after the delete statement commits, use immediate cleanup
rollout.
• Log space: If log space is limited, use deferred cleanup rollout for large deletions.
• Memory constraints: Deferred cleanup rollout consumes additional database heap on
all tables which have deferred cleanup pending.
V8.2
Instructor Guide

Purpose — To discuss some of the considerations for selecting a mode for processing
MDC rollouts.
Details —
Transition statement — Next we will look at some restrictions regarding deferred index
cleanup for MDC tables.
Instructor Guide
MDC Rollout restrictions

• ROLLOUT will not be used in the following circumstances:
– Non-dimension columns specified in the WHERE clause
– Data capture (replication) is enabled
– Per row delete triggers
– FETCH FIRST n ROWS
– Decomposed updates (delete and insert)
– Top level of positioned delete (‘WHERE CURRENT OF’) *
– Additional per row processing is required *
• Table is the parent in a referential integrity constraint
• Dependent MQTs with REFRESH IMMEDIATE
• SELECT FROM DELETE
* Lower levels of cascaded deletes are eligible for ROLLOUT:

– Example: A table with a FOREIGN KEY and ON DELETE cascade could be
rolled out if the foreign key was also a dimension column
Figure 9-37. MDC Rollout restrictions CL4636.0
Notes:
The conditions that need to be met to allow delete using the rollout processing options are:
• The DELETE statement is searched, not positioned (that is, does not use the "WHERE
CURRENT OF" clause).
• No WHERE clause (all rows are to be deleted) or the only conditions in the WHERE
clause are on dimensions.
• The table is not defined with the DATA CAPTURE CHANGES clause.
• The table is not the parent in a referential integrity relationship.
• The table does not have on delete triggers defined.
• The table is not used in any MQTs that are refreshed immediately.
• A cascaded delete operation might qualify for rollout, if its Foreign Key is a subset of its
table's dimension columns.
• The DELETE statement cannot appear in a SELECT statement executing against the
temporary table that identifies the set of affected rows prior to a triggering SQL
V8.2
Instructor Guide
Uempty operation (specified by the OLD TABLE AS clause on the CREATE TRIGGER
statement).
Instructor Guide
Instructor notes:
Purpose — To look at the conditions that determine whether a DELETE statement for a
MDC table can qualify for the special processing of a rollout delete.
Details —
Transition statement — Next we will look at the changes in methods to free unused space
in MDC tables.
V8.2
Instructor Guide
Uempty
Sparse MDC tables: Effects of large data rollout

DELETE FROM Sales WHERE
Region = SW AND Year =
2011
Region Region
NW,2011 SW,2011 SW,2012 NW,2011 SW,2012
Year Year
Table Object
• These pages and storage are still assigned to the MDC table
• How can this storage be reused elsewhere in the table space?
– Prior to DB2 9.7, an offline table reorganization could be used to
free unused MDC blocks back to the table space, but access is
limited during reorganization
Figure 9-38. Sparse MDC tables: Effects of large data rollout CL4636.0
Notes:
When rows are deleted from a DB2 table, the space occupied by the deleted rows remains
allocated to the same table. For MDC tables, it is common to delete large numbers of rows
during a data rollout, which might delete all of the rows for MDC cells which causes a set of
blocks or extents to become empty. These empty MDC blocks are available to be reused
for data added to the same table, but these can not be used by any other object in the
same table space. Prior to DB2 9.7, an offline reclaiming reorganization could be used to
free those blocks back to the table space.
Instructor Guide
Instructor notes:
Purpose — To review the concept of rolling out a group of rows using DELETE statements,
where a predicate includes one or more dimensions of a MDC table. Prior to DB2 9.7, the
offline reorganization was needed to release empty blocks from the MDC table. The
REORG utility option for inplace reorganization is not supported for MDC tables.
Details —
Transition statement — Next we will look at the option in DB2 9.7 to release empty
extents from an MDC online.
V8.2
Instructor Guide
Uempty
Using REORG with RECLAIM EXTENTS ONLY
REORG TABLE Sales Region

Region RECLAIM EXTENTS ONLY
NW,2011 SW,2012
NW,2011 SW,2012
Year Year Extent freed back

to table space
and can be used
Table Object by other tables!
What is going on at the Table Space Level?
REORG TABLE Sales

RECLAIM EXTENTS ONLY
Figure 9-39. Using REORG with RECLAIM EXTENTS ONLY CL4636.0
Notes:
MDC tables can be reorganized to reclaim extents that are not being used. Starting with
DB2 9.7, a complete offline table reorganization is no longer needed to reclaim the MDC
extents.
Both the REORG TABLE command and the db2Reorg API have a new reclaim extents
option. As part of this new method to reorganize MDC tables, you can also control the
access to the MDC table while the reclaim operation is taking place. Your choices include:
no access, read access, and write access (which is the default).
Reclaimed space from the MDC table can be used by other objects within the table space.
In previous releases, the free space could only be used by the MDC table.
Instructor Guide
Instructor notes:
Purpose — To show the option of the REORG utility to quickly release unused blocks or
extents from a MDC table.
Details —
Transition statement — Next we will discuss the RECLAIM EXTENTS ONLY option for
REORG in more detail.
V8.2
Instructor Guide
Uempty
REORG with RECLAIM EXTENTS ONLY
REORG TABLE <mdc table name> RECLAIM EXTENTS ONLY

[ ALLOW { WRITE | READ | NO } ACCESS ]
• Very fast!
– Less processing than a standard table reorg
• No copy of the table created, no copy phase, and so on
– Done in-place with no data movement, minimal logging
• Find the empty blocks in block map
• Mark them as unallocated in the MDC table’s block map
• Mark them as unallocated in the table space SMPs
Figure 9-40. REORG with RECLAIM EXTENTS ONLY CL4636.0
Notes:
The processing for the REORG TABLE utility with the RECLAIM EXTENTS ONLY allows
the unused extents in an MDC table to be released very quickly with full write access
supported during the processing. Unlike a standard table reorganization, it is not necessary
to copy the table data or move any rows. The processing performed locates the empty
blocks and marks them an unallocated. This makes those pages available for use by other
objects in the same table space. This would decrease the number of used pages at the
table space level and increase the number of free pages, but it would not change the
number of pages allocated to the table space. An ALTER TABLESPACE command could
be used to release unused pages from the table space.
Instructor Guide
Instructor notes:
Purpose — To provide additional information about using RECLAIM EXTENT ONLY mode
of REORG for MDC tables.
Details —
Transition statement — Next we will see how a SQL query can be used to locate MDC
tables with reclaimable space.
V8.2
Instructor Guide
Uempty
Checking for MDC tables with reclaimable space

select substr(tabname,1,10) as tabname ,
substr(tabschema,1,10) as tabschema
,reclaimable_space as reclaimable_space_KB
FROM SYSIBMADM.ADMINTABINFO
where tabschema = 'MDC'
TABNAME TABSCHEMA RECLAIMABLE_SPACE_KB

---------- ---------- ---------------------
HIST1 MDC 0
HIST2 MDC 5248
HIST3 MDC 4784
Figure 9-41. Checking for MDC tables with reclaimable space CL4636.0
Notes:
The visual shows an example of a query using the SYSIBMADM.ADMINTABINFO view to
the amount of disk space that could be reclaimed from MDC tables.
The column RECLAIMABLE_SPACE_KB indicates the amount of disk space that can be
reclaimed by running the REORG command with the RECLAIM option. Disk space is
reported in kilobytes. For non-MDC tables, the value is zero.
Instructor Guide
Instructor notes:
Purpose — To show an example of a query that could be used by a database administrator
to determine which MDC table might need to be reorganized using the RECLAIM
EXTENTS ONLY option.
Details —
Transition statement — In the next section, we will discuss the table design
considerations creating MDC tables.
V8.2
Instructor Guide
Uempty
MDC design considerations

• MDC does not make every query faster and can be applied
incorrectly
• Key performance parameters for MDC:

– Extent Size (also known as Block Size)
– Page Size
– MDC Dimensions
• Incorrect dimension selections could lead to:
– Underutilization of space
– Degraded query performance
• Granularity of MDC Dimensions
• Number of MDC Dimensions
Figure 9-42. MDC design considerations CL4636.0
Notes:
It is important to understand that the use of multidimensional clustering does not improve
the performance for every query. Careful planning is necessary to make sure that the
design of the MDC matches the requirements for the application.
Some of the key performance parameters for setting up MDC tables are:
• Extent Size (also known as Block Size)
• Page Size
• MDC Dimensions — Incorrect dimension selections could lead to:
- Under utilization of space
- Degraded query performance
• Granularity of MDC Dimensions
• Number of MDC Dimensions
Instructor Guide
Instructor notes:
Purpose — This should be used to introduce the most important considerations for
implementation of a MDC table that performs well and makes efficient use of disk space.
Details —
Transition statement — Let's first take a look at the considerations for choosing the
dimension columns.
V8.2
Instructor Guide
Uempty
Considerations for Dimension selection

When choosing dimensions for a table, consider:
– First, which queries will benefit from block-level clustering:
• Columns in equality or range queries
• Columns with coarse granularity
• Foreign key columns in fact tables
– Second, the expected density of cells based on expected data
• # possible cells = cartesian product of dimension cardinalities
• Possibility of sparsely populated blocks/cells
– There are three factors to manipulate:
• The extent size - Reduce it if there are many sparse cells
• Number of dimensions
• Rollup of a dimension to a larger granularity with generated columns
Figure 9-43. Considerations for Dimension selection CL4636.0
Notes:
When choosing dimensions for a table, consider:
First, which queries will benefit from block-level clustering.
Queries with columns in equality or range queries:
SELECT .... WHERE Nation = 'Mexico'
SELECT ..... WHERE DATE > '2005-10-15'

Queries on columns with coarse granularity:
SELECT ....... WHERE JOBTITLE = 'SALES'
Rather than:
SELECT ......... WHERE EMPNO = 100200
Queries that retrieve results using the foreign key columns in large fact tables.
Instructor Guide
Second, the expected density of cells based on expected data.

The maximum number of possible cells will be the cartesian product of dimension
cardinalities.
There is a possibility of sparsely populated blocks/cells that will increase the table size.
There are three primary factors to manipulate:
1. The MDC can be defined in a table space with a smaller extent size if there are many
sparse cells.
2. The number of dimensions specified in the ORGANIZE BY clause for the MDC table.
3. If there are too many distinct values for one dimension, a generated column could be
defined to rollup the dimension to a larger granularity.
V8.2
Instructor Guide

Purpose — This shows some of the criteria for selecting the dimension columns for a MDC
table. There is a reference to using generated columns as a MDC dimension. This will be
presented in more detail in the following graphics.
Details —
Transition statement — Let's take a look at using a generated column value as a
dimension for a MDC table.
Instructor Guide
MDC Dimension on a generated column

• A dimension can be created on a GENERATED column, which is a
column built from an expression on a different column in the table
• Example:
( Date DATE,
Nation CHAR(25),
Color VARCHAR(10),
Month generated always as ((INTEGER(Date)/100),
... )
ORGANIZE BY( Month, Color )
• This MDC table will have two dimension block indexes:

– One on Month, another on Color and a composite block index on (Month,
Color)
– This provides a very powerful and flexible way to organize and cluster
data on expressions
Figure 9-44. MDC Dimension on a generated column CL4636.0
Notes:
A dimension can be created on a GENERATED column, which is a column built from an
expression on a different column in the table:
Example:
( Date DATE,
Nation CHAR(25),
Color VARCHAR(10),
Month generated always as ((INTEGER(Date)/100),
... )
ORGANIZE BY( Month, Color )
The generated column month will convert all of the dates for each month to a common
value. For example, the Dates of '2012-10-04' and '2012-10-30' will both have a value of
201210 in the Month column.
V8.2
Instructor Guide
Uempty If there are not enough records created for each individual date value to efficiently fill the
blocks for the MDC table, a generated column might be used to group all of the records for
each month into a block.
This MDC table will have two dimension block indexes: One on the Color column and one
on the generated Month column, and a composite block index on (Month, Color).
This provides a very powerful and flexible way to organize and cluster data on expressions.
Generated columns can be made using an expression, from simple arithmetic expressions,
to built-in functions, to CASE statements.
Instructor Guide
Instructor notes:
Purpose — This shows an example of using a generated column value as one of the
dimensions for a MDC table.
Details —
Transition statement — Let's take a look at how a MDC table with a generated column for
a dimension can be used to process queries containing range predicates.
V8.2
Instructor Guide
Uempty
MDC and generated columns: Integration

• Given an MDC table with dimension on generated column month,
where month = INTEGER(date)/100
• For queries on the dimension (month), block index range scans
can be used
• For queries on the base column (date), block index range scans
can also be done to narrow down which blocks to scan and then
apply the predicates on date to the rows in those blocks only
• The compiler generates the additional dimension predicates to use
– Example: For the query
select * from MDCTABLE where
date>"2012/03/03" and date<"2013/01/15"
The compiler generates the additional predicates
month>=201203 and month<=201301
which can be used as range predicates for a block index scan
• This gives a list of blocks to be scanned, and the original
predicates are applied to the rows in those blocks
Figure 9-45. MDC and generated columns: Integration CL4636.0
Notes:
In some cases, an existing table for already developed applications might be converted to a
MDC table using a generated column for a dimension. The application SQL statements
would refer to the original column name, not the generated column name.
If a MDC table is defined like the previous example, with a dimension on a generated
column month, where month = INTEGER(date)/100:
For queries on the dimension (month), block index range scans can be used.
For queries on the base column (date), block index range scans can also be done to
narrow down which blocks to scan and then apply the predicates on date to the rows in
those blocks only.
The compiler generates the additional dimension predicates to use.
Example: For the query:
select * from MDCTABLE where
date>"2012/03/03" and date<"2013/01/15"
Instructor Guide
The compiler generates the additional predicates:

month>=201203 and month<=201301
which can be used as range predicates for a block index scan.
This gives a list of blocks to be scanned, and the original predicates are applied to the rows
in those blocks.
V8.2
Instructor Guide

Purpose — This shows how the DB2 Optimizer can add predicates during the query
rewrite processing to include a reference to the generated dimension column that would
enable the block dimension index to be used to improve query performance.
Details —
Transition statement — Let's discuss the importance of monotonicity for using generated
columns as a MDC dimension.
Instructor Guide
The Importance of Monotonicity

• Range scans can only be done on derived predicates, as in the previous
example, when the expression used in the generated column definition is
MONOTONIC
if A > B then expr(A) >= expr(B) and if (A < B) then expr(A) <= expr(B)
• Monotonic expression:
B = A/100
• Non monotonic expression:
B = month(date)
A B date B
1 0 2011/03/03 03
103 1 2011/05/17 05
199 1 2011/12/25 12
250 2 2012/02/01 02
378 3 2012/05/24 05
• Compiler determines that as A • Compiler determines that as A increases,

increases, B never decreases B can both increase and decrease
• If the compiler cannot determine the monotonicity of an expression, or if it
determines that an expression is not monotonic, only equality predicates can
be used on the generated column (or dimension if it is a dimension)
Figure 9-46. The Importance of Monotonicity CL4636.0
Notes:
Range queries on a generated column dimension require monotonic column functions.
Expressions must be monotonic to derive range predicates for dimensions on generated
columns. If you create a dimension on a generated column, queries on the base column
will be able to take advantage of the block index on the generated column to improve
performance, with one exception. For range queries on the base column (date, for
example) to use a range scan on the dimension block index, the expression used to
generate the column in the CREATE TABLE statement must be monotonic. Although a
column expression can include any valid expression (including user-defined functions
(UDFs)), if the expression is non-monotonic, only equality or IN predicates are able to use
the block index to satisfy the query when these predicates are on the base column.
DB2 will determine the monotonicity of an expression, where possible, when creating the
generated column for the table, or when creating a dimension from an expression in the
dimensions clause. Certain functions can be recognized as monotonicity-preserving, such
as DATENUM( ), DAYS( ), YEAR( ). Also, various mathematical expressions such as
division, multiplication, or addition of a column and a constant are monotonicity-preserving.
V8.2
Instructor Guide
Uempty Where DB2 determines that an expression is not monotonicity-preserving, or if it cannot

determine this, the dimension will only support the use of equality predicates on its base
column.
Instructor Guide
Instructor notes:
Purpose — This shows examples of monotonic and non-monotonic expressions that might
be used for a generated column value. If the application that uses a MDC table that was
created with a generated column for one dimension uses range predicates on the base
column, then the monotonicity of the expression used for the generated column is a
significant performance issue.
Details —
Transition statement — Let's begin to take the process of designing the MDC table one
step at a time.
V8.2
Instructor Guide
Uempty
Step 1: Identify candidate dimensions

• Determine which queries will benefit from block-level clustering
– Access to subsets of large tables
• Examine potential workload for columns involved in the following:

– Columns in range or equality of IN-list predicates;
– Columns with coarse granularity;
– Roll-in and Roll-out of data
– Columns referenced in GROUP BY and/or ORDER BY clauses
– Foreign key columns or join clauses in fact table of star schema
database
– Combinations of the above
• Usually a workload will have several candidates - Rank them
Figure 9-47. Step 1: Identify candidate dimensions CL4636.0
Notes:
The first step in implementing a MDC table is selecting possible dimension columns. It will
be necessary to review the SQL used for the application, looking for queries that might
benefit from block-level clustering. One characteristic would be those that access relatively
large subsets of large tables. If the SQL only needs to read a few rows or if the table is
small, a MDC will probably not be a useful performance option. If applications have SQL
that are known to have performance problems, start with those.
Examine the potential workload for columns involved in the following:
• Columns in range or equality of IN-list predicates
• Columns with coarse granularity
• Roll-in and Roll-out of data
• Columns referenced in GROUP BY and/or ORDER BY clauses
• Foreign key columns or join clauses in fact table of star schema database
• Combinations of the above
Usually a workload will have several candidates - Rank them.
Instructor Guide
There are many queries that can take advantage of multidimensional clustering.
Examples of such queries follow.
In these examples, assume that there is a MDC table t1 with dimensions c1, c2, and c3.
Example 1:
SELECT .... FROM t1 WHERE c3 < 5000
This query involves a range predicate on a single dimension, so it can be internally
rewritten to access the table using the dimension block index on c3. The index is scanned
for block identifiers (BIDs) of keys having values less than 5000, and a mini-relational scan
is applied to the resulting set of blocks to retrieve the actual records.
Example 2:
SELECT .... FROM t1 WHERE c2 IN (1,2037)
This query involves an IN predicate on a single dimension, and can trigger block
index-based scans. This query can be internally rewritten to access the table using the
dimension block index on c2. The index is scanned for BIDs of keys having values of 1 and
2037, and a mini-relational scan is applied to the resulting set of blocks to retrieve the
actual records.
Example 3:
SELECT ... FROM t1
WHERE c2 > 100 AND c1 = '16/03/1999' AND c3 > 1000 AND c3 < 5000
This query involves range predicates on c2 and c3 and an equality predicate on c1, along
with a logical AND operation. This can be internally rewritten to access the table on each of
the dimension block indexes:
• A scan of the c2 block index is done to find BIDs of keys having values greater than 100
• A scan of the c3 block index is done to find BIDs of keys having values between 1000
and 5000
• A scan of the c1 block index is done to find BIDs of keys having the value '16/03/1999'.
A logical AND operation is then done on the resulting BIDs from each block scan, to find
their intersection, and a mini-relational scan is applied to the resulting set of blocks to find
the actual records.
V8.2
Instructor Guide

Purpose — This explains that the first step in implementing a MDC table will be selection
of the dimension columns. Careful analysis of the application SQL workload will be
necessary to make good choices.
Details —
Transition statement — The next step is to estimate the number of cells that will be in the
MDC table.
Instructor Guide
Step 2: Estimate number of cells per table

• Identify how many potential cells are possible in a table
organized along a set of candidate dimensions
• Find the number of unique combinations of the dimension

values that occur in the data:
– Exact: (if table exists)
WITH Cell_Table as
(SELECT DISTINCT dim_col1, dim_col2, ..., dim_colN
FROM Table)
SELECT COUNT (*) as Cell_Count FROM Cell_Table
• If statistics are available – To approximate, multiply column

cardinalities for the dimension candidates
Figure 9-48. Step 2: Estimate number of cells per table CL4636.0
Notes:
Identify how many potential cells are possible in a table organized along a set of candidate
dimensions. Determine the number of unique combinations of the dimension values that
occur in the data.
If the table exists, an exact number can be determined for the current data by simply
selecting the number of distinct values in each of the columns that will be dimensions for
the table.
The following general SQL can be used:
WITH Cell_Table as
( SELECT DISTINCT dim_col1, dim_col2, ..., dim_colN
FROM Table)
A specific example would be:
WITH Cell_Table as ( Select distinct branch_id,teller_id from mdc.hist1 )
V8.2
Instructor Guide
Uempty Alternatively, an approximation can be determined if you only have the statistics for a table,
by multiplying the column cardinalities for the dimension candidates.
Instructor Guide
Instructor notes:
Purpose — This shows two methods for estimating the number of cells that would be in a
MDC table. The method selected depends on whether a table with representative data
contents is available.
Details —
Transition statement — Let's look at generating cell density statistics when planning the
dimension columns for a MDC table.
V8.2
Instructor Guide
Uempty
Step 3: Cell density statistics

• RpC – Number of Rows per Cell
• RpB – Number of Rows per Block (extent)
RpB= Extentsize*(Pagesize-Overhead)/RowLength
WITH cell_table ( dim_col1, dim_col2,..., dim_colN, RpC) AS
(SELECT dim_col1, dim_col2,..., dim_colN, COUNT(*) RpC
FROM table GROUP BY dim_col1, dim_col2, ... , dim_colN)
SELECT
MIN (RpC) Min_RpC, MAX (RpC) Max_RpC, AVG (RpC) Average_RpC,
INTEGER(STDDEV(RpC)) as STANDARD_DEV_RpC,
COUNT(*) as Cell_Count, SUM(1+(RpC/RpB)) AS Num_Extents
FROM cell_table
MIN_RPC MAX_RPC AVERAGE_RPC STANDARD_DEV_RPC CELL_COUNT NUM_EXTENTS
----------- ----------- ----------- ---------------- ----------- -----------
933 1055 1000 29 100 103
Figure 9-49. Step 3: Cell density statistics CL4636.0
Notes:
The following methodology can be used to collect cell density statistics.
Where:
RpC – Number of Rows per Cell
RpB – Number of Rows per Block (extent) must be calculated as:
RpB= Extentsize*(Pagesize-Overhead)/RowLength
To analyze the cell density for a MDC, the following general SQL could be used:
WITH cell_table ( dim_col1, dim_col2,..., dim_colN, RpC) AS
(SELECT dim_col1, dim_col2,..., dim_colN, COUNT(*) RpC
FROM table GROUP BY dim_col1, dim_col2, ... , dim_colN)
SELECT
COUNT(*) as Cell_Count, SUM(1+(RpC/RpB)) AS Num_Extents
FROM cell_table
Instructor Guide
A specific example follows:

WITH cell_table ( branch_id,teller_id, RpC) AS
(SELECT DISTINCT branch_id,teller_id, COUNT(*) AS RpC
FROM MDC.HIST1
GROUP BY branch_id,teller_id)
SELECT
COUNT(*) as Cell_Count, SUM(1+(RpC/1049)) AS Num_Extents
FROM cell_table
The output would be similar to the following:
MIN_RPC MAX_RPC AVERAGE_RPC STANDARD_DEV_RPC CELL_COUNT NUM_EXTENTS
----------- ----------- ----------- ---------------- ----------- -----------
933 1055 1000 29 100 103
V8.2
Instructor Guide

Purpose — This shows a method getting cell density statistics for a MDC table.
Details —
Transition statement — Let's look at the considerations for implementing a MDC table in
a partitioned database.
Instructor Guide
Database partitioning and MDC (1 of 2)

• If the distribution key is the same as a dimension in the table, then on
each database partition you find a different set of slices of the table
• For example, if our MDC table is distributed by the color dimension and
hashed across two database partitions, we might find the following:
SELECT .... WHERE COLOR='YELLOW' SELECT .... WHERE COLOR='GREEN'
2013, 2012, 2013, 2012,

Canada, Canada, Canada, Canada,
2013, 2013, yellow red 2013, 2013, green
Nation blue Nation
Canada, Canada, Canada, Canada
green green
dimension yellow yellow dimension
2013 2012, 2013 2012,
Mexico, Mexico, Mexico, Mexico,
blue 2013, yellow red 2013, green
2013, 2013,
Mexico, Mexico,
Mexico, Mexico,
yellow green
yellow green
Color Year Color Year

dimension dimension dimension dimension
Partition 1 Partition 2
Figure 9-50. Database partitioning and MDC (1 of 2) CL4636.0
Notes:
If a MDC table is implemented in a DB2 partitioned database, the selection of the
distribution key columns and dimension columns can effect the number of records stored
per cell as blocks are allocated for data on a partition basis.
If the distribution key is the same as a dimension in the table, then on each database
partition you'll find a different set of slices of the table. Storing all of the records for each
slice in one database partition might improve the space utilization of the MDC blocks, but it
will limit the performance benefits of parallel processing across database partitions.
For example, if our MDC table is distributed by the color dimension and hashed across 2
database partitions, we would find that all of the records for each color are stored in a
single database partition. If a SQL query has a predicate of COLOR = 'YELLOW', the query
will be routed to database partition 1 for processing, while a query with a predicate of
COLOR = 'GREEN' is completely processed on database partition 2.
V8.2
Instructor Guide

Purpose — This shows an example where a MDC table is implemented in a DB2
partitioned database and the distribution key includes one or more of the dimension
columns.
Details —
Transition statement — Let's look at an example where the partitioned MDC table is
distributed using non-dimension columns.
Instructor Guide
Database partitioning and MDC (2 of 2)

• If the distribution key is NOT the same as a dimension in the table, then
each database partition contains a subset of data from each slice
• For example, if our MDC table example is distributed using a

non-dimension column and hashed across two database partitions, we
might find the following:
SELECT .... WHERE COLOR='YELLOW'
2013, 2012,
2013, 2012,
Canada, Canada,
Canada, Canada,
blue 2013, 2013, yellow
Nation blue 2013, 2013, yellow Nation Canada, Canada,
Canada, Canada,
dimension yellow yellow dimension yellow yellow
2013 2012,
2013 2012,
Mexico, Mexico,
Mexico, Mexico,
blue 2013, yellow
blue 2013, yellow 2013,
2013, Mexico,
Mexico, Mexico,
Mexico, yellow
yellow yellow
yellow
Color Year Color Year
dimension dimension dimension dimension
Partition 1 Partition 2
Figure 9-51. Database partitioning and MDC (2 of 2) CL4636.0
Notes:
If the distribution key column of a MDC table is NOT the same as a dimension in the table,
then each database partition contains a subset of data from each slice.
In this case, when estimating the space occupancy and density in a DB2 partitioned
database environment, you need to consider the number of records per cell on average on
each database partition, not across the entire table.
For example, if our example MDC table is distributed using a non-dimension column and
hashed across 2 database partitions, we would find that some of the records for each color
are stored on each database partition. If a SQL query has a predicate of COLOR =
'YELLOW', the query will be routed to database partitions 1 and 2 for processing.
V8.2
Instructor Guide

Purpose — This shows an example where a MDC table is implemented in a DB2
partitioned database and the distribution key does NOT include one of the dimension
columns.
Details —
Transition statement — Let's summarize the main MDC table options that would be used
to tune MDC efficiency.
Instructor Guide
MDC tuning summary

• Design factors for optimal cell density:
– Varying the number of dimensions
– Varying the granularity of one or more dimensions with
generated columns
– Varying the block (extent) size and page size of the table space
• Use Design Advisor to recommend possible MDC tables to

reduce query costs for an application workload:
– Graphical interface invoked from Control Center and Activity
Monitor
– db2advis command
db2advis -d musicdb -i workload.sql -m C -o advise1.out
Figure 9-52. MDC tuning summary CL4636.0
Notes:
In review, the primary design factors for optimal cell density:
- Varying the number of dimensions
- Varying the granularity of one or more dimensions with generated columns
- Varying the block (extent) size and page size of the table space
DB2 provides a Design Advisor that can be used to recommend possible MDC tables to
reduce query costs based on an application workload. The analysis for MDC tables can be
combined with recommendations for new indexes, partitioning key changes, and use of
Materialized Query Tables. The Design Advisor has two interfaces, a graphical interface
can be invoked from the Control Center or from within the Activity Monitor and a command
line interface using the db2advis command. The db2advis command has a mode (-m)
option of 'C', which requests the Design Advisor to consider implementation of MDC tables.
This option also looks for potential performance gains from adding a standard single
dimensional clustered index.
V8.2
Instructor Guide

Purpose — This summarizes the design factors that can be used to tune MDC tables.
Details —
Transition statement — Let's look at the MDC Design Advisor.
Instructor Guide
MDC Design Advisor (1 of 2)

execution started at timestamp 2013-10-25-09.41.41.332156
found [4] SQL statements from the input file
Recommending Multi-Dimensional Clusterings...
total disk space needed for initial set [ 0.018] MB
total disk space constrained to [ 67.169] MB
Note: MDC selection in the DB2 Design Advisor requires the target database
to be populated with a data sample. This sample is used for estimating
the number and density of MDC cells in any MDC solution that the
Design Advisor will recommend. If your database is empty the
Design Advisor will not recommend MDC.
Prioritizing Multi-dimensional Clustering candidate tables...

Multi-dimensional Clustering candidate tables, in priority sequence:
Table 0: HIST1,
number of pages 10007,
block size 16
There are 1 candidate tables considered for Multi-dimensional Clustering conversion
Searching the multidimensional space for solutions for HIST1...
Percentage of search points visited...

0% 100
2 clustering dimensions in current solution

[6685.0000] timerons (without any recommendations)
[6187.2103] timerons (with current solution)
[7.45%] improvement
Figure 9-53. MDC Design Advisor (1 of 2) CL4636.0
Notes:
This shows an example of the db2advis command output when performing analysis of a
workload for implementation of MDC tables.
Some Design Advisor analysis, like index evaluation, is based on the table statistics and
Explain output. Since the proper selection of dimensions for a MDC table could vary
depending on the data in a table, the Design Advisor needs to examine the table's data.
Restrictions on MDC recommendations

• An existing table must be populated with sufficient data before the Design Advisor
considers MDC for the table. A minimum of twenty to thirty megabytes of data is
suggested. Tables that are smaller than 12 extents are excluded from consideration.
• MDC requirements for new MQTs will not be considered unless the sampling option, -r,
is used with the db2advis command.
• The Design Advisor does not make MDC suggestions for typed, temporary, or federated
tables.
V8.2
Instructor Guide
Uempty • Sufficient storage space (approximately 1% of the table data for large tables) must be
available for the sampling data that is used during the execution of the db2advis
command.
• Tables that have not had statistics collected are excluded from consideration.
• The Design Advisor does not make suggestions for multicolumn dimensions.
Instructor Guide
Instructor notes:
Purpose — This shows an example of db2advis command output that includes analysis for
MDC tables.
Details —
Transition statement — Next we will look at the suggested MDC table definition.
V8.2
Instructor Guide
Uempty
MDC Design Advisor (2 of 2)

--
--
-- LIST OF MODIFIED CREATE-TABLE STATEMENTS WITH RECOMMENDED PARTITIONING KEYS
AND TABLESPACES AND/OR RECOMMENDED MULTI-DIMENSIONAL CLUSTERINGS
-- ===========================
-- table["MDC "."HIST1"], Added table size 3.816MB
-- DROP INDEX "MDC "."HIST1IX1";
-- CREATE INDEX "MDC "."HIST1IX1" ON "MDC "."HIST1" ("BRANCH_ID" ASC)
ALLOW REVERSE SCANS ;
-- CREATE TABLE "MDC "."HIST1" ( "ACCT_ID" INTEGER NOT NULL ,
-- "TELLER_ID" SMALLINT NOT NULL ,
-- "BRANCH_ID" SMALLINT NOT NULL ,
-- "BALANCE" DECIMAL(15,2) NOT NULL ,
-- "DELTA" DECIMAL(9,2) NOT NULL ,
-- "PID" INTEGER NOT NULL ,
-- "TSTMP" TIMESTAMP NOT NULL WITH DEFAULT ,
-- "ACCTNAME" CHAR(20 OCTETS) NOT NULL ,
-- "TEMP" CHAR(6 OCTETS) NOT NULL ,
-- MDC1310251341430 GENERATED ALWAYS AS ( (SMALLINT(TELLER_ID-(2))/(2))) )
-- IN "MDCTSP1"
-- ---- ORGANIZE BY ROW
-- ORGANIZE BY (
-- MDC1310251341430,
-- BRANCH_ID )
-- ;
-- COMMIT WORK ;
Figure 9-54. MDC Design Advisor (2 of 2) CL4636.0
Notes:
This shows the portion of the db2advis command output with the data definition language
(DDL) for a new MDC table. In the example, generated columns has been selected for one
of the two defined dimensions to store the table data efficiently.
Instructor Guide
The complete report output follows:

execution started at timestamp 2013-10-25-09.41.41.332156
found [4] SQL statements from the input file
Recommending Multi-Dimensional Clusterings...
total disk space needed for initial set [ 0.018] MB
total disk space constrained to [ 67.169] MB
Note: MDC selection in the DB2 Design Advisor requires the target database
to be populated with a data sample. This sample is used for estimating
the number and density of MDC cells in any MDC solution that the
Design Advisor will recommend. If your database is empty the
Design Advisor will not recommend MDC.
Prioritizing Multi-dimensional Clustering candidate tables...

Multi-dimensional Clustering candidate tables, in priority sequence:
Table 0: HIST1,
number of pages 10007,
block size 16
There are 1 candidate tables considered for Multi-dimensional Clustering conversion
Searching the multidimensional space for solutions for HIST1...
Percentage of search points visited...

0% 100
2 clustering dimensions in current solution

[6685.0000] timerons (without any recommendations)
[6187.2103] timerons (with current solution)
[7.45%] improvement
--
--
-- LIST OF MODIFIED CREATE-TABLE STATEMENTS WITH RECOMMENDED PARTITIONING KEYS AND
TABLESPACES AND/OR RECOMMENDED MULTI-DIMENSIONAL CLUSTERINGS
-- ===========================
-- table["MDC "."HIST1"], Added table size 3.816MB
-- DROP INDEX "MDC "."HIST1IX1";
-- CREATE INDEX "MDC "."HIST1IX1" ON "MDC "."HIST1" ("BRANCH_ID" ASC) ALLOW
REVERSE SCANS ;
-- CREATE TABLE "MDC "."HIST1" ( "ACCT_ID" INTEGER NOT NULL ,
-- "TELLER_ID" SMALLINT NOT NULL ,
-- "BRANCH_ID" SMALLINT NOT NULL ,
-- "BALANCE" DECIMAL(15,2) NOT NULL ,
-- "DELTA" DECIMAL(9,2) NOT NULL ,
-- "PID" INTEGER NOT NULL ,
-- "TSTMP" TIMESTAMP NOT NULL WITH DEFAULT ,
-- "ACCTNAME" CHAR(20 OCTETS) NOT NULL ,
-- "TEMP" CHAR(6 OCTETS) NOT NULL ,
-- MDC1310251341430 GENERATED ALWAYS AS ( (SMALLINT(TELLER_ID-(2))/(2))) )
-- IN "MDCTSP1"
-- ---- ORGANIZE BY ROW
V8.2
Instructor Guide
Uempty -- ORGANIZE BY (
-- MDC1310251341430,
-- BRANCH_ID )
-- ;
-- COMMIT WORK ;
-- No new partitioning keys or tablespaces are recommended for this workload.
Instructor Guide
Instructor notes:
Purpose — This shows a portion of the db2advis command output that suggested a new
MDC table.
Details —
Transition statement — Let's look at some examples that compare performance of a
MDC table to a table with standard indexes.
V8.2
Instructor Guide
Uempty
MDC performance: Example
Customer Store
Daily_Sales
*Custkey
*Storekey
Period * Prodkey *storekey

* Promokey
*Perkey
* Custkey
* Perkey Product
Promotion * Storekey
*Prodkey
*Promokey
* Perkey
Figure 9-55. MDC performance: Example CL4636.0
Notes:
This visual shows the relationship between the fact table Daily_Sales, and a group of
related tables Customer, Store, Period, Promotion, and Product. The Daily_Sales table
contains a set columns that are used as the keys for each of the other tables. For example
the column Storkey in Daily_Sales is the key to the Store table.
In the following examples:
• A MDC table was created with the columns Storkey (Store Table key) and Perkey
(Period Table Key) as the dimensions.
• RID indexes are created for the other columns:
- Custkey (Customer Table key)
- Promokey (Promotion Table key)
- Prodkey (Product Table key)
The performance of this table is compared to a non-MDC table with all RID indexes defined
on these five columns.
Instructor Guide
Instructor notes:
Purpose — This introduces the set of tables that were used for the following performance
tests and explains which columns were selected as the dimensions for the MDC table.
Details —
Transition statement — Let's look at disk space required for the MDC and non-MDC
versions of these tables and indexes.
V8.2
Instructor Guide
Uempty
Example: Object size comparisons

• MDC storekey block index size 71 pages and 2 levels
• nonMDC storekey index size 222054 pages and 4 levels
• MDC perkey block index size 72 pages and 2 levels

• nonMDC perkey index size 222054 pages and 4 levels
• MDC prodkey index size 222086 pages and 4 levels

• nonMDC prodkey index size 222086 pages and 4 levels
• MDC daily sales table size 689264 pages

• nonMDC daily sales table size 681903 pages
Figure 9-56. Example: Object size comparisons CL4636.0
Notes:
The block indexes for storkey and perkey for the MDC table are much smaller than the RID
indexes required for the nonMDC table.
MDC storekey block index size 71 pages and 2 levels
nonMDC storekey index size 222054 pages and 4 levels
MDC perkey block index size 72 pages and 2 levels

nonMDC perkey index size 222054 pages and 4 levels
The RID indexes, like prodkey for the MDC table, are the same size as the RID indexes
required for the nonMDC table.
MDC prodkey index size 222086 pages and 4 levels
nonMDC prodkey index size 222086 pages and 4 levels
The size of the daily_sales table defined as MDC is slightly larger than the non-MDC table.
MDC daily sales table size 689264 pages
nonMDC daily sales table size 681903 pages
Instructor Guide
Instructor notes:
Purpose — This shows that for the example, the table size increased slightly larger than
the nonMDC table, but the block indexes for the MDC table are much smaller.
Details —
Transition statement — Let's look at the performance of a query that accesses one slice
of the MDC table.
V8.2
Instructor Guide
Uempty
Example: Point query on Block Index

• Select a multi-dimensional slice using a block index
seconds
select sum(handling_charges)
from daily_sales where storekey = 1
Store 50%
Key
Perkey
Figure 9-57. Example: Point query on Block Index CL4636.0
Notes:
This query demonstrates the power of block prefetch coupled with using the block index to
select the records using a particular store number. This is a slice of the cube.
The query used was:
select sum(handling_charges)
from daily_sales where storekey = 1
For this query, the MDC table reduced the elapsed time by about 50%.
Instructor Guide
Instructor notes:
Purpose — This shows the performance advantage of the MDC table when one slice of
the table is returned for a SELECT statement.
Details —
Transition statement — Let's look the performance of a query that accesses a range of
keys from a block index of the MDC table.
V8.2
Instructor Guide
Uempty
Example: Range query on a Block Index

• Select all rows in a range of dates
seconds
select sum(handling_charge) from daily_sales
where perkey between 1996050 and 1996090
28%
Store
Key
Perkey
Figure 9-58. Example: Range query on a Block Index CL4636.0
Notes:
This shows the performance results for a query that used the block index on perkey to
access a range of dates.
The query used was:
Instructor Guide
Instructor notes:
Purpose — This shows the performance advantage of the MDC table when selecting a
range of keys from one of the MDC table dimensions.
Details —
Transition statement — Let's look the performance of a query that accesses a range of
keys using both block indexes of the MDC table.
V8.2
Instructor Guide
Uempty
Example: Range query on two dimensions

• Two dimensions qualify (store and date range)
seconds
and storekey = 2
45%
Store
Key
Perkey
Figure 9-59. Example: Range query on two dimensions CL4636.0
Notes:
This show the performance results from a query that used both block indexes, perkey to
access a range of dates and storkey to access one store.
The query used was:
and storekey = 2
Instructor Guide
Instructor notes:
Purpose — This shows the performance advantage of the MDC table when selecting
records using both block indexes.
Details —
Transition statement — Let's look at the performance of a query that performs a table
scan of the MDC table.
V8.2
Instructor Guide
Uempty
Example: Full Table Scan

• Scan entire table:
– Little to no advantage from MDC seconds
select storekey, sum(handling_charge)

from daily_sales
group by storekey
Store
Key
Perkey
Figure 9-60. Example: Full table scan CL4636.0
Notes:
This shows the performance results for a query that required a full table scan.
The query used was:
select storekey, sum(handling_charge)
from daily_sales
group by storekey
For this query, the MDC table elapsed time was about the same as the non-MDC table.
Instructor Guide
Instructor notes:
Purpose — This shows that the performance of the MDC table when performing a table
scan is similar to the non-MDC table.
Details —
Transition statement — Let's look at the performance of a query that accesses one cell of
the MDC table using the composite index.
V8.2
Instructor Guide
Uempty
Example: Query on a Cell

• Scan a cell's worth of data
• Small benefit derived from smaller index tree

seconds

where perkey = 1996030 and storekey = 1
10%
Store
Key
Perkey
Figure 9-61. Example: Query on a Cell CL4636.0
Notes:
This shows the performance results for a query that selects the records from one cell off
the MDC table. The smaller size of the block index provided some improvement.
The query used was:
where perkey = 1996030 and storekey = 1
Instructor Guide
Instructor notes:
Purpose — This shows the performance advantage of the MDC table when selecting one
cell from the MDC table.
Details —
Transition statement — Let's look at the performance of a query that uses index ORing of
a block index and a RID index to produce a query result.
V8.2
Instructor Guide
Uempty
Example: Index ORing of Block and RID Indexes

• Result is all records in qualifying blocks,
plus additional RIDs outside of those blocks
• Index ORing can combine multiple block

and/or RID indexes seconds
select sum(quantity_sold) from daily_sales

or prodkey between 12000 and 12050
Key from Resulting

dimension RIDs from blocks and
block index RID index RIDs to fetch
+ = 70%
Figure 9-62. Example: Index ORing of Block and RID Indexes CL4636.0
Notes:
This shows the performance results for a query that used the block index on perkey and a
RID index on prodkey with an OR predicate.
The query used was:
select sum(quantity_sold) from daily_sales
or prodkey between 12000 and 12050
Instructor Guide
Instructor notes:
Purpose — This shows a performance advantage of the MDC table when combining a
block index with a RID index for a OR condition.
Details —
Transition statement — Let's look at the performance of a query that uses one of the RID
indexes of the MDC table.
V8.2
Instructor Guide
Uempty
Example: Point query on Promotion RID Index
• Predicate on 1 key value
select sum(handling_charges) from daily sales seconds

where promokey = 2
Daily_Sales
RID Index * Prodkey

*Storekey
on * Promokey
PromoKey
* Custkey
* Perkey
* Storekey
* Perkey
Figure 9-63. Example: Point query on promotion RID Index CL4636.0
Notes:
This shows the performance results for a query that used the RID index on promokey to
produce the query result.
The query used was:
select sum(handling_charges) from daily sales
where promokey = 2
For this query, the MDC table increased the elapsed time by about 1%.
Instructor Guide
Instructor notes:
Purpose — This shows an example of a query of the MDC table using a RID index was
very close to the performance of the non-MDC table.
Details —
Transition statement — Let's look the performance of a query that performs a nested loop
join using a RID index of the MDC table.
V8.2
Instructor Guide
Uempty
Example: Nested loop join with RID Index

select sum(quantity_sold),sum(shelf_cost),count(*)
from product, daily_sales
where product.prodkey=daily_sales.prodkey
and product.category=42
seconds
Daily_Sales
* Prodkey *Storekey
* Promokey
Product * Custkey
* Perkey
*Prodkey * Storekey
* Perkey
Figure 9-64. Example: Nested loop join with RID Index CL4636.0
Notes:
This shows the performance results for a query that performs a nested loop join using a
RID index on the MDC table.
The query used was:
from product, daily_sales
where product.prodkey=daily_sales.prodkey
and product.category=42
For this query, the MDC table increased the elapsed time by about 2%.
Instructor Guide
Instructor notes:
Purpose — This shows an example of a query of the MDC table using a RID index for a
nested loop join was very close to the performance of the non-MDC table.
Details —
Transition statement — Let's look the performance of a query that performs a nested loop
join using a Block index of the MDC table.
V8.2
Instructor Guide
Uempty
Example: Nested loop join with Block Index

from store, daily_sales
where store.storekey=daily_sales.storekey
and store_number='10'
seconds
Daily_Sales
* Prodkey *Storekey
* Promokey
Store * Custkey
* Perkey
*Storekey * Storekey
* Perkey
Figure 9-65. Example: Nested loop join with Block Index CL4636.0
Notes:
This shows the performance results for a query that performs a nested loop join using a
Block index on the MDC table.
The query used was:
from store, daily_sales
where store.storekey=daily_sales.storekey
and store_number='10'
Instructor Guide
Instructor notes:
Purpose — This shows a performance advantage of the MDC table when using a block
index to perform a nested loop join.
Details —
V8.2
Instructor Guide
Uempty
Unit summary
• Compare the features and performance advantages of multidimensional clustering
(MDC) to single-dimensional clustering
• Define the concepts of MDC tables, including cell, slice, and dimension
• Describe the characteristics of the block indexes used for MDC tables including the
index maintenance performed for SQL INSERT, DELETE, and UPDATEs
• Explain how the block and row indexes can be combined to efficiently process SQL
statements
• Utilize the LOAD Utility to roll-in new data into a MDC table
• Analyze the effects on table space size of
selecting alternative dimensions and extent sizes
Notes:
Instructor Guide
Instructor notes:
Purpose —
Details —
V8.2
Instructor Guide
Uempty
Student exercise 8
Notes:
Instructor Guide
Instructor notes:
Purpose —
Details —
V8.1
Instructor Guide
Uempty Unit 10. Advanced Data Movement
Estimated time
02:30

This unit describes the various types of data movement requirements
and the DB2 provided tools and techniques for moving data. Students
will learn ways to move data between tables. The topics include using
the DB2 utilities LOAD, INGEST and DB2MOVE. The performance
characteristics of the LOAD utility and the effects of LOAD options and
database configuration options on load performance will be discussed.
Examples of INGEST processing options will be presented as well as
the monitoring and restart capabilities of this utility. We will also
discuss using the ADMIN_MOVE_TABLE procedure to make table
changes online.

• Configure the LOAD utility options to optimize the performance of
loading data into DB2 tables
• Describe the conditions that would impact selection of the INGEST
utility rather than using LOAD
processing
• Utilize the db2move utility to move a group of tables into the same
or a different database
• Copy the objects for a schema using the db2move utility or the
ADMIN_COPY_SCHEMA procedure
• Move and make changes to tables with a minimal loss of table
availability using ADMIN_MOVE_TABLE
© Copyright IBM Corp. 2005, 2015 Unit 10. Advanced Data Movement 10-1
Instructor Guide
Unit objectives
• Configure the LOAD utility options to optimize the performance
of loading data into DB2 tables
• Describe the conditions that would impact selection of the
INGEST utility rather than using LOAD
processing
• Utilize the db2move utility to move a group of tables into the
same or a different database
Notes:
V8.1
Instructor Guide

Instructor Guide
Review - Load utility characteristics

• Inserts data into a table from an external file, tape, named pipe, or
cursor
• In addition to being faster than applications or utilities that use SQL
Inserts, it has the following advantages:
– Rows being loaded are not logged
– Pages are built from scratch
– Can take advantage of SMPs and intra-partition parallelism (that is, highly
scalable and fully parallel)
– Statistics can be gathered during a load
• LOAD utility has the following differences with SQL INSERT

processing:
– Triggers are not fired when rows are added, so business rules associated
with triggers are not enforced by the Load utility
– Additional options supporting loading into a multi-partitioned database
• Example: load from table1.ixf of ixf insert into table1
Figure 10-2. Review - Load utility characteristics CL4636.0
Notes:
The DB2 LOAD utility inserts data into a table from an external file, tape, named pipe, or
cursor. The LOAD utility is intended to perform the high speed load processing for larger
tables much faster than the Import utility.
The LOAD utility has the following advantages compared to using Import:
1. Rows being loaded are not logged – LOAD is less likely to encounter problems
associated with exceeding the defined database log space.
2. Pages are built from scratch – LOAD builds and writes new pages, in new extents
directly to the table space containers rather than have the row level overhead for the
SQL Inserts used by Import.
3. Can take advantage of SMPs and intra-partition parallelism – The LOAD utility options
can be used to utilize the CPU and I/O resources of the system to improve LOAD
performance.
4. Statistics can be gathered during a load – The LOAD utility provides the STATISTICS
option to collect DB2 catalog statistics during a LOAD with the REPLACE option, which
V8.1
Instructor Guide
Uempty could save some of the time required to run the RUNSTATS utility after the load
processing completes.
Since LOAD does not use SQL Inserts, any Insert Triggers defined on the table are not
fired when rows are added, so business rules associated with triggers are not enforced by
the Load utility.
There are a group of additional options for loading data into multipartition tables a DPF
partitioned database, including the modes PARTITION_ONLY, LOAD_ONLY,
PARTITION_AND_LOAD and ANALYZE.
Instructor Guide
Instructor notes:
Purpose — This describes some of the features of the LOAD utility. Student should have
learned about basic LOAD processing in one of the prerequisite database administration
classes. Students that support partitioned databases should already be familiar with
performing partitioned table loads.
Details —
Transition statement — Next we will review the phases for LOAD utility processing.
V8.1
Instructor Guide
Uempty
Load utility phases: Review

• Analyze phase
– Applies only to column-organized table
– Used to build column compression dictionaries
• Load phase:
– Input records are read, formatted and written to the target table containers and
to the Load copy (optional)
– Index keys are inserted into sorts
• Build phase – Indexes are built
• Delete phase
– Duplicates are removed from unique indexes
• Index Copy phase

– If copy is needed to replace the index object with its shadow copy (ALLOW
READ ACCESS)
Figure 10-3. Load utility phases: Review CL4636.0
Notes:
LOAD utility processing can include the following phases.
1. Analyze phase: Starting with DB2 10.5, a LOAD utility for a column-organized table
may include the ANALYZE phase, which scans the input to create the column
compression dictionaries. This is only used for column-organized tables.
2. Load phase: Input records are read, formatted and written to the target table
containers and to the Load copy (COPY YES option). During this phase, the Index keys
are inserted into sorts for all indexes defined on the table.
3. Build phase: Indexes are built - All of the indexes defined on the table will be updated
or completely rebuilt based on the INDEXING MODE.
4. Delete phase: If the table has any unique indexes and some duplicates were found
during the Build phase, the duplicate data rows are removed.
5. Index Copy phase: If the ALLOW READ ACCESS option was used to allow
applications to read the table during the first three phases of LOAD processing, this
fourth phase is required to replace the original index object with its shadow copy.
Instructor Guide
Instructor notes:
Purpose — To review the phases of processing for the LOAD utility. Students should have
learned this in a prerequisite course. The detailed information about loading
column-organized tables is included in the two lectures of DB2 BLU Acceleration in this
course.
Details —
Transition statement — Next we will look into the process model used for loading data in
a single partition table.
V8.1
Instructor Guide
Uempty
Load process model

• Designed to self-optimize for excellent performance:
– Exploits CPU and disk parallelism
– Highly scalable in SMP environments

Controlled by disk
parallelism option Container
Formatter
db2lfrm
Formatter Buffer
db2lfrm
Formatter Manipulator
shmem
shmem
shmem
Media db2lfrm
Formatter db2lbm
Input db2lfrm Ridder
Reader Formatter Container
Source db2lrid
db2lmr db2lfrm
Formatter
db2lfrm
Formatter
db2lfrm
Formatter
db2lfrm
Controlled by CPU Container
parallelism option
Simplified Load architecture on a single partition
Figure 10-4. Load process model CL4636.0
Notes:
Load Utility Process Model
This section describes the processing model used by a Load in a non-partitioned database
environment. Almost identical processing takes place on a single logical node during a
partitioned database Load.
In the above figure, Load processes (EDUs) are shown as circles, input and output media
as cylinders, and inter-EDU communication channels as rectangles.
The media reader reads raw data from the input source. Input source is either the input
file/pipe (non-partitioned Load) or a socket opened by the partitioning subagent (partitioned
Load). Media reader gets an empty buffer, fills it with data and parses it. If the end of a data
record is found, the portion of the buffer containing complete records (possibly more than
one) is sent to one of the formatters. If the end of record is not found, the buffer is marked
as not containing a complete record and the whole buffer is sent to one of the formatters.
The same formatter will receive all buffers until the end of the current record is reached. In
the present processing model, there is only one media reader per loading partition.
Instructor Guide
Formatters receive raw data buffers from the media reader, get empty buffers and form
record lists.
For MDC tables, records are clustered along the key columns. Primary formatter also
initializes the table. Formatted records do not contain LOB data because LOB data is sent
directly to buffer manipulators and a LOB descriptor is stored in the record list. Thus every
record will fit into a single record list buffer. However, a buffer of raw data containing
multiple records might consume more than a single record list buffer. (The buffers are the
same size, but internal representation can require more storage than the external
representation.) The record list, which is a linked list of buffers, is split when a raw data
buffer containing a complete record is processed. Alternatively, the list is split if it grows
beyond a dynamically determined maximum length, which is not a common occurrence.
The record list is sent to the ridder.
The ridder receives record lists from the formatters, gets empty buffers, assigns record ID
(RID) to each record, sorts index keys and forms extents. Full extents are sent to the buffer
manipulators. The ridder also collects the statistics and performs consistency points.
The ridder performs some MDC specific block clustering. Partially filled extents are
cached in the MDC extent cache, and some partially filled pages are cached in the MDC
page cache. If a buffer is added to the extent cache, it will not be sent to the buffer
manipulators. Hence, the ridder can consume multiple extent buffers. If the number of
buffers in the MDC cache reaches a dynamically determined limit, an extent buffer has to
be sent to the buffer manipulators before anything can be added to the cache.
Buffer manipulators receive formatted extents from the ridder and write them to disk. If
Load was issued with a COPY YES option, buffer manipulators send the extent buffers to
the media writers. In MDC, buffer manipulators might need to do a read to bring in partial
extent before doing the write.
Media writers (not shown in the figure) receive formatted extents from the buffer
manipulators and write them to the Load copy.
The agent (not shown in the figure) allocates the memory, spawns and coordinates the
EDUs.
In the present model, there is only one media reader per Load (one on each partition, if
DPF is used).
The number of formatters is equal to the Load CPU_PARALLELISM option. This is either
the user-specified value or the number of available CPUs (for more than 5 CPUs, the
parallelism is the number of CPUs minus one). With DPF enabled, the number of available
CPUs is the total number of CPUs on the machine divided by the number of database
partitions defined on the machine. Under some circumstances (LOB data present, for
instance) CPU parallelism is forced to 1. Parallelism is reduced if there are memory
constraints. The maximum number of formatters used by a single Load is 30 per database
partition. If multiple formatters are used, a single ridder will be spawned. Otherwise, the
formatter will also perform the work of the ridder.
V8.1
Instructor Guide
Uempty The number of buffer manipulators is equal to the Load DISK_PARALLELISM option,
which can be specified by the user. Otherwise, DB2 uses the maximum of one buffer
manipulator per four formatters, or one buffer manipulator per each container. The
maximum number of buffer manipulators is the larger of 50, or four times the number of
formatters.
One media writer is spawned for each copy target. There is a single agent per Load.
Instructor Guide
Instructor notes:
Purpose — This provides students with some details about the internal process model
used by the LOAD utility. Students need to understand how the CPU_PARALLELISM and
DISK_PARALLELISM options might impact the processing. The details for MDC table
processing for LOADs will be covered in the unit on Multidimensional Clustering.
Details —
Transition statement — Next we will look at the process model for the Load utility when
loading multipartition tables in a DB2 partitioned database.
V8.1
Instructor Guide
Uempty
Multipartition Load utility process model

• Data partitioning and loading can be performed in a single
step:
– Additional infrastructure needed for partitioning
– Same interface (both API and CLP) as the single partition Load
– Consecutive layers are fully connected via TCP/IP sockets
Spawns and Number and One per target
monitors location are partition.
Load
other agents configurable.
Coordinator
db2agent
hashed to target partition

Partitioning Media
Agent Reader
Pre-part Formatter
db2lpart Formatter Series of
Input db2lmr
Agent db2lfrm
Formatter db2lfrm
Formatter single
Source
db2lpprt db2lfrm
Formatter db2lfrm partition
Formatter
tcpip
tcpip
db2lfrm
Formatter db2lfrm Loads
Formatter
db2lfrm
Formatter db2lfrm -
Formatter
db2lfrm one on
Formatter db2lfrm
Formatter
db2lfrm each target
Formatter db2lfrm
Formatter partition
db2lfrm db2lfrm
One per input source.

Run on coordinator
partition. Simplified Load architecture in a partitioned environment
Figure 10-5. Multipartition Load utility process model CL4636.0
Notes:
This section describes the processing model used by a Load in a partitioned database
environment.
Additional infrastructure is needed to process the data from the input sources, perform the
partitioning, and send the records (via TCP/IP sockets) to the correct target database
partition. Once on the target partition, records are read by the media readers. (They read
from sockets in this case – partitioned data sets are only materialized on the target partition
if the PARTITION_ONLY mode is used.)
The pre-partitioning agent opens the input source (file/pipe/user-supplied file transfer
command) and sends buffers containing complete records to the partitioning agents via
TCP/IP sockets in a round-robin fashion. This agent always runs on the coordinator
database partition. If ANYORDER modifier is specified, one pre -partitioning agent will be
spawned for each input source, and the processing will run in parallel. Otherwise, only a
single pre-partitioning agent is spawned and multiple input sources are processed in
series. A pre-partitioning agent is not created if the input source is of type CURSOR.
Instructor Guide
The partitioning agent extracts the partitioning columns from each data record it receives
from the pre-partitioning agents, determines the target database partition, and sends the
record to the media reader running on that partition via a TCP/IP socket. The number of
partitioning agents and the nodes on which they run are configurable using the
PARTITIONING_DBPARTNUMS Load option.
The mini buffer manipulator agents (not shown in the figure) write out the partitioned files
during a PARTITION_ONLY load.
The Load coordinator agent spawns all other agents and monitors their progress. If the
input source is of type CURSOR, the coordinator opens the cursor and fetches the records
from it. If the CLIENT option is used, the coordinator routes data to the pre-partitioning
agent.
Note that the default value of CPU_PARALLELISM for a partitioned Load is calculated
using the number of CPUs per database partition, not the total number of CPUs on the
machine.
V8.1
Instructor Guide

Purpose — To show the additional process model required to support the Load utility when
the target table has multiple partitions.
Details —
Transition statement — Next we will look at the effect of allowing applications to read a
table when a LOAD with the INSERT option includes the ALLOW READ ACCESS option.
Instructor Guide
Online Load: ALLOW READ ACCESS

• Increased data availability without sacrificing performance
• Pre-existing data visible to table scanners, but no concurrent application
updates or deletes (Uses a table level U lock)
• Old scanners are drained before Load can start
• Super-exclusive lock (Z) needed only during commit
• LOCK WITH FORCE option ensures Load does not wait for locks
• Index rebuild using a shadow object:
– While Load is running, scanners access the original index
– Original is replaced with shadow at commit time
– Shadow can be built in a user-specified table space
• Incremental indexing using a pseudo-insert algorithm:
– Newly inserted keys are marked invisible
– At commit time, inserted keys are made visible instantaneously
– Lazy approach to cleanup of pseudo-inserted markers - no additional scan needed
– Some logging overhead
Figure 10-6. Online Load: ALLOW READ ACCESS CL4636.0
Notes:
If a LOAD utility is run in INSERT mode, the ALLOW READ ACCESS option can be
included to allow applications to continue to reading the existing data while new rows are
loaded. This can be used to reduce the loss of data availability associated with using the
LOAD utility. The Intent to Update(U) Lock is acquired by LOAD to permit reads to
pre-existing data, but no concurrent update or delete SQL processing will be allowed.
The applications that are holding a lock on the table when LOAD starts, called Old
scanners are drained before Load can begin processing. This includes those with read and
write locks on the table. Load uses the super-exclusive lock (Z) only during the final load
commit processing.
The LOCK WITH FORCE option ensures Load does not wait for locks. If ALLOW READ
ACCESS is included, those applications with read locks will not be forced during the initial
LOAD setup phase but they might be forced off during the final load commit process.
When the ALLOW READ ACCESS option is used for a LOAD, the existing indexes will be
left in place to support any concurrent read-only applications. This requires the Index
rebuild to use a shadow object to create a copy of the indexes with pointers to the newly
V8.1
Instructor Guide
Uempty loaded data. The original index copy is replaced with the shadow copy at commit time. The
Shadow can be built in a user-specified temporary table space or, by default, the shadow
copy will use space in the table space where the original indexes are located. Incremental
indexing uses a pseudo-insert algorithm, where the newly inserted keys are marked
invisible until the LOAD completes. At commit time, the inserted keys are made visible
instantaneously. A Lazy approach to cleanup of pseudo-inserted markers is used to
eliminate the need for an additional scan of the indexes, but there will be some logging
overhead as these markers are gradually removed.
Instructor Guide
Instructor notes:
Purpose — This explains the impact of using the ALLOW READ ACCESS option to allow
applications to continue processing a table that is being loaded with the INSERT option.
Details —
Transition statement — Now we will look at some of the LOAD utility options that have the
most impact on performance.
V8.1
Instructor Guide
Uempty
Load options affecting performance (1 of 2)

• CPU PARALLELISM:
– Default: N-1 or N on a N CPU system (usually OK, high for cursor Load)
– Bounded by available memory
– Ensure UTIL_HEAP_SZ and data buffers high enough
• DISK PARALLELISM
– Default: Number of containers (usually performs best)
• DATA BUFFER:
– Default: A percentage of available UTIL_HEAP_SZ (too low, especially for
MDC)
– Reasonable values are 400-800 (4K pages) per CPU
• SORT BUFFER:
– Specifies a value that overrides the SORTHEAP database configuration
parameter during a load operation
– Can NOT exceed value of SORTHEAP, can be used to reduce memory used
for creating indexes during LOAD processing
Figure 10-7. Load options affecting performance (1 of 2) CL4636.0
Notes:
Several of the LOAD utility options can have a significant effect on the performance of a
LOAD utility.
CPU_PARALLELISM
Default = number of CPUs (If number of CPUs <= 5)
number of CPUs - 1 (If number of CPUs > 5)
This parameter controls the number of formatters that the Load utility will spawn to
parallelize the record formatting, which is the most CPU-intensive portion of processing in
the Load phase. The default value is close to the total number of CPUs, and it is usually
optimal. Load will internally reduce the CPU parallelism if the amount of memory given to
the utility is not sufficient to sustain the requested number of EDUs. When this reduction
takes place, messages will be written in the Load message file indicating that parallelism
was reduced.
The value of CPU_PARALLELISM can be reduced to throttle the Load to accommodate
other workloads on the machine, including concurrent Loads.
Instructor Guide
DISK_PARALLELISM
This parameter controls the number of buffer manipulators spawned by the Load utility.
The default value is the larger of the number of database containers the Load target table
is striped across, and one quarter of the number of formatters. The number of buffer
manipulators is limited to either four times the number of formatters or 50, whichever is
larger. Default value of DISK_PARALLELISM is often optimal.
Increasing the value of DISK_PARALLELISM can be beneficial if the number of disks
associated with the logical volumes storing the target table containers is greater than the
number of containers.
DATA_BUFFER
This parameter specifies the amount of memory Load utility can use for internal
processing. The default value is a quarter of the available database utility heap, and it is
generally too low, especially if the target table is MDC. Database utility heap size is defined
by UTIL_HEAP_SZ in the database configuration. Both parameters are specified as the
number of 4 KB pages. DATA_BUFFER has significant effect on the utility performance
because an insufficient value will result in reduced CPU parallelism.
This memory is allocated directly from the utility heap, whose size can be modified through
the util_heap_sz database configuration parameter. Beginning in version 9.5, the value of
the DATA BUFFER option of the LOAD command can temporarily exceed util_heap_sz if
more memory is available in the system. In this situation, the utility heap is dynamically
increased as needed until the database_memory limit is reached. This memory will be
released once the load operation completes.
Setting DATA_BUFFER to a value in the range of 400 to 800 (4K pages) per CPU should
allow the LOAD to perform well. For example, on a 6 CPU system, the DATA_BUFFER
might be set to a value between 2400 and 4800.
SORT BUFFER
This option specifies a value that overrides the SORTHEAP database configuration
parameter during a load operation. It is relevant only when loading tables with indexes and
only when the INDEXING MODE parameter is not specified as DEFERRED. The value that
is specified cannot exceed the value of SORTHEAP. This parameter is useful for throttling
the sort memory that is used when loading tables with many indexes without changing the
value of SORTHEAP, which would also affect general query processing.
V8.1
Instructor Guide

Purpose — This explains the default values and performance impact of these LOAD utility
options. In most cases, the defaults will perform well, so it is not necessary to specify these
options for a LOAD. If the values are specified and are too small, the LOAD might not
perform well.
Details —
Transition statement — Let's look at the performance impact of several other LOAD
options.
Instructor Guide
Load options affecting performance (2 of 2)

• ANYORDER:
– Improves parallelism but data is not loaded in the input order
– Exact sequence can be lost with Database partitioning anyway
– More important with large parallelism (CPU_PARALLELISM)
– When specified for Partitioned Database Load:
• Can use multiple pre-partitioning agents
• Default number of partitioning agents > 1
• Potentially large performance implications
– 36% improvement on non-partitioned Load without index maintenance
• FASTPARSE:
– Use with clean, text-based (for example, DEL) input files
– 16% improvement on non-patitioned Load without index maintenance
• FETCH_PARALLELISM:
– When performing a load from a cursor where the cursor is declared using the
DATABASE keyword, or when using the API sqlu_remotefetch_entry media
entry, load utility attempts to parallelize fetching from the remote data source
Figure 10-8. Load options affecting performance (2 of 2) CL4636.0
Notes:
ANYORDER
If the LOAD option ANYORDER is specified, the Load utility will not maintain the input
source record order when writing the formatted data into the database containers. This can
yield significant performance increases on SMP systems. Note that subsequent query
performance might be affected by changed data ordering, especially those applications
that make heavy use of a clustering index.
In one set of LOAD performance tests, there was a 36% improvement on non-DPF Load
without index maintenance.
FASTPARSE
Use of this parameter reduces the amount of syntax checking performed on user-supplied
input data, thereby improving performance. The target table is guaranteed to be
architecturally correct, and the utility will perform enough checking to prevent segmentation
violations. However when incorrect data is encountered, arbitrary values could be loaded.
V8.1
Instructor Guide
Uempty The option should thus be used only with clean data. Since FASTPARSE affects converting
of ASCII data into internal formats, loading from an IXF file is not affected.
In one set of LOAD performance tests, there was a 16% improvement on non-DPF Load
without index maintenance.
FETCH_PARALLELISM
When performing a load from a cursor where the cursor is declared using the DATABASE
keyword, or when using the API sqlu_remotefetch_entry media entry, and this option is set
to YES, the load utility attempts to parallelize fetching from the remote data source if
possible. If set to NO, no parallel fetching is performed. The default value is YES.
Instructor Guide
Instructor notes:
Purpose — This presents the performance impact of the LOAD options ANYORDER and
FASTPARSE.
Details —
Transition statement — Next, we will look at other factors that effect LOAD performance.
V8.1
Instructor Guide
Uempty
Other Load performance factors

• Without Database partitioning, a single media reader can become CPU-
bound
• Bandwidth from other system resources
– Disk I/O, network (for partiutioned DB), OS message queues
• Parallel Loads:
– Table-level scope allows multiple concurrent Loads
– Useful if a single load cannot fully saturate the CPU
– Utility scales well with the number of concurrent Loads
• Topology used in a partitioned database Load:
– Increased parallelism (pre-partitioning and partitioning) can greatly improve the
performance of the partitioning step
– Default usually optimal
• COPY YES, generated columns, MDC tables
• Performance impact of non-clean data can be significant:
– Load will write a message for each record rejected due to data that can’t be formatted –
this can dominate total load time
– If data is known not to be clean, and performance is more important than recovery,
NOROWWARNINGS modifier can be specified
Figure 10-9. Other Load performance factors CL4636.0
Notes:
There are other factors that effect LOAD utility performance.
In a DB2 database without DPF (partitioning), a single media reader can become CPU
bound and limit throughput.
Other system resources including Disk I/O contention and network capacity for partitioned
table loads, can limit LOAD performance.
The LOAD utility locks at the table level so multiple parallel Loads can be used to fully
utilize CPU resources in systems where a single load cannot fully saturate the CPU.
Increased parallelism (pre-partitioning and partitioning) can greatly improve the
performance of the partitioning step for LOAD in a DB2 partitioned database.
Other factors that effect LOAD performance are:
• COPY YES option: The I/Os needed to create the load copy data can effect
performance.
Instructor Guide
• Generated columns: There can be additional CPU overhead if the table contains
generated columns.
• MDC tables: Loading an MDC table adds significant processing to cluster the input
data into blocks.
• Non-clean data: Load will write a message for each record rejected due to data that
can't be formatted, which can dominate total load time. If data is known not to be clean,
and performance is more important than recovery, the NOROWWARNINGS modifier
can be specified.
V8.1
Instructor Guide

Purpose — To give additional examples of factors that impact LOAD performance.
Details —
Transition statement — Next we will look closer at the index processing for the LOAD
utility.
Instructor Guide
Index maintenance in Load (1 of 2)

• Load supports three indexing modes:
– Rebuild, incremental and deferred
– If not specified, a heuristic (index depth and size of loaded delta) is used to
select incremental or rebuild (AUTOSELECT)
– Deferred can not be selected with unique indexes
• Keys are inserted into sort during the Load phase:

– Single scan of data, regardless of the number of indexes
– CREATE INDEX requires a scan for each index
• Indexes are built serially:

– Single agent builds them all, one at a time
– CREATE INDEX can have multiple agents build a single index, but they are still
built one at a time
• In partitioned databases, index building is naturally parallelized
Figure 10-10. Index maintenance in Load (1 of 2) CL4636.0
Notes:
The Load utility supports three indexing modes: rebuild, incremental, and deferred.
Indexing mode can be explicitly specified as a part of the Load command. The default is
autoselect which involves a heuristic based on the existing index depth and relative sizes of
the loaded and pre-existing data is used to internally select the indexing mode.
Rebuilding the index during an offline Load starts by invalidating the existing index object. If
Load is online, a new copy of the index, called the index shadow, is built to allow concurrent
readers access to the old index object, and the two objects are swapped when the utility
commits. Unless requested otherwise, the shadow shares the table space with the original
index object. Note that a physical copy of index data is necessary at commit time if the
shadow is built in a different table space. In either case, a scan of the pre-existing table
data is necessary to insert the corresponding keys into the new index.
If the utility is instructed not to maintain the index – indexing mode DEFERRED – the index
object will be invalidated. If the index is not explicitly rebuilt after Load completes, a full
index rebuild is executed by the index manager at the time determined by the INDEXREC
V8.1
Instructor Guide
Uempty database configuration parameter. The option to defer index building is not allowed if there
are any unique indexes on the table.
Regardless of the number of indexes defined on the target table, Load inserts all the keys
into multiple sorts (one sort for each index) during the Load phase. Hence, no additional
scan of the data is necessary to maintain the index. However, sorts are executed
sequentially and building of the actual index objects is fully serial – a single agent builds all
the indexes, one at a time. Note that the index manager uses a different index rebuild
strategy. When CREATE INDEX statement is executed, multiple agents might be spawned
(if SMP parallelism is enabled via the INTRA_PARALLEL database manager configuration
parameter), but the indexes are still built one at a time. Hence, a single scan of the table
data is needed for each index defined on the target table, but each individual scan and sort
can be parallelized among multiple SMP agents. Therefore CREATE INDEX can use more
memory for sorting, as sort memory allocated by each agent is bound by the value of
SORTHEAP.
When indexes are built in the partitioned database environment, all operations are naturally
parallelized because a different index object is associated with each database partition
spanned by the target table. This is true for both Load and CREATE INDEX.
Instructor Guide
Instructor notes:
Purpose — This describes the options and processing for index maintenance during a
LOAD utility and compares LOAD performance to the processing for CREATE INDEX,
which can in some cases make better use of processor and I/O parallelism.
Details —
Transition statement — Next we will look at some more factors that effect performance
for index maintenance during loads.
V8.1
Instructor Guide
Uempty
Index maintenance in Load (2 of 2)

• Maintaining unique indexes is expensive if there are a lot of
duplicates:
– Each record deletion is fully logged
– Use of the exception table affects performance as the records are
inserted using SQL
• Optimal strategy for some tables could require testing with

actual data on selected hardware (CPUs and Disk):
– Load with indexing
– DROP INDEX + Load + CREATE INDEX
Figure 10-11. Index maintenance in Load (2 of 2) CL4636.0
Notes:
If the table being loaded has a unique index and the input contains many duplicates, there
will be a significant added overhead for load processing.
• As each duplicate record is deleted it will be fully logged, which could cause a database
log full condition.
• Use of the exception table affects performance as the duplicate records are inserted
using SQL Inserts which are logged.
Because of different indexing strategies, it is sometimes beneficial to drop the indexes prior
to loading, and to recreate them using the CREATE INDEX statement once the target table
is fully populated.
Instructor Guide
Although the complexity of issues involved in letting LOAD process the indexes or using
CREATE INDEX makes it difficult to predict which method will perform best, Load should
offer more efficient index maintenance in the following cases:
1. The size of the loaded data set (Load insert) is small relative to the pre-existing data,
since Load utility can maintain the index incrementally, where CREATE INDEX will
result in a full rebuild.
2. The number of indexes is large enough so that time saved by reducing the number of
table scans outweighs the benefits of parallel scanning and sorting.
V8.1
Instructor Guide

Purpose — To understand the impact to LOAD processing if many unique key violations
are processed during the LOAD. Several examples are given where using LOAD to build
the indexes should perform better than using CREATE INDEX.
Details —
Transition statement — Let's look at several performance tests that were run to compare
index processing by LOAD to using CREATE INDEX statements.
Instructor Guide
LOAD performance versus CREATE INDEX
Load, Create Index, Runstats

2500
2000
runstats
time (s)
1500
Load + index
create index
+ runstats
Load + index
1000 + load
Load
index
Load +
index
500
0
T1: 1 index, 7 GB T2: 4 indexes, 2 GB
Figure 10-12. LOAD performance versus CREATE INDEX CL4636.0
Notes:
The visual shows the results of several performance tests comparing elapsed times for
performing index maintenance during the LOAD processing to using CREATE INDEX
statements after the LOAD processes the data.
In one test, a 7 GB table with one index was processed by a LOAD utility that built the index
and also collected the table statistics. The performance was compared to using the
RUNSTATS utility and a CREATE INDEX after the load processing completed. The results
showed that it took longer to collect the Runstats after load completed because a separate
table scan would be necessary for the Runstats utility. With only one index, it was faster to
use LOAD to just process the data and then use CREATE INDEX to create the index and
collect statistics with RUNSTATS after the index was built.
In a second scenario, a smaller 2 GB table with four indexes was loaded. In this case, the
LOAD utility’s ability to generate all of the index keys during a single pass of the data made
using the LOAD utility somewhat faster than just using LOAD to process the data and
CREATE INDEX statements to build the four indexes.
V8.1
Instructor Guide
Uempty This also shows that when a table has multiple indexes, the index maintenance or BUILD
phase can be significantly longer than the LOAD phase that processes the data rows.
Instructor Guide
Instructor notes:
Purpose — To show that depending on the number of indexes, using CREATE INDEX as a
separate step outside of LOAD might perform better than the LOAD utility index
maintenance in some cases.
Details —
Transition statement — Let's look at what can be done to tune the LOAD utility index
maintenance performance.
V8.1
Instructor Guide
Uempty
Tuning the Index Build phase for LOAD

• When loading a table with indexes, follow general sort tuning principles:
– Check SORTHEAP (DB CFG)
• Max number of memory pages used by a single sort - can be set to
AUTOMATIC, managed by Self Tuning Memory Manager
– Check Sort Memory Limits:
• SHEARTHRES_SHR (DB CFG) - Can be set to AUTOMATIC, managed by
Self Tuning Memory Manager
• Soft limit on the number of memory pages that can be used by all sorts
• If exceeded, new sorts get less memory than requested
– Use TEMP table space with multiple containers and a large buffer pool
– Configure prefetching and page cleaning using NUM_IOSERVERS and
NUM_IOCLEANERS
• Online Load considerations:

– Avoid building the shadow in the TEMP table space since Index Copy
phase will involve physical copy of the index pages
– Watch log consumption during incremental index build
Figure 10-13. Tuning the Index Build phase for LOAD CL4636.0
Notes:
Tuning the index build
General sort tuning principles should be used when loading a table with indexes.
• The SORTHEAP database configuration parameter determines the maximum number
of memory pages that can be used by a single sort. SORTHEAP can be set to
AUTOMATIC to allow the Self Tuning Memory Manager (STMM) to automatically adjust
the size based on the application workload, if SHEAPTHRES is configured to a value of
0.
• The size of database shared memory for sorts is limited by the configuration option
SHEAPTHRES_SHR. SHEAPTHRES_SHR can be set to AUTOMATIC to allow the
Self Tuning Memory Manager (STMM) to automatically adjust the size based on the
application workload.
• Since sorts spill into a temporary table space, make sure that a TEMP table space with
multiple containers and a large buffer pool exists.
Instructor Guide
• Ensure efficient page cleaning and prefetching is enabled by specifying sufficiently

large values of NUM_IOCLEANERS and NUM_IOSERVERS database configuration
parameters. The default value for these is AUTOMATIC for DB2 databases, which
causes DB2 to calculate a value at database startup based on the system configuration.
There are performance implications associated with online Load building the shadow index
in a table space other than the one containing the original, since in that case the index copy
phase will have to involve a physical copy of the whole index object.
Monitor log space consumption when online Load is incrementally maintaining indexes.
V8.1
Instructor Guide

Purpose — This lists some of the performance considerations for index processing during
the LOAD utility, most are related to optimizing the sort operations including increasing
SORTHEAP and the buffer pool that supports the temporary table spaces.
Details — Sort Memory requirements:
The total amount of memory needed by the sort can be estimated using the rough
calculation outlined below. If sufficient physical memory is available, SORTHEAP value
should be set high enough to ensure all sorting is done in memory. Otherwise, performance
will deteriorate since sort will have to spill to the TEMP buffer pool and eventually to disk.
Memory requirement for each data record:
• 'insert buffer' = Total key size + 5 bytes for a RID + 4 bytes overhead + 4 bytes for each
variable length column
• 'partial key buffer' = Round up 'insert buffer' to 8 byte boundary (64-bit server) or 4-byte
boundary (32-bit server), not exceeding 16 bytes in either case
• 'pointer' = 8 bytes (64-bit server) or 4 bytes (32-bit server)
• 'collation' = Add 8 bytes (64-bit server) or 4 bytes (32-bit server) if non-standard
collation is used (IDENTITY_16BIT, UCA400_NO and UCA400_LTH)
• 'key offset and auxiliary array' = 16 bytes
• 'treesort overhead' = 40 bytes (64-bit server) or 20 bytes (32-bit server) - Only if
TreeSort is used!
So, to build INDEX1 on a dataset containing 50 million rows without having to spill, one
would need:
50000000 * (33 + 16 + 8 +16) = 3650000000 = 3.4 GB of storage.
Transition statement — Let's look at using a named pipe as the input for a LOAD utility.
Instructor Guide
Load using a named pipe

• Can avoid creating an intermediate file by exporting to a named
pipe and loading from it
• Can initiate either the export or load first:
– If load is started first, it will wait until data is placed onto the pipe
– If the export is done first, export will fill the pipe and will block until load
starts reading the data from it
• Pipe sizes are actually quite small (anywhere from 8 KB to 32 KB)

• If amount of data is larger than the pipe size, the export/load will
be throttled (by the pipe) and the writes/reads will alternate.
• Example:
Session #1: mkfifo /tmp/pipe
export to /tmp/pipe of ixf select * from sourceTable
Session #2: load from /tmp/pipe of ixf insert into targetTable
Figure 10-14. Load using a named pipe CL4636.0
Notes:
A named pipe could be used to feed the output of an Export utility directly into a LOAD
utility and avoid creating an intermediate file. Either utility could be started first.
Since pipe sizes are actually quite small, being anywhere from 8 KB to 32 KB, if the amount
of data is larger than the pipe size, the export/load will be throttled by the pipe and the
writes/reads will alternate.
The Export and LOAD Utilities would need to run in two different sessions, for example:
Session #1: mkfifo /tmp/pipe
export to /tmp/pipe of ixf select * from sourceTable
Session #2: load from /tmp/pipe of ixf insert into targetTable
V8.1
Instructor Guide

Purpose — This describes using a named pipe to pass data directly from an Export into a
LOAD utility.
Details —
Transition statement — Next we will look at using a declared cursor to provide LOAD
input.
Instructor Guide
Load from Cursor

• The CURSOR file type allows the results of an SQL query to be
directly loaded into a target table without creating an intermediate
exported file
• Must first define a cursor using the DECLARE CURSOR statement
• A nickname can be used in the SQL query which means that data
can be loaded from another database in a single step
• Use INTRA_PARALLEL, MAX_QUERYDEGREE and
DFT_DEGREE to get parallel processing of query used for input
• Examples:
declare loadcurs cursor for select * from Oracle.Table1
load from loadcurs of cursor insert into DB2app.Table1
declare cursor1 cursor for
select year(sales_date), sales_person, count(*) from sales
group by year(sales_date), sales_person
load from cursor1 of cursor insert into sales_summary
Figure 10-15. Load from Cursor CL4636.0
Notes:
A CURSOR input source type can be used to load the data retrieved by an SQL query.
Arbitrary queries are supported, including simple single table selects, joins, and selects
from nicknames.
Since data returned by the query is already in internal database format, less processing
needs to be performed by the utility. However, throughput might be negatively affected by
the fact that SQL is being used to access the source data. Optimizing a Load from
CURSOR involves several additional parameters controlling the SQL query execution time.
INTRA_PARALLEL database manager parameter has to be turned on to allow parallel
execution, and MAX_QUERYDEGREE should be set to ANY. The number of SMP agents
executing the query is controlled by the DFT_DEGREE database configuration parameter.
The number of page cleaners and the number of prefetchers are specified by
NUM_IOCLEANERS and NUM_IOSERVERS database configuration parameters.
V8.1
Instructor Guide

Purpose — This shows some examples of using a declared cursor as the input for a LOAD
utility.
Details — The coordinator agent (named db2agent) executes the cursor fetch, and it will
generally be the bottleneck. During a series of LOAD performance tests, it was concluded
that the single coordinator cannot saturate more than two formatters, so Load CPU
parallelism was reduced to two. Note that formatters' work is simplified when loading from
cursor as the input data is already in internal database representation. Decreasing the
Load CPU parallelism to one was detrimental to performance because the work otherwise
done by the ridder is also performed by the single (primary) formatter. This primary
formatter was CPU-bound and it became the new bottleneck.
Transition statement — Let's look at several options for handling the input for loading.
Instructor Guide
Additional LOAD Input options

• LOAD from a remote database using a CURSOR with the
DATABASE option:
– DECLARE CURSOR statement includes DATABASE option. USER and
USING options can be used to define userid and password
– Federated objects, like a nickname for the remote table are not required
– Less overhead than using a nickname
– Parallel processing can be used if source and target have a common
distribution for database partitioning
DECLARE mycurs CURSOR DATABASE dbsource USER dsciaraf USING
mypasswd FOR SELECT TWO,ONE,THREE FROM abc.table1
LOAD FROM mycurs OF cursor INSERT INTO abc.table2
• LOAD using the SOURCEUSEREXIT option:

– SOURCEUSEREXIT option provides a facility through which a
customized script or executable populates one or more named pipes with
data that is simultaneously read from by the Load utility
– For partitioned databases, multiple instances of the user exit can be
invoked concurrently to achieve parallelism of the input data
Figure 10-16. Additional LOAD Input options CL4636.0
Notes:
Loading from a remote data source using the DATABASE option of DECLARE CURSOR
By specifying the CURSOR file type when using the LOAD command, you can load the
results of an SQL query directly into a target table without creating an intermediate
exported file. Additionally, you can load data from another database by referencing a
nickname within the SQL query, by using the DATABASE option within the DECLARE
CURSOR statement, or by using the sqlu_remotefetch_entry media entry when using the
API interface.
The DATABASE option can be included in the DECLARE CURSOR statement used to
define the LOAD utility input, for example:
DECLARE mycurs CURSOR DATABASE dbsource USER dsciaraf USING mypasswd
FOR SELECT TWO,ONE,THREE FROM abc.table1
LOAD FROM mycurs OF cursor INSERT INTO abc.table2
V8.1
Instructor Guide
Uempty Using the DATABASE option of the DECLARE CURSOR statement (also known as the
remotefetch media type when using the Load API) has some benefits over the nickname
approach:
• Performance:
- Fetching of data using the remotefetch media type is tightly integrated within a load
operation. There are fewer layers of transition to fetch a record compared to the
nickname approach.
- When source and target tables are distributed identically in a multi-partition
database, the Load utility can parallelize the fetching of data, which can further
improve performance.
• Ease of use
- There is no need to enable federation, define a remote data source, or declare a
nickname. Specifying the DATABASE option (and the USER and USING options if
necessary) is all that is required.
- While this method can be used with cataloged databases, the use of nicknames
provides a robust facility for fetching from various data sources which cannot simply
be cataloged.
- To support this remotefetch functionality, the Load utility makes use of infrastructure
which supports the SOURCEUSEREXIT facility. The Load utility spawns a process
which executes as an application to manage the connection to the source database
and perform the fetch. This application is associated with its own transaction and is
not associated with the transaction under which the Load utility is running.
Restrictions
• When loading from a cursor defined using the DATABASE option (or equivalently when
using the sqlu_remotefetch_entry media entry with the db2Load API), the following
restrictions apply:
• The SOURCEUSEREXIT option cannot be specified concurrently.
• The METHOD N option is not supported.
• The usedefaults file type modifier is not supported.
Moving data using a customized application (user exit)

The load SOURCEUSEREXIT option provides a facility through which the Load utility can
execute a customized script or executable, referred to herein as a user exit. The purpose of
the user exit is to populate one or more named pipes with data that is simultaneously read
from by the Load utility. In a multi-partition database, multiple instances of the user exit can
be invoked concurrently to achieve parallelism of the input data.
Instructor Guide
The Load utility creates a one or more named pipes and spawns a process to execute your
customized executable. Your user exit feeds data into the named pipes while the Load
utility simultaneously reads.
Invoking your user exit

The user exit must reside in the bin subdirectory of the DB2 installation directory (often
known as sqllib).
The Load utility invokes the user exit executable with the following command line
arguments:
• <base pipename>
• <number of source media>
• <source media 1>
• <source media 2> ...
• <user exit ID>
• <number of user exits>
• <database partition number>
• Where:
<base pipename> – Is the base name for named-pipes that the Load utility creates and
reads data from. The utility creates one pipe for every source file provided to the LOAD
command, and each of these pipes is appended with .xxx, where xxx is the index of the
source file provided. For example, if there are 2 source files provided to the LOAD
command, and the <base pipename> argument passed to the user exit is pipe123, then
the two named pipes that your user exit should feed with data are pipe123.000 and
pipe123.001. In a partitioned database environment, the Load utility appends the
database partition (DBPARTITION) number .yyy to the base pipe name, resulting in the
pipe name pipe123.xxx.yyy.
<number of source media> – Is the number of media arguments which follow.
<source media 1> <source media 2> ... – Is the list of one or more source files
specified in the LOAD command. Each source file is placed inside double quotation
marks.
<user exit ID> – Is a special value useful when the PARALLELIZE option is enabled.
This integer value (from 1 to N, where N is the total number of user exits being
spawned) identifies a particular instance of a running user exit. When the
PARALLELIZE option is not enabled, this value defaults to 1.
<number of user exits> – Is a special value useful when the PARALLELIZE option is
enabled. This value represents the total number of concurrently running user exits.
When the PARALLELIZE option is not enabled, this value defaults to 1.
V8.1
Instructor Guide
Uempty <database partition number> – Is a special value useful when the PARALLELIZE
option is enabled. This is the database partition (DBPARTITION) number on which the
user exit is executing. When the PARALLELIZE option is not enabled, this value
defaults to 0.
Additional options and features

The SOURCEUSEREXIT facility provides these additional options:
REDIRECT – This option allows you to pass data into the STDIN handle or capture data
from the STDOUT and STDERR handles of the user exit process.
INPUT FROM BUFFER <buffer> – Allows you to pass information directly into the
STDIN input stream of your user exit. After spawning the process which executes the
user exit, the Load utility acquires the file-descriptor to the STDIN of this new process
and passes in the buffer provided. The user exit reads from STDIN to acquire the
information. The Load utility simply sends the contents of <buffer> to the user exit using
STDIN and does not interpret or modify its contents. For example, if your user exit is
designed to read two values from STDIN, an eight-byte userid and an eight-byte
password, your user exit executable written in C might contain the following lines: rc =
read (stdin, pUserID, 8); rc = read (stdin, pPasswd, 8); A user could pass this
information using the INPUT FROM BUFFER option as shown in the following LOAD
command:
LOAD FROM myfile1 OF DEL INSERT INTO table1 SOURCEUSEREXIT myuserexit1
REDIRECT INPUT FROM BUFFER myuseridmypasswd
Note
The Load utility limits the size of <buffer> to the maximum size of a LOB value.
However, from within the command line processor (CLP), the size of <buffer> is
restricted to the maximum size of a CLP statement. From within CLP, it is also
recommended that <buffer> contain only traditional ASCII characters. These issues can
be avoided if the Load utility is invoked using the db2Load API, or if the INPUT FROM
FILE option is used instead.
INPUT FROM FILE <filename> – Allows you to pass the contents of a client side file
directly into the STDIN input stream of your user exit. This option is almost identical to
the INPUT FROM BUFFER option, however this option avoids the potential CLP
limitation. The filename must be a fully qualified client side file and must not be larger
than the maximum size of a LOB value.
OUTPUT TO FILE <filename> – Allows you to capture the STDOUT and STDERR
streams from your user exit process into a server side file. After spawning the process
which executes the user exit executable, the Load utility redirects the STDOUT and
STDERR handles from this new process into the filename specified. This option is
Instructor Guide
useful for debugging and logging errors and activity within your user exit. The filename
must be a fully qualified server side file. When the PARALLELIZE option is enabled, one
file exists per user exit and each file appends a three-digit numeric identifier, such as
filename.000.
PARALLELIZE – This option can increase the throughput of data coming into the Load
utility by invoking multiple user exit processes simultaneously. This option is only
applicable to a multi-partition database. The number of user exit instances invoked is
equal to the number of distribution agents if data is to be distributed across multiple
database partitions during the load operation, otherwise it is equal to the number of
loading agents.
V8.1
Instructor Guide

Purpose — To discuss the options to define a cursor with the DATABASE name of a
remote database and also the SOURCEUSEREXIT option.
Details —
Transition statement — Let's look at the results of some LOAD performance tests that
were run.
Instructor Guide
LOAD performance experiments
Figure 10-17. LOAD performance experiments CL4636.0
Notes:
A series of tests were run to analyze the performance characteristics of the LOAD utility
using different formats of input, including an IXF file, an ASCII text file (ASC), a delimited
file (DEL), and a declared cursor. Different LOAD utility options were tested to find the best
performing combination of options for this single partition database test environment.
These tests were focused on the loading of data so there was no index defined on the table
being loaded. In each case the LOAD processed 50 million records.
The table shows the results for each different input type. One measure was the number of
Gigabytes of data loaded per hour per CPU. The second measure was the number of rows
loaded per second. The last column shows the performance achieved by manually setting
LOAD utility options compared to the LOAD utility self optimization.
In these tests, the use of an IXF file for input significantly outperformed the other options.
The IXF LOAD was able to process 127,900 rows per second compared to 57,300 for the
ASC file, 54,800 for the DEL file, and 50,200 for the Cursor LOAD. The performance
differences were caused by the overhead involved in reading and formatting each input.
V8.1
Instructor Guide

Purpose — These performance results show some different performance characteristics
for the different types of LOAD utility input. The IXF file has the advantage of having data in
a form that requires less CPU overhead per row to format and store. Notice that the
CURSOR based LOAD only used a CPU_PARALLELISM of 2 compared to either 5(IXF) or
6(ASC and DEL). The cursor LOAD reads data in a form that does not require much
formatting per row but also requires more overhead to read, so it did not benefit from
having a larger number of parallel formatters.
Details —
Transition statement — Next we will look at using the LOAD QUERY command to review
the status of a Load utility.
Instructor Guide
Checking Load status: Load query

db2 load query table prod.table1
SQL3501W The table space(s) in which the table resides will not be placed in
backup pending state since forward recovery is disabled for the database.
SQL3109N The utility is beginning to load data from file

"C:\cf45\reorg\savehist.del".
SQL3500W The utility is beginning the "LOAD" phase at time "03/29/2005

21:30:24.468073".
SQL3519W Begin Load Consistency Point. Input record count = "0".

SQL3520W Load Consistency Point was successful.
........................
SQL0289N Unable to allocate new pages in table space "LOADTSPD".

SQLSTATE=57011
SQL3532I The Load utility is currently in the "LOAD" phase.
Number of rows read = 48314

Number of rows skipped = 0
Number of rows loaded = 48314
Number of rows rejected = 0
Number of rows deleted = 0
Number of rows committed = 48314
Number of warnings = 0
Tablestate:
Load Pending
Figure 10-18. Checking Load status: Load query CL4636.0
Notes:
LOAD QUERY command
Checks the status of a load operation during processing and returns the table state. If a
load is not processing, then the table state alone is returned. A connection to the same
database, and a separate CLP session are also required to successfully invoke this
command. It can be used either by local or remote users.
Authorization: None
Required connection: Database
Command syntax:
>>-LOAD QUERY--TABLE--table-name--+------------------------+---->
'-TO--local-message-file-'
>--+-------------+--+-----------+------------------------------><
+-NOSUMMARY---+ '-SHOWDELTA-'
'-SUMMARYONLY-'
V8.1
Instructor Guide
Uempty Command parameters:

• NOSUMMARY – Specifies that no load summary information (rows read, rows skipped,
rows loaded, rows rejected, rows deleted, rows committed, and number of warnings) is
to be reported.
• SHOWDELTA – Specifies that only new information (pertaining to load events that have
occurred since the last invocation of the LOAD QUERY command) is to be reported.
• SUMMARYONLY – Specifies that only load summary information is to be reported.
• TABLE table-name – Specifies the name of the table into which data is currently being
loaded. If an unqualified table name is specified, the table will be qualified with the
CURRENT SCHEMA.
• TO local-message-file – Specifies the destination for warning and error messages that
occur during the load operation. This file cannot be the message-file specified for the
LOAD command. If the file already exists, all messages that the Load utility has
generated are appended to it.
Examples:
A user loading a large amount of data into the STAFF table wants to check the status of the
load operation. The user can specify:
db2 connect to <database>
db2 load query table staff to /u/mydir/staff.tempmsg
Instructor Guide
The output file /u/mydir/staff.tempmsg might look like the following:


"/u/mydir/data/staffbig.del"
SQL3500W The utility is beginning the "LOAD" phase at time "03-21-2002

11:31:16.597045".

Tablestate:
Load in Progress
V8.1
Instructor Guide
Uempty The example in the graphic was the output of the LOAD QUERY after a LOAD utility failed
because the table space was full. Notice that the table is in a Load Pending status, but the
table space is available for access to other tables.

"C:\cf45\reorg\savehist.del".
SQL3500W The utility is beginning the "LOAD" phase at time "03/29/2005

21:30:24.468073".

........................
SQL0289N Unable to allocate new pages in table space "LOADTSPD".

SQLSTATE=57011

Tablestate:
Load Pending
Instructor Guide
Instructor notes:
Purpose — This shows an example of the output from a LOAD QUERY.
Details —
Transition statement — Let's take a look at monitoring an active Load utility with the LIST
UTILITIES command.
V8.1
Instructor Guide
Uempty
Load monitoring: LIST UTILITIES

db2 LIST UTILITIES SHOW DETAIL
ID = 158
Type = LOAD
Member Number = 0
Description = [LOADID: 2550.2013-10-31-15.58.49.076455.0 (10;4)]
[*LOCAL.inst461.131031195024] ONLINE LOAD DEL AUTOMATIC INDEXING
INSERT NON-RECOVERABLE INST461 .LOADHIST1
Start Time = 10/31/2013 15:58:49.093144
State = Executing
Phase Number = 1
Description = SETUP
Total Work = 0 bytes
Completed Work = 0 bytes
Start Time = 10/31/2013 15:58:49.093151
Phase Number = 2
Description = LOAD
Total Work = 10000 rows
Completed Work = 10000 rows
Start Time = 10/31/2013 15:58:55.184448
Phase Number [Current] = 3

Description = BUILD
Total Work = 2 indexes
Completed Work = 2 indexes
Start Time = 10/31/2013 15:58:55.404411
Figure 10-19. Load monitoring: LIST UTILITIES CL4636.0
Notes:
The LIST UTILITIES command can be used to display the status for active LOAD utilities in
a DB2 database. In addition to showing the time when the LOAD began processing, there
is a set of progress monitoring statistics that can be used to determine the current phase of
load processing and the amount of work completed in each phase.
LIST UTILITIES command
Displays to standard output the list of active utilities on the instance. The description of
each utility can include attributes such as start time, description, throttling priority (if
applicable), as well as progress monitoring information (if applicable).
Scope This command only returns information for the database
partition on which it is issued.
Authorization One of the following:
• sysadm
• sysctrl
• sysmaint
Instructor Guide
Required connection Instance

Command syntax:
>>-LIST UTILITIES--+-------------+-----------------------------><
'-SHOW DETAIL-'
Command parameters SHOW DETAIL – Displays detailed progress information for
utilities that support progress monitoring.
V8.1
Instructor Guide

Purpose — This shows an example of LIST UTILITIES output.
Details —
Transition statement — Let's take a look at the effect of the Load utility recovery options.
Instructor Guide
Load utility recovery options
DB2 Database
SALES
SYSCATSPACE TEMPSPACE1
DMS01 DMS02 DMS03 /db/loadcopy
tab1 tab2 tab3
Backup Pending 3) DB2 load from...

DB2 Database
Backup 8AM TS Normal Status insert into tab2 COPY YES to /db/loadcopy
LOG 10 LOG 11 LOG 12 LOG 13 LOG 14 LOG 15 LOG 16 LOG 17
8:00 9:30 10:00 11:30

1) DB2 load from... DMS01 4) DB2 load from...
insert into tab1 COPY NO Backup insert into tab3 NONRECOVERABLE
10:00AM
2) DB2 backup db sales

tablespace(dms01) online to..
Figure 10-20. Load utility recovery options CL4636.0
Notes:
The LOAD utility has three options that effect the recoverability for load processing. These
options have no impact if the DB2 database is configured for circular logging, in which case
the COPY YES option is not allowed.
The graphic shows three Load utilities, each running with a different option for load
recovery.
There is a database backup that was created before the loads started processing. This
could be an online database backup.
1. The first Load utility is run with the COPY NO option, which is the default. The load will
process the input file and append the data to the target table. The data is not logged. A
load with COPY NO in a recoverable database will force that table space for that table
into a BACKUP PENDING status, which allows reads but no updates for ALL tables in
the table space.
2. In the example, an online table space backup is used to resolve the backup pending
state and return the table space to a normal state with full read/write access.
V8.1
Instructor Guide
Uempty 3. The second Load utility uses the COPY YES option. The data in not logged to the DB2
log files, but a load copy file is produced that holds that load output. This load copy file
will have a unique name generated with a timestamp similar to DB2 backup files. The
load copy file will be recorded in the recovery history file for the database and the name
will be logged in the database logs. The table space will not be put into backup pending
status.
4. A third Load utility uses the NONRECOVERABLE option of LOAD. The data is
appended to the target table, but no load copy file is saved and the table space is not
put into any backup pending status, so the table space remains fully available for
application to access other tables.
But what happens if the database is restored and the DB2 logs are used to roll
forward and recover the database to its latest available logged changes?
Instructor Guide
Instructor notes:
Purpose — This provides an example of using each of the LOAD utility recovery options in
a recoverable database. Students should understand that even though the LOAD utility
was changed in Version 8 to remove the load pending state that prevented access to all
tables in the table space during a LOAD, a LOAD with the COPY NO option will cause the
table space to enter the backup pending state which will only allow read applications to run
until a database or table space backup is completed.
Details —
Transition statement — Next we will look at results of performing a full database recovery
for the database after these three Load utilities have finished processing.
V8.1
Instructor Guide
Uempty
Load utility effects during rollforward recovery
DB2 Database
SALES
SYSCATSPACE TEMPSPACE1
DMS01 DMS02 DMS03 /db/loadcopy
tab1 tab2 X tab3

DB2 Database Restore Pending 2) DB2 load from...
Backup 8AM insert into tab2 COPY YES to /db/loadcopy
LOG 10 LOG 11 LOG 12 LOG 13 LOG 14 LOG 15 LOG 16 LOG 17
8:00 9:30 10:00 11:30

1) DB2 load from... DMS01
3) DB2 load from...
insert into tab1 COPY NO Backup insert into tab3 NONRECOVERABLE
10:00AM
RESTORE DB
SALES
ROLLFORWARD DB SALES TO END OF LOGS AND STOP
©©Copyright
CopyrightIBM
IBMCorporation
Corporation2013
2005
Figure 10-21. Load utility effects during rollforward recovery CL4636.0
Notes:
In the previous example, three Load utilities completed processing, each one using a
different option for recovery.
If the database backup taken before the loads processed was restored and the rollforward
command was used with the TO END OF LOGS option to apply all logged changes to the
database, the logged entries for the load utilities would impact the recovery for each of the
loaded tables.
1. First, the load with the COPY NO option would be encountered in the DB2 log files. This
would force the table space for the table that was loaded into a restore pending status
and the remaining logged changes for that table space would be bypassed. No tables in
that table space could be accessed until the table space is restored and a table space
rollforward is run to apply the logs from the time of the backup forward to the end of the
log files.
2. Next, the load with the COPY YES option would be encountered in the DB2 log files by
the roll forward processing. The load copy file name would be found in the log files and
DB2 would add the pages containing the loaded data to the table, which would produce
Instructor Guide
the same result as rerunning the original LOAD. At the end of the rollforward command,
the table space and table would be available for applications.
3. The roll forward would also find a log record that indicated the third load had run with a
NONRECOVERABLE option. In this case, since DB2 can not reproduce the load
processing for that table, the table will be marked inaccessible. The table space
containing the table would not be effected. All other changes encountered in the DB2
logs for that table would be ignored. When the roll forward completes, the table could
not be accessed and it would be necessary to drop and recreate the table. Another
Load could be run if the input is still available.
V8.1
Instructor Guide

Purpose — This explains the effects of different load recovery options on the processing
for loads with the three recovery options.
Details —
Transition statement — Next we will discuss the reasons for using the INGEST utility.
Instructor Guide
INGEST utility - Why a new utility?
• Avoid use of a staging table when loading a table.

– Common scenario is to load into a staging table, followed by INSERT/SELECT from the
staging table to the target table
• Allow other applications to do inserts, updates, or deletes while the utility

loads a table
• Load tables continuously as new data arrives
• More flexible SQL including UPDATE, MERGE, DELETE, function calls,

and expressions
• When a recoverable error occurs (for example, connection failed but re-
established), recover automatically and continue
• Note: Performance should comparable to loading into a staging table

followed by multiple concurrent INSERT/SELECTs from the staging table to
the target table
Figure 10-22. INGEST utility - Why a new utility? CL4636.0
Notes:
The ingest utility (sometimes referred to as continuous data ingest, or CDI) is a high-speed
client-side DB2 utility that streams data from files and pipes into DB2 target tables.
Because the ingest utility can move large amounts of real-time data without locking the
target table, you do not need to choose between the data currency and availability.
The ingest utility ingests pre-processed data directly or from files output by ETL tools or
other means. It can run continually and thus it can process a continuous data stream
through pipes. The data is ingested at speeds that are high enough to populate even large
databases in partitioned database environments.
An INGEST command updates the target table with low latency in a single step. The ingest
utility uses row locking, so it has minimal interference with other user activities on the same
table.
With this utility, you can perform DML operations on a table using a SQL-like interface
without locking the target table. These ingest operations support the following SQL
statements: INSERT, UPDATE, MERGE, REPLACE, and DELETE. The ingest utility also
V8.1
Instructor Guide
Uempty supports the use of SQL expressions to build individual column values from more than one
data field.
Other important features of the ingest utility include:
• Commit by time or number of rows. You can use the commit_count ingest configuration
parameter to have commit frequency determined by the number of written rows or use
the default commit_period ingest configuration parameter to have commit frequency
determined by a specified time.
• Support for copying rejected records to a file or table, or discarding them. You can
specify what the INGEST command does with rows rejected by the ingest utility (using
the DUMPFILE parameter) or by DB2 (using the EXCEPTION TABLE parameter).
• Support for restart and recovery. By default, all INGEST commands are restartable from
the last commit point. In addition, the ingest utility attempts to recover from certain
errors if you have set the retry_count ingest configuration parameter.
Instructor Guide
Instructor notes:
Purpose — To introduce the INGEST utility, which became available with DB2 10.1.
Details —
Transition statement — Next we will discuss where the INGEST utility processing can be
executed.
V8.1
Instructor Guide
Uempty
Deciding where to run the INGEST utility

• The INGEST utility is included as a part of the DB2 client
installation
– You can run it from either the client or the server.
• On an existing server in the data warehouse environment
• On the DB2 coordinator partition of a partitioned database
• On an existing ETL (extract, transform, and load) server
• On a new server
• On a server that is only running the ingest utility
• On a server that is also hosting an additional DB2 coordinator partition
that is dedicated to the ingest utility
Local client Remote client
Ingest utility Target DB

Ingest utility Target DB
Figure 10-23. Deciding where to run the INGEST utility CL4636.0
Notes:
Deciding where to run the ingest utility
The ingest utility is included as a part of the DB2 client install. You can run it from either the
client or the server.
Install the ingest utility (part of the DB2 Data Server Runtime Client and the DB2 Data
Server Client.
There are two choices for where to run the ingest utility:
• On an existing server in the data warehouse environment
- On the DB2 coordinator partition (the database partition server to which applications
will connect and on which the coordinating agent is located)
- On an existing ETL (extract, transform, and load) server
• On a new server
- On a server that is only running the ingest utility
Instructor Guide
- On a server that is also hosting an additional DB2 coordinator partition that is

dedicated to the ingest utility.
There are a number of factors that can influence where you decide to install the ingest
utility:
• Performance: Having the ingest utility installed on its own server has a significant
performance benefit, so this would be suitable for environments with large data sets.
• Cost: Having the ingest utility installed on an existing server means that no additional
expenses are incurred as a result of using it.
• Ease of administration
V8.1
Instructor Guide

Purpose — To discuss the system location for running the INGEST utility.
Details —
Transition statement — Next we will look at a simple INGEST command.
Instructor Guide
Most basic INGEST command syntax
Fields are
file containing separated by
input records commas and
correspond to table
columns.
INGEST FROM FILE my_file.del FORMAT DELIMITED

INSERT INTO my_table;
Insert records
into the target
table
Figure 10-24. Most basic INGEST command syntax CL4636.0
Notes:
You can issue the INGEST command specifying, at a minimum, a source, the format, and
the target table as in the following example:
INGEST FROM FILE my_file.txt
FORMAT DELIMITED
V8.1
Instructor Guide

Purpose — To show the simplest example of an INGEST command.
Details —
Transition statement — Next we will discuss input for INGEST utilities.
Instructor Guide
Input types and formats for INGEST
• Input types: file or named pipe (can also specify multiple files or
multiple pipes)
• Input formats
– DELIMITED
• Equivalent to "OF DEL" on the IMPORT and LOAD commands
• Fields are separated by a one-byte character (default is comma)
• Records are always varying length (delimited by CRLF or LF)
• Field definition list is optional and defaults to specified or implied table
columns
– POSITIONAL
• Equivalent to "OF ASC" on the IMPORT and LOAD commands
• Fields are at fixed offsets in each record and have a fixed length
• Records can be varying length (delimited by CRLF or LF) or fixed length
• Field definition list is required
• Allows numbers to be specified in binary
• INPUT CODEPAGE parameter allows specifying codepage of

input data
Figure 10-25. Input types and formats for INGEST CL4636.0
Notes:
The ingest utility (sometimes referred to as continuous data ingest, or CDI) is a high-speed
client-side DB2 utility that streams data from files and pipes into DB2 target tables.
The INGEST command supports the following input data formats:
• Delimited text
• Positional text and binary
• Columns in various orders and formats
When the ingest utility processes input data, there are three code pages involved: the
application (client) code page, the input data code page, and the database code page.
If the input data code page differs from the application code page, the ingest utility
temporarily overrides the application code page with the input data code page so that DB2
converts the data directly from the input data code page to the database code page. Under
some conditions, the ingest utility cannot override the application code page. In this case,
the ingest utility converts character data that is not defined as FOR BIT DATA to the
V8.1
Instructor Guide
Uempty application code page before passing it to DB2. In all cases, if the column is not defined as
FOR BIT DATA, DB2 converts the data to the database code page.
Instructor Guide
Instructor notes:
Purpose — To discuss briefly the input sources and input formats supported by INGEST.
Details —
Transition statement — Next we will look at some sample INGEST commands.
V8.1
Instructor Guide
Uempty
Ingest - Input types and formats. examples
-- Input records are sent over a named pipe.

INGEST FROM PIPE my_pipe FORMAT DELIMITED
-- Input records are delimited by CRLF and fields are delimited by a vertical bar.
INGEST FROM FILE my_file.del FORMAT DELIMITED '|'
-- Input records are delimited by CRLF with fields in fixed positions.

INGEST FROM FILE my_file.del FORMAT POSITIONAL
(
$field1 POSITION(1:12) INTEGER EXTERNAL,
$field2 POSITION(13:20) CHAR(8)
)
-- Input records are fixed length (no CRLF) with fields in fixed positions.
INGEST FROM FILE my_file.del FORMAT POSITIONAL RECORDLEN 20
(
)
Figure 10-26. Ingest - Input types and formats. examples CL4636.0
Notes:
The visual shows several examples of INGEST commands that demonstrate using a
named pipe or a file as input.
The examples show that the input records could be delimited data or defined with data
fields in fixed positions.
Instructor Guide
Instructor notes:
Purpose — To show INGEST command examples, showing different types of input
definitions.
Details —
Transition statement — Next we will discuss field definitions for INGEST processing.
V8.1
Instructor Guide
Uempty
INGEST – Using Field definition lists
• The optional field definition list specifies field names, position, type,
and options
– Field types are the same as SQL column types, but some types are not
supported.
– By default numeric types are in binary. For ASCII, you must specify
EXTERNAL (required for FORMAT DELIMITED)
• You can omit the field definition list when:
– Input format is DELIMITED
– The SQL statement is INSERT
– The input record format corresponds to the specified or implied
table columns
• You must specify a field definition if any one of the following is true:
– The format is POSITIONAL
– The SQL statement is not INSERT
– Not all fields correspond to table columns
– You want to specify an SQL expression using fields
Figure 10-27. INGEST - Using Field definition lists CL4636.0
Notes:
The visual describes the conditions that would allow the INGEST command to be run
without a defined field list. It also lists some conditions that would require the use of a field
definition list.
The field definition list defaults as follows:
• If a column list follows the table name on the INSERT statement, there is one field for
each column in the list.
• If the INSERT statement omits the column list and there are no implicitly hidden
columns, then there is one field for each column in the table.
• If the INSERT statement omits the column list and there are implicitly hidden columns,
then you must explicitly specify whether or not the implicitly hidden columns are
included. Use the DB2_DMU_DEFAULT registry variable, or the
IMPLICITLYHIDDENINCLUDE or IMPLICITLYHIDDENMISSING keywords to specify if
implicitly hidden columns are included.
Instructor Guide
• Each field has the same name as the corresponding table column, prefixed with a dollar
sign $.
• Each field has the same data type and length (or precision and scale) as the
corresponding table column. Numeric fields (integer, decimal, and so on) default to
EXTERNAL format. DB2SECURITYLABEL fields default to STRING format.
V8.1
Instructor Guide

Purpose — To discuss the use of field lists for INGEST commands.
Details —
Transition statement — Next we will see how field lists of the INGEST command could
contain SQL expressions.
Instructor Guide
Using Field definition lists and SQL expressions

-- Compute column TOTAL_PRICE from two input fields.
INGEST FROM FILE my_file FORMAT DELIMITED
(
$prod_ID INTEGER EXTERNAL,
$prod_name CHAR(8),
$unused_field CHAR(1),
$base_price DECIMAL(5,2) EXTERNAL,
$shipping_cost DECIMAL(5,2) EXTERNAL
)
INSERT INTO my_table(prod_ID, prod_name, total_price)
VALUES($prod_ID, $prod_name, $base_price + $shipping_cost);
-- Input records have three 2-character fields for year, month,

and day.
-- Ingest them into a DATE column.
INGEST FROM FILE my_file FORMAT DELIMITED
(
$month CHAR(2),
$day CHAR(2),
$year CHAR(2)
)
INSERT INTO my_table(date_column)
VALUES( DATE('20' || $year || '-' || $month || '-' || $day)
);
Figure 10-28. Using Field definition lists and SQL expressions CL4636.0
Notes:
The ingest utility supports the use of SQL expressions to build individual column values
from more than one data field.
The visual shows two examples of INGEST commands. In the first example two data fields
in the input are added together to produce the column value used in the INSERT
statement.
In the second example, three input data fields are combined with constants to form a single
output column of data.
V8.1
Instructor Guide

Purpose — To show several examples of INGEST commands that use SQL expressions
to define the data value inserted into a table, rather than using a single input data field.
Details —
Transition statement — Next we will see how a set of format options can be used in an
INGEST command.
Instructor Guide
INGEST Field options -- examples
-- example includes DEFAULTIF condition

FORMAT POSITIONAL
(
$field2 POSITION(10:19) DATE 'yyyy-mm-dd' DEFAULTIF(35) = ' ',
)
INSERT INTO my_table
VALUES($field1, $field2, $field3);
-- example input contains filler field definitions

FORMAT DELIMITED
(
$field1 INTEGER,
$field2 CHAR(8),
$filler1 CHAR,
$field3 CHAR(32),
$filler2 CHAR,
$field4 DATE
)
INSERT INTO my_table VALUES($field1, $field2, $field3, $field4);
Figure 10-29. INGEST Field options -- example CL4636.0
Notes:
The first INGEST command example inserts data from a file with fields in the specified
positions. The fields in the file correspond to the table columns. The DEFAULTIF field
option causes the ingest utility to insert the default value for the second column based on a
default indicator which is located after the data columns.
The second example inserts data from a delimited text file with fields separated by a
comma (the default). The fields in the file correspond to the table columns except that there
are extra fields between the fields for columns 2 and 3 and columns 3 and 4.
V8.1
Instructor Guide

Purpose — To discuss several INGEST command examples using field definitions.
Details —
Transition statement — Next we will look at an example of an INGEST command using
the UPDATE SQL statement.
Instructor Guide
Ingest SQL statements – UPDATE example
• The INGEST command can specify a subset of the INSERT,

UPDATE, DELETE, or MERGE statements.
-- Update records in the table.

(
$key_fld1 INTEGER EXTERNAL,
$key_fld2 INTEGER EXTERNAL,
$data_fld1 CHAR(8),
$data_fld2 CHAR(8),
$data_fld3 CHAR(8)
)
UPDATE my_table SET (data_col1, data_col2, data_col3) =
($data_fld1, $data_fld2, $data_fld3)
WHERE (key_col1 = $key_fld1) AND (key_col2 = $key_fld2);
Figure 10-30. Ingest SQL statements - UPDATE example CL4636.0
Notes:
The visual shows an example of an INGEST command that updates the table rows whose
primary key matches the corresponding fields in the input file.
V8.1
Instructor Guide

Purpose — To discuss an example of an INGEST command tat uses an UPDATE SQL
statement rather than an INSERT statement. This operation is distinctly different from the
INSERT or INSERT_UPDATE processing performed by IMPORT.
Details —
Transition statement — Next we will discuss using an INGEST to perform a MERGE or
DELETE function.
Instructor Guide
INGEST SQL statements – Merge example
-- Merge input records into the table.

FORMAT DELIMITED
(
$key1 INTEGER EXTERNAL,
$key2 INTEGER EXTERNAL,
$data1 CHAR(8),
$data2 CHAR(32),
$data3 DECIMAL(5,2) EXTERNAL
)
MERGE INTO my_table
ON (key1 = $key1) AND (key2 = $key2)
WHEN MATCHED THEN
UPDATE SET (data1, data2, data3) = ($data1, $data2, $data3)
WHEN NOT MATCHED THEN
INSERT VALUES($key1, $key2, $data1, $data2, $data3);
Figure 10-31. INGEST SQL statements - Merge and Delete examples CL4636.0
Notes:
This INGEST command example merges data from the input file into the target table.
For input rows whose primary key fields match a table row, it updates that table row with
the input row. For other input rows, it adds the row to the table.
V8.1
Instructor Guide

Purpose — To discuss using the MERGE statement with the INGEST utility.
Details —
Transition statement — Next we will discuss the fault tolerance of the INGEST command.
Instructor Guide
Fault toleration options for INGEST
• retry_count - Specifies the number of times to retry a failed, but

recoverable, transaction.
• retry_period - Specifies the number of seconds to wait before retrying
a failed, but recoverable, transaction.
• The ingest utility only retries transactions that fail for one of the
following reasons:
– A connection failed but has been reestablished.
– Deadlock or timeout with automatic rollback occurred.
– A system error has caused the unit of work to be rolled back.
– Virtual storage or database resource is not available.
• If the recovery succeeds, the utility will issue a warning, but the
operation otherwise continues as if nothing had happened
– The utility could recover by issuing a commit earlier than requested, or
issuing a rollback and retrying the operation.
A reason code indicates which
• If the recovery fails, the utility will issue an error.

Figure 10-32. Fault toleration options for INGEST CL4636.0
Notes:
The INGEST SET can be used to set the automatic retry options for INGEST utility
processing. The setting affects only later INGEST commands in the same CLP session that
share the same CLP backend process. It does not affect INGEST commands in other CLP
sessions or later CLP sessions that use a different CLP backend process.
These two options define retries for INGEST commands.
• retry_count - Specifies the number of times to retry a failed, but recoverable,
transaction.
• retry_period - Specifies the number of seconds to wait before retrying a failed, but
V8.1
Instructor Guide
Uempty The ingest utility only retries transactions that fail for one of the following reasons:
• A connection failed but has been reestablished.
• Deadlock or timeout with automatic rollback occurred.
• A system error has caused the unit of work to be rolled back.
• Virtual storage or database resource is not available
Instructor Guide
Instructor notes:
Purpose — To discuss the setting of options tat control retrying INGEST operations.
Details —
Transition statement — Next we will see an example of an automatic retry for INGEST
command processing.
V8.1
Instructor Guide
Uempty
INGEST Fault toleration example
-- Example of recovering from full transaction log. Reason code 2

-- means the utility recovered by issuing an early commit.
INGEST SET retry_count 10
INGEST FROM FILE my_file.txt FORMAT DELIMITED INSERT INTO …
SQL2979I The ingest utility is starting at "01/18/2012
12:47:48.421148".
SQL2914I The ingest utility has started the following ingest job:
"DB21001:20120118.124748.421148:00002:00004".
SQL2959W The utility recovered from the following error. Reason
code "2".
Number of reconnects: "0". Number of retries: "1".
SQL0964C The transaction log for the database is full.
SQLSTATE=57011

Number of rows inserted = 10
SQL2902I The ingest utility completed at timestamp "01/18/2012
12:48:00.508088". Number of errors: "0". Number of warnings: "1".
Figure 10-33. INGEST Fault toleration example CL4636.0
Notes:
The example shows the setting of the retry_count option using the INGEST SET command
prior to starting the INGEST operation.
The sample output shows that a database log full condition was encountered during
INGEST processing, but the ingest utility was able to retry the insert operation after
committing the previous data inserts.
Instructor Guide
Instructor notes:
Purpose — To discuss the example of a INGEST utility retrying a failed operation and
being able to complete normal processing.
Details —
Transition statement — Next we will look at some of the error handling options for
INGEST.
V8.1
Instructor Guide
Uempty
INGEST - Error handling options
• The DUMPFILE (or BADFILE) parameter specifies a file to receive input

records that the ingest formatter rejects, for example, records with fields
whose data is not valid for the field type
– After correcting the invalid fields in the dump file, you can re-run the utility using the dump
file as input
• The EXCEPTION TABLE parameter specifies a table to receive input

records that DB2 rejects due to certain SQL errors
– INGEST uses the exception table only when the operation is INSERT or REPLACE
– After correcting the errors in the exception table, you can insert its rows into the target table
• The WARNINGCOUNT parameter specifies the utility is stop after the

specified number of warnings or errors
• The MESSAGES parameter specifies a file to receive messages

– If not specified, the utility writes messages to standard output
Figure 10-34. INGEST - Error handling options CL4636.0
Notes:
INGEST provides a number of options to handle error conditions.
The MESSAGES parameter specifies the file to receive informational, warning, and error
messages. If the file already exists, the ingest utility appends to the end of the file. If this
parameter is not specified, the ingest utility writes messages to standard output.
Even when this parameter is specified, the ingest utility writes messages to standard output
in the following cases:
• syntax errors
• an input file is not found or not readable
• target or exception table is not found
• the dump file or messages file cannot be opened
• other errors detected at start-up
Instructor Guide
Also, the summary messages issued at the end (showing number of rows read, inserted,
and so on, and the total number of warnings and errors) are always written to standard
output.
The DUMPFILE or BADFILE parameter specifies that rows rejected by the formatters are
to be written to the specified file.
The formatters reject rows due to the following types of errors:
• numbers that are invalid or out of range (based on the field type)
• dates, times, and timestamps that do not fit the specified format
• Any other errors detected by the formatter
The EXCEPTION table specifies that rows inserted by the ingest utility and rejected by DB2
with certain SQLSTATEs are to be written to the specified table.
DB2 could reject rows due to the following types of errors. Note that each of these errors
indicates bad data in the input file:
• For character data, right truncation occurred; for example, an update or insert value is a
string that is too long for the column, or a datetime value cannot be assigned to a host
variable, because it is too small.
• A null value, or the absence of an indicator parameter was detected; for example, the
null value cannot be assigned to a host variable, because no indicator variable is
specified.
• A numeric value is out of range.
• An invalid datetime format was detected; that is, an invalid string representation or
value was specified.
• The character value for a CAST specification or cast scalar function is invalid.
• A character is not in the coded character set.
• The data partitioning key value is not valid.
• A resulting row did not satisfy row permissions.
• An insert or update value is null, but the column cannot contain null values.
• The insert or update value of the FOREIGN KEY is not equal to any value of the parent
key of the parent table.
• A violation of the constraint imposed by a unique index or a unique constraint occurred.
• The resulting row of the INSERT or UPDATE does not conform to the check constraint
definition.
• The value cannot be converted to a valid security label for the security policy protecting
the table.
• This authorization ID is not allowed to perform the operation on the protected table.
• The component element is not defined in security label component.
V8.1
Instructor Guide
Uempty • The specified security label name cannot be found for the specified security policy.
• The data type, length, or value of the argument to routine is incorrect.
The WARNINGCOUNT parameter specifies that the INGEST command is to stop after n
warning and error messages have been issued.
Instructor Guide
Instructor notes:
Purpose — To discuss options of the INGEST command for handling error conditions.
Details —
Transition statement — Next we will discuss an example of the INGEST command that
includes these error related options.
V8.1
Instructor Guide
Uempty
Error handling options -- Examples
-- Write records that the utility (formatter)

-- rejects to file my_bad_file.del.
-- Insert records that DB2 rejects to table
-- MY_EXCP_TABLE. Stop the utility if 100 or
more
-- warnings or errors occur. Write all utility
-- messages to my_msgs_file.txt.

DUMPFILE my_bad_file.del
EXCEPTION TABLE my_excp_table
WARNINGCOUNT 100
MESSAGES my_msgs_file.txt
Figure 10-35. Error handling options -- Examples CL4636.0
Notes:
The visual shows a sample INGEST command with a MESSAGES file defined, a
WARNINGCOUNT limit of 100 and the DUMPFILE and EXCEPTION TABLE options
defined to handle input data errors.
Instructor Guide
Instructor notes:
Purpose — To show an INGEST command containing the various error handling options.
Details —
Transition statement — Next we will discuss the restart options for the INGEST
command.
V8.1
Instructor Guide
Uempty
INGEST - Restart
• By default ingest jobs are restartable
• If the utility terminates due to an error or a crash, you can

use the job ID to restart it from the last commit point
• The utility generates a job ID of the form

DB21001:yyyymmdd.hhmmss.uuuuuu.sssss.ttttt, where
– DB21001 refers to DB2 V10.1
– yyyymmdd.hhmmss.uuuuuu is the utility start time
– sssss is the tablespace ID (from the catalog) as a positive
integer
– ttttt is the table ID (from the catalog) as a positive integer
• You can also specify a job ID on the RESTART parameter
Figure 10-36. INGEST - Restart CL4636.0
Notes:
The INGEST utility considers a command to be complete when it reaches the end of the file
or pipe. Under any other conditions, the INGEST utility considers the command incomplete.
These can include:
• The INGEST command gets an I/O error while reading the input file or pipe.
• The INGEST command gets a critical system error from the DB2 database system.
• The INGEST command gets a DB2 database system error that is likely to prevent any
further SQL statements in the INGEST command from succeeding (for example, if the
table no longer exists).
• The INGEST command is killed or terminates abnormally.
By default, all INGEST commands are restartable from the last commit point.
The INGEST command option, RESTART NEW job-ID, specifies that if the INGEST
command fails before completing, it can be restarted from the point of the last commit by
specifying the RESTART CONTINUE option on a later INGEST command. The job-ID is a
string of up to 128 bytes that uniquely identifies the INGEST command. This job-ID must be
Instructor Guide
unique across all INGEST commands in the current database that specified the RESTART
option and are not yet complete. (These could be commands that are still running or that
failed before completing.) Once the INGEST command completes, you can reuse the
job-ID on the RESTART parameter of a later INGEST command. If the job-ID is omitted,
the ingest utility generates one.
V8.1
Instructor Guide

Purpose — To introduce the restart options for the INGEST command.
Details —
Transition statement — Next we will provide additional details about restarting the
INGEST utility.
Instructor Guide
INGEST Restart (continued)
• Restart information is stored in a separate table

(SYSTOOL.INGESTRESTART) that you create once (similar to
explain tables)
– To create the restart table
CALL SYSPROC.SYSINSTALLOBJECTS('INGEST', 'C', NULL, NULL)
– The table does not contain copies of the data, only some counters to keep track of which
records have been ingested
• Restart is designed to have minimal overhead

• You can also specify
– RESTART CONTINUE to restart a previously failed job (and clean up the restart data)
– RESTART TERMINATE to clean up the restart data from a failed job you don't plan to
restart
– RESTART OFF to suppress saving of restart information (in which case the ingest job is
not restartable)
Figure 10-37. INGEST Restart (continued) CL4636.0
Notes:
By default, failed INGEST commands are restartable from the last commit point; however
you first need to create a restart table, which stores the information needed to resume an
INGEST command.
You have to create the restart table only once, and that table will be used by all INGEST
commands in the database.
The ingest utility will use this table to store information needed to resume an incomplete
INGEST command from the last commit point.
Note
The restart table does not contain copies of the input rows, only some counters to indicate
which rows have been committed.
V8.1
Instructor Guide
Uempty It is recommended that you place the restart table in the same tablespace as the target
tables that the ingest utility updates. If this is not possible, you must ensure that the
tablespace containing the restart table is at the same level as the tablespace containing the
target table. For example, if you restore or roll forward one of the table spaces, you must
restore or roll forward the other to the same level. If the table spaces are at different levels
and you run an INGEST command with the RESTART CONTINUE option, the ingest utility
could fail or ingest incorrect data.
If your disaster recovery strategy includes replicating the target tables of ingest operations,
you must also replicate the restart table so it is kept in sync with the target tables.
To create the restart table call the SYSPROC.SYSINSTALLOBJECTS stored procedure:
db2 "CALL SYSPROC.SYSINSTALLOBJECTS('INGEST', 'C', tablespace-name,
NULL)"
Use the RESTART CONTINUE option of INGEST, RESTART CONTINUE job-ID, that
specifies that the ingest utility is to restart a previous INGEST command that specified the
RESTART NEW option and failed before completing. The job-ID specified on this option
must match the job-ID specified on the previous INGEST command. This restarted
command is also restartable.
Instructor Guide
Instructor notes:
Purpose — To discuss additional details about using the RESTART options of the INGEST
utility. Warn students that failure to create the special restart table will prevent INGEST
commands from being able to be restarted.
Details —
Transition statement — Next we will see and example of an INGEST command being
restarted after a system failure.
V8.1
Instructor Guide
Uempty
INGEST - Restart -- Example

RESTART NEW 'My ingest job' -- or omit and use default job ID
Power failure
-- Restart the failed job from the last commit point.

-- (This also cleans up the restart data for this job in the
-- restart table.)
RESTART CONTINUE 'My ingest job'
-- ***** OR *****
-- We don't want to restart –- clean up the restart data.
RESTART TERMINATE 'My ingest job'
Figure 10-38. INGEST - Restart -- Example CL4636.0
Notes:
In the example shown, the INGEST command is executed using the RESTART NEW
option to set a specific job-id value of ‘My ingest job’.
To restart the INGEST processing, the original command is reissued with the RESTART
CONTINUE option included.
If an INGEST command fails before completing and you do not want to restart it, reissue
the INGEST command with the RESTART TERMINATE option. This command option
cleans up the log records for the failed INGEST command.
Instructor Guide
Instructor notes:
Purpose — To show command examples of restarting or terminating a failed INGEST
utility.
Details —
Transition statement — Next we will see two ways to monitor INGEST utility processing.
V8.1
Instructor Guide
Uempty
Monitoring – Example of INGEST LIST
ingest list
Ingest job ID = ingest2

Ingest temp job ID = 1
Target table = INST461.INGHIST1
Input type = FILE
Start Time = 11/01/2013 11:11:23.840242
Running Time = 00:00:03
Number of records processed = 8000
DB20000I The INGEST LIST command completed successfully.
Figure 10-39. Monitoring - Example of INGEST LIST CL4636.0
Notes:
To get basic information about all currently running INGEST commands, use the INGEST
LIST command.
The visual shows an example of the output from the INGEST LIST command.
Usage notes for INGEST LIST:
• The ingest temporary job ID is an integer you can use on the INGEST GET STATS
command. It is provided to save you from typing the full job ID, but it is valid only while
the INGEST command is running
• The dates and times are displayed in the format of the current locale
• The SET UTIL_IMPACT_PRIORITY command does not affect the INGEST command.
• The util_impact_lim database manager configuration parameter does not affect the
INGEST command.
Instructor Guide
Instructor notes:
Purpose — To show an example of the INGEST LIST command that provides basic
information about any currently active INGEST, similar to the LIST UTILITIES command for
DB2 utilities.
Details —
Transition statement — Next we will see the more detailed statistics returned by the
INGEST GET STATS command.
V8.1
Instructor Guide
Uempty
Monitoring – Example of INGEST GET STATS
Temp job ID from

INGEST LIST
command
db2 ingest get stats for 1 every 5 seconds

Ingest job ID = ingest2
Target table = INST461.INGHIST1
Overall Overall Current Current

ingest rate write rate ingest rate write rate
(records/second) (writes/second) (records/second) (writes/second) Total records
----------------- ----------------- ----------------- ----------------- -----------------
45873 2666 137619 8000 8000
17202 9000 0 12800 72000
10586 9307 0 9800 121000
7645 7645 0 3323 137619
DB20000I The INGEST GET STATS command completed successfully.
Processing rates for Input and Output (writes) shown
Figure 10-40. Monitoring - Example of INGEST GET STATS CL4636.0
Notes:
To get more detailed information about a specific INGEST command or all currently running
INGEST commands, use the INGEST GET STATS command.
The example command included the EVERY 5 SECONDS option so that you will see the
processing rates for each five second interval.
Notice in the sample output that the input processing completes at a faster rate than the
rate for writing the data rows to the table.
Instructor Guide
The following example of an INGEST GET STATS command that shows the totals for each
phase:
=> INGEST GET STATS FOR 4 SHOW TOTALS
===================================================================
Ingest job ID = DB21000:20101116.123456.234567:34567:4567
Database = MYDB
Target table = MY_SCHEMA.MY_TABLE1
-------------------------------------------------------------------
Totals for all transporters
Since last query Since command start

------------------------- ----------------------------
Number of bytes read 90,000 180,000
Number of records read 3000 6000
Bytes per second 10,000 10,000
Records per second 1000 1000
---------------------------------------------------------------------------------------
--
Totals for all formatters

------------------------- ----------------------------
Number of records formatted 9000 18,000
Number of records rejected 6 12
---------------------------------------------------------------------------------------
--
Totals for all flushers

------------------------- ----------------------------
Number of records flushed 12,000 24,000
Number of records rejected 8 16
Number of reconnects 4 8
Number of retries 16 32
V8.1
Instructor Guide

Purpose — To show the more detailed monitor data for INGEST processing available
using the INGEST GET STATS command.
Details —
Transition statement — Next we will look at the configuration options for running INGEST
commands.
Instructor Guide
INGEST utility - Configuration parameters
Parameter Range Default Description
COMMIT_COUNT 0 to max 32-bit 0 Number of rows each flusher writes in a

integer single transaction before issuing a commit.
COMMIT_PERIOD 0 to 2,678,400 (31 1 second Number of seconds between committed

days) transactions.
NUM_FLUSHERS_ See below. See below. Number of flushers to allocate for each
database partition (0 means 1 flusher for all
PER_PARITITON partitions)
NUM_FORMATTERS 1 to max number max(1, (number The number of formatter threads.
of threads of logical
CPUs)/2)
PIPE_TIMEOUT 0 to 2,678,400 600 seconds The maximum number of seconds to wait for
seconds (31 days) (10 minutes) data when the input source is a pipe (0 means
wait indefinitely).
RETRY_COUNT 0 to 1000 0 The number of times to retry a failed (but
recoverable) transaction.
RETRY_PERIOD 0 to 2,678,400 0 The number of seconds to wait before
seconds (31 days) retrying a failed (but recoverable) transaction.
SHM_MAX_SIZE 1 to available 1 GB Max size of IPC shared memory in bytes.

memory
Figure 10-41. INGEST utility - Configuration parameters CL4636.0
Notes:
Ingest utility configuration parameters
You can set these configuration parameters to control how the INGEST utility performs on
your DB2 client.
• commit_count - This parameter specifies the number of rows each flusher writes in a
single transaction before issuing a commit.
• commit_period - Specifies the number of seconds between committed transactions.
• num_flushers_per_partition - Specifies the number of flushers to allocate for each
database partition.
• num_formatters - Specifies the number of formatters to allocate.
• pipe_timeout - This parameter specifies the maximum number of seconds to wait for
data when the input source is a pipe.
• retry_count - Specifies the number of times to retry a failed, but recoverable,
transaction.
V8.1
Instructor Guide
Uempty • retry_period - Specifies the number of seconds to wait before retrying a failed, but
• shm_max_size - Specifies the maximum size of Inter Process Communication (IPC)
shared memory in bytes. Because the ingest utility runs on the client, this memory is
allocated on the client machine.
Important
The setting of options using INGEST SET affects only later INGEST commands in the
same CLP session that share the same CLP backend process.
It does not affect INGEST commands in other CLP sessions or later CLP sessions that use
a different CLP backend process.
The following examples set INGEST options using the INGEST SET command.
db2 INGEST SET num_flushers_per_partition 5
db2 INGEST SET shm_max_size 2 GB
Instructor Guide
Instructor notes:
Purpose — To discuss the INGEST configuration options that can be set prior to executing
an INGEST command.
Details —
Transition statement — Next we will discuss the internal processing architecture for
INGEST.
V8.1
Instructor Guide
Uempty
INGEST processing architecture
• Main components/steps: transporter, formatter, flusher

• Multi-threaded
• Directs each row to the appropriate database partition
Hash by database
partition
Flusher(s)
multiple [Array Insert] SQL
files Transporter Formatter
Formatter
OR Flusher(s)
Formatter [Array Insert]
multiple
Transporter Formatter
pipes
Flusher(s)
[Array Insert]
(DB2 Partitioned
(Single CLIENT process) database)
Figure 10-42. INGEST processing architecture CL4636.0
Notes:
A single INGEST command goes through three major phases:
1. Transport
The transporters read from the data source and put records on the formatter queues.
For INSERT and MERGE operations, there is one transporter thread for each input
source (for example, one thread for each input file). For UPDATE and DELETE
operations, there is only one transporter thread.
2. Format
The formatters parse each record, convert the data into the format that DB2 database
systems require, and put each formatted record on one of the flusher queues for that
record's partition. The number of formatter threads is specified by the num_formatters
configuration parameter. The default is (number of logical CPUs)/2.
3. Flush
The flushers issue the SQL statements to perform the operations on the DB2 tables.
The number of flushers for each partition is specified by the num_flushers_per_partition
Instructor Guide
configuration parameter. The default is max( 1, ((number of logical CPUs)/2)/(number of

partitions) ).
You can use the ingest utility to move data into a partitioned database environment.
INGEST commands running on a partitioned database use one or more flushers for
each partition, as specified by the num_flushers_per_partition configuration parameter.
The default is as follows:
- max(1, ((number of logical CPUs)/2)/(number of partitions))
- You can also set this parameter to 0, meaning one flusher for all partitions.
Each flusher connects directly to the partition to which it will send data. In order for the
connection to succeed, all the DB2 server partitions must use the same port number to
receive client connections.
V8.1
Instructor Guide

Purpose — To provide some information about the internal processing used by the
INGEST command. The visual shows how the data rows are directed to the specific
database partitions by the flushers of the INGEST utility.
Details —
Transition statement — Next we will compare the table types supported by INGEST,
LOAD and IMPORT.
Instructor Guide
Comparison to IMPORT and LOAD – supported

Table types
Table type Import Load Ingest
Multidimensional clustering (MDC) table yes yes yes
Insert Time Clustered (ITC) table yes yes yes
Materialized query table (MQT) that is maintained by user yes yes yes
Nickname yes no yes

(SQL02305N)
Range-clustered table (RCT) yes no yes
Range-partitioned table yes yes yes
Summary table no yes yes
Typed table no no yes

Untyped (regular) table yes yes yes
Updatable view yes no yes

(SQL02305N)
Figure 10-43. Comparison to IMPORT and LOAD - supported Table types CL4636.0
Notes:
The visual shows the table types supported by the INGEST, IMPORT and LOAD utilities.
Check the DB2 Information Center for a complete list of supported targets for each utility.
V8.1
Instructor Guide

Purpose — To compare table types supported by the INGEST, LOAD and IMPORT
utilities.
Details —
Transition statement — Next we will compare the column data types supported by
INGEST, LOAD and IMPORT.
Instructor Guide
Comparison to IMPORT and LOAD – Column

types
Column data type Import Load Ingest
Numeric: SMALLINT, INTEGER, BIGINT, DECIMAL, REAL, DOUBLE, yes yes yes
DECFLOAT
Character: CHAR, VARCHAR, NCHAR, NVARCHAR, plus corresponding yes yes yes
FOR BIT DATA types
Graphic: GRAPHIC, VARGRAPHIC yes yes yes
Long types: LONG VARCHAR, LONG VARGRAPHIC yes yes yes

Date/time: DATE, TIME, TIMESTAMP(p) yes yes yes
DB2SECURITYLABEL yes yes yes
LOBs from files: BLOB, CLOB, DBCLOB, NCLOB yes yes no

inline LOBs yes yes no
XML from files yes yes no
inline XML no no no
distinct type (note 1) yes yes yes
structured type no no no
reference type yes yes yes

Notes:
1. Supported if based on a supported built-in type.
Figure 10-44. Comparison to IMPORT and LOAD - Column types CL4636.0
Notes:
The visual shows the column data types supported by each utility.
The ingest utility does not support the following column types:
• large object types (LOB, BLOB, CLOB, DBCLOB)
• XML
• structured types
V8.1
Instructor Guide

Purpose — To compare data types supported by INGEST, LOAD and IMPORT.
Details —
Transition statement — Next we will compare input types supported by each utility.
Instructor Guide
Comparison to IMPORT and LOAD

Input types and formats
Input type Import Load INGEST
cursor no yes no
device no yes no
file yes yes yes
pipe no yes yes
multiple input files, multiple pipes, etc no yes yes
Input format Import Load INGEST
ASC (including binary) yes, except binary yes yes
DEL yes yes yes

IXF yes yes no
Figure 10-45. Comparison to IMPORT and LOAD Input types and formats CL4636.0
Notes:
The visual compares the input types supported by INGEST compared to LOAD and
IMPORT.
Note that INGEST does not support the IXF file formatted input that is supported by LOAD,
IMPORT and EXPORT.
V8.1
Instructor Guide

Purpose — To compare input types supported by INGEST, compared to LOAD and
IMPORT.
Details —
Transition statement — Next we will discuss when you may want to use INGEST rather
than a LOAD utility.
Instructor Guide
When to use INGEST rather than LOAD
• Use INGEST when any of the following is true

– You need other applications to update the table while it is
being loaded
– The input file contains fields you want to skip over
– You need to specify an SQL statement other than INSERT
– You need to specify an SQL expression (to construct a column
value from field values)
– You need to recover and continue on when the utility gets a
recoverable error
Figure 10-46. When to use INGEST rather than LOAD CL4636.0
Notes:
The visual suggests several conditions that may influence the use of INGEST rather then
using the LOAD utility.
One main reason for selecting INGEST over LOAD is the reduced lock contention for
INGEST, since it does not force the target table into a read-only or exclusive table lock for
processing.
You may also want to use the options of INGEST like UPDATE or MERGE that are not
supported by a LOAD.
V8.1
Instructor Guide

Purpose — To discuss some of the reasons for selecting the INGEST utility rather then
using a LOAD utility.
Details —
Transition statement — Next we will discuss some reasons for using LOAD instead of
INGEST.
Instructor Guide
When to use LOAD rather than INGEST
• Use LOAD when any of the following is true

– You don't need other applications to update the table while it
is being loaded
– You need to load a table that contains XML or LOB columns
– You need to load from cursor or load from a device
– You need to load from a file in IXF format
– You need to load a GENERATED ALWAYS column or
SYSTEM_TIME column with the data specified in the input
file
Figure 10-47. When to use LOAD rather than INGEST CL4636.0
Notes:
The visual suggests some conditions when a LOAD utility may be preferred to using the
INGEST utility.
Some of the reasons relate to unsupported functions for the INGEST utility, like an input file
in IXF format, or unsupported data types like LOB or XML columns.
V8.1
Instructor Guide

Purpose — To discuss some conditions where the LOAD utility may be selected rather
than using INGEST.
Details —
Transition statement — Next we will discuss using the db2move utility.
Instructor Guide
db2move utility options: Export/Import/Load

db2move.lst
• EXPORT:
!"ADMIN"."EXPLAIN_INSTANCE"!tab1.ixf!tab1.msg!
¾ -tc table creator list
!"ADMIN"."EXPLAIN_STATEMENT"!tab2.ixf!tab2.msg!
¾ -tn table name list !"ADMIN"."EXPLAIN_ARGUMENT"!tab3.ixf!tab3.msg!
¾ -sn schema name list !"ADMIN"."EXPLAIN_OBJECT"!tab4.ixf!tab4.msg!
!"ADMIN"."EXPLAIN_OPERATOR"!tab5.ixf!tab5.msg!
¾ -ts table space list !"ADMIN"."EXPLAIN_PREDICATE"!tab6.ixf!tab6.msg!
¾ -tf file contains list of tables !"ADMIN"."EXPLAIN_STREAM"!tab7.ixf!tab7.msg!
db2move sample export -tc admin -tn EXPL*
• IMPORT:
¾ -io defaults to REPLACE_CREATE
¾ CREATE,INSERT,INSERT_UPDATE,REPLACE
db2move sample import -io replace
• LOAD:
¾ -lo defaults to INSERT
¾ REPLACE
db2move sample load -lo insert
Figure 10-48. db2move utility options: Export/Import/Load CL4636.0
Notes:
There db2move command provides a number of options that define the filtering that will be
used to select the catalog table entries that should be exporting. The list could be by table
name, table creator, table schema, table space or be read from an external file. These
options can be full name or generic and can be used together.
The example is an export for all tables with a creator name of admin and a table name that
starts with EXPL. This could be used to export a set of Explain tables for a user.
For using the db2move IMPORT mode, the -io option sets the mode used for all Import
utilities. The default is REPLACE_CREATE.
When using the db2move LOAD mode, the -lo option can be used to specify whether to run
the Load utilities in Insert or in Replace mode. The default mode is Insert.
V8.1
Instructor Guide

Purpose — This explains the db2move command options provided to control the selection
of tables for Export and the modes for running Imports and LOAD. It is important to note
that there is no way to set other options that might be desired, like using COPY YES for
loads.
Details —
Transition statement — Let's look at some additional considerations for using db2move.
Instructor Guide
db2move considerations for

Export/Import/Load options
• The target system can be a different platform
– Use binary mode when transferring the IXF files
• LOAD option uses the NONRECOVERABLE option:

– A database backup or table space backups should be created after db2move
processing completes to prevent loss of data if the database needs to be
recovered.
– The effected table spaces will not be in backup pending status
• For IMPORT, if the tables do not already exist, there is no control over
where the tables are created in the target database
• EXPORT will put all of the output IXF files and message files
in a single location, therfore moving a large database might require
several db2move operations with different filter options, each run from
different file system or directory
Figure 10-49. db2move considerations for Export/Import/Load options CL4636.0
Notes:
The db2move utility can be used to help move the tables in one database to a target
database that is on a different platform, like moving from a Windows DB2 system to a AIX
system. The IXF files produced by the db2move EXPORT run should be transferred to the
target system in binary mode.
All LOAD utilities that are run internally by db2move use the NONRECOVERABLE option.
To maintain database recoverability, a database backup or set of table space backups
should be created after db2move processing completes to prevent loss of data if the
database needs to be recovered. The table spaces holding the tables that are loaded by
db2move processing will not be in Backup Pending status.
If a db2move is run with IMPORT and the target tables do not already exist, the tables will
be created, but there is no way to specify which table space to use. The IXF file does not
indicate which table space the original table was in. You can use db2look to get the DDL for
the tables and create them before the db2move command is run with the LOAD or
IMPORT option.
V8.1
Instructor Guide
Uempty The db2move EXPORT function will put all of the output IXF files and the associated
message files in a single location, the current path where the db2move command is
started. When moving a large database it might require several db2move operations with
different filter options, each run using a different file system or directory.
Instructor Guide
Instructor notes:
Purpose — This explains that although it is easy to use db2move to run a series of Export,
Import or Load functions, that for some projects the loss of control over the detailed options
for those utilities might cause problems that need to be planned for.
Details —
Transition statement — Next we will look at the db2move COPY option.
V8.1
Instructor Guide
Uempty
db2move COPY option

Copy one or more schemas between DB2 databases
Uses a -co option to specify:
• Target Database -
• "TARGET_DB <db name> [USER <userid> USING <password>]"
• MODE:
– DDL_AND_LOAD - Creates all supported objects from the source schema,
and populates the tables with the source table data. Default
– DDL_ONLY -Creates all supported objects from the source schema, but does
not repopulate the tables.
– LOAD_ONLY- Loads all specified tables from the source database to the
target database. The tables must already exist on the target.
• SCHEMA_MAP – Allows user to rename schema when copying to target
• TABLESPACE_MAP – Table space name mappings to be used
• Load Utility option – COPY NO or Nonrecoverable
• Owner – Change the owner of each new object created in the target
schema
Figure 10-50. db2move COPY option CL4636.0
Notes:
The COPY option of db2move duplicates schemas into a target database. Use the -sn
option to specify one or more schemas. See the -co option for COPY specific options. Use
the -tn or -tf option to filter tables in LOAD_ONLY mode.
-co option: When the db2move action is COPY, the following -co follow-on options will be
available:
"TARGET_DB <db name> [USER <userid> USING <password>]"
Allows the user to specify the name of the target database and the user/password. (The
source database user/password can be specified using the existing -p and -u options). The
USER/USING clause is optional. If USER specifies a userid, then the password must either
be supplied following the USING clause, or if it's not specified, then db2move will prompt
for the password information. The reason for prompting is for security reasons discussed
below. TARGET_DB is a mandatory option for the COPY action. The TARGET_DB cannot
be the same as the source database. The ADMIN_COPY_SCHEMA procedure can be
used for copying schemas within the same database. The COPY action requires inputting
at least one schema (-sn) or one table (-tn or -tf).
Instructor Guide
Running multiple db2move commands to copy schemas from one database to another will
result in deadlocks. Only one db2move command should be issued at a time. Changes to
tables in the source schema during copy processing might mean that the data in the target
schema is not identical following a copy.
"MODE"
DDL_AND_LOAD – Creates all supported objects from the source schema, and populates
the tables with the source table data. This is the default option.
DDL_ONLY – Creates all supported objects from the source schema, but does not
repopulate the tables.
LOAD_ONLY – Loads all specified tables from the source database to the target database.
The tables must already exist on the target.
This is an optional option that is only used with the COPY action.
"SCHEMA_MAP"
Allows user to rename schema when copying to target. Provides a list of the source-target
schema mapping, separated by commas, surrounded by brackets. For example,
schema_map ((s1, t1), (s2, t2)). This would mean objects from schema s1 will be copied to
schema t1 on the target; objects from schema s2 will be copied to schema t2 on the target.
The default, and recommended, target schema name is the source schema name. The
reason for this is that db2move will not attempt to modify the schema for any qualified
objects within object bodies. Therefore, using a different target schema name might lead to
problems if there are qualified objects within the object body.
For example:
create view FOO.v1 as 'select c1 from FOO.t1'
In this case, copy of schema FOO to BAR, v1 will be regenerated as:
create view BAR.v1 as 'select c1 from FOO.t1'
This will either fail since schema FOO does not exist on the target database, or have an
unexpected result due to FOO being different than BAR. Maintaining the same schema
name as the source will avoid these issues. If there are cross dependencies between
schemas, all inter-dependant schemas must be copied or there might be errors copying the
objects with the cross dependencies.
For example:
create view FOO.v1 as 'select c1 from BAR.t1'
In this case, the copy of v1 will either fail if BAR is not copied as well, or have an
unexpected result if BAR on the target is different than BAR from the source. db2move will
not attempt to detect cross schema dependencies.
This is an optional option that is only used with the COPY action.
V8.1
Instructor Guide
Uempty "NONRECOVERABLE"
This option allows the user to override the default behavior of the Load utility, which is
‘COPY NO’. In a recoverable database, ‘COPY NO’ puts the tablespaces in a ‘backup
pending’ status, so the user will be forced to take backups of each table space that was
loaded into. When specifying this NONRECOVERABLE keyword, the user will not be
forced to take backups of the table spaces immediately. It is, however, highly
recommended that the backups be taken as soon as possible to ensure the newly created
tables will be properly recoverable. This is an optional option available to the COPY action.
"OWNER"
Allows the user to change the owner of each new object created in the target schema after
a successful COPY. The default owner of the target objects will be the connected user; if
this option is specified, ownership will be transferred to the new owner. This is an optional
option available to the COPY action.
"TABLESPACE_MAP"
The user can specify table space name mappings to be used instead of the table spaces
from the source system during a copy. This will be an array of table space mappings
surrounded by parentheses. For example, tablespace_map ((TS1, TS2),(TS3, TS4)). This
would mean that all objects from table space TS1 will be copied into table space TS2 on
the target database and objects from table space TS3 will be copied into table space TS4
on the target. In the case of ((T1, T2),(T2, T3)), all objects found in T1 on the source
database will be recreated in T2 on the target database and any objects found in T2 on the
source database will be recreated in T3 on the target database. The default is to use the
same table space name as from the source, in which case, the input mapping for this table
space is not necessary. If the specified table space does not exist, the copy of the objects
using that table space will fail and be logged in the error file.
The user also has the option of using the SYS_ANY keyword to indicate that the target
table space should be chosen using the default table space selection algorithm. In this
case, db2move will be able to chose any available table space to be used as the target.
The SYS_ANY keyword can be used for all table spaces, example: tablespace_map
SYS_ANY. In addition, the user can specify specific mappings for some table spaces, and
the default table space selection algorithm for the remaining. For example,
tablespace_map ((TS1, TS2),(TS3, TS4), SYS_ANY). This indicates that table space TS1
is mapped to TS2, TS3 is mapped to TS4, but the remaining table spaces will be using a
default table space target. The SYS_ANY keyword is being used since it's not possible to
have a table space starting with SYS.
This is an optional option available to the COPY action.
Instructor Guide
Instructor notes:
Purpose — This describes the COPY option to copy one or more schemas of objects into
a database, with or without data. The Load utility is invoked in either NONRECOVERABLE
or COPY NO mode, so a backup can be used to make sure the new objects are
recoverable. This option became available with DB2 9.1.
Details —
Transition statement — Next, we will look at some examples of using the db2move COPY
option.
V8.1
Instructor Guide
Uempty
db2move COPY schema examples

• To duplicate schema schema1 Database dbsrc
from source database dbsrc
to target database dbtgt, issue:
db2move dbsrc COPY -sn schema1 -co TARGET_DB dbtgt
USER myuser1 USING mypass1
db2move
• To duplicate schema schema1
from source database dbsrc
to target database dbtgt Database dbtgt
rename the schema to newschema1 on
the target map source table space ts1 Output files generated:
COPYSCHEMA.msg
to ts2 on the target, issue: COPYSCHEMA.err
LOADTABLE.msg
db2move dbsrc COPY -sn schema1 -co TARGET_DB dbtgt LOADTABLE.err
SCHEMA_MAP ((schema1,newschema1)) These files are
timestamped.
TABLESPACE_MAP ((ts1,ts2), SYS_ANY))
Figure 10-51. db2move COPY schema examples CL4636.0
Notes:
• To duplicate schema schema1 from source database dbsrc to target database dbtgt,
issue:
db2move dbsrc COPY -sn schema1 -co TARGET_DB dbtgt USER myuser1 USING
mypass1
• To duplicate schema schema1 from source database dbsrc to target database dbtgt,
rename the schema to newschema1 on the target, and map source table space ts1 to
ts2 on the target, issue:
db2move dbsrc COPY -sn schema1 -co TARGET_DB dbtgt
SCHEMA_MAP ((schema1,newschema1))
TABLESPACE_MAP ((ts1,ts2), SYS_ANY))
Instructor Guide
Instructor notes:
Purpose — This shows two simple examples of using the COPY option of db2move.
Details —
Transition statement — Next we will discuss using the ADMIN_COPY_SCHEMA stored
procedure.
V8.1
Instructor Guide
Uempty
ADMIN_COPY_SCHEMA procedure: Copy a
specific schema and its objects in same database
>>-ADMIN_COPY_SCHEMA--(--sourceschema--,--targetschema--,------->
>--copymode--,--objectowner--,--sourcetbsp--,--targettbsp--,---->
>--errortabschema--,--errortab--)------------------------------><
Copymode options: Specifies the mode of copy operation.

• 'DDL': Create empty copies of all supported objects from the source schema.
• 'COPY': Create empty copies of all objects from the source schema, then load each
target schema table with data. Load is done in 'NONRECOVERABLE' mode.
– A backup should be taken after calling the ADMIN_COPY_SCHEMA
• 'COPYNO': Create empty copies of all objects from the source schema, then load
each target schema table with data. Load is done in 'COPYNO' mode.
CALL SYSPROC.ADMIN_COPY_SCHEMA('SOURCE_SCHEMA', 'TARGET_SCHEMA',
'COPY', NULL, 'SOURCETS1 , SOURCETS2', 'TARGETTS1, TARGETTS2,
SYS_ANY', 'ERRORSCHEMA', 'ERRORNAME')
Figure 10-52. ADMIN_COPY_SCHEMA procedure: Copy a specific schema and its objects in same database CL4636.0
Notes:
ADMIN_COPY_SCHEMA procedure: Copy a specific schema and its objects
The ADMIN_COPY_SCHEMA procedure is used to copy a specific schema and all objects
contained in it. The new target schema objects will be created using the same object
names as the objects in the source schema, but with the target schema qualifier. The
ADMIN_COPY_SCHEMA procedure can be used to copy tables with or without the data of
the original tables.
Syntax:
>>-ADMIN_COPY_SCHEMA--(--sourceschema--,--targetschema--,------->
>--copymode--,--objectowner--,--sourcetbsp--,--targettbsp--,---->
>--errortabschema--,--errortab--)------------------------------><
Instructor Guide
Procedure parameters:
sourceschema – An input argument of type VARCHAR(128) that specifies the name of the
schema whose objects are being copied. The name is case-sensitive.
targetschema – An input argument of type VARCHAR(128) that specifies a unique
schema name to create the copied objects into. The name is case-sensitive. If the schema
name already exists, the procedure call will fail and return a message indicating that the
schema must be removed prior to invoking the procedure.
copymode – An input argument of type VARCHAR(128) that specifies the mode of copy
operation. Valid options are:
• 'DDL': Create empty copies of all supported objects from the source schema.
• 'COPY': Create empty copies of all objects from the source schema, then load each
target schema table with data. Load is done in 'NONRECOVERABLE' mode. A backup
must be taken after calling the ADMIN_COPY_SCHEMA, otherwise the copied tables
will be inaccessible following recovery.
• 'COPYNO': Create empty copies of all objects from the source schema, then load each
target schema table with data. Load is done in 'COPYNO' mode.
Note
If copymode is 'COPY' or 'COPYNO', a fully qualified filename, for example 'COPYNO

/home/mckeough/loadoutput', can be specified along with the copymode parameter
value. When a path is passed in, load messages will be logged to the file indicated. The
file name must be writable by the user ID used for fenced routine invocations on the
instance. If no path is specified, then load message files will be discarded (default
behavior).
objectowner – An input argument of type VARCHAR(128) that specifies the authorization

ID to be used as the owner of the copied objects. If NULL, then the owner will be the
authorization ID of the user performing the copy operation.
sourcetbsp – An input argument of type CLOB(2 M) that specifies a list of source table
spaces for the copy, separated by commas. Delimited table space names are supported.
For each table being created, any table space found in this list, and the tables definition,
will be converted to the nth entry in the targettbsp list. If NULL is specified for this
parameter, new objects will be created using the same table spaces as the source objects
use.
targettbsp – An input argument of type CLOB(2 M) that specifies a list of target table
spaces for the copy, separated by commas. Delimited table space names are supported.
One table space must be specified for each entry in the sourcetbsp list of table spaces. The
nth table space in the sourcetbsp list will be mapped to the nth table space in the targettbsp
list during DDL replay. It is possible to specify 'SYS_ANY' as the final table space (an
V8.1
Instructor Guide
Uempty additional table space name, that does not correspond to any name in the source list).
When 'SYS_ANY' is encountered, the default table space selection algorithm will be used
when creating objects (refer to the IN tablespace-name1 option of the CREATE TABLE
statement documentation for further information on the selection algorithm). If NULL is
specified for this parameter, new objects will be created using the same table spaces as
the source objects use.
errortabschema – An input and output argument of type VARCHAR(128) that specifies the
schema name of a table containing error information for objects that could not be copied.
This table is created for the user by the ADMIN_COPY_SCHEMA procedure in the
SYSTOOLSPACE table space. If no errors occurred, then this parameter is NULL on
output.
errortab – An input and output argument of type VARCHAR(128) that specifies the name
of a table containing error information for objects that could not be copied. This table is
created for the user by the ADMIN_COPY_SCHEMA procedure in the SYSTOOLSPACE
table space. This table is owned by the user ID that invoked the procedure. If no errors
occurred, then this parameter is NULL on output. If the table cannot be created or already
exists, the procedure operation fails and an error message is returned. The table must be
cleaned up by the user following any call to the ADMIN_COPY_SCHEMA procedure; that
is, the table must be dropped in order to reclaim the space it is consuming in
SYSTOOLSPACE.
Table 322: ADMIN_COPY_SCHEMA errortab format
Column name Data type Description
OBJECT_SCHEMA VARCHAR(128) Schema name of the object for which the
copy command failed.
OBJECT_NAME VARCHAR(128) Name of the object for which the copy
command failed.
OBJECT_TYPE VARCHAR(30) Type of object.
SQLCODE INTEGER The error SQLCODE.
SQLSTATE CHAR(5) The error SQLSTATE.
ERROR_TIMESTAMP TIMESTAMP Time of failure for the operation that
failed.
STATEMENT CLOB(2 M) DDL for the failing object. If the failure
occurred when data was being loaded
into a target table, this field contains text
corresponding to the load command that
failed.
DIAGTEXT CLOB(2 K) Error message text for the failed
operation.
Authorization – In order for the schema copy to be successful, the user ID calling this
procedure must have the appropriate object creation authorities including both the authority
to select from the source tables, and the authority to perform a load. If a table in the source
schema is protected by label based access control (LBAC), the user ID must have LBAC
Instructor Guide
credentials that allow creating that same protection on the target table. If copying with data,
the user ID must also have LBAC credentials that allow both reading the data from the
source table and writing that data to the target table.
EXECUTE privilege on the ADMIN_COPY_SCHEMA procedure is also needed.
1+1
=2 Example
CALL SYSPROC.ADMIN_COPY_SCHEMA('SOURCE_SCHEMA', 'TARGET_SCHEMA',

'COPY', NULL, 'SOURCETS1 , SOURCETS2', 'TARGETTS1, TARGETTS2,
SYS_ANY', 'ERRORSCHEMA', 'ERRORNAME')
Restrictions;
• Only DDL copymode is supported for HADR databases.
• XML with COPY or COPY NO is not supported.
• Using the ADMIN_COPY_SCHEMA procedure with the COPYNO option places the
table spaces in which the target database object resides in Backup Pending state. After
the load operation completes, target schema tables are in Set Integrity Pending state,
and the ADMIN_COPY_SCHEMA procedure issues a SET INTEGRITY statement to
get the tables out of this state. Because the table spaces are already in Backup
Pending state, the SET INTEGRITY statement fails.
V8.1
Instructor Guide

Purpose — This describes the ADMIN_COPY_SCHEMA procedure to copy a schema of
objects into a new schema, with or without data. The Load utility is invoked in either
NONRECOVERABLE or COPY NO mode, so a backup can be used to make sure the new
objects are recoverable.
Details —
Transition statement — Next we will discuss using the ADMIN_MOVE_TABLE
procedure.
Instructor Guide
Considerations for making changes to DB2 tables

• The traditional methods used to make some changes to DB2 tables
involved moving the data to a new table or performing an offline
reorganization that significantly impacts application availability.
• For example:
– Moving a table to a new set of table spaces (data, index, long)
– Implementing data compression or generating a new compression dictionary
– Adding or removing columns or changing column definitions
– Add or change columns used for multidimensional clustering
– Add or change columns used for range partitioning
– Change columns used for distribution keys for Database partitioning
• Other considerations for moving data to a new table definition:

– Views that reference the source table
– Triggers defined on the source table
– Access privileges granted on the source table
– Catalog statistics need to be collected for the target table and indexes
Figure 10-53. Considerations for making changes to DB2 tables CL4636.0
Notes:
In order to make some changes to a table, the traditional approach would involve creating a
new table and moving the existing data to the new table. If the table is accessed by
application which need to be available almost all of the time, the change to the table would
need to be scheduled to be performed offline. It the table is large, the amount of time the
able would not be available would be extended. Some of the common changes to a table
that would often involve moving data to a new table:
• Changing the table spaces used to store a table and its indexes
• Implementation of data compression could be performed without creating a new table,
but the reorganization to build the dictionary and compress the table needs to run offline
• Adding columns, removing columns or changing column definitions in a table
• Either converting a standard table to a MDC table or changing the columns specified in
the ORGANIZE BY clause
• Converting a standard table to include range partitions or changing the range definitions
for a range-partitioned table
V8.1
Instructor Guide
Uempty • Changing the columns used to distribute the rows evenly in a DPF-partitioned database
system
Moving from one table to another could also impact the views and triggers that reference
the table. The access permissions for the new table would need to match the original
granted privileges. It would also be necessary to collect catalog statistics for the new table.
Instructor Guide
Instructor notes:
Purpose — To review the considerations for making major changes to a table. Many of
these changes require creating a new table and dropping the old table, which impacts any
application that need to access the table.
Details —
Transition statement — Next we will see how the ADMIN_MOVE_TABLE procedure can
help to make these changes to tables with high availability requirements.
V8.1
Instructor Guide
Uempty
Online Table Move stored procedure

• The ADMIN_TABLE_MOVE procedure introduced in DB2 9.7
is designed to move data from a source table to a target table
with a minimal impact to application access
– Changes that can be made using ADMIN_TABLE_MOVE:
• New Data, Index or Long table spaces, which could have a different
page size, extent size or type of table space management (like
moving from SMS to Automatic Storage)
• Data compression could be implemented during the move
• MDC clustering can be added or changed
• Range partitions can be added or changed
• Distribution keys can be changed for database partitioned tables
• Columns can be added, removed or changed
– Multiple phased processing allows write access to the source table
except for a short outage required to swap access to the target table
Figure 10-54. Online Table Move stored procedure CL4636.0
Notes:
Beginning with DB2 9.7, the ADMIN_MOVE_TABLE stored procedure can be used to
move the data in a table to a new table object of the same name (but with possibly different
storage characteristics), while the data remains online and available for access. You can
also generate a new optimal compression dictionary when a table is moved.
This feature reduces your total cost of ownership (TCO) and complexity by automating the
process of moving table data to a new table object while allowing the data to remain online
for select, insert, update, and delete access.
The ADMIN_MOVE_TABLE procedure creates a shadow copy of the table. During the
Copy phase, insert, update, and delete operations against the original table are captured
using triggers and placed into a staging table. After the Copy phase has completed, the
data change operations that were captured in the staging table are replayed to the shadow
copy. The copy of the table includes all table options, indexes, and views. The procedure
then briefly takes the table offline to swap the object names.
Instructor Guide
Instructor notes:
Purpose — To introduce the ADMIN_MOVE _TABLE procedure that can be used to make
majors changes to a table, like those previously discussed while leaving the table
accessible during most of the processing time.
Details —
Transition statement — Next we will cover the basic concepts involved with the
ADMIN_MOVE_TABLE procedure.
V8.1
Instructor Guide
Uempty
ADMIN_MOVE_TABLE: Processing phases

1 INIT PHASE SYSTOOLS.ADMIN_MOVE_TABLE
Create triggers,target, tabschema tabname key value
staging tables
SOURCE TARGET
TABLE 2 TABLE
COPY PHASE
c1 c2 … cn c1 c2 … cn
3
REPLAY PHASE
INSERT INSERT
c1 c2 … cn
Rows with
DELETE DELETE keys present
UPDATE in staging
UPDATE table are
re-copied
Online Keys of from source
Workload row changed table
by online
workoad 4 SWAP PHASE
captured via
triggers STAGING Rename Target -> Source
TABLE
Figure 10-55. ADMIN_MOVE_TABLE: Processing phases CL4636.0
Notes:
The ADMIN_MOVE_TABLE stored procedure uses a multi-phase approach to moving a
table. The procedure can be requested to automatically process all phases based on a
single call or an administrator can use a set of calls to control when each phase begins.
The basic phases used are:
• The INIT phase, performs setup work like creating a staging table to record any
changes to the source table that are made during the move. It can also create a target
table for the move. The progress of the procedure is reflected in the
SYSTOOLS.ADMIN_MOVE_TABLE control table. A set of triggers are created to
capture the changes in the source table and store information in the staging table.
• The COPY phase, copies all of the data from the source table to the target table.
• The REPLAY phase updates the target table with changed data from the source table,
based on the staging table contents.
• The SWAP phase finalizes the move by renaming the target table to replace the source
table name. This phase does require exclusive control of the source table.
Instructor Guide
Instructor notes:
Purpose — To explain the basic steps used by the ADMIN_MOVE_TABLE procedure. This
is just an overview. Each step will be covered in more detail, so do not spend too much time
here.
Details —
Transition statement — Next we will look at the syntax used for the ADMIN_MOVE
_TABLE procedure.
V8.1
Instructor Guide
Uempty
ADMIN_MOVE_TABLE procedure methods

• There are two methods of calling ADMIN_MOVE_TABLE
One method specifies the how to define the target table.
>>-ADMIN_MOVE_TABLE--(--tabschema--,--tabname--,---------------->
>--data_tbsp--,--index_tbsp--,--lob_tbsp--,--organize_by_clause--,--->
.-,-------.
V |
>--partkey_cols--,--data_part--,--coldef--,----options-+--,----->
>--operation--)------------------------------------------------><
The second method allows a predefined table to be specified as the

target for the move.
>>-ADMIN_MOVE_TABLE--(--tabschema--,--tabname--,---------------->
.-,-------.
V |
>--target_tabname--,----options-+--,--operation--)-------------><
Figure 10-56. ADMIN_MOVE_TABLE procedure methods CL4636.0
Notes:
There are two equally valid methods to invoke ADMIN_MOVE_TABLE:
• The first method allows you to modify only certain parts of the table definition for the
target table. For instance, if you had a table definition that is quite large (several KB),
and all you want to do Is modify the table spaces for the table, you can do so without
having to determine the entire CREATE TABLE statement needed to recreate the
source table. All you need to do is to fill out the data_tbsp, index_tbsp, and lob_tbsp
parameters, leaving the other optional parameters blank.
• The second method provides you with more control and flexibility by allowing you to
create the target table beforehand, rather than having the stored procedure create the
target table. This enables you to create a target table that would not be possible using
the first method.
Instructor Guide
Instructor notes:
Purpose — To show the two methods that can be used to run the ADMIN_MOVE_TABLE
procedure.
Details —
Transition statement — Next we will discuss the parameters used to call the stored
procedure.
V8.1
Instructor Guide
Uempty
ADMIN_MOVE_TABLE call parameters (1 of 2)

• Tabschema – Name of the schema which contains the table to be moved.
• Tabname – The name of the table to be moved.
• Data_tbsp – Specifies the new data table space for the target table. If a value is
provided, the index_tbsp and lob_tbsp parameters are required.
• Index_tbsp – Specifies the new index table space for the target table.
• LOB_tbsp – Specifies the new LOB table space for the target table.
• organize_by_clause – This input parameter can be used to specify an ORGANIZE
BY clause for the table. If the value provided does not begin with 'ORGANIZE BY'
then it provides the multi-dimensional clustering (MDC) specification for the target
Can also be used to convert from row-organized to column-organized.
Example: 'C1, C4, (C3,C1), C2'
• Partkey_cols – Provides the distribution key columns specification for the target
table and has the same format as the DISTRIBUTE BY HASH clause of the
Example: 'C1, C3'
• Data_part – Provides the data partitioning specification for the target table and has
the same format as the PARTITION BY RANGE clause of the CREATE TABLE
statement.
Example: '(C1) (STARTING FROM (1) EXCLUSIVE ENDING AT (1000) EVERY
(100))'
Figure 10-57. ADMIN_MOVE_TABLE call parameters (1 of 2) CL4636.0
Notes:
Procedure parameters for ADMIN_MOVE_TABLE
tabschema – This input parameter specifies the name of the schema which contains the
table to be moved. This parameter is case sensitive and has a data type of
VARCHAR(128).
tabname – This input parameter specifies the name of the table to be moved. This
parameter is case sensitive and has a data type of VARCHAR(128).
data_tbsp – This input parameter specifies the new data table space for the target table. If
a value is provided, the index_tbsp and lob_tbsp parameters are required. If a value is not
provided, the data table space of the source table is used. This parameter is case sensitive
and has a data type of VARCHAR(128). This parameter can be NULL or the empty string.
index_tbsp – This input parameter specifies the new index table space for the target table.
If a value is provided, the data_tbsp and lob_tbsp parameters are required. If a value is not
provided, the index table space of the source table is used. This parameter is case
Instructor Guide
sensitive and has a data type of VARCHAR(128). This parameter can be NULL or the
empty string.
lob_tbsp – This input parameter specifies the new LOB table space for the target table. If a
value is provided, the data_tbsp and index_tbsp parameters are required. If a value is not
provided, the LOB table space of the source table is used. This parameter is case sensitive
and has a data type of VARCHAR(128). This parameter can be NULL or the empty string.
organize-by-clause – This input parameter can be used to specify an ORGANIZE BY
clause for the table. If the value provided does not begin with 'ORGANIZE BY' then it
provides the multi-dimensional clustering (MDC) specification for the target table. The
values are entered as a comma separated list of the columns used to cluster data in the
target table along multiple dimensions. If a value of NULL or "-" is given, the ORGANIZE
BY clause is not used. If an empty string or a single blank is given, the procedure checks
whether there is an MDC or ITC specification on the source table, and uses that
specification if located. If the argument begins with 'ORGANIZE BY' it can be used to
specify any option related to the ORGANIZE BY clause of a CREATE TABLE statement.
This parameter has a data type of VARCHAR(32672) and has the same format as the
ORGANIZE BY DIMENSIONS clause of the CREATE TABLE statement. This parameter
can be NULL, the empty string, or a single blank.
Example 1: 'C1, C4, (C3,C1), C2'
Example 2: ORGANIZE BY INSERT TIME
partkey_cols – This input parameter provides the partitioning key columns specification
for the target table. The values are entered as a comma separated list of the key columns
that specify how the data is distributed across multiple database partitions. If a value of
NULL or "-" is given, the PARTITIONING KEY clause is not used. If an empty string or a
single blank is given, the procedure checks whether there is a partitioning key columns
specification on the source table, and uses that specification if located. This parameter has
a data type of VARCHAR(32672) and has the same format as the DISTRIBUTE BY HASH
clause of the CREATE TABLE statement.
Example: 'C1, C3‘
data_part – This input parameter provides the data partitioning specification for the target
table. This statement defines how to divide table data across multiple storage objects
(called data partitions), according to the values in one or more of the table columns. If a
value of NULL or "-" is given, the PARTITION BY RANGE clause is not used. If an empty
string or a single blank is given, the procedure checks whether there is a date partition
scheme on the source table, and uses that information (including partition name) if located.
This parameter has a data type of VARCHAR(32672) and has the same format as the
PARTITION BY RANGE clause of the CREATE TABLE statement.
Example:
V8.1
Instructor Guide
Uempty '(C1) (STARTING FROM (1) EXCLUSIVE ENDING AT (1000) EVERY (100))‘
Instructor Guide
Instructor notes:
Purpose — To explain the various call parameters that can be used for the
Details —
Transition statement — Next we cover the remaining call parameters.
V8.1
Instructor Guide
Uempty
ADMIN_MOVE_TABLE call parameters (2 of 2)

• Coldef – Specifies a new column definition for the target table:
– Allows you to change the column types as long as they are compatible;
however, the column names must remain the same.
– Also provides the ability to add new columns and drop existing columns.
– To add a column, the column must be nullable or have a default value specified.
– To drop a column, the column can not exist in any existing indexes on the
source table.
Example: 'C1 INT, C2 INT DEFAULT 0'
• Target_tabname – Provides the name of an existing table to use as the target table
• Options – Defines any options used by the stored procedure.
• Operation – Specifies which operation or phase the stored procedure is to execute:
– The MOVE operation executes all the operations at one time
– Each phase can be called step by step. But they must be called in the following
order: INIT, COPY, REPLAY, VERIFY (optional), and SWAP.
– REDIRECT sends changes directly to target, REVERT directs source changes
to staging table
Figure 10-58. ADMIN_MOVE_TABLE call parameters (2 of 2) CL4636.0
Notes:
We continue here with additional call parameters:
coldef: This input parameter specifies a new column definition for the target table, allowing
you to change the column types as long as they are compatible; however, the column
names must remain the same. This also provides the ability to add new columns and drop
existing columns. When adding a column, it must be defined as either nullable or have a
default value set. Also, a column can only be dropped if there is a unique or primary index
on the table and the column to be dropped is not a part of that unique or primary index. This
parameter has a data type of VARCHAR(32672). This parameter can be NULL or the
empty string.
Example: 'C1 INT, C2 INT DEFAULT 0‘
target_tabname: This input parameter provides the name of an existing table to use as the
target table during the move.
The following changes can be made to the target table being passed in:
• The data, index and LOB table spaces can be changed
Instructor Guide
• The multi dimensional column (MDC) specification can be added or changed

• The partitioning key columns specification can be added or changed
• The data partitioning specification can be added or changed
• Data compression can be added or removed
• A new column definition can be specified; however, the same restrictions as when
specifying the coldef parameter apply here
The following restrictions apply to the named table:
• The table must exist in the same schema as the source table
• The table must be empty
• No typed tables, materialized query tables (MQT), staging tables, remote tables or
clustered tables are permitted
If this parameter is set to NULL or the empty string, the stored procedure uses the same
definition as the source table. This parameter is case sensitive and has a data type of
VARCHAR(128).
We will be covering the options and operations in more detail.
V8.1
Instructor Guide

Purpose — To cover additional call parameters. The target table name is used for the
second call syntax. We will be covering the many options and the operation types in detail
later.
Details —
Transition statement — Next we will look at the INIT phase of processing in more detail.
Instructor Guide
ADMIN_MOVE_TABLE: INIT phase

• The INIT phase ensures that a table move can take place, and
initializes all of the objects:
– The utility table SYSTOOLS.ADMIN_MOVE_TABLE is used to track options
and status for moving the source table.
– If a target table is not specified, the target table will be created based on the
table changes specified in the call parameters
– A staging table is created to record information about any changes made to the
source table.
• The staging table is populated by a set of triggers that are defined on the source
table to catch rows updated, inserted and deleted.
– The triggers are created on the source table and are used to capture all
changes that are occurring on the data.
For example:
CREATE TRIGGER MDC."HIST1AATOhxd" AFTER DELETE ON MDC.HIST1
REFERENCING OLD AS i FOR EACH ROW BEGIN ATOMIC
IF NOT EXISTS(SELECT * FROM MDC."HIST1AATOhxs" AS s WHERE
s.TELLER_ID=i.TELLER_ID)
THEN
INSERT INTO MDC."HIST1AATOhxs"(TELLER_ID)VALUES(i.TELLER_ID);--
END IF;--
END;
Figure 10-59. ADMIN_MOVE_TABLE: INIT phase CL4636.0
Notes:
The INIT phase ensures that a table move can take place, and initializes all of the objects
used during processing, including the target table, the staging tables and a set of triggers.
The utility table SYSTOOLS.ADMIN_MOVE_TABLE is used to track options and status for
moving the source table. If a target table is not specified, the target table will be created
based on the table changes specified in the call parameters.
The staging table is created to record information about any changes made to the source
table. The staging table is populated by a set of triggers that are defined on the source
table to catch rows updated, inserted and deleted. The triggers are created on the source
table and are used to record the changes that are occurring on the data.
This is an example of a trigger created by ADMIN_MOVE_TABLE to catch any rows
deleted from a source table named MDC.HIST1 and adds information to the staging table
defined for this move.
CREATE TRIGGER MDC."HIST1AATOhxd" AFTER DELETE ON MDC.HIST1
REFERENCING OLD AS i FOR EACH ROW BEGIN ATOMIC
V8.1
Instructor Guide
Uempty IF NOT EXISTS(SELECT * FROM MDC."HIST1AATOhxs" AS s WHERE

s.TELLER_ID=i.TELLER_ID)
THEN
INSERT INTO MDC."HIST1AATOhxs"(TELLER_ID)VALUES(i.TELLER_ID);--
END IF;--
The trigger does not store a complete data row from the source table into the staging table,
it only adds a single column of data, TELLER_ID in this case, indicating that a specific
TELLER_ID has changed. The index on the TELLER_ID column was selected in this case
because that column had the highest cardinality of any of the table's indexes. If the source
table has a unique index the entry created in the staging table would point to a single row.
Instructor Guide
Instructor notes:
Purpose — To provide more information about the processing performed in the INIT stage
for the online table move procedure.
Details —
Transition statement — Next we will discuss the COPY phase of processing.
V8.1
Instructor Guide
Uempty
ADMIN_MOVE_TABLE: Copy phase

• The COPY phase creates a copy of the source table, as of the time of
the beginning of the COPY phase, and inserts that data into the target
table.
• Any updates/deletes/inserts occurring on this source table during this
time are captured and stored in the staging table.
• Indexes will be created on the target table to match the source table
indexes:
– By default, indexes are created at the end of the COPY phase:
• This option reduces overhead of processing each insert into the target table
• The target table would be scanned once per index as the indexes are created
– The COPY_WITH_INDEXES option creates indexes before copying the source
table data
• The COPY_USE_LOAD option causes the Copy phase to perform a
non-recoverable db2Load API to copy the data from the source table to
the target table:
– This reduces log space required and may improve copy performance
– When using the COPY_USE_LOAD option, it is necessary to perform a backup
of the target table spaces before the SWAP phase in order to ensure
recoverability
Figure 10-60. ADMIN_MOVE_TABLE: Copy phase CL4636.0
Notes:
The COPY phase creates a copy of the source table, as of the time of the beginning of the
COPY phase, and inserts that data into the target table. Any updates, deletes or inserts
occurring on this source table during this time are captured and stored in the staging table.
A set of indexes will be created on the target table to match all the source table indexes. By
default these indexes are created at the end of the COPY phase when there is a complete
set of data. One of the options available for the ADMIN_MOVE_TABLE procedure,
COPY_WITH_INDEXES, causes the indexes to be created before the source table is
copied. Having all of the indexes created before populating the target table would increase
the time required to perform the copy since each index would need to be maintained, but
creating a set of indexes after copying the source data would require one scan of the target
table for each index. This option lets the administrator control the processing based on the
characteristics and requirements of the system involved.
The standard mode of copying data rows from the source table to the target table using
insert from select SQL, which will generate recovery log records for every row inserted. The
COPY_USE_LOAD option causes the copy phase to perform a non-recoverable db2Load
Instructor Guide
API to copy the data from the source table to the target table. Using the LOAD function
reduces log space required and might improve copy performance. One important concern
is that using the COPY_USE_LOAD option in this mode makes the processing more
complex for recovery. It is necessary to perform a backup of the target table spaces before
the SWAP phase in order to ensure recoverability.
V8.1
Instructor Guide

Purpose — To provide more detailed information about the COPY phase of processing
and explain several options available to adjust how the copy is performed.
Details —
Transition statement — Next we will describe the REPLAY phase of processing.
Instructor Guide
ADMIN_MOVE_TABLE: Replay phase

• The REPLAY phase copies the rows that have changed in the source
table since the COPY phase began into the target table.
• The row changes have been captured by triggers created on the source
table and are placed into the staging table:
– Changes are propagated to the target table is by scanning the staging table one
row at a time.
– For each row in the staging table:
• Any rows in the target table matching that index are deleted
• Any rows matching that index in the source table are copied to the target
table.
• The row is then removed from the staging table.
• This scan of the staging table is repeated until the number of entries
found in the staging table during its latest scan is less than the
REPLAY_THRESHOLD value in SYSTOOLS.ADMIN_MOVE_TABLE.
• If the rate of updates/inserts/deletes on the source table is faster than
the replay processing, then the replay will just keep applying the
changes until it catches up during a period of lower activity on the
source table.
Figure 10-61. ADMIN_MOVE_TABLE: Replay phase CL4636.0
Notes:
The REPLAY phase copies the rows that have changed in the source table since the
COPY phase began into the target table. These row changes have been captured by
triggers created on the source table and are placed into the staging table.
Changes are propagated to the target table is by scanning the staging table one row at a
time. The processing performed depends on how unique the index is that was selected for
the staging table. For each row in the staging table, the following is performed:
• Any rows in the target table matching that index are deleted
• Any rows matching that index in the source table are copied to the target table
• The row is then removed from the staging table
Since this processing is performed while additional changes are allowing in the source
table, the staging table might have new rows by the time the first scan of the staging table
completes. The scan of the staging table is repeated until the number of entries found in
the staging table during its latest scan is less than the REPLAY_THRESHOLD value in
SYSTOOLS.ADMIN_MOVE_TABLE. The default value for REPLAY_THRESHOLD is 100.
V8.1
Instructor Guide
Uempty It is possible that the rate of updates, inserts and deletes on the source table so high and
the speed of the replay processing could be relatively slow. In that case, the replay will just
keep applying the changes with repeated scans of the staging table until it catches up
during a period of lower activity on the source table.
The REPLAY phase could be repeated using multiple calls to the ADMIN_MOVE_TABLE
procedure over an extended period of time.
Instructor Guide
Instructor notes:
Purpose — To describe in more detail the processing performed by the REPLAY phase.
One important point to make clear is that the staging table only contains the key value from
a selected source table index. If that is a non-unique index, each row of the staging table
causes all matching rows in the target table to be deleted and all match rows in the course
table to be copied again.
Details —
Transition statement — Next we will discuss the SWAP phase of processing.
V8.1
Instructor Guide
Uempty
ADMIN_MOVE_TABLE: SWAP phase

• The SWAP phase switches the source and target table by renaming the source
table to a temporary name and renaming the target table to the name of the source
table.
• Processing starts by performing REPLAY again to minimize the number of entries in
the staging table.
• If the REORG option was set, an offline reorg will be performed on the target table.
• Target table statistics are collected based on options:
– By default, RUNSTATS will be used to collect statistics
– The COPY_STATS option can be used to copy statistics from the source table
– If the NO_STATS option has been set, the stored procedure does not perform RUNSTATS. You
can set AUTO_RUNSTATS or AUTO_STMT_STATS and DB2 will automatically create
statistics.
• The source table will then be taken offline by obtaining an exclusive lock on the
source table.
• REPLAY will be called for a last time in order to finish replaying all of the rows.
• The target table will be renamed to replace the source table and then the locks are
released.
Figure 10-62. ADMIN_MOVE_TABLE: Swap phase CL4636.0
Notes:
The SWAP phase switches the source and target table by renaming the source table to a
temporary name and renaming the target table to the name of the source table.
Since some time might have passed since the REPLAY phase was performed, processing
starts by performing REPLAY again to minimize the number of entries in the staging table.
If the REORG option was set on the procedure call, an offline reorg will be performed on
the target table at this time.
The collection of catalog statistics for the target table depends on the options specified on
the procedure call:
• By default, RUNSTATS will be used to collect statistics.
• The COPY_STATS option can be used to copy statistics from the source table.
• If the NO_STATS option has been set, the stored procedure does not perform
RUNSTATS. You can set AUTO_RUNSTATS or AUTO_STMT_STATS and DB2 will
automatically create statistics.
Instructor Guide
The source table will then be taken offline by obtaining an exclusive lock on the source
table.
Once the source table is offline and can not be updated by other applications, the REPLAY
function will be called for one last time in order to finish replaying all remaining source table
changes.
The target table will be renamed to replace the source table and then the locks are
released.
SWAP can be used after the COPY phase has completed, but ideally after the REPLAY
phase has been called.
V8.1
Instructor Guide

Purpose — To discuss the processing performed in the SWAP phase.
Details —
Transition statement — Next we will see the possible sequences of phases for the
ADMIN_MOVE_TABLE procedure that could be used.
Instructor Guide
Using Step-mode calls for ADMIN_TABLE_MOVE
INIT COPY
Must be first Copy source to target
Setup Target and Staging Uses SQL INSERT unless the
tables COPY_USE_LOAD option is specified
Creates Indexes on target
Might go right from

COPY to SWAP
REPLAY
Copy source to target
based on staging table data SWAP
Could be run several times Calls REPLAY
over a period of time Option to REORG table
Used to minimize SWAP Set statistics based on options
processing time Gets EXCLUSIVE lock on source
Final REPLAY
Swaps names (target to source)
Handles views, triggers, granted
permissions….
VERIFY (optional)
Checks if the table source and
target tables are identical
Might be used for testing CANCEL
Significant overhead Cancel multi-step or failed move
Drops staging and target tables
Figure 10-63. Using Step-mode calls for ADMIN_TABLE_MOVE CL4636.0
Notes:
This shows the possible sequences that could be used when then ADMIN_MOVE_TABLE
procedure is executes step by step.
The INIT phase must be first. The COPY phase must be run next, but some time could
pass between the two phases. In most cases, the REPLAY phase would follow the COPY
phase, but it is not a required step. The SWAP phase could be run directly after the COPY
phase, since it begins by calling REPLAY. The REPLAY step could be run several times,
prior to running the SWAP. The VERIFY step could be run between the REPLAY and the
SWAP phase but should only be used for test purposes as it adds significant overhead to
the overall processing.
There are several additional modes that can be used to handle exceptions in normal
processing:
• CLEANUP: Drops the staging table, any non-unique indexes or triggers created on the
source table by the stored procedure, and the source table if the KEEP option has not
been set. CLEANUP can be called if the command failed during the SWAP phase.
V8.1
Instructor Guide
Uempty • CANCEL: Cancels a multi-step table move while between phases, or cancels a failed
table move operation. Executing this command requires that the operation status is not
in COMPLETED or CLEANUP state. CANCEL clears up all intermediate data (the
indexes, the staging table, the target table, and the triggers on the source table).
• REDIRECT: Forwards changes directly to the target table instead of capturing the
changes in the staging table. Note: The REDIRECT command does not work on
multi-partitioned systems on tables that do not have a unique index.
• REVERT: Reverts to the original behavior wherein the staging table captures the
changes.
Instructor Guide
Instructor notes:
Purpose — To discuss the possible variations in step sequence when the procedure is run
in multi-step mode.
Details —
Transition statement — Next we will discuss the options available on the
ADMIN_MOVE_TABLE procedure call.
V8.1
Instructor Guide
Uempty
ADMIN_MOVE_TABLE processing options (1)

Call Option Description Phase
KEEP Source table is kept under a generated name. swap
COPY_USE_LOAD Use a non-recoverable LOAD utility instead of copy
SQL inserts during the COPY phase.
COPY_WITH_INDEXES Create the target table indexes before copying the copy
source table rather than after the copy completes.
FORCE The SWAP phase does not check to see if the swap
source table has changed its table definition.
COPY_STATS This option copies the statistics from the source swap
table to the target table
NO_STATS This option does not start RUNSTATS or any swap
statistic copying on the target table.
REORG This option sets up an extra offline REORG swap
NO_TARGET_LOCKSIZE_TABLE Does not keep the LOCKSIZE table option on the copy/swap
target table during the COPY and SWAP phases.
NO_AUTO_REVAL This option prevents automatic revalidation on the init
table, and instead, recreates all triggers and views.
Figure 10-64. ADMIN_MOVE_TABLE processing options (1) CL4636.0
Notes:
The ADMIN_MOVE_TABLE procedure call allows one or multiple processing options to be
included. The table summarizes the options and indicates which phase on processing is
effected by the option.
The procedure supports the following options:
• KEEP: This option keeps a copy of the original source table under a different name. If
the source table name is T1, then after the move that table will be automatically
renamed to something such as T1AAAAVxo. You can retrieve the exact name of the
source table in the returned protocol table, under the ORIGINAL key. You can set this
option at any point up to and including the SWAP phase.
• COPY_USE_LOAD: This option uses the non-recoverable db2Load API to copy the
data from the source table to the target table. You can set this option at any point up to
and including the COPY phase.
• COPY_WITH_INDEXES: This option creates indexes before copying the source table;
however, the default is to create the indexes after copying the source table. The
Instructor Guide
advantages of this option are that index creation after copying requires a whole table
scan per index and that the index creation is a transaction that requires active log
space. If the LOGINDEXREBUILD database configuration parameter is on, significant
log space is required for building the indexes in a short time frame. One disadvantage
of this option is that copy performance is reduced because indexes need to be
maintained on the target table. Also, the resulting indexes many contain pseudo deleted
keys, and the indexes are not as well balanced as if the indexes were created after the
copy. You can set the COPY_WITH_INDEXES option at any point up to and including
the COPY phase.
• FORCE: If the force option is set, the SWAP phase does not check to see if the source
table has changed its table definition. Also, if LOAD is used and FORCE is not set, then
an error message is displayed, as FORCE is required if LOAD is used. You can set this
• NO_STATS: This option does not start RUNSTATS or any statistic copying on the target
table. If you use the AUTO_RUNSTATS or AUTO_STMT_STATS database
configuration parameters, DB2 will automatically create new statistics afterwards. For
backwards compatibility, STATS_NO is also accepted. You can set the NO_STATS
• COPY_STATS: This option copies the statistics from the source table to the target table
before performing the swap. This might cause inaccurate physical statistics, especially
if the page size is changed. However, setting this option saves computing time as
RUNSTATS is not called to compute new statistics. Also, the optimizer might choose
the same access plans, because the statistics are the same. For backwards
compatibility, STATS_COPY is also accepted. You can set the STATS_COPY option at
any point up to and including the SWAP phase.
• NO_AUTO_REVAL: This option prevents automatic revalidation on the table, and
instead, recreates all triggers and views. The NO_AUTO_REVAL option can be set only
in the INIT phase.
• NO_TARGET_LOCKSIZE_TABLE: This option does not keep the LOCKSIZE table
option on the target table during the COPY and SWAP phases. The default is to use the
LOCKSIZE table option on the target table to prevent locking overhead, when no
unique index is specified on the source table.
• REORG: This option sets up an extra offline REORG on the target table before
performing the swap. If you use this option to improve your compression dictionary, be
advised that using the default sampling approach is a better method to create an
optimal compression dictionary. However, if you require an optimal XML compression
dictionary, then REORG is the only method. You can set the REORG option at any point
up to and including the SWAP phase.
This list of options is not case sensitive and has a data type of VARCHAR(128). The list
value can be NULL or the empty string.
V8.1
Instructor Guide

Purpose — To describe the some of the options that can be used to control portion of the
processing for the ADMIN_MOVE_TABLE procedure.
Details —
Transition statement — Next we will discuss some additional options for
ADMIN_MOVE_TABLE.
Instructor Guide
ADMIN_MOVE_TABLE processing options (2)

Call Option Description Phase
CLUSTER Reads the data from the source table with an copy
ORDER BY clause
NON_CLUSTER Reads the data from the source table without an copy
ORDER BY clause
LOAD_MSGPATH <path> This option can be used to define the load copy
message file path.
NOT_ENFORCED Specify this option for the conversion of tables with init
enforced check constraints or foreign key
(referential integrity) constraints that are not
supported on the target table
Figure 10-65. ADMIN_MOVE_TABLE processing options (2) CL4636.0
Notes:
The procedure supports these additional options:
• CLUSTER: This option reads the data from the source table with an ORDER BY clause
when a copy index has been specified using ADMIN_MOVE_TABLE_UTIL, a clustering
index exists on the source table or a unique index or primary key is defined in the
source table. Note: A copy index will override a clustering index; a clustering index will
be used in preference to a primary key; a primary key will be used in preference to a
unique index.
• NON_CLUSTER: This option reads the data from the source table without an ORDER
BY clause regardless of whether a copy index has been specified, a clustering index
exists on the source table, or a unique index or primary key has been defined in the
source table. Note: When neither CLUSTER or NON_CLUSTER options are specified,
ADMIN_MOVE_TABLE will read the data from the source table with an ORDER BY
clause only when a clustering index exists on the source table.
• LOAD_MSGPATH : This option can be used to define the load message file path.
V8.1
Instructor Guide
Uempty • NOT_ENFORCED: Specify this option for the conversion of tables with enforced check
constraints or foreign key (referential integrity) constraints that are not supported on the
target table; otherwise, an error is returned (SQL1667N).
Instructor Guide
Instructor notes:
Purpose — To discuss some additional processing options for ADMIN_ MOVE_TABLE.
Details —
Transition statement — Next we will see how options can be set using the procedure
ADMON_MOVE_TABLE_UTIL.
V8.1
Instructor Guide
Uempty
Setting options in the SYSTOOLS.ADMIN_MOVE_TABLE
control table using ADMIN_MOVE_TABLE_UTIL
• Some of the default processing options can be changed by updating
SYSTOOLS.ADMIN_MOVE_TABLE
• The ADMIN_MOVE_TABLE_UTIL procedure works with the
SYSPROC.ADMIN_MOVE_TABLE stored procedure when moving
active table data:
– The stored procedure can be used to alter the user definable values
in the ADMIN_MOVE_TABLE protocol table
– The table referenced by the TABSCHEMA and TABNAME
parameters must be already in progress, and the authorization ID of
the caller must be the same as the user executing the table move.
Syntax
>>-ADMIN_MOVE_TABLE_UTIL--(--tabschema--,--tabname--,--
action--,--key--,--value--)-><
Example, update the DEEP_COMPRESSION_SAMPLE value to 30720
CALL SYSPROC.ADMIN_MOVE_TABLE_UTIL('SVALENTI','T1','UPSERT',
'DEEPCOMPRESSION_SAMPLE','30720')
Figure 10-66. Setting options in the SYSTOOLS.ADMIN_MOVE_TABLE control table using ADMIN_MOVE_TABLE_UTIL CL4636.0
Notes:
The ADMIN_MOVE_TABLE_UTIL procedure works in conjunction with the
SYSPROC.ADMIN_MOVE_TABLE stored procedure when moving active table data. This
stored procedure provides a mechanism to alter the user definable values in the
ADMIN_MOVE_TABLE protocol table, which is created and used by the
This procedure will only modify a value in the ADMIN_MOVE_TABLE protocol table if a
table move for the table referenced by the TABSCHEMA and TABNAME parameters is
already in progress, and the authorization ID of the caller of the procedure is the same as
the user executing the table move.
Syntax:
>>-ADMIN_MOVE_TABLE_UTIL--(--tabschema--,--tabname--,--action--,--key--,
--value--)-><
The schema for this stored procedure is SYSPROC.
Instructor Guide
Procedure parameters
tabschema – This input parameter specifies the name of the schema containing the
table being moved. This name is case sensitive. and has a data type of
VARCHAR(128).
tabname – This input parameter specifies the name of the table being moved. This
parameter is case sensitive and has a data type of VARCHAR(128)
action – This input parameter specifies the action for the procedure to execute.
Valid values are:
- UPSERT: If the specified TABSCHEMA.TABNAME.KEY exists in the
ADMIN_MOVE_TABLE protocol table, this updates the corresponding VALUE with
the new value parameter. Otherwise, this inserts the KEY and VALUE pair into the
ADMIN_MOVE_TABLE protocol table.
- DELETE: If the specified TABSCHEMA.TABNAME. KEY exists in the
ADMIN_MOVE_TABLE protocol table, this deletes the specified KEY and VALUE
pair from the ADMIN_MOVE_TABLE protocol table.
This parameter has a datatype of VARCHAR(128).
key – This input parameter specifies the key that to upsert or delete in the
ADMIN_MOVE_TABLE protocol table.
value – This input parameter specifies the value to upsert into the
ADMIN_MOVE_TABLE protocol table. This parameter has a data type of CLOB(10M).
The parameter can be NULL or the empty string.
We will cover the key options and values next.
V8.1
Instructor Guide

Purpose — Some adjustments can be made to settings in the ADMIN_MOVE_TABLE
protocol table using the ADMIN_MOVE_TABLE_UTIL stored procedure.
Details —
Transition statement — Next we will discuss the keys and values that can be set using
the ADMIN_MOVE_TABLE_UTIL stored procedure.
Instructor Guide
ADMIN_MOVE_TABLE_UTIL settings
– COMMIT_AFTER_N_ROWS: Sets the commit count for the COPY phase. A value
of 0 means no commits are executed during COPY.
– DEEPCOMPRESSION_SAMPLE: If the source table has compression enabled,
this specifies how much data (in KB) is sampled when creating a dictionary for
compression. A value of 0 means no sampling is done.
– COPY_ARRAY_SIZE: Specifies the ARRAY size for COPY_ARRAY_INSERT, a
value less than or equal to 0 means do not use COPY_ARRAY_INSERT.
– COPY_INDEXSCHEMA: The schema of the index used to cluster the data on the
target table during the COPY phase.
– COPY_INDEXNAME: The name of the index used to cluster the data on the
– REPLAY_MAX_ERR_RETRIES: Specifies the maximum retry count for errors
(lock timeouts or deadlocks) that may occur during the REPLAY phase.
– REPLAY_THRESHOLD: For a single iteration of the REPLAY phase, if the
number of rows applied to the staging table is less than this value, then REPLAY
stops, even if new entries are made in the meantime.
– REORG_USE_TEMPSPACE: If the REORG option is specified you can select a
temporary table space for the USE clause of the REORG command. If a value is
not specified here, the REORG does not use a temp table space for the BUILD.
– SWAP_MAX_RETRIES: Specifies the maximum number of retries allowed during
the SWAP phase (if lock timeouts or deadlocks occur).
Figure 10-67. ADMIN_MOVE_TABLE_UTIL settings CL4636.0
Notes:
The keys and values that can be set using ADMIN_MOVE_TABLE_UTIL are:
• COMMIT_AFTER_N_ROWS: During the COPY phase, a commit is executed after this
many rows are copied. A value of 0 means no commits are executed during COPY.
• DEEPCOMPRESSION_SAMPLE: If the source table has compression enabled, this
field specifies how much data (in KB) is sampled when creating a dictionary for
compression. A value of 0 means no sampling is done.
• COPY_ARRAY_SIZE: Specifies the ARRAY size for COPY_ARRAY_INSERT, a value
less than or equal to 0 means do not use COPY_ARRAY_INSERT.
• COPY_INDEXSCHEMA: The schema of the index used to cluster the data on the
• COPY_INDEXNAME: The name of the index used to cluster the data on the target
table during the COPY phase.
• REPLAY_MAX_ERR_RETRIES: Specifies the maximum retry count for errors (lock
timeouts or deadlocks) that might occur during the REPLAY phase.
V8.1
Instructor Guide
Uempty • REPLAY_THRESHOLD: For a single iteration of the REPLAY phase, if the number of
rows applied to the staging table is less than this value, then REPLAY stops, even if
new entries are made in the meantime.
• REORG_USE_TEMPSPACE: If you call the REORG option in the table move, you can
also specify a temporary table space for the USE clause of the REORG command. If a
value is not specified here, the REORG command uses the same table space as the
table being reorganized.
• SWAP_MAX_RETRIES: Specifies the maximum number of retries allowed during the
SWAP phase (if lock timeouts or deadlocks occur).
Instructor Guide
Instructor notes:
Purpose — To discuss the options that can be set using the ADMIN_MOVE_TABLE_UTIL
procedure.
Details —
Transition statement — Next we will discuss how source table indexes are used by
ADMIN_MOVE_TABLE.
V8.1
Instructor Guide
Uempty
Online Table Move makes use of Source Table
Indexes
• ADMIN_MOVE_TABLE can use any index on the table being moved in
order to improve performance and efficiency of the REPLAY operation.
– The columns of the selected index will be used as the columns of the staging
table.
– When an insert/update/delete operation on the source table occurs, only the
index column values for that row will be recorded in the staging table.
– A unique or Primary index is required when moving a table with XML, LOB
columns.
• The order of preference for selecting the index:

1. Primary Index on the source table.
2. Unique Index on the source table, with lowest column count.
3. Any index on the source table based on highest FULLKEYCARD with highest
COLCOUNT.
4. As a last resort, an index may be generated for the move.
Figure 10-68. Online Table Move makes use of Source Table Indexes CL4636.0
Notes:
The indexes defined on the source table effect the processing performed by
ADMIN_MOVE_TABLE.
The ADMIN_MOVE_TABLE procedure can use any index on the table being moved in
order to improve performance and efficiency of the REPLAY operation. The columns of the
selected index will be used as the columns of the staging table. When an insert, update or
delete operation on the source table occurs the triggers created by the procedure store the
index column values for that row in the staging table, if a matching entry does not already
exist in the table.
If the index selected is a non-unique index, a large number of changes might be indicated
by a single row in the staging table.
When moving a source table that includes XML or LOB columns a unique or Primary index
is required or the procedure will generate an error in the INIT phase.
The ADMIN_MOVE _TABLE procedure selects the source table index based on the
following order of preference:
Instructor Guide
1. Primary Index on the source table

2. A unique Index on the source table, with lowest column count
3. Any index on the source table based on highest FULLKEYCARD with highest
COLCOUNT
4. As a last resort, an index might be generated for the move
V8.1
Instructor Guide

Purpose — To discuss the selection of a source table index to be used as the basis for the
staging table.
Details —
Transition statement — Next we will cover in more detail the effect on the online table
movement using unique or non-unique indexes.
Instructor Guide
Impact of source index selected for REPLAY

processing
• Using a unique index for managing the staging table during the Replay
phase:
– Increases overhead for processing source table changes
– Additional logging for staging table data
– No danger of deadlocks, one row in staging table for each change
– Efficient REPLAY processing because the exact source table row can be copied
• Characteristics of Replay without a unique index:

– Blocks of rows are locked for UDI operations. These blocks are defined by the
non-unique index.
– This might increase lock waits
– Could lead to deadlocks
– Replay phase works on a block basis:
• This can cause higher lock wait for UDI operations on the source table.
• The block update can be very resource intensive
Figure 10-69. Impact of source index selected for REPLAY processing CL4636.0
Notes:
The type of index selected for managing the staging table can have a very significant
impact on the efficiency of the REPLAY phase.
Using a unique index for managing the staging table during the Replay phase means that
each row changed in the source table matches a single row in the staging table. If a single
row changed many times, there would still only be one entry in the staging table, because
the staging table does not contain the actual row data, just the index key value. The use of
a unique index for the replay processing has these characteristics:
• Increases overhead for processing source table changes – Each row changed needs to
have a matching row inserted into the staging table.
• Additional logging for staging table data – There will be a row inserted into the staging
table for each distinct source row that changes.
• No added danger of deadlocks, one row in staging table for each change – Since there
is exactly one row in the staging table for each source row changed, there should not be
a locking conflict between two concurrent changes of the source table.
V8.1
Instructor Guide
Uempty • Efficient REPLAY processing because the exact source table row can be copied – The
scan of the staging table will result in copy only those rows that have changed since the
procedure processing started.
The characteristics for processing the online table move without a unique index are as
follows:
• Blocks of rows are locked for UDI operations. These blocks are defined by the
non-unique index. For example, a table with 1 million rows and a non-unique index with
1000 distinct values, for each row changed in the source table there are on the average
999 other source rows that would match the one row in the staging table.
• Lock waits might increase. If there are several concurrent update applications that
change rows with a common non-unique index key value, the need to access the same
row in the staging table could cause lock waits.
• Could lead to deadlocks. The many to one relationship between the source and staging
table could also lead to application deadlocks.
• Replay phase works on a block basis. For each row of the staging table, there could be
many matching rows in the source and target table that would need to be processed,
many of which did not actually change. This could impact lock contention on the source
table. Depending on the index selected and the types of changes occurring in the
course table, this block update processing during replay can be very resource intensive.
Instructor Guide
Instructor notes:
Purpose — To explain in more detail the effects of using a unique or non-unique index for
processing the replay phase on the online table move procedure.
Details —
Transition statement — Next we will cover some additional index related considerations
for the online table move procedure.
V8.1
Instructor Guide
Uempty
Additional index considerations

• If a cluster index is present on the source table, the target table will be
clustered using that index by default.
– Use the CLUSTER or NON_CLUSTERED options to override the default
clustering
• To cluster the target table by a user-selected index:
– Use the multi-step mode when running ADMIN_MOVE_TABLE.
– Set the key entries "COPY_INDEXSCHEMA" and "COPY_INDEXNAME" in the
protocol table with the desired index to cluster the target table before the COPY
phase.
• To modify the attributes of any existing source table indexes (that is,
index clustering, index compression, change
non-partitioned to partitioned indexes):
– Use a multi-step move operation.
– Perform the INIT and COPY phases before making index changes.
– The name of the target table can be found in the protocol table or listed using
DESCRIBE INDEXES FOR TABLE.
– After the modifications have finished, resume with the REPLAY and SWAP
phases and include the FORCE option.
Figure 10-70. Additional index considerations CL4636.0
Notes:
When a table is moved using the ADMIN_MOVE_TABLE procedure, it might be important
to set the sequence for storing rows in the target table. If a cluster index is present on the
source table, the target table will be clustered using that index by default. If you want to
select a particular source table index to re-cluster the target table the following steps can
be used:
• You will need to run the ADMIN_MOVE_TABLE procedure using the multi-step mode.
• Use the ADMIN_MOVE_TABLE-UTIL procedure before the COPY phase is started to
set the key entries "COPY_INDEXSCHEMA" and "COPY_INDEXNAME" in the protocol
table with the desired index to cluster the target table.
The ADMIN_MOVE_TABLE procedure will create a set of indexes on the target table to
match the indexes on the source table. In order to modify the attributes of any existing
source table indexes (i.e. index clustering, index compression, change non-partitioned to
partitioned indexes) use the following steps:
• You will need to run the ADMIN_MOVE_TABLE procedure using the multi-step mode.
Instructor Guide
• Perform the INIT and COPY phases before making any index changes. The target table
indexes will be created during the COPY phase.
• You will be changing indexes on the target table. The name of the target table can be
found in the protocol table. The indexes for the target table could be listed using
DESCRIBE INDEXES FOR TABLE.
• After the index modifications are finished, resume the processing with the REPLAY and
SWAP phases and include the FORCE option.
V8.1
Instructor Guide

Purpose — To discuss some additional index related considerations when using the
ADMIN_MOVE_TABLE procedure. In both of these cases, if you need to override the
default handling for clustering or target table indexes, the multi-step mode will need to be
used.
Details —
Transition statement — Next we will discuss how ADMIN_MOVE_TABLE could be used
when implementing compression for a table that need to be available as much as possible.
Instructor Guide
Using ADMIN_MOVE_TABLE with data compression

• ADMIN_MOVE_TABLE can be used to implement or remove data
compression
• If the source table is compressed the target table will be compressed by
default
• Compression can be enabled on the target table:
– Manually define the target table with COMPRESS YES
– ALTER the generated target table before the COPY phase
• Methods of creating a compression dictionary:
– By default, a dictionary will be generated before performing the COPY operation
using a Bernoulli sampling from the source table.
• The amount of data sampled can specified in the protocol table using
DEEPCOMPRESSION_SAMPLE.
• The default sample size is 20MB
– If DEEPCOMPRESSION_SAMPLE field is set to 0, standard Automatic
dictionary creation (ADC) will be used.
– If the REORG option is used, the RESETDICTIONARY option will cause a
dictionary to be built based on a complete scan during the SWAP phase.
• An optimal dictionary will be created but the time required for the reorganization
would add time to complete moving the table.
Figure 10-71. Using ADMIN_MOVE_TABLE with data compression CL4636.0
Notes:
One use of the ADMIN_MOVE_TABLE procedure might be to implement or remove data
compression for a table with high availability requirements. The application requirements
might not allow an offline table reorganization to be run because that would require some
portion of the processing to be performed when no access to the table would be allowed,
including the rebuilding of all of the indexes for a table. This procedure could also be used
to get a more efficient compression dictionary built for a table with a very short loss of
access to the table.
By default, the target table will have the same compression option as the source table, so a
source table with COMPRESS set to YES would cause the new target table to have
COMPRESS set to YES. If the source table has compression set off and you want to
implement data compression for the target table you can do one of the following:
• You could manually create the target table with COMPRESS YES and invoke the
procedure using the syntax that allows to specify your target table by name.
• You could run the procedure in multi-step mode and use the ALTER TABLE statement
to alter the target table before the COPY phase is started.
V8.1
Instructor Guide
Uempty There are several methods that could be used to generate the compression dictionary for
the target table:
• By default, the ADMIN_MOVE_TABLE procedure will generate the compression
dictionary using a Bernoulli sample from the source table.
• The default sample size is 20MB of data. The sample size can be adjusted by running
the ADMIN_MOVE_TABLE_UTIL procedure with a key of
DEEPCOMPRESSION_SAMPLE.
• If the ADMIN_MOVE_TABLE_UTIL procedure is used to set the
DEEPCOMPRESSION_SAMPLE field 0, DB2 will use its standard Automatic dictionary
creation (ADC) method to create the dictionary for the target table.
• If the REORG option is used, the RESETDICTIONARY option will cause a dictionary to
be built based on a complete scan during the SWAP phase. This would generate an
optimal dictionary for the target table but extend the time required to complete the
move. The REORG is performed before the exclusive lock is acquired on the source
table, so it would not impact access to the course by applications.
Instructor Guide
Instructor notes:
Purpose — To discuss how ADMIN_MOVE_TABLE can be used to implement changes to
table compression in a manner that allows the table to remain online except for a brief
outage to complete the swap.
Details —
Transition statement — Next we will look at some examples for using
ADMIN_TABLE_MOVE.
V8.1
Instructor Guide
Uempty
Example 1: Move a table to new table spaces

call SYSPROC.ADMIN_MOVE_TABLE ( 'MDC', 'HIST1', 'MDCTSP2', 'MDCTSPI',
'MDCTSP2', NULL, NULL, NULL, NULL, 'KEEP,REORG', 'MOVE' )
KEY VALUE
-------------------------------- --------------------------------------------------
AUTHID INST461
CLEANUP_END 2013-06-01-10.57.55.282196
CLEANUP_START 2013-06-01-10.57.55.050882 Source is kept,
COPY_END 2013-06-01-10.57.36.243768
COPY_INDEXNAME HIST1IX1
target is reorganized
COPY_INDEXSCHEMA MDC
COPY_OPTS ARRAY_INSERT,CLUSTER_OVER_INDEX
COPY_START 2013-06-01-10.57.25.132774
INDEXNAME MDC.HIST1IX3
INDEXSCHEMA INST461
INIT_END 2013-06-01-10.57.24.880488
INIT_START 2013-06-01-10.57.21.220930
ORIGINAL HIST1AATOhxo
REPLAY_END 2013-06-01-10.57.53.870351
REPLAY_START 2013-06-01-10.57.36.244234
REPLAY_TOTAL_ROWS 0
REPLAY_TOTAL_TIME 0
STATUS COMPLETE
SWAP_END 2013-06-01-10.57.54.995848
SWAP_RETRIES 0
SWAP_START 2013-06-01-10.57.54.255941
VERSION 10.05.0000
Figure 10-72. Example 1: Move a table to new table spaces CL4636.0
Notes:
Here is an example that shows the ADMIN_MOVE_TABLE procedure call that could be
used to move a table named MDC.HIST1 to a set of new table spaces. The operation for
the call is ‘MOVE', so all of the phases of processing would be performed automatically
based on a single call.
The target table would be automatically created to match the source table, but the data
table space named MDCTPS2 would be used, the indexes for the target table would be
MDCTSPI. The table space MDCTSP2 is specified for long data, even though the table
used for this example did not contain any long data columns. If any of the three table space
options are specified, all three parameters are required.
The example call also includes the REORG and KEEP options. A offline reorganization of
the target table would be performed during the swap phase. The source table would be
kept. The output shows the generated name for the original source table will be
HIST1AATOhxo.
The output from the procedure call includes the start and end times for each phase, INIT, COPY, REPLAY,
SWAP and CLEANUP. In this example there were no changes processed during the REPLAY phase.
Instructor Guide
Instructor notes:
Purpose — To show a simple example of running the ADMIN_MOVE_TABLE procedure to
move a table to a set of new table spaces, keeping the original table under a new name.
Details —
Transition statement — Next we will look at an example that uses ADMIN_MOVE_TABLE
with a manually created target table.
V8.1
Instructor Guide
Uempty
Example 2: Move a table to a manually created target
table
call SYSPROC.ADMIN_MOVE_TABLE( 'MDC', 'HIST1', 'HISTORYPART',
'COPY_USE_LOAD,FORCE', 'MOVE' )
KEY VALUE
-------------------------------- ----------------------------------------------
AUTHID INST461
CLEANUP_END 2013-06-01-13.06.59.876267 The target table is
CLEANUP_START 2013-06-01-13.06.59.148434 created before the stored
COPY_END 2013-06-01-13.06.43.716949
COPY_INDEXNAME HIST1IX1
procedure call
COPY_INDEXSCHEMA MDC
COPY_OPTS LOAD,WITH_INDEXES,CLUSTER_OVER_INDEX
COPY_START 2013-06-01-13.05.59.168849
INDEXNAME HIST1IX2
INDEXSCHEMA MDC
INIT_END 2013-06-01-13.05.57.378117
INIT_START 2013-06-01-13.05.56.894369
PAR_COLDEF using a supplied target table so COLDEF could be
different
REPLAY_END 2013-06-01-13.06.58.603953
REPLAY_START 2013-06-01-13.06.43.719086
REPLAY_TOTAL_ROWS 0
REPLAY_TOTAL_TIME 3
STATUS COMPLETE
SWAP_END 2013-06-01-13.06.59.134988
SWAP_RETRIES 0
SWAP_START 2013-06-01-13.06.58.605555
VERSION 10.05.0000
Figure 10-73. Example 2: Move a table to a manually created target table CL4636.0
Notes:
Here is an example that shows the ADMIN_MOVE_TABLE procedure call that could be
used to move a table named MDC.HIST1 to a manually created target table named
MDC.HISTORYPART, which was created as a range partitioned table. The operation for
the call is ‘MOVE', so all of the phases of processing would be performed automatically
based on a single call.
The target table is already defined, so it would not need to be created during the INIT
phase. The COPY_USE_LOAD option is included on the call, so a DB2 LOAD utility would
be used to build the target table during the COPY phase. It is important to run a BACKUP
for the target table spaces immediately after the procedure completes, because the
non-recoverable load processing makes it impossible to use backups prior to the procedure
call to recover the target table.
The source table would be dropped.
Instructor Guide
Instructor notes:
Purpose — To show a simple example of calling the ADMIN_MOVE_TABLE procedure
with a manually created target table.
Details —
Transition statement — Next we will look at an example of calling the
ADMIN_MOVE_TABLE procedure in multi-step mode.
V8.1
Instructor Guide
Uempty
Example 3: Move a table with multiple steps

1. Call ADMIN_TABLE_MOVE to initialize processing, using an existing
defined table
call SYSPROC.ADMIN_MOVE_TABLE( 'MDC', 'HIST1', 'HIST3',
'COPY_USE_LOAD,FORCE', 'init' )
2. Next run ‘copy’ phase using a LOAD utility

'COPY_USE_LOAD,FORCE', 'copy' )
3. Use the ‘replay’ phase to apply any changes to source

'REORG', 'replay' )
4. Set option for table reorganization to use a temporary space

call admin_move_table_util (
'MDC','HIST1','UPSERT','REORG_USE_TEMPSPACE','TEMPSPACE1')
5. Complete processing using ‘swap’ phase, the table will be reorganized

before the table names are swapped
'REORG,FORCE', 'swap' )
Figure 10-74. Example 3: Move a table with multiple steps CL4636.0
Notes:
Here is an example that shows a sequence of procedure calls that could be used to move a
table named MDC.HIST1 to a manually created target table named MDC.HIST3 in
multi-step mode.
The operation for the first call is ‘init', which would create the staging table and the triggers
needed to maintain the staging table. The target table has already been created.
'COPY_USE_LOAD,FORCE', 'init' )
The next call to ADMIN_MOVE_TABLE has the operation set to ‘copy'. The
COPY_USE_LOAD option causes the LOAD utility to be used to copy the source data to
the target. Remember this is a non-recoverable load.
'COPY_USE_LOAD,FORCE', 'copy' )
The next call to ADMIN_MOVE_TABLE has the operation set to ‘replay'. DB2 would copy
any changed rows from the source to the target based on the staging table content. The
Instructor Guide
REORG option is included on the call, but the actual reorganization will be performed
during the SWAP phase.
call SYSPROC.ADMIN_MOVE_TABLE (
'MDC', 'HIST1', 'HIST3', 'REORG', 'replay' )
In order to minimize the space needed for the target table’s table spaces, it is decided that
the reorganization operation performed during the SWAP phase should use a temporary
table space. Since that is not the default mode for reorganizations run using
ADMIN_MOVE_TABLE, the stored procedure ADMIN_MOVE_TABLE_UTIL can be used
to update the protocol table setting the system temporary table space to use during the
call admin_move_table_util (
'MDC','HIST1','UPSERT','REORG_USE_TEMPSPACE','TEMPSPACE1')
The final call to ADMIN_MOVE_TABLE has the operation set to ‘swap'. DB2 would perform
another replay operation and also run a complete offline reorganization based on the
REORG option. The target table could then get renamed to replace the original source
table. The FORCE option is required because the COPY_USE_LOAD option was
previously used in the COPY phase.
V8.1
Instructor Guide

Purpose — To show a sequence of procedure calls that would run ADMIN_MOVE _TABLE
in multi-step mode using the ADMIN_MOVE_TABLE_UTIL to change one of the default
options. The temporary table space for the optional REORG is one of the settings that can
be changed as needed by an administrator, but only in multi-step mode.
Details —
Transition statement — Next we will look at the handling for the database objects related
to the table being moved.
Instructor Guide
Objects and privileges that are

preserved during the online table movement
• Indexes:
– ADMIN_MOVE_TABLE will create a set of indexes on the target table to match the source table
– The indexes on the target table will be renamed to match the original names during the SWAP
phase
• Views and Triggers:
– If the NO_AUTO_REVAL procedure option is used or if the AUTO_REVAL option is disabled for
the database, then views and triggers will be dropped and recreated during the SWAP phase.
– Otherwise, the views and triggers can be can be automatically revalidated
• Access privileges
– During the SWAP phase, the entries in SYSCAT.TABAUTH will be used to reproduce the
granting of privileges on the table to users, groups and roles.
• Constraints:
– Constraints (other than referential constraints) are recreated on the target table using the same
constraint names.
– For unique and primary constraints the underlying index name may be different than the index
name on the source table.
• Table flags:
– The table flags of the source table are created on the target table when the target table is
created in the INIT phase.
– The flags include attributes like ‘append_mode’, ‘locksize’, ‘volatile’, ‘compression’,
‘datacapture’, ‘pctfree’, ‘logindexbuild’, and‘owner’.
Figure 10-75. Objects and privileges that are preserved during the online table movement CL4636.0
Notes:
In general, the ADMIN_MOVE_TABLE procedure will automatically handle many of the
related database objects when a table is moved.
As we have previously discussed, the ADMIN_MOVE_TABLE will create a set of indexes
on the target table to match the source table. During the SWAP phase, the indexes on the
target table will be renamed to match the original names during the SWAP phase.
The handling for views and triggers that reference the source table depends on several
options. If the NO_AUTO_REVAL procedure option is used or if the AUTO_REVAL option
is disabled for the database, then views and triggers on the source will be dropped and
matching views and triggers will be created during the SWAP phase. Otherwise, the views
and triggers can be automatically revalidated by DB2.
The access privileges for the source table will be duplicated for the target table. During the
SWAP phase, the entries in SYSCAT.TABAUTH will be used to reproduce the granting of
privileges on the table to users, groups and roles.
V8.1
Instructor Guide
Uempty If the source table has referential constraints defined, the ADMIN_MOVE_TABLE
procedure can not be used to move the table. The table constraints (other than referential
constraints) are recreated on the target table using the same constraint names. For unique
and primary constraints, the underlying index name might be different than the index name
on the source table.
The procedure with set target table flags including the append_mode', ‘locksize', ‘volatile',
‘compression', ‘datacapture', ‘pctfree', ‘logindexbuild', and ‘owner' to match those defined
for the source table when the target table is created in the INIT phase.
Instructor Guide
Instructor notes:
Purpose — To discuss the handling of access privileges and related database objects like
views, triggers and indexes when a table is moved.
Details —
Transition statement — Next we will discuss some restrictions for using the online table
move procedure.
V8.1
Instructor Guide
Uempty
Restrictions for using ADMIN_MOVE_TABLE

• Only simple tables are supported as the source table. No MQTs, clustered tables,
system tables, views, nicknames, or aliases are permitted.
• A table cannot be moved if an event monitor is currently active on the table.
• A table cannot be moved if there is an index with an expression-based key defined
on the table.
• A unique index is required if the table contains LOB, XML, or LONG columns.
• A generated column cannot be part of the MDC specification.
• There is no support for text search indexes.
• The VERIFY operation for tables without a unique index does not work on tables
with LOBs.
• The SYSTOOLSPACE tablespace must be created and accessible to 'PUBLIC'.
• Lock timeouts are possible during the COPY phase because of long running
transactions on the source table.
• Deadlocks can occur during the SWAP phase.
• Deadlocks can occur on a source table with non-unique indexes
• A table cannot be moved if it is in the Set Integrity Pending state.
Figure 10-76. Restrictions for using ADMIN_MOVE_TABLE CL4636.0
Notes:
The following restrictions apply to the ADMIN_MOVE_TABLE stored procedure:
• Only simple tables are supported as the source table. No materialized query tables,
typed tables, range clustered tables, system tables, views, nicknames, or aliases are
permitted.
• A table cannot be moved if there is an index with an expression-based key defined on
the table.
• A table cannot be moved if an event monitor is currently active on the table.
• Tables without a unique index are subject to a complex and potentially expensive replay
phase.
• A unique index is required if the table contains LOB, XML, or LONG columns.
• A generated column cannot be part of the MDC specification.
• There is no support for text search indexes.
Instructor Guide
• Be aware of the large disk space requirements, as the procedure creates two copies of
the table and indexes, plus a staging table and log space.
• Copy performance may be an issue as most of the data is moved to the new table using
"insert from select" form.
• The VERIFY operation for tables without a unique index does not work on tables with
LOBs.
• In releases earlier than DB2 Version 9.7 Fix Pack 2, the DB2_SKIPDELETED registry
variable cannot be set to ON.
• The SYSTOOLSPACE table space must be created and accessible to 'PUBLIC'.
• Lock timeouts are possible during the COPY phase because of long running
transactions on the source table.
• Deadlocks can occur during the SWAP phase.
• Deadlocks can occur on a source table with non-unique indexes and several update
processes.
• With VARCHAR2 support enabled, the database treats the empty string and NULL as
equivalent values, but the single blank is a distinct value. With VARCHAR2 support
enabled, the mdc_cols, partkey_cols, and data_part parameters can use a single blank
as distinct from the empty string and NULL.
• A table cannot be moved if it is in the Set Integrity Pending state.
• A table cannot be moved if there are any XSR objects dependent on it.
V8.1
Instructor Guide

Purpose — To review some of the considerations and restrictions associated with running
the ADMIN_MOVE_TABLE procedure.
Details —
Transition statement — Next we will cover some suggestions for use of
ADMIN_TABLE_MOVE.
Instructor Guide
General suggestions for using ADMIN_MOVE_TABLE

• Avoid making multiple moves into same table space at the same time.
– This prevents fragmentation on the target table space.
• Run this procedure when activity on the table is low.
– Avoid mass data loads or deletes so that parallel read access is not a problem.
• Use a multi-step move operation.
– The INIT and COPY phases can be called at any time.
– Execute the REPLAY phase multiple times in order to keep the staging table
size small
– Issue the SWAP during a time of low activity on the table.
• Check if offline methods are a better choice for your table move,
especially when considering tables without unique indexes and for
tables with no index.
• Be aware of the large disk space requirements, as the procedure
creates two copies of the table and indexes, plus a staging table and
log space.
• Copy performance may be an issue as most of the data is moved to the
new table using insert from select form.
Figure 10-77. General suggestions for using ADMIN_MOVE_TABLE CL4636.0
Notes:
Here are some suggestions for best results when using this procedure:
• Avoid making multiple moves into same table space at the same time. This prevents
fragmentation on the target table space.
• Run this procedure when activity on the table is low. Avoid mass data loads or deletes
so that parallel read access is not a problem.
• Use a multi-step move operation. The INIT and COPY phases can be called at any
time. Execute the REPLAY phase multiple times in order to keep the staging table size
small, and then issue the SWAP during a time of low activity on the table.
• Check if offline methods are a better choice for your table move, especially when
considering tables without unique indexes and for tables with no index.
V8.1
Instructor Guide

Purpose — To provide some suggestions for using the ADMIN_MOVE_TABLE procedure.
Details —
Instructor Guide
Unit summary
• Configure the LOAD utility options to optimize the performance
of loading data into DB2 tables
• Describe the conditions that would impact selection of the
INGEST utility rather than using LOAD
processing
• Utilize the db2move utility to move a group of tables into the
same or a different database
Notes:
V8.1
Instructor Guide

Purpose —
Details —
Instructor Guide
Student exercise 9
Notes:
V8.1
Instructor Guide

Purpose —
Details —
Instructor Guide
V8.2
Instructor Guide
Uempty Unit 11. DB2 Database Auditing
Estimated time
01:00

This unit describes the configuration and management tasks required
for implementation of DB2 instance or database-level auditing. We will
explain the specific tasks that need to be performed by a systems
administrator, SYSADM, or security administrator, SECADM, to setup
and control auditing of DB2 environments, including use of the
db2audit command. We will describe creation of audit policies which
can be assigned to specific tables, users or database roles to perform
selective collection of audit records.

• Describe the tasks for DB2 database auditing performed by the
SYSADM user
• List the security administration tasks for DB2 databases that
require the SECADM database authority in performing
database-level audits
• Utilize the db2audit command to implement instance-level auditing
and to configure the audit data and archive locations
• Create audit policies to enable collection of specific categories of
audit data for a DB2 database
• Assign audit policies to selected tables, users or database roles
using the AUDIT statement
How you will check your progress

Accountability:
• Machine exercise
© Copyright IBM Corp. 2005, 2015 Unit 11. DB2 Database Auditing 11-1
Instructor Guide
References
Database Security Guide
V8.2
Instructor Guide
Uempty
Unit objectives
• Describe the tasks for DB2 database auditing performed by
the SYSADM user
• Utilize the db2audit command to implement instance-level
auditing and to configure the audit data and archive locations
• Create audit policies to enable collection of specific
categories of audit data for a DB2 database
• Assign audit policies to selected tables, users or database
roles using the AUDIT statement
Notes:
Instructor Guide
Instructor notes:
Purpose —
Details —
Transition statement — Let’s review the audit support for DB2 databases prior to DB2
9.5.
V8.2
Instructor Guide
Uempty
DB2 Audit facilities prior to DB2 9.5

• Auditing facilities controlled by SYSADM user
• Audit operates at the DB2 instance level
• db2audit command used to manage audit processing:

– db2audit configure – To select audit options
– db2audit describe – To display current audit status
– db2audit start/stop – To control operations
– db2audit extract – To extract audit records from the active audit log
– db2audit prune – To remove audit records from the active audit log
$HOME/sqllib/security
db2audit.cfg
db2audit.log
audit.del checking.del objmaint.del secmaint.del
sysadmin.del validate.del context.del
Figure 11-2. DB2 Audit facilities prior to DB2 9.5 CL4636.0
Notes:
Prior to DB2 9.5, the facilities for auditing of DB2 databases were based on using the
db2audit command, which required a user with SYSADM authority.
The db2audit command operates at the instance level, so the types of audit data collected
and the option to enable and disable auditing impacted all of the databases running under
the same instance.
• The audit configuration file and the audit log file were stored at a fixed location, the
security subdirectory of the sqllib path for the instance.
• The extract option of the db2audit command was used to extract audit records from the
active audit log file for the instance.
• The prune option of the db2audit command was used to remove older audit data from
the active audit log file.
Instructor Guide
Instructor notes:
Purpose — To review the characteristics of database auditing prior to DB2 9.5.
Details —
Transition statement — First, we will review the categories of audit data that can be
generated by DB2 databases.
V8.2
Instructor Guide
Uempty
Standard DB2 Audit data categories

• AUDIT – Generates records when audit settings
are changed or when the audit log is accessed.
• CHECKING – Generates records during
authorization checking of attempts to access or
manipulate DB2 database objects or functions.
• OBJMAINT – Generates records when creating or
dropping data objects.
• SECMAINT – Generates records when granting or
revoking: object or database privileges, or DBADM
authority. Records are also generated when the
database manager security configuration
parameters SYSADM_GROUP,
SYSCTRL_GROUP, or SYSMAINT_GROUP are
modified.
• SYSADMIN – Generates records when operations
requiring SYSADM, SYSMAINT, or SYSCTRL
authority are performed.
• VALIDATE – Generates records when
authenticating users or retrieving system security
information.
• CONTEXT – Generates supporting detailed
records, like the SQL statement for dynamic SQL.
Figure 11-3. Standard DB2 Audit data categories CL4636.0
Notes:
DB2 provides options to select the types of audit records that can be generated by a DB2
database. There are also options to choose whether an audit record should be generated
by normal successful activity, like an authorized database connection request, compared to
a failed request, like a connection that is rejected because the password is incorrect or
expired. The visual shows the categories of audit records that were available prior to DB2
9.5, which added a new category, EXECUTE.
There are different categories of audit records that can be generated.
The categories of events available for auditing are:
• Audit (AUDIT): Generates records when audit settings are changed or when the audit
log is accessed.
• Authorization Checking (CHECKING): Generates records during authorization
checking of attempts to access or manipulate DB2 database objects or functions.
• Object Maintenance (OBJMAINT): Generates records when creating or dropping data
objects.
Instructor Guide
• Security Maintenance (SECMAINT): Generates records when granting or revoking:

object or database privileges, or DBADM authority. Records are also generated when
the database manager security configuration parameters SYSADM_GROUP,
SYSCTRL_GROUP, or SYSMAINT_GROUP are modified.
• System Administration (SYSADMIN): Generates records when operations requiring
SYSADM, SYSMAINT, or SYSCTRL authority are performed.
• User Validation (VALIDATE): Generates records when authenticating users or
retrieving system security information.
• Operation Context (CONTEXT): Generates records to show the operation context
when a database operation is performed. This category allows for better interpretation
of the audit log file. When used with the log's event correlation field, a group of events
can be associated back to a single database operation. For example, a query statement
for dynamic queries, a package identifier for static queries, or an indicator of the type of
operation being performed, such as CONNECT, can provide needed context when
analyzing audit results. The SQL or XQuery statement providing the operation context
might be very long and is completely shown within the CONTEXT record. This can
make the CONTEXT record very large.
V8.2
Instructor Guide

Purpose — To review the types of audit records produced by previous releases of DB2.
Many customers have not implemented auditing and need to understand the types of audit
records that can be generated. The EXECUTE category was added in DB2 9.5 and we will
cover that option in more detail later.
Details —
Transition statement — Next we will review some of the limitations of DB2 facilities for
collection of audit data prior to DB2 9.5.
Instructor Guide
Limitations for audit support prior to DB2 9.5

• Lack of flexibility:
– All databases in instance are controlled by
a single audit configuration.
– Audit log file resides in a fixed location. For
DPF databases, this may be a NFS shared
file system.
– Can select type of audit data but no way to
limit auditing to selected users or certain
tables.
– Dynamic SQL text only available using
CONTEXT option which generates a large
volume of data.
• Performance impact:
– Extracting audit data from the active audit
log can impact database performance.
– General performance issues when
producing a large volume of audit data.
Figure 11-4. Limitations for audit support prior to DB2 9.5 CL4636.0
Notes:
In order to address the major limitations of database auditing in previous DB2 releases,
DB2 9.5 improved and extended the facilities for database auditing.
For example, prior to DB2 9.5:
• The db2audit command set auditing option for the DB2 instance, so all of the databases
in the instance are controlled by that single audit configuration.
• The Audit log file resides in a fixed location. For DPF databases, this might be a NFS
shared file system.
• The configuration allowed the administrator to select categories of audit data but there
was no method to limit auditing to selected users or certain tables.
• If the audit requirements included the SQL statement text, the generation of Dynamic
SQL text required using the CONTEXT option which generates a large volume of data.
• The EXTRACT option of the db2audit command retrieved data from the active audit file,
which could impact database performance.
V8.2
Instructor Guide
Uempty Beginning with DB2 9.5, the audit facilities were enhanced, including these options:
• The location for the active audit log files can be configured. This makes it easier for DPF
partitioned databases to specify a local disk on each database server to hold the audit
data.
• Auditing can be enabled and configured at the database or instance level.
• The generation of audit records can be limited to activity for selected users or tables.
Instructor Guide
Instructor notes:
Purpose — To provide some background information regarding the limitation of auditing
DB2 databases prior to DB2 9.5 and to introduce some of the features for database
auditing beginning with DB2 9.5. These new options will explained in more detail in this
lecture.
Details —
Transition statement — Next we will compare the traditional instance-level auditing of
DB2 with database-level auditing.
V8.2
Instructor Guide
Uempty
DB2 Database Audit features: Part 1

• Two distinct and independent methods for auditing database
activity:
– Instance level audit:
• The db2audit command can be utilized to configure and manage
instance level audit data collection
• Requires SYSADM authority
– Database level audit:

• Management requires SECADM authority for the DB2 database
• SECADM creates audit policy objects in a database using CREATE
AUDIT POLICY statements
• SECADM uses the AUDIT statement to assign an audit policy to the
database or to some specific type of activity, including a table, a
defined database role or an individual user
• DB2 provided Stored procedures may be used by SECADM user to
archive database audit data and extract the audit data for analysis
Figure 11-5. DB2 Database Audit features: Part 1 CL4636.0
Notes:
The enhancements to the DB2 audit facility beginning with DB2 9.5 include fine grained
configuration, a new audit category, separate instance and database logs, and new ways to
customize the audit configuration.
With these options, you have control over exactly which database objects are audited and
you do not need to audit events that occur for database objects that you are not interested
in. Consequently, the performance impact of auditing (and its performance impact on other
database operations) can be better managed.
Individual databases can have their own audit configurations, as can particular objects
within a database, such as tables, or even users, groups, and roles. In addition to providing
easier access to the information that you need, this enhancement also improves
performance, because less data needs to be written to disk.
The security administrator is solely in control of configuring an audit for a database; the
system administrator (holding SYSADM authority) no longer has this authority. The security
administrator also has sufficient access to manipulate the audit log and extract audit data
records to files that can be easily loaded into DB2 tables.
Instructor Guide
The db2audit command was changed in DB2 9.5 to work with these enhanced auditing
options.
• The audit facility provides the ability to audit at both the instance and the individual
database level, independently recording all instance- and database-level activities with
separate logs for each level.
- The system administrator (who holds SYSADM authority) can use the db2audit tool
to configure an audit at the instance level and to control when information for that
audit is collected.
- The system administrator can also use the db2audit tool to archive both instance
and database audit logs and to extract audit data from archived logs of either type.
The security administrator (who holds SECADM authority) can use audit policies
with the SQL statement AUDIT to configure and control the audit requirements for
an individual database. The security administrator can use the
SYSPROC.AUDIT_ARCHIVE and SYSPROC.AUDIT_DELIM_EXTRACT stored
procedures and the SYSPROC.AUDIT_LIST_LOGS table function to archive audit
logs, locate logs of interest, and extract data into delimited files for analysis.
V8.2
Instructor Guide

Purpose — To begin to explain the enhancements and changes in audit support beginning
with DB2 9.5. A key point for students to understand is that they can continue to use the
db2audit command as a SYSADM user to perform instance-level auditing.
It is also important to understand that they do not need to use the db2audit command to
start, stop or configure the options for database-level audits and that a SECADM user will
perform most of the audit tasks for database-level auditing.
Details —
Transition statement — Next we will discuss the process of archiving audit data for
analysis and the EXECUTE category of audit data.
Instructor Guide
DB2 Database Audit features: Part 2

• The EXECUTE category for auditing SQL statements:
– Can only be used for database level audits, it can not be
configured using db2audit command
– Reduces need to generate CONTEXT category audit
records
• Active audit log is archived before data is extracted:

– Reduces performance issue with extracting data from the
active audit log file
– SYSADM user can use the db2audit command to archive
instance or database level audit data
– SECADM uses the AUDIT_ARCHIVE stored procedure to
archive database level audit data
• The location for Active and Archived audit data can be

configured:
– SYSADM user uses the db2audit command to configure the
location for database and instance level active and archived
audit logs
– Allows DPF databases to use local file system for audit log
Figure 11-6. DB2 Database Audit features: Part 2 CL4636.0
Notes:
The audit category, EXECUTE, which can only be configured for database-level auditing,
allows you to audit just the SQL statement that is being run. In DB2 releases prior to DB2
9.5, you needed to audit the CONTEXT event to capture this detail.
With the introduction of archival for audit data, the db2audit command prune option has
been removed. When the current audit data is archived, You should archive the audit logs
on a regular basis (such as once a day or week), and after you have extracted the data that
you need from the archived files, you can delete them or store them offline.
Archiving the audit log moves the active audit log to an archive directory while the server
begins writing to a new, active audit log. This allows the audit log to be stored offline
without having to extract data from it until necessary. After the security administrator or
system administrator has archived a log, they can extract data from the log into delimited
files. The data in the delimited files can be loaded into DB2 database tables for analysis.
The db2audit extract command parameter requires the specification of the names of the
archived audit log files to be used for input. The extract command parameter allows you to
V8.2
Instructor Guide
Uempty specify which categories to extract and whether to extract success or failure events (or
both).
Instructor Guide
Instructor notes:
Purpose — To introduce the EXECUTE category of audit data and to discuss the need to
archive the active audit log before extracting data for analysis.
Details —
Transition statement — Let’s take a look at the options for the db2audit command.
V8.2
Instructor Guide
Uempty
db2audit command
used to manage instance-level auditing
>>-db2audit--+-configure--+-reset-------------------+-+--------><
| '-| Audit Configuration |-' |
+-describe-------------------------------+
+-extract--| Audit Extraction |----------+
+-flush----------------------------------+
+-archive--| Audit Log Archive |---------+
+-start----------------------------------+
+-stop-----------------------------------+
'-?--------------------------------------'
Audit Configuration
|--+--------------------------------------------------+--------->
| .-,-------------------------------------. |
| V | |
'-scope------+-all------+--status--+-both----+---+-'
+-audit----+ +-none----+
+-checking-+ +-failure-+
+-context--+ '-success-'
+-objmaint-+
+-secmaint-+
+-sysadmin-+
'-validate-'
>--+-----------------------+--+---------------------------+----->
'-errortype--+-audit--+-' '-datapath--audit-data-path-'
'-normal-'
>--+---------------------------------+--------------------------|
'-archivepath--audit-archive-path-'
Figure 11-7. db2audit command used to manage instance-level auditing CL4636.0
Notes:
db2audit – Audit facility administrator tool command

DB2 provides an audit facility to assist in the detection of unknown or unanticipated access
to data. The DB2 Audit facility generates and permits the maintenance of an audit trail for a
series of predefined database events. The records generated from this facility are kept in
audit log files. The analysis of these records can reveal usage patterns which would identify
system misuse. Once identified, actions can be taken to reduce or eliminate such system
misuse. The audit facility acts at both the instance and database levels, independently
recording all activities in separate logs based on either the instance or the database.
DB2 provides the ability to independently audit at both the instance and at the individual
database level. The db2audit tool is used to configure audit at the instance level as well as
control when such audit information is collected.
Authorized users of the audit facility can control the following actions within the audit
facility, using db2audit:
Instructor Guide
• Start recording auditable events within the DB2 instance. This does not include
database-level activities.
• Stop recording auditable events within the DB2 instance.
• Configure the behavior of the audit facility at the instance-level only.
• Select the categories of the auditable events to be recorded at the instance-level only.
• Request a description of the current audit configuration for the instance.
• Flush any pending audit records from the instance and write them to the audit log.
• Archive audit records from the current audit log for either the instance or a database
under the instance.
• Extract audit records from an archived audit log by formatting and copying them to a flat
file or ASCII delimited file. Extraction is done in preparation for analysis of log records.
Authorization: sysadm
Command syntax:
>>-db2audit--+-configure--+-reset-------------------+-+--------><
| '-| Audit Configuration |-' |
+-describe-------------------------------+
+-extract--| Audit Extraction |----------+
+-flush----------------------------------+
+-archive--| Audit Log Archive |---------+
+-start----------------------------------+
+-stop-----------------------------------+
'-?--------------------------------------'
Audit Configuration:
|--+--------------------------------------------------+--------->
| .-,-------------------------------------. |
| V | |
'-scope------+-all------+--status--+-both----+---+-'
+-audit----+ +-none----+
+-checking-+ +-failure-+
+-context--+ '-success-'
+-objmaint-+
+-secmaint-+
+-sysadmin-+
'-validate-'
>--+-----------------------+--+---------------------------+----->
'-errortype--+-audit--+-' '-datapath--audit-data-path-'
'-normal-'
>--+---------------------------------+--------------------------|
'-archivepath--audit-archive-path-'
V8.2
Instructor Guide
Uempty Audit Extraction:

.-file--output-file------------------------------------------.
|--+------------------------------------------------------------+-->
'-delasc--+---------------------------+--+-----------------+-'
'-delimiter--load-delimiter-' '-to--delasc-path-'
>--+-+---------------------+---------------------------------+-->
| '-status--+-failure-+-' |
| '-success-' |
| .-,-----------------------------------------. |
| V | |
'-category------+-audit----+--+---------------------+---+-'
+-checking-+ '-status--+-both----+-'
+-context--+ +-failure-+
+-execute--+ '-success-'
+-objmaint-+
+-secmaint-+
+-sysadmin-+
'-validate-'
>--from--+--------------------+--files--input-log-files---------|
'-path--archive-path-'
Audit Log Archive:
|--+-------------------------+---------------------------------->
'-database--database-name-'
>--+-------------------------------+--+------------------+------|
'-node--+---------------------+-' '-to--archive-path-'
'-current-node-number-'
Examples:
This is a typical example of how to archive and extract a delimited ASCII file in a DPF
environment.
The UNIX remove (rm) command deletes the old delimited ASCII files.
• rm /auditdelasc/*.del
• db2audit flush
• db2audit archive database mydb to /auditarchive (files will be indicated for use in next
step)
• db2audit extract delasc to /auditdelasc from files /auditarchive
/db2audit.db.mydb.log.*.20070514102856
• load the del files into a DB2 table…
Instructor Guide
Usage notes
• The instance-level audit facility must be stopped and started explicitly. When starting,
the audit facility uses existing audit configuration information. Since the audit facility is
independent of the DB2 database server, it will remain active even if the instance is
stopped. In fact, when the instance is stopped, an audit record might be generated in
the audit log.
• Ensure that the audit facility has been turned on by issuing the db2audit start
command before using the audit utilities.
• There are different categories of audit records that might be generated. In the
description of the categories of events available for auditing (below), you should notice
that following the name of each category is a one-word keyword used to identify the
category type. The categories of events available for auditing are:
- Audit (audit). Generates records when audit settings are changed or when the audit
log is accessed.
- Authorization Checking (checking). Generates records during authorization
checking of attempts to access or manipulate DB2 database objects or functions.
- Object Maintenance (objmaint). Generates records when creating or dropping data
objects.
- Security Maintenance (secmaint). Generates records when granting or revoking:
object or database privileges, or DBADM authority. Records are also generated
when the database manager security configuration parameters SYSADM_GROUP,
SYSCTRL_GROUP, or SYSMAINT_GROUP are modified.
- System Administration (sysadmin). Generates records when operations requiring
SYSADM, SYSMAINT, or SYSCTRL authority are performed.
- User Validation (validate). Generates records when authenticating users or
retrieving system security information.
- Operation Context (context). Generates records to show the operation context
when an instance operation is performed. This category allows for better
interpretation of the audit log file. When used with the log's event correlator field, a
group of events can be associated back to a single database operation.
- You can audit failures, successes, both or none.
• Any operation on the instance might generate several records. The actual number of
records generated and moved to the audit log depends on the number of categories of
events to be recorded as specified by the audit facility configuration. It also depends on
whether successes, failures, or both, are audited. For this reason, it is important to be
selective of the events to audit.
• To clean up and/or view audit logs, run archive on a regular basis, then run extract on
the archived file to save what is useful. The audit logs can then be deleted with standard
file system delete commands.
V8.2
Instructor Guide

Purpose — To discuss the use of the db2audit command to configure and manage
auditing at the DB2 instance level. This shows the options available starting with DB2 9.5,
which added the support to archive audit data and eliminated the option to purge audit
records.
Details —
Transition statement — Let’s take a look at some examples of using the db2audit
command.
Instructor Guide
db2audit command examples

• To set audit categories objmaint and secmaint to collect all activity:
– db2audit configure scope objmaint status both, secmaint status both
• To begin collecting audit data records for configured categories:

– db2audit start
• To archive the active instance-level audit records:

– db2audit archive
• To extract the archived audit data to delimited files for loading into DB2
tables:
– db2audit extract delasc to /auditdelasc
from files /auditarchive/db2audit.instance.log.*
Figure 11-8. db2audit command examples CL4636.0
Notes:
Here are some examples of instance-level audit management using the db2audit
command.
The first step is to configure the audit categories that need to be collected.
For example, to set the audit categories objmaint and secmaint to collect both successes
and failures, the following command could be used:
db2audit configure scope objmaint status both, secmaint status both
The instance-level audit facility must be stopped and started explicitly. When starting, the
audit facility uses existing audit configuration information. The following command could be
used to begin collecting audit data for the instance:
db2audit start
The audit data must be archived before it can be extracted or saved to another location.
The following command can be used to archive the current instance-level audit data:
db2audit archive
V8.2
Instructor Guide
Uempty The following db2audit command could be used to extract a set of delimited files from the
archived instance-level audit files in a specified disk directory:
db2audit extract delasc to /auditdelasc
from files /auditarchive/db2audit.instance.log.*
Instructor Guide
Instructor notes:
Purpose — To show examples of using the db2audit command to manage instance-level
auditing.
Details —
Transition statement — Next we will compare instance and database-level audit
management.
V8.2
Instructor Guide
Uempty
DB2 instance and database auditing

• Instance-level auditing:
– Managed by user in SYSADM_GROUP defined by DBM CFG
– Requires db2audit start command to begin collection of audit data
– Logs all configured categories for all activity in a DB2 instance
• Database-level auditing:
– Managed by user with SECADM authority for a database
– Does NOT require db2audit start command to begin data collection
– Logs audit data based on assigned audit policies
• Both types of auditing:

– Utilize audit buffer defined by AUDIT_BUF_SZ in DBM CFG
• Synchronous writes if AUDIT_BUF_SZ = 0 can effect performance
• Asynchronous writes if AUDIT_BUF_SZ > 0
Figure 11-9. DB2 instance and database auditing CL4636.0
Notes:
The management of DB2 instance-level auditing must be performed by a user in the
SYSADM_GROUP defined in the DBM configuration. The db2audit start and stop options
are used to begin and suspend audit data collection for instance-level auditing. The
instance-level auditing lacks the flexibility of database-level auditing.
Database-level auditing must be managed by a user with SECADM authority for the DB2
database. No db2audit start command is necessary to begin collection of database-level
audit data. As soon as an audit policy is assigned to some part of the database using the
AUDIT command, DB2 will begin to collect the necessary audit records. The audit policies
provide for very flexible collection of specific audit records.
Controlling the timing of writing audit records to the active log

The writing of the audit records to the active log can take place synchronously or
asynchronously with the occurrence of the events causing the generation of those records.
The value of the audit_buf_sz database manager configuration parameter determines
when the writing of audit records is done.
Instructor Guide
If the value of audit_buf_sz is zero (0), the writing is done synchronously. The event
generating the audit record waits until the record is written to disk. The wait associated with
each record causes the performance of the DB2 database to decrease.
If the value of audit_buf_sz is greater than zero, the record writing is done asynchronously.
The value of the audit_buf_sz when it is greater than zero is the number of 4 KB pages
used to create an internal buffer. The internal buffer is used to keep a number of audit
records before writing a group of them out to disk. The statement generating the audit
record as a result of an audit event will not wait until the record is written to disk, and can
continue its operation.
In the asynchronous case, it could be possible for audit records to remain in an unfilled
buffer for some time. To prevent this from happening for an extended period, the database
manager forces the writing of the audit records regularly. An authorized user of the audit
facility might also flush the audit buffer with an explicit request. Also, the buffers are
automatically flushed during an archive operation.
V8.2
Instructor Guide

Purpose — To highlight some differences between the instance and database-level
auditing, like the requirement for the db2audit start command for instance auditing. The
setting of the AUDIT_BUF_SZ option effects the writing of audit records for both types of
auditing.
Details —
Transition statement — Next we will show how the location for storing the audit data can
be configured.
Instructor Guide
Audit path configuration

• Default audit data path is ~/sqllib/security/auditdata
• Data path configuration supports Database Partition

Expressions:
– Previously used in Automatic Storage
– Necessary in a cluster environment for HA failover
"db2audit configure datapath /auditForNode $N"
• An Archive Path can also be configured:

– Sets Default path to write archived audit log files to
– db2audit and stored procedures are aware of the path, so you can
leave these optional parameters blank
"db2audit configure archivepath /auditarchives"
Figure 11-10. Audit path configuration CL4636.0
Notes:
Storage of audit logs

The system administrator can configure the path for the active audit log and the archived
audit log using the db2audit configure command. Archiving the audit log moves the active
audit log to an archive directory while the server begins writing to a new, active audit log.
This allows the audit log to be stored offline without having to extract data from it until
necessary. After the security administrator or system administrator has archived a log, they
can extract data from the log into delimited files. The data in the delimited files can be
loaded into DB2 database tables for analysis.
Configuring the location of the audit logs allows you to place the audit logs on a large,
high-speed disk, with the option of having separate disks for each node in a database
partitioning feature (DPF) installation. In a DPF environment, the path for the active audit
log can be a directory that is unique to each node. Having a unique directory for each node
helps to avoid file contention, because each node is writing to a different disk.
V8.2
Instructor Guide
Uempty The default path for the audit logs on Windows operating systems is
instance\security\auditdata and on Linux and UNIX operating systems is
instance/security/auditdata. If you do not want to use the default location, you can choose
different directories (you can create new directories on your system to use as alternative
locations, if they do not already exist).
To set the path for the active audit log location and the archived audit log location, use the
db2audit configure command with the datapath and archivepath parameters, as shown
in this example:
db2audit configure datapath /auditlog archivepath /auditarchive
The audit log storage locations you set using db2audit apply to all databases in the
instance.
Note
If there are multiple instance on the server, then each instance should each have separate
data and archive paths.
The path for active audit logs (datapath) in a DPF environment

In a DPF environment, the same active audit log location (set by the datapath parameter)
must be used on each partition.
There are two ways to accomplish this:
• Use database partition expressions when you specify the datapath parameter. Using
database partition expressions allows the partition number to be included in the path of
the audit log files and results in a different path on each database partition.
• Use a shared drive that is the same on all nodes.
You can use database partition expressions anywhere within the value you specify for the
datapath parameter.
For example, on a three node system, where the database partition number is 10, the
following command:
db2audit configure datapath '/pathForNode $N‘
Creates the following files:
/pathForNode10
/pathForNode20
/pathForNode30
Instructor Guide
Note
You cannot use database partition expressions to specify the archive log file path
(archivepath parameter).
Archiving active audit logs

The system administrator can use the db2audit tool to archive both instance and database
audit logs as well as to extract audit data from archived logs of either type. To archive the
active audit log, the security administrator can use the SYSPROC.AUDIT_ARCHIVE
stored procedure. To extract data from the log and load it into delimited files, the security
administrator can use the SYSPROC.AUDIT_DELIM_EXTRACT stored procedure.
The archived log files do not need to be immediately loaded into tables for analysis; they
can be saved for future analysis. For example, they might only need to be looked at when a
corporate audit is taking place.
If a problem occurs during archive, such as running out of disk space in the archive path, or
the archive path does not exist, the archive process fails and an interim log file with the file
extension .bk is generated in the audit log data path, for example,
db2audit.instance.log.0.20070508172043640941.bk. After the problem is resolved (by
allocating sufficient disk space in the archive path or by creating the archive path), you
must move this interim log to the archive path. Then, you can treat it in the same way as a
successfully archived log.
Archiving active audit logs in a DPF environment

In a DPF environment, if the archive command is issued while the instance is running, the
archive process automatically runs on every node. The same timestamp is used in the
archived log file name on all nodes. For example, on a three node system, where the
database partition number is 10, the following command:
db2audit archive to /auditarchive
Creates the following files:
/auditarchive/db2audit.log.10.timestamp
If the archive command is issued while the instance is not running, you can control on
which node the archive is run by one of the following methods:
• Use the node option with the db2audit command to perform the archive for the current
node only.
• Use the db2_all command to run the archive on all nodes.
V8.2
Instructor Guide
Uempty For example: db2_all db2audit archive node to /auditarchive

This sets the DB2NODE environment variable to indicate on which nodes the command
is invoked.
Alternatively, you can issue an individual archive command on each node separately.
For example: 
On node 10: db2audit archive node 10 to /auditarchive
Note
• When the instance is not running, the timestamps in the archived audit log file names
are not the same on each node.
• It is recommended that the archive path is shared across all nodes, but it is not
required.
Instructor Guide
Instructor notes:
Purpose — To discuss the options to configure the locations for the active audit logs and
archived audit logs.
Details —
Transition statement — Next we will show some examples of setting the audit log location
using the db2audit command.
V8.2
Instructor Guide
Uempty
Example of
configured audit data and archive paths
db2audit describe
DB2 AUDIT SETTINGS:
Audit active: "FALSE "

Log audit events: "FAILURE"
Log checking events: "BOTH"
Log object maintenance events: "FAILURE"
Log security maintenance events: "BOTH"
Log system administrator events: "FAILURE"
Log validate events: "FAILURE"
Log context events: "BOTH"
Return SQLCA on audit error: "FALSE "
Audit Data Path: "/home/inst461/auditdata/"
Audit Archive Path: "/home/inst461/auditarch/"
AUD0000I Operation succeeded.
Audit Data Path: "/home/inst461/auditdata/“ contains:
db2audit.db.MUSICDB.log.0
db2audit.instance.log.0
Audit Archive Path: "/home/inst461/auditarch/“ contains:
db2audit.db.MUSICDB.log.0.20080130232517
db2audit.instance.log.0.20080201002744
db2audit.instance.log.0.20080201181532
Figure 11-11. Example of configured audit data and archive paths CL4636.0
Notes:
The visual shows and example of the db2audit describe command output that lists the
current status for instance-level audit options, including the audit data and archive paths.
Audit log file names

The audit log files have names that distinguish whether they are instance-level or
database-level logs and which partition they originate from in a database partitioning
feature (DPF) environment. Archived audit logs have the timestamp of when the archive
command was run appended to their file name.
Active audit log file names

In a DPF environment, the path for the active audit log can be a directory that is unique to
each partition so that each partition writes to an individual file. In order to accurately track
the origin of audit records, the partition number is included as part of the audit log file
name.
Instructor Guide
For example, on partition 20, the instance-level audit log file name is
db2audit.instance.log.20.
For a database called testdb in this instance, the audit log file is
db2audit.db.testdb.log.20.
In a non-DPF environment the partition number is considered to be 0 (zero). In this case,
the instance-level audit log file name is db2audit.instance.log.0.
For a database called testdb in this instance, the audit log file is db2audit.db.testdb.log.0.
Archived audit log file names

When the active audit log is archived, the current timestamp in the following format is
appended to the filename: YYYYMMDDHHMMSS (where YYYY is the year, MM is the
month, DD is the day, HH is the hour, MM is the minutes, and SS is the seconds.
The file name format for an archive audit log depends on the level of the audit log:
• instance-level archived audit log - The file name of the instance-level archived audit
log is: db2audit.instance.log.partition.YYYYMMDDHHMMSS.
• database-level archived audit log - The file name of the database-level archived
audit log is: db2audit.dbdatabase.log.partition.YYYYMMDDHHMMSS.
In a non-DPF environment, the value for partition is 0 (zero).
The timestamp represents the time that the archive command was run, therefore, it
does not always precisely reflect the time of the last record in the log. The archived
audit log file might contain records with timestamps a few seconds later than the
timestamp in the log file name because: When the archive command is issued, the audit
facility waits for the writing of any in-process records to complete before creating the
archived log file. In a multi-machine environment, the system time on a remote machine
might not be synchronized with the machine where the archive command is issued.
In a DPF environment, if the server is running when archive is run, the timestamp is
consistent across partitions and reflects the timestamp generated at the partition at
which the archive was performed.
V8.2
Instructor Guide

Purpose — To show an example of the db2audit describe output which indicates the
current paths for audit data and archive files. It also shows the naming conventions used
for the new instance and database audit files.
Details —
Transition statement — Next we will discuss how to create audit policies for data-level
auditing.
Instructor Guide
Creating Audit policies

• Only SECADM user can create an audit policy for use in a database
• Syntax:
>>-CREATE AUDIT POLICY--policy-name--Ɣ--CATEGORIES-------------->
.-,--------------------------------------------------------.
V (1) |
>----------+-ALL-----------------------+--STATUS--+-BOTH----+-+-->
+-AUDIT---------------------+ +-FAILURE-+
+-CHECKING------------------+ +-NONE----+
+-CONTEXT-------------------+ '-SUCCESS-'
| .-WITHOUT DATA-. |
+-EXECUTE--+--------------+-+
| '-WITH DATA----' |
+-OBJMAINT------------------+
+-SECMAINT------------------+
+-SYSADMIN------------------+
'-VALIDATE------------------'
>--Ɣ--ERROR TYPE--+-NORMAL-+--Ɣ--------------------------------><
'-AUDIT--‘
• Example:
CREATE AUDIT POLICY DBAUDPRF
CATEGORIES AUDIT STATUS BOTH,
SECMAINT STATUS FAILURE,
OBJMAINT STATUS BOTH,
CHECKING STATUS FAILURE,
VALIDATE STATUS FAILURE
ERROR TYPE NORMAL
Figure 11-12. Creating Audit policies CL4636.0
Notes:
CREATE AUDIT POLICY statement

The CREATE AUDIT POLICY statement defines an auditing policy at the current server.
The policy determines what categories are to be audited; it can then be applied to other
database objects to determine how the use of those objects is to be audited.
Invocation
This statement can be embedded in an application program or issued interactively. It is an
executable statement that can be dynamically prepared only if DYNAMICRULES run
behavior is in effect for the package (SQLSTATE 42509).
Authorization
The privileges held by the authorization ID of the statement must include SECADM
authority.
V8.2
Instructor Guide
Uempty Syntax
>>-CREATE AUDIT POLICY--policy-name--?--CATEGORIES-------------->
.-,--------------------------------------------------------.
V (1) |
>----------+-ALL-----------------------+--STATUS--+-BOTH----+-+-->
+-AUDIT---------------------+ +-FAILURE-+
+-CHECKING------------------+ +-NONE----+
+-CONTEXT-------------------+ '-SUCCESS-'
| .-WITHOUT DATA-. |
+-EXECUTE--+--------------+-+
| '-WITH DATA----' |
+-OBJMAINT------------------+
+-SECMAINT------------------+
+-SYSADMIN------------------+
'-VALIDATE------------------'
>--?--ERROR TYPE--+-NORMAL-+--?--------------------------------><
'-AUDIT--
Each category can be specified at most once (SQLSTATE 42614), and no other category
can be specified if ALL is specified (SQLSTATE 42601).
Description
policy-name: Names the audit policy. This is a one-part name. It is an SQL identifier
(either ordinary or delimited). The policy-name must not identify an audit policy already
described in the catalog (SQLSTATE 42710). The name must not begin with the characters
'SYS' (SQLSTATE 42939).
CATEGORIES
A list of one or more audit categories for which a status is specified. If ALL is not specified,
the STATUS of any category that is not explicitly specified is set to NONE.
ALL – Sets all categories to the same status. The EXECUTE category is WITHOUT
DATA.
AUDIT – Generates records when audit settings are changed or when the audit log is
accessed.
CHECKING – Generates records during authorization checking of attempts to access
or manipulate database objects or functions.
CONTEXT – Generates records to show the operation context when a database
operation is performed.
EXECUTE – Generates records to show the execution of SQL statements.
WITHOUT DATA or WITH DATA – Specifies whether or not input data values
provided for any host variables and parameter markers should be logged as part of
the EXECUTE category. 
WITHOUT DATA – Input data values provided for any host variables and parameter
Instructor Guide
markers are not logged as part of the EXECUTE category. WITHOUT DATA is the
default. 
WITH DATA – Input data values provided for any host variables and parameter
markers are logged as part of the EXECUTE category. Not all input values are
logged; specifically, LOB, LONG, XML, and structured type parameters appear as
the null value. Date, time, and timestamp fields are logged in ISO format. The input
data values are converted to the database code page before being logged. If code
page conversion fails, no errors are returned and the unconverted data is logged.
OBJMAINT – Generates records when data objects are created or dropped.
SECMAINT – Generates records when object privileges, database privileges, or
DBADM authority is granted or revoked. Records are also generated when the
database manager security configuration parameters sysadm_group, sysctrl_group,
or sysmaint_group are modified.
SYSADMIN – Generates records when operations requiring SYSADM, SYSMAINT, or
SYSCTRL authority are performed.
VALIDATE – Generates records when users are authenticated or when system security
information related to a user is retrieved.
STATUS: Specifies a status for the specified category.
BOTH – Successful and failing events will be audited.
FAILURE – Only failing events will be audited.
SUCCESS – Only successful events will be audited.
NONE – No events in this category will be audited.
ERROR TYPE: Specifies whether audit errors are to be returned or ignored.
NORMAL – Any errors generated by the audit are ignored and only the SQLCODEs for
errors associated with the operation being performed are returned to the application.
AUDIT – All errors, including errors occurring within the audit facility itself, are returned
to the application.
Rules:
An AUDIT-exclusive SQL statement must be followed by a COMMIT or ROLLBACK
statement (SQLSTATE 5U021). AUDIT-exclusive SQL statements are:
• AUDIT
• CREATE AUDIT POLICY, ALTER AUDIT POLICY, or DROP (AUDIT POLICY)
• DROP (ROLE or TRUSTED CONTEXT if it is associated with an audit policy)
• An AUDIT-exclusive SQL statement cannot be issued within a global transaction
(SQLSTATE 51041) such as, for example, an XA transaction.
V8.2
Instructor Guide
Uempty Additional Notes

Only one uncommitted AUDIT-exclusive SQL statement is allowed at a time across all
database partitions. If an uncommitted AUDIT-exclusive SQL statement is executing,
subsequent AUDIT-exclusive SQL statements wait until the current AUDIT-exclusive SQL
statement commits or rolls back.
Changes are written to the system catalog, but do not take effect until they are committed,
even for the connection that issues the statement.
Example
Create an audit policy to audit successes and failures for the AUDIT and OBJMAINT
categories; only failures for the SECMAINT, CHECKING, and VALIDATE categories, and
no events for the other categories.
CREATE AUDIT POLICY DBAUDPRF
CATEGORIES AUDIT STATUS BOTH,
SECMAINT STATUS FAILURE,
OBJMAINT STATUS BOTH,
CHECKING STATUS FAILURE,
VALIDATE STATUS FAILURE
ERROR TYPE NORMAL
Instructor Guide
Instructor notes:
Purpose — To show the syntax and an example of creating an audit policy. This can only
be performed by a SECADM user.
Details —
Transition statement — Next we will discuss using the AUDIT statement to assign, alter
or remove an audit policy.
V8.2
Instructor Guide
Uempty
Audit policies
are assigned using AUDIT statement
• The AUDIT statement determines the audit policy that is to be used for a particular database or database object at the
current server. Whenever the object is in use, it is audited according to that policy.
>>-AUDIT----------+-DATABASE----------------------+-+----------->
+-TABLE--table-name-------------+
+-TRUSTED CONTEXT--context-name-+
+-+-USER--+--authorization-name-+
| +-GROUP-+ |
| '-ROLE--' |
'-+-ACCESSCTRL-+----------------'
+-DATAACCESS-+
+-DBADM------+
+-SECADM-----+
+-SQLADM-----+
+-SYSADM-----+
+-SYSCTRL----+
+-SYSMAINT---+
+-SYSMON-----+
'-WLMADM-----'
>--+-+-USING---+--POLICY--policy-name-+------------------------><
| '-REPLACE-' |
'-REMOVE POLICY--------------------'
• Examples:
AUDIT DATABASE USING POLICY DBAUDPRF
AUDIT ROLE TELLER REPLACE POLICY TELLERPRF
AUDIT TABLE EMPLOYEE REMOVE POLICY
Figure 11-13. Audit policies are assigned using AUDIT statement CL4636.0
Notes:
AUDIT statement
The AUDIT statement determines the audit policy that is to be used for a particular
database or database object at the current server. Whenever the object is in use, it is
audited according to that policy.
Invocation
This statement can be embedded in an application program or issued interactively. It is
an executable statement that can be dynamically prepared only if DYNAMICRULES run
behavior is in effect for the package (SQLSTATE 42509).
Authorization
The privileges held by the authorization ID of the statement must include SECADM
authority.
Instructor Guide
Syntax
.-,---------------------------------------.
V (1) |
>>-AUDIT----------+-DATABASE----------------------+-+----------->
+-TABLE--table-name-------------+
+-TRUSTED CONTEXT--context-name-+
+-+-USER--+--authorization-name-+
| +-GROUP-+ |
| '-ROLE--' |
''-+-ACCESSCTRL-+----------------'
+-DATAACCESS-+
+-DBADM------+
+-SECADM-----+
+-SQLADM-----+
+-SYSADM-----+
+-SYSCTRL----+
+-SYSMAINT---+
+-SYSMON-----+
'-WLMADM-----'
>--+-+-USING---+--POLICY--policy-name-+------------------------><
| '-REPLACE-' |
'-REMOVE POLICY--------------------'
Description
DATABASE – Specifies that an audit policy is to be associated with or removed from
the database at the current server. All auditable events that occur within the database
are audited according to the associated audit policy.
TABLE table-name – Specifies that an audit policy is to be associated with or removed
from table-name. The table-name must identify a table, materialized query table (MQT),
or nickname that exists at the current server (SQLSTATE 42704). It cannot be a view, a
catalog table, a declared temporary table (SQLSTATE 42995), or a typed table
(SQLSTATE 42997). Only EXECUTE category audit events, with or without data, will be
generated when the table is accessed, even if the policy indicates that other categories
should be audited.
TRUSTED CONTEXT context-name – Specifies that an audit policy is to be
associated with or removed from context-name. The context-name must identify a
trusted context that exists at the current server (SQLSTATE 42704). All auditable
events that happen within the trusted connection defined by the trusted context
context-name will be audited according to the associated audit policy.
USER authorization-name – Specifies that an audit policy is to be associated with or
removed from the user with authorization ID authorization-name. All auditable events
V8.2
Instructor Guide
Uempty that are initiated by authorization-name will be audited according to the associated audit
policy.
GROUP authorization-name – Specifies that an audit policy is to be associated with or
removed from the group with authorization ID authorization-name. All auditable events
that are initiated by users who are members of authorization-name will be audited
according to the associated audit policy. If user membership in a group cannot be
determined, the policy will not apply to that user.
ROLE authorization-name – Specifies that an audit policy is to be associated with or
removed from the role with authorization ID authorization-name. The
authorization-name must identify a role that exists at the current server (SQLSTATE
42704). All auditable events that are initiated by users who are members of
authorization-name will be audited according to the associated audit policy. Indirect role
membership through other roles or groups is valid.
ACCESSCTRL, DATAACCESS, DBADM, SECADM, SQLADM, SYSADM, SYSCTRL,
SYSMAINT, SYSMON, or WLMADM – Specifies that an audit policy is to be associated
with or removed from the specified authority. All auditable events that are initiated by a
user who holds the specified authority, even if that authority is not required for the
event, will be audited according to the associated audit policy.
USING, REMOVE, or REPLACE
Specifies whether the audit policy should be used, removed, or replaced for the
specified object.
• USING – Specifies that the audit policy is to be used for the specified object. An
existing audit policy must not already be defined for the object (SQLSTATE
5U041). If an audit policy already exists, it must be removed or replaced.
• REMOVE – Specifies that the audit policy is to be removed from the specified
object. Use of the object will no longer be audited according to the audit policy.
The association is deleted from the catalog when the audit policy is removed
from the object.
• REPLACE – Specifies that the audit policy is to replace an existing audit policy
for the specified object. This combines both REMOVE and USING options into
one step to ensure that there is no period of time in which an audit policy does
not apply to the specified object. If a policy was not in use for the specified
object, REPLACE is equivalent to USING.
POLICY policy-name – Specifies the audit policy that is to be used to determine audit
settings. The policy-name must identify an existing audit policy at the current server
(SQLSTATE 42704).
Statement Rules
Instructor Guide
An AUDIT-exclusive SQL statement must be followed by a COMMIT or ROLLBACK

statement (SQLSTATE 5U021). AUDIT-exclusive SQL statements are:
AUDIT
CREATE AUDIT POLICY, ALTER AUDIT POLICY, or DROP (AUDIT POLICY)
DROP (ROLE or TRUSTED CONTEXT if it is associated with an audit policy)
An AUDIT-exclusive SQL statement cannot be issued within a global transaction
(SQLSTATE 51041) such as, for example, an XA transaction.
An object can be associated with no more than one policy (SQLSTATE 5U042).
Changes are written to the catalog, but do not take effect until after a COMMIT statement
executes.
Changes do not take effect until the next unit of work that references the object to which the
audit policy applies. For example, if the audit policy is in use for the database, no current
units of work will begin auditing according to the policy until after a COMMIT or a
ROLLBACK statement completes.
If the object with which an audit policy is associated is dropped, the association to the audit
policy is removed from the catalog and no longer exists. If that object is recreated at some
later time, the object will not be audited according to the policy that was associated with it
when the object was dropped.
Examples
Example 1: Use the audit policy DBAUDPRF to determine the audit settings for the
database at the current server.
AUDIT DATABASE USING POLICY DBAUDPRF
Example 2: Remove the audit policy from the EMPLOYEE table.
AUDIT TABLE EMPLOYEE REMOVE POLICY
Example 3: Use the audit policy POWERUSERS to determine the audit settings for the
authorities SYSADM, DBADM, and SECADM, as well as the group DBAS.
AUDIT SYSADM, DBADM, SECADM, GROUP DBAS USING POLICY POWERUSERS
Example 4: Replace the audit policy for the role TELLER with the new policy
TELLERPRF.
AUDIT ROLE TELLER REPLACE POLICY TELLERPRF
V8.2
Instructor Guide

Purpose — To show the options and some examples of the AUDIT statement that will be
used by a SECADM user to assign an audit policy options to a database or a database
object, like a role or table. The AUDIT statement is also used to change the audit policy
assigned to a database object or to remove the current audit policy from a object.
Details —
Transition statement — Let’s look at some additional information about using the AUDIT
statement.
Instructor Guide
Audit statement additional information

• Controls the association of audit policies with database
objects:
– Many objects can be audited according to a single audit policy
– Only one audit policy can be associated with any one object
AUDIT AUTHORITY DBADM, ROLE MARKETING
ROLE ACCOUNTING, ROLE SUPPORT
USING POLICY SENSITIVE_DATA_POL;
• Audit policies are combined when multiple policies are in

effect, for example:
– group1 audited according to policy1,
– group2 audited according to policy2
– A user Pat is a member of both groups
• Combined in an inclusive manner
© Copyright IBM Corporation 2013 15
Figure 11-14. Audit statement additional information CL4636.0
Notes:
Although each object can only be associated with a single audit policy, many objects can
be assigned the same policy. This means that all of the types of audit data needed at the
database level need to be assigned to a single audit policy for the database.
A single audit policy could be assigned to many tables or many users.
If audit policies are associated with groups or roles, a user might inherit the combined audit
types for all of the groups or roles that they are a member of.
V8.2
Instructor Guide

Purpose — To discuss the implementation of database auditing. Since each object
including the database can only be assigned one audit policy at a time, the ALTER AUDIT
POLICY statement could be used to adjust the settings for the policy assigned to a
database, table, or user.
Details —
Transition statement — Next we will discuss assigning an audit policy at the database
level.
Instructor Guide
Audit granularity: The database
Granularity • All auditable events generated within a

Database
Tables connection will be audited according to
Authorities the policy associated with the database
Users
Groups
AUDIT DATABASE USING POLICY AUDITDB
Roles
Trusted
Connections
Figure 11-15. Audit granularity: The database CL4636.0
Notes:
The AUDIT statement can be used to assign or remove an audit policy at the database
level. All auditable events that occur within the database are audited according to the
associated audit policy.
Unlike instance-level auditing, which begins when the db2audit start command is used,
the assignment of an audit policy to any object, like the database implies that auditing will
begin being collected, without needing to manually start the audit data collection.
V8.2
Instructor Guide

Purpose — To discuss the assignment of an audit policy to a database. This statement is
run using a SECADM user, connected to the database where the new audit policy is
assigned.
Details —
Transition statement — Next we will look at assigning an audit policy at the table level.
Instructor Guide
Audit granularity: Tables
Granularity • Events generated whenever DML is executed

Database
Tables
against the audited table
Authorities AUDIT TABLE EMPLOYEE REMOVE POLICY
Users
Groups • Supported tables types:
Roles
Trusted – Untyped tables
Connections – Materialized Query Tables (MQT)
– Nicknames
• Views are audited according to base tables
• Only support for EXECUTE category
Figure 11-16. Audit granularity: Tables CL4636.0
Notes:
The AUDIT statement can be used to associate or remove an audit policy for a table. The
object could be a table, a materialized query table (MQT), or a nickname that exists at the
current server.
The object can NOT be a view, a catalog table, a declared temporary table (SQLSTATE
42995), or a typed table (SQLSTATE 42997).
Only EXECUTE category audit events, with or without data, will be generated when the
table is accessed, even if the policy indicates that other categories should be audited.
The audit policy that applies to a table does not apply to a materialized query table (MQT)
based on that table. It is recommended that if you associate an audit policy with a table,
you also associate that policy with any MQT based on that table. The compiler might
automatically use an MQT, even though an SQL statement references the base table;
however, the audit policy in use for the base table will still be in effect.
V8.2
Instructor Guide

Purpose — To discuss assigning an audit policy to individual DB2 tables. It is important to
note that only EXECUTE type audit records will be generated when an audit policy is
associated with a table and that you need to associate the audit policies with base table to
collect audit data when applications use views.
Details —
Transition statement — Next we will discuss assigning an audit policy based on special
user authorities.
Instructor Guide
Audit granularity: Authorities

• Events are logged whenever the user
Granularity generating the event holds one of the
Database audited authorities even if that authority
Tables is not required for the event.
Authorities
Users
For example:
Groups AUDIT SYSADM, DBADM, SECADM,
Roles GROUP DBAS USING POLICY
Trusted
Connections
POWERUSERS
• Supported Authorities:
• SYSADM Also :
• DBADM • SQLADM
• SECADM • WLMADM,
• ACCESSCTRL
• SYSCTRL • DATAACCESS
• SYSMAINT • CREATE_SECURE_OBJECT
• SYSMON
Figure 11-17. Audit granularity: Authorities CL4636.0
Notes:
An audit policy can be assigned to one or more of the special authorities for a DB2 instance
or database including:
• ACCESSCTRL
• DATAACCESS
• DBADM
• SECADM
• SQLADM
• SYSADM
• SYSCTRL
• SYSMAINT
• SYSMON
• WLMADM
V8.2
Instructor Guide
Uempty All auditable events that are initiated by a user who holds the specified authority, even if
that authority is not required for the event, will be audited according to the associated audit
policy.
The example assigns the audit policy POWERUSERS to determine the audit settings for
the authorities SYSADM, DBADM, and SECADM, as well as a group named DBAS.
Instructor Guide
Instructor notes:
Purpose — This shows the assignment of an audit policy to one of the groups contained in
the DBM CFG file (SYSADM_GROUP,
SYSCTRL_GROUP,SYSMAINT_GROUP,SYSMON_GROUP) or a user that has been
granted DBADM or SECADM authority for a database.
Details —
Transition statement — Next we will discuss the assignment of an audit policy to a user or
collection of users in a group or role.
V8.2
Instructor Guide
Uempty
Granularity: Users, Groups and Roles

• User:
Granularity – Events are logged whenever the user
Database
Tables generating the event is associated with
Authorities a policy
Users
Groups • Groups and Roles:
Roles
Trusted – Events are logged whenever the user
Connections
generating the event is a member of the
group or role that is associated with a
policy
AUDIT ROLE TELLER REPLACE
POLICY TELLERPOL
Figure 11-18. Granularity: Users, Groups and Roles CL4636.0
Notes:
The AUDIT statement can be used to assign or remove an audit policy from authorization
IDs representing users, groups, or roles. All auditable events that are initiated by the
specified user are audited according to the audit policy. The auditable events that are
initiated by users that are a member of the group or role are audited according to the audit
policy. Indirect role membership, such as through other roles or groups, is also included.
The example shown replaces the audit policy for the role TELLER with the new policy
named TELLERPOL.
When a SET SESSION USER statement is executed, the audit policies associated with the
original user (and that user’s group and role memberships and authorities) are combined
with the policies that are associated with the user specified in the SET SESSION USER
statement. The audit policies associated with the original user are still in effect, as are the
policies for the user specified in the SET SESSION USER statement. If multiple SET
SESSION USER statements are issued within a session, only the audit policies associated
with the original user and the current user are considered.
Instructor Guide
Instructor notes:
Purpose — To discuss the assignment of an audit policy to a single user, a defined group
of users or the members of a database role.
Details —
Transition statement — Next we will discuss assigning an audit policy to a defined trusted
context.
V8.2
Instructor Guide
Uempty
Granularity: Trusted contexts

• Events are logged when generated
from within a trusted context that is
associated with a policy
• When a switch user operation is

performed within a trusted context,
all audit policies are re-evaluated
based on the new user replacing
any policies from the old user for the
current session.
Granularity
Database
Tables
Authorities
Users
Groups
Roles
Trusted
Connections
Figure 11-19. Granularity: Trusted contexts CL4636.0
Notes:
The AUDIT statement can be used to associate or remove an audit policy from a trusted
context object created by the SECADM user. The trusted context must exist at the current
server (SQLSTATE 42704).
All auditable events that happen within the trusted connection will be audited according to
the associated audit policy.
When a switch user operation is performed within a trusted context, all audit policies are
re-evaluated according to the new user, and no policies from the old user are used for the
current session. This applies specifically to audit policies associated directly with the user,
the user’s group or role memberships, and the user’s authorities.
For example, if the current session was audited because the previous user was a member
of an audited role, and the switched-to user is not a member of that role, that policy no
longer applies to the session.
Instructor Guide
Instructor notes:
Purpose — To discuss the use of an audit policy to generate audit data for a defined
trusted context. This would allow all the activity associated with a three-tiered application
server to be audited according to a single audit policy.
Details —
Transition statement — Next we will discuss the EXECUTE category in more detail.
V8.2
Instructor Guide
Uempty
EXECUTE category (1 of 2)
• EXECUTE is a database-level category that audits the
execution of SQL statements
– Replaces the CONTEXT category for SQL statement execution
• Can optionally include input data

– Host variables and parameter markers
• Given a reproduction of the data at the time of an event, the

EXECUTE category provides sufficient information to replay
statements to understand their effect
(for example, what rows did that SELECT statement return)
Figure 11-20. EXECUTE category (1 of 2) CL4636.0
Notes:
The EXECUTE category for auditing SQL statements

The EXECUTE category allows you to accurately track the SQL statements a user issues.
Prior to DB2 9.5, the CONTEXT category was used to find this information.
This EXECUTE category captures the SQL statement text as well as the compilation
environment and other values that are needed to replay the statement at a later date. For
example, replaying the statement can show you exactly which rows a SELECT statement
returned. In order to re-run a statement and produce the exact same result the database
tables must first be restored to their state when the statement was issued.
Instructor Guide
Instructor notes:
Purpose — To discuss the use of the EXECUTE category of audit records.
Details —
Transition statement — Next we will discuss some of the detailed information that can be
captured using the EXECUTE category of auditing.
V8.2
Instructor Guide
Uempty
EXECUTE category (2 of 2)
• The EXECUTE can provide:
– Statement text
– Data type, length, value of input host variables and
parameter markers
• Does not include LOBS, LONG, XML and structured types
– Compilation Environment
– Counts indicating number of rows returned and rows modified
• The EXECUTE record is produced when execution completes
• If a statement fails prior to execution, no EXECUTE data is

generated:
– This would include statements that fail for lack of proper security.
The CHECKING audit category can be used to track these events.
Figure 11-21. EXECUTE category (2 of 2) CL4636.0
Notes:
When you audit using the EXECUTE category, the statement text for both static and
dynamic SQL is recorded, as are input parameter markers and host variables. You can
configure the EXECUTE category to be audited with or without input values. Global
variables are not audited.
The auditing of EXECUTE events takes place at the completion of the event (for SELECT
statements this is on cursor close). The status that the event completed with is also stored.
Because EXECUTE events are audited at completion, long-running queries do not
immediately appear in the audit log.
The preparation of a statement is not considered part of the execution. Most authorization
checks are performed at prepare time (for example, SELECT privilege). This means that
statements that fail during prepare due to authorization errors do not generate EXECUTE
events.
Statement Value Index, Statement Value Type and Statement Value Data fields can be
repeated for a given execute record. For the report format generated by the extraction,
each record lists multiple values. For the delimited file format, multiple rows are used. The
Instructor Guide
first row has an event type of STATEMENT and no values. Following rows have an event
type of DATA, with one row for each data value associated with the SQL statement. You
can use the event correlator and application ID fields to link STATEMENT and DATA rows
together.
The columns Statement Text, Statement Isolation Level, and Compilation Environment
Description are not present in the DATA events.
The statement text and input data values that are audited are converted into the database
code page when they are stored on disk (all audited fields are stored in the database code
page). No error is returned if the code page of the input data is not compatible with the
database code page; the unconverted data will be logged instead. Because each database
has it's own audit log, databases having different code pages does not cause a problem.
The ROLLBACK and COMMIT statements are audited when executed by the application,
and also when issued implicitly as part of another command, such as BIND.
After an EXECUTE event has been audited due to access to an audited table, all
statements that affect which other statements are executed within a unit of work, are
audited. These statements are COMMIT, ROLLBACK, ROLLBACK TO SAVEPOINT and
SAVEPOINT.
V8.2
Instructor Guide

Purpose — To discuss additional considerations for auditing using the EXECUTE
category.
Details —
Instructor Guide
Audit-related Stored Procedures and Functions

• SYSPROC.AUDIT_ARCHIVE:
– Allows a SECADM to archive the active audit log for the database
– Call Parameters can override defaults for output location for archived
audit log and database partition number
CALL SYSPROC.AUDIT_ARCHIVE( '/auditarchive', -2 )
• SYSPROC.AUDIT_DEL_EXTRACT:
– Allows a SECADM to perform an extract of archived audit logs for the
database into a set of delimited files that can then later be used to load
DB2 tables
– Call Parameters can override the defaults for the delimiter, the source
and target locations, a file name mask and selection criteria for event
types.
CALL
SYSPROC.AUDIT_DELIM_EXTRACT(NULL,'$HOME/AUDIT_DELIM_EXTRACT',
NULL, '%20070618%', 'CATEGORIES EXECUTE STATUS BOTH')
Figure 11-22. Audit-related Stored Procedures and Functions CL4636.0
Notes:
The security administrator can use the SYSPROC.AUDIT_ARCHIVE and
SYSPROC.AUDIT_DEL_EXTRACT stored procedures to archive audit logs and extract
audit data to delimited files for the database to which the security administrator is currently
connected.
The security administrator must be connected to a database in order to use these stored
procedures.
If you copy archived audit files to another database system, and you want to use the
SYSPROC.AUDIT_DEL_EXTRACT stored procedure to access them, ensure that the
database name is the same, or rename the files to include the same database name.
The SYSPROC.AUDIT_ARCHIVE stored procedure does not archive the instance-level
audit log. The system administrator must use the db2audit command to archive and extract
the instance-level audit log.
V8.2
Instructor Guide
Uempty AUDIT_ARCHIVE procedure and table function - Archive audit log file
The AUDIT_ARCHIVE procedure and table function both archive the audit log file for the
connected database.
Syntax
>>-AUDIT_ARCHIVE--(--directory--,--dbpartitionnum--)-----------><
The syntax is the same for both the procedure and table function.
Procedure and table function parameters

directory
An input argument of type VARCHAR(1024) that specifies the directory where the
archived audit file(s) will be written. The directory must exist on the server and the
instance owner must be able to create files in that directory. If the argument is null or
an empty string, the default directory is used.
dbpartitionnum
An input argument of type INTEGER that specifies a valid database partition
number. Specify -1 for the current database partition, NULL or -2 for an aggregate of
all database partitions.
Authorization
Execute privilege on the AUDIT_ARCHIVE procedure or table function, and SECADM
authority on the database.
Examples
Example 1: Archive the audit log(s) for all database partitions to the default directory using
the procedure.
CALL SYSPROC.AUDIT_ARCHIVE(NULL, NULL)
Example 2: Archive the audit log(s) for all database partitions to the default directory using
the table function.
SELECT * FROM TABLE(SYSPROC.AUDIT_ARCHIVE('', -2)) AS T1
AUDIT_DELIM_EXTRACT – Performs extract to delimited file

The AUDIT_DELIM_EXTRACT stored procedure performs an extract to a delimited file on
archived audit files of the connected database. Specifically, to those archived audit files
that have filenames that match the specified mask pattern.
Instructor Guide
Syntax
>>-AUDIT_DELIM_EXTRACT--(--delimiter--,--target_directory--,--source_directory--,--> 

>--file_mask--,--event_options--)------------------------------><
Procedure parameters
delimiter – An optional input argument of type VARCHAR(1) that specifies the character
delimiter to be used in the delimited files. If the argument is null or an empty string, a
double quote will be used as the delimiter.
target_directory – An optional input argument of type VARCHAR(1024) that specifies the
directory where the delimited files will be stored. If the argument is null or an empty string,
same directory as the source_directory will be used
source_directory – An optional input argument of type VARCHAR(1024) that specifies the
directory where the archived audit log files are stored. If the argument is null or an empty
string, the audit default will be used.
file_mask – An optional input argument of type VARCHAR(1024) is a mask for which files
to extract. If the argument is null or an empty string, it will extract from all audit log files in
the source directory.
event_options – An optional input argument of type VARCHAR(1024) that specifies the
string defines which events to extract. This matches the same string in the db2audit utility.
If the argument is null or an empty string, it will extract all events.
Authorization
Execute privileges on the AUDIT_DELIM_EXTRACT and AUDIT_LIST_LOGS functions,
and SECADM authority on the database.
Examples
Note
Audit log files contain a timestamp as part of their naming convention.
Example - Performs a delimited extract on all audit log files archived on June 18th, 2007 in
the default archive directory. This example is extracting just execute events, using a double
quote (") character delimiter, and creating or appending the resulting extract files
(<category>.del) in the $HOME/audit_delim_extract directory.
CALL SYSPROC.AUDIT_DELIM_EXTRACT(NULL, '$HOME/AUDIT_DELIM_EXTRACT',
NULL, '%20070618%', 'CATEGORIES EXECUTE STATUS BOTH')
V8.2
Instructor Guide

Purpose — To discuss the stored procedures that the SECADM user can utilize to manage
the audit data for a database-level audit. A SYSADM user can also utilize the db2audit
command to archive and extract audit data for either database or instance-level audits.
Details —
Transition statement — Next we will discuss the steps that a security administrator might
use to archive and analyze the audit data produced by a DB2 database.
Instructor Guide
Listing archived audit logs using a Table function

• SYSPROC.AUDIT_LIST_LOGS
– Allows SECADM to list the archived audit logs for the database
prior to extraction.
– The source location of the archived audit logs can be specified as
a parameter.
SELECT FILE FROM

TABLE(SYSPROC.AUDIT_LIST_LOGS('/auditarchive')) AS T
WHERE FILE LIKE 'db2audit.dbname.log.0.200604%‘
FILE -------------------------------------- ...

db2audit.dbname.log.0.20060418235612
db2audit.dbname.log.0.20060419234937
Figure 11-23. Listing archived audit logs using a Table function CL4636.0
Notes:
The database security administrator can use the AUDIT_LIST_LOGS table function in a
SQL SELECT statement to list the names of archived audit log files. The names returned
might then be used with the AUDIT_DELIM_EXTRACT stored procedure to generate the
audit data in a set of files.
AUDIT_LIST_LOGS table function – Lists archived audit log files

The AUDIT_LIST_LOGS table function lists the archived audit log files for a database
which are present in the specified directory.
Syntax
>>-AUDIT_LIST_LOGS--(--directory--)----------------------------><
V8.2
Instructor Guide
Uempty Procedure parameters

directory - An optional input argument of type VARCHAR(1024) that specifies the
directory where the archived audit file(s) will be written. The directory must exist on the
server and the instance owner must be able to create files in that directory. If the
argument is null or an empty string, then the search default directory is used.
Authorization
EXECUTE privilege on AUDIT_LIST_LOGS table function and SECADM authority on the
database.
Examples
Example 1: Lists all archived audit logs in the default audit archive directory:
SELECT * FROM TABLE(SYSPROC.AUDIT_LIST_LOGS('')) AS T1
Note
This only lists the logs in the directory for database on which the query is run.
Archived files have the format: db2audit.db.<dbname>.log.<timestamp>
Instructor Guide
Instructor notes:
Purpose — To discuss the AUDIT_LIST_LOGS function that the SECADM user can utilize
to retrieve the file names for previously archived audit log files.
Details —
Transition statement — Next we will discuss the steps that a security administrator might
use to archive and analyze the audit data produced by a DB2 database.
V8.2
Instructor Guide
Uempty
Access to audit data

1. Use db2audit.ddl sample DDL statements in sqllib/misc to
create a set of DB2 tables:
• One table is defined for each category of audit data
• A TABLESPACE with a page size at least 8K must be used
2. Use SYSPROC.AUDIT_ARCHIVE procedure to archive active

audit log data
3. Use SYSPROC.AUDIT_DEL_EXTRACT procedure to extract

archived audit data to a set of delimited files:
• One file for each category of audit data, for example, execute
audit records written to a file named ‘execute.del’
• Large object column data written to a file named ‘auditlobs’
4. Use LOAD or IMPORT utility to load data into tables
• SECADM will need to be granted LOAD authority by SYSADM
5. Use SQL to query audit data in tables
Figure 11-24. Access to audit data CL4636.0
Notes:
The SECADM user can access the audit data generated for database-level audits using
the DB2 provided stored procedures and standard DB2 utilities.
Creating tables to hold the DB2 audit data
Before you can work with audit data in database tables, you need to create the tables to
hold the data. You should consider creating these tables in a separate schema to isolate
the data in the tables from unauthorized users.
The format of the tables you need to create to hold the audit data might change from
release to release. New columns might be added or the size of an existing column might
change. The DB2 provided script, db2audit.ddl, creates tables of the correct format to
contain the audit records.
If you do not want to use all of the data that is contained in the files, you can omit columns
from the table definitions, or bypass creating certain tables, as required. If you omit
columns from the table definitions, you must modify the commands that you use to load
data into these tables.
Instructor Guide
Before audit data can be extracted, it needs to be archived. The

SYSPROC.AUDIT_ARCHIVE procedure can be used to archive current audit records into
a audit archive file.
The SYSPROC.AUDIT_DEL_EXTRACT procedure can be used to extract all or selected
audit records to a set of delimited files. There will be one delimited file for each audit record
type.
The delimited files can be loaded into DB2 tables using a LOAD or IMPORT utility. In order
run a LOAD utility the SECADM user will need to be granted LOAD authority for the
database.
V8.2
Instructor Guide

Purpose — To discuss the steps that a SECADM user could follow to prepare to analyze
the audit data produced by database-level audits. The concept here is to have a SECADM
user that manages the security and auditing for a database and that user might not have
been granted DBADM authority.
Details —
Transition statement — Next we will look at an example of a query that could be used to
review the EXECUTE category generated by a DB2 database.
Instructor Guide
Example of query using EXECUTE data

SELECT substr(USERID,1,10) as USERID,
substr(EVENT,1,15) as EVENT,
substr(ACTIVITYTYPE,1,12) as ACTIVITY_TYPE,
ROWSMODIFIED, ROWSRETURNED,
substr(STMTTEXT,1,30) AS SQLTEXT
FROM DB2SEC."EXECUTE" AS "EXECUTE“
WHERE ACTIVITYTYPE IN ( 'READ_DML' , 'WRITE_DML' )
ORDER BY "EXECUTE".TIMESTAMP ASC
USERID EVENT ACTIVITY_TYPE ROWSMODIFIED ROWSRETURNED SQLTEXT

---------- --------------- ------------- ------------ -------------------- ------------------------------
db2user STATEMENT READ_DML 0 147 select * from inst461.stock
db2user STATEMENT READ_DML 0 1 SELECT TABNAME, TABSCHEMA, TYP
db2user STATEMENT READ_DML 0 147 select * from mytest
db2user STATEMENT WRITE_DML 3 0 update mytest set qty = 0 wher
db2user STATEMENT WRITE_DML 3 0 update mytest set qty = 10 whe
db2user STATEMENT READ_DML 0 147 select * from mytest
db2user STATEMENT WRITE_DML 3 0 delete from mytest where ite
db2user STATEMENT READ_DML 0 2 select * from syscat.roles
inst461 STATEMENT WRITE_DML 3 0 update stock set qty = 10 wher
inst461 STATEMENT WRITE_DML 6 0 update stock set qty = 0 where
Figure 11-25. Example of query using EXECUTE data CL4636.0
Notes:
The visual shows an example of a SQL query that accesses the EXECUTE audit data. The
output includes an activity type, the SQL text and a count for the number of rows returned
or modified for DML statements.
V8.2
Instructor Guide

Purpose — To show an example of using a SQL query to access selected data from the
table containing the new EXECUTE audit data made available with DB2 9.5.
Details —
Transition statement — Let’s see an example of using the db2audit command can be
used to extract audit data to a file for review.
Instructor Guide
Extracting audit data to a file

• Use the db2audit command file option :
db2audit extract file sampexec.txt category execute from files
'/home/inst461/audit_archive/db2audit.db.MUSICDB.log.0.20080706225139'
timestamp=2008-07-06-22.47.46.597946;
category=EXECUTE;
audit event=STATEMENT;
event correlator=21;
event status=0;
database=MUSICDB;
userid=payuser1;
authid=PAYUSER1;
session authid=PAYUSER1;
origin node=0;
coordinator node=0;
application id=*LOCAL.inst461.080707002700;
application name=db2bp;
…………………
statement text=select empno, lastname , workdept, salary from hr.employee order by workdept,empno;
statement isolation level=CS;
Compilation Environment Description
isolation: CS
query optimization: 5
min dec div 3: NO
degree: 1
SQL rules: DB2
refresh age: +00000000000000.000000
resolution timestamp: 2008-07-06-22.47.46.000000
federated asynchrony: 0
maintained table type: SYSTEM;
rows modified=0;
rows returned=42;
Figure 11-26. Extracting audit data to a file CL4636.0
Notes:
The visual shows an example db2audit command that a SYSADM user could use to create
a file containing the generated audit data. In this example the EXECUTE category audit
records will be formatted and written to a file.
The command used in this example was:
db2audit extract file sampexec.txt category execute from files
'/home/inst461/audit_archive/db2audit.db.MUSICDB.log.0.20080706225139'
V8.2
Instructor Guide
Uempty The sample output could look like the following:

timestamp=2008-07-06-22.47.46.597946;
category=EXECUTE;
event status=0;
database=MUSICDB;
userid=payuser1;
authid=PAYUSER1;
origin node=0;
coordinator node=0;
package schema=INST461;
package name=SQLC2G13;
package section=201;
local transaction id=0x4ca0000000000000;
global transaction id=0x0000000000000000000000000000000000000000;
uow id=12;
activity id=1;
statement invocation id=0;
statement nesting level=0;
activity type=READ_DML;
statement text=select empno, lastname , workdept, salary from hr.employee
order by workdept,empno;
isolation: CS
min dec div 3: NO
degree: 1
SQL rules: DB2
refresh age: +00000000000000.000000
rows modified=0;
rows returned=42;
timestamp=2008-07-06-22.47.46.598278;
category=EXECUTE;
audit event=COMMIT;
Instructor Guide
event status=0;
database=MUSICDB;
userid=payuser1;
authid=PAYUSER1;
origin node=0;
coordinator node=0;
package schema=NULLID;
package section=0;
local transaction id=0x62a4000000000000;
activity type=OTHER;
timestamp=2008-07-06-22.47.46.642627;
category=EXECUTE;
event status=0;
database=MUSICDB;
userid=payuser1;
authid=PAYUSER1;
origin node=0;
coordinator node=0;
uow id=13;
activity id=1;
statement text=select a.empno , a.lastname , b.deptname from hr.employee
a, hr.department b where a.workdept = b.deptno order by lastname;
isolation: CS
V8.2
Instructor Guide
Uempty query optimization: 5

min dec div 3: NO
degree: 1
SQL rules: DB2
refresh age: +00000000000000.000000
rows modified=0;
rows returned=42;
timestamp=2008-07-06-22.47.46.642986;
category=EXECUTE;
audit event=COMMIT;
event status=0;
database=MUSICDB;
userid=payuser1;
authid=PAYUSER1;
origin node=0;
coordinator node=0;
package section=0;
activity type=OTHER;
timestamp=2008-07-06-22.47.50.917730;
category=EXECUTE;
event status=0;
database=MUSICDB;
userid=payuser1;
authid=PAYUSER1;
origin node=0;
coordinator node=0;
Instructor Guide
uow id=14;
activity id=1;
statement text=select lastname,salary,job from payuser1.myemp where salary
> 20000;
isolation: CS
min dec div 3: NO
degree: 1
SQL rules: DB2
refresh age: +00000000000000.000000
rows modified=0;
rows returned=42;
timestamp=2008-07-06-22.47.50.917918;
category=EXECUTE;
audit event=COMMIT;
event status=0;
database=MUSICDB;
userid=payuser1;
authid=PAYUSER1;
origin node=0;
coordinator node=0;
package section=0;
V8.2
Instructor Guide
Uempty activity type=OTHER;
timestamp=2008-07-06-22.47.51.173277;
category=EXECUTE;
event status=0;
database=MUSICDB;
userid=payuser1;
authid=PAYUSER1;
origin node=0;
coordinator node=0;
uow id=15;
activity id=1;
activity type=WRITE_DML;
statement text=update payuser1.myemp set salary = salary + 1000 where
salary > 25000;
isolation: CS
min dec div 3: NO
degree: 1
SQL rules: DB2
refresh age: +00000000000000.000000
rows modified=42;
rows returned=0;
Instructor Guide
Instructor notes:
Purpose — To show an example of using the db2audit command to create a file containing
selected audit record data. Even though the EXECUTE category audit data can only be
generated using a database-level audit, the db2audit command can still be used by the
SYSADM user to extract the audit data to a file for review. The db2audit extract could also
have been used to create delimited files for loading tables.
Details —
Transition statement — Let’s summarize the objectives for this lecture unit.
V8.2
Instructor Guide
Uempty
Unit summary
• Describe the tasks for DB2 database auditing performed by
the SYSADM user
• Utilize the db2audit command to implement instance-level
auditing and to configure the audit data and archive locations
• Create audit policies to enable collection of specific
categories of audit data for a DB2 database
• Assign audit policies to selected tables, users or database
roles using the AUDIT statement
Notes:
Instructor Guide
Instructor notes:
Purpose —
Details —
V8.2
Instructor Guide
Uempty
Student exercise 10
Notes:
Instructor Guide
Instructor notes:
Purpose —
Details —
V9.0
backpg
Back page

CL4636inst

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CL4636inst

Uploaded by

Copyright:

Available Formats

V9.

IBM Training Front cover

DB2 10.5 for LUW Advanced Database Administration with DB2

April 2015 edition

© Copyright International Business Machines Corporation 2005, 2015.

Course description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

Unit 1. Advanced Monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

© Copyright IBM Corp. 2005, 2015 Contents iii

Unit 2. Advanced Table Space Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-1

TOC Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-113

Unit 3. DB2 10.5 BLU Acceleration Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1

© Copyright IBM Corp. 2005, 2015 Contents v

LOAD utility considerations for column-organized tables . . . . . . . . . . . . . . . . . . . . . . . . .3-104

Unit 4. DB2 10.5 BLU Acceleration Implementation and Use . . . . . . . . . . . . . . . . . . . . . . . . .4-1

© Copyright IBM Corp. 2005, 2015 Contents vii

Unit 6. Using Optimizer Profiles to control Access Plans . . . . . . . . . . . . . . . . . . . . . . . . . . .6-1

TOC Sample query 1: Access plan - Profile 6 - Index Anding . . . . . . . . . . . . . . . . . . . . . . . . . . 6-93

Unit 7. Table Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1

© Copyright IBM Corp. 2005, 2015 Contents ix

Partitioning on multiple columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-39

Unit 8. Advanced Table Reorganization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-1

TOC Goals of the REORG utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-11

© Copyright IBM Corp. 2005, 2015 Contents xi

Unit 9. Multiple Dimension Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-1

TOC Showing the background Index Cleanup process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-83

Unit 10. Advanced Data Movement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1

© Copyright IBM Corp. 2005, 2015 Contents xiii

LOAD performance experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-50

TOC Setting options in the SYSTOOLS.ADMIN_MOVE_TABLE control table using

Unit 11. DB2 Database Auditing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1

© Copyright IBM Corp. 2005, 2015 Contents xv

© Copyright IBM Corp. 2005, 2015 Trademarks xvii

pref Course description

© Copyright IBM Corp. 2005, 2015 Course description xix

tables, users or database roles to perform selective collection of

© Copyright IBM Corp. 2005, 2015 Course description xxi

• Implement DB2 Instance audit data collection using the db2audit

© Copyright IBM Corp. 2005, 2015 Course description xxiii

© Copyright IBM Corp. 2005, 2015 Agenda xxv

Uempty Unit 1. Advanced Monitoring

What this unit is about

What you should be able to do

© Copyright IBM Corp. 2005, 2015 Unit 1. Advanced Monitoring 1-1

© Copyright IBM Corporation 2013

Figure 1-1. Unit objectives CL4636.0

Uempty Instructor notes:

© Copyright IBM Corp. 2005, 2015 Unit 1. Advanced Monitoring 1-3

Monitoring infrastructure introduced in DB2 9.7

• DB2 9.7 introduced a low-impact, efficient alternative to the traditional

• Why implement a new infrastructure?

© Copyright IBM Corporation 2013

Figure 1-2. Monitoring infrastructure introduced in DB2 9.7 CL4636.0

© Copyright IBM Corp. 2005, 2015 Unit 1. Advanced Monitoring 1-5

Snapshot infrastructure characteristics

• SQL snapshot wrappers impose heavy overhead:

• Routines are fenced

© Copyright IBM Corporation 2013

Figure 1-3. Snapshot infrastructure characteristics CL4636.0

© Copyright IBM Corp. 2005, 2015 Unit 1. Advanced Monitoring 1-7

• Access to monitor metrics:

© Copyright IBM Corporation 2013

© Copyright IBM Corp. 2005, 2015 Unit 1. Advanced Monitoring 1-9

Uempty Instructor notes:

© Copyright IBM Corp. 2005, 2015 Unit 1. Advanced Monitoring 1-11