DB2 V11 Tech Overview

DB2 for LUW V11 :
An Inside Technical Look
Matt Huras, Distinguished Engineer, IBM

Platform: Linux Unix Windows
Session : C1
Time : 10:40 – 11:40 Mon Nov 14
Safe Harbor Statement
Copyright © IBM Corporation 2016. All rights reserved.
U.S. Government Users Restricted Rights - Use, duplication, or disclosure restricted by GSA ADP Schedule Contract
with IBM Corporation
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES

ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE
INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY
KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON CURRENT THINKING
REGARDING TRENDS AND DIRECTIONS, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT
NOTICE. FUNCTION DESCRIBED HEREIN MY NEVER BE DELIVERED BY I BM. IBM SHALL NOT BE
RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS
PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS
INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS
FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY
AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE.
IBM, the IBM logo, ibm.com and DB2 are trademarks or registered trademarks of International Business
Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms
are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols
indicate U.S. registered or common law trademarks owned by IBM at the time this information was published.
Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at “Copyright and trademark information” at
www.ibm.com/legal/copytrade.shtml
Denotes features that may be enhanced in a later delivery

DB2 Version 11.1 Highlights ( GA: June, 2016)
Warehousing Workloads :
Core Mission Critical Workloads :
Most Consumable, Most Scalable
Extending DB2 Leadership
In-Memory Warehousing Platform
Comprehensive Enterprise Security Massive Scale Warehousing at
In-Memory Performance
Enterprise Encryption
• Centralized Key Managers (KMIP) MPP BLU Scalability
• PB scale in-memory warehousing
Availability 2nd only to DB2 for zOS
Simple Fast Deployment

• Up and running in hours Next Gen In-Memory Performance,
Even Greater Availability Function & Workloads
• Zero data loss DR with HADR
• More online management • Faster ELT/ETL performance
More Platforms Supported • More Query Workloads Optimised
• Power Linux (LE) • More Function supported
• Virtualization for RDMA (x86) • Generated Columns
• RCAC
• OLAP + BLU Perf
Significant Core Database Advances
Enhanced Compatibility
Very Large Database Performance
• Higher user throughput Multi-Lingual SQL Advances
Simpler, Faster, More Online Upgrades • Postgres SQL
• Faster, no need for offline backup Support for European Languages
• Streamlined HADR upgrade • Codepage 819
• DB2 Version 9.7 direct to 11.1

• RCAC
• OLAP + BLU Perf
Review : DB2 10.5 FP5 Native Encryption
In 10.5 FP5, a local flat file is

used to manage master keys for
all databases in the instance
Master Key
Introduced in DB2 for LUW 10.5 FP5 Protects against threats to data at rest
DB2 automatically encrypts all database data – Users accessing data outside the scope of the DBMS
and logs, and any backup images created – Theft of loss of physical media
How to encrypt a database ? Meets compliance requirements, e.g.
– PCI DSS, HIPPA,…
CREATE DATABASE mydb ENCRYPT Uses a standard 2-tier key model
– Data/logs encrypted with a Data Encryption Key (DEK)
- or -
– DEK is encrypted with a Master Key
RESTORE DATABASE mydb from – Encrypted DEK is stored with the database and backups
/home/db2inst1/db2 ENCRYPT – Master Keys are securely stored in a key manager
• In 10.5, DB2 includes a local keystore file at the instance level, to
manage master keys for the databases in the instance
Encryption and Enterprise Key Management
V11.1 adds support for KMIP 1.1 centralized key managers
e.g. IBM Security Key Lifecycle Manager (ISKLM), SafeNet KeySecure, …
V11 adds the option of using

an external centralized key
manager that can be used
DB2 V11.1 across many databases, file
systems and other uses
across an enterprise
Master Key
Example configuration changes to use a KMIP centralized key manager

Local Keystore Indicates KMIP protocol will be
update dbm cfg using keystore_type pkcs12 used to interact with key manager
update dbm cfg using keystore_location /home/thomas/keystores/localkeystore.p12
Location of configuration file
Centralized KMIP Key Manager containing host, port and other
update dbm cfg using keystore_type kmip details of centralized key manager
update dbm cfg using keystore_location /home/thomas/keystores/isklm.cfg
isklm.cfg
Encryption and Enterprise Key Management
V11.1 adds support for KMIP 1.1 centralized key managers
e.g. IBM Security Key Lifecycle Manager (ISKLM), SafeNet KeySecure, …
V11 adds the option of using

an external centralized key
manager that can be used
DB2 V11.1 across many databases, file
systems and other uses
across an enterprise
Master Key
V11 also adds a preview of
possible future support for
DB2 V11.1 FPn
managing keys via external
Hardware Security Modules
(HSMs)
Example configuration changes to use a KMIP centralized key manager

Local Keystore Indicates KMIP protocol will be
update dbm cfg using keystore_type pkcs12 used to interact with key manager
update dbm cfg using keystore_location /home/thomas/keystores/localkeystore.p12
Location of configuration file
Centralized KMIP Key Manager containing host, port and other
update dbm cfg using keystore_type kmip details of centralized key manager
update dbm cfg using keystore_location /home/thomas/keystores/isklm.cfg
isklm.cfg
Additional V11 Security Enhancements
Support for hardware accelerated

encryption on Power 8
• Leverages native, in-core Power 8 vector
instructions for AES encryption and de-
encryption
• Enabled automatically
HADR log data flows

via SSL
Support for encrypted HADR
flows via (SSL)
• Log data encrypted via secure
sockets layer
• All synchronization modes supported
• All standbys supported
• Supported initially on x86 on single node
• Other platforms and configurations
expected later
8
Very Large DB Scalability & Manageability : Examples
Internal Efficiency Improvement : New Latch/Concurrency Management
Existing Protocol (Summary) :
• Hash bucket latch taken in ‘exclusive’ mode whenever Bufferpool Hash Table
a page added or removed from a hash chain
• Hash bucket latch also taken in ‘exclusive’ mode Latch 0 Page47 Page141 Page235
whenever any page in the hash chain accessed
New Protocol (Summary) : Latch 1 Page142

• No change for page additions and removals
• Hash bucket latch taken in ‘share’ mode for most
Latch 2 Page237 Page49
page accesses
Result : A “hash bucket”

• Significant performance and scalability improvement for high concurrency workloads
• Most pronounced for highly concurrent workloads with frequently accessed common pages
• E.g. small table lookups (especially via nested-loop joins), index root page access
• Improvements manifest as lower transaction latency, lower latch and/or lock waits and increased scalability
External Manageability Improvement : Per-Partition Online Inplace REORG

Existing Support
• Online inplace REORG is not supported on range-partitioned tables
New Support
• Online inplace REORG can be run on an individual partition of a range-partitioned table
• Initial support requires no non-partitioned (global) indexes
REORG … INPLACE … ALLOW WRITE ACCESS … ON DATA PARTITION p1 …
Improved Performance for Highly Concurrent Workloads
V11 revamps DB2’s internal bufferpool latching protocol
– Significantly reduces contention
– Benefits most pronounced on high concurrency transactional workloads
Workload #1 - DB2 ESE Workload #2 – DB2 pureScale

9000 14000
8000 12000
1.58x
7000
10000 2.1x
6000
5000 8000
4000 6000
3000
4000
2000
1000 2000
0 0
DB2 V10.5 fp5 DB2 V11.1 DB2 V10.5 fp5 DB2 V11.1
tps 5040 7950 tps 5848 12448
• Workload 1 based on an industry • Workload 2 implements a warehouse-based

benchmark standard transactional order system
• 500 connections • 160 connections
• POWER7 32c, 512 GB • 4 members, 2 CFs with 16c, 256 GB
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will
experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage
configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
Backup and Log Compression Acceleration
Hardware Acceleration on POWER 7+, POWER 8
Backup and log compression Relative image sizes & times – BLU database
can be offloaded to ‘nx842’ Backup image size
uncompressed Backup time
hardware accelerator on Restore time
Power 7+ and Power 8 Backup size drops significantly… But backup
S/W compressed takes 3.5x
– nx842 is built-into these Restore takes 1.5x longer longer!
processors to support AIX nx842 gives good compression
Active Memory Expansion nx842 compressed And performance is similar
to uncompressed
– Used in V11 to provide efficient
0 50 100 150 200 250 300 350 400
DB2 backup and log
backup size pct backup time pct restore time pct
compression
How to Enable
Backup
Significant improvement in BACKUP DATABASE mydbname
backup time and CPU usage COMPRESS COMPRLIB libdb2nx842.a
- or –
compared to software db2set DB2_BCKP_COMPRESSION=NX842
compression BACKUP DATABASE mydbname COMPRESS
– Gives most of the space Log Archival

benefit of s/w compression UPDATE DATABASE CONFIGURATION FOR mydbname
USING LOGARCHCOMPR1 nx842
while saving both elapsed and
CPU time Restore and Log Retrieve automatically decompress, with automatic
reversion to software decompression if accelerator not present.
Easier Upgrades
Upgrade directly from Version 9.7, 10.1 and 10.5 (3 releases back !)
Ability to recover through V10.5 FP7 V11

database version upgrades
– An offline backup is no longer required Online Backup (A) UPGRADE Online Backup …..
before or after UPGRADE Fully Logged

Error
Transactions Transactions ……
– Recovery via roll-forward through
database upgrade now possible Recover with Online Backup (A)
– Applies to all editions and configurations and Rollforward
except Database Partitioning Feature (DPF)
– Pre-req: must start from V10.5 FP 7 or later
HADR environments can now be Primary

upgraded without the need to Standby(s)
re-initialize the standby database
– Standby databases will now ‘replay’ the
UPGRADE
UPGRADE
– pureScale support expected later Replays through
– Pre-req: must start from V10.5 FP7 or later UPGRADE
Streamlined HADR Version Upgrade
No more need to re-initialize the standby database during version upgrades
– Saves significant time & effort especially with large and numerous databases
Procedure overview : Primary pre-V11
1. PRIMARY : Standby
DEACTIVATE database; DB2STOP instance
Upgrade the instance using db2iupgrade UPGRADE
– db2iupgrade invokes db2ckupgrade which will check to ensure BACKUP
primary and standby log positions match
Backup
– db2ckupgrade will not issue STOP HADR or change role Image
RESTORE
2. STANDBY :
DEACTIVATE database; DB2STOP instance Backup
Upgrade the instance using db2iupgrade Image
– db2ckupgrade will no longer fail ‘because it is an HADR standby’
3. STANDBY : UPGRADE the database
– Returns successful – this indicates that the standby is now waiting for
Primary
V11
log data from a subsequent UPGRADE issued on the primary
– No connections can be made to a standby database in this state
(including with Reads on Standby) Standby
– Can be monitored via db2pd –hadr
4. PRIMARY : UPGRADE the database UPGRADE
– Will ship log data to standby; Standby replays these log records
5. PRIMARY : ACTIVATE the database
– Now primary and standby can resume normal operations Replays thru
UPGRADE
DB2 pureScale: Simplified Install and Deployment
Major Focus on Fast “Up and Running”

– Up and running in hours
Install re-engineering includes:

– Focus on “Push-Button” install for pureScale clusters
• Install complexity reduced by approximately 40% (sockets) or 25% (RDMA)
• Smarter defaults, intuitive options, parallel & quick pre-deployment validation across hosts
– Very significant simplification in setting up GPFS replication
• Also simplified conversion to GPFS replication post install via db2cluster
– Increased Resiliency for aborted/partial installations, clean rollback for clean re-
execution
Additional assistance via:

– Simplified documentation
– Enhanced pre-checking of storage, elimination of manual verification of firmware libraries
– Increased intuitive and user-friendly errors & warnings
DB2 pureScale: Simplified Storage Replication
Task Pre-DB2 v11.1 DB2 v11.1

Create a replicated file system 8 GPFS commands 1 db2cluster command
Convert a file system (to replicated) 24 GPFS commands 2 db2cluster commands
Add a disk to a replicated FS 8 GPFS commands 1 db2cluster command
Remove a disk from a replicated FS 7 GPFS commands 1 db2cluster command
Example : File System Capacity Increase & Replication Enablement
Pre-convert view of the non-replicated file system:
PATH ON LOCAL HOST OTHER KNOWN PATHS DISK NAME RDNCY GRP ID COMMENT STATE SIZE FREE
-------------------- ----------------- -------------- ------------ ------- ------ ------ ------
/dev/dm-8 gpfs177nsd - DIRECT ATTACHED UP 15.0G 10.0G
Convert command :
db2cluster -cfs –enablereplication -filesystem db2bkup
-------------------- ----------------- -------------- ------------ ------- ------ ------ ------
/dev/dm-8 gpfs177nsd 1 DIRECT ATTACHED UP 15.0G 10.0G
Add the additional disks for capacity increase and replication :

db2cluster -cfs -add -filesystem db2bkup -disk /dev/dm-9 -rdncy_grp_id 1
-disk /dev/dm-16,/dev/dm-17 -rdncy_grp_id 2
-fstiebreaker /dev/sda
-------------------- ----------------- -------------- ------------ ------- ----- ------ ------
/dev/sda gpfs181nsd - FSTIEBREAKER UP 100.0M 0.0B
Initiate the replication of data from redundancy group 1 to group 2 via a new db2cluster option:
db2cluster –cfs –replicate –filesystem db2bkup
16
Horizontal Scaling with DB2 pureScale on POWER Linux
Scale-out Throughput – DB2 pureScale on LE POWER Linux
140
Thousands of SQL statements/s
Sockets RDMA
120 133
100 105
82.7
80 70
72.8
60 50.8
40
26.8 38
20
0
1 member 2 members 3 members 4 members
• 80% read / 20% write OLTP workload

• POWER8 4c/32t, 160 GB LBP
• 10 Gb RoCE RDMA Ethernet / 10 Gb TCP sockets
Improved Table TRUNCATE Performance in pureScale
More efficient processing of Group Bufferpool (GBP) pages

– Speeds up truncate of permanent tables especially with large GBP sizes
– Helps DROP TABLE and LOAD / IMPORT / INGEST with REPLACE option
– Enables improved batch processing with these operations
Example
– Workload with INGEST (blue) and TRUNCATE (green) of an unrelated table
– New code has much smaller impact on OLTP workload than DB2 10.5 fp5
Application Throughput - DB2 v10.5 fp5 Application Throughput with Enhancement
pureScale Member Subsets Review
CALL SYSPROC.WLM_CREATE_MEMBER_SUBSET( ‘BATCH',
'<databaseAlias>BATCH</databaseAlias>', '( 0, 1 )' );
CALL SYSPROC.WLM_CREATE_MEMBER_SUBSET( ‘OLTP',
'<databaseAlias>OLTP</databaseAlias>', '( 4, 5 )' );
Shared “Failover” Members

(used for BATCH or OLTP only if
BATCH ALL members in a subset fail) OLTP
Member 0 Member 1 Member 2 Member 3 Member 4 Member 5
CF CF
Database

CF CF
Database

CF CF
Database
BATCH OLTP
CF CF
Database
More Flexibility with New FAILOVER_PRIORITY
CALL SYSPROC.WLM_ALTER_MEMBER_SUBSET( ‘BATCH', NULL, ’(ADD 2 FAILOVER_PRIORITY 1)');
CALL SYSPROC.WLM_ALTER_MEMBER_SUBSET( ‘OLTP', NULL, ’(ADD 3 FAILOVER_PRIORITY 1)');
SUBSET MEMBER FAILOVER_ This information is available from

PRIORITY SYSCAT.MEMBERSUBSETMEMBERS and
BATCH 0 0 db2pd –membersubsetstatus -detail
BATCH 1 0
BATCH 2 1
OLTP 4 0
OLTP 5 0
OLTP 3 1
“Failover” Members
(used if ANY members
BATCH in the subset fail) OLTP
CF CF
Database
CALL SYSPROC.WLM_ALTER_MEMBER_SUBSET( ‘BATCH', NULL, ’(ADD 2 FAILOVER_PRIORITY 1)');
CALL SYSPROC.WLM_ALTER_MEMBER_SUBSET( ‘OLTP', NULL, ’(ADD 3 FAILOVER_PRIORITY 1)');
SUBSET MEMBER FAILOVER_ This information is available from

PRIORITY SYSCAT.MEMBERSUBSETMEMBERS and
BATCH 0 0 db2pd –membersubsetstatus detail
BATCH 1 0
BATCH 2 1
OLTP 4 0
OLTP 5 0
OLTP 3 1
BATCH OLTP
CF CF
Database
SUBSET MEMBER FAILOVER_
PRIORITY
BATCH 0 0 Member 2 now used for both BATCH
BATCH 1 0 and OLTP. Use DB2’s integrated
BATCH 2 1 Workload Manager (WLM) to manage.
OLTP 4 0
OLTP 5 0
OLTP 3 1
OLTP 2 2
BATCH OLTP
CF CF
Database
HADR Support for SYNC and NEARSYNC Mode
Combines pureScale and HADR to provide a near continuously
available system with robust RPO=0 disaster recovery
Related capabilities & enhancements include
– Combined pureScale and HADR rolling update supported
1. On STANDBY CLUSTER : Perform pureScale rolling update and commit
2. Issue TAKEOVER (New primary cannot form HADR connection with (now downlevel) new standby)
3. On NEW STANDBY (OLD PRIMARY) CLUSTER : Offline, parallel Update and Commit, then Activate
– In V11.1, HADR log send and replay can occur during crash recovery
• Allows logs written during crash recovery to replayed while crash recovery is occurring
- Previously, log send and replay was disabled during crash recovery
• Allows more rapid attainment of PEER state
• Especially important in pureScale during online member crash recovery
• Support added for both pureScale and non-pureScale
CF CF
Primary Cluster
CF CF
Standby DR Cluster
Applications
M1 M3 CFp CFs M2 M4
GDPC Support Enhancements Site A

Replication
Site B
DB2 V11 adds improved high availability for Geographically Dispersed DB2
pureScale Clusters (GDPC) for both RoCE & TCP/IP
– Multiple adapter ports per member and CF to support higher bandwidth and improved
redundancy at the adapter level
– Dual switches can be configured at each site to eliminate the switch as a site-specific
single point of failure (i.e. 4-switch configuration)
2 Switch (with site SPOF) NEW : 4 Switch (no site SPOF)

Primary Primary
CF Member 1 Member 3
CF
Member 1 Member 3
ro0 ib1
ro0 ro1 ro0 ro1 ro0 ro1
ro0 ib1 ro0 ib1
Storage Switch 1 Storage Switch 1 Switch 2

Peer 1 Peer 1
Replication
Site 1
Replication
Tiebreaker Site 1 Tiebreaker
Site 3 Host Host
Site 2 Site 2
Switch 3 Switch 4
Peer 2 Peer 2
Storage Switch 2
Storage
ro0 eth1 ro0 eth1
ro0 eth1 ro0 ro1 ro0 ro1 ro0 ro1
Member 2
Member 4 Secondary
Secondary Member 4
Member 2 CF
CF
pureScale Disaster Recovery Options
Variety of disaster recovery

options to meet your needs
Storage Replication
Physical Log Replication
• HADR
• Scripted Log File Shipping
Logical Log Replication

• Q Replication
• InfoSphere Change Data
Capture (CDC)
Geographically Dispersed
pureScale Cluster (GDPC)
Methods listed in blue support synchronous DR and RPO=0

28
V11 pureScale Feature Enhancements Summary
Installation, Upgrades and Deployments
– Simplified pureScale GPFS replication setup
Power Linux Little-Endian (LE) support
Linux Virtualization Enhancements
HADR Enhancements
GDPC Enhancements
Performance Enhancements
New Workload Balancing Flexibility
Text Search support

• RCAC
• OLAP + BLU Perf
BLU Acceleration: MPP Scale Out
Technology Query #1
– Pervasive SMP & MPP Query
Parallelism
(inter-partition query parallelism simultaneous with
intra-partition- parallelized, memory-optimized,
columnar, SIMD-enabled, BLU processing)
Value Proposition Query #1 Query #1 Query #1

– Improve Response Time processing processing processing
• Extend order-of-magnitude BLU

performance improvements to DPF !
• Near-linear scale to 100s
of DPF members
1/3 data 1/3 data 1/3 data
– Massively Scale Data
• Well beyond current practical limits
– Streamline BLU Adoption Hash partition Hash partition Hash partition
(BLU Acceleration) (BLU Acceleration) (BLU Acceleration)
• Add BLU Acceleration to existing
data warehouses
CREATE TABLE sales(…) DB2 10.5 BLU Capacity DB2 v11.1 BLU Capacity
ORGANIZE BY COLUMN • 10s of TB • 1,000s of TB
DISTRIBUTE BY (C1,C2) • 100s of Cores • 1000s of Cores
BLU on DPF : Data Distribution Distribution Key
Rows
Just as with row organized tables …
Rows are distributed across DB partitions
via a distribution key
Hash Function
A distribution key is 1 or more columns
in the table
Each table defines its own distribution key
Joins will typically perform better if collocated
• Joined tables have matching distribution keys DB Partition DB Partition DB Partition
and are joined on those columns
Columnar Storage Columnar Storage Columnar Storage
New RANDOM distribution option

CREATE TABLE
sales(c1 INT, c2 INT, c3 CHAR(10),…)
ORGANIZE BY COLUMN
DISTRIBUTE BY RANDOM
RANDOM is a simple option to consider when

Collocated joins are not possible or not practical
(e.g. no suitable common distribution key, or
no equi-joins in the workload)
Collocated join performance advantages may
not be critical (e.g. very small tables)
Other distribution keys result in significant data
skew across the data partitions
BLU on DPF Architecture :
Common Compression Encoding
DB2 BLU with 4 data partitions
Just as in single node BLU … CPUs CPUs CPUs

CPUs
– Multiple compression techniques are
used, e.g. BLU BLU
BLUAcceleration
BLU BLU
• Dictionary Dynamic In-Memory Processing
• Huffman
• Prefix
• Offset
– Each column uses it’s own compression
dictionary
MyTable MyTable MyTable MyTable

New with BLU on DPF … Partition 1 Partition 2 Partition 3 Partition 4
A B C D A B C D A B C D A B C D
– Each column uses the same compression
encoding on all database partitions
– Allows data communication in
compressed form with no additional
processing costs
– Increases effective network bandwidth
significantly In general, different But, a given column X will
columns will have always have the same
different compression encoding in all partitions
encodings
BLU on DPF Architecture : Compressed Communications
SELECT *
FROM CUST C,SALES S
WHERE
Any given column is encoded C.CUST_ID=S.CUST_ID
the same way across all
Partition 0
partitions (slices) of the table Read TQ1
Process
When data is shipped across Return results
partitions there is no need to

decode/encode
Benefits: Scan SALES

Apply
Scan SALES
Apply
Predicates Predicates
–No CPU consumed to Read TQ2 Read TQ2
Join Join
decompress/recompress Insert into TQ1 Insert into TQ1
–Minimal communication bandwidth
consumed Scan CUST Scan CUST
Apply Apply
Predicates Predicates
Broadcast into Broadcast into
TQ2 TQ2
DB Partition 1 DB Partition 2
Demonstrating BLU MPP Virtually Linear Scaling
DB2 Version 11.1 on an IBM Power Systems E850 Cluster
– Each of 6 E850 with 24 P8 cores & 1TB RAM
Scaling Hardware at Scaling Hardware along

constant Data Volume with Data Volume
250
120
200 1.92x 100
QpH
Queries Per Hour
Queries Per Hour

80 Held up!
150
60
100
40
50 20
0
0 6 Node (20TB) - 12
3 Node (10TB) - 6 6 Node (10TB) - 12 3 Node (10TB) - 6 MLNS
MLNS
MLNS MLNS
QpH 113 109
QpH 111 213
Scaling was measured in two different ways

– Doubling the hardware but keeping the database constant
– Doubling the hardware and doubling the database size
– Both tests used an internal heavy analytics workload based on a TPC-DS schema, with
scale ranging from 10TB to 20TB
DB2 BLU MPP Can Deliver Super-Linear Scaling
BLU 4TB Single Node vs. BLU DPF 3 Node

Query Throughput (Higher is Better)
300
Workload
- BD Insights
250
- 4TB Database
- 60 Concurrent Streams
200
BLU MPP System
150
had ~3x more
compute resource in
100 total
- 2.25x more cores
50 - 3x more RAM
- Faster I/O sub-system
0
Single node MPP 3 node 16x speedup (!)
36
Core BLU Acceleration Advances
Significant advances in the core in-memory BLU engine
– Native columnar nested-loop join
– Native columnar sort using new fast parallel radix sort technique
– Native columnar OLAP functions
– Parallel DML for declared global temporary tables (DGTTs)
– Query rewrite improvements
– Improved SORTHEAP utilization
– Faster SQL MERGE processing
Significant
Out-of-the-Box
These apply on single node Performance
and MPP clusters Leap
BLU Acceleration: Native Nested Loop Join
Native BLU support for Nested Loop joins
• Allows non-equality joins to be executed natively in the columnar run-time engine via
nested loop join
• Previously such joins would execute in ‘compensation’ mode (in the row run-time) engine
• May allow other dependent plan operators to also execute natively in the columnar
run-time engine
Net: significant performance improvement for queries where nested
loop join could play a significant role (eg. by non-equality joins)
Consider this example … (continued on next page)
SELECT
ITEM_DESC, SUM(PERCENT_DISCOUNT), SUM(EXTENDED_PRICE),
SUM(SHELF_COST_PCT_OF_SALE)
FROM
PERIOD, DAILY_SALES, PRODUCT, REPORT_PERIOD RP
WHERE
PERIOD.PERKEY=DAILY_SALES.PERKEY AND
PRODUCT.PRODKEY=DAILY_SALES.PRODKEY AND
PERIOD.CALENDAR_DATE BETWEEN RP.START_DATE AND RP.END_DATE AND
RP.RPT_NO in (33, 34)
GROUP BY
ITEM_DESC
BLU Acceleration: Native Nested Loop Join
Pre-V11
Without Native Nested Loop Join: Compensated Execution GRPBY
Native Execution
The result of joining the fact TBSCAN
SORT
table and the other dimension NLJOIN
(PRODUCT) is sent unfiltered to the /----------+-----------\
row engine (all the projected fact CTQ TBSCAN
HSJOIN TEMP
table columns must be converted to /---------+----------\ |
row format). Only then is time HSJOIN TBSCAN vv
CTQ
filtering applied. /-------+-------\ | |
TBSCAN TBSCAN PRODUCT TBSCAN
DAILY_SALES PERIOD REPORT_PERIOD
V11
With Native Nested Loop Join:
The REPORT_PERIOD to PERIOD range vv
CTQ
GRPBY
join is done within the columnar HSJOIN
engine, allowing the fact table to be /----------+-----------\
filtered with the data remaining in HSJOIN TBSCAN
/---------+---------\ |
columnar format. TBSCAN NLJOIN PRODUCT
| /-------+-------\
Net : Massive performance gains DAILY_SALES TBSCAN TBSCAN
REPORT_PERIOD PERIOD
for a class of queries.
10x or more common.
BLU Acceleration: Industry Leading Parallel Sort
New innovative radix sort Pre-V11: TPC-DS q51 302 seconds
implementation Compensated Execution RETURN
FILTER
Native Execution TBSCAN
SORT
• Industry leading performance and multi- UNION
/---------------+----------------\
core parallelism HSJOIN<
/--------------+---------------\
HSJOINx
/---+----\
TBSCAN TBSCAN TBSCAN TBSCAN
TEMP TEMP TEMP TEMP
• Research by IBM TJ Watson Research LMTQ
TBSCAN
TBSCAN
SORT
SORT CTQ
• http://www.vldb.org/pvldb/vol8/p1518-cho.pdf CTQ GRPBY
GRPBY ^HSJOIN
| /-------+-------\
^HSJOIN TBSCAN TBSCAN
/-------+-------\ | |
Natively implemented in BLU TBSCAN

CO-TABLE: TPCDS
WEB_SALES_2
TBSCAN
CO-TABLE: TPCDS
DATE_DIM
CO-TABLE: TPCDS CO-TABLE: TPCDS
Q9 Q8
• Sort operations now execute directly on
encoded data natively in the columnar
run-time engine V11: : TPC-DS q51 45 seconds
RETURN
• Previously such joins would execute in CTQ
TBSCAN
‘compensation’ mode (in the row run-time) engine SORT
FILTER
TBSCAN
SORT
Net: very significant performance PIPE

UNION
/----------------+-----------------\
PIPE
improvement for workloads with TBSCAN

>HSJOIN
/-----------------+-----------------\
TBSCAN
xHSJOIN
/---+----\
TBSCAN TBSCAN
strong dependency on data order TEMP

TBSCAN
SORT
TEMP
TBSCAN
SORT
TEMP TEMP
GRPBY GRPBY
• e.g. ORDER BY, FETCH FIRST N ROWS, ^HSJOIN
/-------+-------\
^HSJOIN
/-------+-------\
Many OLAP functions – eg. RANK(), etc TBSCAN

CO-TABLE: TPCDS
TBSCAN
CO-TABLE: TPCDS
TBSCAN
CO-TABLE: TPCDS
TBSCAN
CO-TABLE: TPCDS
STORE_SALES DATE_DIM WEB_SALES DATE_DIM
Q2 Q1 Q9 Q8
BLU Acceleration: Significant Gains for ELT & ISV Apps
BLU Declared Global Temporary Table (not-logged DGTT) Parallelism
16x Faster !
Example BLU Single Node Overall Workload Gain
DB2 Version 11.1 on Intel Haswell EP
Query Throughput BD
Insights (800GB) Largest contributors to improvement
1.36x
1200
in this workload …
1000
Queries Per Hour
800
• Native Sort
600 Native BLU • Native OLAP (usually combined with sort)
400
Execution • Enables query plans to remain as much as possible
within the columnar engine
200
0
DB2 V10.5 FP5 DB2 V11.1 • SORTHEAP used for building hash tables for JOINs,
Improved GROUP BYs, and other runtime work
QpH 703.85 955.82
SORTHEAP • More efficient use allows for more concurrent intra-
query and inter-query operations to co-exist.
Utilization
Configuration Details
– 2 socket, 36 core Intel Xeon E5-2699 v3 @
2.3GHz
– 192GB RAM
– Internal Multiuser Analytical Workload 800GB
Numerous Core SQL Advances
New Advanced SQL Functionality
• New methodology for building advanced aggregate UDFs
• NZPLSQL support
• New data type (VARBINARY)
• Wide variety of additional SQL functions, including more flexible date/time functions and regular
expression functions
New BLU SQL Support

• IDENTITY and EXPRESSION generated columns
• European Language support (Codepage 819)
• Row and Column Access Control (RCAC)
• NOT LOGGED INITIALLY support
• Wide row support
• Logical character support (CODEUNITS32)
• Additional native columnar online analytical processing (OLAP) for deep in-database analytics
- eg. RANK, ROW_NUMBER, DENSE_RANK, FIRST_VALUE, ROWS/RANGE BETWEEN, …
Significant Advances in SQL compatibility with … DB2 : a polyglot

• PureData for Analytics database
• Oracle
• Postgres
BLU Acceleration: Wide Rows
Pre-V11
Similar as extended row CREATE TABLESPACE TS1 PAGESIZE 32K
support for row organized
tables CREATE TABLE T1(
c1 VARCHAR(16000),
c2 VARCHAR(16000),
• Row width can exceed page size c3 VARCHAR(16000))
ORGANIZE BY COLUMN IN TS1
• Max extended row size 1MB
• No lobification; individual column < FAILS >
values limited by page size

V11
CREATE TABLESPACE TS1 PAGESIZE 32K
Enabled by CREATE TABLE T1(
EXTENDED_ROW_SIZE c1 VARCHAR(16000),
database configuration c2 VARCHAR(16000),
c3 VARCHAR(16000))
parameter ORGANIZE BY COLUMN IN TS1
< SUCCEEDS >

New BINARY and VARBINARY Data Types
Padding
Similar to string types, but intended for
binary data CREATE TABLE T1 (C1 BINARY(3),
C2 CHAR(3)FOR BIT DATA);
String family of types now includes INSERT INTO T1 VALUES (x'6162‘, x’6162’);
• Character strings (CHAR, VARCHAR, SELECT * FROM T1;
LONG_VARCHAR (deprecated), CLOB) C1 C2
• Graphic strings (GRAPHIC, VARGRAPHIC, -------------------------------
LONG_VARGRAPIC (deprecated), x'616200‘ x’616220’
DBCLOB)
Equality Comparison
• Binary strings (BINARY, VARBINARY,
CREATE TABLE T1 (C1 BINARY(3),
BLOB) C2 CHAR(3)FOR BIT DATA);
Full support in DDL, DML, Utilities CREATE TABLE T2 (C1 BINARY(4),
C2 CHAR(4)FOR BIT DATA);
Example differences between BINARY INSERT INTO T1 VALUES (x'6162‘, x’6162’);
and CHAR FOR BIT INSERT INTO T2 VALUES (x'6162‘, x’6162’);
• Padding SELECT T1.C1 FROM T1,T2 WHERE T1.C1=T2.C1

• FOR BIT padded with trailing blanks x’20’s 0 records
• BINARY padded with trailing x’00’s
SELECT T1.C2, T2.C2 FROM T1,T2 WHERE
• Comparisons T1.C2=T2.C2
• FOR BIT uses IDENTITY collation T1.C2 T2.C2
• BINARY uses BINARY collation - strings of different --------- ----------
lengths never equal x'616220‘ x’61622020’
4
Many New SQL Functions (Supported on all tables (row and columnar))
Date and Time Functions (31)

– e.g. OVERLAPS(), DAYOFMONTH(),AGE(), …
Statistical Functions (9)
– e.g. COVARIANCE_SAMP(), MEDIAN(), (), WIDTH_BUCKET(), …
Regular Expression Functions (7)
– e.g. REGEXP_LIKE(), REGEXP_SUBSTR(), REGEXP_COUNT(), …
Bitwise Functions (12)
– e.g. INT4AND() INT4OR(),INT4XOR(), INT4NOT(), …
Misc. Functions (7)
– e.g. BTRIM(), RAWTOHEX(), …
dsmtop
• A new easy-to-use text-based User Interface (UI) for managing DB2
• Similar to db2top, but uses DB2’s lightweight in-memory monitoring functions
• i.e. the ‘MON_GET_*’ interfaces (not the older snapshot interfaces)
• Additional features include
• Metrics about newer DB2 features, eg.
• BLU (column store)
• pureScale
• Workload management
• Increased coverage of memory usage & reorgs
• Windows platform support !
• Menus for not-so-experienced DBAs
• Monitoring remote databases
dsmtop is available on developerworks now, and included in DB2 Version 11.1

Summary : V11.1 is a major new version
Significant enhancements for all DB2 topologies & workloads
• Single server scale-up SMP
• Continuously available DB2 pureScale
• MPP scale-out
Substantial performance, scalability and SQL function improvements
Numerous other improvements we didn’t have time to cover here, e.g.

• ADMIN_MOVE_TABLE() REPORT and TERM options
• NO TABLESPACE option for backups for backing up and recovering History file
• SET SQL_COMPAT='NPS‘ and other NPS compatibility improvements
• Additional monitoring elements
• Inline guidelines
Matt Huras
BLU Internals Concepts and Best
huras@ca.ibm.com
Practices
Please fill out your session
Session C1
evaluation before leaving!

DB2 V11 Tech Overview

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DB2 V11 Tech Overview

Uploaded by

Copyright:

Available Formats

DB2 for LUW V11 :

An Inside Technical Look

Matt Huras, Distinguished Engineer, IBM

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES

Denotes features that may be enhanced in a later delivery

Simple Fast Deployment

Simple Fast Deployment

In 10.5 FP5, a local flat file is

V11 adds the option of using

Example configuration changes to use a KMIP centralized key manager

V11 adds the option of using

Example configuration changes to use a KMIP centralized key manager

Support for hardware accelerated

HADR log data flows

New Protocol (Summary) : Latch 1 Page142

Result : A “hash bucket”

External Manageability Improvement : Per-Partition Online Inplace REORG

Workload #1 - DB2 ESE Workload #2 – DB2 pureScale

• Workload 1 based on an industry • Workload 2 implements a warehouse-based

– Gives most of the space Log Archival

Ability to recover through V10.5 FP7 V11

before or after UPGRADE Fully Logged

HADR environments can now be Primary

Major Focus on Fast “Up and Running”

Install re-engineering includes:

Additional assistance via:

Task Pre-DB2 v11.1 DB2 v11.1

Add the additional disks for capacity increase and replication :

• 80% read / 20% write OLTP workload

More efficient processing of Group Bufferpool (GBP) pages

Shared “Failover” Members

Member 0 Member 1 Member 2 Member 3 Member 4 Member 5

Shared “Failover” Members

Member 0 Member 1 Member 2 Member 3 Member 4 Member 5

Shared “Failover” Members

Member 0 Member 1 Member 2 Member 3 Member 4 Member 5

Member 0 Member 1 Member 2 Member 3 Member 4 Member 5

SUBSET MEMBER FAILOVER_ This information is available from

Member 0 Member 1 Member 2 Member 3 Member 4 Member 5

SUBSET MEMBER FAILOVER_ This information is available from

Member 0 Member 1 Member 2 Member 3 Member 4 Member 5

Member 0 Member 1 Member 2 Member 3 Member 4 Member 5

GDPC Support Enhancements Site A

2 Switch (with site SPOF) NEW : 4 Switch (no site SPOF)

ro0 ib1 ro0 ib1

Storage Switch 1 Storage Switch 1 Switch 2

ro0 eth1 ro0 eth1

ro0 eth1 ro0 ro1 ro0 ro1 ro0 ro1

Variety of disaster recovery

Logical Log Replication

Methods listed in blue support synchronous DR and RPO=0

Simple Fast Deployment

Value Proposition Query #1 Query #1 Query #1

• Extend order-of-magnitude BLU

Columnar Storage Columnar Storage Columnar Storage

New RANDOM distribution option

RANDOM is a simple option to consider when

Just as in single node BLU … CPUs CPUs CPUs

MyTable MyTable MyTable MyTable

partitions there is no need to

Benefits: Scan SALES

Scaling Hardware at Scaling Hardware along

Queries Per Hour

Scaling was measured in two different ways