Professional Documents
Culture Documents
DB2 V11 Tech Overview
DB2 V11 Tech Overview
IBM, the IBM logo, ibm.com and DB2 are trademarks or registered trademarks of International Business
Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms
are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols
indicate U.S. registered or common law trademarks owned by IBM at the time this information was published.
Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at “Copyright and trademark information” at
www.ibm.com/legal/copytrade.shtml
Master Key
Introduced in DB2 for LUW 10.5 FP5 Protects against threats to data at rest
DB2 automatically encrypts all database data – Users accessing data outside the scope of the DBMS
and logs, and any backup images created – Theft of loss of physical media
How to encrypt a database ? Meets compliance requirements, e.g.
– PCI DSS, HIPPA,…
CREATE DATABASE mydb ENCRYPT Uses a standard 2-tier key model
– Data/logs encrypted with a Data Encryption Key (DEK)
- or -
– DEK is encrypted with a Master Key
RESTORE DATABASE mydb from – Encrypted DEK is stored with the database and backups
/home/db2inst1/db2 ENCRYPT – Master Keys are securely stored in a key manager
• In 10.5, DB2 includes a local keystore file at the instance level, to
manage master keys for the databases in the instance
Encryption and Enterprise Key Management
V11.1 adds support for KMIP 1.1 centralized key managers
e.g. IBM Security Key Lifecycle Manager (ISKLM), SafeNet KeySecure, …
In 10.5 FP5, a local flat file is
used to manage master keys for
all databases in the instance
8
Very Large DB Scalability & Manageability : Examples
Internal Efficiency Improvement : New Latch/Concurrency Management
Existing Protocol (Summary) :
• Hash bucket latch taken in ‘exclusive’ mode whenever Bufferpool Hash Table
a page added or removed from a hash chain
• Hash bucket latch also taken in ‘exclusive’ mode Latch 0 Page47 Page141 Page235
whenever any page in the hash chain accessed
New Support
• Online inplace REORG can be run on an individual partition of a range-partitioned table
• Initial support requires no non-partitioned (global) indexes
REORG … INPLACE … ALLOW WRITE ACCESS … ON DATA PARTITION p1 …
Improved Performance for Highly Concurrent Workloads
V11 revamps DB2’s internal bufferpool latching protocol
– Significantly reduces contention
– Benefits most pronounced on high concurrency transactional workloads
Upgrade directly from Version 9.7, 10.1 and 10.5 (3 releases back !)
Convert command :
db2cluster -cfs –enablereplication -filesystem db2bkup
PATH ON LOCAL HOST OTHER KNOWN PATHS DISK NAME RDNCY GRP ID COMMENT STATE SIZE FREE
-------------------- ----------------- -------------- ------------ ------- ------ ------ ------
/dev/dm-8 gpfs177nsd 1 DIRECT ATTACHED UP 15.0G 10.0G
Initiate the replication of data from redundancy group 1 to group 2 via a new db2cluster option:
db2cluster –cfs –replicate –filesystem db2bkup
16
Horizontal Scaling with DB2 pureScale on POWER Linux
Scale-out Throughput – DB2 pureScale on LE POWER Linux
140
Thousands of SQL statements/s
Sockets RDMA
120 133
100 105
82.7
80 70
72.8
60 50.8
40
26.8 38
20
0
1 member 2 members 3 members 4 members
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will
experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage
configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
Improved Table TRUNCATE Performance in pureScale
CF CF
Database
pureScale Member Subsets Review
CALL SYSPROC.WLM_CREATE_MEMBER_SUBSET( ‘BATCH',
'<databaseAlias>BATCH</databaseAlias>', '( 0, 1 )' );
CALL SYSPROC.WLM_CREATE_MEMBER_SUBSET( ‘OLTP',
'<databaseAlias>OLTP</databaseAlias>', '( 4, 5 )' );
CF CF
Database
pureScale Member Subsets Review
CALL SYSPROC.WLM_CREATE_MEMBER_SUBSET( ‘BATCH',
'<databaseAlias>BATCH</databaseAlias>', '( 0, 1 )' );
CALL SYSPROC.WLM_CREATE_MEMBER_SUBSET( ‘OLTP',
'<databaseAlias>OLTP</databaseAlias>', '( 4, 5 )' );
CF CF
Database
pureScale Member Subsets Review
CALL SYSPROC.WLM_CREATE_MEMBER_SUBSET( ‘BATCH',
'<databaseAlias>BATCH</databaseAlias>', '( 0, 1 )' );
CALL SYSPROC.WLM_CREATE_MEMBER_SUBSET( ‘OLTP',
'<databaseAlias>OLTP</databaseAlias>', '( 4, 5 )' );
BATCH OLTP
CF CF
Database
More Flexibility with New FAILOVER_PRIORITY
CALL SYSPROC.WLM_ALTER_MEMBER_SUBSET( ‘BATCH', NULL, ’(ADD 2 FAILOVER_PRIORITY 1)');
CALL SYSPROC.WLM_ALTER_MEMBER_SUBSET( ‘OLTP', NULL, ’(ADD 3 FAILOVER_PRIORITY 1)');
“Failover” Members
(used if ANY members
BATCH in the subset fail) OLTP
CF CF
Database
More Flexibility with New FAILOVER_PRIORITY
CALL SYSPROC.WLM_ALTER_MEMBER_SUBSET( ‘BATCH', NULL, ’(ADD 2 FAILOVER_PRIORITY 1)');
CALL SYSPROC.WLM_ALTER_MEMBER_SUBSET( ‘OLTP', NULL, ’(ADD 3 FAILOVER_PRIORITY 1)');
BATCH OLTP
CF CF
Database
More Flexibility with New FAILOVER_PRIORITY
SUBSET MEMBER FAILOVER_
PRIORITY
BATCH 0 0 Member 2 now used for both BATCH
BATCH 1 0 and OLTP. Use DB2’s integrated
BATCH 2 1 Workload Manager (WLM) to manage.
OLTP 4 0
OLTP 5 0
OLTP 3 1
OLTP 2 2
BATCH OLTP
CF CF
Database
HADR Support for SYNC and NEARSYNC Mode
Combines pureScale and HADR to provide a near continuously
available system with robust RPO=0 disaster recovery
Related capabilities & enhancements include
– Combined pureScale and HADR rolling update supported
1. On STANDBY CLUSTER : Perform pureScale rolling update and commit
2. Issue TAKEOVER (New primary cannot form HADR connection with (now downlevel) new standby)
3. On NEW STANDBY (OLD PRIMARY) CLUSTER : Offline, parallel Update and Commit, then Activate
– In V11.1, HADR log send and replay can occur during crash recovery
• Allows logs written during crash recovery to replayed while crash recovery is occurring
- Previously, log send and replay was disabled during crash recovery
• Allows more rapid attainment of PEER state
• Especially important in pureScale during online member crash recovery
• Support added for both pureScale and non-pureScale
CF CF
Primary Cluster
CF CF
Standby DR Cluster
Applications
M1 M3 CFp CFs M2 M4
DB2 V11 adds improved high availability for Geographically Dispersed DB2
pureScale Clusters (GDPC) for both RoCE & TCP/IP
– Multiple adapter ports per member and CF to support higher bandwidth and improved
redundancy at the adapter level
– Dual switches can be configured at each site to eliminate the switch as a site-specific
single point of failure (i.e. 4-switch configuration)
Site 1
Replication
Tiebreaker Site 1 Tiebreaker
Site 3 Host Host
Site 2 Site 2
Switch 3 Switch 4
Peer 2 Peer 2
Storage Switch 2
Storage
Member 2
Member 4 Secondary
Secondary Member 4
Member 2 CF
CF
pureScale Disaster Recovery Options
Geographically Dispersed
pureScale Cluster (GDPC)
CREATE TABLE sales(…) DB2 10.5 BLU Capacity DB2 v11.1 BLU Capacity
ORGANIZE BY COLUMN • 10s of TB • 1,000s of TB
DISTRIBUTE BY (C1,C2) • 100s of Cores • 1000s of Cores
BLU on DPF : Data Distribution Distribution Key
Rows
Just as with row organized tables …
Rows are distributed across DB partitions
via a distribution key
Hash Function
A distribution key is 1 or more columns
in the table
Each table defines its own distribution key
Joins will typically perform better if collocated
• Joined tables have matching distribution keys DB Partition DB Partition DB Partition
and are joined on those columns
Partition 0
partitions (slices) of the table Read TQ1
Process
When data is shipped across Return results
DB Partition 1 DB Partition 2
Demonstrating BLU MPP Virtually Linear Scaling
DB2 Version 11.1 on an IBM Power Systems E850 Cluster
– Each of 6 E850 with 24 P8 cores & 1TB RAM
300
Workload
- BD Insights
250
- 4TB Database
- 60 Concurrent Streams
200
BLU MPP System
150
had ~3x more
compute resource in
100 total
- 2.25x more cores
50 - 3x more RAM
- Faster I/O sub-system
0
Single node MPP 3 node 16x speedup (!)
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will
experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage
configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
36
Core BLU Acceleration Advances
Significant advances in the core in-memory BLU engine
– Native columnar nested-loop join
– Native columnar sort using new fast parallel radix sort technique
– Native columnar OLAP functions
– Parallel DML for declared global temporary tables (DGTTs)
– Query rewrite improvements
– Improved SORTHEAP utilization
– Faster SQL MERGE processing
Significant
Out-of-the-Box
These apply on single node Performance
and MPP clusters Leap
BLU Acceleration: Native Nested Loop Join
Native BLU support for Nested Loop joins
• Allows non-equality joins to be executed natively in the columnar run-time engine via
nested loop join
• Previously such joins would execute in ‘compensation’ mode (in the row run-time) engine
• May allow other dependent plan operators to also execute natively in the columnar
run-time engine
Net: significant performance improvement for queries where nested
loop join could play a significant role (eg. by non-equality joins)
Consider this example … (continued on next page)
SELECT
ITEM_DESC, SUM(PERCENT_DISCOUNT), SUM(EXTENDED_PRICE),
SUM(SHELF_COST_PCT_OF_SALE)
FROM
PERIOD, DAILY_SALES, PRODUCT, REPORT_PERIOD RP
WHERE
PERIOD.PERKEY=DAILY_SALES.PERKEY AND
PRODUCT.PRODKEY=DAILY_SALES.PRODKEY AND
PERIOD.CALENDAR_DATE BETWEEN RP.START_DATE AND RP.END_DATE AND
RP.RPT_NO in (33, 34)
GROUP BY
ITEM_DESC
BLU Acceleration: Native Nested Loop Join
Pre-V11
Without Native Nested Loop Join: Compensated Execution GRPBY
Native Execution
The result of joining the fact TBSCAN
SORT
table and the other dimension NLJOIN
(PRODUCT) is sent unfiltered to the /----------+-----------\
row engine (all the projected fact CTQ TBSCAN
HSJOIN TEMP
table columns must be converted to /---------+----------\ |
row format). Only then is time HSJOIN TBSCAN vv
CTQ
filtering applied. /-------+-------\ | |
TBSCAN TBSCAN PRODUCT TBSCAN
DAILY_SALES PERIOD REPORT_PERIOD
V11
With Native Nested Loop Join:
The REPORT_PERIOD to PERIOD range vv
CTQ
GRPBY
join is done within the columnar HSJOIN
engine, allowing the fact table to be /----------+-----------\
filtered with the data remaining in HSJOIN TBSCAN
/---------+---------\ |
columnar format. TBSCAN NLJOIN PRODUCT
| /-------+-------\
Net : Massive performance gains DAILY_SALES TBSCAN TBSCAN
REPORT_PERIOD PERIOD
for a class of queries.
10x or more common.
BLU Acceleration: Industry Leading Parallel Sort
New innovative radix sort Pre-V11: TPC-DS q51 302 seconds
implementation Compensated Execution RETURN
FILTER
Native Execution TBSCAN
SORT
• Industry leading performance and multi- UNION
/---------------+----------------\
core parallelism HSJOIN<
/--------------+---------------\
HSJOINx
/---+----\
TBSCAN TBSCAN TBSCAN TBSCAN
TEMP TEMP TEMP TEMP
• Research by IBM TJ Watson Research LMTQ
TBSCAN
TBSCAN
SORT
SORT CTQ
• http://www.vldb.org/pvldb/vol8/p1518-cho.pdf CTQ GRPBY
GRPBY ^HSJOIN
| /-------+-------\
^HSJOIN TBSCAN TBSCAN
/-------+-------\ | |
Q9 Q8
• Sort operations now execute directly on
encoded data natively in the columnar
run-time engine V11: : TPC-DS q51 45 seconds
RETURN
• Previously such joins would execute in CTQ
TBSCAN
‘compensation’ mode (in the row run-time) engine SORT
FILTER
TBSCAN
SORT
GRPBY GRPBY
• e.g. ORDER BY, FETCH FIRST N ROWS, ^HSJOIN
/-------+-------\
^HSJOIN
/-------+-------\
16x Faster !
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will
experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage
configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
Example BLU Single Node Overall Workload Gain
DB2 Version 11.1 on Intel Haswell EP
Query Throughput BD
Insights (800GB) Largest contributors to improvement
1.36x
1200
in this workload …
1000
Queries Per Hour
800
• Native Sort
600 Native BLU • Native OLAP (usually combined with sort)
400
Execution • Enables query plans to remain as much as possible
within the columnar engine
200
0
DB2 V10.5 FP5 DB2 V11.1 • SORTHEAP used for building hash tables for JOINs,
Improved GROUP BYs, and other runtime work
QpH 703.85 955.82
SORTHEAP • More efficient use allows for more concurrent intra-
query and inter-query operations to co-exist.
Utilization
Configuration Details
– 2 socket, 36 core Intel Xeon E5-2699 v3 @
2.3GHz
– 192GB RAM
– Internal Multiuser Analytical Workload 800GB
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will
experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage
configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
Numerous Core SQL Advances
New Advanced SQL Functionality
• New methodology for building advanced aggregate UDFs
• NZPLSQL support
• New data type (VARBINARY)
• Wide variety of additional SQL functions, including more flexible date/time functions and regular
expression functions
4
Many New SQL Functions (Supported on all tables (row and columnar))