l05 Ibm Db2 10.5 Blu Pot Data Skipping

Lab 05 – Data Skipping IBM Software
Lab 05 Data Skipping

Introduction
__1. BLU acceleration is not for the operational queries that access a single row or a few rows (likely
by using an index).
__2. Operational queries against row organized tables using indexes jump to the data directly and
thus data skipping is natural by the use of indexes.
__3. When queries against row organized tables cannot use indexes, a full table scan is performed.
__4. DB2 uses MRU (Most Recently Used) algorithm to keep pages in the buffer pool for transactional
workload using row organized tables.
__5. BLU acceleration is for data mart like analytic workloads that use activities such as grouping,
aggregation, range scans.
__6. DB2 uses scan friendly memory-caching algorithm to access column-organized data as opposed
to the LRU.
__7. In absence of indexes for column organized tables, DB2 achieves the data skipping by using
data from the synopsis table for the column being accessed.
__8. The synopsis tables is automatically maintained during INSERT, UDATE, and DELETE.
__9. Each row in the synopsis table is tied to a certain chunk of data records (usually 1024) and index
map relates these to the physical blocks on the disk.
__10. The following example shows how data is skipped.
__a. Step-1 – A large quantity of data sitting in the file system.
__b. Step-2 – Data is loaded in column organized tables and compression reduces the data.
__c. Step-3 – Data is accessed for a column
__d. Step-4 – Pages skipped for the range that do not quality as per the synopsis table.
IBM DB2 10.5 BLU Acceleration Page 45

IBM Software Lab 05 – Data Skipping
Synopsis Table
__11. In GNOME Command window, type cd5 to change the directory to the Lab 05.
$ cd5
__12. Type skip01 to check the contents of the synopsis table for column organized table
BLU.FACT_RX.
$ ./skip01
__13. The data from the synopsis table for BLU.FACT_RX is dumped in the skip01.log file. Run
gedit skip01.log to see the contents of the file.
$ gedit skip01.log
__14. Maximize the gedit screen. [Click Edit  Preferences  Uncheck Enable text
wrapping. Click Close.]
__15. Please notice that the data is nicely clustered around the low cardinality column MONTH_ID.
__16. Use horizontal scroll bar and notice that the PRODUCT_ID column is also clustered well.
Page 46 An IBM Proof of Technology

__17. Use horizontal scroll bar to go all the way towards right and check that the TSNMIN and TSNMAX
values are in the ranges of 1024 values. Internally DB2 knows how to map this TSN information
to the actual data page pointers.
__18. While you are in the gedit window, press CTRL-Q to quit the window.
__19. Run skip02 to check the column cardinality of the FACT_RX table.
$ ./skip02

Test
__20. Please notice that the MONTH_ID is a low cardinality column. Any query referencing MONTH_ID
column will skip pages that fall outside the value of this column. DB2 finds the index map of
those pages using values of the TSN column from the synopsis table.
__21. This analogy is true for all other columns and that is why indexes on columns are eliminated in
DB2 using BLU acceleration for column organized tables.
__22. Run skip03 to find out how many rows qualify for the predicate MONTH_ID = 200709 and for
the range of PRODUCT_ID from 11900000 to 12000000.
$ ./skip03
__23. In the script skip04, the instance is recycled so that we could run the query against column
organized table in the COLDB database without using any data cached in the buffer pool. Check
script skip04 and then run it.
$ cat skip04
$ ./skip04

__24. Run skip05 against row organized tables in the ROWDB database and compare the time elapsed
for the SQL.
$ ./skip05
Note: There is an index on MONTH_ID and as well as on PRODUCT_ID on row

organized table but there is none on the column organized table.
The data on column organized table is only accessed on column being

accessed and only on pages that qualify for scan as per its synopsis table.
__25. We will now drop index on FACT_RX table on MONTH_ID and PRODUCT_ID in the ROWDB
database and again run skip05 query. Run the following commands.
$ db2 connect to rowdb
$ db2 drop index blu.fact_rx_idx4
$ db2 drop index blu.fact_rx_idx6
$ ./skip05

__26. Please notice that the elapsed time after dropping the indexes increased from 1.12 sec to 19.6
seconds whereas the same query ran in 0.3 seconds for the column organized table.
__27. For the analytics workload, the DBAs need to add indexes to speed up the query and it is the
tuning exercise all the time whenever a query runs slow. It is well known that the number of
indexes and their sizes on a fact table is sometime larger than the actual size of the table.
__28. Any query that does not use an index will need to run on full table to produce the result.
__29. The analytics workload against column organized table does not require creating indexes on the
columns to speed up the queries.
__30. It should not be construed that all types of queries will run faster on column organized tables
than the row organized tables. Generally, it will be true for analytics queries using aggregation,
scan ranges etc. The OLTP type queries are not suited for column organized tables.

Access Path
__31. Run skip07 to generate the access plan for the same query for the ROWDB database.
$ ./skip07
__32. Run gedit skip07.exfmt to see the contents of the file. Check the full table scan for the
SQL.
$ gedit skip07.exfmt
__33. Press CTRL-Q to quit the gedit.

__34. Run skip08 which will create 2 indexes that we dropped earlier and run the explain plan again.
$ ./skip08
__35. Run gedit skip08.exfmt to see the contents of the file. See the index scan for the SQL
after 2 indexes are crated.

__37. Run skip09 which will run the explain plan on the same query against column organized table.
$ ./skip09
__38. Run gedit skip09.exfmt to see the contents of the file. See the CTQ (Column Table Queue)
scan for the SQL.


__39. Scroll down and check the plan detail on section 4 for TBSCAN and notice DYNAMIC LIST
argument PREFETCH for the table scan. This is special for the column organized table and you
will not see this for the row organized table.
__41. Type clear.

$ clear
** End of Lab 05: Data Skipping

l05 Ibm Db2 10.5 Blu Pot Data Skipping

Uploaded by

Copyright:

Available Formats

You might also like

l05 Ibm Db2 10.5 Blu Pot Data Skipping

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

l05 Ibm Db2 10.5 Blu Pot Data Skipping

Uploaded by

Copyright:

Available Formats

Lab 05 – Data Skipping IBM Software

Lab 05 Data Skipping

__10. The following example shows how data is skipped.

__a. Step-1 – A large quantity of data sitting in the file system.

__c. Step-3 – Data is accessed for a column

IBM DB2 10.5 BLU Acceleration Page 45

Page 46 An IBM Proof of Technology

IBM DB2 10.5 BLU Acceleration Page 47

Page 48 An IBM Proof of Technology

Note: There is an index on MONTH_ID and as well as on PRODUCT_ID on row

The data on column organized table is only accessed on column being

IBM DB2 10.5 BLU Acceleration Page 49

Page 50 An IBM Proof of Technology

__33. Press CTRL-Q to quit the gedit.

IBM DB2 10.5 BLU Acceleration Page 51

Page 52 An IBM Proof of Technology

__36. Press CTRL-Q to quit the gedit.

IBM DB2 10.5 BLU Acceleration Page 53

Page 54 An IBM Proof of Technology

__40. Press CTRL-Q to quit the gedit.

__41. Type clear.

** End of Lab 05: Data Skipping

IBM DB2 10.5 BLU Acceleration Page 55

You might also like