Professional Documents
Culture Documents
l05 Ibm Db2 10.5 Blu Pot Data Skipping
l05 Ibm Db2 10.5 Blu Pot Data Skipping
l05 Ibm Db2 10.5 Blu Pot Data Skipping
__2. Operational queries against row organized tables using indexes jump to the data directly and
thus data skipping is natural by the use of indexes.
__3. When queries against row organized tables cannot use indexes, a full table scan is performed.
__4. DB2 uses MRU (Most Recently Used) algorithm to keep pages in the buffer pool for transactional
workload using row organized tables.
__5. BLU acceleration is for data mart like analytic workloads that use activities such as grouping,
aggregation, range scans.
__6. DB2 uses scan friendly memory-caching algorithm to access column-organized data as opposed
to the LRU.
__7. In absence of indexes for column organized tables, DB2 achieves the data skipping by using
data from the synopsis table for the column being accessed.
__8. The synopsis tables is automatically maintained during INSERT, UDATE, and DELETE.
__9. Each row in the synopsis table is tied to a certain chunk of data records (usually 1024) and index
map relates these to the physical blocks on the disk.
__b. Step-2 – Data is loaded in column organized tables and compression reduces the data.
__d. Step-4 – Pages skipped for the range that do not quality as per the synopsis table.
Synopsis Table
__11. In GNOME Command window, type cd5 to change the directory to the Lab 05.
$ cd5
__12. Type skip01 to check the contents of the synopsis table for column organized table
BLU.FACT_RX.
$ ./skip01
__13. The data from the synopsis table for BLU.FACT_RX is dumped in the skip01.log file. Run
gedit skip01.log to see the contents of the file.
$ gedit skip01.log
__14. Maximize the gedit screen. [Click Edit Preferences Uncheck Enable text
wrapping. Click Close.]
__15. Please notice that the data is nicely clustered around the low cardinality column MONTH_ID.
__16. Use horizontal scroll bar and notice that the PRODUCT_ID column is also clustered well.
__17. Use horizontal scroll bar to go all the way towards right and check that the TSNMIN and TSNMAX
values are in the ranges of 1024 values. Internally DB2 knows how to map this TSN information
to the actual data page pointers.
__18. While you are in the gedit window, press CTRL-Q to quit the window.
__19. Run skip02 to check the column cardinality of the FACT_RX table.
$ ./skip02
Test
__20. Please notice that the MONTH_ID is a low cardinality column. Any query referencing MONTH_ID
column will skip pages that fall outside the value of this column. DB2 finds the index map of
those pages using values of the TSN column from the synopsis table.
__21. This analogy is true for all other columns and that is why indexes on columns are eliminated in
DB2 using BLU acceleration for column organized tables.
__22. Run skip03 to find out how many rows qualify for the predicate MONTH_ID = 200709 and for
the range of PRODUCT_ID from 11900000 to 12000000.
$ ./skip03
__23. In the script skip04, the instance is recycled so that we could run the query against column
organized table in the COLDB database without using any data cached in the buffer pool. Check
script skip04 and then run it.
$ cat skip04
$ ./skip04
__24. Run skip05 against row organized tables in the ROWDB database and compare the time elapsed
for the SQL.
$ ./skip05
__25. We will now drop index on FACT_RX table on MONTH_ID and PRODUCT_ID in the ROWDB
database and again run skip05 query. Run the following commands.
$ db2 connect to rowdb
$ db2 drop index blu.fact_rx_idx4
$ db2 drop index blu.fact_rx_idx6
$ ./skip05
__26. Please notice that the elapsed time after dropping the indexes increased from 1.12 sec to 19.6
seconds whereas the same query ran in 0.3 seconds for the column organized table.
__27. For the analytics workload, the DBAs need to add indexes to speed up the query and it is the
tuning exercise all the time whenever a query runs slow. It is well known that the number of
indexes and their sizes on a fact table is sometime larger than the actual size of the table.
__28. Any query that does not use an index will need to run on full table to produce the result.
__29. The analytics workload against column organized table does not require creating indexes on the
columns to speed up the queries.
__30. It should not be construed that all types of queries will run faster on column organized tables
than the row organized tables. Generally, it will be true for analytics queries using aggregation,
scan ranges etc. The OLTP type queries are not suited for column organized tables.
Access Path
__31. Run skip07 to generate the access plan for the same query for the ROWDB database.
$ ./skip07
__32. Run gedit skip07.exfmt to see the contents of the file. Check the full table scan for the
SQL.
$ gedit skip07.exfmt
__34. Run skip08 which will create 2 indexes that we dropped earlier and run the explain plan again.
$ ./skip08
__35. Run gedit skip08.exfmt to see the contents of the file. See the index scan for the SQL
after 2 indexes are crated.
$ gedit skip08.exfmt
__37. Run skip09 which will run the explain plan on the same query against column organized table.
$ ./skip09
__38. Run gedit skip09.exfmt to see the contents of the file. See the CTQ (Column Table Queue)
scan for the SQL.
$ gedit skip09.exfmt
__39. Scroll down and check the plan detail on section 4 for TBSCAN and notice DYNAMIC LIST
argument PREFETCH for the table scan. This is special for the column organized table and you
will not see this for the row organized table.