l05 Ibm Db2 10.5 Blu Pot Data Skipping

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 11

Lab 05 – Data Skipping IBM Software

Lab 05 Data Skipping


Introduction
__1. BLU acceleration is not for the operational queries that access a single row or a few rows (likely
by using an index).

__2. Operational queries against row organized tables using indexes jump to the data directly and
thus data skipping is natural by the use of indexes.

__3. When queries against row organized tables cannot use indexes, a full table scan is performed.

__4. DB2 uses MRU (Most Recently Used) algorithm to keep pages in the buffer pool for transactional
workload using row organized tables.

__5. BLU acceleration is for data mart like analytic workloads that use activities such as grouping,
aggregation, range scans.

__6. DB2 uses scan friendly memory-caching algorithm to access column-organized data as opposed
to the LRU.

__7. In absence of indexes for column organized tables, DB2 achieves the data skipping by using
data from the synopsis table for the column being accessed.

__8. The synopsis tables is automatically maintained during INSERT, UDATE, and DELETE.

__9. Each row in the synopsis table is tied to a certain chunk of data records (usually 1024) and index
map relates these to the physical blocks on the disk.

__10. The following example shows how data is skipped.

__a. Step-1 – A large quantity of data sitting in the file system.

__b. Step-2 – Data is loaded in column organized tables and compression reduces the data.

__c. Step-3 – Data is accessed for a column

__d. Step-4 – Pages skipped for the range that do not quality as per the synopsis table.

IBM DB2 10.5 BLU Acceleration Page 45


IBM Software Lab 05 – Data Skipping

Synopsis Table
__11. In GNOME Command window, type cd5 to change the directory to the Lab 05.
$ cd5

__12. Type skip01 to check the contents of the synopsis table for column organized table
BLU.FACT_RX.
$ ./skip01

__13. The data from the synopsis table for BLU.FACT_RX is dumped in the skip01.log file. Run
gedit skip01.log to see the contents of the file.
$ gedit skip01.log

__14. Maximize the gedit screen. [Click Edit  Preferences  Uncheck Enable text
wrapping. Click Close.]

__15. Please notice that the data is nicely clustered around the low cardinality column MONTH_ID.

__16. Use horizontal scroll bar and notice that the PRODUCT_ID column is also clustered well.

Page 46 An IBM Proof of Technology


Lab 05 – Data Skipping IBM Software

__17. Use horizontal scroll bar to go all the way towards right and check that the TSNMIN and TSNMAX
values are in the ranges of 1024 values. Internally DB2 knows how to map this TSN information
to the actual data page pointers.

__18. While you are in the gedit window, press CTRL-Q to quit the window.

__19. Run skip02 to check the column cardinality of the FACT_RX table.
$ ./skip02

IBM DB2 10.5 BLU Acceleration Page 47


IBM Software Lab 05 – Data Skipping

Test
__20. Please notice that the MONTH_ID is a low cardinality column. Any query referencing MONTH_ID
column will skip pages that fall outside the value of this column. DB2 finds the index map of
those pages using values of the TSN column from the synopsis table.

__21. This analogy is true for all other columns and that is why indexes on columns are eliminated in
DB2 using BLU acceleration for column organized tables.

__22. Run skip03 to find out how many rows qualify for the predicate MONTH_ID = 200709 and for
the range of PRODUCT_ID from 11900000 to 12000000.
$ ./skip03

__23. In the script skip04, the instance is recycled so that we could run the query against column
organized table in the COLDB database without using any data cached in the buffer pool. Check
script skip04 and then run it.
$ cat skip04
$ ./skip04

Page 48 An IBM Proof of Technology


Lab 05 – Data Skipping IBM Software

__24. Run skip05 against row organized tables in the ROWDB database and compare the time elapsed
for the SQL.
$ ./skip05

Note: There is an index on MONTH_ID and as well as on PRODUCT_ID on row


organized table but there is none on the column organized table.

The data on column organized table is only accessed on column being


accessed and only on pages that qualify for scan as per its synopsis table.

__25. We will now drop index on FACT_RX table on MONTH_ID and PRODUCT_ID in the ROWDB
database and again run skip05 query. Run the following commands.
$ db2 connect to rowdb
$ db2 drop index blu.fact_rx_idx4
$ db2 drop index blu.fact_rx_idx6
$ ./skip05

IBM DB2 10.5 BLU Acceleration Page 49


IBM Software Lab 05 – Data Skipping

__26. Please notice that the elapsed time after dropping the indexes increased from 1.12 sec to 19.6
seconds whereas the same query ran in 0.3 seconds for the column organized table.

__27. For the analytics workload, the DBAs need to add indexes to speed up the query and it is the
tuning exercise all the time whenever a query runs slow. It is well known that the number of
indexes and their sizes on a fact table is sometime larger than the actual size of the table.

__28. Any query that does not use an index will need to run on full table to produce the result.

__29. The analytics workload against column organized table does not require creating indexes on the
columns to speed up the queries.

__30. It should not be construed that all types of queries will run faster on column organized tables
than the row organized tables. Generally, it will be true for analytics queries using aggregation,
scan ranges etc. The OLTP type queries are not suited for column organized tables.

Page 50 An IBM Proof of Technology


Lab 05 – Data Skipping IBM Software

Access Path
__31. Run skip07 to generate the access plan for the same query for the ROWDB database.
$ ./skip07

__32. Run gedit skip07.exfmt to see the contents of the file. Check the full table scan for the
SQL.
$ gedit skip07.exfmt

__33. Press CTRL-Q to quit the gedit.

IBM DB2 10.5 BLU Acceleration Page 51


IBM Software Lab 05 – Data Skipping

__34. Run skip08 which will create 2 indexes that we dropped earlier and run the explain plan again.
$ ./skip08

__35. Run gedit skip08.exfmt to see the contents of the file. See the index scan for the SQL
after 2 indexes are crated.
$ gedit skip08.exfmt

Page 52 An IBM Proof of Technology


Lab 05 – Data Skipping IBM Software

__36. Press CTRL-Q to quit the gedit.

__37. Run skip09 which will run the explain plan on the same query against column organized table.
$ ./skip09

__38. Run gedit skip09.exfmt to see the contents of the file. See the CTQ (Column Table Queue)
scan for the SQL.
$ gedit skip09.exfmt

IBM DB2 10.5 BLU Acceleration Page 53


IBM Software Lab 05 – Data Skipping

Page 54 An IBM Proof of Technology


Lab 05 – Data Skipping IBM Software

__39. Scroll down and check the plan detail on section 4 for TBSCAN and notice DYNAMIC LIST
argument PREFETCH for the table scan. This is special for the column organized table and you
will not see this for the row organized table.

__40. Press CTRL-Q to quit the gedit.

__41. Type clear.


$ clear

** End of Lab 05: Data Skipping

IBM DB2 10.5 BLU Acceleration Page 55

You might also like