Professional Documents
Culture Documents
Performance Comparison
Performance Comparison
Performance Comparison
Page is basic building block of DB2 database which is also basic unit of I/O. DB2 used to store
data on a page row-wise architecturally. It means a given page in DB2 used to contain multiple
rows of data. With version 10.5, DB2 has introduced a totally new dimension of organization of
data on a page. According to this new architecture, data can also be organized as per column
or ‘column organized’. DB2 has introduced this new feature to enhance the performance of few
database queries manifold. This article will explain how data is actually stored on data page
for column organized tables and the performance benefits from it along with the comparison
between row organized and Column Organized table performance.
The below diagram in Figure 1 is a pictorial representation of a row organized table, where each
DB2 pages contain multiple rows. Please note that this is just an illustration, actually a DB2 page
contains more attributes such as Page Header information, slot directory etc. In a row organized
table a row cannot span multiple pages.
With a columnar format, a single page stores the row values of just a single column. DB2 allocates
extents of pages for each column in a column-organized table. The page size and extent size is
fixed for each table, based on the table space assigned when the CREATE TABLE statement is
executed. It means that when the database engine performs I/O to retrieve data, it just performs
I/O for only the columns that satisfy the query. This can save a lot of resources when processing
certain kinds of queries.
Below is the pictorial representation of Column Organized table in Figure 2, where we can see the
pages are allocated column wise. Each page is filled with data from a single column; the number of
rows with data in a page would vary.
TSN stands for Tuple Sequence Number (it's like Row ID for row organized table). Rows are
assigned a TSN, in an ascending order when the data row is stored. TSNs would uniquely identify
one row of data within a table. DB2 uses the TSN to locate and retrieve column data for a specific
row. When a column-organized table is created, DB2 creates a system generated page map index.
The index contains one entry for each page in the column-organized table. The index is assigned a
system generated name and uses a schema of SYSIBM. The page map indexes map the TSNs to
pages.
• Column-Organized Storage Object: Column organized storage object included user data
and available empty pages. The user column data is stored in a set of pages termed the
column-organized storage object.
• Data Object: Data object includes Meta data and column-level dictionaries. The column
dictionaries and some other table metadata are stored in the data storage object for the table.
CREATE TABLE STUDENT(ID SMALLINT NOT NULL, NAME VARCHAR(9),STREAM VARCHAR(10) )ORGANIZE BY COLUMN IN
USERSPACE1
To refer the full syntax of db2convert please refer the db2 10.5 infocenter link.
The following command converts all row-organized user-defined tables to column- organized
tables within the database SAMPLE:
code>db2convert –d SAMPLE
The following command converts the single row-organized table SCHEMA1.TAB1 to a column-
organized table in the database SAMPLE:
db2convert -d SAMPLE -z SCHEMA1 –t TAB1
For example:
Call admin_move_table ('TEST','ACCT2','AS2','AS2','AS2','ORGANIZE BY COLUMN',
'','','','COPY_USE_LOAD','MOVE')
First thing to lookout for is 'TABLEORG' column in SYSCAT.TABLES. A value of 'C' indicates that
it's a column organized table and 'R' indicates row organized table.
A new column MPAGES has also been added to SYSCAT.TABLES. It indicates total number of
pages for table metadata. It is non-zero only for a table that is organized by column. For column-
organized tables, the user table data is stored in a special column organized storage object. The
column NPAGES in SYSCAT.TABLES is a count of these pages with table column data. Since
the column dictionaries for column-organized tables can be much larger than the dictionary data
stored for row organized tables, the column MPAGES in SYSCAT.TABLES shows the total number
of pages used for table metadata, which includes these column dictionaries. The column FPAGES
in SYSCAT.TABLES shows a total page count for column-organized tables which includes both the
data and column organized object.
Below is an example of output for newly introduced columns for SYSCAT.TABLES for both Row
and Column Organized tables.
db2 "select substr(tabschema,1,8) as schema,substr(tabname,1,20) as
table,colcount,card,tableorg,npages,fpages,mpages from syscat.tables where tabschema='SCHEMA1'"
Apart from these, there are many columns added in the monitoring routines for column organized
tables. Few columns are like pool_col_l_reads, pool_col_p_reads, pool_async_col_reads,
pool_async_col_writes, object_col_l_reads and object_col_p_reads. These monitor elements
should be used to understand what portion of the I/O is being driven by access to column-
organized tables when a workload impacts both row-organized and column-organized tables.
With same data, we have created one row organized table names TEST2_ROWORG and one
column organized table names TEST2_COLORG. We used the clause ORAGANIZE BY ROW to
create the row organized table because the registry variable DB2_WORKLOAD has been set to
ANALYTICS, which ensures that by default all the tables are created as Column organized tables
as DFT_TABLE_ORG database configuration is set to COLUMN.
Both the tables have same structure and cardinality as shown in below Table Attributes table
COLCOUNT 12
Upon running the below query it can be seen that the column organized table has smaller size in
comparison to its row organized counterpart.
The column-organized versions of the table shows most of the disk space is allocated
in the column-organized object for the tables, the column COL_OBJECT_P_SIZE.
The column dictionaries and other metadata are allocated in the data object, shown as
DATA_OBJECT_P_SIZE. The page map index for a column-organized table is shown as a small
amount of space in the index object for each table, shown as INDEX_OBJECT_P_SIZE. The
amounts shown are kilobytes.
Next we ran the below similar queries for both row organized and column organized tables.
db2 "select * from schema2.test2_roworg"
db2 "select * from schema2.test2_colorg"
Below are the results for roworg and colorg tables for the above two similar queries. The results
were collected by using db2batch and db2exfmt (total query cost) tool as shown in Table Result1
table
POOL_READ_TIME 67 0
We can see the performance for Column Organized table is better in every aspect. This is because
the total number of pages to be fetched from disk to bufferpool is far less than row organized
tables for larger tables. The TOTAL_L_READS count for column organized table is just greater
than half of what is there for row organized tables and hence the overall benefit in performance.
Another query was run on the same tables, now by selecting only one column. The results were
even better now for column organized tables as shown in Table Result2 table
db2 "select col1 from schema2.test2_roworg"
db2 "select col1 from schema2.test2_colorg"
POOL_READ_TIME 26 0
Here, we can see the logical read has dropped considerably for column organized table. Overall
I/O is reduced with columnar technology because reading is done based upon the query needs.
This can often make 95 percent of the I/O go away because most analytic workloads access only
a subset of the columns. For example, if you're only accessing 20 columns of a 50-column table in
a traditional row store, you end up having to do I/O and consume server memory even for data in
columns that are of no interest to the task of satisfying the query.
We carried out a similar kind of testing on smaller tables. In smaller tables however we saw the
row organized version performed better than its column organized counterpart. This is because,
for smaller tables, more extents were allocated for column organized tables than row organized
columns. For example a table might have 30 rows with 50 columns in a table space with an extent
size of 4 pages. Each column would be allocated at least one extent, so the table would require at
least 200 pages for 30 data rows.
So, before implementing column organized table in your environment, it should be tested
thoroughly and then implemented. You should implement column organized tables where you
see the workload to be more analytical. These workloads are characterized by non-selective data
access (that is, queries access more than approximately 5% of the data), and extensive scanning,
grouping, and aggregation. Workloads that are transactional in nature should not use column-
organized tables. Traditional row-organized tables with index access are generally better suited for
these environments. In the case of mixed workloads, which include a combination of analytic query
processing and very selective access (involving less than 2% of the data), a mix of row-organized
and column-organized tables might be suitable.
• INSERT: Whenever DB2 has to perform insert for a given row in a column organized table
it has to insert data on pages which is equal to the number of columns. For example if the
table has 30 columns, a single insert will affect 30 pages. Newly inserted rows are always
stored in the last partially filled pages assigned to each column. DB2 does some special
internal processing for inserting rows into column-organized tables that buffers the new data,
to reduce the processing overhead when applications are inserting many new rows.
• DELETE: When delete operation happens on column organized tables, it does not actually
release space. Therefore there will be still no space available for new inserts. The data for
each column is flagged as deleted in the page for each column. Extents that contain pages
where the entire column data has been flagged as deleted can be released using a REORG
with RECLAIM EXTENTS.
• UPDATE: Updates are processed using a DELETE of the old data row and an INSERT for
the changed row. This impacts every page containing columns for the data row. This means
that an updated row consumes space in proportion to the number of times the row has been
updated until space reclamation occurs.
Easy to Maintain
Column Organized tables eradicates many headaches for DBAs that is generally needed for a
normal table. There is absolute no or very minimum maintenance activities needed for this kind of
tables.
• REORG: There is no need of classic offline/online reorg needed for this kind of tables.
REORG is only needed to reclaim the extents using RECLAIM EXTENT CLAUSE and that is
also automated if we set DB2_WORKLOAD to ANALYTICS.
• INDEXES: These kinds of tables do not need any index to be created by user. The system
automatically creates index called page map index to locate the data. Even if you run
db2advis for any query that is based on column organized tables, it will not suggest any
indexes to you.
• MDCS or MQTs: There is no requirement of MDCs and MQTs to improve the performance of
the query.
Basically column organized table is like LOAD and GO. You will have to simply create it and load it.
There is no or minimum maintenance activity needed for any improvement of performance. These
tables are already optimized to give the best performance (for certain workloads).
Points to remember
1. Column Organized tables has changed the traditional way of storing data on pages to improve
performance for analytical queries w.r.t Row Organized tables.
2. It is easy to implement. Just CREATE LOAD and GO.
3. It requires very minimum maintenance by the DBAs thus saving a great amount of man hours
related to table maintenance.
4. DML operations require a little bit of extra operations compared to traditional row organized
tables.
5. It performs far better than normal row organized tables for analytical workloads, where the
queries which access 5% of data.
6. A very large size and high cardinality usually gives better performance in column organized
format whereas low cardinality tables with row organized format responds better for analytical
workloads.
Acknowledgments
Special thanks to Manish Makwana for review and advice towards writing this article.
Resources
• Learn more from Database https:/ /www.ibm.com/developerworks/library/dm-1406convert-
table-db2105/
• Infocenter link https://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/
com.ibm.db2.luw.admin.dbobj.doc/doc/c0060592.html
• Stay current with developer technical events and webcasts focused on a variety of IBM
products and IT industry topics.
• Follow developerWorks on Twitter
• Get involved in the developerWorks Community. Connect with other developerWorks users
while you explore developer-driven blogs, forums, groups, and wikis.
Suvradeep Sensarma
Suvradeep Sensarma is a DB2 LUW DBA senior consultant with CapGemini India.
He has extensive experience working with customers in performance tuning and
support tips on DB2 on LUW. He is also certified in DB2 10.1 DBA for Linux, UNIX,
and Windows (Exam 611).
Abhinava Mukherjee
Abhinava Mukherjee is a DB2 LUW DBA with CapGemini India supporting multiple
projects in various domains. He has a Master degree in Computer Application along
with hands on IT experience in Datacenter Operation and System Administration
working on AIX, Unix, AS-400 and Windows environment. He is also certified in DB2
10.1 Fundamentals (Exam 610).
© Copyright IBM Corporation 2016
(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)