Index Aware

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 73

Raising Index Awareness

Date Version Notes


20th Jan 2005 Listed full details all ERs and bugs that have been raised as a result of this
document’s creation
Removed some of the TODO (outstanding items) but left some there for later
checking and to make sure readers knows there is some ambiguity
19th Jan 2005 Corrected note on bottom of page 29 to clarify when GROUP BY clause is
added
25th Oct 2004 First version released
Permission to use this document is authorized, provided that A) the use of the document is for informational and non-
commercial purposes, and B) the document retain the copyright notice contained in this disclaimer. Business Objects
may modify this document at any time and without notice. THIS DOCUMENT IS DELIVERED “AS IS” AND WITHOUT
WARRANTY OF ANY KIND INCLUDING, BUT NOT LIMITED TO, ALL IMPLIED WARRANTIES AND CONDITIONS OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
SHALL BUSINESS OBJECTS, ITS SUPPLIERS OR OTHER THIRD PARTIES MENTIONED HEREIN BE LIABLE FOR
ANY DAMAGES ARISING FROM THE USE OF THIS DOCUMENT INCLUDING, BUT NOT LIMITED TO, SPECIAL,
INDIRECT, PUNATIVE OR CONSEQUENTIAL DAMAGES.

The information in this document is subject to change without notice. If you find any problems with this article, please
report them using the feedback link on this site or at techweb@businessobjects.com. Business Objects does not warrant
that this document is error free.

Copyright © Business Objects 2000. All rights reserved.


Portions © Copyright 1996, Microsoft Corporation. All rights reserved.

Trademarks:
The Business Objects logo, BusinessMiner, BusinessQuery, and WebIntelligence are registered trademarks of Business
Objects S.A.
The Business Objects tagline, Broadcast Agent, BusinessObjects, Personal Trainer, Rapid Deployment Templates, and
Set Analyzer are trademarks of Business Objects S.A.
Microsoft, Windows, Windows NT, Access, Microsoft VBA, the Visual Basic Logo and other names of Microsoft products
referenced herein are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or
other countries.
Oracle is a registered trademark of Oracle Corporation. All other names of Oracle products referenced herein are
trademarks or registered trademarks of Oracle Corporation.
All other product and company names mentioned herein are the trademarks of their respective owners.

This software and documentation is commercial computer software under Federal Acquisition regulations, and is
provided only under the Restricted Rights of the Federal Acquisition Regulations applicable to commercial computer
software provided at private expense. The use, duplication, or disclosure by the U.S. Government is subject to
restrictions set forth in subdivision (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at 252.227-
7013.

U.S. Patent No. 5,555,403

Steven White Raising Index Awareness Page 2 of 73


Table of Contents
Introduction...................................................... ............5

Database performance concepts....................................6

Examples in this document............................................8

Example 1 – Use of a table’s primary key........................9

Example 2 – Avoiding joins or tables.............................15

Example 3 – Multiple Foreign Key entries......................20

Example 4 – Unique and non-unique values..................22

Example 5 – Using a Primary Key object in RESULTS pane31

Example 6 – Using an Index Awareness WHERE clause. . .35

Example 7 – Multi-column primary keys ........................38

Example 8 –Affect of Row Based Restrictions.................40

KEYS dialog explained.................................................42

SQL editor dialog.................................................. .......48

Limitations and rules ........................ ..........................49

Appendix A – Bugs and Enhancement Requests.............51

Appendix B – Accessing Index Aware information..........52

............................................... ..................................53

Appendix C – Outstanding Items...................................54

The Secrets of Oracle Bitmap Indexes ..........................54

Overview ...................................................................55

Steven White Raising Index Awareness Page 3 of 73


Bitmap Indexes and Deadlocks ....................................55

Steven White Raising Index Awareness Page 4 of 73


Introduction
This document explores the new feature available in Designer 6.5.1 called Index Awareness. This feature can
help make SQL queries faster by redirecting WHERE clause restrictions to columns that are indexed and/or to
generate a query that references less tables.

Who should read this?


Business Objects universe and report designers who wish to improve the performance on their reports.
Database schema designers and DBAs who support Business Objects developers would also benefit from
understanding the concepts in this document.

Environments used for this article


• Windows 2000 SP3
• Business Objects 6.5.1 GA (no Hotfixes or CSPs were installed)
• Microsoft Access 2002 SP2 (for most db interactions)
• SQL*Server 2000 (for some database interactions)

Pre-requisites
The reader should be very familiar with Business Objects universe and report building concepts. A
familiarization with RDBMS and database performance tuning would also be advantageous.

Disclaimer
The functionality and features detailed herein are based on the author’s understanding of the software version
stated earlier. Every attempt has been made to create an accurate document by testing workflows and
confirming understanding with corporate development, but the reader assumes all risk in the using the
information contained herein.

Resources needed
The following documents or files will be needed to perform the task:

Item description Filename


Universe for reporting on security and universe domains of the repository Combined.unv
Demo Islands Resort Marketing MS Access database that is delivered with Business Objects 5.x and 6.x Club.mdb

Other universe performance techniques


Index Awareness is only one of the many techniques available in universe design to improve the performance
of queries. Please check out the following in the relevant documentation and Knowledge Base articles:
• Aggregate Awarenes
• Short cut joins

Steven White Raising Index Awareness Page 5 of 73


Database performance concepts
This document does not intend to explain database performance tuning concepts in any depth but terminology
used in later sections of this documents should be explained to avoid any confusion.
• Database index
• Primary Key
• Foreign Key
• Surrogate keys
• Partitions
• Referential Integrity
• Influencing the database optimizer
• Tuning philosophy

Database Index
A database construct which allows faster access to rows in a table. Indexes are defined by the database
designer and maintained automatically by the RDBMS. Since indexes are maintained in a sorted sequence, the
database can quickly locate a row or rows in the table.

Primary Key
A database designer defined column or columns which uniquely identify a row in a table. If the RDBMS is
told about the Primary Key during database design, it will automatically maintain an index to help it guarantee
uniqueness and provide referential integrity.

Foreign Key
A database designer defined column or columns which match the Primary Key of another table. This facilitates
referential integrity and provides likely columns for joins to a table that contains the Primary Key or other
tables with Foreign Key instances.

Surrogate keys
A common Data Warehouse type of column where the actual Primary Key and Foreign Key values are
referenced via a smaller, faster column type i.e. a number instead of a longer character type

This allows joins and indexes to be faster because indexes are smaller in size (same rows, less width) and
therefore less memory and/or disk space is consumed during their navigation.

Partitions
A way of dividing the physical layout of a table across separate disk locations to avoid contention when
accessing the table via the disk system. Usually the partitioning is based on a commonly accessed dimension
such as time e.g. break a fact table into separate partitions based on the year\month of the transactions.

Referential Integrity
This is not really a performance tuning term but it is related because this RDBMS feature requires Foreign Key
and Primary Keys to exist.

Influencing the database optimizer


The optimizer, if it exists at all in your database, follows a set of internal rules which help it govern which path
would result in the fastest query execution. It uses the database constructs detailed earlier and knowledge of
the tables and query to makes its decision.

There is no guarantee that the database optimizer will use any specific database performance improving
construct just because the user has provided the opportunity for it to do so e.g. just because a index could be
used does not mean it will be used. The database optimizer may decide that other rules, limits and conditions
apply and it will choose a different route to the data.

Although Index Awareness will help the optimizer make a better decision, it is no guarantee that a Primary Key
or Foreign Key entry (see later in document) will cause the index to be used.

Steven White Raising Index Awareness Page 6 of 73


As with any performance tuning exercise, monitoring of the RDBMS path to the data (sometimes called the
‘explain plan’) is vital. The DBA must also keep RDBMS statistics up to date e.g. how large are the tables?
What is the cardinality of the values?

Tuning philosophy
Although no one will argue that using the Primary Key of a table is a good thing there seems to be 2 opinions
on whether tuning should avoid dimension tables or use them explicitly

Avoid dimension tables


This method suggests that all conditions should be directed to the ‘weightiest table’ i.e. largest which is usually
the fact table in database. This helps reduce the number of tables and therefore joins in a query because joins
are considered very database resource intensive.

Once the restrictions have been processed by the fact table, the database will join out to the smaller dimension
tables.

If this is your intention that you should setup your Index Awareness with Foreign Key entries that will reduce
the number of tables when the weightiest tables are involved.

Pro : less joins and tables


Con : possibly more rows in the join between fact and dimension tables

Use dimension tables


Alternatively this methods suggests that aim of the optimizer is to reduce the number of rows that join into the
fact table as a priority over the number of tables and joins.

Leaving conditions on the dimension tables causes the database to restrict the smaller dimension tables first
and then join into the fact table with less rows.

Pro : less rows in the join to the larger fact table


Con : more tables and joins in the query

If this is your intention that you should setup your Index Awareness with less Foreign Key entries that directly
point to the larger tables in the schema

Getting more information on tuning


Ask your DBA (database administrator) for more information on these concepts.

Steven White Raising Index Awareness Page 7 of 73


Examples in this document
Most of the examples in this document use a variation of the ‘club.mdb’ (aka Islands Resort Marketing) and
‘beach.unv’ the latter has been which has been renamed ‘IndxAwre.unv’

The performance improvements discussed in this document are theoretical. Because the ‘club.mdb’ is so small
and Microsoft Access is not considered a production capable database, it is unlikely you will notice any
performance improvement in running the small queries that ‘club.mdb’ is capable of.

Any LOV that are shown in this document that involve editing the LOV (DESIGNER\OBJECT
PROPERTIES\PROPERTIES\EDIT) have sorts applied to the objects to allow the hierarchical view of the
LOV dialog to display values without duplicates. This step is discussed in another document by the same
author.

Note : MS Access may have a ‘explain plan’ feature (mentioned earlier) but this was not investigated by the
author

Schema and its indexes


The following schema (club.mdb) will be used for the examples:
club.mdb

Steven White Raising Index Awareness Page 8 of 73


Example 1 – Use of a table’s primary key

Create a query that returns Customer and Revenue

The SQL generated will be:


SELECT
Customer.last_name,
sum(Invoice_Line.days * Invoice_Line.nb_guests * Service.price)
FROM
Customer,
Invoice_Line,
Service,
Sales
WHERE
( Customer.cust_id=Sales.cust_id )
AND ( Sales.inv_id=Invoice_Line.inv_id )
AND ( Invoice_Line.service_id=Service.service_id )
GROUP BY
Customer.last_name

Now add a condition that restricts it to return only customers from Dallas, Houston, Los Angeles, San Diego
and San Francisco.

Steven White Raising Index Awareness Page 9 of 73


Steven White Raising Index Awareness Page 10 of 73
The SQL will be:
SELECT
Customer.last_name,
sum(Invoice_Line.days * Invoice_Line.nb_guests * Service.price)
FROM
Customer,
Invoice_Line,
Service,
City,
Sales
WHERE
( City.city_id=Customer.city_id )
AND ( Customer.cust_id=Sales.cust_id )
AND ( Sales.inv_id=Invoice_Line.inv_id )
AND ( Invoice_Line.service_id=Service.service_id )
AND (
City.city IN ('Dallas', 'Houston', 'Los Angeles', 'San
Diego', 'San Francisco')
)
GROUP BY
Customer.last_name

Everything is as it should be.

But what if we knew that the ‘city’ column is not indexed and therefore is not a good candidate for WHERE
clause restrictions? (see schema description earlier)

We also know that column ‘city_id’ is the primary key of the city table but more so, it is indexed. From our
discussions on good performance earlier we determined that using the primary key of a table is one of the
fastest ways of retrieving the rows of that table.

So if we could get the SQL to automatically use the Primary Key of the city table, we could achieve a faster
retrieval of the records.

Note : Of course we could change the object’s SELECT definition to point to that Primary Key column
(city_id) but that means we expect users to know what those Primary Key values mean. Since Primary Key and
Foreign Key values are typically based on surrogate keys, and surrogate keys are automatically allocated by
the ETL or RDBMS itself, an end user will not normally know what Primary Key or Foreign Key value relates
to a real live value that they understand

Steven White Raising Index Awareness Page 11 of 73


Setting up the Index Awareness
1. Open Designer
2. Open the relevant universe e.g. IndxAwre.UNV
3. Locate the City object (in the Customer class)
4. Right click on the City object and choose ‘Object Properties’
5. Select KEYS tab
6. Press INSERT
7. In SELECT column, select the ‘…’ (3 dots) to open the ‘Edit SELECT statement’ dialog
8. Select or type the SQL ‘City.city_id’ note : more on this dialog later in the document
9. OK

Steven White Raising Index Awareness Page 12 of 73


Now let’s return to the query we had before, edit data provider and then view SQL and we discover that the
SQL has changed:

SELECT
Customer.last_name,
sum(Invoice_Line.days * Invoice_Line.nb_guests * Service.price)
FROM
Customer,
Invoice_Line,
Service,
City,
Sales
WHERE
( City.city_id=Customer.city_id )
AND ( Customer.cust_id=Sales.cust_id )
AND ( Sales.inv_id=Invoice_Line.inv_id )
AND ( Invoice_Line.service_id=Service.service_id )
AND (
City.city_id IN (11, 10, 13, 14, 12)
)
GROUP BY
Customer.last_name

Notice that the ‘city.city’ column reference has been replaced by the ‘city.city_id’ column and that the city
names have been replaced with the primary key values that represent their name.

This query will likely run much faster because the main condition is now on an indexed column.

Note : if the SQL does not show any change this could be a sporadic bug noticed by the author where the SQL
would only change when the LOV dialog was visited and OK’d whether any changes were made to the LOV or
not.

What happened behind the scenes?


When we told the City object about its Primary Key, we offered Business Objects an alternative piece of SQL
to use in any WHERE clause should that object be involved in a condition.

It would have used ‘City.city’ but with Index Awareness it used ‘City.city_id’ instead.

It determined this from the KEYS entries you made.

But how did Business Objects convert the City names you selected in the LOV into the primary key values
necessary for the SQL? Remember that the City table has the following values (that are relevant to our
example) :
City table
City_id (Primary Key) City Region
11 Dallas 20
10 Houston 20
13 Los Angeles 21
14 San Diego 21
12 San Francisco 21

It did this by adding the Primary Key SELECT to the query it normally generates for the LOV and when you
selected the City names, Business Objects matched the City names to the Primary Key values then.

Steven White Raising Index Awareness Page 13 of 73


The List of Values SQL

The SQL generated for a non-Index Aware LOV involving City would be:
SELECT DISTINCT City.city FROM City

But the SQL generated for the LOV on the Index Aware City object was:
SELECT DISTINCT City.city, City.city_id FROM City

Notice that the ‘City.city_id’ column has been added.

Steven White Raising Index Awareness Page 14 of 73


This SQL is visible by either editing the LOV for the City object and/or tracing SQL that passes thru the
middleware to the database.

So when you selected the City names from the LOV dialog, you were indirectly selecting the Primary Key
values as well. Think of the Primary Key values as a hidden column in the LOV dialog.

Summary
Index Awareness allowed us to automatically redirect a WHERE clause condition to another column (on the
same table for this example) that we know would provide better performance at query time.

We determined which column to choose as an alternative based on our knowledge of the database schema and
the RDBMS optimizer.

The LOV values we select actually tell Business Objects what Primary Key values to substitute in final query
SQL. The KEYS tab tells Business Objects which SQL syntax to substitute in the final query SQL.

Rule – when an Index Aware object is used in the CONDITIONS pane, the
Primary Key entry will replace the object’s SELECT

Rule – the operand ‘Show list of values’ dialog returns the Primary Key values
that match the visible values in the LOV dialog

Example 2 – Avoiding joins or tables

Steven White Raising Index Awareness Page 15 of 73


Another use of Index Awareness is to reduce the number of tables or joins involved in a query. This reduces the
number of joins in a query which can improve query performance. From previous sections we discussed that
joins are very expensive in performance terms for the RDBMS, so any possibility at avoiding them should be
sought.

The SQL from the last example improved performance by restricting on the Primary Key (indexed) of the city
table.
SELECT
Customer.last_name,
sum(Invoice_Line.days * Invoice_Line.nb_guests * Service.price)
FROM
Invoice_Line,
Sales,
City,
Customer,
Service
WHERE
( City.city_id=Customer.city_id )
AND ( Customer.cust_id=Sales.cust_id )
AND ( Sales.inv_id=Invoice_Line.inv_id )
AND ( Invoice_Line.service_id=Service.service_id )
AND (
City.city_id IN (11, 10, 13, 14, 12)
)
GROUP BY
Customer.last_name

This query has the Customer and City tables in it which both can provide the City_id column.

The City table is only needed to satisfy the WHERE clause and is not needed in the SELECT or GROUP BY
clauses.

Is it possible to remove the City table from the query completely? i.e. tell Business Objects to use the
Customer table to get City_id from if it can
City_id can come from more than one table

1. Locate the City object (in the Customer class)


2. Right click on the City object and choose ‘Object Properties’

Steven White Raising Index Awareness Page 16 of 73


3. Select KEYS tab
4. Skip to step 9 if the Primary Key from the previous example is still in place
5. Press INSERT
6. In the SELECT column, select the ‘…’ (3 dots) to open the ‘Edit SELECT statement’ dialog
7. Select or type the SQL ‘City.city_id’
8. OK
9. Start here if the Primary Key of ‘City.City_id’ is still in place from the previous example
10. Press INSERT
11. In the Key Type column, choose ‘Foreign Key’ if not already chosen
12. In the SELECT column, click on the ‘…’ (3 dots) to open the ‘Edit SELECT statement’ dialog
13. Select or type the SQL ‘Customer.city_id’
14. OK the dialog

Steven White Raising Index Awareness Page 17 of 73


Let’s return to the query panel and see what changes have taken place in the SQL:

SELECT
Customer.last_name,
sum(Invoice_Line.days * Invoice_Line.nb_guests * Service.price)
FROM
Invoice_Line,
Sales,
Customer,
Service
WHERE
( Customer.cust_id=Sales.cust_id )
AND ( Sales.inv_id=Invoice_Line.inv_id )
AND ( Invoice_Line.service_id=Service.service_id )
AND (
Customer.city_id IN (11, 10, 13, 14, 12)
)
GROUP BY
Customer.last_name

Note that the City table is no longer referenced in the query and that the City_id is being restricted using the
Customer table instead.

What affect did this have on our LOV?


None

When we check the SQL of the LOV for the City object we see it remains as it was in example 1 i.e. it has the
Primary Key SELECT added to it.

So we can conclude that the LOV SQL is only affected by the Primary Key entry and that the LOV selection
will always return the Primary Key entries ‘behind the scenes’. Therefore all columns you provide in Foreign
Key entries must have the same type of values and column type as the column referenced by the Primary Key
entry.

Steven White Raising Index Awareness Page 18 of 73


Will the Foreign Key entry always be applied?
No

If the Primary Key table is still needed in the query’s RESULTS, SORTS or CONDITIONS which don’t
involve Index Aware on the same table, then the table will remain in the query and the Foreign Key entry will
not be applied

City object used in the RESULTS causing the Foreign Key entry not to be used i.e. City table remains in the FROM
clause and is used in the WHERE clause (restricting on its Primary Key)

But if the Primary Key table (the original object’s table) is not needed anywhere but the WHERE clause, then
the Foreign Key entry will be applied:

Example of Foreign Key being applied because Primary Key table not being needed anywhere else in the query

Rule – a Foreign Key entry will be ignored if it does not result in less tables
being used. Business Objects can only use less tables if the Primary Key table
is referenced only inside the WHERE clause.

Steven White Raising Index Awareness Page 19 of 73


Example 3 – Multiple Foreign Key entries
The KEYS dialog allows more than one Foreign Key entry which allows us to deal with values which are
repeated (denormalized in DBA speak) in multiple tables.

Database designers may denormalize Primary Key values beyond whats necessary to satisfy constraints to
assist in performance and simplification of SQL generation.

It is expected to see an entity repeated twice. Initially in the entity own table as its Primary Key and then in a
table that refers to this Primary Key thru its Foreign Key.

Denormalization implies that the database designer has gone beyond this duplication and has more than 2
instances of the value in a schema.

If we search the ‘club.mdb’ (Islands Resorts Marketing) database to find such entities, we will find only one
i.e. sales person

Tables list
Sponsor.sales_id
Sales_Person.sales_id
Customer.sales_id

That gives us three places that we can get sales_id from.

So if we setup the KEYS for the ‘Sales Person’ object (in


Sales class) properly we can give Business Objects 3 choices
from which it can retrieve these values for the purposes of
restricting data in the WHERE clause (and hopefully speeding
up the query).

KEYS entries for Sales Person object

Steven White Raising Index Awareness Page 20 of 73


Customer, Sponsor and Revenue object with Sales Person condition

The Sponsor table is used to restrict on Sales Person (sales_id) because it is one of the Foreign Key entries and
the Sales_Person table is not used anywhere else in the query so can be dropped by Business Objects.

But why does Business Objects use the Sponsor table and not the Customer table? Both tables exist in the
FROM clause? Perhaps it’s the sequence of Foreign Key entries in the KEYS dialog? If we rearrange them so
that they are listed as follows:

And check the query SQL we see:

The SQL has changed again, this time using the Customer table to restrict Sales Person via sales_id.

Rule – the sequence of Foreign Key entries in the KEYS dialog determines
which entry is given preference i.e. the last enabled entry in the list that
results in the least number of tables.
Rule - If the ‘best’ Foreign Key entry does not result in a table count reduction
compared to the original query, the Primary Key entry will be the only one
that applies

Steven White Raising Index Awareness Page 21 of 73


Example 4 – Unique and non-unique values
In the previous examples all the values that were restricted were unique e.g. Sales Person and City. But what
happens when we use a value that is not unique e.g. Service

First setup the Primary Key entry:

And the build an example query (using Customer and Revenue as RESULTS and Service IN LIST <list of
values):

Why does the LOV has duplicates? Normally LOV have a DISTINCT on them and
no duplicates are allowed as shown in the screenshot on the right:

Steven White Raising Index Awareness Page 22 of 73


Let’s look at the LOV SQL and note that as expected (as we discovered in example 1) the Primary Key entry
has been added to the SELECT of the LOV SQL.

When we look at the service table, we see the duplicates there as well e.g. Hotel Room is repeated for SL_ID =
21, 31 and 41. SL_ID refers to Service Line.

Steven White Raising Index Awareness Page 23 of 73


This tell us that Service is not completely unique in the Service table. In fact it is only unique within a Service
Line (SL_ID).

So how can we help a user navigate to the correct value in a LOV when there are duplicates? Easy, we use
normal customized LOV in Designer.

EDIT the LOV, add the Service Line object, apply a primary ascending sort to the Service Line and a secondary ascending sort
to the Service object note : the sorts are only necessary if the LOV is to be viewed in Hierarchical View

This results in a LOV that looks like this:

Steven White Raising Index Awareness Page 24 of 73


As expected the Primary Key value does not show in the LOV dialog but note that the Hierarchical View and
the Tabular View show a different number of values. The Tabular View lists the values that are returned from
the LOV query, excluding the Primary Key entry ofcourse. The Hierarchical View applies a further ‘distinct’
on the display values before it allows the user to navigate the LOV hierarchy. This means that hierarchy view
ignores the impact of the hidden Primary Key value in the LOV.

If we use this LOV as it is with the Primary Key entry, we’ll see the following SQL generated (based on
original query RESULTS=Customer, Revenue CONDITIONS=Service IN LIST <values>):

The Primary Key values from the Service table has been selected (behind the scenes) and put into the SQL. These values
represent all the ‘Accommodation’ Service Line values. This is as expected.

Although the duplication is annoying, selecting all the Accomodation values while in Tabular View (in this
example) does result in the correct SQL and therefor results.

Steven White Raising Index Awareness Page 25 of 73


Let’s see what happens when we use the Hierarchical View of the LOV dialog.

This time only 3 Primary Key values have been selected e.g. 212, 211 and 213 which represent Bungalow, Hotel Room
and Hotel Suite where SL_ID=21!! This means using the Hierarchical View compared to the Tabular View would give
different results.

Clearly it would be unacceptable to users that depending on whether they used Tabular View or Hierarchical
View they got the correct results or not.

Note : Even before Index Aware, the Hierarchical View would always apply a further DISTINCT on the display
values. The problem only arises now because using Index Aware we are not actually limting the query by the
values we select in the dialog but by the Primary Key values indirectly being selected. See a separate article
by the author on the LOV dialog

So how do we make this LOV work correctly every time?

We look at the data and see that the Service name is not really unique in its own Service Line table
table. That table is actually a Service within Service Line table. So we look at
Service Line table and discover the Service Line name is not unique without being
qualified by which Resort the Service Line is available at.

We could look beyond Resort because it is shown with its country but the Resort
name is acutally unique across all countries. Although we determined this by
looking at the data, we really should confirm it by talking to the schema designer
as well. Since this may be test data which not reflect all the true relationships in
the lifetime of the database

So we need to determine what values should be displayed in the LOV to help


guarantee that the unique Primary Key values the user indirectly selects from the
hidden column represent the values that the user can see on screen.

Steven White Raising Index Awareness Page 26 of 73


Steven White Raising Index Awareness Page 27 of 73
We do this by further customizing the LOV for the Service object to the point that the Service name is fully
qualified by all the values we think are necessary i.e. Service Line and Resort

Tabular View – selecting the Bungalow, Hotel Room and Hote Suite values for Bahamas Beach and French Riveria
generates the correct SQL much like before.
1

Steven White Raising Index Awareness Page 28 of 73


But note what happens you when you return the LOV to review your selections i.e. only the Bahamas Beach
values are highlighted and if you OK the dialog only the Primary Key values representing those values are put
in the SQL.

Rule - The LOV dialog matches the values on the first match it can find and
does not take into consideration that previous Primary Key values were
remembered.

Steven White Raising Index Awareness Page 29 of 73


With Hierarchical View things are no different. Let’s see the affect of Hierarchical View on the LOV and SQL
now that we’ve customized the LOV to allow the user to navigate correctly to the values.

Hierarchical View gives the correct SQL

But when we return to the LOV dialog:

The same thing happens.

Clearly the problem is that the LOV dialog matches on the first occurrence of the values selected and does not
take the Primary Key value into account.

This limits our use of the Primary Key feature because


a. depending on Tabular View or Hierarchical View we could get different results
b. we can navigate to the correct values select them and then use them but the next time we return to the
LOV dialog, it will remove duplicates and only select the first instance of our value selected

Rule – use of Primary Key entries in LOV can only be used with accuracy when
the value is unique e.g. customer social security number, product code.

Steven White Raising Index Awareness Page 30 of 73


Example 5 – Using a Primary Key object in
RESULTS pane
So far in our examples we’ve placed Index Aware objects in the CONDITIONS pane of the qeury. This is the
main use of Index Awareness but what happens when those same objects are used in the RESULTS pane?

Let’s build a query with Customer and Revenue in the RESULTS pane.

First of all we’ll see what the SQL will be without Index Aware being active:

SQL without Index Aware being active

Now, let’s set the


Customer object (in the
Customer class) to have a
Primary Key entry of
‘Customer.cust_id’ i.e.

And the SQL is now:

Notice that the SELECT of the query has changed, a MAX function has been added to the Customer.last_name
(the Customer object did not have a MAX function in its SELECT definition).
Also the GROUP BY clause has been changed i.e. Customer.last_name replaced with Customer.cust_id

Steven White Raising Index Awareness Page 31 of 73


Note : If a query does not have a GROUP BY clause (no measures etc) then the MAX function is not applied
i.e. the Index Awareness itself does NOT cause a GROUP BY to be generated, that is done by normal Business
Objects SQL generation

Steven White Raising Index Awareness Page 32 of 73


What affect does this have on our results? None for this query. With or without Index Aware this query will
return the same results. Why? Because Customer.last_name is unique across the whole data set i.e. cust_id has
a 1-to-1 relationship to last_name.

Now in a real database (I apologize to the designer of the ‘club.mdb’ demo database!) would have last names
that are not unique.

The row counts for the 2 refreshes are indentical. The first refresh had Index Aware disabled and the second
has it enabled.

What happens when values are not unique?


Let’s make the last name of existing record in the Customer table equal to another last name:

Customer with last name ‘Oneda’ was changed to ‘Okumura’

Then run the same query with and without


Index Awareness and see what the row
counts are:

20 rows are returned when Index Awareness is not applied but 21 are returned when it is applied. The
difference is the result set is that Okumura last name has been aggregated at the database level when Index
Aware is not enabled but when Index Awareness was enabled, both Okumura records were returned from the
database and optionally aggregated locally within the report.

Steven White Raising Index Awareness Page 33 of 73


The reporter feature ‘Avoid duplicate row aggregation’ is used in the screenshot below. The reporter tables below with the header
‘Dupicates Aggregated’ have this feature unchecked i.e. OFF

So this means that Index Awareness can affect result sets and can cause less aggregation to be performed on
the database, which is exactly where we’d want aggregation to be peformed to reduce the load on Reporter
(BUSOBJ.EXE).

Rule – when Index Aware objects are used in the RESULTS pane their Primary
Key entries can cause additional rows to be returned from the database.
Report results should remain the same if no local reporter calculations
depend on the row count and level of uniqueness of the result set

Steven White Raising Index Awareness Page 34 of 73


Example 6 – Using an Index Awareness WHERE
clause

So far all our examples have ignored the WHERE clause part of the Primary Key and Foreign Key entries
within the KEYS dialog.

First, let’s see what happens when you add a WHERE clause to the Primary Key entry.

The WHERE clause SQL gets added to the LOV query for that object:

Steven White Raising Index Awareness Page 35 of 73


And it gets added to the query that the object is used in, whether the Index Aware object is used in the
RESULTS pane or the CONDITIONS pane:

So far I cannot think of a use for adding WHERE clauses to the Primary Key entries. I would assume if the
original object has a WHERE clause you would want the Primary Key WHERE clause to replace that much as
the Primary Key SELECT clause replaces the original SELECT clause but this means the universe developer
would need to look up the Primary Key values in advance

Note : TODO CHECK THIS STATEMENT If these Primary Key values are based on surrogate keys there is no
guarantee that a database reload will maintain the same surrogate key values.

Steven White Raising Index Awareness Page 36 of 73


What happens to the Foreign Key WHERE clause?
Assuming the Foreign Key entry is applied (depending on rules discussed elsewhere) its WHERE clause is
simply ANDed to the query once the Foreign Key replacement has taken place.

Based on:

We get:

Again, right now I cannot think of a use for the addition of the WHERE clause entry.

Rules:
• WHERE clause entries for Primary Key and Foreign Key entries are
ANDed to the query once the SELECT entry has been implemented.
• WHERE clauses must reference the same tables as the WHERE clause
for Primary Key entries
• WHERE clauses should try to reference the same tables as the WHERE
clause for Foreign Key entries to avoid introducing more complexity
into the query

Steven White Raising Index Awareness Page 37 of 73


Example 7 – Multi-column primary keys

Where a table does not have one column which uniquely identifies the rows but instead relies on more than
one column, this is a ‘multi-column primary key’

Note : this is not be to confused with ‘concatenated index’ or ‘composite index’ which is a database index that
is made up of more than one column in a table.

There are no examples of this in the ‘club.mdb’ so we have to create our own schema or make believe using
‘club.mdb’. To save time let’s make believe…

Let’s pretend that the Resort table has a


multi-column primary key based on
Country_id and Resort_id.

Since the KEYS dialog only allows one Primary Key entry we have to make that one entry relate to our multi-
column primary key. So we define a Primary Key entry as “Country_id & Resort.Resort_id” (the & is MS

The value of 61 comes from the concantenation of ‘6’ and ‘1’. ‘6’ represents the country_id for Australia and
‘1’ represents the resort_id for ‘Australian Reef’

Access syntax for concatenation).

Steven White Raising Index Awareness Page 38 of 73


Since this concatenation will be stated in the final query, it will likely negate any indexes that exist (typically
any manipulation of values at SQL query run time negate indexes that could have been used on the columns
affected by the manipulation) unless that index has been setup as a ‘function based index’ (see Database
Performance section earlier for an explanation)

So although the Index Awareness feature allows SELECT clauses that span multiple columns (or even tables) it
is unlikely that the database indexes will do the same. You should avoid multi-column primary key entries
and/or discuss indexes with your DBA. Surrogate keys are a good way around multi-column primary keys in
database design and your database designer should consider them

Rule – Index Awareness is best used with single column Primary Keys. Multi-
column primary keys should be avoided and/or your DBA should consider
‘function based indexes’ or surrogate keys

Steven White Raising Index Awareness Page 39 of 73


Example 8 –Affect of Row Based Restrictions

Row Based Restrictions (RBR) are settings created in Supervisor that allow additional WHERE clauses to be
ANDed to the final query SQL should that table be involved in the query. See Designer documentation for
more information.

Let’s see what happens:

City object with Primary Key and Foreign Key entries and no RBR applied to the City table. As expected the
restriction of ‘City EQUAL TO Albertville’ has been converted to a Primary Key value (25) pointing to the
Customer.City_id Foreign Key

Steven White Raising Index Awareness Page 40 of 73


Now with a RBR active on the City table:

The Customer.City_id is not used, instead the original Primary Key entry is used and the RBR is applied to the
City table (as expected)

Rule - If an Index Aware object tables list references a table that has RBR, we
cannot use any Foreign Key entries. Why? because the RBR in Supervisor will
be invalid for the Foreign Key table.

Steven White Raising Index Awareness Page 41 of 73


KEYS dialog explained

The KEYS tab of the Object Properties dialog contains the following controls:
• ‘type of values in PK and FK’
• DETECT
• PARSE
• Key Type
• Select
• Where
• Enable
• OK
• Cancel
• Apply
• Help
• Insert
• Delete

Note : Buttons INSERT, DELETE, Cancel and Help are not detailed below because they are standard
Designer (and Windows) buttons and their purpose should be self explanatory

Steven White Raising Index Awareness Page 42 of 73


Drop down - Type of values in PK and FK

This drop down lists all the standard data types that Business Objects handles i.e. Character, Date, Long Text
and Number. You have to inform Business Objects which data type is the relevant for the series of Primary
Key and Foreign Keys because the data type may be different from the object you are defining Index
Awareness for e.g. column ‘City.city’ is a character column and the object ‘City’ is also a Character type but its
Primary Key column ‘City.city_id’ is a numeric type.

All Primary Key and Foreign Key entries must be of the same value type because there is only one drop down
for all entries. This makes sense because depending on the rules applied, any of the Primary Key and Foreign
Key entries could be used in the final query.

If you select the wrong type and then try to PARSE then you will receive the error ‘The expression type is not
compatible with the object type’ as shown below:

Steven White Raising Index Awareness Page 43 of 73


DETECT

Detects Primary Key and Foreign Keys if the middleware and database support their detection. Some
databases do not support their detection e.g. MS Access.

Should the database be unable to supply key information, the error ‘Key is not supported’ is raised by Business
Objects.

Should the Index Aware object reference no tables or more than one table, the error ‘No detected key for

objects involving several tables’ will be raised:

Note : this error description is slightly misleading because it is raised even when the object makes reference to
no tables i.e. the Index Aware object SELECT and WHERE clauses result in no tables being referenced in the
TABLES list

Any Primary Key detected will over-write without warning any existing Primary Key entry

Once the Primary Key has been detected then the Foreign Keys are sought. All Foreign Key entries will be
added to the list if they didn’t previously exist i.e. no duplicates will be allowed

Steven White Raising Index Awareness Page 44 of 73


Knowing if the database allows key detection
The PRM parameter KEY_INFO_SUPPORTED states whether the target database allows key detection. This
parameter has a default value of Y (for Yes) so if it is missing from a PRM file, Business Objects will assume it
can supply key info and attempt it.

PRM file parameter that states if database allows key detection

Steven White Raising Index Awareness Page 45 of 73


PARSE
Sends the SELECT and WHERE SQL of each entry, enabled or not, to the target database to confirm its
syntax. Errors are listed in a single dialog:

Apart from the Business Objects caught error ‘The expression type is not compatible with the object type’ all
other errors will be database specific, usually relating to errors in the building of the SELECT and WHERE
SQL statements.

The following limitations of the PARSE detection should be noted:


a. Primary Key SELECT that does not reference any table is not detected (but the WHERE does
reference a table)
b. Does not detect the potential Cartesian Product in the final query when the Primary Key or Foreign
Key SELECT or WHERE clauses are not joined to at least one of the Index Aware object’s tables list
c. Does not detect the potential ‘Incompatible combination of objects’ in the final query when the tables
referenced by the Primary Key or Foreign Key SELECT or WHERE clauses are not on the same
context as the Index Aware object’s tables list.
Warning – In case you’re using MS Access for testing this you should note that MS Access does not always
detect syntax errors e.g. WHERE clause with a single word ‘steven’ is not detected. Expressions with words
separated by a space are detected as incorrect

Key Type
There are only 2 key types i.e. Primary Key and Foreign Key. Only one Primary Key
entry can exist for a single Index Aware object. There can be zero or more Foreign Key
entries but before even 1 Foreign Key entry can be used a Primary Key entry must
exist.

Note : Should you try to create more than one Primary Key entry, an existing Primary Key entry will be
converted automatically without warning to a Foreign Key. Its SQL will remain unchanged only its type.
This could lead to incorrect SQL in the final query if the changed Primary Key entry is not reviewed by
the universe developer

Select
Enter or build the SELECT SQL which will be used to:
a. Replace the Index Aware object’s original SELECT SQL should the object be used in the
CONDITIONS pane of a query (subject to other rules stated elsewhere)
b. Added to the SQL of the Index Aware object’s LOV query
c. Replace the Index Aware object’s SELECT SQL in a GROUP BY clause should the Index Aware
object be used in the RESULTS pane of a query

Tip – add SQL comments to the Primary Key and Foreign Key entries so that they easily identifiable in the
final query SQL e.g. [ Customer.City_id /* Primary Key */ ] for Oracle, where /* comment */ is Oracle
comment syntax

Steven White Raising Index Awareness Page 46 of 73


Where
Enter or build the WHERE SQL which will be ANDed to any query where the SELECT part is also used.

ANDing additional conditions can cause unexpected results. You cannot chose whether the condition ORed
instead. It is always ANDed. Consider the effect of ANDing the conditions carefully.

Tip – add SQL comments to the Primary Key and Foreign Key entries so that they are easily identifiable in the
final query SQL e.g. [ Customer.City_id IN (10, 11, 12) /* Foreign Key */ ]

Enable
Only Primary Key or Foreign Key entries that are enabled will be taken into consideration during SQL
generation.

But enabled or not, the entries will be checked when the PARSE button is used.

Apply or OK
Applies the changes and optionally closes the dialog.

Further checking is done to detect the following errors:


• “There is at least one key without a Select statement. Each key requires at least one Select statement.
You need to verify your keys” this matches the rules discussed earlier
• “You defined multiple foreign keys without a primary key. Each foreign key requires at least one
primary key. You need to redefine your foreign keys” this wording is a little confusing because it
should not say “requires at least one SELECT” but should say “requires one SELECT” instead

These errors should be self explanatory based on our understanding of the rules determined so far.

Steven White Raising Index Awareness Page 47 of 73


SQL editor dialog
The SQL editor available to the management of Index Aware is the same SQL editor dialog available to
predefined conditions, objects SELECT and objects WHERE clauses.

This dialog provides normal SQL constructs as well as more complex items such as:

If your target database allows them, you can add SQL comments to the Primary Key and Foreign Key entries
so that they easily identifiable in the final query SQL e.g. [ Customer.city_id /* Index
Awareness Foreign Key */ ] for Oracle

note : for Primary Key and Foreign Key


WHERE clauses, the PARSE button
within this dialog always presents the
error ‘Parse failed : Invalid Definition
(UNV0023)’ not matter what you enter.
Ignore this error for now.

TODO – check what affect @Aggregate_Aware which also changes tables involved in a query has on
Index Awareness

Steven White Raising Index Awareness Page 48 of 73


Limitations and rules
Like any product functionality there are limits to its design. The following issues have been identified below.

Limited set of operands


Index Awareness will only apply with the following operands:
• Equal to
• Different From
• In List
• Not In List

Right hand side of operand must be selected


When the operands above are used, only value(s) selected from the
‘Show list of values’ dialog will trigger Index Awareness functionality.

Typing or entering a value will not trigger it even if the values are
identical to those in the LOV

Why? Because Index Awareness uses the selection from the LOV
dialog to grab the Primary Key value for each entry. If you entered
your selection using ‘Type a new constant’ (for example) Index
Awareness would then have to go out to the database again and locate
the matching Primary Key value. (or at least interrogate the cached
LOV which it currently does not do)

Also, creating a condition by using the ‘Simple Condition’ toolbar button within the query panel does not
permit Index Awareness either

Only works with Full Client reports


Disappointing but we have to start somewhere!

Does not work with prompts


Disappointing but expected because of the following reasons:
a. Complexity – prompt population takes place after all the tables in a query has been decided upon.
Whereas Index Awareness helps build the overall query, changing tables etc. The point in the query
building workflow where these things happen are entirely different.
b. Shared LOVs – if more than one prompt exists in the report and/or universe with the same name it is
reduced to a single prompt display and common LOV. But what if the prompt was related to different
objects with completely different LOVs and KEY definitions? This complication has not been
overcome in the current version of Index Awareness
c. Prompt definition – prompts can be defined in the universe with hard coded LOV or LOVs that come
from another object entirely. This complication has not been overcome in this feature.

Steven White Raising Index Awareness Page 49 of 73


List of rules found in examples

The following are repetition of rules discovered and explained within the examples earlier in the document.
They are listed here for reference:

• A Foreign Key entry will be ignored if it does not result in less tables being used. Business Objects
can only use less tables if the Primary Key table is referenced only inside the WHERE clause.
• The sequence of Foreign Key entries in the KEYS dialog determines which entry is given preference
i.e. the last enabled entry in the list that results in the least number of tables. But if the ‘best’ Foreign
Key entry does not result in a table count reduction compared to the original query, the Primary Key
entry will be the only one that applies
• Use of Primary Key entries in LOV can only be used with accuracy when the value is unique e.g.
customer social security number, product code.
• When Index Aware objects are used in the RESULTS pane their Primary Key entries can cause
additional rows to be returned from the database. Report results should remain the same if no local
reporter calculations depend on the row count and level of uniqueness of the result set
• WHERE clause entries for Primary Key and Foreign Key entries are ANDed to the query once the
SELECT entry has been implemented.
• For Primary Key entries the WHERE clauses must reference the same tables as the Index Aware
object’s original tables list
• For Foreign Key entries the WHERE clauses should try to reference the same tables as the Index
Aware original object’s tables list avoid introducing more complexity into the query
• Index Awareness is best used with single column Primary Keys. Multi-column primary keys should
be avoided and/or your DBA should consider ‘function based indexes’ and multi-column indexes.
• If an Index Aware object tables list references a table that has RBR, we cannot use any Foreign Key
entries. Why? because the RBR in Supervisor will be invalid for the Foreign Key table.

Steven White Raising Index Awareness Page 50 of 73


Appendix A – Bugs and Enhancement Requests
Please find below a list of bugs and enhancement requests that the author has raised during the creation of this
document. If you notice the same problems, please contact Business Objects Customer Support at
www.techsupport.businessobjects,com and have them raise support cases which are linked to the bugs and
enhancement requests listed below.

The more feedback the developers receive, will help guide their coding efforts.

If they don’t know its broke, they wont fix it!

List of bugs noticed in product


Bug Description
n/a Query SQL sometimes not being updated when returning to query
NOTE : could not raise as a bug because could not replicate it consistently
1099793 SQL editor dialog - for Primary Key and Foreign Key WHERE clauses, the PARSE
button within this dialog always presents the error ‘Parse failed : Invalid Definition
(UNV0023)’ not matter what you enter. Ignore this error for now.

List of Enhancement Requests


ER number Description
30602 When editing WHERE the dialog says ‘edit SELECT’
30615 Allow Primary Key and Foreign Key entries in KEYS tab to be reordered
30613 Error on Apply/OK wording “You defined multiple foreign keys without a primary key.
Each foreign key requires at least one primary key. You need to redefine your foreign
keys” this wording is a little confusing. Basically you cannot have one Foreign Key
without first having a single Primary Key entry.
n/a Work with prompts
Note : did not raise it as a ER because I realize the difficulty involved
30612 Work with BOTH and EXCEPT
30614 Applying a simple filter to an object in results pane does not trigger Index Aware (only
LOV from ‘list of values’ operand does)
30610 within KEYS property of an object, the DETECT button raises an error "No detected key
for objects involving several tables". but the error is also raised when the object does not
refer to any table. Suggest the error text is "Keys can only be detected for objects that refer
to one and only one table"
30616 Should you try to create more than one Primary Key entry, an existing Primary Key entry
will be converted automatically without warning to a Foreign Key. Its SQL will remain
unchanged only its type. This could lead to incorrect SQL in the final query if the changed
Primary Key entry is not reviewed by the universe developer suggest a dialog is raised
warning the user to review the changed PK entry (which is now a FK)

Steven White Raising Index Awareness Page 51 of 73


Appendix B – Accessing Index Aware information

There are 2 ways to automate access to universe information:


1. Designer SDK
2. Business Objects repository tables

Designer SDK
There seems to be no way using the SDK to access Index Aware information i.e. the properties and methods
are not exposed.

Business Objects repository tables


The new repository universe domain table that contains Index Aware information is UNV_OBJECT_KEY but
existing UNV_OBJECT_DATA and UNIVERSE_OBJECT tables are also affected.
Table UNV_OBJECT_KEY

A universe exists, called ‘combined.unv’ which was written by the author of this document, will include a first
attempt at reporting on the KEY information. The additions to the universe are not perfect, its limitations are:
1. outer joins are not handled – so trying to list objects whether they have KEY records or not will not
work i.e. only objects with KEY records will be returned
2. returning the SELECT and WHERE clauses in a single Business Objects query will generate 2
SELECT statements because these are on separate contexts. This may not be correct but it works for
now

Steven White Raising Index Awareness Page 52 of 73


The following is a piece of SQL that shows the SELECT clauses of any Index Aware objects:
SELECT
base_class.CLS_NAME,
UNV_OBJECT.OBJ_NAME,
object_key_links.KEY_STATES,
object_key_links.KEY_POSITION,
object_keys.OBJ_NAME,
key_select_clause.OBJ_DATAVALUE
FROM
UNV_OBJECT_KEY object_key_links,
UNV_CLASS base_class,
UNV_OBJECT,
UNV_OBJECT object_keys,
UNV_OBJECT_DATA key_select_clause
WHERE
( UNV_OBJECT.OBJECT_ID=object_key_links.OBJECT_ID and
UNV_OBJECT.UNIVERSE_ID=object_key_links.UNIVERSE_ID )
AND ( key_select_clause.OBJ_DATATYPE = 'S' )
AND ( key_select_clause.OBJ_SLICE=1 )
AND ( object_key_links.UNIVERSE_ID=object_keys.UNIVERSE_ID and
object_key_links.KEY_ID=object_keys.OBJECT_ID )
AND ( object_keys.OBJECT_ID=key_select_clause.OBJECT_ID and
object_keys.UNIVERSE_ID=key_select_clause.UNIVERSE_ID )
AND base_class.CLASS_ID=UNV_OBJECT.CLASS_ID
AND base_class.UNIVERSE_ID=UNV_OBJECT.UNIVERSE_ID

It is suggested you experiment with the 3 tables highlighted and the combined.unv universe.

Steven White Raising Index Awareness Page 53 of 73


Appendix C – Outstanding Items
The list below are items that need to be clarified or completed in this document. This will not teach the reader
anything about Index Awareness, they are simply a reminder to the author to complete them for the next
release of the document.

1. The use Primary Key to solve the old problem of hierarchical prompts only returning the value
selected and not the path

2. Consider prefixing object value with Primary Key value

3. What happens when you ignore ‘the expression type us not compatible with the object type’

The Secrets of Oracle Bitmap Indexes

Steven White Raising Index Awareness Page 54 of 73


Overview
Oracle's two major index types are Bitmap indexes and B-Tree indexes. B-Tree indexes are the regu
OLTP systems make much use of, and bitmap indexes are a highly compressed index type that tends
primarily for data warehouses.
Characteristic of Bitmap Indexes
• For columns with very few unique values (low cardinality)

Columns that have low cardinality are good candidates (if the cardinality of a column is <= 0.1 % th
column is ideal candidate, consider also 0.2% – 1%)
• Tables that have no or little insert/update are good candidates (static data in warehouse)

• Stream of bits: each bit relates to a column value in a single row of table

create bitmap index person_region on person (region);

Row Region North East West South


1 North 1 0 0 0
2 East 0 1 0 0
3 West 0 0 1 0
4 West 0 0 1 0
5 South 0 0 0 1
6 North 1 0 0 0
Advantage of Bitmap Indexes
The advantages of them are that they have a highly compressed structure, making them fast to read
structure makes it possible for the system to combine multiple indexes together for fast access to the
table.
Compressed indexes, like bitmap indexes, represent a trade-off between CPU usage and disk space u
compressed structure is faster to read from disk but takes additional CPU cycles to decompress for a
uncompressed structure imposes a lower CPU load but requires more bandwidth to read in a short tim
One belief concerning bitmap indexes is that they are only suitable for indexing low-cardinality data
necessarily true, and bitmap indexes can be used very successfully for indexing columns with many
different values.
Disadvantage of Bitmap Indexes
The reason for confining bitmap indexes to data warehouses is that the overhead on maintaining th
enormous. A modification to a bitmap index requires a great deal more work on behalf of the system
modification to a b-tree index. In addition, the concurrency for modifications on bitmap indexes is dr
Bitmap Indexes and Deadlocks
Bitmap indexes are not appropriate for tables that have lots of single row DML operations (inserts) a
concurrent single row DML operations. Deadlock situations are the result of concurrent inserts as the
example shows: Open two windows, one for Session 1 and one for Session 2
Session 1 Session 2

create table bitmap_index_demo (


value varchar2(20)
);

Steven White Raising Index Awareness Page 55 of 73


Bitmap Index vs. B-tree Index: Which and When?
by Vivek Sharma
Understanding the proper application of each index can have a big impact on performance.
Conventional wisdom holds that bitmap indexes are most appropriate for columns having
low distinct values—such as GENDER, MARITAL_STATUS, and RELATION. This
assumption is not completely accurate, however. In reality, a bitmap index is always
advisable for systems in which data is not frequently updated by many concurrent systems.
In fact, as I'll demonstrate here, a bitmap index on a column with 100-percent unique values
(a column candidate for primary key) is as efficient as a B-tree index.
In this article I'll provide some examples, along with optimizer decisions, that are common
for both types of indexes on a low-cardinality column as well as a high-cardinality one.
These examples will help DBAs understand that the usage of bitmap indexes is not in fact
cardinality dependent but rather application dependent.
Comparing the Indexes
There are several disadvantages to using a bitmap index on a unique column—one being the
need for sufficient space (and Oracle does not recommend it). However, the size of the
bitmap index depends on the cardinality of the column on which it is created as well as the
data distribution. Consequently, a bitmap index on the GENDER column will be smaller
than a B-tree index on the same column. In contrast, a bitmap index on EMPNO (a
candidate for primary key) will be much larger than a B-tree index on this column. But
because fewer users access decision-support systems (DSS) systems than would access
transaction-processing (OLTP) ones, resources are not a problem for these applications.
To illustrate this point, I created two tables, TEST_NORMAL and TEST_RANDOM. I
inserted one million rows into the TEST_NORMAL table using a PL/SQL block, and then
inserted these rows into the TEST_RANDOM table in random order:
Create table test_normal (empno number(10), ename varchar2(30), sal
number(10));

Begin
For i in 1..1000000
Loop
Insert into test_normal
values(i, dbms_random.string('U',30), dbms_random.value(1000,7000));
If mod(i, 10000) = 0 then
Commit;
End if;
End loop;
End;
/

Create table test_random


as
select /*+ append */ * from test_normal order by dbms_random.random;

SQL> select count(*) "Total Rows" from test_normal;

Total Rows
----------
1000000

Steven White Raising Index Awareness Page 56 of 73


Elapsed: 00:00:01.09

SQL> select count(distinct empno) "Distinct Values" from test_normal;

Distinct Values
---------------
1000000

Elapsed: 00:00:06.09
SQL> select count(*) "Total Rows" from test_random;

Total Rows
----------
1000000

Elapsed: 00:00:03.05
SQL> select count(distinct empno) "Distinct Values" from test_random;

Distinct Values
---------------
1000000

Elapsed: 00:00:12.07
Note that the TEST_NORMAL table is organized and that the TEST_RANDOM table is
randomly created and hence has disorganized data. In the above table, column EMPNO has
100-percent distinct values and is a good candidate to become a primary key. If you define
this column as a primary key, you will create a B-tree index and not a bitmap index because
Oracle does not support bitmap primary key indexes.
To analyze the behavior of these indexes, we will perform the following steps:
1. On TEST_NORMAL:
A. Create a bitmap index on the EMPNO column and execute some queries with
equality predicates.
B. Create a B-tree index on the EMPNO column, execute some queries with
equality predicates, and compare the logical and physical I/Os done by the
queries to fetch the results for different sets of values.
2. On TEST_RANDOM:
A. Same as Step 1A.
B. Same as Step 1B.
3. On TEST_NORMAL:
A. Same as Step 1A, except that the queries are executed within a range of
predicates.
B. Same as Step 1B, except that the queries are executed within a range of
predicates. Now compare the statistics.
4. On TEST_RANDOM:
A. Same as Step 3A.
B. Same as Step 3B.
5. On TEST_NORMAL:

Steven White Raising Index Awareness Page 57 of 73


A. Create a bitmap index on the SAL column, and then execute some queries
with equality predicates and some with range predicates.
B. Create a B-tree index on the SAL column, and then execute some queries
with equality predicates and some with range predicates (same set of values
as in Step 5A). Compare the I/Os done by the queries to fetch the results.
6. Add a GENDER column to both of the tables, and update the column with three
possible values: M for male, F for female, and null for N/A. This column is updated
with these values based on some condition.
7. Create a bitmap index on this column, and then execute some queries with equality
predicates.
8. Create a B-tree index on the GENDER column, and then execute some queries with
equality predicates. Compare to results from Step 7.
Steps 1 to 4 involve a high-cardinality (100-percent distinct) column, Step 5 a normal-
cardinality column, and Steps 7 and 8 a low-cardinality column.
Step 1A (on TEST_NORMAL)
In this step, we will create a bitmap index on the TEST_NORMAL table and then check for
the size of this index, its clustering factor, and the size of the table. Then we will run some
queries with equality predicates and note the I/Os of these queries using this bitmap index.
SQL> create bitmap index normal_empno_bmx on test_normal(empno);

Index created.

Elapsed: 00:00:29.06
SQL> analyze table test_normal compute statistics for table for all
indexes for all indexed columns;

Table analyzed.

Elapsed: 00:00:19.01
SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size
in MB"
2 from user_segments
3* where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_BMX');

SEGMENT_NAME Size in MB
------------------------------------ ---------------
TEST_NORMAL 50
NORMAL_EMPNO_BMX 28

Elapsed: 00:00:02.00
SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR
------------------------------ ---------------------------------
NORMAL_EMPNO_BMX 1000000

Elapsed: 00:00:00.00
You can see in the preceding table that the size of the index is 28MB and that the clustering
factor is equal to the number of rows in the table. Now let's execute the queries with equality
predicates for different sets of values:
SQL> set autotrace only
SQL> select * from test_normal where empno=&empno;
Enter value for empno: 1000
old 1: select * from test_normal where empno=&empno

Steven White Raising Index Awareness Page 58 of 73


new 1: select * from test_normal where empno=1000

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
d=1 Bytes=34)
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_EMPNO_BMX'

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
515 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Step 1B (on TEST_NORMAL)
Now we will drop this bitmap index and create a B-tree index on the EMPNO column. As
before, we will check for the size of the index and its clustering factor and execute the same
queries for the same set of values, to compare the I/Os.
SQL> drop index NORMAL_EMPNO_BMX;

Index dropped.

SQL> create index normal_empno_idx on test_normal(empno);

Index created.

SQL> analyze table test_normal compute statistics for table for all
indexes for all indexed columns;

Table analyzed.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size


in MB"
2 from user_segments
3 where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_IDX');

SEGMENT_NAME Size in MB
---------------------------------- ---------------
TEST_NORMAL 50
NORMAL_EMPNO_IDX 18

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR
---------------------------------- ----------------------------------
NORMAL_EMPNO_IDX 6210
It is clear in this table that the B-tree index is smaller than the bitmap index on the EMPNO
column. The clustering factor of the B-tree index is much nearer to the number of blocks in
a table; for that reason, the B-tree index is efficient for range predicate queries.
Now we'll run the same queries for the same set of values, using our B-tree index.

Steven White Raising Index Awareness Page 59 of 73


SQL> set autot trace
SQL> select * from test_normal where empno=&empno;
Enter value for empno: 1000
old 1: select * from test_normal where empno=&empno
new 1: select * from test_normal where empno=1000

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
d=1 Bytes=34)
2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (C
ost=3 Card=1)

Statistics
----------------------------------------------------------
29 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
515 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
As you can see, when the queries are executed for different set of values, the number of
consistent gets and physical reads are identical for bitmap and B-tree indexes on a 100-
percent unique column.
BITMAP B-TREE
Consistent Physical EMPNO Consistent Physical
Reads Reads Reads Reads
5 0 1000 5 0
5 2 2398 5 2
5 2 8545 5 2
5 2 98008 5 2
5 2 85342 5 2
5 2 128444 5 2
5 2 858 5 2
Step 2A (on TEST_RANDOM)
Now we'll perform the same experiment on TEST_RANDOM:
SQL> create bitmap index random_empno_bmx on test_random(empno);

Index created.

SQL> analyze table test_random compute statistics for table for all
indexes for all indexed columns;

Table analyzed.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size


in MB"

Steven White Raising Index Awareness Page 60 of 73


2 from user_segments
3* where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_BMX');

SEGMENT_NAME Size in MB
------------------------------------ ---------------
TEST_RANDOM 50
RANDOM_EMPNO_BMX 28

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR
------------------------------ ---------------------------------
RANDOM_EMPNO_BMX 1000000
Again, the statistics (size and clustering factor) are identical to those of the index on the
TEST_NORMAL table:
SQL> select * from test_random where empno=&empno;
Enter value for empno: 1000
old 1: select * from test_random where empno=&empno
new 1: select * from test_random where empno=1000

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1
Bytes=34)
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP INDEX (SINGLE VALUE) OF 'RANDOM_EMPNO_BMX'

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
515 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Step 2B (on TEST_RANDOM)
Now, as in Step 1B, we will drop the bitmap index and create a B-tree index on the EMPNO
column.
SQL> drop index RANDOM_EMPNO_BMX;

Index dropped.

SQL> create index random_empno_idx on test_random(empno);

Index created.

SQL> analyze table test_random compute statistics for table for all
indexes for all indexed columns;

Table analyzed.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size


in MB"
2 from user_segments
3 where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_IDX');

Steven White Raising Index Awareness Page 61 of 73


SEGMENT_NAME Size in MB
---------------------------------- ---------------
TEST_RANDOM 50
RANDOM_EMPNO_IDX 18

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR
---------------------------------- ----------------------------------
RANDOM_EMPNO_IDX 999830
This table shows that the size of the index is equal to the size of this index on
TEST_NORMAL table but the clustering factor is much nearer to the number of rows,
which makes this index inefficient for range predicate queries (which we'll see in Step 4).
This clustering factor will not affect the equality predicate queries because the rows have
100-percent distinct values and the number of rows per key is 1.
Now let's run the queries with equality predicates and the same set of values.
SQL> select * from test_random where empno=&empno;
Enter value for empno: 1000
old 1: select * from test_random where empno=&empno
new 1: select * from test_random where empno=1000

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1
Bytes=34)
2 1 INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_IDX' (NON-UNIQUE)
(Cost=3 Card=1)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
515 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Again, the results are almost identical to those in Steps 1A and 1B. The data distribution did
not affect the amount of consistent gets and physical reads for a unique column.
Step 3A (on TEST_NORMAL)
In this step, we will create the bitmap index (similar to Step 1A). We know the size and the
clustering factor of the index, which equals the number of rows in the table. Now let's run
some queries with range predicates.
SQL> select * from test_normal where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old 1: select * from test_normal where empno between &range1 and &range2
new 1: select * from test_normal where empno between 1 and 2300

2300 rows selected.

Steven White Raising Index Awareness Page 62 of 73


Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=451 Card=2299
Bytes=78166)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=451
Card=2299 Bytes=78166)
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_BMX'

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
331 consistent gets
0 physical reads
0 redo size
111416 bytes sent via SQL*Net to client
2182 bytes received via SQL*Net from client
155 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2300 rows processed
Step 3B (on TEST_NORMAL)
In this step, we'll execute the queries against the TEST_NORMAL table with a B-tree index
on it.
SQL> select * from test_normal where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old 1: select * from test_normal where empno between &range1 and &range2
new 1: select * from test_normal where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=23 Card=2299
Bytes=78166)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=23
Card=2299 Bytes=78166)
2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE)
(Cost=8 Card=2299)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
329 consistent gets
15 physical reads
0 redo size
111416 bytes sent via SQL*Net to client
2182 bytes received via SQL*Net from client
155 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2300 rows processed
When these queries are executed for different sets of ranges, the results below show:

Steven White Raising Index Awareness Page 63 of 73


BITMAP B-TREE
EMPNO
Consistent Physical (Range) Consistent Physical
Reads Reads Reads Reads
331 0 1-2300 329 0
285 0 8-1980 283 0
346 19 1850-4250 344 16
427 31 28888-31850 424 28
371 27 82900-85478 367 23
2157 149 984888-1000000 2139 35
As you can see, the number of consistent gets and physical reads with both indexes is again
nearly identical. The last range (984888-1000000) returned almost 15,000 rows, which was
the maximum number of rows fetched for all the ranges given above. So when we asked for
a full table scan (by giving the hint /*+ full(test_normal) */ ), the consistent read and
physical read counts were 7,239 and 5,663, respectively.
Step 4A (on TEST_RANDOM)
In this step, we will run the queries with range predicates on the TEST_RANDOM table
with bitmap index and check for consistent gets and physical reads. Here you'll see the
impact of the clustering factor.
SQL>select * from test_random where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old 1: select * from test_random where empno between &range1 and &range2
new 1: select * from test_random where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:08.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=453 Card=2299
Bytes=78166)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=453
Card=2299 Bytes=78166)
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_BMX'

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2463 consistent gets
1200 physical reads
0 redo size
111416 bytes sent via SQL*Net to client
2182 bytes received via SQL*Net from client
155 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2300 rows processed

Step 4B (on TEST_RANDOM)

Steven White Raising Index Awareness Page 64 of 73


In this step, we will execute the range predicate queries on TEST_RANDOM with a B-tree
index on it. Recall that the clustering factor of this index was very close to the number of
rows in a table (and thus inefficient). Here's what the optimizer has to say about that:
SQL> select * from test_random where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old 1: select * from test_random where empno between &range1 and &range2
new 1: select * from test_random where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:03.04

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=613 Card=2299
Bytes=78166)
1 0 TABLE ACCESS (FULL) OF 'TEST_RANDOM' (Cost=613 Card=2299
Bytes=78166)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6415 consistent gets
4910 physical reads
0 redo size
111416 bytes sent via SQL*Net to client
2182 bytes received via SQL*Net from client
155 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2300 rows processed
The optimizer opted for a full table scan rather than using the index because of the
clustering factor:
BITMAP B-TREE
EMPNO
Consistent Physical (Range) Consistent Physical
Reads Reads Reads Reads
2463 1200 1-2300 6415 4910
2114 31 8-1980 6389 4910
2572 1135 1850-4250 6418 4909
3173 1620 28888-31850 6456 4909
2762 1358 82900-85478 6431 4909
7254 3329 984888-1000000 7254 4909
For the last range (984888-1000000) only, the optimizer opted for a full table scan for the
bitmap index, whereas for all ranges, it opted for a full table scan for the B-tree index. This
disparity is due to the clustering factor: The optimizer does not consider the value of the
clustering factor when generating execution plans using a bitmap index, whereas for a B-
tree index, it does. In this scenario, the bitmap index performs more efficiently than the B-
tree index.
The following steps reveal more interesting facts about these indexes.
Step 5A (on TEST_NORMAL)

Steven White Raising Index Awareness Page 65 of 73


Create a bitmap index on the SAL column of the TEST_NORMAL table. This column has
normal cardinality.
SQL> create bitmap index normal_sal_bmx on test_normal(sal);

Index created.

SQL> analyze table test_normal compute statistics for table for all
indexes for all indexed columns;

Table analyzed.
Now let's get the size of the index and the clustering factor.
SQL>select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size
in MB"
2* from user_segments
3* where segment_name in ('TEST_NORMAL','NORMAL_SAL_BMX');

SEGMENT_NAME Size in MB
------------------------------ --------------
TEST_NORMAL 50
NORMAL_SAL_BMX 4

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR
------------------------------ ----------------------------------
NORMAL_SAL_BMX 6001
Now for the queries. First run them with equality predicates:
SQL> set autot trace
SQL> select * from test_normal where sal=&sal;
Enter value for sal: 1869
old 1: select * from test_normal where sal=&sal
new 1: select * from test_normal where sal=1869

164 rows selected.

Elapsed: 00:00:00.08

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=39 Card=168 Bytes=4032)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=39
Card=168 Bytes=4032)
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
165 consistent gets
0 physical reads
0 redo size
8461 bytes sent via SQL*Net to client
609 bytes received via SQL*Net from client
12 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
164 rows processed
and then with range predicates:
SQL> select * from test_normal where sal between &sal1 and &sal2;
Enter value for sal1: 1500
Enter value for sal2: 2000

Steven White Raising Index Awareness Page 66 of 73


old 1: select * from test_normal where sal between &sal1 and &sal2
new 1: select * from test_normal where sal between 1500 and 2000

83743 rows selected.

Elapsed: 00:00:05.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes
=2001024)
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
Bytes=2001024)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
11778 consistent gets
5850 physical reads
0 redo size
4123553 bytes sent via SQL*Net to client
61901 bytes received via SQL*Net from client
5584 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
83743 rows processed
Now drop the bitmap index and create a B-tree index on TEST_NORMAL.
SQL> create index normal_sal_idx on test_normal(sal);

Index created.

SQL> analyze table test_normal compute statistics for table for all
indexes for all indexed columns;

Table analyzed.
Take a look at the size of the index and the clustering factor.
SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size
in MB"
2 from user_segments
3 where segment_name in ('TEST_NORMAL','NORMAL_SAL_IDX');

SEGMENT_NAME Size in MB
------------------------------ ---------------
TEST_NORMAL 50
NORMAL_SAL_IDX 17

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR
------------------------------ ----------------------------------
NORMAL_SAL_IDX 986778

In the above table, you can see that this index is larger than the bitmap index on the same
column. The clustering factor is also near the number of rows in this table.
Now for the tests; equality predicates first:
SQL> set autot trace
SQL> select * from test_normal where sal=&sal;
Enter value for sal: 1869
old 1: select * from test_normal where sal=&sal
new 1: select * from test_normal where sal=1869

164 rows selected.

Steven White Raising Index Awareness Page 67 of 73


Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=169 Card=168 Bytes=4032)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=169
Card=168 Bytes=4032)
2 1 INDEX (RANGE SCAN) OF 'NORMAL_SAL_IDX' (NON-UNIQUE) (Cost=3
Card=168)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
177 consistent gets
0 physical reads
0 redo size
8461 bytes sent via SQL*Net to client
609 bytes received via SQL*Net from client
12 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
164 rows processed
...and then, range predicates:
SQL> select * from test_normal where sal between &sal1 and &sal2;
Enter value for sal1: 1500
Enter value for sal2: 2000
old 1: select * from test_normal where sal between &sal1 and &sal2
new 1: select * from test_normal where sal between 1500 and 2000

83743 rows selected.

Elapsed: 00:00:04.03

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes
=2001024)
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
Bytes=2001024)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
11778 consistent gets
3891 physical reads
0 redo size
4123553 bytes sent via SQL*Net to client
61901 bytes received via SQL*Net from client
5584 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
83743 rows processed
When the queries were executed for different set of values, the resulting output, as shown in
the tables below, reveals that the numbers of consistent gets and physical reads are identical.
BITMAP B-TREE
SAL Rows
Consistent Physical (Equality) Consistent Reads Physical Fetched
Reads Reads Reads
165 0 1869 177 164

Steven White Raising Index Awareness Page 68 of 73


169 163 3548 181 167
174 166 6500 187 172
75 69 7000 81 73
177 163 2500 190 175
BITMAP B-TREE
SAL Rows
Consistent Physical (Range) Consistent Physical Fetched
Reads Reads Reads Reads
11778 5850 1500-2000 11778 3891 83743
11765 5468 2000-2500 11765 3879 83328
11753 5471 2500-3000 11753 3884 83318
17309 5472 3000-4000 17309 3892 166999
39398 5454 4000-7000 39398 3973 500520
For range predicates the optimizer opted for a full table scan for all the different set of
values—it didn't use the indexes at all—whereas for equality predicates, the optimizer used
the indexes. Again, the consistent gets and physical reads are identical.
Consequently, you can conclude that for a normal-cardinality column, the optimizer
decisions for the two types of indexes were the same and there were no significant
differences between the I/Os.
Step 6 (add a GENDER column)
Before performing the test on a low-cardinality column, let's add a GENDER column to this
table and update it with M, F, and null values.
SQL> alter table test_normal add GENDER varchar2(1);

Table altered.

SQL> select GENDER, count(*) from test_normal group by GENDER;

S COUNT(*)
- ----------
F 333769
M 499921
166310

3 rows selected.
The size of the bitmap index on this column is around 570KB, as indicated in the table
below:
SQL> create bitmap index normal_GENDER_bmx on test_normal(GENDER);

Index created.

Elapsed: 00:00:02.08

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size


in MB"
2 from user_segments
3 where segment_name in ('TEST_NORMAL','NORMAL_GENDER_BMX');

SEGMENT_NAME Size in MB
------------------------------ ---------------
TEST_NORMAL 50

Steven White Raising Index Awareness Page 69 of 73


NORMAL_GENDER_BMX .5625

2 rows selected.
In contrast, the B-tree index on this column is 13MB in size, which is much bigger than the
bitmap index on this column.
SQL> create index normal_GENDER_idx on test_normal(GENDER);

Index created.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size


in MB"
2 from user_segments
3 where segment_name in ('TEST_NORMAL','NORMAL_GENDER_IDX');

SEGMENT_NAME Size in MB
------------------------------ ---------------
TEST_NORMAL 50
NORMAL_GENDER_IDX 13

2 rows selected.
Now, if we execute a query with equality predicates, the optimizer will not make use of this
index, be it a bitmap or a B-tree. Rather, it will prefer a full table scan.
SQL> select * from test_normal where GENDER is null;

166310 rows selected.

Elapsed: 00:00:06.08

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=166310
Bytes=4157750)
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=166310
Bytes=4157750)

SQL> select * from test_normal where GENDER='M';

499921 rows selected.

Elapsed: 00:00:16.07

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=499921
Bytes=12498025)
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601
Card=499921Bytes=12498025)

SQL>select * from test_normal where GENDER='F'


/

333769 rows selected.

Elapsed: 00:00:12.02

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=333769 Byte
s=8344225)
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=333769
Bytes=8344225)
Conclusions

Steven White Raising Index Awareness Page 70 of 73


Now that we understood how the optimizer reacts to these techniques, let's examine a
scenario that clearly demonstrates the best respective applications of bitmap indexes and B-
tree indexes.
With a bitmap index on the GENDER column in place, create another bitmap index on the
SAL column and then execute some queries. The queries will be re-executed with B-tree
indexes on these columns.
From the TEST_NORMAL table, you need the employee number of all the male employees
whose monthly salaries equal any of the following values:
1000
1500
2000
2500
3000
3500
4000
4500
Thus:
SQL>select * from test_normal
where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and
GENDER='M';
This is a typical data warehouse query, which, of course, you should never execute on an
OLTP system. Here are the results with the bitmap index in place on both columns:
SQL>select * from test_normal
where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and
GENDER='M';

1453 rows selected.

Elapsed: 00:00:02.03

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=198 Card=754
Bytes=18850)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=198
Card=754 Bytes=18850)
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP AND
4 3 BITMAP OR
5 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
6 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
7 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
8 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
9 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
10 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
11 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
12 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
13 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
14 3 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_GENDER_BMX'

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1353 consistent gets
920 physical reads
0 redo size
75604 bytes sent via SQL*Net to client
1555 bytes received via SQL*Net from client
98 SQL*Net roundtrips to/from client

Steven White Raising Index Awareness Page 71 of 73


0 sorts (memory)
0 sorts (disk)
1453 rows processed
And with the B-tree index in place:
SQL>select * from test_normal
where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and
GENDER='M';

1453 rows selected.

Elapsed: 00:00:03.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=754
Bytes=18850)
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=754
Bytes=18850)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6333 consistent gets
4412 physical reads
0 redo size
75604 bytes sent via SQL*Net to client
1555 bytes received via SQL*Net from client
98 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1453 rows processed
As you can see here, with the B-tree index, the optimizer opted for a full table scan, whereas
in the case of the bitmap index, it used the index to answer the query. You can deduce
performance by the number of I/Os required to fetch the result.
In summary, bitmap indexes are best suited for DSS regardless of cardinality for these
reasons:
• With bitmap indexes, the optimizer can efficiently answer queries that include AND,
OR, or XOR. (Oracle supports dynamic B-tree-to-bitmap conversion, but it can be
inefficient.)
• With bitmaps, the optimizer can answer queries when searching or counting for
nulls. Null values are also indexed in bitmap indexes (unlike B-tree indexes).
• Most important, bitmap indexes in DSS systems support ad hoc queries, whereas B-
tree indexes do not. More specifically, if you have a table with 50 columns and users
frequently query on 10 of them—either the combination of all 10 columns or
sometimes a single column—creating a B-tree index will be very difficult. If you
create 10 bitmap indexes on all these columns, all the queries can be answered by
these indexes, whether they are queries on all 10 columns, on 4 or 6 columns out of
the 10, or on a single column. The AND_EQUAL hint provides this functionality for
B-tree indexes, but no more than five indexes can be used by a query. This limit is
not imposed with bitmap indexes.
In contrast, B-tree indexes are well suited for OLTP applications in which users' queries are
relatively routine (and well tuned before deployment in production), as opposed to ad hoc
queries, which are much less frequent and executed during nonpeak business hours. Because
data is frequently updated in and deleted from OLTP applications, bitmap indexes can cause
a serious locking problem in these situations.

Steven White Raising Index Awareness Page 72 of 73


The data here is fairly clear. Both indexes have a similar purpose: to return results as fast as
possible. But your choice of which one to use should depend purely on the type of
application, not on the level of cardinality.

Steven White Raising Index Awareness Page 73 of 73

You might also like