SQL Tunning

SQL Tunning :
Contents
1. Introduction
• Reduce the Workload

• Balance the Workload
• Parallelize the Workload
• Upgrade
2. Improve SQL statement tuning

2.1 Reviewing the Execution Plan
2.2 Restructuring SQL statement
2.2.1 AND or ‘=’ clause
2.2.2 IN or BETWEEN clauses
2.2.3 LIKE clause
2.2.4 IN and EXISTS clauses
2.2.5 DISTINCT clause
2.2.6 UNION or UNION ALL clauses
2.2.7 CASE statement
2.2.8 SELECT DISTINCT clause
2.3 Eliminate the use of Temporary Tables
2.4 Avoid Mixed Type Expressions
3. Indexes
3.1 Secondary Indexes
3.2 Join Index
3.2.1 Single table Join Index
3.2.2 Aggregate Join Index
3.2.3 Sparse Index
3.2.4 Global (Join) Index
3.2.5 Join Index performance
3.3 Partitioned Primary Index
3.4. Index Usage
1. Introduction
< !--[if !supportLineBreakNewLine]-->
< !--[endif]-->
Performance of Teradata SQL based application depends on several factors, including database design,
network latency, and query optimization, hardware specifications. Poorly tuned queries often cause
performance problems. The objective of tuning a system is to either reduce the response time for end
users of the system, or to reduce the resources used to process the same work. Both these objectives
can be achieved in several ways:
Reduce the Workload
This is what commonly constitutes SQL tuning: finding more efficient ways to process the same
workload. It is possible to change the execution plan of the statement without altering the functionality
to reduce the resource consumption.
Two examples of how resource usage can be reduced are:
1. If a commonly executed query needs to access a small percentage of data in the table, then it
can be executed more efficiently by using an index. Creation and utilization of such an index,
reduces the amount of resources used.
2. If a user is looking at the first twenty rows of the 10,000 rows returned in a specific sort order,
and if the query (and sort order) can be satisfied by an index, then the user does not need to
access and sort the 10,000 rows to see the first 20 rows.
Balance the Workload
Systems often tend to have peak usage in the daytime when real users are connected to the system and
low usage in the nighttime. If non-critical reports and batch jobs can be scheduled to run in the
nighttime and their concurrency during daytime reduced, then it frees up resources for the more critical
programs in the day.
Parallelize the Workload
Queries that access large amounts of data (typical data warehouse queries) often can be parallelized.
This is extremely useful for reducing the response time in low concurrency data warehouse. However,
for OLTP environments, which tend to be high concurrency, this can adversely impact other users by
increasing the overall resource usage of the program.
Upgrade
Software/Hardware upgrade can be another means for attaining the desired performance levels and
should be used only after trying everything from SQL tuning/data model restructuring and Database
tuning perspective.
To ensure performance after a major / minor software or hardware upgrade, perform the following
tasks on the production system.
• Recollect statistics where possible.

• Save the EXPLAINs for the base queries.
• Run the test bed again after the upgrade and get the new EXPLAINs.
• Compare the two EXPLAINs.
• Check for faster or slower response on any of the test queries.
• If slower then check for the before and after performance EXPLAIN.
2. Improve SQL statement tuning

2.1 Reviewing the Execution Plan
When tuning (or writing) a SQL statement in an OLTP environment, the goal is to drive from the table
that has the most selective filter. This means that there are fewer rows passed to the next step. If the
next step is a join, then this means that fewer rows are joined. Check to see whether the access paths
are optimal.
When examining the optimizer execution plan, look for the following:
• The plan is such that the driving table has the best filter.
• The join order in each step means that the fewest number of rows are being returned to the
next step (that is, the join order should reflect, where possible, going to the best not-yet-used
filters).
• There are any unintentional Cartesian products (even with small tables).
• Each table is being accessed efficiently:
Consider the predicates in the SQL statement and the number of rows in the table. Look for suspicious
activity, such as a full table scans on tables with large number of rows, which have predicates in the
where clause. Determine why an index is not used for such a selective predicate.
A full table scan does not mean inefficiency. It might be more efficient to perform a full table scan on a
small table, or to perform a full table scan to leverage a better join method (for example, hash_join) for
the number of rows returned.
If any of these conditions are not optimal, then consider restructuring the SQL statement or the indexes
available on the tables.
2.2 Restructuring SQL statement
Rewriting an inefficient SQL statement is often easier than repairing it. Since SQL is a flexible language,
more than one SQL statement may meet the needs of your requirements. Although two SQL statements
may produce the same result, one may be processed faster than the other. The results of the EXPLAIN
statement can be used to compare the execution plans of the two statements and determine which is
more efficient.
2.2.1 AND or ‘=’ clause

To improve SQL efficiency, it is advantageous to use equijoins whenever possible. Statements that
perform equijoins on untransformed column values are easier to tune.
2.2.2 IN or BETWEEN clauses
In case of a choice of using the IN or the BETWEEN clauses in the query, it is advantageous to use the
BETWEEN clause, as it is much more efficient. For example:
SELECT customer_number, customer_name
FROM customer
WHERE customer_number in (1000, 1001, 1002, 1003, 1004);
is much less efficient than:
SELECT customer_number, customer_name
FROM customer
WHERE customer_number BETWEEN 1000 and 1004
Assuming there is a useful index on customer_number, the Query Optimizer can locate a range of
numbers much faster (using BETWEEN) than it can find a series of numbers using the IN clause.
2.2.3 LIKE clause
If LIKE is used in a WHERE clause, it is better to try to use one or more leading character in the clause, if
at all possible. For example, use:
LIKE 'm%' not LIKE '%m'
If a leading character is used in the LIKE clause, then the Query Optimizer has the ability to potentially
use an index to perform the query thereby, speeding performance.
But if the leading character in a LIKE clause is a wildcard, the Query Optimizer will not be able to use an
index, and a table scan must be run, which reduces performance and takes more time.
2.2.4 IN and EXISTS clauses
In certain circumstances, it is better to use IN rather than EXISTS. In general, if the selective predicate is
in the subquery, then use IN. If the selective predicate is in the parent query, then use EXISTS.
A subquery when used with an IN clause can take advantage of selectivity specified in the subquery. This
is most beneficial when the most selective filter appears in the subquery and there are indexes on the
join columns. Conversely, using EXISTS is beneficial when the most selective filter is in the parent query.
This allows the selective predicates in the parent query to be applied before filtering the rows against
the EXISTS criteria.
Below are two examples that demonstrate the benefits of IN and EXISTS. Both examples use the same
schema with the following characteristics:
· < !--[endif]-->There is a unique index on the employees.employee_idfield.
· < !--[endif]-->There is an index on the orders.customer_idfield.
· < !--[endif]-->There is an index on the employees.department_idfield.
· < !--[endif]-->The employees table has 27,000 rows.
· < !--[endif]-->The orders table has 10,000 rows.
Example 1:
This example demonstrates how rewriting a query to use IN can improve performance. This query
identifies all employees who have placed orders on behalf of customer 144.
The following SQL statement uses EXISTS:
SELECTe.employee_id, e.first_name, e.last_name, e.salary
FROM employees e
WHERE EXISTS (SELECT 1
FROM orders o
WHERE e.employee_id = o.sales_rep_id
AND o.customer_id = 144);
Rewriting the statement using IN, results in significantly fewer resources used. The SQL statement using
IN:
FROM employees e
WHERE e.employee_id IN (SELECT o.sales_rep_id
FROM orders o
WHERE o.customer_id = 144);
Explanation:
In the query using EXISTS, an extra unnecessary step is being performed by the parent query.
From the sub-query, we obtain a table that has equijoined the tablesemployees and orders on
the basis of employee_id and sales_rep_id. Further, the table is filtered for customer_id = 144.
The obtained table is the resultant table needed but, an extra step of comparing of employee_id
is done once again when the parent query is performed.
However, in the query using IN, the subquery returns only a filtered table on the basis of the
specified customer_id and the join function is performed by the parent query. Thus, the work
performed by the query using IN clause is much less than that of the query using the EXISTS
clause.
Example 2:
This example demonstrates how rewriting a query to use EXISTS can improve performance. This query
identifies all employees from department 80 who are sales reps who have placed orders. The following
SQL statement uses IN:
SELECTe.employee_id, e.first_name, e.last_name, e.department_id, e.salary
FROM employees e
WHERE e.department_id = 80
AND e.job_id = 'SA_REP'
AND e.employee_id IN (SELECT o.sales_rep_id
FROM orders o);
The following SQL statement uses EXISTS:
FROM employees e
WHERE e.department_id = 80
AND e.job_id = 'SA_REP'
AND EXISTS (SELECT 1
FROM orders o
WHERE e.employee_id = o.sales_rep_id);
Explanation:
In the query using IN, the subquery returns the entire orders table with the required column data. The
rest of the filtering and the joining are done in the parent query.
However, in the query using EXISTS, the subquery returns a table equijoined between employees and
orders table. The parent query then performs the filtration according to the specified conditions. Thus,
the work performed by the query using EXISTSclause is much less than that of the query using the IN
clause.
2.2.5 DISTINCT clause
At times this clause is added to every SELECT statement, even when it is not necessary. The DISTINCT
clause should only be used in SELECT statements if it is known that duplicate returned rows are a
possibility, and that having duplicate rows in the result set would cause problems with the
requirements. The DISTINCT clause creates a lot of extra work, and reduces the physical resources that
other SQL statements have at their disposal. Because of this, the DISTINCT clause is used only if it is
necessary.
2.2.6 UNION or UNION ALL clauses
When using the UNION statement, keep in mind that, by default, it performs the equivalent of a SELECT
DISTINCT on the final result set. In other words, UNIONtakes the results of two like recordsets, combines
them, and then performs a SELECT DISTINCT in order to eliminate any duplicate rows. This process
occurs even if there are no duplicate records in the final recordset. Selecting a distinct result requires
building a temporary worktable, storing all rows in it and sorting before producing the output. If it is
known that there are duplicate records, and this presents a problem for the application, then, the
UNION statement should be used to eliminate the duplicate rows.
On the other hand, if it is known that there will never be any duplicate rows, or if there are, and this
presents no problem to your application, then the UNION ALL statement should be used instead of the
UNION statement. The advantage of the UNION ALL is that it does not perform the SELECT DISTINCT
function, which saves a lot of unnecessary resources from being used. UNION ALL requires no worktable
and no sorting. In most cases it’s much more efficient.
One more potential problem with UNIONis the danger of flooding temporary database with a huge
worktable. It may happen if a large result set is expected from a UNION query.
2.2.7 CASE statement
Often, it is necessary to calculate different aggregates on various sets of tables. Usually, this is done with
multiple scans on the table, but it is easy to calculate all the aggregates with one single scan. Eliminating
n-1 scans can greatly improve performance.
Combining multiple scans into one scan can be done by moving the WHERE condition of each scan into a
CASE statement, which filters the data for the aggregation.
Example:
The following example asks for the count of all employees who earn less then 2000, between 2000 and
4000, and more than 4000 each month. This can be done with three separate queries:
SELECT COUNT (*)
FROM employees
WHERE salary < 2000;
SELECT COUNT (*)

FROM employees
WHERE salary BETWEEN 2000 AND 4000;
SELECT COUNT (*)

FROM employees
WHERE salary > 4000;
However, it is more efficient to run the entire query in a single statement. Each number is calculated as
one column. The count uses a filter with the CASE statement to count only the rows where the condition
is valid. For example:
SELECT COUNT (CASE WHEN salary < 2000
THEN 1 ELSE null END) count1,
COUNT (CASE WHEN salary BETWEEN 2001 AND 4000
THEN 1 ELSE null END) count2,
COUNT (CASE WHEN salary > 4000
THEN 1 ELSE null END) count3
FROM employees;
2.2.8 SELECT DISTINCT clause
The DISTINCT option is used in a SELECT statement to filter out duplicate results from a query's output.
In a simple SELECT from one table this is the easiest and quickest way of doing things. However, with a
more complex query, the query can be recoded to gain a performance advantage. Consider the
following example:
Example:
The query returns authors that have a book already published.
SELECT DISTINCT au_fname, au_lname

FROM authors a
JOIN titleAuthor t ON t.au_id = a.au_id;
The same result can be obtained by writing the query in the following manner,
SELECT au_fname, au_lname

FROM authors a
WHERE EXISTS (SELECT *
FROM titleAuthor t
WHERE t.au_id = a.au_id );
The second example gives a slightly better performance than the first one. The reason is that the
EXISTSclause causes a name to be returned when the first book is found, and no further books for that
author are considered (we already have the author's name, and only want to see it only once)
On the other hand, the DISTINCT query returns one copy of the author's name for each book the author
has worked on, and the list of authors generated subsequently needs to be examined for duplicates to
satisfy the DISTINCT clause.
The DISTINCT clause involves a worktable, which does not happen in the EXISTS clause.
AND has precedence over OR

2.3 Eliminate the use of Temporary Tables
As queries become more complex, temporary tables are used more and more. While temporary tables
may sometimes be unavoidable, they can often be sidestepped by using derived tables instead.
A derived table is the result of using another SELECT statement in the FROM clause of a SELECT
statement. By using derived tables instead of temporary tables, the application's performance can be
boosted. Temporary tables slow the performance dramatically. The problem with temporary tables is
the amount of overhead that goes along with using them. In order to get the fastest queries possible,
the goal must be to make them do as little work as possible. The biggest benefit of using derived tables
over using temporary tables is that they require fewer steps, and everything happens in memory instead
of a combination of memory and disk. Fewer the steps involved, along with less I/O, faster is the
performance.
2.4 Avoid Mixed Type Expressions
It is always better to avoid mixed-mode expressions, and beware of implicit type conversions.
Consider the following example:
Example:
Assume a table with a column charcol defined as VARCHAR(2) and is also the primary index of the table.
The following SQL query is run on this table,
SELECT * FROM <tablename>
WHERE charcol = <numexpr>;
where,numexpr is an expression of numeric type. In SQL, the default implicit type conversion is from
Character to Numeric data type. Thus, in the above query charcoalis converted to numeric data type.
Since the primary index has been defined on a column of VARCHAR data type and the implicit
conversion makes it of numeric type, a full table scan is performed. This affects the performance of the
query.
This can be avoided by specifying an explicit conversion. The reformed query is,
SELECT * FROM<tablename>
WHERE charcol = CAST (<numexpr> AS VARCHAR (2));
In this case, since the column on which the primary index has been defined, remains of the same type, a
single AMP data retrieval takes place. This improvises the performance of the query.
3. Indexes
3.1 Secondary Indexes
SecondaryIndexes (SI) supply alternate access paths and the use of appropriate secondary indexes can
increase retrieval performance. For best results, secondary indexes should be based on frequently used
set selections and on equality search.
A table can have up to 32 Secondary Indexes that can be created and dropped dynamically. However, it
is not a good idea to create a number of SIs for each table just to speed up set selection because SIs
consume the following extra resources:
• SIs require additional storage to hold their subtables. In the case of a Fallback table, the SI
subtables are Fallback also. Twice the additional storage space is required.
• SIs require additional I/O to maintain these subtables.
When deciding whether or not to define a NUSI, there are other considerations. The
Optimizer may choose to do a Full Table Scan rather than utilize the NUSI in two cases:
• When the NUSI is not selective enough.

• When no COLLECTed STATISTICS are available.
As a guideline, choose only those rows having frequent access as NUSI candidates. After the table has
been loaded, create the NUSI indexes, COLLECT STATISTICS on the indexes, and then do an EXPLAIN
referencing each NUSI. If the Parser chooses a Full Table Scan over using the NUSI, drop the index.
3.2 Join Index
Join indexes are defined in a way that allows join queries to be resolved without accessing or joining
their underlying base tables and also, reduce the number of rows processed in generating result sets. A
join index is useful for queries where the index structure contains all the columns referenced by one or
more joins, thereby allowing the index to cover all or part of the query. For a join index the join result is
persistently stored and automatically maintained.
3.2.1 Single table Join Index
A single table join index is very useful for resolving joins on large tables without having to redistribute
the joined rows across the AMPs. A single-table join index partitions all or a subset of a base table using
a primary index based on the table’s foreign key (preferably the primary index of the table to which it is
to be joined).
3.2.2 Aggregate Join Index

Aggregate join indexes offer an extremely efficient, cost-effective method of resolving queries that
frequently specify the same aggregate operations on the same column or columns. When aggregate join
indexes are available, the system does not have to repeat aggregate calculations for every query.
An aggregate join index can be defined on two or more tables, or on a single table. A single-table
aggregate join index includes:
< !--[if !supportLists]-->· < !--[endif]-->A columnar subset of a base table

< !--[if !supportLists]-->· < !--[endif]-->Additional columns for the aggregate summaries of the base table
columns
An aggregate join index can be created using:
< !--[if !supportLists]-->· < !--[endif]-->SUM function

< !--[if !supportLists]-->· < !--[endif]-->COUNT function
< !--[if !supportLists]-->· < !--[endif]-->GROUP BY clause
The use of an aggregate join index entails:
< !--[if !supportLists]-->· < !--[endif]-->Initial time consumed to calculate and create the index.
< !--[if !supportLists]-->· < !--[endif]-->Whenever a value in a join-index column of the base table is updated,
time to recalculate the aggregate and update the index.
3.2.3 Sparse Index
Sparse indexes index a portion of the table using WHEREclause predicates to limit the rows indexed.
Allowing constant expressions in the WHERE clause of the CREATE JOIN INDEX statement limits the rows
that are included in the join index to a subset of the rows in the table based on an SQL query result. This
capability in effect allows creation of sparse indexes. When base tables are large, this feature can be
used to reduce the content of the join index to only the portion of the table that is frequently used if the
typical query only references a portion of the rows.
A sparse index can focus on the portions of the tables that are most frequently used. This capability:
< !--[if !supportLists]-->· < !--[endif]-->Reduces the storage requirements for a join index.
< !--[if !supportLists]-->· < !--[endif]-->Makes the costs for maintaining an index proportional to the percent
of rows actually referenced in the index.
3.2.4 Global (Join) Index
A Global Index is a term used to define a join index that contains the Row IDs of the base table rows.
Queries may use the join index to qualify a few rows, then refer to the base tables to obtain requested
columns that aren't stored in the join index. Such queries are said to be partially-covered by the index.
This is referred to as a partially-covered global index.
Because the RDBMS supports multi-table, partially-covering join indexes, all types of join indexes, except
the aggregate join index, can be joined to their base tables to retrieve columns that are referenced by a
query but are not stored in the join index.
A partial-covering join index takes less space than a covering join index. Not all columns that are
involved in a query selection condition have to be stored in a partial-covering join index. The benefits
are:
< !--[if !supportLists]-->· < !--[endif]-->Disk storage space for the join index decreases when fewer columns
are stored in the join index.
< !--[if !supportLists]-->· < !--[endif]-->Performance increases when the number of selection conditions that
can be evaluated on the join index increases.
3.2.5 Join Index performance
3.2.5.1 Collecting Statistics
To provide the Optimizer with the information needed to generate the best plans, you need to have
collected statistics on the primary index columns of each join index.
Consider collecting statistics to improve performance during:
•Creation of a join index

•Update maintenance of a join index
3.2.5.2 Cost considerations
Join indexes, like secondary indexes, incur both space and maintenance costs. For example, insert,
update, and delete operations must be performed twice, once for the base table and once for the join
index. However, if join indexes are suited to your applications, the improvements in query performance
can far outweigh the costs.
3.2.5.3 Load utilities
MultiLoad and FastLoad utilities cannot be used to load or unload data into base tables that have an
associated join index defined on them because join indexes are not maintained during the execution of
these utilities. If an error occurs, the join index must be dropped and recreated after that table has been
loaded. The TPump utility, which perform standard SQL row inserts and updates, can be used because
join indexes are properly maintained during the execution of such utilities.
3.3 Partitioned Primary Index

The Partitioned Primary Index (PPI) feature allows a class of queries to access a portion of a large table,
instead of the whole table. This is achieved by hashing rows to different virtual AMPs, as is done with a
normal PI, but creating local partitions within each virtual AMP. Normal PI access remains unchanged,
but in the case of a range query, for example, each virtual AMP is able immediately to focus its search on
specific partitions within its workspace.
PPI’s are defined on a table in order to increase query efficiency by avoiding full table scans without the
overhead and maintenance costs of secondary indexes.
A Partitioned Primary Index (PPI) allows the data rows of a table to be:
• Hash partitioned to the AMPs by the hash of the primary index columns
• Partitioned on some set of columns on each AMP
• Ordered by the hash of the primary index columns within that partition
Two functions, RANGE_N and CASE_N, can be used to simplify the specification of a partitioning
expression.
PPI improves performance as follows:
• Automatic optimization occurs for queries that specify a restrictive condition on the partitioning
column.
• Uses partition elimination to improve the efficiency of range searches when, for example, the
searches are range partitioned.
• Only the rows of the qualified partitions in a query need to be accessed avoid full table scans.
• Provides an access path to the rows in the base table while still providing efficient join strategies
• If the same partition is consistently targeted, the part of the table updated may be able to fit
largely in cache, significantly boosting performance
Benefits that are the result of using PPI vary based on:
• The number of partitions defined

• The number of partitions that can be eliminated given the query workloads
• Whether or not the user follows an update strategy that takes advantage of partitioning.
However, the main disadvantages of using a PPI are:
• The PI access disadvantage occurs only when the partitioning column is not part of the PI. In this
situation, a query specifying a PI value, but no value for the partitioning column, must look in
each partition for that value, instead of positioning directly to the first row for the PI value.
• The direct join disadvantage occurs when another table with the same PI is joined with an
equality condition on every PI column. For two non-PPI tables, the rows of the two tables will be
ordered the same, and the join can be performed directly. If one of the tables is partitioned, the
rows won't be ordered the same, and the task, in effect, becomes a set of sub-joins, one for
each partition of the PPI table.
3.4. Index Usage
TheEXPLAIN facility expresses the‘confidence’ level for retrieval of data from a table. Some of the
phrases used are:
…with high confidence
• Restricting conditions exist on index(es) or column(s) that have collected statistics.
…with low confidence
• Restricting conditions exist on index(es) having no statistics, but estimates can be based upon a
sampling of the index(es).
• Restricting conditions exist on index(es) or column(s) that have collected statistics but are “AND-
ed” together with conditions on non-indexed columns.
• Restricting conditions exist on index(es) or column(s) that have collected statistics but are “OR-
ed” together with other conditions.
…with no confidence
• Conditions outside the above.
The execution strategy provided by the EXPLAINfacility gives a direct feedback on what steps the
optimizer chooses to do. Studying EXPLAIN outputs is an excellent way to know the inner workings of
the Teradata DBS and how it handles SQL. Use of EXPLAIN facility regularly, saves a lot of time and
computing resources by pointing out problems in SQL statements before they are actually run
A) Explain the EXPLAIN: Check for EXPLAIN plan to see how exactly Teradata will be executing the
query. Try to understand basic keywords in Explain Plan like confidence level, join strategy used, re-
distribution happening or not.
B) Collect STATS: The stats of the columns used join conditions should updated. Secondary Indexes
without proper STATS can be of little or no help. Check for STATS status of the table.
C) Use Proper PI: If the Primary index is not properly defined in any one or all of the tables in the query.
Check if the PI of target table loaded is unique.
D) Use PPI: If there is Partition Primary Index created on a table, try to use it. If you are not using it in
filter condition, it will degrade the performance.
E) No FUNCTIONS in Conditions: Try to avoid using function in join conditions. Ex Applying

COALESCE or TRIM etc causes high CPU consumption.
F) Use PPI: If Partition Primary Index is defined in tables try to use it. If you are not using it in filter
condition, it will degrade the performance.
G) Same column DATA TYPES: Define same data type for the joining columns.
H) Avoid IN clause in filter conditions: When there can be huge number of values in where conditions,
better option can be to insert such values in a volatile table and use volatile table with INNER JOIN in the
main query.
I) Use Same PI in Source & Target: PI columns also can help in saving the data into disk .If the Source
and Target have the same PI, data dump can happen very efficiently form source to target.
J) Collect STATS on VOLATILE table: Collect stats on volatile tables where required can save
AMPCPU. Remove stats if already present where it is not getting used.
If the volatile table contains UNIQUE PI, then go for sample stats rather than full stats.
K) DROPPING volatile tables explicitly: Once volatile tables is no more required you can drop those.
Donít wait for complete procedure to be over. This will free some spool space immediately and could
prove to be very helpful in avoiding No More Spool Space error.
L) NO LOG for volatile tables: Create volatile tables with NO LOG option.
M) Check DBQL Stats: Keep your performance stats accessible. Target the most AMPCPU consuming
query first.
N) UPDATE clause: Do not write UPDATE clause with just SET condition and no WHERE condition.
Even if the Target/Source has just one row, add WHERE clause for PI column.
O) DELETE & INSERT: Sometimes replacing UPDATE with DELETE & INSERT can save good number
of AMPCPU. Check if this holds good for your query.
P) Query SPLITS: Split queries into several smaller queries logically and use volatile tables with proper
PI.
Q) Try MSR: If same target table is loaded multiple times, try MSR for several sections. This will speed
the final MERGE step into target table and you may see good CPU gain.
R) Try OLAP Functions: Check if replacing co-related sub query with OLAP function may result in
AMPCPU saving.
S) Avoid DUPLICATE data: If the join columns in the tables involved in the query have duplicates. Use
Distinct or Group by, load into a volatile table, collect stats and use the volatile table.
T) Use Proper JOINS: If joins used, donít use right outer, left or full joins where inner joins is sufficient.
U) User proper ALIAS: Check the aliases in the joins. Small mistake could lead to a product join.
V) Avoid CAST: Avoid unnecessary casting for DATE columns. Once defined as DATE, you can
compare date columns against each other even when they are in different format. Internally, DATE is
stored as INTEGER. CAST is required mainly when you have to compare VARCHAR value as DATE.
W) Avoid UDF: Most of the functions are available in Teradata for data manipulations. So avoid User
Defined Functions
X) Avoid FULL TABLE SCAN: Try to avoid FTS scenarios like SI should be defined on the columns
which are used as part of joins or Alternate access path. Collect stats on SI columns else there are
chances where optimizer might go for FTS even when SI is defined on that particular column
Y) Avoid using IN/NOT IN: For large list of values, avoid using IN /NOT IN in SQLs. Write large list
values to a temporary table and use this table in the query
Z) Use CONSTANTS: Use constants to specify index column contents whenever possible, instead of
specifying the constant once, and joining the tables. This may provide a small savings on performance.

SQL Tunning

Uploaded by

Copyright:

Available Formats

You might also like

SQL Tunning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SQL Tunning

Uploaded by

Copyright:

Available Formats

SQL Tunning :

• Reduce the Workload

2. Improve SQL statement tuning

Balance the Workload

• Recollect statistics where possible.

2. Improve SQL statement tuning

2.2.1 AND or ‘=’ clause

SELECT COUNT (*)

SELECT COUNT (*)

SELECT DISTINCT au_fname, au_lname

SELECT au_fname, au_lname

AND has precedence over OR

• When the NUSI is not selective enough.

3.2.2 Aggregate Join Index

< !--[if !supportLists]-->· < !--[endif]-->A columnar subset of a base table

An aggregate join index can be created using:

< !--[if !supportLists]-->· < !--[endif]-->SUM function

The use of an aggregate join index entails:

3.2.5.1 Collecting Statistics

•Creation of a join index

3.2.5.2 Cost considerations

3.2.5.3 Load utilities

3.3 Partitioned Primary Index

• The number of partitions defined

However, the main disadvantages of using a PPI are:

3.4. Index Usage

…with high confidence

• Restricting conditions exist on index(es) or column(s) that have collected statistics.

…with low confidence

• Conditions outside the above.

E) No FUNCTIONS in Conditions: Try to avoid using function in join conditions. Ex Applying

You might also like