Professional Documents
Culture Documents
Performance Tuning Overview
Performance Tuning Overview
Performance Tuning Overview
If you tune all the bottlenecks, you can further optimize session performance by
increasing the number of pipeline partitions in the session. Adding partitions can
improve performance by utilizing more of the system hardware while processing
the session.
1. Optimize the target. Enables the Integration Service to write to the targets
efficiently.
2. Optimize the source. Enables the Integration Service to read source data
efficiently.
3. Optimize the mapping. Enables the Integration Service to transform and
move data efficiently.
4. Optimize the transformation. Enables the Integration Service to process
transformations in a mapping efficiently.
5. Optimize the session. Enables the Integration Service to run the session
more quickly.
6. Optimize the grid deployments. Enables the Integration Service to run on a
grid with optimal performance.
7. Optimize the Power Center components. Enables the Integration Service
and Repository Service to function optimally.
8. Optimize the system. Enables Power Center service processes to run more
quickly.
Identification of Bottlenecks
Performance of Informatica is dependant on the performance of its several
components like database, network, transformations, mappings, sessions etc. To
tune the performance of Informatica, we have to identify the bottleneck first.
So if source was fine, then in the latter case, session should take less time. Still if
the session takes near equal time as former case, then there is a source bottleneck.
So we can put filter with the FALSE condition just after the transformation and run
the session. If the session run takes equal time with and without this test filter,then
transformation is the bottleneck.
1. If the source is a flat file, ensure that the flat file is local to the Informatica
server. If source is a relational table, then try not to use synonyms or aliases.
2. If the source is a flat file, reduce the number of bytes (By default it is 1024
bytes per line) the Informatica reads per line. If we do this, we can decrease
the Line Sequential Buffer Length setting of the session properties.
3. If possible, give a conditional query in the source qualifier so that the
records are filtered off as soon as possible in the process.
4. In the source qualifier, if the query has ORDER BY or GROUP BY, then
create an index on the source table and order by the index field of the source
table.
1. Partition the session: This creates many connections to the source and
target, and loads data in parallel pipelines. Each pipeline will be independent
of the other. But the performance of the session will not improve if the
number of records is less. Also the performance will not improve if it does
updates and deletes. So session partitioning should be used only if the
volume of data is huge and the job is mainly insertion of data.
2. Run the sessions in parallel rather than serial to gain time, if they are
independent of each other.
3. Drop constraints and indexes before we run session. Rebuild them after the
session run completes. Dropping can be done in pre session script and
Rebuilding in post session script. But if data is too much, dropping indexes
and then rebuilding them etc. will be not possible. In such cases, stage all
data, pre-create the index, use a transportable table space and then load into
database.
4. Use bulk loading, external loading etc. Bulk loading can be used only if the
table does not have an index.
5. In a session we have options to ‘Treat rows as ‘Data Driven, Insert, Update
and Delete’. If update strategies are used, then we have to keep it as ‘Data
Driven’. But when the session does only insertion of rows into target table, it
has to be kept as ‘Insert’ to improve performance.
6. Increase the database commit level (The point at which the Informatica
server is set to commit data to the target table. For e.g. commit level can be
set at every every 50,000 records)
7. By avoiding built in functions as much as possible, we can improve the
performance. E.g. For concatenation, the operator ‘||’ is faster than the
function CONCAT (). So use operators instead of functions, where possible.
The functions like IS_SPACES (), IS_NUMBER (), IFF (), DECODE () etc.
reduce the performance to a big extent in this order. Preference should be in
the opposite order.
8. String functions like substring, ltrim, and rtrim reduce the performance. In
the sources, use delimited strings in case the source flat files or use varchar
data type.
9. Manipulating high precision data types will slow down Informatica server.
So disable ‘high precision’.
10.Localize all source and target tables, stored procedures, views, sequences
etc. Try not to connect across synonyms. Synonyms and aliases slow down
the performance.
DATABASE OPTIMISATION
To gain the best Informatica performance, the database tables, stored procedures
and queries used in Informatica should be tuned well.
1. If the source and target are flat files, then they should be present in the
system in which the Informatica server is present.
2. Increase the network packet size.
3. The performance of the Informatica server is related to network
connections.Data generally moves across a network at less than 1 MB per
second, whereas a local disk moves data five to twenty times faster. Thus
network connections often affect on session performance. So avoid network
connections.
4. Optimize target databases.
1. To improve performance, cache the lookup tables. Informatica can cache all the
lookup and reference tables; this makes operations run very fast. (Meaning of
cache is given in point 2 of this section and the procedure for determining the
optimum cache size is given at the end of this document.)
2. Even after caching, the performance can be further improved by minimizing the
size of the lookup cache. Reduce the number of cached rows by using a sql
override with a restriction.
Cache: Cache stores data in memory so that Informatica does not have to read the
table each time it is referenced. This reduces the time taken by the process to a
large extent. Cache is automatically generated by Informatica depending on the
marked lookup ports or by a user defined sql query.
input that comes from the from the source table, SUPPORT_TABLE.
If there are 50,000 employee_id, then size of the lookup cache will be 50,000.
where e. employee_id=s.eno’
If there are 1000 eno, then the size of the lookup cache will be only 1000.But here
the performance gain will happen only if the number of records in
SUPPORT_TABLE is not huge. Our concern is to make the size of the cache as
less as possible.
3. In lookup tables, delete all unused columns and keep only the fields that are used
in the mapping.
6. In the sql override query of the lookup table, there will be an ORDER BY
clause. Remove it if not needed or put fewer column names in the ORDER BY list.
9. If lookup data is static, use persistent cache. Persistent caches help to save and
reuse cache files. If several sessions in the same job use the same lookup table,
then using persistent cache will help the sessions to reuse cache files. In case of
static lookups, cache files will be built from memory cache instead of from the
database, which will improve the performance.
10. If source is huge and lookup table is also huge, then also use persistent cache.
11. If target table is the lookup table, then use dynamic cache. The Informatica
server updates the lookup cache as it passes rows to the target.
12. Use only the lookups you want in the mapping. Too many lookups inside a
mapping will slow down the session.
13. If lookup table has a lot of data, then it will take too long to cache or fit in
memory. So move those fields to source qualifier and then join with the main table.
14. If there are several lookups with the same data set, then share the caches.
15. If we are going to return only 1 row, then use unconnected lookup.
16. All data are read into cache in the order the fields are listed in lookup ports. If
we have an index that is even partially in this order, the loading of these lookups
can be speeded up.
17. If the table that we use for look up has an index (or if we have privilege to add
index to the table in the database, do so), then the performance would increase both
for cached and un cached lookups.
The amount of transformation logic you can push to the database depends on the
database, transformation logic, and mapping and session configuration. The
Integration Service processes all transformation logic that it cannot push to a
database.
Use the Pushdown Optimization Viewer to preview the SQL statements and
mapping logic that the Integration Service can push to the source or target
database. You can also use the Pushdown Optimization Viewer to view the
messages related to pushdown optimization.
The following figure shows a mapping containing transformation logic that can be
pushed to the source database:
The Integration Service generates and executes a SELECT statement based on the
transformation logic for each transformation it can push to the database. Then, it
reads the results of this SQL query and processes the remaining transformations.
When you run a session with large quantities of data and full pushdown
optimization, the database server must run a long transaction. Consider the
following database performance issues when you generate a long transaction:
TESTING
Unit Testing
Unit testing can be broadly classified into 2 categories.
Quantitative Testing
Validate your Source and Target
b) If you are using flat file make sure have enough read/write permission on the
file share.
b) Check the Read and Write counters. How long it takes to perform the load.
c) Use the session and workflow logs to capture the load statistics.
a) Have customized SQL queries to check the source/targets and here we will
perform the Record Count Verification.
b) Analyze the rejections and build a process to handle those rejections. This
requires a clear business requirement from the business on how to handle the data
rejections. Do we need to reload or reject and inform etc? Discussions are required
and appropriate process must be developed.
Performance Improvement
a) Network Performance
b) Session Performance
c) Database Performance
Qualitative Testing
Analyze & validate your transformation business rules. More of functional testing.
e) You need review field by field from source to target and ensure that the required
transformation logic is applied.
f) If you are making changes to existing mappings make use of the data lineage
feature Available with Informatica Power Center. This will help you to find the
consequences of Altering or deleting a port from existing mapping.
g) Ensure that appropriate dimension lookup’s have been used and your
development is in Sync with your business requirements.
Integration Testing
After unit testing is complete; it should form the basis of starting integration
testing. Integration testing should
Integration Testing would cover End-to-End Testing for DWH. The coverage of the
tests would include the below:
Count Validation
Control totals: To ensure accuracy in data entry and processing, control totals can
be compared by the system with manually entered or otherwise calculated control
totals using the data fields such as quantities, line items, documents, or dollars, or
simple record counts
Hash totals: This is a technique for improving data accuracy, whereby totals are
obtained on identifier fields (i.e., fields for which it would logically be meaningless
to construct a total), such as account number, social security number, part number,
or employee number. These totals have no significance other than for internal
system control purposes.
Limit checks: The program tests specified data fields against defined high or low
value limits (e.g., quantities or dollars) for acceptability before further processing.
Dimensional Analysis
Statistical Analysis
When you validate the calculations you don’t require loading the entire rows
into target and Validating it.
Instead you use the Enable Test Load feature available in Informatica Power Center.
Property Description
Enable Test You can configure the Integration Service to perform
Load a test load.
Check for missing data, negatives and consistency. Field-by-Field data verification
can be done to check the consistency of source and target data.
Overflow checks: This is a limit check based on the capacity of a data field or data
file area to accept data. This programming technique can be used to detect the
truncation of a financial or quantity data field value after computation (e.g.,
addition, multiplication, and division). Usually, the first digit is the one lost.
Format checks: These are used to determine that data are entered in the proper
mode, as numeric or alphabetical characters, within designated fields of
information. The proper mode in each case depends on the data field definition.
Sign test: This is a test for a numeric data field containing a designation of an
algebraic sign, + or - , which can be used to denote, for example, debits or credits
for financial data fields.
Size test: This test can be used to test the full size of the data field. For example, a
social security number in the United States should have nine digits
Granularity
Other validations
Note: Based on your project and business needs you might have additional
testing requirements.
Any changes to the business requirement will follow the change management
process and eventually those changes have to follow the SDLC process.
Syntax Testing: Test your customized queries using your source qualifier before
executing the session. Performance Testing for identifying the following
bottlenecks:
Target
Source
Mapping
Session
System
Run test sessions. You can configure a test session to read from a flat file
source or to write to a flat file target to identify source and target
bottlenecks.
Analyze performance details. Analyze performance details, such as
performance counters, to determine where session performance decreases.
Analyze thread statistics. Analyze thread statistics to determine the optimal
number of partition points.
Monitor system performance. You can use system monitoring tools to
view the percentage of CPU use, I/O waits, and paging to identify system
bottlenecks. You can also use the Workflow Monitor to view system
resource usage. Use Power Center conditional filter in the Source Qualifier
to improve performance.
Share metadata. You can share metadata with a third party. For example, you
want to send a mapping to someone else for testing or analysis, but you do not
want to disclose repository connection information for security reasons. You
can export the mapping to an XML file and edit the repository connection
information before sending the XML file. The third party can import the
mapping from the XML file and analyze the metadata.
Debugger
You can debug a valid mapping to gain troubleshooting information about data and
error conditions. To debug a mapping, you configure and run the Debugger from
within the Mapping Designer. The Debugger uses a session to run the mapping on
the Integration Service. When you run the Debugger, it pauses at breakpoints and
you can view and edit transformation output data.
Before you run a session. After you save a mapping, you can run some
initial tests with a debug session before you create and configure a session in
the Workflow Manager.
After you run a session. If a session fails or if you receive unexpected
results in the target, you can run the Debugger against the session. You
might also want to run the Debugger against a session if you want to debug
the mapping using the configured session properties.
You can select three different debugger session types when you configure the
Debugger. The Debugger runs a workflow for each session type. You can choose
from the following Debugger session types when you configure the Debugger:
Debug Process
2. Configure the Debugger. Use the Debugger Wizard to configure the Debugger
for the mapping. Select the session type the Integration Service uses when it runs
the Debugger. When you create a debug session, you configure a subset of session
properties within the Debugger Wizard, such as source and target location. You can
also choose to load or discard target data.
3. Run the Debugger. Run the Debugger from within the Mapping Designer.
When you run the Debugger, the Designer connects to the Integration Service. The
Integration Service initializes the Debugger and runs the debugging session and
workflow. The Integration Service reads the breakpoints and pauses the Debugger
4. Monitor the Debugger. While you run the Debugger, you can monitor the target
data, transformation and mapplet output data, the debug log, and the session log.
When you run the Debugger, the Designer displays the following windows:
When you complete the Debugger Wizard, the Integration Service starts the
session and initializes the Debugger. After initialization, the Debugger moves in
and out of running and paused states based on breakpoints and commands that you
issue from the Mapping Designer. The Debugger can be in one of the following
states:
Note: To enable multiple users to debug the same mapping at the same time, each
user must configure different port numbers in the Tools > Options > Debug tab.
When you run the Debugger, you can monitor the following information:
The Mapping Designer displays windows and debug indicators that help you
monitor the session:
While you monitor the Debugger, you might want to change the transformation
output data to see the effect on subsequent transformations or targets in the data
flow. You might also want to edit or add more breakpoint information to monitor
the session more closely.
Restrictions
Active source. Related target tables must have the same active source.
Key relationships. Target tables must have key relationships.
Target connection groups. Targets must be in one target connection group.
Treat rows as insert. Use this option when you insert into the target. You
cannot use updates with constraint based loading.
Active Source:
When target tables receive rows from different active sources, the Integration
Service reverts to normal loading for those tables, but loads all other targets in the
session using constraint-based loading when possible. For example, a mapping
contains three distinct pipelines. The first two contain a source, source qualifier,
and target. Since these two targets receive data from different active sources, the
Integration Service reverts to normal loading for both targets. The third pipeline
contains a source, Normalizer, and two targets. Since these two targets share a
single active source (the Normalizer), the Integration Service performs constraint-
based loading: loading the primary key table first, then the foreign key table.
Key Relationships:
When target tables have no key relationships, the Integration Service does not
perform constraint-based loading.
Similarly, when target tables have circular key relationships, the Integration
Service reverts to a normal load. For example, you have one target containing a
primary key and a foreign key related to the primary key in a second target. The
second target also contains a foreign key that references the primary key in the first
target. The Integration Service cannot enforce constraint-based loading for these
tables. It reverts to a normal load.
Target Connection Groups:
The Integration Service enforces constraint-based loading for targets in the same
target connection group. If you want to specify constraint-based loading for
multiple targets that receive data from the same active source, you must verify the
tables are in the same target connection group. If the tables with the primary key-
foreign key relationship are in different target connection groups, the Integration
Service cannot enforce constraint-based loading when you run the workflow. To
verify that all targets are in the same target connection group, complete the
following tasks:
Verify all targets are in the same target load order group and receive data
from the same active source.
Use the default partition properties and do not add partitions or partition
points.
Define the same target type for all targets in the session properties.
Define the same database connection name for all targets in the session
properties.
Choose normal mode for the target load type for all targets in the session
properties.
Use constraint-based loading when the session option Treat Source Rows As is set
to insert. You might get inconsistent data if you select a different Treat Source
Rows As option and you configure the session for constraint-based loading.
When the mapping contains Update Strategy transformations and you need to load
data to a primary key table first, split the mapping using one of the following
options:
Load primary key table in one mapping and dependent tables in another
mapping. Use constraint-based loading to load the primary table.
Perform inserts in one mapping and updates in another mapping.
Constraint-based loading does not affect the target load ordering of the mapping.
Target load ordering defines the order the Integration Service reads the sources in
each target load order group in the mapping. A target load order group is a
collection of source qualifiers, transformations, and targets linked together in a
mapping. Constraint based loading establishes the order in which the Integration
Service loads individual targets within a set of targets receiving data from a single
source qualifier.
Example
In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign
keys referencing the T1 primary key. T_3 has a primary key that T_4 references as
a foreign key.
Since these tables receive records from a single active source, SQ_A, the
Integration Service loads rows to the target in the following order:
1. T_1
The Integration Service loads T_1 first because it has no foreign key dependencies
and contains a primary key referenced by T_2 and T_3. The Integration Service
then loads T_2 and T_3, but since T_2 and T_3 have no dependencies, they are not
loaded in any particular order. The Integration Service loads T_4 last, because it
has a foreign key that references a primary key in T_3.After loading the first set of
targets, the Integration Service begins reading source B. If there are no key
relationships between T_5 and T_6, the Integration Service reverts to a normal load
for both targets.
If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6
receive data from a single active source, the Aggregator AGGTRANS, the
Integration Service loads rows to the tables in the following order:
T_5
T_6
T_1, T_2, T_3, and T_4 are in one target connection group if you use the same
database connection for each target, and you use the default partition properties.
T_5 and T_6 are in another target connection group together if you use the same
database connection for each target and you use the default partition properties.
The Integration Service includes T_5 and T_6 in a different target connection
group because they are in a different target load order group from the first four
targets.
When you enable constraint-based loading, the Integration Service orders the target
load on a row-by-row basis. To enable constraint-based loading:
1. In the General Options settings of the Properties tab, choose Insert for the
Treat Source Rows As property.
2. Click the Config Object tab. In the Advanced settings, select Constraint
Based Load Ordering.
3. Click OK.
Target Load Order
When you use a mapplet in a mapping, the Mapping Designer lets you set the
target load plan for sources within the mapplet.
You can configure the target load order for a mapping containing any type of target
definition. In the Designer, you can set the order in which the Integration Service
sends rows to targets in different target load order groups in a mapping. A target
load order group is the collection of source qualifiers, transformations, and targets
linked together in a mapping. You can set the target load order if you want to
maintain referential integrity when inserting, deleting, or updating tables that have
the primary key and foreign key constraints.
The Integration Service reads sources in a target load order group concurrently, and
it processes target load order groups sequentially.
To specify the order in which the Integration Service sends data to targets, create
one source qualifier for each target within a mapping. To set the target load order,
you then determine in which order the Integration Service reads each source in the
mapping.
The following figure shows two target load order groups in one mapping:
In this mapping, the first target load order group includes ITEMS, SQ_ITEMS, and
T_ITEMS. The second target load order group includes all other objects in the
mapping, including the TOTAL_ORDERS target. The Integration Service
processes the first target load order group, and then the second target load order
group.
When it processes the second target load order group, it reads data from both
sources at the same time.