Performance Tuning Overview

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 33

PERFORMANCE TUNNING IN INFORMATICA

Performance Tuning Overview


The goal of performance tuning is to optimize session performance by eliminating
performance bottlenecks. To tune session performance, first identify a performance
bottleneck, eliminate it, and then identify the next performance bottleneck until
you are satisfied with the session performance. You can use the test load option to
run sessions when you tune session performance.

If you tune all the bottlenecks, you can further optimize session performance by
increasing the number of pipeline partitions in the session. Adding partitions can
improve performance by utilizing more of the system hardware while processing
the session.

Because determining the best way to improve performance can be complex,


change one variable at a time, and time the session both before and after the
change. If session performance does not improve, you might want to return to the
original configuration.

Complete the following tasks to improve session performance:

1. Optimize the target. Enables the Integration Service to write to the targets
efficiently.
2. Optimize the source. Enables the Integration Service to read source data
efficiently.
3. Optimize the mapping. Enables the Integration Service to transform and
move data efficiently.
4. Optimize the transformation. Enables the Integration Service to process
transformations in a mapping efficiently.
5. Optimize the session. Enables the Integration Service to run the session
more quickly.
6. Optimize the grid deployments. Enables the Integration Service to run on a
grid with optimal performance.
7. Optimize the Power Center components. Enables the Integration Service
and Repository Service to function optimally.
8. Optimize the system. Enables Power Center service processes to run more
quickly.
Identification of Bottlenecks
Performance of Informatica is dependant on the performance of its several
components like database, network, transformations, mappings, sessions etc. To
tune the performance of Informatica, we have to identify the bottleneck first.

Bottleneck may be present in source, target, transformations, mapping, session,


database or network. It is best to identify performance issue in components in the
order source, target, transformations, mapping and session. After identifying the
bottleneck, apply the tuning mechanisms in whichever way they are applicable to
the project.

Identify bottleneck in Source


If source is a relational table, put a filter transformation in the mapping, just after
source qualifier; make the condition of filter to FALSE. So all records will be
filtered off and none will proceed to other parts of the mapping.In original case,
without the test filter, total time taken is as follows:-

Total Time = time taken by (source + transformations + target load)

Now because of filter, Total Time = time taken by source

So if source was fine, then in the latter case, session should take less time. Still if
the session takes near equal time as former case, then there is a source bottleneck.

Identify bottleneck in Target


If the target is a relational table, then substitute it with a flat file and run the
session. If the time taken now is very much less than the time taken for the session
to load to table, then the target table is the bottleneck.

Identify bottleneck in Transformation


Remove the transformation from the mapping and run it. Note the time taken.Then
put the transformation back and run the mapping again. If the time taken now is
significantly more than previous time, then the transformation is the bottleneck.
But removal of transformation for testing can be a pain for the developer since that
might require further changes for the session to get into the ‘working mode’.

So we can put filter with the FALSE condition just after the transformation and run
the session. If the session run takes equal time with and without this test filter,then
transformation is the bottleneck.

Identify bottleneck in sessions


We can use the session log to identify whether the source, target or transformations
are the performance bottleneck. Session logs contain thread summary records like
the following:-

MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of


partition point [SQ_test_all_text_data] has completed: Total Run Time
=[11.703201] secs, Total Idle Time = [9.560945] secs, Busy Percentage
=[18.304876].

MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the


transformation stage of partition point [SQ_test_all_text_data] has completed:
Total Run Time = [11.764368] secs, Total Idle Time = [0.000000] secs, Busy
Percentage = [100.000000].

If busy percentage is 100, then that part is the bottleneck.

Basically we have to rely on thread statistics to identify the cause of performance


issues. Once the ‘Collect Performance Data’ option (In session ‘Properties’ tab) is
enabled, all the performance related information would appear in the log created by
the session

Optimizing the Bottleneck’s

1. If the source is a flat file, ensure that the flat file is local to the Informatica
server. If source is a relational table, then try not to use synonyms or aliases.
2. If the source is a flat file, reduce the number of bytes (By default it is 1024
bytes per line) the Informatica reads per line. If we do this, we can decrease
the Line Sequential Buffer Length setting of the session properties.
3. If possible, give a conditional query in the source qualifier so that the
records are filtered off as soon as possible in the process.
4. In the source qualifier, if the query has ORDER BY or GROUP BY, then
create an index on the source table and order by the index field of the source
table.

PERFORMANCE TUNING OF TARGETS


If the target is a flat file, ensure that the flat file is local to the Informatica server. If
target is a relational table, then try not to use synonyms or aliases.

1. Use bulk load whenever possible.


2. Increase the commit level.
3. Drop constraints and indexes of the table before loading.

PERFORMANCE TUNING OF MAPPINGS


Mapping helps to channel the flow of data from source to target with all the
transformations in between. Mapping is the skeleton of Informatica loading
process.

1. Avoid executing major sql queries from mapplets or mappings.


2. Use optimized queries when we are using them.
3. Reduce the number of transformations in the mapping. Active
transformations like rank, joiner, filter, aggregator etc should be used as less
as possible.
4. Remove all the unnecessary links between the transformations from
mapping.
5. If a single mapping contains many targets, then dividing them into separate
mappings can improve performance.
6. If we need to use a single source more than once in a mapping, then keep
only one source and source qualifier in the mapping. Then create different
data flows as required into different targets or same target.
7. If a session joins many source tables in one source qualifier, then an
optimizing query will improve performance.
8. In the sql query that Informatica generates, ORDERBY will be present.
Remove the ORDER BY clause if not needed or at least reduce the number
of column names in that list. For better performance it is best to order by the
index field of that table.
9. Combine the mappings that use same set of source data.
10.On a mapping, field with the same information should be given the same
type and length throughout the mapping. Otherwise time will be spent on
field conversions.
11.Instead of doing complex calculation in query, use an expression transformer
and do the calculation in the mapping.
12.If data is passing through multiple staging areas, removing the staging area
will increase performance.
13.Stored procedures reduce performance. Try to keep the stored procedures
simple in the mappings.
14.Unnecessary data type conversions should be avoided since the data type
conversions impact performance.
15.Transformation errors result in performance degradation. Try running the
mapping after removing all transformations. If it is taking significantly less
time than with the transformations, then we have to fine-tune the
transformation.
16.Keep database interactions as less as possible.

PERFORMANCE TUNING OF SESSIONS


A session specifies the location from where the data is to be taken, where the
transformations are done and where the data is to be loaded. It has various
properties that help us to schedule and run the job in the way we want.

1. Partition the session: This creates many connections to the source and
target, and loads data in parallel pipelines. Each pipeline will be independent
of the other. But the performance of the session will not improve if the
number of records is less. Also the performance will not improve if it does
updates and deletes. So session partitioning should be used only if the
volume of data is huge and the job is mainly insertion of data.
2. Run the sessions in parallel rather than serial to gain time, if they are
independent of each other.
3. Drop constraints and indexes before we run session. Rebuild them after the
session run completes. Dropping can be done in pre session script and
Rebuilding in post session script. But if data is too much, dropping indexes
and then rebuilding them etc. will be not possible. In such cases, stage all
data, pre-create the index, use a transportable table space and then load into
database.
4. Use bulk loading, external loading etc. Bulk loading can be used only if the
table does not have an index.
5. In a session we have options to ‘Treat rows as ‘Data Driven, Insert, Update
and Delete’. If update strategies are used, then we have to keep it as ‘Data
Driven’. But when the session does only insertion of rows into target table, it
has to be kept as ‘Insert’ to improve performance.
6. Increase the database commit level (The point at which the Informatica
server is set to commit data to the target table. For e.g. commit level can be
set at every every 50,000 records)
7. By avoiding built in functions as much as possible, we can improve the
performance. E.g. For concatenation, the operator ‘||’ is faster than the
function CONCAT (). So use operators instead of functions, where possible.
The functions like IS_SPACES (), IS_NUMBER (), IFF (), DECODE () etc.
reduce the performance to a big extent in this order. Preference should be in
the opposite order.
8. String functions like substring, ltrim, and rtrim reduce the performance. In
the sources, use delimited strings in case the source flat files or use varchar
data type.
9. Manipulating high precision data types will slow down Informatica server.
So disable ‘high precision’.
10.Localize all source and target tables, stored procedures, views, sequences
etc. Try not to connect across synonyms. Synonyms and aliases slow down
the performance.

DATABASE OPTIMISATION
To gain the best Informatica performance, the database tables, stored procedures
and queries used in Informatica should be tuned well.

1. If the source and target are flat files, then they should be present in the
system in which the Informatica server is present.
2. Increase the network packet size.
3. The performance of the Informatica server is related to network
connections.Data generally moves across a network at less than 1 MB per
second, whereas a local disk moves data five to twenty times faster. Thus
network connections often affect on session performance. So avoid network
connections.
4. Optimize target databases.

PERFORMANCE TUNING OF LOOKUP TRANSFORMATIONS

Lookup transformations are used to lookup a set of values in another table.Lookups


slows down the performance.

1. To improve performance, cache the lookup tables. Informatica can cache all the
lookup and reference tables; this makes operations run very fast. (Meaning of
cache is given in point 2 of this section and the procedure for determining the
optimum cache size is given at the end of this document.)

2. Even after caching, the performance can be further improved by minimizing the
size of the lookup cache. Reduce the number of cached rows by using a sql
override with a restriction.

Cache: Cache stores data in memory so that Informatica does not have to read the
table each time it is referenced. This reduces the time taken by the process to a
large extent. Cache is automatically generated by Informatica depending on the
marked lookup ports or by a user defined sql query.

Example for caching by a user defined query: -


Suppose we need to lookup records where employee_id=eno.

‘employee_id’ is from the lookup table, EMPLOYEE_TABLE and ‘eno’ is the

input that comes from the from the source table, SUPPORT_TABLE.

We put the following sql query override in Lookup Transform

‘select employee_id from EMPLOYEE_TABLE’

If there are 50,000 employee_id, then size of the lookup cache will be 50,000.

Instead of the above query, we put the following:-

‘select emp employee_id from EMPLOYEE_TABLE e, SUPPORT_TABLE s

where e. employee_id=s.eno’

If there are 1000 eno, then the size of the lookup cache will be only 1000.But here
the performance gain will happen only if the number of records in
SUPPORT_TABLE is not huge. Our concern is to make the size of the cache as
less as possible.

3. In lookup tables, delete all unused columns and keep only the fields that are used
in the mapping.

4. If possible, replace lookups by joiner transformation or single source


qualifier.Joiner transformation takes more time than source qualifier
transformation.

5. If lookup transformation specifies several conditions, then place conditions that


use equality operator ‘=’ first in the conditions that appear in the conditions tab.

6. In the sql override query of the lookup table, there will be an ORDER BY
clause. Remove it if not needed or put fewer column names in the ORDER BY list.

7. Do not use caching in the following cases: -

-Source is small and lookup table is large.

-If lookup is done on the primary key of the lookup table.


8. Cache the lookup table columns definitely in the following case: -

-If lookup table is small and source is large.

9. If lookup data is static, use persistent cache. Persistent caches help to save and
reuse cache files. If several sessions in the same job use the same lookup table,
then using persistent cache will help the sessions to reuse cache files. In case of
static lookups, cache files will be built from memory cache instead of from the
database, which will improve the performance.

10. If source is huge and lookup table is also huge, then also use persistent cache.

11. If target table is the lookup table, then use dynamic cache. The Informatica
server updates the lookup cache as it passes rows to the target.

12. Use only the lookups you want in the mapping. Too many lookups inside a
mapping will slow down the session.

13. If lookup table has a lot of data, then it will take too long to cache or fit in
memory. So move those fields to source qualifier and then join with the main table.

14. If there are several lookups with the same data set, then share the caches.

15. If we are going to return only 1 row, then use unconnected lookup.

16. All data are read into cache in the order the fields are listed in lookup ports. If
we have an index that is even partially in this order, the loading of these lookups
can be speeded up.

17. If the table that we use for look up has an index (or if we have privilege to add
index to the table in the database, do so), then the performance would increase both
for cached and un cached lookups.

PUSH DOWN OPTIMISATION


You can push transformation logic to the source or target database using pushdown
optimization. When you run a session configured for pushdown optimization, the
Integration Service translates the transformation logic into SQL queries and sends
the SQL queries to the database. The source or target database executes the SQL
queries to process the transformations.

The amount of transformation logic you can push to the database depends on the
database, transformation logic, and mapping and session configuration. The
Integration Service processes all transformation logic that it cannot push to a
database.

Use the Pushdown Optimization Viewer to preview the SQL statements and
mapping logic that the Integration Service can push to the source or target
database. You can also use the Pushdown Optimization Viewer to view the
messages related to pushdown optimization.

The following figure shows a mapping containing transformation logic that can be
pushed to the source database:

This mapping contains an Expression transformation that creates an item ID based


on the store number 5419 and the item ID from the source. To push the
transformation logic to the database, the Integration Service generates the
following SQL statement:

INSERT INTO T_ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC) SELECT


CAST((CASE WHEN 5419 IS NULL THEN '' ELSE 5419 END) + '_' + (CASE
WHEN ITEMS.ITEM_ID IS NULL THEN '' ELSE ITEMS.ITEM_ID END) AS
INTEGER), ITEMS.ITEM_NAME, ITEMS.ITEM_DESC FROM ITEMS2
ITEMS

The Integration Service generates an INSERT SELECT statement to retrieve the


ID, name, and description values from the source table, create new item IDs, and
insert the values into the ITEM_ID, ITEM_NAME, and ITEM_DESC columns in
the target table. It concatenates the store number 5419, an underscore, and the
original ITEM ID to get the new item ID.
Pushdown Optimization Types
You can configure the following types of pushdown optimization:

 Source-side pushdown optimization. The Integration Service pushes as


much transformation logic as possible to the source database.
 Target-side pushdown optimization. The Integration Service pushes as
much transformation logic as possible to the target database.
 Full pushdown optimization. The Integration Service attempts to push all
transformation logic to the target database. If the Integration Service cannot
push all transformation logic to the database, it performs both source-side
and target-side pushdown optimization.

Running Source-Side Pushdown Optimization Sessions


When you run a session configured for source-side pushdown optimization, the
Integration Service analyzes the mapping from the source to the target or until it
reaches a downstream transformation it cannot push to the source database.

The Integration Service generates and executes a SELECT statement based on the
transformation logic for each transformation it can push to the database. Then, it
reads the results of this SQL query and processes the remaining transformations.

Running Target-Side Pushdown Optimization Sessions


When you run a session configured for target-side pushdown optimization, the
Integration Service analyzes the mapping from the target to the source or until it
reaches an upstream transformation it cannot push to the target database. It
generates an INSERT, DELETE, or UPDATE statement based on the
transformation logic for each transformation it can push to the target database. The
Integration Service processes the transformation logic up to the point that it can
push the transformation logic to the database. Then, it executes the generated SQL
on the Target database.

Running Full Pushdown Optimization Sessions


To use full pushdown optimization, the source and target databases must be in the
same relational database management system. When you run a session configured
for full pushdown optimization, the Integration Service analyzes the mapping from
the source to the target or until it reaches a downstream transformation it cannot
push to the target database. It generates and executes SQL statements against the
source or target based on the transformation logic it can push to the database.

When you run a session with large quantities of data and full pushdown
optimization, the database server must run a long transaction. Consider the
following database performance issues when you generate a long transaction:

 A long transaction uses more database resources.


 A long transaction locks the database for longer periods of time. This
reduces database concurrency and increases the likelihood of deadlock.
 A long transaction increases the likelihood of an unexpected event. To
minimize database performance issues for long transactions, consider using
source-side or target-side pushdown optimization.

Rules and Guidelines for Functions in Pushdown Optimization


Use the following rules and guidelines when pushing functions to a database:

 If you use ADD_TO_DATE in transformation logic to change days, hours,


minutes, or seconds, you cannot push the function to a Teradata database.
 When you push LAST_DAY () to Oracle, Oracle returns the date up to the
second. If the input date contains sub seconds, Oracle trims the date to the
second.
 When you push LTRIM, RTRIM, or SOUNDEX to a database, the database
treats the argument (' ') as NULL, but the Integration Service treats the
argument (' ') as spaces.
 An IBM DB2 database and the Integration Service produce different results
for STDDEV and VARIANCE. IBM DB2 uses a different algorithm than
other databases to calculate STDDEV and VARIANCE.
 When you push SYSDATE or SYSTIMESTAMP to the database, the
database server returns the timestamp in the time zone of the database
server, not the Integration Service.
 If you push SYSTIMESTAMP to an IBM DB2 or a Sybase database, and
you specify the format for SYSTIMESTAMP, the database ignores the
format and returns the complete time stamp.
 You can push SYSTIMESTAMP (‘SS’) to a Netezza database, but not
SYSTIMESTAMP (‘MS’) or SYSTIMESTAMP (‘US’).
 When you push TO_CHAR (DATE) or TO_DATE () to Netezza, dates with
sub second precision must be in the YYYY-MM-DD HH24: MI: SS.US
format. If the format is different, the Integration Service does not push the
function to Netezza.

TESTING

Unit Testing
Unit testing can be broadly classified into 2 categories.

Quantitative Testing
Validate your Source and Target

a) Ensure that your connectors are configured properly.

b) If you are using flat file make sure have enough read/write permission on the
file share.

c) You need to document all the connector information.

Analyze the Load Time

a) Execute the session and review the session statistics.

b) Check the Read and Write counters. How long it takes to perform the load.

c) Use the session and workflow logs to capture the load statistics.

d) You need to document all the load timing information.

Analyze the success rows and rejections.

a) Have customized SQL queries to check the source/targets and here we will
perform the Record Count Verification.

b) Analyze the rejections and build a process to handle those rejections. This
requires a clear business requirement from the business on how to handle the data
rejections. Do we need to reload or reject and inform etc? Discussions are required
and appropriate process must be developed.

Performance Improvement

a) Network Performance

b) Session Performance

c) Database Performance

d) Analyze and if required define the Informatica and DB partitioning


requirements.

Qualitative Testing
Analyze & validate your transformation business rules. More of functional testing.

e) You need review field by field from source to target and ensure that the required
transformation logic is applied.

f) If you are making changes to existing mappings make use of the data lineage
feature Available with Informatica Power Center. This will help you to find the
consequences of Altering or deleting a port from existing mapping.

g) Ensure that appropriate dimension lookup’s have been used and your
development is in Sync with your business requirements.

Integration Testing
After unit testing is complete; it should form the basis of starting integration
testing. Integration testing should

Test out initial and incremental loading of the data warehouse.

Integration testing will involve following

1. Sequence of ETL jobs in batch.


2. Initial loading of records on data warehouse.
3. Incremental loading of records at a later date to verify the newly inserted or
updated data.
4. Testing the rejected records that don’t fulfill transformation rules.
5. Error log generation.

Integration Testing would cover End-to-End Testing for DWH. The coverage of the
tests would include the below:

Count Validation

Record Count Verification: DWH backend/Reporting queries against source and


target as an initial check.

Control totals: To ensure accuracy in data entry and processing, control totals can
be compared by the system with manually entered or otherwise calculated control
totals using the data fields such as quantities, line items, documents, or dollars, or
simple record counts

Hash totals: This is a technique for improving data accuracy, whereby totals are
obtained on identifier fields (i.e., fields for which it would logically be meaningless
to construct a total), such as account number, social security number, part number,
or employee number. These totals have no significance other than for internal
system control purposes.

Limit checks: The program tests specified data fields against defined high or low
value limits (e.g., quantities or dollars) for acceptability before further processing.

Dimensional Analysis

Data integrity between the various source tables and relationships.

Statistical Analysis

Validation for various calculations.

 When you validate the calculations you don’t require loading the entire rows
into target and Validating it.
 Instead you use the Enable Test Load feature available in Informatica Power Center.

Property Description
Enable Test You can configure the Integration Service to perform
Load a test load.

With a test load, the Integration Service reads and


transforms data without writing to targets. The
Integration Service generates all session files, and
performs all pre- and post-session Functions, as if
running the full session.

The Integration Service writes data to relational


targets, but rolls back the data when the session
completes. For all other target types, such as flat file
and SAP BW, the Integration Service does not write
data to the targets.

Enter the number of source rows you want to test in


the Number of Rows to Test field. You cannot
perform a test load on sessions using XML sources.
You can perform a test load for relational targets
when you configure a session for normal Mode. If
you configure the session for bulk mode, the session
fails.
Number of Rows Enter the number of source rows you want the
to Test Integration Service to test load. The Integration
Service reads the number you configure for the test
load.

Data Quality Validation

Check for missing data, negatives and consistency. Field-by-Field data verification
can be done to check the consistency of source and target data.

Overflow checks: This is a limit check based on the capacity of a data field or data
file area to accept data. This programming technique can be used to detect the
truncation of a financial or quantity data field value after computation (e.g.,
addition, multiplication, and division). Usually, the first digit is the one lost.

Format checks: These are used to determine that data are entered in the proper
mode, as numeric or alphabetical characters, within designated fields of
information. The proper mode in each case depends on the data field definition.
Sign test: This is a test for a numeric data field containing a designation of an
algebraic sign, + or - , which can be used to denote, for example, debits or credits
for financial data fields.

Size test: This test can be used to test the full size of the data field. For example, a
social security number in the United States should have nine digits

Granularity

Validate at the lowest granular level possible

Other validations

Audit Trails, Transaction Logs, Error Logs and Validity checks.

Note: Based on your project and business needs you might have additional
testing requirements.

User Acceptance Test


In this phase you will involve the user to test the end results and ensure that
business is satisfied with the quality of the data.

Any changes to the business requirement will follow the change management
process and eventually those changes have to follow the SDLC process.

Optimize Development, Testing, and Training Systems

 Dramatically accelerate development and test cycles and reduce storage


costs by creating fully functional, smaller targeted data subsets for
development, testing, and training systems, while maintaining full data
integrity.
 Quickly build and update nonproduction systems with a small subset of
production data and replicate current subsets of nonproduction copies faster.
 Simplify test data management and shrink the footprint of nonproduction
systems to significantly reduce IT infrastructure and maintenance costs.
 Reduce application and upgrade deployment risks by properly testing
configuration updates with up-to-date, realistic data before introducing them
into production .
 Easily customize provisioning rules to meet each organization’s changing
business requirements.
 Lower training costs by standardizing on one approach and one
infrastructure.
 Train employees effectively using reliable, production-like data in training
systems.

Support Corporate Divestitures and Reorganizations

 Untangle complex operational systems and separate data along business


lines to quickly build the divested organization’s system.
 Accelerate the provisioning of new systems by using only data that’s
relevant to the divested organization.
 Decrease the cost and time of data divestiture with no reimplementation
costs .

Reduce the Total Cost of Storage Ownership

 Dramatically increase an IT team’s productivity by reusing a comprehensive


list of data objects for data selection and updating processes across multiple
projects, instead of coding by hand—which is expensive, resource intensive,
and time consuming .
 Accelerate application delivery by decreasing R&D cycle time and
streamlining test data management.
 Improve the reliability of application delivery by ensuring IT teams have
ready access to updated quality production data.
 Lower administration costs by centrally managing data growth solutions
across all packaged and custom applications.
 Substantially accelerate time to value for subsets of packaged applications.
 Decrease maintenance costs by eliminating custom code and scripting.
Informatica Power Center Testing
Debugger: Very useful tool for debugging a valid mapping to gain troubleshooting
information about data and error conditions. Refer Informatica documentation to
know more about debugger tool.

Test Load Options – Relational Targets.

Running the Integration Service in Safe Mode

 Test a development environment. Run the Integration Service in safe mode


to test a development environment before migrating to production
 Troubleshoot the Integration Service. Configure the Integration Service to
fail over in safe mode and troubleshoot errors when you migrate or test a
production environment configured for high availability. After the
Integration Service fails over in safe mode, you can correct the error that
caused the Integration Service to fail over.

Syntax Testing: Test your customized queries using your source qualifier before
executing the session. Performance Testing for identifying the following
bottlenecks:

 Target
 Source
 Mapping
 Session
 System

Use the following methods to identify performance bottlenecks:

 Run test sessions. You can configure a test session to read from a flat file
source or to write to a flat file target to identify source and target
bottlenecks.
 Analyze performance details. Analyze performance details, such as
performance counters, to determine where session performance decreases.
 Analyze thread statistics. Analyze thread statistics to determine the optimal
number of partition points.
 Monitor system performance. You can use system monitoring tools to
view the percentage of CPU use, I/O waits, and paging to identify system
bottlenecks. You can also use the Workflow Monitor to view system
resource usage. Use Power Center conditional filter in the Source Qualifier
to improve performance.

Share metadata. You can share metadata with a third party. For example, you
want to send a mapping to someone else for testing or analysis, but you do not
want to disclose repository connection information for security reasons. You
can export the mapping to an XML file and edit the repository connection
information before sending the XML file. The third party can import the
mapping from the XML file and analyze the metadata.

Debugger
You can debug a valid mapping to gain troubleshooting information about data and
error conditions. To debug a mapping, you configure and run the Debugger from
within the Mapping Designer. The Debugger uses a session to run the mapping on
the Integration Service. When you run the Debugger, it pauses at breakpoints and
you can view and edit transformation output data.

You might want to run the Debugger in the following situations:

 Before you run a session. After you save a mapping, you can run some
initial tests with a debug session before you create and configure a session in
the Workflow Manager.
 After you run a session. If a session fails or if you receive unexpected
results in the target, you can run the Debugger against the session. You
might also want to run the Debugger against a session if you want to debug
the mapping using the configured session properties.

Debugger Session Types:

You can select three different debugger session types when you configure the
Debugger. The Debugger runs a workflow for each session type. You can choose
from the following Debugger session types when you configure the Debugger:

 Use an existing non-reusable session. The Debugger uses existing source,


target, and session configuration properties. When you run the Debugger, the
Integration Service runs the non-reusable session and the existing workflow.
The Debugger does not suspend on error.
 Use an existing reusable session. The Debugger uses existing source,
target, and session configuration properties. When you run the Debugger, the
Integration Service runs a debug instance of the reusable session And creates
and runs a debug workflow for the session.
 Create a debug session instance. You can configure source, target, and
session configuration properties through the Debugger Wizard. When you
run the Debugger, the Integration Service runs a debug instance of the debug
workflow and creates and runs a debug workflow for the session.

Debug Process

To debug a mapping, complete the following steps:

1. Create breakpoints. Create breakpoints in a mapping where you want the


Integration Service to evaluate data and error conditions.

2. Configure the Debugger. Use the Debugger Wizard to configure the Debugger
for the mapping. Select the session type the Integration Service uses when it runs
the Debugger. When you create a debug session, you configure a subset of session
properties within the Debugger Wizard, such as source and target location. You can
also choose to load or discard target data.

3. Run the Debugger. Run the Debugger from within the Mapping Designer.
When you run the Debugger, the Designer connects to the Integration Service. The
Integration Service initializes the Debugger and runs the debugging session and
workflow. The Integration Service reads the breakpoints and pauses the Debugger

when the breakpoints evaluate to true.

4. Monitor the Debugger. While you run the Debugger, you can monitor the target
data, transformation and mapplet output data, the debug log, and the session log.
When you run the Debugger, the Designer displays the following windows:

 Debug log. View messages from the Debugger.


 Target window. View target data.
 Instance window. View transformation data.
5. Modify data and breakpoints. When the Debugger pauses, you can modify
data and see the effect on transformations, mapplets, and targets as the data moves
through the pipeline. You can also modify breakpoint information.

The Designer saves mapping breakpoint and Debugger information in the


workspace files. You can copy breakpoint information and the Debugger
configuration to another mapping. If you want to run the Debugger from another
Power Center Client machine, you can copy the breakpoint information and the
Debugger configuration to the other Power Center Client machine.

Running the Debugger:

When you complete the Debugger Wizard, the Integration Service starts the
session and initializes the Debugger. After initialization, the Debugger moves in
and out of running and paused states based on breakpoints and commands that you
issue from the Mapping Designer. The Debugger can be in one of the following
states:

 Initializing. The Designer connects to the Integration Service.


 Running. The Integration Service processes the data.
 Paused. The Integration Service encounters a break and pauses the
Debugger.

Note: To enable multiple users to debug the same mapping at the same time, each
user must configure different port numbers in the Tools > Options > Debug tab.

The Debugger does not use the high availability functionality.


Monitoring the Debugger :

When you run the Debugger, you can monitor the following information:

 Session status. Monitor the status of the session.


 Data movement. Monitor data as it moves through transformations.
 Breakpoints. Monitor data that meets breakpoint conditions.
 Target data. Monitor target data on a row-by-row basis.

The Mapping Designer displays windows and debug indicators that help you
monitor the session:

 Debug indicators. Debug indicators on transformations help you follow


breakpoints and data flow.
 Instance window. When the Debugger pauses, you can view transformation
data and row information in the Instance window.
 Target window. View target data for each target in the mapping.
 Output window. The Integration Service writes messages to the following
tabs in the Output window:
 Debugger tab. The debug log displays in the Debugger tab.
 Session Log tab. The session log displays in the Session Log tab.
 Notifications tab. Displays messages from the Repository Service.

While you monitor the Debugger, you might want to change the transformation
output data to see the effect on subsequent transformations or targets in the data
flow. You might also want to edit or add more breakpoint information to monitor
the session more closely.

Restrictions

You cannot change data for the following output ports:

 Normalizer transformation. Generated Keys and Generated Column ID


ports.
 Rank transformation. RANKINDEX port.
 Router transformation. All output ports.
 Sequence Generator transformation. CURRVAL and NEXTVAL ports.
 Lookup transformation. NewLookupRow port for a Lookup
transformation configured to use a dynamic cache.
 Custom transformation. Ports in output groups other than the current
output group.
 Java transformation. Ports in output groups other than the current output
group.

Additionally, you cannot change data associated with the following:

 Mapplets that are not selected for debugging


 Input or input/output ports
 Output ports when the Debugger pauses on an error breakpoint
Constraint-Based Loading:
In the Workflow Manager, you can specify constraint-based loading for a session.
When you select this option, the Integration Service orders the target load on a
row-by-row basis. For every row generated by an active source, the Integration
Service loads the corresponding transformed row first to the primary key table,
then to any foreign key tables. Constraint-based loading depends on the following
requirements:

 Active source. Related target tables must have the same active source.
 Key relationships. Target tables must have key relationships.
 Target connection groups. Targets must be in one target connection group.
 Treat rows as insert. Use this option when you insert into the target. You
cannot use updates with constraint based loading.

Active Source:

When target tables receive rows from different active sources, the Integration
Service reverts to normal loading for those tables, but loads all other targets in the
session using constraint-based loading when possible. For example, a mapping
contains three distinct pipelines. The first two contain a source, source qualifier,
and target. Since these two targets receive data from different active sources, the
Integration Service reverts to normal loading for both targets. The third pipeline
contains a source, Normalizer, and two targets. Since these two targets share a
single active source (the Normalizer), the Integration Service performs constraint-
based loading: loading the primary key table first, then the foreign key table.

Key Relationships:

When target tables have no key relationships, the Integration Service does not
perform constraint-based loading.

Similarly, when target tables have circular key relationships, the Integration
Service reverts to a normal load. For example, you have one target containing a
primary key and a foreign key related to the primary key in a second target. The
second target also contains a foreign key that references the primary key in the first
target. The Integration Service cannot enforce constraint-based loading for these
tables. It reverts to a normal load.
Target Connection Groups:

The Integration Service enforces constraint-based loading for targets in the same
target connection group. If you want to specify constraint-based loading for
multiple targets that receive data from the same active source, you must verify the
tables are in the same target connection group. If the tables with the primary key-
foreign key relationship are in different target connection groups, the Integration
Service cannot enforce constraint-based loading when you run the workflow. To
verify that all targets are in the same target connection group, complete the
following tasks:

 Verify all targets are in the same target load order group and receive data
from the same active source.
 Use the default partition properties and do not add partitions or partition
points.
 Define the same target type for all targets in the session properties.
 Define the same database connection name for all targets in the session
properties.
 Choose normal mode for the target load type for all targets in the session
properties.

Treat Rows as Insert:

Use constraint-based loading when the session option Treat Source Rows As is set
to insert. You might get inconsistent data if you select a different Treat Source
Rows As option and you configure the session for constraint-based loading.

When the mapping contains Update Strategy transformations and you need to load
data to a primary key table first, split the mapping using one of the following
options:

 Load primary key table in one mapping and dependent tables in another
mapping. Use constraint-based loading to load the primary table.
 Perform inserts in one mapping and updates in another mapping.

Constraint-based loading does not affect the target load ordering of the mapping.
Target load ordering defines the order the Integration Service reads the sources in
each target load order group in the mapping. A target load order group is a
collection of source qualifiers, transformations, and targets linked together in a
mapping. Constraint based loading establishes the order in which the Integration
Service loads individual targets within a set of targets receiving data from a single
source qualifier.

Example

The following mapping is configured to perform constraint-based loading:

In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign
keys referencing the T1 primary key. T_3 has a primary key that T_4 references as
a foreign key.

Since these tables receive records from a single active source, SQ_A, the
Integration Service loads rows to the target in the following order:

1. T_1

2. T_2 and T_3 (in no particular order)


3. T_4

The Integration Service loads T_1 first because it has no foreign key dependencies
and contains a primary key referenced by T_2 and T_3. The Integration Service
then loads T_2 and T_3, but since T_2 and T_3 have no dependencies, they are not
loaded in any particular order. The Integration Service loads T_4 last, because it
has a foreign key that references a primary key in T_3.After loading the first set of
targets, the Integration Service begins reading source B. If there are no key
relationships between T_5 and T_6, the Integration Service reverts to a normal load
for both targets.

If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6
receive data from a single active source, the Aggregator AGGTRANS, the
Integration Service loads rows to the tables in the following order:

 T_5
 T_6

T_1, T_2, T_3, and T_4 are in one target connection group if you use the same
database connection for each target, and you use the default partition properties.
T_5 and T_6 are in another target connection group together if you use the same
database connection for each target and you use the default partition properties.
The Integration Service includes T_5 and T_6 in a different target connection
group because they are in a different target load order group from the first four
targets.

Enabling Constraint-Based Loading:

When you enable constraint-based loading, the Integration Service orders the target
load on a row-by-row basis. To enable constraint-based loading:

1. In the General Options settings of the Properties tab, choose Insert for the
Treat Source Rows As property.
2. Click the Config Object tab. In the Advanced settings, select Constraint
Based Load Ordering.
3. Click OK.
Target Load Order
When you use a mapplet in a mapping, the Mapping Designer lets you set the
target load plan for sources within the mapplet.

Setting the Target Load Order

You can configure the target load order for a mapping containing any type of target
definition. In the Designer, you can set the order in which the Integration Service
sends rows to targets in different target load order groups in a mapping. A target
load order group is the collection of source qualifiers, transformations, and targets
linked together in a mapping. You can set the target load order if you want to
maintain referential integrity when inserting, deleting, or updating tables that have
the primary key and foreign key constraints.

The Integration Service reads sources in a target load order group concurrently, and
it processes target load order groups sequentially.

To specify the order in which the Integration Service sends data to targets, create
one source qualifier for each target within a mapping. To set the target load order,
you then determine in which order the Integration Service reads each source in the
mapping.

The following figure shows two target load order groups in one mapping:

In this mapping, the first target load order group includes ITEMS, SQ_ITEMS, and
T_ITEMS. The second target load order group includes all other objects in the
mapping, including the TOTAL_ORDERS target. The Integration Service
processes the first target load order group, and then the second target load order
group.

When it processes the second target load order group, it reads data from both
sources at the same time.

To set the target load order:

1. Create a mapping that contains multiple target load order groups.


2. Click Mappings > Target Load Plan.
3. The Target Load Plan dialog box lists all Source Qualifier transformations in
the mapping and the targets that receive data from each source qualifier.
4. Select a source qualifier from the list.
5. Click the Up and Down buttons to move the source qualifier within the load
order.
6. Repeat steps 3 to 4 for other source qualifiers you want to reorder. Click OK.

You might also like