Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 40

PowerMart/Center 6

Performance/Tuning Overview
Informatica Developer Network

Mark Haas
Senior Consultant
Professional Services

P&T022603
A Performance Tuning Methodology
 How do you optimize and tune Informatica?
 You need a combination of basic knowledge and
techniques
 Knowledge
 Know the basic Informatica architecture
 Know the other building blocks in the system
 Database features and architecture
 Operating system features and architecture
 Know the limits of what is possible
 Know your goals

 Techniques
 Know how to find the bottlenecks
 Know how to eliminate them

2
The Production Environment
Disk Disk

Disk Disk Disk Disk


LAN/WAN DBMS
Disk OS Disk
Disk Disk
Disk Informatica Disk
Disk Disk
 This is a multi-vendor, multi-system environment
 There are many components involved
 Operating systems, databases, networks and I/O
 Usually need to monitor performance in several places
 Usually need to monitor outside Informatica

 Tuning involves an iterative approach


 1) Find the biggest performance problem
 2) Eliminate or reduce it
 3) Go to step 1

3
The Production Environment

DBMS
OS

 Database and OS considerations


 Databases are not usually configured for optimal
performance when first installed out of the box
 The amount of memory and CPUs dedicated to the ETL
process also plays a significant role

4
The Production Environment

Informatica

 Session Performance will be affected by:


 Properly utilizing mapping objects (Lookups, Aggregators,
etc.) with cache attributes
 Partition where ever possible
 Use good mapping strategies

5
Performance Tuning
 There are two general areas to optimize and tune
 External components to Informatica (OS, memory, etc.)
 Internal to Informatica (tasks, mapping, workflows, etc.)

 Getting data through the Informatica engine


 This involves optimizing at the task and mapping level
 This involves optimizing the system to make sure Informatica
runs well
 Getting data into and out of the Informatica engine
 This usually involves optimizing non-Informatica components
 The engine can’t run faster than the source or target

6
Measuring Performance
 Several types of Bottlenecks can effect performance
 Network
 System
 Database
 Informatica Mappings and Tasks

 There are several ways to measure performance such as total


amount of data (volume) per unit of time
 Volume can be measured as:
 Number of bytes
 Number of rows
 Time can be measured as:
 CPU or process time
 “Wall Clock” time

7
Measuring Performance
 For the purpose of identifying bottlenecks we will
use:
 “Wall Clock” time as a relative measurement time
 Number of rows loaded over the period of time (rows per
second)

 Rows per second (rows/sec) will allow performance


measurement of a session over a period of time and
with changes in our environment.
 Rows per sec can have a very wide range
depending on the size of the row (number of bytes),
the type of source/target (flat file or relational) and
underlying hardware.

8
Measuring Performance
 Establishing the baseline using the Workflow
Manager
 Run the workflow with the session task to be measured
 View the session properties from the Workflow Monitor at
the end of the session and record the number of rows
loaded, the start time, and end times of the session
 Subtract the start time from the end time of the session
convert to seconds to get the total session time
 Divide the number of rows loaded by the number of
seconds of run time for the session

9
Measuring Performance
 Things to note:
 Calculated rows per second are not the same as “Write
Throughput”
 For multiple targets use sum of rows loaded for targets that
are similar in row size
 For multiple partitions use the sum of rows loaded for all
partitions
 Complex mappings with multiple data flows may require
the creation of multiple mappings
 Monitor background processes external to Informatica
 Establish a baseline and work to make improvements
 Use the MX views and Real-Time Metadata Reporter to
view historical session task information

10
Server Resource Architecture
 Two session task parameters control the processing
pipeline
 The session shared memory size (DTM Buffer Size)
 The buffer block size

 These parameters are specified per session task, in


the Workflow Manager

11
Server Resource Architecture
 Session Shared Memory Size controls the total
amount of memory used to buffer rows internally by
the reader and writer
 This sets the total number of blocks available
 The usual value is about 25 MB (25,000,000)
 If the block size is 64K, then you get 16*25 = 400 blocks

 Buffer Block Size controls the size of the blocks that


move in the pipeline
 Optimum size depends on the row size being processed
 64K (64,000)  64 rows of 1K
 128K (128,000)  128 rows of 1K

12
Server Resource Architecture

Shared memory constant 25 MB


900
800
700
rows p/sec

600
1K row
500
400 2K row
300 3K row
200
100
0
32K 64K 96K 128K
buffer block size

13
Server Resource Architecture

Buffer block size constant 64K

1000
800
rows p/sec

600 1K row
2K row
400 3K row
200
0
12 15 20 25 30 35
shared memory size

14
Identifying Source Bottlenecks
 Reading from a flat file usually does not cause a
bottleneck
 Configure a session to a flat file target instead of a
relational target
 You can use the one created from a write test

 Place a filter set to false on the output of each


source qualifier in the map
 Execute the generated SQL from the source qualifier
externally
Use a filter to create read throughput only test
Original
Modified

15
Identifying Source Bottlenecks
 Modified “Read Test” mapping
 Used to identified a read or mapping bottleneck
 Create a new mapping that bypasses transformations
 If there is a significant change then the problem could be
with the transformations

Original

Modified

16
Identifying Target Bottlenecks
 Writing to a flat file usually does not cause a
bottleneck
 Configure a session task to write to a flat file target
instead of a relational target
 If a target is a flat file then most likely the problem is
elsewhere

17
Identifying Mapping Bottlenecks
 Generally if the bottleneck is not with the reader or
writer process then the next step to review the
mapping
 Mapping bottlenecks can be created by improper
configuration of aggregator, joiner, sorter, rank, and
lookup caches

18
Identifying Session Task Bottlenecks
 Check commit levels
 Check the session log for excessive transformation
errors
 Decimal Arithmetic enabled
 Update (else insert) enabled
 Incorrect Partitioning choices (PowerCenter)
 Pre and Post session task commands
 Tracing level

19
Mapping Optimizing
 Single-Pass Read
 Use a single SQL when reading multiple tables from the same
database.
 Data type conversions are expensive
 Watch out for hidden port to port data type conversions
 Over use of string and character conversion functions

 Use filters early and often


 Filters can be used as SQL overrides (Source Qualifies,
Lookups) and as transformations to reduce the amount of data
processed
 Simplify expressions
 Factor out common logic
 Use variables to reduce the number of time a function is used

20
Mapping Optimizing
 Use operators instead of functions when possible
 The concatenation operator (‘||’) is faster than the CONCAT
function

 Simplify nested IIFs when possible


 Use proper cache sizing for Aggregators, Rank,
Sorter, Joiner and Lookup transformations
 Incorrect cache sizing creates additional disk swaps that
can have a large performance degradation
 Use the performance counters to verify correct sizing

21
Using Performance Counters
 All transformations have basic counters that are
maintained by the server
 Counters are enabled at the session level using the collect
performance data option
 The server creates a “session_name.perf” for counter
statistics
 Default location is the session log directory
 Collecting performance data will have some impact on
session performance similar to tracing
 The important counters are the ones showing reads
and writes to disks for aggregators, ranks, sorter,
and joiners and the rows read from cache for
lookups
 Refer to the help documentation to find cache calculations
for the Aggregator, Lookup, Sorter, Joiner, and Rank

22
Using Performance Counters
 Performance counters provide a variety of statistics
for each transformation in a map
 Counters are enabled in the session property

Session Wizard
“Collect Performance Data”

23
Session Task Optimizing
 Run partitioned sessions
 Improved performance
 Better utilization of CPU, I/O and data source/target
bandwidth

 Use “Incremental Aggregation” when possible


 Good for “rolling average” type aggregation

 Reduce transformation errors


 Put logic in place to reduce “bad data” such as nulls in a
calculation

 Reduce the level of tracing


Tip: See the “Velocity Methodology”
Document for further information

24
Partitioned Extraction and Load
 Key Range
 Round Robin
 Hash Auto Keys
 Hash User Keys
 Pass Through

25
Partitioned Extraction and Load
 Key Range Partition
 Data is distributed between partitions according to
pre-defined range values for keys
 Available in PowerCenter 5, but only for Source
Qualifier
 Key Range partitioning can now be applied to other
transformations
 Common use for this new functionality:
 Apply to Target Definition to align output with physical partition
scheme of target table.
 Apply to Target Definition to write all data to a single file to stream
into a database bulk loader that does not support concurrent loads
(e.g. Teradata, DB2)

26
Partitioned Extraction and Load
 Key Range Partition (Continued)
 You can select input or input/output ports for the keys,
not variable ports and output only ports.
 Remember, the partition occurs BEFORE the transformation;
hence variable and output only are not allowed because they
have not yet been evaluated
 You can select multiple keys to form a Composite Key
 Range specification is: Start Range and End Range
 You can specify an Open Range also
 NULL values will go to the First Partition
 All unmatched rows will also go into First Partition,
user will see the following warning message once in
the log file:
TRANSF_1_1_2_1> TT_11083 WARNING! A row did not match any of the key ranges
specified at transformation [EXPTRANS]. This row and all subsequent unmatched rows will
be sent to the first target partition.

27
Partitioned Extraction and Load
 Round Robin Partitioning
 The Informatica Server evenly distributes the data to
each partition.
 The user need not specify anything because key
values are not interrogated
 Common use:
 Apply to a flat file source qualifier when dealing with unequal
input file sizes
– Use “user hash” when there are downstream lookups/joiners
 Trick: All but one of the input files can be empty… you no
longer have to physically partition input files
– Note: There are performance implications with doing this.
Sometimes it’s better, sometimes it’s worse.

28
Partitioned Extraction and Load
 Hash Partitioning
 Data is distributed between partitions according to a “hash”
function applied to the key values
 PowerCenter 5 supports Auto Hash Partitioning
automatically for aggregator and rank transformations
 Goal: Evenly distribute data, but make sure that like key
values are always processed by the same partition
 A “hash function” is applied to a set of ports.
 Hash function returns a value between 1 and the number of
partitions.
 A “good” hash function provides a uniform distribution of return
values
– Not all 1s or all 2s, but an even mix.
 Based on this function’s return value, the row is routed to
the corresponding partition (e.g. if 1, send to partition 1, if
2, send to partition 2, etc.)

29
Partitioned Extraction and Load
 Hash Auto Key
 No need to specify the keys to hash on. Automatically uses all
key ports (ex, Group By key or Sort key) as a composite key
 Only valid for Unsorted Aggregation, Rank and Sorter
 The default partition type for unsorted aggregator and rank
 NULL values will get converted to zero for hashing.

 Hash Key
 Just like Auto Hash Key, but the user explicitly specifies the ports
to be used as the key.
 Only input and input/output ports allowed
 Common use:
 When dealing with input files of unequal sizes, hash partition (vs.
round-robin) data into downstream lookups and joiners to improve
“locality of reference” of caches (hash on ports used in the
lookup/join condition)
 “Override” the auto hash for performance reasons
– Hashing is faster on numeric values vs. strings

30
Partitioned Extraction and Load
 Pass Through Partitioning
 Data is passed through to the next stage within the current
partition
 Since data is not distributed, the user need not specify
anything
 Common use:
 Create additional stages (processing threads) within pipeline
to improve performance

31
Partitioned Extraction and Load
 Partition tab appears in Session Task from within the
Workflow Manager

Partitioning is
Based on the transformation Partition Tab

Select the
appropriate
partition

32
Partitioned Extraction and Load
 By default, session tasks have the following partition points
and partition schemes:
 Relational Source, Target (Pass Through)
 File Source, Target (Pass Through)
 Unsorted Aggregator, Rank (Auto Hash)
 NOTE: You cannot delete the default partition points for sources
and targets.
 You can NOT run debug session with number of partition >1
 Just like PowerCenter 5…

33
Partitioning Do’s
 Cache as much as possible in memory
 Spread cache files across multiple physical devices both
within and across partitions
 Unless the directory is hosted on some kind of disk array,
configure disk based caches to use as many disk devices as
possible

 “Round Robin” or “Hash” partition a single input file until you


determine this is bottleneck
 “Range Partition” to align source/target with physical
partitioning scheme of source/target tables
 “Pass through” partition to apply more CPU resources to a
pipeline (when TX is bottleneck)

34
Partitioning Don’ts
 Don’t add partition points if the session is already source or
target constrained
 Tune the source or target to eliminate the bottleneck

 Don’t add partition points if the CPUs are already maxed out
(%idle < ~5%)
 Eliminate unnecessary processing and/or buy more CPUs

 Don’t add partition points if system performance monitor


shows regular “page out” activity
 Eliminate unnecessary processing and/or buy more memory

 Don’t add multiple partitions until you’ve tested and tuned a


single partition
 You’ll be glad you did

35
Default Partition Points

Default partitioning points

Reader Transformation Transfor- Writer


mation
Default Partition Points
Default Partition Point Default Partition Type Description

Source Qualifier or Pass-through Controls how the Server reads data from the
Normalizer Transformation source and passes data into the source qualifier

Rank and unsorted Hash auto-keys Ensures that the Server group rows before it
Aggregator Transformation sends them to the transformation

Target Instances Pass-through Control how the instances distribute data to the
targets

36
Session Performance Thread Statistics
***** RUN INFO FOR TGT LOAD ORDER GROUP [1], SRC PIPELINE [1] *****
MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of
Reader
partition point [SQ_order_data] has completed: Total Run Time = [79.694595]
Stage secs, Total Idle Time = [21.450840] secs, Busy Percentage = [73.083695].

MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation


DTM stage of partition point [SQ_order_data] has completed: Total Run Time =
Stage [80.135229] secs, Total Idle Time = [0.711023] secs, Busy Percentage =
[99.112721].

MASTER> PETL_24022 Thread [WRITER_1_1_1] created for the write stage of


Writer partition point(s) [order_data_out] has completed: Total Run Time =
Stage [80.936382] secs, Total Idle Time = [2.123060] secs, Busy Percentage =
[97.376878].
MASTER> PETL_24021 ***** END RUN INFO *****

Reader Data Transformation Writer

Note: Stages overlap when possible

37
Using Thread Statistics
 Look for stages with that are ~100% busy. These
are likely candidates for “pass-through” partition
points to allow concurrent processing
 If repartition point rules do not allow sub-dividing the
“100% busy” thread, then consider adding another
partition
 This only helps if you have available CPU capacity

38
Informatica Resources
 Professional Services
 Contact the Regional Manager for details and pricing

 Educational Services
 Performance & Tuning Courses

 Informatica Methodology
 http://www1.informatica.com/methodology

 Informatica Developer Network downloads & forums


 http://devnet.informatica.com

39
40

You might also like