Professional Documents
Culture Documents
Improve Job Performance in Informatica
Improve Job Performance in Informatica
Improve Job Performance in Informatica
INDEX
1.INTRODUCTION............................................................................................................3 1.1.Datawarehousing........................................................................................................3 1.2.Informatica as a datawarehousing tool .....................................................................3 1.3.Need for Performance Tuning ..................................................................................3 2.IDENTIFICATION OF BOTTLENECKS.......................................................................4 2.1Identify bottleneck in Source......................................................................................4 2.2Identify bottleneck in Target.......................................................................................4 2.3Identify bottleneck in Transformation........................................................................4 2.4Identify bottleneck in sessions ...................................................................................4 3.PERFORMANCE TUNING OF SOURCES...................................................................5 4.PERFORMANCE TUNING OF TARGETS...................................................................5 5.PERFORMANCE TUNING OF LOOKUP TRANSFORMATIONS.............................6 6.PERFORMANCE TUNING OF OTHER TRANSFORMATIONS................................7 6.1.Update Strategy Transformation................................................................................7 6.2.Sequence generator Transformation..........................................................................7 6.3.Sorter Transformation................................................................................................8 6.4.Aggregator Transformation.......................................................................................8 6.5.Joiner Transformation................................................................................................8 6.6.Filter Transformation.................................................................................................9 6.7.Expression Transformation........................................................................................9 7.PERFORMANCE TUNING OF MAPPINGS.................................................................9 8.PERFORMANCE TUNING OF SESSIONS.................................................................10 9.DATABASE OPTIMISATION......................................................................................11 10. OPTIMUM CACHE SIZE IN LOOKUPS..................................................................11 Calculating Lookup Index Cache...................................................................................11 Calculating Lookup Data Cache....................................................................................12
1. INTRODUCTION
1.1. Datawarehousing
The fixed definition of datawarehouse given by William Inmon, a pioneer in this field who popularized this term is A subject-oriented, integrated, nonvolatile and time-variant collection of data in support of management decisions. Datawarehouse is a place where a wide variety of data is prepared, organized and presented to its users in the best possible way. It helps to consolidate information stored in heterogeneous business systems. It is a database that does not delete, purge or update records, hence is a valuable historical storage. Success of any business depends on its users. A datawarehouse provides several advantages for a company struggling to provide its business users with an effective decision support solution. Datawarehouse publish the organizations data assets and provides high quality information for the management to make timely, consistent and reliable decisions that impact their business.
performance. This document lists all the techniques available to tune Informatica performance.
2.IDENTIFICATION OF BOTTLENECKS
Performance of Informatica is dependant on the performance of its several components like database, network, transformations, mappings, sessions etc. To tune the performance of Informatica, we have to identify the bottleneck first. Bottleneck may be present in source, target, transformations, mapping, session, database or network. It is best to identify performance issue in components in the order source, target, transformations, mapping and session. After identifying the bottleneck, apply the tuning mechanisms in whichever way they are applicable to the project.
MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ_test_all_text_data] has completed: Total Run Time = [11.703201] secs, Total Idle Time = [9.560945] secs, Busy Percentage = [18.304876]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ_test_all_text_data] has completed: Total Run Time = [11.764368] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000].
If busy percentage is 100, then that part is the bottleneck. Basically we have to rely on thread statistics to identify the cause of performance issues. Once the Collect Performance Data option (In sessionProperties tab) is enabled, all the performance related information would appear in the log created by the session.
then using persistent cache will help the sessions to reuse cache files. In case of static lookups, cache files will be built from memory cache instead of from the database, which will improve the performance. 10. If source is huge and lookup table is also huge, then also use persistent cache. 11. If target table is the lookup table, then use dynamic cache. The Informatica server updates the lookup cache as it passes rows to the target. 12. Use only the lookups you want in the mapping. Too many lookups inside a mapping will slow down the session. 13. If lookup table has a lot of data, then it will take too long to cache or fit in memory. So move those fields to source qualifier and then join with the main table. 14. If there are several lookups with the same data set, then share the caches. 15. If we are going to return only 1 row, then use unconnected lookup. 16. All data are read into cache in the order the fields are listed in lookup ports. If we have an index that is even partially in this order, the loading of these lookups can be speeded up. 17. If the table that we use for look up has an index (or if we have privilege to add index to the table in the database, do so), then the performance would increase both for cached and uncached lookups.
An Update Strategy allows determining if a row should be rejected, deleted, inserted or updated in the target. 1. Use Update Strategy transformation as less as possible in the mapping. 2. Do not use update strategy transformation if we just want to insert into target table, instead use direct mapping, direct filtering etc. 3. For updating or deleting rows from the target table we can use Update Strategy transformation itself.
3. We can also opt for sequencing in the source qualifier by adding a dummy field in the source definition and source qualifier, and then giving a sql query like select seq_name.nextval, <other column names>... from <source table name> where <condition if any>. Seq_name is the sequence that generates primary key for our source table. <Sequence name>. Nextval is a sequence generator object in Oracle. This method of primary key generation is faster than using sequence generator transformation.
names in that list. For better performance it is best to order by the index field of that table. 9. Combine the mappings that use same set of source data. 10. On a mapping, field with the same information should be given the same type and length throughout the mapping. Otherwise time will be spent on field conversions. 11. Instead of doing complex calculation in query, use an expression transformer and do the calculation in the mapping. 12. If data is passing through multiple staging areas, removing the staging area will increase performance. 13. Stored procedures reduce performance. Try to keep the stored procedures simple in the mappings. 14. Unnecessary data type conversions should be avoided since the data type conversions impact performance. 15. Transformation errors result in performance degradation. Try running the mapping after removing all transformations. If it is taking significantly less time than with the transformations, then we have to fine-tune the transformation. 16. Keep database interactions as less as possible.
6. Increase the database commit level (The point at which the Informatica server is set to commit data to the target table. For e.g. commit level can be set at every every 50,000 records) 7. By avoiding built in functions as much as possible, we can improve the performance. E.g. For concatenation, the operator || is faster than the function CONCAT (). So use operators instead of functions, where possible. The functions like IS_SPACES (), IS_NUMBER (), IFF (), DECODE () etc. reduce the performance to a big extent in this order. Preference should be in the opposite order. 8. String functions like substring, ltrim, and rtrim reduce the performance. In the sources, use delimited strings in case the source flat files or use varchar data type. 9. Manipulating high precision data types will slow down Informatica server. So disable high precision. 10. Localize all source and target tables, stored procedures, views, sequences etc. Try not to connect across synonyms. Synonyms and aliases slow down the performance.
9.DATABASE OPTIMISATION
To gain the best Informatica performance, the database tables, stored procedures and queries used in Informatica should be tuned well. 1. If the source and target are flat files, then they should be present in the system in which the Informatica server is present. 2. Increase the network packet size. 3. The performance of the Informatica server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance. So avoid network connections. 4. Optimize target databases.
11
To calculate the minimum lookup index cache size, use the formula:Columns in lookup cache = 200 * [<Column size> + 16] To calculate the maximum lookup index cache size, use the formula:Columns in lookup cache = <Number of rows in lookup table>* [<column size> + 16] * 2 Example:Suppose the lookup table has lookup values based in the field ITEM_ID. It uses the lookup condition, ITEM_ID = IN_ITEM_ID1. This ITEM_ID has data type as integer and size as 16. Therefore the total column size is 16. The table contains 60000 rows. Minimum lookup index cache size = 200 * [16 + 16] = 6400 Maximum lookup index cache size = 60000 * [16+16] * 2 = 3,840,000 So this lookup transformation needs an index cache size between 6400 and 3,840,000. For best session performance, this lookup transformation needs an index cache size of 3,840,000 bytes.
*********************
12