Download as pdf
Download as pdf
You are on page 1of 7
c2)0a016 ‘Velocity Error Handling Teebiques - PowerCenter Mappings © Print User Rating: 5/5 eo000 Please Rate Vote5 ¥ Rate Details Last Updated: June 27, 2015 Hits: 17039 Error Handling Techniques - PowerCenter Mappings Challenge Attempting to load data with errors can directly impact load performance and data quality given enough volume. There can be downstream impacts that can affect business decisions at all levels. A mapping’ target system constraints are not efficient filters for valid data when loading high volumes, An alternative approach is required. Description Identifying errors and creating an error handling strategy is an essential part of a data integration project. Informatica’s best practice recommendation is to use the mapping to identify and capture data errors and then make them available for further processing or correction through continuing data quality management processes. In the production environment, data must be checked and validated prior to entry into the target system. One strategy for catching data errors is to use PowerCenter mappings and error logging capabilities to catch specific data validation errors and unexpected transformation or database constraint errors. Data Validation Errors The first step in using a mapping to trap data validation errors is to understand and identify the error handling requirements. Consider the following questions: What types of data errors are likely to be encountered? Of these errors, which ones should be captured? What process can capture the possible errors? Should errors be captured before they have a chance to be written to the target databas: Will any of these errors need to be reloaded or corrected? How will the users know if errors are encountered? How will the errors be stored? Should descriptions be assigned for individual errors? Can a table be designed to store captured errors and the error descriptions? Capturing data errors within a mapping and re-routing these errors to an error table facilitates analysis by end users and improves performance, One practical application of the mapping approach is to capture foreign key constraint errors (¢.g., executing a lookup on a dimension table prior to loading a fact table), hpestvelcty informatica comindexprpbest-practces-ll'S0-errar-handling’379-m ep20z7Implecompenentéprink= page= w caar0r6 ‘alot = Error Harding Techniques - PowerCenter Mappings Referential integrity is assured by including this sort of functionality in a mapping, While the database still enforces the foreign key constraints, erroneous data is not written to the target table; constraint errors are captured within the mapping so that the PowerCenter server does not have to write them to the session log and the reject/bad file, thus improving performance. Data content errors can also be captured in a mapping. Mapping logic can identify content errors and attach descriptions to them. This approach can be effective for many types of data content error, including: date conversion, null values intended for not null target fields, and incorrect data formats or data types. Sample Mapping Approach for Data Validation Errors In the following example, customer data is to be checked to ensure that invalid null values are intercepted before being written to not null columns in a target CUSTOMER table. Once a null value is identified, the row containing the error is to be separated from the data flow and logged in an error table. One sol ion is to implement a mapping similar to the one shown below: v > ane ‘ar & (e)-—-f # SeOAOWIO EHRALOATE ATRLERROR cusronen 0: a <8 9 —¥) ERADR_KEY~ 7 OS RAGRS Liha a custower € A iorce) Once the customer data has been read from the source table, a lookup on the address table will validate the customer’s address, The next transformation, an expression, is used to validate the source data with name and address validation rules such as, the customer’s name is NULL, and flagging records that have one or more errors. A router transformation then separates valid rows from those containing errors. The valid records are loaded directly to the Customer's Target table. It is good practice to append error rows with a unique key; this can be a composite consisting of a MAPPING_ID and ROW_ID, for example. The MAPPING_ID would refer to the mapping name and the ROW_ID would be created by a sequence generator. The composite key is designed to allow developers to trace rows written to the error tables that store information useful for error reporting and investigation. In this example, two error tables are suggested, namely: CUSTOMER, ERR and ERR_DESC_TBL. hipestvelcty informatica comindexprpbest-pactces-ll'S0-errar-handling‘379-m ep20z7Implecomperertéprink= page= c2j0a016 Velocity Error Handing Teebiques - PowerCenter Mappings NAME vatchar2 % FOLDER_NAME varchar boB date pee ROWID urer ADDRESS vatchai2 @ MAPFING ID varchar? @ ROW.ID umber @ ERRORDESC — varchat2 % MAPSINGID —vaichai2 LOAD_DATE date SOURCE varchar? TARGET varchar? a Ee Lo The table ERR_DESC_TBL, is designed to hold information about the error, such as the Folder name, mapping name, the ROW_ID, and the error description, This table can be used to hold all data validati error descriptions for all mappings, giving a single point of reference for reporting. The other audit columns like Load_Date will be captured when the error is recorded. The CUSTOMER_ERR table can be an almost exact copy of the target CUSTOMER table: * All columns are defined as optional to allow Null values to be stored + Two additional columns are added: ROW_ID and MAPPING _ID. These columns allow the two error tables to be joined. The CUSTOMER ERR table stores the entire row that was rejected, enabling the user to trace the error rows back to the source and potentially build mappings to reprocess them. The mapping logic must assign a unique description for each error in the rejected row. In this example, any null value intended for a not null target field could generate an error message such as ‘NAME is NULL’ or ‘DOB is NULL’. This step can be done in an expression transformation (c.g., EXP_VALIDATION in the sample mapping). After the field descriptions are assigned, the error row can be split into several rows, one for each possible error using a normalizer transformation. After a single source row is normalized, the resulting rows can be filtered to leave only errors that are present (i.e., each record can have zero to many errors). For example, if a row has three errors, three error rows would be generated with appropriate error descriptions (ERROR_DESC) in the table ERR_DESC_TBL. The following table shows how the produced error data could look. Table Name: | CUSTOMER_ERR NAME DOB ADDRESS, ROW_ID MAPPING_ID NULL NULL NULL 1 DIM_LOAD TableName: | ERR_DESC_TBL hipestvelcty informatica comindex prpbest-practces-ll'S0-errar-handlng‘379-m p20z7Implecomperertprink= page= aT c2)0a016 ‘Velocity Error Handling Teebiques - PowerCenter Mappings FOLDER MAPPING | ROW_ID | ERROR_DESC LOAD_DATE | SOURCE ‘Target cusr DIM_Loap | 1 Name is NULL 10/11/2006 | CUSTOMER_FF | CUSTOMER cust DIM_LoaD | 1 DOB is NULL. 19/11/2006 | CUSTOMER_FF | CUSTOMER cust DIM_LoaD | 1 10/11/2006 | CUSTOMER_FF | CUSTOMER The efficiency of a mapping approach can be increased by employing reusable objects. Common logic should be placed in mapplets, which can be shared by multiple mappings. This improves productivity in implementing and managing the capture of data validation errors. Data validation error handling can be extended by including mapping logic to grade error severity. For example, flagging data validation errors as ‘soft’ or ‘hard’. * A ‘hard’ error can be defined as one that would fail when being written to the database, such as a constraint error. * A ‘soft’ error can be defined as a data content error. A record flagged as ‘hard’ can be filtered from the target and written to the error tables, while a record flagged as ‘soft’ can be written to both the target system and the error tables. This gives business analysts an opportunity to evaluate and correct data imperfections while still allowing the records to be processed for end-user reporting. Ultimately, business organizations need to decide if the analysts should fix the data in the reject table or in the source systems. The advantage of the mapping approach is that all errors are identified as either data errors or constraint errors and can be properly addressed. The mapping approach also reports errors based on projects or categories by identifying the mappings that contain errors. The most important aspect of the mapping approach however, is its flexibility. Once an error type is identified, the error handling logic can be placed anywhere within a mapping. By using the mapping approach to capture identified errors, the operations team can effectively communicate data quality issues to the business users. Constraint and Transformation Errors Perfect data can never be guaranteed, In implementing the mapping approach described above to detect errors and log them to an error table, how can we handle unexpected errors that arise in the load? For example, PowerCenter may apply the validated data to the database; however the relational database ‘management system (RDBMS) may reject it for some unexpected reason, An RDBMS may, for example, reject data if constraints are violated. Ideally, we would like to detect these database-level errors automatically and send them to the same error table used to store the soft errors caught by the mapping hipestvelcty informatica comindexprpbest-pactces-ll'S0-errar-handling‘379-m ep20z7Implecomperertéprink= page= caar0r6 ‘aby - Error Haring Tecniqes- PowerCenter Mappings approach described above. In some eases, the ‘stop on errors’ session property can be set to ‘1’ to stop source data for which unhandled errors were encountered from being loaded. In this case, the process will stop with a failure, the data must be corrected, and the entire source may need to be reloaded or recovered. This is not always an acceptable approach. An alternative is to have the load process continue in the event of records being rejected, and then reprocess only the records that were found to be in error. This can be achieved by configuring the ‘stop on errors’ property to 0 and capturing all rejected rows in error logging tables or files. By default, the error-messages from the RDBMS and any un-caught transformation errors are sent to the session log, Switching on error logging using the Relational Database option shown in Fig 1, redirects these messages to a selected database (application specific work area) in which four tables are automatically created: PMERR_MSG PMERR_DATA, PMERR_TRANS PMERR_SESS By creating an application specific work area, table growth can be managed per application and the tables are easier to purge later, Note that Informatica does not recommend creating these tables in the Informatica repository database. Fig 1: Error Logging Options: Session Log, Relational Database & File ‘ere Paps) Cone te [ing | Conan snr] sot (Dayne pay Tine: Ben et) confoname: [dead sete cag Recimrs Pt Sesto ny Fes ‘Wie Cert Sass Loa Peasy War Cer Sse La sr B Erorfonding| Stes onero= DaeraE OF Sars Paes at On Pesesion cannon enor OoRetes Slew Brora Tee (Patios Options {ror tog Type for a9 Tye ‘The PowerCenter Advanced Workflow Guide contains detailed information on the structure of these tables. hipestvelcty informatica comindexprpbest-pactces-ll'S0-errar-handling‘379-m ep20z7Implecomperertéprink= page= ST c2)0a016 ‘Velocity Error Handling Teebiques - PowerCenter Mappings However, the PMERR_MSG table stores the error messages that were encountered in a ses following four columns of this table allow us to retrieve any RDBMS errors: ion. The * WORKFLOW_RUN_ID: unique identifier of the workflow run + SESS_INST_ID: unique identifier for the session * TRANS_NAMI lame of the transformation where an error occurred. When a RDBMS error occurs, this is the name of the target transformation. TRANS_ROW_ID: Specifies the row ID generated by the last active source. This field contains the row number at the target when the error occurred. 2RROR_MSG: Error message generated by the RDBMS, + The PMERR_ SESS table contains information of the session in which the errors were generated. The combination of the tables can be used to transfer the records from these tables into the applicable error table, This can be done using a post-load session (i.e., an additional PowerCenter session) which reads the PMERR tables and inserts the error details into ERR_DESC_TBL. When the post process ends, ERR_DESC_TBL will contain both ‘soft’ errors and ‘hard’ errors. One problem with capturing RDBMS errors in this way is mapping them to the relevant source key to provide lineage. This can be difficult when the source and target rows are not directly related (i.e., one source row can actually result in zero or more rows at the target). In this case, the mapping that loads the source must write translation data to a staging table (including the source key and target row number). The translation table can then be used by the post-load session to identify the source key by the target row number retrieved from the error log. The source key stored in the translation table could be a row number in the case of a flat file, or a primary key in the case of a relational data source. Note that files can also be used for separate error logging as shown in Fig 1. Typically relational database loggingis preferred because it is queryable and permissions are manageable through the database layer. Performance considerations in a high volume environment with a high volume of untrapped errors may mean that file logging is preferable over relational database logging if taking error logging out of the session log file. PowerCenter provides in-built functions such as ERROR() and ABORT() that can be used in a transformation, ERROR() makes PowerCenter Integration Service skip a row whereas ABORT( stops the session. Both functions issue a User defined error message that is captured in the Session Log or written to Error Log tables depending on the session configuration. An example for ERROR() function is given below. Here the Start Date field is validated to see if it is greater than the current date. hipestvelcty informatica comindexprpbest-pactces-ll'S0-errar-handling‘379-m ep20z7Implecomperertéprink= page= or c2p0an016 Velocity Error Handling Techniques - PowerCenter Mappings Fonction Jpots |varites| —fomun @&¥. ~—~—SOS*SidCRC BCX Wrancione 2] | rn START_OT> SYSDATEERRORCiivald Sten Date) =I A 2 character 8 8 = 8 fs i ba (2 ia atic Funetensin the allrurctons group. | - Numeric keypad— j Operator keypad rel ano | on |_not Cancel 4fsfe{]|_d5 > Validate (Sei) |-lE % | |) comments Tam i Sele ; Help Reprocessing After the load and post-load sessions are complete, the error table (¢.g., ERR_DESC_TBL) can be analyzed by members of the business or operational teams. The rows listed in this table have not been loaded into the target database. The operations team can, therefore, fix the data in the source that resulted in ‘soft’ errors and may be able to explain and remediate the ‘hard’ errors. Once the errors have been fixed, the source data can be reloaded. Ideally, only the rows resulting in errors during the first run should be reprocessed in the reload. This can be achieved by including a filter and a lookup in the original load mapping and using a parameter to configure the mapping for an initial load or for a reprocess load. If the mapping is reprocessing, the lookup searches for each source row number in the error table, while the filter removes source rows for which the lookup has not found errors. If initial loading, all rows are passed through the filter, validated, and loaded. ‘With this approach, the same mapping can be used for initial and reprocess loads. During a reprocess run, the records successfully loaded should be deleted (or marked for deletion) from the error table, while any new errors encountered should be inserted as if an initial run. On completion, the post-load process is executed to capture any new RDBMS errors. This ensures that reprocessing loads are repeatable and result in reducing numbers of records in the error table over time, Tags: Big DataData SynchronizationData WarehousingMaster Data Management hipestvelcty informatica comindexprpbest-pactces-ll'S0-errar-handling‘379-m ep20z7Implecomperertéprink= page= ™

You might also like