Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Informatica Training - Presentation Transcript

1. ETL Tool – Informatica Training


2. Course Objectives
o At the end of this course you will:
o Understand how to use all major PowerCenter 7 components
o Be able to perform basic Repository administration tasks
o Be able to build basic ETL Mappings and Mapplets
o Be able to create, run and monitor Workflows
o Be able to troubleshoot most problems
3. PowerCenter 7 Architecture Not Shown: Client ODBC Connections for Source and
Target metadata Targets Sources native native TCP/IP Heterogeneous Targets Repository
Repository Server Repository Agent TCP/IP native Server Heterogeneous Sources
Repository Designer Workflow Workflow Rep Server Manager Manager Monitor
Administrative Console
4. Repository Server
o Each Repository has an independent architecture for the management of the
physical Repository tables
o Components: one Repository Server, and a Repository Agent for each Repository

Client overhead for Repository management is greatly reduced by the Repository Server
Repository Repository Server Repository Agent Server Repository Manager Repository
Server Administration Console

5. Design Process
o Create Source definition(s)
o Create Target definition(s)
o Create a Mapping
o Create a Session Task
o Create a Workflow from Task components
o Run the Workflow and verify the results
6. Methods of Analyzing Sources
o Import from Database
o Import from File
o Import from Cobol File
o Import from XML file
o Create manually

Repository Relational Flat file COBOL file XML file Source Analyzer

7. Creating Target Definitions


o Methods of creating Target Definitions
o Import from Database
o Import from an XML file
o Manual Creation
o Automatic Creation
8. Import Definition from Database
o Can “ Reverse engineer” existing object definitions from a database system
catalog or data dictionary

Table View Synonym Warehouse Designer Database DEF Repository Repository Server
Repository Agent TCP/IP native DEF ODBC

9. Transformation Types
o Source Qualifier: reads data from Flat File and Relational Sources
o XML Source Qualifier: reads XML data
o Normalizer: reorganizes records from VSAM, Relational and Flat File
o Expression: performs row-level calculations
o Aggregator: performs aggregate calculations
o Filter: drops rows conditionally
o Router: splits rows conditionally
o Sorter: sorts data

Informatica PowerCenter 6 provides 17 objects for data transformation:

10. Transformation Types


o Update Strategy: tags rows for insert, update, delete, reject
o Lookup: looks up values and passes them to other objects
o Joiner: joins heterogeneous sources
o Stored Procedure: calls a database stored procedure
o Sequence Generator: generates unique ID values
o Rank: limits records to the top or bottom of a range

Transformations objects (continued)

11.
o Active Transformation
o Connected
o Ports
 All input / output
o Specify a Filter condition
o Usage
 Filter rows from flat file sources
 Single pass source(s) into multiple targets

Drops rows conditionally Filter Transformation

12. Aggregator Transformation


o Active Transformation
o Connected
o Ports
 Mixed
 Variables allowed
 Group By allowed
o Create expressions in output or variable ports
o Usage
 Standard aggregations

Performs aggregate calculations

13. Incremental Aggregation Trigger in Session Properties, Performance Tab


o Cache is saved into $PMCacheDir: aggregatorname.DAT
o aggregatorname.IDX
o Upon next run, files are overwritten with new cache information

Best Practice is to copy these files in case a rerun of data is ever required. Reinitialize
when no longer needed, e.g. – at the beginning new month processing Example: When
triggered, PowerCenter Server will save new MTD totals. Upon next run (new totals),
Server will subtract old totals; difference will be passed forward MTD calculation

14. Joiner Transformation


o Active Transformation
o Connected
o Ports
 All input or input / output
 “ M” denotes port comes from master source
o Specify the Join condition
o Usage
 Join two flat files
 Join two tables from different databases
 Join a flat file with a relational table

Performs heterogeneous joins on records from different databases or flat file sources

15. Mid-Mapping Join


o The Joiner does not accept input in the following situations:
 Both input pipelines begin with the same Source Qualifier
 Both input pipelines begin with the same Normalizer
 Both input pipelines begin with the same Joiner
 Either input pipeline contains an Update Strategy
16. Sorter Transformation Sorts data from any source, at any point in a data flow
o Active Transformation
o Connected
o Ports
o Input/Output
o Define one or more sort keys
o Define sort order for each key
o Usage
o Sort data from Flat Files
o Re-sort data in a mapping

Sort Keys Sort Order

17. Sequence Generator Transformation Generates unique keys for any port on a row
o Passive Transformation
o Connected
o Ports
 Two predefined output ports,
 NEXTVAL and
 CURRVAL
 No input ports allowed
o Usage
 Generate sequence numbers
 Shareable across mappings
18. Lookup Transformation Looks up values in a database table and provides data to other
components in a Mapping
o Passive Transformation
o Connected / Unconnected
o Ports
o Mixed
o “L” denotes Lookup port
o “R” denotes port used as a return value (unconnected Lookup only)
o Specify the Lookup Condition
o Usage
o Get related values
o Verify if records exists or if data has changed
19. To Cache or not to Cache?
o Caching can significantly impact performance
o Cached
 Lookup table data is cached locally on the Server
 Mapping rows are looked up against the cache
 Only one SQL SELECT is needed
o Uncached
 Each Mapping row needs one SQL SELECT
o Rule Of Thumb : Cache if the number (and size) of records in the Lookup table is
small relative to the number of mapping rows requiring lookup
20. Update Strategy Transformation Used to specify how each individual row will be used to
update target tables (insert, update, delete, reject)
o Active Transformation
o Connected
o Ports
o All input / output
o Specify the Update Strategy Expression
o Usage
o Updating Slowly Changing Dimensions
o IIF or DECODE logic determines how to handle the record
21. Router Transformation Rows sent to multiple filter conditions
o Active Transformation
o Connected
o Ports
o All input/output
o Specify filter conditions for each Group
o Usage
o Link source data in one pass to multiple filter conditions
22. Router Groups
o Input group (always one)
o User-defined groups
o Each group has one condition
o ALL group conditions are evaluated for EACH row
o One row can pass multiple conditions
o Unlinked Group outputs are ignored
o Default group (always one) can capture rows that fail all Group conditions
23. Mappings
o By the end of this section you will be familiar with:
o Mapping components
o Source Qualifier transformation
o Mapplets
o Mapping validation
o Data flow rules
o System Variables
o Mapping Parameters and Variables
24. Mapplet Advantages
o Useful for repetitive tasks / logic
o Represents a set of transformations
o Mapplets are reusable
o Use an ‘instance’ of a Mapplet in a Mapping
o Changes to a Mapplet are inherited by all instances
o Server expands the Mapplet at runtime
25. Unsupported Transformations
o Use any transformation in a Mapplet except:
o XML Source definitions
o COBOL Source definitions
o Normalizer
o Pre- and Post-Session stored procedures
o Target definitions
o Other Mapplets
26. Object Sharing
o Reuse existing objects
o Enforces consistency
o Decreases development time
o Share objects by using copies and shortcuts
o Required security settings for sharing objects:
 Repository Privilege: Use Designer
 Originating Folder Permission: Read
 Destination Folder Permissions: Read/Write

Created from a shared folder Copy from shared or unshared folder Preserves space
Duplicates space Dynamically reflects changes to original object Changes to original
object not captured Link to an object in another folder Copy object to another folder
SHORTCUT COPY

27. Data Flow Rules


o Each Source Qualifier starts a single data stream (a dataflow)
o Transformations can send rows to more than one transformation (split one data
flow into multiple pipelines)
o Two or more data flows can meet together -- if (and only if) they originate from a
common active transformation
 Cannot add an active transformation into the mix

Example holds true with Normalizer in lieu of Source Qualifier. Exceptions are: Mapplet
Input and Joiner transformations DISALLOWED T T Active ALLOWED T Passive T

28. Mapping Validation


o Mappings must:
 Be valid for a Session to run
 Be end-to-end complete and contain valid expressions
 Pass all data flow rules
o Mappings are always validated when saved; can be validated without being saved
o Output Window will always display reason for invalidity
29. Workflows
o By the end of this section, you will be familiar with:
o The Workflow Manager GUI interface
o Workflow Schedules
o Setting up Server Connections
 Relational, FTP and External Loader
o Creating and configuring Workflows
o Workflow properties
o Workflow components
o Workflow Tasks
30. Workflow Manager Tools
o Workflow Designer
 Maps the execution order and dependencies of Sessions, Tasks and
Worklets, for the Informatica Server
o Task Developer
 Create Session, Shell Command and Email tasks
 Tasks created in the Task Developer are reusable
o Worklet Designer
 Creates objects that represent a set of tasks
 Worklet objects are reusable
31. Workflow Scheduler Objects
o Setup reusable schedules to associate with multiple Workflows
 Used in Workflows and Session Tasks
32. Server Connections
o Configure Server data access connections
 Used in Session Tasks
o Configure:
o Relational
o MQ Series
o FTP
o Custom
o External Loader
33. Task Developer
o Create basic Reusable “building blocks” – to use in any Workflow
o Reusable Tasks
 Session Set of instructions to execute Mapping logic
 Command Specify OS shell / script command(s) to run
 during the Workflow
 Email Send email at any point in the Workflow

Session Command Email

34. Session Tasks


o After this section, you will be familiar with:
o How to create and configure Session Tasks
o Session Task properties
o Transformation property overrides
o Reusable vs. non-reusable Sessions
o Session partitions
35. Session Task
o Created to execute the logic of a mapping (one mapping only)
o Session Tasks can be created in the Task Developer (reusable) or Workflow
Developer (Workflow-specific)
o Steps to create a Session Task
 Select the Session button from the Task Toolbar or
 Select menu Tasks | Create

Session Task Bar Icon

36. Session Task


o Steps to create a Session Task (continued)
 Double click on the Session object
 Valid Mappings are displayed in the dialog box
o Session Task tabs
 General
 Properties
 Config Object
 Sources
 Targets
 Components
 Transformation
 Partitions
 Metadata Extensions
37. Session Task
o Server instructions to runs the logic of ONE specific Mapping
 e.g. - source and target data location specifications, memory allocation,
optional Mapping overrides, scheduling, processing and load instructions
o Becomes a component of a Workflow (or Worklet)
o If configured in the Task Developer, the Session Task is reusable (optional)
38. Monitor Workflows
o By the end of this section you will be familiar with:
o The Workflow Monitor GUI interface
o Monitoring views
o Server monitoring modes
o Filtering displayed items
o Actions initiated from the Workflow Monitor
o Truncating Monitor Logs
39. Monitor Workflows
o The Workflow Monitor is the tool for monitoring Workflows and Tasks
o Review details about a Workflow or Task in two views
 Gantt Chart view
 Task view

Gantt Chart view Task view

40. Monitoring Workflows


o Perform operations in the Workflow Monitor
 Restart -- restart a Task, Workflow or Worklet
 Stop -- stop a Task, Workflow, or Worklet
 Abort -- abort a Task, Workflow, or Worklet
 Resume -- resume a suspended Workflow after a failed Task is corrected
o View Session and Workflow logs
o Abort has a 60 second timeout
 If the Server has not completed processing and committing data during the
timeout period, the threads and processes associated with the Session are
killed

Stopping a Session Task means the Server stops reading data

41. Parameters and Variables


o By the end of this section you will understand:
 System Variables
 Creating Parameters and Variables
 Features and advantages
 Establishing values for Parameters and Variables
42. System Variables SESSSTARTTIME $$$SessStartTime
o Returns the system date value as a string. Uses system clock on machine hosting
Informatica Server
 format of the string is database type dependent
 Used in SQL override
 Has a constant value
o Returns the system date value on the Informatica Server
 Used with any function that accepts transformation date/time data types
 Not to be used in a SQL override
 Has a constant value

SYSDATE

o Provides current datetime on the Informatica Server machine


 Not a static value
43. Mapping Parameters and Variables
o Apply to all transformations within one Mapping
o Represent declared values
o Variables can change in value during run-time
o Parameters remain constant during run-time
o Provide increased development flexibility
o Defined in Mapping menu
o Format is $$ VariableName or $$ ParameterName
44. Mapping Parameters and Variables
o Sample declarations

Declare Variables and Parameters in the Designer Mappings menu Set the appropriate
aggregation type Set optional Initial Value User-defined names
45. Mapping Parameters and Variables Apply Parameter / Variable in formula
46. Functions to Set Mapping Variables
o SetCountVariable -- Counts the number of evaluated rows and increments or
decrements a mapping variable for each row
o SetMaxVariable -- Evaluates the value of a mapping variable to the higher of two
values
o SetMinVariable -- Evaluates the value of a mapping variable to the lower of two
values
o SetVariable -- Sets the value of a mapping variable to a specified value
47. Error Handling (Row Level)
o Reject files on Server store data rejected by the Writer and/or target database
o Conditions causing data to be rejected include:
 Target database constraint violations, out-of-space errors, log space errors,
null values not accepted
 Data-driven records, containing value ‘3’ or DD_REJECT
 Target table properties ‘reject truncated/overflowed rows’

Data, Overflow, Null or Truncated Column Level Indicators INSERT


0,D,1313,D,Regulator System,D,Air Regulators,D,250.00,D,150.00,D UPDATE
1,D,1314,D,Second Stage Regulator,D,Air Regulators,D,365.00,D,265.00,D DELETE
2,D,1390,D,First Stage Regulator,D,Air Regulators,D,170.00,D,70.00,D REJECT
3,D,2341,D,Depth/Pressure Gauge,D,Small Instruments,D,105.00,D,5.00,D
Transformation errors are written to the Session log file, not to the .bad file

48.
o Thank you !

You might also like