Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

PowerCenter Basic Concepts

Ale Ribeiro
June 6, 2006

1
Agenda

• What is PowerCenter?
• PowerCenter Client Applications
• Demo
• PowerCenter – Designer, Workflow Manager, Workflow Monitor
• PowerCenter Architecture

• Where do we use PowerCenter in IT?


• Q&A

2
PowerCenter

• Is a single, unified enterprise data integration


platform that allows companies and government
organizations of all sizes to access, discover,
and integrate data from virtually any business
system, in any format, and deliver that data
throughout the enterprise at any speed
• An ETL Tool (Extract, Transform and Load)

3
PowerCenter Client Applications

Administration Development

Repository Administration Console Designer Workflow Workflow


Manager (browser-based) Manager Monitor

Manage repository Perform domain and Create ETL Create and Monitor and
• connections repository service tasks: mappings start workflows control
• folders • Create/configure nodes workflows
• objects and repository services
• users and groups • Upgrade/delete
• Start/stop
• Backup/restore

4
Designer Tools – Create mappings

Source Target Transformation Mapplet Mapping


Analyzer: Designer: Developer: Designer: Designer:
create source create target create reusable create create
objects objects transformations mapplets mappings

5
Mapping

Logically Defines the ETL Process:


• Reads data from sources
• Applies transformation logic to data
• Writes transformed data to targets

Source Transformations Target

Note: Sources and targets can be flat files, relational tables, XML files,
application systems, message queues, etc

Unit 1 6
Mapping (cont’d)
• A mapping is a set of source and target definitions linked by transformation
objects that define the rules for data transformation. Mappings represent the
data flow between sources and targets. When the Integration Service runs a
session, it uses the instructions configured in the mapping to read,
transform, and write data.
• Every mapping must contain the following components:
Source definition. Describes the characteristics of a source table or file.
Transformation. Modifies data before writing it to targets. Use different transformation objects to
perform different functions.
Target definition. Defines the target table or file.
Links. Connect sources, targets, and transformations so the Integration Service can move the
data as it transforms it.
• A mapping can also contain one or more mapplets. A mapplet is a set of
transformations that you build in the Mapplet Designer and can use in
multiple mappings.

7
Example

• Give me an Excel file with Total Order Amount per


Customer. I also need to know when this data was
extracted (date) and the customer type initial ( first letter
of the customer type)
• Define the sources
• Orders
• Customers
• Define any required transformation
• Sum of order amount
• Get extracted date
• Get first letter of customer type
• Create the file

8
Transformations

• Generate, modify, or pass data


• Data passes into and out of
transformations through ports that
you link in a mapping
• Passive transformations do not
change the number of rows received
• Active transformations can change
the number of rows received

Unit 1 9
PowerCenter Transformations (partial list)

Source Qualifier: reads data from flat file and relational sources
Expression: performs row-level calculations
Filter: drops rows conditionally
Sorter: sorts data
Aggregator: performs aggregate calculations
Joiner: joins heterogeneous sources
Lookup: looks up values and passes them to other objects
Update Strategy: tags rows for insert, update, delete, reject
Router: routes rows conditionally
Transaction Control: allows data-driven commits and rollbacks

10
Advanced PowerCenter Transformations

Union: Performs a union-all join between two data streams


Java: allows Java syntax to be used within PowerCenter
Midstream XML Parser: reads XML from anywhere in mapping
Midstream XML Generator: writes XML to anywhere

More Source Qualifiers: read from XML, message queues


and applications

11
Mapplet – Set of transformation that can be
reusable

Mapplet
Input & Output
transformations
(pass data from
or to mapping)

Mapplet Designer Tool

Unit 14 12
Example: Data Sources Defined Outside Mapplet
Mapping

Source data defined


outside the Mapplet

Mapplet

Mapplet Input
transformation
Mapplet Output
transformation

Unit 14 13
Recap

1. ETL a. Extract, transform and load data


2. Designer b. Create mapping objects
3. Mapping c. Logically defines the ETL process
4. Transformation d. Generates or manipulates data
5. Mapplet − Set of transformations that can be
reused in multiple mappings

14
Workflow Manager Tools – Create and Start
Workflow

Create reusable tasks Create worklets Create workflows

15
Task

• An executable set of actions, functions or


commands
• Examples:
Session task runs a mapping
Command task runs a shell script
Email task sends an email
Decision task branches workflow conditionally
Timer task waits for a specified period

16
Session

• Task that executes a mapping


• Define Log Options, Error handling, Connections

17
Decision Task
Tests for a condition during the workflow and sets a flag based on
the condition
Use a link condition (or a Control task) downstream to test the flag
and control execution flow
Can use workflow variables in condition

Options on all
tasks to fail parent Treat inputs as
and disable AND/OR

Unit 16 18
Email Task
Sends an email within a workflow
Note: emails can also be sent post-session in a Session task
Can be used with a link condition to notify success or failure of prior
tasks

Unit 16 19
Event Wait Task
Pauses processing of the pipeline until a specified event occurs
Events can be:
Pre-defined – file watch
User-defined – created by an Event Raise task elsewhere in the workflow

Unit 17 20
Event Wait Task (cont’d)
Events Tab

Specify either a pre-defined


or user-defined event

User-defined events must be declared in the workflow Events tab

21
Event Raise Task
Sets the location of a user-defined event in the workflow
User-defined events are triggered when the PowerCenter Server executes
the Event Raise Task
User-defined events must be declared in the workflow Events tab

Used with the Event Wait Task

22
Command Task

Specifies one or more UNIX


command or shell script,
DOS command or batch file
for Integration Services to run
during a workflow

Note: UNIX and DOS commands


can also be run pre- or post-
session in a Session task

Command task status


(success or failure) is held in
the task-specific variable
$command_task_name.STATUS

23
Command Task (cont’d)

Add Cmd

Remove Cmd

24
Reusable Tasks

• Session, Email and Command tasks can be reusable


• Use the Task Developer to create reusable tasks
• Reusable tasks appear in the Navigator Tasks node and can be
dragged and dropped into any workflow

In a workflow, a reusable task is indicated by a special symbol

Unit 17 25
Worklet
An object representing a set or grouping of Tasks
Can contain any Task available in the Workflow Manager
Worklets expand and execute inside a Workflow
A Workflow which contains a Worklet is called the “parent
Workflow”
Worklets CAN be nested
Reusable Worklets – create in the Worklet Designer
Non-reusable Worklets – create in the Workflow Designer

Unit 18 26
Workflow

• A collection of ordered tasks


• Tasks can be linked sequentially, concurrently and/or combined
• Links can be conditional on previous tasks completing

Unit 1 27
Workflow Structure

• Workflow 1
1 • Session 1
• Worklet A
1 • Session A1
2 • Session A2
3 • Session A3
• Worklet B Worklet C
Session B1 Session B2 Session C1 3
1 2 Session C2 4

28
Workflow Schedule
•Workflow can be scheduled to run continuously, repeat at a given time or
interval, or start manually.
•The Integration Service runs a workflow unless the prior workflow run fails.
•When a workflow fails, the Integration Service removes the workflow from the
schedule, and you must reschedule it

29
Workflow Monitor

• Check Workflow Status


• Recover Workflow
• Get session log

30
Recap

1. Workflow a. A collection of ordered tasks


2. Worklet b. Set of tasks
3. Task c. An executable mapping, functions or commands
4. Workflow Manager d. Create and start workflows
5. Workflow Monitor e. Monitor and control workflows

Unit 1 31
PowerCenter Architecture

Domain
Sources Targets

PowerCenter Client

Repository

32
Architecture – Components
• Domain is a collection of nodes and services. Primary unit of administration
• The Repository Service manages connections to the PowerCenter repository from
client applications. The Repository Service is a separate, multi-threaded process that
retrieves, inserts, and updates metadata in the repository database tables. The
Repository Service ensures the consistency of metadata in the repository.
• The Integration Service reads mapping and session information from the repository.
It extracts data from the mapping sources and stores the data in memory while it
applies the transformation rules that you configure in the mapping. The Integration
Service loads the transformed data into the mapping targets.

• The Administration Console is a web application that you use to manage a


PowerCenter domain. If you have a user login to the domain, you can access the
Administration Console. Use the Administration Console to perform administrative
tasks such as managing logs, user accounts, and domain objects. Domain objects
include services, nodes, and licenses.

• The PowerCenter repository resides in a relational database. The repository


database tables contain the instructions required to extract, transform, and load data.
PowerCenter Client applications access the repository database tables through the
Repository Service.

33
Metadata

• Defines data and processes


• Examples:
• Source and target definitions
• Type (flat file, database table, XML file, etc)
• Datatype (character string, integer, decimal, etc)
• Other attributes (length, precision, etc.)
• Mapping logic
• Workflow logic

• Stored in a metadata repository


Repository

34
Recap

Match the terms and explanations:


1. Metadata a. Defines data and processes
2. Repository b. Collection of tables that contains
PowerCenter metadata
3. Repository Manager c. Repository organization and security
4. Integration Service d. ETL processing engine

Unit 1 35
Where do we use PowerCenter?

• Data Warehouse(SalesVision) and Data Mart


(Horizon) Loads
• Customer Hub Load
• Interfaces –
• PowerCafe Orders Peoplesoft
• Magic Leads PowerCafe
• Customer Portal Online Support Access Atlas
• ADS Sales Rep Accounts SalesPortal LDAP

36
PowerCenter Connect Options

Packaged Databases Messaging and Hierarchical* Software as a


Applications and and Flat Standards Service
Systems Files (SaaS)
Hyperion Essbase DB2 HTTP Adabas salesforce.com
Lotus Notes Flat files IBM MQSeries C-ISAM
PeopleSoft Informix JMS Complex flat files
SAP Netweaver BW Netezza LDAP Datacom
SAS SQL Server MSMQ IDMS
Siebel Sybase ODBC IMS
Teradata TIBCO Rendezvous VSAM
Web logs webMethods
Web Services
XML

37
Questions?

38

You might also like