Professional Documents
Culture Documents
PowerCenter Basic Concepts
PowerCenter Basic Concepts
PowerCenter Basic Concepts
Agenda
What is PowerCenter? PowerCenter Client Applications Demo
PowerCenter Designer, Workflow Manager, Workflow Monitor PowerCenter Architecture
PowerCenter
Is a single, unified enterprise data integration platform that allows companies and government organizations of all sizes to access, discover, and integrate data from virtually any business system, in any format, and deliver that data throughout the enterprise at any speed An ETL Tool (Extract, Transform and Load)
Development
Repository Manager
Manage repository connections folders objects users and groups
Designer
Workflow Manager
Workflow Monitor
Monitor and control workflows
Target Transformation Mapplet Source Designer: Developer: Designer: Analyzer: create create source create target create reusable objects transformations mapplets objects
Mapping
Logically Defines the ETL Process: Reads data from sources Applies transformation logic to data Writes transformed data to targets
Source
Transformations
Target
Note: Sources and targets can be flat files, relational tables, XML files, application systems, message queues, etc
Unit 1
Mapping (contd)
A mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation. Mappings represent the data flow between sources and targets. When the Integration Service runs a session, it uses the instructions configured in the mapping to read, transform, and write data. Every mapping must contain the following components:
Source definition. Describes the characteristics of a source table or file. Transformation. Modifies data before writing it to targets. Use different transformation objects to perform different functions. Target definition. Defines the target table or file. Links. Connect sources, targets, and transformations so the Integration Service can move the data as it transforms it.
A mapping can also contain one or more mapplets. A mapplet is a set of transformations that you build in the Mapplet Designer and can use in multiple mappings.
Example
Give me an Excel file with Total Order Amount per Customer. I also need to know when this data was extracted (date) and the customer type initial ( first letter of the customer type)
Define the sources
Orders Customers
Transformations
Generate, modify, or pass data Data passes into and out of transformations through ports that you link in a mapping Passive transformations do not change the number of rows received Active transformations can change the number of rows received
Unit 1
More Source Qualifiers: read from XML, message queues and applications
11
Unit 14
12
Mapplet
Unit 14
13
Recap
1. 2. 3. 4. 5. ETL Designer Mapping Transformation Mapplet
a. b. c. d.
Extract, transform and load data Create mapping objects Logically defines the ETL process Generates or manipulates data Set of transformations that can be reused in multiple mappings
14
Create worklets
Create workflows
15
Task
An executable set of actions, functions or commands Examples:
Session task runs a mapping Command task runs a shell script Email task sends an email Decision task branches workflow conditionally Timer task waits for a specified period
16
Session
Task that executes a mapping Define Log Options, Error handling, Connections
17
Decision Task
Tests for a condition during the workflow and sets a flag based on the condition Use a link condition (or a Control task) downstream to test the flag and control execution flow Can use workflow variables in condition
Unit 16
18
Email Task
Sends an email within a workflow
Note: emails can also be sent post-session in a Session task
Can be used with a link condition to notify success or failure of prior tasks
Unit 16
19
Unit 17
20
21
Command Task
Specifies one or more UNIX command or shell script, DOS command or batch file for Integration Services to run during a workflow
Note: UNIX and DOS commands can also be run pre- or postsession in a Session task
23
24
Reusable Tasks
Session, Email and Command tasks can be reusable Use the Task Developer to create reusable tasks Reusable tasks appear in the Navigator Tasks node and can be dragged and dropped into any workflow
Unit 17
25
Worklet
An object representing a set or grouping of Tasks Can contain any Task available in the Workflow Manager Worklets expand and execute inside a Workflow A Workflow which contains a Worklet is called the parent Workflow Worklets CAN be nested Reusable Worklets create in the Worklet Designer Non-reusable Worklets create in the Workflow Designer
Unit 18
26
Workflow
A collection of ordered tasks Tasks can be linked sequentially, concurrently and/or combined Links can be conditional on previous tasks completing
Unit 1
27
Workflow Structure
Workflow 1
1 1 2 3 Session 1 Worklet A
Session A1 Session A2 Session A3
Worklet B
Session B1 Session B2
Worklet C 2
Session C1 Session C2
3 4
28
Workflow Schedule
Workflow can be scheduled to run continuously, repeat at a given time or interval, or start manually. The Integration Service runs a workflow unless the prior workflow run fails. When a workflow fails, the Integration Service removes the workflow from the schedule, and you must reschedule it
29
Workflow Monitor
Check Workflow Status Recover Workflow Get session log
30
Recap
1. 2. 3. 4. 5. Workflow Worklet Task Workflow Manager Workflow Monitor
a. b. c. d. e.
A collection of ordered tasks Set of tasks An executable mapping, functions or commands Create and start workflows Monitor and control workflows
Unit 1
31
PowerCenter Architecture
Domain
Sources
Integration Service
Targets
Administration Console
PowerCenter Client
Repository
32
Architecture Components
Domain is a collection of nodes and services. Primary unit of administration The Repository Service manages connections to the PowerCenter repository from client applications. The Repository Service is a separate, multi-threaded process that retrieves, inserts, and updates metadata in the repository database tables. The Repository Service ensures the consistency of metadata in the repository. The Integration Service reads mapping and session information from the repository. It extracts data from the mapping sources and stores the data in memory while it applies the transformation rules that you configure in the mapping. The Integration Service loads the transformed data into the mapping targets. The Administration Console is a web application that you use to manage a PowerCenter domain. If you have a user login to the domain, you can access the Administration Console. Use the Administration Console to perform administrative tasks such as managing logs, user accounts, and domain objects. Domain objects include services, nodes, and licenses. The PowerCenter repository resides in a relational database. The repository database tables contain the instructions required to extract, transform, and load data. PowerCenter Client applications access the repository database tables through the Repository Service.
33
Metadata
Defines data and processes Examples:
Source and target definitions
Type (flat file, database table, XML file, etc) Datatype (character string, integer, decimal, etc) Other attributes (length, precision, etc.)
Repository
34
Recap
Match the terms and explanations: 1. Metadata 2. Repository 3. Repository Manager 4. Integration Service a. Defines data and processes b. Collection of tables that contains PowerCenter metadata c. Repository organization and security d. ETL processing engine
Unit 1
35
36
37
Questions?
38