Professional Documents
Culture Documents
Introduction To Datastage
Introduction To Datastage
Introduction To Datastage
5.4
What is DataStage?
DB2, by default
Repository stores objects built and used by Information Server
applications
DataStage jobs
Metadata imported into DataStage
Security
Repository
Logging and reporting
Metadata management
Information
Services
Director
Business
Glossary
Information
Analyzer
FastTrack
DataStage /
QualityStage
Metadata
Metadata
Access Services
Analysis Services
Metadata Server
MetaBrokers
Metadata
Workbench
IS Users
Reporting
DataStage architecture
DataStage clients
Administrator
Designer
Director
DataStage engines
Parallel engine
Runs parallel jobs
Server engine
Runs server jobs
Runs job sequences
DataStage Clients
DataStage Administrator
Project
environment
variables
DataStage Designer
Menus / toolbar
Job log
DataStage parallel job
with DB2 Connector
stage
DataStage Director
Log
messages
Developing in DataStage
In Designer, only the job log for the currently opened job is
available
Standard
jobs folder
Standard
table
definitions
folder
Parallel jobs
Server jobs
A type of server job that runs and controls jobs and other activities
specified on the diagram
Can run both parallel jobs and other job sequences
Provides a common interface to the set of jobs it controls
Runtime monitoring in the job log
Stages
Read data
Write data
Examples: Sequential File, DB2, Oracle, Peek stages
Transform data (Transformer stage)
Filter data (Transformer stage)
Aggregate data (Aggregator stage)
Generate data (Row Generator stage)
Merge data (Join, Lookup stages)
Links
Job Parallelism
Pipeline parallelism
Advantages:
Partition parallelism
Three-node partitioning
Node 1
subset1
Stage
Node 2
subset2
Data
Stage
subset3
Node 3
Stage
Configuration file
Determines the degree of parallelism (number of partitions) of
jobs that use it
Every job runs under a configure file
Each DataStage project has a default configuration file
Specified by the $APT_CONFIG_FILE job parameter
Individual jobs can run under different configuration files than the
project default
The same job can also run using different configuration files on different
job runs
Resources attached
to the node
Node
(partition)
Checkpoint
1. True or false: DataStage Director is used to build and
compile your ETL jobs
2. True or false: Use Designer to monitor your job during
execution
3. True or false: Administrator is used to set global and project
properties
Checkpoint solutions
1. False.
2. True. The job log is available both in Director and Designer.
In Designer, you can only view log messages for a job open
in Designer.
3. True.