Datastage Guide

DATA STAGE REFERENCE GUIDE AND LAB HANDOUT
PALLETE:
Pallet contains all kind of stages
1. General
2. Database
4. Development and Debug
5. File
6. Processing
7. Real Time
8. Restructure
GENERAL STAGES:
General it contains 4 stages
1. Annotation,
2. Container,
3. Description Annotation,
4. Link
Link: It is used to give connection between two stages
Annotation: It is used to provide the comments
DATABASE STAGES:
It contains all kind of data base stages
1. DB2/UDB API
2. DB2/UDB Load
3. Dynamic RDBMS
4. ODBC Enterprise
5. SQL Server Enterprise
6. Oracle Enterprise
7. Stored Procedure
8. Teradata Enterprise
9. Teradata Multi Load
DEVELOPMENT AND DEBUG STAGES:

1. Column Generator
2. Head
3. Peek
4. Row Generator
5. Sample
6. Tail
7. Write Range Map
1
FILE STAGES:
1. Complex Flat File
2. Data set
3. External Source
4. External Target
5. File set
6. Look up File set
7. Sequential file
8. SAS Parallel Data Set
PROCESSING STAGES:
1. Aggregator
2. Change Apply
3. Change Capture
4. Compare
5. Compress
6. Copy
7. Decode
8. Difference
9. Encode
10. Expand
11. Filter
12. Funnel
13. Generic
14. join
15. Look up
16. Merge
17. Modify
18. Pivote
19. Remove Duplicate
20. External Filter
21. SAS
22. Sort
23. Surrogate Key Generator
24. Switch
25. Transformer
REAL TIME STAGES:

1. RTI Input
2. RTI Output
3. XML Input
4. XML Output
5. XML Transformer
2
RESTRUCTURE STAGES:
1. Column Export
2. Column Import
3. Combine Records
4. Make Sub Record
5. Make Vector
6. Promote Sub Record
7. Split Sub record
8. Split Vector
FAVOURATE STAGE: None
SEQUENCIAL FILE STAGE:

It is a file stage which is used to extract data from flat files (.txt, csv, .xls)
It supports one input link and one output link and one reject link
It is used either in source (or) target to extract data from flat files
Sequential file stage process the data sequentially by default
In Sequential file stage Read Methods are two types
Read Method = Specific files
= File pattern
Example of specific files:
c:\emp.txt
Example of file pattern:
C:\emp1.txt
C:\emp2.txt
C:\emp3.txt
If the file path is like above then if we want o extract data from all above files then
we can specify like this
C:\emp*.txt * for one or more character matching
C:\emp? .txt? For single character matching
LIMITAIONS OF SEQUENCIAL FILE:

12. It supports up to 2 GB Of data
3
EXAMPLE JOBS FOR SEQUENTIAL FILE STAGE:
Requirement: Extracting EMP data from text file and loading into text file By using
Sequential file Stage
Here Read method= specific file(s)
Input File data:
Output File data:
Job:
4
Properties for sequential_File_0:
Target Sequential_File_1 Properties:
5
Importing table table definition from sequential file:
Right click on sequential_file_0 Or double click on seqential_file_0
Click on Columns tab on left hand side tab
6
Click on Load Tab at Bottom
It will show the Window like this
Now Click On import and select Sequential file definitions..
7
Now in the file list u select the file emp1.txt
Click on Import tab:
Tick the check box First line is column names and Click on Defines
8
Now Click on Ok and click on Close tab Now that file emp1.txt will show in the table
Definition list
Now Select Emp1.txt in table definition list and click on OK
9
It will show Window like this
Now Click on OK and again Click on OK ….This is the way of procedure for importing
table definition
2)Example Job for Sequential File:

Here Read method=file pattern
Input sequential_File_0
Properties:
Emp1.txt
10
Emp2.txt
These two files are in this path: D:\dspractice\sanjeev\emp*.txt

Requirement:
Output target sequential_File_1:
11
Job:
Input Sequential_File_0 Properties:
12
Output Sequential_File_1 Properties:


Reject Mode=Continue
Continue: Continue to simply discard any rejected rows;
Fail: Fail to stop if any row is rejected; Output to send rejected rows down a reject link.
here two records field delimiter is not properly ended empno=300,400
13
Input sequential file_0 data:
Job:
Input sequential file_0 data Properties:
14
Output sequential_File_01 Properties:
Output data:
4) Example Job for Sequential File:

Reject Mode=Output
Input file data:
15
Input sequential file data :
Output Sequential File Data:
Output Rejects Data:
16
Job:
Note: if U select Reject Mode=Output then u must have reject link


Reject Mode=Output
Options:
File Name Column=InputRecFilepath
Note:Here u should create InputRecFilepath Column in Extended column Properties
Job:
17
Input Seq file data:
Input Properties:
18
Columns:
Output Properties:
19
Outputdata:


Reject Mode=Output
Options:
RowNumberColumn=InputRowNumberColumn
Note:Here u should create InputRecFilepath Column and InputRowNumberColumn in
Extended column Properties
Job:
InputProperties:
20
Columns:
Input Data:
21
Output Properties:
OutputData:
22

Reject Mode=Output
Options:
Options
Filter=Sed –n’3,5p’
Note:Here u should create InputRecFilepath Column and InputRowNumberColumn in
Extended column Properties
Input Source file data:
23
Input Sequential filedata:
Job:
24
Input properties:
25
Input Columns:
Output properties:
26
Output Data:


Reject Mode=Continue
Options:
Read First Rows=3
Input source File data:
27
Job:
Input sequential file properties:
Columns:
28
Output sequential file properties:
29

Reject Mode=Output
Options:
Read First Rows=5
Filter=grep “bhaskar”
Grep Options:
1) grep “string” Ex: grep “bhaskar”
2) grep –v “string” Ex: grep – v “bhaskar”
3) grep –i “String” Ex: grep - i “bhaskar”
Source filed data:
Job:
30
Input sequential file Properties:
Columns:
Output sequential file properties:
31
Output Data:


Reject Mode=Output
Options:
Filter=grep –v “bhaskar” It dsiplay except bhaskar record
Filter=grep –i “bhaskar”It display only bhaskar record “I “ means ignore case sensitive
32
Source filedata:
Job:
Input Sequential file stage properties:
33
Columns:
Output seqfile stage properties:
34
DATA SET SATGE:
1. Data set is a file stage which is used for staging the data for dependent jobs
2. Data set is a internal stage in data stage.
3. The extension of data set is .DS
4. It never use to extract data from client location
5. It is used as intermediate stage between two tables
Types of Data sets
1. Persistence data set
2. Virtual Data set
DATA SET MANAGEMENT UTILITY:

It is an utility for Organizing data set
ToolsDataset ManagementFile Name Output. Ds
MULTIPLE FILES FOR DATA SET:

1. Descriptor file
2. Data File
3. Control File
4. Header File
Descriptor file:
It holds the information about the address and about the structure
Data file: Represents the data in the native form
Control and Header file: these files are operate at the OS level for controlling the
Descriptor and Data file
Note: Data set other names or alias

1. Os Files
2. Orchestrate File
EXAMPLE JOBS FOR DATA SET STAGE:

Example Job on DATA SET:
Source file data:
JOB:
Input sequential properties:
Output dataset Properties:
35
Note: the target dataset file extension is .ds
Outputdataset data
We can view the record schema of data set

Go to toolsdataset management it will show the window like this
36
it will show the list of files and datasets
Now select empoutput1.ds and click OK
37
U can view the record schema By click on table definition icon
U can see the data here by click on data viewer option:
Note: By using data set management we can we can open the dataset, it can show the
schema window’ it can show the data window, it can copy dataset and can delete dataset
38
DIFFRENCES BETWEEN SEQUENTIAL FILE AND DATA SET:
Sequential File stage Data set stage

It executes in sequential Mode It executes in parallel mode
Cannot Apply Part ion techniques Can Apply Part ion Techniques
It supports all formats like.txt,.csv,.xls etc It supports only .ds
It is used to extract data from flat file It never use to extract data from client flat
files
SORT STAGE:
It is used to sort the data either in ascending order or descending order based on key
column while populating data from source to target it takes one input link and one output
link
Double click on sort stage go to stagepropertieskey = Cid (Based on column Sort
the data)sort order=ascending order or descending go to the outputdrag and drop
all columns in stage tabAdvanced there is a option execute mode if developer wants
to execute in sequential mode then execute mode in sequential by default it is in parallel
click ok savecompilerun it
Note: if developer require more keys go to stage properties click on sorting

keysselect one more key.
EXAMPLE JOB FOR SORT STAGE:

JOB:
39
Sequential_File_0 properties:
Sort stage properties:
40
Output Columns:
Target Dataset_1 Properties:
41
View data:
EXAMPLE JOB FOR SORT STAGE:

Input data:
Output Requirement:
42
Job:
Input Sequential_File_0 Properties:
Sort_4 Properties:
43
Output Mapping:
Output columns:
44
Sort_7 Properties:
45
Output: mappings
Output columns:
46
Target Dataset_1 properties:
OUTPUT DATA:
47
REMOVE DUPLICATE STAGE:
It is used to remove duplicates based on key column while populating data from source to
target it takes one input link and use one output link
Double click on remove duplicate stagekey=sale_id(column name)go to output drag

and drop all columns
If requires do part ion technique go to input portioning choose which part ion
technique we required save compile run it
EXAMPLE JOB FOR REMOVE DUPLICATE:
Input file :
Output File:
48
JOB:
Input file properties:
49
Remove duplicate properties:
Target file properties:
50
Example job for remove duplicate :
Input data:
Output requirement:
Job:
Oracle enterprise properties:
51
Remove duplicate properties:
52
Target file properties:
53
COPY STAGE:
It is used to copy the source data into multiple targets. It takes one input link and gives
one (or) more than one output link.
Double click on copy stage go to outputdrag and drop of required target links by
choosing output names
FILTER STAGE:
It is used to filter the records while populating data from source to target. it takes one
(or) more than one output link and one reject link
54
Double click on filter stagegive constrains on where clause like sal_id<300 if u want
more where clauses click on predicatesgo to output tab drag and drop all columns of
3 target tables click ok choose link ordering for corresponding constraint.
MODIFY STAGE:
It is used to modify the column names, modify the data types, modify nullabilities.it takes
one input link and one output link
Note; Target column name=source column name can be maintained irrespective of

column changes, data type changes, nullability changes
1. Handling null values:

Column_Name=Handle_Null('Column_Name',Value)
2. Droping acolumn:
DROP Column_Name
3. Keeping a column:
KEEP Column_Name
4. Type conversion:
Column_Name=type_conversion('Column_Name')
EX: HIREDATE=date_from_timestamp(“HIREDATE”)
FUNNEL STAGE:
It is used to combine all source tables data into single target table.It takes one input link
and gives one out put link.
Note: It works union all operation,
All source input links meta data should be same.
55
Double click on funnel stagego to out put drag and drop all columns
There are 3 types of funnels are there

1. Continuous Funnel
2. Sort Funnel
3. Sequence Funnel
CONTINOUS FUNNEL:
In continues funnel it extracts each record from all input links and populate to target.
SORT FUNNEL:
In sort funnel it extracts each record from all input links sort it in ascending order and
populates to target.
SEQUENCE FUNNEL:
It extracts all records from one input link and populates to target and again extracts from
2 input links and populates to target etc.
56
Example:
SURROGATE KEY STAGE:

It is used to generate some sequence no s from 1 to n.
Surrogate key is a key which is defined by user and it starts from 1 and incremented by
1Sequentilally and continuously. it takes one input link and gives one output link
Double click on surrogate keyname=sequence_no (in which column to generate the

sequence number)choose start value=1 go to out put and drag and drop of required
columns.
Example:
57
AGGREGATOR STAGE:
It is used to find aggregate values, like sum, average, max, min, after group by values. it
takes one input link and gives one output link.
Double click on aggregator stage group by droup=deptnocolumn for

calculation=salchoose required functions like sum ,max,min,avg,assign that function
values the required columns go to outputdrag and drop all columns—Click ok
1. EXAMPLE JOB FOR AGGREGATOR STAGE:
58
Input data:
Requirement:
Output file11 data:
Output field12 data:
JOB:
59
database properties;
Copy stage properties:
60
Aggregator 2 properties:
Output tab:
target seqfile11 properties:
61
Column generator properties:
62
Column genenator out put tab:
Aggrigator4 properties:
63
output tab:
Target se file12 Properties:
64
ADVANCED PROCESSING STAGES:
JOIN STAGE:
It is used to join more than one tables based on key column and populates data into target
table while populating from source to target, it takes more than one i/p link and gives one
output link
Double click on join stage key=common column name from input links ie Deptno
It require same column name from all input links
Join type = inner/left/right /full outer choose any one go to output and drag and drop
all columns
Join stage performs 4 types of join operations

1.inner join
2.left outer join
3.right outer join
4.full outer join
1. Inner join:
In inner join the join stage extracts only matched records from all i/p links based on key
column and populates into target
2. Left outer join:

In left outer join the join stage extracts all records from left input link along with matched
records from other links based on key column and populates data into the target table
3. Right outer join:

In Right outer join the join stage extracts all records from Right input link along with
matched records from other links based on key column and populates data into the target
table
4.Full outer join

In Full outer join ,the join stage extracts all records from left and right link and
populates data to the target table based on the key
Note: in Case of full outer join it supports only two inputs.
65
Example:
Emp Table:
Empno Ename Sal Deptno

100 Bhaskar 1500 10
101 Mohan 2000 20
102 Sanjeev 2500 30
Dept Table
Deptno Dname Loc

10 Admin Hyderabad
20 Sales Bangalore
30 Marketing Delhi
Inner join:
Empno Ename Sal Deptno Dname Loc

100 Bhaskar 1500 10 Admin Hyderabad
101 Mohan 2000 20 Sales Bangalore
Left Outer Join:

102 Sanjeev 2500 30 null null
Right outer join:

null null null 40 Marketing Delhi
Full Outer join:

102 Sanjeev 2500 30 null null
null null null 40 Marketing Delhi
66
LOOKUP STAGE:
It is used to join more than one table based on different key columns and populates into
target table.
It takes one input link, one (or) more than one i/p reference links and gives one output
link and one reject link
Look up stage can perform 2 types of join operation s

1. Inner join
2. Left outer join (left link always should be stream link)
Establish join condition manually drag and drop of required columnsnow take required
columns to output linkclick Ok
Note: Default inner join occurs
How to take left outer join:

Double click on lookup stage  click on constraint go to look up failure chose
continue Click Ok
How to take reject link:

Take one reject link from lookup stagesdouble click on lookup stagego to constraint
make it as reject
Note:
The constraint option contains lookup failure
if lookup failure=fail then job will aborted if values not found in reference table(look up
table)
if look up failure=drop this is inner join
if look up failure=continue then it is left outer join
if look up failure=reject then unmatched records populates into reject table
67
There are two types of look up types available in data stage
1. Narmal lookup
2. Sparse lookup
Normal lookup:
Normal lookup has less memory so that if reference tables having fewer amounts of data
prefer to use Normal lookup
Sparse lookup:
Sparse lookup has more memory so that if reference tables contain huge amount of data
prefer to use sparse lookup
How to find whether look up is normal lookup or sparse look up:

Double click reference table  choose lookup type
Note:
A sparse look up can not support more than one reference links
Differences between Join stage and Look up stage:
Join stage Look up stage

It can perform 4 types of join operations It can perform two types of operations
Inner ,left,right,full outer inner, left
It won’t gives reject link It gives reject link
It has more memory when compare to look It has less memory when compared to join
up stage stage
MERGE STAGE:
It is used to join more than one table based on common column while populating from
source to target .it takes more than one input link(one is master link and remaining are
child links (or) update links) and gives one output link,(n-1) reject links if “n” are source
links
Note: the no of reject links should be no of child tables (or) updated tables
68
Example:
TRANSFORMER STAGE:
Trans former stage plays major role in data stage .it is used to modify the data, apply
some functions while populating data from source to target
It takes one input link and gives one (or) more than one output links.
It has 3 components
1. Stage variable
2. Constraints
3. Derivations (or) Expressions
1. Transformer stage can works as copy stage and filter stage
2. Transformer stage requires C++ Compiler .it convert high level data into machine
language
Double click on transformer stage drag and drop of required target columnsClick Ok
Each Transformer stage contains only one stage variable and, each target table contains
only one constraints and each target column contains only one derivation.
The order of execution of components is
1. 1. Stage variable
2. Constraints
3. Derivations
Example:
69
How to work transformer as filter stage (or) how to apply constraints in the transformer
stage:
Double click on trans former stage  double click on constraint again double on
particular link click on this window  it provides all information’s automatically and
view Constraints  for reject link click other wise.
Example Derivation:
If Sale_Id <300 then Amount_Sold=Amount_Sold+300

Else if Sale_Id>300 and Sale_Id<600 then Amount _Sold=Amount_Sold+600
Else if Sale_Id>600 and Sale_Id<1000 then Amount _Sold=Amount_Sold+1000 Else
Amount_Sold=Amount_Sold+100
Transformer stage provides some Functions and other informations.those are

1. Ds Macro
2. Ds routine
3. Job parameter
4. Input column
5. Stage Variables
6. System Variables
7. String
8. Function
9. Parenthesis
10. If Then Else
1. Ds Macro:
Ds Macro provides some built in Functions like
1. DsProjectName()
2. DsJobName()
3. DsHostName()
4. DsJobStartDate()
5. DsJobStartTime()
6. DsJobStartTimeStamp()
2. Ds routine:
It is nothing but set up functions
3. Job parameters:
Job parameters are nothing but some variables. these are used to reduce the redundancy
of work
4. Input columns:
It provides all input column names
5. Stage variables:
70
Stage variables are used to increase the performance and to reduce the redundancy of
work
How to define stage variable properties:
Click on stage variable right click on stage variable select stage variable
propertiesdefine stage variables
6. Stage variables:
It contain some built in functions like
1. @INROWNUM
2. @OUTROWNUM
3. @NUMPARTIONS
INROWNUM and OUTROWNUM provides how many records are loading into
transformer stage and how many records extracted from transformer stage ,Num Portion
tells how many nodes is handled
7. String:
It provides information with in double quotation hard coded value
8. Functions:
There are several built functions in data stage
1. Date&Time
2. Logical
3. Mathematical
4. Null Handling
5. Number
6. Raw
7. String
8. Type Conversion
9. Utility
EXAMPLE JOBS OF TRANSFORMER STAGE:

1)EXAMPLE JOB FOR TRANSFORMER
JOB:1
Inputfile:
71
Output requirement
JOB:
Sequential file:
72
Transformer Stage properties:
INPUTCOLUMNS:
73
OUTPUTLINK:
TARGET FILE:
74
2) EXAMPLE JOB FOR TRANSFORMER:
JOB2:
Input file:
Output requirement:
JOB:
Input file properties:
75
Transformer stage properties:
Stage variable derivation:

Field(INPUT.HDATE,"/",3):"-": Field(INPUT.HDATE,"/",2):"-":
Field(INPUT.HDATE,"/",1)
4. EXAMPLE JOB FOR TRANSFORMER:

Inputfile data:
Out put Reqirements:

Output1:
Output2:
76
JOB:
INPUT:
77
Transformer1:
Stage variable derivation:

Left(INPUT.REC,1)
Transformer2:
78
Constrains logic:
OUTPUTINVC:
79
OUTPUTPRODID:
80
4. EXAMPLE JOB FOR TRANSFORMER STAGE:
Input file:seqfile1:
Input file:seqfile0:
Out put Requirement:
Job:
81
Input file file1 properties:properties:
Join properties:
Transformer stage properties:
82
Stage variable Status derivation:
If ((DSLink9.leftRec_ENO = DSLink9.rightRec_ENO) And (DSLink9.ENAME =

DSLink9.ENAME1) And (DSLink9.SAL = DSLink9.SAL1) )Then "SAME" Else If
(((DSLink9.leftRec_ENO = DSLink9.rightRec_ENO)) And (DSLink9.ENAME =
DSLink9.ENAME1) And (DSLink9.SAL <> DSLink9.SAL1) ) Or
((DSLink9.leftRec_ENO = DSLink9.rightRec_ENO) And (DSLink9.ENAME <>
DSLink9.ENAME1) And (DSLink9.SAL = DSLink9.SAL1)) Or
((DSLink9.leftRec_ENO = DSLink9.rightRec_ENO) And (DSLink9.ENAME <>
DSLink9.ENAME1) And (DSLink9.SAL <> DSLink9.SAL1))Then "UPDATE" Else
"NEW"
Target filel :properties:
83
1. TRANSFORMER DATE&TIME FUNCTIONS:
1.CurrentDate: Returns the date that the job runs in date format.
Syntax: CurrentDate()
Inputdata:
Output Data:
JOB:
84
Transformer_Time_Date Properties:
Currentdate Field derivation:

CurrentDate()
2.CurrentTime: Returns the time at which the job runs in time format.
Syntax:CurrentTime()
InputData:
OutputData:
JOB:
85
Currenttime Field Derivation:

CurrentTime()
3.CurrentTimeStamp: Returns a timestamp giving the date and time that the job runs in
timestamp format
Syntax:CurrentTimeStamp()
InputData:
OutputData:
JOB:
86
4.DateFromDaysSince:
Syntax: DateFromDaysSince(%number%,[%"yyyy-mm-dd"%])
Inputdata:
Outputdata:
JOB:
87
DaysfromDaysSince Filed derivation:
DateFromDaysSince(DSLink5.Field003, DSLink5.Field004)
5. DateFromJulianDay
Syntax: DateFromJulianDay(%juliandate%)
Inputdata:
Outputdata:
JOB:
88
Transformer_TimeandDate:
HireDate Field Derivation:

DateFromJulianDay(Input.JULIANDATE)
6. DaysSinceFromDate
Syntax: DaysSinceFromDate(%date%,%"yyyy-mm-dd"%)
Inputdata:
OutputData:
89
JOB:
Transformer_TimeandDate Properties:
DAYSSINCEFROMDATE Field Properties:

DaysSinceFromDate(Input.HIREDATE,"2012-12-30")
7.HoursFromTime:
Syntax:HoursFromTime(%time%)
InputData:
OutputData:
90
JOB:
HoursFromTime Filed Derivation:

HoursFromTime(Input.DAYLOGINTIME)
8.JulianDayFromDate:
Syntax:JulianDayFromDate(%date%)
Outputdata:
91
JOB:
JULIANDAYFROMDATE Field Derivation

JulianDayFromDate(Input.DOJ)
9.MicroSecondsFromTime:
Syntax: MicroSecondsFromTime(%time%)
OutputData:
10.MinutesfromTime:
92
Syntax:MinutesFromTime(%time%)
Inputdata:
Outputdata:
JOB:
MINUTESFROMTIME Field derivation:

MinutesFromTime(Input.DAYLOGINTIME)
93
11.MonthDayFromDate:
Syntax: MonthDayFromDate(%date%)
InputData:
OutputData:
JOB:
MOMTHDAYFROMDATE Field Derivation:

MonthDayFromDate(Input.DOJ)
12. MonthFromDate
94
Syntax:MonthFromDate(%date%)
InputData:
Outputdata:
JOB:
MONTHDAYFROMDATE Field Derivation:

MonthFromDate(Input.DOJ)
13.NextWeekDayFromDate:
Syntax: NextWeekdayFromDate(%sourcedate%,%dayname%)
InputData:
95
OutputData:
JOB:
NEXTWEEKDAYFROMDATE Field Derivation:

NextWeekdayFromDate(Input.DOJ,"SATURDAY")
14.PreviousWeekdayFromDate
Syntax: PreviousWeekdayFromDate(%sourcedate%,%dayname%)
Inputdata:
96
Outputdata:
JOB:
Transformer_Time_Date properties:
Previous WeekdayFromDate Field derivation:

PreviousWeekdayFromDate(Intput.DATEOFJOIN,"SATURDAY")
15. SecondsFromTime
Syntax: SecondsFromTime(%time%)
Inputdata:
97
Outputdata:
JOB:
Secondsfromtime Field derivation:

SecondsFromTime(Intput.TIME)
16.SecondsSinceFromTimeStamp:
Syntax:SecondsSinceFromTimestamp(%timestamp%,%"yyyy-mm-dd hh:nn:ss"%)
98
InputData:
OutputData:
JOB:
SECONDSSINCEFROMTIMESTAMP Field Derivation:

SecondsSinceFromTimestamp(Input.DOJTIMESTAMP,"2008-08-19 22:30:52")
17.TimeDate:
Syntax:TimeDate()
99
InputData:
OutputData:
JOB:
TIMEDATE Field Derivation:

TimeDate()
18.TimeFromMidNightSeconds
Syntax: TimeFromMidnightSeconds(%seconds%)
100
InputData
OutputData:
JOB:
TIMEFROMMIDNIGHTSECONDS Filed Derivation:

TimeFromMidnightSeconds(Input.DOJINSEC)
19.TimestampFromDateTime
Sybtax: TimestampFromDateTime(%date%,%time%)
101
Inputdata:
OutputData:
JOB:
TIMESTAMPFROMDATETIME Field Derivation:

TimestampFromDateTime(Input.DOJ, Input.TIME)
102
20. TimestampFromSecondsSince
Syntax: TimestampFromSecondsSince(%seconds%,[%timestamp%])
InputData:
OutputData:
21. TimetFromTimestamp
Syntax: TimetFromTimestamp(%timestamp%)
Inputdata:
Outputdata:
JOB:
103
TimeTfromTimestamp Derivation:
TimetFromTimestamp(Intput.DATEOFJOIN)
22. WeekdayFromDate
Syntax:WeekdayFromDate(%date%,[%startdayname%])
Inputdata:
Outputdata:
JOB:
104
Weekdayfromdate Field function Derivation:

WeekdayFromDate(Intput.DATEOFJOIN,"SATURDAY")
23. YeardayFromDate
Syntax: YeardayFromDate(%date%)
Inputdata:
Outputdata:
JOB:
105
YeardayfromDate Filed derivation:

YeardayFromDate(Intput.DATEOFJOIN)
24. YearFromDate
Syntax: YearFromDate(%date%)
Inputdata:
Outputdata:
JOB:
106
YearFromDate Field Derivation:

YearFromDate(Intput.DATEOFJOIN)
25. YearweekFromDate
Syntax: YearweekFromDate(%date%)
Inputdata:
Outputdata:
JOB:
107
YearWeekFromDate Field Derivation:

YearweekFromDate(Intput.DATEOFJOIN)
2.TRANSFORMER LOGICAL FUNCTIONS

1. BitAnd:
Syntax: BitAnd(%integer%,%integer%)
InputData:
OutputData:
JOB:
108
Transformer_Logical_Functions Properties:
BITAND Field Derivation:

BitAnd(Intput.NUMBER1, Intput.NUMBER2)
2. BitCompress:
Syntax: BitCompress(%binarystring%)
Inputdata:
OutputData:
JOB:
109
BITCOMPRESS Field Derivation:

BitCompress(Intput.BINARYNUMBER)
3. BitExpand
Syntax: BitExpand(%bitfield%)
InputData:
Outputdata:
JOB:
110
BITEXPAND Field Derivation:

BitExpand(Intput.BITCOMPRESS)
4.BitOr:
Syntax:BitOr(%integer%,%integer%)
InputData:
OutputData
JOB:
111
BITOR Field Derivation:

BitOr(Intput.NUMBER1, Intput.NUMBER2)
5. BitXOr
Syntax: BitXOr(%integer%,%integer%)
InputData:
OutputData:
112
JOB:
BITXOR Field Derivation:

BitXOr(Intput.NUMBER1, Intput.NUMBER2)
6. Not
Syntax: Not(%expression%)
Returns the complement of the logical value of an expression. If the value of expression
is true, the Not function returns a value of false (0). If the value of expression is false, the
NOT function returns a value of true (1).
InputData:
113
OutputData:
JOB:
NOT Field Derivation:

Not(Intput.EXPRESSION)
7. SetBit
114
Syntax: SetBit(%bitfield%,%bitliststring%,%bitstate%)
InputData:
OutputData:
JOB:
SETBIT Filed Derivation:

SetBit(Intput.NUMBER, Intput.BITLIST, Intput.BITSTATE)
3. TRANSFORMER MATHEMATICAL FUNCTIONS:
115
TRANSFORMER LOGICAL FUNCTIONS
1. BitAnd:
Syntax: BitAnd(%integer%,%integer%)
InputData:
OutputData:
JOB:
BITAND Field Derivation:

BitAnd(Intput.NUMBER1, Intput.NUMBER2)
116
2. BitCompress:
Syntax: BitCompress(%binarystring%)
Inputdata:
OutputData:
JOB:
BITCOMPRESS Field Derivation:

BitCompress(Intput.BINARYNUMBER)
117
3. BitExpand
Syntax: BitExpand(%bitfield%)
InputData:
Outputdata:
JOB:
BITEXPAND Field Derivation:

BitExpand(Intput.BITCOMPRESS)
118
4.BitOr:
Syntax:BitOr(%integer%,%integer%)
InputData:
OutputData
JOB:
BITOR Field Derivation:

BitOr(Intput.NUMBER1, Intput.NUMBER2)
119
5. BitXOr
Syntax: BitXOr(%integer%,%integer%)
InputData:
OutputData:
JOB:
BITXOR Field Derivation:

BitXOr(Intput.NUMBER1, Intput.NUMBER2)
6. Not
120
Syntax: Not(%expression%)
Returns the complement of the logical value of an expression. If the value of expression
is true, the Not function returns a value of false (0). If the value of expression is false, the
NOT function returns a value of true (1).
InputData:
OutputData:
JOB:
NOT Field Derivation:

Not(Intput.EXPRESSION)
121
7. SetBit
Syntax: SetBit(%bitfield%,%bitliststring%,%bitstate%)
InputData:
OutputData:
JOB:
SETBIT Filed Derivation:

SetBit(Intput.NUMBER, Intput.BITLIST, Intput.BITSTATE)
4.TRANSFORMER NULLHANDLING FUNCTIONS:
122
IsNotNull:
Syntax: IsNotNull(%value%)
InputData:
Outputdata:
3.NullToEmpty:
Output:
JOB:
123
NULLTOEMPTY Field Derivation:
NullToEmpty(Intput.COMM)
4. NullToValue
NullToValue(%inputcol%,%value%)
Inputdata:
Outputdata:
124
JOB:
NULLTOVALUE Field Properties:

NullToValue(Intput.COMM,"000")
5. SetNull
Syntax:SetNull()
Outputdata:
125
JOB:
DOEXPIRE Field Derivation:

SetNull()
TRANSFORMER STRING FUNCTIONS:
STRING FUNCTIONS:
1. Alnum: Checks whether the given string contains only alphanumeric characters.
Syntax: Alnum(%string%)
126
Input data:
Output data:
JOB:
INPUT PROPERTIES:
127
Transformer_Str_Alnum properties:
Logic:Alnum(INPUT.EMPNO)
Same job with different data:Inputdata
Output data:
128
2. FUNCTION:
Alpha
Syntax: Alpha(%string%)
Input data:
Outputdata:
JOB:
Transformer_str_Alpha properties:
129
3.FUNCTION:
CompactWhiteSpace:
Syntax: CompactWhiteSpace(%string%)
Input Data:
Outputdata:
JOB:
Transformer_Str_CompactWhiteSpace properties:
4. FUNCTION
130
Compare
Syntax: Compare(%string1%,%string2%,[%justification%])
Input Data:
Output data:
If str1=str2 then it returns 0,

If str1>str2 then it returns 1
If str1<str2 then it returns -1
JOB:
5. FUNCTION
131
CompareNum:
Syntax: CompareNum(%string1%,%string2%,%length%)
Input data:
Output data:
Note:In this job If first 5 characters same in string1,sting2 it return 0,If string 1>string2 it
returns 1,if string1<string2 it returns -1
JOB:
Transformer_str_compareNum properties:
CompareNum field derivation:

CompareNum(Input.STRING1, Input.STRING2,5)
6.FUNCTION:
132
CompareNumNoCase
Syntax: CompareNumNoCase (%string1%,%string2%,%length%)
Input data:
OutputData:
JOB:
Transformer_str_compareNumNoCase properties:
CompareNumNoCase field derivation:

CompareNumNoCase(Input.STRING1, Input.STRING2,5)
133
7.FUNCTION:
Convert:
Syntax: Convert(%fromlist%,%tolist%,%expression%)
Inputdata:
Outputdata:
JOB:
Transformer_str_Convert properties:
ConvertFunction field derivation:

Convert("@$#"," ", Input.STRING1)
Same job try with the logic: Convert("@$#","", Input.STRING1)
134
Outputdata:
8. FUNCTION
Count:
Syntax: Count(%string%,%substring%)
Inputdata:
Outputdata:
JOB:
135
Transformer_str_Count properties:
Transformer_str_Count Field derivation:

Count(Input.STRING1,"A")
Note in the Above field STRING1 Contain string A repeated twise:
9.FUNCTION:
Dcount:
Syntax: Dcount(%string%,%delimiter%)
Counts the number of delimited fields in a string.
InputData:
Outputdata:
JOB:
Transformer_str_Dcount properties:
136
Transformer_str_Dcount field derivation:
Dcount(Input.STRING1,"|")
10.FUNCTION:
DownCase:
Syntax: DownCase(%string%)
Inputdata:
Output Data:
JOB:
137
Transformer_str_DownCase properties:
STRING1 Field derivation:

DownCase(Input.STRING1)
11.FUNCTION:
Dquote:
Syntax: DQuote(%string%)
Inputdata:
Outputdata:
JOB:
138
Transformer_Str_Dquote properties:
CUSTID Field Derivation:

DQuote(input.CUSID)
CNAME Field Derivation:
DQuote(input.CNAME)
ADDRESS Field Derivation:
DQuote(input.ADDRESS)
12.FUNCTION;
Field:
Syntax: Field(%string%,%delimiter%,%occurrence%,[%number%])
Inputddata;
Outputdata:
JOB:
139
Transformer_Str_Filed properties:
ADDRESS Field derivation:

Field(input.ADDRESS,"-",2)
13.FUNCTION
Index:
Syntax; Index(%string%,%substring%,%occurrence%)
Inputdat:
Outputdata:
JOB:
Transformer_Str_Index properties:
140
Trxr_Str_Index field derivation:
Index(input.FLAVOUR,"chocolate",2)
14.FUUNCTION
Left:
Syntax: Left(%string%,%length%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_left properties:
141
Trx_Str_left field derivation:
Left(input.FLAVOUR,9)
15.FUNCTION
length:
Syntax: Len(%string%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_length properties:
Trx_Str_length filed derivation:

Len(input.FLAVOUR)
16.FUNCTION;
Num:
Syntax:Num (%string%)
142
Inputdata:
Outputdata:
JOB:
Transformer_Str_Num properties:
CUSTID Field derivation:

Num(DSLink5.CUSID)
Note: if Given field contain a value is num then it returns 1. if the given string value is
Alphabetic value or AlphaNumeric value then then it return in output 0
17.FUNCTION
PadString
143
Syntax: PadString(%string%,%padstring%,%padlength%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_PadString properties:

PadString(DSLink5.CUSID,"0",2)
Note: Here iam padding Two characters with ‘0’to the existing CUSID filed
144
18.FIELD
Right
Syntax: Right(%string%,%length%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_Right properties:
CNAME Field derivation:

Right(DSLink5.CNAME,7)
145
19.FUNCTION:
Soundex:
Syntax: Soundex(%string%)
Returns a code which identifies a set of words that are (roughly) phonetically alike based
on the standard, open algorithm for SOUNDEX evaluation.
Inputdata:
Outputdata:
JOB:
Transformer_Str_Soundex properties

Soundex(DSLink5.CNAME)
20.FUNCTION
146
Space:
Syntax: Space(%length%)
Returns a string of n space characters.
Inputdata:
Output:
JOB:
Transformer_Str_Space Properties:

Space(DSLink5.ADDRESS)
21.FUNCTION
147
Squote:
Syntax: Squote(%string%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_Squote properties:

Squote(DSLink5.ADDRESS)
22.FUNCTION
148
Str
Syntax: Str(%string%,%repeats%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_Str properties;

Str(DSLink5.CNAME,2)
23.FUNCTION:
149
StripWhiteSpace
Syntax: StripWhiteSpace(%string%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_StripWhiteSpace properties:
ADDRESS Field derivation;

StripWhiteSpace(DSLink5.ADDRESS)
24.FUNCTION;
150
Trim:
Syntax: Trim(%string%,[%stripchar%],[%option%])
Inputdata:
Outputdata:
JOB:
Transformer_Str_Trim properties:

Trim(DSLink5.ADDRESS,".","A")
In this job trim again “|” also
Outputdata:
151
JOB:
Transformer_Str_Trim_Tab Properties:

Trim(DSLink6.ADDRESS,"|","A")
Different Trim Options:
A Remove all occurrences of stripchar
B Remove both leading and trailing occurrences of stripchar
D Remove leading, trailing, and redundant white-space characters
E Remove trailing white-space characters
F Remove leading white-space characters
L Remove all leading occurrences of stripchar
R Remove leading, trailing, and redundant occurrences of stripchar
T Remove all trailing occurrences of stripchar
152
25.FUNCTION:
TrimF:
Syntax: TrimF(%string%)
Inputdata:
Output data:
JOB:
Transformer_Str_TrimF Properties:
153
ADDRESS Field Derivation;
TrimF(DSLink5.ADDRESS)
26.FUNCTION
TrimB:
Syntax:
Inputdata:
Outputdata:
JOB:
154
Transformer_Str_TrimB Properties:
TrimB(DSLink5.CNAME)
27.FUNCTION:
TrimLeadingTrailing:
Syntax: TrimLeadingTrailing(%string%)
Inputdata:
Output data:
155
JOB:
CNAME Filed Derivation:
TrimLeadingTrailing(DSLink5.CNAME)
28.FUNCTION:
UpCase:
Syntax: UpCase(%string%)
Inputdata:
156
Outputdata:
JOB:
Transformer_Str_UpCase properties:
UpCase(DSLink5.ADDRESS)
157
7. TRANSFORMER TYPE CONVERSION FUNCTIONS:
DATA STGAE TRANSFORMER TOTAL 26 STRING FUNCTIONS:
STRING FUNCTIONS:
2. Alnum: Checks whether the given string contains only alphanumeric characters.
Syntax: Alnum(%string%)
Input data:
Output data:
JOB:
INPUT PROPERTIES:
158
Transformer_Str_Alnum properties:
Logic:Alnum(INPUT.EMPNO)
Same job with different data:Inputdata
Output data:
159
2.FUNCTION:
Alpha
Syntax: Alpha(%string%)
Input data:
Outputdata:
JOB:
Transformer_str_Alpha properties:
160
3.FUNCTION:
CompactWhiteSpace:
Syntax: CompactWhiteSpace(%string%)
Input Data:
Outputdata:
JOB:
Transformer_Str_CompactWhiteSpace properties:
4.FUNCTION
Compare
Syntax: Compare(%string1%,%string2%,[%justification%])
Input Data:
161
Output data:
If str1=str2 then it returns 0,

If str1>str2 then it returns 1
If str1<str2 then it returns -1
JOB:
5.FUNCTION
CompareNum:
Syntax: CompareNum(%string1%,%string2%,%length%)
Input data:
162
Output data:
JOB:
Transformer_str_compareNum properties:
CompareNum field derivation:

CompareNum(Input.STRING1, Input.STRING2,5)
6.FUNCTION:
CompareNumNoCase
Syntax: CompareNumNoCase (%string1%,%string2%,%length%)
Input data:
163
OutputData:
JOB:
Transformer_str_compareNumNoCase properties:
CompareNumNoCase field derivation:

CompareNumNoCase(Input.STRING1, Input.STRING2,5)
164
7.FUNCTION:
Convert:
Syntax: Convert(%fromlist%,%tolist%,%expression%)
Inputdata:
Outputdata:
JOB:
Transformer_str_Convert properties:
ConvertFunction field derivation:

Convert("@$#"," ", Input.STRING1)
Same job try with the logic: Convert("@$#","", Input.STRING1)
165
Outputdata:
8.FUNCTION
Count:
Syntax: Count(%string%,%substring%)
Inputdata:
Outputdata:
JOB:
Transformer_str_Count properties:
166
Transformer_str_Count Field derivation:
Count(Input.STRING1,"A")
Note in the Above field STRING1 Contain string A repeated twise:
9.FUNCTION:
Dcount:
Syntax: Dcount(%string%,%delimiter%)
Counts the number of delimited fields in a string.
InputData:
Outputdata:
JOB:
Transformer_str_Dcount properties:
167
Transformer_str_Dcount field derivation:
Dcount(Input.STRING1,"|")
10.FUNCTION:
DownCase:
Syntax: DownCase(%string%)
Inputdata:
Output Data:
JOB:
168
Transformer_str_DownCase properties:
STRING1 Field derivation:

DownCase(Input.STRING1)
11.FUNCTION:
Dquote:
Syntax: DQuote(%string%)
Inputdata:
Outputdata:
JOB:
169
Transformer_Str_Dquote properties:
CUSTID Field Derivation:

DQuote(input.CUSID)
CNAME Field Derivation:
DQuote(input.CNAME)
DQuote(input.ADDRESS)
12.FUNCTION;
Field:
Syntax: Field(%string%,%delimiter%,%occurrence%,[%number%])
Inputddata;
Outputdata:
170
JOB:
Transformer_Str_Filed properties:

Field(input.ADDRESS,"-",2)
13.FUNCTION
Index:
Syntax; Index(%string%,%substring%,%occurrence%)
Inputdat:
Outputdata:
JOB:
171
Transformer_Str_Index properties:
Trxr_Str_Index field derivation:

Index(input.FLAVOUR,"chocolate",2)
14.FUUNCTION
Left:
Syntax: Left(%string%,%length%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_left properties:
Trx_Str_left field derivation:

Left(input.FLAVOUR,9)
172
15.FUNCTION
length:
Syntax: Len(%string%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_length properties:
Trx_Str_length filed derivation:

Len(input.FLAVOUR)
173
16.FUNCTION;
Num:
Syntax:Num (%string%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_Num properties:

Num(DSLink5.CUSID)
Note: if Given field contain a value is num then it returns 1. if the given string value is
Alphabetic value or AlphaNumeric value then then it return in output 0
174
17.FUNCTION
PadString
Syntax: PadString(%string%,%padstring%,%padlength%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_PadString properties:

PadString(DSLink5.CUSID,"0",2)
Note: Here iam padding Two characters with ‘0’to the existing CUSID filed
175
18.FIELD
Right
Syntax: Right(%string%,%length%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_Right properties:

Right(DSLink5.CNAME,7)
176
19.FUNCTION:
Soundex:
Syntax: Soundex(%string%)
Returns a code which identifies a set of words that are (roughly) phonetically alike based
on the standard, open algorithm for SOUNDEX evaluation.
Inputdata:
Outputdata:
JOB:
Transformer_Str_Soundex properties

Soundex(DSLink5.CNAME)
20.FUNCTION
177
Space:
Syntax: Space(%length%)
Returns a string of n space characters.
Inputdata:
Output:
JOB:
Transformer_Str_Space Properties:

Space(DSLink5.ADDRESS)
21.FUNCTION
178
Squote:
Syntax: Squote(%string%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_Squote properties:

Squote(DSLink5.ADDRESS)
22.FUNCTION
179
Str
Syntax: Str(%string%,%repeats%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_Str properties;

Str(DSLink5.CNAME,2)
23.FUNCTION:
180
StripWhiteSpace
Syntax: StripWhiteSpace(%string%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_StripWhiteSpace properties:
ADDRESS Field derivation;

StripWhiteSpace(DSLink5.ADDRESS)
24.FUNCTION;
181
Trim:
Syntax: Trim(%string%,[%stripchar%],[%option%])
Inputdata:
Outputdata:
JOB:

Trim(DSLink5.ADDRESS,".","A")
In this job trim again “|” also
Outputdata:
182
JOB:
Transformer_Str_Trim_Tab Properties:

Trim(DSLink6.ADDRESS,"|","A")
Different Trim Options:
A Remove all occurrences of stripchar
B Remove both leading and trailing occurrences of stripchar
D Remove leading, trailing, and redundant white-space characters
E Remove trailing white-space characters
F Remove leading white-space characters
L Remove all leading occurrences of stripchar
R Remove leading, trailing, and redundant occurrences of stripchar
T Remove all trailing occurrences of stripchar
183
25.FUNCTION:
TrimF:
Syntax: TrimF(%string%)
Inputdata:
Output data:
JOB:
184
Transformer_Str_TrimF Properties:
ADDRESS Field Derivation;
TrimF(DSLink5.ADDRESS)
26.FUNCTION
TrimB:
Syntax:
Inputdata:
Outputdata:
185
JOB:
Transformer_Str_TrimB Properties:
TrimB(DSLink5.CNAME)
27.FUNCTION:
TrimLeadingTrailing:
Syntax: TrimLeadingTrailing(%string%)
Inputdata:
186
Output data:
JOB:
CNAME Filed Derivation:
TrimLeadingTrailing(DSLink5.CNAME)
187
28.FUNCTION:
UpCase:
Syntax: UpCase(%string%)
Inputdata:
Outputdata:
JOB:
188
Transformer_Str_UpCase properties:
UpCase(DSLink5.ADDRESS)
TRANSFOMER TIME AND DATE FUNCTIONS:

Inputdata:
Output Data:
JOB:
189

CurrentDate()
InputData:
OutputData:
JOB:
190
CurrentTime()
timestamp format
InputData:
OutputData:
JOB:
191
CurrentTimeStamp Field Derivation:
CurrentTimeStamp()
TRANSFOMER TIME AND DATE FUNCTIONS:

Inputdata:
Output Data:
JOB:
192
CurrentDate()
InputData:
OutputData:
JOB:
193

CurrentTime()
timestamp format
InputData:
OutputData:
JOB:
194
CurrentTimeStamp Field Derivation:

CurrentTimeStamp()
JOB PARAMETERS:
Job parameters are variables which are using to reduce the redundancy of work there are
two types of job parameters are available
1. Local parameters
2. Global parameters
Local parameters are defining in job parameters
How to define parameters:

Go to the job propertiesclick on the icon(or) go to edit select job propertiesclick on
parameters
Parameter Name Prompt Type Default value
SERVER_NAME Enter server name String Oracle
USER_NAME Enter user name String Scott
PASSWORD Enter password Encrypted *****(Tiger)
Click okdouble click on Oracle stage Insert parameter for server, username and
password
Always the parameter between two hashes which indicates a parameter other wise it
considers i.e. actual value.
How to define Global parameter:
Go to data stage administratorgo to the projectsclick on propertiesclick on
environmentclick on user defined.
Parameter Name Prompt Type Default value

SERVER_NAME Enter server name String Oracle
USER_NAME Enter user name String Scott
PASSWORD Enter password Encrypted *****(Tiger)
Go to Data stage designer select one Jobgo to job propertiesclick on add

environment variable at last can find user defined environment variablenow select
one it will apply to job parameteragain click on add environment variableselect
remaining variablesclick okgo to oracle stage insert again parameters(global)
195
Note: Global parameter should have prefix as $ symbol
How to hide global parameter values:

Go to the job propertiesgo to parameterdouble click on valuesselect $PROJDEF
Differences between Local parameters and Global parameters
Local Parameter Global parameter

It is defined in job parameters only These parameters are defined environment
variable in administrator
These parameters valid with in a job These parameters valid with in a project
These parameters not require $ symbol These parameters start with $ symbol
CONTAINERS:
Containers are used to minimize the complexity of job for better understanding and
reusability purpose
There are two types of containers are available in data stage
1. Local Container
2. Shared Container
Local Container:
It is used to minimize the complexity of job for better understanding purpose only it
never used for reusability purpose and its limit with in a job.
Shared Container:
It is used to both the purpose like to minimize the complexity of job and reusability
purpose and its limit with in a project.
Differences between Local Containers and Shared Container:
Local Containers Shared Containers

It is used to minimize the complexity of job It is used to minimize the complexity of job
only for better understanding purpose only it
never used for reusability
Its limit with in a job. And its limit is with in a project
It occupies no memory It occupies some memory
It can reconstruct directly We Can not deconstruct it directly if we
want deconstruct it first converted to local
and then constructed
How to construct Container:
Go to data stage designeropen a specific job select a required stages click on edit
click on construct container then choose local or shared if want deconstruct right
click on containerclick deconstruct.
How to use shared container in another job:

Create a new job drag and drop of shared container into new jobdesign our job
according to our requirement Double click on shared container go to out put assign
196
old output link (shared container link) to new out put linkgo to columns click on
loadselect reconcile from container link (old link)click on validatedo it for
remaining linksclick ok
IMPORTING AND EXPORTING DATA STAGE JOBS:

How to export Jobs:
Go to data stage managerclick on exportselect data stage componentsselect path
where you want to save choose whole project or selectionclick export
Note:
The extension of back up file is .dsx
How to export Jobs:

Go to data stage managerclick on importselect data stage componentsselect the
required path of file click ok
SWITCH STAGE:
It is used to filter the records based on constraint while populating the data from source to
target it takes one input link and gives more than one out put link
Note:
It takes only equal operator double click on switch stageselector= deptnocase=10,
case=20, case=30drag and drop of required column links
Example:
Oracle_Enterprise_0 properties:
197
Switch stage properties:
Outputmappings:
Outputname=T1;
198
Output_Dataset_2 properties:
199
output:
Outputname=T2;
200
Output Dataset_3 Properties:
output:
Outputname=T3;
201
Output Dataset_4 Properties:
Output:
DIFFERENCE STAGE:
It is used to find out the difference of two input files (or) two input tables.
It takes only two input links and gives one output link .
Difference stage takes some kind of meta data from two input links ,it gives one
extra column to the output link called change_code.
If change_code=1 then its anew record, if change_code=2, it’s a copy record ,if
change_code=3 it’s a updated record.
202
Example:
COMPRESS STAGE:
It is used to zip the file by using UNIX command. It takes one input link and gives
one output link.
EXPAND STAGE:
It is used to unzip the zipped file into normal format command. It takes one input
link and gives one output link.
DECODE STAGE:
It is used to decode a particular file into unknown format which was preferred to
security purpose. It takes one input link and gives one output link.
203
ENCODE STAGE:
It is used to encode the decode file into normal. It takes one input link and gives one
output link.
PEEK STAGE:
It is used to find out what records are going to which node.It is a file type stage but
can not a saved a file.It take one input link and one output link
EXAMPLE JOB FOR PEEKSTAGE:

Inputdata:
JOB:
204
Input sequential file properties:
Peek stage properties:
205
Peek stage Output columns:
Output seqfile properties:
206
Output seq file data:
EXAMPLE JOB FOR

PEEKSTAGE:
Inputdata:
Option outputmode=Joblog:
Job:
Input seqfile properties:
207
Input seqfile data:
208
peek stage properties:
Here we set the option Peek outputmode=joblog so we can the data at Logs only
Procedure for see the data at logs:
Goto the tools and rundirectornow click on view log it will show the screen like
209
in the above screen from bottom to 8th row u click it will show the log details
210
EXAMPLE JOB FOR PEEKSTAGE:
Inputdata:
JOB:
211
Input seqfile data:
Input seqfile properties:
212
Peek stage properties:
Peek output1 columns:
213
peek output1 mappings:
Peek output2 columns:
peek output2 mappings:
214
peekoutput3 columns:
215
Peekoutput3 mappings:
peekout1 properties:
216
peek output1 data:
Peekoutput2 properties:
217
Peekoutput2 data:
Peekoutput3 data:
218
MULTIPLE INSTANCE:
Multiple instance is a good concept available in data stage parallel job. Through multiple
instance a developer can run a physical job more than one time at a time parallely with
the invocation ID.
How to run a job more than one time:
Go to data stage designer open a required jobgo to job propertiescheck the box
Allow multiple instanceclick ok
RUN TIME COLUMN PROPAGATION:

This concept also available in datastage parallel jobs only. Through this concept even
though if source contains no columns then run time column may propagate and treat the
column is available in the source.
How to run the job through RCP:

Go to the data stage administrator go to the propertiesin general click on enable
run time column propagation in parallel jobs click okclose.
Go to data stage designer select a required jobgo to propertiesclick
on enable run time column propagation for new links.
CONFIGURATION FILE:
The configuration file is available in data stage server and the extension of the
configuration file is .APT and which is used to know how many nodes are available in
the particular projects.
It contains four components
1. Fast Name: It is the name of the Node
2. Pools: Which is used for specific task
3. Resource Scratch Disk: It is temporary memory
4. Resourced Disk: It is permanent memory.
The Configuration file name is Default. Apt
How to view Configuration file:
Go to data stage managerclick on tools go to configurationopen default
219
How to run a specific job with two nodes if project is running with four nodes:
Go to data stage manageropen default configuration filesave as with another name
by clicking down save button(say sample)now delete two nodes from sample
filesave and close.
Go to data stage designeropen a particular job go to job propertiesgo to
parametersclick on add environment variable select APT_CONFIG_FILEDouble
click on default value Choose Sample. Aptclick ok Now our required job is run with
two nodes.
DATA SET MANAGEMENT:

From data set management a developer can identify the size of the data set ,records in
node, view data set data, delete data set and copy data set
How to go to data set management:

Go to data set managerclick on tools select data set managementnow choose
required data setclick ok.
COMBINALITY MODE:
It is used assign a single processor to all homogeneous (or) similar stages in data stage
designer
How to maintain combinality:
Go to data stage designer open required job go to processing stagesgo to stage and
advanced make it combinality node= combinality
HOW TO IMPORT TABLE DEFINATION FROM ORACLE DATA BASE:

Go to data stage designer In repository right click on table definitionselect import 
choose plug in meta data definition  choose oracle 9iclick okprovide server name,
user name and passwordclick nextchoose scott for owners list select required
tables,choose folder name click import
ACTIVE STAGES:
The active stage is nothing but if the data is going to modify into particular stage then the
stage is called active stage
Example: Transformer, Aggrigator.
PASSIVE STAGE:
The passive stage is nothing but if the data is going to modify on that stage then the stage
is called passive stage.
RUNNING AJOB THROUGH UNIX COMMAND:
Dsjob –run –job status -param<parameter1>=<value1>param<parameter2>=<value2> -
par am<parameter n>=<value n>
220
JOB SEQUENCE:
It is used to run all jobs in a sequence (or) in a order by considering its dependencies. it
has many activities.
How to go to job sequence:
Select job sequencedrag drop of required jobs from jobs in repositorygive
connectionsave itcompile it now sequentially these 3 jobs will be run.
These jobs are called job activity.

Double click on job activity we can find general/job/triggersgo to job make
execution =reset if required ,then rungo to triggersin expression type make un
conditional/conditional ok/conditional fail
Unconditionally= if job1 is successfully finished (or) aborted then job2 will run
Conditional ok= if job1 is successfully finished then job2 will run.
Conditional fail=if job1 is fail then job2 will run.
NOTIFICATION ACTIVITY:
It is used to send a mail to required persons automatically.
Double click on notification activity go to notification SMTP mail server: Company
name (www.xyz.com) ,sender email address: abreddy2@xyz.com, recipients email
address: recipients email address : abreddy2@xyz.com Email subject:Aggregatot job has
been aboarted,give some information on the bodyclick ok
221
TERMINATOR ACTIVITY:
It is used to send stop request to all running job.
WAIT FOR FILE ACTIVITY:

It is used to wait for a file up to some extent of time
Double click on wait for file activitygo to filefilename: select the file and set
timing(24 hours time only)
222
SEQUENCER:
It is used to connect one activity to another activity it takes more input links and gives
one output link
Double click on sequencergo to sequencer chose mode=All/Any

All is nothing but needs to get all requests from all input links.
Any means any request from one input link
223
ROUTINE ACTIVITY:
It is used to execute a routine between two jobs
Double click on routine activitychoose routine nameif required parameter give
parameter.
EXECUTE COMMAND ACTIVITY:

It is used to execute a UNIX command between two jobs
Double click on execute command activityexecute UNIX command click ok
END LOOP START LOOP ACTIVITY:

It is used to execute some jobs more than one time in a sequence.
224
SLOWLY CHANGING DIMENSIONS:
There are 3 types of SCD‘s Available in DWH.
Type1: It always maintains current data and updated data
Type2: It always maintains current data and full historical data
Type3: It always maintains current data and partial historical data
EXERICE-1:
no name sal
100 Bhaskar 1500
101 Mohan 2000
103 Sanjeev 2000
no name sal
100 Bhaskar 1000
101 Mohan 1500
102 Srikanth 2000
After implementing SCD Type1
no name sal
100 Bhaskar 1500
101 Mohan 2000
102 Srikanth 2000
103 Sanjeev 2000
Type-I:
In SCD Type-I If a record exists in source table and not exists in target table then simply
insert a record into target table (103 record) if a record exists in both source and target
tables then simply update source record into target table(100,101)
Type-II:
While implementing SCD Type-II there are two extra columns are maintained in target
called Effective Start Date and Effective End Date .Effective start date is also part of
primary key.
If a record exists in source and not exists in target table then simply insert records into
target table. while inserting put Effective Start Date is equal to current date and effective
end date set null.
225
If a record exists in both source and target tables even though we are inserting a
source record into target table but before insert a record into target table the existing
record in target table update effective End Date=CurrentDate-1.
Now insert source record into target table effective start date=Current Date and Effective
End Date=Null
no name sal Effective_Strat_Date Effective_End_Date

100 Bhaskar 1000 2011-01-31 2012-04-05
101 Mohan 1500 2011-01-31 2012-04-05
102 Srikanth 2000 2011-01-31 2012-04-05
100 Bhaskar 1500 2011-02-01 Null
101 Bhaskar 2000 2011-02-01 Null
103 Sanjeev 2000 2011-02-01 Null
Type-III:
If a record exists in source and not exists in target table then simply insert records
Into target table. While inserting put Effective start Date is equal to Current Date and
Effective End Date set Null.
If a record exists in both source and target tables then check target table count group by
primary key if count=1 then update Effective End Date=Current Date-1 then simply
insert source record into target record.
If count greater than one then delete a record into target table group by primary key
where Effective End Date=Not Null. Now update target record Effective End
Date=Current Date-1 Then simply insert source record into target.
226
DATAWAREHOUSE:
Data ware house is nothing but collection of transactional data and historical data and can
be maintained in dwh for analysis purpose.
They are 3 types of tools should be maintained on any data warehousing project
1. ETL Tools
2. OLAP Tools (or) Reporting Tools
3. Modeling Tool
ETL TOOL:
ETL is nothing but Extraction, Transformation, and Loading. a ETL Developer(those
who are expertise in dwh extracts data from heterogeneous databases(or)Flat files,
Transform data from source to target(dwh) while transforming needs to apply
transformation rules and finally load data into dwh.
There are several ETL Tools available in the market those are
1. Data stage
2. Informatica
3. Abinitio
4. Oracle Warehouse Builder
5. Bodi (Business Objects Data Integration)
6. MSIS (Microsoft Integration Services)
OLAP:
OLAP is nothing but Online Analytical Processing and these tools are called as reporting
Tools Also
A OLAP Developer analyses the data ware house and generate reports based on selection
criteria.
There are several OLAP Tools are available
1. Business Objects
2. Cognos
3. Report Net
4. SAS
5. Micro Strategy
6. Hyperion
7. MSAS (Microsoft Analysis Services)
227
MODELING TOOL:
Those who are working with ERWIN Tool called data modeler .A data modeler can
design data base of DWH with the help of fallowing tools
A ETL Developer can extract data from source databases (or) flat files(.txt,csv,.xls etc)
and populates into DWH .While populating data into DWH they are some staging areas
can be maintained between source and target .these staging areas are called staging area1
and staging area2.
STAGING AREA:
Staging Area is nothing but is temporary place which is used for cleansing unnecessary
data (or) unwanted data (or) inconsistency data.
Note: A Data Modeler can design DWH in two ways

1. ER Modeling
2. Dimensional Modeling
ER Modeling:
ER Modeling is nothing but entity relationship modeling. in this model always call table
as entities and it may be second normal form (or) 3rd normal form (or) in between 2nd and
3rd normal form
Dimensional Modeling:
In this model tables called as dimensions (or) fact tables. It can be subdivided into three
schemas.
1. Star Schema
2. Snow Flake Schema
3. Multi Star Schema (or) Hybrid (or) Galaxy
Star Schema:
A fact table surrounded by dimensions is called start schema. it looks like start
In a start schema if there is only one fact table then it is called simple start schema.
In a start schema if there are more than one fact table then it is called complex start
schema
228
Sales Fact table:
Sale_id
Customer_id
Product_id
Account_id
Time_id
Promotion_id
Sales_per_day
Profit_per_day
Account Dimension:
Account_id
Account_type
Account_holder_name
Account_open_date
Account_nominee
Account_open_balence
Pramotion:
Promotion_id
Promotion_type
Promotion_date
Pramotion_designation
Pramotion_Area
229
Product:
Product_id
Product_name
Product_type
Product_desc
Product_version
Product_stratdate
Product_expdate
Product_maxprice
Product_wholeprice
Customer:
Cust_id
Cust_name
Cust_type
Cust_address
Cust_phone
Cust_nationality
Cust_gender
Cust_father_name
Cust_middle_name
Time:
Time_id
Time_zone
Time_format
Month_day
Week_day
Year_day
Week_Yeat
DIMENSION TABLE:
If a table contains primary keys and provides detail information about the table
(or) master information of the table then called dimension table.
FACT TABLE:
If a table contains more foreign keys and it’s having transactions, provides
summarized information such a table called fact table.
DIMENSION TYPES:
There are several dimension types are available
230
CONFORMED DIMENSION:
If a dimension table shared with more than one fact table (or) having foreign key more
than one fact table. Then that dimension table is called confirmed dimension.
DEGENERATED DIMENSION:
If a fact table act as dimension and it’s shared with another fact table (or) maintains
foreign key in another fact table .such a table called degenerated dimension.
JUNK DIMENSION:
A junk dimension contains text values, genders,(male/female),flag values(True/false) and
which is not use full to generate reports. Such dimensions is called junk dimension.
DIRTY DIMENSION:
If a record occurs more than one time in a table by the difference of non key attribute
such a table is called dirty dimension
FACT TABLE TYPES:

There are 3 types of fact s are available in fact table
1. Additive facts
2. Semi additive facts
3. Non additive facts
ADDITIVE FACTS:
If there is a possibility to add some value to the existing fact in the fact table .that facts
we called as additive fact.
SEMI ADDITIVE FACT:
If there is possibility to add some value to the existing fact up to some extent in the fact
table is we called as semi additive fact.
NON ADDITIVE FACT:

If there is not possibility to add some value to the existing fact in the fact table is we
called as Non additive fact.
SNOW FLAKE SCHEMA:

Snow Flake schema maintains in dimension table normalized data .in this schema some
dimension tables are not directly maintained relation ship with fact table and those are
maintained relation ship with another dimension
231
DIFFERENCE BETWEEN STAR SCHEMA AND SNOW FLAKE SCHEMA:
Star schema Snow flake schema

It maintains demoralized data in the It maintains normalized data in the
dimension table dimension table
Performance will be increased when Performance will be decreases when
joining fact table to dimension table when joining fact table to dimension table to
shrunken dimension table because it
require more inner joins when compared
compared with snow flake with snow flake
All dimension table should maintain ed Some dimension tables are not directly
relation ship directly with fact table maintained relationship with fact table
PREPARED BY
BHASKAR REDDY.A
Mail:abreddy2003@gmail.com
Contact: 91-9948047694
232
233

Datastage Guide

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Datastage Guide

Uploaded by

Copyright:

Available Formats

DATA STAGE REFERENCE GUIDE AND LAB HANDOUT

DEVELOPMENT AND DEBUG STAGES:

REAL TIME STAGES:

FAVOURATE STAGE: None

SEQUENCIAL FILE STAGE:

LIMITAIONS OF SEQUENCIAL FILE:

Here Read method= specific file(s)

Input File data:

Output File data:

Target Sequential_File_1 Properties:

Click on Columns tab on left hand side tab

Now Click On import and select Sequential file definitions..

Click on Import tab:

Now Select Emp1.txt in table definition list and click on OK

2)Example Job for Sequential File:

These two files are in this path: D:\dspractice\sanjeev\emp*.txt

Input Sequential_File_0 Properties:

3)Example Job for Sequential File:

Here Read method= specific file(s)

here two records field delimiter is not properly ended empno=300,400

Input sequential file_0 data Properties:

4) Example Job for Sequential File:

Here Read method= specific file(s)

Input file data:

Output Sequential File Data:

Output Rejects Data:

Note: if U select Reject Mode=Output then u must have reject link

5)Example Job for Sequential File:

Here Read method= specific file(s)

6)Example Job for Sequential File:

Here Read method= specific file(s)

Here Read method= specific file(s)

Input Source file data:

8)Example Job for Sequential File:

Here Read method= specific file(s)

Input sequential file properties:

Here Read method= specific file(s)

Output sequential file properties:

10)Example Job for Sequential File:

Here Read method= specific file(s)

Input Sequential file stage properties:

Output seqfile stage properties:

DATA SET MANAGEMENT UTILITY:

MULTIPLE FILES FOR DATA SET:

Note: Data set other names or alias

EXAMPLE JOBS FOR DATA SET STAGE:

Input sequential properties:

Output dataset Properties:

We can view the record schema of data set

Now select empoutput1.ds and click OK

U can see the data here by click on data viewer option:

Sequential File stage Data set stage

Note: if developer require more keys go to stage properties click on sorting

EXAMPLE JOB FOR SORT STAGE:

Sort stage properties:

Target Dataset_1 Properties:

EXAMPLE JOB FOR SORT STAGE:

Input Sequential_File_0 Properties:

Double click on remove duplicate stagekey=sale_id(column name)go to output drag

Input file properties:

Target file properties:

Oracle enterprise properties:

Note; Target column name=source column name can be maintained irrespective of