Professional Documents
Culture Documents
Datastage Guide
Datastage Guide
PALLETE:
Pallet contains all kind of stages
1. General
2. Database
4. Development and Debug
5. File
6. Processing
7. Real Time
8. Restructure
GENERAL STAGES:
General it contains 4 stages
1. Annotation,
2. Container,
3. Description Annotation,
4. Link
Link: It is used to give connection between two stages
Annotation: It is used to provide the comments
DATABASE STAGES:
It contains all kind of data base stages
1. DB2/UDB API
2. DB2/UDB Load
3. Dynamic RDBMS
4. ODBC Enterprise
5. SQL Server Enterprise
6. Oracle Enterprise
7. Stored Procedure
8. Teradata Enterprise
9. Teradata Multi Load
1
FILE STAGES:
1. Complex Flat File
2. Data set
3. External Source
4. External Target
5. File set
6. Look up File set
7. Sequential file
8. SAS Parallel Data Set
PROCESSING STAGES:
1. Aggregator
2. Change Apply
3. Change Capture
4. Compare
5. Compress
6. Copy
7. Decode
8. Difference
9. Encode
10. Expand
11. Filter
12. Funnel
13. Generic
14. join
15. Look up
16. Merge
17. Modify
18. Pivote
19. Remove Duplicate
20. External Filter
21. SAS
22. Sort
23. Surrogate Key Generator
24. Switch
25. Transformer
2
RESTRUCTURE STAGES:
1. Column Export
2. Column Import
3. Combine Records
4. Make Sub Record
5. Make Vector
6. Promote Sub Record
7. Split Sub record
8. Split Vector
3
EXAMPLE JOBS FOR SEQUENTIAL FILE STAGE:
Requirement: Extracting EMP data from text file and loading into text file By using
Sequential file Stage
Job:
4
Properties for sequential_File_0:
5
Importing table table definition from sequential file:
Right click on sequential_file_0 Or double click on seqential_file_0
6
Click on Load Tab at Bottom
It will show the Window like this
7
Now in the file list u select the file emp1.txt
Tick the check box First line is column names and Click on Defines
8
Now Click on Ok and click on Close tab Now that file emp1.txt will show in the table
Definition list
9
It will show Window like this
Now Click on OK and again Click on OK ….This is the way of procedure for importing
table definition
Input sequential_File_0
Properties:
Emp1.txt
10
Emp2.txt
11
Job:
12
Output Sequential_File_1 Properties:
13
Input sequential file_0 data:
Job:
14
Output sequential_File_01 Properties:
Output data:
Requirement: Extracting EMP data from text file and loading into text file By using
Sequential file Stage
15
Input sequential file data :
16
Job:
17
Input Seq file data:
Input Properties:
18
Columns:
Output Properties:
19
Outputdata:
Job:
InputProperties:
20
Columns:
Input Data:
21
Output Properties:
OutputData:
22
7)Example Job for Sequential File:
Requirement: Extracting EMP data from text file and loading into text file By using
Sequential file Stage
23
Input Sequential filedata:
Job:
24
Input properties:
25
Input Columns:
Output properties:
26
Output Data:
27
Job:
Columns:
28
Output sequential file properties:
29
9)Example Job for Sequential File:
Requirement: Extracting EMP data from text file and loading into text file By using
Sequential file Stage
Grep Options:
1) grep “string” Ex: grep “bhaskar”
2) grep –v “string” Ex: grep – v “bhaskar”
3) grep –i “String” Ex: grep - i “bhaskar”
Source filed data:
Job:
30
Input sequential file Properties:
Columns:
31
Output Data:
32
Source filedata:
Job:
33
Columns:
34
DATA SET SATGE:
1. Data set is a file stage which is used for staging the data for dependent jobs
2. Data set is a internal stage in data stage.
3. The extension of data set is .DS
4. It never use to extract data from client location
5. It is used as intermediate stage between two tables
Types of Data sets
1. Persistence data set
2. Virtual Data set
Descriptor file:
It holds the information about the address and about the structure
Data file: Represents the data in the native form
Control and Header file: these files are operate at the OS level for controlling the
Descriptor and Data file
JOB:
35
Note: the target dataset file extension is .ds
Outputdataset data
36
it will show the list of files and datasets
37
U can view the record schema By click on table definition icon
Note: By using data set management we can we can open the dataset, it can show the
schema window’ it can show the data window, it can copy dataset and can delete dataset
38
DIFFRENCES BETWEEN SEQUENTIAL FILE AND DATA SET:
It is used to extract data from flat file It never use to extract data from client flat
files
SORT STAGE:
It is used to sort the data either in ascending order or descending order based on key
column while populating data from source to target it takes one input link and one output
link
Double click on sort stage go to stagepropertieskey = Cid (Based on column Sort
the data)sort order=ascending order or descending go to the outputdrag and drop
all columns in stage tabAdvanced there is a option execute mode if developer wants
to execute in sequential mode then execute mode in sequential by default it is in parallel
click ok savecompilerun it
39
Sequential_File_0 properties:
40
Output Columns:
41
View data:
Output Requirement:
42
Job:
Sort_4 Properties:
43
Output Mapping:
Output columns:
44
Sort_7 Properties:
45
Output: mappings
Output columns:
46
Target Dataset_1 properties:
OUTPUT DATA:
47
REMOVE DUPLICATE STAGE:
It is used to remove duplicates based on key column while populating data from source to
target it takes one input link and use one output link
Input file :
Output File:
48
JOB:
49
Remove duplicate properties:
50
Example job for remove duplicate :
Input data:
Output requirement:
Job:
51
Remove duplicate properties:
52
Target file properties:
53
COPY STAGE:
It is used to copy the source data into multiple targets. It takes one input link and gives
one (or) more than one output link.
Double click on copy stage go to outputdrag and drop of required target links by
choosing output names
FILTER STAGE:
It is used to filter the records while populating data from source to target. it takes one
(or) more than one output link and one reject link
54
Double click on filter stagegive constrains on where clause like sal_id<300 if u want
more where clauses click on predicatesgo to output tab drag and drop all columns of
3 target tables click ok choose link ordering for corresponding constraint.
MODIFY STAGE:
It is used to modify the column names, modify the data types, modify nullabilities.it takes
one input link and one output link
2. Droping acolumn:
DROP Column_Name
3. Keeping a column:
KEEP Column_Name
4. Type conversion:
Column_Name=type_conversion('Column_Name')
EX: HIREDATE=date_from_timestamp(“HIREDATE”)
FUNNEL STAGE:
It is used to combine all source tables data into single target table.It takes one input link
and gives one out put link.
Note: It works union all operation,
All source input links meta data should be same.
55
Double click on funnel stagego to out put drag and drop all columns
CONTINOUS FUNNEL:
In continues funnel it extracts each record from all input links and populate to target.
SORT FUNNEL:
In sort funnel it extracts each record from all input links sort it in ascending order and
populates to target.
SEQUENCE FUNNEL:
It extracts all records from one input link and populates to target and again extracts from
2 input links and populates to target etc.
56
Example:
Example:
57
AGGREGATOR STAGE:
It is used to find aggregate values, like sum, average, max, min, after group by values. it
takes one input link and gives one output link.
58
3. EXAMPLE JOB FOR AGGREGATOR STAGE:
Input data:
Requirement:
Output file11 data:
JOB:
59
database properties;
60
Aggregator 2 properties:
Output tab:
61
Column generator properties:
62
Column genenator out put tab:
Aggrigator4 properties:
63
output tab:
64
ADVANCED PROCESSING STAGES:
JOIN STAGE:
It is used to join more than one tables based on key column and populates data into target
table while populating from source to target, it takes more than one i/p link and gives one
output link
Double click on join stage key=common column name from input links ie Deptno
It require same column name from all input links
Join type = inner/left/right /full outer choose any one go to output and drag and drop
all columns
1. Inner join:
In inner join the join stage extracts only matched records from all i/p links based on key
column and populates into target
65
Example:
Emp Table:
Dept Table
Inner join:
66
LOOKUP STAGE:
It is used to join more than one table based on different key columns and populates into
target table.
It takes one input link, one (or) more than one i/p reference links and gives one output
link and one reject link
67
There are two types of look up types available in data stage
1. Narmal lookup
2. Sparse lookup
Normal lookup:
Normal lookup has less memory so that if reference tables having fewer amounts of data
prefer to use Normal lookup
Sparse lookup:
Sparse lookup has more memory so that if reference tables contain huge amount of data
prefer to use sparse lookup
MERGE STAGE:
It is used to join more than one table based on common column while populating from
source to target .it takes more than one input link(one is master link and remaining are
child links (or) update links) and gives one output link,(n-1) reject links if “n” are source
links
Note: the no of reject links should be no of child tables (or) updated tables
68
Example:
TRANSFORMER STAGE:
Trans former stage plays major role in data stage .it is used to modify the data, apply
some functions while populating data from source to target
It takes one input link and gives one (or) more than one output links.
It has 3 components
1. Stage variable
2. Constraints
3. Derivations (or) Expressions
1. Transformer stage can works as copy stage and filter stage
2. Transformer stage requires C++ Compiler .it convert high level data into machine
language
Double click on transformer stage drag and drop of required target columnsClick Ok
Each Transformer stage contains only one stage variable and, each target table contains
only one constraints and each target column contains only one derivation.
The order of execution of components is
1. 1. Stage variable
2. Constraints
3. Derivations
Example:
69
How to work transformer as filter stage (or) how to apply constraints in the transformer
stage:
Double click on trans former stage double click on constraint again double on
particular link click on this window it provides all information’s automatically and
view Constraints for reject link click other wise.
Example Derivation:
1. Ds Macro:
Ds Macro provides some built in Functions like
1. DsProjectName()
2. DsJobName()
3. DsHostName()
4. DsJobStartDate()
5. DsJobStartTime()
6. DsJobStartTimeStamp()
2. Ds routine:
It is nothing but set up functions
3. Job parameters:
Job parameters are nothing but some variables. these are used to reduce the redundancy
of work
4. Input columns:
It provides all input column names
5. Stage variables:
70
Stage variables are used to increase the performance and to reduce the redundancy of
work
How to define stage variable properties:
Click on stage variable right click on stage variable select stage variable
propertiesdefine stage variables
6. Stage variables:
It contain some built in functions like
1. @INROWNUM
2. @OUTROWNUM
3. @NUMPARTIONS
INROWNUM and OUTROWNUM provides how many records are loading into
transformer stage and how many records extracted from transformer stage ,Num Portion
tells how many nodes is handled
7. String:
It provides information with in double quotation hard coded value
8. Functions:
There are several built functions in data stage
1. Date&Time
2. Logical
3. Mathematical
4. Null Handling
5. Number
6. Raw
7. String
8. Type Conversion
9. Utility
Inputfile:
71
Output requirement
JOB:
Sequential file:
72
Transformer Stage properties:
INPUTCOLUMNS:
73
OUTPUTLINK:
TARGET FILE:
74
2) EXAMPLE JOB FOR TRANSFORMER:
JOB2:
Input file:
Output requirement:
JOB:
75
Transformer stage properties:
Output2:
76
JOB:
INPUT:
77
Transformer1:
78
Constrains logic:
OUTPUTINVC:
79
OUTPUTPRODID:
80
4. EXAMPLE JOB FOR TRANSFORMER STAGE:
Input file:seqfile1:
Input file:seqfile0:
Job:
81
Input file file1 properties:properties:
Join properties:
82
Stage variable Status derivation:
83
1. TRANSFORMER DATE&TIME FUNCTIONS:
1.CurrentDate: Returns the date that the job runs in date format.
Syntax: CurrentDate()
Inputdata:
Output Data:
JOB:
84
Transformer_Time_Date Properties:
2.CurrentTime: Returns the time at which the job runs in time format.
Syntax:CurrentTime()
InputData:
OutputData:
JOB:
85
Transformer_Time_Date Properties:
3.CurrentTimeStamp: Returns a timestamp giving the date and time that the job runs in
timestamp format
Syntax:CurrentTimeStamp()
InputData:
OutputData:
JOB:
86
Transformer_Time_Date Properties:
4.DateFromDaysSince:
Syntax: DateFromDaysSince(%number%,[%"yyyy-mm-dd"%])
Inputdata:
Outputdata:
JOB:
Transformer_Time_Date Properties:
87
DaysfromDaysSince Filed derivation:
DateFromDaysSince(DSLink5.Field003, DSLink5.Field004)
5. DateFromJulianDay
Syntax: DateFromJulianDay(%juliandate%)
Inputdata:
Outputdata:
JOB:
88
Transformer_TimeandDate:
6. DaysSinceFromDate
Syntax: DaysSinceFromDate(%date%,%"yyyy-mm-dd"%)
Inputdata:
OutputData:
89
JOB:
Transformer_TimeandDate Properties:
7.HoursFromTime:
Syntax:HoursFromTime(%time%)
InputData:
OutputData:
90
JOB:
Transformer_TimeandDate Properties:
8.JulianDayFromDate:
Syntax:JulianDayFromDate(%date%)
Outputdata:
91
JOB:
Transformer_TimeandDate Properties:
9.MicroSecondsFromTime:
Syntax: MicroSecondsFromTime(%time%)
OutputData:
10.MinutesfromTime:
92
Syntax:MinutesFromTime(%time%)
Inputdata:
Outputdata:
JOB:
Transformer_TimeandDate Properties:
93
11.MonthDayFromDate:
Syntax: MonthDayFromDate(%date%)
InputData:
OutputData:
JOB:
Transformer_TimeandDate Properties:
12. MonthFromDate
94
Syntax:MonthFromDate(%date%)
InputData:
Outputdata:
JOB:
Transformer_TimeandDate Properties:
13.NextWeekDayFromDate:
Syntax: NextWeekdayFromDate(%sourcedate%,%dayname%)
InputData:
95
OutputData:
JOB:
Transformer_TimeandDate Properties:
14.PreviousWeekdayFromDate
Syntax: PreviousWeekdayFromDate(%sourcedate%,%dayname%)
Inputdata:
96
Outputdata:
JOB:
Transformer_Time_Date properties:
15. SecondsFromTime
Syntax: SecondsFromTime(%time%)
Inputdata:
97
Outputdata:
JOB:
Transformer_Time_Date Properties:
16.SecondsSinceFromTimeStamp:
Syntax:SecondsSinceFromTimestamp(%timestamp%,%"yyyy-mm-dd hh:nn:ss"%)
98
InputData:
OutputData:
JOB:
Transformer_TimeandDate Properties:
17.TimeDate:
Syntax:TimeDate()
99
InputData:
OutputData:
JOB:
Transformer_TimeandDate Properties:
18.TimeFromMidNightSeconds
Syntax: TimeFromMidnightSeconds(%seconds%)
100
InputData
OutputData:
JOB:
Transformer_TimeandDate Properties:
19.TimestampFromDateTime
Sybtax: TimestampFromDateTime(%date%,%time%)
101
Inputdata:
OutputData:
JOB:
Transformer_TimeandDate Properties:
102
20. TimestampFromSecondsSince
Syntax: TimestampFromSecondsSince(%seconds%,[%timestamp%])
InputData:
OutputData:
21. TimetFromTimestamp
Syntax: TimetFromTimestamp(%timestamp%)
Inputdata:
Outputdata:
JOB:
103
Transformer_Time_Date Properties:
TimeTfromTimestamp Derivation:
TimetFromTimestamp(Intput.DATEOFJOIN)
22. WeekdayFromDate
Syntax:WeekdayFromDate(%date%,[%startdayname%])
Inputdata:
Outputdata:
JOB:
104
Transformer_Time_Date Properties:
23. YeardayFromDate
Syntax: YeardayFromDate(%date%)
Inputdata:
Outputdata:
JOB:
105
Transformer_Time_Date Properties:
24. YearFromDate
Syntax: YearFromDate(%date%)
Inputdata:
Outputdata:
JOB:
106
Transformer_Time_Date Properties:
25. YearweekFromDate
Syntax: YearweekFromDate(%date%)
Inputdata:
Outputdata:
JOB:
107
Transformer_Time_Date Properties:
OutputData:
JOB:
108
Transformer_Logical_Functions Properties:
2. BitCompress:
Syntax: BitCompress(%binarystring%)
Inputdata:
OutputData:
JOB:
109
Transformer_Logical_Functions Properties:
3. BitExpand
Syntax: BitExpand(%bitfield%)
InputData:
Outputdata:
JOB:
110
Transformer_Logical_Functions Properties:
4.BitOr:
Syntax:BitOr(%integer%,%integer%)
InputData:
OutputData
JOB:
111
Transformer_Logical_Functions Properties:
5. BitXOr
Syntax: BitXOr(%integer%,%integer%)
InputData:
OutputData:
112
JOB:
Transformer_Logical_Functions Properties:
6. Not
Syntax: Not(%expression%)
Returns the complement of the logical value of an expression. If the value of expression
is true, the Not function returns a value of false (0). If the value of expression is false, the
NOT function returns a value of true (1).
InputData:
113
OutputData:
JOB:
Transformer_Logical_Functions Properties:
7. SetBit
114
Syntax: SetBit(%bitfield%,%bitliststring%,%bitstate%)
InputData:
OutputData:
JOB:
Transformer_Logical_Functions Properties:
115
TRANSFORMER LOGICAL FUNCTIONS
1. BitAnd:
Syntax: BitAnd(%integer%,%integer%)
InputData:
OutputData:
JOB:
Transformer_Logical_Functions Properties:
116
2. BitCompress:
Syntax: BitCompress(%binarystring%)
Inputdata:
OutputData:
JOB:
Transformer_Logical_Functions Properties:
117
3. BitExpand
Syntax: BitExpand(%bitfield%)
InputData:
Outputdata:
JOB:
Transformer_Logical_Functions Properties:
118
4.BitOr:
Syntax:BitOr(%integer%,%integer%)
InputData:
OutputData
JOB:
Transformer_Logical_Functions Properties:
119
5. BitXOr
Syntax: BitXOr(%integer%,%integer%)
InputData:
OutputData:
JOB:
Transformer_Logical_Functions Properties:
6. Not
120
Syntax: Not(%expression%)
Returns the complement of the logical value of an expression. If the value of expression
is true, the Not function returns a value of false (0). If the value of expression is false, the
NOT function returns a value of true (1).
InputData:
OutputData:
JOB:
Transformer_Logical_Functions Properties:
121
7. SetBit
Syntax: SetBit(%bitfield%,%bitliststring%,%bitstate%)
InputData:
OutputData:
JOB:
Transformer_Logical_Functions Properties:
122
IsNotNull:
Syntax: IsNotNull(%value%)
InputData:
Outputdata:
3.NullToEmpty:
Output:
JOB:
Transformer_Logical_Functions Properties:
123
NULLTOEMPTY Field Derivation:
NullToEmpty(Intput.COMM)
4. NullToValue
NullToValue(%inputcol%,%value%)
Inputdata:
Outputdata:
124
JOB:
Transformer_Logical_Functions Properties:
Outputdata:
125
JOB:
Transformer_Logical_Functions Properties:
STRING FUNCTIONS:
1. Alnum: Checks whether the given string contains only alphanumeric characters.
Syntax: Alnum(%string%)
126
Input data:
Output data:
JOB:
INPUT PROPERTIES:
127
Transformer_Str_Alnum properties:
Logic:Alnum(INPUT.EMPNO)
Output data:
128
2. FUNCTION:
Alpha
Syntax: Alpha(%string%)
Input data:
Outputdata:
JOB:
Transformer_str_Alpha properties:
129
3.FUNCTION:
CompactWhiteSpace:
Syntax: CompactWhiteSpace(%string%)
Input Data:
Outputdata:
JOB:
Transformer_Str_CompactWhiteSpace properties:
4. FUNCTION
130
Compare
Syntax: Compare(%string1%,%string2%,[%justification%])
Input Data:
Output data:
5. FUNCTION
131
CompareNum:
Syntax: CompareNum(%string1%,%string2%,%length%)
Input data:
Output data:
Note:In this job If first 5 characters same in string1,sting2 it return 0,If string 1>string2 it
returns 1,if string1<string2 it returns -1
JOB:
Transformer_str_compareNum properties:
6.FUNCTION:
132
CompareNumNoCase
Syntax: CompareNumNoCase (%string1%,%string2%,%length%)
Input data:
OutputData:
JOB:
Transformer_str_compareNumNoCase properties:
133
Note:In this job If first 5 characters same in string1,sting2 it return 0,If string 1>string2 it
returns 1,if string1<string2 it returns -1
7.FUNCTION:
Convert:
Syntax: Convert(%fromlist%,%tolist%,%expression%)
Inputdata:
Outputdata:
JOB:
Transformer_str_Convert properties:
134
Outputdata:
8. FUNCTION
Count:
Syntax: Count(%string%,%substring%)
Inputdata:
Outputdata:
JOB:
135
Transformer_str_Count properties:
9.FUNCTION:
Dcount:
Syntax: Dcount(%string%,%delimiter%)
Counts the number of delimited fields in a string.
InputData:
Outputdata:
JOB:
Transformer_str_Dcount properties:
136
Transformer_str_Dcount field derivation:
Dcount(Input.STRING1,"|")
10.FUNCTION:
DownCase:
Syntax: DownCase(%string%)
Inputdata:
Output Data:
JOB:
137
Transformer_str_DownCase properties:
11.FUNCTION:
Dquote:
Syntax: DQuote(%string%)
Inputdata:
Outputdata:
JOB:
138
Transformer_Str_Dquote properties:
12.FUNCTION;
Field:
Syntax: Field(%string%,%delimiter%,%occurrence%,[%number%])
Inputddata;
Outputdata:
JOB:
139
Transformer_Str_Filed properties:
13.FUNCTION
Index:
Syntax; Index(%string%,%substring%,%occurrence%)
Inputdat:
Outputdata:
JOB:
Transformer_Str_Index properties:
140
Trxr_Str_Index field derivation:
Index(input.FLAVOUR,"chocolate",2)
14.FUUNCTION
Left:
Syntax: Left(%string%,%length%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_left properties:
141
Trx_Str_left field derivation:
Left(input.FLAVOUR,9)
15.FUNCTION
length:
Syntax: Len(%string%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_length properties:
16.FUNCTION;
Num:
Syntax:Num (%string%)
142
Inputdata:
Outputdata:
JOB:
Transformer_Str_Num properties:
Note: if Given field contain a value is num then it returns 1. if the given string value is
Alphabetic value or AlphaNumeric value then then it return in output 0
17.FUNCTION
PadString
143
Syntax: PadString(%string%,%padstring%,%padlength%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_PadString properties:
144
18.FIELD
Right
Syntax: Right(%string%,%length%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_Right properties:
145
19.FUNCTION:
Soundex:
Syntax: Soundex(%string%)
Returns a code which identifies a set of words that are (roughly) phonetically alike based
on the standard, open algorithm for SOUNDEX evaluation.
Inputdata:
Outputdata:
JOB:
Transformer_Str_Soundex properties
20.FUNCTION
146
Space:
Syntax: Space(%length%)
Returns a string of n space characters.
Inputdata:
Output:
JOB:
Transformer_Str_Space Properties:
21.FUNCTION
147
Squote:
Syntax: Squote(%string%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_Squote properties:
22.FUNCTION
148
Str
Syntax: Str(%string%,%repeats%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_Str properties;
23.FUNCTION:
149
StripWhiteSpace
Syntax: StripWhiteSpace(%string%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_StripWhiteSpace properties:
24.FUNCTION;
150
Trim:
Syntax: Trim(%string%,[%stripchar%],[%option%])
Inputdata:
Outputdata:
JOB:
Transformer_Str_Trim properties:
Outputdata:
151
JOB:
Transformer_Str_Trim_Tab Properties:
152
25.FUNCTION:
TrimF:
Syntax: TrimF(%string%)
Inputdata:
Output data:
JOB:
Transformer_Str_TrimF Properties:
153
ADDRESS Field Derivation;
TrimF(DSLink5.ADDRESS)
26.FUNCTION
TrimB:
Syntax:
Inputdata:
Outputdata:
JOB:
154
Transformer_Str_TrimB Properties:
TrimB(DSLink5.CNAME)
27.FUNCTION:
TrimLeadingTrailing:
Syntax: TrimLeadingTrailing(%string%)
Inputdata:
Output data:
155
JOB:
Transformer_Str_Trim properties:
TrimLeadingTrailing(DSLink5.CNAME)
28.FUNCTION:
UpCase:
Syntax: UpCase(%string%)
Inputdata:
156
Outputdata:
JOB:
Transformer_Str_UpCase properties:
UpCase(DSLink5.ADDRESS)
157
7. TRANSFORMER TYPE CONVERSION FUNCTIONS:
STRING FUNCTIONS:
2. Alnum: Checks whether the given string contains only alphanumeric characters.
Syntax: Alnum(%string%)
Input data:
Output data:
JOB:
INPUT PROPERTIES:
158
Transformer_Str_Alnum properties:
Logic:Alnum(INPUT.EMPNO)
Output data:
159
2.FUNCTION:
Alpha
Syntax: Alpha(%string%)
Input data:
Outputdata:
JOB:
Transformer_str_Alpha properties:
160
3.FUNCTION:
CompactWhiteSpace:
Syntax: CompactWhiteSpace(%string%)
Input Data:
Outputdata:
JOB:
Transformer_Str_CompactWhiteSpace properties:
4.FUNCTION
Compare
Syntax: Compare(%string1%,%string2%,[%justification%])
Input Data:
161
Output data:
5.FUNCTION
CompareNum:
Syntax: CompareNum(%string1%,%string2%,%length%)
Input data:
162
Output data:
Note:In this job If first 5 characters same in string1,sting2 it return 0,If string 1>string2 it
returns 1,if string1<string2 it returns -1
JOB:
Transformer_str_compareNum properties:
6.FUNCTION:
CompareNumNoCase
Syntax: CompareNumNoCase (%string1%,%string2%,%length%)
Input data:
163
OutputData:
JOB:
Transformer_str_compareNumNoCase properties:
164
7.FUNCTION:
Convert:
Syntax: Convert(%fromlist%,%tolist%,%expression%)
Inputdata:
Outputdata:
JOB:
Transformer_str_Convert properties:
165
Outputdata:
8.FUNCTION
Count:
Syntax: Count(%string%,%substring%)
Inputdata:
Outputdata:
JOB:
Transformer_str_Count properties:
166
Transformer_str_Count Field derivation:
Count(Input.STRING1,"A")
Note in the Above field STRING1 Contain string A repeated twise:
9.FUNCTION:
Dcount:
Syntax: Dcount(%string%,%delimiter%)
Counts the number of delimited fields in a string.
InputData:
Outputdata:
JOB:
Transformer_str_Dcount properties:
167
Transformer_str_Dcount field derivation:
Dcount(Input.STRING1,"|")
10.FUNCTION:
DownCase:
Syntax: DownCase(%string%)
Inputdata:
Output Data:
JOB:
168
Transformer_str_DownCase properties:
11.FUNCTION:
Dquote:
Syntax: DQuote(%string%)
Inputdata:
Outputdata:
JOB:
169
Transformer_Str_Dquote properties:
12.FUNCTION;
Field:
Syntax: Field(%string%,%delimiter%,%occurrence%,[%number%])
Inputddata;
Outputdata:
170
JOB:
Transformer_Str_Filed properties:
13.FUNCTION
Index:
Syntax; Index(%string%,%substring%,%occurrence%)
Inputdat:
Outputdata:
JOB:
171
Transformer_Str_Index properties:
14.FUUNCTION
Left:
Syntax: Left(%string%,%length%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_left properties:
172
15.FUNCTION
length:
Syntax: Len(%string%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_length properties:
173
16.FUNCTION;
Num:
Syntax:Num (%string%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_Num properties:
Note: if Given field contain a value is num then it returns 1. if the given string value is
Alphabetic value or AlphaNumeric value then then it return in output 0
174
17.FUNCTION
PadString
Syntax: PadString(%string%,%padstring%,%padlength%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_PadString properties:
175
18.FIELD
Right
Syntax: Right(%string%,%length%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_Right properties:
176
19.FUNCTION:
Soundex:
Syntax: Soundex(%string%)
Returns a code which identifies a set of words that are (roughly) phonetically alike based
on the standard, open algorithm for SOUNDEX evaluation.
Inputdata:
Outputdata:
JOB:
Transformer_Str_Soundex properties
20.FUNCTION
177
Space:
Syntax: Space(%length%)
Returns a string of n space characters.
Inputdata:
Output:
JOB:
Transformer_Str_Space Properties:
21.FUNCTION
178
Squote:
Syntax: Squote(%string%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_Squote properties:
22.FUNCTION
179
Str
Syntax: Str(%string%,%repeats%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_Str properties;
23.FUNCTION:
180
StripWhiteSpace
Syntax: StripWhiteSpace(%string%)
Inputdata:
Outputdata:
JOB:
Transformer_Str_StripWhiteSpace properties:
24.FUNCTION;
181
Trim:
Syntax: Trim(%string%,[%stripchar%],[%option%])
Inputdata:
Outputdata:
JOB:
Transformer_Str_Trim properties:
Outputdata:
182
JOB:
Transformer_Str_Trim_Tab Properties:
183
25.FUNCTION:
TrimF:
Syntax: TrimF(%string%)
Inputdata:
Output data:
JOB:
184
Transformer_Str_TrimF Properties:
TrimF(DSLink5.ADDRESS)
26.FUNCTION
TrimB:
Syntax:
Inputdata:
Outputdata:
185
JOB:
Transformer_Str_TrimB Properties:
TrimB(DSLink5.CNAME)
27.FUNCTION:
TrimLeadingTrailing:
Syntax: TrimLeadingTrailing(%string%)
Inputdata:
186
Output data:
JOB:
Transformer_Str_Trim properties:
TrimLeadingTrailing(DSLink5.CNAME)
187
28.FUNCTION:
UpCase:
Syntax: UpCase(%string%)
Inputdata:
Outputdata:
JOB:
188
Transformer_Str_UpCase properties:
UpCase(DSLink5.ADDRESS)
Output Data:
JOB:
189
Transformer_Time_Date Properties:
2.CurrentTime: Returns the time at which the job runs in time format.
Syntax:CurrentTime()
InputData:
OutputData:
JOB:
Transformer_Time_Date Properties:
190
Currenttime Field Derivation:
CurrentTime()
3.CurrentTimeStamp: Returns a timestamp giving the date and time that the job runs in
timestamp format
Syntax:CurrentTimeStamp()
InputData:
OutputData:
JOB:
Transformer_Time_Date Properties:
191
CurrentTimeStamp Field Derivation:
CurrentTimeStamp()
Output Data:
JOB:
Transformer_Time_Date Properties:
192
Currentdate Field derivation:
CurrentDate()
2.CurrentTime: Returns the time at which the job runs in time format.
Syntax:CurrentTime()
InputData:
OutputData:
JOB:
193
Transformer_Time_Date Properties:
3.CurrentTimeStamp: Returns a timestamp giving the date and time that the job runs in
timestamp format
Syntax:CurrentTimeStamp()
InputData:
OutputData:
JOB:
194
Transformer_Time_Date Properties:
JOB PARAMETERS:
Job parameters are variables which are using to reduce the redundancy of work there are
two types of job parameters are available
1. Local parameters
2. Global parameters
195
Note: Global parameter should have prefix as $ symbol
CONTAINERS:
Containers are used to minimize the complexity of job for better understanding and
reusability purpose
There are two types of containers are available in data stage
1. Local Container
2. Shared Container
Local Container:
It is used to minimize the complexity of job for better understanding purpose only it
never used for reusability purpose and its limit with in a job.
Shared Container:
It is used to both the purpose like to minimize the complexity of job and reusability
purpose and its limit with in a project.
Differences between Local Containers and Shared Container:
196
old output link (shared container link) to new out put linkgo to columns click on
loadselect reconcile from container link (old link)click on validatedo it for
remaining linksclick ok
SWITCH STAGE:
It is used to filter the records based on constraint while populating the data from source to
target it takes one input link and gives more than one out put link
Note:
It takes only equal operator double click on switch stageselector= deptnocase=10,
case=20, case=30drag and drop of required column links
Example:
Oracle_Enterprise_0 properties:
197
Switch stage properties:
Outputmappings:
Outputname=T1;
198
Output_Dataset_2 properties:
199
output:
Outputname=T2;
200
Output Dataset_3 Properties:
output:
Outputname=T3;
201
Output Dataset_4 Properties:
Output:
DIFFERENCE STAGE:
It is used to find out the difference of two input files (or) two input tables.
It takes only two input links and gives one output link .
Difference stage takes some kind of meta data from two input links ,it gives one
extra column to the output link called change_code.
If change_code=1 then its anew record, if change_code=2, it’s a copy record ,if
change_code=3 it’s a updated record.
202
Example:
COMPRESS STAGE:
It is used to zip the file by using UNIX command. It takes one input link and gives
one output link.
EXPAND STAGE:
It is used to unzip the zipped file into normal format command. It takes one input
link and gives one output link.
DECODE STAGE:
It is used to decode a particular file into unknown format which was preferred to
security purpose. It takes one input link and gives one output link.
203
ENCODE STAGE:
It is used to encode the decode file into normal. It takes one input link and gives one
output link.
PEEK STAGE:
It is used to find out what records are going to which node.It is a file type stage but
can not a saved a file.It take one input link and one output link
JOB:
204
Input sequential file properties:
205
Peek stage Output columns:
206
Output seq file data:
Option outputmode=Joblog:
Job:
207
Input seqfile data:
208
peek stage properties:
Here we set the option Peek outputmode=joblog so we can the data at Logs only
Procedure for see the data at logs:
Goto the tools and rundirectornow click on view log it will show the screen like
209
in the above screen from bottom to 8th row u click it will show the log details
210
EXAMPLE JOB FOR PEEKSTAGE:
Inputdata:
JOB:
211
Input seqfile data:
212
Peek stage properties:
213
peek output1 mappings:
214
peekoutput3 columns:
215
Peekoutput3 mappings:
peekout1 properties:
216
peek output1 data:
Peekoutput2 properties:
217
Peekoutput2 data:
Peekoutput2 properties:
Peekoutput3 data:
218
Peekoutput3 properties:
MULTIPLE INSTANCE:
Multiple instance is a good concept available in data stage parallel job. Through multiple
instance a developer can run a physical job more than one time at a time parallely with
the invocation ID.
How to run a job more than one time:
Go to data stage designer open a required jobgo to job propertiescheck the box
Allow multiple instanceclick ok
CONFIGURATION FILE:
The configuration file is available in data stage server and the extension of the
configuration file is .APT and which is used to know how many nodes are available in
the particular projects.
It contains four components
1. Fast Name: It is the name of the Node
2. Pools: Which is used for specific task
3. Resource Scratch Disk: It is temporary memory
4. Resourced Disk: It is permanent memory.
The Configuration file name is Default. Apt
How to view Configuration file:
Go to data stage managerclick on tools go to configurationopen default
219
How to run a specific job with two nodes if project is running with four nodes:
Go to data stage manageropen default configuration filesave as with another name
by clicking down save button(say sample)now delete two nodes from sample
filesave and close.
Go to data stage designeropen a particular job go to job propertiesgo to
parametersclick on add environment variable select APT_CONFIG_FILEDouble
click on default value Choose Sample. Aptclick ok Now our required job is run with
two nodes.
COMBINALITY MODE:
It is used assign a single processor to all homogeneous (or) similar stages in data stage
designer
How to maintain combinality:
Go to data stage designer open required job go to processing stagesgo to stage and
advanced make it combinality node= combinality
ACTIVE STAGES:
The active stage is nothing but if the data is going to modify into particular stage then the
stage is called active stage
Example: Transformer, Aggrigator.
PASSIVE STAGE:
The passive stage is nothing but if the data is going to modify on that stage then the stage
is called passive stage.
RUNNING AJOB THROUGH UNIX COMMAND:
Dsjob –run –job status -param<parameter1>=<value1>param<parameter2>=<value2> -
par am<parameter n>=<value n>
220
JOB SEQUENCE:
It is used to run all jobs in a sequence (or) in a order by considering its dependencies. it
has many activities.
How to go to job sequence:
Select job sequencedrag drop of required jobs from jobs in repositorygive
connectionsave itcompile it now sequentially these 3 jobs will be run.
NOTIFICATION ACTIVITY:
It is used to send a mail to required persons automatically.
Double click on notification activity go to notification SMTP mail server: Company
name (www.xyz.com) ,sender email address: abreddy2@xyz.com, recipients email
address: recipients email address : abreddy2@xyz.com Email subject:Aggregatot job has
been aboarted,give some information on the bodyclick ok
221
TERMINATOR ACTIVITY:
It is used to send stop request to all running job.
Double click on wait for file activitygo to filefilename: select the file and set
timing(24 hours time only)
222
SEQUENCER:
It is used to connect one activity to another activity it takes more input links and gives
one output link
223
ROUTINE ACTIVITY:
It is used to execute a routine between two jobs
Double click on routine activitychoose routine nameif required parameter give
parameter.
224
SLOWLY CHANGING DIMENSIONS:
There are 3 types of SCD‘s Available in DWH.
Type1: It always maintains current data and updated data
Type2: It always maintains current data and full historical data
Type3: It always maintains current data and partial historical data
EXERICE-1:
no name sal
100 Bhaskar 1500
101 Mohan 2000
103 Sanjeev 2000
no name sal
100 Bhaskar 1000
101 Mohan 1500
102 Srikanth 2000
no name sal
100 Bhaskar 1500
101 Mohan 2000
102 Srikanth 2000
103 Sanjeev 2000
Type-I:
In SCD Type-I If a record exists in source table and not exists in target table then simply
insert a record into target table (103 record) if a record exists in both source and target
tables then simply update source record into target table(100,101)
Type-II:
While implementing SCD Type-II there are two extra columns are maintained in target
called Effective Start Date and Effective End Date .Effective start date is also part of
primary key.
If a record exists in source and not exists in target table then simply insert records into
target table. while inserting put Effective Start Date is equal to current date and effective
end date set null.
225
If a record exists in both source and target tables even though we are inserting a
source record into target table but before insert a record into target table the existing
record in target table update effective End Date=CurrentDate-1.
Now insert source record into target table effective start date=Current Date and Effective
End Date=Null
Type-III:
If a record exists in source and not exists in target table then simply insert records
Into target table. While inserting put Effective start Date is equal to Current Date and
Effective End Date set Null.
If a record exists in both source and target tables then check target table count group by
primary key if count=1 then update Effective End Date=Current Date-1 then simply
insert source record into target record.
If count greater than one then delete a record into target table group by primary key
where Effective End Date=Not Null. Now update target record Effective End
Date=Current Date-1 Then simply insert source record into target.
226
DATAWAREHOUSE:
Data ware house is nothing but collection of transactional data and historical data and can
be maintained in dwh for analysis purpose.
They are 3 types of tools should be maintained on any data warehousing project
1. ETL Tools
2. OLAP Tools (or) Reporting Tools
3. Modeling Tool
ETL TOOL:
ETL is nothing but Extraction, Transformation, and Loading. a ETL Developer(those
who are expertise in dwh extracts data from heterogeneous databases(or)Flat files,
Transform data from source to target(dwh) while transforming needs to apply
transformation rules and finally load data into dwh.
There are several ETL Tools available in the market those are
1. Data stage
2. Informatica
3. Abinitio
4. Oracle Warehouse Builder
5. Bodi (Business Objects Data Integration)
6. MSIS (Microsoft Integration Services)
OLAP:
OLAP is nothing but Online Analytical Processing and these tools are called as reporting
Tools Also
A OLAP Developer analyses the data ware house and generate reports based on selection
criteria.
There are several OLAP Tools are available
1. Business Objects
2. Cognos
3. Report Net
4. SAS
5. Micro Strategy
6. Hyperion
7. MSAS (Microsoft Analysis Services)
227
MODELING TOOL:
Those who are working with ERWIN Tool called data modeler .A data modeler can
design data base of DWH with the help of fallowing tools
A ETL Developer can extract data from source databases (or) flat files(.txt,csv,.xls etc)
and populates into DWH .While populating data into DWH they are some staging areas
can be maintained between source and target .these staging areas are called staging area1
and staging area2.
STAGING AREA:
Staging Area is nothing but is temporary place which is used for cleansing unnecessary
data (or) unwanted data (or) inconsistency data.
ER Modeling:
ER Modeling is nothing but entity relationship modeling. in this model always call table
as entities and it may be second normal form (or) 3rd normal form (or) in between 2nd and
3rd normal form
Dimensional Modeling:
In this model tables called as dimensions (or) fact tables. It can be subdivided into three
schemas.
1. Star Schema
2. Snow Flake Schema
3. Multi Star Schema (or) Hybrid (or) Galaxy
Star Schema:
A fact table surrounded by dimensions is called start schema. it looks like start
In a start schema if there is only one fact table then it is called simple start schema.
In a start schema if there are more than one fact table then it is called complex start
schema
228
Sales Fact table:
Sale_id
Customer_id
Product_id
Account_id
Time_id
Promotion_id
Sales_per_day
Profit_per_day
Account Dimension:
Account_id
Account_type
Account_holder_name
Account_open_date
Account_nominee
Account_open_balence
Pramotion:
Promotion_id
Promotion_type
Promotion_date
Pramotion_designation
Pramotion_Area
229
Product:
Product_id
Product_name
Product_type
Product_desc
Product_version
Product_stratdate
Product_expdate
Product_maxprice
Product_wholeprice
Customer:
Cust_id
Cust_name
Cust_type
Cust_address
Cust_phone
Cust_nationality
Cust_gender
Cust_father_name
Cust_middle_name
Time:
Time_id
Time_zone
Time_format
Month_day
Week_day
Year_day
Week_Yeat
DIMENSION TABLE:
If a table contains primary keys and provides detail information about the table
(or) master information of the table then called dimension table.
FACT TABLE:
If a table contains more foreign keys and it’s having transactions, provides
summarized information such a table called fact table.
DIMENSION TYPES:
There are several dimension types are available
230
CONFORMED DIMENSION:
If a dimension table shared with more than one fact table (or) having foreign key more
than one fact table. Then that dimension table is called confirmed dimension.
DEGENERATED DIMENSION:
If a fact table act as dimension and it’s shared with another fact table (or) maintains
foreign key in another fact table .such a table called degenerated dimension.
JUNK DIMENSION:
A junk dimension contains text values, genders,(male/female),flag values(True/false) and
which is not use full to generate reports. Such dimensions is called junk dimension.
DIRTY DIMENSION:
If a record occurs more than one time in a table by the difference of non key attribute
such a table is called dirty dimension
ADDITIVE FACTS:
If there is a possibility to add some value to the existing fact in the fact table .that facts
we called as additive fact.
If there is possibility to add some value to the existing fact up to some extent in the fact
table is we called as semi additive fact.
231
DIFFERENCE BETWEEN STAR SCHEMA AND SNOW FLAKE SCHEMA:
PREPARED BY
BHASKAR REDDY.A
Mail:abreddy2003@gmail.com
Contact: 91-9948047694
232
233