Professional Documents
Culture Documents
DataStage Material (Question and Answers Collection)
DataStage Material (Question and Answers Collection)
DATASTAGE
4/16/2012
D
DA
ATTA
ASSTTA
AG
GEE
ByPappuKumarPage2
D
DA
ATTA
ASSTTA
AG
GEE
DatastageDesigner
DatastageServer
DatastageDirector
TCP/IP
DatastageManager
DatastageRepository
DatastageAdministrator
ByPappuKumarPage3
D
DA
ATTA
ASSTTA
AG
GEE
When I was installed Data stage software in our personal PC its automatically comes in
our PC is having 4 components in blue color like DATASTAGE ADMINISTRATOR,
DATASTAGE DESIGNER, DATASTAGE DIRECTOR, DATASTAGE MANAGER. These are the
client components.
DS Client components:1) Data Stage Administrator:This components will be used for to perform create or delete the projects. , cleaning
metadata stored in repository and install NLS.
2) Data stage Manager:it will be used for to perform the following task like..
a) Create the table definitions.
b) Metadata back-up and recovery can be performed.
c) Create the customized components.
3) Data stage Director:It is used to validate, schedule, run and monitor the Data stage jobs.
4) Data stage Designer:It is used to create the Datstage application known as job. The following activities can
be performed with designer window.
a) Create the source definition.
b) Create the target definition.
c) Develop Transformation Rules
d) Design Jobs.
Data Stage Repository:-
ByPappuKumarPage4
D
DA
ATTA
ASSTTA
AG
GEE
It is one of the server side components which is defined to store the information
about to build out Data Ware House.
Data Stage Server:This is defined to execute the job while we are creating Data stage jobs.
2. What is a job? And Types of the Job?
Ans:- Job is nothing but it is ordered series of individual stages which are linked together
to describe the flow of data from source and target.
There are three types of jobs can be designed.
a) Server jobs
b) Parallel Jobs
c) Mainframe Jobs
3. Have you work either parallel jobs or server jobs?
Ans:- I had been working parallel job since 3+ years onwards
e)
f)
ByPappuKumarPage5
D
DA
ATTA
ASSTTA
AG
GEE
5.
II. Active stages:- which defines the data transformation and filtering
the data known as active stage.
Ex:- All Processing Stages.
6. Explain parallism techniques?
Ans:- it is a process to perform ETL task in parallel approach need to build the data
warehouse. The parallel jobs support the following hardware system like SMP, MPP to
achieve the parallism.
There are two types of parallel parallism techniques.
A. Pipeline parallism.
B. Partition parallism.
Pipeline parallism.:- the data flow continuously throughout it pipeline . All stages in
the job are Operating simultaneously.
For example, my source is having 4 records as soon as first record starts processing, then all
remaining records processing simultaneously.
Partition parallism:- in this parallism, the same job would effectively be run simultaneously
by several processors. Each processors handles separate subset of total records.
For example, my source is having 100 records and 4 partitions. The data will be equally
partition across 4 partitions that mean the partitions will get 25 records. Whenever the first
partition starts, the remaining three partitions start processing simultaneously and parallel.
7. What is configuration file? What is the use of this in data stage?
It is normal text file. it is having the information about the processing and storage
resources that are available for usage during parallel job execution.
The default configuration file is having like
ByPappuKumarPage6
D
DA
ATTA
ASSTTA
AG
GEE
Round Robin
g) Same
9)ExplaineachandeveryFilestages?
File stages
Note:- All file stage are passive stages means which defines just to read or write access only.
ByPappuKumarPage7
D
DA
ATTA
ASSTTA
AG
GEE
it is one of the file stages which it can be used to reading the data from file or writing the data
to file. It can support single input link or single output link and as well as reject link.
Dataset:It is also one of the file stages which it can be used to store the data on internal format, it is
related operating system. So, it will take less time to read or write the data.
ByPappuKumarPage8
D
DA
ATTA
ASSTTA
AG
GEE
File Set:It is also one of the file stage which it can be used to read or write the data on file set. The file
it can be saved with the extension of .fs. it operating parallel
ByPappuKumarPage9
D
DA
ATTA
ASSTTA
AG
GEE
ByPappuKumarPage10
D
DA
ATTA
ASSTTA
AG
GEE
Complex Flat File:This file is used to read the data form Mainframe file. By using CFF we can read ASCII or EBCDIC
(Extended Binary coded Decimal Interchange Code) data. We can select the required columns and
can omit the remaining. We can collect the rejects (bad formatted records) by setting the property
reject to save (other options: continue fail). We can flatten the arrays (COBOL files).
ByPappuKumarPage11
D
DA
ATTA
ASSTTA
AG
GEE
11) Explainaboutvarioustypesofprocessingstages?
Processing Stages
Aggregator stage:It is one the processing stage which it can be used to perform the summaries for the group
of input data. It can support single input link which carries the input data and it can support single
out put link which carries aggregated data to output link.
When I was go for properties for aggregator stage..Double click on aggregate stage then it will
show
ByPappuKumarPage12
D
DA
ATTA
ASSTTA
AG
GEE
Copy stage:It is also one of the processing stages which it can be used just to copy the input data to number
of output links. It can support single input link and number of output links.
When I was go for the properties of the copy stage. Double click on copy stage It will show like
ByPappuKumarPage13
D
DA
ATTA
ASSTTA
AG
GEE
Filter Stgae:it is also one of the processing stage which it can be used tooo perform the filter the data
based on given condition. It can support single input link and n no of output links and optinally it
support one reject link.
When I was go for propertief of filter stage. Double click on filter stage it will show
ByPappuKumarPage14
D
DA
ATTA
ASSTTA
AG
GEE
Switch stage:- it is also one of the processing stage which it can be used to filter the input data
based on given conditions. It can support single input link and 128 output links .
When I was go for properties of switch stage.. double click on switch stage..
ByPappuKumarPage15
D
DA
ATTA
ASSTTA
AG
GEE
Join stage:It is also one of the processing stages which can be used to combine two or more input datasets
based on key field. It can support two or more input datasets and one output dataset, and doesnt
support reject link.
Join can be performing inner join, left outer, right outer, full-outer joins.
Inner join means to display the matched records from both the side tables.
Left-outer join means to show the matched records from both sides as well as unmatched records
from left side table.
Right-outer join means to show the matched records from both sides as well as unmatched records
from Right side table.
Full-outer join means to show the matched as well as unmatched records from both sides.
When I was go to the properties of join stage. Double click on join stage
ByPappuKumarPage16
D
DA
ATTA
ASSTTA
AG
GEE
Merge stage:It is also one of the processing stages which it can be used to merge the multiple input data. It
can support multiple input links, the first input link is called master input link and remaining
links are called Updated links.
It can be perform inner join and left-outer join only.
ByPappuKumarPage17
D
DA
ATTA
ASSTTA
AG
GEE
When I was go to the properties of the merge stage..Double click on merge stage
ByPappuKumarPage18
D
DA
ATTA
ASSTTA
AG
GEE
Q) On which case inner join perform and on which case left-outer join perform?
In merge stage it is having one property is here. To see the above picture..
If Unmatched Master Mode= Drop then it will be perform inner join.
If Unmatched Master Mode= keep then it will be perform left-outer join.
Look-up stage:This is also one of the processing stages which can be used to look-up on relational tables.
It can support multiple input links and single output link and support single reject link.
This is simple job for regarding on explanation of look-up stage
ByPappuKumarPage19
D
DA
ATTA
ASSTTA
AG
GEE
ByPappuKumarPage20
D
DA
ATTA
ASSTTA
AG
GEE
ByPappuKumarPage21
D
DA
ATTA
ASSTTA
AG
GEE
Funnel stage:- it is also one of Active processing stage which can be used to combined the
multiple input datasets into single output datasets.
Note:- all the input datasets is having same structure.
ByPappuKumarPage22
D
DA
ATTA
ASSTTA
AG
GEE
Remove Duplicate Stage:- it is also one of processing stage which it can be used to remove
the duplicates data based on key field.
It can support single input link and single output link.
ByPappuKumarPage23
D
DA
ATTA
ASSTTA
AG
GEE
Sort stage:It is also one of the processing stage which can be used to sort data based on key field,
either ascending order or descending order.
It can be support single input link and single output link.
ByPappuKumarPage24
D
DA
ATTA
ASSTTA
AG
GEE
Modify Stage:It is also one of the processing stages which it can be used to when you are able to handle Null
handling and Data type changes. It is used to change the data types if the source contains the
varchar and the target contains integer then we have to use this Modify Stage and we have to
change according to the requirement. And we can do some modification in length also.
ByPappuKumarPage25
D
DA
ATTA
ASSTTA
AG
GEE
Basically you would use a pivot stage when u need to convert those 3 fields like m1,m2,m3 into a
single field marks which contains a unique value per row...i.e. You would need the following
ByPappuKumarPage26
D
DA
ATTA
ASSTTA
AG
GEE
output
ByPappuKumarPage27
D
DA
ATTA
ASSTTA
AG
GEE
Surrogate key Generator:It is also one important stage on processing stage which it can be used to generate the
sequence numbers while implementing slowly changing dimension. It is a system generated key on
dimensional tables.
ByPappuKumarPage28
D
DA
ATTA
ASSTTA
AG
GEE
Transformer Stage:It is an active processing stage which allows filtering the data based on given
condition and can derive new data definitions by developing an expression. This stage uses
Microsoft .net framework environment for its compilation.
The transformer stage can be performing data cleaning and data scrubbing operation. It can
have single input link and number of output links and also reject link.
ByPappuKumarPage29
D
DA
ATTA
ASSTTA
AG
GEE
Change Capture Stage:This is also one of the active processing stage which it can be used to capture the
changes between two sources like After and Before. The source which is used as reference to
capture the change is called after dataset. The source which we are looking for the change is called
before dataset. The change code will be added in output dataset. So, by this change code will be
recognizing delete, insert or update.
ByPappuKumarPage30
D
DA
ATTA
ASSTTA
AG
GEE
ByPappuKumarPage31
D
DA
ATTA
ASSTTA
AG
GEE
15)ExplainDevelopmentandDebugstages?
ByPappuKumarPage32
D
DA
ATTA
ASSTTA
AG
GEE
Column generator stage:This stage adds the columns to incoming data and generates mock data for these columns for each
data row processed. It can have single input link and single output link.
ByPappuKumarPage33
D
DA
ATTA
ASSTTA
AG
GEE
Input data:-
Output data:-
ByPappuKumarPage34
D
DA
ATTA
ASSTTA
AG
GEE
Head Stage:This stage helpful for testing .and debug the application with large datasets. This stage selects
TOP N rows from the input dataset and copies the selected rows to an output datasets. It can have
a single input link and single output link.
Tail Stage:This stage helpful for testing and debug the application with large datasets. This stage selects
BOTTOM N rows from the input dataset and copies the selected rows to an output datasets. It can
have a single input link and single output link.
ByPappuKumarPage35
D
DA
ATTA
ASSTTA
AG
GEE
Sample Stage:This stage will be having single input link and any number of output link when operating percent
or period mode.
ByPappuKumarPage36
D
DA
ATTA
ASSTTA
AG
GEE
Peek Stage:it can have a single input link and any number output link. It can be used to print the record
column values to the job log view.
ByPappuKumarPage37
D
DA
ATTA
ASSTTA
AG
GEE
Mock data:-
ByPappuKumarPage38
D
DA
ATTA
ASSTTA
AG
GEE
21)CanyouexplainType2implementation?
SCD type-2 is the common problems in DWH. It is to maintain the history information for particular
organization in target. So, for every update in the source, it insert new record in target.
In this implementation, it is having two input datasets like before and after datasets, these two
are connected to change capture stage which is connected to transformer stage which is having two
output links like insert link and update link. The insert link is connected to stored procedure stage
which is connected to transformer which is connected to target stage. And also other output link
(update link) of the transformer stage which is joined with the target stage while removing records
by using remove duplicate stage. The output link of the join stage which is connected to
transformer stage which is connected to target update stage.
For example, the source is having EMP table with 100 records. When I was run the job, the
records was initially loaded into target insert stage, how it means, First compare two input
datasets, in first time there is no change in the records. So, the change capture stage gives the
change code=1. The transformer stage transforms the records from source to target by generating
sequence to the records by using stored procedure stage.
If any updation is occurred at source, that updation records will be stored to target side
(TGT_UPDATE). How it will be store means, first two compare two input datasets, changes is
occurred at source level then change capture stage gives the change code=3 . by using this change
ByPappuKumarPage39
D
DA
ATTA
ASSTTA
AG
GEE
code, the transformer stage transform the records to join stage through the update link. Join stage
joins the updated records and target records by removing duplicate records using remove duplicate
stage. The output of the join stage to connected to transform stage which was transforming update
records to target update stage.
17)WhatisEnvironmentVariables?
Basically Environment variable is predefined variable those we can use while creating DS
job. We create/declare these variables in DS Administrator. While designing the job we set the
properties for these variables. Environmental variables are also called as Global variables.
There are two types variables are there.
1. Local Variables
2.Environmental variables/Global Variables
Local Variables:- only for particular job only
Env Variables:- In any job through out ur project in this some default variables r there and also we
can define some user defined variables also.
How means, Creating project specific environment variables- Start up Data Stage Administrator.Choose the project and click the "Properties" button.- On the General tab click the "Environment..."
button.- Click on the "User Defined" folder to see the list of job specific environment variables.
Give me to you some example for environment variable. So that it will be more clear for us.
Example is
you want to connect to database you need use id , password and schema.
These are constant through out the project so they will be created as environment variables.
Use them where ever you are want with #Variable#.
By using this if there is any change in password or schema no need to worry about all the jobs.
Change it at the level of environment variable that will take care of all the jobs.
18) Explain Job parameters?
There is an icon to go to Job parameters in the tool bar. Or you can press Ctrl+J to enter into
Job Parameters dialog box. Once you enter give a parameter name and corresponding default value
for it. This helps to enter the value when you run the job. Its not necessary always to open the job
to change the parameter value. Also when the job runs through script its just enough to give the
parameter value in the command line of script. Else you have to change the value in the job
compile and then run in the script. So its easy for the users to handle the jobs using parameters.
20) What is difference between version Data stage 7.5 and 8.0.1?
Main differences b/w data stage 7.5.2 to 8.0.1
1. In data stage 7.5.2 we have manager as client. in 8.0.1 we dont have any manager client. the
manager client is embedded in designer client.
2. In 7.5.2 quality stage has separate designer .in 8.0.1 quality stage is integrated in designer.
ByPappuKumarPage40
D
DA
ATTA
ASSTTA
AG
GEE
3. In 7.5.2 code and metadata is stored in file based system. in 8.0.1 code is a file based system
where as metadata is stored in database.
4. In 7.5.2 we required operating system authentications. in 8.0.1 we require operating system
authentications and data stage authentications.
5. In 7.5.2 we dont have range lookup. In8.0.1 we have range lookup.
6. In 7.5.2 a single join stage can't support multiple references. In 8.0.1 a single join stage can
support
Multiple references.
7. In 7.5.2, when a developer opens a particular job, and another developer wants to open the
same job, that job cant be opened. in 8.0.1 it can be possible when a developer opens a particular
job and another developer wants to open the same job then it can be opened as read only job.
8. In 8.0.1 a compare utility is available to compare 2 jobs, one in development another is in
production. in 7.5.2 it is not possible
9. In 8.0.1 quick find and advance find features are available, in 7.5.2 not available
10. In 7.5.2 first time one job is run and surrogate key s Generated from initial to n value. Next
time the same job is compiled and run again surrogate key is generated from Initial to n.
Automatic increment of surrogate key is not in 7.5.2. But in 8.0.1 surrogate key is incremented
automatically. a state fiyle is used to store the maximum value of surrogate key.
Q) How did you handle Rejected Data?
Reject-link is defined and reject data is loaded back into DWH. So reject link has to be defined
every output link you wish to collect rejected data. Reject data typically a bad data like duplicates
of primary keys or null-rows where data is expected.
Q) What are Routines and where/how are they written and have written any routine before?
It didnt use Routines at any time in my project. But I know the routines. Routines are stored
in the routine Branch of the DS Repository. Where you can create, view or Edit. The following
different types of routines.
1) Transform Function
2) Before-after Sub routines
3) Job control routines.
Q) Explain METASTAGE in DS 8.0.1?
It is used to handle the metadata which will be very useful for data lineage and data analysis
later on. Meta data is type of data we are handling. These data definitions are stored in repository
and can be accessed with the use of Meta stage.
ByPappuKumarPage41
D
DA
ATTA
ASSTTA
AG
GEE
Quality stage can be integrated with data stage. In quality stage we have many stages like
investigate, match, survivorship, like that we can do the quality related works and we can
integrate with the data stage we need quality stage plug-in to achieve the task.
There is a stage named Stored Procedure available in Data stage palette under Database
category. You can use that stage to call your procedure in Data stage jobs.
Controlling Data stage jobs through some other Data stage jobs. Ex: Consider two Jobs XXX and
YYY. The Job YYY can be executed from Job XXX by using Data stage macros in Routines.
To Execute one job from other job, following steps needs to be followed in Routines.
1. Attach job using DSAttachjob function.
2. Run the other job using DSRunjob function
3. Stop the job using DSStopJob function
ByPappuKumarPage42