Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 22

Case transform - This transform allows you to specify multiple conditions and route the data to multiple targets

depending upon the condition. If you observe the case transform icon, it says it will accept one source and will produce multiple targets. Only one of multiple branches is executed per row. Will see what all the options available, Label: This is nothing but the Name of Condition, if case condition is true the data will route to that particular target. Expression: This defines cases and labels for output paths Default: Only available if Produce default option when all expression are false option is enabled. True: For one case only option is enabled, the row is passed to 1 st case whose expression returns true else passed to all cases whose expression returns true. Data O/P: Connection between case transform & obj used for a particular case must be labeled. Each O/P label must be used at least once Design Steps:

Design a DF that extract records from EMP table Now place a case transform on to the workspace from local object library Connect source to case transform, give a dbl. click on case transform, Case editor window will be opened Now you can add the case conditions according to your requirement.

Here my requirement is, based on the DEPTNO i want to route the records and load them in correspoding tables respectively If DEPTNO = 10 then load in TGT_10 DEPTNO = 20 then load in TGT_20 DEPTNO = 30 then load in TGT_30 DEPTNO = 40 as default

After defining the case condtions, comeout from the editor and link the appropriate targets.

Validate the DF, save the job and execute it.

Now check the data in each target.

Date_Generation Transform
Notes on Date_Generation transform: This is ultimate transform for creating Time dimension tables. It generates dates incremented as you specify. Options:

Start date : Well DI document says that start date range starts from 1900.01.01, I did some sanity test and date range starts from 1752.09.14 onwards. In 11.5 you when you mention date like this and while validating the DF, DI will show you an error stating that date range starts from 1900.01.01 onwards. But there is tricky way to workout this. It is as simple as that, only thing that you need to do is save

the job after designing your time dimension without validating the DF/Job. However, this has been overwhelmed in later versions. And one can pass variables inspite of selecting values. End date : The date range ends with 9999.12.31. Instead of selecting values we can pass variables also. Increment : We can specify date intervasl between the sequence. We can increment daily, weekly and monthly. Join rank : While constructing the join, sources will be joined based on their ranks. Cache : The dataset will be cahced in memory to be used in later transforms.

Design Steps:

Drag the date_generation transform from the object library on to the workspace, and in the next step connect a query transform and next step is your target object. Your design would be like this

Now open the date_generation transform and mention values, check the image for reference

Now in the query transform i had applied some funtions like month(), year(), quarter(), day_in_month(), week_in_year() etc. Check the image how i mapped.

Now youre done with the design part, save it and execute it. View the data

Merge Transform
Mege Transform: Combines two or more scehmas as a single schema, it is equivalent to UNION ALL operator in Oracle Input datasets must have same structure(same number of columns, same names), dataypes and their datasizes Design Steps:

Place your source tables on the work area Drag the Merge transform from the object library and connect each source to merge Click on the merge transform, nothing to do here just validate the window, comeout from the merge transform Now connect it to the TGT table, give a click on validate all and save the job and execute the job

Key_Generation Transform
When creating a dimension table in a data warehouse, we generally create the tables with a system generated key to unqiuely identify a row in the dimension. This key is also known as a surrogate key. Note on Key_Generation: To generate artificial keys in DI we can use either Key_Generation Transform or Key_Generation Funtion. It looks into the table and fetches max. existing key value and that will be used as a starting value. Based on this starting value, transform/function increments the value for each row.

Options: We have three options


Table Name : You should provide table name along with the Datastore and Owner (DATASTORE.OWNER.TABLE) Generated Key Column : The new artificial keys are inserted into this column. Remember your key column should be in any number datatype (REAL, FLOAT, DOUBLE, INTEGER, DECIMAL), if it is any other data type, then DI will throw an error. Increment value : Specify your interval for system generated key values. Surrogate key will be incremented based on this interval value. From 11.7 version onwards, we can pass variables also.

Design Steps: Here Im populating customer information, Ive a primary called Customer_ID in the both source & target tables, but I want to maintain a SURROGATE_KEY.

Have a glance on soruce data, here it is

key_gen transform always expects a SURROGATE_KEY column in SCHEMA IN

After completion of your job execution, here is the target customers_dim target data with surrogate key values.

Map_Operation Transform
Map_Operation: It allows you to change the opcodes on your data. Before discussing this we should know about opcodes precisely In DI we have 5 opcodes, Normal, Insert, Update, Delete and Discard(youll see this option in Map_Operaion only) Normal: Indeed, it creates a new row in the target. The data which is coming from the source, is usually flagged as normal opcode Insert: It does the same thing, it creates a new row in the target and the rows will be flagged as I Insert Update: If the rows are flagged as U , it overwrites an existing row in the target. Delete: If the rows are flagged as D, those rows will be deleted from the target

Discard: If you select this option, those rows will not be loaded into the target. Understanding Opcodes: Here is an example, in the below figure Im using (normal to normal, normal to insert, normal to update, normal to delete) opcodes. Here i have taken normal opcode mainly because, query transform always takes normal rows as input and produces normal rows as output. However, will share the remaining opcodes in Table comparison and History Preserving transforms.

In the first flow, i.e., MO_Normal-> i have selected Normal as Normal and discarded rest all opcodes.

This flow inserts all records in to the target which are coming from the source In the second flow, MO_Insert-> i have selected normal as insert and discarded rest all opcodes.

It does the same thing, inserts all records in to target. Have a glance on both the data sets before loading in to the target. You will see no opcode for Normal as Normal rows(1st flow), but you can see Insert opcode indicated as I for Normal as Insert (2nd flow).

In the third flow, i want to update few records in the target . o Lets say i want to update all the records whose deptno = 20.

Now, I have selected normal as update in the map_operation and discarded rest all.

Check the data, you can see the updated rows flagged as U

In the fourth flow, I want to delete some records from the target. o Lets say i want to delete rows whose deptno = 30, in the map_operation transform i have selected normal as delete and discarded rest all o In the qry transform i have filtered out few records, where i want to delete those from the target.

and you can see the data after the map_operation dataset along with delete opcode D.

In the target data set, the above records will be deleted. Check the target data

Now here in the sub-flow, i have inserted these deleted records in to another table. For this i have added one more Map_Operation and selected row type delete as insert.

Now in the last and final flow, i have discarded all the opcodes. I do not want to load the data in to the target.

Check the data

Pivot Transform
Pivot: This creates a new row for every value that you specified as a pivot column. Observe the icon, it says that will convert column to rows. Options:

Pivot_Sequence : It creates a sequence for each row created from a pivoted column Non Pivot : List of columns specified here by you, those will be displayed as it is in the target. Pivot Sets : For each n every pivot set, you can define a set of columns. For each set you will be having a Header column and a Data column. Header column consists all the pivoted columns, and Data column contains the actual data in the pivoted columns. Pivot Columns : Set of columns swivelled to rows.

Design Steps: Having 5 columns in the source table(Sno, Sname, Jan_sal, Feb_sal, Mar_sal). I want to convert salary column values in to rows

Drag the source table and target table from Datastore object library on to the workspace, drag the Pivot transform and place in between your source and target. Now, connect each object as shown in the below figure.

Have a glance on source data

Give a dbl. click on the Pivot Transform. Check the Pivot sequence name, by default PIVOT_SEQ will be there. If you want you can change or leave as it is. Now I want to load Sno, Sname as it is. So i have dragged these two columns on to the Non-pivotal list. Now drag all SAL Columns on to the Pivotal list. Default PIVOT_DATA, PIVOT_HDR names will be ge generated.

Save the definition, now you can see, (SNO, SNAME, PIVOT_SEQ, PIVOT_HDR, PIVOT_DATA) columns in Schema Out.

Come out from the Pivot tranform by pressing BACK button on the standard tool bar. Save the Dataflow, validate it and execute the job. Check out the resultant data.

You might also like