Professional Documents
Culture Documents
Example of SCD1 and Update Stratgey
Example of SCD1 and Update Stratgey
Example of SCD1 and Update Stratgey
Many beginners get it wrong to manually get the SCD working in Informatica. Lets see with a
simple example step by step.
Assumption : Working with Scott user in Oracle
SRC:
create table emps_us as select empno,ename,sal from emp ;
TGT:
create table empt_us as select empno,ename,sal from scott.emp where 1=2 ;
alter table empt_us add constraint eno_pk primary key(empno) ;
Step1: Lets Get the Simple Pass Through Working
Transfer the data from Source to target using Informatica
Note: For some reason if this operation is not done , the update may not work.
Step2 : Lets get the update working.
Objective : When the source rows changes , update only those changed rows in target
SRC: update emps_us set sal=sal+1000 where empno in ( 7900,7902,7934);
Power Center Designer
Drag the source ,target and create a Update Strategy Transformation.
Straight link src - Updtrans - tgt
Now we need to lookup into the tgt to see which rows to update. We are going to do it using a
unconnected lookup transformation.We want to lookup for rows where the SAL is changed.
Create lookup transformation,select target table , add two input port IN_EMPNO and IN_SAL
.This values will be supplied when this lkp transformation will be called. Add the conditions
empno=IN_EMPNO and SAL != IN_SAL
Ports -> enable R for empno . This just signifies that if the condition are met a true will be
returned.
Now we need to edit update strategy expression as
IIF( NOT ISNULL( :LKP.LKPTRANS(EMPNO,SAL)),DD_UPDATE,DD_REJECT )
Save and create a Workflow.
Imp:
1) WF->Properties->Treat Source Rows as > Change it to Data Driven.
2) The target should have a primary key for update
Now run the workflow . Observe that the modified rows are getting updated.
Step3: Now lets get the insert also working.
Objective : Now along with updating the existing rows we need to add new rows.
Select dynamic cache and insert else update options in lookup transformation
properties.
Create one dummy out port in expression transformation to pass date to target
and assign SYSDATE in expression editor.
Create two groups in router transformation one for INSERT and another one for
UPDATE.
Give condition NewLookupRow=1 for insert group and NewLookupRow=2 for
update group.
Connect insert group from router to insert pipe line in target and update group to
update pipe line target through update strategy transformation.
With coding for SCD Type1 by using Dynamic lookup transformation completed.
Execution:
Insert records in source CUST table by using following insert scripts.
SET DEFINE OFF;
Insert into CUST (CUST_ID, CUST_NM, ADDRESS, CITY, STATE, INSERT_DT,
UPDATE_DT)
Values (80001, 'Marion Atkins', '100 Main St.', 'Bangalore', 'KA',
SYSDATE,SYSDATE);
Insert into CUST (CUST_ID, CUST_NM, ADDRESS, CITY, STATE, INSERT_DT,
UPDATE_DT)
Values (80002, 'Laura Jones', '510 Broadway Ave.', 'Hyderabad', 'AP',
SYSDATE,SYSDATE);
Insert into CUST (CUST_ID, CUST_NM, ADDRESS, CITY, STATE, INSERT_DT,
UPDATE_DT)
Values (80003, 'Jon Freeman', '555 6th Ave.', 'Bangalore', 'KA',
SYSDATE,SYSDATE);
COMMIT;
Data in source will look like below.
Start work flow after insert the records in CUST table. After completion of this
work flow all the records will be loaded in target and data will be look like below.
Now update any record in source and re run the work flow it will update record in
target. If any records in source which are not present in target will be inserted in
target table.
( DEPTNO != DEPTNO1 )
) ,1,0 )
Step 7: Get a router transformation and connect the source rows(the rows that have "1"
postfixed) to it. Makesure the insert_flg and update
flg columns that you created also is connected for the same and give group names for this to
create two groups as follows
Group 1. Insert_rows: Group filter condition for this: Insert_flg
Group 2. Update_rows: Group filter condition for this: Update_flg
Step 8: Now connect the insert_rows groups of the router to the target. Note: here you should not
connect the SK column from the source
/router transformation, but you need to use a sequence generator tranformation and connect the
nextval column to the SK of the target as this
sequence needs to be updated to the next value whenever a new row gets added.
Step 9: Connect the update_rows of the router tranformation to an update strategy tranformation
with a transformation value of DD_UPDATE and
then connect to the Target instance 2( Note : target instance2 is nothing but a copy paste of the
target table and is not generated at the
target DB).
STep 10: Now create a workflow and the task for the same and run the transformation.
Make sure you commit the changes made on the source side :-)
The logic goes very simple:
1. First the lookup will look up in the cache table for a given row for existence.
2. If the SK does not exist, then it will go ahead and update the row in the target/Dimension
table.
3. After this the sequence generator will be updated to the next value WRT the target/Dimension
table.
4. If the SK exist then the condition will point to the update_flg and will do a DD_UPDATE of
the corresponding row in the target table.
5. Now, the same process will continue with the next row onwards.
6. Note the SK in the target table should be a primary key with out fail.
Slowly Changing Dimensions (SCDs) are dimensions that have data that changes slowly, rather
than changing on a time-based, regular schedule
For example, you may have a dimension in your database that tracks the sales records of your
company's salespeople. Creating sales reports seems simple enough, until a salesperson is
transferred from one regional office to another. How do you record such a change in your sales
dimension?
You could sum or average the sales by salesperson, but if you use that to compare the
performance of salesmen, that might give misleading information. If the salesperson that was
transferred used to work in a hot market where sales were easy, and now works in a market
where sales are infrequent, her totals will look much stronger than the other salespeople in her
new region, even if they are just as good. Or you could create a second salesperson record and
treat the transferred person as a new sales person, but that creates problems also.
Dealing with these issues involves SCD management methodologies:
Type 1:
The Type 1 methodology overwrites old data with new data, and therefore does not track
historical data at all. This is most appropriate when correcting certain types of data errors, such
as the spelling of a name. (Assuming you won't ever need to know how it used to be misspelled
in the past.)
Here is an example of a database table that keeps supplier information:
Supplier_Key
Supplier_Code
Supplier_Name
Supplier_State
123
ABC
Acme Supply Co
CA
In this example, Supplier_Code is the natural key and Supplier_Key is a surrogate key.
Technically, the surrogate key is not necessary, since the table will be unique by the natural key
(Supplier_Code). However, the joins will perform better on an integer than on a character string.
Now imagine that this supplier moves their headquarters to Illinois. The updated table would
simply overwrite this record:
Supplier_Key
Supplier_Code
Supplier_Name
Supplier_State
123
ABC
Acme Supply Co
IL
The obvious disadvantage to this method of managing SCDs is that there is no historical record
kept in the data warehouse. You can't tell if your suppliers are tending to move to the Midwest,
for example. But an advantage to Type 1 SCDs is that they are very easy to maintain.
Explanation with an Example:
Source Table: (01-01-11) Target Table: (01-01-11)
Emp no
Ename
Sal
101
1000
102
2000
103
3000
Emp no
Ename
Sal
101
1000
102
2000
103
3000
The necessity of the lookup transformation is illustrated using the above source and target table.
Source Table: (01-02-11) Target Table: (01-02-11)
Emp no Ename
Sal
Empno
Ename
Sal
101
1000
101
1000
102
2500
102
2500
103
3000
103
3000
104
4000
104
4000
In the second Month we have one more employee added up to the table with the Ename
D and salary of the Employee is changed to the 2500 instead of 2000.
Create a table by name emp_source with three columns as shown above in oracle.
In the same way as above create two target tables with the names emp_target1,
emp_target2.
Go to the targets Menu and click on generate and execute to confirm the creation of the
target tables.
The snap shot of the connections using different kinds of transformations are shown
below.
Here in this transformation we are about to use four kinds of transformations namely
Lookup transformation, Expression Transformation, Filter Transformation, Update
Transformation. Necessity and the usage of all the transformations will be discussed in
detail below.
The first thing that we are goanna do is to create a look up transformation and connect the
Empno from the source qualifier to the transformation.
What Lookup transformation does in our mapping is it looks in to the target table
(emp_table) and compares it with the Source Qualifier and determines whether to insert,
update, delete or reject rows.
In the Ports tab we should add a new column and name it as empno1 and this is column
for which we are gonna connect from the Source Qualifier.
The Input Port for the first column should be unchked where as the other ports like
Output and lookup box should be checked. For the newly created column only input and
output boxes should be checked.
(ii)Lookup Table Column should be Empno, Transformation port should be Empno1 and
Operator should =.
Expression Transformation: After we are done with the Lookup Transformation we are using
an expression transformation to check whether we need to insert the records the same records or
we need to update the records. The steps to create an Expression Transformation are shown
below.
Drag all the columns from both the source and the look up transformation and drop them
all on to the Expression transformation.
Now double click on the Transformation and go to the Ports tab and create two new
columns and name it as insert and update. Both these columns are gonna be our output
data so we need to have check mark only in front of the Output check box.
The Snap shot for the Edit transformation window is shown below.
The condition that we want to parse through our output data are listed below.
Input IsNull(EMPNO1)
Output iif(Not isnull (EMPNO1) and Decode(SAL,SAL1,1,0)=0,1,0) .
Filter Transformation: we are gonna have two filter transformations one to insert and other to
update.
Connect the Insert column from the expression transformation to the insert column in the
first filter transformation and in the same way we are gonna connect the update column in
the expression transformation to the update column in the second filter.
Later now connect the Empno, Ename, Sal from the expression transformation to both
filter transformation.
If there is no change in input data then filter transformation 1 forwards the complete input
to update strategy transformation 1 and same output is gonna appear in the target table.
If there is any change in input data then filter transformation 2 forwards the complete
input to the update strategy transformation 2 then it is gonna forward the updated input to
the target table.
Update Strategy Transformation: Determines whether to insert, delete, update or reject the
rows.
Drag the respective Empno, Ename and Sal from the filter transformations and drop them
on the respective Update Strategy Transformation.
Now go to the Properties tab and the value for the update strategy expression is 0 (on the
1st update transformation).
Now go to the Properties tab and the value for the update strategy expression is 1 (on the
2nd update transformation).
We are all set here finally connect the outputs of the update transformations to the target
table.
We see the implementation of SCD type 1 by using the customer dimension table as an example.
The source table looks as
CREATE TABLE Customers (
Customer_Id
Number,
Customer_Name Varchar2(30),
Location
Varchar2(30)
Now I have to load the data of the source into the customer dimension table using SCD Type 1.
The Dimension table structure is shown below.
CREATE TABLE Customers_Dim (
Cust_Key
Number,
Customer_Id
Number,
Customer_Name Varchar2(30),
Location
Varchar2(30)
)
Open the mapping designer tool, source analyzer and either create or import the source
definition.
Go to the Warehouse designer or Target designer and import the target definition.
Select the lookup Transformation, enter a name and click on create. You will get a
window as shown in the below image.
Edit the lkp transformation, go to the properties tab, and add a new port In_Customer_Id.
This new port needs to be connected to the Customer_Id port of source qualifier
transformation.
Go to the condition tab of lkp transformation and enter the lookup condition as
Customer_Id = IN_Customer_Id. Then click on OK.
Create the expression transformation with input ports as Cust_Key, Name, Location,
Src_Name, Src_Location and output ports as New_Flag, Changed_Flag
For the output ports of expression transformation enter the below expressions and click
on ok
New_Flag = IIF(ISNULL(Cust_Key),1,0)
Changed_Flag = IIF(NOT ISNULL(Cust_Key)
AND (Name != Src_Name
OR Location != Src_Location),
1, 0 )
Now connect the ports of lkp transformation (Cust_Key, Name, Location) to the
expression transformaiton ports (Cust_Key, Name, Location) and ports of source qualifier
transformation(Name, Location) to the expression transforamtion ports(Src_Name,
Src_Location) respectively.
Create a filter transformation and drag the ports of source qualifier transformation into it.
Also drag the New_Flag port from the expression transformation into it.
Edit the filter transformation, go to the properties tab and enter the Filter Condition as
New_Flag=1. Then click on ok.
Now create an update strategy transformation and connect all the ports of the filter
transformation (except the New_Flag port) to the update strategy. Go to the properties tab
of update strategy and enter the update strategy expression as DD_INSERT
Now drag the target definition into the mapping and connect the appropriate ports from
update strategy to the target definition.
Create a sequence generator transformation and connect the NEXTVAL port to the target
surrogate key (cust_key) port.
The part of the mapping diagram for inserting a new row is shown below:
Now create another filter transformation and drag the ports from lkp transformation
(Cust_Key), source qualifier transformation (Name, Location), expression transformation
(changed_flag) ports into the filter transformation.
Edit the filter transformation, go to the properties tab and enter the Filter Condition as
Changed_Flag=1. Then click on ok.
Now create an update strategy transformation and connect the ports of the filter
transformation (Cust_Key, Name, and Location) to the update strategy. Go to the
properties tab of update strategy and enter the update strategy expression as DD_Update
Now drag the target definition into the mapping and connect the appropriate ports from
update strategy to the target definition.
You might have come across an ETL scenario, where you need to update a huge
table with few records and occasional inserts. The straight forward approach of
using LookUp transformation to identify the Inserts, Update and Update Strategy to
do the Insert or Update may not be right for this particular scenario, mainly because
of the LookUp transformation may not perform better and start degrading as the
lookup table size increases.
In this article lets talk about a design, which can take care of the scenario we just
spoke.
The Theory
When you configure an Informatica PowerCenter session, you have several options
for handling database operations such as insert, update, delete.
Specifying an Operation for All Rows
During session configuration, you can select a single database operation for all rows
using the Treat Source Rows As setting from the 'Properties' tab of the session.
1. Insert :- Treat all rows as inserts.
2. Delete :- Treat all rows as deletes.
3. Update :- Treat all rows as updates.
4. Data Driven :- Integration Service follows instructions coded into
Update Strategy flag rows for insert, delete, update, or reject.
Specifying Operations for Individual Target Rows
Once you determine how to treat all rows in the session, you can also set options for
individual rows, which gives additional control over how each rows behaves. Define
these options in the Transformations view on Mapping tab of the session properties.
1. Insert :- Select this option to insert a row into a target table.
2. Delete :- Select this option to delete a row from a table.
3. Update :- You have the following options in this situation:
4. Truncate Table :- Select this option to truncate the target table before
loading data.
We can create the mapping just like an 'INSERT' only mapping, with out LookUp,
Update Strategy Transformation. During the session configuration lets set up the
session properties such that the session will have the capability to both insert and
update.
Now lets set the properties for the target table as shown below.
properties Insert and Update else Insert.
Choose the
Thats all we need to set up the session for update and insert with out update
strategy.
Hope you enjoyed this article. Please leave us a comment below, if you have any
difficulties implementing this. We will be more than happy to help you.
1. Within a session
2. Within a Mapping
Update Strategy within a session:
When we configure a session, we can instruct the IS to either treat all rows in the same way or
use instructions coded into the session mapping to flag rows for different database operations.
Session Configuration:
Edit Session -> Properties -> Treat Source Rows as: (Insert, Update, Delete, and Data Driven).
Insert is default.
Specifying Operations for Individual Target Tables:
You can set the following update strategy options:
1. Insert: Select this option to insert a row into a target table.
2. Delete: Select this option to delete a row from a table.
3. Update: We have the following options in this situation:
i.
Update as Update. Update each row flagged for update if it exists in the target
table.
ii.
iii.
Update else Insert. Update the row if it exists. Otherwise, insert it.
4. Truncate: Select this option to truncate the target table before loading data.
Flagging Rows within a Mapping:
Within a mapping, we use the Update Strategy transformation to flag rows for insert, delete,
update, or reject.
Operation
Constant
Numeric Value
INSERT
DD_INSERT
INSERT
DD_INSERT
UPDATE
DD_UPDATE
DELETE
DD_DELETE
REJECT
DD_REJECT
In simple terms when you set the Treat Source Rows property it indicates Informatica that the
row has to be tagged as Insert or Update or Delete. This property coupled with target level
property of allowing Insert, Update, Delete works out wonders even in absence of Update
Strategy. This also leads to a clear-cut mapping design. I am not opposing the use of Update
Strategy but in some situations this leads to a slight openness in the mapping wherein I dont
have to peek into the reason of action the Strategy is performing e.g.
IIF(ISNULL(PK)=1,DD_INSERT,DD_UPDATE).
Lets buckle up our belts and go on a ride to understand the use of these properties.
Assume a scenario where I have following Table Structure in Stage
Keeping things simple the target table would be something like this
As you can see the target has UserID as a surrogate key which I will populate through a
sequence. Also note that Username is unique.
Now I have a scenario where I have to update the existing records and insert the new ones as
supplied in the staging table.
Before beginning with writing code, lets first understand TSA and target properties is more
detail. Treat Source Rows accepts 4 types of settings:
1. Insert :- When I set this option Informatica will mark all rows read from source
as Insert. Means that the rows will only be inserted.
2. Update :- When I set this option Informatica will mark all rows read from
source as Update. It means that rows when arrive target they have to be
updated in it.
3. Delete :- The rows will be marked as to be deleted from target once having
been read from Source.
4. Data Driven :- This indicates Informatica that we are using an update strategy
to indicate what has to be done with rows. So no marking will be done when
rows are read from source. Infact what has to be done with rows arriving to
target will be decided immediately before any IUD operation on target
However setting TSA alone will not let you modify rows in the target. Each target in itself should
be able to accept or I should say allow IUD operations. So when you have set TSA property you
have to also set the target level property also that whether the rows can be inserted, updated or
deleted from the target. This can be done in following ways:-
Insert and delete are self-explanatory however update has been categorized into 3 sections.
Please note that setting any of them will allow update on your tables:1. Update as Update :- This is simple property which says that if the row arrives
target, it has to be updated in target. So if you check the logs Informatica will
generate an Update template something like UPDATE
INFA_TARGET_RECORDS SET EMAIL = ? WHERE USERNAME = ?
2. Update as Insert :- This means that when row arrives target and it is a row
which has to be updated, then the update behaviour should be to insert this
row in target. In this case Informatica will not generate any update template
for the target instead the incoming row will be inserted using the template
INSERT INTO INFA_TARGET_RECORDS(USERID,USERNAME,EMAIL) VALUES
( ?, ?, ?)
3. Update else Insert :- Means that the incoming row flagged as update should
be either updated or inserted. In a nutshell it means that if any key column is
present in the incoming row which also exists in target then Informatica will
intelligently update that row in target. In case if the incoming key column is
not present in the target the row will be inserted.
PS :- The last two properties require you to set the Insert property of target also because if this is
not checked then Update as Insert & Update else Insert will not work and session will fail stating
that the target does not allows Insert. Why? Well its simple because these update clauses have
insert hidden in them.
Ok enough of theories? Fine lets get our hand dirty. Coming back to our scenario, we have
the rows read from source and want them to be either inserted or updated in target depending
upon the status of rows i.e. whether they are present in the target or not. My mapping looks
something like this:
Here I have used a lookup table to fetch user ID for a username incoming from stage. In the
router following has been set:-
What actually happened is that I have treated all rows from source to be flagged as update.
Secondly I have modified the behaviour of the Update and set it as Update as Insert. Due to this
property update has allowed me actually to insert the rows in target. When the session runs it will
update the rows in target and insert the new rows in target (actually update as insert).
Try it out and let me know if it works for you. I am not attaching any run demo because its better
if you do it and understand even more clearly what is happening behind the scenes.
insert rows from the source that dont exist in the target
delete rows from the target that no longer exist in the source
1. Insert ISNULL(Target_PK)
2. Delete ISNULL(Source_PK)
3. Default used for Update
6. Insert an Update Transformation coming form the Delete Group using
DD_DELETE
1. Connect this transformation to the Target
7. Insert a Filter Transformation coming from the Update Group
1. (
DECODE(Source_Field1, Target_Field1, 1, 0) = 0
OR
DECODE(Source_Field2, Target_Field2, 1, 0) = 0
)
2. Modify as needed to compare all non-key fields
8. Insert an Update Transformation coming from the Filter Transformation using
DD_UPDATE
1. Connect this transformation to the Target
9. Connect the Insert Group in the Router Transformation to the Target
Please leave a comment if you have questions on this informatica mapping process
SCD - Type 1
Slowly Changing Dimensions (SCDs) are dimensions that have data that
changes slowly, rather than changing on a time-based, regular schedule
For example, you may have a dimension in your database that tracks the
sales records of your company's salespeople. Creating sales reports seems
simple enough, until a salesperson is transferred from one regional office to
another. How do you record such a change in your sales dimension?
You could sum or average the sales by salesperson, but if you use that to
compare the performance of salesmen, that might give misleading
information. If the salesperson that was transferred used to work in a hot
market where sales were easy, and now works in a market where sales are
infrequent, her totals will look much stronger than the other salespeople in
her new region, even if they are just as good. Or you could create a second
salesperson record and treat the transferred person as a new sales person,
but that creates problems also.
Dealing with these issues involves SCD management methodologies:
Type 1:
The Type 1 methodology overwrites old data with new data, and therefore
does not track historical data at all. This is most appropriate when correcting
certain types of data errors, such as the spelling of a name. (Assuming you
won't ever need to know how it used to be misspelled in the past.)
Here is an example of a database table that keeps supplier information:
Supplier_Key
123
Supplier_Code
ABC
Supplier_Name
Acme Supply Co
Supplier_State
CA
Supplier_Code
ABC
Supplier_Name
Acme Supply Co
Supplier_State
IL
Emp no Ename
101
A
102
B
103
C
Sal
1000
2000
3000
Ename
Sal
Empno
Ename
Sal
A
B
C
D
1000
2500
3000
4000
101
102
103
104
A
B
C
D
1000
2500
3000
4000
In the second Month we have one more employee added up to the table with
the Ename D and salary of the Employee is changed to the 2500 instead of
2000.
Step 1: Is to import Source Table and Target table.
In the same way as above create two target tables with the names
emp_target1, emp_target2.
In the Ports tab we should add a new column and name it as empno1
and this is column for which we are gonna connect from the Source
Qualifier.
The Input Port for the first column should be unchked where as the
other ports like Output and lookup box should be checked. For the
newly created column only input and output boxes should be checked.
Drag all the columns from both the source and the look up
transformation and drop them all on to the Expression transformation.
Now double click on the Transformation and go to the Ports tab and
create two new columns and name it as insert and update. Both these
columns are gonna be our output data so we need to have check mark
only in front of the Output check box.
The Snap shot for the Edit transformation window is shown below.
The condition that we want to parse through our output data are listed
below.
Input IsNull(EMPNO1)
Output iif(Not isnull (EMPNO1) and Decode(SAL,SAL1,1,0)=0,1,0) .
Later now connect the Empno, Ename, Sal from the expression
transformation to both filter transformation.
Drag the respective Empno, Ename and Sal from the filter
transformations and drop them on the respective Update Strategy
Transformation.
Now go to the Properties tab and the value for the update strategy
expression is 0 (on the 1st update transformation).
Now go to the Properties tab and the value for the update strategy
expression is 1 (on the 2nd update transformation).
We are all set here finally connect the outputs of the update
transformations to the target table.