Professional Documents
Culture Documents
Slowly Changing Dimension
Slowly Changing Dimension
Slowly Changing Dimension
Dimensional modeling (DM) identifies a set of techniques and concepts utilized in data
warehouse design. Contrary to entity-relationship modeling (ER), dimensional Modeling does
not essentially involve a relational database.
DM is a technique designed to support end-user queries in a data warehouse. It focuses on
understandability and performance.
Dimensional modeling always uses the concepts of facts (measures), and dimensions (context)
Fact:
Dimension:
Surrogate Key:
A surrogate key is a unique identifier for the entity in the modeled world
It is not derived from application data
Its not meant to be shown outside the DWH
Its only significance is to act as the primary key
Having the key independent of all other columns insulates the database relationships from
changes in the data values or database design (making the database more agile) and guarantees
uniqueness.
Type 1:
The Type 1 methodology overwrites old data with new data, and therefore does not track historical data
at all. This is most appropriate when correcting certain types of data errors, such as the spelling of a
name. (Assuming you won't ever need to know how it used to be misspelled in the past.)
Here is an example of a database table that keeps supplier information:
Supplier_Key Supplier_Code Supplier_Name Supplier_State
123
ABC
Acme Supply Co CA
In this example, Supplier_Code is the natural key and Supplier_Key is a surrogate key. Technically, the
surrogate key is not necessary, since the table will be unique by the natural key (Supplier_Code).
However, the joins will perform better on an integer than on a character string.
Now imagine that this supplier moves their headquarters to Illinois. The updated table would simply
overwrite this record:
Supplier_Key Supplier_Code Supplier_Name Supplier_State
123
ABC
Acme Supply Co IL
The obvious disadvantage to this method of managing SCDs is that there is no historical record kept in
the data warehouse. You can't tell if your suppliers are tending to move to the Midwest, for example.
But an advantage to Type 1 SCDs is that they are very easy to maintain.
Important points:
We see the implementation of SCD type 1 by using the customer dimension table as an example. The
source table looks as
CREATE TABLE Customers (
Customer_Id Number,
Customer_Name Varchar2(30),
Location
Varchar2(30)
)
Now I have to load the data of the source into the customer dimension table using SCD Type 1. The
Dimension table structure is shown below.
CREATE TABLE Customers_Dim (
Cust_Key
Number,
Customer_Id Number,
Customer_Name Varchar2(30),
Location
Varchar2(30)
)
Go to the condition tab of lkp transformation and enter the lookup condition as Customer_Id =
IN_Customer_Id. Then click on OK.
Connect the customer_id port of source qualifier transformation to the IN_Customer_Id port of
lkp transformation.
Create the expression transformation with input ports as Cust_Key, Name, Location, Src_Name,
Src_Location and output ports as New_Flag, Changed_Flag
For the output ports of expression transformation enter the below expressions and click on ok
New_Flag = IIF(ISNULL(Cust_Key),1,0)
Changed_Flag = IIF(NOT ISNULL(Cust_Key)
AND (Name != Src_Name
OR Location != Src_Location),
1, 0 )
Now connect the ports of lkp transformation (Cust_Key, Name, Location) to the expression
transformaiton ports (Cust_Key, Name, Location) and ports of source qualifier
transformation(Name, Location) to the expression transforamtion ports(Src_Name,
Src_Location) respectively.
The mapping diagram so far created is shown in the below image.
Create a filter transformation and drag the ports of source qualifier transformation into it. Also
drag the New_Flag port from the expression transformation into it.
Edit the filter transformation, go to the properties tab and enter the Filter Condition as
New_Flag=1. Then click on ok.
Now create an update strategy transformation and connect all the ports of the filter
transformation (except the New_Flag port) to the update strategy. Go to the properties tab of
update strategy and enter the update strategy expression as DD_INSERT
Now drag the target definition into the mapping and connect the appropriate ports from update
strategy to the target definition.
Create a sequence generator transformation and connect the NEXTVAL port to the target
surrogate key (cust_key) port.
The part of the mapping diagram for inserting a new row is shown below:
Now create another filter transformation and drag the ports from lkp transformation
(Cust_Key), source qualifier transformation (Name, Location), expression transformation
(changed_flag) ports into the filter transformation.
Edit the filter transformation, go to the properties tab and enter the Filter Condition as
Changed_Flag=1. Then click on ok.
Now create an update strategy transformation and connect the ports of the filter
transformation (Cust_Key, Name, and Location) to the update strategy. Go to the properties tab
of update strategy and enter the update strategy expression as DD_Update
Now drag the target definition into the mapping and connect the appropriate ports from update
strategy to the target definition.
Type 2
This method tracks historical data by creating multiple records for a given natural key in the dimensional
tables with separate surrogate keys and/or different version numbers. Unlimited history is preserved for
each insert.
Let us drive the point home using a simple scenario. For example in the current month i.e. (01-01-2010)
we are provided with a source table with the three columns and three rows in it like (EMpno,Ename,Sal).
There is a new employee added and one change in the records in the month (01-02-2010). We are going
to use the SCD-2 style to extract and load the records in to target table.
Source Table: (01-01-11)
Emp no
Ename
Sal
101
1000
102
2000
103
3000
Emp no
Ename
Sal
S-date
E-date
Ver
Flag
100
101
1000
01-01-10
Null
200
102
2000
01-01-10
Null
300
103
3000
01-01-10
Null
In the second Month we have one more employee added up to the table with the Ename D and salary of
the Employee B is changed to the 2500 instead of 2000.
Source Table: (01-02-11)
Emp no
Ename
Sal
101
1000
102
2500
103
3000
104
4000
Emp no
Ename
Sal
S-date
E-date
Ver
Flag
100
101
1000
01-02-10
Null
200
102
2000
01-02-10
01-01-10
300
103
3000
01-02-10
Null
201
102
2500
01-02-10
Null
400
104
4000
01-02-10
Null
The thing to be noticed here is if there is any update in the salary of any employee then the new
row is inserted with the current date as the start date and Null as the end date. Also the
previous row is closed by placing the current date-1 as its end date.
If we use the Version property, the new row is simply inserted with the incremented value. If
the new row is a row for an already existing key then the version is incremented or if the row is
for a key that does not previously exist in the table then it is added with the version=1.
If we use the Flag option, the New row is added with a Flag=1 and the previous roe is updated
and closed with a value of Flag=0
Important points:
We will see how to implement the SCD Type 2 version in informatica. As an example consider the
customer dimension. The source and target table structures are shown below:
--Source Table
Create Table Customers
(
Customer_Id Number Primary Key,
Location
Varchar2(30)
);
--Target Dimension Table
Create Table Customers_Dim
(
Cust_Key Number Primary Key,
Customer_Id
Number,
Location
Varchar2(30),
Version
Number
);
The basic steps involved in creating a SCD Type 2 version mapping are
Identifying the new records and inserting into the dimension table with version number as one.
Identifying the changed record and inserting into the dimension table by incrementing the
version number.
Lets divide the steps to implement the SCD type 2 version mapping into three parts.
Go to the conditions tab of the lookup transformation and enter the condition as Customer_Id =
IN_Customer_Id
Go to the properties tab of the LKP transformation and enter the below query in Lookup SQL
Override. Alternatively you can generate the SQL query by connecting the database in the
Lookup SQL Override expression editor and then add the order by clause.
SELECT
Customers_Dim.Cust_Key as Cust_Key,
Customers_Dim.Location as Location,
Customers_Dim.Version as Version,
Customers_Dim.Customer_Id as Customer_Id
FROM
Customers_Dim
ORDER BY
Customers_Dim.Customer_Id, Customers_Dim.Version--
You have to use an order by clause in the above query. If you sort the version column in
ascending order, then you have to specify "Use Last Value" in the "Lookup policy on multiple
match" property. If you have sorted the version column in descending order then you have to
specify the "Lookup policy on multiple match" option as "Use First Value"
Click on Ok in the lookup transformation. Connect the customer_id port of source qualifier
transformation to the In_Customer_Id port of the LKP transformation.
Create an expression transformation with input/output ports as Cust_Key, LKP_Location,
Src_Location and output ports as New_Flag, Changed_Flag. Enter the below expressions for
output ports.
Now create a filter transformation to identify and insert new record in to the dimension table.
Drag the ports of expression transformation (New_Flag) and source qualifier transformation
(Customer_Id, Location) into the filter transformation.
Go the properties tab of filter transformation and enter the filter condition as New_Flag=1
Now create a update strategy transformation and connect the ports of filter transformation
(Customer_Id, Location). Go to the properties tab and enter the update strategy expression as
DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update
strategy transformation to the target definition.
Create a sequence generator and an expression transformation. Call this expression
transformation as "Expr_Ver".
Drag and connect the NextVal port of sequence generator to the Expression transformation. In
the expression transformation create a new output port (Version) and assign value 1 to it.
Now connect the ports of expression transformation (Nextval, Version) to the Target definition
ports (Cust_Key, Version). The part of the mapping flow is shown in the below image.
Create a filter transformation. This is used to find the changed record. Now drag the ports from
expression transformation (changed_flag), source qualifier transforamtion (customer_id,
location) and LKP transformation (version) into the filter transformation.
Go to the filter transformation properties and enter the filter condition as changed_flag =1.
Create an expression transformation and drag the ports of filter transformation except the
changed_flag port into the expression transformation.
Go to the ports tab of expression transformation and create a new output port (O_Version) and
assign the expression as (version+1).
Now create an update strategy transformation and drag the ports of expression transformation
(customer_id, location,o_version) into the update strategy transformation. Go to the properties
tab and enter the update strategy expression as DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update
strategy transformation to the target definition.
Now connect the Next_Val port of expression transformation (Expr_Ver created in part 2) to the
cust_key port of the target definition. The complete mapping diagram is shown in the below
image:
Type 3
This Method has limited history preservation, and we are goanna use surrogate key as the Primary key
here.
Source table: (01-01-2011)
Empno
Ename
Sal
101
1000
102
2000
103
3000
Ename
C-sal
P-sal
101
1000
102
2000
103
3000
Ename
Sal
101
1000
102
4566
103
3000
Ename
C-sal
P-sal
101
1000
102
4566
2000
103
3000
Important points:
We will see the implementation of SCD type 3 by using the customer dimension table as an example.
The source table looks as
Number,
Location
Varchar2(30)
Now I have to load the data of the source into the customer dimension table using SCD Type 3. The
Dimension table structure is shown below.
Number,
Customer_Id
Number,
Curent_Location
Previous_Location
Varchar2(30),
Varchar2(30)
Select the lookup Transformation, enter a name and click on create. You will get a window as
shown in the below image.
Go to the condition tab of LKP transformation and enter the lookup condition as Customer_Id =
IN_Customer_Id. Then click on OK.
Connect the customer_id port of source qualifier transformation to the IN_Customer_Id port of
LKP transformation.
Create the expression transformation with input ports as Cust_Key, Prev_Location,
Curr_Location and output ports as New_Flag, Changed_Flag
For the output ports of expression transformation enter the below expressions and click on ok
New_Flag = IIF(ISNULL(Cust_Key),1,0)
Changed_Flag = IIF(NOT ISNULL(Cust_Key)
Now connect the ports of LKP transformation (Cust_Key, Curent_Location) to the expression
transformaiton ports (Cust_Key, Prev_Location) and ports of source qualifier transformation
(Location) to the expression transformation ports (Curr_Location) respectively.
The mapping diagram so far created is shown in the below image.
Create a filter transformation and drag the ports of source qualifier transformation into it. Also
drag the New_Flag port from the expression transformation into it.
Edit the filter transformation, go to the properties tab and enter the Filter Condition as
New_Flag=1. Then click on ok.
Now create an update strategy transformation and connect all the ports of the filter
transformation (except the New_Flag port) to the update strategy. Go to the properties tab of
update strategy and enter the update strategy expression as DD_INSERT
Now drag the target definition into the mapping and connect the appropriate ports from update
strategy to the target definition. Connect Location port of update strategy to the
Current_Location port of the target definition.
Create a sequence generator transformation and connect the NEXTVAL port to the target
surrogate key (cust_key) port.
The part of the mapping diagram for inserting a new row is shown below:
Now create another filter transformation, Go to the ports tab and create the ports Cust_Key,
Curr_Location, Prev_Location, Changed_Flag. Connect the ports LKP Transformation (Cust_Key,
Current_Location) to the filter transformation ports (Cust_Key, Prev_Location), source qualifier
transformation ports (Location) to the filter transformation port (Curr_Location) and expression
transformation port(changed_flag) to the changed_flag port of the filter transformation.
Edit the filter transformation, go to the properties tab and enter the Filter Condition as
Changed_Flag=1. Then click on ok.
Now create an update strategy transformation and connect the ports of the filter
transformation (Cust_Key, Curr_Location, Prev_location) to the update strategy. Go to the
properties tab of update strategy and enter the update strategy expression as DD_Update
Now drag the target definition into the mapping and connect the appropriate ports from update
strategy to the target definition.
The complete mapping diagram is shown in the below image.