Slowly Changing Dimension

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Dimensional Modeling

Dimensional modeling (DM) identifies a set of techniques and concepts utilized in data
warehouse design. Contrary to entity-relationship modeling (ER), dimensional Modeling does
not essentially involve a relational database.
DM is a technique designed to support end-user queries in a data warehouse. It focuses on
understandability and performance.
Dimensional modeling always uses the concepts of facts (measures), and dimensions (context)

Fact:

Facts are the measurable, quantitative data about a business


Fact tables record metrics for a particular event generally containing numeric values and foreign
keys to dimensional data.
Fact tables are designed to a low level of uniform detail (referred to as "granularity" or "grain"),
they can record events in great detail. Due to this large number of records are piled up in a fact
table over time.
Fact tables are often defined by their grain. The grain of a fact table represents the most atomic
level by which the facts may be defined. The grain of a SALES fact table might be stated as "Sales
volume by Day by Product by Store". Each record in this fact table is therefore uniquely defined
by a day, product and store.
Fact tables consist of a surrogate key to uniquely identify each row.

Dimension:

Dimension refers to descriptive attributes related to fact data


Dimensions provide structured labeling information to otherwise unordered numeric measures.
Dimension tables have a small number of records compared to fact tables, however, each
record may have a plethora of attributes to describe the fact data.
Dimensions can define a wide variety of characteristics, but some of the most common
attributes defined by dimension tables include:
o Time dimension tables describe time at the lowest level of time granularity for which
events are recorded in the star schema
o Geography dimension tables describe location data, such as country, state, or city
o Product dimension tables describe products
o Employee dimension tables describe employees, such as sales people
o Range dimension tables describe ranges of time, dollar values, or other measurable
quantities to simplify reporting
A dimension table has a primary key column that uniquely identifies each dimension record
(row). The dimension table is associated with a fact table using this key. Data in the fact table
can be filtered and grouped (sliced and diced) by various combinations of attributes.

Surrogate Key:

A surrogate key is a unique identifier for the entity in the modeled world
It is not derived from application data
Its not meant to be shown outside the DWH
Its only significance is to act as the primary key
Having the key independent of all other columns insulates the database relationships from
changes in the data values or database design (making the database more agile) and guarantees
uniqueness.

Slowly Changing Dimensions


Dimensions in data management and data warehousing contain relatively static data about such entities
as geographical locations, customers, or products. Data captured by Slowly Changing Dimensions
(SCDs) changes slowly but unpredictably, rather than according to a regular schedule. In Data
Warehouse there is a need to track changes in dimension attributes in order to report historical data. In
other words, implementing one of the SCD types should enable users assigning proper dimension's
attribute value for given date.
Some scenarios can cause Referential integrity problems.
For example, a database may contain a fact table that stores sales records. This fact table would be
linked to dimensions by means of foreign keys. One of these dimensions may contain data about the
company's salespeople: e.g., the regional offices in which they work. However, the salespeople are
sometimes transferred from one regional office to another. For historical sales reporting purposes it
may be necessary to keep a record of the fact that a particular sales person had been assigned to a
particular regional office at an earlier date, whereas that sales person is presently assigned to a different
regional office.
Dealing with these issues involves SCD management methodologies such as the following:

Type 1:
The Type 1 methodology overwrites old data with new data, and therefore does not track historical data
at all. This is most appropriate when correcting certain types of data errors, such as the spelling of a
name. (Assuming you won't ever need to know how it used to be misspelled in the past.)
Here is an example of a database table that keeps supplier information:
Supplier_Key Supplier_Code Supplier_Name Supplier_State
123
ABC
Acme Supply Co CA
In this example, Supplier_Code is the natural key and Supplier_Key is a surrogate key. Technically, the
surrogate key is not necessary, since the table will be unique by the natural key (Supplier_Code).
However, the joins will perform better on an integer than on a character string.

Now imagine that this supplier moves their headquarters to Illinois. The updated table would simply
overwrite this record:
Supplier_Key Supplier_Code Supplier_Name Supplier_State
123
ABC
Acme Supply Co IL
The obvious disadvantage to this method of managing SCDs is that there is no historical record kept in
the data warehouse. You can't tell if your suppliers are tending to move to the Midwest, for example.
But an advantage to Type 1 SCDs is that they are very easy to maintain.

Important points:

Usually the changes relate to correction of errors in the source system


Sometimes the change in the source system has no significance
The old value in the source system needs to be discarded
The change in the source system need not be preserved in the DWH
Overwrite the attribute value in the dimension table row with the new value
No other changes are made in the dimensions table row
The key of the dimension table or any other key values are not affected
Easiest to implement

Explanation with an Example:


The process involved in the implementation of SCD Type 1 in Informatica is

Identifying the new record and inserting it in to the dimension table.


Identifying the changed record and updating the dimension table.

We see the implementation of SCD type 1 by using the customer dimension table as an example. The
source table looks as
CREATE TABLE Customers (
Customer_Id Number,
Customer_Name Varchar2(30),
Location
Varchar2(30)
)
Now I have to load the data of the source into the customer dimension table using SCD Type 1. The
Dimension table structure is shown below.
CREATE TABLE Customers_Dim (
Cust_Key
Number,
Customer_Id Number,
Customer_Name Varchar2(30),
Location
Varchar2(30)
)

Steps to Create SCD Type 1 Mapping

Follow the below steps to create SCD Type 1 mapping in Informatica

Create the source and dimension tables in the database.


Open the mapping designer tool, source analyzer and either create or import the source
definition.
Go to the Warehouse designer or Target designer and import the target definition.
Go to the mapping designer tab and create new mapping.
Drag the source into the mapping.
Go to the toolbar, Transformation and then Create.
Select the lookup Transformation, enter a name and click on create. You will get a window as
shown in the below image.

Select the customer dimension table and click on OK.


Edit the lkp transformation, go to the properties tab, and add a new port In_Customer_Id. This
new port needs to be connected to the Customer_Id port of source qualifier transformation.

Go to the condition tab of lkp transformation and enter the lookup condition as Customer_Id =
IN_Customer_Id. Then click on OK.

Connect the customer_id port of source qualifier transformation to the IN_Customer_Id port of
lkp transformation.
Create the expression transformation with input ports as Cust_Key, Name, Location, Src_Name,
Src_Location and output ports as New_Flag, Changed_Flag
For the output ports of expression transformation enter the below expressions and click on ok

New_Flag = IIF(ISNULL(Cust_Key),1,0)
Changed_Flag = IIF(NOT ISNULL(Cust_Key)
AND (Name != Src_Name
OR Location != Src_Location),
1, 0 )

Now connect the ports of lkp transformation (Cust_Key, Name, Location) to the expression
transformaiton ports (Cust_Key, Name, Location) and ports of source qualifier
transformation(Name, Location) to the expression transforamtion ports(Src_Name,
Src_Location) respectively.
The mapping diagram so far created is shown in the below image.

Create a filter transformation and drag the ports of source qualifier transformation into it. Also
drag the New_Flag port from the expression transformation into it.
Edit the filter transformation, go to the properties tab and enter the Filter Condition as
New_Flag=1. Then click on ok.
Now create an update strategy transformation and connect all the ports of the filter
transformation (except the New_Flag port) to the update strategy. Go to the properties tab of
update strategy and enter the update strategy expression as DD_INSERT
Now drag the target definition into the mapping and connect the appropriate ports from update
strategy to the target definition.
Create a sequence generator transformation and connect the NEXTVAL port to the target
surrogate key (cust_key) port.
The part of the mapping diagram for inserting a new row is shown below:

Now create another filter transformation and drag the ports from lkp transformation
(Cust_Key), source qualifier transformation (Name, Location), expression transformation
(changed_flag) ports into the filter transformation.
Edit the filter transformation, go to the properties tab and enter the Filter Condition as
Changed_Flag=1. Then click on ok.
Now create an update strategy transformation and connect the ports of the filter
transformation (Cust_Key, Name, and Location) to the update strategy. Go to the properties tab
of update strategy and enter the update strategy expression as DD_Update
Now drag the target definition into the mapping and connect the appropriate ports from update
strategy to the target definition.

The complete mapping diagram is shown in the below image.

Type 2
This method tracks historical data by creating multiple records for a given natural key in the dimensional
tables with separate surrogate keys and/or different version numbers. Unlimited history is preserved for
each insert.
Let us drive the point home using a simple scenario. For example in the current month i.e. (01-01-2010)
we are provided with a source table with the three columns and three rows in it like (EMpno,Ename,Sal).
There is a new employee added and one change in the records in the month (01-02-2010). We are going
to use the SCD-2 style to extract and load the records in to target table.
Source Table: (01-01-11)
Emp no

Ename

Sal

101

1000

102

2000

103

3000

Target Table: (01-01-11)


Skey

Emp no

Ename

Sal

S-date

E-date

Ver

Flag

100

101

1000

01-01-10

Null

200

102

2000

01-01-10

Null

300

103

3000

01-01-10

Null

In the second Month we have one more employee added up to the table with the Ename D and salary of
the Employee B is changed to the 2500 instead of 2000.
Source Table: (01-02-11)
Emp no

Ename

Sal

101

1000

102

2500

103

3000

104

4000

Target Table: (01-02-11)


Skey

Emp no

Ename

Sal

S-date

E-date

Ver

Flag

100

101

1000

01-02-10

Null

200

102

2000

01-02-10

01-01-10

300

103

3000

01-02-10

Null

201

102

2500

01-02-10

Null

400

104

4000

01-02-10

Null

The thing to be noticed here is if there is any update in the salary of any employee then the new
row is inserted with the current date as the start date and Null as the end date. Also the
previous row is closed by placing the current date-1 as its end date.
If we use the Version property, the new row is simply inserted with the incremented value. If
the new row is a row for an already existing key then the version is incremented or if the row is
for a key that does not previously exist in the table then it is added with the version=1.
If we use the Flag option, the New row is added with a Flag=1 and the previous roe is updated
and closed with a value of Flag=0

Important points:

They usually relate to true changes in source systems


There is a need to preserve history in the DWH
This type of change partitions the history in the DWH
Every change for the same attribute must be preserved
An effective date is included in the dimension table
There are no changes to the original row in the dimensions table
The key of the original row is not affected
The new row is inserted with a new surrogate key.

Explanation with an Example:

We will see how to implement the SCD Type 2 version in informatica. As an example consider the
customer dimension. The source and target table structures are shown below:
--Source Table
Create Table Customers
(
Customer_Id Number Primary Key,
Location
Varchar2(30)
);
--Target Dimension Table
Create Table Customers_Dim
(
Cust_Key Number Primary Key,
Customer_Id
Number,
Location
Varchar2(30),
Version
Number
);

The basic steps involved in creating a SCD Type 2 version mapping are

Identifying the new records and inserting into the dimension table with version number as one.
Identifying the changed record and inserting into the dimension table by incrementing the
version number.

Lets divide the steps to implement the SCD type 2 version mapping into three parts.

SCD Type 2 version implementation - Part 1


Here we will see the basic set up and mapping flow required for SCD type 2 version. The steps involved
are:
Create the source and dimension tables in the database.
Open the mapping designer tool, source analyzer and either create or import the source
definition.
Go to the Warehouse designer or Target designer and import the target definition.
Go to the mapping designer tab and create new mapping.
Drag the source into the mapping.
Go to the toolbar, Transformation and then Create.
Select the lookup Transformation, enter a name and click on create. You will get a window as
shown in the below image.

Select the customer dimension table and click on OK.


Edit the lookup transformation, go to the ports tab and remove unnecessary ports. Just keep
only Cust_key, customer_id, location ports and Version ports in the lookup transformation.
Create a new port (IN_Customer_Id) in the lookup transformation. This new port needs to be
connected to the customer_id port of the source qualifier transformation.

Go to the conditions tab of the lookup transformation and enter the condition as Customer_Id =
IN_Customer_Id
Go to the properties tab of the LKP transformation and enter the below query in Lookup SQL
Override. Alternatively you can generate the SQL query by connecting the database in the
Lookup SQL Override expression editor and then add the order by clause.

SELECT

Customers_Dim.Cust_Key as Cust_Key,
Customers_Dim.Location as Location,
Customers_Dim.Version as Version,

Customers_Dim.Customer_Id as Customer_Id
FROM

Customers_Dim

ORDER BY

Customers_Dim.Customer_Id, Customers_Dim.Version--

You have to use an order by clause in the above query. If you sort the version column in
ascending order, then you have to specify "Use Last Value" in the "Lookup policy on multiple
match" property. If you have sorted the version column in descending order then you have to
specify the "Lookup policy on multiple match" option as "Use First Value"
Click on Ok in the lookup transformation. Connect the customer_id port of source qualifier
transformation to the In_Customer_Id port of the LKP transformation.
Create an expression transformation with input/output ports as Cust_Key, LKP_Location,
Src_Location and output ports as New_Flag, Changed_Flag. Enter the below expressions for
output ports.

New_Flag = IIF(ISNULL(Cust_Key), 1,0)


Changed_Flag = IIF( NOT ISNULL(Cust_Key) AND
LKP_Location != SRC_Location, 1, 0)

The part of the mapping flow is shown below.

SCD Type 2 version implementation - Part 2


In this part, we will identify the new records and insert them into the target with version value as 1. The
steps involved are:

Now create a filter transformation to identify and insert new record in to the dimension table.
Drag the ports of expression transformation (New_Flag) and source qualifier transformation
(Customer_Id, Location) into the filter transformation.
Go the properties tab of filter transformation and enter the filter condition as New_Flag=1

Now create a update strategy transformation and connect the ports of filter transformation
(Customer_Id, Location). Go to the properties tab and enter the update strategy expression as
DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update
strategy transformation to the target definition.
Create a sequence generator and an expression transformation. Call this expression
transformation as "Expr_Ver".
Drag and connect the NextVal port of sequence generator to the Expression transformation. In
the expression transformation create a new output port (Version) and assign value 1 to it.
Now connect the ports of expression transformation (Nextval, Version) to the Target definition
ports (Cust_Key, Version). The part of the mapping flow is shown in the below image.

SCD Type 2 Version implementation - Part 3


In this part, we will identify the changed records and insert them into the target by incrementing the
version number. The steps involved are:

Create a filter transformation. This is used to find the changed record. Now drag the ports from
expression transformation (changed_flag), source qualifier transforamtion (customer_id,
location) and LKP transformation (version) into the filter transformation.
Go to the filter transformation properties and enter the filter condition as changed_flag =1.
Create an expression transformation and drag the ports of filter transformation except the
changed_flag port into the expression transformation.
Go to the ports tab of expression transformation and create a new output port (O_Version) and
assign the expression as (version+1).
Now create an update strategy transformation and drag the ports of expression transformation
(customer_id, location,o_version) into the update strategy transformation. Go to the properties
tab and enter the update strategy expression as DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update
strategy transformation to the target definition.
Now connect the Next_Val port of expression transformation (Expr_Ver created in part 2) to the
cust_key port of the target definition. The complete mapping diagram is shown in the below
image:

Type 3
This Method has limited history preservation, and we are goanna use surrogate key as the Primary key
here.
Source table: (01-01-2011)
Empno

Ename

Sal

101

1000

102

2000

103

3000

Target Table: (01-01-2011)


Empno

Ename

C-sal

P-sal

101

1000

102

2000

103

3000

Source Table: (01-02-2011)


Empno

Ename

Sal

101

1000

102

4566

103

3000

Target Table (01-02-2011):


Empno

Ename

C-sal

P-sal

101

1000

102

4566

2000

103

3000

So hope u got what Im trying to do with the above tables:

Important points:

They usually relate to soft or tentative changes in the source systems


There is a need to keep track of history with old and new values of the changed attribute
They are used to compare performances across the transition
They provide the ability to track forward and backward

The process involved in the implementation of SCD Type 3 in informatica is

Explanation with an Example:

Identifying the new record and insert it in to the dimension table.


Identifying the changed record and update the existing record in the dimension table.

We will see the implementation of SCD type 3 by using the customer dimension table as an example.
The source table looks as

CREATE TABLE Customers (


Customer_Id

Number,

Location

Varchar2(30)

Now I have to load the data of the source into the customer dimension table using SCD Type 3. The
Dimension table structure is shown below.

CREATE TABLE Customers_Dim (


Cust_Key

Number,

Customer_Id

Number,

Curent_Location
Previous_Location

Varchar2(30),
Varchar2(30)

Steps to Create SCD Type 3 Mapping


Follow the below steps to create SCD Type 3 mapping in informatica
Create the source and dimension tables in the database.
Open the mapping designer tool, source analyzer and either create or import the source
definition.
Go to the Warehouse designer or Target designer and import the target definition.
Go to the mapping designer tab and create new mapping.
Drag the source into the mapping.
Go to the toolbar, Transformation and then Create.

Select the lookup Transformation, enter a name and click on create. You will get a window as
shown in the below image.

Select the customer dimension table and click on OK.


Edit the LKP transformation, go to the properties tab, remove the Previous_Location port and
add a new port In_Customer_Id. This new port needs to be connected to the Customer_Id port
of source qualifier transformation.

Go to the condition tab of LKP transformation and enter the lookup condition as Customer_Id =
IN_Customer_Id. Then click on OK.
Connect the customer_id port of source qualifier transformation to the IN_Customer_Id port of
LKP transformation.
Create the expression transformation with input ports as Cust_Key, Prev_Location,
Curr_Location and output ports as New_Flag, Changed_Flag
For the output ports of expression transformation enter the below expressions and click on ok

New_Flag = IIF(ISNULL(Cust_Key),1,0)
Changed_Flag = IIF(NOT ISNULL(Cust_Key)

AND Prev_Location != Curr_Location,


1, 0 )

Now connect the ports of LKP transformation (Cust_Key, Curent_Location) to the expression
transformaiton ports (Cust_Key, Prev_Location) and ports of source qualifier transformation
(Location) to the expression transformation ports (Curr_Location) respectively.
The mapping diagram so far created is shown in the below image.

Create a filter transformation and drag the ports of source qualifier transformation into it. Also
drag the New_Flag port from the expression transformation into it.
Edit the filter transformation, go to the properties tab and enter the Filter Condition as
New_Flag=1. Then click on ok.
Now create an update strategy transformation and connect all the ports of the filter
transformation (except the New_Flag port) to the update strategy. Go to the properties tab of
update strategy and enter the update strategy expression as DD_INSERT
Now drag the target definition into the mapping and connect the appropriate ports from update
strategy to the target definition. Connect Location port of update strategy to the
Current_Location port of the target definition.
Create a sequence generator transformation and connect the NEXTVAL port to the target
surrogate key (cust_key) port.
The part of the mapping diagram for inserting a new row is shown below:

Now create another filter transformation, Go to the ports tab and create the ports Cust_Key,
Curr_Location, Prev_Location, Changed_Flag. Connect the ports LKP Transformation (Cust_Key,
Current_Location) to the filter transformation ports (Cust_Key, Prev_Location), source qualifier
transformation ports (Location) to the filter transformation port (Curr_Location) and expression
transformation port(changed_flag) to the changed_flag port of the filter transformation.
Edit the filter transformation, go to the properties tab and enter the Filter Condition as
Changed_Flag=1. Then click on ok.
Now create an update strategy transformation and connect the ports of the filter
transformation (Cust_Key, Curr_Location, Prev_location) to the update strategy. Go to the
properties tab of update strategy and enter the update strategy expression as DD_Update

Now drag the target definition into the mapping and connect the appropriate ports from update
strategy to the target definition.
The complete mapping diagram is shown in the below image.

You might also like