DWH Slowly Changing Dimension

The definition of slowly changing dimension is in its name only.
The dimension whichÂ

changes slowly with time. A customer dimension table represents customer. When creatingÂ a
customer, normal assumption is it is independent of time. But what if address of customer
changes or name of customer changes?
This change can be implemented in three ways...
Type I: Replace the old record with a new record with updated data, there by we lose the history.
But data warehouse has a responsibility to track the history effectively, where Type I
implementation fails.
Type II: Create a new additional dimension table record with new value. By this way we can
keep the history. We can determine which dimension is current by adding a current record flag or
by time stamp on the dimensional row.
Type III: In this type of implementation we create a new field in the dimension table, which
stores the old value of the dimension. When an attribute of the dimension changes then, we push
the updated value to the current field and old value to the old field.
--------------------------------------------------------------------------------------------------------------------
-
Slowly Changing Dimensions
Entities change over time. Customer demographics, product characteristics, classification rules,
status of customers etc. lead to changes in the attributes of dimensions. In a transaction system,
many a times the change is overwritten and track of change is lost.
For example a source system may have only the latest customer PIN Code, as it is needed to send
the marketing and billing statements. However, a data warehouse needs to maintain all the
previous PIN Codes as well, because we need to track on how many customers move to new
locations over what frequency.
A key benefit for Data Warehouse is to provide historical information, which is typically over-
written (and thus lost)in the transaction systems. How to handle slowly changing dimensions in a
Dimensional Model is a key determinant to that benefit.
There are three ways to handle the same:

Slowly Changing Dimension method 1 (In short SCD 1)
The way most of the source systems will handle it- Overwrite the attribute value. For example if
a customer’s marital status has moved from 'Unmarried' to 'Married', we over-write 'unmarried'
to 'Married'. Similarly, if an insurance policy status has moved from 'Lapsed' to 'Re-instated' the
new status is over written on the old status. This is obviously done, when we are not analyzing
the historical information.
Slowly Changing Dimension Method 2 (in short SCD 2)
This is the true-blue technique to deliver precise historical analysis. This is used, when there is
more than one change in the attributes of an entity, and we need to track the date of change of the
attribute.
In this method, a new record is added whereby the new record is given a separate identifier as the
primary key. We cannot use the production key as the primary key here as it has not changed
(Customer ID has remained the same, while the value of its attribute 'marital status' has
changed). This new identifier is called the surrogate key.
Apart from adding a new record and providing a new primary (surrogate) key, the validity period
for this new record is also added.
For example- You have a dimensional table with customer_ID '110002' with marital status as
'single'. Overtime, customer gets married and also moved to a new location. The customer
dimension record will be:
Surrogate Marital Date of

Customer ID Date Valid City
Key Status Birth
1100021 110002 Sept 23, 2004 Single Jan8, 1982 Palo Alto
1100022 110002 Oct 25, 2005 Married Jan8, 1982 Palo Alto
1100023 110002 Nov 23, 2005 Married Jan8, 1982 San Francisco
Slowly changing dimension method 3 (SCD 3)
This is a mid-way between method 1 and method 2. Here we don’t add an additional record, but
add a new field 'old attribute value'. However, this has limitations. This method has to know
from the beginning on what attributes will change. This is because a new field/attribute has to be
added in the design for every attribute, which can change. Secondly, attribute can change
maximum once in the lifetime of the entity OR at least the lifetime of the data warehouse.
Surrogate Customer Marital Date of Marital City

City
Key ID Status Birth Status Old Old
Palo
1100021 110002 Married Jan8, 1982 San Francisco Single
Alto
NOTE – The term of 'Slowly changing dimension' is used because of it being a universally
acknowledged term. However, the same methods will apply to fast changing dimensions as well.
Surrogate Keys as Primary keys of dimension tables

There is a best practice in dimensional model design to not to use the production primary key as
the primary key for the dimension table. This goes against conventional logic, but has a reason.
Data Warehouse has a core need for maintaining historical information and how an entity has
moved and changed shape through the passage of time. Typically Source Systems need for this
kind of information is quite less. In case of historical tracking in source systems, these systems
can have the luxury of using multiple-field primary key (including the key identifier of the entity
plus date stamp). For example if an insurance policy is lapsed and after two months it becomes
reinstated, one can use the primary key as a combination of Policy number+ date/time +the status
in the 'policy history table'. However, Data Warehouse doesn’t recommend the luxury of using
multiple field primary key in dimension table.
Therefore, the concept 'surrogate Key' comes into play where the primary key is not the
production key, but a key generated by the system. The production key is also used as an
attribute within the same dimension table.
The situations/reasons on when a surrogate key is used:
• 'Slowly changing dimensions'

• When the primary key itself is repeated.
• When there is a multiple field primary key. Dimension model typically does not use
multiple field primary key to link to the fact table.
Therefore it is always recommended to use surrogate keys. it is difficult to find the organizations,
which will not face the situations as highlighted above. If there are, they could as well manage
their needs using excel and pivot tables.
Acquisition Phase
1)Access lock
2)Acquisition lock which sets a flag in table header rejecting any DDL except DROP.
How these 2 locks differ with each other.....basically access lock doesnt allow any exclusive lock but
acquisition lock allows DROP....does that mean it overrides access lock????
Application Phase
1)Write lock
2)Application phase which sets a flag in table header rejecting any DDL except DROP and allowing
DML with access lock only(select)Same ques here also......how these 2 differs
RELEASE MLOAD;
Will release access lock on target table and acquisition lock or both....?
Also please specify what causes these errors and what shud we do in each case
a)error 2571 ....mload can't be released and what is the raeson for such error
b)error 2652 Error Code 2652. Operation Not allowed.tableis being mloaded
----------------------------------------------------------------------------

DWH Slowly Changing Dimension

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DWH Slowly Changing Dimension

Uploaded by

Copyright:

Available Formats

The definition of slowly changing dimension is in its name only.

The dimension whichÂ

This change can be implemented in three ways...

There are three ways to handle the same:

Slowly Changing Dimension Method 2 (in short SCD 2)

Surrogate Marital Date of

Slowly changing dimension method 3 (SCD 3)

Surrogate Customer Marital Date of Marital City

Surrogate Keys as Primary keys of dimension tables

The situations/reasons on when a surrogate key is used:

• 'Slowly changing dimensions'

You might also like