Professional Documents
Culture Documents
29.3 Soft & Hards Rules in DataVault - Intro
29.3 Soft & Hards Rules in DataVault - Intro
But the sad truth is that we see bad data in operational systems due to human-
caused errors such as typos, ignoring standards and duplicates, in addition to
lack of input-validators in operating systems such as must-fields not being
declared as well as references to other entities (primary-foreign-key
constraints) not being defined.
In decision support systems, business users eventually expect to see high
quality data down the stream. However, the quality of data can be subjective.
What might be considered as wrong data to a business user, can also be
correct and valuable data to another business user.
For this reason, when loading a data warehouse, we would want to load all
data and leave nothing behind. In any way, the data warehouse should
provide both “single version of facts” as well as “versions of the truth” which
is represented in Data Vault 2.0 as Raw Data Vault and Business Vault.
Soft business rules are implemented in the Business Vault or in the loading
process of the information Marts. By implementing data quality as soft
business rules, the incoming raw data is not modified in any way
remaining within the enterprise data warehouse for further analysis. If data
quality rules change, or new knowledge regarding the data is obtained, it is
possible to adjust the data quality routines without having to reload any
previous raw data.
Using this practice assures to write the correct data from Information
Marts back to operational systems.
Here's a good distinction between soft rules and hard rules:
Hard Rules.
These should be applied before data is stored in the DataVault. Any rules
applied here do not alter the contents or the granularity of the data, and
maintains auditability.
Data typing
Normalization / Denormalization
De-duplication
Soft Rules.
Rules that change, or interpret the data, for example adds business logic.
This changes the granularity of the data.
Standardizing addresses
Coalescing
Consolidation
Write back Quality Data to Operational System (Source System)
Remember that Data quality routines (soft rules) take place in the
Business Vault. An example of our data quality routines at Company.com is
the quality check of phone-numbers in the CRM system Salesforce.
So, with this routine, we read the data from the Raw Data Vault, a cleansing or
quality job script then rearranges these numbers in a readable and
comprehensible form. After which, the data is then provided to an Information
Mart (or what we call in this case: Interface Mart) which in turn can then be
used to send this data back to the operational source system itself.
This is where business users can also use this data in their business operations
next time when loading the raw data into the Raw Data Vault, the data is
already cleansed.