Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Soft & Hard Rules – Data Quality in Data

Vault 2.0 Architecture

In making business decisions whether daily or long term, the quality of


data is a critical facet to factor into these decision-making processes. Thus, the
immediate access to the data and certainty on its quality can enhance business
performance immensely. 

But the sad truth is that we see bad data in operational systems due to human-
caused errors such as typos, ignoring standards and duplicates, in addition to
lack of input-validators in operating systems such as must-fields not being
declared as well as references to other entities (primary-foreign-key
constraints) not being defined.
In decision support systems, business users eventually expect to see high
quality data down the stream. However, the quality of data can be subjective.
What might be considered as wrong data to a business user, can also be
correct and valuable data to another business user.

For this reason, when loading a data warehouse, we would want to load all
data and leave nothing behind. In any way, the data warehouse should
provide both “single version of facts” as well as “versions of the truth” which
is represented in Data Vault 2.0 as Raw Data Vault and Business Vault.

The second-best practice is implementing data quality routines by


applying soft business rules in the data warehouse.

Soft business rules are implemented in the Business Vault or in the loading
process of the information Marts. By implementing data quality as soft
business rules, the incoming raw data is not modified in any way
remaining within the enterprise data warehouse for further analysis. If data
quality rules change, or new knowledge regarding the data is obtained, it is
possible to adjust the data quality routines without having to reload any
previous raw data.

Using this practice assures to write the correct data from Information
Marts back to operational systems.
Here's a good distinction between soft rules and hard rules:

Hard Rules.

These should be applied before data is stored in the DataVault. Any rules
applied here do not alter the contents or the granularity of the data, and
maintains auditability.

 Data typing

 Normalization / Denormalization

 Adding system fields (tags)

 De-duplication

 Splitting by record structure

 Trimming spaces from character strings

Soft Rules.

Rules that change, or interpret the data, for example adds business logic.
This changes the granularity of the data.

 Concatenating name fields

 Standardizing addresses

 Computing monthly sales

 Coalescing

 Consolidation
Write back Quality Data to Operational System (Source System)

Remember that Data quality routines (soft rules) take place in the
Business Vault. An example of our data quality routines at Company.com is
the quality check of phone-numbers in the CRM system Salesforce.

Phone-numbers sometimes show as an unreadable or hard to read form or can


be incomprehensible by automated processes and other operational
applications.

So, with this routine, we read the data from the Raw Data Vault, a cleansing or
quality job script then rearranges these numbers in a readable and
comprehensible form. After which, the data is then provided to an Information
Mart (or what we call in this case: Interface Mart) which in turn can then be
used to send this data back to the operational source system itself.

This is where business users can also use this data in their business operations
next time when loading the raw data into the Raw Data Vault, the data is
already cleansed.

You might also like