Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Data Warehouse and Business Intelligence (DW/BI) Systems have become a solution of

choice for industries to address the many critical business questions ingrained in this vital
data. Providing an essential data integration process supporting a variety of reporting
requirements, DW/BI Systems often involve significant construction effort, on-going
maintenance, and reengineering effort when data structures and business rules change. And
they do change; presenting us with problem #1.

Building effective and efficient DW/BI Systems can quickly become a daunting and difficult
technical ordeal even for veteran engineering teams. Several integrated technologies are
required from database systems, data transformation (ETL) tools, various programming
languages, administration and reporting software to high performance networks and
computers having very large storage capacities. In many cases even the sheer volume of data
can be overwhelming; Yup, problem #2.

Once reporting is delivered to diverse user communities that consume the data simplified
usability is expected yet technological realities today often require highly trained end-users.
The design, creation, delivery, and support of a robust, effortless DW/BI System for
intelligent use, sadly, are complex. You guessed it; problem #3.

Furthermore, as upstream systems change, and as DW/BI technology plows ahead, and as the
dynamic complexities involved prevail, we also find that every so often new data sources
need to be added to the mix. These are usually unpredicted and unplanned for. The
integration impact can be enormous of requiring complete regeneration of the aggregated
data; hence, problem #4.

So how do we solve these problems? Well …

Bill Inmon widely regarded as the father of data warehousing, defines a data warehouse as:

“a subject oriented, nonvolatile, time-variant collection of data in support of management’s


decisions” (http://en.wikipedia.org/wiki/Bill_Inmon).

Ralph Kimball, a pioneering data warehousing architect, developed the “dimensional


modeling” methodology now regarded as the de-facto standard in the area of decision
support. The Dimensional Model (called a “star schema”) is different from
Inman’s “normalized modeling” (sometimes called a “snowflake schema”)

methodology. In Kimball’s Star Schema,


transactional data is partitioned into aggregated “facts” with referential “dimensions”
surrounding and providing descriptors that define the facts. The Normalized Model (3NF or
“third normal form”) stores data in related “tables” following relational database design rules
established by E. F. Codd and Raymond F. Boyce in the early 1970’s that eliminate data
redundancy. Fostering vigorous debate amongst DW/BI Architects as to which methodology
is best, both have weakness when dealing with inevitable changes in the systems feeding the
data warehouse and in cleansing data to conform to strict methodology requirements

Further, the OLAP cube (for “online analytical processing”) is a data structure that allows
fast analysis of data from multiple perspectives. The cube structure is

created from either a Star or Snowflake Schema stored


as metadata from which one can view or “pivot” the data in various ways. Generally cubes
have one time based dimension that supports a historical representation of data. Creating
OLAP cubes can be very expensive and often create a significant amount of data that is of
little or no use. The 80/20 rule appears in many cases to hold true (where only 20% of the
OLAP cube data proves useful) which begs the question: Built upon a traditional architecture
does an OLAP cube truly deliver sufficient ROI? Often, the answer is a resounding,
NO! Durable DW/BI systems must deliver real value.

** A Different Approach **

Data Vault Modeling is a hybrid methodology providing historical data storage from multiple
sources designed to be resilient to environmental changes. Dan Linstedt, the creator of this
methodology, describes the resulting database as:

“A detail oriented, historical tracking and uniquely linked set of normalized tables that
support one or more functional areas of business. It is a hybrid approach encompassing the
best of breed between 3NF and Star Schemas. The design is flexible, scalable, consistent and
adaptable to the needs of the
enterprise.” (http://en.wikipedia.org/wiki/Data_Vault_Modeling)

One Data Vault philosophy is that all data is relevant, even if it is wrong. Data being wrong is
a business problem, not a technical one. I agree! This simple premise of the Data Vault is to
ingest 100% of the source data 100% of the time, good, bad, or ugly. Relevant in today’s
world, auditability and traceability of all the data in the data warehouse thus become a
standard requirement. This data model is architected specifically to meet the needs of today’s
enterprise DW/BI Systems:

“To understand the Data Vault is to understand the business”


Focused on the business process, the Data Vault as a data integration architecture, has robust
standards and definitional methods which unite information in order to make sense if it. The
Data Vault model is comprised of three basic table types:

HUB – containing a list of unique business keys having its own surrogate key

LNK - establishes relationships between business keys (typically hubs, but links can link
to other links)

SAT – holding descriptive attributes that can change over time (similar to a Kimball
Type II slowly changing dimension)

There are several advantages to the Data Vault approach:

o Simplifies the data ingestion process


o Removes the cleansing requirement of a Star Schema
o Instantly provides auditability for HIPPA and other regulations
o Puts the focus on the real problem instead of programming around it
o Easily allows for the addition of new data sources without disruption to
existing schema

Take a look at this short video showing the power of the Data Vault model. Visit
www.danlinstedt.com and http://learndatavault.com for much more on Data Vault modeling
and methodology.

In conclusion, the Data Vault modeling and methodology addresses most elements of the
problems we identified above:

o It adapts to a changing business environment


o It simplifies the DW/BI design complexities
o It increases usability by business users because it is modeled after the business
domain
o It allows for new data sources to be added without impacting the existing
design

Very Cool!
As the architect of TrinityIDC, a DV/BI cloud based solution, I can say that the more I use
the Data Vault modeling and methodology, the more I love to use it! Ask us at DB Best to
help you design yours!

Read more: http://www.dbbest.com/blog/the-data-vault-what-it-why-need-


it/#ixzz3VHn4eo4K

You might also like