Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

DATA WAREHOUSE CONCEPTS

A variation on the staging structure is the addition of data marts to the data
warehouse. The data marts store summarized data for a particular line of business,
making that data easily accessible for specific forms of analysis. For example,
adding data marts can allow a financial analyst to more easily perform detailed
queries on sales data, to make predictions about customer behavior. Data marts
make analysis easier by tailoring data specifically to meet the needs of the end user.

Facts, Dimensions, and Measures


The core building blocks of information in a data warehouse are facts, dimensions,
and measures.

A fact is the part of your data that indicates a specific occurrence or transaction. For
example, if your business sells flowers, some facts you would see in your data
warehouse are:

• Sold 30 roses in-store for $19.99

• Ordered 500 new flower pots from China for $1500

• Paid salary of cashier for this month $1000

Several numbers can describe each fact, and we call these numbers measures.
Some measures to describe the fact ‘ordered 500 new flower pots from China for
$1500’ are:
• Quantity ordered - 500

• Cost - $1500

When analysts are working with data, they perform calculations on measures (e.g.,
sum, maximum, average) to glean insights. For example, you may want to know the
average number of flower pots you order each month.

A dimension categorizes facts and measures and provides structured labeling


information for them - otherwise, they would just be a collection of unordered
numbers! Some dimensions to describe the fact ‘ordered 500 new flower pots from
China for $1500’ are:

• Country purchased from - China

• Time purchased - 1 pm

• Expected date of arrival - June 6th

You cannot perform calculations on dimensions explicitly, and doing so probably


would not be very helpful - how can you find the ‘average arrival date for orders’?
However, it is possible to create new measures from dimensions, and these are
useful. For example, if you know the average number of days between the order
date and arrival date, you can better plan stock purchases.

Normalization and Denormalization


Normalization is the process of efficiently organizing data in a data warehouse (or
any other place that stores data). The main goals are to reduce data redundancy -
i.e., remove any duplicate data - and improve data integrity - i.e., improve the
accuracy of data. There are different levels of normalization and no consensus for
the ‘best’ method. However, all methods involve storing separate but related pieces
of information in different tables.

There are many benefits to normalization, such as:

• Faster searching and sorting on each table

• Simpler tables make data modification commands faster to write and execute

• Less redundant data means you save on disk space, and so you can collect
and store more data
Denormalization is the process of deliberately adding redundant copies or groups of
data to already normalized data. It is not the same as un-normalized data.
Denormalization improves the read performance and makes it much easier to
manipulate tables into forms you want. When analysts work with data warehouses,
they typically only perform reads on the data. Thus, denormalized data can save
them vast amounts of time and headaches.

Benefits of denormalization:

• Fewer tables minimize the need for table joins which speeds up data analysts’
workflow and leads them discovering more useful insights in the data

• Fewer tables simplify queries leading to fewer bugs

Data Models
It would be wildly inefficient to store all your data in one massive table. So, your data
warehouse contains many tables that you can join together to get specific
information. The main table is called a fact table, and dimension tables surround it.

The first step in designing a data warehouse is to build a conceptual data model that
defines the data you want and the high-level relationships between them.

OLAP vs. OLTP


Online transaction processing (OLTP) is characterized by short write transactions
that involve the front-end applications of an enterprise’s data architecture. OLTP
databases emphasize fast query processing and only deal with current data.
Businesses use these to capture information for business processes and provide
source data for the data warehouse.

Online analytical processing (OLAP) allows you to run complex read queries and
thus perform a detailed analysis of historical transactional data. OLAP systems help
to analyze the data in the data warehouse.

You might also like