Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

Dimensional Modeling in Data

Warehousing

Bachelor of Technology
Computer Science and Engineering
OCTOBER 2019

TISL/CSE/Term-Paper/Semester-7 2
TABLE OF CONTENTS

1. Abstract
2. Introduction
3. Body
i. E-R Modeling Versus Dimensional Modeling
ii. Elements of Dimensional Data Model
iii. Dimensional Modelling process
iv. Schemas of Multi-Dimensional Modeling
v. OLAP Operations
vi. Rules for Dimensional Modelling
vii. Benefits of dimensional modelling
4. Conclusion
5. References

TISL/CSE/Term-Paper/Semester-7 3
ABSTRACT

The Data Warehouse (DW) is considered as a collection of integrated, detailed, historical data,
collected from different sources. DW is used to collect data designed to support management
decision making. There are so many approaches in designing a data warehouse both in conceptual
and logical design phases. The conceptual design approaches are dimensional fact model,
multidimensional E/R model, star-ER model and object-oriented multidimensional model. And the
logical design approaches are flat schema, star schema, fact constellation schema, galaxy schema
and snowflake schema. In this paper we have focused on Dimensional Modeling in the Data
Warehouse. Dimensional Modeling (DM) is most popular technique in data warehousing. In DM a
model of tables and relations is used to optimize decision support query performance in relational
databases.

TISL/CSE/Term-Paper/Semester-7 4
INTRODUCTION

In Data Modeling, Dimensional Modeling is used for OLAP Applications design.ER Modeling is
used for OLTP application design.

MODELS ENTITY-RELATIONSHIP MODELING (ER): Entity-relationship modeling is a logical


design technique that seeks to eliminate data redundancy. ER models show the relationship between data.
These models are difficult to read and understand unless trained in the model methodology. Also, it is
difficult to understand the business from viewing the ER model.

DIMENSIONAL MODELING: Dimensional modeling is the name of a logical design technique used for
data warehouses. Every dimensional model is composed of a “fact” table and a set of “dimension” tables.
Conformed fact and dimension elements are elements that conform to the enterprises centralized metadata
database. For example, “store_id” would have a common definition and attributes across the enterprise and
as such would have the same information across dimension tables.

TISL/CSE/Term-Paper/Semester-7 5
E-R Modeling Versus Dimensional Modeling

Elements of Dimensional Data Model

Fact
Facts are the measurements/metrics or facts from your business process. For a Sales business process, a
measurement would be quarterly sales number.

Dimension
Dimension provides the context surrounding a business process event. In simple terms, they give who, what,
where of a fact. In the Sales business process, for the fact quarterly sales number, dimensions would be
 Who – Customer Names
 Where – Location
 What – Product Name
In other words, a dimension is a window to view information in the facts.

Attributes
The Attributes are the various characteristics of the dimension.In the Location dimension, the
attributes can be
 State
 Country
 Zipcode etc.
Attributes are used to search, filter, or classify facts. Dimension Tables contain Attributes.

Fact Table
A fact table is a primary table in a dimensional model.
A Fact Table contains
1. Measurements/facts
2. Foreign key to dimension table

Dimension table
 A dimension table contains dimensions of a fact.
 They are joined to fact table via a foreign key.

TISL/CSE/Term-Paper/Semester-7 6
 Dimension tables are de-normalized tables.
 The Dimension Attributes are the various columns in a dimension table
 Dimensions offers descriptive characteristics of the facts with the help of their attributes
 No set limit set for given for number of dimensions
 The dimension can also contain one or more hierarchical relationships

Dimensional Modelling process

Steps of Dimensional Modelling

The dimensional data model is built based on star schema with a fact table at the center surrounded by a
number of dimension tables. The following four-step process is commonly used in dimensional modeling
design:
1. Select the business process
2. Declare the grain
3. Identify the dimensions
4. Identify the Fact

Let’s examine each step in the modelling process in greater detail.

1. Select the business process to model – business process is daily activities performed in your
company supported by an online transaction system (OLTP) or source system. In this step, we have
to gather the requirements from business users to select the business process or source of
measurement to model. Good examples of business processes are order processing, shipments,
materials purchasing, GL, etc.
2. Declare the grain – after having a business process to model, we need to declare the grain of a
business process. Declaring grain means describing exactly what a record in a fact table represents.
The grains express the level of detail associating with facts in the fact table.
3. Identify the dimensions – in this step, we add a number of dimensions that represent all possible
descriptions that take on single values in the context of each fact in the fact table. Date, time,
product, customer, store, etc., are several good examples of common dimensions.
4. Identify the facts – in the last step, we select the numeric facts that will be loaded into the fact table.
To identify the facts, we need to find the KPIs of the business process or find out what we are trying
to measure.

TISL/CSE/Term-Paper/Semester-7 7
Conceptual Modeling of Data Warehouses

Schemas of Multi-Dimensional Modeling:


 Star schema: A fact table in the middle connected to a set of dimension tables .
 Snowflake schema: A refinement of star schema where some dimensional
hierarchy is normalized into a set of smaller dimension tables, forming a shape
similar to snowflake.
 Fact constellations: Multiple fact tables share dimension tables, viewed as a
collection of stars, therefore called galaxy schema or fact constellation.

STAR SCHEMA
It is the basic structure for a dimensional model. A fact table in the middle connected to a set of dimension
tables
It contains:
 A large central table (fact table)
 A set of smaller attendant tables (dimension table), one for each dimension

SNOWFLAKE SCHEMA

A refinement of star schema where some dimensional hierarchy is further splitting (normalized)
into a set of smaller dimension tables, forming a shape similar to snowflake

However, the snowflake structure can reduce the effectiveness of browsing, since more
joins will be needed.
TISL/CSE/Term-Paper/Semester-7 8
Snowflake schema is an extension of star schema in a way; it separates itself from Star when it
comes to handling large dimension tables. A star schema focuses on a centralized design with a
fact table in it connecting to different dimension tables end to end.

FACT CONSTELLATION SCHEMA


Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or
fact constellation.

TISL/CSE/Term-Paper/Semester-7 9
GALAXY SCHEMA
Galaxy schema is a schema where multiple fact tables share dimension tables. Unlike a fact
constellation schema, the fact tables in a galaxy do not need to be directly related [12]. The
following figure, Fig 5, illustrates a sample of a galaxy schema.

OLAP Operations

 OLAP servers are based on multidimensional view of data. The list of OLAP operations −
 Roll-up
 Drill-down
 Slice and dice
 Pivot (rotate)
Roll-up
 Roll-up performs aggregation on a data cube in any of the following ways −
 By climbing up a concept hierarchy for a dimension
 By dimension reduction

TISL/CSE/Term-Paper/Semester-7 10
Drill-down
 Drill-down is the reverse operation of roll-up. It is performed by either of the following
ways −
 By stepping down a concept hierarchy for a dimension
 By introducing a new dimension.
 It navigates the data from less detailed data to highly detailed data.

Slice
 The slice operation selects one particular dimension from a given cube and provides a new
sub-cube.
 Here Slice is performed for the dimension "time" using the criterion time = "Q1".
 It will form a new sub-cube by selecting one or more dimensions.
TISL/CSE/Term-Paper/Semester-7 11
Dice
 Dice selects two or more dimensions from a given cube and provides a new sub-cube.
 The dice operation on the cube based on the following selection criteria involves three
dimensions.
(location = "Toronto" or "Vancouver") (time = "Q1" or "Q2") (item =" Mobile" or
"Modem")

TISL/CSE/Term-Paper/Semester-7 12
Pivot
 The pivot operation is also known as rotation. It rotates the data axes in view in order to
provide an alternative presentation of data.

TISL/CSE/Term-Paper/Semester-7 13
Rules for Dimensional Modelling
 Load atomic data into dimensional structures.
 Build dimensional models around business processes.
 Need to ensure that every fact table has an associated date dimension table.
 Ensure that all facts in a single fact table are at the same grain or level of detail.
 It's essential to store report labels and filter domain values in dimension tables
 Need to ensure that dimension tables use a surrogate key
 Continuously balance requirements and realities to deliver business solution to support
their decision-making

Benefits of dimensional modelling

 Dimensional model has proved to be more understandable – in the dimensional model, data
is grouped into coherent dimensions that help business users analyze the data easier.
 The dimensional model allows boost query performance – the dimensional model is more
de-normalized therefore it is optimized for querying. In addition, the predictable framework
of a dimensional model allows database engine to make a strong assumption about the data;
this helps database engine boost query performance.
 Dimensional model is extensible.

TISL/CSE/Term-Paper/Semester-7 14
CONCLUSION

Data Modeling landscapes have evolved significantly in the past few years, from the traditional
relational model to now include non-relational models as well. The growth of Big Data and its
unstructured and semi-structured data formats, along with trends in Cloud Computing, Artificial
Intelligence, Data Lakes, Machine Learning, Blockchain and others pushing the need for more
advanced concepts and practices.

Such a monumental shift has caused Data Modeling to also advance. The fundamental changes in
data infrastructure and newer technology evolutions have together contributed to this Data
Management development.

Data Modeling is suddenly facing new challenges as database design not only includes traditional
relational databases, but newer NoSQL databases handling large amounts of unstructured data as
well. Moreover, now advanced database analysts demand the presence of Predictive Models, which
are incorporated only via Artificial Intelligence or Machine Learning technologies.

Now, it is possible for DBAs to access large libraries of Machine Learning or Deep Learning Data
Models offered by third-party vendors. This was unheard of in the database world five or six years
ago. Modern databases are equipped to handle cognitive technologies and live data sources
provisioned through the Cloud.

TISL/CSE/Term-Paper/Semester-7 15
TISL/CSE/Term-Paper/Semester-7 16

You might also like