Professional Documents
Culture Documents
Dimensional Modeling in Data Warehousing: Bachelor of Technology Computer Science and Engineering
Dimensional Modeling in Data Warehousing: Bachelor of Technology Computer Science and Engineering
Warehousing
Bachelor of Technology
Computer Science and Engineering
OCTOBER 2019
TISL/CSE/Term-Paper/Semester-7 2
TABLE OF CONTENTS
1. Abstract
2. Introduction
3. Body
i. E-R Modeling Versus Dimensional Modeling
ii. Elements of Dimensional Data Model
iii. Dimensional Modelling process
iv. Schemas of Multi-Dimensional Modeling
v. OLAP Operations
vi. Rules for Dimensional Modelling
vii. Benefits of dimensional modelling
4. Conclusion
5. References
TISL/CSE/Term-Paper/Semester-7 3
ABSTRACT
The Data Warehouse (DW) is considered as a collection of integrated, detailed, historical data,
collected from different sources. DW is used to collect data designed to support management
decision making. There are so many approaches in designing a data warehouse both in conceptual
and logical design phases. The conceptual design approaches are dimensional fact model,
multidimensional E/R model, star-ER model and object-oriented multidimensional model. And the
logical design approaches are flat schema, star schema, fact constellation schema, galaxy schema
and snowflake schema. In this paper we have focused on Dimensional Modeling in the Data
Warehouse. Dimensional Modeling (DM) is most popular technique in data warehousing. In DM a
model of tables and relations is used to optimize decision support query performance in relational
databases.
TISL/CSE/Term-Paper/Semester-7 4
INTRODUCTION
In Data Modeling, Dimensional Modeling is used for OLAP Applications design.ER Modeling is
used for OLTP application design.
DIMENSIONAL MODELING: Dimensional modeling is the name of a logical design technique used for
data warehouses. Every dimensional model is composed of a “fact” table and a set of “dimension” tables.
Conformed fact and dimension elements are elements that conform to the enterprises centralized metadata
database. For example, “store_id” would have a common definition and attributes across the enterprise and
as such would have the same information across dimension tables.
TISL/CSE/Term-Paper/Semester-7 5
E-R Modeling Versus Dimensional Modeling
Fact
Facts are the measurements/metrics or facts from your business process. For a Sales business process, a
measurement would be quarterly sales number.
Dimension
Dimension provides the context surrounding a business process event. In simple terms, they give who, what,
where of a fact. In the Sales business process, for the fact quarterly sales number, dimensions would be
Who – Customer Names
Where – Location
What – Product Name
In other words, a dimension is a window to view information in the facts.
Attributes
The Attributes are the various characteristics of the dimension.In the Location dimension, the
attributes can be
State
Country
Zipcode etc.
Attributes are used to search, filter, or classify facts. Dimension Tables contain Attributes.
Fact Table
A fact table is a primary table in a dimensional model.
A Fact Table contains
1. Measurements/facts
2. Foreign key to dimension table
Dimension table
A dimension table contains dimensions of a fact.
They are joined to fact table via a foreign key.
TISL/CSE/Term-Paper/Semester-7 6
Dimension tables are de-normalized tables.
The Dimension Attributes are the various columns in a dimension table
Dimensions offers descriptive characteristics of the facts with the help of their attributes
No set limit set for given for number of dimensions
The dimension can also contain one or more hierarchical relationships
The dimensional data model is built based on star schema with a fact table at the center surrounded by a
number of dimension tables. The following four-step process is commonly used in dimensional modeling
design:
1. Select the business process
2. Declare the grain
3. Identify the dimensions
4. Identify the Fact
1. Select the business process to model – business process is daily activities performed in your
company supported by an online transaction system (OLTP) or source system. In this step, we have
to gather the requirements from business users to select the business process or source of
measurement to model. Good examples of business processes are order processing, shipments,
materials purchasing, GL, etc.
2. Declare the grain – after having a business process to model, we need to declare the grain of a
business process. Declaring grain means describing exactly what a record in a fact table represents.
The grains express the level of detail associating with facts in the fact table.
3. Identify the dimensions – in this step, we add a number of dimensions that represent all possible
descriptions that take on single values in the context of each fact in the fact table. Date, time,
product, customer, store, etc., are several good examples of common dimensions.
4. Identify the facts – in the last step, we select the numeric facts that will be loaded into the fact table.
To identify the facts, we need to find the KPIs of the business process or find out what we are trying
to measure.
TISL/CSE/Term-Paper/Semester-7 7
Conceptual Modeling of Data Warehouses
STAR SCHEMA
It is the basic structure for a dimensional model. A fact table in the middle connected to a set of dimension
tables
It contains:
A large central table (fact table)
A set of smaller attendant tables (dimension table), one for each dimension
SNOWFLAKE SCHEMA
A refinement of star schema where some dimensional hierarchy is further splitting (normalized)
into a set of smaller dimension tables, forming a shape similar to snowflake
However, the snowflake structure can reduce the effectiveness of browsing, since more
joins will be needed.
TISL/CSE/Term-Paper/Semester-7 8
Snowflake schema is an extension of star schema in a way; it separates itself from Star when it
comes to handling large dimension tables. A star schema focuses on a centralized design with a
fact table in it connecting to different dimension tables end to end.
TISL/CSE/Term-Paper/Semester-7 9
GALAXY SCHEMA
Galaxy schema is a schema where multiple fact tables share dimension tables. Unlike a fact
constellation schema, the fact tables in a galaxy do not need to be directly related [12]. The
following figure, Fig 5, illustrates a sample of a galaxy schema.
OLAP Operations
OLAP servers are based on multidimensional view of data. The list of OLAP operations −
Roll-up
Drill-down
Slice and dice
Pivot (rotate)
Roll-up
Roll-up performs aggregation on a data cube in any of the following ways −
By climbing up a concept hierarchy for a dimension
By dimension reduction
TISL/CSE/Term-Paper/Semester-7 10
Drill-down
Drill-down is the reverse operation of roll-up. It is performed by either of the following
ways −
By stepping down a concept hierarchy for a dimension
By introducing a new dimension.
It navigates the data from less detailed data to highly detailed data.
Slice
The slice operation selects one particular dimension from a given cube and provides a new
sub-cube.
Here Slice is performed for the dimension "time" using the criterion time = "Q1".
It will form a new sub-cube by selecting one or more dimensions.
TISL/CSE/Term-Paper/Semester-7 11
Dice
Dice selects two or more dimensions from a given cube and provides a new sub-cube.
The dice operation on the cube based on the following selection criteria involves three
dimensions.
(location = "Toronto" or "Vancouver") (time = "Q1" or "Q2") (item =" Mobile" or
"Modem")
TISL/CSE/Term-Paper/Semester-7 12
Pivot
The pivot operation is also known as rotation. It rotates the data axes in view in order to
provide an alternative presentation of data.
TISL/CSE/Term-Paper/Semester-7 13
Rules for Dimensional Modelling
Load atomic data into dimensional structures.
Build dimensional models around business processes.
Need to ensure that every fact table has an associated date dimension table.
Ensure that all facts in a single fact table are at the same grain or level of detail.
It's essential to store report labels and filter domain values in dimension tables
Need to ensure that dimension tables use a surrogate key
Continuously balance requirements and realities to deliver business solution to support
their decision-making
Dimensional model has proved to be more understandable – in the dimensional model, data
is grouped into coherent dimensions that help business users analyze the data easier.
The dimensional model allows boost query performance – the dimensional model is more
de-normalized therefore it is optimized for querying. In addition, the predictable framework
of a dimensional model allows database engine to make a strong assumption about the data;
this helps database engine boost query performance.
Dimensional model is extensible.
TISL/CSE/Term-Paper/Semester-7 14
CONCLUSION
Data Modeling landscapes have evolved significantly in the past few years, from the traditional
relational model to now include non-relational models as well. The growth of Big Data and its
unstructured and semi-structured data formats, along with trends in Cloud Computing, Artificial
Intelligence, Data Lakes, Machine Learning, Blockchain and others pushing the need for more
advanced concepts and practices.
Such a monumental shift has caused Data Modeling to also advance. The fundamental changes in
data infrastructure and newer technology evolutions have together contributed to this Data
Management development.
Data Modeling is suddenly facing new challenges as database design not only includes traditional
relational databases, but newer NoSQL databases handling large amounts of unstructured data as
well. Moreover, now advanced database analysts demand the presence of Predictive Models, which
are incorporated only via Artificial Intelligence or Machine Learning technologies.
Now, it is possible for DBAs to access large libraries of Machine Learning or Deep Learning Data
Models offered by third-party vendors. This was unheard of in the database world five or six years
ago. Modern databases are equipped to handle cognitive technologies and live data sources
provisioned through the Cloud.
TISL/CSE/Term-Paper/Semester-7 15
TISL/CSE/Term-Paper/Semester-7 16