Download as pdf or txt
Download as pdf or txt
You are on page 1of 139

Lecture 4

Data Warehouses

Data Warehouses - 2021/22


Lecture Goals

• Goal:
 Multidimensional model
 Logical model – facts, measures and dimensions
 What is a data cube?
 Drilling into data using dimensions

Data Warehouses - 2021/22


 OLAP operations
 What can we do with a data cube?
Multidimensional
Data Model

Data Warehouses - 2021/22


Information systems

• Information systems fall into two major categories:


 those that support the execution of business processes
 those that support the analysis of business processes

• The principles of dimensional design have evolved as a


direct response to the unique requirements of the latter

Data Warehouses - 2021/22


 The core of every dimensional model is a set of business
metrics that captures how a process is evaluated, and a
description of the context of every measurement.
Information systems
• OLTP systems are primarily data capture systems.
 On the other hand, OLAP systems are information delivery
systems.

• A data warehouse is an information delivery system.


 It is not about technology, but about solving users’ problems
and providing strategic information to the user.
 At design we focus on what information the users need, not so
much on how you are going to provide the required information.

Data Warehouses - 2021/22


 When you begin to collect requirements for your proposed
data warehouse, your mindset will have to be different.
 You have to go from a data capture model to an information
delivery model.
 In several ways, building a data warehouse is very different
from building an operational system.
Information
systems
Data Warehouses - 2021/22

• “Through measurement
comes knowledge.”
 Heike Kamerlingh Onnes
Information systems

• OLAP users
 Manager, analyst, management, etc.
 The management of the organization is typically interested
in aggregated data – specific numbers, like:

Data Warehouses - 2021/22


 Quantitative comparison of product sales values over the years
in the same region
 What is the total value of sales of a given product in a specific
store during the period
 How much money was spent on a specific product promotion in a
specific industry and in a specific city
Information systems
• “Through measurement comes knowledge.”
 Heike Kamerlingh Onnes

• A data warehouse is an example of the journey that data


takes, when combined with context, to become
information.
 Prior to application of context it is just a collection of
numbers and letters, bits and bytes.

Data Warehouses - 2021/22


 Yet information is still not enough to enable an organization
to learn from and act based on what they have collected.
 The ability to use accurate data and timely information to
objectively measure and, therefore, proactively manage
outcomes and business processes demonstrates the value of a
data warehouse.
Information
systems

• End users usually can't fully describe what


they want in a BI system:
 They can provide very important insight into how
they think about the company:

Data Warehouses - 2021/22


 They can tell you what performance indicators are
important to them
 Each department can determine how it measures
success in that particular department.
 Give insight into how they combine and use
different information to make strategic decisions
Information systems

Data Warehouses - 2021/22


Information systems
• For OLAP
 Data is organized around subjects
 Not geared towards a particular
application; data in a data
warehouse cut across applications
 Data reflects states from the past
 Data is rather frozen for analysis
 States that can be measured by

Data Warehouses - 2021/22


certain measures
 Measures can be further
analysed – data crunching
 Data is maintained for years
 Broader time-horizon
 Allow more generic analysis
Information systems

• Business processes
 operational activities performed by the organization
 such as taking an order, processing an insurance claim, enrolling
students for class, or creating a snapshot of each account each
month
 managers are interested in capturing performance metrics
 events generated

Data Warehouses - 2021/22


 which translate into facts
 process selection
 is important because it defines a specific design goal and allows
you to declare grain, dimensions and facts.
 Kimball Group
Information systems
• Facts are concepts of key
importance for decision-
making proces
 Correspond to events
dynamically occuring in a
company

• The term fact represents


business measurement
 Imagine that you are standing
in a store and watching the

Data Warehouses - 2021/22


products being sold; unit
quantity, sales amount, tax,
etc. are recorded. for each
product in each sales
transaction.
 These measurements are
recorded when products are
scanned at the checkout
Information systems
• Facts
 In ER it can be represented
as an Entity or n-ary
relationship
 are the measurements that
result from a business
process event and are
almost always numeric.
 A fact corresponds to a
physical observable event,
and not to the demands of a
particular report

Data Warehouses - 2021/22


 defined at a certain level of
detail, called grain, for
example
 a single product sold as part
of a sales transaction
 a daily summary of sales of
individual products within
daily sales transactions
Multi-dimensional Model

• Measurement is easy to discern


 whether by listening to people talk, or reading a report or a
chart

Every dimensional solution describes a process

Data Warehouses - 2021/22



 by capturing what is measured
 and the context in which the measurements are evaluated.
Multi-dimensional Model

• Consider the following business questions:


 What are gross margins by product category for January?
 What is the average account balance by education level?
 What is the return rate by supplier?

• Each focuses on a business process:


 sales, account management, return processing

Data Warehouses - 2021/22


• These process-centric questions do not focus on
individual activities or transactions.
 To answer them, it is necessary to look at a group of
transactions.
Multi-dimensional Model
• Consider the following business questions:
 What are gross margins by product category for January?
 What is the average account balance by education level?
 What is the return rate by supplier?

• Most importantly, each of these questions reveals

Data Warehouses - 2021/22


something about how its respective business process is
measured.
 The study of sales involves the measurement of gross margin.
 Financial institutions measure account balance.
 Purchasing managers watch the return quantities.
Multi-dimensional Model

• Without some kind of context, a measurement is


meaningless.
 If you are told “sales were $10,000”:
 There is not much you can do with this information.
 Is that sales of a single product, or many products?

Data Warehouses - 2021/22


 Does it represent a single transaction, or the company’s total sales
from conception to date?
Multi-dimensional Model

• Dimensions provide context for facts.


 To understand what “$10,000” means, you need more
information.
 “Order dollars were $10,000 for electronic products in January
2009.”

Data Warehouses - 2021/22


 By adding dimensional context—a product category, a month, and
a year—the fact has been made useful.
Multi-dimensional Model
• Consider the following business questions:
 What are gross margins by product category for January?
 What is the average account balance by education level?
 What is the return rate by supplier?

• Context, again, is revealed in business questions or


reports.

Data Warehouses - 2021/22


 In the examplar questions:
 Gross margin is viewed in the context of product categories and
time (the month of January).
 Sick days are viewed in the context of a department (marketing)
and time (last year).
 Payables are viewed in the context of their status (outstanding)
and vendor.
Information systems

• Managers
 can tell you what
measurement units are
important for them
 think of the business in

Data Warehouses - 2021/22


terms of business
dimensions
Information systems

• If your users of the data


warehouse think in terms
of business dimensions for
decision making

Data Warehouses - 2021/22


 you should also think of
business dimensions
Information systems

• The concept of business


dimensions is fundamental
to the multidimensional
model.

• Business dimensions
describe the business-

Data Warehouses - 2021/22


specific objects within the
model
 such as products, customers,
regions, employees, …
Information systems

• Dimensions provide the


“who, what, where, when, • Dimension tables are
why, and how” context sometimes called the “soul”
surrounding a business of the data warehouse
process event  they contain the entry
points and descriptive labels
 contain the descriptive

Data Warehouses - 2021/22


that enable the DW/BI
attributes used by BI system to be leveraged for
applications for filtering and business analysis
grouping the facts
Information systems
• Let us quickly review the business
model of a large retail operation.
 If you just look at daily sales, you soon
realize that the sales are interrelated
to many business dimensions.
 The daily sales are meaningful only
when they are related to the dates of
the sales, the products, the
distribution channels, the stores, the
sales territories, the promotions, and
a few more dimensions.
 Multidimensional views are
inherently representative of any
business model.
 Very few models are limited to three

Data Warehouses - 2021/22


dimensions or less.
 For planning and making strategic
decisions, managers and executives'
probe into business data through
scenarios.
 For example, they compare actual
sales against targets and against
sales in prior periods.
 They examine the breakdown of sales
by product, by store, by sales
territory, by promotion, and so on.
Information systems

• A multidimensional data model is organized around a


central theme
 ability to define a data by subject matter, for example – sales
 delivers information about that theme – sales
 rather than the organisation’s current operations – handling of

Data Warehouses - 2021/22


particular sales order
 focuses on snapshots – measurements – of the operational
data at given moments/events
Information systems
• SNAPSHOTS
 Data in the data warehouse is
stored in units of "snapshots"
 The records in the data
warehouse are
 created as of some moment in
time
 and are in effect a snapshot
taken as of that moment in
time.
 In this regard the data in the
data warehouse is

Data Warehouses - 2021/22


fundamentally different from
the data in an operational
data base environment.
 Data in an operational data
base environment can be
updated.
 Since data in the data
warehouse environment is
snapshot data it cannot be
updated.
Information systems
• What exactly is a process?
 A business process refers to a • In short:
wide range of structured, often
chained, activities or tasks  process models involve
conducted by people or functional decomposition
equipment to produce a
specific service or product for a • DW works
particular user or consument
 by capturing details about
• The entity-relationship significant events or
transactions, it constructs a
model is used to describe record of activities
information
 the entity-relationship guides • DW focus is on:
the database design of an

Data Warehouses - 2021/22


operational system,  modelling the measurement of
business processes
• The process model is used to  support the analysis of
business processes
describe business activity  not supporting the execution of
 the process model guides business processes – OLTP
design of the functional
components.
Information systems
• Snapshots
 There are many different
forms of taking snapshots
 EVENTS
 The most basic consideration
of a snapshot is that the
snapshot has been taken as a
result of an event.
 The event may be triggered
by a wide variety of
occurrences:

Data Warehouses - 2021/22


 an occurrence of a
transaction,
 the periodic passage of
time,
 a threshold having been
reached,
 an audit,
 a special request, etc.
Information systems
• The snapshot triggered has four basic components:
 A key
 identifies the record and primary data
 A unit of time
 usually refers to the moment when the event occured
 Primary data that relates directly to the key
 Secondary data captured as part of the snapshot process that
has no direct relationship to the primary data or key

Data Warehouses - 2021/22


 incidental data that might be later used to support decisions
 extraneous information captured at the moment of snapshot
Information systems
• Example:
 A key
 identifies a sales of product – SalesOrderDetailID
 A unit of time
 identifies when this sales happened – OrderDate
 Primary data that relates directly to the key
 identifies product, price, location, etc. – ProductName,
LocationName

Data Warehouses - 2021/22


 Secondary data
 how much product is in stock, interest rate, weather, etc.
Information systems

• Interaction with an analytic system takes place


exclusively through queries that retrieve data about
business processes;
 information is not created or modified.
 queries involve large numbers of transactions,
 in contrast to the operational system’s typical focus on individual
transactions.

Data Warehouses - 2021/22


 specific questions asked are less predictable,
 and more likely to change over time.
 historic data will remain important to the analytic system
 long after its operational use has passed.
Information systems

Data Warehouses - 2021/22


Information systems
• For effective analysis users must
have easy methods of performing
complex analysis along several
business dimensions.
 They need an environment that
presents a multidimensional
view of data, providing the
foundation for analytical
processing through easy and
flexible access to information.

• Decision makers must be able to


analyze data along any number
of dimensions, at any level of

Data Warehouses - 2021/22


aggregation, with the capability
of viewing results in a variety of
ways.
 They must have the ability to
drill down and roll up along the
hierarchies of every dimension.

• Without a solid system for true


multidimensional analysis, your
data warehouse is incomplete.
Information systems

Data Warehouses - 2021/22


Information systems

• The analysis does not stop


with this single
multidimensional query.
 The user continues to ask
further questions
 comparisons to similar
products,

Data Warehouses - 2021/22


 comparisons among
territories,
 views of the results by
rotating the presentation
between columns and rows
Multidimensional data

• The principles of dimensional modelling address the


unique requirements of analytic systems.
 A dimensional design is optimized for queries that may
access large volumes of transactions, not just individual
transactions.
 It is not burdened with supporting concurrent, high-

Data Warehouses - 2021/22


performance updates.
 It supports the maintenance of historic data, even as the
operational systems change or delete information.
Multidimensional data

• Differs from normalised data model in the degree of data


normalisation
 Limited elasticity
 It is relatively easy to add new tables to a normalised model
 Higher risk of data inconsitency
 Data redundancy

Data Warehouses - 2021/22


 Storing calculated values
 Efficiency
 Intuitiveness
Multidimensional data
• Our mental model of a dataset changes the way we ask
questions.
 the way how we ask questions results from a particular
mental model we use
 it’s easier to comprehend data – if data model is „compatible”
with our mental model

• Mental model of a dataset

Data Warehouses - 2021/22


 whether we think of the data as a
 collection of rows of numbers that we can aggregate bottom-up,
 complete dataset that we can slice top-down to ask questions.
 etc.

• Data model affects performance


OLAP

• In this context, OLAP is a • Dimensional modeling is


design paradigm widely accepted as the
 Logical representation of preferred technique for
data presenting analytical data
 A way to search for because it meets two
information from a physical simultaneous
data store
requirements:

Data Warehouses - 2021/22


 Focused on data aggregation
 Deliver data that business
 aggregates information from users understand.
multiple systems
 stores it in a
 Ensure fast query
multidimensional format performance.
Data Model
• Logical – conceptual and
abstract
 Does not specify the details
of the physical
implementation
 Defines the types of
information needed
 Three objectives:
 Simplicity
 Completeness
 Efficiency

Data Warehouses - 2021/22


Multidimensional data

• At the begining there were rows of data


 Data at row-level
 The numbers are there, listed, one by one
 Suitable for low-level, detailed, focusing on individual data point
workloads
 In a way a bottom-up perspective
 Dataset-level

Data Warehouses - 2021/22


 Most questions you want to ask involve many data points
 typically somehow grouped in different ways
 even for only moderately large datasets
 In a way a top down approach to data
Multidimensional data

Order Amount / ID
20

15

Data Warehouses - 2021/22


10

0
0 2 4 6 8 10 12 14 16
Multidimensional data

Order Amount / ID
20
15
10
5
0
0 2 4 6 8 10 12 14 16

Data Warehouses - 2021/22


Order Amount / Day
20
15
10
5
0
0 1 2 3 4 5 6 7 8
Multidimensional data

Order Amount / ID Order Amount / Day


20 20

0 0
0 5 10 15 20 0 2 4 6 8

SUM OF Order Amount / Day

Data Warehouses - 2021/22


60

50

40

30

20

10

0
0 1 2 3 4 5 6 7 8
Multidimensional data

Data Warehouses - 2021/22


Multidimensional data

• The founding principle of dimensional design is


disarmingly simple.
 Dimensional design supports analysis of a business
process by modeling how it is measured.

The dimensional model of a business process is made up

Data Warehouses - 2021/22



of two components:
 measurements and their context
Managerial Perspective

• “There are typically a number of different dimensions


from which a given pool of data can be analysed.”

• “This plural perspective, or Multidimensional

Data Warehouses - 2021/22


Conceptual View appears to be the way most business
persons naturally view their enterprise.”
 E.F. Codd, S.B. Codd and C.T. Salley, 1993
Managerial Perspective

Data Warehouses - 2021/22


Managerial Perspective

• Data Warehouse users


 That translate directly to :
 Facts and measures:
 Sales amount, Sales value, Profit, etc.

Data Warehouses - 2021/22


 And dimensions:
 Product, Markets, Time, etc.
Multi-dimensional Model
• These two simple concepts, measurement and
context, are the foundation of dimensional design.

• A multidimensional data model is organized around a


central theme
 ability to define a data by subject matter, for example – sales
 delivers information about a theme, rather than the
organisation’s current operations

Data Warehouses - 2021/22


• Every dimensional solution describes a process by
capturing what is measured and the context in which
the measurements are evaluated.
 measurements are called facts
 context descriptors are called dimensions
Multi-dimensional Model
• Facts and measures
 Phenomena we are focused on (Facts), and the numerical values
that describe the phenomena (Measures)
 Eg. Each sales record is a fact, and its sales value is a measure
 central element of the dimensional model and further data
warehouse

• Dimensions
 Perspectives on the phenomena
 Group correlated attributes into the same dimension to ease analysis

Data Warehouses - 2021/22


tasks
 Eg. Each sales record is associated with its values of Product, Store,
Time

• In general
 if an attribute is commonly aggregated or summarized, it is a fact
 if an attribute is used to drive aggregations or summarizations, it
is a dimension.
Multi-dimensional Model

Data Warehouses - 2021/22


Facts and measures

• Facts
 describe business processes, certain business events
 E.g. single game at a slot machine

• Measures
 quantitative context

Data Warehouses - 2021/22


 E.g. win value, time spent
 high volume of data
Dimensions

• Dimensions
 provide the “who, what, where, when, why, and how” context
 surrounding the subject – business process event
 contain the descriptive attributes
 used by BI applications for filtering and grouping of the facts

• Examples:
 Customer, Product, Time, Store, ...

Data Warehouses - 2021/22


• If the element values are used to
 filter queries, order data, control aggregation, or drive master–detail relationships,
it is most likely a dimension.
Facts and Dimensions
• Fact:
 A fact is something that has happened
or been measured.
 Like sale of a single product to one
customer or the total amount of sales
of a particular item in a month.
 fact is a numerical value
 usually associated with several
dimensions

• Dimension:
 A dimension is the main analytical
object in a BI space.
 can be a list of products or customers,

Data Warehouses - 2021/22


time, geographic location, or any
other entity that is used to analyze
numeric data.
 Dimensions are related to facts.
 The reason for their existence is the
addition of qualitative information to
the numerical information contained
in the facts.
 Sometimes a dimension may have to
do with other dimensions, but
directly or indirectly it will always be
somehow related to the facts.
Facts

OLTP event DW
tran1 event
tran2 event
tran3
tran4

Data Warehouses - MASTERS - 2021/22


OLTP status DW
tran1
tran2
tran3
event
OLTP s1 DW
tran1 s2
tran2 s3
tran3
Multi-dimensional Model

• Query
 Show me sales, profit and average call volume per day
for my 10 most profitable salespeople

• Components
 Facts
 a focus of interest for decision-making - subject

Data Warehouses - 2021/22


 Measures
 attributes that describe facts from different points of view - value
 Dimensions
 discrete attributes which determine the granularity adopted to
represent facts - perspective
OLAP

• Example of OLAP needs:


 Comparisons (this period v.s. last period)
 Show me the sales per store for this year and compare it to that of
the previous year to identify discrepancies
 Ranking and statistical profiles (top N/bottom N)
 Show me sales, profit and average call volume per day for my 10

Data Warehouses - 2021/22


most profitable salespeople
 Custom consolidation (market segments, ad hoc groups)
 Show me an abbreviated income statement by quarter for the last
four quarters for my northeast region operations
Managerial Perspective

• Consider an example of a • Consider an example of a


car rental company medical center
 Facts ?  Facts ?
 Measures ?  Measures ?

Data Warehouses - 2021/22


 Dimensions ?  Dimensions ?
DATA as a cube

Data Warehouses - 2021/22


Data as a cube
• Analysis of sales units
along the three business
dimensions of
 product, time, and
geography

• Three dimensions form a


collection of cubes

Data Warehouses - 2021/22


 In each of the small
dimensional cubes
 you find the sales units for
that particular slice of time,
product, and geographical
division
Data as a cube

• Logical
 data as a multidimensional
cube

• Facts and dimensions form


a cube

Data Warehouses - 2021/22


 Axis represent dimensions
 Points represent measures
Data as a cube

Data Warehouses - 2021/22


Data as a cube

• Perspective
 It’s not the bottom-up perspective of rows
 The dataset is treated as a whole
 We do not really care about the individual rows.
 Dataset perspective

Data Warehouses - 2021/22


 Asking questions means breaking the data up in many different
ways.
 This becomes more natural as the dataset gets larger and more
complex.
Data as a cube

• Multidimensional data
 viewed as cubes:
 eg. sales cube
 eg. olympic standings cube

Data Warehouses - 2021/22


 identified data
 can be modelled and viewed
in multiple dimensions
Data as a cube

• Multi-dimensional data
 Each measure’s value is
accessed
 Using coordinate system
composed of a set of dimensions
 Particular attributes of the

Data Warehouses - 2021/22


dimensions define a single point
in the cube
 Example
 Q1, Vancouver, Computer → 825
Data as a cube
Location: South
2012 2013 2014
Laptop 1250 1016 1023
s
Display 506 619 602
s
Printer 720 531 646
s

Data Warehouses - 2021/22


Category

Location
Data as a Cube

• Multidimensional
 many dimensions, many
axeses

Data Warehouses - 2021/22


Data as a cube

Product
• Multi-dimensional data
 Regional manager perspective
 Dimension

Region

Product
• Multi-dimensional data
 Product manager perspective
 Dimension

Data Warehouses - 2021/22


• Multi-dimensional data Product Region
 Financial manager perspective
 Dimension

Region
Data as a cube

Data Warehouses - 2021/22


Data as a cube

• Contrary to what the name


suggests
 a data cube /OLAP
cube/hypercube is not
necessarily a strict

Data Warehouses - 2021/22


mathematical cube
 as its sides are not always
equal.
Data as a cube

• Multi-dimensional data
 Multiple measures
 A point in a multi-dimensional space can represent several values
 Add virtual dimensions representing measures

Data Warehouses - 2021/22


Data as a cube

• Aggregations
 OLAP queries also require
the ability to aggregate data
over different dimensions

Data Warehouses - 2021/22


 We need to have totals
data as a cube

• Aggregations
 Many dimensions = many
aggregates
 We need to calculate many

Data Warehouses - 2021/22


different totals
Data as a cube

• Left to right
 How fine-grained data turns
into a cube.

• Right to left
 How the cube containing all

Data Warehouses - 2021/22


the data in one value is
broken down:
 first by time, then by
product, then by location.
Data as a cube

• How you do you ask


questions of such a dataset?
 You specify how you want it • It’s the dataset perspective
broken down and how to
aggregate the measures. • Process:
• Example  Split the dataset along the
gender dimension – into two
 Dataset of incomes for many groups.

Data Warehouses - 2021/22


different job titles, ages,
genders, education levels,  Calculate the averages for
etc. each gender.
 Ask about the average
income for men vs. women
Data as a cube
• This way, you can quickly
ask many different
questions:
 What’s the average by
education level?
 How about gender and
education level? • It’s all just a matter of
 What about age? asking the question.

Data Warehouses - 2021/22


 Etc.

• Different aggregations in
each slice are possible
 average, median, minimum,
maximum, etc.
Multidimensional Data

Data Warehouses - 2021/22


Dimensions

• Dimensions are rarely flat


 When you look at a report, for example, you might decide you want to know
more.
 analysis as the process of “drilling into data”
 “drill down” - a summarized view is replaced with a more detailed view
 This interactive exploration of facts characterizes much of the interaction users have
with the data warehouse or data mart.

Data Warehouses - 2021/22


Dimensions

• Attribute hierarchies offer


a natural way to organize
facts at successively deeper
levels of detail.
 Users understand them
intuitively, and drilling
through a hierarchy may

Data Warehouses - 2021/22


closely match the way many
users prefer to break down
key business measurements.
Dimensions

• Attribute hierarchies offer a natural way to organize facts at


successively deeper levels of detail
 Users understand them intuitively
 drilling through a hierarchy may closely match the way many users prefer to break
down key business measurements

Data Warehouses - 2021/22


Dimensions
• Many dimensions can be
understood as a hierarchy of
attributes, participating in
successive master–detail
relationships:
 the bottom of such a hierarchy
represents the lowest level of detail
 Highest granularity of data
 the top represents the highest level

Data Warehouses - 2021/22


of summarization
 Lowest granularity of data
 each level may have a set of
attributes and participates in a
parent–child relationship with the
level beneath it.
Dimensions
• Dimensions are rarely flat
 Many dimensions can be
understood as a hierarchy of
attributes, participating in
successive master–detail
relationships.
 the bottom of such a
hierarchy represents the
lowest level of detail
 the top represents the
highest level of

Data Warehouses - 2021/22


summarization
 Each level may have a set of
attributes and participates
in a parent–child
relationship with the level
beneath it.
 Days make up months,
months fall into quarters, etc.
Dimensions

• Dimension hierarchies
 are organized in classification levels
 e.g., Day, Month, …
 dependencies between the classification levels
 are described by the classification schema through functional dependencies

Data Warehouses - 2021/22


Dimensions
• In general - partially ordered set of dimensional attributes
 ({D1, …, Dn, TopD}; →)
 → functional dependency
 attribute A determines B (A → B), if the value of B is uniquely determined by
the value of A
 TopD is the maximum element regarding →
 ∀𝑖=1,𝑛 : 𝐷𝑖 →TopD
 Partial ordering allows for parallel hierarchies
 Fully-ordered set of classification levels is called a Path

Data Warehouses - 2021/22


Dimensions

• The highest level of


summarization, all
products, is added for
convenience.
 It represents a complete
summarization of the

Data Warehouses - 2021/22


product dimension
 studying a fact by all
products results in a single
row of data
Dimensions

• Example:
 classification hierarchy path from product dimension
 fully-ordered set of classification levels is called a Path

Data Warehouses - 2021/22


Dimensions

Data Warehouses - 2021/22


Dimensions

• Some software tools link the concept of drilling to the


concept of an attribute hierarchy.
 hierarchy as a predefined drill path
 sometimes referred to as drilling within an attribute
hierarchy

When viewing a fact:

Data Warehouses - 2021/22



 drilling down is accomplished by adding a dimension
attribute from the next level down the hierarchy
 drilling up is achieved by removing
Dimensions

• Dimension hierarchies
 A concept hierarchy defines a sequence of mappings from a set
of low-level concepts to higher-level, more general concepts.

Data Warehouses - 2021/22


Dimensions

• Dimension hierarchies
 may also be defined by discretizing or grouping values for a given
dimension or attribute, resulting in a set-grouping hierarchy

Data Warehouses - 2021/22


Using hierarchies

Data Warehouses - 2021/22


Dimensions

Data Warehouses - 2021/22


Dimensions

• Attribute hierarchies describe relationships among


dimension attributes
 E.g. products fall within brands; brands within categories
 these rules can be inferred without referring to actual data

• Another form of hierarchy may exist among instances of


dimensions

Data Warehouses - 2021/22


 E.g. employees report to other employees
 E.g. companies own other companies

• The relationship is recursive


 there may be any number of levels in the hierarchy
Dimensions

• Instance hierarchy may be useful in studying facts


 suppose each call has an assigned employee
 employees have managers that they report to
 It may be useful to roll up all transactions to regional managers level
 explore the data by drilling down through multiple levels

Data Warehouses - 2021/22


Measures
• Measure has two components
 Numerical value: (sales price)
 Aggregation formula (SUM): used for aggregating/combining a number of measure
values into one

• Measure value is determined by dimension value combination


 Measure value should be meaningful for all aggregation levels

• Aggregates

Data Warehouses - 2021/22


 together with measure values we store summarizing information
Multi-dimensional data

• Data cube
 Lattice of cuboids

Data Warehouses - 2021/22


Multi-dimensional data

• Data cube
 Lattice of cuboids

Data Warehouses - 2021/22


Multi-dimensional data
• Lowest level of summarization is
called the base cuboid
 In general N-D, where N is the
number of dimensions
 E.g. base cuboid for the given
time, item, location, and supplier
dimensions - 4-D cuboid

• M-D cuboid
 M dimensions, with N-M
summaries
 E.g. 3-D cuboid for time, item,
and location - summarized for all

Data Warehouses - 2021/22


suppliers

• The 0-D cuboid


 holds the highest level of
summarization
 called the apex cuboid
 E.g. total sales summarized over
all four dimensions
 typically denoted by all
Multi-dimensional data

• 2 dimensions
 location
 country
 state
 city
 topic

Data Warehouses - 2021/22


 topic
 cat
 subcat
Multi-dimensional data

• Navigation paths
• GROUP BY
All ALL
 <>
 <Name> Category CD DVD
 <Brand>
 <Category> Brand TOS BOS TOS
 <Name, Category>

Data Warehouses - 2021/22


 <Name, Brand> Name CD1 CD2 DVD1
 <Brand, Category>
 <Name, Brand, Category>
Multidimensional Data

Data Warehouses - 2021/22


Dimensions
• Dimensions
 are used for
 Selection of data
 Grouping of data at the right level of detail
 consist of dimension values
 Product dimension have values ”milk”, ”cream”, …
 Time dimension have values ”1/1/2001”, ”2/1/2001”,…
 values may have an ordering
 Used for comparing cube data across values

Data Warehouses - 2021/22


 Example: ”percent sales increase compared with last month”
 Especially used for Time dimension

• Dimensions are sometimes called the “soul” of the data


warehouse
 they contain the entry points and descriptive labels that
enable the DW/BI system to be leveraged for business
analysis
Dimensions

Data Warehouses - 2021/22


Facts

• Facts
 A fact is an element of the multi-dimensional space
 Associates a set of dimension elements with measures
 „cube cell“
 Granularity relates to dimensions

Data Warehouses - 2021/22


 Fact is uniquely identified through a combination of dimension
elements
 Granularity should be defined – depending on user requirements
and data availability
 Single granularity for a fact table
 Qualifying and quantifying information
Measures

• Measures
 Quantifying information
 Usually numeric
 in simple terms measures are normally the elements you want to
add up in reporting

Data Warehouses - 2021/22


 Examples:
 sales figures
 turnover
 measurements (temperature, rainfall, …)
Dimensional and ER
• ER Model
 OLTP systems
 Data consistency, non-redundancy,
and efficient data storage are
critical
 Entity-Relationship Modelling
 Removes data redundancy
 Ensures data consistency
 Expresses microscopic relationships

• Dimensional Model
 Focus is on how managers view the

Data Warehouses - 2021/22


business
 Information is centered around a
business process – how the business
measures the process in many ways
along several business dimensions
 Dimensional Modelling
 Captures critical measures
 Views along dimensions
 Intuitive to business users
Multidimensional data

• Multidimensional data
 Aggregates
 together with measure values
we store summarizing
information

Data Warehouses - 2021/22


Multidimensional data

• Multidimensional data
 Aggregates
 together with measure values
we store summarizing
information

Data Warehouses - 2021/22


Data as a cube

• How to physically represent a cube?


 MOLAP
 ROLAP

Data Warehouses - 2021/22


Data as a cube

• Implemented in a relational database,


 the dimensional model is called a star schema.

• Implemented in a multidimensional database,


 it is known as a data cube.

Data Warehouses - 2021/22


• If any part of your data warehouse includes a star
schema or a cube
 it leverages dimensional design.
Data as a data cube

• JAVA • C like

Data Warehouses - 2021/22


Product

Product

Customer
Customer
Data as a star

Data Warehouses - 2021/22


Product

Customer
Cubes and Stars

Data Warehouses - 2021/22


OLAP Server
• The term multidimensional
• A multidimensional database is often confused
database is a form of with the term OLAP
database in which data is (analytical processing):
stored in a  MDB is a database
multidimensional model
 OLAP is an activity/proces
 cell contains values used to analyze a database
 the cell represents a

Data Warehouses - 2021/22


snapshot, a particular • The confusion is caused by
measurement of a business
process the word OLAP cube
 coordinates of each cell  OLAP cube has the same
 are defined by the number of
meaning as MDB
hierarchical dimensions  means a multidimensional
database
OLAP Server
• Multidimensional DB (MDB)
 Optimized for DW and OLAP applications
 created using input from the staging area
 designed for efficient and convenient storage and retrieval of large
volumes of data

Data Warehouses - 2021/22


OLAP Server
• What do we use to manage
multidimensional databases?
 The system that manages and
operates multidimensional
databases is called a
multidimensional database
management system (MDBMS)
 known as OLAP server or OLAP
engine
Integrat
• The standard interface that or

Data Warehouses - 2021/22


connects to MDBMS is XML for Conver
analysis t Conver
t Conver
 known as XMLA t
OLAP Operations

Data Warehouses - 2021/22


Information systems

Data Warehouses - 2021/22


OLAP Operations

Data Warehouses - 2021/22


OLAP Operations

Data Warehouses - 2021/22


OLAP Operations

• Data in a cube
• Starnet query model
 What can we do with the cube ? all

continent

country

branch

Data Warehouses - 2021/22


all year monthday item subcatcat all

customer

group

all
OLAP Operations

• Drilling into data


 The word drill connotes digging deeper into something.
 In a dimensional context, that something is a fact.
 A generic concept of drilling is expressed simply as the
addition of dimensional detail.

Data Warehouses - 2021/22


OLAP Operations
all

• OLAP operations continent

 Drill down (roll-down) country


 Category → Product
branch
 Region → City
 Quarter → Month all year month day item subcat cat all

Data Warehouses - 2021/22


customer

group

all
OLAP Operations

• OLAP operations
 Drill down (roll-down)
 Category → Product
 Region → City
 Quarter → Month

Data Warehouses - 2021/22


OLAP Operations

Data Warehouses - 2021/22


OLAP Operations
all

• OLAP operations continent

 Roll up (drill-up) country


 Product → Category
branch
 City → Region
 Month → Quarter all year month day item subcat cat all

Data Warehouses - 2021/22


customer

group

all
OLAP Operations

• OLAP operations
 Roll up (drill-up)
 Product → Category
 City → Region
 Month → Quarter

Data Warehouses - 2021/22


OLAP Operations

Data Warehouses - 2021/22


OLAP Operations

• OLAP operations
 Slice
 Selection on one dimension of
the given cube
 Result is a subcube

Data Warehouses - 2021/22


OLAP Operations

Data Warehouses - 2021/22


OLAP Operations

• OLAP operations
 Dice
 Selection on two or more
dimensions

Data Warehouses - 2021/22


 Results in a subcube
OLAP Operations

Data Warehouses - 2021/22


OLAP Operations
• OLAP operations
 Slice and dice all

continent

country

branch

all year month day item subcat cat all

Data Warehouses - 2021/22


customer

group

all
OLAP Operations

• OLAP operations
 Pivot
 also known as rotation
 rotates data axes
 to provide an alternative
presentation of data

Data Warehouses - 2021/22


OLAP Operations
• OLAP operations
 Roll up (drill-up)
 summarize data by climbing up hierarchy or by dimension reduction
 It navigates the data from highly detailed data to less detailed data
 Drill down (roll down)
 reverse of roll-up - from higher level summary to lower level summary or detailed data,
or introducing new dimensions
 It navigates the data from less detailed data to highly detailed data

Data Warehouses - 2021/22


 Slice
 project and select
 Dice
 form a new sub cube
 Pivot (rotate)
 reorient the cube, visualization, 3D to series of 2D planes.
OLAP Operations
• OLAP operation all
 Drill across continent

 Set together multiple facts country


branch
 Requires a conformed dimension
all year month
day cat all
item subcat
 executes queries involving (i.e., across)
more than one fact table
customer

group

Data Warehouses - 2021/22


all

all year month


day cat all
item subcat

supplier

type

all
• Adamson C.
 Star Schema The Complete Reference
 McGraw-Hill, 2010

Jensen C.S., Pedersen T.B., Thomsen C.,


Bibliography •
 Multidimensional Databases and Data
Warehousing,
 Morgan & Claypool Publishers series
“Synthesis lectures on data management”,
2010

• Inmon W.,
 Building the Data Warehouse,
 John Wiley & Sons, New York 2002

SOURCES • Claudia Imhoff, Nicholas Galemmo,


Jonathan G. Geiger,

Data Warehouses - 2021/22


 Mastering Data Warehouse Design -
Relational and Dimensional Techniques,
 Wiley Publishing, Inc., 2003

• Efraim Turban, Ramesh Sharda, Dursun


Delen, David King, Janine E. Aronson
 Business Intelligence: A Managerial
Approach
 Prentice Hall, 2010

You might also like