Professional Documents
Culture Documents
05 Molap Rolap
05 Molap Rolap
hamed.amir.2015@gmail.com
An enterprise wide fall in profit What was the quarterly sales during
last year ??
Profit down by a large percentage What was the quarterly sales at
consistently during last quarter only. regional level during last year ??
Rest is OK
Analysis is directional
Drill Down
Roll Up More in
subsequent
slides
Pivot
Solution
Compute answers to “all” possible “queries”. But how?
NOTE: Queries are multidimensional aggregates at some level
?
Transaction
Data
Data
Loading
OLAP
Reports
Decision
Maker
Data Cube
(MOLAP) Presentation
Tools
Product
Bread 8
Category Division Quarter Eggs 45
Butter 13
Product District Month Week 12
Jam
Juice 10
City Day
w1 w2 w3 w4 w5 w6
Red
Red
Blue
Blue WA WA
OR OR
Gray Gray
CA CA
Jul Aug Sep Jul Aug Sep
WA
Blue
Total OR
Blue
17
Jul Aug Sep Datawarehousing & Datamining-Lecture 5
CA
Jul Aug Sep
Querying the Data Cube
Number of Autos Sold
Cross-tabulation
“Cross-tab” for short CA OR WA Total
Report data grouped by 2 Jul 45 33 30 108
dimensions
Aug 50 36 42 128
Aggregate across other
dimensions Sep 38 31 40 109
Include subtotals Total 133 100 112 345
Operations on a cross-tab
Roll up (further aggregation)
Drill down (less aggregation)
CA OR WA Total
Red 40 29 40 109
Blue 45 31 37 113
Gray 48 40 35 123
19 Total 133
Datawarehousing & Datamining-Lecture 5 100 112 345
“Standard” Data Cube Query
Measurements
Which fact(s) should be reported?
Filters
What slice(s) of the cube should be used?
Grouping attributes
How finely should the cube be diced?
Each dimension is either:
(a) A grouping attribute
(b) Aggregated over (“Rolled up” into a single total)
n dimensions → 2n sets of grouping attributes
Aggregation = projection to a lower-dimensional subspace
State, Month,
Color
Total
22 Datawarehousing & Datamining-Lecture 5
MOLAP evaluation
Advantages of MOLAP:
Instant response (pre-calculated aggregates).
Impossible to ask question without an answer.
Drawbacks of MOLAP:
Long load time ( pre-calculating the cube may take days!).
"curse of dimensionality" i.e. as the number of dimensions increases,
the number of possible aggregates increases exponentially
Time
Product
Geography
Example: Usually the sale price of different items varies based on the geography
and the time of the year. Instead of storing this information using an additional
dimension, the said price is calculated at run time, this may be slow, but can result
in tremendous saving in space, as all the items are not sold throughout the year.
Product
Fact Table
Part 2 P
ar
P2 t
P3 1
Total Part 3
Better Solution:
CUBE & ROLLUP (sql operator)
…
24 summary tables, add in geography, results
in 120 tables
- The largest aggregate will be the Summarization by day
and product because this is the most detailed
- Smart tools will allow less detailed aggregates to be constructed
from more detailed aggregates (full aggregate awareness)
38
Datawarehousing & Datamining-
ROLAP Issues
Maintenance.
Non standard hierarchy of dimensions.
Non standard conventions.
Explosion of storage space requirement.
Should plan for twice the size of the raw data for ROLAP
summaries in most environments.
% Gain
60
40 Aggregation answers
few queries
20
2 4 MB 6 8
46 Datawarehousing & Datamining-Lecture 5
Let’s put some steps forward