Professional Documents
Culture Documents
DWH Lectures OLAP
DWH Lectures OLAP
Lecture-5
1
Online Analytical Processing
(OLAP)
2
DWH & OLAP
percentage consistently during regional level during last year ??
last quarter only. Rest is OK
Analysis is directional
Drill Down
More in
Roll Up subsequent
slides
Pivot
5
Challenges…
Not feasible to write predefined queries.
Fails to remain user_driven (becomes
programmer driven).
Solution
Compute answers to “all” possible “queries”. But
how?
?
Data
Loading
Reports
Transaction
Data
Decision
OLAP
Maker
Data Cube
(MOLAP) Presentation
Tools
11
OLTP vs. OLAP
Feature OLTP OLAP
Level of data Detailed Aggregated
Amount of data per Small Large
transaction
Views Pre-defined User-defined
Typical write Update, insert, delete Bulk insert
operation
“age” of data Current (60-90 days) Historical 5-10 years and
also current
Number of users High Low-Med
Tables Flat tables Multi-Dimensional tables
Database size Med (109 B – 1012 B) High (1012 B – 1015 B)
Query Optimizing Requires experience Already “optimized”
Data availability High Low-Med 12
OLAP FASMI Test
Fast: Delivers information to the user at a fairly constant rate.
Most queries answered in under five seconds.
14
OLAP Implementations
1. MOLAP: OLAP implemented with a multi-
dimensional data structure.
17
Aggregations in MOLAP
Sales volume as a function of (i) product, (ii) time,
and (iii) geography
og
E
Ge
Hierarchical summarization paths N
Industry Province Year Milk 23
Product
Bread 8
Category Division Quarter Eggs 45
Butter 13
Product District Month Week 12
Jam
12,000
OJ RK 8UP PK MJ BU AJ
10,000
8,000 Drill-down
6,000
4,000
2,000
-
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 20
2001 2002
Querying the cube: Pivoting
40,000
Juices Soda Drinks
35,000
30,000
25,000
20,000
15,000
10,000
5,000
-
2001 2002
18,000
2001 2002
16,000
14,000
12,000
10,000
8,000
6,000
4,000
2,000
-
Orange Mango Apple Rola- 8-UP Bubbly- Pola-
juice juice juice Kola UP Kola
21
Question: Juices are selling well or the sodas?
Looking at the highest level summary, with the lowest data granularity,
shows that the sales of Soda Drinks are higher for years 2001 and
2002.
Question: Which sodas are doing well?
Just clicking on the Sodas column shows the second histogram. It
shows that the sales of 8-UP cola are highest in both years i.e. 2001
and 2002.
Question: Is 8-UP a clear winner?
Flipping the dimensions i.e. instead of product on x-axis putting time on
x-axis and further drilling-down shows that there is a trend of increasing
sales of 8-UP, BUT the sale is alarmingly down during QTR3 in 2001
and QTR1 and QTR3 during 2002.
All of the above information was extracted by exploring 46 table entries
and three levels of aggregates. The total exploration time took less
than 3 seconds.
Question: Why erratic sales of 8-UP?
This may probably require exploring other dimensions or their
combinations.
22
MOLAP evaluation
Advantages of MOLAP:
23
MOLAP evaluation
Drawbacks of MOLAP:
24
MOLAP Implementation issues
Maintenance issue: Every data item
received must be aggregated into every cube
(assuming “to-date” summaries are
maintained). Lot of work.
26
Storage
If you have entries above 100 in each dimension then things tend to
blow up. When does it actually blows up depends on the tool being
used and how sparse matrices are stored. So you might end up
requiring more cube space then allocated or even available.
27
Partitioned Cubes
To overcome the space limitation of MOLAP, the cube is
partitioned.
Time
Product
Geography
31
Why ROLAP?
Issue of scalability i.e. curse of dimensionality
for MOLAP
.
og
Product Ge
Fact Table
33
Time
How to create “Cube” in ROLAP
Cube is a logical entity containing values of a
certain fact at a certain aggregation level at an
intersection of a combination of dimensions.
P2
P3
Total
34
How to create “Cube” in ROLAP using SQL
For the table entries, without the totals
SELECT S.Month_Id, S.Product_Id,
SUM(S.Sales_Amt)
FROM Sales
GROUP BY S.Month_Id, S.Product_Id;
36
CUBE Clause
37
ROLAP & Space Requirement
If one is not careful, with the increase in number of
dimensions, the number of summary tables gets very
large
38
EXAMPLE: ROLAP & Space Requirement
A naïve implementation will require all combinations of
summary tables at each and every aggregation level.
…
24 summary tables, add in
geography, results in 120 tables
39
ROLAP Issues
Maintenance.
Non standard hierarchy of dimensions.
Non standard conventions.
Explosion of storage space requirement.
Aggregation pit-falls.
40
ROLAP Issue: Maintenance
Summary tables are mostly a maintenance
issue (similar to MOLAP) than a storage
issue.
Calendar:
01 Jan. to 31 Dec or
01 Jul. to 30 Jun. or
01 Sep to 30 Aug.
Week:
Mon. to Sat. or Thu. to Wed. 43
ROLAP Issue: Storage space explosion
44
ROALP Issues: Aggregation pitfalls
45
How to Reduce Summary tables?
Many ROLAP products have developed means
to reduce the number of summary tables by:
46
Performance vs. Space Trade-Off
Maximum performance boost implies using
lots of disk space for storing every pre-
calculation.
47
Performance vs. Space Trade-off using Wizard
60
40 Aggregation
answers few queries
20
2 4 MB 6 8
48
HOLAP
Target is to get the best of both worlds.
49
DOLAP
Cube on the
remote server
Local
Machine/Server
50
End
51