Professional Documents
Culture Documents
Lecture 4: Queries, Query Processing and Optimization: Data Warehouse, Business Intelligence, Data Mining
Lecture 4: Queries, Query Processing and Optimization: Data Warehouse, Business Intelligence, Data Mining
Lecture 4: Queries, Query Processing and Optimization: Data Warehouse, Business Intelligence, Data Mining
2
Introduction (1/2)
Typical queries on Data Warehouses contain
aggregations, such as
How many articles in the product group electronic
devices were sold per month in each region in 2017?
Characteristics of typical Data Warehouse queries:
covering a large amount of tuples
selecting tuples in several or
all dimensions
often an aggregate function is
executed
Apple
3
Introduction (2/ 2)
multidimensional query:
queries include many or even all dimensions
specific optimization techniques are useful!
DataCube contains a lot of data
problem: aggregations on large data sets
4
Query processing: Overview
Algebra Code
Standardization
& Code generation
Logical simplification
Optimization
Algebra Plan
Physical Plan
Optimization
Optimization parameterization
Cost-based Plan
Choice
Compile Time Run Time
5
Query processing phases (1/2)
6
Query processing phases (2/2)
7
We already have seen multidimension
operations ..
dice
slice
rollup
drill down
drill across
rotate
8
Different kinds of
multidimensional queries
Range query Partial-match request
Product (item)
Product (item)
Time (days) Time (days)
Product (item)
Product (item)
8
Relational implementation of multi-
dimensional queries
10
Star-join: Example
11
Star-join: Construction
select clause
Key figures
possibly aggregated: SUM, AVG, MAX, MIN, COUNT
Granularity of the result, e.g. month, region
from clause
facts and dimension tables, Join conditions
where clause
restrictions (e.g .:
( Product.Product_category = 'Textbook' and
Geography.Country = 'Germany' and Time.Year = 2017)
group by
Grouping due to the granularity of the result
12
A bit longer quote
13
Optimizing Star Joins
14
Optimizing Star Joins / 2
Example: Join over fact table Sale and the three
dimension tables Product, Time and Geography:
4-way Join;
In RDBMS usually only pairwise Join: sequence of
pairwise Joins necessary
4! possible Join orders
Considered as heuristic are only the Joins that
are linked by joining criterion in queries
Joins between relations that are not linked by a
join condition in the query are not considered
To reduce the the number of possibe execution
plans
15
Optimizing Star Joins / 2
Assumptions:
Table Sales: 10,000,000 records
10 stores in Germany (out of 100)
20 selling days in January 2017 (of 1000 saved days)
50 products in product category „Textbook-Computer
Science" (out of 1000)
Equal distribution / same selectivity of the single
values
16
Optimizing Star Joins / 3
Heuristic provides e.g. the following execution plan:
Plan A:
σProduct_category ='Textbook-Computer
Science'
σMonth ='January 2017'
Product
σCountry ='Germany'
Sale
Time
Geography
16
Optimizing Star Joins / 4
The following execution plan is usually not considered
(with cross product of the dimension tables):
Plan B:
Sale
Geography Time
17
Star Join: Example of calculation
Assumptions:
Table Sales: 10,000,000 records
10 stores in Germany (out of 100)
20 selling days in January 2017 (of 1000 saved
days)
50 products in product category „Textbook-Computer
Science" (out of 1000)
Equal distribution / same selectivity of the single
values
Plan A: 1. Join: 1,000,000 tuples (as result)
2. Join: 20,000 tuple
3. Join: 1,000 tuples
Plan B: 1. Cross product: 200 tuples (as result)
2. Cross Product: 10,000 tuples
Join: 1,000 tuples
18
Oracle optimizer hints
select /* + star */ from ...
Hint to the optimizer that Star-Query is not entirely
uncontroversial (nicht unumstritten)
Hint categories
Hints for Optimization Approaches and Goals,
Hints for Access Paths, Hints for Query Transformations,
Hints for Join Orders
Hints for Join Operations,
Hints for Parallel Execution,
Additional Hints
19
Grouping and aggregation
Data analysis: aggregation of multi-dimensional data
SQL extensions for OLAP:
20
Grouping Sets
Used SQL syntax
group by grouping sets ((A, B), (A, C), (C));
21
Example
Initial data: Region Year Sale
MV 2017 20
Brandenburg 2017 30
Berlin 2017 50
MV 2016 23
Brandenburg 2016 35
Berlin 2016 40
22
Example
Region Year Sale
Initial data:
MV 2017 20
Brandenburg 2017 30
Berlin 2017 50
MV 2016 23
Brandenburg 2016 35
Berlin 2016 40
23
Grouping sets, another example
select Year, Quarter, sum (Orders) as Orders
from sales_order
group by grouping sets ((Year, Quarter) (Year))
order by Year, Quarter
25
Understanding GROUPING SETS / 2
26
Understanding GROUPING SETS / 3
27
Understanding GROUPING SETS / 4
Grouping Sets query Query without Grouping
Sets
select a, b, c, sum(d) ( select a, null, null, sum(d)
from t from t
group by grouping sets group by a)
(a, b, c) union
( select null, b, null, sum(d)
from t
group by b)
union
( select null, null, c, sum(d)
from t
group by c )
28
RollUp operator
RollUp
Sum of rows (or subtotal lines) are inserted into the
result set of a query with a group by clause.
Clause specifies, on which dimension attributes the
RollUp operation is executed
rollup(region, year)
Also usable to form aggregates along dimensions
Example:
rollup(year, month, day)
rollup(country, state, location)
29
RollUp: Example
select Year, Quarter, sum (Orders) as Orders
from sales_order ...
group by rollup (Year, Quarter)
order by Year, Quarter
1) Total number of
orders (Year,
Quarter)
2) Orders / year
(Year)
30
RollUp: Example / 2
31
Understanding Rollup
32
Understanding Rollup /2
( select a, b, c, sum(d)
select a, b, c, sum(d)
from t
from t
group by a, b, c)
group by rollup (a, (b, c))
union all
( select a, null, null, sum(d)
from t
group by a)
union all
( select null, null, null, sum(d)
from t
group by ())
28
CUBE operator
33
CUBE operator: Example
Product Region Year Sales
Product Region Year Sales Data Warehouses SANH 2017 45
Data Warehouses SANH 2017 45 Data Warehouses THÜR 2017 43
Data Warehouses THÜR 2017 43 CUBE ... ... ... ...
Data Warehouses SANH 2016 47 Data Warehouses SANH NULL 92
Data Warehouses THÜR 2016 42 Data Warehouses THÜR NULL 85
34
Cube
Used SQL syntax
group by cube (A, B, C);
Results in defined set
(A, B, C)
(A, B)
(A, C)
(B, C)
(A)
(B)
(C)
()
35
Example für the Cube operator
select Year, Quarter, sum(Orders) as Orders
from sales_order
group by cube (Year, Quarter)
order by Year, Quarter
36
Understanding CUBE
37
CUBE operator: SQL syntax
cube: extension of group by
Implementation in SQL Server, DB2, since Oracle 8i
Syntax:
38
Iceberg query
39
Summary Cube, RollUp, Grouping Set
operator
Cube operator:
Generates all combinations: e.g. for 4 grouping attributes
→ 16 combinations, for 3 grouping attributes: 8
combinations
Example: (A, B, C) → (A, B, C) (A, B) (A, C) (B, C) (A) (B)
(C) ()
RollUp operator:
Generates only combinations with superaggregates:
For the example: (A, B, C) → sum
() (A) (A, B) (A, B, C)
Grouping Sets sum
Generates only aggregate values for sum
exactly the specified set
For the example: (A, B, C) → (A, B, C)
40
Outlook
We will also have an exercise on the topic of SQL
extensions for warehouses
also: building the Star scheme, use it
Queries with grouping sets, rollup, cube should be
tested at that
41