Adaptive Optimization of Very Large Join Queries: Thomas Neumann Bernhard Radke Conference: SIGMOD 2018

Adaptive Optimization
of Very Large Join

Queries
Thomas Neumann
Bernhard Radke
Conference : SIGMOD 2018
Authors
 Thomas Neumann
 Professor at Technical University of Munich
 6 papers got accepted in VLDB 2020, by Thomas Neumann group
 Adaptive join for large queries, project received funding from
European Research Council.
 Bernhard Radke, M.Sc.

 Technical University of Munich
2/21
Motivation for adaptive optimization
 More Complex Queries(means more

relations and joins)
 Database should perform equally
good for complex query
 PostgreSql use genetic
algorithm(heuristic approach) for
12+ relation query.
 Accidental fall off in quality of
query plan
3/21
Formalizing the problem • Sum of
sizes of intermedi
ate results
• Follows
 Target : Commercial Databases( which also supports non-inner join, outer join, ASI property
etc.), should adapt and perform good for simple and complex queries(having large
number of relations) as well.
Cost function (Cout )
Query
Query Graph Query Graph( Q=(V,E) ) Query Optimizer
Generator
4/31
Queries can be divided into three
categories (depending on query graph and
number of relations)
Small Queries Medium Queries Large Queries
5/21
Small Queries
 Query having less than 14 Relations.

 If number of relations is more than 13 then
use countCC(Q) to find the budget graphs.
 CountCC(Q) : Counts the number of subgraph of given
Query graph Q.
 If number of subgraphs are less than budget, then it’s
a small query.
 Algorithm used : DPHyp
 DPHyp works on Hyper graph and supports non-inner
joins, cross joins and other such counterparts.
6/21
CountCC(Q)
7/21
Basic Idea of Hyper graphs
 Hyper graph
 Subgraph
 Connected
 Connected Subgraph
 Connected Complement Subgraph
 CSG-CMP pair
 Neighborhood
 Min(s) ( and ordering)
R1 < R2 < R3 < R4 < R5 < R6
8/21
Intitution of DPHyp
Images source : Dynamic Programming Strikes Back, by Guido Moerkotte and Thomas Neumann
9/21
Medium Queries
DPHyp
 DP cannot be used after some point, because optimization Time Complexity

becomes expensive for clique queries
is O(3n).
 DP becomes too expensive depending on shape of query graph.
 Linear query graph are easier for join ordering than complex
query graph(cliques)
 Reason: more subgraphs for complex query graphs
 Solution: Linearize the search space(consider only connected
sub-chains of linear ordering)
 IKKBZ Algorithm is used for linearization and then DP algorithm
is used to find optimal join tree.
10/21
(IK/KBZ)
Toshihide Ibraraki
IKKBZ Algorithm Tiko Kameda
Ravi KrishnaMurthy
Haran Boral
Carlo Zaniolo
Cout = (1-sel)/costs
IKKBZ Cyclic Graph : MST

(with minimizing
join selectivities
Time Complexity :
O(n^2)
11/21
LinearizedDP Query graph
Linear order left deep tree
LinearizedDP
12/21
Large Queries
 Problem with LinearizedDP is O(n3 ) [for DP phase]. Fine upto 100 relations
 Greedy approach + DP (idea from Iterative DP)
 Greedy Algorithm used : Greedy Operator Ordering (GOO)
 GOO produces good bushy plan and run efficiently
 Run LinearizedDP over subplan of size k(k=100), iteratively by choosing subplan with
maximum cost and size=k.
 LinearizedDP will run till whole budget(size of DP table) gets over.
13/21
GOO-DP
14/21
Join ordering constructed with GOO
Image source :New Heuristic for Optimizing Large Queries, Leonidas Fegaras
15/21
LinearizedDP++ :
Adaptive Algorithm works for non-
inner and cross
join
16/21
Experimental
Evaluation
17/21
Some Details
 Total optimizaion time : sum of

total time taken by all benchmark
queries.
 Normalized cost: All costs are
normalized with best non-cross join
found.
 Minsel : orders joins increasing in
selectivity
 DPSize : Bushy-tree (System R)
 DPSizeLinear: Left-deep(System R)
 MILP :- considers cross product as
well
 QuickPick : randomized algorithm.
 Genetic : Similar to the genetic
algorithm used in PostgreSql
18/21
Median optimization time for
diff. queries
Median Optimization Time for Random Tree Queries of Median Optimization Time for Random Tree Queries
Sizes 10–100 (100 queries per size) of Sizes 10–1000 (100 queries per size)
Comparison with existing Database
systems
20/21
Conclusion
 Handle wide range of queries.

 Can be used in commercial databases, support all kinds of SQL queries.
 Optimization time compared to commercial databases is quite low for larger
number of relations in query.
 Using better cost function than Cout, can improve the plan quality.
 If number of relation is less than 14, but the query graph is of clique type,
then complexity of DPHyp will be O(313).
 With addition of LinearizedDP++, there will be more speed up in the
optimization time of large queries, which have non-inner joins and better
plans.
21/21
Questions

Adaptive Optimization of Very Large Join Queries: Thomas Neumann Bernhard Radke Conference: SIGMOD 2018

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Adaptive Optimization of Very Large Join Queries: Thomas Neumann Bernhard Radke Conference: SIGMOD 2018

Uploaded by

Copyright:

Available Formats

Adaptive Optimization

of Very Large Join

 Bernhard Radke, M.Sc.

 More Complex Queries(means more

Cost function (Cout )

Small Queries Medium Queries Large Queries

 Query having less than 14 Relations.

R1 < R2 < R3 < R4 < R5 < R6

 DP cannot be used after some point, because optimization Time Complexity

IKKBZ Cyclic Graph : MST

Linear order left deep tree

 Total optimizaion time : sum of

 Handle wide range of queries.

You might also like