Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Adaptive Optimization

of Very Large Join


Queries
Thomas Neumann
Bernhard Radke
Conference : SIGMOD 2018
Authors

 Thomas Neumann
 Professor at Technical University of Munich
 6 papers got accepted in VLDB 2020, by Thomas Neumann group
 Adaptive join for large queries, project received funding from
European Research Council.

 Bernhard Radke, M.Sc.


 Technical University of Munich

2/21
Motivation for adaptive optimization

 More Complex Queries(means more


relations and joins)
 Database should perform equally
good for complex query
 PostgreSql use genetic
algorithm(heuristic approach) for
12+ relation query.
 Accidental fall off in quality of
query plan

3/21
Formalizing the problem • Sum of
sizes of intermedi
ate results
• Follows
 Target : Commercial Databases( which also supports non-inner join, outer join, ASI property
etc.), should adapt and perform good for simple and complex queries(having large
number of relations) as well.

Cost function (Cout )

Query
Query Graph Query Graph( Q=(V,E) ) Query Optimizer
Generator

4/31
Queries can be divided into three
categories (depending on query graph and
number of relations)

Small Queries Medium Queries Large Queries

5/21
Small Queries

 Query having less than 14 Relations.


 If number of relations is more than 13 then
use countCC(Q) to find the budget graphs.
 CountCC(Q) : Counts the number of subgraph of given
Query graph Q.
 If number of subgraphs are less than budget, then it’s
a small query.
 Algorithm used : DPHyp
 DPHyp works on Hyper graph and supports non-inner
joins, cross joins and other such counterparts.

6/21
CountCC(Q)

7/21
Basic Idea of Hyper graphs

 Hyper graph
 Subgraph
 Connected
 Connected Subgraph
 Connected Complement Subgraph
 CSG-CMP pair
 Neighborhood
 Min(s) ( and ordering)

R1 < R2 < R3 < R4 < R5 < R6

8/21
Intitution of DPHyp

Images source : Dynamic Programming Strikes Back, by Guido Moerkotte and Thomas Neumann

9/21
Medium Queries
DPHyp

 DP cannot be used after some point, because optimization Time Complexity


becomes expensive for clique queries
is O(3n).
 DP becomes too expensive depending on shape of query graph.
 Linear query graph are easier for join ordering than complex
query graph(cliques)
 Reason: more subgraphs for complex query graphs
 Solution: Linearize the search space(consider only connected
sub-chains of linear ordering)
 IKKBZ Algorithm is used for linearization and then DP algorithm
is used to find optimal join tree.

10/21
(IK/KBZ)
Toshihide Ibraraki
IKKBZ Algorithm Tiko Kameda

Ravi KrishnaMurthy
Haran Boral
Carlo Zaniolo

Cout = (1-sel)/costs

IKKBZ Cyclic Graph : MST


(with minimizing
join selectivities

Time Complexity :
O(n^2)

11/21
LinearizedDP Query graph

Linear order left deep tree

LinearizedDP

12/21
Large Queries

 Problem with LinearizedDP is O(n3 ) [for DP phase]. Fine upto 100 relations
 Greedy approach + DP (idea from Iterative DP)
 Greedy Algorithm used : Greedy Operator Ordering (GOO)
 GOO produces good bushy plan and run efficiently
 Run LinearizedDP over subplan of size k(k=100), iteratively by choosing subplan with
maximum cost and size=k.
 LinearizedDP will run till whole budget(size of DP table) gets over.

13/21
GOO-DP

14/21
Join ordering constructed with GOO

Image source :New Heuristic for Optimizing Large Queries, Leonidas Fegaras

15/21
LinearizedDP++ :
Adaptive Algorithm works for non-
inner and cross
join

16/21
Experimental
Evaluation

17/21
Some Details

 Total optimizaion time : sum of


total time taken by all benchmark
queries.
 Normalized cost: All costs are
normalized with best non-cross join
found.
 Minsel : orders joins increasing in
selectivity
 DPSize : Bushy-tree (System R)
 DPSizeLinear: Left-deep(System R)
 MILP :- considers cross product as
well
 QuickPick : randomized algorithm.
 Genetic : Similar to the genetic
algorithm used in PostgreSql

18/21
Median optimization time for
diff. queries

Median Optimization Time for Random Tree Queries of Median Optimization Time for Random Tree Queries
Sizes 10–100 (100 queries per size) of Sizes 10–1000 (100 queries per size)
Comparison with existing Database
systems

20/21
Conclusion

 Handle wide range of queries.


 Can be used in commercial databases, support all kinds of SQL queries.
 Optimization time compared to commercial databases is quite low for larger
number of relations in query.
 Using better cost function than Cout, can improve the plan quality.
 If number of relation is less than 14, but the query graph is of clique type,
then complexity of DPHyp will be O(313).
 With addition of LinearizedDP++, there will be more speed up in the
optimization time of large queries, which have non-inner joins and better
plans.

21/21
Questions

You might also like