Professional Documents
Culture Documents
05.query Optimization.2pp
05.query Optimization.2pp
05.query Optimization.2pp
Structure of a DBMS
Applications
Application
Web Forms Front Ends SQL Interface
1!
Steps in Processing a Query
SQL statement
Query
Query Optimizer
Optimizer
Execution plan
Runtime DB Processor
2!
Conceptual Evaluation Strategy
Represented as a:!
Query Tree!
3!
Query Tree
Query Tree
π S.sname
SELECT S.sname
FROM Student S, Enrolled E
S.sid = E.sid WHERE S.sid = E.sid
AND S.gpa > 2
σ S.gpa > 2 σ E.cid = CS2550 AND E.cid = CS2550;
S E
4!
Query Processing: Selections
Query Type!
Do we have range query, point query, conjunction?!
Statistics!
Selectivity… 0-1!
– # of selected tuples / cardinality!
Examples: what is the (estimated) selectivity?!
‒ σ sex=ʻmʼ (Employee) !
‒ σ ssn=ʻ123456789ʼ (Employee)!
‒ σ salary_level=30000 (Employee)!
CS1655, Alexandros Labrinidis -- University of Pittsburgh! 10!
5!
Implementations for Selection
A1: linear search (brute force)!
Full scan!
A2: binary search!
Assume file is ordered on attribute !
A3: using primary index (or hash key)!
Equality on key attribute!
A4: using primary/clustering index – multiple records!
Equality on non-key attribute!
6!
Implementations for Selection…
How to handle conjunction (AND) / disjunction (OR)?!
External Sorting
7!
Sort-Merge
Example:!
Sort-Merge
Example: !
min(4, 205) then Dm = 4!
8!
Sort-Merge
Passes:!
Example: Dm = 4 then 205 runs to 52, 13, 4, 1!
Number of passes = logDm Nr!
J4: Hash-join!
9!
J1: Nested-loop join
10!
J2: Single-loop join
11!
J2: Single-loop join performance
Rule 1: Smaller file should be the outer join loop!
Rule 2: The file that has high JSF should be the outer join
loop !
12!
J4: Hash-join
Join relations R and S!
A is the common attribute in R,
B is the common attribute in S!
J4: Hash-join!
Partition Hash Join!
Hybrid Hash Join!
13!
Partition Hash Join
Join relations R and S!
Partitioning Phase!
Partition hash function!
R into M partitions: R1, R2, …, RM!
S into M partitions: S1, S2, …, SM!
– IDEA: Ri only needs to be joined with Si!
14!
Hybrid Hash Join
How ?!
Push selections down the tree!
Identify Joins!
Push project operations down the tree!
CS1655, Alexandros Labrinidis -- University of Pittsburgh! 30!
15!
Example of Query Tree
Π P.PNumber, P.DNum, E.Last, E.Address, E.DOB!
x!
x!
P!
E! D!
D.MgrSSN=E.SSN
P.DNum=D.DNumber E!
σ P.Location = “PGH”!
D!
P!
16!
General Transformation Rules - 1
Cascade of σ
σc1 AND c2 AND … AND Cn (R) = σc1 (σc2 ( … (σcn (R)) … ))!
Commutativity of σ
σc1 ( σc2 (R)) = σc2 ( σc1 (R)) !
Cascade of Π
Πlist1 (Πlist2 (… (ΠlistN (R)) … )) = Πlist1 (R)!
Commuting σ with Π
ΠA1, A2, …, An (σc(R)) = σc (ΠA1, A2, …, An (R))!
17!
General Transformation Rules - 3
Associativity of join, x, , !
(R * S) * T = R * (S * T)!
– where * can be any of join, x, , #
Commuting of P operation:!
ΠL (R S) = (ΠL (R)) (ΠL (S))!
18!
General Transformation Rules - 5
Other transformations:!
Any boolean transformation can still be applied,
e.g.:!
– not (C1 and C2) = (not C1) or (not C2)!
– not (C1 or C2) = (not C1) and (not C2)!
x!
x!
P!
E! D!
19!
Example of Query Tree (Step 2)
Π P.PNumber, P.DNum, E.Last, E.Address, E.DOB!
σ P.DNum=D.DNumber!
x!
x!
P!
E! D!
P.DNum=D.DNumber!
σ P.Location = “PGH”!
D.MgrSSN=E.SSN!
E! D! P!
20!
Example of Query Tree (Step 4)
D.MgrSSN=E.SSN!
P.DNum=D.DNumber! E!
σ P.Location = “PGH”! D!
P!
D.MgrSSN=E.SSN!
Π P.PNumber, P.Dnum!
E!
σ P.Location = “PGH”! D!
P!
CS1655, Alexandros Labrinidis -- University of Pittsburgh! 42!
21!
Example of Query Tree (Step 5½)
Π P.PNumber, P.DNum, E.Last, E.Address, E.DOB!
D.MgrSSN=E.SSN!
Π P.PNumber, P.Dnum!
E!
Π D.MgrSSN, D.DNumber!
σ P.Location = “PGH”!
P! D!
CS1655, Alexandros Labrinidis -- University of Pittsburgh! 43!
Π P.PNumber, P.Dnum! E!
σ P.Location = “PGH”! Π D.MgrSSN, D.DNumber!
P! D!
CS1655, Alexandros Labrinidis -- University of Pittsburgh! 44!
22!
Outline of algebraic optimization
1. Break up selections (with conjunctive conditions) into
a cascade of selection operators!
2. Push selection operators as far down in the tree as
possible!
3. Convert Cartesian products into joins!
4. Rearrange leaf nodes to:!
Execute first the most restrictive select operators!
– What is restrictive? (Fewest tuples or Smallest size)!
Make sure we donʼt have Cartesian products!
5. Move projections as far down as possible!
6. Identify subtrees that represent groups of operations
which can be executed by single algorithm!
CS1655, Alexandros Labrinidis -- University of Pittsburgh! 45!
Q: Is single-operator-at-a-time appropriate?!
Materialized evaluation or processing !
A: NO. Why?!
A: We would need temporary relations (and extra disk
space) to store intermediate results!
23!
Cost-based query optimization
Compiled queries VS interpreted queries!
Cost-based query optimization!
Full-scale optimization, taking costs & selectivity into
account to produce the lowest estimated cost plan!
Usually happens only for compiled queries!
Costs:!
Access cost to secondary storage!
Storage cost (for intermediate results)!
Computation cost !
Memory usage cost!
Communication cost (data/query shipping)!
CS1655, Alexandros Labrinidis -- University of Pittsburgh! 47!
24!
Cost Functions for Selection
A1: linear search (brute force)!
B/2 if found, B otherwise!
A2: binary search!
log2B if unique key!
log2B + s/bfr -1 if not key!
A3: using primary index (or hash key)!
1 for static or linear hashing, 2 for extensible!
Levels + 1!
A4: using primary/clustering index – multiple records!
Levels + (B/2) for primary index!
Levels + (s/bfr) for clustering index !
CS1655, Alexandros Labrinidis -- University of Pittsburgh! 49!
25!
Multiple relation queries with joins
A!
D! B!
C! C! D!
A! B!
A! B! C! D!
26!