Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Query Optimizer

Prepared by Harish Patnaik

School of Computer Engineering, KIIT Deemed to be University


Content

1. Steps for query optimization


2. Translating queries into algebra
3. Relational algebra equivalances
4. Cost estimation of query plan
5. Estimation of result size
Query Optimizer
• Steps for query optimization -
ü Queries are converted into blocks
ü Blocks are translated into relational algbra
expressions
ü Enumerating alternative plans for evaluating
these expressions
ü Estimating the cost of each plan and choosing the
plan with lowest cost
Converting into blocks

• SQL queries are optimized by decomposing them


into smaller blocks and then optimizing each block

• A block is an SQL query with no nesting and exactly


one SELECT clause and one FROM clause and at most
one WHERE clause, GROUP BY clause, HAVING
clause
Translating Queries into Algebra
Sailors(sid, sname, rating, age)
Boats (bid, bname, color)
Reserves (sid, bid, day, rname)
• For each sailor with the highest rating and at least two
reservations for red boats, find the sailor id and the
earliest date on which the sailor has a reservation for a
red boat.
• SELECT S.sid, MIN(R.day) FROM Sailors S, Reserves R,
Boats B WHERE S.sid=R.sid AND R.bid=B.bid AND
B.color=’Red’ AND S.rating = (SELECT MAX(S2.rating)
FROM Sailors S2) GROUP BY S.sid HAVING COUNT(*)>1;
Translating Queries into Algebra
• Relational algebra expression of first block -
SELECT - π(projection) WHERE - σ(selection)
FROM - X (cross product)

• The relational algebra expression is represented as σ π ×


expression

• The optimizer finds the best plan for σ − π − × expression. Then


apply GROUP BY clause, HAVING clause.
Relational algebra equivalances
• An optimizer enumerates plans by applying
several equivalances between relational
algebra expressions.
• Selections
• σc1∧c2∧...cn(R) ≡ σc1(σc2(...(σcn(R)))) (Cascade)
• σc1(σc2(R)) ≡ σc2(σc1(R)) (Commutative)
• Projections
• Successively eliminating columns from a relation is
equivalent to simply eliminating all but the columns
retained by the final projection
πa1(R) ≡ πa1(πa2(...(πan(R)))) (Cascade) where ai ⃀ a i+1
Relational algebra equivalances
• Cross-Product and Joins
• R × S ≡ S × R and R S ≡ S R (Commutative)
• R × (S × T) ≡ (R × S) × T and
R (S T) ≡ (R S) T (Associative)
• When joining several relations, we are free to join
the relations in any order we choose
Relational algebra equivalances
• Select, Project and Join
• πa(σc(R)) ≡ σc(πa(R)) (Commute)
• R c S ≡ σc(R × S) (join ≡ selection on cross)
• If the selection condition involves only attributes of
one of the arguments of cross-product or join -
σc(R × S) ≡ σc(R) × S and σc(R S) ≡ σc(R) S
• A selection can be replaced by a cascade of selections -
σc(R × S) ≡ σc1∧c2∧c3(R × S) ≡ σc1(σc2(σc3(R × S)))
≡ σc1(σc2(R) × σc3(S)) where c1 of R,S c2 of R, c3 of S,
• πa(R × S) ≡ πa1(R) × πa2(S) (Commute) c1,c2,c3 ⃀ c
• πa(R c S) ≡ πa1(R) c πa2(S) a1 is in R, a2 is in S
• πa(R c S) ≡ πa(πa1(R) c πa2(S)) a1,a2 ⃀ a or c
Cost estimation of query plan
• Cost estimation is required for each enumerated
plan.
• For each node in the tree, we must estimate the cost
of performing the corresponding operation. Costs are
affected significantly with pipelining or temporary
relations
• For each node, we must estimate the size of the
result and whether it is sorted. This result is the input
for the operation of the parent of the current node.
• Number of page IOs is used as the unit of cost.
Estimation of result size
• Size estimation plays an important role in cost
estimation as output of one operator can be the input
to another operator and the cost of an operator
depends on input size.
Ex - SELECT attr_list FROM rel_list WHERE term1∧ ..∧termn
• The maximum number of tuples in the result of the
query is the product of the cardinalities of relations in
the FROM clause. Every term of WHERE clause
eliminates some of the potential result tuples.
• The actual size of the result can be estimated as the
maximum size times the product of the reduction
factors for the terms in WHERE clause.
Computation of reduction factor
• column = value: reduction factor can be approximated by
1/NKeys(I) if there is an index I on column for the relation in
question.
NKeys(I) - no of distinct key values for index I
ü If there is no index on column, the System optimizer
arbitrarily assumes that the reduction factor is 1 /10
• column1 = column2: reduction factor can be approximated by
1/ MAX(NKeys(I1),NKeys(I2)) if I1 and I2 are the indexes on
column1 and column2 respectively.
ü If only one of two columns has an index I, reduction
factor is 1/NKeys(I)
ü If none of the columns has an index, reduction factor is 1/10
Computation of reduction factor
cont..
• column > value: Reduction factor is approximated by
(High(I)−value) /(High(I)−Low(I)) if there is an index I on
column
where High(I) - highest value in index I
Low(I) - lowest value in index I
üIf the column is not of arithmetic type or there is no
index, a fraction less than half is chosen
• column IN (list of values): reduction factor is the
reduction factor for ‘column = value’ multiplied by the
number of items in the list
Assumption - uniform distribution of values
Computation of reduction factor
cont..

• select sal from emplyee where sal=100000


No of employees= 200
No of distinct sal value= 5

You might also like