Query Optimization

Query Optimization
These slides are prepared based on the Lecture

Notes of CS186, Berkeley 2021
Yücel Saygın
Sample SQL Query
SELECT sname, sid
FROM Sailors S, Reserves R
Where S.sid = R.sid AND date = ’28.05.2024’
Lets write the RA expression and an execution plan

Different Execution Plans
• Many execution plans!
• Which one to choose?
• What is the criteria for choosing an execution plan?
• Is it possible to find the optimal query execution plan?
Heuristics and Estimation
• Estimation of the Cost of a given execution plan.
• The selectivity of an operator is an approximation for the
percentage of pages it will return (or pass to the next
operator).
• Why is it important to estimate the selectivity of an
operator?
Selectivity of some operators
• Ex. rating = 5
• With uniform distribution assumption, what would be the
selectivity of rating = 5 assuming that rating is between 1
and 10?
• In general selectivity of X=a: 1/(unique vals in X)
• Ex. rating > 5
• What would be the selectivity of the above condition?
• In general selectivity of X>a :
(max(X)- a) / (max(X)- min(X) + 1)
• Ex. Sailors.sid = Reserves.sid
• What would be the selectivity of the above condition?
• In general selectivity of X=Y :
1/max(unique vals in X, unique vals in Y)
• Ex. rating = 5 AND age = 20
• What would be the selectivity of the above condition
assuming that the age is between 0 and 99?
• In general selectivity of cond1 AND cond2:
Selectivity(cond1) * Selectivity(cond2)
Selectivity of Joins
• If we join tables A and B on the condition A.id = B.id
Selectivity of the join is:
|A|∗|B|/max(unique vals in A.id, unique vals in B.id)
Basic Heuristics
• There are many possible query plans for a reasonably complex
query
• We need some way to reduce the number of plans that we
actually consider
1. Push down projects (π) and selects (σ) as far as they can go
down
2. Only consider left deep plans
3. Do not consider cross products unless they are the only option
Benefits of Pushing down
selection and projection operators
• Pushing down selection is an obvious choice.

• How about the projection operator?
Benefits of Pushing down
selection and projection operators
• Pushing down selection is an obvious choice.

• How about the projection operator?
• Can you push down all projection operators?
Considering left deep plans
• Aim: reducing the search space and pipelining

• Which one of the above allows pipelining?
Avoiding Cross Products
• Cross products are the worst!
System R Query Optimizer
• System R uses all the heuristics described in the previous
slides.
• The first pass of System R determines how to access
tables optimally or interestingly (we will define what we
mean by interesting later on).
• We have two options for how to access a table during the
first pass:
1. Full Scan
2. Index Scan (for every index the table has built on it)
• Cost of Full Scan for a table A is [A] I/O operations since
it needs to read in every page.
• Cost of an index scan, the number of I/O operations
depends on how the records are stored and whether or
not the index is clustered.
Lets remember the alternatives for
Data Entry k* in Index
Three alternatives:
1. Data record with key value k
2. <k, rid of data record with search key value k>
3. <k, list of rids of data records with search key k>
• If the data entries are the data records themselves (i.e.
• Alternative 1) then indexes have an IO cost of:
• (cost to reach level above leaf) + (num leaves read)
Example
• Table A has [A] pages
• There is an alternative 1 index built on C1 of height 2
• There are 2 conditions in our query: C1 > 5 and C2 < 6
• C1 and C2 both have values in the range 1-10
• What is the selectivity of C1 > 5 and C2 < 6 ?

Example
• Table A has [A] pages
• There is an alternative 1 index built on C1 of height 2
• There are 2 conditions in our query: C1 > 5 and C2 < 6
• We can do an index scan for C1 : 2 I/O operations to read

the internal nodes (since the index is of height 2) then
we will read half of the leaf pages adding 0.5[A] I/O
operations
• For alternative 2 and 3 indexes, the formula is a little different:
(cost to reach level above leaf) +
(num of leaf nodes read) +
(num of data pages read)
For alternatives 2 and 3:
The index could be Clustered or
Unclustered
Index entries
CLUSTERED direct search for UNCLUSTERED
data entries
Data entries Data entries

(Index File)
(Data file)
Data Records Data Records

Example
• Table B with [B] data pages and |B| records
• Alt 2 index on column C1, with a height of 2 and [L] leaf
pages
• There are two conditions: C1 > 5 and C2 < 6
• If the index is clustered, the scan will take 2 I/Os to reach the
index node above the leaf level, it will then have to read 0.5[L]
leaf pages, and then 0.5[B] data pages. Therefore, the total is
2 + 0.5[L] + 0.5[B]. If the index is unclustered, the formula is
the same except we have to read 0.5|B| data pages instead. So
the total number of I/Os is 2 + 0.5[L] + 0.5|B|.

Query Optimization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Query Optimization

Uploaded by

Copyright:

Available Formats

Query Optimization

These slides are prepared based on the Lecture

Lets write the RA expression and an execution plan

• Pushing down selection is an obvious choice.

• Pushing down selection is an obvious choice.

• Aim: reducing the search space and pipelining

• What is the selectivity of C1 > 5 and C2 < 6 ?

• We can do an index scan for C1 : 2 I/O operations to read

Data entries Data entries

Data Records Data Records

You might also like