Professional Documents
Culture Documents
Chapter 1: Query Processing and Optimization: Slides By: Ms. Shree Jaswal
Chapter 1: Query Processing and Optimization: Slides By: Ms. Shree Jaswal
Chapter 1: Query Processing and Optimization: Slides By: Ms. Shree Jaswal
Processing
and Optimization
Slides by: Ms. Shree Jaswal
ADMT chp1
• The slides in this presentation are made by referring the above mentioned author’s slides.
2
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Topics to be covered
• Overview: Introduction, Query processing in DBMS,
Steps of Query Processing
• Measures of Query Cost: Selection Operation, Sorting,
Join Operation, Other Operations
• Evaluation of Expressions.
ADMT chp1
• Self-learning Topics: Solve problems on query
optimization.
3
Prerequisite
Topics
• Reviewing basic concepts of a Relational database,
• SQL concepts
Relational Database
• Entity
• Relationship
ADMT chp1
6
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
AGGREGATE FUNCTIONS (These compute summary of
information: for example, SUM, COUNT, AVG, MIN, MAX)
7
Examples on Relational Algebra
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Operations
• Consider the following schema
ADMT chp1
58 Rusty 10 35.0
58 103 11/12/96 8
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
sname rating
ADMT chp1
Yuppy 9
Rusty 10
9
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Set Operations
• Union (Either-or) : Two relation instances are said to be union-compatible if the
following conditions hold:
they have the same number of the fields,
and corresponding fields, taken in order from left to right, have the same domains.
• S1 S2
ADMT chp1
• Is R1 S1 possible?
• No
• not a valid operation because the two relations are not union-compatible.
10
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Set Operations
• Intersection (Both): S1 S2
ADMT chp1
22 Dustin 7 45.0
11
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Set Operations
• Cross-product (Cartesian product): S1 x R1 returns a relation instance
whose schema contains all the fields of S1 (in the same order as they appear
in S1) followed by all the fields of R1 (in the same order as they appear in R1).
• S1 x R1
ADMT chp1
31 Lubber 8 55.5 58 103 11/12/96
58 Rusty 10 35.0 22 101 10/10/96
58 Rusty 10 35.0 58 103 11/12/96
12
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Equijoin: when the join condition consists solely of equalities of the form R.name1
= S.name2, that is, equalities between two fields in R and S.
ADMT chp1
S1 S1.sid=Rl.sid R1
sid
22
58
13
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
• S1 S1.sid=Rl.sid R1 is actually a natural join
14
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
15
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
58 Rusty 10 35.0 58 103 11/12/96
16
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
17
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
58 Rusty 10 35.0 58 103 11/12/96
18
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Aggregate Functions
• Use of the Aggregate Functional operation ℱ
ℱMAX Salary (EMPLOYEE) retrieves the maximum salary value from the
EMPLOYEE relation
ℱMIN Salary (EMPLOYEE) retrieves the minimum Salary value from the
ADMT chp1
20
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Recap of
Relational
Algebra
Operations
Query Tree
• A query tree is a tree data structure representing a relational
algebra expression.
• The tables of the query are represented as leaf nodes.
ADMT chp1
• The node is then replaced by the result table. This process continues
for all internal nodes until the root node is executed and replaced by
the result table.
22
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Query Tree
• A B C
Example
• Employee
EmpID EName Salary DeptNo DateOfJoining
ADMT chp1
24
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Example
• Let us consider the query as the following.
Example
• Let us consider another query involving a join.
2. Optimization
3. Evaluation
ADMT chp1
29
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
evaluated in many ways.
30
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
31
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
32
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
33
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Disk
structure
ADMT chp1
data is read back after being written to ensure that the write
was successful
36
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
37
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
disk I/O
But hard to take into account for cost estimation
38
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
• The index is called an access path on the field.
39
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Primary Index
Also referred to as a clustering index
Defined on an ordered data file
The data file is ordered on a key field
Includes one index entry for each block in the
data file; the index entry has the key field value for
ADMT chp1
An index that is not primary index is called as a
secondary index
40
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Primary Index
Secondary Index
B+ tree
• B+ tree is an n-array tree with a variable but often large number of children per
node. A B+ tree consists of a root, internal nodes and leaves. The root may be either
a leaf or a node with two or more children.
• In a B+ tree, data stored only in leaf nodes.
• The leaf nodes of the tree stores the actual record rather than pointers to records.
ADMT chp1
•
43
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
B+ tree
Selection Operation
• File scan
• Algorithm A1 (linear search). Scan each file block and test all records to see
whether they satisfy the selection condition.
Cost estimate = br block transfers + 1 seek
br denotes number of blocks containing records from relation r
If selection is on a key attribute, can stop on finding record
• Note: binary search generally does not make sense since data is not stored
ADMT chp1
consecutively
except when there is an index available,
and binary search requires more seeks than index search
45
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
1 seek for each level of tree and one seek for 1st block
Cost = hi * (tT + tS) + tT * b
46
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
Cost = (hi + n) * (tT + tS)
Can be very expensive!
47
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
tuple > v; do not use index
Cost is identical to A3
48
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
Cost is identical to A4
49
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
all the obtained sets of record pointers.
Then fetch records from file
If some conditions do not have appropriate indices, apply test on
retrieved records in memory.
50
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Implementation of Complex
Selections
• Disjunction:1 2 . . . n (r).
• A10 (disjunctive selection by union of identifiers).
Applicable if all conditions have available indices.
Otherwise use linear scan.
ADMT chp1
If very few records satisfy , and an index is applicable to
Find satisfying records using index and fetch from file
51
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Sorting
• Sorting in DBMS may be required for 2 reasons:
SQL queries specify that the output be sorted, or
Several operations like joins can be implemented efficiently if
input relations are sorted first
ADMT chp1
•
used. For relations that don’t fit in memory, external sort-
merge is a good choice.
53
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
External Sort-Merge
• Let M denote memory size (in pages).
1. Create sorted runs. Let i be 0 initially.
Repeatedly do the following till the end of the relation:
ADMT chp1
55
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
R0 R1 R2 R3 R4
ADMT chp1
56
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
read the next block (if any) of the run into the buffer.
3. until all input buffer pages are empty:
57
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
58
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
R0 R1 R2 R3 R4
ADMT chp1
5 way merge
Output Disk 59
Example: External Sorting Using Sort-Merge
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
a 19 a 19
g 24 d 31 a 14
b 14
a 19 g 24 a 19
c 33
d 31 b 14
b 14 d 31
c 33 c 33
c 33 e 16
b 14 d 7
g 24
ADMT chp1
p 2
a 14 d 7 r 16
r 16
p 2
initial sorted
relation runs runs output
create
runs
merge
pass–1
merge
pass–2
60
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
In pass 2: 6/3= 2 sorted runs of 80 pages and 28 pages
In pass 3: sorted file of 108 pages
• Thus no. of passes = 1+ log 5–1(108 / 5) =4
61
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Simple Example
ADMT chp1
= br ( 2 log M–1 (br / M) + 1)= 12(2*2+1)= 60 block transfers
Where br=12 and M=3
• Seeks: next slide
63
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
Total number of seeks:
2 br / M + 2 br logM–1(br / M) - br
= 2 br / M + br ( 2 logM–1(br / M) - 1)
=2(4) + 12(2(2)-1)=44 disk seeks 64
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
• Assumptions: No. of buffers fit in memory =3, no. of tuples fit in one buffer =1
EmpId Name
1001 Jayanti
1002 Pramod
1003 Neha
ADMT chp1
1010 Abhishek
1011 Santosh
65
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Example –Stage 1
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
Jayan Pram Neha Niles Mayu Vinay Shree Aksha Jaya Abhis Santo
ti od h r ak ta hek sh
1, 3, 2, 5, 4, 6, 8, 9, 7, 10, 11,
Jayan Neha Pram Mayu Niles Vinay Aksha Jaya Shree Abhis Santo
ti od r h ak ta hek sh
ADMT chp1
66
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Example –Stage 2
• Read M-1 input buffers and 1 output buffer to write sorted output
EmpId Name
1001 Jayanti
1005 Mayur
ADMT chp1
1009 Jaya
1011 Santosh
1007 Shree
67
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Example –Stage 2
EmpId Name
1010 Abhishek
1008 Akshata
1009 Jaya
ADMT chp1
1007 Shree
1006 Vinayak
68
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Join Operation
• Several different algorithms to implement joins
Nested-loop join
Block nested-loop join
ADMT chp1
69
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
• Expensive since it examines every pair of tuples in the two relations.
70
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
•
Nested-Loop Join (Cont.)
In the worst case, if there is enough memory only to hold one block of each relation,
• the estimated cost is
nr bs + br block transfers, plus
n r + br seeks
• If the smaller relation fits entirely in memory, use that as the inner relation.
Reduces cost to br + bs block transfers and 2 seeks
ADMT chp1
• and cost estimate will be 100+400 =500 block transfers.
71
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
end
end
end
72
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
If equi-join attribute forms a key or inner relation, stop inner
loop on first match
73
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
• In the worst case, we have to read each block of takes once for each block of
student. Thus, in the worst case, a total of 100 ∗ 400 + 100 = 40,100 block
transfers plus 2∗100 = 200 seeks are required.
• The best-case cost remains the same—namely, 100 + 400 = 500 block
transfers and 2 seeks.
ADMT chp1
74
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
c can be estimated as cost of a single selection on s using the join condition.
• If indices are available on join attributes of both r and s,
use the relation with fewer tuples as the outer relation.
75
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
c is computed by applying Algorithm A2 cost = (hi + 1) * (tT + tS) = (4+1)*1
CPU cost likely to be less than that for block nested loops join
76
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Merge-Join( sort-merge-join)
1. Sort both relations on their join attribute (if not already sorted on
the join attributes).
2. Merge the sorted relations to join them
1. Join step is similar to the merge stage of the sort-merge
algorithm.
2. Main difference is handling of duplicate values in join attribute
ADMT chp1
77
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
1010 Abhishek 04
1011 Santosh 04
78
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
1010 Abhishek 04 Marketing
1011 Santosh 04 Marketing
79
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Merge-Join (Cont.)
• Can be used only for equi-joins and natural joins
• Each block needs to be read only once (assuming all
tuples for any given value of the join attributes fit in
ADMT chp1
80
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Merge-Join (Cont.)
• Example of student takes
If the relations are already sorted on the join attribute ID, then merge join takes a total of 400 + 100
= 500 block transfers.
If we assume that in the worst case only one buffer block is allocated to each input relation (that is,
bb = 1), a total of 400 + 100 = 500 seeks
• Suppose the relations are not sorted, and the memory size is the worst case, only three blocks. The
cost is as follows:
1. sorting relation takes requires log M–1(br / M) =log3−1(400/3) = 8 merge passes.
ADMT chp1
The number of seeks = 2 ∗ 100/3 + 100 ∗ (2 ∗ 6 − 1) = 1164, and 100 seeks are required for writing
the output, for a total of 1264 seeks.
3. merging the two relations takes 400 + 100 = 500 block transfers and 500 seeks.
• Thus, the total cost is 9100 block transfers plus 8932 seeks if the relations are not sorted, and the
memory size is just 3 blocks.
81
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Example
• Suppose that the relation R (with 150 pages) consists of one attribute a and S (with 90
pages) also consists of one attribute a. Determine the optimal join method for processing the
following query:
• Assume also that the DBMS only has available the following join methods: nested-loop,
block nested loop and sort-merge.
• Determine the number of page I/Os required by each method to work out which is the
cheapest.
ADMT chp1
82
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Example Solution
• Simple Nested Loops: nr bs + br
We use relation S as the outer loop. Total Cost = 90 + (90×150) = 13590
• Block Nested Loops: = br / (M-2) bs + br block transfers + 2 br / (M-2) seeks
If R is outer: Total Cost = 150 + (90×ceil(150/(10-2))) = 1860
If S is outer: Total Cost = 90 + (150×ceil(90/(10-2))) = 1890
ADMT chp1
• Total Cost = 1500 + 270 + 240 = 2010
• Therefore, the optimal way to process the query is Block Nested Loop join.
83
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Hash-Join
• Applicable for equi-joins and natural joins.
• A hash function h is used to partition tuples of both relations
• h maps JoinAttrs values to {0, 1, ..., n}, where JoinAttrs denotes the
common attributes of r and s used in the natural join.
ADMT chp1
•
n is denoted as nh.
84
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Hash-Join (Cont.)
•r tuples in ri need only to be compared with s
tuples in si Need not be compared with s tuples
in any other partition, since:
ADMT chp1
86
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Hash-Join Algorithm
• The hash-join of r and s is computed as follows.
1.Partition the relation s using hashing function h. When partitioning
a relation, one block of memory is reserved as the output buffer for
each partition.
ADMT chp1
index. Output the concatenation of their attributes.
• Relation s is called the build input and r is called the probe
input.
87
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
Rarely required
88
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Handling of Overflows
• Partitioning is said to be skewed if some partitions have significantly more tuples than
some others
• Hash-table overflow occurs in partition si if si does not fit in memory. Reasons could
be
Many tuples in s with same value for join attributes
Bad hash function
ADMT chp1
• Both approaches fail with large numbers of duplicates
Fallback option: use block nested loops join on overflowed partitions
89
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Cost of Hash-Join
• If recursive partitioning is not required: cost of hash join is
3(br + bs) +4 nh block transfers +
2( br / bb + bs / bb) seeks
nh is overhead for partially filled blocks which can be ignored
ADMT chp1
• If the entire build input can be kept in main memory no partitioning is
required
Cost estimate goes down to br + bs.
90
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
3(100 + 400) = 1500 block transfers +
2( 100/3 + 400/3) = 336 seeks
91
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
92
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Other Operations
• Duplicate elimination can be implemented via hashing or
sorting.
On sorting duplicates will come adjacent to each other, and
all but one set of duplicates can be deleted.
Optimization: duplicates can be deleted during run
ADMT chp1
perform projection on each tuple followed by duplicate
elimination.
If projection includes a key, no duplicates will exist
93
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Hash Index
ADMT chp1
95
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
aggregates
For avg, keep sum and count, and divide sum by count at the
end
96
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
2. At end of si add the tuples in the hash index to the result.
97
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
delete it from the index.
2. At end of si add remaining tuples in the hash index
to the result.
98
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
Right outer-join and full outer-join can be computed similarly.
99
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Example
• SELECT FNAME, DNAME
FROM (EMPLOYEE LEFT OUTER JOIN DEPARTMENT
ON DNO = DNUMBER);
• Note: The result of this query is a table of employee names and their associated departments. It is
similar to a regular join result, with the exception that if an employee does not have an associated
department, the employee's name will still appear in the resulting table, although the department name
would be indicated as null.
ADMT chp1
{UNION the temporary tables to produce the LEFT OUTER JOIN}
RESULT TEMP1 υ TEMP2
• The cost of the outer join, as computed above, would include the cost of the
associated steps (i.e., join, projections and union).
100
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
101
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Evaluation of Expressions
• So far: we have seen algorithms for individual operations
• Alternatives for evaluating an entire expression tree
Materialization: generate results of an expression
ADMT chp1
102
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Materialization
• Materialized evaluation: evaluate one operation at a time, starting at
the lowest-level. Use intermediate results materialized into temporary
relations to evaluate next-level operations.
• E.g.,
ADMT chp1
and finally compute the projection on name.
103
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Materialization (Cont.)
• Materialized evaluation is always applicable
• Cost of writing results to disk and reading them back can be
quite high
Our cost formulas for operations ignore cost of writing results
ADMT chp1
execution time
104
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Pipelining
• Pipelined evaluation : evaluate several operations simultaneously,
passing the results of one operation on to the next.
• E.g., in previous expression tree, don’t store result of
building="Watson " (department)
instead, pass tuples directly to the join.. Similarly, don’t store result of
ADMT chp1
output tuples even as tuples are received for inputs to the operation.
• Pipelines can be executed in two ways: demand driven and producer
driven
105
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Pipelining (Cont.)
• In demand driven or lazy evaluation
system repeatedly requests next tuple from top level operation
Each operation requests next tuple from children operations as
required, in order to output its next tuple
In between calls, operation has to maintain “state” so it knows
ADMT chp1
106
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Pipelining (Cont.)
• Implementation of demand-driven pipelining
Each operation is implemented as an iterator implementing the
following operations
open()
E.g. file scan: initialize file scan
state: pointer to beginning of file
ADMT chp1
next output tuple is found. Save pointers as iterator state.
close(): tells iterator that no more tuples are required
107
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Pipelining (Cont.)
• In producer-driven or eager pipelining
Operators produce tuples eagerly and pass them up to their
parents
Buffer maintained between operators, child puts tuples in
ADMT chp1
• Alternative name: push model of pipelining
• Useful in parallel processing systems
108
Query Optimization
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
• DBMS Structure
Introduction
• Alternative ways of evaluating a given query
Equivalent expressions
Different algorithms for each operation
Introduction (Cont.)
• An evaluation plan defines exactly what algorithm is used for
each operation, and how the execution of the operations is
coordinated.
Introduction (Cont.)
• Cost difference between evaluation plans for a query can
be enormous
E.g. seconds vs. days in some cases
• Steps in cost-based query optimization
1. Generate logically equivalent expressions using
ADMT chp1
113
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Introduction (Cont.)
• Estimation of plan cost based on:
Statistical information about relations.
Examples:
ADMT chp1
statistics
114
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Transformation of Relational
Expressions
• Two relational algebra expressions are said to be
equivalent if the two expressions generate the
same set of tuples on every legal database
ADMT chp1
115
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Transformation of Relational
Expressions
• In SQL, inputs and outputs are multisets of
tuples
Two expressions in the multiset version of the
relational algebra are said to be equivalent if
ADMT chp1
or vice versa
116
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Equivalence Rules
Example
• Consider the following University example with relation
schemas:
• Instructor(ID, name, dept_name, salary)
ADMT chp1
123
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
size of the relation to be joined.
124
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
125
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
dept_name = “Music” (instructor) year = 2009 (teaches)
126
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Example
An un-optimized relational algebra expression:
Name (
GPA3.5 and Title = ′Ada Programming Language’ and Students.SSN = Enrollment.SSN and Enrollment.Course_no = Courses.Course_no
Example contd.
• Initial query tree:
Name
GPA 3.5 and Title = 'Ada Programming Language’
ADMT chp1
Courses
Students Enrollment 129
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Example contd.
• Perform selections as early as possible.
Name
Enrollment.Course_no = Courses.Course_no
ADMT chp1
Courses
Students
130
Transformation Example: Pushing
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Projections
• Consider: name, title(dept_name= “Music” (instructor) teaches)
course_id, title (course))))
• When we compute
(dept_name = “Music” (instructor teaches)
ADMT chp1
course_id, title (course))))
• Performing the projection as early as possible reduces the size of the relation
to be joined.
131
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
(r1 r2) r3
so that we compute and store a smaller temporary
ADMT chp1
relation.
132
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
dept_name= “Music” (instructor) teaches
first.
133
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Enumeration of Equivalent
Expressions
• Query optimizers use equivalence rules to systematically generate
expressions equivalent to the given expression
• Can generate all equivalent expressions as follows:
Repeat
ADMT chp1
Optimized plan generation based on transformation rules
Special case approach for queries with only selections, projections and
joins
134
Estimating Statistics of Expression
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Results
• Statistical Information for Cost Estimation using Catalog
information: DBS catalog stores info about DB relations…
• nr: number of tuples in a relation r.
• br: number of blocks containing tuples of r.
ADMT chp1
•
nr ùú é
br = ú ê
fr ú ê
ê 135
Histograms
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
50
frequency
30
20
10
ADMT chp1
1–5 6–10 11–15 16–20 21–25
Equi-width histograms
value
Equi-depth histograms
136
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
In absence of statistical information c is assumed to be nr / 2.
137
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
1. Search all the plans and choose the best plan in a cost-based
fashion.
2. Uses heuristics to choose a plan.
138
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
future use.
139
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
Heuristic Optimization
• Cost-based optimization is expensive
• Systems may use heuristics to reduce the number of choices that must
be made in a cost-based fashion.
ADMT chp1
Some systems use only heuristics, others combine heuristics with
partial cost-based optimization.
140
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
• Explain brute force nested loop join algorithm (Dec 2018, 2020)……10M…Ans: Chp12,
pg 550
• Why BCNF is called as stricter than 3NF? Justify your answer. (Dec
2019)……5M….out of syllabus…Ans: Chp8, pg 333- pg336, watch video
https://www.youtube.com/watch?time_continue=18&v=NNjUhvvwOrk&feature=emb_l
ogo
ADMT chp1
•
syllabus…Ans: Chp13, pg 607, watch video
https://www.youtube.com/watch?v=06HlvmB8mDk
Note: Chapter number and page numbers are from the book, Korth, Slberchatz,Sudarshan, :”Database
System Concepts”, 6th Edition, McGraw – Hill 142
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
ADMT chp1
4. Apply SELECT and PROJECT operations before applying the
JOIN operation at the earliest
143