Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 61

Week 3

Query Processing and Optimization


Relational Algebra Revision

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Relational Algebra Overview
 Relational Algebra consists of several groups of operations
 Unary Relational Operations


SELECT (symbol:  (sigma))
 PROJECT (symbol: (pi))
 RENAME (symbol:  (rho))
 Relational Algebra Operations From Set Theory
 UNION (  ), INTERSECTION ( ), DIFFERENCE (or MINUS, – )
 CARTESIAN PRODUCT ( x )
 Binary Relational Operations
 JOIN (several variations of JOIN exist)
 Additional Relational Operations
 OUTER JOINS
 AGGREGATE FUNCTIONS (These compute summary of information:
for example, SUM, COUNT, AVG, MIN, MAX) ℱ

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Examples
 Select the EMPLOYEE tuples whose department number is 4:
 DNO = 4 (EMPLOYEE)
 Select the employee tuples whose salary is greater than $30,000:
 SALARY > 30,000 (EMPLOYEE)

 Example: To list each employee’s first and last name and salary, the following is
used:

LNAME, FNAME,SALARY(EMPLOYEE)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Examples
 To retrieve the first name, last name, and salary of all
employees who work in department number 5, we must
apply a select and a project operation
 We can write a single relational algebra expression as
follows:
 
FNAME, LNAME, SALARY( DNO=5(EMPLOYEE))

 OR We can explicitly show the sequence of operations,


giving a name to each intermediate relation:
 DEP5_EMPS  
DNO=5(EMPLOYEE)

 RESULT   FNAME, LNAME, SALARY (DEP5_EMPS)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Examples
 To retrieve the social security numbers of all employees who
either work in department 5 (RESULT1 below) or directly
supervise an employee who works in department 5 (RESULT2
below)
 We can use the UNION operation as follows:
DEP5_EMPS  DNO=5 (EMPLOYEE)
RESULT1   SSN(DEP5_EMPS)
RESULT2(SSN)  SUPERSSN(DEP5_EMPS)
RESULT  RESULT1  RESULT2
 The union operation produces the tuples that are in either
RESULT1 or RESULT2 or both

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Examples
 Generally, CROSS PRODUCT is not a
meaningful operation
 Can become meaningful when followed by other
operations
 Example (not meaningful):
 FEMALE_EMPS   SEX=’F’(EMPLOYEE)
 EMPNAMES   FNAME, LNAME, SSN (FEMALE_EMPS)
 EMP_DEPENDENTS  EMPNAMES x DEPENDENT
 EMP_DEPENDENTS will contain every combination of
EMPNAMES and DEPENDENT
 whether or not they are actually related

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Examples
 Example: Suppose that we want to retrieve the name of the
manager of each department.
 To get the manager’s name, we need to combine each
DEPARTMENT tuple with the EMPLOYEE tuple whose SSN
value matches the MGRSSN value in the department tuple.
 We do this by using the join operation.


DEPT_MGR  DEPARTMENT MGRSSN=SSN EMPLOYEE
 MGRSSN=SSN is the join condition
 Combines each department record with the employee who
manages the department
 The join condition can also be specified as
DEPARTMENT.MGRSSN= EMPLOYEE.SSN

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Table 8.1 Operations of Relational
Algebra

continued on next slide

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Table 8.1 Operations of Relational
Algebra (continued)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Query Tree Notation
 Query Tree
 An internal data structure to represent a query
 Standard technique for estimating the work involved in
executing the query, the generation of intermediate results,
and the optimization of execution
 Nodes stand for operations like selection, projection, join,
renaming, division, ….
 Leaf nodes represent base relations
 A tree gives a good visual feel of the complexity of the query
and the operations involved
 Algebraic Query Optimization consists of rewriting the query
or modifying the query tree into an equivalent tree.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Example of Query Tree

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


How queries are processed In SQL?

 SQL is a nonprocedural language in which we


specify What we need without How we can
get it.

 With higher level database query languages


such as SQL, a special component of the
DBMS called the Query Processor takes care
of arranging the underlying access routines to
satisfy a given query.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


DB ACCESS

Programmers/user

Database
System

DBMS
Software
Software: Query Processing

Software: Data Access

Database
Database
Definition

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


How queries are processed?

A query is processed in four general steps:


1. Scanning and Parsing
2. Query Optimization or planning the execution
strategy
3. Query Code Generation (interpreted or
compiled)
4. Execution in the runtime database processor

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Relational Query Processing

Query
Scanning
Parsing
Validating
Intermediate form of Query
(query Tree)
Query
Optimizer
Catalog

Execution Plan
Query Code
Generator
Compile
d Query Executable Code
Code
Execution in
Runtime
processor

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


1. Query Recognition
 Scanning is the process of identifying the
tokens in the query.
 The tokenized representation is suitable for
processing by the parser.
 Token examples are SQL keywords, Attribute
names, Table names, …
 This representation may be in a tree form.

 Parser checks the tokenized representation


for correct syntax. This is according to rules of
language grammar

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


1. Query Recognition

 Validating, checks are made to determine if


columns and tables identified in the query
exist in the database.

 If the query passes the recognition checks,


the output (intermediate form of query) is
called the Canonical Query Tree.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Relational Query Processing

Query
Scanning
Parsing
Validating
Intermediate form of Query
(query Tree)
Query
Optimizer
Catalog

Execution Plan
Query Code
Generator
Compile
d Query Executable Code
Code
Execution in
Runtime
processor

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


2. Query Optimization
 The goal of the query optimizer is to find an
efficient strategy for executing the query using the
access routines.

 Optimization typically takes one of two forms:


Heuristic Optimization or
Cost Based Optimization

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


2. Query Optimization

 For any given query, there may be a number of


different ways to execute it.
 Each operation in the query (SELECT, JOIN, etc.) can
be implemented using one or more different Access
Routines.
 For example, an access routine that employs an index
to retrieve some rows would be more efficient than an
access routine that performs a full table scan.
 The query optimizer has determined the execution
plan

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Relational Query Processing

Query
Scanning
Parsing
Validating
Intermediate form of Query
(query Tree)
Query
Optimizer
Catalog

Execution Plan
Query Code
Generator
Compile
d Query Executable Code
Code
Execution in
Runtime
processor

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


3. Query Code Generator
 Once the query optimizer has determined the
execution plan (the specific ordering of access
routines), the code generator writes out the actual
access routines to be executed.
 With an interactive session, the query code is
interpreted and passed directly to the runtime
database processor for execution.
 It is also possible to compile the access routines
and store them for later execution

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Access Routines
 are algorithms that are used to access and
aggregate data in a database.
 A RDBMS may have a collection of general
access routines that can be combined to
implement a query execution plan.
 We are interested in access routines for
selection, projection, join and set operations such
as union, intersection, set difference, Cartesian
product, etc.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Relational Query Processing

Query
Scanning
Parsing
Validating
Intermediate form of Query
(query Tree)
Query
Optimizer
Catalog

Execution Plan
Query Code
Generator
Compile
d Query Executable Code
Code
Execution in
Runtime
processor

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


4. Execution in the runtime DB processor
 At this point, the query has been scanned,
parsed, planned and (possibly) compiled.
 The runtime database processor then executes
the access routines against the database.
 The results are returned to the application that
made the query in the first place.
 Any runtime errors are also returned.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Query Processing & Optimization

What is Query Processing?


 Steps required to transform high level SQL query into a
correct and “efficient” strategy for execution and
retrieval.

What is Query Optimization?


 The activity of choosing a single “efficient” execution
strategy (from hundreds) as determined by database
catalog statistics.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Example

R(A,B,C)
S(C,D,E)
SELECT B, D
FROM R, S
WHERE R.C=S.C AND
R.A = "c" AND
S.E = 2

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


R A B C S C D E
a 1 10 10 x 2
b 1 20 20 y 2
c 2 10 30 z 2
d 2 35 40 x 1
e 3 45 50 y 3
Answer B D
2 x
But this is your intelligent way..
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
• How to execute query?

- Do Cartesian product RxS.


Basic idea - Select tuples.
- Do projection.
projection

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


RxS R.A R.B R.C S.C S.D S.E
a 1 10 10 x 2
a 1 10 20 y 2
.
.
Got one... c 2 10 10 x 2
.
.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Problem

 A Cartesian product RxS may be LARGE:

 need to create and examine n x m tuples,


where n = |R| and m = |S|.
 For example, n = m = 1000 => 106 records.

 need more efficient evaluation methods.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Relational Algebra:
used to describe logical plans.
Ex: Original logical query plan
SELECTB,D B,D

WHERE...
R.A=“c” S.E=2  R.C=S.C

x
FROMR,S
R S

OR: B,D [ R.A=“c” S.E=2  R.C = S.C (RxS)]

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Improved logical query plan:

Plan II
B,D

natural join
R.A = “c” S.E = 2

R S

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


R S
A B C A='c'(R) E=2(S) C D E

a 1 10 A B C C D E 10 x 2
b 1 20 c 2 10 10 x 2 20 y 2
c 2 10 20 y 2 30 z 2
d 2 35 30 z 2 40 x 1
e 3 45 50 y 3

B,D
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Questions for Query Optimization
 Which relational algebra expression, equivalent to the given
query, will lead to the most efficient solution plan?

 For each algebraic operator, what algorithm (of several available)


do we use to compute that operator?

 How do operations pass data (main memory buffer, disk buffer,


…)?

 Will this plan minimize resource usage? (CPU/Response


Time/Disk)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Overview of Query Execution

SQL query
parse
parse tree
convert
answer
logical query plan
execute
apply laws
Pi
“improved” l.q.p
estimate result sizes statistics
pick best
l.q.p. +sizes {P1,C1>...}
consider physical plans estimate costs

{P1,P2,…..}
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Processing Steps

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Three Major Steps of Processing
(1) Query Decomposition
 Analysis
 Derive Relational Algebra Tree
 Normalization

(2) Query Optimization


 Heuristic: Improve and Refine relational algebra tree to
create equivalent Logical Query Plans
 Cost Based: Use database statistics to estimate
physical costs of logical operators in LQP to create
Physical Execution Plans

(3) Query Execution

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Query Decomposition

 RELATIONAL ALGEBRA TREE


 Root : The desired result of query
 Leaf : Base relations of query
 Non-Leaf : Intermediate relation created from relational algebra operation
 NORMALIZATION
 Convert WHERE clause into more easily manipulated form
 Conjunctive Normal Form(CNF) : (a v b)  [(c v d)  e] f (more efficient)
 Disjunctive Normal Form(DNF) : 

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Query Processing: Who needs it?
A motivating example:
Identify all managers who work in a London branch
SELECT *
FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
s.position = ‘Manager’ AND
b.city = ‘london’;

Results in these equivalent relational algebra statements

(1) (position=‘Manager’)^(city=‘London’)^(Staff.branchNo=Branch.branchNo) (Staff X Branch)

(2) (position=‘Manager’)^(city=‘London’) (Staff Staff.branchNo = Branch.branchNo Branch)

(3) [(position=‘Manager’) (Staff)] Staff.branchNo = Branch.branchNo [(city=‘London’) (Branch)]

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


A Motivating Example (cont…)
Assume:
 1000 tuples in Staff.
 ~ 50 Managers

 50 tuples in Branch.
 ~ 5 London branches

 No indexes or sort keys

 All temporary results are written back to disk (memory is small)

 Tuples are accessed one at a time (not in blocks)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Motivating Example: Query 1 (Bad)
(position=‘Manager’)^(city=‘London’)^(Staff.branchNo=Branch.branchNo) (Staff X Branch)

 Requires (1000+50) disk accesses to read from Staff and Branch


relations
 Creates temporary relation of Cartesian Product (1000*50) tuples
 Requires (1000*50) disk access to read in temporary relation and test
predicate

Total Work = (1000+50) + 2*(1000*50) =


101,050 I/O operations

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Motivating Example: Query 2 (Better)
(position=‘Manager’)^(city=‘London’) (Staff Staff.branchNo = Branch.branchNo Branch)

 Again requires (1000+50) disk accesses to read from Staff and Branch
 Joins Staff and Branch on branchNo with 1000 tuples
(1 employee : 1 branch )

 Requires (1000) disk access to read in joined relation and check predicate

Total Work = (1000+50) + 2*(1000) =


3050 I/O operations

3300% Improvement over Query 1

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Motivating Example: Query 3 (Best)
[ (position=‘Manager’) (Staff) ] Staff.branchNo = Branch.branchNo [ (city=‘London’) (Branch) ]

 Read Staff relation to determine ‘Managers’ (1000 reads)


 Create 50 tuple relation(50 writes)

 Read Branch relation to determine ‘London’ branches (50 reads)


 Create 5 tuple relation(5 writes)

 Join reduced relations and check predicate (50 + 5 reads)

Total Work = 1000 + 2*(50) + 5 + (50 + 5) =


1160 I/O operations

8700% Improvement over Query 1

Consider if Staff and Branch relations were 10x size? 100x? !!!
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe
Heuristic Optimization
GOAL:
 Use relational algebra equivalence rules to improve the
expected performance of a given query tree.

Consider the example given earlier:


 Join followed by Selection (~ 3050 disk reads)
 Selection followed by Join (~ 1160 disk reads)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Relational Algebra Transformations
Cascade of Selection

(1)p  q  r (R) = p(q(r(R)))

Commutativity of Selection Operations



(2)p(q(R)) = q(p(R))

In a sequence of projections only the last is required


 (3)LM…N(R) = L(R)

Selections can be combined with Cartesian Products and Joins


 (4)p( R x S ) = R  S p

(5)p( R  S ) = R  S Visual of 4
p

q q^p

p
x =

R S R S
Note : The above is an incomplete List! For a complete list see the text.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


More Relational Algebra Transformations

Join and Cartesian Product Operations are Commutative and


Associative
(6) R x S = S x R
(7) R x (S x T) = (R x S) x T
(8) R p S = S p R
(9) (R p S) q T = R p (S q T)

Selection Distributes over Joins


 If predicate p involves attributes of R only:
(10) p( R wvq S ) = p(R) q S
 If predicate p involves only attributes of R and q involves only
attributes of S:
(11) p^q(R r S) = p(R) r q(S)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Optimization Uses The Following Heuristics

 Break apart conjunctive selections into a sequence of simpler


selections (preparatory step for next heuristic).

 Move  down the query tree for the earliest possible execution
(reduce number of tuples processed).

 Replace -x pairs by  (avoid large intermediate results).

 Break apart and move as far down the tree as possible lists of
projection attributes, create new projections where possible
(reduce tuple widths early).

 Perform the joins with the smallest expected result first

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Heuristic Optimization Example
“What are the ticket numbers of the pilots flying to France on 01-01-06?”

SELECT p.ticketno
FROM Flight f , Passenger p, Crew c
WHERE f.flightNo = p.flightNo AND Canonical Relational Algebra Expression
f .flightNo = c.flightNo AND
f.date = ’01-01-06’ AND
f.to = ’FRA’ AND
p.name = c.name AND
c.job = ’Pilot’

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Heuristic Optimization (Step 1)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Heuristic Optimization (Step 2)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Heuristic Optimization (Step 3)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Heuristic Optimization (Step 4)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Heuristic Optimization (Step 5)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Heuristic Optimization (Step 6)

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe


Query Transformation Example

Figure 19.2 Steps in converting a query tree during heuristic


optimization. (a) Initial (canonical) query tree for SQL query Q.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 19- 56


Query Transformation Example
(cont’d.)

Figure 19.2 Steps in converting a query tree during heuristic optimization


(b) Moving SELECT operations down the query tree.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 19- 57


Query Transformation Example
(cont’d.)

Figure 19.2 Steps in converting a query tree during heuristic optimization


(c) Applying the more restrictive SELECT operation first.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 19- 58


Query Transformation Example
(cont’d.)

Figure 19.2 Steps in converting a query tree during heuristic optimization


(d) Replacing CARTESIAN PRODUCT and SELECT with JOIN operations.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 19- 59


Query Transformation Example
(cont’d.)

Figure 19.2 Steps in converting a query tree during heuristic optimization


(e) Moving PROJECT operations down the query tree.

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe Slide 19- 60


Covered Topics
 Relational Algebra Revision

 Query Processing

 Query Optimization

Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe

You might also like