ADB Chapter Two

CHAPTER TWO
Query processing and Optimization

Translating SQL Queries into Relational Algebra
Relational Algebra is a widely used procedural query language, which takes relation as an input
and generates relation as an output.
SQL Relational algebra query operations are performed recursively on a relation. The output of
these operations is a new relation, which might be formed from one or more input relations. It uses
operators to perform queries. An operator can be unary or binary.
Relational Algebra operations
Basic Operations:
1. Selection ( ): It selects a subset of rows (records) from a relation.

2. Projection ( ): Deletes unwanted columns tuples from relation.
3. Cross product (X): allows us to combine two relations.
4. Set difference (-): Tuples in relation-1 but not in relation-2.
5. Union (U): Tuples in relation-1 and in relation-2.
Additional Operations:
Intersection, join, division, and renaming are not essential, but (very!) useful.
1. Selection
 Selection operator is a unary operator that acts like a filter on a relation by returning
only a certain number of rows that satisfy the selection condition.
 The resulting relation will have the same degree (schema) as the original (input)
relation.
 The resulting relation may have fewer rows than the original relation.
 It is denoted by sigma:
Condition (R) (Returns only those tuples in R that satisfy condition C)
 The tuples (rows) to be returned are dependent on a condition that is part of the
selection operator.
 A condition C can be made up of any combination of comparison or logical operators
that operate on the attributes of R.
 Comparison operators:
 Logical operators:
Page | 1
Examples:
Assume the following relation is EMP.
 Select only those Employees in the CS department:

 Select * from EMP where Dept = CS
Dept = 'CS' (EMP)
Office Dept Rank

Name
Smith 400 CS Assistant
Jones 220 Econ Adjunct
Green 160 Econ Assistant
Brown 420 CS Associate
Smith 500 Fin Associate
The Resulting relation will be:
Name Office Dept Rank

 Select only those Employees with last name Smith who are assistant professors:
 Select * from EMP where Name=Smith and Rank=Assistant.
Name = 'Smith' Rank = 'Assistant' (EMP)
The resulting relation will be:

 Select only those Employees who are either Assistant Professors or in the Economics
department:
 Select * from EMP Rank=Assistant or Dept=Econ
Rank = 'Assistant' Dept = 'Econ' (EMP)
Page | 2

 Select only those Employees who are not in the CS department or Adjuncts:
 Select * from EMP where Dept=not CS or Adjunct
(Rank = 'Adjunct' Dept = 'CS') (EMP)
The resulting Relation will be:

2. Projection
 Projection is also a Unary operator that limits the attributes that will be returned
from the original relation.
 The Projection operator is pi:
 The general syntax is:
attributes R
(where attributes is the list of attributes to be displayed and R is the relation.)
 The projection method defines a relation that contains a vertical subset of relation.
 The resulting relation will have the same number of tuples as the original relation
(unless there are duplicate tuples produced).
 The degree (schema) of the resulting relation may be equal to or less than that of the
original relation because it contains exactly the fields in the projection list.
 Projection operator has to eliminate duplicates.
Examples
Assume the same EMP relation above is used.
 Project only the names and departments of the employees:
Page | 3
name, dept (EMP)
Name Dept
Smith CS
Jones Econ
Green Econ
Brown CS
Smith Fin
Example 2: The following relation is called Sailor
Sid Sname Rating Age

28 Smith 9 35
31 lubber 8 55
44 Guppy 5 35
58 Rusty 10 35
 Select Sname, Rating from Sailor
Sname, Rating (Sailor)
Sname Rating
Smith 9
lubber 8
Guppy 5
Rusty 10
Combining Selection and Projection
Page | 4
 The selection and projection operators can be combined to perform both
operations.
Office Dept Rank

Name
Example: - Show the names of all employees working in the CS department:

name ( Dept = 'CS' (EMP))
Name
Smith
Brown
Example: - Show the name and rank of those Employees who are not in the CS department or
Adjuncts:
name, rank ( (Rank = 'Adjunct' Dept = 'CS') (EMP) )
Name Rank
Green Assistant
Smith Associate
3. Cartesian product
The Cartesian product also referred to as a cross-join, returns all the rows in all the tables listed in
the query. Each row in the first table is paired with all the rows in the second table. This happens
when there is no relationship defined between the two tables.
R X S: The Cartesian product operation defines a relation that is the concatenation of every tuple
of relation R with every tuple of relation S
Example: The following two tables are named R and S
Page | 5
R S
First Last Age Dinner Dessert
Bill Smith 22 Steak Ice Cream
Mary Keen 23 Lobster Cheesecake
Tony Jones 32
RXS
First Last Age Dinner Dessert

Bill Smith 22 Steak Ice Cream
Bill Smith 22 Lobster Cheesecake
Mary Keen 23 Steak Ice Cream
Mary Keen 23 Lobster Cheesecake
Tony Jones 32 Steak Ice Cream
Tony Jones 32 Lobster Cheesecake
4. Union, Intersection and Set Difference
All of these operations take two input relations, which must be union-compatible.
 Same number of fields.

 Corresponding fields have the same type.
Union
R U S: The union of two relations R and S defines a relation that contains all the tuples of R or S
or both R and S. Duplicate tuples are eliminated.
R and S must be union-compatible: Same number of fields and corresponding’ fields have the
same type.
Example: - Consider the following relations R and S
R S
Page | 6
First Last Age
First Last Age
Bill Smith 22
Forrest Gump 36
Sally Green 28
Sally Green 28
Mary Keen 23
DonJuan DeMarco 27
Tony Jones 32
R S First Last Age

Bill Smith 22
Sally Green 28
Mary Keen 23
Tony Jones 32
Forrest Gump 36
DonJuan DeMarco 27
Intersection
R n S: The intersection operation defines a relation consisting of the set of all tuples that are in
both R and S. R and S must be union-compatible.
Example: - Consider the following relations R and S

Relation R
First Last Age

Bill Smith 22
Sally Green 28
Mary Keen 23
Tony Jones 32
Relation S
Page | 7
First Last Age
Forrest Gump 36
Sally Green 28
DonJuan DeMarco 27
R S First Last Age

Sally Green 28
Set Difference
R-S: The set difference operation defines a relation consisting of the tuples that are in relation R,
but not in S. It displays all tuples/ records belonging to the first relation (left relation) but not in
the second relation (right relation). R and S must be Union compatible.
First Last Age

First Last Age
Bill Smith 22
Forrest Gump 36
Sally Green 28
Sally Green 28
Mary Keen 23
DonJuan DeMarco 27
Tony Jones 32
Relation R Relation S
First Last Age

R-S Bill Smith 22
Mary Keen 23
Tony Jones 32
5. Join Operations
Join operation is used to combine rows from two or more tables based on a related column between
them. Typically we want only combinations of the Cartesian product that satisfy certain conditions
and so we would normally use a Join Operation instead of the Cartesian product operation. Join is
a derivative Cartesian product, equivalent to performing a selection operation, using the join
predicate as the selection formula, over the Cartesian product of the operand relations. It is one of
Page | 8
the most difficult operations to implement efficiently in an RDBMS and is one of the reasons why
relational systems have intrinsic performance problems.
There are various forms of Join operations:
1. (INNER) JOIN: It returns records that have matching values in both tables.
2. LEFT (OUTER) JOIN: It returns all records from the left table and the matched records
from the right table. Tuples from R that do not have matching values in the common
attributes of S are also included in the result relation. Missing values in the second relation
are set to null.
3. RIGHT (OUTER) JOIN: It returns all records from the right table and the matched
records from the left table.
4. FULL (OUTER) JOIN: It returns all records when there is a match in either left or right
table.
Page | 9
Condition Join: R CS = C (RXS)
Example: S1 S1.sid<R1.sidR1
S1 R1
Sid Sname Rating Age Sid Bid Day
22 Peter 7 45 22 101 10/10/96
31 Mengistu 8 55 58 103 11/12/96
58 Solomon 10 35
22 Peter 7 45 58 103 11/12/96
31 Mengistu 8 55 58 103 11/12/96
5. Equi join: Natural join (to remove one of the similar attributes). It is a special case of
condition join where the condition C contains only equalities.
Example:
sid, .., age, bid, ..(S1 sidR1)
S1 R1
Page | 10
22 Peter 7 45 22 101 10/10/96
31 Mengistu 8 55 58 103 11/12/96
58 Solomon 10 35
Sid Sname Rating Age Bid Day
22 Peter 7 45 101 10/10/96
58 Solomon 10 35 103 11/12/96
6. Division Operation
 The division operation is not supported as a primitive operator but useful for
expressing queries like:
Find sailor who have reserved all boats
 Precondition: in A/B the attributes in B must be included in the schema for A also
the result has attributes A-B.
 SALES(SupId,ProdId);
 PRODUCTS(ProdId);
 Relations SALES and PRODUCTS must be built using projections.
 SALES/PRODUCTS: the ids of the suppliers supplying all products.
Example:
Page | 11
A
Sno Pno
S1 P1
S1 P2
S1 P3
S1 P4
S2 P1
S2 P2
S3 P2
S4 P2
S4 P4
Sno
B1 S1
Pno S2
P2 A/B1= S3
S4
B2
Pno Sno
P2 A/B2= S1
P4 S4
Page | 12
B3
Pno
P1
Sno
P2
S1
P4 A/B3=
Page | 13
OVERVIEW OF QUERY PROCESSING
There are many ways that complex query can be performed, and one of the aims of query
processing is to determine which one is the most cost effective. In declarative languages such as
SQL, the user specifies what data is required rather than how it is retrieved
Query processing
Query processing is the activities involved in retrieving data from the database. Giving the DBMS
the responsibility for selecting the best strategy prevents users from choosing strategies that are
known to be inefficient and gives the DBMS more control system performance
The aim of query processing are to transform a query written in a high level language, typically
SQL, into a correct and efficient execution strategy expressed in low level language (implementing
the relational algebra),and to execute the strategy to retrieve the required data.
Query processing can be divided into four main phases:
1. Decomposition(consists of parsing and validation)
2. Optimization
3. Code generation and
4. Execution
Page | 14
Query in high-level language
(typically SQL)
Query decomposition System Catalog
Relational Algebra
expression
Database statistics
Compile time Query optimization
Execution
plan
Cost generation
Generated code
Runtime Runtime Query Main databases

execution
Query output
Query Decomposition
The aim of query decomposition is to transform a high-level query into a relational algebra query,
and to check that the query is syntactically and semantically correct
The typical stages of query decomposition are:
 Analysis
 Normalization
 Semantic analysis
 Simplification and
 Query restructuring.
1. Analysis
In this stage, the query is lexically and syntactically analyzed using the techniques of programming
language compilers
Page | 15
In addition, this stage verifies that the relations and attributes specified in the query are defined in
the system catalog. It also verifies that any operations applied to database objects are appropriate
for the object type.
Example:
SELECT StaffNumber
FROM staff
WHERE position >10; data type mismatch
On compilation of this stage, the high-level query has been transformed into some internal
representation that is more suitable for processing.
The internal form that is typically chosen is some kind of tree, which is constructed as follows:
 A leaf node is created for each base relation in the query

 A non-leaf node is created for each intermediate relation produced by a relational algebra
operation.
 The root of the tree represents the result of the query
 The sequence of operations is derived from the leaves to the root
Example: Find all managers who work at a Wolkite branch
∞s.branchNo=b.branchNo
root root
σs.position=’manager’ σb.city=’Wolkite’ Intermediate op
Staff Branch leaves
Page | 16
2. Normalization
Converts the query into a normalized form that can be more easily manipulated. The predicate
WHERE will be converted to conjunctive (ᴧ) or disjunctive (ᴠ) normal form
Conjunction normal form: A sequence of conjunction that are connected with the ᴧ (And)
operator. A conjunction selection contains only those tuples that satisfy all conjuncts
Example:
(position=’Manager’ ᴠ salary >20000) ᴧ (branchNo= ‘B003’)
Disjunctive normal form: A sequence of disjuncts that are connected with the ᴠ (OR) operator.
A disjunctive selection contains those tuples formed by the union of all tuples that satisfy the
disjuncts.
Example:
(position = ‘Manager’ ᴧ branchNo= ‘B003’) ᴠ (salary > 20000 ᴧ branchNo=’B003’)
3. Semantic Analysis
Objective of semantic analysis is to reject normalized queries that are incorrectly formalized or
contradictory.
Incorrect: If components do not contribute to the generation of the results. It happens if some join
specifications are missing.
Contradictory: A query is contradictory if it is predicate cannot be satisfied by any tuple
Example:
(position=’Manager’^ position =’Assistance’) v salary >20000
The above query could be simplified to (salary>20000)
 F v salary >20000
4. Simplifications
The objective of the simplification stage is:
 To detect redundant qualifications

 To eliminate common sub expression and
 To transform the query to a semantically equivalent but more easily and efficiently
computed form.
Page | 17
Typically:
 Access restrictions
 View definitions and
 Integrity constraints are considered at this stage ,some of which may also introduce
redundancy
If the user the does not have the appropriate access to all the components of the query the query
must be rejected.
View definition
CREATE VIEW Staff3 AS
SELECT staffNo,fName,lName,salary,branchNo
FROM Staff
WHERE branchNo=‘B003’;
SELECT *
FROM Staff3
WHERE (branchNo=‘B003’ AND salary > 20000);
SELECT staffNo,fName,lName,salary
FROM Staff
WHERE(branchNo=‘B003’ AND salary>20000) AND branchNo=‘B003’; WHERE
(branchNo=‘B003’ AND salary>20000)
Assuming that the user has the appropriate access privileges, an initial optimization is to apply the
well-known idempotent rules of Boolean algebra, such as:
 p ^ p =p
 P ^ False= False
 P ^ true = P
 P ^ -P=False
 p^ (p V q)=p
 p v p=p
 p v False =p
 p v true =true
Page | 18
 p v –p = true
 p v (p^q)=p
Integrity constraints
Pos=manager salary >20000
CREARE ASSERTION onlyManager SalaryHigh
CHECK((position<>’Manager’ AND Salary <20000)
OR (position=’Manager’ AND salary > 20000));
5. Query restructuring
The query is restructured to provide a more efficient implementation.
Heuristically Approach-uses transformation rules to convert one relational algebra expression into
an equivalent form that is known to be more efficient.
Example: Select before join- manager working in Wolkite Branch
TRANSFORMATION RULES
In listing these rules, we use three relations R,S, and T
 R defines over the attributes A={ A1,A2,…An} , and

 S defines over B={B1,B2,…Bn};
 p, q, and r denotes predicates, and
 L,L1,L2,M,M1,M2, and N denote sets of attributes
1. Conjunctive selection operations can cascade into individual selection operations (and vice
versa)
 σp^q^r(R)= σp( σp(σq(σr(R) ))
 σ branchNo= ‘B0003’^salary>15000(Staff) = σbranchNo =
‘B0003’(σsalary>15000(Staff)
2. Commutativity of selection operations
 σp(σq(R)) = σq(σp(R))
 σ branchNo= ‘B0003’(σ salary>15000(Staff))= (σsalary>15000(σ branchNo=
‘B0003’(Staff))
 male students whose grade above 3.2
3. In a sequence of projection operaions, only the last in the sequence is required
 ∏lName ∏ branchNo,lName(Staff)= ∏lName(Staff)
4. Commutativity of selection and projection
Page | 19
If the predicate P involves only the attributes in the projection list, then the selection and projection
operations commute:
∏A1,….Am(σp(R)= σp(∏ A1,….Am(R))

∏fName,lName(σlName=’Beech’(Staff))= σlName=’Beech’(∏
fName,lName(Staff))
5. Commutativity of Theta Join (and Cartesian Product)
 R ⋈ p S = S ⋈p R
 R X S=SXR
Example: staff ⋈ staff.branchNo=Branch.branchNo Branch = Branch ⋈

staff.branchNo=Branch.branchNo Staff
6. Commutative of Union and Intersection (but not set difference)

 RUS=SUR
 RnS=SnR
7. Commutativity of selection and set operations (Union, Intersection and Set Difference)
 σp(R U S) = σp(R) U σp(S)
 σp(R n S) = σp(R) n σp(S)
 σp(R -S) = σp(R) - σp(S)
8. Commutativity of projection and union
 πL(RUS)= πL(S) U π(R)
9. Associativity of Theta join(and Cartesian product)
 (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T)
 (R X S) X T = R X (S X T)
10. Associativity of union and Intersection (but not set Difference)
 (R US) U T = S U ( R U T)
 (R n S) n T = S n ( R n T)
 Many DBMS use heuristics to determine strategies for query processing
QUERY OPTIMIZATION
The activity of choosing an efficient execution strategy for processing a query. The aim of query
optimization is to choose the one that minimizes resource usage.
Generally, we try to reduce the total execution time of the query, which is the sum of execution
time of all individual operations that make up the query
However, resource usage may also be viewed as the response time of the query in which case we
concentrate on maximizing the number of parallel operations. (Pipelining)
Both methods of query optimization depend on database statistics to evaluate properly the
difference options that are available.
Page | 20
The accuracy and currency of these statistics have a significant bearing on the efficiency of the
execution strategy chosen.
The statistics cover information about relations, attributes and indexes.
The system catalog may store statistics
 giving the cardinality of relations
 the number of distinct values for each attribute and
 the number of levels in a multilevel index
Example: Find all managers who work at a Wolkite branch
SELECT *
FROM Staff s,Branch b
WHERE S.branchNo=b.branchNo AND
(s.position=’manager’AND b.city =’Wolkite’);
The equivalent relational algebra queries corresponding to these SQL statements are:
1. σ (position=manager’) ᴧ(city=’ Wolkite’) ᴧ(staff.branchNo=Branch.branchNo)(StaffXBranch)
2. σ (position=manager’) ᴧ(city=’ Wolkite’) (Staff ∞ staff.branchNo=Branch.branchNo Branch)
3. σ (position=manager’(Staff)) ∞ staff.branchNo=Branch.branchNo(σ city=’Wolkite’(Branch))
• Assume there are 1000 tuples in staff, 50 tuples in Branch, 50 Managers (one for each
branch), and 5 Wolkite branches.
• We compare these queries based on the number of disk accesses required
Page | 21
• Assume there are no indexes or sort keys on either relations and that the result of any
intermediate operations is stored on disk.
• Assume tuples are accessed one at a time and
• Main memory is large enough to process entire relations for each relational algebra
operation
The first query calculates the Cartesian product of staff and Branch, which requires (1000+50)
disk accesses to read the relations, and creates a relation with (1000*50)tuples. We then have to
read each of these tuples again to test them against the selection predicate at a cost of another
(1000*50) disk accesses, giving a total cost of :
(1000 +50) +2*(1000*50) =101050 disk accesses
The second query joins staff and Branch on the branch number branchNo, which again requires
(1000+50) disk accesses to read each of the relations. We know that the join of the two relations
has 1000 tuples, one for each member of staff ( a member of staff can only work at one
branch).Consequently, the selection operation requires 1000 disk accesses to read the result of the
join ,giving a total cost
2*1000+(1000+50)=3050 disk accesses
The final query first reads each staff tuple to determine the manager tuples, which requires 1000
disk access and produces relations with 50 tuples .The second selections operations, reads each
Branch tuple to determine the Wolkite branch, which requires 50 disk access and produces a
relations with 5 tuples. The final operation is the join of the reduced staff and Branch relations,
which requires (50+5) disk accesses, giving a total cost of:
1000+2*50 +5+ (50+5) =1160 disk access
COST ESTIMATION AND STATISTICS
The dominant cost in query processing is usually that of disk access which are slow compared with
memory access. Many of the cost estimates are based on the cardinality of the relation
The success of estimating the size and cost of intermediate relational algebra operations depends
on the amount and currency of the statistical information that the DBMS holds.
Page | 22

ADB Chapter Two

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ADB Chapter Two

Uploaded by

Copyright:

Available Formats

CHAPTER TWO

Query processing and Optimization

Relational Algebra operations

1. Selection ( ): It selects a subset of rows (records) from a relation.

Assume the following relation is EMP.

 Select only those Employees in the CS department:

Office Dept Rank

The Resulting relation will be:

Name Office Dept Rank

The resulting relation will be:

Name Office Dept Rank

Name Office Dept Rank

The resulting Relation will be:

Name Office Dept Rank

 Project only the names and departments of the employees:

The resulting relation will be:

Example 2: The following relation is called Sailor

Sid Sname Rating Age

 Select Sname, Rating from Sailor

Sname, Rating (Sailor)

Combining Selection and Projection

Office Dept Rank

Example: - Show the names of all employees working in the CS department:

Example: The following two tables are named R and S

First Last Age Dinner Dessert

 Same number of fields.

Example: - Consider the following relations R and S

R S First Last Age

Example: - Consider the following relations R and S

First Last Age

R S First Last Age

First Last Age

First Last Age

There are various forms of Join operations:

Sid Sname Rating Age Sid Bid Day

22 Peter 7 45 22 101 10/10/96

31 Mengistu 8 55 58 103 11/12/96

The resulting relation will be:

Sid Sname Rating Age Sid Bid Day

22 Peter 7 45 58 103 11/12/96

31 Mengistu 8 55 58 103 11/12/96

sid, .., age, bid, ..(S1 sidR1)

22 Peter 7 45 22 101 10/10/96

31 Mengistu 8 55 58 103 11/12/96

The resulting relation will be:

Sid Sname Rating Age Bid Day

22 Peter 7 45 101 10/10/96

58 Solomon 10 35 103 11/12/96

Query processing can be divided into four main phases:

1. Decomposition(consists of parsing and validation)

3. Code generation and

Query decomposition System Catalog

Runtime Runtime Query Main databases

The typical stages of query decomposition are:

WHERE position >10; data type mismatch

 A leaf node is created for each base relation in the query

Example: Find all managers who work at a Wolkite branch

σs.position=’manager’ σb.city=’Wolkite’ Intermediate op

Staff Branch leaves

(position=’Manager’ ᴠ salary >20000) ᴧ (branchNo= ‘B003’)

(position = ‘Manager’ ᴧ branchNo= ‘B003’) ᴠ (salary > 20000 ᴧ branchNo=’B003’)

Contradictory: A query is contradictory if it is predicate cannot be satisfied by any tuple

(position=’Manager’^ position =’Assistance’) v salary >20000

The above query could be simplified to (salary>20000)

(1000 +50) +2(100050) =101050 disk accesses