Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Information Systems

Application - Databases

Relational Algebra

Shawn A. Butler, Ph.D.


Senior Lecturer, Executive Education Program
Institute for Software Research
Carnegie Mellon University
Objectives
ƒ Understand some fundamental basics of relational
algebra
ƒ Understand how to construct a query
ƒ Understand:
• Selections
• Projections
• Unions
• Joins

© 2009, CMU-ISR 2
Queries
ƒ A query is applied to relation instances
ƒ The query result is also a relation instance
• Schemas of input relations for a query are fixed
• Queries will always run
• The schema for the result is also fixed!
Determined by definition of query language
constructs
• Queries often use field names, but can also use
field position
ƒ Field position easier for formal definitions; however,
names are more readable
ƒ Both used in SQL

© 2009, CMU-ISR 3
Example Instances
R1

sid bid day


22 101 10/10/08 S2
58 103 11/12/08
sid sname rating age
28 yuppy 9 35.0

‘Sailors’ and ‘Reserves’ 31 Lubber 8 55.5


44 guppy 5 35.0
58 Rusty 10 35.0
S1
sid sname rating age
Field names are “inherited” from
22 Dustin 7 45.0
names of fields in query input
31 Lubber 8 55.5
58 Rusty 10 35.0

© 2009, CMU-ISR 4
Relational Algebra
ƒ Basic Operations:
• Selection (σ) Selects a subset of rows from relation
• Projection (π) Chooses which columns for the new
relation
• Cross-product (χ) Allows us to combine two relations
• Set Difference (―) Rows in relations 1, but not in
relation 2
• Union (U) Tuples in relation 1 and relation 2
• Intersection ( ) Only tuples in both 1 and 2
U

ƒ Additional Operations:
• Intersection, join, division, renaming

© 2009, CMU-ISR 5
Selection
ƒ Selects rows that sid sname rating age
satisfy selection 28 yuppy 9 35.0
condition
58 Rusty 10 35.0
ƒ No Duplicates!
σ rating 8 s2
ƒ Schema of result
identical to schema of
sid sname rating age
input relation
28 yuppy 9 35.0
ƒ Result relation can be
the input for another
relational algebra σ sname “yuppy” σ rating 8 s2
operation!

© 2009, CMU-ISR 6
Projection
ƒ Deletes attributes that sname rating
are not in projection list yuppy 9

ƒ Schema of result Lubber 8

contains exactly the guppy 5

fields in the projection Rusty 10

list, with the same


names that they had in π sname, rating s2
the input relation
age
ƒ Projection operator has 35.0
to eliminate duplicates 35.0
(Why?)
• Real systems don’t usually π age s2
delete duplicates unless
specifically asked
© 2009, CMU-ISR 7
Composition

S2

sid sname rating age


28 yuppy 9 35.0
31 Lubber 8 55.5
44 guppy 5 35.0
58 Rusty 10 35.0
sname rating
yuppy 9
Rusty 10

π sname ,rating σ rating 8 s2

© 2009, CMU-ISR 8
Union (s1 s2)

S2

sid sname rating age S1 s2


28 yuppy 9 35.0
sid sname rating age
31 Lubber 8 55.5
22 Dustin 7 45.0
44 guppy 5 35.0
31 Lubber 8 55.5
58 Rusty 10 35.0
58 Rusty 10 35.0
28 yuppy 9 35.0
S1 44 guppy 5 35.0
sid sname rating age
22 Dustin 7 45.0
31 Lubber 8 55.5
58 Rusty 10 35.0

© 2009, CMU-ISR 9
Union Compatible
ƒ Operations take two input relations, which must
be union-compatible
• Same number of fields
• Corresponding fields have same domain, i.e. same type

Are S1 and S3 union-compatible??

sid sname rating age sid sname rating age

22 Dustin 7 45.0 22 Dustin 7 45

31 Lubber 8 55.5 31 Lubber 8 55

58 Rusty 10 35.0 58 Rusty 10 35

S1 S3

© 2009, CMU-ISR 10
Intersection and Set-Difference

S2

sid sname rating age S1 S2


28 yuppy 9 35.0
31 Lubber 8 55.5 sid sname rating age
31 Lubber 8 55.5
44 guppy 5 35.0
58 Rusty 10 35.0
58 Rusty 10 35.0

S1 S1— S2
sid sname rating age
sid sname rating age
22 Dustin 7 45.0
22 Dustin 7 45.0
31 Lubber 8 55.5
58 Rusty 10 35.0

© 2009, CMU-ISR 11
Cross Product (S1 χ R1)
ƒ Each row of S1 is paired with each row of R1
ƒ Result Schema has one field per field of S1 and
R1, with field names inherited if possible

sid sname rating age sid bid day


22 Dustin 7 45.0 22 101 10/10/08
22 Dustin 7 45.0 58 103 11/12/08
31 Lubber 8 55.5 22 101 10/10/08
31 Lubber 8 55.5 58 103 11/12/08
58 Rusty 10 35.0 22 101 10/10/08
58 Rusty 10 35.0 58 103 11/12/08

Notice that there are now two fields called ‘sid’!

© 2009, CMU-ISR 12
Renaming Operator
ƒ Name conflicts require new field names
ƒ Convenient to name relation results
ƒ Renaming operator (ρ) for expression ρ(R(F),E),
where:
• E is an arbitrary relation expression
• R is the name given the result from the relation
expression
• F is the list of renamed fields in the form of
ƒ Oldname Æ newname
ƒ Oldposition Æ newname

ƒ Example: ρ(C(1Æsid1, 5Æsid2, S1 R1)


• C(sid1:integer, sname:string, rating:integer, age:real,
sid2:integer, bid:integer, day:dates)
© 2009, CMU-ISR 13
Joins
ƒ Condition Joins: Cross-products followed by a
selection of the form R ⋈c S = σc(R χ S)
• ⋈ defined as a cross-product followed by a
selection
• C refers to the attributes of both R and S
• Example:
ƒ S1 ⋈ s1.sid<r1.sid R1

sid sname rating age sid bid day


22 Dustin 7 45.0 58 101 10/10/08
31 Lubber 8 55.5 58 101 10/10/08

© 2009, CMU-ISR 14
Equi-joins
ƒ Special case of the condition join where
the condition c contains only equalities
ƒ Resulting schema same as that of cross-
product
ƒ Fewer tuples than cross-product

S1 s1.sid r1.sid R1

sid sname rating age bid day


22 Dustin 7 45.0 101 10/10/08
58 Rusty 10 35.5 103 11/12/08

© 2009, CMU-ISR 15
Division
ƒ Not supported as a primitive operator, but
useful for expressing queries such as:
• Find x which have all y
• Let A have 2 fields, x and y; B have only one
field y:
ƒ A/B = {x| 〈x,y〉 ∊ A y ∊ B}
• A/B contains all x tuples such that for every y
tuple in B, there is an xy tuple in A
• x and y can be any lists of fields; y is the list of
fields in B, and x y is the list of fields of A

© 2009, CMU-ISR 16
Division Example
f1 f2 f2 f2 f2

S1 P1 P2 P2 P1

S1 P2 P4 P2
B1
S1 P3 P4
B2
S1 P4 f1
B3
S2 P1 S1
f1
S2 P2 S2
S1 f1
S3 p2 S3
S4 S1
S4 P2 S4

S4 p4 A/B1 A/B2 A/B3

A
© 2009, CMU-ISR 17
Expressing Division in SQL
ƒ For A/B, compute all x values that are not
‘disqualified’ by some y value in B
• X value is disqualified if by attaching y value
from B, we obtain an xy tuple that is not in A

Disqualified x values πx πx A χ B —A

A/B: πx A — all disqualified values

© 2009, CMU-ISR 18
Lots of Example Queries

© 2009, CMU-ISR 19
Example 2
ƒ Find names of sailors who have reserved a
red boat
• Solution 1: πsname((σcolor=‘red’ Boats) ⊜ Reserves
⊜ Sailors)
• Solution 2: πsname (πsid((πbid(σcolor=‘red’Boats) ⊜
Reserves ⊜ Sailors)
Which solution is more efficient?

© 2009, CMU-ISR 20
Example 3

© 2009, CMU-ISR 21
Relational Algebra Summary
ƒ The relational model has a rigorously
defined query language that is simple and
powerful
ƒ Relational algebra is more operational;
useful as internal representation for query
ƒ There are often several ways of expressing
a given query; the query optimizer should
choose the most efficient version

© 2009, CMU-ISR 22

You might also like