Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 105

INTRODUCTION

TO
DATABASE
L E A R N B Y E X A M P L E S

D E S I G N T H E O RY F O R
R E L AT I O N A L D ATA B A S E S
Objectives

 Understand concepts of:


I2DB: Design Theory for Relational Databases

 Functional Dependencies

 Normalization

 Decomposition

 Multi-valued Dependencies
Design Theory for Relations

 Helps us examine a design and make


I2DB: Design Theory for Relational Databases

improvements based on a few simple principles


 States the constraints that apply to the relation
 The most common constraint is the functional
dependencies
3.1 Functional Dependency

 A functional dependency: constraint between two sets of


I2DB: Design Theory for Relational Databases

attributes in a relation
 A set of attributes X (include A1A2…An) in R functionally
determine another attribute Y (include B1B2…Bm), also in R,
(written X → Y) if and only if each X value is associated with
precisely one Y value
 A functional dependency A1A2…An  B1B2…Bm holds on
relation R if two tuples of R agree on all of the attributes A1, A2, …,
An then they must also agree on all of the attributes B1, B2, …, Bm
3.1 Functional Dependency
I2DB: Design Theory for Relational Databases

Movies1

 Easy to see that: the following FD is true


 title,year  length, genre, studioName

 Exercise: How about the FD


 title,year  starName

title, year  startName does not hold in Movies1 relation


Keys of Relations
I2DB: Design Theory for Relational Databases

 A set of one or more attributes {A1, A2, …, An} is


a key for a relation R, if
1) Those attributes functionally determine all other
attributes of the relation R

2) No proper subset of {A1, A2, …, An} functionally


determines all other attributes of R, i.e., a key must be
minimal
Keys of Relations

 Example
I2DB: Design Theory for Relational Databases

 (title,year,starName) form a key for Movies1 relation

 (year,starName) is not a key of Movies1 relation


Keys of Relations

 A relation can have more than one keys


I2DB: Design Theory for Relational Databases

 One of them is chosen as primary key

 Others are candidate keys (or alternate keys)


Superkeys

 A set of attributes that contains a key is called a


I2DB: Design Theory for Relational Databases

superkey
 Every superkey satisfies the first condition of a
key: it functionally determines all other attributes
of the relation
 If K is a key, L is a super key, then: K  L
 A key is also a super key
Superkeys

 Example
I2DB: Design Theory for Relational Databases

 Some superkeys
▪ (title,year,starName)

▪ (title,year,starName,length,studioName)
3.2 Rules about Functional
Dependencies

 A set of FD’s S follows from a set of FD’s T


I2DB: Design Theory for Relational Databases

if every relation instance that satisfies all the FD’s


in T also satisfies all the FD’s in S
 Two sets of FD’s S and T are equivalent if and
only if S follows from T, and T follows S
Splitting and Combining Rules

 Splitting Rule
I2DB: Design Theory for Relational Databases

 We can replace an FD A1A2…An  B1B2…Bm by a set of

FD’s A1A2…An  Bi, for i=1,2,…,m

 Combining Rule
 We can replace a set of FD’s A1A2…An  Bi, for i=1,2,

…,m by the single FD A1A2…An  B1B2…Bm


Splitting and Combining Rules

 Example
I2DB: Design Theory for Relational Databases

 title year  length

 title year  genre

 title year  studioName

 title year  length genre studioName


Trivial Functional Dependencies
I2DB: Design Theory for Relational Databases

 The FD’s A1A2…An  B1B2…Bm is trivial(phụ


thuộc hàm tầm thường hiển nhiên, vế phải là
tập con của vế trái), if
{B1, B2, …, Bm}  {A1, A2, …, An}
 Every trivial FD holds in every relation
Trivial Functional Dependencies
I2DB: Design Theory for Relational Databases

 The FD’s A1A2…An  B1B2…Bm is equavilent to

A1A2…An  C1C2…Ck, where C’s are all those B’s


that are not also A’s C’s
A’s
B’s

 Ex: A,B,C,D  C,D,E,F then A,B,C,D  E,F


The Closure of Attributes
I2DB: Design Theory for Relational Databases

 The closure of a set of attributes {A1, A2, …, An}

under FD’s in S (denoted {A1, A2, …, An}+) is the set


of attributes B such that every relation that satisfies
all the FD’s in set S also satisfies A1A2…An B
 That is, A1A2…An B follows from the FD’s of S
 A1, A2, …, An  {A1, A2, …, An}+, because A1A2…

AnAi is trivial
The Closure of Attributes

 Example
I2DB: Design Theory for Relational Databases

 R(A,B,C,D,E,F)

 S={AC, A D, D E, E F}, {A}+=? (các phụ thuộc

hàng có liên quan đến bao đóng)


 {A}+={A,C,D,E,F}
The Closure of Attributes
I2DB: Design Theory for Relational Databases

 Algorithm 3.7: Closure of a set of attributes


 Input: A set of attributes {A1,A2,…,An} and a set of FD’s S

 Output: The closure {A1,A2,…,An}+

1. If necessary, split the FD’s of S, so each FD in S have singleton right side

2. Let X be a set of attributes that will become the closure. Initialize X to be {A1,A2,
…,An}

3. Repeatedly search for some FD: B1B2…Bm  C, such that B1, B2, …, Bm are in X,
but C is not
a) If such C is found, add to X, and repeat the search
b) If such C is not found, no more attributes can be added to X

4. The set X is the correct value of {A 1, A2, …, An}+


The Closure of Attributes

 Example
I2DB: Design Theory for Relational Databases

 R(A,B,C,D,E,F)

 S={ABC, BCAD,DE,CFB}

 Compute {A,B}+ , {B,C}+

[AB] += ABCDE(tâp thuộc tính vẫn còn thiếu F),


ABF
-tương tự [B,C]+ = BCADE, BCF
The Closure of Attributes

 Example
I2DB: Design Theory for Relational Databases

 S={ABC, BCAD,DE,CFB}

 Split BC AD into BC A and BC D

 S={ABC, BCA, BC D, DE, CFB}

 Start with Result={A,B}

 First: since AB C, then add C to Result, Result={A,B,C}

 Second: since BC A, BC D, and A is already in Result, but D is not, then add D

to Result, Result={A,B,C,D}
 Third: since D E, and E is not in Result, then add E to Result, Result={A,B,C,D,E}

 No more changes to Result are possible, the algorithm is finished here, and {A,B}

+={A,B,C,D,E}
The Closure of Attributes

 R(A,B,C,D,E,F)
I2DB: Design Theory for Relational Databases

 S={ABC, BCAD,DE,CFB}
 Compute {AB}+ ? {BC}+ ?
 What are some the keys of R?
The Closure of Attributes

 R(A, B, C, D)
I2DB: Design Theory for Relational Databases

 S={BC  D, D  A, A  B}
 Compute {CD}+ ? {BC}+ ?
 What are some the keys of R?
The Closure of Attributes

 R(A, B, C, D)
I2DB: Design Theory for Relational Databases

 S={A→B, B→C, C→D, D→A}


 Compute {A}+ ? {B}+ ?
 What are some the keys of R?
The Transitive Rule

If A1A2…AnB1B2…Bm and
I2DB: Design Theory for Relational Databases

B1B2…BmC1C2…Ck hold in relation R, then

A1A2…An  C1C2…Ck also holds in R


 If some of the C’s are among the A’s, we may
eliminate them from the right side
The Transitive Rule

 Example: Let’s see another version of Movies relation


I2DB: Design Theory for Relational Databases

title year length genre studioName studioAddr


Star Wars 1977 124 scifi Fox Holywood
Eight Below 2005 120 drama Disney Buena Vista
Wayne’s World 1992 95 comedy Paramount Holywood

 Two of the FD’s that we might claim to hold are:

title year  studioName

studioName  studioAddr
 The transitive rule allows us to construct the third FD’s:

title year  studioAddr


I2DB: Design Theory for Relational Databases


Exercise 3.2.1 (page 83)
Exercises
Closing Sets of Functional
Dependencies

 Suppose a set of FD’s S, any set of FD’s T equivalent to S is


I2DB: Design Theory for Relational Databases

said to be a basis for S.


Then we say T is a basis for S
 Just work with only FD’s that have singleton right sides
 A minimal basis for FD’s S is a basis B that satisfies three
conditions:
 All the FD’s in B have singleton right sides

 If any FD is removed from B, the result is no longer a basis

 If for any FD in B we remove one or more attributes from the left side,

the result is no longer a basis


Closing Sets of Functional
Dependencies

 Example
I2DB: Design Theory for Relational Databases

 R(A,B,C)

 S={A B, A  C, B  A, B  C, C  A, C  B,

AB  C, BC  A, AC  B, A  BC, B  AC, C  AB}


R and its FD’s have several minimal basis
▪ {A  B, B  A, B  C, C  B}, or
▪ {A  B, B  C, C  A}
What happens to …

 … a set of FD’s S of R when we project R on


I2DB: Design Theory for Relational Databases

some attributes?
 That is, suppose a relation R with set of FD’s S,
and R1=L(R). What FD’s hold in R1?
Projecting Functional
Dependencies

 To find a functional dependencies of projection,


I2DB: Design Theory for Relational Databases

we
 Follow from S, and

 Involve only attributes of R1


Projecting Functional
Dependencies

 Algorithm 3.12: Projecting a Set of FD’s


I2DB: Design Theory for Relational Databases

 Input: R, R1=L(R), S a set of FD’s that hold in R

 Output: the set of FD’s that hold in R1

 Method:

▪ T is the set of FD’s that hold in R1. Initially, T is empty

▪ For each set of attributes X of R1, compute X+. Add to T all non-

trivial FD’s X → A such that A is both in X+ and an attribute of R1

▪ Construct a minimal basis from T


Projecting Functional
Dependencies

 Algorithm 3.12: Projecting a Set of FD’s (cont)


I2DB: Design Theory for Relational Databases

 Compute a minimal basis from T


▪ If there is an FD F in T that follows from other FD’s in T, then
remove F from T
▪ Let Y  B is a FD in T , with at least two attributes in Y, and let Z
is Y with one of its attributes removed:
▪ If Z  B follows from the other FD’s in T (including Y  B), then
replace Y  B by Z  B

▪ Repeat the above steps in all possible ways until no more


changes to T can be made
Projecting Functional
Dependencies

 Two notations
I2DB: Design Theory for Relational Databases

 (1) Closing the empty set and the set of all attributes

cannot yield a nontrivial FD


 (2) If we have already know that the closure of some set

X is all attributes, then we cannot discover any new


FD’s by closing supersets of X
Projecting Functional
Dependencies

 Example: Suppose R(A,B,C,D) has FD’s A→B, B→C, and C→D.


I2DB: Design Theory for Relational Databases

R1=A,C,D(R). Find the FD’s of R1?


 Compute the closure of the singleton set
▪ {A}+={A,B,C,D}, and B is not in R1, then new FD’s A→C, A→D
▪ {C}+={C,D}, then new FD’s C→D
▪ {D}+={D}, no new FD’s

 Compute the closure of the doubleton set


▪ Since {A}+ include all attributes, no care any more for supersets of {A}
▪ {C,D}+={C,D}, no new FD’s holds in R1

 Finally, there are three FD’s A→C, A→D, C→D hold in R1

 A→D is transitive from A→C, and C→D

 So, minimal basis is {A→C, C→D}


Exercises

 Exercise 3.2.1, 3.2.2 (page 83)


I2DB: Design Theory for Relational Databases

 Exercise 3.2.3 (page 83/84)


3.3 Design of Relational Database
Schema

 Anomalies: problems such as redundancy that occur when


I2DB: Design Theory for Relational Databases

we try to cram too much into a single relation are called


anomalies
 Some kinds of anomalies:
1) Redundancy: information may be repeated unnecessarily in several
tuples
2) Update Anomalies: information may be changed in one tuple, but be
unchanged in another
3) Deletion Anomalies: if a set of values becomes empty, we may lose
other information as a side effect
Anomalies

 Example
I2DB: Design Theory for Relational Databases

title year length genre studioName starName


Star Wars 1977 124 SciFi Fox Carrie Fisher
Star Wars 1977 124 SciFi Fox Mark Hamill
Star Wars 1977 124 SciFi Fox Harrison Ford
Gone With The Wind 1939 231 drama MGM Vivien Leigh
Wayne's World 1992 95 comedy Paramount Dana Carvey
Wayne's World 1992 95 comedy Paramount Mike Meyers

1) The length and genre repeated unnecessarily


2) We’d change the length of Star Wars to 125 in the first tuple, but not in
the other tuples

3) We can delete ‘Fox’ from the set of Studios, then we have no more
studios for Star Wars
I2DB: Design Theory for Relational Databases


… eliminate these anomalies?
How we can …
Decomposing Relations

 One way to eliminate these anomalies is to


I2DB: Design Theory for Relational Databases

decompose relations:
 Given a relation R(A1,A2,…,An), we may decompose R

into two relations S(B1,B2,…,Bm) and T(C1,C2,…,Ck)


such that:
▪ {A1,A2,…,An} = {B1,B2,…,Bm}  {C1,C2,…,Ck}

▪ S = B1,B2,…,Bm(R)

▪ T = C1,C2,…,Bk(R)
Decomposing Relations

 Example: Decompose the Movies1 relation


I2DB: Design Theory for Relational Databases

title
title year length
year length genre
genre studioName
studioName starName
starName
StarWars
Star Wars 1977
1977 124SciFi
124 SciFi Fox
Fox CarrieFisher
Carrie Fisher
StarWars
Star Wars 1977
1977 124SciFi
124 SciFi Fox
Fox MarkHamill
Mark Hamill
StarWars
Star Wars 1977
1977 124SciFi
124 SciFi Fox
Fox HarrisonFord
Harrison Ford
GoneWith
Gone WithTheTheWind
Wind 1939
1939 231drama
231 drama MGM
MGM VivienLeigh
Vivien Leigh
Wayne'sWorld
Wayne's World 1992
1992 95comedy
95 comedy Paramount
Paramount DanaCarvey
Dana Carvey
Wayne'sWorld
Wayne's World 1992
1992 95comedy
95 comedy Paramount
Paramount MikeMeyers
Mike Meyers

into
 A relation Movies2, by all the attributes except for startName

 A relation Movies3, by title, year, and starName


Decomposing Relations

 Example (cont.):
I2DB: Design Theory for Relational Databases

title
title year length
year length genre
genre studioName
studioName starName
starName
StarWars
Star Wars 1977
1977 124SciFi
124 SciFi Fox
Fox CarrieFisher
Carrie Fisher
StarWars
Star Wars 1977
1977 124SciFi
124 SciFi Fox
Fox MarkHamill
Mark Hamill
StarWars
Star Wars 1977
1977 124SciFi
124 SciFi Fox
Fox HarrisonFord
Harrison Ford
GoneWith
Gone WithTheTheWind
Wind 1939
1939 231drama
231 drama MGM
MGM VivienLeigh
Vivien Leigh
Wayne'sWorld
Wayne's World 1992
1992 95comedy
95 comedy Paramount
Paramount DanaCarvey
Dana Carvey
Wayne'sWorld
Wayne's World 1992
1992 95comedy
95 comedy Paramount
Paramount MikeMeyers
Mike Meyers
title
title year length
year length genre
genre studioName
studioName title
title year
year starName
starName
StarWars
Star Wars 1977
1977 124SciFi
124 SciFi Fox
Fox StarWars
Wars 1977 CarrieFisher
Fisher
Star 1977 Carrie
GoneWith
Gone WithTheTheWind
Wind 1939
1939 231drama
231 drama MGM
MGM StarWars
Wars 1977 MarkHamill
Hamill
Star 1977 Mark
Wayne'sWorld
Wayne's World 1992
1992 95comedy
95 comedy Paramount
Paramount StarWars
Wars 1977 HarrisonFord
Ford
Star 1977 Harrison
GoneWith
Gone WithThe
The1939
1939
Wind
Wind VivienLeigh
Vivien Leigh
Wayne'sWorld1992
Wayne's World1992 DanaCarvey
Dana Carvey
Wayne'sWorld1992
Wayne's World1992 MikeMeyers
Mike Meyers
Decomposing Relations

 Discuss on Example
I2DB: Design Theory for Relational Databases

title
title year length
year length genre
genre studioName
studioName
StarWars
Star Wars 1977
1977 124SciFi
124 SciFi Fox
Fox
GoneWith
Gone WithTheTheWind
Wind 1939
1939 231drama
231 drama MGM
MGM
Wayne'sWorld
Wayne's World 1992
1992 95comedy
95 comedy Paramount
Paramount

title
title year
year starName
starName
StarWars
Star Wars 1977
1977 CarrieFisher
Carrie Fisher
 The redundancy is eliminated StarWars
Star Wars 1977
1977 MarkHamill
Mark Hamill
StarWars
Star Wars 1977
1977 HarrisonFord
Harrison Ford
 The risk of update anomaly is gone GoneWith
Gone WithThe
The1939
1939
Wind
Wind VivienLeigh
Vivien Leigh
Wayne'sWorld1992
Wayne's World1992 DanaCarvey
Dana Carvey
 The risk of deletion anomaly is gone Wayne'sWorld1992
Wayne's World1992 MikeMeyers
Mike Meyers

 What is about the repeat of (title, year) in Movies3


relation?
How we can guarantee …

 … the anomalies not to exists?


I2DB: Design Theory for Relational Databases
Boyce-Codd Normal Form

 Boyce-Codd Normal Form (BCNF) is a simple


I2DB: Design Theory for Relational Databases

condition that guarantee anomalies are not exists


 A relation R is in BCNF if and only if whenever there is a

nontrivial FD A1A2…AnB1B2…Bm for R, it is the case


that {A1,A2,…,An} is a super key for R
 Notation
 The left side of every nontrivial FD must be a super key

 The left side of every nontrivial FD must contain a key


Boyce-Code Normal Form

 Example: BCNF or not?


I2DB: Design Theory for Relational Databases

title
title year length
year length genre
genre studioName
studioName starName
starName
StarWars
Star Wars 1977
1977 124SciFi
124 SciFi Fox
Fox CarrieFisher
Carrie Fisher
StarWars
Star Wars 1977
1977 124SciFi
124 SciFi Fox
Fox MarkHamill
Mark Hamill
StarWars
Star Wars 1977
1977 124SciFi
124 SciFi Fox
Fox HarrisonFord
Harrison Ford
GoneWith
Gone WithTheTheWind
Wind 1939
1939 231drama
231 drama MGM
MGM VivienLeigh
Vivien Leigh
Wayne'sWorld
Wayne's World 1992
1992 95comedy
95 comedy Paramount
Paramount DanaCarvey
Dana Carvey
Wayne'sWorld
Wayne's World 1992
1992 95comedy
95 comedy Paramount
Paramount MikeMeyers
Mike Meyers

 This is not in BCNF, because:


 title year  length genre studioName is a FD, and

 {title, year} is not a super key


Boyce-Code Normal Form

 Example: BCNF or not?


I2DB: Design Theory for Relational Databases

title
title year length
year length genre
genre studioName
studioName
StarWars
Star Wars 1977
1977 124SciFi
124 SciFi Fox
Fox
GoneWith
Gone WithTheTheWind
Wind 1939
1939 231drama
231 drama MGM
MGM
Wayne'sWorld
Wayne's World 1992
1992 95comedy
95 comedy Paramount
Paramount

 This is in BCNF, because:


 title year  length genre studioName is a FD, and

 {title, year} is a key


Boyce-Code Normal Form

 Example: What is about the case that a relation R


I2DB: Design Theory for Relational Databases

has only two attributes A and B?


  R(A,B) is in BCNF
Example: Loss of information
after decomposition
R1
A B
I2DB: Design Theory for Relational Databases

R 1 2 R3
A B C 4 2 A B C
1 2 3
1 2 3 R2
1 2 5
4 2 5 B C
4 2 3
2 3
2 5 4 2 5

 Suppose we have R(A,B,C) but neither of the FD’s B->A


nor B->C holds.
 R is decomposed into R1 and R2 as above
 When we try to re-construct R by Natural Join of R1 and R2, we have: R3
= R1 X R2 (but R3 <> R => We lost information)
Decomposition into BCNF

 We can break any relation schema into a


I2DB: Design Theory for Relational Databases

collection of subsets of its attributes and


 These subsets are the schemas of relations in BCNF

 From the data in decomposed relations, we must be

able to reconstruct the original relation instance exactly


Decomposition into BCNF

 Algorithm 3.20: BCNF Decomposition Algorithm


I2DB: Design Theory for Relational Databases

 Input: a relation R0 with a set of functional dependencies S0

 Output: a decomposition of R0 into a collection of relations, all of which are

in BCNF
 Method: R=R0, S=S0
▪ Check whether R is in BCNF. If so, nothing to do, return {R}

▪ If there are BCNF violation, let one be X→Y


▪ Compute X+

▪ Choose R1=X+, and let R2 have attributes X and those attributes of R that are not in X+

▪ Compute the sets of FD’s for R1 and R2, let these S1, S2

▪ Recursively decompose R1, R2 using this algorithm. Return the union of the result of these
compositions
Decomposition into BCNF

 Example:
I2DB: Design Theory for Relational Databases

Movies(title,year,studioName,president,presAddr)
 Three FD’s are
▪ title, year → studioName
▪ studioName → president

▪ president → presAddr

 {title, year} is only key for this relation

 Apply Algorithm 3.20 above:


Decomposition into BCNF

 Example (cont.)
I2DB: Design Theory for Relational Databases

 Decomposition starts with studioName→president


▪ Compute {studioName}+={studioName,president,presAddr}
▪ R1={studioName,president,presAddr}

▪ R2={title,year,studioName}

▪ S1={studioName→president, president→presAddr}
▪ S2={title year → studioName}

▪ So, in R2, {title,year} is a key, and R2 in BCNF


▪ In R1, studioName is a key, but president→presAddr is not in BCNF
Decomposition into BCNF

 Example (cont.)
I2DB: Design Theory for Relational Databases

 Decomposition continues with president→presAddr


▪ Compute {president}+={president,presAddr}
▪ R11={president,presAddr}

▪ R12={president,studioName}
▪ S11={president→presAddr}

▪ S12={studioName→president}
▪ So, R11, R12 are both in BCNF
Decomposition into BCNF

 Example (cont.)
I2DB: Design Theory for Relational Databases

 Final decomposition
▪ {tilte, year, studioName}
▪ {studioName, president}
▪ {president, presAddr}
3.4 Decomposition: The Good,
Bad and Ugly

 Three problems of decomposition


I2DB: Design Theory for Relational Databases

 Elimination of Anomalies be decomposition

 Recoverability of Information

 Preservation of Dependencies

 There is no way to get all three at once


Recovering Information from a
Decomposition

 If we decompose relation R into two-attribute


I2DB: Design Theory for Relational Databases

relations, then we might not get the instance of R


back when we join these decomposed relations
 If we decompose a relation by Algorithm 3.20,
then the original relation can be recovered exactly
by the natural join
The Chase Test for Lossless Join

 Decomposing relation R into relations with sets of


I2DB: Design Theory for Relational Databases

attributes R1, R2, …, Rk


 Given set of FD’s F that hold in R
 Question: R1(R) ⋈ R2(R) ⋈…⋈ Rk(R)=R?
The Chase Test for Lossless Join

 Remember
I2DB: Design Theory for Relational Databases

 The natural join is associative and commutative

 Any tuple t in R is surely t in R1(R) ⋈ R2(R) ⋈…⋈

Rk(R)

 R1(R) ⋈ R2(R) ⋈…⋈ Rk(R)=R when the FD’s in F

hold for R if and only if every tuple in the join is also in R


The Chase Test for Lossless Join

 The chase test is to see whether a tuple t in R1(R) ⋈ R2(R)


I2DB: Design Theory for Relational Databases

⋈…⋈ Rk(R) can be proved, using the FD’s in F, also to be


a tuple in R
 If t is in the join, there are t1,t2,…tk in R such that t=R1(t1) ⋈
R2(t2) ⋈…⋈ Rk(tk)
 ti agree with t on the attributes of Ri

 ti has unknown values in its components not in Ri

 We draw a tableau and use a given set of FD’s F to prove


that t is really in R
The Chase Test for Lossless Join

 Example: R(A,B,C,D), S1={A,D}, S2={A,C},


I2DB: Design Theory for Relational Databases

S3={B,C,D}. FD’s A→B, B→C,CD→A


 The tableau for this decomposition
A B C D
a b1 c1 d
a b2 c d2
a3 b c d

 Goal: (a,b,c,d) appears in this tableau after using FD’s


The Chase Test for Lossless Join

 Example: (A→B, B→C, CD→A)


I2DB: Design Theory for Relational Databases

A B C D
a b1 c1 d
a b2 c d2
a3 b c d

 Since A→B, then b1=b2, replace b2 by b1, then

A B C D
a b1 c1 d
a b1 c d2
a3 b c d
The Chase Test for Lossless Join

 Example: (A→B, B→C, CD→A)


I2DB: Design Theory for Relational Databases

A B C D
a b1 c1 d
a b1 c d2
a3 b c d

 Since B→C, then c1=c, replace c1 by c, then

A B C D
a b1 c d
a b1 c d2
a3 b c d
The Chase Test for Lossless Join

 Example: (A→B, B→C, CD→A)


I2DB: Design Theory for Relational Databases

A B C D
a b1 c d
a b1 c d2
a3 b c d

 Since CD→A, then a3=a, replace a3 by a, then

A B C D
a b1 c d
a b1 c d2
a b c d
The Chase Test for Lossless
Join

 Example: (A→B, B→C, CD→A)


I2DB: Design Theory for Relational Databases

A B C D
a b1 c1 d
a b1 c d2
a b c d

 t=(a,b,c,d) appears in relation R

 We have proved that, if R satisfies the FD’s A→B, B→C,


CD→A, then whenever we project onto {A,D}, {A,C},
{B,C,D} and rejoin, what we get must have been in R
Dependency Preservation

 In some cases, it is not possible that the


I2DB: Design Theory for Relational Databases

decomposition has both the lossless join and


dependency preservation
Dependency Preservation

 Example: Bookings(title, theater, city)


I2DB: Design Theory for Relational Databases

 FD’s
▪ theater → city
▪ title city → theater
theater city theater title
 Keys
Guild Menlo Park Guild Antz
▪ {title, city}
Park Menlo Park Park Antz
▪ {theater, title}

 theater → city violates BCNF

(theater is not a superkey) theater city Title


 Decomposition into two relations Guild Menlo Park Antz
▪ {theater, city} Part Menlo Park Antz
▪ {theater, title}

 When we join these two decomposed relations, the result relation may not satisfy the title

city → theater FD’s


How we can …

 … solve the BCNF’s weakness such as


I2DB: Design Theory for Relational Databases

 The preservation of dependencies

  3NF
Exercises

 Exercise 3.3.1 (page 92)


I2DB: Design Theory for Relational Databases

 Exercise 3.4.1 (page 102)


3.5 Third Normal Form (3NF)

 3NF is the solution that have both the lossless-join, and


I2DB: Design Theory for Relational Databases

dependencies-preservation properties
 A relation R is in the third normal form (3NF) if and only if whenever a

nontrivial FD A1A2…AnB1B2…Bm, , either


▪ {A1,A2,…,An} is a super key,

▪ Or those B1,B2,…Bm that are not among the A’s, are each a member of some key (not
necessarily the same key)

 Notation
 An attribute that is a member of some key is often said to be prime

 For each nontrivial FD, either the left side is a super key, or the right side

consists of prime (member of key) attributes only


Differences between BCNF and
3NF

3NF BCNF
I2DB: Design Theory for Relational Databases

A nontrivial functional dependency: A nontrivial functional dependency:


X  B holds in R, either X  B holds in R, then
(a) X is a super key of R, or (a) X is a super key of R
(b) B is prime attribute of R

Note: A functional dependency X  Y is a full functional


dependency if removal of any attribute A from X means that
the dependency does not hold any more; A partial
functional dependency is not a full functional dependency
The Synthesis Algorithm for 3NF
Schemas

 Goal: decomposing a relation R into a set of


I2DB: Design Theory for Relational Databases

relations such that:


 The relations of the decomposition are all in 3NF

 The decomposition has a lossless join

 The decomposition has the dependency-preservation

property
The Synthesis Algorithm for 3NF
Schemas

 Algorithm 3.26: Synthesis of 3NF relations


I2DB: Design Theory for Relational Databases

 Input: A relation R, and a set F of functional dependencies that hold for R

 Output: A decomposition of R into a collection of relations, each of which

is in 3NF. The decomposition has the lossless-join and dependency-


preservation properties
 Method: Perform the following steps
▪ 1) Find a minimal basis of F, say G

▪ 2) For each FD X  A in G, use {X, A} as the schema of one of the relations in the
decomposition
▪ 3) If none of the sets of relations from Step2 is a superkey for R, add another
relation whose schema is a key for R
The Synthesis Algorithm for 3NF
Schemas

 Example 3.27
I2DB: Design Theory for Relational Databases

 Given R(A,B,C,D,E), with FD’s AB C, C  B, A  D

 Notice: the given FD’s are their own minimal basis


▪ Cannot eliminate any of the given FD’s

▪ Cannot eliminate any attribute from a left side

 Start 3NF synthesis by follows (1), (2), and (3)


▪ New relations: S1(A,B,C), S2(B,C), S3(A,D)

▪ S2 is a subset of S1, then we can drop S2

▪ Consider whether we need to add a relation whose scheme is a key


▪ In this case, R has two keys: {A,B,E} and {A,C,E}

▪ Since neither of these keys is a subset of the schemas, we must add one of them, say S4(A,B,E)

▪ The final decomposition of R is S1(A,B,C), S3(A,D), S4(A,B,E)


The Synthesis Algorithm for 3NF
Schemas

 Why this Algorithm works?


I2DB: Design Theory for Relational Databases

 Lossless Join

 Dependency Preservation

 Third Normal Form


I2DB: Design Theory for Relational Databases


Exercise 3.5.1 (page 105)
Exercises
Normal Form Summary

 Unnormalized Form (UNF)


I2DB: Design Theory for Relational Databases

 A relation that contains one or more repeating

groups.
 Example:

Manufacture Model CPU RAM HD


HP 2001 2.00 2048 240
2002 2.40 2048 320
Toshiba 2003 2.00 2048 120
2004 2.80 4096 500
Sony 2010 2.40 2048 300
IBM 2011 2.20 4096 300
Normal Form Summary

 First Normal Form (1NF)


I2DB: Design Theory for Relational Databases

 Every component of every tuple is an atomic value

(intersection of each row and column contains one


and only one value)
Manufacture Model CPU RAM HD
HP 2001 2.00 2048 240
HP 2002 2.40 2048 320
Toshiba 2003 2.00 2048 120
Toshiba 2004 2.80 4096 500
Sony 2010 2.40 2048 300
IBM 2011 2.20 4096 300
Normal Form Summary

 Second Normal Form (2NF)


I2DB: Design Theory for Relational Databases

 Based on the concept of full functional dependency.

 X  Y is full functional dependency if Y is functionally dependent on X, but not


on any proper subset of X. i.e: ∄ V⊆ X, V  Y
 Second Normal Form (2NF) : a relation that is in 1NF and every non-primary-
key attribute is fully functionally dependent on the primary key.
 Example 1:
▪ R(A,B,C,D) and S={AB  CD, B  D}, so : key of R is AB
▪ B  D : D is functionally dependent on B, an subset of key AB so D is not fully functionally
dependent on the key AB
▪ R is not in 2NF

 Example 2:
▪ R(A,B,C,D) and S={AB  CD, C  D}
▪ R is in 2NF
Normal Form Summary

 Third Normal Form (3NF)


I2DB: Design Theory for Relational Databases

 Based on the concept of transitive dependency.

 Transitive Dependency is a condition where


▪ A, B and C are attributes of a relation such that if A  B
and B  C,
▪ then C is transitively dependent on A through B. (Provided that A is not functionally dependent on B or
C).

 Third Normal Form (3NF): A relation that is in 1NF and 2NF and in which NO non-key

attribute is transitively dependent on the key


 Example 1: R(A,B,C,D), AB  CD and C  D
▪ AB  C and C  D so that D is transitively dependent on AB

▪ R is not in 3NF

 Example 2: R(A,B,C,D), AB  CD and C  A, C  B


▪ although C is not a superkey but A, B are key attributes (prime attributes)

▪ R is in 3NF
3.6 Multi valued Dependencies
(self studying)

 Suppose we have relation R,


I2DB: Design Theory for Relational Databases

a MVD on R, X ->-> Y, says that if two tuples of R


agree on all the attributes of X, then their
components in Y may be swapped, and the result
will be two tuples that are also in the relation
Multi valued Dependencies

 Example about MVD


I2DB: Design Theory for Relational Databases

Name street city title year


C. Fisher 123 Maple St. Hollywood Star Wars 1977
C. Fisher 5 Locust Ln. Malibu Star Wars 1977
C. Fisher 123 Maple St. Hollywood Empire Strikes Back 1980
C. Fisher 5 Locust Ln. Malibu Empire Strikes Back 1980
C. Fisher 123 Maple St. Hollywood Return of the Jedi 1983
C. Fisher 5 Locust Ln. Malibu Return of the Jedi 1983

 name ->-> street city


Name street city title year
C. Fisher 123 Maple St. Hollywood Star Wars 1977
C. Fisher 5 Locust Ln. Multi valued Dependencies
Malibu Star Wars 1977
C. Fisher 123 Maple St. Hollywood Empire Strikes Back 1980
C. Fisher 5 Locust Ln. Malibu Empire Strikes Back 1980
C. Fisher 123 Maple St. Hollywood Return of the Jedi 1983
 Example about Ln. MVD
I2DB: Design Theory for Relational Databases

C. Fisher 5 Locust Malibu Return of the Jedi 1983

name street city title year


C. Fisher 123 Maple St. Hollywood Star Wars 1977
C. Fisher 5 Locust Ln. Malibu Empire Strikes Back 1980

 name ->-> street city

 t=first tuple, u=fourth tuple


▪ Since t[name]=u[name], there are v1, v2 tuple in relation R such that
▪ Its name is C. Fisher, street and city agree with t, title and year agree with u
or (v1 is third tuple)
▪ Its name is C. Fisher, street and city agree with u, title and year agree with t
(v’2 is second tuple)
Multi valued Dependencies

 Picture of MVD X->->Y


I2DB: Design Theory for Relational Databases

X (A1, …, An) Y(B1, …, Bm) others (Z)


t

v1 v2

u
equal
swap Y and Z to get new v1,v2 tuples
Multi valued Dependencies

 MVD Rules
I2DB: Design Theory for Relational Databases

 Trivial MVD’s

▪ The MVD A1A2…An ->-> B1B2…Bm holds in any relation if {B1,B2,…,Bm}  {A1,A2,…,An}

 The transitive rule

▪ If A1A2…An ->-> B1B2…Bm, and B1B2…Bm ->-> C1C2…Ck hold for some relation, then so does
A1A2…An ->-> C1C2…Ck

 FD promotion

▪ Every FD is an MVD, that is if A1A2…An -> B1B2…Bm, then A1A2…An ->-> B1B2…Bm

 Complementation rule
▪ If X ->-> Y, and Z is all the other attributes, then X->->Z

 More Trivial MVD’s

▪ If all the attributes of R are (A1A2…An , B1B2…Bm) then A1A2…An ->-> B1B2…Bm holds in R
Multi valued Dependencies

 Splitting doesn’t hold


I2DB: Design Theory for Relational Databases

 Like FD’s, we cannot generally split the left side of MVD

 But unlike FD’s, we cannot generally split the right side of MVD
Name street city title year
C. Fisher 123 Maple St. Hollywood Star Wars 1977
C. Fisher 5 Locust Ln. Malibu Star Wars 1977
C. Fisher 123 Maple St. Hollywood Empire Strikes Back 1980
C. Fisher 5 Locust Ln. Malibu Empire Strikes Back 1980
C. Fisher 123 Maple St. Hollywood Return of the Jedi 1983
C. Fisher 5 Locust Ln. Malibu Return of the Jedi 1983

 So, name ->-> street city holds, but neither name ->-> street, nor name

->-> city holds for this relation


What we do with …

 … redundancy caused by MVD’s?


I2DB: Design Theory for Relational Databases

Name street city title year


C. Fisher 123 Maple St. Hollywood Star Wars 1977
C. Fisher 5 Locust Ln. Malibu Star Wars 1977
C. Fisher 123 Maple St. Hollywood Empire Strikes Back 1980
C. Fisher 5 Locust Ln. Malibu Empire Strikes Back 1980
C. Fisher 123 Maple St. Hollywood Return of the Jedi 1983
C. Fisher 5 Locust Ln. Malibu Return of the Jedi 1983
Fourth Normal Form

 The redundancy that comes from MVD’s is not


I2DB: Design Theory for Relational Databases

removable by putting the database schema in


BCNF
 There is a stronger normal form, called 4NF, that
treats MVD’s as FD’s when it comes to
decomposition, but not when determining keys of
the relation
Fourth Normal Form

 4NF statement
I2DB: Design Theory for Relational Databases

 A relation R is in 4NF if whenever X ->-> Y is a nontrivial

MVD, then X is a super key


 Nontrivial MVD means that
 Y is not a subset of X

 X and Y are not, together, all the attributes

 Note that the definition of ‘super key’ still depends


on FD’s only
BCNF Versus 4NF

 Every FD X→Y is also MVD X->->Y


I2DB: Design Theory for Relational Databases

 Thus, if R is in 4NF, it is certainly in BCNF


 But R could be in BCNF and not in 4NF
 Example: R(name,street,city,title,year) is BCNF but not

4NF because MDV name ->-> street city is a nontrivial


(name is not a key of R, but the key of R is all five
attributes)
Decomposition into 4NF

 Algorithm 3.33
I2DB: Design Theory for Relational Databases

 Input: A relation R0, with a set of FD’s and MVD’s S 0

 Output: A decomposition of R 0 into relations all of which are in 4NF

 Method:

▪ Initialize: R=R0, S=S0

▪ Find a 4NF violation in R, say X->->Y, where X is not a super key. If there is none,
return; R is suitable decomposition
▪ If there is such a violation, break the schema of R into two schemas:
▪ XY is one of the decomposed R1 relation

▪ And R2 schema has X and all attributes of R that are not among the X or Y

▪ Find the FD’s, MVD’s that hold in R1, R2. Recursively decompose R1, R2 again
Decomposition into 4NF

 Example 3.34
I2DB: Design Theory for Relational Databases

 The R(name, street, city, title, year) with MDV

name ->-> street city is a violation 4NF


 Decomposition relations:
▪ R1(name, street, city) with MDV name ->-> street, city
▪ And R2(name, title, year) with MDV name ->-> title, year
3.7 An Algorithm for Discovering MVD’s
(self studying)

 Reasoning about MVD’s and FD’s


I2DB: Design Theory for Relational Databases

 Problem: Given a set of MVD’s and/or FD’s that hold for

a relation R, Does a certain MVD or FD also hold in R?


 Solution: Use a tableau to explorer all inferences from

the given set, to see if you can prove the target


dependency
An Algorithm for Discovering
MVD’s

 Why do we care?
I2DB: Design Theory for Relational Databases

 4NF technically requires an MVD violation


▪ Need to infer MVD’s from given FD’s and MVD’s that may not be
violations themselves

 When we decompose, we need to project FD’s, MVD’s


The Closure and the Chase to FD

 Goal: Does X → Y hold in a relation R?


I2DB: Design Theory for Relational Databases

 Solution: Closing X with respect to F and seeing whether


Y  X+
 Method: Using and chasing the tableau by the FD’s F
 A chase-based test:
 Start with a tableau having two rows that agree only on X

 Chase the tableau using the FD’s of F

 If the final tableau agrees in all column of Y, then X → Y holds;

otherwise it does not


The Closure and Chase to FD

 Example: Suppose a relation R(A,B,C,D,E,F) with FD’s


I2DB: Design Theory for Relational Databases

AB→C, BC→AD, D→E, CF→B. Test AB→D hold in R?


 Start with the tableau having two rows that agree on A,B:
A B C D E F
a b c1 d1 e1 f1
a b c2 d2 e2 f2

 As AB→C, then c1=c2, we have


A B C D E F
a b c d1 e1 f1
a b c d2 e2 f2
The Closure and Chase to FD

 Example (cont)
I2DB: Design Theory for Relational Databases

A B C D E F
a b c d1 e1 f1
a b c d2 e2 f2

 As BC→AD, then d1=d2


A B C D E F
a b c d e1 f1
a b c d e1 f2

 We can go no further. Since the two tuples now agree in

the D column, AB→D does follow from the given FD’s


Extending the Chase to MVD’s

 For MVD X->->Y,


I2DB: Design Theory for Relational Databases

 We start with a tableau consisting of two tuples that


agree in X and disagree in all attributes not in the set X.
 We apply the given FD’s to equate symbols, and we
apply the given MVD’s to swap the values in certain
attributes between two existing rows of the tableau to
add new rows to the tableau
An Algorithm for Discovering
MVD’s

 Example: Suppose a relation R(A,B,C,D), and


I2DB: Design Theory for Relational Databases

A→B, B->->C. Test A->->C hold in R?


 Start with the tableau having two rows (for A->->C)
A B C D
a b1 c d1
a b c2 d

 Since A→B, then b1=b


A B C D
a b c d1
a b c2 d
An Algorithm for Discovering
MVD’s
A B C D
 Example (cont)
I2DB: Design Theory for Relational Databases

a b c d1
a b c2 d

 Since B->->C, then we swap two rows that agree in B column


to get two more rows:
A B C D
a b c d1
a b c2 d
a b c2 d1
a b c d

 We get a row (a,b,c,d), which proves that A->->C holds in


relation R
An Algorithm for Discovering
MVD’s

 Example: Suppose a relation R(A,B,C,D), A->->BC, D→C.


I2DB: Design Theory for Relational Databases

Test A→C hold ?


 Start with the tableau with two rows
A B C D
a b1 c1 d1
a b2 c2 d2

 Since A->->BC, then we swap the B and C columns to get two new tuples

into the tableau


A B C D
a b1 c1 d1
a b2 c2 d2
a b2 c2 d1
a b1 c1 d2
An Algorithm for Discovering
MVD’s

 Example (cont)
I2DB: Design Theory for Relational Databases

 We have pairs of rows that agree on D, so we can apply


the FD D→C, then c1=c2
A B C D
a b1 c1 d1
a b2 c2 d2
a b2 c2 d1
a b1 c1 d2
 So, we conclude that A→C
An Algorithm for Discovering
MVD’s

 Example: Suppose a relation R(A,B,C,D), A->->B,


I2DB: Design Theory for Relational Databases

B->->C. Test A->->C hold ?


A B C D
a b1 c d1
a b c2 d

 We have two rows agree on A, then we apply A->->B


A B C D
a b1 c d1
a b c2 d
a b c d1
a b1 c2 d
An Algorithm for Discovering
MVD’s
 Example: Suppose a relation R(A,B,C,D), A->->B, B->->C. A->-
I2DB: Design Theory for Relational Databases

>C?
 B->->C,
A the 2nd,3rd rows
B agree on B, then
C we swap the
D C column, we have
a b1 c d1
a b c2 d
a b c d1
a b c d
a b c2 d1
a b1 c2 d

 We have (a,b,c,d) which proves that A->->C


An Algorithm for Discovering
MVD’s

 Example: Suppose a relation R(A,B,C,D,E), and one of


I2DB: Design Theory for Relational Databases

the decomposed relation S(A,B,C). MVD A->->CD holds


in R. Does A->->C hold in S?
A B C D E
a b1 c d1 e1
a b c2 d e

 As A->->CD, swap the C and D columns


A B C D E
a b1 c d1 e1
a b c2 d e
a b1 c2 d e1
a b c d1 e
I2DB: Design Theory for Relational Databases


Exercise 3.6.2 (page 114)
Exercises

You might also like