Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 27

C20.

0046: Database
Management Systems
Lecture #7
Matthew P. Johnson
Stern School of Business, NYU
Spring, 2004

M.P. Johnson, DBMS, Stern/NYU, Sp2004 1


Agenda
 Last time: Normalization
 Homework 1 due now
 Project part 2 is up, due on the 19th (Thurs.)
 This time:
1. Finish BCNF
2. 3NF
3. 4NF
4. Relational Algebra…

M.P. Johnson, DBMS, Stern/NYU, Sp2004 2


BCNF Review
 Q: What’s required for BCNF?

 Q: What’s the slogan for BCNF?

 Q: Who are B & C?

 Q: What are the two types of violations?

M.P. Johnson, DBMS, Stern/NYU, Sp2004 3


BCNF Review
 Q: How do we fix a non-BCNF relation?

 Q: If AsBs violates BCNF, what do we do?


 Q: In this case, could the decomposition be lossy?

 Q: Under what circumstances could a


decomposition be lossy?

 Q: How do we combine two relations?

M.P. Johnson, DBMS, Stern/NYU, Sp2004 4


Decomposition algorithm example
 R(N,O,R,P) F = {N  O, O  R, R  N}
Name Office Residence Phone
George Pres. WH 202-…
George Pres. WH 486-…
Dick VP NO 202-…
 Key: N,P Dick VP NO 307-…
 Violations of BCNF: N  O, OR, N OR
 which kinds of violations are these?
 Pick N  OR (on board)
 Can we rejoin? (on board)
 What happens if we pick N  O instead?
 Can we rejoin? (on board)
M.P. Johnson, DBMS, Stern/NYU, Sp2004 5
Lossless BCNF decomposition
 Consider simple relation: R(A,B,C)
 Only FD: A  B (assume C!A) Key: A,C
 Diff vars from text! Also goes through if assumption is false
 BCNF violation (which kind?): no key on the left
 Thus: Decomposition to BCNF:
 Create R1(A,B) and R2(A,C)
 Could this be lossy?
 We will join R1 and R2 on A to find out

Q: Since C ! A, what kind Q: If C  A, then what


of bad FD do we have? kind do we have?

M.P. Johnson, DBMS, Stern/NYU, Sp2004 6


Lossless BCNF decomposition
 Suppose R contains (b,a,c) and (b’,a,c’)
 In projection onto (B,A):
 (b,a,c)  (b,a), (b’,a,c’)  (b’,a)
 In projection onto (A,C):
 (b,a,c)  (a,c), (b’,a,c’)  (a,c’)
 In joining, (b’,a), (a,c)  (b’,a,c)
 Q: Is/must/can this be correct?
 A: Yes! A  B, so b = b’
 So this was lossless
 We assumed C!A, but argument also goes
through when CA
 Moral: BCNF decomp alg is always lossless
M.P. Johnson, DBMS, Stern/NYU, Sp2004 7
BCNF summary
 BCNF decomposition is lossless
 Can reproduce original by joining
 Saw last time: Every 2-attribute relation is in
BCNF
 Final set of decomposed relations might be
different depending on
 Order of bad FDs chosen
 Saw last time: But all results will be in BCNF

M.P. Johnson, DBMS, Stern/NYU, Sp2004 8


A problem with BCNF
 Relation: R(Title, Theater, Neighboorhood)
 FDs:
 Title,N’hood  Theater

 Assume movie can’t play twice in same neighborhood


 Theater  N’hood
 Keys:
 {Title, N’hood}
Title Theater N’hood
 {Theater, Title}
City of God Angelica Village
Fog of War Angelica Village

M.P. Johnson, DBMS, Stern/NYU, Sp2004 9


A problem with BCNF
 BCNF violation: Theater  N’hood
 Decompose:
 {Theater, N’Hood}
 {Theater, Title}
 Resulting relations:
R1 R2
Theater N’hood Theater Title
Angelica Village Angelica City of God
Angelica Fog of War

M.P. Johnson, DBMS, Stern/NYU, Sp2004 10


Problem - continued
 Suppose we add new rows to R1 and R2:

 Their join:
R1 R2
Theater N’hood Theater Title
Angelica Village Angelica City of God
Film Forum Village Angelica Fog of War
Film Forum City of God
(R’)
Theater N’hood Title
Angelica Village City of God
Angelica Village Fog of War
Film Forum Village City of God

A and B could not enforce FD Title,N’hood  Theater


M.P. Johnson, DBMS, Stern/NYU, Sp2004 11
Third normal form: motivation
 There are some situations in which
 BCNF is not dependency-preserving, and
 Efficient checking for FD violation on updates is
important
 In these cases BCNF is too severe a req.
 Solution: define a weaker normal form, called
Third Normal Form
 in which FDs can be checked on individual relations
without performing a join (no inter-relational FDs)
 to which relations can be converted, preserving both
data and FDs

M.P. Johnson, DBMS, Stern/NYU, Sp2004 12


Third Normal Form
 BCNF decomposition is not dependency-preserving!
 We now define the (weaker) Third Normal Form
 Turns out: this example was already in 3NF

AArelation
relationRRisisin
in3rd
3rdnormal
normalform
formifif::

For
Forevery
everynontrivial
nontrivialdependency ...,AAnn
dependencyAA11,,AA22,,..., BB
for
for R,
R,{A{A11,,AA22,,...,
...,AAnn}}isisaasuper-key
super-keyfor
forR,R,
or
orBBisispart
partofofaakey,
key,i.e.,
i.e.,BBisisprime
prime
Tradeoff:
BCNF = no FD anomalies, but may lose some FDs
3NF = keeps all FDs, but may have some anomalies

M.P. Johnson, DBMS, Stern/NYU, Sp2004 13


BCNF: vices and virtues
 Be clear on the problem just described v. the
arg. that BCNF decomp is lossless
 BCNF decomp does not lose data
 Resulting relations can be rejoined to obtain the
original
 But: it can can lose dependencies
 After decomp, possible to add rows whose
corresponding rows would be illegal in (rejoined)
original

M.P. Johnson, DBMS, Stern/NYU, Sp2004 14


Recap: goals of normalization
 When we decompose a relation R with FDs F into
R1..Rn we want:
1. lossless-join decomposition – no data lost
2. no/little redundancy: the relations Ri should be in either
BCNF or at least 3NF
3. Dependency preservation: if Fi be the set of
dependencies in F+ that include only attributes in Ri:
 F is the “sum” of the FDs of the new relations
 (F1  F2  F3  …  Fn)+ = F+
 Otherwise checking updates for violation of FDs
may require computing joins, which is expensive
M.P. Johnson, DBMS, Stern/NYU, Sp2004 15
Dependency preservation
 Saw that last req. didn’t hold in move-theater
example
 Did it hold in R(N,O,R,P) example?
(on board)

M.P. Johnson, DBMS, Stern/NYU, Sp2004 16


Testing for 3NF
 For each dependency X  Y, use attribute closure
to check if X is a superkey
 If X is not a superkey, verify that each attribute in Y
is prime
 This test is rather more expensive, since it involves finding
candidate keys
 Testing for 3NF is NP-complete (in what?)
 Interestingly, decomposition into 3NF can be done in
polynomial time
  Testing for 3NF is harder than decomposing into 3NF!
 Optimization: need to check only FDs in F, need not
check all FDs in F+ (why?)

M.P. Johnson, DBMS, Stern/NYU, Sp2004 17


3NF Example
 R = (J, K, L)
 F = (JK  L, L  K)
 Two candidate keys: JK and JL
 R is in 3NF
 JK  L JK is a superkey
 LK K is prime
 BCNF decomposition yields
 R1 = (L,K), R2 = (L,J)
 testing for JK  L requires a join
 There is some redundancy in R
M.P. Johnson, DBMS, Stern/NYU, Sp2004 18
BCNF and 3NF Comparison
 Example of problems due to redundancy in 3NF
 R = (J, K, L) J K L
 F = (JK  L, L  K) j1 k1 l1
j2 k1 l1
j3 k1 l1
NULL k2 l2
 A schema that is in 3NF but not BCNF has the
problems of:
 redundancy (e.g., the relationship between l1 and k1)
 need to use null values (if allowed!), e.g. to represent the
relationship between l2 and k2 when there is no
corresponding value for attribute J

M.P. Johnson, DBMS, Stern/NYU, Sp2004 19


Comparison of BCNF and 3NF
 It is always possible to decompose a relation
into relations in 3NF such that:
 the decomposition is lossless
 the dependencies are preserved

 It is always possible to decompose a relation


into relations in BCNF such that:
 the decomposition is lossless
 but it may not be possible to preserve
dependencies
 But may eliminate more redundancy

M.P. Johnson, DBMS, Stern/NYU, Sp2004 20


The Normal Forms (so far)
 1NF: every attribute has an atomic value
 2NF: 1NF and no partial dependencies
 3NF: for each FD X  Y either
 it is trivial, or
 X is a superkey, or
 Y is a part of some key
 BCNF:
 3NF and third 3NF option disallowed
 I.e, 2NF and no transitive dependencies

M.P. Johnson, DBMS, Stern/NYU, Sp2004 21


Distinguishing examples
 1NF but not 2NF: R(Name, SSN ,Mailing-
address,Phone)
 Key: SSN,Phone
 Partial: ssn  name, address
 2NF but not 3NF: R(Title,Year,Studio,Pres,Pres-Addr)
 Key: Title,Year
 Transitive: studio  president

 3NF but not BCNF: R(Title, Theater, N’hood)


 Title,N’hood  Theater

 Prime-on-right: Theater  N’hood

M.P. Johnson, DBMS, Stern/NYU, Sp2004 22


Design Goals
 Goal for a relational database design is:
 No redundancy
 Lossless Join
 Dependency Preservation
 If we cannot achieve this, we accept one of
 dependency loss
 use of more expensive inter-relational methods to preserve
dependencies
 data redundancy due to use of 3NF
 Interesting: SQL does not provide a direct way of
specifying FDs other than superkeys
 can specify FDs using assertions, but they are expensive to test

M.P. Johnson, DBMS, Stern/NYU, Sp2004 23


3NF
 3NF means we may have anomalies
 Example: TEACH(student, teacher, subject)
 student, subject  teacher (students not allowed in the
same subject with two teachers)
 teacher  subject (each teacher teaches one subject)
 Subject is prime, so this is 3NF
 But we have anomalies:
 Insertion: cannot insert a teacher until we have a
student taking his subject
 If we convert to BCNF, we lost student,
subject  teacher
M.P. Johnson, DBMS, Stern/NYU, Sp2004 24
BCNF and over-normalization
 What is the problem?
 Schema overload – trying to capture two meanings:
 1) subject X can be taught by teacher Y
 2) student Z takes subject W from teacher V
 What to do?
 3NF has anomalies, normalizing to BCNF loses FDs
 One soln: keep the 3NF TEACH and another
(BCNF) relation SUBJECT-TAUGHT (teacher,
subject)
 Still (more!) redundancy, but no more insert and
delete anomalies

M.P. Johnson, DBMS, Stern/NYU, Sp2004 25


New topic: MVDs (3.7)
 Consider this relation
 People ~ their jobs ~ their residences
 Person-address/city: many-many
 Person-job: many-many
 Address/city-job: independent
Name SSN Jobs Streets Citys

Michael 123 Mayor 111 East 60th Street New York


Michael 123 Mayor 222 Brompton Road London
Michael 123 CEO 111 East 60th Street New York
Michael 123 CEO 222 Brompton Road London
Hilary 456 Senator 333 Some Street Chappaqua
Hilary 456 Senator 444 Embassy Row Washington
Hilary 456 First Lady 333 Some Street Chappaqua
Hilary 456 First Lady 444 Embassy Row Washington
Hilary 789 Lawyer 333 Some Street Chappaqua
Hilary 789 Lawyer 444 Embassy Row Washington
M.P. Johnson, DBMS, Stern/NYU, Sp2004 26
Redundancy in BCNF
Name Streets Citys Jobs
Michael 111 East 60th Street New York Mayor
Michael 222 Brompton Road London Mayor
Michael 111 East 60th Street New York CEO
Michael 222 Brompton Road London CEO
Hilary 333 Some Street Chappaqua Senator
Hilary 444 Embassy Row Washington Senator
Hilary 333 Some Street Chappaqua First Lady
Hilary 444 Embassy Row Washington First Lady
Hilary 333 Some Street Chappaqua Lawyer
Hilary 444 Embassy Row Washington Lawyer

 Lots of redundancy!  Now what?


 Key? All fields  New concept, leading
 None determined by others! to another normal form:
 Non-trivial FDs? None!  Multivalued
  In BCNF? Yes! dependencies
M.P. Johnson, DBMS, Stern/NYU, Sp2004 27

You might also like