ER Diagrams (Concluded), Schema Refinement, and Normalization

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 39

ER Diagrams (Concluded),

Schema Refinement, and Normalization


Zachary G. Ives
University of Pennsylvania
CIS 550 Database & Information Systems


October 6, 2005

Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan
2
Examples of ER Diagrams
Please interpret these ER diagrams:
COURSES
STUDENTS
Takes
COURSES
STUDENTS Takes
STUDENTS
COURSES
Takes
3
Converting ER Relationship Sets to
Tables: 1:n Relationships
CREATE TABLE Teaches(
fid INTEGER,
serno CHAR(15),
semester CHAR(4),
PRIMARY KEY (serno),
FOREIGN KEY (fid)
REFERENCES PROFESSORS,
FOREIGN KEY (serno) REFERENCES Teaches)
CREATE TABLE Teaches_Course(
serno INTEGER,
subj VARCHAR(30),
cid CHAR(15),
fid CHAR(15),
when CHAR(4),
PRIMARY KEY (serno),
FOREIGN KEY (fid) REFERENCES PROFESSORS)
1 entity = key of
relationship set:
Or embed
relationship in
many entity set:
COURSES
PROFESSORS
Teaches
4
1:1 Relationships
If you borrow money or have credit, you might get:





What are the table options?
CreditReport Borrower
delinquent?
ssn
name debt
Describes
rid
5
ISA Relationships: Subclassing
(Structurally)
Inheritance states that one entity is a special kind
of another entity: subclass should be member of
base class
name
ISA
People
id
Employees
salary
6
But How Does this Translate
into the Relational Model?
Compare these options:
Two tables, disjoint tuples
Two tables, disjoint attributes
One table with NULLs
Object-relational databases (allow subclassing of tables)
7
Weak Entities
A weak entity can only be identified uniquely using the primary
key of another (owner) entity.
Owner and weak entity sets in a one-to-many relationship
set, 1 owner : many weak entities
Weak entity set must have total participation
People
Feeds
Pets
ssn name weeklyCost name
species
8
Translating Weak Entity Sets
Weak entity set and identifying relationship set are translated
into a single table; when the owner entity is deleted, all
owned weak entities must also be deleted
CREATE TABLE Feed_Pets (
name VARCHAR(20),
species INTEGER,
weeklyCost REAL,
ssn CHAR(11) NOT NULL,
PRIMARY KEY (pname, ssn),
FOREIGN KEY (ssn) REFERENCES Employees,
ON DELETE CASCADE)
9
N-ary Relationships
Relationship sets can relate an arbitrary number of
entity sets:
Student Project
Advisor
Indep
Study
10
Summary of ER Diagrams
One of the primary ways of designing logical
schemas
CASE tools exist built around ER
(e.g. ERWin, PowerBuilder, etc.)
Translate the design automatically into DDL, XML, UML,
etc.
Use a slightly different notation that is better suited to
graphical displays
Some tools support constraints beyond what ER diagrams
can capture
Can you get different ER diagrams from the same data?
11
Schema Refinement & Design Theory
ER Diagrams give us a start in logical schema design
Sometimes need to refine our designs further
Theres a system and theory for this
Focus is on redundancy of data
Causes update, insertion, deletion anomalies
12
Not All Designs are Equally Good
Why is this a poor schema design?


And why is this one better?
Stuff(sid, name, serno, subj, cid, exp-grade)
Student(sid, name)
Course(serno, cid)
Subject(cid, subj)
Takes(sid, serno, exp-grade)
13
Focus on the Bad Design






Certain items (e.g., name) get repeated
Some information requires that a student be enrolled
(e.g., courses) due to the key

sid name serno subj cid exp-grade
1 Sam 570103 AI 520 B
23 Nitin 550103 DB 550 A
45 Jill 505103 OS 505 A
1 Sam 505103 OS 505 C
14
Functional Dependencies
Describe Key-Like Relationships
A key is a set of attributes where:
If keys match, then the tuples match
A functional dependency (FD) is a generalization:
If an attribute set determines another, written X ! Y
then if two tuples agree on attribute set X, they must
agree on X:

sid ! name

What other FDs are there in this data?
FDs are independent of our schema design choice
15
Formal Definition of FDs
Def. Given a relation schema R and subsets X, Y of R:
An instance r of R satisfies FD X Y if,
for any two tuples t1, t2 2 r,
t1[X ] = t2[X] implies t1[Y] = t2[Y]
For an FD to hold for schema R, it must hold for
every possible instance of r

(Can a DBMS verify this? Can we determine this by looking
at an instance?)
16
General Thoughts on Good Schemas
We want all attributes in every tuple to be determined
by the tuples key attributes, i.e. part of a superkey
(for key X Y, a superkey is a non-minimal X)
What does this say about redundancy?
But:
What about tuples that dont have keys (other than the entire
value)?
What about the fact that every attribute determines itself?

17
Armstrongs Axioms: Inferring FDs
Some FDs exist due to others; can compute using
Armstrongs axioms:
Reflexivity: If Y _ X then X Y (trivial dependencies)
name, sid name
Augmentation: If X Y then XW YW
serno subj so serno, exp-grade subj, exp-grade
Transitivity: If X Y and Y Z then X Z
serno cid and cid subj
so serno subj

18
Armstrongs Axioms Lead to
Union: If X Y and X Z
then X YZ
Pseudotransitivity: If X Y and WY Z
then XW Z
Decomposition: If X Y and Z _ Y
then X Z

Lets prove these from Armstrongs Axioms
19
Closure of a Set of FDs
Defn. Let F be a set of FDs.
Its closure, F
+
,

is the set of all FDs:
{X Y | X Y is derivable from F by Armstrongs
Axioms}
Which of the following are in the closure of our Student-Course
FDs?
name name
cid subj
serno subj
cid, sid subj
cid sid
20
Attribute Closures: Is Something
Dependent on X?
Defn. The closure of an attribute set X, X
+
, is:
X
+
= {Y | X Y e F
+
}
This answers the question is Y determined
(transitively) by X?; compute X
+
by:





Does sid, serno subj, exp-grade?
closure := X;
repeat until no change {
if there is an FD U V in F
such that U is in closure
then add V to closure}
21
Equivalence of FD sets
Defn. Two sets of FDs, F and G, are equivalent if
their closures are equivalent, F
+
= G
+

e.g., these two sets are equivalent:
{XY Z, X Y} and
{X Z, X Y}

F
+
contains a huge number of FDs
(exponential in the size of the schema)
Would like to have smallest representative FD
set
22
Minimal Cover
Defn. A FD set F is minimal if:
1. Every FD in F is of the form X A,
where A is a single attribute
2. For no X A in F is:
F {X A } equivalent to F
3. For no X A in F and Z c X is:
F {X A } {Z A } equivalent to F
Defn. F is a minimum cover for G if F is minimal and is
equivalent to G.
e.g.,
{X Z, X Y} is a minimal cover for
{XY Z, X Z, X Y}
in a sense,
each FD is
essential
to the cover
we express
each FD in
simplest form
23
More on Closures
If F is a set of FDs and X Y e F
+

then for some attribute A e Y, X A e F
+


Proof by counterexample.
Assume otherwise and let Y = {A
1
,..., A
n
}
Since we assume X A
1
, ..., X A
n
are in F
+

then X A
1
... A
n
is in F
+
by union rule,
hence, X Y is in F
+
which is a contradiction


24
Why Armstrongs Axioms?
Why are Armstrongs axioms (or an equivalent rule
set) appropriate for FDs? They are:
Consistent: any relation satisfying FDs in F will satisfy
those in F
+

Complete: if an FD X Y cannot be derived by
Armstrongs axioms from F, then there exists some
relational instance satisfying F but not
X Y

In other words, Armstrongs axioms derive all the
FDs that should hold
25
Proving Consistency
We prove that the axioms definitions must be true
for any instance, e.g.:
For augmentation (if X Y then XW YW):

If an instance satisfies X Y, then:
For any tuples t
1
, t
2
er,
if t
1
[X] = t
2
[X] then t
1
[Y] = t
2
[Y] by defn.

If, additionally, it is given that t
1
[W] = t
2
[W],
then t
1
[YW] = t
2
[YW]
26
Proving Completeness
Suppose X Y e F
+
and define a relational instance
r that satisfies F
+
but not X Y:
Then for some attribute A e Y, X A e F
+

Let some pair of tuples in r agree on X
+
but disagree
everywhere else:
x
1
x
2
... x
n
a
1,1
v
1
v
2
... v
m
w
1,1
w
2,1
...
x
1
x
2
... x
n
a
1,2
v
1
v
2
... v
m
w
1,2
w
2,2
...
X A X
+
X
R X
+
{A}
27
Proof of Completeness contd
Clearly this relation fails to satisfy X A and X Y.
We also have to check that it satisfies any FD in F
+
.
The tuples agree on only X
+
.
Thus the only FDs that might be violated are of the form
X Y where X _ X
+
and Y contains attributes in
R X
+
{A}.
But if X Ye F
+
and X _ X
+
then Y _ X
+
(reflexivity
and augmentation).
Therefore X Y is satisfied.
28
Decomposition
Consider our original bad attribute set


We could decompose it into



But this decomposition loses information about
the relationship between students and courses.
Why?
Stuff(sid, name, serno, subj, cid, exp-grade)
Student(sid, name)
Course(serno, cid)
Subject(cid, subj)
29
Lossless Join Decomposition
R
1
, R
k
is a lossless join decomposition of R w.r.t. an FD set F if
for every instance r of R that satisfies F,
[
R
1
(r) ... [
R
k
(r) = r
Consider:




What if we decompose on
(sid, name) and (serno, subj, cid, exp-grade)?
sid name serno subj cid exp-grade
1 Sam 570103 AI 570 B
23 Nitin 550103 DB 550 A
30
Testing for Lossless Join
R
1
, R
2
is a lossless join decomposition of R with respect to F
iff at least one of the following dependencies is in F+
(R
1
R
2
) R
1
R
2

(R
1
R
2
) R
2
R
1
So for the FD set:
sid name
serno cid, exp-grade
cid subj

Is (sid, name) and (serno, subj, cid, exp-grade) a lossless
decomposition?
31
Dependency Preservation
Ensures we can easily check whether a FD X Y
is violated during an update to a database:

The projection of an FD set F onto a set of attributes Z,
F
Z
is
{X Y | X Y e F
+
, X Y _ Z}
i.e., it is those FDs local to Zs attributes
A decomposition R
1
, , R
k
is dependency preserving if
F
+
= (F
R
1
... F
R
k
)
+

The decomposition hasnt lost any essential FDs, so
we can check without doing a join
32
Example of Lossless and
Dependency-Preserving Decompositions
Given relation scheme
R(name, street, city, st, zip, item, price)
And FD set name street, city
street, city st
street, city zip
name, item price
Consider the decomposition
R
1
(name, street, city, st, zip) and R
2
(name, item, price)
Is it lossless?
Is it dependency preserving?
What if we replaced the first FD by name, street city?
33
Another Example
Given scheme: R(sid, fid, subj)
and FD set: fid subj
sid, subj fid
Consider the decomposition
R
1
(sid, fid) and R
2
(fid, subj)

Is it lossless?
Is it dependency preserving?
34
FDs and Keys
Ideally, we want a design s.t. for each nontrivial
dependency X Y, X is a superkey for some
relation schema in R
We just saw that this isnt always possible
Hence we have two kinds of normal forms
35
Two Important Normal Forms
Boyce-Codd Normal Form (BCNF). For every relation
scheme R and for every X A that holds over R,
either A e X (it is trivial) ,or
or X is a superkey for R
Third Normal Form (3NF). For every relation scheme
R and for every X A that holds over R,
either A e X (it is trivial), or
X is a superkey for R, or
A is a member of some key for R
36
Normal Forms Compared
BCNF is preferable, but sometimes in conflict with
the goal of dependency preservation
Its strictly stronger than 3NF

Lets see algorithms to obtain:
A BCNF lossless join decomposition
A 3NF lossless join, dependency preserving
decomposition
37
BCNF Decomposition Algorithm
(from Korth et al.; our book gives recursive version)
result := {R}
compute F+
while there is a schema R
i
in result that is not in BCNF
{
let A B be a nontrivial FD on R
i

s.t. A R
i
is not in F+
and A and B are disjoint

result:= (result R
i
) {(R
i
- B), (A,B)}
}
38
3NF Decomposition Algorithm
by Phil Bernstein, now @ MS Research
Let F be a minimal cover
i:=0
for each FD A B in F {
if none of the schemas R
j
, 1s j s i, contains AB
{
increment i
R
i
:= (A, B)
}
}
if no schema R
j
, 1 s j s i contains a candidate key for R {
increment i
R
i
:= any candidate key for R
}
return (R
1
, , R
i
)
Build dep.-
preserving
decomp.
Ensure
lossless
decomp.
39
Summary
We can always decompose into 3NF and get:
Lossless join
Dependency preservation
But with BCNF we are only guaranteed lossless joins
BCNF is stronger than 3NF: every BCNF schema is
also in 3NF
The BCNF algorithm is nondeterministic, so there is
not a unique decomposition for a given schema R

You might also like