Ch15 FDs and Normalization

Chapter 15
Basics of Functional
Dependencies
and
Normalization
Chapter 15 Outline
 Problems in Bad DB Design
 Functional Dependencies
 Normal Forms Based on Primary Keys
 General Definitions of Second and Third
Normal Forms
 Boyce-Codd Normal Form
Problems in Bad DB Design
Student# Studentname Course# CourseName
100 Ali CS100 C++
100 Ali CS101 Java
200 Ahmad cs200 OS
 Redundant data
 More space  Slow system
 Complexities of update
 Update anomalies:
 Insertion, Deletion Update Anomalies
 Attributes in which most of their values are Null
 Ambiguous meaning of Null
 Existed but unknown at present (e.g. Address)
 Not applicable (e.g. student average)
 Applicable but not assigned yet (e.g. student mark)
Update Anomalies
Insert anomalies
You cannot create/insert a new course unless you have a student
enrolled in a course.
Update anomalies
Incase of updating Studentname, you have to update many rows.
Delete anomalies
If a course has only one student. Deleting the student will delete the
course.
Student# Studentname Course# CourseName

100 Ali CS100 C++
100 Ali CS101 Java
200 Ahmad cs200 OS
Functional Dependency
A functional dependency (FD) is a constraint
between two sets of attributes in a relation schema
• If X and Y are two sets of attributes in the same
relation schema R, then X → Y means that X
functionally determines Y.
• FD is a property of the meaning or semantics of
the attributes
• The FD specifies a restriction on the possible
tuples that can form a relation instance r of R
The FD Constraint - Informally
Functional Example: A relation
dependency DEPARTMENT (DNO, DNAME, DLOC)
X  Y holds if and can have following FDs
only if whenever two FD1: DNO  DLOC
tuples agree on their FD2: DNO  DNAME
X-value, they must
necessarily agree on
their Y value
The FD Constraint - Formally
The FD constraint is that for any two tuples t1 and
t2 in the relation instance r(R) that have:
t1[X] = t2[X]
we must also have t1[Y] = t2[Y]
This means that the values of the Y component of a tuple depend on, or
are determined by, the X component
• The values of the X component of a tuple uniquely (or functionally)
determine the values of the Y component
If X  Y holds, then Y is functionally dependent on X
- X is termed the left-hand-side (LHS) of the FD or determinant
- Y is termed the right-hand-side (RHS) of the FD
Inference Rules
An Inference Rules in logic is a procedure

which combines known facts to produce
("infer") new facts
Example: If A is true, and A implies B,
then B is true
There are 6 inference rules: IR1 - IR6
IR1-IR3 are referred to as Armstrong’s
Inference Rules
IR1: Reflexive Rule
If Y X then X  Y
A set of attributes always determines itself
or any of its subsets
Example:
If ESSN {ESSN, Dependent_Name} then
{ESSN, Dependent_Name}  ESSN holds
IR2: Augmentation Rule
If X  Y Then XZ  YZ
Adding the same set of attributes to both
the LHS & RHS of a FD results in another
valid FD
Example:
If SSN  Ename then
{SSN, Address}  {Ename, Address}
IR3: Transitive Rule
If X  Y, Y  Z Then X  Z
FDs are transitive
Example:
If SSN  Dno and Dno  Dlocation
Then SSN  Dlocation
Armstrong's Inference Rules
The rules IR1-IR3 are sound and complete.

• Sound: Anything implied by such rules is logically correct
• Complete: Have the ability to imply any possible logical

FD’s
IR4: Decomposition Rule
If X  YZ Then X  Y, X  Z
We can remove attributes from the RHS of
a dependency, and decompose the FD
Example:
If SSN  {Ename, Dno} then
SSN  Ename and SSN  Dno
IR5: Additive (Union) Rule
If X  Y, X  Z Then X  YZ
We can union attributes from the RHS of a
dependency, and combine a set of FDs
into a single FD (reverse of IR4)
Example:
If SSN  Ename and SSN  Dno then
SSN  {Ename, Dno}
IR6: Pseudo transitive Rule
If X  Y, WY  Z Then WX  Z
Represents a variant of IR3
Example:
If SSN  MgrSSN and
{MgrSSN, Dependent_Name}  Relationship Then
{SSN, Dependent_Name} -> Relationship

Closure of a Set of FD’s (F+)
Definition: Given a set F of functional dependencies
on R. The closure of F denoted by F+ is the set of all
functional dependencies inferred from F via the
inference rules given previously.
To compute F+
Let F+ = F
Apply the inference rules repeatedly until no more
changes occur in F+
Example (1)
Let R(A,B,C,D) be a relation schema and
F={AB, AC, BCD} be a set of FD’s hold on R.
Find F+
A B and AC Then A BC (Rule 5)
ABC and BCD Then AD (Rule 3)
AB and AD Then ABD (Rule 5)
AC and AD Then A CD (Rule 5)
AB and AC and AD Then ABCD (Rule 5)
Example (2)
Given R(A, B, C, G, H, I) and
F={A->B, A->C, CG->H, CG->I, B->H}.
We list some members of F+ below
AB and BH Therefore AH (using IR3 )
CGH and CGI Therefore CG->HI (using IR5)
AC Then AGCG (using IR2) (By adding G)
CGI Therefore AGI (using IR3)
(OR by using IR6 AC, CGI therefore AGI)
Closure of Attribute Set
Given a relation schema R and a set of FD’s

that hold on R. Let α be a set of attributes
in R. Then
α+= α plus all attributes that can be implied
directly or indirectly from α
Example (1)
Given R(A, B, C) with functional dependencies
F={AB and BC}. Calculate A+
Initially, A+ ={A}.
And then use the given FD’s
From AB we get A+ = {A, B}.
From BC we get A+ = {A, B, C}.
Therefore,
A+ = {A, B, C} which is all attributes of R
so A is a candidate key.
Example (2)
Given R ( A, B, C, D, E, F ) with a set of FDs
F = {A  BC, E  CF, B  E, CD  EF}
Find the candidate key for R.
A+={ABCEF} (By using the algorithm)
B+={BECF}
……
AB+={ABCEF}
AD+={ADBCEF} which is a candidate key
Normalization
Normalization is a method for organizing data elements in a
database into tables to minimize duplication
Why Normalization?
Reduce Redundant data
Remove Inconsistent data
Reduce anomalies
Increase data integrity
Simplify data maintenance
Take less disk space
Goal of Normalization
In each table all non-key attributes should be
dependent on the primary key
Normalization
Normal forms:
First Normal Form (1NF)
Second Normal Form (2NF)
Strength
Third Normal Form (3NF)
Boyce-codd Normal Form (BCNF)
First Normal Form (1NF)
A relation schema is in 1NF if:
domains of attributes include only atomic (simple,
indivisible) values
and the value of an attribute is a single value from the
domain of that attribute
Example of un-normalized relation
Let R(SSN,Name(F-name,L-name),{telephone})
Note: R has a composite attribute (Name) and has a
multivalue attribute (Telephone). Then R in not in 1NF (i.e.
unnormalized relation)
BCNF Form
Rule: Given a relation schema R and a set of

FD’s of the form (αß) that hold on R.
Then R is in BCNF if for all FD’s in F, one
of the following conditions is satisfied:
1) ß α or
2) α is super key
BCNF Example
Lending(Branch-name,Branch-city,Branch-assets,Loan-no,Amount,Customer)
FD1
FD2
FD1: α is not S.K and ß not α

Then Lending must be decomposed into:
R1 which includes α and ß
R2 which includes R – ß
R1(Branch-name,Branch-city,Branch-assets)
R2(Branch-name, Loan-no, Amount, Customer)

BCNF Example
FD1
FD2


BCNF Example
FD1
FD2


BCNF Example Cont.
Repeat the procedure for R1 and R2 again:
R1 has only one FD (α is S.K). So, R1 is in BCNF
R2 has one FD which does not satisfied the conditions. So

decompose R2 into R21 and R22
R21(Loan-no, Amount, Branch-name) which satisfies the S.K condition

BCNF Example Cont.
Repeat the procedure for R1 and R2 again:
R1 has only one FD (α is S.K). So, R1 is in BCNF
R2 has one FD which does not satisfied the conditions. So

decompose R2 into R21 and R22
R21(Loan-no, Amount, Branch-name) which satisfies the S.K condition

BCNF Example Cont.
R22(Loan-no, Customer) (R – ß)
Rule: Any attribute which does not determined by FD must
be part of a key.
Lending will be as follows:
Lending (R)
R1 R2
R21 R22
BCNF Example Cont.
R22(Loan-no, Customer) (R – ß)
Rule: Any attribute which does not determined by FD must
be part of a key.
Lending will be as follows:
Lending (R)
R1 R2
Only R1, R21
and R22 will
be in the DB R21 R22
3NF
Rule: Given a relation schema R and a set of FD’s of the
form (αß) that hold on R. Then R is in 3NF if for all
FD’s in F, one of the following conditions is satisfied:
1) ß α or
2) α is super key or
3) Each attribute in ß is prime
Prime attribute: An attribute that is a member of any
candidate key
Nonprime attribute: An attribute that is not a member of
any candidate key
3NF Example
R(Branch-name,Customer-name,Banker-name,Office-no)
FD2:
α is not S.K
ß not α
ß1 is prime but ß2 is not
Then, R is not in 3NF
R must be decomposed into:
R1 which includes α and all nonprime of ß
R2 which includes R – all nonprime of ß
3NF Example
ß1
FD2:
α is not S.K ß2
ß not α
3NF Example
ß1
FD1:
α is not S.K ß2
ß not α
R1(Banker-name,Office-no)
R2(Branch-name, Customer-name, Banker-name)

3NF Example
ß1
FD1:
α is not S.K ß2
ß not α

3NF Example
R1:
α is S.K So, R1 is in 3NF
R2: ß is prime attribute. So, R2 is in 3NF
2NF
Rule: Given a relation schema R and a set of FD’s of the
form (αß) that hold on R. Then R is in 2NF if for all
FD’s in F, one of the following conditions is satisfied:
1) ß α or
2) α is super key or
3) Each attribute in ß is prime or
4) α is not proper subset of a key
2NF Example
ß1
ß2
FD1:
α
α is S.K
FD2:
α is not subset of a key
So, R is in 2NF
Example
R A B C D E F Full
Dependency
Transitive
Partial Dependency
Dependency
Normalization Steps
If a relation has repeating groups or multivalue
Then remove the repeating group and split the
multivalue into a new relation to be in 1NF
Remove partial dependency to be in 2NF
Remove transitive dependency to be in 3NF
When a relation schema is satisfied 3NF:
Partial dependencies are removed
Transitive dependencies are removed
All attributes are dependent on P.K
Tables are small and well-formd
Example (When R must not be in BCNF)
Let R(A, B, C, D, E) be a relation schema and F={A  B,
AC DE, DC) be a set of functional
dependencies hold on R. Check if R is in BCNF or not?
Solution: R(A, B, C, D, E)
FD1
FD2
FD3
FD1: α is not super key. So, decompose R into R1 and R2
R1(A, B)
R2(A, C, D, E)
Example
R2(A, C, D, E)
FD1: α is S.K
FD2: α is not S.K. But if we decompose R2 according to FD2 we will loss FD
FD1 is lost
R21 (D, C) R22(A, D, E)
Example
R2(A, C, D, E)
FD1: α is S.K
FD1 is lost
R21 (D, C) R22(A, D, E)
Example
R2(A, C, D, E)
FD1: α is S.K
FD1 is lost
R21 (D, C) R22(A, D, E)
So, we return to previous

normal form which is 3NF
Then, R1 is in BCNF and
R2 is in 3NF because ß is prime
Example
R(SSN, Pno, Hours, Ename, Pname, Plocation)
1NF: R is in 1NF because there is no repeating group (composite) and no

multivalue attribute.
2NF:
FD1: α (ssn,pno )is super key
FD2: α (ssn) is not super key
ß (ename) is not prime attribute
α (ssn) is a part of a key. So, R is not in 2NF. Then decompose R into:
R1=(α, ß)=(SSN, Ename)
R2=(R- ß) = (SSN, Pno, Hours, Ename, Pname, Plocation

Example
R1=(α, ß)=(SSN, Ename) R1 is in 2NF
R2=(R- ß) = (SSN, Pno, Hours, Ename, Pname, Plocation)
R2:
FD1: α (ssn,pno )is super key
FD2: α (pno) is not super key
ß (pname, plocation) is not prime attribute
α (pno) is a part of a key. So, R is not in 2NF. Then decompose R2 into:
R21=(α , ß) = (SSN, Pno, Hours)
R22=(R- ß) = (Pno, Pname, Plocation)
R21 and R22 are in 2NF and also in 3NF and in BCNF

Ch15 FDs and Normalization

Uploaded by

Copyright:

Available Formats

You might also like

Ch15 FDs and Normalization

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ch15 FDs and Normalization

Uploaded by

Copyright:

Available Formats

Chapter 15

Student# Studentname Course# CourseName

An Inference Rules in logic is a procedure

The rules IR1-IR3 are sound and complete.

• Complete: Have the ability to imply any possible logical

{SSN, Dependent_Name} -> Relationship

Given a relation schema R and a set of FD’s

Rule: Given a relation schema R and a set of

FD1: α is not S.K and ß not α

R2(Branch-name, Loan-no, Amount, Customer)

FD1: α is not S.K and ß not α

R2(Branch-name, Loan-no, Amount, Customer)

FD1: α is not S.K and ß not α

R2(Branch-name, Loan-no, Amount, Customer)

R1 has only one FD (α is S.K). So, R1 is in BCNF

R2(Branch-name, Loan-no, Amount, Customer)

R2 has one FD which does not satisfied the conditions. So

R21(Loan-no, Amount, Branch-name) which satisfies the S.K condition

R1 has only one FD (α is S.K). So, R1 is in BCNF

R2(Branch-name, Loan-no, Amount, Customer)

R2 has one FD which does not satisfied the conditions. So

R21(Loan-no, Amount, Branch-name) which satisfies the S.K condition

R2(Branch-name, Customer-name, Banker-name)

R2(Branch-name, Customer-name, Banker-name)

R2(Branch-name, Customer-name, Banker-name)

FD1: α is not super key. So, decompose R into R1 and R2

So, we return to previous

1NF: R is in 1NF because there is no repeating group (composite) and no

R2=(R- ß) = (SSN, Pno, Hours, Ename, Pname, Plocation

R2=(R- ß) = (SSN, Pno, Hours, Ename, Pname, Plocation)

R22=(R- ß) = (Pno, Pname, Plocation)

You might also like