Ch15 FDs and Normalization

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 48

Chapter 15

Basics of Functional
Dependencies

and

Normalization
Chapter 15 Outline
 Problems in Bad DB Design
 Functional Dependencies
 Normal Forms Based on Primary Keys
 General Definitions of Second and Third
Normal Forms
 Boyce-Codd Normal Form
Problems in Bad DB Design
Student# Studentname Course# CourseName
100 Ali CS100 C++
100 Ali CS101 Java
200 Ahmad cs200 OS

 Redundant data
 More space  Slow system
 Complexities of update
 Update anomalies:
 Insertion, Deletion Update Anomalies
 Attributes in which most of their values are Null
 Ambiguous meaning of Null
 Existed but unknown at present (e.g. Address)
 Not applicable (e.g. student average)
 Applicable but not assigned yet (e.g. student mark)
Update Anomalies

Insert anomalies
You cannot create/insert a new course unless you have a student
enrolled in a course.
Update anomalies
Incase of updating Studentname, you have to update many rows.
Delete anomalies
If a course has only one student. Deleting the student will delete the
course.

Student# Studentname Course# CourseName


100 Ali CS100 C++
100 Ali CS101 Java
200 Ahmad cs200 OS
Functional Dependency
A functional dependency (FD) is a constraint
between two sets of attributes in a relation schema
• If X and Y are two sets of attributes in the same
relation schema R, then X → Y means that X
functionally determines Y.
• FD is a property of the meaning or semantics of
the attributes
• The FD specifies a restriction on the possible
tuples that can form a relation instance r of R
The FD Constraint - Informally
Functional Example: A relation
dependency DEPARTMENT (DNO, DNAME, DLOC)
X  Y holds if and can have following FDs
only if whenever two FD1: DNO  DLOC
tuples agree on their FD2: DNO  DNAME
X-value, they must
necessarily agree on
their Y value
The FD Constraint - Formally
The FD constraint is that for any two tuples t1 and
t2 in the relation instance r(R) that have:
t1[X] = t2[X]
we must also have t1[Y] = t2[Y]

This means that the values of the Y component of a tuple depend on, or
are determined by, the X component
• The values of the X component of a tuple uniquely (or functionally)
determine the values of the Y component
If X  Y holds, then Y is functionally dependent on X
- X is termed the left-hand-side (LHS) of the FD or determinant
- Y is termed the right-hand-side (RHS) of the FD
Inference Rules

An Inference Rules in logic is a procedure


which combines known facts to produce
("infer") new facts
Example: If A is true, and A implies B,
then B is true
There are 6 inference rules: IR1 - IR6
IR1-IR3 are referred to as Armstrong’s
Inference Rules
IR1: Reflexive Rule
If Y X then X  Y
A set of attributes always determines itself
or any of its subsets

Example:
If ESSN {ESSN, Dependent_Name} then
{ESSN, Dependent_Name}  ESSN holds
IR2: Augmentation Rule

If X  Y Then XZ  YZ
Adding the same set of attributes to both
the LHS & RHS of a FD results in another
valid FD
Example:
If SSN  Ename then
{SSN, Address}  {Ename, Address}
IR3: Transitive Rule

If X  Y, Y  Z Then X  Z
FDs are transitive
Example:
If SSN  Dno and Dno  Dlocation
Then SSN  Dlocation
Armstrong's Inference Rules

The rules IR1-IR3 are sound and complete.


• Sound: Anything implied by such rules is logically correct

• Complete: Have the ability to imply any possible logical


FD’s
IR4: Decomposition Rule

If X  YZ Then X  Y, X  Z
We can remove attributes from the RHS of
a dependency, and decompose the FD
Example:
If SSN  {Ename, Dno} then
SSN  Ename and SSN  Dno
IR5: Additive (Union) Rule

If X  Y, X  Z Then X  YZ
We can union attributes from the RHS of a
dependency, and combine a set of FDs
into a single FD (reverse of IR4)
Example:
If SSN  Ename and SSN  Dno then
SSN  {Ename, Dno}
IR6: Pseudo transitive Rule
If X  Y, WY  Z Then WX  Z
Represents a variant of IR3
Example:
If SSN  MgrSSN and
{MgrSSN, Dependent_Name}  Relationship Then

{SSN, Dependent_Name} -> Relationship


Closure of a Set of FD’s (F+)
Definition: Given a set F of functional dependencies
on R. The closure of F denoted by F+ is the set of all
functional dependencies inferred from F via the
inference rules given previously.
To compute F+
Let F+ = F
Apply the inference rules repeatedly until no more
changes occur in F+
Example (1)
Let R(A,B,C,D) be a relation schema and
F={AB, AC, BCD} be a set of FD’s hold on R.
Find F+
A B and AC Then A BC (Rule 5)
ABC and BCD Then AD (Rule 3)
AB and AD Then ABD (Rule 5)
AC and AD Then A CD (Rule 5)
AB and AC and AD Then ABCD (Rule 5)
Example (2)
Given R(A, B, C, G, H, I) and
F={A->B, A->C, CG->H, CG->I, B->H}.
We list some members of F+ below
AB and BH Therefore AH (using IR3 )
CGH and CGI Therefore CG->HI (using IR5)
AC Then AGCG (using IR2) (By adding G)
CGI Therefore AGI (using IR3)
(OR by using IR6 AC, CGI therefore AGI)
Closure of Attribute Set

Given a relation schema R and a set of FD’s


that hold on R. Let α be a set of attributes
in R. Then
α+= α plus all attributes that can be implied
directly or indirectly from α
Example (1)
Given R(A, B, C) with functional dependencies
F={AB and BC}. Calculate A+
Initially, A+ ={A}.
And then use the given FD’s
From AB we get A+ = {A, B}.
From BC we get A+ = {A, B, C}.
Therefore,
A+ = {A, B, C} which is all attributes of R
so A is a candidate key.
Example (2)
Given R ( A, B, C, D, E, F ) with a set of FDs
F = {A  BC, E  CF, B  E, CD  EF}
Find the candidate key for R.
A+={ABCEF} (By using the algorithm)
B+={BECF}
……
AB+={ABCEF}
AD+={ADBCEF} which is a candidate key
Normalization
Normalization is a method for organizing data elements in a
database into tables to minimize duplication
Why Normalization?
Reduce Redundant data
Remove Inconsistent data
Reduce anomalies
Increase data integrity
Simplify data maintenance
Take less disk space
Goal of Normalization
In each table all non-key attributes should be
dependent on the primary key
Normalization

Normal forms:
First Normal Form (1NF)
Second Normal Form (2NF)
Strength
Third Normal Form (3NF)
Boyce-codd Normal Form (BCNF)
First Normal Form (1NF)
A relation schema is in 1NF if:
domains of attributes include only atomic (simple,
indivisible) values
and the value of an attribute is a single value from the
domain of that attribute
Example of un-normalized relation
Let R(SSN,Name(F-name,L-name),{telephone})
Note: R has a composite attribute (Name) and has a
multivalue attribute (Telephone). Then R in not in 1NF (i.e.
unnormalized relation)
BCNF Form

Rule: Given a relation schema R and a set of


FD’s of the form (αß) that hold on R.
Then R is in BCNF if for all FD’s in F, one
of the following conditions is satisfied:
1) ß α or
2) α is super key
BCNF Example
Lending(Branch-name,Branch-city,Branch-assets,Loan-no,Amount,Customer)
FD1
FD2

FD1: α is not S.K and ß not α


Then Lending must be decomposed into:
R1 which includes α and ß
R2 which includes R – ß
R1(Branch-name,Branch-city,Branch-assets)

R2(Branch-name, Loan-no, Amount, Customer)


BCNF Example
Lending(Branch-name,Branch-city,Branch-assets,Loan-no,Amount,Customer)
FD1
FD2

FD1: α is not S.K and ß not α


Then Lending must be decomposed into:
R1 which includes α and ß
R2 which includes R – ß
R1(Branch-name,Branch-city,Branch-assets)

R2(Branch-name, Loan-no, Amount, Customer)


BCNF Example
Lending(Branch-name,Branch-city,Branch-assets,Loan-no,Amount,Customer)
FD1
FD2

FD1: α is not S.K and ß not α


Then Lending must be decomposed into:
R1 which includes α and ß
R2 which includes R – ß
R1(Branch-name,Branch-city,Branch-assets)

R2(Branch-name, Loan-no, Amount, Customer)


BCNF Example Cont.
Repeat the procedure for R1 and R2 again:
R1(Branch-name,Branch-city,Branch-assets)

R1 has only one FD (α is S.K). So, R1 is in BCNF

R2(Branch-name, Loan-no, Amount, Customer)

R2 has one FD which does not satisfied the conditions. So


decompose R2 into R21 and R22

R21(Loan-no, Amount, Branch-name) which satisfies the S.K condition


BCNF Example Cont.
Repeat the procedure for R1 and R2 again:
R1(Branch-name,Branch-city,Branch-assets)

R1 has only one FD (α is S.K). So, R1 is in BCNF

R2(Branch-name, Loan-no, Amount, Customer)

R2 has one FD which does not satisfied the conditions. So


decompose R2 into R21 and R22

R21(Loan-no, Amount, Branch-name) which satisfies the S.K condition


BCNF Example Cont.
R22(Loan-no, Customer) (R – ß)
Rule: Any attribute which does not determined by FD must
be part of a key.
Lending will be as follows:
Lending (R)

R1 R2

R21 R22
BCNF Example Cont.
R22(Loan-no, Customer) (R – ß)
Rule: Any attribute which does not determined by FD must
be part of a key.
Lending will be as follows:
Lending (R)

R1 R2
Only R1, R21
and R22 will
be in the DB R21 R22
3NF
Rule: Given a relation schema R and a set of FD’s of the
form (αß) that hold on R. Then R is in 3NF if for all
FD’s in F, one of the following conditions is satisfied:
1) ß α or
2) α is super key or
3) Each attribute in ß is prime
Prime attribute: An attribute that is a member of any
candidate key
Nonprime attribute: An attribute that is not a member of
any candidate key
3NF Example
R(Branch-name,Customer-name,Banker-name,Office-no)

FD2:
α is not S.K
ß not α
ß1 is prime but ß2 is not
Then, R is not in 3NF
R must be decomposed into:
R1 which includes α and all nonprime of ß
R2 which includes R – all nonprime of ß
3NF Example
R(Branch-name,Customer-name,Banker-name,Office-no)
ß1

FD2:
α is not S.K ß2
ß not α
ß1 is prime but ß2 is not
Then, R is not in 3NF
R must be decomposed into:
R1 which includes α and all nonprime of ß
R2 which includes R – all nonprime of ß
3NF Example
R(Branch-name,Customer-name,Banker-name,Office-no)
ß1

FD1:
α is not S.K ß2
ß not α
ß1 is prime but ß2 is not
Then, R is not in 3NF
R must be decomposed into:
R1 which includes α and all nonprime of ß
R2 which includes R – all nonprime of ß
R1(Banker-name,Office-no)

R2(Branch-name, Customer-name, Banker-name)


3NF Example
R(Branch-name,Customer-name,Banker-name,Office-no)
ß1

FD1:
α is not S.K ß2
ß not α
ß1 is prime but ß2 is not
Then, R is not in 3NF
R must be decomposed into:
R1 which includes α and all nonprime of ß
R2 which includes R – all nonprime of ß
R1(Banker-name,Office-no)

R2(Branch-name, Customer-name, Banker-name)


3NF Example
R1(Banker-name,Office-no)

R2(Branch-name, Customer-name, Banker-name)

R1:
α is S.K So, R1 is in 3NF
R2: ß is prime attribute. So, R2 is in 3NF
2NF
Rule: Given a relation schema R and a set of FD’s of the
form (αß) that hold on R. Then R is in 2NF if for all
FD’s in F, one of the following conditions is satisfied:
1) ß α or
2) α is super key or
3) Each attribute in ß is prime or
4) α is not proper subset of a key
2NF Example
R(Branch-name,Customer-name,Banker-name,Office-no)
ß1

ß2

FD1:
α
α is S.K
FD2:
α is not subset of a key
So, R is in 2NF
Example
R A B C D E F Full
Dependency

Transitive
Partial Dependency
Dependency
Normalization Steps
If a relation has repeating groups or multivalue
Then remove the repeating group and split the
multivalue into a new relation to be in 1NF
Remove partial dependency to be in 2NF
Remove transitive dependency to be in 3NF
When a relation schema is satisfied 3NF:
Partial dependencies are removed
Transitive dependencies are removed
All attributes are dependent on P.K
Tables are small and well-formd
Example (When R must not be in BCNF)
Let R(A, B, C, D, E) be a relation schema and F={A  B,
AC DE, DC) be a set of functional
dependencies hold on R. Check if R is in BCNF or not?
Solution: R(A, B, C, D, E)
FD1
FD2
FD3

FD1: α is not super key. So, decompose R into R1 and R2

R1(A, B)

R2(A, C, D, E)
Example
R2(A, C, D, E)

FD1: α is S.K
FD2: α is not S.K. But if we decompose R2 according to FD2 we will loss FD

FD1 is lost
R21 (D, C) R22(A, D, E)
Example
R2(A, C, D, E)

FD1: α is S.K
FD2: α is not S.K. But if we decompose R2 according to FD2 we will loss FD

FD1 is lost
R21 (D, C) R22(A, D, E)
Example
R2(A, C, D, E)

FD1: α is S.K
FD2: α is not S.K. But if we decompose R2 according to FD2 we will loss FD

FD1 is lost
R21 (D, C) R22(A, D, E)

So, we return to previous


normal form which is 3NF
Then, R1 is in BCNF and
R2 is in 3NF because ß is prime
Example
R(SSN, Pno, Hours, Ename, Pname, Plocation)

1NF: R is in 1NF because there is no repeating group (composite) and no


multivalue attribute.
2NF:
FD1: α (ssn,pno )is super key
FD2: α (ssn) is not super key
ß (ename) is not prime attribute
α (ssn) is a part of a key. So, R is not in 2NF. Then decompose R into:
R1=(α, ß)=(SSN, Ename)

R2=(R- ß) = (SSN, Pno, Hours, Ename, Pname, Plocation


Example
R1=(α, ß)=(SSN, Ename) R1 is in 2NF

R2=(R- ß) = (SSN, Pno, Hours, Ename, Pname, Plocation)

R2:
FD1: α (ssn,pno )is super key
FD2: α (pno) is not super key
ß (pname, plocation) is not prime attribute
α (pno) is a part of a key. So, R is not in 2NF. Then decompose R2 into:
R21=(α , ß) = (SSN, Pno, Hours)

R22=(R- ß) = (Pno, Pname, Plocation)

R21 and R22 are in 2NF and also in 3NF and in BCNF

You might also like