Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 45

Functional Dependencies

and Normalization
Relational Database Design-Part II
Good relation schema design
Goodness of relation schema can be done by
(a) Logical level
(b) Implementation level
Informal Design Guidelines For Relation Schemas
i) Semanitics of the Relation Attributes
Guidelines 1:
Design a relation schema that do not combine
attributes from multiple entity types & relationship types.
Ex.Emp_dept
(a) ssn, ename, add, dno, dname,mgrssn
Ex. Emp_proj
ii) Redundant Information in Tuples and Update anomalies
 when two attributes are mixed.
Update anomalies
i) Insertion anomaly
ii) deletion anomaly
iii) Modification anomaly
Ex. Emp_Proj

EmpID ProjID No_of_Hrs Ename Pname Ploc

101 1 25 Kumar Finance Pune


101 2 30 Kumar Marketing Mumbai
102 1 45 Ram Finance Pune
102 3 10 Ram Banking Chennai
103 4 20 Prasanth Inventory Hyderabad
Update Anomaly
 Changing the project name of project ID 3 from
“Banking” to “Customer-Accounting” may cause this
update to be made for all 100 employees working on
project P1
Insert Anomaly
 Cannot insert a project unless an employee is assigned

to.
 Inversely- Cannot insert an employee unless he/she is

assigned to a project.
Delete Anomaly
 When a project is deleted, it will result in deleting all the

employees who work on that project.


 Alternately, if an employee is the sole employee on a

project, deleting that employee would result in deleting


Guidelines 2:
Design the base relation schemas so that no
insertion, deletion, or modification anomalies are present in
the relations. If any anomalies are present, note them
clearly and make sure that the programs that update the
database will operate correctly.
create views for base relations with joins for easy querying.
iii) Reducing the Null values in tuples
Waste storage
Join operations at logical level
Guidelines 3:
Avoid placing attributes in a base relation whose values
may frequently be null.
4) Disallowing the possibility of Generating Spurious
Tuples Ename Ploc
Ex. Emp_loc
Ram Pune
Kumar Delhi
Mukesh Pune

Ex. Emp_Proj SSN Pno Hrs Pname Ploc


11 44 10 PA Pune
12 45 15 PB Delhi
13 46 14 PC Pune

SSN Pno Pname Ename Ploc


11 44 PA Ram Pune
12 45 PB Kumar Delhi
13 46 PC Ram Pune
Guidelines 4 :
 The relations should be designed to satisfy the lossless join condition.
 No spurious tuples should be generated by doing a natural join of any
relations.

 Design relation schemas so that they can be joined with equality


conditions on attributes that are either PRIMARY KEYS or FOREIGN KEYS.
 Do not have common attributes that are either PK or FK.
 If such relations are unavoidable do not join them, because it produces
spurious tuples.
Functional Dependencies
Functional dependency is a constraint between 2 sets of
attributes from the database.
 Functional dependency is a property of the semantics of
the attributes.
Database designers specify the semantics by Functional
dependency.
FD x->y
x determines y (Or) y determined by x
t1[x]=t2[x] => t1[y]=t2[y]
x, y are set of attributes.
A set of attributes X functionally determines a set of
attributes Y if the value of X determines a unique value for
Y
X Y holds if whenever two tuples have the same value
for X, they must have the same value for Y
If t1[X]=t2[X], then t1[Y]=t2[Y] in any relation instance
r(R)
X  Y in R specifies a constraint on all relation instances
r(R).
If K is a key of relation R, then K functionally determines
all attributes in R (since we never have two distinct tuples
with t1[K]=t2[K])
Employee ID determines employee name

Empid  ENAME
Project Number determines project name and location

PNUMBER  {PNAME, PLOCATION}


Employee ID and project number determines the hours

per week that the employee works on the project


{Empid, PNUMBER}  HOURS
An FD is a property of the attributes in the schema R
The constraint must hold on every relation instance r(R)
Given a set of FDs F, we can infer additional FDs that hold
whenever the FDs in F hold
Armstrong's inference rules:
 IR1. (Reflexive) If Y subset-of X, then X -> Y
 IR2. (Augmentation) If X -> Y, then XZ -> YZ
 (Notation: XZ stands for X U Z)
 IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z

IR1, IR2, IR3 form a sound and complete set of inference rules
 These are rules hold and all other rules that hold can be deduced
from these
Some additional inference rules that are useful:
Decomposition: If X -> YZ, then X -> Y and X -> Z
Union: If X -> Y and X -> Z, then X -> YZ
Psuedotransitivity: If X -> Y and WY -> Z, then WX -> Z
The last three inference rules, as well as any other
inference rules, can be inferred from IR1, IR2, and IR3
(completeness property)

Closure of a set F of FDs is the set F+ of all FDs that can


be inferred from F

Closure of a set of attributes X with respect to F is the set


+
X of all attributes that are functionally determined by X

X+ can be calculated by repeatedly applying IR1, IR2, IR3


using the FDs in F
Emp_Proj(SSN,Ename,Pno,Pname,Plocation, hours)
F = {SSN->Ename, Pno->Pname,Plocation,
{SSN,Pno}-> hours}
{SSN}+ ={SSN,Ename}
{Pno}+ ={Pno,Pname, Plocation}
{SSN,Pno}+={SSN,Ename,Pno,Pname,Plocaion, hours}
Finding the candidate keys
The first step in the process of finding a normal form and
decomposing a relation is to find the candidate keys.
R = (ABCDE), F = {A -> C, E -> D, B -> C}
the first step in finding the candidate keys is to find the
attribute closure given F.
+
A = AC
+
B = BC
+
E = DE
Any attribute that only appears on the right side in a trivial
dependency must be in the candidate key. For this, that
includes ABE. Does ABE+ get us to a candidate key?
+
ABE = ABCDE – yes it does. The candidate key is ABE.
Example-2
R = (ABCDE)
F = {A -> BE, C -> BE, B -> D}
Let’s compute the attribute closure:
+
A = ABDE
+
C = CBDE
+
B = BD
+
D =D
+
E =E
The candidate key is AC
Example-3
R = (ABCDEF),
F = {A -> B, B -> D, C -> D, E -> F}
Let’s compute the attribute closures:
+
A = ABD
+
B = BD
+
C = CD
+
D =D
+
E = EF
+
ACE = ABCDEF,
so ACE is a candidate key.
Example-4
R = (ABCD)
F={AB -> C, BC -> D, CD -> A}
AB+ = ABCD,
BC+=ABCD,

CD+=ACD
So our candidate keys are AB and BC..

Why isn’t BCD+ a candidate key?


Because it is not minimal.
Cover
We say that a set of functional dependencies F covers
another set of functional dependencies G, if every
functional dependency in G can be inferred from F.
More formally, F covers G if G+ ⊆ F+.
F is a minimal cover of G if F is the smallest set of
functional dependencies that cover G.
every set of functional dependencies has a minimal cover.
Also, note that there may be more than one minimal cover.
We find the minimal cover by iteratively simplifying the set
of functional dependencies.
Minimal Cover
 {A -> BC, B -> C, A -> B, AB -> C }
 first step: Break down the RHS of each functional dependency into a
single attribute:
{A -> B, A -> C, B -> C, A -> B, AB -> C}
then I will remove one of the two A -> B, so we will get:
{A -> B, A -> C, B -> C, AB -> C}
 second step: Remove extraneous attribute.
Extraneous attribute is a redundant attribute on the LHS of the
functional dependency (whose LHS has 2 or more attributes):
 for AB -> C , check if A is necessary by:
 replace AB -> C with B -> C so if B+ contains C then A is unnecessary:
 B+ = B (B -> C) = BC (so A is unnecessary) check if B is necessary by:
 replace AB -> C with A -> C so if A+ contains C then B is unnecessary:
 A+ = A (A -> B) = AB (A -> C) = ABC (so B is unnecessary) now we have:
{A ->B, A ->C, B ->C}
third step: Remove unnecessary or redundant
functional dependencies:
for A -> B check if A+ contains B without using A -> B:
A+ = A (A -> C) = AC (so A -> B is necessary)
A -> C check if A+ contains C without using A -> C:
A+ = A (A -> B) = ABC (so A -> C is not necessary)
B -> C check if B+ contains C without using B -> C:
B+ = B (so B -> C is necessary)
now we have:
{A -> B, B -> C }
Find the minimal cover of the set of functional
dependencies given; {A → C, AB → C, C → DI, CD → I,
EC → AB, EI → C}
Simple properties/steps of minimal cover:
1. Right Hand Side (RHS) of all FDs should be single attribute.
F1 = {A → C, AB → C, C → D, C → I, CD → I, EC → A,
EC → B, EI → C}
2. Remove extraneous attributes.
In the set of FDs, AB → C, CD → I, EC → A, EC → B, and EI → C
have more than one attribute in the LHS. Hence, we check one of
these LHS attributes are extraneous or not.
(i) A+ = ACDI
(ii) B+ = B
(iii) C+ = CDI
(iv) D+ = D
(v) E+ = E
(vi) I+ = I
From (i), the closure of A included the attribute C. So, B is
extraneous in AB → C, and B can be removed.
From (iii), the closure of C included the attribute I. So, D is
extraneous in CD → I, and D can be removed.
F2 = {A → C, C → D, C → I, EC → A, EC → B, EI → C}
3. Eliminate redundant functional dependency.
None of the FDs in F2 is redundant. Hence, F2 is minimal
cover.
Hence, set of functional dependencies F2 is the minimal
cover for the set F.
Normalization
 Refining the database
 Relational Db design by analysis
 Taking relations through normal forms.
 Normalization of data
 Normalization: Process of decomposing unsatisfactory "bad" relations
by breaking up their attributes into smaller relations.
 Decomposing relations to minimize redundancy and update anomalies.

Properties of Normalization
There are two important properties of decompositions:
a) Loss less join/ Non additive join property
 Decomposed relation doesn’t give spurious tuples.

b) Dependency preservation
each functional dependency is there in some decomposed relation.
Note that: Property (a) is extremely important and cannot be sacrificed.
Property (b) is less stringent and may be sacrificed.
A Lossy Decomposition
Example of Lossless Decomposition
 Decomposition of R = (A, B, C)
R1 = (A, B) R2 = (B, C)
Denormalization - Getting back the base relation.

First Normal Form (1NF)


Attributes should have atomic values.
ORDBMS are removed.
Eg. Department
dno, dname, mgrssn, dloc

1NF

D1( dno, dloc) ,D2(dno,dname,mgrssn)


Second Normal Form (2NF)
Prime attribute - attribute that is member of the primary key K.
Full functional dependency - a FD X  Y where removal of
any attribute from X means the FD does not hold any more
An attribute is called non prime if it is not a Prime attribute.
R is in 2NF if every non-prime attribute A in R is fully functionally
dependent on Primary Key. (i.e) there should not be partial
dependency on Primary Key.
Emp_proj
Ssn Pno Hrs Ename Pname Ploc
2 NF
FD1 : Ssn, Pno->Hrs
FD2 : Ssn->ename
FD3 : Pno->Pname,Ploc
Ssn,pno,hrs ssn,ename pno,pname,ploc
Third Normal Form (3NF)
A relation schema R is in third normal form (3NF) if it is in
2NF and no non-prime attribute A in R is transitively
dependent on the primary key
Eg. Emp_dept

Ssn Name Address Dno Dname Mgrssn


Ssn -> name,address 3NF
Ssn -> Dno
Dno -> Dname, MgrSsn
Ssn Name Address Dno

Dno Dname Mgrssn


BOYCE-CODD NORMAL FORM(BCNF)
Every determinant in a relation is a candidate key.
FD X->Y
X is determinant can be single or composite attribute
Eg. Teaching
FD1 : student, course -> Instructor
FD2 : Instructor -> Course

Student Course Instructor


Relational decomposition into BCNF
with nonadditive join property
1. For the relation R that not in BCNF.
take X -> Y that notation BCNF.
create 2 relations
(R – Y) & (X U Y)

Student, instructor Instructor, Course


Example
Repayment (borrower_id, name, address,
loanamount, requestdate, repayment_date,
repayment_amount)
A borrower is identified with an unique borrower_id, and
has only one address. Borrowers can have multiple
simultaneous loans, but they always have different request
dates. The borrower can make multiple repayments on the
same day, but not more than one repayment per loan per
day.
a) State a key (candidate key) for Repayment.
b) Make the normalization to BCNF. Show the steps.
Answer a)
we can derive the following set of functional dependencies
Borrower_id → name, address
Borrower_id, Requestdate → loanamount [given: more than
one loan cannot be requested by a single borrower]
Borrower_id, requestdate,
repayment_date →repayment_amount [given: a borrower
can make multiple repayments on a single day, but not on a
single loan]
Borrower_id, requestdate, repayment_date → name,
address, loanamount, repayment_amount.
Hence, attributes (Borrower_id, requestdate,
repayment_date) together forms a candidate key.
 Answer b):
 Is the given relation Repayment is in 1NF? Yes.
 Is the given relation is in 2NF?
 No. We have the following partial key dependencies.
 1. We can easily derive name and address of every borrower if we know the
borrower_id from the FDs Borrower_d → name, and Borrower_id → address.
 2. We can derive the loanamount if we know borrower_id, and requestdate from the
FD Borrower_id, Requestdate →loanamount.
 Hence, the relation Repayment is not in 2NF. To convert it into a 2NF relation, we
can decompose Repayment into the following relations;
 Borrower (Borrower_id, Name, Address)
 Borrower_loan (Borrower_id, Requestdate, Loanamount)
 Repayment (Borrower_id, Requestdate, Repayment_date, Repayment_amount)

Are these tables in 3NF?


 Yes. There are no transitive dependencies present in the above tables’ set of
functional dependencies. Hence, we would say that all these tables are in 3NF.
 Are these tables in BCNF?
 Yes. There are no more than one candidate keys present in the above set of tables.
Hence the following decomposed tables are in Boyce-Codd Normal Form.
Borrower (Borrower_id, Name, Address)
Borrower_loan (Borrower_id, Requestdate, Loanamount)
Shipping (ShipName, ShipType, VoyageID, Cargo, Port,
ArrivalDate)
ShipName → ShipType ;
ShipName, ArrivalDate → VoyageID, Port
VoyageID → ShipName, Cargo
Assume that the Shipping relation is in 1NF.
We have a partial key dependency as ShipName alone can
determine ShipType. It is not in 2NF.
SHIPS( ShipName, ShipType)
ShipName → ShipType ;
VOYAGES (ShipName (fk), VoyageID, Cargo, Port,
ArrivalDate)
ShipName, ArrivalDate → VoyageID,port ;
VoyageId → Shipname, Cargo
It is not in 3NF. Voyages relation has transitive
SHIPPORTS (ShipName (fk), VoyageID (fk), Port, ArrivalDate)
ShipName, ArrivalDate → VoyageID, Port
VoyageID → ShipName
CARGO (VoyageID, Cargo)
VoyageId → Cargo
SHIPS( ShipName, ShipType)
ShipName → ShipType
(d) SHIPPORTS is not in BCNF since it has VoyageID as a determinant
but VoyageID is not a candidate key.
So split the SHIPPORTS relation into SHIPDATES and SHIPVOYAGE
relations:
SHIPDATES (ShipName (fk), Port, ArrivalDate)
ShipName, Date → Port
SHIPVOYAGE (VoyageID (fk), ShipName (fk))
VoyageID → ShipName
CARGO (VoyageID, Cargo)
VoyageId → Cargo
SHIPS( ShipName, ShipType)
Sample problem
R(A,B,C,D,E,F,G)
FD= {A->DF, B->E, AB->C, F->G}
Check whether the relation R is in 3 NF? If not, normalize the
relation till 3NF.
(Assume that R is in 1 NF)
Candidate key of R?
AB+ = ABCDEFG
* There is a partial dependency in R(A->DF, B->E), hence the
relation is not in 2 NF. Decompose the relations into
R1(A,D,F,G) R2(B,E) R3(A,B,C)
A->D,F B->E A,B->C
F->G
R1(A,D,F,G)
A->D,F
F->G
The relation R1 is not in 3NF, as it has transitive
dependency.(A->F, F->G Hence A->G)
Decomposed the relation R1
D1(A,D,F) D2(F,G)
A->D,F F->G

The Decomposed relations are,

D1(A,D,F) D2(F,G) R2(B,E) R3(A,B,C)


Dependency Preservation Property of a
Decomposition
A decomposition D = {R1, R2, ..., Rm} of R is dependency-
preserving with respect to F if the union of the projections of F
on each Ri in D is equivalent to F; that is
((πR1(F)) υ . . . υ (π Rm(F)))+ = F+
Ex. R(ABCDE) F={A->B,B->CD,D->E)
R1(A,B) R2(BCD) R3(DE)
The above decomposed relations satisfy the
dependency preservation property.
Claim 1:
It is always possible to find a dependency-
preserving decomposition D with respect to F such
that each relation Ri in D is in 3nf.
Non-additive (Lossless) Join Property of a
Decomposition:

Definition: Lossless join property: a decomposition D = {R1,

R2, ..., Rm} of R has the lossless (nonadditive) join


property with respect to the set of dependencies F on R if,
for every relation state r of R that satisfies F, the following
holds, where * is the natural join of all the relations in D:

* (π R1(r), ..., π Rm(r)) = r


Note: The word loss in lossless refers to loss of information,

not to loss of tuples. In fact, for “loss of information” a better


term is “addition of spurious information”
Lossless (Non-additive) Join Property of a Decomposition :

 Algorithm 15.3: Testing for Lossless Join Property


 Input: A universal relation R, a decomposition D = {R1,
R2, ..., Rm} of R, and a set F of functional dependencies.
1. Create an initial matrix S with one row i for each relation Ri in D,
and one column j for each attribute Aj in R.
2. Set S(i,j):=bij for all matrix entries. (* each bij is a distinct
symbol associated with indices (i,j) *).
3. For each row i representing relation schema Ri
{for each column j representing attribute Aj
{if (relation Ri includes attribute Aj) then set S(i,j):=
aj;};};
 (* each aj is a distinct symbol associated with index (j) *)
4. Repeat the following loop until a complete loop execution results in no
changes to S
{for each functional dependency X Y in F
{for all rows in S which have the same symbols in the columns
corresponding to attributes in X
{make the symbols in each column that correspond to an
attribute in Y be the same in all these rows as follows:
If any of the rows has an “a” symbol for the
column, set the other rows to that same “a” symbol in the column.
If no “a” symbol exists for the attribute in any of the
rows, choose one of the “b” symbols that appear in one of the rows for the
attribute and set the other rows to that same “b” symbol in the column ;};
};
};
5. if a row is made up entirely of “a” symbols, then the decomposition has the
lossless join property; otherwise it does not.
Nonadditive join test for n-ary decompositions.
(a) Case 1: Decomposition of EMP_PROJ into EMP_PROJ1
and EMP_LOCS fails test.
Nonadditive join test for n-ary decompositions.
(c) Case 2: Decomposition of EMP_PROJ into EMP, PROJECT, and WORKS_ON satisfies
test.
Multivalued Dependencies and Fourth
Normal Form(4NF)
(a) The EMP relation with two MVDs: ENAME —>> PNAME and
ENAME —>> DNAME.
(b) Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS.
Join Dependencies and Fifth Normal Form
(5NF or Project-Join Normal Form (PJNF))
(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1, R2, R3). (d) Decomposing the relation SUPPLY into the
5NF relations R1, R2, and R3.

You might also like