Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 38

Normalization

1
What Is Normalization
• Normalization is the process of
minimizing redundancy from a relation or set
of relations. Redundancy in relation may cause
insertion, deletion and updation anomalies.

2
Anomalies
• Insertion Anomalies
• Deletion Anomalies.
• Modification Anomalies.

3
• Insert anomalies − We tried to insert data in a record that
does not exist at all.
• Deletion anomalies − We tried to delete a record, but parts of
it was left undeleted because of unawareness, the data is also
saved somewhere else.
• Update anomalies − If data items are scattered and are not
linked to each other properly, then it could lead to strange
situations. For example, when we try to update one data item
having its copies scattered over several places, a few instances
get updated properly while a few others are left with old
values. Such instances leave the database in an inconsistent
state.
4
Student Info
Stu-id Stu-name Age Address Branch Dept Hod name
Code
E101 Ajay 21 Indore 101 CS VIJAY
E102 Mohan 21 Indore 101 CS Vijay
E103 Jay 22 Bhopal 102 IT Saritha
E104 Ankit 22 Bhopal 102 IT Saritha
E105 Sonia 22 Ujjain 101 CS VIJAY
E106 Rahul 20 Ujjain 105 ME R.R
Sharma
E107 Sumit 20 Ujjain 101 CS VIJAY
E108 Sudha 20 Mumbai 105 ME R.R
Sharma
E109 Edwin 21 Delhi 103 EC Dr.joshi
E110 Neha 21 Pune 103 EC Dr. Joshi
E111 Neha 21 Pune 104 CE Prof.R Jain
E112 Rahul 19 Dewas 101 CS VIJAY

5
Solution of Problem
Stu-id Stu-name Age Address Branch- Branch Dept HOD
code _ocde
E101 Ajay 21 Indore 101 101 CS Vijay
E102 Mohan 21 Indore 101 102 IT Saritha
E103 Jay 22 Bhopal 102 103 EC Dr.joshi
E104 Ankit 22 Bhopal 102 104 CE Prof.R Jain
E105 Sonia 22 Ujjain 101 105 ME R.R Sharma
E106 Rahul 20 Ujjain 105 106 EX Jay Singh
E107 Sumit 20 Ujjain 101
E108 Sudha 20 Mumbai 105
E109 Edwin 21 Delhi 103
E110 Neha 21 Pune 103
E111 Neha 21 Pune 104
E112 Rahul 19 Dewas 101
6
First Normal Form
• A relation schema R is in first normal form (1NF) if the
domains of all attributes of R are atomic.
• A domain is atomic if elements of the domain are considered
to be indivisible units.
• 1NF disallows having a set of values, a tuple of values, or a
combination of both as an attribute value for a single tuple.

7
First Normal Form (cont’d.)
• Techniques to achieve first normal form
– Remove attribute and place in separate relation
– Expand the key
– Use several atomic attributes
• Does not allow nested relations
– Each tuple can have a relation within it
• To change to 1NF:
– Remove nested relation attributes into a new relation
– Propagate the primary key into it
– Unnest relation into a set of 1NF relations

8
9
Second Normal Form
• Second normal form (2NF) is based on the
concept of full functional dependency.
• Full Function dependency
X Y
if removal of any attribute A from X means
that the dependency does not hold any more;
that is, for any attribute A ε X, (X – {A}) does not
functionally determine Y.

10
Definition:-A relation schema R is in 2NF if every nonprime attribute A
in R is fully functionally dependent on the primary key of R.
• Prime attribute − An attribute, which is a part of the candidate-key, is
known as a prime attribute.
• Non-prime attribute − An attribute, which is not a part of the prime-key, is
said to be a non-prime attribute.

11
12
key
• K is a superkey for relation schema R if and only if K  R
• K is a candidate key for R if and only if
– K  R, and
– for no   K,   R
• Functional dependencies allow us to express constraints that
cannot be expressed using superkeys. Consider the schema:
bor_loan = (customer_id, loan_number, amount ).
We expect this functional dependency to hold:
loan_number  amount
but would not expect the following to hold:
amount  customer_name

13
Find Candidate Key
• Consider the universal relation R = {A, B, C, D, E, F,
G, H, I, J} and the set of functional dependencies F =
{ {A, B}→{C}, {A}→{D, E}, {B}→{F},
{F}→{G,H}, {D}→{I, J} }. What is the key for R?
Decompose R into 2NF and then 3NF relations.

14
Redundancy in 3NFBranch
Stu-id Branch
Code Name
• There is some redundancy in this schema
• Example of problems due to redundancy in 3NFE101 101 CS
Student (Stu_id,Branch_code,Dept) E102 101 CS
Stu_id Branch_code E103 102 IT
Branch_code Dept E104 102 IT
E105 101 CS
E106 105 ME
E107 101 CS
E108 105 ME
E109 103 EC
E110 103 EC
E111 104 CE
E112 101 CS
15
Decomposition
Stu_id Branch_code
E101 CS
Dept Branch_co
E102 CS de
E103 IT CS 101
E104 IT IT 102
E105 CS EC 103
E106 ME CE 104
E107 CS ME 105
E108 ME EX 106

E109 EC
E110 EC
E111 CE
E112 CS
16
Third Normal Form
• Third normal form (3NF) is based on the concept of transitive
dependency.
• A functional dependency X→Y in a relation schema R is a
transitive dependency if there exists a set of attributes Z in R
that is neither a candidate key nor a subset of any key of R,
and both X→Z and Z→Y hold.

17
18
Third Normal Form
• Definition. According to Codd’s original definition, a
relation schema R is in 3NF if it satisfies 2NF and no
nonprime attribute of R is transitively dependent on the
primary key.

19
Testing for 3NF
• Optimization: Need to check only FDs in F,
need not check all FDs in F+.
• Use attribute closure to check for each
dependency   , if  is a superkey.
• If  is not a superkey, we have to verify if each
attribute in  is contained in a candidate key
of R

20
Example
1. R=(ABCDE) and FD={AB C, B D,D E } What is
the key for R? Decompose R into 3NF

2.  Consider relation R(A, B, C, D, E)


A -> BC,
CD -> E,
B -> D,
E -> A,
What is the key for R? Decompose R into 3NF
21
General Definitions of Second
and Third Normal Forms

22
BCNF (Boyce-Codd Normal Form)

• A relation schema R is in Boyce-Codd Normal Form


(BCNF) if whenever an FD X  A holds in R, then X is
a superkey of R
– Each normal form is strictly stronger than the previous
one:
• Every 2NF relation is in 1NF
• Every 3NF relation is in 2NF
• Every BCNF relation is in 3NF
– There exist relations that are in 3NF but not in BCNF
– The goal is to have each relation in BCNF (or 3NF)

23
BCNF (Boyce-Codd Normal Form)

• A relation schema R is in BCNF with respect to a set F of


functional dependencies if for all functional dependencies in
F+ of the form

 

where   R and   R, at least one of the following holds:


1.    is trivial (i.e.,   )
2.  is a superkey for R

24
Decomposing a Schema into BCNF

• Suppose we have a schema R and a non-trivial dependency


 causes a violation of BCNF.
We decompose R into:
1. (U )
2. (R-( -  ) )

• In our example,
  = loan_number
  = amount
and bor_loan is replaced by
 (U  ) = ( loan_number, amount )
 ( R - (  -  ) ) = ( customer_id, loan_number )
25
Example of BCNF Decomposition
• R = (A, B, C, )
F = {A  B
B  C}
Key = {A}
• R is not in BCNF (B  C but B is not super key)
• Decomposition
R1 = (B, C)
R2 = (A,B)

26
Example of BCNF Decomposition

• Original relation R and functional dependency F


R = (branch_name, branch_city, assets,
customer_name, loan_number, amount )
F = {branch_name  assets branch_city
loan_number  amount branch_name }
Key = {loan_number, customer_name}
• Decomposition
– R1 = (branch_name, branch_city, assets )
– R2 = (branch_name, customer_name, loan_number, amount )
– R3 = (branch_name, loan_number, amount )
– R4 = (customer_name, loan_number )
• Final decomposition
R 1 , R 3 , R4

27
Multivalued Dependencies (MVDs)

• Let R be a relation schema and let   R and   R. The


multivalued dependency
  
holds on R if in any legal relation r(R), for all pairs for tuples t1
and t2 in r such that t1[] = t2 [], there exist tuples t3 and t4 in
r such that:
t1[] = t2 [] = t3 [] = t4 []
t3[] = t1 []
t3[R – ] = t2[R – ]
t4 [] = t2[]
t4[R – ] = t1[R – ]
28
MVD (Cont.)
• Tabular representation of   
Example
• Let R be a relation schema with a set of attributes that are
partitioned into 3 nonempty subsets.
Y, Z, W
• We say that Y  Z (Y multidetermines Z )
if and only if for all possible relations r (R )
< y1, z1, w1 >  r and < y1, z2, w2 >  r
then
< y1, z1, w2 >  r and < y1, z2, w1 >  r
• Note that since the behavior of Z and W are identical it follows
that
Y  Z if Y  W
Example (Cont.)
• In our example:
course  teacher
course  book
• The above formal definition is supposed to formalize the
notion that given a particular value of Y (course) it has
associated with it a set of values of Z (teacher) and a set of
values of W (book), and these two sets are in some sense
independent of each other.
• Note:
– If Y  Z then Y  Z
– Indeed we have (in above notation) Z1 = Z2
The claim follows.
Use of Multivalued Dependencies
• We use multivalued dependencies in two ways:
1.To test relations to determine whether they are legal under
a given set of functional and multivalued dependencies
2.To specify constraints on the set of legal relations. We shall
thus concern ourselves only with relations that satisfy a
given set of functional and multivalued dependencies.
• If a relation r fails to satisfy a given multivalued
dependency, we can construct a relations r that does
satisfy the multivalued dependency by adding tuples
to r.
Theory of MVDs
• From the definition of multivalued dependency, we can derive the
following rule:
– If   , then   
That is, every functional dependency is also a multivalued
dependency
• The closure D+ of D is the set of all functional and multivalued
dependencies logically implied by D.
– We can compute D+ from D, using the formal definitions of functional
dependencies and multivalued dependencies.
– We can manage with such reasoning for very simple multivalued
dependencies, which seem to be most common in practice
– For complex dependencies, it is better to reason about sets of
dependencies using a system of inference rules (see Appendix C).
Fourth Normal Form
• A relation schema R is in 4NF with respect
to a set D of functional and multivalued
dependencies if for all multivalued
dependencies in D+ of the form   ,
where   R and   R, at least one of the
following hold:
–    is trivial (i.e.,    or    = R)
–  is a superkey for schema R
• If a relation is in 4NF it is in BCNF
Restriction of Multivalued Dependencies

• The restriction of D to R is the set D


i i

consisting of
– All functional dependencies in D+ that include only
attributes of Ri
– All multivalued dependencies of the form
  (  Ri)
where   Ri and    is in D+
4NF Decomposition Algorithm
result: = {R};
done := false;
compute D+;
Let D denote the restriction of D+ to R
i i

while (not done)


if (there is a schema Ri in result that is not in 4NF) then
begin
let    be a nontrivial multivalued dependency that holds
on Ri such that   Ri is not in D , and ;
i

result := (result - Ri)  (Ri - )  (, );


end
else done:= true;
Note: each Ri is in 4NF, and decomposition is lossless-join
Example
• R =(A, B, C, G, H, I)
F ={ A  B
B  HI
CG  H }
• R is not in 4NF since A  B and A is not a superkey for R
• Decomposition
a) R1 = (A, B) (R1 is in 4NF)
b) R2 = (A, C, G, H, I) (R2 is not in 4NF)
c) R3 = (C, G, H) (R3 is in 4NF)
d) R4 = (A, C, G, I) (R4 is not in 4NF)
• Since A  B and B  HI, A  HI, A  I
e) R5 = (A, I) (R5 is in 4NF)
f)R6 = (A, C, G) (R6 is in 4NF)
Further Normal Forms
• Join dependencies generalize multivalued dependencies
– lead to project-join normal form (PJNF) (also called fifth normal
form)
• A class of even more general constraints, leads to a normal
form called domain-key normal form.
• Problem with these generalized constraints: are hard to
reason with, and no set of sound and complete set of
inference rules exists.
• Hence rarely used

You might also like