Unit 2 Normalization-3

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 39

Normalization

Normalization is defined as organizing data so as to


reduce unnecessary redundancy and to preserve
information.

A normal form is a measure of quality of


the design of a relation schema and hence of a
relational database.

1
Normalization
Normalization of data
◦ Normalization: Process of decomposing unsatisfactory "bad" relations
by breaking up their attributes into smaller relations.
◦ Decomposing relations to minimize redundancy and update anomalies.
Properties of Normalization
There are two important properties of decompositions:
a) Loss less join/ Non additive join property
Decomposed relation doesn’t give spurious tuples.
b) Dependency preservation
each functional dependency is there in some decomposed relation.
Normal forms & Normal tests
The normal form of relation schema is the
highest normal form satisfied by the schema.
There are various normal forms and normal
tests namely,
First normal form, Second normal form,
Third normal form, Boyce-Codd normal form
and tests to verify whether a relation
schema is in a desired normal form.

3
4
First normal form (1NF)
A relation schema is said to be in first
normal form if all its attributes are atomic.

5
A schema which is not in first normal form

Dlocations is a
multi-valued attribute.

6
Conversion into first normal form

The above relation schema is decomposed into two relation schemas.

Now DEPARTMENT & DEPT_LOCATIONS are in 1NF.

7
Conversion into first normal form
Alternative technique to decomposition also exists.
You may expand the primary key incorporating the
Multi-valued attribute into the primary key.

This solution has a


disadvantage of introducing
redundancy in the relation.

8
Conversion into first normal form
If maximum number of values of multi-valued
attribute is known then you may replace the
multi-valued attribute by a number of attributes.
In the example, instead of using Dlocations, you
may use three attributes, namely
Dlocation1, Dlocation2, Dlocation3
assuming that the maximum number of values of
Dlocations
This solution candisadvantage
has the be three. of introducing NULL values if most
departments have fewer than three locations. It further introduces spurious
semantics about the ordering among the location values that is not
originally intended. Querying on this attribute becomes more difficult.
9
Multi-valued attribute replaced

DEPARTMENT
Dname Dnumber Dmgr_ssn Docation1 Dlocation2 Dlocation3

Research 5 333445555 Bellaire Sugarland Houston

Administration 4 987654321 Stafford    

Headquaters 1 888665555 Houston    

10
Conversion into first normal form
First normal form does not allow complex attribute too.

11
Conversion into first normal form
Decompose

into two relations

12
Multiple multi-valued attributes
This relation is NOT in 1NF
and so

decompose this relation into two relations, namely

Presence of multiple complex attributes can be dealt with in a similar fashion.


Firstly, the composite part of the complex attribute is replaced by its components
and multi-valued aspect of each component is dealt with in the above manner.

13
Second normal form (2NF)
A relation schema R is said to be in second normal
form if

(i) it is in first normal form and

(ii) there is no partial dependency on primary


key of R.

14
Example

FD1: {Ssn, Pnumber}  {Hours} (It is a full functional dependency.)

FD2: {Ssn}  {Ename} (It is a partial functional dependency.)

FD3: {Pnumber}  {Pname, Plocation} (It is a partial functional dependency.)

EMP_PROJ is already in 1NF because all its attributes are atomic.

But it is NOT in 2NF because of the partial dependencies – FD2 and


FD3.

15
Decomposition into 2NF
In order to reduce the schema EMP_PROJ into 2NF, we decompose it
with respect to partial functional dependency.
Decomposition with respect to
{Ssn}  {Ename} results in
R1(Ssn, Ename) &
R2(Ssn, Pnumber, Hours, Pname, Plocation).
R1 is in 2NF but R2 is not because of partial dependency {Pnumber} 
{Pname, Plocation}
So decompose R2 with respect to
{Pnumber}  {Pname, Plocation}

16
Decomposition into 2NF
The decomposition of R2 with respect to {Pnumber}  {Pname,
Plocation} results in
R3(Pnumber, Pname, Plocation} &
R4(Ssn, Pnumber, Hours)
So decomposition of EMP_PROJ with respect to partial
dependency results in three relation schemas namely,
R1(Ssn, Ename), R3(Pnumber, Pname, Plocation} &
R4(Ssn, Pnumber, Hours).
All these relations are in 2NF. (In fact these are in higher normal
forms than 2NF.)

17
Decomposed relation schemas of EMP_PROJ

R1 R3

R4

18
Third normal form (3NF)
A relation schema R is said to be in third normal
form if

(i) it is in second normal form and

(ii) there is no transitive dependency in R.

19
Example
Consider the relation schema

This schema is in 2NF because all its attributes are atomic


and there is no partial dependency.

But it is not in 3NF because of the transitive dependency


{Dnumber}  {Dname, Dmgr_ssn}

20
Decomposition
So decompose the schema with respect to transitive
dependency.

After decomposition one relation is R1(Dnumber, Dname,


Dmgr_ssn)
and the other is
R2(Ename, Ssn, Address, Dnumber).
R1 and R2 are in 3NF.

21
Decomposition

22
Boyce-Codd normal form (BCNF)
A relation schema is said to be in BCNF if
(i) it is third normal form and
(ii) key attribute does not depend on non-key attribute.

FD1: AB  C
FD2: C  B

This relation schema is NOT in BCNF because of the dependency C  B.

Decompose R into R1(C, B) & R2(A, C)

23
Another example
Consider another relation schema TEACH(Student#,
Course#, Instructor#) and
Suppose that Instructor#  Course#
This functional dependency means that an instructor can
teach at the most one course.
Since Course# is a key attribute hence the schema TEACH is
NOT in BCNF.
Decomposition of TEACH with respect to Instructor# 
Course# results in
R1(Instructor#, Course#) & R2(Student# , Instructor#)

24
Fourth normal form (4NF)
A relation will be in 4NF if it is in Boyce Codd normal form and has no
multi-valued dependency.

For a dependency A → B, if for a single value of A, multiple values of B


exists, then the relation will be a multi-valued dependency.
Fifth normal form (5NF)
A relation is in 5NF if it is in 4NF and not contains any join
dependency and joining should be lossless.
5NF is satisfied when all the tables are broken into as many tables as
possible in order to avoid redundancy.
5NF is also known as Project-join normal form (PJ/NF).
In the above table, John takes both Computer and Math class for
Semester 1 but he doesn't take Math class for Semester 2. In this case,
combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about
the subject and who will be taking that subject so we leave Lecturer
and Subject as NULL. But all three columns together acts as a primary
key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three
relations P1, P2 & P3:
A schema with two keys

Primary Key = {Property_id#}

Secondary Key = {County_name, Lot#}

32
FDs on the schema
FD1:{Poperty_id#}{County_name, Lot#, Area, Price,
Tax_rate}

FD2: {County_name, Lot#}  {Property_id#, Area, Price,


Tax_rate}

FD3: {County_name}  {Tax_rate} Partial FD

FD4: {Area}  {Price} Transitive FD

33
Partial dependency
The attributes that are not part of any of the keys are called
non-key attributes.

Dependency of non-key attributes on a part of any of the


keys is defined as partial dependency.

34
General definition of second normal form
A relation schema is said to be in second normal
form
(i) if it is in first normal form and
(ii) there is no partial dependency of non-key attribute on
any of the keys of the schema.

35
Decomposition into 2NF

36
Decomposition into 3NF

Not in 3NF

Decomposition into 3NF

37
Exercise 2
Consider the following relation for published books:
BOOK (Book_title, Author_name, Book_type, List_price, Author_affil, Publisher).
Author_affil refers to the affiliation of author.
Suppose that the following dependencies exist:

Book_title → Publisher, Book_type


Book_type → List_price
Author_name → Author_affil.

What normal form is the relation in? Explain your answer.


Apply normalization until you cannot decompose the
relations further. State the reasons behind each
decomposition.

38
Exercise 4
Consider the universal relation
R = {A, B, C, D, E, G, H, I, J, K}
and the set of functional dependencies
F = {ABC, ADE, BK, KGH, D  IJ}.

What is the key for R? Decompose R into 2NF and then 3NF
relations.

39

You might also like