Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 48

Normalizationis a process of organizing the data in

database to avoid data redundancy, insertion anomaly,


update anomaly & deletion anomaly.
There are three types of anomalies that occur
when the database is not normalized. These
are Insertion, update and deletion anomaly.

emp_id emp_name emp_address emp_dept

101 Rick Delhi D001

101 Rick Delhi D002

123 Maggie Agra D890

166 Glenn Chennai D900

166 Glenn Chennai D004


Update anomaly: In the above table we have two
rows for employee Rick as he belongs to two
departments of the company. If we want to update
the address of Rick then we have to update the
same in two rows or the data will become
inconsistent. If somehow, the correct address gets
updated in one department but not in other then as
per the database, Rick would be having two
different addresses, which is not correct and would
lead to inconsistent data.
Insert anomaly: Suppose a new employee joins
the company, who is under training and currently
not assigned to any department then we would not
be able to insert the data into the table if emp_dept
field doesnt allow nulls.
Delete anomaly: Suppose, if at a point of time the
company closes the department D890 then deleting
the rows that are having emp_dept as D890 would
also delete the information of employee Maggie
since she is assigned only to this department.
Un-Normalized Form (UNF)
If a table contains non-atomic values at each row, it is
said to be in UNF. An atomic value is something that
can not be further decomposed. A non-atomic value,
as the name suggests, can be further decomposed and
simplified. Consider the following table:
Emp-Id Emp-Name Month Sales Bank-Id Bank-Name

E01 AA Jan 1000 B01 SBI

Feb 1200

Mar 850

E02 BB Jan 2200 B02 UTI

Feb 2500

E03 CC Jan 1700 B01 SBI

Feb 1800

Mar 1850

Apr 1725

In the sample table above, there are multiple occurrences of rows under each key Emp-Id.
Although considered to be the primary key, Emp-Id cannot give us the unique identification
facility for any single row. Further, each primary key points to a variable length record (3
for E01, 2 for E02 and 4 for E03).
First normal form(1NF)
Second normal form(2NF)
Third normal form(3NF)
Boyce & Codd normal form (BCNF)
A relation is said to be in 1NF if it contains no non-
atomic values and each row can provide a unique
combination of values.

As per the rule of first normal form, an


attribute (column) of a table cannot hold
multiple values. It should hold only atomic
values.
Emp-Id Emp-Name Month Sales Bank-Id Bank-Name

E01 AA Jan 1000 B01 SBI

E01 AA Feb 1200 B01 SBI

E01 AA Mar 850 B01 SBI

E02 BB Jan 2200 B02 UTI

E02 BB Feb 2500 B02 UTI

E03 CC Jan 1700 B01 SBI

E03 CC Feb 1800 B01 SBI

E03 CC Mar 1850 B01 SBI

E03 CC Apr 1725 B01 SBI

As you can see now, each row contains unique combination of values.
Unlike in UNF, this relation contains only atomic values, i.e. the rows
can not be further decomposed, so the relation is now in 1NF.
emp_addres
p_id emp_name emp_address emp_mobile emp_id emp_name emp_mobile
s

1 Herschel New Delhi 8912312390


101 Herschel New Delhi 8912312390

8812121212
102 Jon Kanpur 8812121212
2 Jon Kanpur
9900012222 102 Jon Kanpur 9900012222

3 Ron Chennai 7778881212 103 Ron Chennai 7778881212

9990000123 104 Lester Bangalore 9990000123


4 Lester Bangalore
8123450987 104 Lester Bangalore 8123450987
A relation is said to be in 2NF if already it is in 1NF and each and
every attribute fully depends on the primary key of the relation.
Speaking inversely, if a table has some attributes which is not
dependant on the primary key of that table, then it is not in 2NF.

. Emp-Id is the primary key of the above relation. Emp-Name,


Month, Sales and Bank- Name all depend upon Emp-Id.
But the attribute Bank-Name depends on Bank-Id, which is not the
primary key of the table. So the table is in 1NF, but not in 2NF.
If this position can be removed into another related relation, it would
come to 2NF.
A table is said to be in 2NF if both the
following conditions hold:
A relation is said to be in 2NF if it is already
in 1NF
and each and every attribute fully depends
on the primary key of the relation.
non-prime attribute
The attribute which is not in the primary key
is called non-prime attribute.
Emp-Id Emp-Name Month Sales Bank-Id

E01 AA JAN 1000 B01

E01 AA FEB 1200 B01

E01 AA MAR 850 B01 Bank-Id Bank-Name

E02 BB JAN 2200 B02 B01 SBI

E02 BB FEB 2500 B02 B02 UTI

E03 CC JAN 1700 B01

E03 CC FEB 1800 B01

E03 CC MAR 1850 B01

E03 CC APR 1726 B01


After removing the portion into another relation we store lesser amount
of data in two relations without any loss information. There is also a
significant reduction in redundancy.
Example: Suppose a school wants to store
the data of teachers and the subjects they
teach. They create a table that looks like
this: Since a teacher can teach more than
one subjects, the table can have multiple
rows for a same teacher.
teacher_id subject teacher_age

111 Maths 38

111 Physics 38

222 Biology 38

333 Physics 40

333 Chemistry 40

Candidate Keys: {teacher_id, subject}


Non prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values.


However, it is not in 2NF because non prime attribute teacher_age
is dependent on teacher_id alone which is a proper subset of
candidate key. This violates the rule for 2NF as the rule says
nonon-prime attribute is dependent on the proper subset of any
candidate key of the table.
teacher_details table

teacher_id teacher_age
teacher_subject
111 38 table:

222 38 teacher_id subject

333 40 111 Maths

111 Physics

222 Biology

333 Physics

333 Chemistry
The attributes of a table is said to be dependent on
each other when an attribute of a table uniquely
identifies another attribute of the same table.

For example: Suppose we have a student table with


attributes:
Stu_Id, Stu_Name, Stu_Age.

Here Stu_Id attribute uniquely identifies the Stu_Name


attribute of student table because if we know the
student id we can tell the student name associated
with it.
This is known as functional dependency and can be
written as Stu_Id->Stu_Name or in words we can say
Stu_Name is functionally dependent on Stu_Id.
If column A of a table uniquely identifies
the column B of same table then it can
represented as A->B (Attribute B is
functionally dependent on attribute A)

Functional dependency is represented by


an arrow sign () that is, XY, where X
functionally determines Y.
The left-hand side attributes determine
the values of attributes on the right-hand
side.
Since for each value of A there is associated one and only one value of B.
Since for A = 3 there is associated more than one value of B.
Various Types of Functional Dependencies are

Single Valued Functional Dependency
Fully Functional Dependency
Partial Functional Dependency
Transitive Functional Dependency
Trivial Functional Dependency
A simple example of single value functional
dependency is when A is the primary key of
an entity (eg. SID) and B is some single
valued attribute of the entity (eg. Sname).
Then, A B must always hold.

CID SID Sname


C1 S1 A
C1 S2 A
C2 S1 A
C3 S1 A
Sname is depend on SID
SIDSname
S1 A
S1 A
S1 A

SID is not depend on Sname .


SnameSID
A SI
A S2
A functional dependency P Q is full
functional dependency if removal of any
attribute A from P means that the
dependency does not hold any more.
If AD C, is fully functional dependency,

then we cannot remove A or D. i.e. C is fully


functional dependent on AD. If we are able
to remove A or D, then it is not full
functional dependency.
{SSN, PNUMBER} HOURS is a full FD since neither SSN HOURS
nor PNUMBER HOURS hold
{SSN, PNUMBER} ENAME is not a full FD (it is called a partial
dependency ) since SSN ENAME also holds
A Functional Dependency in which one or more non key attributes are
functionally depending on a part of the primary key is called partial
functional dependency. Or

where the determinant consists of key attributes, but not the entire primary
key, and the determined consist of non-key attributes.

For example, Consider a Relation R(A,B,C,D,E) having FD : AB CDE


where PK is AB.
Then, { A C; A D; A E; B C; B D; B E } all are Partial
Dependencies.
Given a relation R(A,B,C) then dependency like A>B, B>C is a
transitive dependency, since A>C is implied .

Ex: SSN --> DMGRSSN is a transitive FD {since SSN -->


DNUMBER and DNUMBER --> DMGRSSN hold} SSN -->
ENAME is non-transitive FD since there is no set of attributes X
where SSN --> X and X --> ENAME.
A relation is said to be in 3NF, if it is already in 2NF and there exists no
transitive dependency in that relation. Speaking inversely, if a table
contains transitive dependency, then it is not in 3NF, and the table must be
split to bring it into 3NF.

What is a transitive dependency?


Within a relation if we see
A B [B depends on A]
And
B C [C depends on B]
Then we may derive
A C[C depends on A]
Such derived dependencies hold well in most of the situations.
For example if we have Roll Marks
And
Marks Grade
Then we may safely derive Roll Grade.
This third dependency was not originally specified but we have
derived it.

The derived dependency is called a transitive dependency when


such dependency becomes improbable.
emp_id emp_name emp_zip emp_state emp_city emp_district

1001 John 282005 UP Agra Dayal Bagh

1002 Ajeet 222008 TN Chennai M-City

1006 Lora 282007 TN Chennai Urrapakkam

1101 Lilly 292008 UK Pauri Bhagwan

1201 Steve 222999 MP Gwalior Ratan

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}


so on
Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are
not part of any candidate keys.
Here, emp_state, emp_city & emp_district dependent on emp_zip. And,
emp_zip is dependent on emp_id that makes non-prime attributes (emp_state,
emp_city & emp_district) transitively dependent on super key (emp_id). This
violates the rule of 3NF.
Employee table
2NF=3NF Employee_zip table

emp_id emp_name emp_zip emp_zip emp_state emp_city emp_district

1001 John 282005 282005 UP Agra Dayal Bagh

1002 Ajeet 222008 222008 TN Chennai M-City

1006 Lora 282007 282007 TN Chennai Urrapakkam

1101 Lilly 292008 292008 UK Pauri Bhagwan

222999 MP Gwalior Ratan


1201 Steve 222999
Student_Detail Table :

Student Studen DOB Street city State Zip


_id t_nam
e

New Student_Detail Table :

Student_id Student_na DOB Zip


me

Address Table :
Zip Street city state
A relationship is said to be in BCNF if it is already in 3NF
and the left hand side of every dependency is a candidate
key. A relation which is in 3NF is almost always in BCNF.
These could be same situation when a 3NF relation may not
be in BCNF the following conditions are found true.

1. The candidate keys are composite.


2. There are more than one candidate keys in the relation.
3. There are some common attributes in the relation.
Professor Code Department Head of Dept. Percent Time

P1 Physics Ghosh 50

P1 Mathematics Krishnan 50

P2 Chemistry Rao 25

P2 Physics Ghosh 75

P3 Mathematics Krishnan 100

Consider, as an example, the above relation. It is assumed that:


1. A professor can work in more than one department
2. The percentage of the time he spends in each department is given.
3. Each department has only one Head of Department.

The relation diagram for the above relation is given as the following:
The given relation is in 3NF. Observe, however, that the
names of Dept. and Head of Dept. are duplicated. Further, if
Professor P2 resigns, rows 3 and 4 are deleted. We lose the
information that Rao is the Head of Department of Chemistry.

The normalization of the relation is done by creating a new


relation for Dept. and Head of Dept. and deleting Head of
Dept. form the given relation. The normalized relations are
shown in the following.
Example: Suppose there is a company wherein employees work
inmore than one department. They store the data like this:

emp_nationalit dept_typ dept_no_of_em


emp_id emp_dept
y e p
Production
1001 Austrian and D001 200
planning
1001 Austrian stores D001 250

design and
1002 American technical D134 100
support

Purchasing
1002 American D134 600
department
Functional dependencies in the table above:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor emp_dept alone
are keys.
To make the table comply with BCNF we can break the table in
three tables like this:

emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:

dept_no_o
emp_dept dept_type
f_emp emp_dept_mapping table
Production
and D001 200 emp_id emp_dept
planning Production
1001
stores D001 250 and planning
design 1001 stores
and design and
D134 100
technical 1002 technical
support support
Purchasin Purchasing
g 1002
D134 600 department
departme
nt
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:

For first table: emp_id


For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional

dependencies left side part is a key.


When attributes in a relation have multi-valued dependency, further Normalization
to 4NF and 5NF are required. Let us first find out what multi-valued dependency is.

A multi-valued dependency is a typical kind of dependency in which each and


every attribute within a relation depends upon the other, yet none of them is a
unique primary key.

We will illustrate this with an example. Consider a vendor supplying many items to
many projects in an organization. The following are the assumptions:
1. A vendor is capable of supplying many items.
2. A project uses many items.
3. A vendor supplies to many projects.
4. An item may be supplied by many vendors.
A multi valued dependency exists here because all the
attributes depend upon the other and yet none of them
is a primary key having unique value.

Vendor Code Item Code Project No.


V1 I1 P1
V1 I2 P1
V1 I1 P3
V1 I2 P3
V2 I2 P1
V2 I3 P1
V3 I1 P2
V3 I1 P3
The given relation has a number of problems. For example:
1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row with a blank for
item code has to be introduced.
2. The information about item I1 is stored twice for vendor V3.

Observe that the relation given is in 3NF and also in BCNF. It still has the problem mentioned above.
The problem is reduced by expressing this relation as two relations in the Fourth Normal Form
(4NF). A relation is in 4NF if it has no more than one independent multi valued dependency or one
independent multi valued dependency with a functional dependency.

The table can be expressed as the two 4NF relations given as following. The fact that vendors are
capable of supplying certain items and that they are assigned to supply for some projects in
independently specified in the 4NF relation.
These relations still have a problem. While defining the 4NF we mentioned
that all the attributes depend upon each other. While creating the two tables
in the 4NF, although we have preserved the dependencies between Vendor
Code and Item code in the first table and Vendor Code and Item code in the
second table, we have lost the relationship between Item Code and Project
No. If there were a primary key then this loss of dependency would not
have occurred. In order to revive this relationship we must add a new table
like the following. Please note that during the entire process of
normalization, this is the only step where a new table is created by joining
two attributes, rather than splitting them into separate tables.

You might also like