Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

NORMALIZATION OF DATABASE TABLES MODULE 4

 DATABASE TABLES AND NORMALIZATION


Having good relational database software is not enough to avoid the data redundancy in
Database Systems. If the database tables are treated as though they are files in a file
system, the RDBMS never has a chance to demonstrate its superior data-handling
capabilities.
The table is the basic building block in the database design process.
 Consequently, the table’s structure is of great interest. Ideally, the database design
process can do by using Entity Relationship (ER) Modeling, yields good table structures.

Yet it is possible to create poor table structures even in a good database design.
So how do you recognize a poor table structure, and how do you produce a good
table?
The answer to both questions involves normalization.
Normalization is a process for evaluating and correcting table structures to minimize
data redundancies, thereby reducing the likelihood of data anomalies.
Normalization works through a series of stages called normal forms.
The first three stages are described as first normal form (1NF), second normal form
(2NF), and third normal form (3NF).
From a structural point of view, 2NF is better than 1NF, and 3NF is better than 2NF.
For most purposes in business database design, 3NF is as high as we need to go in the
normalization process and 3NF structures also meet the requirements of fourth normal
form (4NF).

Although normalization is a very important database design ingredient, we should not


assume that the highest level of normalization is always the most desirable.
Generally, the higher the normal form, the more relational join operations required to
produce a specified output and the more resources required by the database system to
respond to end-user queries.
A successful design must also consider end-user demand for fast performance.
Therefore, we will occasionally be expected to denormalize some portions of a database
design in order to meet performance requirements.
Denormalization produces a lower normal form; that is, a 3NF will be converted to a
2NF through denormalization. However, the price we pay for increased performance
through denormalization is greater data redundancy.

Dept., of CSE, GST


Page 1
NORMALIZATION OF DATABASE TABLES MODULE 4

 THE NEED FOR NORMALIZATION


Normalization is the process of removing redundant data from your tables in order to
improve storage efficiency, data integrity and scalability. This improvement is balanced
against an increase in complexity and potential performance losses from the joining of
the normalized tables at query-time.
There are two goals of the normalization process:
 eliminating redundant data (for example, storing the same data in more than one
table)
 Ensuring data dependencies make sense (only storing related data in a table).
Both of these are worthy goals as they reduce the amount of space a database consumes
and ensure that data is logically stored.

Normalization is the aim of well design Relational Database Management System


(RDBMS). It is step by step set of rules by which data is put in its simplest forms.
We normalize the relational database management system because of the following
reasons:
 Minimize data redundancy i.e. no unnecessarily duplication of data.
 To make database structure flexible i.e. it should be possible to add new data values
and rows without reorganizing the database structure.
 Data should be consistent throughout the database i.e. it should not suffer from
following anomalies.
o Insert Anomaly - Due to lack of data i.e., all the data available for insertion such
that null values in keys should be avoided. This kind of anomaly can seriously
damage a database
o Update Anomaly - It is due to data redundancy i.e. multiple occurrences of same
values in a column. This can lead to inefficiency.
o Deletion Anomaly - It leads to loss of data for rows that are not stored
elsewhere. It could result in loss of vital data.
 Complex queries required by the user should be easy to handle.
On decomposition of a relation into smaller relations with fewer attributes on
normalization the resulting relations whenever joined must result in the same relation
without any extra rows. The join operations can be performed in any order. This is
known as Lossless Join decomposition.
The resulting relations (tables) obtained on normalization should possess the
properties such as each row must be identified by a unique key, no repeating groups,
homogenous columns, each column is assigned a unique name etc

Dept., of CSE, GST


Page 2
NORMALIZATION OF DATABASE TABLES MODULE 4

Data Anomalies
 Normalization is the process of splitting relations into well structured relations
that allow users to insert, delete, and update tuples without introducing database
inconsistencies.
 Without normalization many problems can occur when trying to load an integrated
conceptual model into the DBMS.
 These problems arise from relations that are generated directly from user views are
called anomalies.
 There are three types of anomalies: update, deletion and insertion anomalies.

For example, each employee in a company has a department associated with them as
well as the student group they participate in.

Employee_ID Name Department Student_Group


123 J. Longfellow Accounting Beta Alpha Psi

234 B. Rech Marketing Marketing Club

234 B. Rech Marketing Management Club

456 A. Bruchs CIS Technology Org.

456 A. Bruchs CIS Beta Alpha Psi

If A. Bruchs’ department is an error it must be updated at least 2 times or there will be


inconsistent data in the database. If the user performing the update does not realize the
data is stored redundantly the update will not be done properly.

An update anomaly is a data inconsistency that results from data redundancy and a
partial update.

A deletion anomaly is the unintended loss of data due to deletion of other data.

For example, if the student group Beta Alpha Psi disbanded and was deleted from the
table above, J. Longfellow and the Accounting department would cease to exist. This
results in database inconsistencies and is an example of how combining information
that does not really belong together into one table can cause problems.

An insertion anomaly is the inability to add data to the database due to absence of
other data.

For example, assume Student_Group is defined so that null values are not allowed. If a
new employee is hired but not immediately assigned to a Student_Group then this
employee could not be entered into the database. This results in database
inconsistencies due to omission.

Dept., of CSE, GST


Page 3
NORMALIZATION OF DATABASE TABLES MODULE 4

Update, deletion, and insertion anomalies are very undesirable in any database.
Anomalies are avoided by the process of normalization.

 THE NORMALIZATION PROCESS


The use normalization to produce a set of normalized tables to store the data that will be
used to generate the required information.

The objective of normalization is to ensure that each table conforms to the concept of
well-formed relations, that is, tables that have the following characteristics:

 Each table represents a single subject. For example, a course table will contain only
data that directly pertains to courses. Similarly, a student table will contain only
student data.
 No data item will be unnecessarily stored in more than one table (in short, tables
have minimum controlled redundancy). The reason for this requirement is to
ensure that the data are updated in only one place.
 All nonprime attributes in a table are dependent on the primary key—the entire
primary key and nothing but the primary key. The reason for this requirement is to
ensure that the data are uniquely identifiable by a primary key value.
 Each table is void of insertion, update, or deletion anomalies. This is to ensure the
integrity and consistency of the data.

To accomplish the objective, the normalization process takes you through the steps that
lead to successively higher normal forms. The most common normal forms and their
basic characteristic are listed in Table.

NORMAL FORM CHARACTERISTIC

First normal form (1NF) Table format, no repeating groups, and PK identified

Second normal form (2NF) 1NF and no partial dependencies

Third normal form (3NF) 2NF and no transitive dependencies

Boyce-Codd normal form (BCNF) Every determinant is a candidate key (special case of 3NF)

Fourth normal form (4NF) 3NF and no independent multivalued dependencies

 From the data modeler’s point of view, the objective of normalization is to ensure that
all tables are at least in third normal form (3NF). Even higher-level normal forms exist.
However, normal forms such as the fifth normal form (5NF) and domain-key normal

Dept., of CSE, GST


Page 4
NORMALIZATION OF DATABASE TABLES MODULE 4

form (DKNF) are not likely to be encountered in a business environment and are mainly
of theoretical interest.
More often than not, such higher normal forms usually increase joins (slowing
performance) without adding any value in the elimination of data redundancy. Some
very specialized applications, such as statistical research, might require normalization
beyond the 4NF, but those applications fall outside the scope of most business
operations. Because this book focuses on practical applications of database techniques,
the higher-level normal forms are not covered.

Functional Dependency
Functional dependency is a relationship that exists when one attribute uniquely
determines another attribute.
If R is a relation with attributes X and Y, a functional dependency between the
attributes is represented as X->Y, which specifies Y is functionally dependent on X.
Here X is a determinant set and Y is a dependent attribute. Each value of X is
associated with precisely one Y value.
Functional dependency in a database serves as a constraint between two sets of
attributes. Defining functional dependency is an important part of relational
database design and contributes to aspect normalization.
Full Functional Dependency: In a relation , there exists Full Functional
Dependency between any two attributes X and Y, when X is functionally dependent
on Y and is not functionally dependent on any proper subset of Y.
Partial Functional Dependency: In a relation, there exists Partial Dependency,
when a non prime attribute (the attributes which are not a part of any candidate
key ) is functionally dependent on a proper subset of Candidate Key.
For example : Let there be a relation R ( Course, Sid , Sname , fid, schedule , room ,
marks )
 Full Functional Dependencies: {Course , Sid) -> Sname , {Course , Sid} -> Marks, etc.
 Partial Functional Dependencies : Course -> Schedule , Course -> Room

Dept., of CSE, GST


Page 5
NORMALIZATION OF DATABASE TABLES MODULE 4

Trivial functional dependency


o A → B has trivial functional dependency if B is a subset of A.
o The following dependencies are also trivial like: A → A, B → B

Non-trivial functional dependency


o A → B has a non-trivial functional dependency if B is not a subset of A.
o When A intersection B is NULL, then A → B is called as complete non-trivial.

Six rules IR1 through IR6 (inference rules for functional dependencies)

 IR1 (reflexive rule)1: If X ⊇ Y, then X→Y.


 IR2 (augmentation rule)2: {X→Y} |=XZ→YZ.
 IR3 (transitive rule): {X→Y, Y→Z} |=X→Z.
 IR4 (decomposition, or projective, rule): {X→YZ} |=X→Y.
 IR5 (union, or additive, rule): {X→Y, X→Z} |=X→YZ.
 IR6 (pseudotransitive rule): {X→Y,WY→Z} |=WX→Z.

In detail explanation:

IR1 (reflexive rule)

 In the reflexive rule, if Y is a subset of X, then X determines Y.


If X ⊇ Y, then X→Y.

IR2 (augmentation rule)

 In augmentation, if X determines Y, then XZ determines YZ for any Z.


If X → Y then XZ → YZ
 Example:
sid -> sname then
sid,phone -> sname,phone
Dept., of CSE, GST
Page 6
NORMALIZATION OF DATABASE TABLES MODULE 4

IR3 (transitive rule):

 In the transitive rule, if X determines Y and Y determine Z, then X must also determine Z.
 If X → Y and Y → Z then X → Z
 Example:
sid -> sname and sname -> city then sid -> city

IR4 (decomposition, or projective, rule):


 This Rule says, if X determines Y and Z, then X determines Y and X determines Z
separately.
If X → YZ then X → Y and X → Z

IR5 -union, or additive, rule: {X→Y, X→Z} |=X→YZ

IR6 (pseudotransitive rule):

 In Pseudo transitive Rule, if X determines Y and YZ determines W, then XZ determines W.


If X → Y and YZ → W then XZ → W

Normal forms
 First normal form

First normal form is the first step in database normalization.


Following conditions need to be satisfied for a table to be in the first normal form:
 All fields should have atomic values. Atomicity means that all the columns must
be split unless their values become unsplittable. So, we cannot use one tuple for
two entities, also, we cannot merge the attributes.
 In other words, avoid saving values in comma separated format in database.
They cause a table to fail the first normal form in most cases.

Ex: ID NAME
1 a,b wrong
2 c

Dept., of CSE, GST


Page 7
NORMALIZATION OF DATABASE TABLES MODULE 4

 Each data in a column should be of same kind. ( means of same data type).

Ex: ID NAME
1 a
h c

Wrong

 Each column should have a unique name in a table.

Wrong right
Ex:
ID NAME NAME ID FNAME LNAME
1 A B 1 A B

 order in which you store the data in table does not matter.

Dept., of CSE, GST


Page 8
NORMALIZATION OF DATABASE TABLES MODULE 4

Problems 1NF Addresses


There are two cases in which a table fails 1NF and they cause two different
problems.
If a table saves a multi valued attribute in comma separated format, it becomes
difficult to update or insert another value. One must retrieve the whole string, make
changes and then put it in the database.
While in another case, if one tuple to represent two entities because they have all
other attributes values the same.
Normalization to 1NF
A table can be normalized to 1NF either by adding more columns to make the tuples
unique or by using another tuple for every value of a multi valued attribute, based
on the property of 1NF the table design is lacking.
Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute
Subject. Its decomposition into 1NF has been shown in table 2.
Table 1
Rollno Name Subject

101 Akon Os , cn

102 Lkon Java


103 Bkon c

Table 2
Rollno Name Subject

101 Akon Os

101 Akon Cn

102 Lkon Java

103 Bkon c

Dept., of CSE, GST


Page 9
NORMALIZATION OF DATABASE TABLES MODULE 4

The first normal form states that:

Every column in the table must be unique


Separate tables must be created for each set of related data
Each table must be identified with a unique column or concatenated columns called
the primary key
No rows may be duplicated
no columns may be duplicated
no row/column intersections contain a null value
no row/column intersections contain multi valued fields

Dept., of CSE, GST


Page 10
NORMALIZATION OF DATABASE TABLES MODULE 4

 Second Normal Form

A database is in second normal form if it satisfies the following conditions:

It is in first normal form


All non-key attributes are fully functional dependent on the primary key (means it should
does not have any partial dependencies)

Note that if the primary key is not a composite key, all non-key attributes are always fully
functional dependent on the primary key.

A table that is in 1st normal form and contains only a single key as the primary key is
automatically in 2nd normal form.

Example 1: Consider the following example:

The table in this example is in the 1NF Since all the attributes are single valued. But it is
not yet in 2 NF. If the student1 leaves university and the tuple is deleted, then we lose all
the information about the professor schmid , since this attribute is fully functional

Dept., of CSE, GST


Page 11
NORMALIZATION OF DATABASE TABLES MODULE 4

dependent on the primary key IDSt. To solve this problem, we must create a new table
professor with the attribute professor (the name) and the key IDProf. The third table grade
is necessary for combining the two relations student and professor and to manage the
grades. Besides the grade it contains only the 2 IDs of the student and professor. If now a
student is deleted, we don’t lose the information about the professor.

Example 2: Suppose a school wants to store the data of teachers and the subjects they
teach. They create a table that looks like this: Since a teacher can teach more than one
subjects, the table can have multiple rows for a same teacher.

teacher_id subject teacher_age


111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF
because non prime attribute teacher_age is dependent on teacher_id alone which is a
proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-
prime attribute is dependent on the proper subset of any candidate key of the table”.

To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:

teacher_id teacher_age
111 38
222 38
333 40

teacher_subject table:
teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry

Dept., of CSE, GST


Page 12
NORMALIZATION OF DATABASE TABLES MODULE 4

3rd Normal Form

A database is in third normal form if it satisfies the following conditions:

It is in second normal form


There is no transitive functional dependency

By transitive functional dependency, we mean we have the following relationships in the


table: A is functionally dependent on B, and B is functionally dependent on C. In this case, C
is transitively dependent on A via B. (means non prime attribute should not depend on
another non prime attribute)

3rd Normal Form Example

Consider the following example:

A bank uses the following relation: Vendor(ID, Name, Account_No, Bank_Code_No, Bank)
The attribute ID is the identification key. All attributes are single valued (1NF). The table is
also in 2NF.
The following dependencies exist:

ID is prime attribute and other attribute name account_no, Bank_code_no, Bank are non prime
attributes.
1. Name, Account_No, Bank_Code_No are functionally dependent on ID (ID --> Name,
Account_No, Bank_Code_No)
2. Bank is functionally dependent on Bank_Code_No (Bank_Code_No --> Bank ) which is a non
prime attribute.

Dept., of CSE, GST


Page 13
NORMALIZATION OF DATABASE TABLES MODULE 4

The table in this example is in 1 NF and 2NF. But there is transitive dependency between
bank-code-ni and bank, because bank-code-no is not the primary key of this relation. To get
to the third normal form 3NF, we have to put the bank name in a separate table together with
clearing number to identify it.

Example 2: consider a student table .

Rollno State City Sname


1 Karnataka Bangalore Ram
2 Karnataka Hubli shree

In this table Rollno is a prime attribute, and state, city,sname are non prime attributes.

The following dependencies exits:

1. Rollno --- state , state ----city => rollno -- city


Here state and city both are non prime attributes, dependent on each other. Rollno determines city
with the help of state non prime attribute.

To make the table to be in 3NF, divide the table into two tables.

Rollno Sname State State city

Dept., of CSE, GST


Page 14
NORMALIZATION OF DATABASE TABLES MODULE 4

 Boyce and Codd Normal Form (BCNF)

Boyce and Codd Normal Form is a higher version of the Third Normal form or an
extension to the third normal form, and is also known as 3.5 Normal Form.
This form deals with certain type of anomaly that is not handled by 3NF. A 3NF table which
does not have multiple overlapping candidate keys is said to be in BCNF.
For a table to be in BCNF, following conditions must be satisfied:
R must be in 3rd Normal Form
 For each functional dependency ( X → Y ), X should be a super Key. 
( note: always non prime attribute should not determine prime attribute) 
Super key : combination of one or more attributes which help to uniquely identity tuple
in a table.

Rules for BCNF


For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two
conditions:

1. It should be in the Third Normal Form.


2. And, for any dependency A → B, A should be a super key.

Dept., of CSE, GST


Page 15
NORMALIZATION OF DATABASE TABLES MODULE 4

Example
Below we have a college enrolment table with columns student_id, subject and professor.

student_id Subject professor

101 Java P.Java

101 C++ P.Cpp

102 Java P.Java2

103 C# P.Chash

104 Java P.Java


As you can see, we have also added some sample data to the table.
In the table above:

One student can enroll for multiple subjects. For example, student with student_id 101,
has opted for subjects - Java & C++
For each subject, a professor is assigned to the student.
And, there can be multiple professors teaching one subject like we have for Java.

In the table above student_id, subject together form the primary key, because
using student_id and subject, we can find all the columns of the table.
One more important point to note here is, one professor teaches only one subject, but one
subject may have two different professors.
Hence, there is a dependency between subject and professor here, where subject depends
on the professor name.
This table satisfies the 1st Normal form because all the values are atomic, column names
are unique and all the values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as their is no Partial Dependency.
And, there is no Transitive Dependency; hence the table also satisfies the 3rd Normal
Form.

Dept., of CSE, GST


Page 16
NORMALIZATION OF DATABASE TABLES MODULE 4

But this table is not in Boyce-Codd Normal Form.

Why this table is not in BCNF?


In the table above, student_id, subject form primary key, which means subject column, is
a prime attribute.
But, there is one more dependency, professor → subject.
And while subject is a prime attribute, professor is a non-prime attribute, which is not
allowed by BCNF.

How to satisfy BCNF?


To make this relation (table) satisfy BCNF, we will decompose this table into two
tables, student table and professor table.
Below we have the structure for both the tables.
Student Table

student_id p_id

101 1

101 2

and so on...
And, Professor Table
p_id professor subject

1 P.Java Java

2 P.Cpp C++

and so on...

Dept., of CSE, GST


Page 17
NORMALIZATION OF DATABASE TABLES MODULE 4

A more Generic Explanation


In the picture below, we have tried to explain BCNF in terms of relations.

Dept., of CSE, GST


Page 18
NORMALIZATION OF DATABASE TABLES MODULE 4

 4th Normal Form

Rules for 4th Normal Form


For a table to satisfy the Fourth Normal Form, it should satisfy the following two
conditions:

1. It should be in the Boyce-Codd Normal Form.


2. And, the table should not have any Multi-valued Dependency.

What is Multi-valued Dependency?


A table is said to have multi-valued dependency, if the following conditions are true,

1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the
table may have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B,
then B and C should be independent of each other.

If all these conditions are true for any relation (table), it is said to have multi-valued
dependency.

Example
Below we have a college enrolment table

s_id course hobby

1 Science Cricket

1 Maths Hockey

2 C# Cricket

2 Php Hockey

Dept., of CSE, GST


Page 19
NORMALIZATION OF DATABASE TABLES MODULE 4

In the table above, student with s_id 1 has opted for two courses, Science and Maths, and
has two hobbies, Cricket and Hockey and the two records for student with s_id 1, will give
rise to two more records, as shown below, because for one student, two hobbies exists,
hence along with both the courses, these hobbies should be specified.
s_id course hobby

1 Science Cricket

1 Maths Hockey

1 Science Hockey

1 Maths Cricket
And, in the table above, there is no relationship between the columns course and hobby.
They are independent of each other.
So there is multi-value dependency, which leads to un-necessary repetition of data and
other anomalies as well.
To satisfy 4th Normal Form- To make the above relation satisfy the 4th normal form,
we can decompose the table into 2 tables.

CourseOpted Table

s_id course

1 Science

1 Maths

2 C#

2 Php

Dept., of CSE, GST


Page 20
NORMALIZATION OF DATABASE TABLES MODULE 4

Hobbies Table,
s_id hobby

1 Cricket

1 Hockey

2 Cricket

2 Hockey
Now this relation satisfies the fourth normal form.
A table can also have functional dependency along with multi-valued dependency. In that
case, the functionally dependent columns are moved in a separate table and the multi-
valued dependent columns are moved to separate tables.

 5th Normal Form


 A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
 5NF is satisfied when all the tables are broken into as many tables as possible in order
to avoid redundancy.
 5NF is also known as Project-join normal form (PJ/NF).

1) Lossless / lossy join decomposition

 If we can decompose table further to eliminate redundancy and anomaly, and when
we re-join the decomposed tables by means of candidate keys, we should not be
losing the original data or any new record set should not arise. In simple words,
joining two or more decomposed table should not lose records nor create new
records.

Dept., of CSE, GST


Page 21
NORMALIZATION OF DATABASE TABLES MODULE 4

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

 In the above table, John takes both Computer and Math class for Semester 1 but he
doesn't take Math class for Semester 2. In this case, combination of all these fields
required to identify a valid data.
 Suppose we add a new Semester as Semester 3 but do not know about the subject and
who will be taking that subject so we leave Lecturer and Subject as NULL. But all three
columns together acts as a primary key, so we can't leave other two columns blank.
 So to make the above table into 5NF, we can decompose it into three relations P1, P2
& P3:

P1

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

Dept., of CSE, GST


Page 22
NORMALIZATION OF DATABASE TABLES MODULE 4

P2

SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3

SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen

Dept., of CSE, GST


Page 23
NORMALIZATION OF DATABASE TABLES MODULE 4

Closure of a set of FD’s

Closure set means need to find all candidate keys of a given relation. Following example will
explain about how to find a closure set.
Example 1. R(ABCD) FD { A->B , B->C , C->D}
Sol: A+ = BCDA ( all attributes are present in a relation, so it can be candidate key)
B+ = BCD ( all attributes are NOT present in a relation, so it can not be candidate key)
C+ = CD
D+ = D
AB+ = ABCD ( all attributes are present in a relation, but still doesn’t consider as candidate
key, why? )

RULE: always we have to choose minimal attribute to be a candidate key ( means single
attribute should be a primary key). But here AB is a combination of both A & B attributes which
become super key. so, it cant be candidate key.

So candidate key = {A}


Prime attribute = A non prime attribute = B,C,D

Example 2: R(ABCD) FD { A->B,B->C, C->D, D->A}


Sol: A+ = ABCD
B+ = BCDA
C+ = CDAB
D+ = DABC

Candidate key ={ A,B,C,D)


Prime attributes = A,B,C,D non prime attributes = nil

Example 3. R(ABCDE) FD {A->B, BC->D, E->C, D->A }


SOL: the question is how many closure like A+, B+ …. we need to find. For that we shall use
short cut method, consider all RHS of FD’s , we will get BDCA, when we compare this with
relation R, E attribute is missed out, so we assume that E acts as reflesive property, so now
E = EBDCA. This indicates that we need to find closure set from E.

Dept., of CSE, GST


Page 24
NORMALIZATION OF DATABASE TABLES MODULE 4

E+ = EC (this cant be candidate key, as all attributes not present). This indicates that single
attribute cant be primary key.
So we now need to go with combination of attributes along with E.
AE+ = ABECD BE+ = BECDA CE+ = CE DE+ = DEACB

Candidate key = { AE, DE, BE}


Prime attribute = A,D,B,E Non prime attribute = C

Given a relation, find whether its in 2NF or not?


Example 1. R(ABCDEF) FD { C->F , E->A , EC->D , A->B}
(by using above specified short cut method)
EC = ECFADB
So, EC+ = ECFADB
Next to find other combination keys, again we shall use other short method i.e. , we obtained one
candidate key as EC, either E or C is present on RHS of FD’s. here non of them present. So we
have only one candidate key {EC}.
Prime attribute: {E,C} non prime attribute: {A,B,D,F}
Now, since we need to say whether relation R is in 2NF or not, we need check partial
dependency.
RULE: LHS should be proper subset of candidate key and RHS should be non prime
attribute
Will apply this rule for each FD’s.
i) C-> F C (LHS) is a part of candidate key and F(RHS) is non prime. So partial dependency
exists. Its not in 2NF
ii) E->A partial dependency exists. Its not in 2NF
iii) EC->D LHS subset of candidate key, RHS non prime, its in 2NF.
iv) A->D both are non prime.

So R(ABCDE) is not in 2NF. Partial dependency exists.

Dept., of CSE, GST


Page 25
NORMALIZATION OF DATABASE TABLES MODULE 4

Given a relation, find whether its in 3NF or not?


Example 1: R(ABCD) FD { AB->CD , D->A }
SOL: AB+ = ABCD DB+ = DBAC
Candidate key {AB , DB}
Prime attributes { A,B,D} non prime attributes {C}
Now to check whether relation R is in 3NF..
RULE: for each FD, LHS must be a candidate key OR RHS is a prime attribute. ( either one
condition should be true)
i. AB->CD LHS is a candidate key, RHS not a prime attribute, its in 3NF.
ii. D->A LHS not a candidate key, RHS is a prime attribute, its in 3NF.

So, the relation R(ABCD) is in 3NF.

Properties of normalization

1) Lossless / lossy join decomposition


2) Dependency preserving decomposition

Lossless / lossy join decomposition

This property says that that extra tuple or less tuple generation problem does not occur after
decomposition. Lets explain with an example. We have a relation R
R
A B C
1 2 1
2 2 2
3 3 2
The relation R is divided randomly into R1(AB) and R2(BC).

R1 R2
A B B C
1 2 2 1
2 2 2 2
3 3 3 2

Dept., of CSE, GST


Page 26
NORMALIZATION OF DATABASE TABLES MODULE 4

We have divided the relation R, we should have some common attribute between two tables to
query it. So here B is a common attribute.

Query: find the value of C, if the value of A=1.

Sol: select R2.C from R2 natural join R1 where R1.A=1;

Before it perform natural join, it performs cross product of two tables. To that cross product
result natural join will be applied. (natural join means only common values between two tables
will be retrieved)

Cross product (R1 X R2)


A B B C
1 2 2 1
1 2 2 2
1 2 3 2
2 2 2 1
2 2 2 2
2 2 3 2
3 3 2 1
3 3 2 2
3 3 3 2

After applying natural join to the above cross product , we get the following result R11

R11
A B C == R
1 2 1
1 2 2 Compare A B C
2 2 1 this with 1 2 1
2 2 2 original 2 2 2
table R 3 3 2
3 3 1
3 3 2

Observe in R11, for A=1, it has two values of C (1 and 2). But in original table A =1 means C
has only one value. This indicates after joining extra tuples are added.

So, this is called lossy decomposition. Lossy does not mean that we are losing data, lossy is in
terms of tuples, after joining tables there should be no extra tuples. Data inconsistency occurs.
All the values in new table R11 are not valid.

Always we should get lossless decomposition when we join tables. So for that there is a rule to
Dept., of CSE, GST
Page 27
NORMALIZATION OF DATABASE TABLES MODULE 4

choose attribute which should be common between tables.

Rule: common attribute should be a candidate key or super key of either R1 or R2 or both.

So in above relation R, A attribute is a candidate key or primary key because it doesn’t have
duplicate values. So A attribute should be common between tables.

R1 R2
A B A C
1 2 1 1
2 2 2 2
3 3 3 2

If we apply natural join for the above tables. We get lossless decomposition, so no redundancy.

Conditions are:
1) R1 U R2 = R ( ex AB U AC = ABC)
2) R1 ∩ R2 = ϕ (ex AB ∩ AC = ϕ )
3) Common attribute should be candidate key or super key .

2) Dependency preserving decomposition

The decomposition of relation R with FDs , F into R1 and R2 with FDs F1 and F2 respectively,
is said to be dependency preserving if , (F1 U F2) + = F+

i.e A relation R is divided into R1 & R2, similarly functional dependencies is divided into F1 &
F2. later when it is rejoined like R1 U R2 = R and F1 U F2 = F , should get the original
table.

Dept., of CSE, GST


Page 28
NORMALIZATION OF DATABASE TABLES MODULE 4

Example 1: Let a relation R(A,B,C,D) and set a FDs F = { A -> B , A -> C , C -> D}
are given.

A relation R is decomposed into –

R1 = (A, B, C) with FDs F1 = {A -> B, A -> C}, and R2 = (C, D) with FDs F2 = {C ->
D}.

F ' = F1 ∪ F2 = {A -> B, A -> C, C -> D}

so, F ' = F. And so, F‘ + = F+.

Hence, dependency preserved means functional dependency is preserved even after


decomposing and rejoining of tables.

Example 2: If a Relation R (ABC) , F = { A -> B , B-> C } is decomposed into R1(AB)


R2(BC) is this decomposition preserving the dependency or not?

Sol: R1 ( AB) and R2(BC)


F1 = { A->A , B->B, A->B} F2 = { B->B , C-> C, B->C }

(tip: choose trivial functional dependency)

(F1 U F2) = { A->B , B->C }

So, dependency is preserved in this relation

Dept., of CSE, GST


Page 29
NORMALIZATION OF DATABASE TABLES MODULE 4

Minimal cover for a set of FDs ( Canonical cover)

A minimal cover for a set F of FDs is a set G of FDs such that

1. Every dependency in G is of the form X  A, where A is a single attribute.


2. The closure F+ is equal to the closure G+.
3. If we obtain a set H of dependencies from G by deleting one or more dependencies or by
deleting attributes from a dependency in G, then F+ ≠ H+.

General algorithm fro obtaining a minimum cover of a set F of FDs.

1. Put the FDs in a standard form: obtain a collection G of equivalent FDs with a single
attribute on right side.
2. Minimize the left side of each FD: for each FD in G, check each attribute in the left side to
see if it can be deleted while preserving equivalence to F+.
3. Delete redundant FDs : check each remaining FD in G to see if it can be deleted while
preserving equivalence to F+.

That’s means 1. Singleton RHS ( A -> BC - A->B , A->C)


2. no extraneous attribute on LHS.
3. no redundant FD.

Example 1: R(ABC) F { A->B , AB->C}

Solution: suppose if we find A+ = ABC, this indicates that attribute A alone can determine all
attributes. In F this AB->C functional dependency is an extraneous attribute, reduce it to minimal.

So now F is , F{A->B , A->C}

Example 2: R(ABC) F { A->B , B->C, A->C}

Solution: by transitive rule we know, A->B , B-> C => A->C


So in F, A->C functional dependency is extraneous attribute, reduce it to minimal.

So now F is , F{A->B , B->C}

Dept., of CSE, GST


Page 30
NORMALIZATION OF DATABASE TABLES MODULE 4

Equivalence of FDs:

Will be given a relation R, along with two set of FDs, need to find whether these functional
dependencies are equal or not. Lets solve an example.

Example 1: R(ABC) given two sets of FDs


X { A->B , B->C} Y{ A->B , B->C , A->C}

Solution: X covers Y
(tip: by using X functional dependencies find the closure set for Y)

A+ = ABC
B+ = BC

Check from these closure set, whether Y functional dependencies are determined or not.
Yes, A+ = ABC determines A->B , A->C. B+ = BC determines B->C.
So we can say X covers Y.

Similarly, Y covers X
(tip: by using Y functional dependencies find the closure set for X)

A+ = ABC
B+ = BC
Check from these closure set, whether X functional dependencies are determined or not.
Yes, A+ = ABC determines A->B B+ = BC determines B->C.
So we can say Y covers X.

This implies both FDs are equal.

Example 2: R(ABCD) two set FDs are


X{ AB->CD , B->C , C->D } Y{AB->C , AB-> D , C->D}

Solution: X covers Y

AB+ = ABCD
C+ = CD
Check from these closure set, whether Y functional dependencies are determined or not.
Yes, A+ = ABC determines AB->C , AB->D.
C+ = CD determines C->D.
So we can say X covers Y.

Y covers X

Dept., of CSE, GST


Page 31
NORMALIZATION OF DATABASE TABLES MODULE 4

AB+ = ABCD
B+ = B
C+ = CD

Check from these closure set, whether X functional dependencies are determined or not.
Yes, AB+ = ABCD determines AB->CD B+ =B doesn’t determine
C+ = CD determines C->D.

So from Y, all X functional dependencies are not determined. So Y doesn’t cover X.


So they are not equivalent.

Example 3: R(ACDEH) two sets of Fds are


F{ A->C, AC->D , E->AD , E-> H} G{ A->CD , E->AH }
Solve it.

 Denormalization
 Denormalization is a database optimization technique in which we add redundant
data to one or more tables. This can help us avoid costly joins in a relational
database.
 Note- that denormalization does not mean not doing normalization. It is an
optimization technique that is applied after doing normalization.
 In a traditional normalized database, we store data in separate logical tables and
attempt to minimize redundant data. We may strive to have only one copy of each
piece of data in database.
 For example, in a normalized database, we might have a Courses table and a
Teachers table.Each entry in Courses would store the teacherID for a Course but not
the teacherName. When we need to retrieve a list of all Courses with the Teacher
name, we would do a join between these two tables.
 In some ways, this is great; if a teacher changes is or her name, we only have to
update the name in one place.
 The drawback is that if tables are large, we may spend an unnecessarily long time
doing joins on tables.
 Denormalization, then, strikes a different compromise. Under denormalization, we
decide that we’re okay with some redundancy and some extra effort to update the
database in order to get the efficiency advantages of fewer joins.
Pros of Denormalization:-
1. Retrieving data is faster since we do fewer joins
2. Queries to retrieve can be simpler (and therefore less likely to have bugs),
since we need to look at fewer tables.

Dept., of CSE, GST


Page 32
NORMALIZATION OF DATABASE TABLES MODULE 4

Cons of Denormalization:-
1. Updates and inserts are more expensive.
2. Denormalization can make update and insert code harder to write.
3. Data may be inconsistent.
4. Data redundancy necessities more storage.

Why BCNF is better than 3NF?

3NF requires that every non-key attribute is fully and nontransitionally dependent on each candidate key.
There is no such requirement for key attributes.

BCNF requires that every attribute is fully and nontransitionally dependent on each candidate key. Which
can also be phrased as: satisfies 3NF and in addition also requires that every key attribute is fully and
nontransitionally dependent on each candidate key.

The above means that every BCNF relation is also 3NF, but every 3NF relation is not necessarily BCNF.
Hence BCNF is stronger than 3NF.

Dept., of CSE, GST


Page 33

You might also like