Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

UNIT-4

SCHEMA REFINEMENT and NORMAL FORMS


Syllabus: Problems caused by redundancy, Decompositions, problem related to
decomposition, reasoning about FDS, FIRST, SECOND, THIRD Normal forms, BCNKF,
Lossless join Decomposition, Dependency preserving Decomposition, Schema refinement in
Data base Design, Multi valued Dependencies, FORTH Normal Form.

Schema Refinement: The purpose of Schema refinement is used for a refinement


approach based on decompositions. Redundant storage of information (i.e. duplication of
data) is main cause of problem. This redundancy is eliminated by decompose the relation.

Q. Problems caused by redundancy: Redundancy is a method of storing the same


information repeatedly. It means, storing the same data more than one place with in a
database is can lead several problems. Such as
1) Redundant Storage: It removes the multi-valued attribute. It means, some tuples or
information is stored repeatedly.
2) Update Anomalies: Suppose, if we update one row (or record) then DBMS will update
more than one similar row, causes update anomaly.
For example, if we update the department name those who getting the salary 40000 then
that will update more than one row of employee table is causes update anomaly.
3) Insertion Anomalies: In insertion anomaly, when allows insertion for already existed
record again, causes insertion anomaly.
4) Deletion Anomalies: In deletion anomaly, when more than one record is deleted
instead of specified or one, causes deletion anomaly.
Consider the following example,
Eno ename salary rating hourly_wages hours_worked
111 suresh 25000 8 10 40
222 eswar 30000 8 10 30
333 sankar 32000 5 7 30
444 pada 31000 5 7 32
555 aswani 35000 8 10 40

In this example,
1) Redundancy storage: The rating value 8 corresponds to the hourly_wages 10
and there is repeated three times.
2) Update anomalies: If the hourly_wages in the first tuple is updated, it does
not make changes in the corresponding second and last tuples. Because, the
key element of tuples is emp_id.
i.e. update employee set hourly_wages = 12 where emp_id = 140; But the
question is update hourly_wages from 10 to 12.
3) Insertion anomalies: If we have to inset a new tuple for an employee, we should
know both the values rating value as well as hoURLy_wages value.
4) Deletion anomalies: Delete all the tupes through a given rating value, causes
deletion anomaly.
Q. Decompositions: A relation schema (table) R can be decomposition into a
collection of smaller relation schemas (tables) (i.e. R 1, R2, . . . . Rm) to eliminate
anomalies caused by the redundancy in the original relation R is called Decomposition
of relation. This shown in Relational Algebra as
R1 с R for 1 ≤ i ≤ m and R1 U R2 U R3 . . . Rm = R.
(or)
A decomposition of a relation schema R consists of replacing the relation schema by
two or more relation schemas that each contains a subset of the attributes of R
and together include all attributes in R.
(or)
In simple, words, “The process of breaking larger relations into smaller relations is
known as decomposition”.
Consider the above example that can be decomposing the following hourly_emps
relation into two relations.
Hourly_emp (eno, ename, salary, rating, houly_wages, hours_worked)
This is decomposed into
Hourly_empsd( eno, ename, salary, rating, hours_worked) and
Wages(rating, houly_wages)
Eno ename salary rating hours_worked
111 suresh 25000 8 40
222 eswar 30000 8 30
333 sankar 32000 5 30
444 padma 31000 5 32
555 aswani 35000 8 40
rating hourly_wages
8 10
5 7

Q. Problems Related to Decomposition:


The use of Decomposition is to split the relation into smaller parts to eliminate
the anomalies. The question is
1. What are problems that can be caused by using decomposition?
2. When do we have to decompose a relation?
The answer for the first question is, two properties of decomposition are
considered.
1) Lossless-join Property: This property helps to identify any instance of the
original relation from the corresponding instance of the smaller relation
attained after decomposition.
2) Dependency Preservation: This property helps to enforce constraint on
smaller relation in order to enforce the constraints on the original relation.
The answer for the second question is, number of normal forms exists. Every
relation schema is in one of these normal forms and these normal forms can help to
decide whether to decompose the relation schema further or not.
The Disadvantage of decomposition is that it enforces us to join the decomposed
relations in order to solve the queries of the original relation.

Q. What is Relation?: A relation is a named two–dimensional table of data.


Each relation consists of a set of named columns and an arbitrary number of
unnamed rows. Emp-id ename dept name salary
For example, a relation named 100 Simpson Marketing 48000
Employee contains following 140 smith Accounting 52000
110 Lucero Info-systems 43000
attributes, emp-id, ename, 190 Davis Finance 55000
dept name and salary. 150 martin marketing 42000
marketing 42000

What are the Properties of relations?:


The properties of relations are defined on two dimensional tables. They are:
❑ Each relation (or table) in a database has a unique name.
❑ An entry at the intersection of each row and column is atomic or single. These can be
no multiplied atttributes in a relation.
❑ Each row is unique, no two rows in a relation are identical.
❑ Each attribute or column within a table has a unique name.
❑ The sequence of columns (left to right) is insignificant the column of a relation can be
interchanged without changing the meaning use of the relation.
❑ The sequence of rows (top to bottom) is insignificant. As with column, the rows of
relation may be interchanged or stored in any sequence.
Q. Removing multi valued attributes from tables: the “second property of
relations” in the above is applied to this table. In this property, there is no multi valied
attributes in a relation. This rule is applied to the table or relation to eliminate the one or
more multi valued attribute. Consider the following the example, the employee table contain
6 records. In this the course title has multi valued values/attributes. The employee 100
taken two courses vc++ and ms-office. The record 150 did not taken any course. So it is null.
Emp-id name dept-name salary course_title
100 Krishna cse 20000 vc++, msoffice
140 Rajasekhar cse 18000 C++, DBMS,DS

Now, this multi valued attributes are eliminated and shown in the following
employee2 table.
Emp-id name dept-name salary course_title
100 Krishna cs 20000 vc++
100 Krishna cs 20000 MSoffice
140 Rajasekhar cs 18000 C++
140 Rajasekhar cs 18000 DBMS
140 Rajasekhar cs 18000 DS
Functional dependencies : A functional dependency is a constraint between two
attributes (or) two sets of attributes.
For example, the table EMPLOYEE has 4 columns that are Functionally dependencies on
EMP_ID.

Emp_id Ename Dept_name Salry

Partial functional dependency : It is a functional dependency in which one or more non- key
attributes are functionally dependent on part of the primary key. Consider the following
graphical representation, in that some of the attributes are partially depend on primary key.

Emp_id Course_title Ename Dept_name Salry date completed

In this example, Ename, Dept_name, and salary are fully functionally depend on Primary key
of Emp_id. But Course_title and date_completed are partial functional dependency. In this
case, the partial functional dependency creates redundancy in that relation.
Q. What is Normal Form? What are steps in Normal Form?
NORMALIZATION: Normalization is the process of decomposing relations to produce
smaller, well-structured relation.
To produce smaller and well structured relations, the user needs to follow six
normal forms.
Steps in Normalization:
A normal form is state of relation that result from applying simple rules from
regarding functional dependencies ( relationships between attributes to that relation. The
normal form are
1. First normal form 2. Second normal form
3. Third normal form 4. Boyce/codd normal form
5. Fourth normal form 6. Fifth normal form

1) First Normal Form: Any multi-valued attributes (also called repeating groups) have
been removed,
2) Second Normal Form: Any partial functional dependencies have been removed.
3) Third Normal Form: Any transitive dependencies have been removed.
4) Boyce/Codd Normal Form: Any remaining anomalies that result from functional
dependencies have been removed.
5) Fourth Normal Form: Any multi-valued dependencies have been removed.
6) Fifth Normal Form: Any remaining anomalies have been removed.

Advantages of Normalized Relations Over the Un-normalized


Relations: The advantages of normalized relations over un-normalized relations are,
1) Normalized relation (table) does not contain repeating groups whereas, un-
normalized relation (table) contains one or more repeating groups.
2) Normalized relation consists of a primary key. There is no primary key presents in
un-normalized relation.
3) Normalization removes the repeating group which occurs many times in a table.
4) With the help of normalization process, we can transform un-normalized table to
First Normal Form (1NF) by removing repeating groups from un-normalized tables.
5) Normalized relations (tables) gives the more simplified result whereas un-
normalized relation gives more complicated results.
6) Normalized relations improve storage efficiency, data integrity and scalability. But
un-normalized relations cannot improvise the storage efficiency and data integrity.
7) Normalization results in database consistency, flexible data accesses.

Q. FIRST NORMAL FORM (1NF): A relation is in first normal form (1NF) contains no
multi-Valued attributes. Consider the example employee, that contain multi valued attributes
that are removing and converting into single valued attributes
Multi valued attributes in course title
Emp-id name dept-name salary course_title
100 Krishna Cse 20000 vc++,
msoffice
140 Raja It 18000 C++,
DBMS,
DS
Removing the multi valued attributes and converting single valied using First NF
Emp-id name dept-name salary course_title
100 Krishna Cse 20000 vc++
100 Krishna Cse 20000 msoffice
140 Raja I 18000 C++
t
140 Raja I 18000 DBMS
t
140 Raja I 18000 DS
t

SECOND NORMAL FORM(2NF): A relation in Second Normal Form (2NF) if it is in the 1NF
and if all non-key attributes are fully functionally dependent on the primary key. In a
functional dependency X → Y, the attribute on left hand side ( i.e. x) is the primary key of
the relation and right side attributes on right hand side i.e. Y is the non-key attributes. In
some situation some non key attributes are partial functional dependency on primary key.
Consider the following example for partial functional specification and also that convert into
2 NF to decompose that into two relations.

Emp_id Course_title Ename Dept_name Salry date completed

In this example, Ename, Dept_name, and salary are fully functionally depend on Primary key
of Emp_id. But Course_title and date_completed are partial functional dependency. In this
case, the partial functional dependency creates redundancy in that relation.
➢ To avoid this, convert this into Second Normal Form. The 2NF will decompose
the relation into two relations, shown in graphical representation.

EMPLOYEE Emp_id Ename Dept_name Salry

COURSE
Course_title Date_Completed Emp_id

In the above graphical representation


➢ the EMPLOYEE relation satisfies rule of 1 NF in Second Normal form and
➢ the COURSE relation satisfies rule of 2 NF by decomposing into two relation.

THIRD NORMAL FORM(3NF): A relation that is in Second Normal form and has no
transitive dependencies present.
Transitive dependency : A transitive is a functional dependency between two non key
attributes. For example, consider the relation Sales with attributes cust_id, name, sales
person and region that shown in graphical representation.

Cust_id Cname Sales_person Region


Salry

CUST_ID NAME SALESPERSON REGION


1001 Anand Smith South
1002 Sunil kiran West
1003 Govind babu rao East
1004 Manohar Smith South
1005 Madhu Somu North
In this example, to insert, delete and update any row that facing Anomaly.

a) Insertion Anomaly: A new salesperson is assigned to North Region without assign a


customer to that salesperson. This causes insertion Anomaly.
b) Deletion Anomaly: If a customer number say 1003 is deleted from the table, we lose the
information of salesperson who is assigned to that customer. This causes, Deletion
Anomaly.
c) Modification Anomaly: If salesperson Smith is reassigned to the East region, several
rows must be changed to reflect that fact. This causes, update anomaly.
To avoid this Anomaly problem, the transitive dependency can be removed by decomposition
of SALES into two relations in 3NF.

Consider the following example, that removes Anomaly by decomposing into two relations.
CUST_ID NAME SALESPERSON SALESPERSON REGION
1001 Anand Smith Smith South
1002 Sunil kiran kiran West
1003 Govind babu rao babu rao East
1004 Manohar Smith Smith South
1005 Madhu Somu Somu North

SALES PERSON CUSTOMER

Sales_person Region Cust_id Cname Salry

Q. BOYCE/CODD NORMAL FORM(BCNF): A relation is in BCNF if it is in


3NF and every determinant is a candidate key.
+
FD in F of the form X → A where X с S and A є S, X is a super key of R.
Boyce-Codd normal form removes the remaining anomalies in 3NF that are resulting
from functional dependency, we can get the result of relation in BCNF.
For example, STUDENT-ADVIDSOR IN 3NF
STUDENT-ADVISOR

Student-id major-subject faculty-advisor

STUDENT-ADVISOR relstion with simple data.

S TYDENT-ID MAJOR-SUBJECT FACULTY-ADVISOR

1 MATHS B
2 MATHS B
3 MATHS B
4 STATISTICS A
5 STATISTICS A

In the above relation the primary key in student-id and major-subject. Here the part of the
primary key major-subject is dependent upon a non key attribute faculty–advisor. So, here
the determenant the faculty-advisor. But it is not candidate key.
Here in this example there are no partial dependencies and transitive dependencies.
There is only functional dependency between part of the primary key and non key attribute.
Because of this dependency there is anomely in this relation.
Suppose that in maths subject the advisor’ B’ is replaced by X. this change must be
made in two or more rows in this relation. This is an updation anomely.
To convert a relation to BCNF the first step in the original relation is modified that the
determinant(non key attributes) becoms a component of the primary key of new relation.
The attribute that is dependent on determinant becomes a non key attributes .
STUDENT-ADVISOR

STUDENT-ID FACULTY-ADVIDSOR MAJOR-SUBJEST

The second step in the conversion process is decompose the relation to eleminate the partial
functional dependency. This results in two relations. these relations are in 3NF and BCNF
. since there is only one candidate key. That is determinant.
Two relations are in BCNF.
ADVISOR STUDENT

Faculty- advisor major-subject Student-id faculty-advisor

In these two relations the student relation has a composite key , which contains attributes
student-id and faculty-advisor. Here faculty–advisor a foreign key which is referenced to
the primary key of the advisor relation.
Two relations are in BCNF with simple data. Student_id Faculty_Advisor
Faculty Advisor Major_subject 1 B
2 B
B MATHS 3 A
A PHYSICS 4 A
5 A
Q.Fourth Normal Form (4 NF) : A relation is in BCNF that contain no multi- valued
dependency. In this case, 1 NF will repeated in this step. For example, R be a relation
schema, X and Y be attributes of R, and F be a set of dependencies that includes both FDs
and MVDs. (i.e. Functional Dependency and Multi-valued Dependencies). Then R is said to be
in Fourth Normal Form (4NF) if for every MVD X →→ Y that holds over R, one of the
following statements is true.
1) Y с X or XY = R, or 2) X is a super key.
Example: Consider a relation schema ABCD and suppose that are FD A → BCD and the MVD
B → → C are given as shown in Table B C A D tuples
It shows three tuples from relation ABCD that satisfies
the given MVD B → → C. From the definition of a MVD, b c1 a1 d1 - tuple t1
given tuples t1 and t2, it follows that tuples t3 must also
be included in the above relation. Now, consider b c2 a2 d2 - tuple t2

tuples t2 and t3. From the given FD A → BCD and b c1 a2 d2 - tuple t3


the fact that these tuples have the same A-value, we
can compute
the c1 = c2. Therefore, we see that the FD B → C must hold over ABCD whenever the FD
A → BCD and the MVD B → → C holds. If B → C holds, the relation is not in BCNF but the
relation is in 4 NF.
The fourth normal from is useful because it overcomes the problems of the various
approaches in which it represents the multi-valued attributes in a single relation.

Q. Fifth Normal Form (5 NF) : Any remaining anomalies from 4 NF relation have
been removed.
A relation schema R is said to be in Fifth Normal Form (5NF) if, for every join dependency
* (R1, ......... Rn) that holds over R, one of the following statements is true.
*Ri = R for some I, or
* The JD is implied by the set of those FDs over R in which the left side is a key for R.
It deals with a property loss less joins.

Q. LOSSELESS-JOIN DECOMPOSITION:
Let R be a relation schema and let F be a set FDs (Functional Dependencies) over R. A
decomposition of R into two schemas with attribute X and Y is said to be lossless-join
decomposition with respect to F, if for every instance r of R that satisfies the dependencies
in Fr.
πx (r) πy (r) = r
In simple words, we can recover the original relation from the decomposed
relations.
In general, if we take projection of a relation and recombine them using natural join, we
obtain some additional tuples that were not in the original relation.
S P D S P P D

s1 p1 d1 s1 p1
P1 d1
s2 p2 d2 s2 p2 p2 d2

s2 p1 d3 s2 p1 p1 d3

S P D
s1 p1 d1

s1 p1 d3

s2 p2 d2

s2 p1 d1

s2 p1 d3
The decomposition of relation schema r i.e. SPD into SP i.e. PROJECTING πsp (r
) and PD i.e., projecting πPD (r) is therefore lossless decomposition as it gains back all
original tuples of relation ‘r’ as well as with some additional tuples that were not in original
relation ‘r’.
Q. Dependency-Preserving Decomposition:
Dependency-preserving decomposition property allows us to check integrity constraints
efficiently. In simple words, a dependency-preserving decomposition allows us to enforce all
FDs by examining a single relation on each insertion or modification of a tuple.
Let R be a relation schema that is decomposed in two schemas with attributes X and
Y and let F be a set of FDs over R. The projection of F on X is the set of FD s in the closure
F+ that involves only attributes in X. We denote the projection of F on attributes X as F x.
Note that a dependency U → V in F+ is in Fx only if all the attributes in U and Y are in X.
The decomposition of relation schema R with FDs F into schemas with attributes
X and Y is dependency-preserving if ( F x U F y)+ = F+.
That is, if we take the dependencies in Fx and Fy and compute the closure of their
union, then we get back all dependencies in the closure of F. To enforce F x. we need to
examine only relation X (that inserts on that relation) to enforce F y, we need to examine
only relation Y.
Example: Suppose that a relation R with attributes ABC is decomposed into relations with
attributes AB and BC. The set F of FDs over r includes A → B , B → C and C → A. Here, A
→ B is in FAB and B → C is in FBC and both are dependency-preserving. Where as C → A is
not implied by the dependencies of FAB and FBC. Therefore C → A is not dependency-
preserving.
Consequently, FAB also contains B → A as well as A → B and FBC contains C →
B as well as B → C. Therefore FAB U FBC contain A → B , B → C, B → A and C → B.
Now, the closure of the dependencies in FAB and FBC includes C → A (because, from
C → B, B → A and transitivity rule, we compute as C → A).
Unit – IV
SCHEMA REFINEMENT (NORMALIZATION) : Purpose of Normalization or schema refinement, concept
of functional dependency, normal forms based on functional dependency(1NF, 2NF and 3 NF), concept of
surrogate key, Boyce-codd normal form(BCNF), Lossless join and dependency preserving decomposition,
Fourth normal form(4NF).

PURPOSE OF NORMALIZATION OR SCHEMA REFINEMENT


Database Normalization is a technique of organizing the data in the database. Normalization is a
systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like
Insertion, Update and Deletion Anomalies. It is a multi-step process that puts data into tabular form by
removing duplicated data from the relation tables. If a database design is not perfect, it may contain anomalies,
which are like a bad dream for any database administrator. Managing a database with anomalies is next to
impossible.
Normalization is used for mainly two purpose,
 Eliminating redundant (useless) data.
 Ensuring data dependencies make sense i.e., data is logically stored.

Problem Without Normalization


Without Normalization, it becomes difficult to handle and update the database, without facing data loss.
Insertion, Updation and Deletion Anamolies are very frequent if Database is not Normalized. To understand
these anomalies let us take an example of Student table.
S_id S_name S_Address Subject_opted
401 Kesava Noida JAVA
402 Rama Panipat DBMS
403 Krishna Jammu DBMS
404 Kesava Noida Data Mining

 Updation Anamoly : If data items are scattered and are not linked to each other properly, then it could
lead to strange situations. For example, when we try to update one data item having its copies scattered
over several places, a few instances get updated properly while a few others are left with old values.
Such instances leave the database in an inconsistent state.

Example: To update address of a student who occurs twice or more than twice in a table, we will have
to update S_Address column in all the rows, else data will become inconsistent.

 Insertion Anamoly : We tried to insert data in a record that does not exist at all.

Example: Suppose for a new admission, we have a Student id(S_id), name and address of a student but
if student has not opted for any subjects yet then we have to insert NULL there, leading to Insertion
Anamoly.

 Deletion Anamoly : We tried to delete a record, but parts of it was left undeleted because of
unawareness, the data is also saved somewhere else.

Example: If (S_id) 402 has only one subject and temporarily he drops it, when we delete that row,
entire student record will be deleted along with it.
CONCEPT OF FUNCTIONAL DEPENDENCY
Functional dependency (FD) is a set of constraints between two attributes in a relation. Functional dependency
says that if two tuples have same values for attributes A1, A2,..., An, then those two tuples must have to have
same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally determines Y.
The left-hand side attributes determine the values of attributes on the right-hand side.

ARMSTRONG'S AXIOMS
If F is a set of functional dependencies then the closure of F, denoted as F+, is the set of all functional
dependencies logically implied by F. Armstrong's Axioms are a set of rules, that when applied repeatedly,
generates a closure of functional dependencies.
 Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds beta.
 Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is adding
attributes in dependencies, does not change the basic dependencies.
 Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then a → c also
holds. a → b is called as a functionally that determines b.

TRIVIAL FUNCTIONAL DEPENDENCY


 Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called a
trivial FD. Trivial FDs always hold.
 Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial FD.
 Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is said to be a completely
non-trivial FD.

PROPERTIES OF FUNCTIONAL DEPENDENCIES:


1. Reflexive: If Y⊆X then X → Y is a Reflexive Functional Dependency.
Ex: AB→A , A⊆AB holds. Therefore AB→A is a Reflexive Functional Dependency.

2. Augmentation: If X→Y is a functional dependency then by augmentation, XZ→YZ is also a functional


dependency.

3.Transitivity: IF X→Y and Y→Z are two functional dependencies then by transitivity, X→Z is also a
functional dependency.

4. Union: If X→Y and X→Z are two functional dependencies then, X→YZ is also a functional dependency.

5. Decomposition: If X→YZ is a functional dependency then X→Y and X→Z are also functional
dependencies.
CLOSURE SET OF A FUNCTIONAL DEPENDENCY (F+)
It is a set of all functional dependencies that can be determined using the given set of dependencies. It is
denoted by F+.
Attribute Closure (X+): It is a set of all the attributes that can be determined using X. It is denoted by X+,
where X is any set of attributes.
Example:
R(A,B,C) F:{A→B , B→C}
A+={A,B,C} B+={B,C} C+={C}
AB+={A,B,C} AC+={A,C,B} BC+={B,C} ABC+={A,B,C}

Identifying keys in the given relation based on Functional Dependencies associated with it
X+ is a set of attributes that can be determined using the given set X of attributes.
 If X+ contains all the attributes of a relation, then X is called "Super key" of that relation.
 If X+ is minimal set, then X is called "Candidate Key" of that relation.
If no closure contains all the elements then in such a case we can find independent attributes of that
relation i.e., the attributes that which are not in the R.H.S. of any dependency. If the closure of the Independent
attributes contains all the elements then it can be treated as a candidate key.
If the closure of independent attributes also doesn't contain all the elements then we try to find the key
by adding dependent attributes one by one. If we couldn't find key then we can add groups of dependent
attributes till we find a key to that relation.

FIRST NORMAL FORM (1NF)


1NF is designed to disallow multi valued attributes, composite attributes and their combinations. Means
1NF allows only atomic values i.e., the attribute of any tuple must be either 1 or NULL value.
A relation having multi valued and composite attributes is known as Un Normalized Relation. Removal
of These multi valued and composite attributes will turn the UN Normalized Relation to 1NF Relation.
Example:
Professor
Un Normalized Relation Since
ID Name Salary salary is a Multi valued
attribute
1 Rama {40000,10000,15000,10000}
We can eliminate this multi valued attribute by splitting the salary column to more specific columns like
Basic, TA, DA, HRA. The Above relation in 1NF is as follows.
Professor
ID Name Basic TA DA HRA 1NF
1 Rama 40000 10000 15000 10000

Every Relation in the Relation Database must be in 1NF.


SECOND NORMAL FORM (2NF):
It depends on the concept of Full Functional Dependences(FFD) and disallows Partial Functional
Dependencies(PFD) i.e., for a relation that has concatenated primary key, each attribute in the table that is not
part of the primary key must depend upon the entire concatenated key for its existence. If any attribute depends
only on one part of the concatenated key, then the relation fails Second normal form.
A relation with Partial Functional dependencies can be made as in 2NF by removing them.
PFD: part of key → Non Key
FFD: Key → Non key
Example:
Medication Dependencies
Patient No. Drug No. of units Pname
P1 D1 10 Kiran Patient No. → Pname
P1 D2 20 Kiran
P1 D3 15 Kiran Patient No.,Drug → No. of units

P2 D4 15 Raj

The above relation is in 1NF but not in 2NF. Key for the relation is Patient No. and Drug. After
eliminating the Partial functional dependencies, the decomposed relations are:
Patient Parent Relation Dependencies
Patient No. Pname
P1 Kiran Key: Patient No. Patient No. → Pname
P2 Raj

Medication Child Relation Dependencies


Patient No. Drug No. of units
P1 D1 10 Key: Patient No. and Drug Patient No.,Drug → No. of
Patient No. is Foreign key units
P1 D2 20
referring Patient No. in the
P1 D3 15 Parent Relation (Patient)
P2 D4 15

The above 2 relations satisfy 2NF. They don't have partial functional dependencies.
Note: If key is only one attribute then the relation is always in 2NF.
THIRD NORMAL FORM (3NF):
Third Normal form applies that every non-prime attribute of a relation must be dependent on primary
key, or we can say that, there should not be the case that a non-prime attribute is determined by another non-
prime attribute. So this transitive functional dependency should be removed from the relation and also the
relation must be in Second Normal form. Simply, It is based on the concept of transitive dependencies. 3NF
disallows the transitive dependencies.
Dependency: Key → Non Key
Transitive Dependency: Non Key → Non Key
A relation satisfying 2NF and with Transitive Functional dependencies can be made as in 3NF by removing the
transitive functional dependencies.
Example:
Contains Key
Patient No. Pname Ward No. Ward Name Patient No. and Pname
P1 Kumar W1 ICU Dependencies
P2 Kiran W1 ICU Patient No.,Pname → Ward No.
P3 Kamal W1 ICU Ward No. → Ward Name (Transitive
Dependency)
P4 Sharath W2 General

Ward No. → Ward Name is transitive because Ward No. is not a key. To make this relation satisfy 3NF
transitive dependency must be removed from it. It can be done by decomposing the relation. The decomposed
relations are as follows:
Ward Key Dependencies
Ward No. Ward Name
W1 ICU Ward No. Ward No. → Ward Name
W2 General

Contains Primary Key Dependencies


Patient No. Pname Ward No. Patient No. and Pname
P1 Kumar W1 Patient No.,Pname → Ward No.
P2 Kiran W1 Foreign Key
P3 Kamal W1 Word No. refers Ward
No. in relation Ward
P4 Sharath W2

The above two relations satisfy 3NF. They don't have transitive dependencies.
Note 1: A relation R is said to be in 3NF if whenever a non trivial functional dependency of the form X → A
holds then either X is a super key or A is a prime attribute.
Note 2: If all attributes are prime attributes then the relation is in 3NFbecause with such attributes no partial
functional dependencies and transitive dependencies exists.
BOYCE - CODD NORMAL FORM(BCNF):
BCNF is a higher version of the Third Normal form. This form deals with certain type of anamoly that is not
handled by 3NF. A 3NF relation which does not have multiple overlapping candidate keys is said to be in
BCNF. For a relation to be in BCNF, following conditions must be satisfied:

 R must be in 3rd Normal Form


 and, for each functional dependency ( X -> Y ), X should be a super Key.
(OR)
A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every dependency is a
candidate key. A relation which is in 3NF is almost always in BCNF. These could be same situation when a
3NF relation may not be in BCNF the following conditions are found true.
1. The candidate keys are composite.
2. There are more than one candidate keys in the relation.
3. There are some common attributes in the relation.

Professor Code Department Head of Dept. Percent Time


P1 Physics Ghosh 50
P1 Mathematics Krishnan 50
P2 Chemistry Rao 25
P2 Physics Ghosh 75
P3 Mathematics Krishnan 100

Consider, as an example, the above relation. It is assumed that:


1. A professor can work in more than one department
2. The percentage of the time he spends in each department is given.
3. Each department has only one Head of Department.
Dependencies of the above relation are:
Department,Professor Code → Head of the Depatrment
Department,Professor Code → Percent time
Department → Head of the Depatrment
Head of the Department,Professor Code → Depatrment
Head of the Department,Professor Code → Percent time

The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are
duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that Rao is the
Head of Department of Chemistry.
The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and deleting
Head of Dept. form the given relation. The normalized relations are shown in the following.
Professor_work Dependencies
Professor Code Department Percent Time Department,Professor Code → Percent time
P1 Physics 50
P1 Mathematics 50 Department is foreign key referring
Department in the Deartment_Details relation
P2 Chemistry 25
P2 Physics 75
P3 Mathematics 100

Depatrment_Details Dependencies
Department Head of Dept. Department → Head of the Department
Physics Ghosh
Mathematics Krishnan
Chemistry Rao

FOURTH NORMAL FORM (4NF)


When attributes in a relation have multi-valued dependency, further Normalization to 4NF and 5NF are
required. A multi-valued dependency is a typical kind of dependency in which each and every attribute within
a relation depends upon the other, yet none of them is a unique primary key. Consider a vendor supplying many
items to many projects in an organization. The following are the assumptions:
1. A vendor is capable of supplying many items.
2. A project uses many items.
3. A vendor supplies to many projects.
4. An item may be supplied by many vendors.
A multi valued dependency exists here because all the attributes depend upon the other and yet none of them is
a primary key having unique value.

Vendor Code Item Code Project No.


V1 I1 P1
V1 I2 P1
V1 I1 P3
V1 I2 P3
V2 I2 P1
V2 I3 P1
V3 I1 P2
V3 I1 P3
The given relation has a number of problems. For example:
1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row with a blank for
item code has to be introduced.
2. The information about item I1 is stored twice for vendor V3.
Observe that the relation given is in 3NF and also in BCNF. It still has the problem mentioned above. The
problem is reduced by expressing this relation as two relations in the Fourth Normal Form (4NF). A
relation is in 4NF if it has no more than one independent multi valued dependency or one independent
multi valued dependency with a functional dependency.
The table can be expressed as the two 4NF relations given as following. The fact that vendors are capable
of supplying certain items and that they are assigned to supply for some projects in independently specified
in the 4NF relation.

Vendor_Supply Vendor_Project
Vendor Code Item Code Vendor Code Project No.
V1 I1 V1 P1
V1 I2 V1 P3
V2 I2 V2 P1
V2 I3 V3 P2
V3 I1 .

SUMMARY
Input Transformation Output
Relation Relation
All Relations Eliminate variable length record. Remove multi-attribute lines in table. 1NF
1NF Remove dependency of non-key attributes on part of a multi-attribute key. 2NF
2NF Remove dependency of non-key attributes on other non-key attributes. 3NF
3NF Remove dependency of an attribute of a multi attribute key on an attribute of BCNF
another (overlapping) multi-attribute key.
BCNF Remove more than one independent multi-valued dependency from relation 4NF
by splitting relation.
4NF Add one relation relating attributes with multi-valued dependency. 5NF

PROPERTIES OF DECOMPOSITION:
Every Decomposition must satisfy 2 properties.
1. Lossless join
2. Dependency Preserving
1. Lossless join:
If we decompose a relation R into relations R1 and R2,
 Decomposition is lossy if R1 ⋈ R2 ⊃ R
 Decomposition is lossless if R1 ⋈ R2 = R
To check for lossless join decomposition using FD set, following conditions must hold:
1. Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute of R must be either in R1
or in R2. Att(R1) U Att(R2)=Att(R)
2. Intersection of Attributes of R1 and R2 must not be NULL. Att(R1) ∩ Att(R2) ≠ Φ
3. Common attribute must be a key for at least one relation (R1 or R2) Att(R1) ∩ Att(R2) -> Att(R1)
or Att(R1) ∩ Att(R2) -> Att(R2)

For Example, A relation R (A, B, C, D) with FD set {A->BC} is decomposed into R1(ABC) and R2(AD) which
is a lossless join decomposition as:
1. First condition holds true as Att(R1) U Att(R2) = (ABC) U (AD) = (ABCD) = Att(R).
2. Second condition holds true as Att(R1) ∩ Att(R2) = (ABC) ∩ (AD) ≠ Φ
3. Third condition holds true as Att(R1) ∩ Att(R2) = A is a key of R1(ABC) because A->BC is given.

Dependency Preserving Decomposition

If we decompose a relation R into relations R1 and R2, All dependencies of R either must be a part of
R1 or R2 or must be derivable from combination of FD’s of R1 and R2.
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and R2(AD)
which is dependency preserving because FD A->BC is a part of R1(ABC).

Decomposition of a relation is done when a relation in relational model is not in appropriate normal form.
Relation R is decomposed into two or more relations if decomposition is lossless join as well as dependency
preserving.

CONCEPT OF SURROGATE KEY:


In database design, it is a good practice to have a primary key for each table. There are two ways to specify a
primary key: The first is to use part of the data as the primary key. For example, a table that includes
information on employees may use Social Security Number as the primary key. This type of key is called a
natural key. The second is to use a new field with artificially-generated values whose sole purpose is to be used
as a primary key. This is called a surrogate key.
A surrogate key has the following characteristics:
1) It is typically an integer.
2) It has no meaning. You will not be able to know the meaning of that row of data based on the surrogate
key value.
3) It is not visible to end users. End users should not see a surrogate key in a report.
Surrogate keys can be generated in a variety of ways, and most databases offer ways to generate
surrogate keys. For ex, Oracle uses SEQUENCE , MySQL uses AUTO_INCREMENT , and SQL Server uses
IDENTITY.
Surrogate keys are often used in data warehousing systems, as the high data volume in a data warehouse
means that optimizing query speed becomes important. Using a surrogate key is advantageous because it is
quicker to join on a numeric field rather than a non-numeric field. This does come at a price — when you insert
data into a table, whether via an ETL Process or via an "IINSERT INTO" statement, the system needs to take
more resources to generate the surrogate key.
There are no hard rules on when to employ a surrogate key as opposed to using the natural key. Often
the data architect would need to look at the nature of the data being modeled and stored and consider any
possible performance implications.

You might also like