Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

UNIT 4

RELATIONAL DATABASE DESIGN


In general, the goal of relational database design is to generate a set of relation schemas that
allows us to store information without unnecessary redundancy, yet also allows us to retrieve
information easily. This is accomplished by designing schemas that are in an appropriate normal
form. To determine whether a relation schema is in one of the desirable normal forms, we need
information about the real-world enterprise that we are modeling with the database. Some of this
information exists in a well-designed E-R diagram, but additional information about the enterprise
may be needed as well.

FEATURES OF GOOD RELATIONAL DESIGNS


Entity-relationship design provides an excellent starting point for creating a relational database
design. Obviously, the goodness (or badness) of the resulting set of schemas depends on how good
the E-R design was in the first place.
The schemas for the university database is given in the below figure.

Design Alternative: Larger Schemas


Suppose that instead of having the schemas instructor and department, we have the schema:

This represents the result of a natural join on the relations corresponding to instructor and
department. This seems like a good idea because some queries can be expressed using fewer joins,
until we think carefully about the facts about the university that led to our E-R design.
Let us consider the instance of the inst_dept relation shown in below figure. Notice that we
have to repeat the department information (“building” and “budget”) once for each instructor in the
department. For example, the information about the Comp.Sci. department (Taylor, 100000) is
included in the tuples of instructors Katz, Srinivasan, and Brandt.
It is important that all these tuples agree as to the budget amount since otherwise our
database would be inconsistent. In our original design using instructor and department, we stored
the amount of each budget exactly once. This suggests that using inst dept is a bad idea since it

Department of IT Page 1
stores the budget amounts redundantly and runs the risk that some user might update the budget
amount in one tuple but not all, and thus create inconsistency.

Even if we decided to live with the redundancy problem, there is still another problem with
the inst_dept schema. Suppose we are creating a new department in the university. In the
alternative design above, we cannot represent directly the information concerning a department
(dept_name, building, budget) unless that department has at least one instructor at the university.
This is because tuples in the inst_dept table require values for ID, name, and salary. This means that
we cannot record information about the newly created department until the first instructor is hired
for the new department.

Problems Caused by Redundancy: -


Storing the same information redundantly, that is, in more than one place within a database, can
lead to several problem.
 Redundant Storage: - Some information is stored repeatedly.

 Update Anomalies: - If one copy of such repeated data is updated, an inconsistency is created
unless all copies are similarly updated.

 Insertion Anomalies: - It may not be possible to store certain information unless some other,
unrelated, information is stored as well.

 Deletion Anomalies: - It may not be possible to delete certain information without losing some
other, unrelated, information as well.

Design Alternative: Smaller Schemas


Consider an extreme case where all we had were schemas consisting of one attribute. No
interesting relationships of any kind could be expressed.

into the following two schemas:

The flaw in this decomposition arises from the possibility that the enterprise has two employees
with the same name. This is not unlikely in practice, as many cultures have certain highly popular
names. Of course each person would have a unique employee-id, which is why ID can serve as the

Department of IT Page 2
primary key. As an example, let us assume two employees, both named Kim, work at the university
and have the following tuples in the relation on schema employee in the original design:

The above figure shows these tuples, the resulting tuples using the schemas resulting from the
decomposition, and the result if we attempted to regenerate the original tuples using a natural join.
As we see in the figure, the two original tuples appear in the result along with two new tuples that
incorrectly mix data values pertaining to the two employees named Kim. Although we have more
tuples, we actually have less information in the following sense. We can indicate that a certain
street, city, and salary pertain to someone named Kim, but we are unable to distinguish which of
the Kims. Thus, our decomposition is unable to represent certain important facts about the
university employees. Clearly, we would like to avoid such decompositions. We shall refer to such
decompositions as being lossy decompositions, and, conversely, to those that are not as lossless
decompositions.

NORMALIZATION: -
Database Normalization is a technique of organizing the data in the database. Normalization is a
systematic approach of decomposing tables to eliminate data redundancy(repetition) and
undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization divides
the larger table into the smaller table and links them using relationship. The normal form is used to
reduce redundancy from the database table.
Normalization is used for mainly two purposes,
 Eliminating redundant (useless) data.
 Ensuring data dependencies make sense i.e data is logically stored.
Problems without Normalization
If a table is not properly normalized and have data redundancy then it will not only eat up extra
memory space but will also make it difficult to handle and update the database, without facing data
Department of IT Page 3
loss. Insertion, Updation and Deletion Anomalies are very frequent if database is not normalized. To
understand these anomalies let us take an example of a Student table.
Rollno name branch hod office_tel

401 Akon CSE Mr. X 53337

402 Bkon CSE Mr. X 53337

403 Ckon CSE Mr. X 53337

404 Dkon CSE Mr. X 53337


In the table above, we have data of 4 Computer Sci. students. As we can see, data for the
fields branch, hod(Head of Department) and office_tel is repeated for the students who are in the
same branch in the college, this is Data Redundancy

Insertion Anomaly: -
Suppose for a new admission, until and unless a student opts for a branch, data of the student
cannot be inserted, or else we will have to set the branch information as NULL. Also, if we have to
insert data of 100 students of same branch, then the branch information will be repeated for all
those 100 students. These scenarios are nothing but Insertion anomalies.

Updation Anomaly: -
What if Mr. X leaves the college? or is no longer the HOD of computer science department? In that
case all the student records will have to be updated, and if by mistake we miss any record, it will
lead to data inconsistency. This is Updation anomaly.

Deletion Anomaly: -
In our Student table, two different information’s are kept together, Student information and Branch
information. Hence, at the end of the academic year, if student records are deleted, we will also lose
the branch information. This is Deletion anomaly.

Normalization Rule
Normalization rules are divided into the following normal forms:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF
5. Fourth Normal Form
Normal Form Description

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes


are fully functional dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition


dependency exists.

4NF A relation will be in 4NF if it is in Boyce Codd normal form and


has no multi-valued dependency.

Department of IT Page 4
5NF A relation is in 5NF if it is in 4NF and not contains any join
dependency and joining should be lossless.

1. First Normal Form (1NF): -


A relation will be 1NF if it contains an atomic value. It states that an attribute of a table
cannot hold multiple values. It must hold only single-valued attribute. First normal form
disallows the multi-valued attribute, composite attribute, and their combinations.
The 1st Normal form expects you to design your table in such a way that it can easily be
extended and it is easier for you to retrieve data from it whenever required.
If tables in a database are not even in the 1st Normal Form, it is considered as bad database
design.
If a relation contain composite or multi-valued attribute, it violates first normal form or a
relation is in first normal form if it does not contain any composite or multi-valued attribute. A
relation is in first normal form if every attribute in that relation is singled valued attribute.

Rules for First Normal Form: -


The first normal form expects you to follow a few simple rules while designing your database,
and they are:

Rule 1: Single Valued Attributes


Each column of your table should be single valued which means they should not contain multiple
values.

Rule 2: Attribute Domain should not change


This is more of a "Common Sense" rule. In each column the values stored must be of the same
kind or type.
For example: If you have a column dob to save date of births of a set of people, then you cannot
or you must not save 'names' of some of them in that column along with 'date of birth' of others
in that column. It should hold only 'date of birth' for all the records/rows.

Rule 3: Unique name for Attributes/Columns


This rule expects that each column in a table should have a unique name. This is to avoid
confusion at the time of retrieving data or performing any other operation on the stored data. If
one or more columns have same name, then the DBMS system will be left confused.

Rule 4: Order doesn't matters


This rule says that the order in which you store the data in your table doesn't matter.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.


EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

Department of IT Page 5
The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab


2. Second Normal Form (2NF): -
For a table to be in the Second Normal Form, it must satisfy two conditions:
1) The table should be in the First Normal Form.
2) There should be no Partial Dependency. i.e., all non-key attributes are fully functional
dependent on the primary key

What is Dependency?
Let's take an example of a Student table with columns student_id, name, reg_no(registration
number), branch and address(student's home address).
student_id name reg_no branch address

In this table, student_id is the primary key and will be unique for every row, hence we can
use student_id to fetch any row of data from this table. Even for a case, where student names are
same, if we know the student_id we can easily fetch the correct record.
student_id name reg_no branch address

10 Akon 07-WY CSE Kerala

11 Akon 08-WY IT Gujarat

Hence we can say a Primary Key for a table is the column or a group of columns (composite
key) which can uniquely identify each record in the table. I can get the branch name of student
with student_id 10. Similarly, I can get name of student with student_id 10 or 11. So all I need
is student_id and every other column depends on it, or can be fetched using it. This
is Dependency and we also call it Functional Dependency.

What is Partial Dependency?


For a simple table like Student, a single column like student_id can uniquely identfy all the
records in a table. But this is not true all the time. So now let's extend our example to see if more
than 1 column together can act as a primary key.
Let's create another table for Subject, which will have subject_id and subject_name fields
and subject_id will be the primary key.
subject_id subject_name

Department of IT Page 6
1 Java

2 C++

3 Php
Now we have a Student table with student information and another table Subject for
storing subject information.
Let's create another table Score, to store the marks obtained by students in the respective
subjects. We will also be saving name of the teacher who teaches that subject along with marks.
score_id student_id subject_id marks teacher

1 10 1 70 Java Teacher

2 10 2 75 C++ Teacher

3 11 1 80 Java Teacher
In the score table we are saving the student_id to know which student's marks are these
and subject_id to know for which subject the marks are for. Together, student_id +
subject_id forms a Candidate Key for this table, which can be the Primary key. Now if you look
at the Score table, we have a column names teacher which is only dependent on the subject, for
Java it's Java Teacher and for C++ it's C++ Teacher & so on.
The primary key for this table is a composition of two columns which
is student_id & subject_id but the teacher's name only depends on subject, hence the subject_id,
and has nothing to do with student_id. This is Partial Dependency, where an attribute in a table
depends on only a part of the primary key and not on the whole key.
To remove Partial Dependency there can be many different solutions for this, but out objective is
to remove teacher's name from Score table. The simplest solution is to remove
columns teacher from Score table and add it to the Subject table. Hence, the Subject table will
become:
subject_id subject_name teacher

1 Java Java Teacher

2 C++ C++ Teacher

3 Php Php Teacher


And our Score table is now in the second normal form, with no partial dependency.
score_id student_id subject_id marks

1 10 1 70

2 10 2 75

3 11 1 80
Quick Recap: -
1. For a table to be in the Second Normal form, it should be in the First Normal form and it
should not have Partial Dependency.
2. Partial Dependency exists, when for a composite primary key, any attribute in the table
depends only on a part of the primary key and not on the complete primary key.
Department of IT Page 7
3. To remove Partial dependency, we can divide the table, remove the attribute which is causing
partial dependency, and move it to some other table where it fits in well.

3. Third Normal Form (3NF): -


Here we have 3 tables, Student, Subject and Score.
Student Table
student_id name reg_no branch address

10 Akon 07-WY CSE Kerala

11 Akon 08-WY IT Gujarat

12 Bkon 09-WY IT Rajasthan

In the Score table, we need to store some more information, which is the exam name and total
marks, so let's add 2 more columns to the Score table.
score_id student_id subject_id marks exam_name total_marks

Requirements for Third Normal Form


For a table to be in the third normal form,
1. It should be in the Second Normal form.
2. And it should not have Transitive Dependency.

What is Transitive Dependency?


With exam_name and total_marks added to our Score table, it saves more data now. Primary
key for our Score table is a composite key, which means it's made up of two attributes or
columns student_id + subject_id. Our new column exam_name depends on both student and
subject. For example, a mechanical engineering student will have Workshop exam but a
computer science student won't. And for some subjects you have Practical exams and for some
you don't. So we can say that exam_name is dependent on both student_id and subject_id.
But the second new column total_marks depends on exam_name as with exam type the total
score changes. For example, practicals are of less marks while theory exams are of more marks.
But, exam_name is just another column in the score table. It is not a primary key or even a part of
the primary key, and total_marks depends on it. This is Transitive Dependency. When a non-
prime attribute depends on other non-prime attributes rather than depending upon the prime
attributes or primary key.

Department of IT Page 8
How to remove Transitive Dependency?
The solution is very simple. Take out the columns exam_name and total_marks from Score table
and put them in an Exam table and use the exam_id wherever required.
Score Table: In 3rd Normal Form
score_id student_id subject_id marks exam_id

The new Exam table


exam_id exam_name total_marks

1 Workshop 200

2 Mains 70

3 Practicals 30

Advantage of removing Transitive Dependency


The advantage of removing transitive dependency is,
 Amount of data duplication is reduced.
 Data integrity achieved.

4. Boyce-Codd Normal Form (BCNF): -


Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and is also known as
3.5 Normal Form.
Rules for BCNF: -
For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two conditions:
1. It should be in the Third Normal Form.
2. And, for any dependency A → B, A should be a super key.
In simple words, the second point explains, that for a dependency A → B, A cannot be a non-
prime attribute, if B is a prime attribute.
Example: -
Below we have a college enrolment table with columns student_id, subject and professor.
student_id subject professor

101 Java P.Java

101 C++ P.Cpp

102 Java P.Java2

103 C# P.Chash

104 Java P.Java


As you can see, we have also added some sample data to the table. In the table above:
 One student can enrol for multiple subjects. For example, student with student_id 101, has
opted for subjects - Java & C++
 For each subject, a professor is assigned to the student.
 And, there can be multiple professors teaching one subject like we have for Java.

Department of IT Page 9
In the table above student_id, subject together form the primary key, because
using student_id and subject, we can find all the columns of the table. One more important point
is, one professor teaches only one subject, but one subject may have two different professors.
Hence, there is a dependency between subject and professor here, where subject depends on the
professor name.
 This table satisfies the 1st Normal form because all the values are atomic, column names are
unique and all the values stored in a particular column are of same domain.
 This table also satisfies the 2nd Normal Form as their is no Partial Dependency.
 And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.
 But this table is not in Boyce-Codd Normal Form.
In the table above, student_id, subject form primary key, which means subject column is
a prime attribute. But, there is one more dependency, professor → subject. And while subject is
a prime attribute, professor is a non-prime attribute, which is not allowed by BCNF.

How to satisfy BCNF?


To make this relation (table) satisfy BCNF, we will decompose this table into two
tables, student table and professor table.

And now, this relation satisfy Boyce-Codd Normal Form.

A more Generic Explanation: -


We have tried to explain BCNF in terms of relations.

Department of IT Page 10
5. Fourth Normal Form (4NF): -
Fourth Normal Form comes into picture when Multi-valued Dependency occur in any relation.

Rules for 4th Normal Form: -


For a table to satisfy the Fourth Normal Form, it should satisfy the following two conditions:
1. It should be in the Boyce-Codd Normal Form.
2. And, the table should not have any Multi-valued Dependency.
What is Multi-valued Dependency?
A table is said to have multi-valued dependency, if the following conditions are true,
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the table may
have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B
and C should be independent of each other.
If all these conditions are true for any relation(table), it is said to have multi-valued dependency.
Example: -
Below we have a college enrolment table with columns s_id, course and hobby.

s_id course hobby

1 Science Cricket

1 Maths Hockey

2 C# Cricket

2 Php Hockey
In the table above, student with s_id 1 has opted for two courses, Science and Maths, and has
two hobbies, Cricket and Hockey. Well the two records for student with s_id 1, will give rise to
two more records, as shown below, because for one student, two hobbies exists, hence along
with both the courses, these hobbies should be specified.

s_id course hobby

1 Science Cricket

1 Maths Hockey

1 Science Hockey

1 Maths Cricket

And, in the table above, there is no relationship between the columns course and hobby. They
are independent of each other. So there is multi-value dependency, which leads to un-necessary
repetition of data and other anomalies as well.
How to satisfy 4th Normal Form?
To make the above relation satify the 4th normal form, we can decompose the table into 2 tables.

Department of IT Page 11
Now this relation satisfies the fourth normal form. A table can also have functional dependency
along with multi-valued dependency. In that case, the functionally dependent columns are
moved in a separate table and the multi-valued dependent columns are moved to separate
tables. If you design your database carefully, you can easily avoid these issues.

6. Fifth normal form (5NF): -


A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless. 5NF is satisfied when all the tables are broken into as many tables as possible in order
to avoid redundancy. 5NF is also known as Project-join normal form (PJ/NF).
Join Dependency: -
Join decomposition is a further generalization of Multivalued dependencies. If the join of R1 and
R2 over C is equal to relation R, then we can say that a join dependency (JD) exists. Where R1
and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A, B, C, D).
Alternatively, R1 and R2 are a lossless decomposition of R. A JD ⋈ {R1, R2,..., Rn} is said to hold
over a relation R if R1, R2,....., Rn is a lossless-join decomposition. The *(A, B, C, D), (C, D) will be a
JD of R if the join of join's attribute is equal to the relation R. Here, *(R1, R2, R3) is used to
indicate that relation R1, R2, R3 and so on are a JD of R.

Example: -
SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case, combination of all these fields required to identify a
valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who
will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank. So to make the above
table into 5NF, we can decompose it into three relations P1, P2 & P3:

Department of IT Page 12
FUNCTIONAL DEPENDENCY THEORY: -
Functional Dependency (FD) determines the relation of one attribute to another attribute
in a database management system (DBMS) system. The functional dependency is a relationship that
exists between two attributes. It typically exists between the primary key and non-key attribute
within a table. Functional dependency helps you to maintain the quality of data in the database.
A functional dependency is denoted by an arrow →. The functional dependency of X on Y is
represented by X → Y. The left side of FD is known as a determinant, the right side of the
production is known as a dependent. Functional Dependency plays a vital role to find the
difference between good and bad database design.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address. Here
Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we
know the Emp_Id, we can tell that employee name associated with it. Functional dependency can be
written as:
 Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.

Key terms: -
Here, are some key terms for functional dependency:
S.NO KEY TERM DESCRIPTION
1 Axiom Axiom is a set of inference rules used to infer all the
functional dependencies on a relational database.
2 Decomposition It is a rule that suggests if you have a table that appears to
contain two entities which are determined by the same
primary key then you should consider breaking them up
into two different tables.
3 Dependent It is displayed on the right side of the functional
dependency diagram.
4 Determinant It is displayed on the left side of the functional dependency
Diagram.
5 Union It suggests that if two tables are separate, and the PK is the
same, you should consider putting them together.

How to find functional dependencies for a relation?


Functional Dependencies in a relation are dependent on the domain of the relation. Consider the
STUDENT relation given below.

Department of IT Page 13
 We know that STUD_NO is unique for each student. So STUD_NO->STUD_NAME, STUD_NO-
>STUD_PHONE, STUD_NO->STUD_STATE, STUD_NO->STUD_COUNTRY and STUD_NO ->
STUD_AGE all will be true.
 Similarly, STUD_STATE->STUD_COUNTRY will be true as if two records have same STUD_STATE,
they will have same STUD_COUNTRY as well.
 For relation STUDENT_COURSE, COURSE_NO->COURSE_NAME will be true as two records with
same COURSE_NO will have same COURSE_NAME.

Functional Dependency Set: -


Functional Dependency set or FD set of a relation is the set of all FDs present in the relation. For
Example, FD set for relation STUDENT shown below.
{STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE, STUD_NO->STUD_STATE, STUD_NO-
>STUD_COUNTRY, STUD_NO -> STUD_AGE, STUD_STATE->STUD_COUNTRY}

Attribute Closure: -
Attribute closure of an attribute set can be defined as set of attributes which can be functionally
determined from it.
How to find attribute closure of an attribute set?
To find attribute closure of an attribute set:
 Add elements of attribute set to the result set.
 Recursively add elements to the result set which can be functionally determined from the
elements of the result set.
Using FD set, attribute closure can be determined as:
(STUD_NO)+ = {STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY, STUD_AGE}
(STUD_STATE)+ = {STUD_STATE, STUD_COUNTRY}

Rules of Functional Dependencies: -


Below given are the three most important rules for Functional Dependency:
1. Reflexive rule: - If X is a set of attributes and Y is subset of X, then X holds a value of Y.
2. Augmentation rule: - When x -> y holds, and c is attribute set, then ac -> bc also holds. That is
adding attributes which do not change the basic dependencies.
3. Transitivity rule: - This rule is very much similar to the transitive rule in algebra if x -> y holds
and y -> z holds, then x -> z also holds. x -> y is called as functionally that determines y.

Types of Functional Dependencies: -


They are of four types.
1. Trivial functional dependency.
2. Non-trivial functional dependency.
3. Multi valued dependency.
4. Transitive dependency.

1. Trivial functional dependency: - The dependency of an attribute on a set of attributes is known


as trivial functional dependency if the set of attributes includes that attribute.

Department of IT Page 14
Symbolically: A ->B is trivial functional dependency if B is a subset of A. The following
dependencies are also trivial: A->A & B->B.
For Example: -
Consider a table with two columns Student_id and Student_Name. {Student_Id, Student_Name} ->
Student_Id is a trivial functional dependency as Student_Id is a subset of {Student_Id,
Student_Name}. That makes sense because if we know the values of Student_Id and
Student_Name then the value of Student_Id can be uniquely determined. Also, Student_Id ->
Student_Id & Student_Name -> Student_Name are trivial dependencies too.

2. Non-trivial functional dependency: - If a functional dependency X->Y holds true where Y is not
a subset of X then this dependency is called non trivial Functional dependency.
For Example: -
An employee table with three attributes: emp_id, emp_name, emp_address. The following
functional dependencies are non-trivial:
emp_id -> emp_name (emp_name is not a subset of emp_id)
emp_id -> emp_address (emp_address is not a subset of emp_id)
On the other hand, the following dependencies are trivial:
{emp_id, emp_name} -> emp_name [emp_name is a subset of {emp_id, emp_name}]
If a FD X->Y holds true where X intersection Y is null then this dependency is said to be
completely non trivial function dependency.

3. Multi valued dependency: -


Multivalued dependency occurs when there are more than one independent multivalued
attributes in a table.
For example: Consider a bike manufacture company, which produces two colors (Black and
white) in each model every year.
bike_model manuf_year color
M1001 2007 Black
M1001 2007 Red
M2012 2008 Black
M2012 2008 Red
M2222 2009 Black
M2222 2009 Red
M1001 2007 Black

Here columns manuf_year and color are independent of each other and dependent on
bike_model. In this case these two columns are said to be multivalued dependent on bike_model.
These dependencies can be represented like this:
bike_model ->> manuf_year
bike_model ->> color

4. Transitive dependency: -
A functional dependency is said to be transitive if it is indirectly formed by two functional
dependencies.
X -> Z is a transitive dependency if the following three functional dependencies hold true:
 X->Y
 Y does not ->X
 Y->Z
Note: A transitive dependency can only occur in a relation of three of more attributes. This
dependency helps us normalizing the database in 3NF (3rd Normal Form).

Department of IT Page 15
Example: -
Company CEO Age

Microsoft Satya Nadella 51

Google Sundar Pichai 46

Alibaba Jack Ma 54

{Company} -> {CEO} (if we know the company, we know its CEO's name)
{CEO} -> {Age} If we know the CEO, we know the Age
Therefore according to the rule of rule of transitive dependency:
{Company} -> {Age} should hold, that makes sense because if we know the company name, we
can know his age.
Note: You need to remember that transitive dependency can only occur in a relation of three or
more attributes.

Decomposition: -
The process of breaking up or dividing a single relation into two or more sub relations is called as
decomposition of a relation. When a relation in the relational model is not in appropriate normal
form then the decomposition of a relation is required. In a database, it breaks the table into
multiple tables. If the relation has no proper decomposition, then it may lead to problems like loss
of information. Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.
Types of Decomposition: -
1. Lossless Decomposition.
2. Dependency Preserving.

1. Lossless Decomposition: - If the information is not lost from the relation that is decomposed,
then the decomposition will be lossless. The lossless decomposition guarantees that the join of
relations will result in the same relation as it was decomposed. The relation is said to be lossless
decomposition if natural joins of all the decomposition give the original relation.
Example: -

EMPLOYEE_DEPARTMENT table

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT

Department of IT Page 16
EMPLOYEE table
EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida

DEPARTMENT table

DEPT_ID EMP_ID DEPT_NAME

827 22 Sales

438 33 Marketing

869 46 Finance

575 52 Production

678 60 Testing

Now, when these two relations are joined on the common column "EMP_ID", then the resultant
relation will look like:

Employee ⋈ Department

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

2. Dependency Preserving: - Dependency is an important constraint on the database. In the


dependency preservation, at least one decomposed table must satisfy every dependency. If a
relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a
part of R1 or R2 or must be derivable from the combination of functional dependencies of R1 and
R2. For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A-
Department of IT Page 17
>BC). The relational R is decomposed into R1(ABC) and R2(AD) which is dependency preserving
because FD A->BC is a part of relation R1(ABC).

DATABASE DESIGN PROCESS: -


We assumed that a relation schema r(R) is given, and proceeded to normalize it. There are several
ways in which we could have come up with the schema r(R):
1. r(R) could have been generated in converting an E-R diagram to a set of relation schemas.
2. r(R) could have been a single relation schema containing all attributes that are of interest. The
normalization process then breaks up r(R) into smaller schemas.
3. r(R)could have been the result of an ad-hoc design of relations that we then test to verify that it
satisfies a desired normal form.

E-R Model and Normalization: -


When we define an E-R diagram carefully, identifying all entities correctly, the relation
schemas generated from the E-R diagram should not need much further normalization. However,
there can be functional dependencies between attributes of an entity. For instance, suppose an
instructor entity set had attributes dept_name and dept_address, and there is a functional
dependency dept_name→ dept_address. We would then need to normalize the relation generated
from instructor.
Most examples of such dependencies arise out of poor E-R diagram design. In the above
example, if we had designed the E-R diagram correctly, we would have created a department entity
set with attribute dept address and a relationship set between instructor and department.
Similarly, a relationship set involving more than two entity sets may result in a schema that may
not be in a desirable normal form.
Functional dependencies can help us detect poor E-R design. If the generated relation
schemas are not in desired normal form, the problem can be fixed in the E-R diagram. That is,
normalization can be done formally as part of data modeling. Alternatively, normalization can be
left to the designer’s intuition during E-R modeling, and can be done formally on the relation
schemas generated from the E-R model.

Naming of Attributes and Relationships: -


A desirable feature of a database design is the unique-role assumption, which means that
each attribute name has a unique meaning in the database. This prevents us from using the same
attribute to mean different things in different schemas. For example, we might otherwise consider
using the attribute number for phone number in the instructor schema and for room number in the
classroom schema. The join of a relation on schema instructor with one on classroom is
meaningless.

Denormalization for Performance: -


Occasionally database designers choose a schema that has redundant information; that is, it
is not normalized. They use the redundancy to improve performance for specific applications. The
penalty paid for not using a normalized schema is the extra work (in terms of coding time and
execution time) to keep redundant data consistent. The process of taking a normalized schema and
making it nonnormalized is called denormalization.

Performance benefits of denormalization


Denormalization can improve performance by:
 Minimizing the need for joins.
 Precomputing aggregate values, that is, computing them at data modification time, rather than at
select time.
 Reducing the number of tables, in some cases.

Department of IT Page 18
UNIT WISE IMPORTANT QUESTIONS: -
TWO MARK QUESTIONS: -
1. Define Functional Dependency [APRIL 2019]
(OR)
Define Functional Dependency [APRIL 2018]
(OR)
Define Functional Dependency [MAY 2016]
2. What is the necessity of normalization in data base? [APRIL 2019]
3. Write the definition of BCNF. [APRIL 2018]
4. Define Functional Dependency. Why are some functional dependencies trivial? [APRIL 2017]
5. Demonstrate transitive dependency. Give an example. [APRIL 2017]
6. Define 3rd Normal form. [APRIL 2017]
7. Write the definition of 3NF. [OCTOBER 2018]
8. Define MVD. [OCTOBER 2018]
9. Write any two reasons for going to normalization on a relation. [OCTOBER 2017]
10. What is multi valued dependencies? Give an example. [OCTOBER 2017]
11. Why certain functional dependencies are called trivial? [OCTOBER 2017]
12. Identify the problems caused by Redundancy. [DECEMBER 2016]
13. Define Transitive dependencies. [DECEMBER 2016]

ESSAY QUESTIONS: -
1. Explain how 3NF and BCNF can remove redundancy from the relations? [APRIL 2019]
2. Discuss Multi valued dependency and 4NF in detail. [APRIL 2019]
3. What is MVD and write the MVD axioms? [APRIL 2018]
4. Explain 4NF and uses of 4NF with examples. [APRIL 2018]
5. Consider a relation scheme R=(A,B,C,D,E,H) on which the following functional dependencies hold:
{A⟶B, BC⟶D, E⟶C, D⟶A}. Write the candidate keys of R. [APRIL 2017]
6. Consider the statement “Every relation in 3 NF is also in BCNF and Vice Versa”. Judge whether statement
is correct or not? Give explanation. [APRIL 2017]
7. Explain Lossless Join Decomposition and Dependency Preserving Decomposition. [MAY 2016]
8. Define BCNF? How does BCNF differ from 3NF? Explain with an example. [MAY 2016]
9. What is FD and write the FD axioms. [OCTOBER 2018]
10. Explain BCNF and uses of BCNF with examples. [OCTOBER 2018]
11. Consider a relation scheme R = (A, B, C, D, E, H) on which the following functional dependencies hold:
{A⟶B, BC⟶D, E⟶C, D⟶A}. Compute the canonical cover. [OCTOBER 2017]
12. Define first normal form and second normal form. Give an example. [OCTOBER 2017]
13. What is schema refinement? Discuss the problems caused by redundancy. [DECEMBER 2016]
14. Contrast 3NF decomposition method with BCNF decomposition method illustratively.
[DECEMBER 2016]

Department of IT Page 19

You might also like