Professional Documents
Culture Documents
Kalinga University Department of Computer Science: Dependencies
Kalinga University Department of Computer Science: Dependencies
Course-MCA Sem-II
Subject-Database Management System
Subject Code- MCA204
UNIT-III
Dependencies
A functional dependency (FD) is a relationship between two attributes, typically between the PK
and other non-key attributes within a table. For any relation R, attribute Y is functionally
dependent on attribute X (usually the PK), if for every valid instance of X, that value of X
uniquely determines the value of Y.
A functional dependency FD: X → Y is called trivial if Y is a subset of X. In other words, a
dependency FD: X → Y means that the values of Y are determined by the values of X. Two
tuples sharing the same values of X will necessarily have the same values of Y.
Functional Dependency
If the information stored in a table can uniquely determine another information in the same table,
then it is called Functional Dependency. Consider it as an association between two attributes of
the same relation.
P -> Q
E01 Amit 28
E02 Rohit 31
In the above table, EmpName is functionally dependent on EmpID because EmpName can take
only one value for the given value of EmpID:
Fully-functionally Dependency
An attribute is fully functional dependent on another attribute, if it is Functionally Dependent on
that attribute and not on any of its proper subset.
<ProjectCost>
ProjectID ProjectCost
001 1000
002 5000
<EmployeeProject>
Transitive Dependency
When an indirect relationship causes functional dependency it is called Transitive Dependency.
Multivalued Dependency
When existence of one or more rows in a table implies one or more other rows in the same table,
then the Multi-valued dependencies occur.
->->
P->->Q
Q->->R
In the above case, Multivalued Dependency exists only if Q and R are independent attributes.
Partial Dependency
Partial Dependency occurs when a nonprime attribute is functionally dependent on part of a
candidate key.
The 2nd Normal Form (2NF) eliminates the Partial Dependency. Let us see an example −
<StudentProject>
As stated, the non-prime attributes i.e. StudentName and ProjectName should be functionally
dependent on part of a candidate key, to be Partial Dependent.
The StudentName can be determined by StudentID that makes the relation Partial Dependent.
The ProjectName can be determined by ProjectID, which that the relation Partial Dependent.
Normalization
As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values.
It should hold only atomic values.
Example: Suppose a company wants to store the names and contact details of its employees. It
creates a table that looks like this:
8812121212
102 Jon Kanpur
9900012222
103 Ron Chennai 7778881212
9990000123
104 Lester Bangalore
8123450987
Two employees (Jon & Lester) are having two mobile numbers so the company stored them in
the same field as you can see in the table above.
This table is not in 1NF as the rule says “each attribute of a table must have atomic (single)
values”, the emp_mobile values for employees Jon & Lester violates that rule. To make the table
complies with 1NF we should have the data like this:
Example: Suppose a school wants to store the data of teachers and the subjects they teach. They
create a table that looks like this: Since a teacher can teach more than one subjects, the table can
have multiple rows for a same teacher.
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because
non prime attribute teacher_age is dependent on teacher_id alone which is a proper subset of
candidate key. This violates the rule for 2NF as the rule says “no non-prime attribute is
dependent on the proper subset of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id Subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Example
Consider table as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
{Note that, there are many courses having the same course fee. }
Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate
key {STUD_NO, COURSE_NO} ;
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
2 C5
Note – 2NF tries to reduce the redundant data getting stored in memory. For instance, if there
are 100 students taking C1 course, we dont need to store its Fee as 1000 for all the 100 records,
instead once we can store it in the second table as the course fee for C1 is 1000.
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each
functional dependency X-> Y at least one of the following conditions hold:
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee, they create
a table named employee_details that looks like this:
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on
emp_id that makes non-prime attributes (emp_state, emp_city & emp_district) transitively
dependent on super key (emp_id). This violates the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove the
transitive dependency:
employee table:
employee_zip table:
Example-2
Student Table
Subject Table
Score Table
1 10 1 70
2 10 2 75
3 11 1 80
In the Score table, we need to store some more information, which is the exam name and total
marks, so let's add 2 more columns to the Score table.
With exam_name and total_marks added to our Score table, it saves more data now. Primary key
for our Score table is a composite key, which means it's made up of two attributes or columns
→ student_id + subject_id.
Our new column exam_name depends on both student and subject. For example, a mechanical
engineering student will have Workshop exam but a computer science student won't. And for
some subjects you have Prctical exams and for some you don't. So we can say that exam_name is
dependent on both student_id and subject_id.
And what about our second new column total_marks? Does it depend on our Score table's
primary key?
Well, the column total_marks depends on exam_name as with exam type the total score changes.
For example, practicals are of less marks while theory exams are of more marks.
But, exam_name is just another column in the score table. It is not a primary key or even a part
of the primary key, and total_marks depends on it.
This is Transitive Dependency. When a non-prime attribute depends on other non-prime
attributes rather than depending upon the prime attributes or primary key.
Again the solution is very simple. Take out the columns exam_name and total_marks from Score
table and put them in an Exam table and use the exam_id wherever required.
1 Workshop 200
2 Mains 70
3 Practicals 30
Multivalued Dependency-
If two or more independent relation are kept in a single relation or we can say multivalue
dependency occurs when the presence of one or more rows in a table implies the presence of
one or more other rows in that same table. Put another way, two attributes (or columns) in a table
are independent of one another, but both depend on a third attribute. A multivalued dependency
always requires at least three attributes because it consists of at least two attributes that are
dependent on a third.
For a dependency A -> B, if for a single value of A, multiple value of B exists, then the table
may have multi-valued dependency. The table should have at least 3 attributes and B and C
should be independent for A ->> B multivalued dependency.
o Multivalued dependency occurs when two attributes in a table are independent of each
other but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on a third
attribute that's why it always requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(white and
black) of each model every year.
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent
of each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The
representation of these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
Example
Let us see an example &mins;
<Student>
In the above table, we can see Students Amit and Akash have interest in more than one activity.
To correct it, divide the table into two separate tables and break Multivalued Dependency −
<StudentCourse>
StudentName CourseDiscipline
Amit Mathematics
Amit Mathematics
Yuvraj Computers
Akash Literature
Akash Literature
Akash Literature
<StudentActivities>
StudentName Activities
Amit Singing
Amit Dancing
Yuvraj Cricket
Akash Dancing
Akash Cricket
Akash Singing
This breaks the multivalued dependency and now we have two functional dependencies −
Fourth normal form (4NF) is a level of database normalization where there are no non-trivial
multivalued dependencies other than a candidate key. It builds on the first three normal forms
(1NF, 2NF and 3NF) and the Boyce-Codd Normal Form (BCNF). It states that, in addition to a
database meeting the requirements of BCNF, it must not contain more than one multivalued
dependency.
Properties – A relation R is in 4NF if and only if the following conditions are satisfied:
A table with a multivalued dependency violates the normalization standard of Fourth Normal
Form (4NK) because it creates unnecessary redundancies and can contribute to inconsistent data.
To bring this up to 4NF, it is necessary to break this information into two tables.
o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Example
Let us see an example −
<Movie>
MovieOne UK Comedy
MovieOne UK Thriller
MovieTwo Australia Action
<Movie_Shooting>
Movie_Name Shooting_Location
MovieOne UK
MovieOne UK
MovieTwo Australia
MovieTwo Australia
MovieThree India
<Movie_Listing>
Movie_Name Listing
MovieOne Comedy
MovieOne Thriller
MovieTwo Action
MovieTwo Crime
MovieThree Drama
Let R is a relation schema R1, R2, R3……..Rn be the decomposition of R. r( R ) is said to satisfy
join dependency if and only if
If a table can be recreated by joining multiple tables and each of this table have a subset of the
attributes of the table, then the table is in Join Dependency. It is a generalization of Multivalued
Dependency
Join Dependency can be related to 5NF, wherein a relation is in 5NF, only if it is already in 4NF
and it cannot be decomposed further.
Example
<Employee>
EmpName EmpSkills EmpJob (Assigned Work)
The above table can be decomposed into the following three tables; therefore it is not in 5NF:
<EmployeeSkills>
EmpName EmpSkills
Tom Networking
Katie Programming
<EmployeeJob>
EmpName EmpJob
Tom EJ001
Harry EJ002
Katie EJ002
<JobSkills>
EmpSkills EmpJob
Networking EJ001
Programming EJ002
The above relations have join dependency, so they are not in 5NF. That would mean that a join
relation of the above three relations is equal to our original relation <Employee>.
A relation R is in 5NF if and only if every join dependency in R is implied by the candidate keys
of R. A relation decomposed into two relations must have loss-less join Property, which ensures
that no spurious or extra tuples are generated, when relations are reunited through a natural join.
Properties – A relation R is in 5NF if and only if it satisfies following conditions:
Application of the general definitions of 2NF and 3NF may identify additional redundancy
caused by dependencies that violate one or more candidate keys. However, despite these
additional constraints, dependencies can still exist that will cause redundancy to be present in
3NF relations. This weakness in 3NF, resulted in the presentation of a stronger normal form
called Boyce–Codd Normal Form (Codd, 1974).
Although, 3NF is adequate normal form for relational database, still, this (3NF) normal form
may not remove 100% redundancy because of X?Y functional dependency, if X is not a
candidate key of given relation. This can be solve by Boyce-Codd Normal Form (BCNF).
Boyce–Codd Normal Form (BCNF) is based on functional dependencies that take into
account all candidate keys in a relation; however, BCNF also has additional constraints
compared with the general definition of 3NF.
A relation is in BCNF iff, X is superkey for every functional dependency (FD) X?Y in given
relation.
In other words,
A relation is in BCNF, if and only if, every determinant is a Form (BCNF) candidate key.
You came across a similar hierarchy known as Chomsky Normal Form in Theory of
Computation. Now, carefully study the hierarchy above. It can be inferred that every relation in
BCNF is also in 3NF. To put it another way, a relation in 3NF need not to be in BCNF. Ponder
over this statement for a while.
To determine the highest normal form of a given relation R with functional dependencies, the
first step is to check whether the BCNF condition holds. If R is found to be in BCNF, it can be
safely deduced that the relation is also in 3NF, 2NF and 1NF as the hierarchy shows. The 1NF
has the least restrictive constraint – it only requires a relation R to have atomic values in each
tuple. The 2NF has a slightly more restrictive constraint.
The 3NF has more restrictive constraint than the first two normal forms but is less restrictive
than the BCNF. In this manner, the restriction increases as we traverse down the hierarchy.
Example-1:
Find the highest normal form of a relation R(A, B, C, D, E) with FD set as:
Explanation:
Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its subset can determine all
attribute of relation, So AC will be candidate key. A or C can’t be derived from any other
attribute of the relation, so there will be only 1 candidate key {AC}.
Step-2: Prime attributes are those attribute which are part of candidate key {A, C} in this
example and others will be non-prime {B, D, E} in this example.
Step-3: The relation R is in 1st normal form as a relational DBMS does not allow multi-
valued or composite attribute.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper
subset of candidate key AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E
is in 2nd normal form (B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is a
prime attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to satisfy 3rd
normal for, either LHS of an FD should be super key or RHS should be prime attribute. So the
highest normal form of relation will be 2nd Normal form.
Below we have a college enrolment table with columns student_id, subject and professor.
103 C# P.Chash
As you can see, we have also added some sample data to the table.
One student can enrol for multiple subjects. For example, student with student_id 101,
has opted for subjects - Java & C++
And, there can be multiple professors teaching one subject like we have for Java.
Well, in the table above student_id, subject together form the primary key, because
using student_id and subject, we can find all the columns of the table.
One more important point to note here is, one professor teaches only one subject, but one subject
may have two different professors.
Hence, there is a dependency between subject and professor here, where subject depends on the
professor name.
This table satisfies the 1st Normal form because all the values are atomic, column names are
unique and all the values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as their is no Partial Dependency.
And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.
But this table is not in Boyce-Codd Normal Form.
In the table above, student_id, subject form primary key, which means subject column is a prime
attribute.
And while subject is a prime attribute, professor is a non-prime attribute, which is not allowed
by BCNF.
To make this relation(table) satisfy BCNF, we will decompose this table into two
tables, student table and professor table.
Student Table
student_id p_id
101 1
101 2
and so on...
2 P.Cpp C++
and so on...
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Inclusion Dependency
o Multivalued dependency and join dependency can be used to guide database design
although they both are less common than functional dependencies.
o Inclusion dependencies are quite common. They typically show little influence on
designing of the database.
o The inclusion dependency is a statement in which some columns of a relation are
contained in other columns.
o The example of inclusion dependency is a foreign key. In one relation, the referring
relation is contained in the primary key column(s) of the referenced relation.
o Suppose we have two relations R and S which was obtained by translating two entity sets
such that every R entity is also an S entity.
o Inclusion dependency would be happen if projecting R on its key attributes yields a
relation that is contained in the relation obtained by projecting S on its key attributes.
o In inclusion dependency, we should not split groups of attributes that participate in an
inclusion dependency.
o In practice, most inclusion dependencies are key-based that is involved only keys.
Lossless Decomposition
o Decomposition is lossless if it is feasible to reconstruct relation R from decomposed
tables using Joins. This is the preferred choice. The information will not lose from the
relation when decomposed. The join would result in the same original relation.
o <EmpInfo>
o
<DeptDetails>
Dpt2 E002 HR
o Therefore, the above relation had lossless decomposition i.e. no loss of information.
o Lossy Decomposition
o As the name suggests, when a relation is decomposed into two or more relational
schemas, the loss of information is unavoidable when the original relation is retrieved.
o <EmpInfo>
o
<DeptDetails>
Dept_ID Dept_Name
Dpt1 Operations
Dpt2 HR
Dpt3 Finance
o Now, you won’t be able to join the above tables, since Emp_ID isn’t part of
the DeptDetails relation.
A B C
1 2 1
2 5 3
3 3 3
R( A , B , C )
Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B , C )-
R(A,B,C)
1. R1(A,B)
2. R2(B,C)
The two sub relations are-
A B
1 2
2 5
3 3
R1( A , B )
B C
2 1
B C
5 3
3 3
R2( B , C )
Now, let us check whether this decomposition is lossless or not.
For lossless decomposition, we must have-
R1 ⋈ R2 = R
Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 , we get-
A B C
1 2 1
2 5 3
3 3 3
No_name
Roll_no Sname
111 parimal
222 parimal
name_dept
Sname Dept
parimal COMPUTER
parimal ELECTRICAL
In lossy decomposition ,spurious tuples are generated when a natural join is applied to the
relations in the decomposition.
stu_joined
Multivalued Dependency
o Multivalued dependency occurs when two attributes in a table are independent of each
other but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on a third
attribute that's why it always requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(white and
black) of each model every year.
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent
of each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The
representation of these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
Join Dependency
o Join decomposition is a further generalization of Multivalued dependencies.
o If the join of R1 and R2 over C is equal to relation R, then we can say that a join
dependency (JD) exists.
o Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations
R (A, B, C, D).
o Alternatively, R1 and R2 are a lossless decomposition of R.
o A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-
join decomposition.
o The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the
relation R.
o Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R.