Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

DBMS 1

UNIT-III
SCHEMA REFINEMENT AND NORMAL FORMS: Introduction to Schema Refinement -
Problems Caused by redundancy, Decompositions - Problem related to decomposition, Functional
Dependencies -Reasoning about FDS, Normal Forms - FIRST, SECOND, THIRD Normal forms -
BCNF - Properties of Decompositions - Lossless join Decomposition, Dependency preserving
Decomposition, Multi valued Dependencies - FOURTH Normal Form, Join Dependencies, FIFTH
Normal form.

Problems Caused by redundancy


Data redundancy in databases refers to the unnecessary duplication of data. It can arise from poor
database design or lack of proper normalization. Redundancy can cause several issues:

Problems Caused by Redundancy


1. Wasted Storage:Storing duplicate data consumes more storage than necessary.
2. Data Anomalies:These are inconsistencies that arise due to redundancy.
Update Anomalies: When you have the same piece of data stored in multiple places, updating it in
one place can lead to inconsistency if it's not updated everywhere.
Insertion Anomalies: You might have to insert redundant data in multiple places, leading to
inconsistencies.
Deletion Anomalies: Deleting data in one table might unintentionally remove necessary data
that's needed elsewhere.
3. Increased Complexity:Querying and maintaining redundant data can be more complex.
4. Performance Issues:Duplicate data can slow down search, update, and insert operations.
5. Data Integrity Issues:If data is inconsistent across tables, it can lead to data integrity issues.

Example for Problems Caused by Redundancy:


Let's consider a simplistic example. Suppose you have a table called "Orders" with the following structure
and data:

| OrderID | CustomerName | Product | CustomerAddress |


|-----------|-----------------------|-------------|------------------------|
|1 | Madhu | Laptop | Hyderabad |
|2 | Madhu | Mouse | Hyderabad |
|3 | Naveen | Keyboard| Bengaluru |
From the table:
DBMS 2

Redundancy: The `CustomerName` "Madhu" and his `CustomerAddress` "Hyderabad" are repeated for
two orders.

Problems:
Update Anomaly: If Madhu moves to a new address, you'd have to update multiple rows. If you forget to
update all the rows, it leads to inconsistent data.
Insertion Anomaly: To insert a new order for Madhu, you have to re-enter his address, leading to further
redundancy.
Deletion Anomaly: If you decide to delete the order with the mouse, you might be tempted to delete
Madhu's details entirely, but that would remove crucial data associated with the laptop order.
Solution:
Normalizing the database can resolve these problems. In this example, splitting the table into two tables,
`Orders` and `Customers`, would be a start:

1. Customers Table:

| CustomerID | CustomerName | CustomerAddress |


|------------|--------------|-------------------|
| 101 | Madhu | Hyderabad |
| 102 | Naveen | Bengaluru |
2. Orders Table:

| OrderID | CustomerID | Product |


|---------|------------|----------|
|1 | 101 | Laptop |
|2 | 101 | Mouse |
|3 | 102 | Keyboard |
This design reduces redundancy and eliminates the anomalies.

Decompositions and its problems in DBMS


Decomposition in the context of database design refers to the process of breaking down a single table into
multiple tables in order to eliminate redundancy, reduce data anomalies, and achieve normalization.
Decomposition is typically done using rules defined by normalization forms.

However, while decomposition can be helpful, it is not without challenges. Done incorrectly,
decomposition can lead to its own set of problems.
DBMS 3

Problems Related to Decomposition


1. Loss of Information
Non-loss decomposition: When a relation is decomposed into two or more smaller relations, and the
original relation can be perfectly reconstructed by taking the natural join of the decomposed relations,
then it is termed as lossless decomposition. If not, it is termed "lossy decomposition."
Example: Let's consider a table `R(A, B, C)` with a dependency `A → B`. If you decompose it into
`R1(A, B)` and `R2(B, C)`, it would be lossy because you can't recreate the original table using natural
joins.
Example: Consider a relation R(A,B,C) with the following data:

|A |B |C |
|----|----|----|
|1 |X |P |
|1 |Y |P |
|2 |Z |Q |
Suppose we decompose R into R1(A,B) and R2(A,C).

R1(A, B):

|A |B |
|----|----|
|1 |X |
|1 |Y |
|2 |Z |
R2(A, C):

|A |C |
|----|----|
|1 |P |
|1 |P |
|2 |Q |
Now, if we take the natural join of R1 and R2 on attribute A, we get back the original relation R.
Therefore, this is a lossless decomposition.

2. Loss of Functional Dependency


Once tables are decomposed, certain functional dependencies might not be preserved, which can lead to
the inability to enforce specific integrity constraints.
Example: If you have the functional dependency `A → B` in the original table, but in the decomposed
tables, there is no table with both `A` and `B`, this functional dependency can't be preserved.
Example: Let's consider a relation R with attributes A,B, and C and the following functional
dependencies:

A→B
B→C
DBMS 4

Now, suppose we decompose R into two relations:

R1(A,B) with FD A → B
R2(B,C) with FD B → C

In this case, the decomposition is dependency-preserving because all the functional dependencies of the
original relation R can be found in the decomposed relations R1 and R2. We do not need to join R1 and
R2 to enforce or check any of the functional dependencies.

However, if we had a functional dependency in R, say A → C, which cannot be determined from either
R1 or R2 without joining them, then the decomposition would not be dependency-preserving for that
specific FD.

3. Increased Complexity
Decomposition leads to an increase in the number of tables, which can complicate queries and
maintenance tasks. While tools and ORM (Object-Relational Mapping) libraries can mitigate this to some
extent, it still adds complexity.
4. Redundancy
Incorrect decomposition might not eliminate redundancy, and in some cases, can even introduce new
redundancies.
5. Performance Overhead
An increased number of tables, while aiding normalization, can also lead to more complex SQL queries
involving multiple joins, which can introduce performance overheads.
Best Practices
Ensure decomposition is non-lossy. After decomposition, it should be possible to recreate the original data
using natural joins.
Preserve functional dependencies to enforce integrity constraints.
Strike a balance. While normalization and decomposition are essential, in some scenarios (like reporting
databases), a certain level of denormalization might be preferred for performance reasons.
Regularly review and optimize the database design, especially as the application's requirements evolve.

In essence, while decomposition is a powerful tool in achieving database normalization and reducing
anomalies, it must be done thoughtfully and judiciously to avoid introducing new problems.

Functional Dependencies and its reasoning in DBMS

Functional dependencies play a vital role in the normalization process in relational database design. They
help in defining the relationships between attributes in a relation and are used to formalize the properties
of the relation and drive the process of decomposition.
DBMS 5

Functional Dependencies (FD)


A functional dependency `X→Y
` between two sets of attributes X and Y in a relation R is defined as: if two tuples (rows) of R have the
same value for attributes X, then they must also have the same values for attributes Y. In other words, the
values of X determine the values of Y.

Consider a relation (table) Students:

| sid | sname | zipcode | cityname | state |


|------|----------|---------|----------------|----------------|
| S001 | Aarav | 110001 | New Delhi | Delhi |
| S002 | Priyanka | 400001 | Mumbai | Maharashtra |
| S003 | Rohit | 700001 | Kolkata | West Bengal |
| S004 | Ananya | 560001 | Bengaluru | Karnataka |
| S005 | Kartik | 600001 | Chennai | Tamil Nadu |
| S006 | Lakshmi | 500001 | Hyderabad | Telangana |
| S007 | Aditya | 160017 | Chandigarh | Chandigarh |
sid functionally determines sname because for a given student ID, there's only one possible student name
zipcode functionally determines cityname, a specific zip code should determine a unique cityname
cityname functionally determines state, A city name could determine a state.
Mathematically, these functional dependencies can be represented as:

sid→sname

zipcode→cityname

cityname→state

Reasoning About Functional Dependencies


1. Trivial Dependency
- If Y is a subset of X, then the dependency X -> Y is trivial.
- For example, in {A, B} -> {A}, the dependency is trivial because A is part of {A, B}.

Example: For attributes A,B,C:

A→A
is a trivial dependency because an attribute always determines itself.
AB→A
is a trivial dependency because the combined attributes A and B always determine A as it's a subset.
ABC→AC
is a trivial dependency for the same reason; the combined attributes A,B, and C always determine A and
C.
2. Full Functional Dependency
DBMS 6

- An attribute functionally depends on a set of attributes, X, and does not functionally depend on any
proper subset of X.

Example: Consider a relation StudentCourses that has the following attributes:

StudentID (unique identifier for each student)


CourseID (unique identifier for each course)
Instructor (name of the instructor teaching the course)
The relation is used to keep track of which student is enrolled in which course and who the instructor for
that course is.

Now, assume we have the following functional dependency:


(StudentID,CourseID) → Instructor

This means that a combination of a specific student and a specific course will determine who the
instructor is.

Thus, Instructor is fully functionally dependent on the combined attributes StudentID and CourseID.

3. Transitive Dependency
- If A -> B and B -> C, then A has a transitive dependency on C through B.

Example: Consider a relation Employees with the following attributes:

EmployeeID (unique identifier for each employee)


EmployeeName
Department (department in which the employee works)
DepartmentLocation (location of the department)
Now, let's consider the following functional dependencies:

EmployeeID → Department
Department → DepartmentLocation

From the above functional dependencies:

An EmployeeID determines the Department an employee works in.


A Department determines its DepartmentLocation.

However, the DepartmentLocation is also dependent on the EmployeeID through Department. This means
the DepartmentLocation has a transitive dependency on EmployeeID via Department.

This kind of transitive dependency can lead to redundancy.


DBMS 7

4. Closure
- The closure of a set of attributes X with respect to a set of functional dependencies FD, denoted as X+,
is the set of attributes that are functionally determined by X.
- For example, given FDs: {A -> B, B -> C}, the closure of {A}, denoted as A+, would be {A, B, C}.

Introduction to Normal Forms

In database management systems (DBMS), the concept of normalization is employed to organize


relational databases efficiently and to eliminate redundant data, ensure data dependency, and ensure data
integrity. The process of normalization is divided into several stages, called "normal forms." Each normal
form has a specific set of rules and criteria that a database schema must meet.

Here's a brief overview of the main normal forms:

1. First Normal Form (1NF)


Each table should have a primary key.
Atomic values: Each attribute (column) of a table should hold only a single value, meaning no repeating
groups or arrays.
All entries in any column must be of the same kind.
Second Normal Form (2NF)
It meets all the requirements of 1NF.
It ensures that non-key attributes are fully functionally dependent on the primary key. In other words, if a
table has a composite primary key, then every non-key attribute should be dependent on the full set of
primary key attributes.
Third Normal Form (3NF)
It meets all the requirements of 2NF.
It ensures that the non-key columns are functionally dependent only on the primary key. This means there
should be no transitive dependencies.
Boyce-Codd Normal Form (BCNF)
Meets all requirements of 3NF.
For any non-trivial functional dependency, X → Y, X should be a superkey. It's a more stringent version
of 3NF.
Fourth Normal Form (4NF)
Meets all the requirements of BCNF.
There shouldn’t be any multi-valued dependency for a superkey. This deals with separating independent
multiple relationships, ensuring that you cannot determine multiple sets of values in a table from a single
key attribute.
Fifth Normal Form (5NF or Project-Join Normal Form - PJNF)
It deals with cases where certain projections of your data must be recreatable from other projections.
Sixth Normal Form (6NF)
DBMS 8

Often considered when dealing with temporal databases (databases that have time-dependent data).
Deals with how data evolves over time and is less commonly discussed in most relational database design
contexts.

Normalization often involves trade-offs. While higher normal forms eliminate redundancy and improve
data integrity, they can also result in more complex relational schemas and sometimes require more joins,
which can affect performance. As such, it's essential to understand the data and the specific application's
requirements when deciding the level of normalization suitable for a particular situation. Sometimes,
denormalization (intentionally introducing redundancy) is implemented to improve performance,
especially in read-heavy databases.

First Normal Form (1NF) in DBMS

The First Normal Form (1NF) is the first step in the normalization process of organizing data within a
relational database to reduce redundancy and improve data integrity. A relation (table) is said to be in 1NF
if it adheres to the following rules:

1. Atomic Values:
Each attribute (column) contains only atomic (indivisible) values. This means values in each column are
indivisible units and there should be no sets, arrays, or lists.
For example, a column called "Phone Numbers" shouldn't contain multiple phone numbers for a single
record. Instead, you'd typically break it into additional rows or another related table.
2. Primary Key:
Each table should have a primary key that uniquely identifies each row. This ensures that each row in the
table can be uniquely identified.
3. No Duplicate Rows:
There shouldn’t be any duplicate rows in the table. This is often ensured by the use of the primary key.
4. Order Doesn't Matter:
The order in which data is stored doesn't matter in the context of 1NF (or any of the normal forms).
Relational databases don't guarantee an order for rows in a table unless explicitly sorted.
5. Single Valued Attributes:
Columns should not contain multiple values of the same type. For example, a column "Skills" shouldn't
contain a list like "Java, Python, C++" for a single record. Instead, these skills should be split across
multiple rows or placed in a separate related table.
Example for First Normal Form (1NF)
Consider a table with a structure:

| Student_ID | Subjects |
|------------|-------------------|
|1 | Math, English |
DBMS 9

|2 | English, Science |
|3 | Math, History |
The table above is not in 1NF because the "Subjects" column contains multiple values.

To transform it to 1NF:

| Student_ID | Subject |
|------------|-----------|
|1 | Math |
|1 | English |
|2 | English |
|2 | Science |
|3 | Math |
|3 | History |
Now, each combination of "Student_ID" and "Subject" is unique, and every attribute contains only atomic
values, ensuring the table is in 1NF.

Achieving 1NF is a fundamental step in database normalization, laying the foundation for further
normalization processes to eliminate redundancy and ensure data integrity.

Second Normal Form (2NF) in DBMS

The Second Normal Form (2NF) is the next stage in the normalization process after the First Normal
Form (1NF). A relation is in 2NF if:

1. It is already in 1NF:
This means the relation contains only atomic values, there are no duplicate rows, and it has a primary key.
2. No Partial Dependencies:
All non-key attributes (i.e., columns that aren't part of the primary key) should be functionally dependent
on the *entire* primary key. This rule is especially relevant for tables with composite primary keys (i.e.,
primary keys made up of more than one column).
In simpler terms, no column should depend on just a part of the composite primary key.
Example for Second Normal Form
Let's consider a table that keeps track of the courses that students are enrolled in, with the faculty who
teach those courses:

| Student_ID | Course_ID | Course_Name | Faculty |


|------------|-----------|---------------|-------------|
|1 | C1 | Math | Mr. A |
DBMS 10

|1 | C2 | English | Ms. B |
|2 | C1 | Math | Mr. A |
|3 | C3 | History | Ms. C |
Here, a combination of `Student_ID` and `Course_ID` can be considered as a primary key because a
student can be enrolled in multiple courses, and each course might be taken by many students.

However, you'll notice that `Course_Name` and `Faculty` depend only on `Course_ID` and not on the
combination of `Student_ID` and `Course_ID`. This is a partial dependency.

To bring the table to 2NF, we need to remove the partial dependencies:

StudentCourse Table

| Student_ID | Course_ID |
|------------|-----------|
|1 | C1 |
|1 | C2 |
|2 | C1 |
|3 | C3 |
Course Table

| Course_ID | Course_Name | Faculty |


|-----------|---------------|-------------|
| C1 | Math | Mr. A |
| C2 | English | Ms. B |
| C3 | History | Ms. C |
Now, the `StudentCourse` table relates students to courses, and the `Course` table holds information about
each course. There are no more partial dependencies.

It's worth noting that while 2NF does improve the structure of our database by reducing redundancy and
eliminating partial dependencies, it might not eliminate all anomalies or redundancy. Further
normalization forms (like 3NF and BCNF) address additional types of dependencies and potential issues.

Third Normal Form (3NF) in DBMS


DBMS 11

The Third Normal Form (3NF) is a further step in the normalization process after achieving Second
Normal Form (2NF). A relation is considered to be in 3NF if:

1. It is already in 2NF:
This means the relation has no partial dependencies of non-key attributes on the primary key.
2. No Transitive Dependencies:
All non-key attributes are functionally dependent only on the primary key and not on any other non-key
attributes. If there is a dependency of one non-key attribute on another non-key attribute, it is called a
transitive dependency, and such a dependency violates 3NF.
Simply put, in 3NF, non-key attributes should not depend on other non-key attributes; they should only
depend on the primary key.

Example for Third Normal Form (3NF)


Consider a table storing information about products sold by different vendors:

| Product_ID | Product_Name | Vendor_Name | Vendor_Address |


|------------|--------------|-------------|-----------------|
| P1 | Laptop | TechCorp | 123 Tech St. |
| P2 | Mouse | TechCorp | 123 Tech St. |
| P3 | Chair | FurniShop | 456 Furni Rd. |
In the table above, `Product_ID` is the primary key. We can see that `Vendor_Address` depends on
`Vendor_Name` rather than `Product_ID`, which represents a transitive dependency.

To convert this table to 3NF, we can split it into two tables:

Product Table

| Product_ID | Product_Name | Vendor_Name |


|------------|--------------|-------------|
| P1 | Laptop | TechCorp |
| P2 | Mouse | TechCorp |
| P3 | Chair | FurniShop |
Vendor Table

| Vendor_Name | Vendor_Address |
|-------------|-----------------|
| TechCorp | 123 Tech St. |
| FurniShop | 456 Furni Rd. |
Now, the `Product` table has `Product_ID` as the primary key, and all attributes in this table depend only
on the primary key. The `Vendor` table has `Vendor_Name` as its primary key, and the address in this
table depends only on the vendor name.

This normalization eliminates the transitive dependency and reduces redundancy. If we need to change a
vendor's address, we now only have to make the change in one place in the `Vendor` table.
DBMS 12

To further refine the database structure, we might proceed to other normalization forms like BCNF, but
3NF is often sufficient for many practical applications and strikes a good balance between minimizing
redundancy and maintaining a manageable schema.

Boyce-Codd Normal Form (BCNF) in DBMS

Boyce-Codd Normal Form (BCNF) is an advanced step in the normalization process, and it's a stronger
version of the Third Normal Form (3NF). In fact, every relation in BCNF is also in 3NF, but the converse
isn't necessarily true. BCNF was introduced to handle certain anomalies that 3NF does not deal with.

A relation is in BCNF if:

1. It is already in 3NF.

2. For every non-trivial functional dependency X→Y


, X is a superkey. This essentially means that the only determinants in the relation are superkeys.

Here, "non-trivial" means that Y is not a subset of X, and a "superkey" is a set of attributes that
functionally determines all other attributes in the relation.

Example for Boyce-Codd Normal Form (BCNF)


Consider a university scenario where professors supervise student theses in various topics. Now, let's
assume each professor can only supervise one topic, but multiple professors can supervise the same topic.

Initial Table

| Student | Professor | Topic |


|---------|-----------|--------|
| Alice | Mr. A | Math |
| Bob | Mr. B | Math |
| Charlie | Mr. C | Physics|
Here:
Each professor is associated with exactly one topic.
The primary key is {Student, Professor}, meaning a professor can supervise multiple students, but each
student has one thesis and thus one topic.
There's a functional dependency {Professor} → {Topic} since each professor supervises only one topic.
Now, observe that {Professor} is not a superkey (because the primary key is a combination of Student and
Professor), but it determines another attribute in the table (Topic). This violates the definition of BCNF.
DBMS 13

To bring this table into BCNF, we can decompose it into two tables:

StudentSupervision Table:

| Student | Professor |
|---------|-----------|
| Alice | Mr. A |
| Bob | Mr. B |
| Charlie | Mr. C |
ProfessorTopic Table:

| Professor | Topic |
|-----------|--------|
| Mr. A | Math |
| Mr. B | Math |
| Mr. C | Physics|
This decomposition eliminates the partial dependency and ensures that the only determinants are
superkeys, making the structure adhere to BCNF.

In practice, BCNF is a highly normalized form, and while it can minimize redundancy, it can also increase
the complexity of the database design. Designers often have to make trade-offs between achieving higher
normal forms and maintaining simplicity, depending on the specific use case and requirements of the
system.

Properties of Decomposition
Decomposition must have the following properties:

1. Decomposition Must be Lossless

2. Dependency Preservation

3. Lack of Data Redundancy

1. Decomposition Must be Lossless


Decomposition must always be lossless, which means the information must never get lost from a
decomposed relation. This way, we get a guarantee that when joining the relations, the join would
eventually lead to the same relation in the result as it was actually decomposed.

2. Dependency Preservation
DBMS 14

Dependency is a crucial constraint on a database, and a minimum of one decomposed table must satisfy
every dependency. If {P → Q} holds, then two sets happen to be dependent functionally. Thus, it becomes
more useful when checking the dependency if both of these are set in the very same relation. This
property of decomposition can be done only when we maintain the functional dependency. Added to this,
this property allows us to check various updates without having to compute the database structure’s
natural join.

3. Lack of Data Redundancy


It is also commonly termed as a repetition of data/information. According to this property, decomposition
must not suffer from data redundancy. When decomposition is careless, it may cause issues with the
overall data in the database. When we perform normalization, we can easily achieve the property of lack
of data redundancy.

What is Lossless Decomposition in DBMS?


Lossless decomposition, also known as lossless join property or lossless-join decomposition, is a
desirable property in database normalization and decomposition within database management systems
(DBMS).

It refers to the ability to recreate the original relation or relations through the process of joining the
decomposed relations, without losing any information or introducing spurious tuples.

How to Achieve Lossless Decomposition in DBMS?


To achieve lossless decomposition, two main conditions must be satisfied:

Dependency Preservation: The decomposed relations should preserve all the functional dependencies that
existed in the original relation.
Common Attribute: There should be at least one common attribute shared by the decomposed relations.
This attribute enables the join operation to recreate the original relation without any loss of information.
Lossless decomposition is important because it ensures that the decomposition process does not introduce
any information loss or anomalies in the resulting relations. It helps maintain the integrity and correctness
of the database.

Lossless decomposition is a key consideration in database normalization techniques such as Boyce-Codd


Normal Form (BCNF) and Third Normal Form (3NF). These normal forms aim to decompose relations
into smaller, well-structured relations while preserving dependencies and avoiding anomalies.

Advantages of Lossless Decomposition


Data Integrity: Lossless decomposition ensures that the original information in the database is preserved
without any loss or corruption.
Dependency Preservation: Lossless decomposition preserves all the functional dependencies that exist in
the original relation.
DBMS 15

Database Normalization: Lossless decomposition is a key component of the normalization process. It


helps eliminate redundancy, minimize data anomalies, and improve overall database structure and design.
Efficient Join Operations: Lossless decomposition allows for efficient join operations between
decomposed relations.
Disadvantages of Lossless Decomposition
Increased Storage Requirements: Lossless decomposition can result in an increased number of relations or
tables compared to the original relation.
Complex Querying: Decomposing a relation into smaller relations can make complex querying more
challenging.
Performance Impact: The decomposition process and subsequent join operations can impact performance,
especially when dealing with large databases or complex queries.
Maintenance Complexity: Managing and maintaining a decomposed database with multiple relations can
be more complex than managing a single relation.

Fourth Normal Form (4NF) in DBMS


The Fourth Normal Form (4NF) is an advanced level in the normalization process, aiming to handle
certain types of anomalies which aren't addressed by the Third Normal Form (3NF). Specifically, 4NF
addresses multi-valued dependencies.

A relation is in 4NF if:

1. It is already in 3NF.

2. No multi-valued dependencies exist. A multi-valued dependency occurs when an attribute depends on


another attribute but not on the primary key.

To clarify, consider a relation R with attributes X, Y, and Z. We say that there is a multi-valued
dependency from X to Y, denoted X↠Y
, if for a single value of X, there are multiple values of Y associated with it, independent of Z.

Example for Fourth Normal Form (4NF)


Let's illustrate 4NF with a scenario involving students, their hobbies, and the courses they've taken:

Initial Table

| Student_ID | Hobby | Course |


|------------|------------|------------|
| S1 | Painting | Math |
| S1 | Painting | Physics |
| S1 | Hiking | Math |
DBMS 16

| S1 | Hiking | Physics |
| S2 | Reading | Chemistry |
| S2 | Reading | Biology |
In the table:

For student `S1`, there are two hobbies (`Painting` and `Hiking`) and two courses (`Math` and `Physics`),
resulting in a combination of every hobby with every course.
This design suggests a multi-valued dependency between `Student_ID` and `Hobby`, and also between
`Student_ID` and `Course`.
To bring the table to 4NF, we can decompose it into two separate tables:

StudentHobbies Table:

| Student_ID | Hobby |
|------------|------------|
| S1 | Painting |
| S1 | Hiking |
| S2 | Reading |
StudentCourses Table:

| Student_ID | Course |
|------------|------------|
| S1 | Math |
| S1 | Physics |
| S2 | Chemistry |
| S2 | Biology |
With this separation:

The `StudentHobbies` table lists the hobbies of each student.


The `StudentCourses` table lists the courses taken by each student.
There are no more multi-valued dependencies. This setup not only reduces redundancy but also prevents
the possibility of certain types of inconsistencies and anomalies in the data.

For most practical applications, normalization up to 3NF or BCNF is often adequate. However, when
specific types of redundancy or data anomalies are a concern, proceeding to 4NF or even 5NF can be
beneficial.

Fifth Normal Form (5NF or PJNF) in DBMS


DBMS 17

The Fifth Normal Form (5NF), also known as Project-Join Normal Form (PJNF), is a further step in the
normalization process. It aims to address redundancy arising from certain types of join dependencies that
aren't covered by earlier normal forms.

A relation is in 5NF or PJNF if:

1. It is already in BCNF.

2. Every non-trivial join dependency in the relation is implied by the candidate keys.

A join dependency occurs in a relation R when it is always possible to reconstruct R by joining multiple
projections of R. A join dependency is represented as {R1, R2, ..., Rn} ⟶ R, which means that when R is
decomposed into R1, R2, ..., Rn, the natural join of these projections results in the original relation R.

The join dependency is non-trivial if none of the projections Ri is equal to R.

Example for Fifth Normal Form (5NF)


Consider a relation involving suppliers, parts, and projects:

Initial Table (SupplierPartsProjects)

| Supplier | Part | Project |


|----------|-------|---------|
| S1 | P1 | J1 |
| S1 | P2 | J1 |
| S1 | P1 | J2 |
| S2 | P2 | J2 |
Assume the following constraints for our example:

Every part supplied for a project is supplied by all suppliers supplying any part for that project.
Every part supplied by a supplier is supplied by that supplier for all projects to which that supplier
supplies any part.
Given the above constraints, the following join dependencies exist on the table:

{Supplier, Part} ⟶ SupplierPartsProjects


{Supplier, Project} ⟶ SupplierPartsProjects
{Part, Project} ⟶ SupplierPartsProjects
To decompose the relation into 5NF:

SupplierParts:

| Supplier | Part |
|----------|-------|
| S1 | P1 |
| S1 | P2 |
DBMS 18

| S2 | P2 |
SupplierProjects:

| Supplier | Project |
|----------|---------|
| S1 | J1 |
| S1 | J2 |
| S2 | J2 |
PartsProjects:

| Part | Project |
|-------|---------|
| P1 | J1 |
| P2 | J1 |
| P1 | J2 |
| P2 | J2 |
Now, these decomposed tables eliminate the redundancy caused by the specific constraints and join
dependencies of the original relation. When you take the natural join of these tables, you will get back the
original table.

It's worth noting that reaching 5NF can lead to an increased number of tables, which can complicate
queries and database operations. Thus, achieving 5NF should be a conscious decision made based on the
specific requirements and constraints of a given application.

When a relation is decomposed into multiple smaller relations, lossless decomposition ensures that the
original information contained in the original relation can be reconstructed by joining the decomposed
relations using suitable join operations.

You might also like