Professional Documents
Culture Documents
DBMS Unit 3
DBMS Unit 3
UNIT-III
SCHEMA REFINEMENT AND NORMAL FORMS: Introduction to Schema Refinement -
Problems Caused by redundancy, Decompositions - Problem related to decomposition, Functional
Dependencies -Reasoning about FDS, Normal Forms - FIRST, SECOND, THIRD Normal forms -
BCNF - Properties of Decompositions - Lossless join Decomposition, Dependency preserving
Decomposition, Multi valued Dependencies - FOURTH Normal Form, Join Dependencies, FIFTH
Normal form.
Redundancy: The `CustomerName` "Madhu" and his `CustomerAddress` "Hyderabad" are repeated for
two orders.
Problems:
Update Anomaly: If Madhu moves to a new address, you'd have to update multiple rows. If you forget to
update all the rows, it leads to inconsistent data.
Insertion Anomaly: To insert a new order for Madhu, you have to re-enter his address, leading to further
redundancy.
Deletion Anomaly: If you decide to delete the order with the mouse, you might be tempted to delete
Madhu's details entirely, but that would remove crucial data associated with the laptop order.
Solution:
Normalizing the database can resolve these problems. In this example, splitting the table into two tables,
`Orders` and `Customers`, would be a start:
1. Customers Table:
However, while decomposition can be helpful, it is not without challenges. Done incorrectly,
decomposition can lead to its own set of problems.
DBMS 3
|A |B |C |
|----|----|----|
|1 |X |P |
|1 |Y |P |
|2 |Z |Q |
Suppose we decompose R into R1(A,B) and R2(A,C).
R1(A, B):
|A |B |
|----|----|
|1 |X |
|1 |Y |
|2 |Z |
R2(A, C):
|A |C |
|----|----|
|1 |P |
|1 |P |
|2 |Q |
Now, if we take the natural join of R1 and R2 on attribute A, we get back the original relation R.
Therefore, this is a lossless decomposition.
A→B
B→C
DBMS 4
R1(A,B) with FD A → B
R2(B,C) with FD B → C
In this case, the decomposition is dependency-preserving because all the functional dependencies of the
original relation R can be found in the decomposed relations R1 and R2. We do not need to join R1 and
R2 to enforce or check any of the functional dependencies.
However, if we had a functional dependency in R, say A → C, which cannot be determined from either
R1 or R2 without joining them, then the decomposition would not be dependency-preserving for that
specific FD.
3. Increased Complexity
Decomposition leads to an increase in the number of tables, which can complicate queries and
maintenance tasks. While tools and ORM (Object-Relational Mapping) libraries can mitigate this to some
extent, it still adds complexity.
4. Redundancy
Incorrect decomposition might not eliminate redundancy, and in some cases, can even introduce new
redundancies.
5. Performance Overhead
An increased number of tables, while aiding normalization, can also lead to more complex SQL queries
involving multiple joins, which can introduce performance overheads.
Best Practices
Ensure decomposition is non-lossy. After decomposition, it should be possible to recreate the original data
using natural joins.
Preserve functional dependencies to enforce integrity constraints.
Strike a balance. While normalization and decomposition are essential, in some scenarios (like reporting
databases), a certain level of denormalization might be preferred for performance reasons.
Regularly review and optimize the database design, especially as the application's requirements evolve.
In essence, while decomposition is a powerful tool in achieving database normalization and reducing
anomalies, it must be done thoughtfully and judiciously to avoid introducing new problems.
Functional dependencies play a vital role in the normalization process in relational database design. They
help in defining the relationships between attributes in a relation and are used to formalize the properties
of the relation and drive the process of decomposition.
DBMS 5
sid→sname
zipcode→cityname
cityname→state
A→A
is a trivial dependency because an attribute always determines itself.
AB→A
is a trivial dependency because the combined attributes A and B always determine A as it's a subset.
ABC→AC
is a trivial dependency for the same reason; the combined attributes A,B, and C always determine A and
C.
2. Full Functional Dependency
DBMS 6
- An attribute functionally depends on a set of attributes, X, and does not functionally depend on any
proper subset of X.
This means that a combination of a specific student and a specific course will determine who the
instructor is.
Thus, Instructor is fully functionally dependent on the combined attributes StudentID and CourseID.
3. Transitive Dependency
- If A -> B and B -> C, then A has a transitive dependency on C through B.
EmployeeID → Department
Department → DepartmentLocation
However, the DepartmentLocation is also dependent on the EmployeeID through Department. This means
the DepartmentLocation has a transitive dependency on EmployeeID via Department.
4. Closure
- The closure of a set of attributes X with respect to a set of functional dependencies FD, denoted as X+,
is the set of attributes that are functionally determined by X.
- For example, given FDs: {A -> B, B -> C}, the closure of {A}, denoted as A+, would be {A, B, C}.
Often considered when dealing with temporal databases (databases that have time-dependent data).
Deals with how data evolves over time and is less commonly discussed in most relational database design
contexts.
Normalization often involves trade-offs. While higher normal forms eliminate redundancy and improve
data integrity, they can also result in more complex relational schemas and sometimes require more joins,
which can affect performance. As such, it's essential to understand the data and the specific application's
requirements when deciding the level of normalization suitable for a particular situation. Sometimes,
denormalization (intentionally introducing redundancy) is implemented to improve performance,
especially in read-heavy databases.
The First Normal Form (1NF) is the first step in the normalization process of organizing data within a
relational database to reduce redundancy and improve data integrity. A relation (table) is said to be in 1NF
if it adheres to the following rules:
1. Atomic Values:
Each attribute (column) contains only atomic (indivisible) values. This means values in each column are
indivisible units and there should be no sets, arrays, or lists.
For example, a column called "Phone Numbers" shouldn't contain multiple phone numbers for a single
record. Instead, you'd typically break it into additional rows or another related table.
2. Primary Key:
Each table should have a primary key that uniquely identifies each row. This ensures that each row in the
table can be uniquely identified.
3. No Duplicate Rows:
There shouldn’t be any duplicate rows in the table. This is often ensured by the use of the primary key.
4. Order Doesn't Matter:
The order in which data is stored doesn't matter in the context of 1NF (or any of the normal forms).
Relational databases don't guarantee an order for rows in a table unless explicitly sorted.
5. Single Valued Attributes:
Columns should not contain multiple values of the same type. For example, a column "Skills" shouldn't
contain a list like "Java, Python, C++" for a single record. Instead, these skills should be split across
multiple rows or placed in a separate related table.
Example for First Normal Form (1NF)
Consider a table with a structure:
| Student_ID | Subjects |
|------------|-------------------|
|1 | Math, English |
DBMS 9
|2 | English, Science |
|3 | Math, History |
The table above is not in 1NF because the "Subjects" column contains multiple values.
To transform it to 1NF:
| Student_ID | Subject |
|------------|-----------|
|1 | Math |
|1 | English |
|2 | English |
|2 | Science |
|3 | Math |
|3 | History |
Now, each combination of "Student_ID" and "Subject" is unique, and every attribute contains only atomic
values, ensuring the table is in 1NF.
Achieving 1NF is a fundamental step in database normalization, laying the foundation for further
normalization processes to eliminate redundancy and ensure data integrity.
The Second Normal Form (2NF) is the next stage in the normalization process after the First Normal
Form (1NF). A relation is in 2NF if:
1. It is already in 1NF:
This means the relation contains only atomic values, there are no duplicate rows, and it has a primary key.
2. No Partial Dependencies:
All non-key attributes (i.e., columns that aren't part of the primary key) should be functionally dependent
on the *entire* primary key. This rule is especially relevant for tables with composite primary keys (i.e.,
primary keys made up of more than one column).
In simpler terms, no column should depend on just a part of the composite primary key.
Example for Second Normal Form
Let's consider a table that keeps track of the courses that students are enrolled in, with the faculty who
teach those courses:
|1 | C2 | English | Ms. B |
|2 | C1 | Math | Mr. A |
|3 | C3 | History | Ms. C |
Here, a combination of `Student_ID` and `Course_ID` can be considered as a primary key because a
student can be enrolled in multiple courses, and each course might be taken by many students.
However, you'll notice that `Course_Name` and `Faculty` depend only on `Course_ID` and not on the
combination of `Student_ID` and `Course_ID`. This is a partial dependency.
StudentCourse Table
| Student_ID | Course_ID |
|------------|-----------|
|1 | C1 |
|1 | C2 |
|2 | C1 |
|3 | C3 |
Course Table
It's worth noting that while 2NF does improve the structure of our database by reducing redundancy and
eliminating partial dependencies, it might not eliminate all anomalies or redundancy. Further
normalization forms (like 3NF and BCNF) address additional types of dependencies and potential issues.
The Third Normal Form (3NF) is a further step in the normalization process after achieving Second
Normal Form (2NF). A relation is considered to be in 3NF if:
1. It is already in 2NF:
This means the relation has no partial dependencies of non-key attributes on the primary key.
2. No Transitive Dependencies:
All non-key attributes are functionally dependent only on the primary key and not on any other non-key
attributes. If there is a dependency of one non-key attribute on another non-key attribute, it is called a
transitive dependency, and such a dependency violates 3NF.
Simply put, in 3NF, non-key attributes should not depend on other non-key attributes; they should only
depend on the primary key.
Product Table
| Vendor_Name | Vendor_Address |
|-------------|-----------------|
| TechCorp | 123 Tech St. |
| FurniShop | 456 Furni Rd. |
Now, the `Product` table has `Product_ID` as the primary key, and all attributes in this table depend only
on the primary key. The `Vendor` table has `Vendor_Name` as its primary key, and the address in this
table depends only on the vendor name.
This normalization eliminates the transitive dependency and reduces redundancy. If we need to change a
vendor's address, we now only have to make the change in one place in the `Vendor` table.
DBMS 12
To further refine the database structure, we might proceed to other normalization forms like BCNF, but
3NF is often sufficient for many practical applications and strikes a good balance between minimizing
redundancy and maintaining a manageable schema.
Boyce-Codd Normal Form (BCNF) is an advanced step in the normalization process, and it's a stronger
version of the Third Normal Form (3NF). In fact, every relation in BCNF is also in 3NF, but the converse
isn't necessarily true. BCNF was introduced to handle certain anomalies that 3NF does not deal with.
1. It is already in 3NF.
Here, "non-trivial" means that Y is not a subset of X, and a "superkey" is a set of attributes that
functionally determines all other attributes in the relation.
Initial Table
To bring this table into BCNF, we can decompose it into two tables:
StudentSupervision Table:
| Student | Professor |
|---------|-----------|
| Alice | Mr. A |
| Bob | Mr. B |
| Charlie | Mr. C |
ProfessorTopic Table:
| Professor | Topic |
|-----------|--------|
| Mr. A | Math |
| Mr. B | Math |
| Mr. C | Physics|
This decomposition eliminates the partial dependency and ensures that the only determinants are
superkeys, making the structure adhere to BCNF.
In practice, BCNF is a highly normalized form, and while it can minimize redundancy, it can also increase
the complexity of the database design. Designers often have to make trade-offs between achieving higher
normal forms and maintaining simplicity, depending on the specific use case and requirements of the
system.
Properties of Decomposition
Decomposition must have the following properties:
2. Dependency Preservation
2. Dependency Preservation
DBMS 14
Dependency is a crucial constraint on a database, and a minimum of one decomposed table must satisfy
every dependency. If {P → Q} holds, then two sets happen to be dependent functionally. Thus, it becomes
more useful when checking the dependency if both of these are set in the very same relation. This
property of decomposition can be done only when we maintain the functional dependency. Added to this,
this property allows us to check various updates without having to compute the database structure’s
natural join.
It refers to the ability to recreate the original relation or relations through the process of joining the
decomposed relations, without losing any information or introducing spurious tuples.
Dependency Preservation: The decomposed relations should preserve all the functional dependencies that
existed in the original relation.
Common Attribute: There should be at least one common attribute shared by the decomposed relations.
This attribute enables the join operation to recreate the original relation without any loss of information.
Lossless decomposition is important because it ensures that the decomposition process does not introduce
any information loss or anomalies in the resulting relations. It helps maintain the integrity and correctness
of the database.
1. It is already in 3NF.
To clarify, consider a relation R with attributes X, Y, and Z. We say that there is a multi-valued
dependency from X to Y, denoted X↠Y
, if for a single value of X, there are multiple values of Y associated with it, independent of Z.
Initial Table
| S1 | Hiking | Physics |
| S2 | Reading | Chemistry |
| S2 | Reading | Biology |
In the table:
For student `S1`, there are two hobbies (`Painting` and `Hiking`) and two courses (`Math` and `Physics`),
resulting in a combination of every hobby with every course.
This design suggests a multi-valued dependency between `Student_ID` and `Hobby`, and also between
`Student_ID` and `Course`.
To bring the table to 4NF, we can decompose it into two separate tables:
StudentHobbies Table:
| Student_ID | Hobby |
|------------|------------|
| S1 | Painting |
| S1 | Hiking |
| S2 | Reading |
StudentCourses Table:
| Student_ID | Course |
|------------|------------|
| S1 | Math |
| S1 | Physics |
| S2 | Chemistry |
| S2 | Biology |
With this separation:
For most practical applications, normalization up to 3NF or BCNF is often adequate. However, when
specific types of redundancy or data anomalies are a concern, proceeding to 4NF or even 5NF can be
beneficial.
The Fifth Normal Form (5NF), also known as Project-Join Normal Form (PJNF), is a further step in the
normalization process. It aims to address redundancy arising from certain types of join dependencies that
aren't covered by earlier normal forms.
1. It is already in BCNF.
2. Every non-trivial join dependency in the relation is implied by the candidate keys.
A join dependency occurs in a relation R when it is always possible to reconstruct R by joining multiple
projections of R. A join dependency is represented as {R1, R2, ..., Rn} ⟶ R, which means that when R is
decomposed into R1, R2, ..., Rn, the natural join of these projections results in the original relation R.
Every part supplied for a project is supplied by all suppliers supplying any part for that project.
Every part supplied by a supplier is supplied by that supplier for all projects to which that supplier
supplies any part.
Given the above constraints, the following join dependencies exist on the table:
SupplierParts:
| Supplier | Part |
|----------|-------|
| S1 | P1 |
| S1 | P2 |
DBMS 18
| S2 | P2 |
SupplierProjects:
| Supplier | Project |
|----------|---------|
| S1 | J1 |
| S1 | J2 |
| S2 | J2 |
PartsProjects:
| Part | Project |
|-------|---------|
| P1 | J1 |
| P2 | J1 |
| P1 | J2 |
| P2 | J2 |
Now, these decomposed tables eliminate the redundancy caused by the specific constraints and join
dependencies of the original relation. When you take the natural join of these tables, you will get back the
original table.
It's worth noting that reaching 5NF can lead to an increased number of tables, which can complicate
queries and database operations. Thus, achieving 5NF should be a conscious decision made based on the
specific requirements and constraints of a given application.
When a relation is decomposed into multiple smaller relations, lossless decomposition ensures that the
original information contained in the original relation can be reconstructed by joining the decomposed
relations using suitable join operations.