Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 18

DBMS Normalization

In DBMS, normalization is needed to ensure that the database is designed in such a way that it can handle a large amount of data while
maintaining accuracy and consistency.

Large database can have redundancies and repetitions. This repetition of data may result in:
• Making tables very large.
• It isn't easy to maintain and update data as it would involve searching many records in relation.
• Wastage and poor utilization of disk space and resources.
• The likelihood of errors and inconsistencies increases.

A database anomaly is a flaw in the database that occurs because of poor planning and redundancy.

1. Insertion anomalies: This occurs when we are not able to insert data into a database because some attributes may be missing at the
time of insertion.
2. Updation anomalies: This occurs when the same data items are repeated with the same values and are not linked to each other.
3. Deletion anomalies: This occurs when deleting one part of the data deletes the other necessary information from the database.

So to handle these problems, we should analyze and decompose the relations with redundant data into smaller, simpler, and well-
structured relations that are satisfy desirable properties. Normalization is a process of decomposing the relations into relations with fewer
attributes.
DBMS Normalization
Here are some examples of why normalization is needed in DBMS:

• Reducing data redundancy: Normalization helps to eliminate data redundancy by dividing data into smaller, more
manageable tables. For example, consider a table that contains information about customers and their orders. If this
information is stored in a single table, there may be redundant data, such as the customer's name and address being
repeated for each order. Normalization can help to eliminate such redundancies by creating separate tables for
customers and orders.

• Avoiding update anomalies: Update anomalies occur when changing data in one place can cause inconsistencies in other
places. For example, suppose that a table contains information about customers and the products they have purchased.
If a customer changes their address, this change would need to be made in multiple places in the table, which could lead
to inconsistencies. By normalizing the data, updates can be made in one place, reducing the risk of update anomalies.

• Improving data integrity: Normalization helps to ensure that data in the database is accurate and consistent. By
organizing data into smaller tables and minimizing redundancy, the database becomes less prone to errors and
inconsistencies. This can lead to improved data integrity and higher-quality data.
DBMS Normalization

• 1NF: A relation is in 1NF if all its attributes have an atomic value.


• 2NF: A relation is in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the candidate key.
• 3NF: A relation is in 3NF if it is in 2NF and there is no transitive dependency.
• BCNF: A relation is in BCNF if it is in 3NF and for every Functional Dependency, LHS is the super key.
• 4NF: A relation is in 4NF if it is in BCNF and there are no multi-valued dependencies. This means that the relation should
not have multiple independent sets of values for the same set of attributes.
• 5NF: A relation is said to be in 5th Normal Form (5NF) when it satisfies the conditions of Fourth Normal Form (4NF) and has
no join dependencies between any of its candidate keys, meaning that it cannot be further decomposed without losing
data or introducing anomalies.
DBMS Normalization = 1NF
Definition: A table is in 1NF if and only if every attribute value is atomic or indivisible.
• Column of the table should contain only atomic values
• There should be no repeating groups of data.

Properties:

• Each column of the table should contain only atomic values, meaning that it should not contain arrays, lists, or any other
complex data structures.
• There should be no repeating groups of data. This means that each row in the table should represent a single entity, and
multiple occurrences of a single attribute should be represented as separate rows in the table.
DBMS Normalization = 1NF
Original Table –
In this table, the attribute "Subject1" contains a list of values (e.g., "Maths, Science, English"), violating the 1NF rule. To
bring this table into 1NF, we need to create a separate table for subjects and link it to the "Students" table using a foreign
key. Name Subject1 Subject2 Subject3
Tom Maths, Science, English History, Geography Physics
Alex Science, Maths Physics, Chemistry English, Art

Modified Table
SubjectID Subject
Name SubjectID 1 Maths
Tom 1 2 Science
Tom 2 3 English
Tom 3 4 Art
Alex 1 5 History
Alex 2
6 Geography
Alex 4
7 Physics
8 Chemistry
DBMS Normalization = 1NF
Original Table –
In this table, the attributes "Product_Name", "Quantity", and "Price" are repeating groups since they can have multiple
values for each order. To bring this table into 1NF, we need to create a separate table for the repeating attributes and link it
to the "Orders" table using a foreign key

Modified Table
DBMS Normalization = 2NF
Definition: A relation is in 2NF if and only if it is in 1NF and every non-key attribute is fully functionally dependent on the
primary key.
• Each non-key attribute of the table must be dependent on the entire primary key, not just a part of it.

Properties:

• The table must be in 1NF.


• Every non-key attribute must be fully functionally dependent on the primary key.
DBMS Normalization = 2NF
Original Table –
In this table, the attribute "Course_Instructor and Name" depends only on the "Course_ID" attribute, not on the entire
primary key. To bring this table into 2NF, we need to create a separate table for courses and link it to the "Students" table
using a foreign key.

Modified Table
Course_ID Course_Instructor
101 Dr. Brown
102 Dr. Smith
103 Dr. Johnson
DBMS Normalization = 2NF
Original Table –
In this table, the attributes "Department_Name" and "Manager_Name" depend only on the "Department_ID" and
"Manager_ID" attributes, respectively, violating the 2NF rule. To bring this table into 2NF, we need to create separate tables
for departments and managers and link them to the "Employees" table using foreign keys.

Modified Table
Course_ID Course_Instructor
101 Dr. Brown
102 Dr. Smith
103 Dr. Johnson
DBMS Normalization = 3NF
Definition:
A relation is said to be in 3NF if it meets the following conditions - it is in 2NF and there are no transitive dependencies
between non-key attributes.

Advantage:
• No data redundancy: Each piece of data is stored only once, reducing the storage space required and minimizing the risk
of inconsistencies and errors.
• Data integrity: With no transitive dependencies between non-key attributes, there is a reduced risk of data anomalies
and inconsistencies.

Transitive Dependency:
A transitive dependency is a functional dependency between non-key attributes in which the value of one non-key
attribute determines the value of another non-key attribute
DBMS Normalization = 3NF
Original Table –
In this table, the attributes "Supplier_Name" and "Supplier_Country" depend on the "Supplier_ID" attribute, which is not a
part of the primary key. To bring this table into 3NF, we need to create a separate table for suppliers and link it to the
"Orders" table using a foreign key

Modified Table
DBMS Normalization = BCNF
Definition:
A relation is said to be in BCNF if it meets the following conditions - it is in 3NF and does not have dependencies between
attributes that belong to candidate keys – all the attributes should be dependent on only and only candidate key
• Since we are talking about dependencies between attributes, hence any candidate key of the table needs to have 2
or more attributes.
• In case candidate key of the table has only 1 attribute, BCNF is already satisfied

BCNF is the advance version of 3NF. It is stricter than 3NF.


DBMS Normalization = BCNF
Original Table –
In the table below:
• One student can enrol for multiple subjects. For example, student with student_id 101, has opted for subjects - Java & C++
• For each subject, a professor is assigned to the student.
• And, there can be multiple professors teaching one subject like we have for Java.

There is one dependency, professor → subject.


And while subject is a prime attribute, professor is a non-prime attribute, which is not allowed by BCNF.

Table Transformation
DBMS Normalization = 4NF
Definition:
A relation is said to be in 4NF if it meets the following conditions - it is in 3NF and there are no multi-valued dependencies.

Multi-Valued Dependencies:
Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a third
attribute.
A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it always
requires at least three attributes.
DBMS Normalization = 4NF
Original Table –
Suppose there is a bike manufacturer company which produces two colors(white and black) of each model every year.
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other.

Modified Table
DBMS Normalization = 4NF
Original Table –
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two hobbies, Dancing
and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.

STU_ID COURSE HOBBY


21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey

Modified Table
STU_ID COURSE STU_ID HOBBY
21 Computer 21 Dancing
21 Math 21 Singing
34 Chemistry 34 Dancing
74 Biology 74 Cricket
59 Physics 59 Hockey
DBMS Normalization = 5NF
Definition:
A relation is said to be in 5NF if the relation is in 4NF, and not contains any join dependency and joining should be lossless.
5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.

Multi-Valued Dependencies:
Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a third
attribute.
A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it always
requires at least three attributes.
DBMS Normalization = 5NF
Original Table –
In the below table, John takes both Computer and Math class for Semester 1 but he doesn't take class for Semester 2. In this
case, combination of all these fields required to identify a valid data (all columns constitute primary key)

Modified Table
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen

You might also like