Database Normalization Updated

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 22

Database

Normalization
What is Database Normalization?

Database Normalization is a technique of organizing the data in the database. Normalization is a


systematic approach of decomposing tables to eliminate data redundancy (repetition) and undesirable
characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that puts data into
tabular form, removing duplicated data from the relation tables.
Normalization is used for mainly two purposes,
Eliminating redundant(useless) data.
Ensuring data dependencies make sense i.e data is logically stored.
Why Normalize a Database?

Make the database more efficient


Prevent the same data from being stored in more than one place (called an “insert anomaly”)
Prevent updates being made to some data but not others (called an “update anomaly”)
Prevent data not being deleted when it is supposed to be, or from data being lost when it is not
supposed to be (called a “delete anomaly”)
Ensure the data is accurate
Reduce the storage space that a database takes up
Ensure the queries on a database run as fast as possible
Data Anomalies

An anomaly is where there is an issue in the data that is not meant to
be there. This can happen if a database is not normalised.
We’ll be using a student database as an example, which records student, class,
and teacher information.

Student
Student Name Fees Paid Course Name Class 1 Class 2 Class 3
ID

1 John Smith 200 Economics Economics 1 Biology 1

2 Maria Griffin 500 Computer Science Biology 1 Business Intro Programming 2

3 Susan Johnson 400 Medicine Biology 2

4 Matt Long 850 Dentistry

This is not a normalised table, and there are a few issues with this.
Insert Anomaly
An insert anomaly happens when we try to insert a record into this table without knowing all
the data we need to know.
For example, if we wanted to add a new student but did not know their course name.
The new record would look like this:
Student Course
Student ID Fees Paid Class 1 Class 2 Class 3
Name Name
1 John Smith 200 Economics Economics 1 Biology 1
Computer Business Programming
2 Maria Griffin 500 Biology 1
Science Intro 2
Susan
3 400 Medicine Biology 2
Johnson
4 Matt Long 850 Dentistry
Jared
5 0 ?
Oldham

We would be adding incomplete data to our table, which can cause issues when trying to analyse this data.
Update Anomaly
An update anomaly happens when we want to update data, and we update some of the data
but not other data.
For example, let’s say the class Biology 1 was changed to “Intro to Biology”. We would have to
query all of the columns that could have this Class field and rename each one that was found.
Student Course
Student ID Fees Paid Class 1 Class 2 Class 3
Name Name
Intro to
1 John Smith 200 Economics Economics 1
Biology
Computer Intro to Business Programming
2 Maria Griffin 500
Science Biology Intro 2
Susan
3 400 Medicine Biology 2
Johnson
4 Matt Long 850 Dentistry

There’s a risk that we miss out on a value, which would cause issues.
Ideally, we would only update the value once, in one location.
Delete Anomaly
A delete anomaly occurs when we want to delete data from the table, but we end up deleting more than
what we intended.

For example, let’s say Susan Johnson quits and her record needs to be deleted from the system. We could
delete her row:

Student ID Student Name Fees Paid Course Name Class 1 Class 2 Class 3
1 John Smith 200 Economics Economics 1 Biology 1
Computer
2 Maria Griffin 500 Biology 1 Business Intro Programming 2
Science
3 Susan Johnson 400 Medicine Biology 2
4 Matt Long 850 Dentistry

But, if we delete this row, we lose the record of the Biology 2 class, because it’s not stored anywhere else. The same
can be said for the Medicine course.
We should be able to delete one type of data or one record without having impacts on other records we don’t want
to delete.
Without any normalization, all information is stored in one table as shown below.

Here you see Movies Rented column has multiple values.


1NF (First Normal Form) Rules
each table cell should contain a single value.

each record needs to be unique.

1NF Example
First Normal Form (1NF)

•A relation will be 1NF if it contains an atomic value.


•It states that an attribute of a table cannot hold multiple values. It must hold only single-
valued attribute.
•First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
2NF (Second Normal Form) Rules
Rule 1- Be in 1NF
Rule 2- Single Column Primary Key that does not functionally dependent on any subset of candidate
key relation

Table 2
Second Normal Form (2NF)

•In the 2NF, relational must be in 1NF.


•In the second normal form, all non-key attributes are fully functional dependent
on the primary key
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a
candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF,
we decompose it into two tables:
3NF (Third Normal Form) Rules
Rule 1 - Be in 2NF
Rule 2- Has no transitive functional dependencies

Table 2
Third Normal Form
(3NF)

By transitive functional dependency, we


mean we have the following relationships
in the table: A is functionally dependent on
B, and B is functionally dependent on C. In
this case, C is transitively dependent on A
via B.
In the table, [Book ID] determines [Genre ID], and [Genre ID] determines [Genre Type]. Therefore, [Book ID] determines [Genre
Type] via [Genre ID] and we have transitive functional dependency, and this structure does not satisfy third normal form.
To bring this table to third normal form, we split the table into two as follows:

Now all non-key attributes are fully functional dependent only on


the primary key. In [TABLE_BOOK], both [Genre ID] and [Price]
are only dependent on [Book ID]. In [TABLE_GENRE], [Genre
Type] is only dependent on [Genre ID].
SUMMARY
Normalization is a systematic approach of decomposing tables to eliminate data
redundancy(repetition) and undesirable characteristics like Insertion, Update and Deletion
Anomalies.
It is a multi-step process that puts data into tabular form, removing duplicated data from the
relation tables
THANK YOU

You might also like