Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 60

NORMALIZATION

Normalization is the process of organizing the data


in the database.
Normalization divides the larger table into the
smaller table and links them using relationship.
This Process is used to reduce the redundancy in
the large table
Terms to be understood:
1. Redundancy
2. Insertion Anamoly
3. Deletion Anamoly
4. Updation Anamoly
DATA REDUNDANCE
Repetition of Data at Multiple places
1. Size of the database
2. Insertion Anamoly
3. Deletion Anamoly
4. Updation Anamoly
INSERTION ANAMOLY
ANAMOLY - PROBLEM

INSERTION ANAMOLY:
When the user tries to insert data
into a table he has to insert
repeated data in every row
unnecessarily many times.
This is called INSERTION
ANAMOLY
DELETION ANAMOLY
ANAMOLY - PROBLEM

DELETION ANAMOLY:
When the user tries to delete some
data, its related dataset also will be
deleted.
Loss of related dataset when some
other dataset is deleted
This is called DELETION ANAMOLY
UPDATOON ANAMOLY
ANAMOLY - PROBLEM

UPDATION ANAMOLY:
When the user tries to update a data he
has to update the same data in all the
rows again and again.
Suppose any row misses while
updation it leads to data
inconsistency.
This is called UPDATION ANAMOLY
NORMALIZATION
HOW NORMALIZATION SOLVES THIS
PROBLEM?
Less redundancy
Solves:
INSERTION ANAMOLY
DELETION ANAMOLY
UPDATION ANAMOLY
HOW TO ACHIEVE
FIRST NORMAL FORM
SECOND NORMAL FORM
THIRD NORMAL FORM
FIRST NORMAL FORM
The table must be scalable, so that
the table can be easily extended.
If the table is not in first normal
form then it is said as the bad
database design.
Normalization Process
Very good database

4 NF etc

3 NF

2 NF
1 NF
UNNORMA
LIZED DATA
1 NORMAL FORM
1. Each column of the relation must
have ATOMIC VALUES
2. A Column must contain values of
SAME TYPE(data type)
3. Each column must have a UNIQUE
NAME
4. The values in the columns can be
stored in any order
2 NORMAL FORM
ND

1. The table Must be in


1st NORMAL FORM
2. There should be no
PARTIAL DEPENDENCY
in the relation
What is PARTIAL DEPENDENCY?
DEPENDENCY ?
If all the columns are depending on any
one column or the primary key column
then it is called
DEPENDENCY or
FUNCTIONAL DEPENDENCY
The functional dependency is a
relationship that exists between two
attributes. That is between the primary
key and non-key attribute.
PARTIAL DEPENDENCY?
For a simple table it is not difficult to
identify the other columns using
the primary key column
This is not the case at all situations
So, there are tables where the
combination of attributes will form
the primary key.
CONSIDER THE FOLLOWING
Now we have the folloiwng tables

STUDENT TABLE SUBJECT TABLE

SCORE TABLE
PARTIAL DEPENDENCY
SID SNAME
18MCAP201 C PROGRAMMING
18MCAP202 DWDM
18MCAP203 MAD
18MCAP204 CNS
18MCAP214 CNS LAB
18MCAP213 MAD LAB
18MCAP211 C PROGRAMMING LAB
PRIMARY KEY
SCORE TABLE
ROLL_NUMBER SID
SID MARKS TEACHER
100 18MCAP201 87 BALAN

100 18MCAP202 86 HARI

101 18MCAP201 67 BALAN

101 18MCAP202 76 HARI

200 18MCAP204 90 KISHORE

201 18MCAP203 64 SUSEELA

201 18MCAP201 56 BALAN


PARTIAL DEPENDENCY

PRIMARY KEY:
ROLL_NUMBER + SID
PARTIAL DEPENDENCY

MARKS, TEACHER -> ROLL_NUMBER +


SID
TEACHER -> SID
HOW TO REMOVE?
1. Remove the column from the
score table which creates the
PARTIAL DEPENDENCY
2. Put it in the subject table as a
NEXT COLUMN
PRIMARY KEY
SCORE TABLE
ROLL_NUMBER SID
SID MARKS TEACHER
100 18MCAP201 87 BALAN
100 18MCAP202 86 HARI
101 18MCAP201 67 BALAN
101 18MCAP202 76 HARI
200 18MCAP204 90 KISHORE
201 18MCAP203 64
SUSEELA
201 18MCAP201 56 BALAN
SUBJECT TABLE
SID SNAME TEACHER
18MCAP201 C PROGRAMMING BALAN
18MCAP202 DWDM HARI
18MCAP203 MAD SUSEELA
18MCAP204 CNS KISHORE
18MCAP214 CNS LAB KISHORE
18MCAP213 MAD LAB SUSEELA
18MCAP211 C PROGRAMMING LAB BALAN
3RD NORMAL FORM
It should be in 2nd NORMAL
FORM
It should not have
TRANSITIVE DEPENDENCY
SCORE TABLE
ROLL_NUMBER SID MARKS EXAM_NAME TOTAL_
MARKS
100 18MCAP201 87 CPROGRAMMING 100

100 18MCAP202 86 DWDM 60

101 18MCAP201 67 CPROGRAMMING 100

101 18MCAP202 76 DWDM 60

200 18MCAP204 90 CNS 60

201 18MCAP203 64 MAD 60

201 18MCAP201 56 C_LAB 40


TRANSITIVE DEPENDENCY?
ROLL_NUMBER SID MARKS EXAM_NAME TOTAL_
MARKS

NOT A PART OF
PRIMARY KEY
TRANSITIVE DEPENDENCY
The situation where some attributes
of a relation is completely depending
on a
NON-PRIME ATTRIBUTE
and not depending on
PRIME ATTRIBUTE
is called
TRANSITIVE DEPENDENCY
How to remove this?
Remove the two columns which
creates the transitive dependency and
make it as a SEPARATE TABLE .
Leave the NON PRIME attribute in
the previous table
EXAM TABLE
EXAM_NAME TOTAL_MARKS

CPROGRAMMING 100

DWDM 60

DATA STRUCTURES 100

DATABASE MANAGEMENT 100

CNS 60

MAD 60

C_LAB 40
SCORE TABLE
ROLL_NUMBER SID MARKS EXAM_NAME

100 18MCAP201 87 CPROGRAMMING

100 18MCAP202 86 DWDM

101 18MCAP201 67 CPROGRAMMING

101 18MCAP202 76 DWDM

200 18MCAP204 90 CNS

201 18MCAP203 64 MAD

201 18MCAP201 56 C_LAB


So, the bigger table is divided
ROLL_NUMBER NAME BRANCH
STUDENT

BRANCH HOD PHONE_OFFICE


BRANCH

SUBJECT SID SNAME TEACHER

ROLL_NUMBER SID MARKS EXAM_NAME


SCORE

EXAM_NAME TOTAL_MARKS
EXAM
BCNF or 3.5NF (Boyce Codd Normal Form)
1. Table Must be in
3rd NORMAL FORM
2. No NON PRIMARY COLUMN
should identify the PRIMARY key
column
i.e. A -> B means
A should be a primary key and B
should be a non primary key.
Consider the situation,
What will happen if a NON PRIME
ATTRIBUTE is able to find a PRIME
ATTRIBUTE value in a relation.
This means the relation is not in BCNF.
BCNF will not allow a NON PRIME
ATTRIBUTE to identify a PRIME
ATTRIBUTE.
PRIMARY KEY
ENROLL TABLE
ROLL_NUMBER SNAME TEACHER
100 C PROGRAMMING BALAN

101 DBMS SHRINIVAS

102 OS JOSEPH

102 DBMS MARAN

101 OS SIVA

104 MAD SHRINIVAS

100 CO KISHORE
DEPENDENCIES
ROLL_NUMBER + SNAME -> TEACHER
TEACHER - > SNAME

NON PRIME
ATTRIBUTE PRIME ATTRIBUTE
ROLL_NUMBER TID

TID TEACHER SUBJECT


JOINS
Combining the relations using a
common column.
1. INNER JOIN
2. OUTER JOIN –
Left Outer Join,
Right Outer Join,
Full Outer Join
3. NATURAL JOIN
4. SELF JOIN
LOSSLESS JOIN DECOMPOSITION
What is decomposition?
Separating relations into multiple relations
Why decomposition? To remove
1. Redundancy 2. The Anomalies 3.
Inconsistencies
Lossless Decomposition:\
Lossless Decomposition is defined as the process
of decomposing a relation into multiple
relations, if the relation is able to be
reconstructed again after applying any join
operation on the relations.
This will construct the same original relation
EXAMPLE
EMP_ID ENAME AGE LOCATION DEPT_ID DNAME
100 JOHN 34 BANGALORE 10 TESTING
101 KRISHNA 25 HYDERABAD 10 TESTING
102 BALAN 34 HYDERABAD 11 R&D
103 SIVA 43 TIRUPATHI 12 DEVELOPING
EMP_ID ENAME AGE LOCATION DEPT_ID

100 JOHN 34 BANGALORE 10


101 KRISHNA 25 HYDERABAD 10
102 BALAN 34 HYDERABAD 11
103 SIVA 43 TIRUPATHI 12

EMP_ID DEPT_ID DNAME

100 10 TESTING
101 10 TESTING
102 11 R&D
103 12 DEVELOPING
LOSSY DECOMPOSITION
EMP_ID ENAME AGE LOCATION

100 JOHN 34 BANGALORE


101 KRISHNA 25 HYDERABAD
102 BALAN 34 HYDERABAD
103 SIVA 43 TIRUPATHI

DEPT_ID DNAME

10 TESTING
10 TESTING
11 R&D
12 DEVELOPING
Lossless Join decomposition?
If a relation R is decomposed into R1 and R2,
Then it is a lossy decomposition if R1 R2 is the
superset of R
Decomposition is lossless if R1 R2 = R

Lossless Join Decomposition


To have lossless join decomposition, a relation should
follow these conditions.
1. Union of Attributes of R1 and R2 must be equal to R
2. Each attribute of R must be either in R1 or in R2
That means,
Attribute or R1 U Attributes of R2 =
Attributes of R
Intersection of R1 and R2 must NOT be
NULL
That means,
R1 Intersect R2 not equal to NULL
Consider a relation
R (A,B,C,D)
A->BC
Now R can be decomposed into R1(A,B,C) and
R2(A,D)
Condition becomes true,

1. R1 U R2 -> R1(A,B,C) U R2(A,D) -> R(A,B,C,D)


2. R1 intersect R2 -> R1(A,B,C) Intersect R2(A,D) -
>A
WHICH IS NOT EQUAL TO NULL
4 NORMAL FORM
th

For a table to satisfy the Fourth Normal


Form, it should satisfy the following two
conditions
It should be in the
BOYCE CODD NORMAL FORM
And, the table should not have any
MULTI VALUED DEPENDENCY
MULTIVALUED DEPENDENCY
For a dependency A → B, if for a single
value of A, multiple values of B exists, then
the table may have multi-valued
dependency.
Also, a table should have at-least 3
columns for it to have a multi-valued
dependency.
And, for a relation R(A,B,C), if there is a
multi-valued dependency between, A and
B, then B and C should be independent of
each other
EXAMPLE
ROLL_NUMBER COURSE HOBBIES
1 Science Cricket
1 Maths Hockey
2 C# Cricket
2 PHP Hockey
3 MAD Football
4 MAD Hockey
4 JAVA Chess
4 ANIMATION Football
5 MAD Carrom
5 JAVA Tennis
EXAMPLE
student with s_id 1 has opted for two
courses, Science and Maths, and has two
hobbies, Cricket and Hockey.

• What is the
Problem?
ROLL_NUMBER COURSE HOBBIES
1 Science Cricket
1 Maths Hockey
1 Science Hockey
1 Maths Cricket
2 C# Cricket
2 PHP Hockey
2 C# Hockey
2 PHP Cricket
MULTI VALUED DEPENDENCY
There is no relationship between the
columns course and hobby. They are
independent of each other.
Since there is multi-value dependency,
which leads to un-necessary repetition of
data and other anomalies as well.
How to satisfy 4th Normal Form?
CourseOpted Table

ROLL_NUMBER COURSE

1 Science
1 Maths
2 C#
2 PHP
3 MAD
4 MAD
4 JAVA
4 ANIMATION
Hobbies Table,

ROLL_NUMBER HOBBIES
1 Cricket
1 Hockey
2 Cricket
2 Hockey
3 Football
4 Hockey
4 Chess
4 Football
A table can also have functional
dependency along with multi-valued
dependency.
In that case, the functionally
dependent columns are moved in a
separate table and the multi-valued
dependent columns are moved to
separate tables.
5th NORMAL FORM
4TH NORMAL FORM
IT SHOULD NOT HAVE JOIN
DEPENDENCY
5th NORMAL FORM is otherwise called
as PROJECT JOIN NORMAL FORM
Join dependency
R

If a table can be recreated


by joining multiple tables
and each of this table have
a subset of the attributes of
the table, then the table is in
Join Dependency R1 R2

R
What is the problem if JOIN
dependency does not exist?
Two problems
1. Data Loss (OR)
2. New Entries are Created
EXAMPLE
SUPPLIER PRODUCT CUSTOMER
CROMPTOM FAN HARIHARAN
CROMPTON MOTOR SAMUVEL
PHILIPS SPEAKER HARIHARAN
PHILIPS TV KRISHNAN
HAVELLES FAN SAMUVEL
HAVELLES LED BULB KRISHNAN
HAVELLES MOTOR KRISHNAN
TRY TO DIVIDE
SUPPLIER PRODUCT CUSTOMER SUPPLIER PRODUCT

CROMPTOM FAN HARIHARAN CROMPTOM FAN

CROMPTON MOTOR SAMUVEL CROMPTON MOTOR

PHILIPS SPEAKER HARIHARAN PHILIPS SPEAKER

PHILIPS LED BULB KRISHNAN PHILIPS


LLSLED BULB
HAVELLES FAN SAMUVEL HAVELLES SE FAN
HAVELLES LED BULB KRISHNAN HAVELLES LED BULB

HAVELLES MOTOR KRISHNAN HAVELLES MOTOR

SUPPLIER CUSTOMER CUSTOMER PRODUCT


CROMPTOM HARIHARAN HARIHARAN FAN
CROMPTON SAMUVEL SAMUVEL MOTOR
PHILIPS TO HARIHARAN HARIHARAN SPEAKER
S
PHILIPS LIE KRISHNAN KRISHNAN
SES LED BULB
HAVELLES U PP SAMUVEL SAMUVEL U FAN
S
HAVELLES KRISHNAN KRISHNAN LED BULB

HAVELLES KRISHNAN KRISHNAN MOTOR


WHAT JOIN DEPENDENCY SAYS?
R

R1 R2

R
What the conclusion?
The above three tables are not able to find
out proper information after
decomposition.
That is, the three tables are not following
the JOIN DEPENDENCY
But without decompostion one can able to
say the information properly.
So, there is no necessity to decompose the
table further. (Information is lost)
Conclusion: If decomposition leads no loss
of information, then decompose it. Else No.
Example – Unnormalized table
EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar


EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

You might also like