Normalization Notes

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

DBMS – NORMALIZATION

Functional Dependency

Functional dependency FD is a set of constraints between two attributes in


a relation. Functional dependency says that if two tuples have same values
for attributes A1, A2,..., An, then those two tuples must have to have same
values for attributes B1, B2, ..., Bn. Functional dependency is represented
by an arrow sign → that is, X→Y, where X functionally determines Y. The
left-hand side attributes determine the values of attributes on the right-hand
side.

Armstrong's Axioms

If F is a set of functional dependencies then the closure of F, denoted as


F+, is the set of all functional dependencies logically implied by F.
Armstrong's Axioms are a set of rules, that when applied repeatedly,
generates a closure of functional dependencies. Reflexive rule − If alpha
is a set of attributes and beta is_subset_of alpha, then alpha holds beta.
Augmentation rule − If a → b holds and y is attribute set, then ay → by
also holds. That is adding attributes in dependencies, does not change the
basic dependencies. Transitivity rule − same as transitive rule in algebra,
if a → b holds and b → c holds, then a→ c also holds. a → b is called as a
functionally that determines b.

Trivial Functional Dependency

Trivial − If a functional dependency FD X → Y holds, where Y is a subset of


X, then it is called a trivial FD. Trivial FDs always hold. Non-trivial − If an
FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial
FD. Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ,
it is said to be a completely non-trivial FD.

Page 1 of 14
COMPILED BY: SANDIP GHOSHAL
NORMALIZATION (Normal Form)

➢ 1NF →Remove repeating groups or non atomic attribute.

➢ 2NF →Remove partial dependency.

➢ 3NF →Remove transitive dependency.

➢ BCNF / 3.5NF →All the determinants should be the key element.

➢ 4NF →Separate SVD(Single Valued Dependency) and


MVD(Multi Valued Dependency).

➢ 5NF →Remove JD(Join Dependency). Create tables for


each existing MVD.

➢ 6NF →Same as 5NF. Applied for Temporal Data Base.

➢ DK/NF →Separate all the attributes which have no explicit


constraint.

Page 2 of 14
COMPILED BY: SANDIP GHOSHAL
Normalization

If a database design is not perfect, it may contain anomalies, which are like
a bad dream for any database administrator. Managing a database with
anomalies is next to impossible.

Basically normalization is a process through which we can analyze a


given relational schema to reduce data redundancy and also to
remove following three anomalies.

▪ Update anomalies
▪ Deletion anomalies
▪ Insert anomalies

First Normal Form

First Normal Form is defined in the definition of relations tables itself. This
rule defines that all the attributes in a relation must have atomic domains.
The values in an atomic domain are indivisible units.

UNF → 1NF:

✓ Remove repeating groups: ƒ


✓ Entering appropriate data in the empty columns of rows.
✓ Placing repeating data along with a copy of the original key attribute
in a separate relation. Identifying a primary key for each of the new
relations.

Second Normal Form

Before we learn about the second normal form, we need to understand the
following –

Prime attribute − An attribute, which is a part of the prime-key, is known as


a prime attribute.

Non-prime attribute − An attribute, which is not a part of the prime-key, is


said to be anon-prime attribute.

Page 3 of 14
COMPILED BY: SANDIP GHOSHAL
If we follow second normal form, then every non-prime attribute should be
fully functionally dependent on prime key attribute. That is, if X → A holds,
then there should not be any proper subset Y of X, for which Y → A also
holds true.

1NF → 2NF:

✓ Remove partial dependencies:


✓ The functionally dependent attributes are removed from the relation
by placing them in a new relation along with a copy of their
determinant.

Third Normal Form

For a relation to be in Third Normal Form, it must be in Second Normal


form and the following must satisfy –

No non-prime attribute is transitively dependent on prime key attribute. For


any non-trivial functional dependency, X → A, then either –

X is a super key or,


A is prime attribute.

2NF → 3NF:

✓ Remove transitive dependencies:


✓ The transitively dependent attributes are removed from the relation by
placing them in a new relation along with a copy of their determinant.

Boyce-Codd Normal Form [ 3.5 Normal Form ]

Boyce-Codd Normal Form BCNF is an extension of Third Normal Form on


strict terms. BCNF states that –

For any non-trivial functional dependency, X → A, X must be a super-


key.

Page 4 of 14
COMPILED BY: SANDIP GHOSHAL
Transformation to BCNF:

✓ Remove violating functional dependencies by placing them in a new


relation.

Multi Valued Dependency [ MVD ]

• Multi valued dependency occurs when two attributes in a table are


independent of each other but, both depend on a third attribute.
• A multi valued dependency consists of at least two attributes that are
dependent on a third attribute that's why it always requires at least
three attributes.

Example: Suppose there is a bike manufacturer company which produces


two colors (white and black) of each model every year.

BIKE_MODEL MANUF_YEAR COLOR

M2011 2008 White

M2001 2008 Black

M3001 2013 White

M3001 2013 Black

M4006 2017 White

M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on


BIKE_MODEL and independent of each other.

In this case, these two columns can be called as multi valued dependent on
BIKE_MODEL. The representation of these dependencies is shown below:

BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR

Page 5 of 14
COMPILED BY: SANDIP GHOSHAL
Fourth normal form (4NF):

Fourth normal form (4NF) is a level of database normalization where there


are no non-trivial multi valued dependencies other than a candidate key. It
builds on the first three normal forms (1NF, 2NF and 3NF) and the Boyce-
Codd Normal Form (BCNF). It states that, in addition to a database
meeting the requirements of BCNF, it must not contain more than one
multivalued dependency.

Properties – A relation R is in 4NF if and only if the following conditions


are satisfied:

1. It should be in the Boyce-Codd Normal Form (BCNF).


2. The table should not have any Multi-valued Dependency.

Join Dependency

Join decomposition is a further generalization of Multi valued


dependencies. If the join of R1 and R2 over C is equal to relation R, then
we can say that a join dependency (JD) exists. Where R1 and R2 are the
decompositions R1 (A, B, C) and R2 (C, D) of a given relations R (A, B, C,
D).

Fifth Normal Form / Projected Normal Form (5NF/PJNF):

A relation R is in 5NF if and only if every join dependency in R is implied by


the candidate keys of R. A relation decomposed into two relations must
have loss-less join Property, which ensures that no spurious or extra tuples
are generated, when relations are reunited through a natural join.

Properties – A relation R is in 5NF if and only if it satisfies following


conditions:

1. R should be already in 4NF.


2. It cannot be further non loss decomposed (join dependency)

Page 6 of 14
COMPILED BY: SANDIP GHOSHAL
A spurious tuple is, basically, a record in a database that gets created
when two tables are joined badly. In database-ese, spurious tuples
are created when two tables are joined on attributes that are neither
primary keys nor foreign keys.

Types of Data Base

1. Non Temporal or Time Stamp


2. Temporal
A. Uni – Temporal
B. Bi – Temporal
C. Tri - Temporal

Temporal database

A temporal database stores data relating to time instances. It offers


temporal data types and stores information relating to past, present and
future time. Temporal databases could be uni-temporal, bi-temporal or tri-
temporal.

More specifically the temporal aspects usually include valid time,


transaction time or decision time

• Valid time is the time period during which a fact is true in the real
world. It consists of VST [Valid Start Time] & VET [Valid End Time].
• Transaction time is the time period during which a fact stored in the
database was known. It consists of TST [Transaction Start Time] &
TET [Transaction End Time].
• Decision time is the time period during which a fact stored in the
database was decided to be valid.

Uni-Temporal

A uni-temporal database has one axis of time, either the validity range or
the system time range.

Page 7 of 14
COMPILED BY: SANDIP GHOSHAL
Bi-Temporal

A bi-temporal database has two axis of time.

• valid time. [VST & VET]


• transaction time or decision time. [TST & TET]

Tri-Temporal

A tri-temporal database has three axes of time.

• valid time.
• transaction time
• decision time.

sixth normal form

A R [table] is in sixth normal form (abbreviated 6NF) if and only if it


satisfies no nontrivial join dependencies at all — where, as before, a join
dependency is trivial if and only if at least one of the projections (possibly
U_projections) involved is taken over the set of all attributes of the R [table]
concerned.

Domain-key normal form (DK/NF)

Domain-key normal form (DK/NF) is a normal form used in database


normalization which requires that the database contains no constraints
other than domain constraints and key constraints.

A domain constraint specifies the permissible values for a given attribute,


while a key constraint specifies the attributes that uniquely identify a row in
a given table.

The domain/key normal form is achieved when every constraint on the


relation is a logical consequence of the definition of keys and domains, and
enforcing key and domain restraints and conditions causes all constraints
to be met. Thus, it avoids all non-temporal anomalies.

Page 8 of 14
COMPILED BY: SANDIP GHOSHAL
Example: -

Example of 1NF; 2NF; 3NF & BCNF (3.5 NF)

TABLE1
{ROLLNO, DEPT_NO, HOD, DEPT_NAME, S_NAME, STATUS, SUB, SECTION}

FDs =>

{ROLLNO, DEPT_NO, HOD} →{DEPT_NAME, S_NAME, STATUS, SUB, SECTION}


{DEPT_NO, HOD} →{DEPT_NAME, STATUS} ------- [Partial Dependency]
{S_NAME}→ {SECTION} ------------------------------------ [Transitive Dependency]
{DEPT_NAME} →{HOD}

[Assuming HOD & S_NAME are atomic]

1NF => Remove SUB as repeating group / non atomic attribute.

TABLE2
{ROLLNO, DEPT_NO, HOD, DEPT_NAME, S_NAME, STATUS, SECTION}

TABLE3
{ROLLNO, DEPT_NO, HOD, SUB}

2NF => Remove PARTIAL DEPENDENCY.

TABLE3
{ROLLNO, DEPT_NO, HOD, SUB}

TABLE4
{ROLLNO, DEPT_NO, HOD, S_NAME, SECTION}

TABLE5
{ DEPT_NO, HOD, DEPT_NAME, STATUS}

Page 9 of 14
COMPILED BY: SANDIP GHOSHAL
3NF => Remove TRANSITIVE DEPENDENCY.

TABLE3
{ROLLNO, DEPT_NO, HOD, SUB}

TABLE5
{ DEPT_NO, HOD, DEPT_NAME, STATUS}

TABLE6
{ROLLNO, DEPT_NO, HOD, S_NAME}

TABLE7
{S_NAME, SECTION}

BCNF (3.5 NF) => Remove “Non Key Determinants”.

TABLE3
{ROLLNO, DEPT_NO, HOD, SUB}

TABLE6
{ROLLNO, DEPT_NO, HOD, S_NAME}

TABLE7
{S_NAME, SECTION}

TABLE8
{ DEPT_NO, HOD, STATUS}

TABLE9
{DEPT_NAME, HOD}

Page 10 of 14
COMPILED BY: SANDIP GHOSHAL
TABLE1 ------------------------------------------------- UNF

TABLE2 TABLE3 ----------------------- 1NF

TABLE4 TABLE5 TABLE3 ---------------------- 2NF

TABLE6 TABLE7 TABLE5 TABLE3 ---------------------- 3NF

TABLE6 TABLE7 TABLE8 TABLE9 TABLE3 ---------------------BCNF

Example of 4NF; 5NF; Spurious Tuples

TABLE1

ROLLNO NAME SUBJECT MARKS


1 AB DBMS 30
1 AB C 29
2 CD DBMS 29
1 AB DBMS 29

FDs =>

{ROLLNO } → {NAME} ----------------------------------- SVD

{ROLLNO } → →{SUBJECT}
{ROLLNO } → → {MARKS} --------------------- MVD
{SUBJECT } → → {MARKS}
[Assuming NAME is atomic]

Page 11 of 14
COMPILED BY: SANDIP GHOSHAL
4NF => Separate SVD & MVD.

TABLE2 {ROLLNO, NAME}


ROLLNO NAME
1 AB
2 CD
TABLE3 {ROLLNO, SUBJECT, MARKS}
ROLLNO SUBJECT MARKS
1 DBMS 30
1 C 29
2 DBMS 29
1 DBMS 29

5NF / PJNF => Remove JD. Separate all MVDs.

TABLE2 {ROLLNO, NAME}


ROLLNO NAME
1 AB
2 CD
TABLE4 {ROLLNO, SUBJECT}
ROLLNO SUBJECT
1 DBMS
1 C
2 DBMS
TABLE5 {ROLLNO, MARKS}
ROLLNO MARKS
1 30
1 29
2 29
TABLE6 { SUBJECT, MARKS}
SUBJECT MARKS
DBMS 30
C 29
DBMS 29
Page 12 of 14
COMPILED BY: SANDIP GHOSHAL
TABLE1

TABLE2 TABLE3 ----------------------- 4NF

TABLE2 TABLE4 TABLE5 TABLE6 ---------------------- 5NF

EX. OF SPURIOUS TOUPLE.

During normalizing into 5NF if any one decomposes TABLE3 into TABLE4
& TABLE5 [As those two tables cover all the attributes of TABLE3] only then that will be
a bad decomposition or wrong normalization which may cause a problem of
spurious touple.

After joining TABLE4 & TABLE5 we get following set of data which is not
same as TABLE3.

TABLE4 TABLE5

ROLLNO SUBJECT MARKS


1 DBMS 30
1 DBMS 29
1 C 30
1 C 29
2 DBMS 29

The 3rd row is an extra information or spurious touple.

But if we join TABLE4, TABLE5 & TABLE6 then, we get same data as
TABLE3.

So, spurious touple may caused by bad or wrong normalization.

Page 13 of 14
COMPILED BY: SANDIP GHOSHAL
Example of DK/NF

TABLE1
{ROLLNO, NAME, FEES_PAID, SECTION}

FDs =>

{ROLLNO } → {NAME, FEES_PAID, SECTION}

[Assuming NAME is atomic]

List of Implicit Constraint and Explicit Constraint / Domain Constraint

ROLLNO → Primary Key


NAME → Char(50) Implicit Constraint
FEES_PAID → {‘Yes’ or ‘No’} Explicit Constraint
SECTION → {‘A’ or ‘B’ or ‘K1’ or ‘K2’} Domain Constraint

DK/NF => Separate Implicit and Explicit Constraint.

TABLE2
{ROLLNO, NAME}

TABLE3
{ROLLNO, FEES_PAID, SECTION}

TABLE1

TABLE2 TABLE3 -------------------- DK/NF


Page 14 of 14
COMPILED BY: SANDIP GHOSHAL

You might also like