Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

NORMALIZATION

F7S / MIT

September 26, 2006

Nasrullah Memon

Objectives of Normalization
• Develop a good description of the data, its
relationships and constraints
• Produce a stable set of relations that
• Is a faithful model of the enterprise
• Is highly flexible
• Reduces redundancy-saves space and reduces
inconsistency in data
• Is free of update, insertion and deletion
anomalies

1
The Purpose of Normalization

Normalization is a technique for producing a set of


relations with desirable properties, given the data
requirements of an enterprise.

The process of normalization is a formal method that


identifies relations based on their primary or candidate
keys and the functional dependencies among their
attributes.

Anomalies
• An anomaly is an inconsistent, incomplete,
or contradictory state of the database
• Insertion anomaly – user is unable to insert a new
record when it should be possible to do so
• Deletion anomaly – when a record is deleted,
other information that is tied to it is also deleted
• Update anomaly –a record is updated, but other
appearances of the same items are not updated

2
Redundancy

Dependencies between attributes within a relation cause


redundancy.

Example: All addresses in same town have the same postal


code.
SSN Name Town PostCode

1234 Kim Tjæreborg 6731


2334 Ann Beech Tjæreborg 6731
3456 David Ford Tjæreborg 6731

There is clearly redundant information stored here.


Consistency and integrity are harder to maintain even in this
simple example.
5

Redundancy and Other Problems


•Set-valued or multi-valued attributes in the ER diagram result in
multiple rows in corresponding table.

Example: Person (SSN, Name, Adrress, Hobbies)


A person entity with multiple hobbies yields multiple rows in table Person
Hence, the association between Name and Address for the same person is
stored redundantly.
SSN is the key of the entity set, but (SSN, Hobby) is the key of the
corresponding relation
SSN Name Addresss Hobbies

1234 Kim Tjæreborg hiking


1234 Kim Tjæreborg biking

3456 David Ford Tjæreborg lacross

3
Decomposition
Solution: Use two relations to store Person information
Person(SSN, NAme, Address)
Hobbies(SSN, Hobby)
The decomposition is more general: people with hobbies now can be
described.
No update anomalies: Name and address stored once
A hobby can be separately supplied or deleted
Decomposition is the process of breaking a relation into two or more
relations to eliminate the redundancies and corresponding anomalies

Functional Dependencies

Functional dependency describes the relationship between


attributes in a relation.
For example, if A and B are attributes of relation R, and B is
functionally dependent on A ( denoted A B), if each value of
A is associated with exactly one value of B. ( A and B may each
consist of one or more attributes.)

B is functionally
A B
dependent on A

Determinant Refers to the attribute or group of attributes on the


left-hand side of the arrow of a functional
dependency 8

4
Functional Dependencies (2)
Trival functional dependency means that the right-hand
side is a subset ( not necessarily a proper subset) of the left-
hand side.
For example:
staffNo, sName Æ sName
staffNo, sName Æ staffNo

They do not provide any additional information about possible integrity


constraints on the values held by these attributes.

We are normally more interested in nontrivial dependencies because they


represent integrity constraints for the relation.

Functional Dependencies (3)


Main characteristics of functional dependencies in normalization

• Have a one-to-one relationship between attribute(s) on the


left- and right- hand side of a dependency;

• hold for all time;

• are nontrivial.

10

5
Functional Dependencies (4)

Identifying the primary key

Functional dependency is a property of the meaning or


semantics of the attributes in a relation. When a functional
dependency is present, the dependency is specified as a
constraint between the attributes.

An important integrity constraint to consider first is the


identification of candidate keys, one of which is selected to
be the primary key for the relation using functional dependency.

11

Functional Dependencies (5)


Inference Rules
A set of all functional dependencies that are implied by a given
set of functional dependencies X is called closure of X, written
X+. A set of inference rule is needed to compute X+ from X.
Armstrong’s axioms

1. Relfexivity: If B is a subset of A, them A Æ B


2. Augmentation: If A Æ B, then A, C Æ B
3. Transitivity: If A Æ B and B Æ C, then AÆ C
4. Self-determination: AÆA
5. Decomposition: If A Æ B,C then A Æ B and AÆ C
6. Union: If A Æ B and A Æ C, then AÆ B,C
7. Composition: If A Æ B and C Æ D, then A,CÆ B,
12

6
Functional Dependencies (6)

Minimal Sets of Functional Dependencies


A set of functional dependencies X is minimal if it satisfies
the following condition:
• Every dependency in X has a single attribute on its
right-hand side

• We cannot replace any dependency A Æ B in X with


dependency C ÆB, where C is a proper subset of A, and
still have a set of dependencies that is equivalent to X.

• We cannot remove any dependency from X and still


have a set of dependencies that is equivalent to X.
13

Functional Dependencies (7)

Example of A Minimal Sets of Functional Dependencies

A set of functional dependencies for the StaffBranch relation


satisfies the three conditions for producing a minimal set.

staffNo Æ sName
staffNo Æ position
staffNo Æ salary
staffNo Æ branchNo
staffNo Æ bAddress
branchNo Æ bAddress
branchNo, position Æ salary
bAddress, position Æ salary

14

7
The Process of Normalization

• Normalization is often executed as a series of steps. Each step


corresponds to a specific normal form that has known properties.

• As normalization proceeds, the relations become progressively


more restricted in format, and also less vulnerable to update
anomalies.

• For the relational data model, it is important to recognize that


it is only first normal form (1NF) that is critical in creating
relations. All the subsequent normal forms are optional.

15

The Process of Normalization


• Usually three steps (in industry) giving rise to
– First Normal Form (1NF)
– Second Normal Form (2NF)
– Third Normal Form (3NF)
• In academia
– Boyce -Codd Normal Form (BCNF)
– Fourth Normal Form (4NF)
• At each step we consider relationships between
an entity's attributes
– These relationships are known as functional
dependencies
16

8
Data Normalization
UNORMALISED ENTITY

step1 ... remove repeating groups

1st NORMAL FORM

step2 ... remove partial dependencies

2nd NORMAL FORM

step3 ... remove indirect dependencies

3rd NORMAL FORM

step4 ... remove multi-dependencies step4 ..every determinate a key

4th NORMAL FORM BOYCE-CODD NORMAL FORM


17

Example
ORDER NUMBER 1023

SUPPLIER NUMBER 500028

ORDER DATE 09/05/88

DELIVERY DATE 25/07/88

PART NO. PART-DESC QTY-ORD PRICE

O463 Hook 150 15.00

1492 Bolt 1000 10.00

3164 Spanner 10 5.00


TOTAL 30.00

PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATE


DELIVERY-DATE, (PART#, PART-
DESCRIPTION,QUANTITY-ORDERED, PRICE), TOTAL-
PRICE)
18

9
First Normal Form
• An entity type is in 1NF if there are no repeating
groups of attribute types
• Any un-normalised entity type is transformed to
1NF
– Remove all repeating attribute groups
– Repeating attribute groups become new entity
types in their own right
– The identifier of the original entity type must
be an attribute (but not necessarily an
identifier) of the derived entity type.
19

Example of First Normal Form


ORDER NUMBER 1023
SUPPLIER NUMBER 500028

ORDER DATE 09/05/88

DELIVERY DATE 25/07/88

PART NO. PART-DESC QTY-ORD PRICE

O463 Hook 150 15.00


1492 Bolt 1000 10.00

3164 Spanner 10 5.00


TOTAL 30.00

UN-NORMALIZED ENTITY TYPE


PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATE
DELIVERY-DATE, (PART#, PART-DESCRIPTION,
QUANTITY-ORDERED, PRICE), TOTAL-PRICE)
20

10
Example in 1NF
ORDER NUMBER 1023
SUPPLIER NUMBER 500028

ORDER DATE 09/05/88

DELIVERY DATE 25/07/88

PART NO. PART-DESC QTY-ORD PRICE

O463 Hook 150 15.00


1492 Bolt 1000 10.00
3164 Spanner 10 5.00
TOTAL 30.00

ENTITY TYPES IN 1NF


PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATE
DELIVERY-DATE, TOTAL-PRICE)
PURCHASE-ITEM-1 ( ORDER#, PART#, PART-DESCRIPTION,
QUANTITY-ORDERED, PRICE)

[NOTE: PART# ALONE DOES NOTE IDENTIFY PURCHASE-ITEM]

21

Example
REGISTRATION FORM

STUDENT NUMBER S0843215

STUDENT NAME P. Smith

STUDENT ADDRESS 1, South Downs


Hale

COURSE NO COURSE TUTOR NAME TUTOR NO


PM951 Computing T. Long 037428

S212 Biology S. Short 096524

STUDENT (Student#, student-name, student-address)

ENROLMENT (Student#, Course#, course-title,


tutor-name,tutor-staff#)

22

11
Benefits from 1ST Normal Form
• Any 'hidden' entities are identified
• Process results in separation of different objects
• BUT anomalies may still exist
PURCHASE-ITEM-1( ORDER#, PART#, PART-
DESCRIPTION,QUANTITY-ORDERED, PRICE)
– PART-DESCRIPTION appears on every
PURCHASE-ITEM occurence.
– This may result in anomalies when updating or
deleting records
– The problem in the example is that PART-
DESCRIPTION is functionally dependent only on
PART# (part of the identifier)
23

Second Normal Form

• An enity type is in 2NF if it is in 1NF and


each non identifying attribute depends
upon the whole identifier
• Any enity type in 1NF is transformed to
2NF
– Identify functional dependencies
– Re-write entity types so that each non-
identifying attribute is functionally
dependent on the whole of the identifier
24

12
Example
ORDER NUMBER 1023
SUPPLIER NUMBER 500028

ORDER DATE 09/05/88

DELIVERY DATE 25/07/88

PART NO. PART-DESC QTY-ORD PRICE

O463 Hook 150 15.00


1492 Bolt 1000 10.00
3164 Spanner 10 5.00
TOTAL 30.00

ENTITY TYPES IN 1NF


PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATE
DELIVERY-DATE, TOTAL-PRICE)
PURCHASE-ITEM-1 ( ORDER#, PART#, PART-DESCRIPTION,
QUANTITY-ORDERED, PRICE)

25

Functional Dependencies
PURCHASE-ORDER (ORDER#, SUPPLIER#, ORDER-DATE
DELIVERY-DATE, TOTAL-PRICE)
PURCHASE-ITEM-1 ( ORDER#, PART#, PART-DESCRIPTION,
QUANTITY-ORDERED, PRICE)

PRICE

ORDER#
QUANTITY-
ORDERED
PART#

PART-
DESCRIPTION

26

13
In 2nd Normal Form

• Decompose PURCHASE-ITEM into two entity types


PURCHASE-ITEM (Order#, Part#, Quantity-Ordered,
Price)
PART (Part#, Part-Description)
• Original enity type decomposed into three entity types in
2nd normal form
PURCHASE-ORDER (Order#,Supplier#, Order-Date,
Delivery-Date, Total-Price)
PURCHASE-ITEM (Order#, Part#,Quantity-Ordered, Price)
PART (Part#, Part-Description)

27

Example in 2NF
REGISTRATION FORM

STUDENT NUMBER S0843215

STUDENT NAME P. Smith

STUDENT ADDRESS 1, South Downs


Hale

COURSE NO COURSE TUTOR NAME TUTOR NO


PM951 Computing T. Long 037428

S212 Biology S. Short 096524

ENTITY TYPES IN 2NF


STUDENT (Student#,Student-Name, Student-Adderss)
ENROLMENT ( Student#, Course#, Tutor-Name, Tutor-Staff#)
COURSE (Course#, Course-Title)
28

14
Third normal Form

• An enity type is in 3NF if it is in 2NF and


all non identifying attributes are
independent
• Any enity type in 2NF is transformed in
3NF
– Determine functional dependencies between
non identifying attributes
– Decompose enity into new entities

29

Example
REGISTRATION FORM

STUDENT NUMBER S0843215

STUDENT NAME P. Smith

STUDENT ADDRESS 1, South Downs


Hale

COURSE NO COURSE TUTOR NAME TUTOR NO

PM951 Computing T. Long 037428

S212 Biology S. Short 096524

ENTITY TYPES IN 2NF


STUDENT (Student#,Student-Name, Student-Adderss)
ENROLMENT ( Student#, Course#, Tutor-Name, Tutor-Staff#)
COURSE (Course#,, Course-Title)
30

15
Functional Dependencies

STUDENT (Student#,Student-Name, Student-Adderss)


ENROLMENT ( Student#, Course#, Tutor-Name,
Tutor-Staff#)
COURSE (Course#,, Course-Title)

Tutor-staff#

Student#

Course#

Tutor-name

31

Example in 3NF
REGISTRATION FORM

STUDENT NUMBER S0843215

STUDENT NAME P. Smith

STUDENT ADDRESS 1, South Downs


Hale

COURSE NO COURSE TUTOR NAME TUTOR NO


PM951 Computing T. Long 037428

S212 Biology S. Short 096524

ENTITY TYPES IN 3NF


STUDENT (Student#,Student-Name, Student-
Adderss)
ENROLMENT ( Student#, Course#, Tutor-Staff#)
COURSE (Course#,, Course-Title)
TUTOR (Tutor-Staff#, Tutor-Name)
32

16
Boyce-Codd Normal Form (BCNF)

• A relation is in BCNF if every determinate is a


candidate key
• For a relation with only one candidate key , 3NF
and BCNF are equivalent
• Violation of BCNF is rare, may occur in a
relation that :
contains two (or more) composite candidate
keys and which overlap, that is share at least one
attribute in common

33

CLIENT_INTERVIEW BCNF
Interview Interview
Client_no Date Time Staff_no Room_no

CR76 13-May-95 10.30 SG G101


5
CR56 13-May-95 12.00 SG5 G101

CR74 13-May-95 12.00 SG37 G102

CR56 10-Jun-95 10.00 SG5 G102

The following FDs hold :


Client_No,Interview_Date ->Interview_time,Staff_no,Room_no
Staff_no,Interview_Date,Interview_time -> Client_no
Staff_no,Interview_date -> Room_no

Client_no,Interview_date and Staff_no,Interview_date are


composite candidate keys that share the common attribute
Interview_date 34

17
BCNF
The relation CLIENT_INTERVIEW is in 3NF but not BCNF

To transform to BCNF:
Remove the violating FD and create two relations:

INTERVIEW (Client_no, Interview_date, Interview_time, Staff_no


STAFF_ROOM (Staff_no,Interview_date,Room_no)

35

Fourth Normal Form

• An entity type is in 4NF if it is in 3NF and


there are no multivalued dependencies
between its attribute types
• Any entity type in 3NF is transformed to
4NF
– Detect any multivalued dependencies
– Decompose entity type

36

18
Multivalued Dependencies - 1
AUTHOR_NO BOOK_NO SUBJECT BOOK_TITLE AUTHOR_NAME

A1 B1 Comp. Sc. Methods Jones

A1 B1 Maths Method Jones

A2 B1 Comp. Sc. Methods Smith

A2 B1 Maths Methods Smith

A3 B2 Maths Calculus Brown

IN 3rd NORMAL FORM


author_name
author_no AUTHOR (Author_no, Author-name)
book_no BOOK (Book_no, Book-_title)
book_title
AUTHOR-BOOK-SUBJECT
subject
(Author_no, Book_no, Subject)

37

Multivalued Dependencies - 2
• Example models that "each AUTHOR is associated
with all the SUBJECTS under which the BOOK is
classified"
• The attribute SUBJECT contains redundant values. If
SUBJECT were deleted from rows 1 & 2 the values
could be deduced from rows 3 & 4
Anomaly because the same set
AUTHOR_NO BOOK_NO SUBJECT of SUBJECT is associated with
A1 B1 Comp. Sc. each AUTHOR of the same
BOOK
A1 B1 Maths

multidetermines
A2 B1 Comp. Sc.
BOOK_NO AUTHOR_NO

A2 B1 Maths

A3 B2 Maths BOOK_NO SUBJECT

38

19
Fourth Normal Form
AUTHOR_NO BOOK_NO SUBJECT
IN 4th NORMAL FORM
A1 B1 Comp. Sc.
AUTHOR (Author_no,
A1 B1 Maths Author_name)
A2 B1 Comp. Sc. BOOK (Book_no, Book_Title)
A2 B1 Maths AUTHOR-BOOK (Author_no,
A3 B2 Maths Book_no)
BOOK-SUBJECT (Book_no,
Subject)

AUTHOR_NO BOOK_NO BOOK_NO SUBJECT

A1 B1 B1 Comp. Sc.

A2 B1 B1 Maths

A3 B2 B2 Maths
39

Conclusions
• Data Normalization is a bottom-up technique
that ensures the basic properties of the
relational model
– no duplicate tuples
– no nested relations
• Data normalisation is often used as the only
technique for database design - implementation
view
• A more appropriate approach is to complement
conceptual modelling with data normalization
40

20

You might also like