Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

Normalization

Prepared by Caselyn F. Silvestre


The Need for Normalization
Normalization
We discuss four normal forms: first, second, third, and
Boyce-Codd normal forms
1NF, 2NF, 3NF, and BCNF

Normalization is a process that “improves” a database


design by generating relations that are of higher normal
forms.

The objective of normalization:


“to create relations where every dependency is on the key,
the whole key, and nothing but the key”.
Normalization
There is a sequence to normal forms:
1NF is considered the weakest,
2NF is stronger than 1NF,
3NF is stronger than 2NF, and
BCNF is considered the strongest
\
Also,
any relation that is in BCNF, is in 3NF;
any relation in 3NF is in 2NF; and
any relation in 2NF is in 1NF.
The Normalization Process
• 1. Each table represents a single subject.
• Ex. A course table will contain only data that directly pertain to courses.
Similarity, a student table will contain only student data.
• 2. NO data item will be unnecessarily stored in more than one table.
This requirement is to ensure the that data are updated in only one
place.
• 3. All nonprime attributes are dependent on the primary key, the
entire primary key and nothing but the primary key to have uniquely
identify each value.
• 4. Each table is void, update or deletion anomalies, which ensure
integrity and consistency of the data.
Insert report layout here
Normal Forms
NORMAL FORM CHARACTERISTICS
First Normal form (1NF) Table format, no repeating groups, and PK identified
Second Normal form (2NF) 1NF and no partial dependencies
Third Normal form (3NF) 2NF and no transitive dependencies
Boyce-Codd Normal form (BCNF) Every determinant is a candidate key (special case of 3NF)
Fourth normal form (4NF) 3NF and no independent multivalued dependencies
Functional dependencies

EmpNum is the determinant attribute and EmpEmail is the dependent attribute


EmpNum  EmpEmail
EmpNum  EmpFname
EmpNum  EmpLname

EmpEmail
EmpNum EmpFname

EmpLname

EmpNum EmpEmail EmpFname EmpLname

91.2914 8
Transitive dependency
DeptName is transitively dependent on EmpNum via DeptNum
EmpNum  DeptName

EmpNum  DeptNum

EmpNum EmpEmail DeptNum DeptNname

DeptNum  DeptName

EmpNum EmpEmail DeptNum DeptNname


CONVERSION TO FIRST NORMAL FORM
CRITERIA 1: Eliminate the Repeating Groups by eliminating the NULLs
PROJ_NUM PROJ_NAME EMP_NUM EMP_NAME JOB_CLASS CHG_HO HOURS

15
Table in 1st Normal Form
Evergreen 103 June E. Arbough Elect. Engineer
UR
84.50 23.8
15 Evergreen 101 John G. News Database Designer 105.00 19.4
15 Evergreen 105 Alice K. Johnson Database Designer 105.00 35.7
15 Evergreen 106 William Smithfield Programmer 35.75 12.8
15 Evergreen 102 David H. Senior System Analyst 96.75 23.8
18 Amber Wave 114 Annelise Jones Applications Designer 48.10 24.6
18 Amber Wave 118 James J. Frommer General Support 18.36 45.3
18 Amber Wave 104 Anne K. Ramoras System Analyst 96.75 32.4
18 Amber Wave 112 Darlene M. Smithson DSS Analyst 45.95 44.0
22 Rolling Tide 105 Alice K. Johnson Database designer 105.00 35.7
22 Rolling Tide 104 Anne K. Ramoras System Analyst 96.75 23.8
22 Rolling Tide 113 Delbert K. Joenbrood Applications Designer 48.10 24.6
22 Rolling Tide 111 Geoff B. Wabash Clerical Support 26.87 22.0
22 Rolling Tide 106 William Smithfield Programmer 35.75 24.6
25 Starflight 107 Maria D. Alonzo Programmer 35.75 24.6
25 Starflight 115 Travis B. Bawangi System Anlayst 96.75 45.8
25 Starflight 101 John G. News Database Designer 105.00 56.3
25 Starflight 114 Annelise Jones Applications Designer 48.10 33.1
25 Starflight 108 Raplh B. Washington System Analyst 96.75 23.6
CONVERSION TO FIRST NORMAL FORM
CRITERIA 2: Identify the Primary Key
PROJ_NUM PROJ_NAME EMP_NUM EMP_NAME JOB_CLASS CHG_HO HOURS

15
Table in 1st Normal Form
Evergreen 103 June E. Arbough Elect. Engineer
UR
84.50 23.8
15 Evergreen 101 John G. News Database Designer 105.00 19.4
15 Evergreen 105 Alice K. Johnson Database Designer 105.00 35.7
15 Evergreen 106 William Smithfield Programmer 35.75 12.8
15 Evergreen 102 David H. Senior System Analyst 96.75 23.8
18 Amber Wave 114 Annelise Jones Applications Designer 48.10 24.6
18 Amber Wave 118 James J. Frommer General Support 18.36 45.3
18 Amber Wave 104 Anne K. Ramoras System Analyst 96.75 32.4
18 Amber Wave 112 Darlene M. Smithson DSS Analyst 45.95 44.0
22 Rolling Tide 105 Alice K. Johnson Database designer 105.00 35.7
22 Rolling Tide 104 Anne K. Ramoras System Analyst 96.75 23.8
22 Rolling Tide 113 Delbert K. Joenbrood Applications Designer 48.10 24.6
22 Rolling Tide 111 Geoff B. Wabash Clerical Support 26.87 22.0
22 Rolling Tide 106 William Smithfield Programmer 35.75 24.6
25 Starflight 107 Maria D. Alonzo Programmer 35.75 24.6
25 Starflight 115 Travis B. Bawangi System Anlayst 96.75 45.8
25 Starflight 101 John G. News Database Designer 105.00 56.3
25 Starflight 114 Annelise Jones Applications Designer 48.10 33.1
25 Starflight 108 Raplh B. Washington System Analyst 96.75 23.6
CONVERSION TO FIRST NORMAL FORM
CRITERIA 3: Identify All Dependencies
FIRST NORMAL FORM DEPENDENCY DIAGRAM

PROJ_NUM PROJ_NAME EMP_NUM EMP_NAME JOB_CLASS CHG_HOUR HOURS

Transitive dependency
Partial dependency Partial dependency
1NF (PROJ_NUM, EMP_NUM, PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOURS, HOURS)
PARTIAL DEPENDENCIES
(PROJ_NUM PROJ_NAME) PROJ_NUM is the determinant; PROJ_NAME is dependent attri..
(EMP_NUM EMP_NAME, JOB_CLASS, CHG_HOUR)
TRANSITIVE DEPENDENCIES
(JOB_CLASS CHG_HOUR)
SECOND NORMAL FORM CONVERSTION RESULT
Table name: PROJECT
PROJECT (PROJ_NUM, PROJ_NAME)

PROJ_NUM PROJ_NAME

EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)


Table name: EMPLOYEE TRANSITIVE DEPENDENCY (JOB_CLASS CHG_HOUR)

EMP_NUM EMP_NAME JOB_CLASS CHG_HOUR

Transitive dependency
Table name: ASSIGNMENT

PROJ_NUM EMP_NUM ASSING_HOURS ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)

CRITERIA 1: Make new table to eliminate Partial dependencies


CRITERIA 2: Reassign corresponding dependent attributes
THIRD NORMAL FORM CONVERSTION RESULT

PROJ_NUM PROJ_NAME EMP_NUM EMP_NAME JOB_CLASS

Table name: PROJECT Table name: EMPLOYEE


PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)

PROJ_NUM EMP_NUM ASSIGN_HOURS JOB_CLASS CHG_HOUR


Table name: ASSIGNMENT Table name: JOB
ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS) JOB (JOB_CLASS, CHG_HOUR)

CRITERIA 1: Make new table to eliminate Transitive dependencies


CRITERIA 2: Reassign corresponding dependent attributes
IMPROVING DATABASE DESIGN
1. Evaluate PK Assignments
Each time a new employee is entered into the
EMPLOYEE table, a JOB_CLASS value must be
entered. Unfortunately, if we will enter DB
Designer and Database Designer it triggers
violation.
Solution:
JOB_CODE  JOB_CLASS, CHG_HOUR
2. Evaluate Naming Conventions
Therefore, CHG_HOUR will be changed to
JOB_CHG_HOUR to indicate association to JOB table.

JOB_CLASS  JOB_DESCRIPTION in the JOB table


HOURS  ASSIGN_HOURS in the ASSIGNMENT table
3. Refine Attribute Atomicity
Easily generate report by sorting
EX. EMP_NAME  EMP_FNAME, EMP_LNAME, EMP_INITIAL

4. Identifying New Attributes


If the EMPLOYEE table were used in real-world environment,
other attributes would have be added.
EX. EMP_HIREDATE, EMP_MEDICAREPAYMENT, EMP_SALARYPAYMENT
5. Identifying New Relationship
According to the original report, the user needs to track
which employee is acting as the manager of each project.
This can only be implemented as a relationship between
EMPLOYEE and PROJECT.
Each PROJECT has only one MANAGER.
Thus ensure the EMP_NUM as the foreign key in PROJECT.
This action ensure that access PROJECT’s manager data
without producing unnecessary and undesirable duplication.
6. Refine Primary Key as Required for Data Granularity
Granularity – refers to the level of detail
represented by the values stored in a table’s row.
EX.
The ASSIGNMENT table, attribute ASSIGN_HOURS
can be stored at their lowest granularity such as
hourly total, daily total, weekly total, monthly total or
yearly total.
For what time frame? Hour, day, week,?
7. Maintain Historical Accuracy –
Writing the job charge per hour into the ASSIGNMENT table
is crucial to maintaining the historical accuracy of the table’s
data.
It would be appropriate to name the attribute
ASSIGN_CHG_HOUR rather than JOB_CHG_HOUR because the
value of job charge per hour will change over time.
8. Evaluate Using Derived Attributes –
You can derived attribute in the ASSIGNMENT table to
store the actual charge made to a project.
The derived attribute is ASSIGN_CHARGE, is the result of
multiplying ASSIGN_HOURS by ASSIGN_CHG_HOUR.
This creates a transitive dependency such that:
(ASSIGN_CHARGE + ASSIGN_HOURS)  ASSIGN_CHG_HOUR
The Complete Database
PROJ_NUM PROJ_NAME EMP_NUM

PROJ_NUM PROJ_NAME EMP_NUM


15 Evergreen 105
18 Amber Wave 104
22 Rolling Tide 113
25 Starflight 101

Table name: PROJECT


JOB_CODE JOB_DESCRIPTION JOB_CHG_HOUR

JOB_CODE JOB_DESCRIPTION JOB_CHG_HOUR


500 Programmer 35.75
501 Systems Analyst 96.75
502 Database Designer 105.00
503 Electrical Engineer 84.50
504 Mechanical Engineer 67.90
505 Civil Engineer 55.78
506 Clerical Support 26.87
507 DSS Analyst 45.95
508 Applications Designer 48.10
509 Bio Technician 34.55
510 General Support 18.36

Table name: JOB


ASSIGN_NUM ASSIGN_DATE PROJ_NUM EMP_NUM ASSIGN_HOURS ASSIGN_CHG_HOUR ASSIGN_CHARGE

ASSIGN_NUM ASSIGN_DATE PROJ_NUM EMP_NUM ASSIGN_HOURS ASSIGN_CHG_HOUR ASSIGN_CHARGE


1001 04-Mar-12 15 103 2.6 84.50 219.70
1002 04-Mar-12 18 118 1.4 18.36 25.70
1003 05-Mar-12 15 101 3.6 105.00 378.00
1004 05-Mar-12 22 113 2.5 48.10 120.25
1005 05-Mar-12 15 103 1.9 84.50 160.55
1006 05-Mar-12 25 115 4.2 96.75 406.35
1007 05-Mar-12 22 105 5.2 105.00 546.00
1008 05-Mar-12 25 101 1.7 105.00 178.50
1009 05-Mar-12 15 105 2.0 105.00 210.00
1010 06-Mar-12 15 102 3.8 96.75 367.65
1011 06-Mar-12 22 104 2.6 96.75 251.55
1012 06-Mar-12 15 101 2.3 105.00 241.50
1013 06-Mar-12 25 114 1.8 48.10 86.58

Table name: ASSIGNMENT


EMP_NUM EMP_LNAME EMP_FNAME EMP_INITIAL EMP_HIREDATE JOB_CODE

EMP_NUM EMP_LNAME EMP_FNAME EMP_INITIAL EMP_HIREDATE JOB_CODE

101 News John G 08-Nov-00 502

102 Senior David H 12-Jul-89 501

103 Arbough June E 01-Dec-97 503

104 Ramoras Anne K 15-Nov-88 501

105 Johnson Alice K 01-Feb-94 502

106 Smithfield William L 22-Jun-05 500

107 Olenko Gerald A 18-Jul-99 500

108 Wabash Geoff B 11-Dec-96 501

109 Smithson Darlen M 04-Apr-89 501

Table name: EMPLOYEE


Surrogate Key Considerations

When primary key is considered to be unsuitable, designers use surrogate keys


Data entries in Table 5.3 are inappropriate because they duplicate existing records
-Yet there has been no violation of either entity integrity or referential integrity
***END***

You might also like