Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Normalization

Dr. Shyamala Sivakumar, Ph.D., P.Eng., SMIEEE


Normalization is a process

 Learn distinctions between 1st, 2nd and 3rd normal forms


 Learn to diagram and resolve functional dependencies
 Data 1NF
 1NF 2NF
 2NF 3NF
 Identify and explain 3 types of anomalies
 Insertion
 Deletion
 Update
Data Normalization
 Decomposing relations with anomalies (problems) into smaller,
well-structured relations
 Validates and improves logical design to avoid unnecessary
duplication of data
 Good data modeling makes formal normalization easier
 You may find that your expertly designed tables require no
changes to remove anomalies, because you did such a good
job at improving your design through enterprise,
conceptual, and logical ERDs.
 Use normalization procedures to check your work and to
improve the data models provided to you by other
designers.
3
First Normal Form (1NF)

 Unique field names


 Unique rows (requires complete key)
 No multi-values attributes
 Every attribute value is atomic (no composite attributes)
 Order of rows and columns is irrelevant
Table with Multi-Valued Attributes: Not in 1NF

Note: this is NOT a relation


5
Discussion
 Why isn’t this a relation?
 What is the primary key? Order_ID and Product_ID.
 The 2 “records” represented above have a problem with
repeating values (see that there are 3 different values for
Product_ID, 3 different values for Product_Description,
etc.).
 Also, Customer_Address is a composite field. We can
assume that the only parts of the address to be recorded
are city and state.
First Normal Form (1NF)
 Unnormalized table

 Contains a repeating group

 Table in 1NF

 Contains no repeating groups

 Removal of repeating groups is starting point


in quest for problem-free tables
Now it is in 1NF

Note: this is a relation, but not a particularly well-structured one


8
Discussion
 To satisfy 1NF requirements: we duplicate the information contained in the first 5 columns
for those rows missing values.
 Note: Customer_Address should be split into two fields. This is an oversight in the
graphic
 Represented in DBDL form as:
 INVOICE (Order_ID, Order_Date, Customer_ID, Custome_Name, Customer_City,
Customer_State, Product_ID, Product_Description, Product_Finish, Unit_Price,
Ordered_Quantity)
 But there are still issues (anomalies) in this relation:
 Insertion–if new product is ordered for order 1007 of existing customer, customer data
must be re-entered, causing excessive duplication
 Deletion–if we delete the Dining Table from Order 1006, we lose information concerning
this item's finish and price
 Update–changing the price of product ID 4 requires update in several records
 Why do these anomalies exist?
 Because there are multiple themes (entity types) in one relation. This results in duplication and an
unnecessary dependency between the entities
Example: Is this relation in 1NF?

Are the field names unique?


Are the rows unique (is there a primary key)?
Are there any composite fields?
Are there any multi-valued fields?
Is the order of the rows or columns important for meaning of the
10 data?
Discussion
 This list represents data about employees who have taken professional training courses through their
employer.
 What’s the primary key?
 Answer–If employee can only take a course once, then it is composite: Emp_ID, Course_Title. But one employee
doesn’t have a Course_Title. We know that PK fields are always required.
 What happens if--
 Insertion–can’t enter a new employee without having the employee take a class; can’t add
a new class without having a student enrolled
 Deletion–if we remove employee 140, we lose information about the existence of a Tax Acc
class
 Modification–giving a salary increase to employee 100 forces us to update multiple records
 These are called anomolies→ problems with the structure of a relation
 Why do these anomalies exist?
 Because there are two themes (entity classes) in this one relation. This results in data duplication and an
unnecessary dependency between the entities
Second Normal Form (2NF)
 1NF Tables may contain problems
 Redundancy
 Update Anomalies
 Update, inconsistent data, additions, deletions
 Occur because a column is dependent on a
portion of a multi-column primary key
 2NF Table
 In1NF and every nonkey column is functionally
dependent on the entire primary key (more on the
next slide)
Second Normal Form
 1NF plus every non-key attribute is functionally
dependent on the ENTIRE primary key
 No partial dependencies
 Hint: If there is no composite key, the relation is
automatically in 2NF!
 “…the whole key…”
 When you diagram all dependency sets in the relation,
there should be no partial key determinants
 remember the determinant is on the left side of the
arrow.
 2NF is only in doubt if there is a composite key in the
relation.
13
Functional Dependencies

 Functional Dependency: The value of one attribute


(the determinant) allows us to identify the value of
other attribute(s)
 Diagram dependencies:
Determinant field(s) → list of dependent field(s)
Field A → Field B, Field C
This means:
If I know the value of Field A, I have enough
information to find the values of Fields B & C
14
Discussion
 The determinant is a field(s) that uniquely identifies the value of another field. You
must understand the definition and content of the fields to accurately normalize
the design.
 Refer to the relation on slide 10 (Employee2).
 If you find the Emp_ID in one column you can look in that row to find the
employee’s name, their department, and their salary.
 The Emp_ID is unique to the employee and in every row for that employee, they
have the same department and salary.
 Note that Emp_ID does NOT determine
 the value of Course_Title or
 Date-Completed,
 because the value of these fields may change on different rows for the same
Emp_ID.
 That is because the employee may take different courses and complete them on
different dates.
Update Anomalies
 Update
 Information is in multiple rows, difficult to update
 Inconsistent data
 Because of the duplication, a row that is not
updated causes inconsistency
 Additions
 Dummy records are required to add new unused
dependent rows
 Deletions
 Nonkey column (nonkey attribute) – when a
column is not a part of the primary key
Dependency Diagram

 Dependency diagram – uses arrows to indicate


all the functional dependencies present in a
table

 Partial dependencies – dependencies only on a


portion of the primary key
New Example: Partial Dependencies?

ORDER (Order_ID, Order_Date, Customer_ID, Customer_Name,


Customer_City, Customer_State, Product_ID, Product_

Description, Product_Finish, Unit_Price, Ordered_Quantity)

Removing Partial Dependencies


1. Move partial dependency sets to their own relation,
with the determinant as the primary key
2. Leave a copy of the determinant behind as a foreign
key
3. Rename all relations to be unique and meaningful
18
Partial dependencies in ORDER?
ORDER (Order_ID, Order_Date, Customer_ID, Custome_Name,
Customer_City, Customer_State, Product_ID, Product_

Description, Product_Finish, Unit_Price, Ordered_Quantity)

Diagram Partial Dependencies


Start with 1 field from the composite key

Order_ID ➔ Order_Date, Customer_ID,


Customer_Name, Customer_City,
Customer_State

19
Partial dependencies in ORDER?
ORDER(Order_ID, Order_Date, Customer_ID, Custome_Name,
Customer_City, Customer_State, Product_ID, Product_Description,

Product_Finish, Unit_Price, Ordered_Quantity)

Diagram Partial Dependencies


Then the other field from the composite key
Order_ID ➔ Order_Date, Customer_ID,
Customer_Name, Customer_City,
Customer_Address

Product_ID ➔ Product_Description,
Product_Finish, Unit_Price
20
Partial dependencies in ORDER?

ORDER(Order_ID (FK), Order_Date, Customer_ID, Custome_Name,


Customer_Address, Product_ID, Product_Description,

Product_Finish, Unit_Price, Ordered_Quantity)

Convert Partial Dependency into its own table &


leave new PK behind as a FK (repeat)
ORDER (Order_ID, Order_Date, Customer_ID,
Customer_Name, Customer_City, Customer_State)

21
Partial dependencies in ORDER?

ORDER(Order_ID (FK), Order_Date, Customer_ID, Custome_Name,


Customer_City, Customer_State, Product_ID (FK), Product_Description,

Product_Finish, Unit_Price, Ordered_Quantity)

Create New Tables from Partial Dependencies

ORDER (Order_ID, Order_Date, Customer_ID,


Customer_Name, Customer_City, Customer_State)

PRODUCT (Product_ID, Product_Description,


Product_Finish, Unit_Price)
22
Partial dependencies in ORDER?

ORDER(Order_ID (FK), Order_Date, Customer_ID, Custome_Name,


Customer_City, Customer_State, Product_ID (FK), Product_Description,

Product_Finish, Unit_Price, Ordered_Quantity)

Rename Tables to be Meaningful


ORDER (Order_ID, Order_Date, Customer_ID,
Customer_Name, Customer_City, Customer_State)
PRODUCT (Product_ID, Product_Description,
Product_Finish, Unit_Price)
ORDER OD_DETAIL (Order_ID (FK), Product_ID (FK),
Ordered_Quantity)
23
Functional dependencies
 Identify the functional dependencies (if you know the full primary key, you know
that all of the rest of the fields will be dependent on it)
 Start here: Order_ID, Product_ID ➔ all fields
 Order_ID ➔ Order_Date, Customer_ID, Customer_Name,
 Customer_City, Customer_State
 Product_ID ➔ Product_Description, Product_Finish, Unit_Price
 Therefore, NOT in 2nd Normal Form
 New relations:
 ORDER (Order_ID, Order_Date, Customer_ID, Customer_Name,
 Customer_City, Customer_State)
 PRODUCT (Product_ID, Product_Description, Product_Finish,
 Unit_Price)
 ORDER_DETAIL (Order_ID (FK), Product_ID (FK), Ordered_Qty)
Relations in 2NF

ORDER (Order_ID, Order_Date, Customer_ID,


Customer_Name, Customer_Address)

PRODUCT (Product_ID, Product_Description,


Product_Finish, Unit_Price)

OD_DETAIL (Order_ID (FK), Product_ID (FK),


Ordered_Quantity)

25
Third Normal Form (3NF)

 2NF Tables may still contain problems


 Redundancy and wasted space
 Update Anomalies
 Update, inconsistent data, additions, deletions
 Occur because a column is dependent on a
portion of a multi-column primary key
 3NF Table
 In2NF and the only determinants contained are
candidate keys
Relations in 3NF
 2NF plus every non-key attribute is functionally
dependent only on the ENTIRE primary key
 No transitive dependencies (transitive means field is
determined by a nonkey field)
1. Move transitive dependency sets to their own
relation, with the determinant as the primary key
2. Leave a copy of the determinant behind as a foreign
key
3. Rename all relations to be unique and meaningful
Relations in 3NF

ORDER (Order_ID, Order_Date, Customer_ID,


Customer_Name, Customer_Address)

PRODUCT (Product_ID, Product_Description,


Product_Finish, Unit_Price)
Transitive dependency –
OD_DETAIL (Order_ID (FK), Product_ID (FK), depends on Customer_ID
Ordered_Quantity)

Customer(Customer_ID,
Customer_Name, Customer_Address)

28
Problems with Incorrect Decomposition
 Decomposition must take place according to
that described for 3NF

 Even though you may decompose a table, you


run the risk of splitting the functional
dependence across different tables
Fourth Normal Form (4NF)
 3NF Tables may still contain problems
 Dependencies

 Update Anomalies
 Update, additions, deletions
 Occur because of multivalued dependencies
 4NF Table
 In3NF and has no multivalued dependencies (we
will not do much of this)
Summary: Normal Forms
Summary
 Normalization is a process of optimizing
databases to prevent update anomalies
 Normalization attempts to correct update
issues by eliminating duplication
 Duplication also creates inconsistency
 Insertions can violate database integrity if the
database is not normalized
 Deletions can violate database integrity if the
database is not normalized
Summary (con’t.)
 Normal Forms – First (1NF), Second (2NF),
Third(3NF), and Fourth(4NF)
 1NF has no repeating groups
 2NF is in 1NF and no non-key field is dependent
on only a portion of the primary key
 3NF is in 2NF and the only determinants are
candidate keys
Normalization Practice
A Normalization Practice
 Consider the following table structure holding
 The fields are described below:
all information about products (developed by
someone who didn’t know as much about ProdID Unique product ID number
database design as you do): CatID Unique product category ID number
 PRODUCT (ProdID, CatID, SupplierID, Color, SupplierID Unique supplier ID number
Qty, Price, Name, Desc, Loc) Color Color of the product (red, yellow, etc.)
 Consider the following business rules related Qty Number of units of the product in inventory
to this table: Price The price for each unit of the product
Name Company name of the supplier
 Each product is supplied by a single supplier, but
each supplier may provide multiple products. Desc Description of each product category
Loc Warehouse location for each category of product
 Each product is classified into a single product
category, but each product category may include
multiple products
 Each product is in only one warehouse location,
but a location will house multiple products.
Answer the following questions

Answers
Questions
1. Is the table a relation (in 1NF)? How do 1. Is the table a relation (in 1NF)? How do you know? If not,
you know? If not, resolve. resolve.
2. Is the original/corrected relation in 2NF?  Yes, it is in 1NF. It (presumably) has unique field names,
 How do you know? unique rows, no composite or multi-valued fields, and no
ordered rows or columns.
 If not 2NF, diagram only relevant
dependencies and resolve to new 2NF 2. Is the original/corrected relation in 2NF?
relations. DO NOT jump ahead to 3NF!
 Yes, it is in 2NF, because it has a simple PK.
3. Are the relations created in #2 already in
3NF? 3. Are the relations created in #2 already in 3NF?

 How do you know?  No, because there are transitive dependencies.

 If not 3NF, diagram only relevant


dependencies and resolve to new 3NF
relations.
Removing transitive dependencies

Transitive dependencies 3 NF Tables


 CATEGORY (CatID, Desc,Loc)
 CatID → Desc, Loc  SUPPLIER (SupplierID, Name)
 SupplierID → Name  PRODUCT (ProdID, CatID, SupplierID, Color, Qty,
Price, Name, Desc, Loc)
 PRODUCT (ProdID, CatID(FK), SupplierID (FK),
Color, Qty, Price)
B Normalization Practice
 The fields are described below:
 Consider the following table structure holding all of a
company’s data about a hotel and reservations:
Attribute Description
 RESERVATION (Room_No, Room_Quality, Beds,
Arrive_Date, Depart_Date, Mgr_ID, Cust_No, Room_No Unique hotel room number
Last_Name, No_Party, Mgr_Name, Mgr_Phone) Room_Quality Standard, Deluxe, Executive
 Consider the following business rules related to this Beds 1, 2, or 2+
table: Arrive_Date Check-in date
 Each room can have only one reservation per arrival Depart_Date Check-out date
date and can be reserved by only one customer at a
time. Mgr_ID Manager on duty on arrival
date
 A customer can make multiple reservations or may
have no reservations. Cust_No Unique customer identifier
 Each reservation is for exactly one room; each Last_Name Customer’s last name
reservation is for exactly one customer. No_Party Number of people staying in
 Duty managers are assigned based on a schedule: room
one manager per day Mgr_Name Last name of manager on duty
 Each room quality level can have any number of Mgr_Phone Office phone for manage on
beds.
duty
Answer the following questions
Questions
Answers
1. Is the table a relation (in 1NF)? How do 1. Is the table a relation (in 1NF)? How do you know? If not,
you know? If not, resolve. resolve.
2. Is the original/corrected relation in 2NF?  Yes, it is in 1NF. It (presumably) has unique field names, unique
 How do you know? rows, no composite or multi-valued fields, and no ordered rows
or columns.
 If not 2NF, diagram only relevant
dependencies and resolve to new 2NF 2. Is the original/corrected relation in 2NF?
relations. DO NOT jump ahead to 3NF!
 NO, it is NOT in 2NF, because there are partial dependencies
3. Are the relations created in #2 already in
3NF?  Hint: When a entity has a composite PK, check for partial
dependencies
 How do you know?
3. Are the relations created in #2 already in 3NF?
 If not 3NF, diagram only relevant
dependencies and resolve to new 3NF  No, because there are transitive dependencies.
relations.
Removing partial dependencies
partial dependencies 2 NF Tables
 ROOM (RoomNo, Room_Quality, Beds)
 RoomNo → Room_Quality, Beds
 Arrive_date→ Mgr_ID, Mgr_name, Mgr_phone  SCHEDULE(Arrive_date, Mgr_ID, Mgr_Name,
Mgr_Phone)
 RESERVATION (Room_No, Room_Quality, Beds,
Arrive_Date, Depart_Date, Mgr_ID, Cust_No,
Last_Name, No_Party, Mgr_Name, Mgr_Phone)
 RESERVATION (Room_No (FK), Arrive_Date (FK),
Depart_Date, Cust_No, Last_Name, No_Party)
Removing transitive dependencies

Transitive dependencies 3 NF Tables


 Are there TD’s in Room: No  ROOM (RoomNo, Room_Quality, Beds)
 Are there TD’s in SCHEDULE?  SCHEDULE(Arrive_date, Mgr_ID (FK), Mgr_Name,
 MgrID → Mgr_name, Mgr_Phone Mgr_Phone)
 Are there TD’s in RESERVATION  MANAGER(Mgr_ID, Mgr_Name, Mgr_Phone)
 Cust_No → LastName  RESERVATION (Room_No (FK), Arrive_Date (FK),
Depart_Date, Cust_No (FK), Last_Name,
No_Party)
 CUSTOMER(Cust_No, Last_Name)
 RESERVATION (Room_No (FK), Arrive_Date (FK),
Depart_Date, Cust_No (FK), No_Party)
C Normalization Practice

 Consider the following data blob


(developed by someone who didn’t
Item_No Description Vendor Vendor_City Item_Cost
know as much about database
design as you do):
54321 Wagon Quality Toys, Inc. Toronto 25.00
 ITEM (Item_No, Description, Acme Toys, Inc. Winnipeg 23.00
{Vendor,} {Vendor_City}, 33303 Skateboard Quality Toys, Inc. Toronto 19.00
{Item_Cost}) Acme Toys, Inc. Winnipeg 21.00
Sporty Toys, Inc. Toronto 23.00
 You may assume that Vendor names
are unique
Answer the following questions

Questions Answers
1. Is the table a relation (in 1NF)? 1. Is the table a relation (in 1NF)? How do you
How do you know? If not, resolve. know? If not, resolve.
 DO NOT jump ahead to 2NF!  NO, it is NOT in 1NF. It has unique field
names, unique rows and no ordered rows or
columns. However, it has multi-valued
fields,.
Convert to 1NF

Steps 1NF Table


 See modifications to table
 ITEM (Item_No, Description,
Item_No Description Vendor Vendor_City Item_Cost
Vendor, Vendor_City, Item_Cost)
2. Is the corrected 1NF relation in 54321 Wagon Quality Toys, Inc. Toronto 25.00
2NF? 54321 Wagon Acme Toys, Inc. Winnipeg 23.00
 Hint: When a entity has a 33303 Skateboard Quality Toys, Inc. Toronto 19.00
composite PK, check for partial 33303 Skateboard Acme Toys, Inc. Winnipeg 21.00
dependencies 33303 Skateboard Sporty Toys, Inc. Toronto 23.00
 NO, it is NOT in 2NF, because there
are partial dependencies
Convert to 2NF

Partial dependencies 1NF relation to 2NF relations


 ItemNo → Description  ITEM (Item_No, Description, Vendor,
Vendor_City, Item_Cost)
 Vendor → Vendor_city

 ITEM(Item_No, Description)
 VENDOR(Vendor, Vendor_City)
 COST(Item_No (FK), Vendor (FK), Item_Cost)
Convert to 3NF

Questions Answer
3. Are the 2NF relations created  3. Yes all 2NF relations are also
already in 3NF? in 3NF, because all tables only
 How do you know?
have one non-key field, so there
are no transitive dependencies.
 If not 3NF, diagram only relevant
dependencies and resolve to new
3NF relations.

You might also like