Chapter 5

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

DATABASES

5. Database normalization

bogdan.florea@upb.ro
Chapter 5. Database normalization

• Database normalization
• First normal form (1NF)
• Second normal form (2NF)
• Third normal form (3NF)
• Boyce and Codd normal form (BCNF)
• Database denormalization
Database normalization

• Database normalization is the process of restructuring a relational


database in accordance with a set of rules (normal forms)
• The goal is to reduce data redundancy and improve data integrity
• The normalization process requires the organization of the columns
(attributes) of the tables (relations) to ensure that their
dependencies are properly enforced by the integrity constraints
• This process can be accomplished by:
• synthesis – creating a new database design
• decomposition – improving an existing database design
Database normalization

• Consider the following table structure:


students
id first_name last_name group specialization tutor
1 Bob Belcher 441F ELA Calvin Fischoeder
2 Teddy Francisco 441F ELA Calvin Fischoeder
3 Courtney Wheeler 442F ELA Calvin Fischoeder
4 Henry Haber 442F ELA Calvin Fischoeder
5 Jimmy Pesto 441G TST Hugo Habercore
6 Tammy Larsen 441G TST Hugo Habercore
7 Chuck Charles 442G TST Hugo Habercore
8 Edith Cranwinkle 442G TST Hugo Habercore
Database normalization

• Insert anomaly
• For a new admission, if the group is not explicitly set to accept NULL values,
the record cannot be inserted
• The specialization of a student is not set until the 3rd year of study

• Update anomaly
• If a student transfers from one group to another, this might imply changing
the tutor as well,
• If this is not handled appropriately, it will lead to data inconsistency

• Delete anomaly
• In the students table, we have both the student records and the groups
• If the student records are deleted, the group information is lost too
First normal form (1NF)

• A relation is in the first normal form (1NF) if and only if the


domain of each attribute contains only atomic (indivisible) values
and the value for each attribute contains only a single value from
that domain
• The first normal form also requires that each tuple is identified
with a primary key
• First normal form results:
• Identification of each tuple with a primary key
• Eliminate repeating groups in individual tables
• Create a separate table for each set of related data
First normal form (1NF)

• Example: In the previous students table, a telephone number can


also be set. One requirement is to be able to have multiple phone
numbers for a student
• Since the phone number can be stored in various formats using a
character data type, one apparent solution is to simply store the
telephone numbers in the same column, using a delimiter
id first_name last_name group … telephone
1 Bob Belcher 441F … 0754.324.156
2 Teddy Francisco 441F … 0752 148 569, 0766 59 08 72
3 Courtney Wheeler 442F …
4 Henry Haber 442F … +4 (021) 345 76 33
First normal form (1NF)

• If the column telephone would store an arbitrary text, there would


be no problem, but in this case we want the values to be viewed
and dealt with as valid phone numbers
• In this case, the values are not atomic (indivisible)
• Another solution is to introduce separate columns for this data

id first_name last_name group … telephone_1 telephone_2


1 Bob Belcher 441F … 0754.324.156
2 Teddy Francisco 441F … 0752 148 569 0766 59 08 72
3 Courtney Wheeler 442F …
4 Henry Haber 442F … +4 (021) 345 76 33
First normal form (1NF)

• Technically, this structure does not violate the first normal form,
since the values are atomic
• Informally, they still represent a repeating group (they are
conceptually the same attribute)
• What happens if more than two phone numbers are needed? How
many columns should be added?
id first_name last_name group … telephone_1 telephone_2
1 Bob Belcher 441F … 0754.324.156
2 Teddy Francisco 441F … 0752 148 569 0766 59 08 72
3 Courtney Wheeler 442F …
4 Henry Haber 442F … +4 (021) 345 76 33
First normal form (1NF)

• One possible resolution which would respect the first normal form
is to split the groups of strings or columns into atomic entities and
ensure that no row contains more than one phone number
• In this case, the id is no longer unique and instead the primary key
becomes the combination of id and phone number

id first_name last_name group … telephone


1 Bob Belcher 441F … 0754.324.156
2 Teddy Francisco 441F … 0752 148 569
2 Teddy Francisco 441F … 0766 59 08 72
3 Courtney Wheeler 442F …
4 Henry Haber 442F … +4 (021) 345 76 33
First normal form (1NF)

• A better solution is to create a separate table for storing the phone


numbers
• A one-to-many relationship will exist between the students table
and the phone _numbers table
• This structure will also respect the other normal forms (2NF, 3NF)
students phone_numbers
id first_name last_name group … id student_id telephone
1 Bob Belcher 441F … 1 1 0754.324.156
2 Teddy Francisco 441F … 2 2 0752 148 569
3 Courtney Wheeler 442F … 3 2 0766 59 08 72
4 Henry Haber 442F … 4 4 +4 (021) 345 76 33
Second normal form (2NF)

• A relation is in the second normal form (2NF) if it is in 1NF and no


non-prime attribute is dependent on any proper subset of any
candidate key of the relation
• A non-prime attribute is an attribute which is not a part of any
candidate key
• A functional dependency on part of any candidate key is a
violation of 2NF
schedules
course_id date course_title room capacity …
1 2018-11-28 Information Theory A01 15 …
2 2018-11-29 Databases A04 12 …
1 2018-12-05 Information Theory A03 20 …
Second normal form (2NF)

• In the schedules tables, the candidate key CK is (course_id, date)


• The table dependencies are:
• course_id → course_title
• { course_id, date } → room

• The course_title attribute has a partial dependency on the CK


• Tables with a single attribute candidate keys are always in 2NF
schedules
course_id date course_title room capacity …
1 2018-11-28 Information Theory A01 15 …
2 2018-11-29 Databases A04 12 …
1 2018-12-05 Information Theory A03 20 …
Second normal form (2NF)

courses
id title …
1 Information Theory …
2 Databases …
1 Information Theory …

schedules
course_id date room capacity …
1 2018-11-28 A01 15 …
2 2018-11-29 A04 12 …
1 2018-12-05 A03 20 …
Third normal form (3NF)

• A relation is in the third normal form (3NF) if it is in 2NF and all


the attributes in a table are determined only by the candidate keys
of the relation and not by any non-prime attributes
• This implies that there are no transitive dependencies between
non-prime attributes of a relation
• In the schedules table, the room and capacity attributes are
dependent and they are not part of any candidate key
schedules
course_id date room capacity …
1 2018-11-28 A01 15 …
2 2018-11-29 A04 12 …
1 2018-12-05 A03 20 …
Third normal form (3NF)

• In the previous relation, the dependency room → capacity is a


dependency between two non-prime attributes
rooms
room capacity …
A01 15 …
A04 12 …
A03 20 …
schedules
course_id date room …
1 2018-11-28 A01 …
2 2018-11-29 A04 …
1 2018-12-05 A03 …
Boyce-Codd normal form (BCNF or 3.5NF)

• A relation is in the Boyce-Codd normal form (BCNF) if and only if


for every dependency 𝐴 → 𝐵, one of the following conditions holds:
• 𝐴 → 𝐵 is a trivial dependency (B is a subset of A)
• A is a superkey of the relation

• This means that for every dependency 𝐴 → 𝐵, 𝐴 cannot be a non-


prime attribute if 𝐵 is a prime attribute
• Informally the Boyce-Codd normal form is expressed as “Each
attribute must represent a fact about the key, the whole key, and
nothing but the key”.
Boyce-Codd normal form (BCNF or 3.5NF)

enrolments
student_id course professor
1 Java John Smith
1 Databases Anthony Page
2 Databases Laura Palmer
3 C++ Michael Angelo
4 Java John Smith

• One student can enroll for multiple courses


• For each course, a professor is assigned to the student
• Multiple professors can teach the same course
• One professor teaches only one course
Boyce-Codd normal form (BCNF or 3.5NF)

enrolments
student_id course professor
1 Java John Smith
1 Databases Anthony Page
2 Databases Laura Palmer
3 C++ Michael Angelo
4 Java John Smith

• Primary key: (student_id, course)


• Dependency professor → course contains a non-prime attribute
(professor)
Boyce-Codd normal form (BCNF or 3.5NF)

enrolments
student_id professor_id
1 1
1 2
2 3
3 4
4 1

professors
id professor course
1 John Smith Java
2 Anthony Page Databases
3 Laura Palmer Databases
4 Michael Angelo C++
Database denormalization

• Each phase in the normalization process is usually resolved by


decomposing relations into multiple tables
• This increases the complexity of the database structure and of the
required queries
• While the advantages are obvious and the increase in complexity is
justified, sometimes it may be easier/beneficial to willingly break
some of the normalization rules for specific application cases
• This process is called denormalization, where the normalization
rules are purposely not followed
• The denormalization should be a conscious process and not
accidental
Database denormalization

• Example: Each employee has a work and personal email address


and a work and personal telephone number
id first_name last_name w_email p_email w_phone p_phone
1 Mickey Lee m.lee@... mick77@... 0740998877
2 Andrea Smith a.smith@... andreas@... 0723413241 0214132415
3 Julia Sky j.sky@... jules54@... 0312314351

• This table is not in 1NF because it contains repeating groups


• However, since it is clear that this company will only assign one
work email for each employee and request only one personal
address, the compromise is acceptable

You might also like