Professional Documents
Culture Documents
4 0 Normalization
4 0 Normalization
4 0 Normalization
Normalization
1
Fundamentals of Databases 1
2
Fundamentals of Databases 1
Course Overview:
•Introduction
•ER modelling
•Relational database design
• Mapping ER model into relational DB model
• Design of the database scheme
•Normalization
•Referential Integrity: Constraints
•Implementation
• Creation of database, tables and fields
• Loading data into the database
•Structured Query Language (SQL)
• Simple queries
• Complex queries: Joins, views, nested queries
•Referential Integrity: Temporal Data and Triggers
•Physical data organization
•Indexing Structures
Fundamentals of Databases 1
Learning objectives
• You can analyze a given relation and identify what kind of anomalies it would allow or avoid
and which normal form it is.
• You can analyze a given relation to identify functional dependencies.
• You know and understand the 1st, 2nd, 3rd and BC normal forms.
• You can transform relations that are not in these normal forms into these normal forms.
• You know and understand the terms "repeating group", "atomic value range", "functional
dependency", "full functional dependency“, “partial functional dependency” and "transitive
functional dependency".
Reading
• El, chapter 14
• Co, Chapter 6
4
Fundamentals of Databases 1
5
Fundamentals of Databases 1
• Let's assume that a new article "sports bag" is to be added to the assortment. It
already has an articleID (5) and a description (kid‘s sports bag). The article will be
available in Vienna and Berlin.
There are a couple of potential suppliers. Negotiations are still going on with potential
suppliers. How do you do add the article to the relation?
• Let's assume that the article „tent" is to be removed from the assortment. What
happens?
A relation that is not normalized (not in normal form) causes anomaly problems.
A relation that is not normalized (not in normal form) shows a lot of redundancies.
Redundancies lead to inconsistencies.
6
knirsch@htw-berlin.de
Fundamentals of Databases 1
• Insertion anomaly
Example: A new article is to be added - initially there is not yet a supplier.
The new article cannot be added because in this case part of the key would carry a
NULL value, which is not allowed.
A phantom supplier („dummy“) would have to be inserted as key.
• Deletion anomaly
Example: Let us assume that the article „tent" is to be removed from the assortment
and the tuple is to be deleted. When deleting the article „tent", the supplier C and
all the information belonging to the supplier are also deleted.
To keep the vendor, the other attributes would have to be set to NULL values. This is
not possible because, again, part of the key would carry NULL values.
7
knirsch@htw-berlin.de
Fundamentals of Databases 1
In addition, data should be stored in the relations in such a way that the data in the
database can be easily read and processed by the DBMS.
El, p.475: Normalization of data can be considered a process of analyzing the given
relation schemas based on their FDs and primary keys to achieve the desirable
properties of (1) minimizing redundancy and (2) minimizing the insertion, deletion,
and update anomalies …... The normalization process is primarily achieved via
decomposition.
8
knirsch@htw-berlin.de
Fundamentals of Databases 1
Redundancy Example
9
knirsch@kiu.edu.ge
Fundamentals of Databases 1
Functional Dependency - FD
R ⊆ {D1 x D2 x …. Dn}
R = {A, B, …. N} with A,B,…N attributes of R
Functional Dependency X → Y
t1, t2 R
if t1[X] = t2[X] t1[Y] = t2[Y]
if t1[X] = t2[X] then follows t1[Y] = t2[Y]
10
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1
Functional Dependency - FD
R = {A1, A2, …. An}
X,Y ⊆ R
Functional Dependency X → Y if t1[X] = t2[X] then t1[Y] = t2[Y]
• In other words: With given attribute sets X and Y with X,Y subsets of the relation
R: Y is functionally dependent on X if the following holds true: The values of X
determine the values of Y.
11
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1
Functional Dependency - FD
{A} → {B} ?
{B} → {C} ?
{A} → {C} ?
{C,D} → {B} ?
1
{B,C} → {D} ?
2
3
4
5
Kemper, p.180
FDs represent semantic constraints that need to hold true for all possible
instances of a relation.
FDs represent contraints for the database schema, not for any given
representation.
12
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1
Functional Dependency - FD
EL, p.473:
13
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1
Functional Dependency - FD
14
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1
Functional Dependency - FD
El, p.472:
If a constraint on R states that there cannot be more than one tuple with a given X-
value in any relation instance r(R)—that is, X is a candidate key of R—this implies
that X → Y for any subset of attributes Y of R (because the key constraint implies
that no two tuples in any legal state r(R) will have the same value of X).
If X is a candidate key of R, then X → R.
15
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1
enrollment
18
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1
Normalization Process
roomEquip
buildingID cYear roomID maxOcc category equipment price
1 1980 1 25 B chair: 25; desk: 4; 100
projector: 1;
1 1980 2 50 A chair: 50; desk: 12; 200
projector: 1;
flip chart: 2
2 1960 1 10 C chair: 10; desk: 1; 50
3 2012 1 12 C chair: 12; desk: 1; 50
3 2012 3 100 A chair: 100; desk: 2; 200
projector: 3;
There are a total of 6 normal forms: The 1st to 3rd N, the Boyce-Codd normal form as
refinement of the 3rd NF and the 4th and 5th NF. The NF build on each other. This
means: if a relation is in the 3rd NF, it is automatically also in the 2nd NF.
El, p.475: The normal form of a relation refers to the highest normal form condition
that it meets, and hence indicates the degree to which it has been normalized.
Of essential importance are the normal forms 1 to 3 and Boyce-Codd normal form: A
relational database design should only contain relations that correspond at least to
the 3rd NF / Boyce-Codd normal form. (Unless relations are kept unnormalized on
purpose.)
21
knirsch@htw-berlin.de
Fundamentals of Databases 1
• 1st normal form does not mean that in a relation "by chance" all attribute
values are atomic. It requires that only atomic attribute values are possible.
22
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1
mariadb documentation
https://mariadb.com/kb/en/database-normalization-1st-normal-form/
• What this means is that data must be able to fit into a tabular format, where each
field contains one value. This is also the stage where the primary key is defined.
Some sources claim that defining the primary key is not necessary for a table to be in
first normal form, but usually it's done at this stage and is necessary before we can
progress to the next stage. Theoretical debates aside, you'll have to define your
primary keys at this point.
• Although not always seen as part of the definition of 1st normal form, the principle
of atomicity is usually applied at this stage as well. This means that all columns must
contain their smallest parts or be indivisible. A common example of this is where
someone creates a name field, rather than first name and surname fields. They
usually regret it later.
23
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1
• El, p. 477: First normal form (1NF) is now considered to be part of the formal
definition of a relation in the basic (flat) relational model; historically, it was
defined to disallow multivalued attributes, composite attributes, and their
combinations. It states that the domain of an attribute must include only atomic
(simple, indivisible) values and that the value of any attribute in a tuple must be
a single value from the domain of that attribute. Hence, 1NF disallows having a
set of values, a tuple of values, or a combination of both as an attribute value
for a single tuple.
• In other words, 1NF disallows relations within relations or relations as attribute
values within tuples. The only attribute values permitted by 1NF are single
atomic (or indivisible) values.
24
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1
In our example roomEquip we actually have a two-fold violation of the 1st normal
form:
roomEquip:
t1: {1,1, 2020, 25, A, 100, {chair:25, desk:4; projector: 1}}
• 2nd violation of 1st normal form: composite attribute within the repeating group
- each value of the repeating group holds two components / two attributes:
equipmentType equipmentNumber.
25
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1
26
Fundamentals of Databases 1
Key?
27
Fundamentals of Databases 1
Example:
roomEquip: {1,1, 2020, 25, {chair:25, desk:4; projector: 1}, A, 100}
1. Each equipment set is considered a single value of the attribute domain. The PK
determines the equipment set.
2. Each value stands for its own. In this case the attribute is not functionally
dependent on the primary key.
studentName hobbies
29
knirsch@kiu.edu.ge
Fundamentals of Databases 1
30
knirsch@kiu.edu.ge
Fundamentals of Databases 1
N1NF
• NoSQL databases, e.g. document databases, are usually not in 1st NF.
31
Fundamentals of Databases 1
A relation is in 2nd NF
• if it is in 1st NF and
• if all non-key attributes are fully functionally dependent on the entire key.
(There are no partial dependencies)
For relations with atomic key it holds that they are in the 2nd NF if they are in
the 1st NF.
32
Fundamentals of Databases 1
33
Fundamentals of Databases 1
Relations in 2nd NF
34
Fundamentals of Databases 1
• The attribute "price" is determined by the attribute "category" and the attribute
"category" by the entire key {buildingID, roomID}. Thus, the attribute "price" is
transitively dependent on the key {buildingID, roomID}.
35
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1
36
Fundamentals of Databases 1
1 1 25 B B 100
1 2 50 A A 200
2 1 10 C C 50
3 3 100 A
37
Fundamentals of Databases 1
Relations in 3rd NF
38
Fundamentals of Databases 1
39
Fundamentals of Databases 1
There is a difference between BCNF and 3rd NF only if there are multiple
candidate keys with overlapping attributes.
Example: candidate key 1 consisting of attributes {A,B} and candidate key 2
consisting of attributes {B,C}. Overlapping attribute in both key candidates: B
41
Fundamentals of Databases 1
Normalizing to BCNF:
Decomposition of the relation into two relations.
42
Fundamentals of Databases 1
Candidate keys?
FDs
Is the relation in 3rd normal form?
43
Fundamentals of Databases 1
Any professor can only examine one course. Each course is only examined by one
professor.
Candidate keys?
FDs
44
Fundamentals of Databases 1
When decomposing relations from 3rd to BCNF, one MUST do it in a way that
renders non-additive joins and does not generate spurious tuples.
When decomposing relations from 3rd to BCNF, it is not always possible to
preserve all FDs. Sometimes one needs to decide between arriving at BCNF
but losing FDs or staying with 3rd NF and preserving FDs.
45
Fundamentals of Databases 1
Any professor can only examine one course but each course can be examined by
multiple professors.
Candidate keys?
FDs
FDs cannot all be preserved but non-additive join property can be achieved
46
Fundamentals of Databases 1
Wrong solution:
Generation of spurious (incorrect) tuples (additive join property)
47
Fundamentals of Databases 1
Summary Normalization
NF StudID
1st NF all attrbutes have atomic values only, no repearting groups, no
composite attributes, table format, PK
2nd NF 1st NF plus no partial dependency of non-key attributes
3rd NF 2nd NF plus no transitive dependencies of non-key attributes
BCNF 3rd NF plus every determinant is a candidate key
Definition: An attribute (or group of attributes) of which others are fully functionally
dependent is called a determinant.
48
Fundamentals of Databases 1
Summary Normalization
49
Fundamentals of Databases 1
In IPD course we work with this table that displays all item orders of a company.
What NF is it in?
What is the key?
Why is that perfectly okay for the purpose? Do we get anomalies?
Fundamentals of Databases 1
51
eva.knirsch@kiu.edu.ge