4 0 Normalization

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

Fundamentals of Databases 1

Normalization

1
Fundamentals of Databases 1

2
Fundamentals of Databases 1

Course Overview:
•Introduction
•ER modelling
•Relational database design
• Mapping ER model into relational DB model
• Design of the database scheme
•Normalization
•Referential Integrity: Constraints
•Implementation
• Creation of database, tables and fields
• Loading data into the database
•Structured Query Language (SQL)
• Simple queries
• Complex queries: Joins, views, nested queries
•Referential Integrity: Temporal Data and Triggers
•Physical data organization
•Indexing Structures
Fundamentals of Databases 1

Learning objectives

• You can analyze a given relation and identify what kind of anomalies it would allow or avoid
and which normal form it is.
• You can analyze a given relation to identify functional dependencies.
• You know and understand the 1st, 2nd, 3rd and BC normal forms.
• You can transform relations that are not in these normal forms into these normal forms.
• You know and understand the terms "repeating group", "atomic value range", "functional
dependency", "full functional dependency“, “partial functional dependency” and "transitive
functional dependency".

Reading

• El, chapter 14
• Co, Chapter 6

4
Fundamentals of Databases 1

Introduction Normal Forms


articleID article price storage stock supplier telephone
Description Location Supplier
1 skis 200 Vienna 50 A 123456

1 skis 200 Munich 25 B 234567


2 tent 150 Berlin 10 C 345678
3 snowshoes 100 Berlin 50 A 123456
4 boots 50 Vienna 150 A 123456

4 boots 50 Cologne 5 A 123456

4 boots 50 Munich 15 B 234567

4 boots 50 Munich 5 D 589944

The relation shows a table out of a merchandise management system.

What is the key of the relation?

5
Fundamentals of Databases 1

Introduction Normal Forms

• Let's assume that a new article "sports bag" is to be added to the assortment. It
already has an articleID (5) and a description (kid‘s sports bag). The article will be
available in Vienna and Berlin.
There are a couple of potential suppliers. Negotiations are still going on with potential
suppliers. How do you do add the article to the relation?

• Let's assume that the article „tent" is to be removed from the assortment. What
happens?

• Let's assume that supplier A communicates a new telephone number.

A relation that is not normalized (not in normal form) causes anomaly problems.

A relation that is not normalized (not in normal form) shows a lot of redundancies.
Redundancies lead to inconsistencies.

6
knirsch@htw-berlin.de
Fundamentals of Databases 1

Introduction Normal Forms


The examples of the previous slides are summarized under the terms:

• Insertion anomaly
Example: A new article is to be added - initially there is not yet a supplier.
The new article cannot be added because in this case part of the key would carry a
NULL value, which is not allowed.
A phantom supplier („dummy“) would have to be inserted as key.

• Deletion anomaly
Example: Let us assume that the article „tent" is to be removed from the assortment
and the tuple is to be deleted. When deleting the article „tent", the supplier C and
all the information belonging to the supplier are also deleted.
To keep the vendor, the other attributes would have to be set to NULL values. This is
not possible because, again, part of the key would carry NULL values.

• Update anomaly: Supplier A communicates a new telephone number. As a result, all


tuples with A as the supplier must be updated.

7
knirsch@htw-berlin.de
Fundamentals of Databases 1

Introduction Normal Forms


The transformation of a relation into a normal form is called normalization. The goals
of normalization are:

Avoiding anomalies and


Reducing redundancy.

In addition, data should be stored in the relations in such a way that the data in the
database can be easily read and processed by the DBMS.

Drawback of normalization: it costs performance.

El, p.475: Normalization of data can be considered a process of analyzing the given
relation schemas based on their FDs and primary keys to achieve the desirable
properties of (1) minimizing redundancy and (2) minimizing the insertion, deletion,
and update anomalies …... The normalization process is primarily achieved via
decomposition.

8
knirsch@htw-berlin.de
Fundamentals of Databases 1
Redundancy Example

• Trainers can also be


members.
• With the shown design,
personal information of a
trainer-member would be
stored twice: Once in the
member table and once in
the trainer table.
• If a member (that is also a
trainer) updates his
personal account
information, e.g. new email,
the information is updated
in the member table but
not in the trainer table.
→ data becomes
inconsistent

9
knirsch@kiu.edu.ge
Fundamentals of Databases 1

Functional Dependency - FD
R ⊆ {D1 x D2 x …. Dn}
R = {A, B, …. N} with A,B,…N attributes of R

X,Y ⊆ R X and Y are subsets of attributes of R ,


X can be one attribute or an attribute group.
Y can be one attribute or an attribute group.

Functional Dependency X → Y

 t1, t2  R
if t1[X] = t2[X]  t1[Y] = t2[Y]
if t1[X] = t2[X] then follows t1[Y] = t2[Y]

Y ⊆ R is functionally dependent on X ⊆ R if the value of X specifies a unique value of Y.

The values ​of X ⊆ R thus uniquely determine the values ​of Y ⊆ R

10
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1

Functional Dependency - FD
R = {A1, A2, …. An}
X,Y ⊆ R
Functional Dependency X → Y if t1[X] = t2[X] then t1[Y] = t2[Y]

• Definition of functional dependence:


• Given is a relation R:
• A set of attributes Y of relation R is functionally dependent on a set of attributes
X of relation R, if for a given value of the X attributes there can be only at most
one Y value.
• For functional dependency (FD) the notation X --> Y is used.
• X,Y can be atomic attributes or sets of attributes.

• In other words: With given attribute sets X and Y with X,Y subsets of the relation
R: Y is functionally dependent on X if the following holds true: The values of X
determine the values of Y.

11
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1

Functional Dependency - FD

{A} → {B} ?
{B} → {C} ?
{A} → {C} ?
{C,D} → {B} ?
1
{B,C} → {D} ?
2
3
4
5
Kemper, p.180

FDs represent semantic constraints that need to hold true for all possible
instances of a relation.
FDs represent contraints for the database schema, not for any given
representation.

12
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1

Functional Dependency - FD

EL, p.473:

A functional dependency is a property of the relation schema R, not of a particular


legal relation state r of R. Therefore, an FD cannot be inferred automatically from a
given relation extension r but must be defined explicitly by someone who knows
the semantics of the attributes of R.

Given a populated relation, we cannot determine which FDs hold and which do not
unless we know the meaning of and the relationships among the attributes. All we
can say is that a certain FD may exist if it holds in that particular extension. We
cannot guarantee its existence until we understand the meaning of the
corresponding attributes. We can, however, emphatically state that a certain FD
does not hold if there are tuples that show the violation of such an FD.

13
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1

Functional Dependency - FD

ISBN → {publishingCompany, title, author}


studID →
{streetName, postalCode} → {locality}
articleID →
{studID,courseID} →

Teacher Reading Course


Smith Bartram Data Structures
Smith Martin Data Management
Hall Hoffman Compilers
Brown Horowitz Data Structures

Which FDs may hold true, which can be ruled out?

14
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1

Functional Dependency - FD

El, p.472:
If a constraint on R states that there cannot be more than one tuple with a given X-
value in any relation instance r(R)—that is, X is a candidate key of R—this implies
that X → Y for any subset of attributes Y of R (because the key constraint implies
that no two tuples in any legal state r(R) will have the same value of X).
If X is a candidate key of R, then X → R.

• consequently: All attributes are functionally dependent on the primary key

15
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1

Partial functional dependency and full functional dependency

enrollment

studID studName courseID courseName enrollStatus


1 john 1 FoP active
1 john 2 FoDB waiting
2 ana 1 FoP active
3 giorgi 1 FoP denied
3 giorgi 3 Algorithms active

What is the key?


FDs?
Fundamentals of Databases 1

Full functional dependency and partial functional dependency


enrollment
studID studName courseID courseName enrollStatus
1 john 1 FoP active
1 john 2 FoDB waiting
2 ana 1 FoP active
3 giorgi 1 FoP denied
3 giorgi 3 Algorithms active

Fully functional dependence: If attribute Y is functionally dependent on a composite key


X but not on any subset of that composite key, the attribute Y is fully functionally
dependent on X.
Partial functional dependency: A constraint in which an attribute Y is dependent on only
a portion (subset) of the primary key X.

Partial dependency is only applicable when the key is …..

Identify partial and full dependencies in the example!


Fundamentals of Databases 1

Transitive Functional Dependency

• El, p. 483: A functional dependency X → Y in a relation schema R is a transitive


dependency if there exists a set of attributes Z in R that is neither a candidate
key nor a subset of any key of R and both X → Z and Z→ Y hold.

18
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1

buildingID cYear roomID maxOcc category equipment price


1 1980 1 25 B chair: 25; desk: 4; 100
projector: 1;
1 1980 2 50 A chair: 50; desk: 12; 200
projector: 1;
flip chart: 2
2 1960 1 10 C chair: 10; desk: 1; 50
3 2012 1 12 C chair: 12; desk: 1; 50
3 2012 3 100 A chair: 100; desk: 2; 200
projector: 3;
Fundamentals of Databases 1

Normalization Process
roomEquip
buildingID cYear roomID maxOcc category equipment price
1 1980 1 25 B chair: 25; desk: 4; 100
projector: 1;
1 1980 2 50 A chair: 50; desk: 12; 200
projector: 1;
flip chart: 2
2 1960 1 10 C chair: 10; desk: 1; 50
3 2012 1 12 C chair: 12; desk: 1; 50
3 2012 3 100 A chair: 100; desk: 2; 200
projector: 3;

What is the key?


What types of FDs exist?
Fundamentals of Databases 1

Introduction Normal Forms

There are a total of 6 normal forms: The 1st to 3rd N, the Boyce-Codd normal form as
refinement of the 3rd NF and the 4th and 5th NF. The NF build on each other. This
means: if a relation is in the 3rd NF, it is automatically also in the 2nd NF.

El, p.475: The normal form of a relation refers to the highest normal form condition
that it meets, and hence indicates the degree to which it has been normalized.

Of essential importance are the normal forms 1 to 3 and Boyce-Codd normal form: A
relational database design should only contain relations that correspond at least to
the 3rd NF / Boyce-Codd normal form. (Unless relations are kept unnormalized on
purpose.)

21
knirsch@htw-berlin.de
Fundamentals of Databases 1

1st normal form


• Starting point: A relation in a database is initially said to be in zero normal form.
No preconditions.

• 1st normal form:


• A relation is said to be in first normal form when:
• The relation has a primary key: all of the key attributes are identified.
• No single attribute (column) has multiple values: There are no repeating
groups in the table. Each cell contains one and only one value, not a set of
values. No attribute has relations as elements.

• 1st normal form does not mean that in a relation "by chance" all attribute
values are atomic. It requires that only atomic attribute values are possible.

22
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1

mariadb documentation
https://mariadb.com/kb/en/database-normalization-1st-normal-form/

• Tables are in 1st normal form if they follow these rules:

• There are no repeating groups.


• All the key attributes are defined.
• All attributes are dependent on the primary key.

• What this means is that data must be able to fit into a tabular format, where each
field contains one value. This is also the stage where the primary key is defined.
Some sources claim that defining the primary key is not necessary for a table to be in
first normal form, but usually it's done at this stage and is necessary before we can
progress to the next stage. Theoretical debates aside, you'll have to define your
primary keys at this point.

• Although not always seen as part of the definition of 1st normal form, the principle
of atomicity is usually applied at this stage as well. This means that all columns must
contain their smallest parts or be indivisible. A common example of this is where
someone creates a name field, rather than first name and surname fields. They
usually regret it later.

23
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1

1st normal form

• El, p. 477: First normal form (1NF) is now considered to be part of the formal
definition of a relation in the basic (flat) relational model; historically, it was
defined to disallow multivalued attributes, composite attributes, and their
combinations. It states that the domain of an attribute must include only atomic
(simple, indivisible) values and that the value of any attribute in a tuple must be
a single value from the domain of that attribute. Hence, 1NF disallows having a
set of values, a tuple of values, or a combination of both as an attribute value
for a single tuple.
• In other words, 1NF disallows relations within relations or relations as attribute
values within tuples. The only attribute values permitted by 1NF are single
atomic (or indivisible) values.

24
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1

1st normal form

In our example roomEquip we actually have a two-fold violation of the 1st normal
form:

roomEquip:
t1: {1,1, 2020, 25, A, 100, {chair:25, desk:4; projector: 1}}

• 1st violation of 1st normal form: repeating group


- an attribute that has more than one value in each tuple

• 2nd violation of 1st normal form: composite attribute within the repeating group
- each value of the repeating group holds two components / two attributes:
equipmentType equipmentNumber.

25
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1

Transfer to 1st NF: Decomposition


• Insert additional tuples to resolve the repeating groups,
• Insert additional columns to resolve composite attributes.

Definition of Repeating Group:


A repeating group denotes a set of values for an attribute. That is, each tuple
contains not only one attribute value for the attribute in question, but a set
(group) of values for an attribute. Repeating groups are sets of data of the
same type.
Example: Equipment type: chair, desk, projector, smartboard, …

Definition of Composite Attribute:


A composite attribute is an attribute that is composed of more than one
attribute, e.g. address, name.
Example: equipment is composed of equipment type and amount

26
Fundamentals of Databases 1

Relation scheme in 1st NF

buildingID cYear roomID maxOcc eType eNumber category price

1 1980 1 25 chair 25 B 100


1 1980 1 25 desk 4 B 100
1 1980 1 25 projector 1 B 100
1 1980 2 50 chair 50 A 200
1 1980 2 50 desk 12 A 200
1 1980 2 50 projector 1 A 200
1 1980 2 50 flipChart 2 A 200
2 1960 1 10 chair 10 C 50
2 1960 1 10 desk 1 C 50
3 2012 3 100 chair 100 A 200
3 2012 3 100 desk 2 A 200
3 2012 3 100 projector 3 A 200

Key?
27
Fundamentals of Databases 1

1st Normal Form or Non-1st Normal Form


Non-atomic attributes can be looked at in 2 ways:

Example:
roomEquip: {1,1, 2020, 25, {chair:25, desk:4; projector: 1}, A, 100}

1. Each equipment set is considered a single value of the attribute domain. The PK
determines the equipment set.

{1,1} → {chair: 25; desk: 4; projector: 1}


is considered a single value;
→ querying of this attribute becomes more difficult
(unless the usual query pattern is to display equipment for one room).
→ you forego this performance / querying advantage

2. Each value stands for its own. In this case the attribute is not functionally
dependent on the primary key.

equipment: {1,1} → chair: 25


{1,1} → desk:4
{1,1} → projector:1
28
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1

Non-1st Normal Form

studentName hobbies

giorgi {football, basketball, swimming}


mariam {volleyball, swimming, horse riding}
lasha {football}
saba {singing, climbing}
daviti {singing, reading, travelling, painting, football}
ana {ice-skating}

• Set of hobbies to be considered as single value.


• All relational databases allow this, using a set datatypes.
• Query Pattern: display all hobbies of a specific student.
• Updates (additions or deletions) of hobbies are challenging.
• Advantage: All student-hobby information is stored together.
Typical: ALL student information is sdtored at one place.
• NoSQL databases (e.g. document databases) store all information about a unit
(student) together in a document.

29
knirsch@kiu.edu.ge
Fundamentals of Databases 1

1st Normal Form


studentName hobby • Query Patterns:
– we need a football team. Display all
giorgi football students that have football as hobby.
giorgi basketball
– What is the most favorite hobby?
giorgi swimming (Analyzing data)
mariam volleyball
– How many students like to sing?
mariam swimming
• Easy to maintain and update (add / delete
mariam horseriding hobbies)
lasha football
• Simple to query and analyze
saba singing
• Disadvantage:
saba climbing
daviti football
– student-hobby information is stored in
different tuples.
daviti singing
– All student information is stored across
daviti reading
several tables and potentially need to be
daviti travelling joined.
daviti painting
ana ice-skating

30
knirsch@kiu.edu.ge
Fundamentals of Databases 1

N1NF
• NoSQL databases, e.g. document databases, are usually not in 1st NF.

• Relational databases usually are in 1NF, but:


• Relational databases also have N1NF datatype(s)
• mariadb has the set datatype – for repeating groups
• set ('value1','value2',...)
set (“swimming“,“singing“,“football“,“collecting stamps“)

• Advantage: repeating group stays together, less decomposition


• Beneficial if query pattern asks for all values of the set

• Disadvantage: More difficult to maintain as values are part of the structure


• Disadvantage if query patterns typically ask for singular values of the set, if
aggregations or analysis are needed.

31
Fundamentals of Databases 1

Definition 2nd normal form:

A relation is in 2nd NF
• if it is in 1st NF and
• if all non-key attributes are fully functionally dependent on the entire key.
(There are no partial dependencies)

For relations with atomic key it holds that they are in the 2nd NF if they are in
the 1st NF.

Transfer to 2nd NF: Decomposition


Decomposition of the relation in such a way that all non-key attributes in the
sub-relations are fully functionally dependent on the entire key of the
respective sub-relation.

32
Fundamentals of Databases 1

Relation scheme in 1st NF

buildingID cYear roomID maxOcc eType eNumber category price

1 1980 1 25 chair 25 B 100


1 1980 1 25 desk 4 B 100
1 1980 1 25 projector 1 B 100
1 1980 2 50 chair 50 A 200
1 1980 2 50 desk 12 A 200
1 1980 2 50 projector 1 A 200
1 1980 2 50 flipChart 2 A 200
2 1960 1 10 chair 10 C 50
2 1960 1 10 desk 1 C 50
3 2012 3 100 chair 100 A 200
3 2012 3 100 desk 2 A 200
3 2012 3 100 projector 3 A 200

33
Fundamentals of Databases 1

Relations in 2nd NF

buildin room maxOcc category price buildingI roomID eType eNumber


gID ID D
1 1 25 B 100 1 1 Chairs 25
1 2 50 A 200 1 1 Tables 4
2 1 10 C 50 1 1 Projector 1
3 3 100 A 200 1 2 Chairs 50
1 2 Tables 12
buildingID cYear 1 2 Projector 1
1 2 Flipchart 2
1 1980 1 2 Moderator 1
2 1960 s
3 2012 wall
2 1 Chairs 10
2 1 Tables 1
3 3 Chairs 100
3 3 Tables 2
3 3 Projector 3

34
Fundamentals of Databases 1

3rd normal form


• A transitive dependency between two attribute sets Y and A exists if Y is fully
functionally dependent on Z and Z is fully dependent on X.
• Thus, the following is true: X --> Z and Z --> Y.

• Looking at relation room In our example:


– {buildingID, roomID} is the key X
– the attribute category is Z and
– the attribute price is Y

• The following is true: X --> Z and Z --> Y.

• The attribute "price" is determined by the attribute "category" and the attribute
"category" by the entire key {buildingID, roomID}. Thus, the attribute "price" is
transitively dependent on the key {buildingID, roomID}.

35
eva.knirsch@kiu.edu.ge
Fundamentals of Databases 1

3rd normal form


Definition 3rd normal form:
A relation is in 3rd NF if it is in 2nd NF and no non-key attribute is transitively
dependent on the key.

Transfer to 3rd NF:


Decomposition of the relation in such a way that all non-key attributes that are
transitively dependent on the key are put into a new relation that contains these
attributes and the associated key.

36
Fundamentals of Databases 1

Relations in the 3rd NF

buildingID roomID maxOcc inCat category price

1 1 25 B B 100
1 2 50 A A 200
2 1 10 C C 50
3 3 100 A

37
Fundamentals of Databases 1

Summary 1st to 3rd normal forms


building Build category price building_id room_id equipment eNumber
_id ing 1 1 Chairs 25
year B 100 1 1 Tables 4
1 1980 A 200 1 1 Projector 1
2 1960 C 50 1 2 Chairs 50
3 2012 1 2 Tables 12
1 2 Projector 1
building_ room_id max_ in_cat 1 2 Flipchart 2
id occupa 1 2 Moderators 1
ncy wall
1 1 25 B 2 1 Chairs 10
1 2 50 A 2 1 Tables 1
2 1 10 C 3 3 Chairs 100
3 3 100 A 3 3 Tables 2
3 3 Projector 3

Relations in 3rd NF

38
Fundamentals of Databases 1

Relations‘ scheme SportsClub (ER-model)


member {[member_id: integer, l_name: string, f_name: string; zip: integer,
city:string DoB: date, gender: enum, email:string; is_child_of: integer]}
trainer: {[tr_id: integer; tr_l_name:string, license: boolean]}
area {[area:string, description: string, tr_id: integer]}
course {[course_id: integer, course_name: string, target_group: string, area:
string, tr_id: integer]}
enrollment {[member_id: integer, course_id: integer]}
Key is both, PK and FK key on relation members and relation courses

What NF is our sportsClub schema in?

39
Fundamentals of Databases 1

Introduction Normal Forms


article_ article_ price storage stock supplier telephone
ID description Location supplier
1 skis 200 Vienna 50 A 123456

1 skis 200 Munich 25 B 234567


2 tent 150 Berlin 10 C 345678
3 snowshoes 100 Berlin 50 A 123456
4 boots 50 Vienna 150 A 123456

4 boots 50 Cologne 5 A 123456

4 boots 50 Munich 15 B 234567

4 boots 50 Munich 5 D 589944

In which normal form is the relation?


Write down the assumend existing FDs.
Decompose into 3rd normal form.
40
Fundamentals of Databases 1

Boyce Codd Normal Form (BCNF)


The BCNF is a „refined" 3rd NF.
Definition: a relation is in BCNF if it is in 3rd NF and if for each X→ Y holds:
X is a candidate key.
Every attribute must depend on the entire (candidate) key.
(For comparison 2nd NF: Every non-key attribute must depend on the whole
key).
If a relation is in 3rd NF but not in BCNF, then there are still update-anomalies
(redundancy).

There is a difference between BCNF and 3rd NF only if there are multiple
candidate keys with overlapping attributes.
Example: candidate key 1 consisting of attributes {A,B} and candidate key 2
consisting of attributes {B,C}. Overlapping attribute in both key candidates: B

41
Fundamentals of Databases 1

Boyce Codd Normal Form (BCNF)


is only to be applied under following conditions:

1. Candidate keys (composite keys) with partially overlapping attributes.


2. Functional dependencies between parts of the keys, e.g. if a part of the
key {A,B} is functionally dependent on a part of the key {B,C}.

Normalizing to BCNF:
Decomposition of the relation into two relations.

42
Fundamentals of Databases 1

Boyce Codd Normal Form (BCNF)


Village
locality region governor population
Achalsopheli Kartli Giorgi 500
Achalsopheli Imereti Levan 200
Sairme Imereti Levan 300
Sairme Ratcha Ketevan 100
Achalziche Djawacheti Irakli 8000

locality region Achalziche Jawacheti Irakli 8000

Candidate keys?
FDs
Is the relation in 3rd normal form?
43
Fundamentals of Databases 1

Boyce Codd Normal Form (BCNF)


Exam
Professor StudID Course Grade
Seager 1234 Usability a
Seager 4711 Usability b

Seager 5699 Usability a

Smith 1234 Programming c


Hugh 5699 Algorithms b

Any professor can only examine one course. Each course is only examined by one
professor.

Candidate keys?
FDs

44
Fundamentals of Databases 1

3rd normal form or BCNF

When decomposing relations in the process of normalization, two properties


must / should be maintained:
1. Nonadditive join property - MUST
Relations that contain matching attributes that are not (foreign key,
primary key) combinations cause additive joins with incorrect (spurious)
tuples.

2. Preservation of FDs - SHOULD (Desirable)

When decomposing relations from 3rd to BCNF, one MUST do it in a way that
renders non-additive joins and does not generate spurious tuples.
When decomposing relations from 3rd to BCNF, it is not always possible to
preserve all FDs. Sometimes one needs to decide between arriving at BCNF
but losing FDs or staying with 3rd NF and preserving FDs.

45
Fundamentals of Databases 1

Boyce Codd Normal Form (BCNF)


Exam
Professor StudID Course Grade
Seager 1234 Usability a
Seager 4711 Usability b

Seager 5699 Usability a

Smith 1234 Programming c


Hugh 5699 Programming b

Any professor can only examine one course but each course can be examined by
multiple professors.
Candidate keys?
FDs
FDs cannot all be preserved but non-additive join property can be achieved
46
Fundamentals of Databases 1

Wrong solution:
Generation of spurious (incorrect) tuples (additive join property)

→R1: {[professor, course]} Joining these two relations


→R2: {[studID, course, grade]} renders a spurious tuples.

StudID course grade Professor course

1234 Programming c Hugh Programming


5699 Programming b Smith Programming

StudID course grade Professor


1234 Programming c Hugh
1234 Programming c Smith
5699 Programming b Hugh
5699 Programming b Smith

47
Fundamentals of Databases 1

Summary Normalization

NF StudID
1st NF all attrbutes have atomic values only, no repearting groups, no
composite attributes, table format, PK
2nd NF 1st NF plus no partial dependency of non-key attributes
3rd NF 2nd NF plus no transitive dependencies of non-key attributes
BCNF 3rd NF plus every determinant is a candidate key

Definition: An attribute (or group of attributes) of which others are fully functionally
dependent is called a determinant.

48
Fundamentals of Databases 1

Summary Normalization

• Normalization rules help the database designer to build a consistent and


robust data model that prevents anomalies and reduces redundancy
.
• But: normalization also has disadvantages:
– when splitting the model into many single relations, clarity suffers at
some point
– when querying the data, the relations have to be reassembled (joined)
again. This costs performance.

• A good compromise is needed. Relations in 3rd NF / BCNF are seen as such


a good compromise

49
Fundamentals of Databases 1

In IPD course we work with this table that displays all item orders of a company.
What NF is it in?
What is the key?
Why is that perfectly okay for the purpose? Do we get anomalies?
Fundamentals of Databases 1

Denormalization: Relations staying in 1st NF


• Co, p. 227: The problem with normalization is that as tables are decomposed to
conform to normalization requirements, the number of database tables
expands. Therefore, in order to generate information, data must be put together
from various tables. Joining a large number of tables takes additional
input/output (I/O) operations and processing logic, thereby reducing system
speed.

• Denormalized relations – relations in 1st NF - get better performance.


• When is it possible to denormalize / leave in 1st NF - without getting anomalies?

• Data in NoSQL databases are usually stored in not normalized format.

51
eva.knirsch@kiu.edu.ge

You might also like