Chapter 4

Chapter 4
Normalization
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe
Chapter Outline
3 Normal Forms Based on Primary Keys

3.1 Normalization of Relations 3.2 Practical Use of Normal Forms 3.3 First Normal Form 3.4 Second Normal Form 3.5 Third Normal Form
5 BCNF (Boyce-Codd Normal Form) 3.Multivalued Dependencies and Fourth Normal Form Mapping EER Model Constructs to Relations

Step 8: Options for Mapping Specialization or Generalization. Step 9: Mapping of Union Types (Categories).
Slide 4- 2
3.1 Normalization of Relations (1)
Normalization:
The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations
Normal form:
Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form
Slide 4- 3
Normalization of Relations (2)
2NF, 3NF, BCNF
based on keys and FDs of a relation schema based on keys, multi-valued dependencies : MVDs; 5NF based on keys, join dependencies : JDs (Chapter 11)
4NF
Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation; Chapter 11)
Slide 4- 4
3.2 Practical Use of Normal Forms
Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect The database designers need not normalize to the highest possible normal form
(usually up to 3NF, BCNF or 4NF) The process of storing the join of higher normal form relations as a base relationwhich is in a lower normal form
Denormalization:
Slide 4- 5
3.2 First Normal Form
Disallows

composite attributes multivalued attributes nested relations; attributes whose values for an individual tuple are non-atomic
Considered to be part of the definition of relation
Slide 4- 6
Figure 10.8 Normalization into 1NF
Slide 4- 7
Figure 10.9 Normalization nested relations into 1NF
Slide 4- 8
3.3 Second Normal Form (1)

Uses the concepts of FDs, primary key Definitions
Prime attribute: An attribute that is member of the primary key K Full functional dependency: a FD Y -> Z where removal of any attribute from Y means the FD does not hold any more {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds
Examples:
Slide 4- 9
Second Normal Form (2)
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key R can be decomposed into 2NF relations via the process of 2NF normalization
Slide 4- 10
Figure 10.10 Normalizing into 2NF and 3NF
Slide 4- 11
Figure 10.11 Normalization into 2NF and 3NF
Slide 4- 12
3.4 Third Normal Form (1)
Definition:
Transitive functional dependency: a FD X -> Z that can be derived from two FDs X -> Y and Y -> Z SSN -> DMGRSSN is a transitive FD
Examples:
Since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold Since there is no set of attributes X where SSN -> X and X -> ENAME
Slide 4- 13
SSN -> ENAME is non-transitive
Third Normal Form (2)
A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key R can be decomposed into 3NF relations via the process of 3NF normalization NOTE:
In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency . E.g., Consider EMP (SSN, Emp#, Salary ).
Here, SSN -> Emp# -> Salary and Emp# is a candidate key.
Slide 4- 14
General Normal Form Definitions (2)
Definition:
Superkey of relation schema R - a set of attributes S of R that contains a key of R A relation schema R is in third normal form (3NF) if whenever a FD X -> A holds in R, then either:

(a) X is a superkey of R, or (b) A is a prime attribute of R
NOTE: Boyce-Codd normal form disallows condition (b) above
Slide 4- 15
5 BCNF (Boyce-Codd Normal Form)
A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> A holds in R, then X is a superkey of R Each normal form is strictly stronger than the previous one

Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF The goal is to have each relation in BCNF (or 3NF)
Slide 4- 16
Figure 10.12 Boyce-Codd normal form
Slide 4- 17
Figure 10.13 a relation TEACH that is in 3NF but not in BCNF
Slide 4- 18
Achieving the BCNF by Decomposition (1)
Two FDs exist in the relation TEACH:

fd1: { student, course} -> instructor fd2: instructor -> course
{student, course} is a candidate key for this relation and that the dependencies shown follow the pattern in Figure 10.12 (b).
So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet this property, while possibly forgoing the preservation of all functional dependencies in the decomposed relations.
(See Algorithm 11.3)

Slide 4- 19
Achieving the BCNF by Decomposition (2)
Three possible decompositions for relation TEACH {student, instructor} and {student, course} {course, instructor } and {course, student} {instructor, course } and {instructor, student} All three decompositions will lose fd1. We have to settle for sacrificing the functional dependency preservation. But we cannot sacrifice the non-additivity property after decomposition. Out of the above three, only the 3rd decomposition will not generate spurious tuples after join.(and hence has the non-additivity property). A test to determine whether a binary decomposition (decomposition into two relations) is non-additive (lossless) is discussed in section 11.1.4 under Property LJ1. Verify that the third decomposition above meets the property.
Slide 4- 20
3. Multivalued Dependencies and Fourth Normal Form (1)

(a) The EMP relation with two MVDs: ENAME >> PNAME and ENAME >> DNAME. (b) Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.
Slide 4- 21
3. Multivalued Dependencies and Fourth Normal Form (1)

(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3). (d) Decomposing the relation SUPPLY into the 5NF relations R1, R2, and R3.
Slide 4- 22
Multivalued Dependencies and Fourth Normal Form (2)

Definition: A multivalued dependecy (MVD) X >> Y specified on relation schema R, where X and Y are both subsets of R, specifies the following constraint on any relation state r of R: If two tuples t1 and t2 exist in r such that t1[X] = t2[X], then two tuples t3 and t4 should also exist in r with the following properties, where we use Z to denote (R 2 (X Y)): t [X] = t [X] = t [X] = t [X]. 3 4 1 2 t [X] = t [X] = t [X] = t [X]. 3 1 4 2 t [X] = t [X] = t [X] = t [X]. 3 2 4 1 An (MVD) X >> Y in R is called a trivial MVD if (a) Y is a subset of X or (b) X Y = R.
Slide 4- 23
Multivalued Dependencies and Fourth Normal Form (4) Definition: A relation schema R is in 4NF with respect to a set of dependencies F (that includes functional dependencies and multivalued dependencies) if, for every nontrivial multivalued dependency X >> Y in F+, X is a superkey for R.
Note: F+ is the (complete) set of all dependencies (functional or multivalued) that will hold in every relation state r of R that satisfies F. It is also called the closure of F.
Slide 4- 24
Multivalued Dependencies and Fourth Normal Form (5)

Decomposing a relation state of EMP that is not in 4NF: (a) EMP relation with additional tuples. (b) Two corresponding 4NF relations EMP_PROJECTS and EMP_DEPENDENTS.
Slide 4- 25
Mapping EER Model Constructs to Relations
Step8: Options for Mapping Specialization or Generalization.
Convert each specialization with m subclasses {S1,S2, ,Sm} and generalized superclass C, where the attributes of C are {k,a1,an} and k is the (primary) key, into relational schemas using one of the four following options:
Option 8A: Multiple relations-Superclass and subclasses Option 8B: Multiple relations-Subclass relations only Option 8C: Single relation with one type attribute Option 8D: Single relation with multiple type attributes
Slide 4- 26
Mapping EER Model Constructs to Relations
Option 8A: Multiple relations-Superclass and subclasses
Create a relation L for C with attributes Attrs(L) = {k,a1, an} and PK(L) = k. Create a relation Li for each subclass Si, 1 < i < m, with the attributes Attrs(Li) = {k} U {attributes of Si} and PK(Li) = k. This option works for any specialization (total or partial, disjoint of over-lapping.) Create a relation Li for each subclass Si, 1 < i < m, with the attributes Attrs(Li) = {attributes of Si} U {k,a1,an} and PK(Li) = k. This option only works for a specialization whose subclasses are total (every entity in the superclass must belong to (at least) one of the subclasses).
Option 8B: Multiple relations-Subclass relation only
Slide 4- 27
FIGURE 4.4 EER diagram notation for an attribute-defined specialization on JobType.
Slide 4- 28
FIGURE 7.4 Options for mapping specialization or generalization. (a) Mapping the EER schema in Figure 4.4 using option 8A.
Slide 4- 29
FIGURE 4.3 Generalization. (b) Generalizing CAR and TRUCK into the superclass VEHICLE.
Slide 4- 30
FIGURE 7.4 Options for mapping specialization or generalization. (b) Mapping the EER schema in Figure 4.3b using option 8B.
Slide 4- 31
Mapping EER Model Constructs to Relations (contd.)
Option 8C: Single relation with one type attribute
Create a single relation L with attributes Attrs(L) = {k,a1,an} U {attributes of S1} UU {attributes of Sm} U {t} and PK(L) = k. The attribute t is called a type (or discriminating) attribute that indicates the subclass to which each tuple belongs Create a single relation schema L with attributes Attrs(L) = {k,a1,an} U {attributes of S1} UU {attributes of Sm} U {t1, t2, ,tm} and PK(L) = k. Each ti, 1 < I < m, is a Boolean type attribute indicating whether a tuple belongs to the subclass Si.
Option 8D: Single relation with multiple type attributes
Slide 4- 32
FIGURE 4.4 EER diagram notation for an attribute-defined specialization on JobType.
Slide 4- 33
FIGURE 7.4 Options for mapping specialization or generalization. (c) Mapping the EER schema in Figure 4.4 using option 8C.
Slide 4- 34
FIGURE 4.5 EER diagram notation for an overlapping (non-disjoint) specialization.
Slide 4- 35
FIGURE 7.4 Options for mapping specialization or generalization. (d) Mapping Figure 4.5 using option 8D with Boolean type fields Mflag and Pflag.
Slide 4- 36
Mapping of Shared Subclasses (Multiple Inheritance)
A shared subclass, such as STUDENT_ASSISTANT, is a subclass of several classes, indicating multiple inheritance. These classes must all have the same key attribute; otherwise, the shared subclass would be modeled as a category. We can apply any of the options discussed in Step 8 to a shared subclass, subject to the restriction discussed in Step 8 of the mapping algorithm. Below both 8C and 8D are used for the shared class STUDENT_ASSISTANT.
Slide 4- 37
FIGURE 4.7 A specialization lattice with multiple inheritance for a UNIVERSITY database.
Slide 4- 38
FIGURE 7.5 Mapping the EER specialization lattice in Figure 4.6 using multiple options.
Slide 4- 39
Step 9: Mapping of Union Types (Categories).
For mapping a category whose defining superclass have different keys, it is customary to specify a new key attribute, called a surrogate key, when creating a relation to correspond to the category. In the example below we can create a relation OWNER to correspond to the OWNER category and include any attributes of the category in this relation. The primary key of the OWNER relation is the surrogate key, which we called OwnerId.
Slide 4- 40
FIGURE 4.8 Two categories (union types): OWNER and REGISTERED_VEHICLE.
Slide 4- 41
FIGURE 7.6 Mapping the EER categories (union types) in Figure 4.7 to relations.
Slide 4- 42
Mapping Exercise
Exercise 7.4. FIGURE 7.7 An ER schema for a SHIP_TRACKING database.
Slide 4- 43

Chapter 4

Uploaded by

Copyright:

Available Formats

You might also like

Chapter 4

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4

Uploaded by

Copyright:

Available Formats

Chapter 4

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

3 Normal Forms Based on Primary Keys

3.1 Normalization of Relations (1)

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

Normalization of Relations (2)

2NF, 3NF, BCNF

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

3.2 Practical Use of Normal Forms

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

3.2 First Normal Form

Considered to be part of the definition of relation

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

Figure 10.8 Normalization into 1NF

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

Figure 10.9 Normalization nested relations into 1NF

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

3.3 Second Normal Form (1)

Uses the concepts of FDs, primary key Definitions

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

Second Normal Form (2)

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

Figure 10.10 Normalizing into 2NF and 3NF

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

Figure 10.11 Normalization into 2NF and 3NF

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

3.4 Third Normal Form (1)

SSN -> ENAME is non-transitive

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

Third Normal Form (2)

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

General Normal Form Definitions (2)

(a) X is a superkey of R, or (b) A is a prime attribute of R

NOTE: Boyce-Codd normal form disallows condition (b) above

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

5 BCNF (Boyce-Codd Normal Form)

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

Figure 10.12 Boyce-Codd normal form

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

Figure 10.13 a relation TEACH that is in 3NF but not in BCNF

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

Achieving the BCNF by Decomposition (1)

Two FDs exist in the relation TEACH:

fd1: { student, course} -> instructor fd2: instructor -> course

So this relation is in 3NF but not in BCNF

(See Algorithm 11.3)

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

Achieving the BCNF by Decomposition (2)

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

3. Multivalued Dependencies and Fourth Normal Form (1)

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

3. Multivalued Dependencies and Fourth Normal Form (1)

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

Multivalued Dependencies and Fourth Normal Form (2)

Multivalued Dependencies and Fourth Normal Form (5)

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe

Mapping EER Model Constructs to Relations

Step8: Options for Mapping Specialization or Generalization.