Professional Documents
Culture Documents
Chapter 4
Chapter 4
Chapter 4
Normalization
Chapter Outline
3.1 Normalization of Relations 3.2 Practical Use of Normal Forms 3.3 First Normal Form 3.4 Second Normal Form 3.5 Third Normal Form
5 BCNF (Boyce-Codd Normal Form) 3.Multivalued Dependencies and Fourth Normal Form Mapping EER Model Constructs to Relations
Step 8: Options for Mapping Specialization or Generalization. Step 9: Mapping of Union Types (Categories).
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 4- 2
Normalization:
The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations
Normal form:
Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form
Slide 4- 3
based on keys and FDs of a relation schema based on keys, multi-valued dependencies : MVDs; 5NF based on keys, join dependencies : JDs (Chapter 11)
4NF
Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation; Chapter 11)
Slide 4- 4
Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect The database designers need not normalize to the highest possible normal form
(usually up to 3NF, BCNF or 4NF) The process of storing the join of higher normal form relations as a base relationwhich is in a lower normal form
Denormalization:
Slide 4- 5
Disallows
composite attributes multivalued attributes nested relations; attributes whose values for an individual tuple are non-atomic
Slide 4- 6
Slide 4- 7
Slide 4- 8
Prime attribute: An attribute that is member of the primary key K Full functional dependency: a FD Y -> Z where removal of any attribute from Y means the FD does not hold any more {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds
Examples:
Slide 4- 9
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key R can be decomposed into 2NF relations via the process of 2NF normalization
Slide 4- 10
Slide 4- 11
Slide 4- 12
Definition:
Transitive functional dependency: a FD X -> Z that can be derived from two FDs X -> Y and Y -> Z SSN -> DMGRSSN is a transitive FD
Examples:
Since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold Since there is no set of attributes X where SSN -> X and X -> ENAME
Slide 4- 13
A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key R can be decomposed into 3NF relations via the process of 3NF normalization NOTE:
In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency . E.g., Consider EMP (SSN, Emp#, Salary ).
Here, SSN -> Emp# -> Salary and Emp# is a candidate key.
Slide 4- 14
Definition:
Superkey of relation schema R - a set of attributes S of R that contains a key of R A relation schema R is in third normal form (3NF) if whenever a FD X -> A holds in R, then either:
Slide 4- 15
A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> A holds in R, then X is a superkey of R Each normal form is strictly stronger than the previous one
Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF The goal is to have each relation in BCNF (or 3NF)
Slide 4- 16
Slide 4- 17
Slide 4- 18
{student, course} is a candidate key for this relation and that the dependencies shown follow the pattern in Figure 10.12 (b).
A relation NOT in BCNF should be decomposed so as to meet this property, while possibly forgoing the preservation of all functional dependencies in the decomposed relations.
Three possible decompositions for relation TEACH {student, instructor} and {student, course} {course, instructor } and {course, student} {instructor, course } and {instructor, student} All three decompositions will lose fd1. We have to settle for sacrificing the functional dependency preservation. But we cannot sacrifice the non-additivity property after decomposition. Out of the above three, only the 3rd decomposition will not generate spurious tuples after join.(and hence has the non-additivity property). A test to determine whether a binary decomposition (decomposition into two relations) is non-additive (lossless) is discussed in section 11.1.4 under Property LJ1. Verify that the third decomposition above meets the property.
Slide 4- 20
Slide 4- 21
Slide 4- 22
Slide 4- 23
Multivalued Dependencies and Fourth Normal Form (4) Definition: A relation schema R is in 4NF with respect to a set of dependencies F (that includes functional dependencies and multivalued dependencies) if, for every nontrivial multivalued dependency X >> Y in F+, X is a superkey for R.
Note: F+ is the (complete) set of all dependencies (functional or multivalued) that will hold in every relation state r of R that satisfies F. It is also called the closure of F.
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe
Slide 4- 24
Slide 4- 25
Convert each specialization with m subclasses {S1,S2, ,Sm} and generalized superclass C, where the attributes of C are {k,a1,an} and k is the (primary) key, into relational schemas using one of the four following options:
Option 8A: Multiple relations-Superclass and subclasses Option 8B: Multiple relations-Subclass relations only Option 8C: Single relation with one type attribute Option 8D: Single relation with multiple type attributes
Slide 4- 26
Create a relation L for C with attributes Attrs(L) = {k,a1, an} and PK(L) = k. Create a relation Li for each subclass Si, 1 < i < m, with the attributes Attrs(Li) = {k} U {attributes of Si} and PK(Li) = k. This option works for any specialization (total or partial, disjoint of over-lapping.) Create a relation Li for each subclass Si, 1 < i < m, with the attributes Attrs(Li) = {attributes of Si} U {k,a1,an} and PK(Li) = k. This option only works for a specialization whose subclasses are total (every entity in the superclass must belong to (at least) one of the subclasses).
Slide 4- 27
Slide 4- 28
FIGURE 7.4 Options for mapping specialization or generalization. (a) Mapping the EER schema in Figure 4.4 using option 8A.
Slide 4- 29
FIGURE 4.3 Generalization. (b) Generalizing CAR and TRUCK into the superclass VEHICLE.
Slide 4- 30
FIGURE 7.4 Options for mapping specialization or generalization. (b) Mapping the EER schema in Figure 4.3b using option 8B.
Slide 4- 31
Create a single relation L with attributes Attrs(L) = {k,a1,an} U {attributes of S1} UU {attributes of Sm} U {t} and PK(L) = k. The attribute t is called a type (or discriminating) attribute that indicates the subclass to which each tuple belongs Create a single relation schema L with attributes Attrs(L) = {k,a1,an} U {attributes of S1} UU {attributes of Sm} U {t1, t2, ,tm} and PK(L) = k. Each ti, 1 < I < m, is a Boolean type attribute indicating whether a tuple belongs to the subclass Si.
Slide 4- 32
Slide 4- 33
FIGURE 7.4 Options for mapping specialization or generalization. (c) Mapping the EER schema in Figure 4.4 using option 8C.
Slide 4- 34
Slide 4- 35
FIGURE 7.4 Options for mapping specialization or generalization. (d) Mapping Figure 4.5 using option 8D with Boolean type fields Mflag and Pflag.
Slide 4- 36
A shared subclass, such as STUDENT_ASSISTANT, is a subclass of several classes, indicating multiple inheritance. These classes must all have the same key attribute; otherwise, the shared subclass would be modeled as a category. We can apply any of the options discussed in Step 8 to a shared subclass, subject to the restriction discussed in Step 8 of the mapping algorithm. Below both 8C and 8D are used for the shared class STUDENT_ASSISTANT.
Slide 4- 37
FIGURE 4.7 A specialization lattice with multiple inheritance for a UNIVERSITY database.
Slide 4- 38
FIGURE 7.5 Mapping the EER specialization lattice in Figure 4.6 using multiple options.
Slide 4- 39
For mapping a category whose defining superclass have different keys, it is customary to specify a new key attribute, called a surrogate key, when creating a relation to correspond to the category. In the example below we can create a relation OWNER to correspond to the OWNER category and include any attributes of the category in this relation. The primary key of the OWNER relation is the surrogate key, which we called OwnerId.
Slide 4- 40
Slide 4- 41
FIGURE 7.6 Mapping the EER categories (union types) in Figure 4.7 to relations.
Slide 4- 42
Mapping Exercise
Exercise 7.4. FIGURE 7.7 An ER schema for a SHIP_TRACKING database.
Slide 4- 43