Professional Documents
Culture Documents
Normalization
Normalization
and
Database Normalization
Functional Dependencies
• Redundancy: root of several problems with
relational schemas:
– redundant storage, insert/delete/update anomalies
• FD, a form of integrity constraint that can
identify schemas with such problems and suggest
refinements.
• X → Y means
– Given any two tuples in r, if the X values are the
same, then the Y values must also be the same. (but
not vice versa)
• Read “→” as “determines”
Example Functional Dependencies
• Let R be NewStudent(stuId, lastName, major, credits, status, socSecNo)
• FDs in R include
– {stuId}→{lastName}, but not the reverse
– {stuId} →{lastName, major, credits, status, socSecNo, stuId}
– {socSecNo} →{stuId, lastName, major, credits, status, socSecNo}
– {credits}→{status}, but not {status}→{credits}
• ZipCode→AddressCity
– 16652 is Huntingdon’s ZIP
• ArtistName→BirthYear
– Picasso was born in 1881
• Autobrand→Manufacturer, Engine type
– Pontiac is built by General Motors with gasoline engine
• Author, Title→PublDate
– Shakespeare’s Hamlet was published in 1600
• Trivial Functional Dependency
– The FD X→Y is trivial if set {Y} is a subset of set {X}
– Examples: If A and B are attributes of R,
• {A}→{A}
• {A,B} →{A}
• {A,B} →{B}
• {A,B} →{A,B}
– are all trivial FDs and will not contribute to the evaluation of normalization.
FD Axioms
• Consider the following attributes and functional dependencies: A B C D E F H
A--> D
AE --> H
DF --> BC
E --> C
H --> E
a) List all keys (not superkeys):
(Hint: Given a relation R and a set of functional dependencies F, the key K is found as follows:
Initially set K:=R
For each attribute A in R, do {
Compute {K-A}+
If {K-A}+ contains all attributes in R, then set K={K-A}
} end
Keys: AEF, AFH
b) Which of the following dependencies are implied by those above:
– A→AD implied
– A→DH not implied
– AED→C implied
– DH→C implied
– ADF→E not implied
Normalization - Definition
• This is the process which allows you to clean out
redundant data within your database.
• This involves restructuring the tables to successively
meeting higher forms of Normalization.
• A properly normalized database should have the
following characteristics
– Scalar values in each fields
– Absence of redundancy.
– Minimal use of null values.
– Minimal loss of information.
Levels of Normalization
• Levels of normalization based on the amount of
redundancy in the database.
• Various levels of normalization are:
– First Normal Form (1NF)
– Second Normal Form (2NF)
Redundancy
Number of Tables
– Third Normal Form (3NF)
Complexity
– Boyce-Codd Normal Form (BCNF)
– Fourth Normal Form (4NF)
– Fifth Normal Form (5NF)
– Domain Key Normal Form (DKNF)
0-55-123456-9 Main Street Small House 714-000-0000 $22.95 0-55-123456-9 Jones 123-333-3333
a)Candidate keys: L
2NF…There is no partial dependency. Not 3NF ... e.g. due to transitive dependency (M →
K) (K is not part of a key).
b) Candidate keys: LN
Not 2NF, After decomposing, 3NF ... There is no transitive dependency. BCNF ... every
determinant in that table is a candidate key.
c)Candidate keys: KLM LMN
Assuming KLM as primary key; R is 3NF ... There is no transitive dependency. R is not
BCNF ... e.g. in N → K, N does not contain a key.
d)Candidate keys: K
2NF… There is no partial dependency. Not 3NF ... e.g. due to transitive dependency (LM
→ N) (LM does not contain a key).
R = {A, B, C, D, E, F, G, H, I}
F = { {A, B} -> {C},
{A} -> {D, E},
{B} -> {F},
{F} ->{G, H},
{D} -> {I, J} }
What is the key for R? Decompose R into 2NF, then 3NF relations.
R = {A, B, C, D, E, F, G, H, I}
F = { {A, B} -> {C},
{A} -> {D, E},
{B} -> {F},
{F} ->{G, H},
{D} -> {I, J} }
What is the key for R? Decompose R into 2NF, then 3NF relations.
Key : AB
{A}+ = {A, D, E, I, J} and {B}+ = {B, F, G, H}, Not 2NF,
R1 = {A, D, E, I, J}, R2 = {B, F, G, H}, R3 = {A, B, C}
{A} -> {D} -> {I, J} Not 3NF,
R11 = {D, I, J}, R12 = {A, D, E}
{B} -> {F} -> {G, H} Not 3NF,
R21 = {F, G, H}, R22 = {B, F}
The final set of relations in 3NF are {R11, R12, R21, R22, R3}
R = {A, B, C, D, E, F, G, H, I}
G = { {A, B} -> {C},
{B, D} -> {E, F},
{A, D} -> {G, H},
{A} -> {I},
{H} -> {J} }.
What is the key for R? Decompose R into 2NF, then 3NF relations.
R = {A, B, C, D, E, F, G, H, I}
G = { {A, B} -> {C},
{B, D} -> {E, F},
{A, D} -> {G, H},
{A} -> {I},
{H} -> {J} }.
What is the key for R? Decompose R into 2NF, then 3NF relations.
Key : ABD
{A, B} -> {C, I}, {B, D} -> {E, F}, {A, D} -> {G, H, I, J} Not 2NF,
R1 = {A, B, C}, R2 = {B, D, E, F}, R3 = {A, D, G, H, J}, R4 = {A, B, D}, R5 = {A, I}
{A, D} -> {H} -> {J} Not 3NF,
R31 = {H, J}, R32 = {A, D, G, H}
The final set of 3NF relations is {R1, R2, R31, R32, R4, R5}
Consider a relation R(A,B,C,D,E) with the following dependencies:
AB -> C
CD -> E
DE -> B
If not, is ABD?