Normalization

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Employee Information (Employee schema)

Emp-No Emp- Dept Manager Proj-id Proj- Location Weeks-


Name Start- on-
Date Project
005 Smith Marketing Jones A 12-93 Peking 11
B 6-94 Kolkata 15
C 09-94 Delhi 6
007 Bond Accounts Bloggs B 06-94 Kolkata 3
D 06-94 Berlin 9
009 King Info Hurne C 09-94 Delhi 10
Systems
010 Holt Accounts Bloggs A 12-93 Peking 21
B 06-94 Belfast 10
D 06-94 Hamburg 12
Store the information of emp-project in a DBMS.
Data Redundancy
Emp- Emp- Dept Manager Proj-id Proj- Locatio Weeks-
No Name Start- n on-
Date Project
005 Smith Marketing Jones A 12-93 Peking 11
005 Smith Marketing Jones B 6-94 Kolkata 15
005 Smith Marketing Jones C 09-94 Delhi 6
007 Paul Accounts James B 06-94 Kolkata 3
007 Paul Accounts James D 06-94 Berlin 9
009 King Systems Hurne C 09-94 Delhi 10
010 Holt Accounts James A 12-93 Peking 21
010 Holt Accounts James B 06-94 Belfast 10
 What is the problem of data redundancy?
Insert Anomalies
Emp- Emp- Dept Manager Proj-id Proj- Locatio Weeks-
No Name Start- n on-
Date Project
005 Smith Marketing Jones A 12-93 Peking 11
005 Smith Marketing Jones B 6-94 Kolkata 15
005 Smith Marketing Jones C 09-94 Delhi 6
007 Paul Accounts James B 06-94 Kolkata 3
007 Paul Accounts James D 06-94 Berlin 9
009 King Systems Hurne C 09-94 Delhi 10
010 Holt Accounts James A 12-93 Peking 21
010 Holt Accounts James B 06-94 Belfast 10
006 John Accounts James C 09-94 Mumbai 15

A new employee (006) is assigned in Project C in Mumbai for 15


weeks. Insertion of this information requires to insert the other data
correctly so that data base remains consistent. Only project information
cannot be added.
Delete Anomalies
Emp- Emp- Dept Manager Proj-id Proj- Locatio Weeks-
No Name Start- n on-
Date Project
005 Smith Marketing Jones A 12-93 Peking 11
005 Smith Marketing Jones B 6-94 Kolkata 15
005 Smith Marketing Jones C 09-94 Delhi 6
007 Paul Accounts James B 06-94 Kolkata 3
007 Paul Accounts James D 06-94 Berlin 9
009 King Systems Jack C 09-94 Delhi 10
010 Holt Accounts James A 12-93 Peking 21
010 Holt Accounts James B 06-94 Belfast 10
006 John Accounts James C 09-94 Delhi 6

Let, employee 007 resigned from the office. Deletion of that tuple is the loss
of the information of project no D.
Update Anomalies
Emp- Emp- Dept Manager Proj-id Proj- Locatio Weeks-
No Name Start- n on-
Date Project
005 Smith Marketing Tom A 12-93 Peking 11
005 Smith Marketing Tom B 6-94 Kolkata 15
005 Smith Marketing Tom C 09-94 Delhi 6
007 Paul Accounts James B 06-94 Kolkata 3
007 Paul Accounts James D 06-94 Berlin 9
009 King Systems Hurne C 09-94 Delhi 10
010 Holt Accounts James A 12-93 Peking 21
010 Holt Accounts James B 06-94 Belfast 10
006 John Accounts James C 09-94 Delhi 6

A new manager ‘Tom’ is assigned in marketing dept. which requires update


operation of many rows . Update two or more relations may lead to data
inconsistency.
Normalization
 Definition: Normalization can be looked upon as a
process of analyzing the given relation schemes based
on their FDs and primary keys to achieve the desirable
properties of

➢ minimizing redundancy and


➢ minimizing the insertion, deletion and update
anomalies.
First Normal Form
 First Normal Form:
 Definition: A relation R is in first normal form if and only if
all underlying domains contain atomic (i.e. simple &
indivisible) values only.

 First normal form was defined to disallow:


 Multi-valued attributes
 Composite attributes and
 The combination of both
Identifying Primary Key
 The most difficult part of constructing a first normal form set
of relations is identifying the primary key for each relation.

 It is best to select the primary key attributes which reduce


the number of attributes that repeat. Primary key should
the following properties:
➢ stable
➢ minimal
➢ definite
➢ accessible
First Normal Form
 Case 1 (composite attribute):
 Employee_Schema( emp_id, ename, eaddr, salary)
 Let,Emp_id is

IT - 212

Department
name Unique emp no.

 Employee schema is not in 1NF.


 Disadvantages of composite attribute: Extra programming required
 Solution: Store only the component attributes
First Normal Form
 Case 2 (Multivalued attribute):
 Department_Schema( D-number, dname, d-mgr-ssn, d-loc)

D-number dname D-mgr-ssn D-loc


5 Research M-100 {kolkata,
Delhi,
Mumbai}
4 Administrator M-101 bangalore

1 Headquarters M-102 kolkata

 Department_schema is not in 1NF as D-loc is a multivalued attribute.


 What will be the solution?
First Normal Form
 1st solution:
 Department_schema(D-number, dname, d-mgr-ssn, d-loc)

D-number dname D-mgr-ssn D-loc


5 Research M-100 kolkata
5 Research M-100 Delhi
5 Research M-100 Mumbai
4 Administrator M-101 bangalore

1 Headquarters M-102 kolkata

It is not free from the insert, delete & update anomalies


2nd Solution:
 Remove the multi-valued attribute.
 Place it in a separate relation along with the primary key
(which is designed without considering the multivalued
attibute).
Identify FDs & primary key
Dnumber dname D-mgr-ssn D-loc
5 Research M-100 {kolkata,
Delhi,
Mumbai}
4 Administrator M-101 bangalore

1 Headquarters M-102 kolkata

FDs: Dnumber → dname D-number→ D-mgr-ssn

If dloc is not considered dnumber will be the primary key.


First Normal Form
2nd Solution:
 Remove the attribute d-loc that violates 1NF.
 Place it in a separate relation along with the primary key
d-no.
Department ( D-
number, dname, d-
mngr-ssn, d-loc)

Dept ( D-number, Dept_location(D-


dname, d-mngr-ssn) number, d-loc)
Guidelines to construct 1NF
Relations
1. Identify the set of multi-valued attributes (repeating
group).

2. Select a primary key for the set of attributes without


considering the multi-valued attribute.

3. Create a relation from the attributes which are not in the


repeating group.

4. Make the key of this relation the key identified in (2).


Guidelines to construct 1NF
Relations
5. Identify the primary key for the repeating group by taking
each value of the primary key in (2) and identifying a unique
attribute(s) in the repeating group.

6. Create a relation from the attributes in the repeating group


and the primary key identified in (2).

7. Make the key for this relation the key identified in (5) and
(if the key in (5) is not unique) the key identified in (2).

8. If the repeating group contains a set of attributes which repeat


then apply the guidelines from (2).
EMPLOYEE relation
Emp-No Emp- Dept Manager Proj-id Proj- Location Weeks-
Name Start- on-
Date Project
005 Smith Marketing Jones A 12-93 Peking 11
B 6-94 Kolkata 15
C 09-94 Delhi 6
007 Bond Accounts Bloggs B 06-94 Kolkata 3
D 06-94 Berlin 9
009 King Info Hurne C 09-94 Delhi 10
Systems
010 Holt Accounts Bloggs A 12-93 Peking 21
B 06-94 Belfast 10
D 06-94 Hamburg 12
Information of Employee
Employee(emp-id, emp-name, dept, manager, proj-
id, proj-startdate, location, weeks-on-project)
 Every employee belongs to a particular department with
their managers.
 Each employee may be assigned into more than one
projects.
 Every project is started on a particular date.
 Projects are given to employees which may work in
different branches which are in different locations.
 Employees are working for a particular project for some
weeks.
 EMPLOYEE is not in 1NF !
1 st Solution ( with so many repetitions)

Emp- Emp- Dept Manager Proj-id Proj- Locatio Weeks-


No Name Start- n on-
Date Project
005 Smith Marketing Jones A 12-93 Peking 11
005 Smith Marketing Jones B 6-94 Kolkata 15
005 Smith Marketing Jones C 09-94 Delhi 6
007 Paul Accounts James B 06-94 Kolkata 3
007 Paul Accounts James D 06-94 Berlin 9
009 King Systems Hurne C 09-94 Delhi 10
010 Holt Accounts James A 12-93 Peking 21
010 Holt Accounts James B 06-94 Belfast 10
Identify FDs & primary key
 FDs:
 Emp-no → { emp-name, dept, manager}
 Dept→ manager
 Proj-id → proj-startdate
 {emp-id, proj-id} → {location, weeks-on-project}

 Multi-valued attributes:
Proj-id, proj-startdate, location, weeks-on-project

 Primary key without multi-valued attributes:


emp-no
Solution
 The primary key is emp-no.

 The set of attributes which repeat for each value of emp-


no are
 proj-id, proj-start-date, location, and weeks-on-project.

 Removing these attributes from the full attribute set


produces the relation:
 (emp-no, emp-name, dept, manager)

 The primary key is emp-no.


 (emp-no, emp-name, dept, manager)
Solution
 The primary key for the repeating group is proj_id as for each
emp_no the proj_id uniquely identifies the proj-start-date,
location, and weeks-on-project attributes.

 The new relation is:


 (emp-no, proj-id, proj-start-date, location, weeks-on-project)

 The primary key for the repeating group, proj-id, is not unique
in this relation and so the key of this relation is (emp-no, proj-
id).
 (emp-no, proj-id, proj-start-date, location, weeks-on-project)
Solution
 There are no more repeating groups.

 The first normal form relations are:


 employee1(emp-no, emp-name, dept, manager)
 emp-on-p1(emp-no, proj-id, proj-start-date, location,
weeks-on)
Properties of Relational Decompositions
 Attribute preservation condition: Each attribute in R will
appear in at least one relation schema Ri in the decomposition
so that no attributes are “lost”.

 Dependency Preservation Property: a decomposition D =


{R1, R2, ..., Rm} of R is dependency-preserving with respect to
F if the union of the projections of F on each Ri in D is
equivalent to F; that is, ((R1(F)) υ . . . υ (Rm(F)))+ = F+
Properties of Relational Decompositions
Lossless (Non-additive) Join Property of a Decomposition:
A decomposition D = {R1, R2, ..., Rm} of R has the lossless
(nonadditive) join property with respect to the set of
dependencies F on R if, for every relation state r of R that satisfies
F, the following holds, where * is the natural join of all the
relations in D:
* (R1(r), ..., Rm(r)) = r

Note: The word loss in lossless refers to loss of information, not to loss of
tuples or “addition of spurious information”
Decomposition with loss of information
R1
A1 A2 A3
a 1 ab
b 2 ab
R a 2 bc
Join over R(lossy)
A1 A2 A3 A4
A1 A1 A2 A3 A4
a 1 ab X
a 1 ab X
b 2 ab Y
R2 a 1 ab Y
a 2 bc Y
A1 A4 b 2 ab Y
a 2 bc Y
FDs: {A1,A2}→A3 a X a 2 bc X
A2→A4
b Y
a Y Spurious tuples
Loss-less Decomposition
R1
A1 A2 A3
a 1 ab
b 2 ab
R a 2 bc
Join over R
A1 A2 A3 A4
A2 A1 A2 A3 A4
a 1 ab X
a 1 ab X
b 2 ab Y
R2 b 2 ab Y
a 2 bc Y
A2 A4 a 2 bc Y
FDs: {A1,A2}→A3
A2→A4 1 X
2 Y
Properties of Relational Decompositions
 Lossless Binary Decomposition:
A decomposition D = {R1, R2} of R has the lossless
join property with respect to a set of functional
dependencies F on R if and only if either
 (R1 ∩ R2)  R1 or
 (R1 ∩ R2)  R2.

Chapter 11-31
Properties of Relational Decompositions
Algorithm : Testing for Lossless Join Property
Input: A universal relation R, a decomposition D = {R1, R2, ..., Rm}
of R, and a set F of functional dependencies.

1. Create an initial matrix S with one row i for each relation Ri in


D, and one column j for each attribute Aj in R.

2. Set S(i,j):=bij for all matrix entries. (* each bij is a distinct


symbol associated with indices (i,j) *).

3. For each row i representing relation schema Ri


{for each column j representing attribute Aj
{if (relation Ri includes attribute Aj) then set S(i,j):= aj;};};
(* each aj is a distinct symbol associated with index (j) *)
Properties of Relational Decompositions
Algorithm: Testing for Lossless Join Property (cont.)
4. Repeat the following loop until a complete loop execution
results in no changes to S
{for each functional dependency X Y in F
{for all rows in S which have the same symbols in the columns
corresponding to attributes in X
{make the symbols in each column that correspond to an attribute in Y
be the same in all these rows as follows: if any of the rows has an “a” symbol
for the column, set the other rows to that same “a” symbol in the column. If
no “a” symbol exists for the attribute in any of the rows, choose one of the
“b” symbols that appear in one of the rows for the attribute and set the other
rows to that same “b” symbol in the column ;};};};

5. If a row is made up entirely of “a” symbols, then the


decomposition has the lossless join property; otherwise it does
not.
Is decomposition of R lossless?
 R= {ssn, ename, pno, ploc, pname,hours}
 F ={ ssn→ename, pno→{pnme, ploc}, {ssn, pno}→hours}
 R is decomposed into
 R1(ssn ,ename)
 R2(pno, pname,ploc)
 R3(ssn, pno, hours)
After executing STEP 1, 2 ,3
After executing STEP 4, 5
Second normal form
(2NF)
(based on the concept of full-
functional dependency)
Full & partial functional dependency
 Full functional Dependency: An FD X→Y is a full-
functional dependency if removal of any attribute A from X
means that the dependency does not hold anymore i.e. A Є X,
(X-{A})does not functionally determine Y.

 Partial Dependency: If some attribute A (A Є X) can be


removed from X and the dependency still holds
i.e. for some A, (x-{A})→Y.

 A partial dependency can only occur in a relation with a


composite key .
Full & partial functional dependency
 Emp-proj-schema( ssn, pno, hours, ename, pname,
p-loc)
 FDs: { Ssn → ename, P-no→ {pname, p-loc}
{ssn, p-no}→ hours {Ssn,pno} → ename }

 Example (full functional dependency):


Ssn → ename, P-no→ {pname, p-loc}, {ssn, p-no}→ hours

 Example(partial functional dependency):


{ssn, p-no}→ename, as ssn alone can funcionally
determines ename.
Second Normal form

Definition: A relation R is in second normal form (2 NF)


if and only if
 it is in 1NF and
 every non-key attribute is fully functionally dependent on
the primary key.
Example of 2NF
Emp-proj-schema( ssn, pno, hours, pname, p-loc)
FDs:
 P-no→ {pname, p-loc}
 {ssn, p-no}→ hours

Q. Is this relation in 2NF ?


Example of 2NF
 Emp-proj-schema is not in 2NF.

 Primary key : {ssn, pno}

 FDs for which relation schema violates 2NF:


pno → { pname, ploc}

 Parital dependency on primary key:


{ssn, pno} → { pname, ploc}

Emp-proj-schema should be decomposed into 2NF


relations.
Second Normal Form

Emp-proj-schema ( ssn,
pno, hours, ename, pname,
Solution: p-loc)

Emp1-schema ( Emp2-schema (pno,


ssn, pno, hours) pname, p-loc)

{Ssn, pno} →
hours pno → {pname,
ploc}
Why 2NF?
 The relation Emp-proj-schema (not in 2NF) suffers from
anomalies with respect to insertion, deletion and update
operation.

 Insertion: We cannot enter the fact that a particular


employee’s name and his ssn until he is allocated in a
project.

 Deletion: If we want to delete a project information after


its completion a tuple will be deleted that contains
information of an employee.
Why 2NF?
 Update Operation: The name of an employee may appear
many times if he is allocated to many projects. This
redundancy causes problems in update operation.
EMPLOYEE relation
Emp-No Emp- Dept Manager Proj-id Proj- Location Weeks-
Name Start- on-
Date Project
005 Smith Marketing Jones A 12-93 Peking 11
B 6-94 Kolkata 15
C 09-94 Delhi 6
007 Bond Accounts Bloggs B 06-94 Kolkata 3
D 06-94 Berlin 9
009 King Info Hurne C 09-94 Delhi 10
Systems
010 Holt Accounts Bloggs A 12-93 Peking 21
B 06-94 Belfast 10
D 06-94 Hamburg 12
2NF on EMPLOYEE
 EMPLOYEE relation is already normalized in 1NF and
the new relations are
• Employee1(emp-no, emp-name, dept, manager)
• Emp-on-p1(emp-no, proj-id, proj-start-date, location,
weeks-on-project)

 Employee1 is in 2NF as the non-key attributes are fully


functionally dependent on primary key.
 Primary key={emp-no} (Not a composite key)
 FDs: {emp-no} →{emp-name, dept, manager}
dept → manager
2NF on EMP-PROJECT

 Emp-on-p1 is not in 2NF.


 Primary key={emp-no, proj-id} (composite key)

 FDs: {emp-no,proj-id}→{ location, weeks-on-project}


proj-id→proj-start-date

What is the solution?


EMP-PROJECT in 2NF
Appling 2NF on EMP-PROJECT we get:

 Employee1(emp-no, emp-name, dept, manager)

 Project1(proj-id, proj-start-date)

 Emp-p2(emp-no, location, weeks-on-project)


Third normal form
(3NF)
(based on the concept of
transitive dependency)
Transitive dependency
Definition:
A FD X→Y in a relation schema R is a transitive
dependency if there is a set of attributes Z that is neither a
candidate key nor a subset of any key of R and both X→Z
and Z→Y hold.

X→Z, Z→Y
i.e. X→Z→Y
i.e. X→Y

All dependencies are non-trivial FDs.


3 rd Normal Form(3NF)
Definition:
A relation R is in 3NF if and only if
 it is in 2NF and
 every non-key attribute is non-transitively dependent
on the primary key.
Example
 Sales( Sid , status, city),
where each Sid value determines a city value and each city
has its status.

 FDs will be:


 Sid → city
 City→ status
Sid →status (transitive dependency holds)
Example
 Relation schema Sales( Sid , status, city),
Primary key: Sid
FDs: Sid→{city , status} , city→status

 Non-key attributes(status, city) are fully functionally


dependent on primary key S#. Hence Sales relation
schema is in 2NF.

 But there is a transitive dependency (city→status). So, it


is not in 3NF.

What is the solution?


Solution
Decompose Sales relation into others relations which do not
hold any transitive dependency.
Sales( S#, status,
city)

SC( S# , city) CT(city, status)

S# → city City → status


Why 3NF?
Transitivity leads to difficulties in insertion, deletion and update
anomalies.
 Insertion: We cannot enter the fact that particular city has a status
value until we have some suppliers located in that city.

 Deletion: If we delete a tuple of sales relation we destroy not only


the information for the concerned supplier but also the information
that a city has a status value.

 Update operation: The status value for a given city appears many
times in relation, such redundancy may lead to inconsistent state
after updating the status of a city.
EMP-PROJECT relation
Emp-No Emp- Dept Manager Proj-id Proj- Location Weeks-
Name Start- on-
Date Project
005 Smith Marketing Jones A 12-93 Peking 11
B 6-94 Kolkata 15
C 09-94 Delhi 6
007 Bond Accounts Bloggs B 06-94 Kolkata 3
D 06-94 Berlin 9
009 King Info Hurne C 09-94 Delhi 10
Systems
010 Holt Accounts Bloggs A 12-93 Peking 21
B 06-94 Belfast 10
D 06-94 Hamburg 12
EMP-PROJECT after 2NF
decomposition
Appling 2NF on EMP-PROJECT we get:

 Employee1(emp-no, emp-name, dept, manager)

 Project1(proj-id, proj-start-date)

 Emp-p2(emp-no, location, weeks-on-project)


Applying 3NF in EMP-PROJECT
Employee1(emp-no, emp-name, dept, manager)
FDs: {emp-no} →{emp-name, dept, manager}
dept → manager

Employee1 holds transitive dependency.


Emp-no → dept → manager
Hence it is not in 3NF.

Employee1 can be decomposed into two relations:


Department(dept, manager)
Employee3(emp-no, emp-name, dept)
3NF in EMP-PROJECT

 Project1(proj-id, proj-start-date)
 FDs: proj-id→proj-start-date
No transitive dependency.

 Emp-p2(emp-no, location, weeks-on-project)


 FDs: {emp-no,proj-id}→{ location, weeks-on-project}
No transitive dependency.
EMP-PROJECT in 3NF

Department(dept, manager)

Employee3(emp-no, emp-name, dept)

Project1(proj-id, proj-start-date)

Emp-p2(emp-no, location, weeks-on-project)


Algorithm :Relational Synthesis into 3NF with
Dependency Preservation and Lossless Join
Property
Input: A universal relation R and a set of functional
dependencies F on the attributes of R.
1. Find a canonical cover G for F.

2. For each left-hand-side X of a functional dependency that


appears in G, create a relation schema in D with attributes
{X υ {A1} υ {A2} ... υ {Ak}}, where X  A1, X  A2, ..., X –>Ak
are the only dependencies in G with X as left-hand-side (X is
the key of this relation).

3. If none of the relation schemas in D contains the key of R,


then create one more relation schema in D that contains
attributes that form a key of R.

4. Eliminate redundant relations from the resulting set.


Example of the algorithm
 Example: R( P, L, C, A) with
 FDs: {P→LCA, LC→PA, A→C} IS R in 3NF?

Solution: Primary key of R is P.


Canonical cover G= { P→ LA, LC→P, A→C}
For LHS P ----- R1( P, L, A)
For LHS LC ----- R2( L, C, P)
For LHS A ----- R3(A, C)
R1 contains the primary key of R.
R1, R2, R3 is in 3NF.
Q. Normalize Student relation up to
3NF
STUDENT
Course- Course-
Sno SName Unit-Code Unit-Name Lecturer
Code Length
001 Smith A203 3 U45 Databases II Brown
U87 Programming Green

003 Soap A104 4 U86 Algorithms Purple


U45 Databases II Brown
U25 Business I Red

007 Who A203 3 U12 Business II Pink


U46 Databases I

010 Lemon A323 2 U12 Business II Pink


U86 Algorithms Purple

You might also like