Learning Goals: Data Warehousing (CS614)

Data Warehousing (CS614)
Lecture 7
Learning Goals
 What is normalization
 Anomalies
 Normalization: 1NF
6.1 Normalization
Sometimes (especially in a DSS environment), it becomes necessary to drift from this purist approach to meet practical
business requirements of performance. These deviations would cause and the corresponding inconsistencies and
anomalies this is what de-normalization is about.
Normalization is the process of efficiently organizing data in a database by decomposing a table into smaller tables.
There are basically two goals of normalization as follows:
1. Eliminate redundant data (for example, storing the same data in more than one table)
2. To avoid anomalies (insert, update, delete)
Anomalies in Database
Anomalies are actually called errors in common language that occur while dealing with data.
There are three types of anomalies that occur in database. Insert, update and delete.
Insert Anomaly
An Insert Anomaly occurs when certain attributes cannot be inserted into the database without the presence of other
attributes. We need all extra attributes to insert a single attribute.
Delete Anomaly
A Delete Anomaly exists when certain attributes are lost because of the deletion of other attributes.
Update Anomaly
An Update Anomaly exists when one or more instances of duplicated data is updated, but not all. and if by mistake we
miss any record, it will lead to data inconsistency. This is Updation anomaly.
Normalization mean, decomposing tables according to the attributes groups. In order to normalize a database, it is
essential to understand its dependencies. Dependencies are based on the Business Rules that apply.
First Normal Form (1NF)

For a table to be in the First Normal Form, it should follow the following 4 rules:
1. It should only have single(atomic) valued attributes/columns.

2. Values stored in a column should be of the same domain
3. All the columns in a table should have unique names.
4. And the order in which data is stored, does not matter.
Example: Not normalized in first form Solution

How to solve this Problem? Here is our updated table and it now satisfies the First
Normal Form.
roll_no name subject roll_no name subject
101 Akon OS
101 Akon OS, CN
101 Akon CN
103 Ckon Java 103 Ckon Java
102 Bkon C, C++ 102 Bkon C
102 Bkon C++
Before 2nd normalization, Super key and Candidate Key

(A superkey is a set of attributes within a table whose values can be used to uniquely identify a tuple. A candidate
key is a minimal set of attributes necessary to identify a tuple; this is also called a minimal superkey.)
(Minimal superkey is called candidate key)
(Superkey is a key that uniquely identify a row completely. There can be many sets of superkey in a table)
Emp_SSN Emp_Number Emp_Name Super keys:

 {Emp_SSN}
 {Emp_Number}
123456789 226 Steve  {Emp_SSN, Emp_Number}
 {Emp_SSN, Emp_Name}
999999321 227 Ajeet  {Emp_SSN, Emp_Number, Emp_Name}
 {Emp_Number, Emp_Name}
888997212 228 Chaitanya
777778888 229 Robert Note: SSN is Social Security number. Emp_SSN

and Emp_Number are unique.
Candidate key ( That is minimal superkey) Candidate key ( That is minimal superkey)
Dependencies
1. Functional dependency, Attributes are completely dependent on a single key or candidate key but not the part of
the candidate key
2. Partial dependency, when attributes are dependent only a part of candidate key, not on the entire candidate key
3. Transitive dependency, when attribute C is dependent on attribute B, and attribute B is dependent on A, then C
has transitive dependency with A
Second Normal Form (2NF) example
First we need to find candidate key or single primary key attribute in table. Remember that dependencies are based on
the Business Rules of an organization. But here a tip for candidate key is that attributes that have not dependency
(incoming edges) can be identified as candidate key or primary key. Or we can say, attributes that are independent from
any other attribute(s) and they are enough to identify a row. To do that we will take the closure of those attributes to
identify the row. If closure is capturing all the attributes, then this shows the surety of a candidate key.
2NF (Second Normal Form) Rules
 Rule 1- Be in 1NF
 Rule 2- And, it should not have Partial Dependency
o If single primary key, then: All non-key attributes are functional dependent on the primary key
o If candidate key, then: No non-prime attribute should dependent on any proper subset of any candidate.
Not in 2nd Normalized form (Red color is showing redundancy)
SID Name Age SubjectCode SubjectName TeacherName

1 Ali 23 CS103 ICT XYZ
2 Ahmad 22 CS103 ICT XYZ
3 Kaleem 20 CS103 ICT XYZ
4 Nabeel 24 CS102 DB ABC
5 Imran 22 CS102 DB ABC
6 Kashif 20 CS213 OOP DEF
What to do:
1. Take each non-key attribute in turn and ask the question: is this attribute dependent on one part of the
candidate key?
2. After underlining all the attributes, as shown above with the help of arrows, then remove all dependent
attributes to new table + a copy of the part of the key it is dependent upon. The key it is dependent upon
becomes the key in the new table. Underline the key in this new table.
In Second Normalized form:
SID Name Age SubjectCode SubjectCode SubjectName TeacherName

1 Ali 23 CS103 CS103 ICT XYZ
2 Ahmad 22 CS103 CS102 DB ABC
3 Kaleem 20 CS103 CS213 OOP DEF
4 Nabeel 24 CS102
5 Imran 22 CS102
6 Kashif 20 CS213
Not in 3rd Normalized form
SID Name Age SubjectCode SubjectCode SubjectName TeacherName Designation

1 Ali 23 CS103 CS103 ICT XYZ Lecturer
2 Ahmad 22 CS103 CS103 ICT XYZ Lecturer
3 Kaleem 20 CS103 CS103 ICT XYZ Lecturer
4 Nabeel 24 CS102 CS102 DB ABC A.P (Phd)
5 Imran 22 CS102 CS102 DB ABC A.P (Phd)
6 Kashif 20 CS213 CS213 OOP DEF A.P (Phd)
3rd Normalization:
A table design is said to be in 3NF if both the following conditions hold:
 Table must be in 2NF

 Transitive functional dependency of non-prime attribute on any super key should be removed.
An attribute that is not part of any candidate key is known as non-prime attribute
SID Name Age SubjectCode SubjectCode SubjectName TID TeacherName
1 Ali 23 CS103 CS103 ICT 1 XYZ
2 Ahmad 22 CS103 CS102 DB 2 ABC
3 Kaleem 20 CS103 CS213 OOP 3 DEF
4 Nabeel 24 CS102
5 Imran 22 CS102
6 Kashif 20 CS213
TID TeacherName Designation

1 XYZ Lecturer
2 ABC A.P (Phd)
3 DEF A.P (Phd)
Another Example of Second normalization:
This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is [Purchase Location]. In this
case, [Purchase Location] only depends on [Store ID], which is only part of the primary key. Therefore, this table
does not satisfy second normal form.
To bring this table to second normal form, we break the table into two tables, and now we have the following:
What we have done is to remove the partial functional dependency that we initially had. Now, in the table
[TABLE_STORE], the column [Purchase Location] is fully dependent on the primary key of that table, which is [Store
ID].
Third Normal form (3NF)
A table design is said to be in 3NF if both the following conditions hold:
 Table must be in 2NF

 Transitive functional dependency of non-prime attribute on any super key should be removed.
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words, 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional dependency X-> Y at
least one of the following conditions hold:
 X is a super key of table

 Y is a prime attribute of table
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee, they create a table named
employee details that looks like this:
emp_id emp_name emp_zip emp_state emp_city emp_district
1001 John 282005 UP Agra Dayal Bagh
1002 Ajeet 222008 TN Chennai M-City
1006 Lora 282007 TN Chennai Urrapakkam
1101 Lilly 292008 UK Pauri Bhagwan
1201 Steve 222999 MP Gwalior Ratan
Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on

Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any candidate keys.
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on emp_id that makes
non-prime attributes (emp_state, emp_city & emp_district) transitively dependent on super key (emp_id). This violates
the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove the transitive dependency:
employee table:
emp_id emp_name emp_zip
1001 John 282005
1002 Ajeet 222008
1006 Lora 282007

1101 Lilly 292008
1201 Steve 222999
employee_zip table:
emp_zip emp_state emp_city emp_district
282005 UP Agra Dayal Bagh
222008 TN Chennai M-City
282007 TN Chennai Urrapakkam
292008 UK Pauri Bhagwan
222999 MP Gwalior Ratan

Learning Goals: Data Warehousing (CS614)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Learning Goals: Data Warehousing (CS614)

Uploaded by

Copyright:

Available Formats

Data Warehousing (CS614)

attributes. We need all extra attributes to insert a single attribute.

First Normal Form (1NF)

1. It should only have single(atomic) valued attributes/columns.

3. All the columns in a table should have unique names.

4. And the order in which data is stored, does not matter.

Example: Not normalized in first form Solution

roll_no name subject roll_no name subject

102 Bkon C, C++ 102 Bkon C

102 Bkon C++

Before 2nd normalization, Super key and Candidate Key

Emp_SSN Emp_Number Emp_Name Super keys:

888997212 228 Chaitanya

777778888 229 Robert Note: SSN is Social Security number. Emp_SSN

Second Normal Form (2NF) example

2NF (Second Normal Form) Rules

Not in 2nd Normalized form (Red color is showing redundancy)

SID Name Age SubjectCode SubjectName TeacherName

In Second Normalized form:

SID Name Age SubjectCode SubjectCode SubjectName TeacherName

Not in 3rd Normalized form

SID Name Age SubjectCode SubjectCode SubjectName TeacherName Designation

 Table must be in 2NF

TID TeacherName Designation

Another Example of Second normalization:

 Table must be in 2NF

 X is a super key of table

emp_id emp_name emp_zip emp_state emp_city emp_district

1001 John 282005 UP Agra Dayal Bagh

1002 Ajeet 222008 TN Chennai M-City

1006 Lora 282007 TN Chennai Urrapakkam

1101 Lilly 292008 UK Pauri Bhagwan

1201 Steve 222999 MP Gwalior Ratan

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on

emp_id emp_name emp_zip

1001 John 282005

1002 Ajeet 222008

1006 Lora 282007

1201 Steve 222999

emp_zip emp_state emp_city emp_district

282005 UP Agra Dayal Bagh

222008 TN Chennai M-City

282007 TN Chennai Urrapakkam

292008 UK Pauri Bhagwan

222999 MP Gwalior Ratan

You might also like