Professional Documents
Culture Documents
Learning Goals: Data Warehousing (CS614)
Learning Goals: Data Warehousing (CS614)
Lecture 7
Learning Goals
What is normalization
Anomalies
Normalization: 1NF
Normalization: 2NF
Normalization: 3NF
6.1 Normalization
Sometimes (especially in a DSS environment), it becomes necessary to drift from this purist approach to meet practical
business requirements of performance. These deviations would cause and the corresponding inconsistencies and
anomalies this is what de-normalization is about.
Normalization is the process of efficiently organizing data in a database by decomposing a table into smaller tables.
There are basically two goals of normalization as follows:
1. Eliminate redundant data (for example, storing the same data in more than one table)
2. To avoid anomalies (insert, update, delete)
Anomalies in Database
Anomalies are actually called errors in common language that occur while dealing with data.
There are three types of anomalies that occur in database. Insert, update and delete.
Insert Anomaly
An Insert Anomaly occurs when certain attributes cannot be inserted into the database without the presence of other
Delete Anomaly
A Delete Anomaly exists when certain attributes are lost because of the deletion of other attributes.
Update Anomaly
An Update Anomaly exists when one or more instances of duplicated data is updated, but not all. and if by mistake we
miss any record, it will lead to data inconsistency. This is Updation anomaly.
Normalization mean, decomposing tables according to the attributes groups. In order to normalize a database, it is
essential to understand its dependencies. Dependencies are based on the Business Rules that apply.
101 Akon OS
101 Akon OS, CN
101 Akon CN
103 Ckon Java 103 Ckon Java
Candidate key ( That is minimal superkey) Candidate key ( That is minimal superkey)
Dependencies
1. Functional dependency, Attributes are completely dependent on a single key or candidate key but not the part of
the candidate key
2. Partial dependency, when attributes are dependent only a part of candidate key, not on the entire candidate key
3. Transitive dependency, when attribute C is dependent on attribute B, and attribute B is dependent on A, then C
has transitive dependency with A
First we need to find candidate key or single primary key attribute in table. Remember that dependencies are based on
the Business Rules of an organization. But here a tip for candidate key is that attributes that have not dependency
(incoming edges) can be identified as candidate key or primary key. Or we can say, attributes that are independent from
any other attribute(s) and they are enough to identify a row. To do that we will take the closure of those attributes to
identify the row. If closure is capturing all the attributes, then this shows the surety of a candidate key.
Rule 1- Be in 1NF
Rule 2- And, it should not have Partial Dependency
o If single primary key, then: All non-key attributes are functional dependent on the primary key
o If candidate key, then: No non-prime attribute should dependent on any proper subset of any candidate.
What to do:
1. Take each non-key attribute in turn and ask the question: is this attribute dependent on one part of the
candidate key?
2. After underlining all the attributes, as shown above with the help of arrows, then remove all dependent
attributes to new table + a copy of the part of the key it is dependent upon. The key it is dependent upon
becomes the key in the new table. Underline the key in this new table.
3rd Normalization:
A table design is said to be in 3NF if both the following conditions hold:
An attribute that is not part of any candidate key is known as non-prime attribute
SID Name Age SubjectCode SubjectCode SubjectName TID TeacherName
1 Ali 23 CS103 CS103 ICT 1 XYZ
2 Ahmad 22 CS103 CS102 DB 2 ABC
3 Kaleem 20 CS103 CS213 OOP 3 DEF
4 Nabeel 24 CS102
5 Imran 22 CS102
6 Kashif 20 CS213
This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is [Purchase Location]. In this
case, [Purchase Location] only depends on [Store ID], which is only part of the primary key. Therefore, this table
does not satisfy second normal form.
To bring this table to second normal form, we break the table into two tables, and now we have the following:
What we have done is to remove the partial functional dependency that we initially had. Now, in the table
[TABLE_STORE], the column [Purchase Location] is fully dependent on the primary key of that table, which is [Store
ID].
Third Normal form (3NF)
A table design is said to be in 3NF if both the following conditions hold:
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words, 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional dependency X-> Y at
least one of the following conditions hold:
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee, they create a table named
employee details that looks like this:
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any candidate keys.
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on emp_id that makes
non-prime attributes (emp_state, emp_city & emp_district) transitively dependent on super key (emp_id). This violates
the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove the transitive dependency:
employee table:
employee_zip table: