Lec 7 - IDB

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

Balochistan University of Information Technology, Engineering & Management Sciences

Introduction to
Database Systems
Mohammad Imran
Lecturer
Department of Information Technology
Balochistan University of Information Technology, Engineering & Management Sciences

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015
Balochistan University of Information Technology, Engineering & Management Sciences

Lecture 7

Database Normalization

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 2
Balochistan University of Information Technology, Engineering & Management Sciences

Keys
• Before diving into Normalization, lets first clear some
concepts of Keys in Database
o Candidate Key
o Primary Key
o Foreign Key
o Alternate Key
o Composite Key
o Prime Attribute
o Non-Prime Attribute
Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 3
Balochistan University of Information Technology, Engineering & Management Sciences

Candidate Key
• Keys which are candidates for primary key of a table
• Type of keys which full fill all the requirements of primary
key
• So, the Key which is not null and have unique records is a
candidate for primary key
• Every table must have at least one candidate key but at
the same time can have several
• Example: STUDENT {SID, FNAME, LNAME, COURSEID}
o Candidate Keys: SID or FNAME+LNAME
Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 4
Balochistan University of Information Technology, Engineering & Management Sciences

Primary Key
• Type of candidate key which is chosen as a representing
key for table is known as primary key
• There is only one primary key per table
• Example: STUDENT {SID, FNAME, LNAME, COURSEID}
o Candidate Keys: SID or FNAME+LNAME
o Primary Key: SID (Chosen)

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 5
Balochistan University of Information Technology, Engineering & Management Sciences

Foreign Key
• Keys which is used to define relationship between two tables
• When we want to implement relationship between two tables then we use
concept of foreign key also known as referential integrity
• There can be more than one foreign key per table
• Foreign key is generally a primary key from one table that appears as a
field in another where the first table has a relationship to the second
• Example: STUDENT {SID, FNAME, LNAME, COURSEID}
o Candidate Keys: SID or FNAME+LAME
o Primary Key: SID
o Foreign Key: COURSEID

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 6
Balochistan University of Information Technology, Engineering & Management Sciences

Alternate Key
• If any table have more than one candidate key, then after
choosing primary key from those candidate key, rest of
candidate keys are known as an alternate key of that table
• Example: We have a table named Employee which has two
columns EmpID and EmpMail, both have not null attributes
and unique value
o So both columns are treated as candidate key
o We chosen and made EmpID as a primary key to that table
then EmpMail will be known as alternate key

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 7
Balochistan University of Information Technology, Engineering & Management Sciences

Composite Key
• When we create keys on more than one column then that
key is known as composite key
• Example:
o I have a table Student which has two columns SID and
SRefNo and we make primary key on these two column
o Now SID & SRefNo known as composite key

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 8
Balochistan University of Information Technology, Engineering & Management Sciences

Keys Identified
• Example: STUDENT {SID, FNAME, LNAME, COURSEID}
• Here in STUDENT table keys are: Candidate, Primary,
Foreign, Alternate, Composite
o Candidate keys are SID or FNAME+LNAME
o Primary Key: SID
o Foreign Key: COURSEID
o Alternate Key: FNAME+LNAME
o Composite Key: FNAME+LNAME

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 9
Balochistan University of Information Technology, Engineering & Management Sciences

Prime & Non-Prime Attribute


• Any attribute that is at least part of a key is known as a
prime attribute or key attribute
• Conversely, a nonprime attribute, or a non-key
attribute, is not part of any candidate key
• Example: STUDENT {SID, FNAME, LNAME, Age}
o Candidate Keys: SID or FNAME+LAME
o Primary Key: SID
o Prime Attributes: SID, Fname, LName
o Non-Prime Attributes: Age
Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 10
Balochistan University of Information Technology, Engineering & Management Sciences

Anomalies
• An anomaly is an inconsistent, incomplete, or
contradictory state of the database
• Insertion anomaly – user is unable to insert a new record when it
should be possible to do so
• Deletion anomaly – when a record is deleted, other information
that is tied to it is also deleted
• Update anomaly –a record is updated, but other appearances
of the same items are not updated

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 11
Balochistan University of Information Technology, Engineering & Management Sciences

Consider Table – Contains Anomalies


Purchase Product
Customer Amount Price Total price
date name
Ali 2014-02-14 Football 1 80 80
Sidra 2014-02-16 Tennis Ball 2 30 60
Ali 2014-02-14 Tennis Ball 2 30 60
Ali 2014-02-14 Cricket Bat 1 200 200
Asif 2014-02-18 Joggers 2 300 600
Sidra 2014-02-16 Football 1 80 80

What's wrong with this table?


It's difficult to modify data in it. Upon modification, several anomalies can occur

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 12
Balochistan University of Information Technology, Engineering & Management Sciences

Consider Table – Insert Anomalies


• It's impossible to insert a Custome Purchase Product
Amount Price
Total
r date name price
product into the table if
the product hasn't been Ali 2014-02-14 Football 1 80 80
bought by a customer Tennis
Sidra 2014-02-16 2 30 60
yet Ball
Tennis
• Similarly, it's impossible to Ali 2014-02-14
Ball
2 30 60

insert a customer who Ali 2014-02-14


Cricket
1 200 200
hasn't made a purchase Bat

yet Asif 2014-02-18 Joggers 2 300 600

Sidra 2014-02-16 Football 1 80 80

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 13
Balochistan University of Information Technology, Engineering & Management Sciences

Consider Table – Update Anomalies


• It's difficult to update Custome Purchase Product
Amount Price
Total
data in the table r date name price

• If you want to change the Ali 2014-02-14 Football 1 80 80


name of the product, you Tennis
Sidra 2014-02-16 2 30 60
have to update all rows Ball
where the product is Ali 2014-02-14
Tennis
2 30 60
Ball
bought
Cricket
Ali 2014-02-14 1 200 200
• You cannot change the Bat
price of the product for all Asif 2014-02-18 Joggers 2 300 600
future purchases
Sidra 2014-02-16 Football 1 80 80

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 14
Balochistan University of Information Technology, Engineering & Management Sciences

Consider Table – Delete Anomalies


• If you delete the Asif’s Custome Purchase Product
Amount Price
Total
r date name price
purchase (say, because
the order was cancelled), Ali 2014-02-14 Football 1 80 80

you will also delete the Sidra 2014-02-16


Tennis
2 30 60
Ball
product ”Joggers”
Tennis
Ali 2014-02-14 2 30 60
Ball
Cricket
Ali 2014-02-14 1 200 200
Bat

Asif 2014-02-18 Joggers 2 300 600

Sidra 2014-02-16 Football 1 80 80

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 15
Balochistan University of Information Technology, Engineering & Management Sciences

Dealing with Anomalies


• Too much anomalies, How do you deal with tables like
this?
• Solution: You have to normalize them!
• What does it mean? Lets see in next slides

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 16
Balochistan University of Information Technology, Engineering & Management Sciences

Database Normalization
• Normalization is a process for evaluating and correcting
table structures to minimize data redundancies, thereby
reducing the likelihood of data anomalies

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 17
Balochistan University of Information Technology, Engineering & Management Sciences

Goals of Database Normalization


• There are two goals of the normalization process:
o eliminating redundant data (for example, storing the
same data in more than one table)
o ensuring data dependencies make sense (only storing
related data in a table)
• Both of these are worthy goals as they reduce the
amount of space a database consumes and ensure that
data is logically stored

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 18
Balochistan University of Information Technology, Engineering & Management Sciences

Results of Normalization
• Reduce data redundancy
• Reduce chances of data becoming inconsistent
• A table is said to be normalized if it satisfies certain
constraints

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 19
Balochistan University of Information Technology, Engineering & Management Sciences

Normal Forms
• A series of guidelines for ensuring that databases are normalized
referred to as normal forms
• Numbered from one (the lowest form of normalization, referred to
as first normal form or 1NF) through five (fifth normal form or 5NF)
• For most purposes in business database design, 3NF is as high as
you need to go in the normalization process
• So, We will discuss 1NF, 2NF, and 3NF
• Remember! 2NF is better than 1NF, and 3NF is better than 2NF
means every level is better than its previous level
Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 20
Balochistan University of Information Technology, Engineering & Management Sciences

Normalization also has cons!


• You get Normalization on the price of performance degrade
o the higher the normal form, the more relational join operations
you need to produce a specified output
o more resources are required by the database system to
respond to end-user queries
o Therefore, you will occasionally need to denormalize some
portions of a database design to meet performance
requirements

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 21
Balochistan University of Information Technology, Engineering & Management Sciences

Denormalization
• Denormalization produces a lower normal form; that is, a
3NF will be converted to a 2NF through Denormalization.
• Price for increased performance through Denormalization:
o greater data redundancy

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 22
Balochistan University of Information Technology, Engineering & Management Sciences

First Normal Form (1NF)


• As per First Normal Form,
1. No two Rows of data must contain repeating group of
information i.e.
o Each set of column must have a unique value, such that
multiple columns cannot be used to fetch the same row
2. Each row should have a primary key that distinguishes it
as unique

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 23
Balochistan University of Information Technology, Engineering & Management Sciences

Before 1NF
• In First Normal Form, any
row must not have a Student Age Subject
column in which more
than one value is saved, Biology,
Shahid 15
like separated with Maths
commas. Rather than
Talha 14 Maths
that, we must separate
such data into multiple
Usama 17 Maths
rows.
Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 24
Balochistan University of Information Technology, Engineering & Management Sciences

After 1NF
• Repeating groups split and PK
is also defined as Composite
Key (Student + Subject) Student Subject Age
Shahid Biology 15
• Using the First Normal Form,
data redundancy increases, Shahid Maths 15
as there will be many Talha Maths 14
columns with same data in Usama Maths 17
multiple rows but each row as
a whole will be unique

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 25
Balochistan University of Information Technology, Engineering & Management Sciences

Functional Dependency
• Before moving on to 2NF, lets see what functional
dependency is
• Functional Dependency means that the value of one or
more attributes determines the value of one or more
other attributes
• The standard notation for representing the relationship
between STU_NUM and STU_LNAME is:
o CMSID → STU_LNAME

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 26
Balochistan University of Information Technology, Engineering & Management Sciences

Functional Dependency (Contd.)


• The attribute whose value determines another is called the determinant or
the key
• The attribute whose value is determined by the other attribute is called
the dependent
• CMSID → STU_LNAME
• Determinant Dependent
• We can say that CMSID is the determinant and STU_LNAME is the
dependent OR
• CMSID functionally determines STU_FNAME, and STU_LNAME is
functionally dependent on CMSID

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 27
Balochistan University of Information Technology, Engineering & Management Sciences

Functional Dependency (Contd.)


• As stated earlier, functional dependence can involve a
determinant that comprises more than one attribute and
multiple dependent attributes. (Composite Key)

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 28
Balochistan University of Information Technology, Engineering & Management Sciences

Types of Functional Dependency


• Two types of functional dependencies that are of special interest in
normalization are
o partial dependencies
o transitive dependencies
• A partial dependency exists when there is a functional dependence
in which the determinant is only part of the primary key
• For example, if (A, B) → (C, D), B → C, and (A, B) is the primary
key, then the functional dependence B → C is a partial dependency
because only part of the primary key (B) is needed to determine the
value of C
Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 29
Balochistan University of Information Technology, Engineering & Management Sciences

Partial Dependency Partial


Dependency
• Consider Customer Relation
• We have Composite Key
(Cust_ID and Order_ID)
• Partial Dependency exists Cust_ID Order_ID Name
because we can determine
the Name attribute with the 101 1234 AT&T
Cust_ID only 101 156 AT&T
• we don’t need Order_ID to
125 1250 Cisco
determine name attribute

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 30
Balochistan University of Information Technology, Engineering & Management Sciences

Transitive Dependency
• When a non-key attribute determines another non-key
attribute, it is called Transitive Dependency Transitive
Dependency

Emp_ID F_Name L_Name Dept_ID Dept_Name

111 Mary Jones 1 Acct

122 Sarah Smith 2 Mktg


Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 31
Balochistan University of Information Technology, Engineering & Management Sciences

Transitive Dependency
• Consider Relation Customer
• Primary Key Emp_ID determinaes F_Name, L_Name & Dept_ID
• But Dept_ID (Non Key Attribute) determines Dept_Name (another
non key attribute)
• So Transitive Dependency Exists
• We say that A → B → C, So A → C Transitive
Dependency

Emp_ID F_Name L_Name Dept_ID Dept_Name


111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg
Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 32
Balochistan University of Information Technology, Engineering & Management Sciences

Second Normal Form


• There must not be any partial dependency of any column
on primary key
• Simply for a table that has concatenated primary key,
each column in the table that is not part of the primary
key must depend upon the entire concatenated key for its
existence
• If any column depends only on one part of the
concatenated key, then the table does not qualify for
Second normal form
Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 33
Balochistan University of Information Technology, Engineering & Management Sciences

Lets see 1NF, What it lakes


• Table is in First Normal
Form
Student Subject Age
• Two rows for Shahid, to
Shahid Biology 15
include multiple subjects
that he has opted for Shahid Maths 15
Talha Maths 14
• It still is searchable, and
follows First normal form Usama Maths 17
but it is an inefficient use
of space.

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 34
Balochistan University of Information Technology, Engineering & Management Sciences

Lets see 1NF, What it lakes (Contd.)


Partial
• Candidate key is {Student + Dependency

Subject}
• Age of Student only
Student Subject Age
depends on Student Shahid Biology 15
column (Partial Dependency), Shahid Maths 15
which is incorrect as per Talha Maths 14
Second Normal Form Usama Maths 17

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 35
Balochistan University of Information Technology, Engineering & Management Sciences

Lets see 1NF, What it lakes (Contd.)


Partial
• So, we identified that Partial Dependency
Dependency exist that should
be removed according to 2NF
• To achieve second normal Student Subject Age
form, it would be helpful to split Shahid Biology 15
out the subjects into an
Shahid Maths 15
independent table, and match
Talha Maths 14
them up using the student
names as foreign keys. Usama Maths 17

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 36
Balochistan University of Information Technology, Engineering & Management Sciences

Second Normal Form


1. Table Should be 1NF
2. Make New Tables to Eliminate Partial Dependencies.
o It is important that they also remain in the original table as
well because they will be the foreign keys for the relationships
needed to relate these new tables to the original table
3. Reassign Corresponding Dependent Attributes
• Remember! Conversion to 2NF occurs only when the 1NF
has a Composite Primary Key
• If 1NF has a single-attribute primary key, then the table is
automatically in 2NF
Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 37
Balochistan University of Information Technology, Engineering & Management Sciences

Second Normal Form (Contd.)


• Condition 1: Relation
should be in 1NF
Student Subject Age
• Is it in 1NF?
Shahid Biology 15
• YES! Shahid Maths 15
• So Condition 1 is Talha Maths 14
satisfied Usama Maths 17

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 38
Balochistan University of Information Technology, Engineering & Management Sciences

Second Normal Form (Contd.)


Partial
• Condition 2: Make New Tables to Dependency
Eliminate Partial Dependencies
• Does Partial Dependency exist?
• YES! Student Subject Age
• So to remove Partial Shahid Biology 15
dependency, we have to create Shahid Maths 15
new relation for every Talha Maths 14
component of key attribute
Usama Maths 17

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 39
Balochistan University of Information Technology, Engineering & Management Sciences

Second Normal Form (Contd.)


Subject Student
• Condition 2 is satisfied now because
Shahid
Partial Dependency removed Biology
Talha
• New relations created for every Maths Usama
component of Key Attribute
• PK in original table remains PK in new
table as well
• Also Original table also contains the Student Subject
same Primary Key attribute remain as
Shahid Biology
Composite Key for reference
Shahid Maths
• Now we have 3 Tables: Student,
Subject and Stu_Sub_Rel Talha Maths
Usama Maths
Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 40
Balochistan University of Information Technology, Engineering & Management Sciences

Second Normal Form (Contd.)


Subject Student
• Condition 3 is not satisfied Shahid
Biology
because original dependent Talha
Maths Usama
attributes are not assigned to
related Determinant
• So lets do it now
Student Subject
Shahid Biology
Shahid Maths
Talha Maths
Usama Maths
Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 41
Balochistan University of Information Technology, Engineering & Management Sciences

Second Normal Form (Contd.)


• Now original dependent Subject Student Age
Biology Shahid 15
attributes assigned to their
Maths Shahid 15
determinant (Age assigned to its
Talha 14
determinant student) Usama 17
• Condition 3 is also satisfied , so
relations are now in 2NF Student Subject
Shahid Biology
Shahid Maths
Talha Maths
Usama Maths

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 42
Balochistan University of Information Technology, Engineering & Management Sciences

Third Normal Form


1. Table/ relation must be in Second Normal Form (2NF)
2. Make New Tables to Eliminate Transitive Dependencies
3. Reassign Corresponding Dependent Attributes
• In Simple terms, A table/ relation is in third normal form (3NF) when:
o It is in 2NF
o It contains no transitive dependencies
• Recall Transitive Dependency
o When a non-key attribute determines another non-key attribute, it is
called Transitive Dependency

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 43
Balochistan University of Information Technology, Engineering & Management Sciences

Third Normal Form


Subject Student
• Check the conditions of 3NF to our Shahid
tables which we converted to 2NF Biology
Talha
previously Maths Usama
• It satisfied 1st condition because it is in
2NF
• Does it satisfy 2nd Condition, No
Transitive Dependency? Student Subject
• Yes, it satisfies 2nd Conditions of 3NF Shahid Biology
also Shahid Maths
Talha Maths
• Good New: Our Table is already in 3NF
Usama Maths
Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 44
Balochistan University of Information Technology, Engineering & Management Sciences

Third Normal Form (Contd.)


• So What if any of our relation contains Transitive
Dependency?
• Lets see an example, how to solve problem of Transitive
dependency if it exist and remove it to qualify for 3NF

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 45
Balochistan University of Information Technology, Engineering & Management Sciences

Third Normal Form (Contd.)


• Consider the relation, it contains Transitive Dependency
• StudentID is only key in this relation so table is automatically in 2NF.
• StudentID is determinant of its dependents StudentName, DOB, Zip
but Zip (Non Key Attribute) is determinant of two other non key
attributes, State & City, So Transitive Dependency exists Transitive
Dependency

StudentID StudentName DOB Zip State City

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 46
Balochistan University of Information Technology, Engineering & Management Sciences

Third Normal Form (Contd.)


• Hence to apply 3NF, we need to move the street, city
and state to new table, with Zip as primary key (rule 2 of
3NF) and reassign their dependent attributes to their
determinants (rule 3 of 3NF)
Transitive
Dependency

StudentID StudentName DOB Zip State City

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 47
Balochistan University of Information Technology, Engineering & Management Sciences

Third Normal Form (Contd.)


• Zip has been moved to new table named as Address as
Primary Key (Rule 2 of 3NF) and its dependents moved to
Address table as well (Rule 3 of 3NF). Now it satisfied 3NF
• Note: Zip has also been kept in original table for reference

Student Table

Student_id Student_name DOB Zip

Address Table
Zip city state

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 48
Balochistan University of Information Technology, Engineering & Management Sciences

1NF to 3NF Summary


Normal Form Rules

1. No two Rows of data must contain repeating group of


information
1NF
2. Each row should have a primary key that distinguishes it
as unique

1. Table Should be 1NF


2NF 2. Make New Tables to Eliminate Partial Dependencies.
3. Reassign Corresponding Dependent Attributes

1. Table/ relation must be in Second Normal Form (2NF)


3NF 2. Make New Tables to Eliminate Transitive Dependencies
3. Reassign Corresponding Dependent Attributes

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 49
Balochistan University of Information Technology, Engineering & Management Sciences

References
• Database Systems, Design Implementation &
Management 10/11th Edition (Chapter 6)

Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015 50
Balochistan University of Information Technology, Engineering & Management Sciences

Thank you

51
Introduction to Database Systems Spring 2015 Mohammad Imran June 17, 2015

You might also like