Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

CHAPTER 3 Dr Nour Charara

CSI300
RELATIONAL DATA MODEL AND NORMALIZATION AUCE

1
DEFINITIONS
• Informally, a relation looks like a table of values (see Figure on
next slide).

• A relation contains a set of rows.

• The data elements in each row represent certain facts that


correspond to a real-world entity or relationship
– In the formal model, rows are called tuples

• Each column has a column header that gives an indication of the


meaning of the data items in that column
– In the formal model, the column header is called an attribute name
(or just attribute)
ATTRIBUTES & TUPLES
INFORMAL DEFINITIONS
• Key of a Relation:
– Each row (tuple) in the table is uniquely identified by the
value of a particular attribute (or several attributes
together)
• Called the key of the relation
– In the STUDENT relation, SSN is the key
– If no attributes posses this uniqueness property, a new
attribute can be added to the relation to assign unique
row-id values (e.g. unique sequential numbers) to identify
the rows in a relation
• Called artificial key or surrogate key
FORMAL DEFINITIONS – RELATION SCHEMA
• Relation Schema (or description) of a Relation:
– Denoted by R(A1, A2, ..., An)
– R is the name of the relation
– The attributes of the relation are A1, A2, ..., An
– n is the cardinality of the relation
• Example:
CUSTOMER (Cust-id, Cust-name, Address, Phone#)
– CUSTOMER is the relation name
– The CUSTOMER relation schema (or just relation) has four attributes:
Cust-id, Cust-name, Address, Phone#
• Each attribute has a domain or a set of valid values.
FORMAL DEFINITIONS - TUPLE
• A tuple is an ordered set of values (enclosed in angled brackets ‘<
… >’)
• Each value is derived from an appropriate domain.
• A row in the CUSTOMER relation is a 4-tuple and would consist of
four values, for example:
– <632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404)
894-2000">
– Called a 4-tuple because it has 4 values
– In general, a particular relation will have n-tuples, where n is the
number of attributes for the relation
• A relation is a set of such tuples (rows)
FORMAL DEFINITIONS - DOMAIN
• A domain of values can have a logical definition:
– Example: “USA_phone_numbers” are the set of 10 digit phone numbers valid in
the U.S.

• A domain also has a data-type or a format defined for it.


– The USA_phone_numbers may have a format: (ddd)ddd-dddd where each d is a
decimal digit.
– Dates have various formats such as year, month, date formatted as yyyy-mm-
dd, or as dd:mm:yyyy etc.
RELATION DEFINITIONS SUMMARY
Informal Terms Formal Terms
Table Relation
Column Header Attribute
All possible Column Domain
Values or Data Type
Row Tuple

Table Definition Schema of a Relation


Populated Table State of the Relation
DIFFERENT ORDER OF TUPLES
RELATIONAL DATABASE SCHEMA
• Relational Database Schema:
– A set S of relation schemas that belong to the same database.
– S is the name of the whole database schema
– S = {R1, R2, ..., Rn}
– R1, R2, …, Rn are the names of the individual relation schemas within the database S
• Next figure shows a COMPANY database schema with 6 relation schemas
Entity – Relationship ER

Entity

Relation :
Relationship
RELATIONAL DATABASE STATE
• Next two slides show an example of a COMPANY database state
– Each relation has a set of tuples
• The tuples in each table satisfy key and other constraints
• If all constraints are satisfied by a database state, it is called a valid state
– The database state changes to another state whenever the tuples
in any relation are changed via insertions, deletions, or updates
Continued next slide…
ANOMALIES IN DBMS

three types of anomalies that occur when the


database is not normalized.
These are :
1. Insertion,
2. update
3. deletion anomaly.
17
ANOMALIES IN DBMS
Example: Suppose a manufacturing company stores the employee details in a table
named employee that has four attributes: emp_id for storing employee’s id,
emp_name for storing employee’s name, emp_address for storing employee’s
address and emp_dept for storing the department details in which the employee
works.

18
ANOMALIES IN DBMS

To overcome these anomalies we need to normalize the data


Update anomaly: we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we have
to update the same in two rows or the data will become inconsistent. If somehow, the
correct address gets updated in one department but not in other then as per the
database, Rick would be having two different addresses, which is not correct and
would lead to inconsistent data.

Insert anomaly : Suppose a new employee joins the company, who is under
training and currently not assigned to any department then we would not be able to
insert the data into the table if emp_dept field doesn’t allow nulls.

Delete anomaly: Suppose, if at a point of time the company closes the


department D890 then deleting the rows that are having emp_dept as D890 would
also delete the information of employee Maggie since she is assigned only to this
department.
19
NORMALIZATION
•Normalization is the process of removing redundant data from your tables to
improve storage efficiency, data integrity, and scalability.
•Normalization generally involves splitting existing tables into multiple ones,
which must be re-joined or linked each time a query is issued.
the most commonly used normal forms:
1. First normal form(1NF)
2. Second normal form(2NF)
3. Third normal form(3NF)

20
FIRST NORMAL FORM (1NF)
The official qualifications for 1NF are:
1. Each attribute name must be unique.
2. Each attribute value must be single.
3. Each row must be unique.
4. There is no repeating groups.
Additional:
 Choose a primary key.
Reminder:
A primary key is unique, not null, unchanged. A primary
key can be either an attribute or combined attributes. 21
FIRST NORMAL FORM (1NF)

22
FIRST NORMAL FORM (1NF)

23
FUNCTIONAL DEPENDENCIES

24
FUNCTIONAL DEPENDENCIES

25
FUNCTIONAL DEPENDENCIES

26
DETERMINANT

27
SECOND NORMAL FORM (2NF)

28
SECOND NORMAL FORM (2NF)

27

29
SECOND NORMAL FORM (2NF)
27

30
THIRD NORMAL FORM (3NF)

31
32
1

33
BOYCE CODD NORMAL FORM (BCNF) – 3.5NF

34
SUPER KEY
A superkey is a combination of columns that uniquely identifies any row within
a relational database management system (RDBMS) table.
A candidate key is a closely related concept where the superkey is reduced to
the minimum number of columns required to uniquely identify each row.

35
BOYCE CODD NORMAL FORM (BCNF) – 3.5NF

36
37
Example: Suppose a school wants to store the data of teachers and the subjects they
teach. They create a table that looks like this: Since a teacher can teach more than
one subjects, the table can have multiple rows for a same teacher.

The table is in 1 NF because each attribute has atomic values. However, it is not in
2NF because non prime attribute teacher_age is dependent on teacher_id alone
which is a proper subset of candidate key (partial dependency). This violates the rule
for 2NF as the rule says “no non-prime attribute is dependent on a part of primary
key of the table”. 38
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:

Now the tables comply with third normal form (3NF) also. 39
Example: Suppose a company wants to store the complete address of each
employee, they create a table named employee_details that looks like this:

Candidate Keys: {emp_id}


Non-prime attributes: all attributes except emp_id are non-prime as they are not
part of any candidate keys.
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is
dependent on emp_id that makes non-prime attributes (emp_state, emp_city &
emp_district) transitively dependent on primary key (emp_id). This violates the rule of
3NF.
40
To make this table complies with 3NF we have to break the table into two tables to
remove the transitive dependency:
employee table:

employee_zip table:

41

You might also like