Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Database Management Systems

Relational Data Model


History of Relational Data
Model
Invented by E.F. Codd in 1970, IBM Research
Prototypes (1970s)
 System R (IBM San Jose Research Laboratory)
 Ingres (University of California, Berkeley)
Commercial systems were available from
1980
Most widely used data model
 IBM, Oracle, Microsoft, …
Basic concepts of the model
Based on mathematical theory
Based on the following components:
 Table: Data are organized into a form of tables
with rows and columns
 Data Manipulation: Efficient operations are used to
manipulate data stored in the relations
 Data Integrity: Facilities are included to specify
business rules that maintain the integrity of data
when they are manipulated
Relational Data Structure
Employee Degree

Name Born Salary


Cardinality

Steve White 1975-07-15 43.000


Susan Taylor 1984-12-28 280.000

Relation: named two dimensional table of


data
Attribute: named column
Record: one row in the relation
Properties of Relations
Each relation has a unique name in the database
An entry at the intersection of each row and column
is atomic (singe valued), no multi-valued attributes in
the relations
Each row is unique, no two rows in a relation, which
are identical
Each attribute (column) has unique name within a
table
The sequence of columns is insignificant
The sequence of rows is insignificant
Data Integrity
Domain Constraints
Entity Integrity
Referential Integrity
Operational Constraints
Functional Dependencies
Constraint between two attributes or two sets of
attributes
Attribute B is functionally dependent on attribute A if
the value of A uniquely determines the value of B.
 Denoted A  B
Example
SSN  Name, Address, BirthDate
ISBN  Title, Publisher
The attributes on the left hand side of the arrow in a
functional dependency are called determinant
Properties of the Functional
Dependencies

Armstrong’s Axioms
 Reflexivity: if X is subset of Y, then YX
 Augmentation: if XY, then XZYZ for any Z
 Transitivity: if XY and YZ, then XZ
Additional rules:
 Union: if XY and XZ, then XYZ
 Decomposition: if XYZ, then XY and XZ
Candidate Keys
1. Unique identification: Each non keys attribute is
functionally dependent on the key
2. Redundancy free: No attribute in the key can be
deleted without removing the property of unique
identification
Example: Employee (SSN, IDcard, Name,
Address, Born)
SSN → IDcard, Name, Address, Born
IDCard → SSN, Name, Address, Born
Name, Address, Born → SSN, IDcard
Keys
Key
 Simple: consists one attribute
 Composite key: consists more than one attribute
Primary key: The one of the candidates keys
are defined as primary key.
Foreign Key: An attribute in a relation that
serves as the primary key of another relation
in the same database
Benefit of the constraints
Automatically verified rules
Just definition is needed
The relational DBMS refuses a data
modification if it breaks a defined
constraints
Domain Constraints
All of the values in a column of a relation
must be taken from the same domain
 Data type
 Size (length)
 Domain (allowable values, or range)
E.G.: passport number
 String, 8 chars, first two chars are letter, the rest
are number
Entity Integrity
Every relation has primary key
Primary key or part of the primary key cannot
be null
Properties of the null value
 Null value means undefined value
 Not equal to 0
 Not equal to an empty string
 Not equal to anything else
 A null vale is not equal to an another null value
Referential Integrity
Maintain consistency among the rows of two
relations
Each foreign key value must match a primary
key value in the other relation or else the
foreign key value must be null.
Example
 Book (ISBN, Title, PublisherID)
 Publisher(PublisherID, Name, Address)
Operational Constraints
Come from the business rules
Cannot be defined in relational data
model
Example:
 Maximum 20 student can take an exam in
same time  Limitation for number of
records
Relational Algebra
An algebra whose operands are relations or
variables that represent relations.
Operators are designed to do the most
common things that we need to do with
relations in a database.
 The result is an algebra that can be used as a
query language for relations.
Relation  Relation
Product Table

ID Description Packet Unit Price


I26 Screw 10 230
I35 Nail 10 180
I87 Hammer 1 24
I22 Digger 1 454
I98 Screw driver 1 1203
I56 Scissors 1 442
I34 Brush 5 762
Selection
R1=σc(R2)
C is a condition (as
in “if” statements)
that refers to
attributes of R2.
R1 is all those
records of R2 that
satisfy C.
Selection Example
σPacket=10(Product)
ID Description Packet Unit Price
I26 Screw 10 230
I35 Nail 10 180
Projection
R1=πL(R2)

L is a list of attributes from


the schema of R2.
R1 is constructed by looking
at each record of R2,
extracting the attributes on
list L, in the order specified,
and creating from those
components a record for R1.
Eliminate duplicate records,
if any.
Projection Example
πID, Description (Product)

ID Description
I26 Screw
I35 Nail
I87 Hammer
I22 Digger
I98 Screw driver
I56 Scissors
I34 Brush
Cross Product
R3 = R1 x R2
 Pair each record t1 of R1 with each record
t2 of R2.
 Concatenation t1t2 is a record of R3.
 Schema of R3 is the attributes of R1 and
R2, in order.
 But beware attribute A of the same name
in R1 and R2: use R1.A and R2.A.
Product_Supplier table

ID Supplier
I26 S22
I87 S44
I22 S10
I28 S10
I98 S22
I98 S01
Theta-join
R3=(R1) C (R2)

Take the product R1 X R2


Then apply select c to the result
As for select c can be any boolean-
valued condition.
 Historic versions of this operator allowed
only A theta B, where theta was =, <, etc.;
hence the name “theta-join.”
Equi-join
Equi join is a special theta join where the condition
contains only equality operator

(Product) Product.ID=Product_Supplier.ID (Product_Supplier)

ID Description Packet UnitPrice ID Supplier


I26 Screw 10 230 I26 S22
I87 Hammer 1 24 I87 S44
I22 Digger 1 454 I22 S10
I26 Screw 10 230 I26 S10
I98 Screw Driver 1 1203 I98 S22
I98 Screw Driver 1 1203 I98 S01
Natural Join
A frequent type of join connects two
relations by:
 Equating attributes of the same name
 Projecting out one copy of each pair of
equated attributes.
ID Description Packet UnitPrice Supplier
I26 Screw 10 230 S22
I87 Hammer 1 24 S44
I22 Digger 1 454 S10
I26 Screw 10 230 S10
I98 Screw Driver 1 1203 S22
I98 Screw Driver 1 1203 S01

(Product) (Product_Supplier)
Outer Join
Suppose we join R C S.
A record of R that has no record of S
with which it joins is said to be dangling
Outer join preserves dangling records
by padding them with a special NULL
symbol in the result.
Outer Join Example
(Product) (Product_Supplier)

ID Description Unit UnitPrice Supplier


I26 Screw 10 230 S22
I87 Hammer 1 24 S44
I22 Digger 1 454 S10
I26 Screw 10 230 S10
I98 Screw Driver 1 1203 S22
I98 Screw Driver 1 1203 S01
I56 Scissors 1 442
I34 Brush 5 762
I35 Nail 10 180

Each product table records can be found in the result


If there is no supplier record for a product record
then the result contains null values in the product
suppliers attributes
Union, Intersection, Set difference
All of these operations take two input relations, which
must be union compatible
 Same number of fields
 Corresponding fields have same data type
Operations:
 Union: The result contains each of the records from both
table but the duplicated records are filtered
 Intersection: The result contains the records, which can be
found in both relation
 Set Difference: The result contains the records, which can
be found only in the first relation
“Well-structured” Relations
Contain minimal redundancy
Allow users to insert, delete and modify
the rows in a relation without errors or
inconsistency
Anomalies
Student
ID Name Class Specialization Dept Responsible
123545 John Doe 4 Tel. Tel Dr. George Com
324231 Simon Smith 5 Power Elct. PE Dr. John Power
423232 Charles Newton 5 Comp. Sci. CS Dr. George Relation
423444 Arthur Stewart 3 Comp. Sci CS Dr. George Relation
423869 Eric Stephenson 4 Comp. Sci CS Dr. George Relation

Result of the wrong design


Consequences of the redundancy:
 Update anomaly: one occurrence of a fact is changed, but
not all occurrences.
 Deletion anomaly: valid fact is lost when a record is deleted.
 Insertion anomaly: valid fact cannot be inserted into the
relation because some information is missing. But this
information is independent or not available, when the new
record is created.
Normalization
The process of decomposing relations with
anomalies to produce smaller, well-structured
relations
Validating method
Formal method
Normal Forms
 1NF
 2NF
 3NF
 BCNF
First Normal Form 1NF
Any multi valued attributes (repeating groups) have
been removed, so there is a single value at the
intersection of each rows and columns
ID Name Class Specialization Dept Responsible
123545 John Doe 4 Tel. Tel Dr. George Com
324231 Simon Smith 5 Power Elct. PE Dr. John Power
423232 Charles Newton 5
423444 Arthur Stewart 3 Comp. Sci. CS Dr. George Relation
423869 Eric Stephenson 4

ID Name Class Specialization Dept Responsible


123545 John Doe 4 Tel. Tel Dr. George Com
324231 Simon Smith 5 Power Elct. PE Dr. John Power
423232 Charles Newton 5 Comp. Sci. CS Dr. George Relation
423444 Arthur Stewart 3 Comp. Sci CS Dr. George Relation
423869 Eric Stephenson 4 Comp. Sci CS Dr. George Relation
Second Normal Form 2NF
1NF
Any Partial Functional Dependency have been
removed
 Every non key attribute functionally depend on the
full set of primary key attribute
Solution
 Decompose R Using X -> B
 Replace R by relations with schemas:
1. R1 = X U B
2. R2 = R – B
2 NF Example
League (Team_name, City, Player_id, Player_name)

FD:
Team_Name, Player_id → City, Player_name
Player_id → Player_name
Team_Name → City

Player (Player_id, Player_name)


Team (Team_name, City)
League (Team_name, Player_id)
Third Normal Form 3NF
2NF
Any transitive dependency have been
removed
 There is no functional dependency between two
(or more) non key attributes
Solution
 Like 2NF
3 NF Example
Studies (ID, Name, Class, Specialization, Department,
Responsible)
FD:
Spec. → Dept., Resp.
ID → Name, Class, Spec., Dept., Resp.

Student(ID, Name, Class, Spec.)


Studies(Spec., Dept., Resp.)
Boyce-Codd Normal Form BCNF
Determinants of each of the functional
dependencies are candidate key
Consequence
 Remove all anomalies, which came from
the functional dependencies
Solution
 Like 2NF and 3NF
BCNF Example
Interview (Candidate, Day, Hour, Employee_ID, Room)
Cand, Day → Hour, Emp_ID, Room
Day, Emp_ID → Room
Day, Emp_ID, Hour → Cand, Room

Interview (Candidate, Day, Hour, Employee_ID)


Emp_Room ( Day, Employee_ID, Room)
3NF - BCNF
If a relation is in BCNF then it is in 3NF
If a relation is in 3NF and there isn’t any
composite candidate keys then the relation is
in BCNF.
If a relation is in 3NF and there are composite
candidate keys but the determinant of
candidate keys do not have common
attributes then the relation is in BCNF.
Dependency preserve
Every relation has a dependency
preserve decomposition, which is in 3NF
Every relation has lossless join
decomposition, which is in BCNF
 Lossless decomposition: The original
relation can be reconstructed by natural
join of its decompositions

You might also like