Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Designing a Database

Week 10: Database Schema


Design (1,N) (1,N)
Relational
Part(Name,Description,Part#)
Part orders Customer
Supplier(Name, Addr)
Database Design (1,1) Customer(Name, Addr)
From an ER Schema to a Relational supplies
Date Supplies(Name,Part#, Date)
Orders(Name,Part#)
One (1,N)

Restructuring an ER schema Supplier


Network
Performance Analysis Hierarchical
Analysis of Redundancies, Removing Part Part Customer
name
Generalizations
Translation into a Relational Schema
Supplier Customer
Supplier

CSC343 – Introduction to Databases Database Design — 1 CSC343 – Introduction to Databases Database Design — 2

(Relational) Database Design


 Given a conceptual schema (ER, but could also be Database
a UML), generate a logical (relational) schema.
 This is not just a simple translation from one
Design
model to another for two main reasons: Process
o not all the constructs of the Entity-Relationship
model can be translated naturally into the
relational model;
o the schema must be restructured in such a way
as to make the execution of the projected
operations as efficient as possible.
 The topic is covered in Section 3.5 of the textbook.

CSC343 – Introduction to Databases Database Design — 3 CSC343 – Introduction to Databases Database Design — 4

1
Performance Analysis
Logical Design Steps
o It is helpful to divide the design into  An ER schema is restructured to optimize:
two steps:  Cost of an operation (evaluated in terms of the
number of occurrences of entities and relationships
o Restructuring of the Entity- that are visited during the execution of an
Relationship schema, based on operation);
criteria for the optimization of the  Storage requirements (evaluated in terms of
schema and the simplification of the number of bytes necessary to store the data
following step; described by the schema).
o Translation into the logical  In order to study these parameters, we need to know:
model, based on the features of the  Projected volume of data;
logical model (in our case, the  Projected operation characteristics.
relational model).

CSC343 – Introduction to Databases Database Design — 5 CSC343 – Introduction to Databases Database Design — 6

Cost Model
 The cost of an operation is measured in terms of Employee-Department Example
the number of disk accesses required. A disk
access is, generally, orders of magnitude more
expensive than in-memory accesses, or CPU
operations.
 For a coarse estimate of cost, we assume that
a Read operation (for one entity or
relationship) requires 1 disk access;
A Write operation (for one entity or
relationship) requires 2 disk accesses (read
from disk, change, write back to disk).
 There are many other cost models depending on
use and type of DB
Warehouse (OLAP - On-Line Analysis
Processing)
Operational DB (OLTP - On-Line Transaction
CSC343 – Introduction to Databases Database Design — 7 CSC343 – Introduction to Databases Database Design — 8

Processing)

2
Typical Operations Workload Design
 Operation 1: Assign an employee to a project.  During initial design and requirements
 Operation 2: Find an employee record, including her analysis
department, and the projects she works for.
 Operation 3: Find records of employees for a
Estimate operations and frequency
department. Gross estimates at best
 Operation 4: For each branch, retrieve its  After database is operational
departments, and for each department, retrieve the
last names of their managers, and the list of their Tools record actual workload
employees. characteristics
 Need operations and their volume/frequency

CSC343 – Introduction to Databases Database Design — 9 CSC343 – Introduction to Databases Database Design — 10

Analysis of Redundancies
Analysis
Steps  A redundancy in a conceptual
schema corresponds to a piece of
information that can be derived
(that is, obtained through a series
of retrieval operations) from other
data in the database.
 An Entity-Relationship schema
may contain various forms of
redundancy.
CSC343 – Introduction to Databases Database Design — 11 CSC343 – Introduction to Databases Database Design — 12

3
Deciding About Redundancies
Examples of Redundancies  The presence of a redundancy in a database
may be
 an advantage: a reduction in the number of
accesses necessary to obtain the derived
information;
 a disadvantage: because of larger storage
requirements, (but, usually at negligible cost) and
the necessity to carry out additional operations
in order to keep the derived data consistent.
 The decision to maintain or eliminate a
redundancy is made by comparing the cost of
operations that involve the redundant
information and the storage needed, in the
case of presence or absence of redundancy.

CSC343 – Introduction to Databases Database Design — 13 CSC343 – Introduction to Databases Database Design — 14

Removing Generalizations
Cost Comparison: An Example
 The relational model does not allow direct
representation of generalizations that may
be present in an E-R diagram.
 For example, here is an ER schema with
generalizations:

In this schema the attribute


NumberOfInhabitants is
redundant.

CSC343 – Introduction to Databases Database Design — 15 CSC343 – Introduction to Databases Database Design — 16

4
Option 1 Option 3
Note!
Possible
...Two
Restructuring
More...
s
Option 2 Option 4 (combination)

CSC343 – Introduction to Databases Database Design — 17 CSC343 – Introduction to Databases Database Design — 18

General Rules For Removing


Generalization Partitioning and Merging of
 Option 1 is convenient when the operations involve Entities and Relationships
the occurrences and the attributes of E0, E1 and E2  Entities and relationships of an E-R schema can be
more or less in the same way. partitioned or merged to improve the efficiency of
 Option 2 is possible only if the generalization operations, using the following principle:
satisfies the coverage constraint (i.e., every Accesses are reduced by separating
instance of E0 is either an instance of E1 or E2) and attributes of the same concept that are
is useful when there are operations that apply only accessed by different operations and
to occurrences of E1 or E2. by merging attributes of different
 Option 3 is useful when the generalization is not concepts that are accessed by the
coverage-compliant and the operations refer to same operations.
either occurrences and attributes of E1 (E2) or of  The same criteria with those discussed for
E0, and therefore make distinctions between child redundancies are valid in making a decision about
this type of restructuring.
and parent entities.
 Available options can be combined (see option 4)
CSC343 – Introduction to Databases Database Design — 19 CSC343 – Introduction to Databases Database Design — 20

5
Example of Partitioning
Deletion of Multi-Valued Attribute

Recall this is a weak entity

CSC343 – Introduction to Databases Database Design — 21 CSC343 – Introduction to Databases Database Design — 22

Merging Entities Partitioning of a Relationship

Suppose that
composition
represents Are these equivalent?
current and
past
compositions
of a team

Two candidate keys


CSC343 – Introduction to Databases Database Design — 23 CSC343 – Introduction to Databases Database Design — 24

6
Selecting a Primary Key Translation into a Logical Schema
 Every relation must have a unique primary
key.  The second step of logical design consists of a
translation between different data models.
 The criteria for this decision are as follows:
 Starting from an E-R schema, an equivalent
Attributes with null values cannot form primary
keys;
relational schema is constructed. By
“equivalent”, we mean a schema capable of
One/few attributes is preferable to many representing the same information.
attributes;
Internal key preferable to external ones (weak  We will deal with the translation problem
entity); systematically, beginning with the
A key that is used by many operations to
fundamental case, that of entities linked by
access the instances of an entity is preferable many-to-many relationships.
to others.
 At this stage, if none of the candidate keys
satisfies the above requirements, it may be
best to introduce a new attribute (e.g.,Database
CSC343 – Introduction to Databases
social Design — 25 CSC343 – Introduction to Databases Database Design — 26

insurance #, student #,…)

Many-to-Many Relationships Many-to-Many Recursive


Relationships

Employee(Number, Surname, Salary)


Project(Code, Name, Budget)
Participation(Number, Code, StartDate)
Product(Code, Name, Cost)
Composition(Part, SubPart, Quantity)
CSC343 – Introduction to Databases Database Design — 27 CSC343 – Introduction to Databases Database Design — 28

7
Ternary Relationships One-to-Many Relationships

Player(Surname, DateOfBirth, Position)


Team(Name, Town, TeamColours)
Contract(PlayerSurname, PlayerDateOfBirth,
Supplier(SupplierID, SupplierName) Team, Salary)
OR
Product(Code, Type)
Player(Surname, DateOfBirth, Position,
Department(Name, Telephone)
TeamName, Salary)
Supply(Supplier, Product, Department, Quantity)
Team(Name, Town, TeamColours)

CSC343 – Introduction to Databases Database Design — 29 CSC343 – Introduction to Databases Database Design — 30

Weak Entities One-to-One Relationships

Head(Number, Name, Salary, Department,


StartDate)
Department(Name, Telephone, Branch)
Student(RegistrationNumber, University, OR
Surname, EnrolmentYear) Head(Number, Name, Salary, StartDate)
University(Name, Town, Address) Department(Name, Telephone,
HeadNumber, Branch)
CSC343 – Introduction to Databases Database Design — 31 CSC343 – Introduction to Databases Database Design — 32

8
Optional One-to-One
Relationships A Sample ER Schema

Employee(Number, Name, Salary)


Department(Name, Telephone, Branch, Head,
StartDate)
Or, if both entities are optional
Employee(Number, Name, Salary)
Department(Name, Telephone, Branch)
Management(Head, Department, StartDate)

CSC343 – Introduction to Databases Database Design — 33 CSC343 – Introduction to Databases Database Design — 34

1-1 and Optional 1-1 Relationships


Entities with Internal Identifiers
R3

E5 R4 E6

E5 E6
R5

1-1 or optional 1-1


relationships can
E3(A31, A32) lead to messy
E4(A41, A42) E4 transformations
E5(A51, A52)
E4 E3
E6(A61, A62, A63)
E3 E5(A51, A52, A61R3, A62R3, AR3, A61R4,
A62R4, A61R5, A62R5, AR5)
CSC343 – Introduction to Databases Database Design — 35 CSC343 – Introduction to Databases Database Design — 36

9
Weak Entities Many-to-Many Relationships
R3
R3

E1 R1 E5 R4 E6
E1 R1 E5 R4 E6

R5
R5 R6
R6

E1(A11, A51, A12) E2


E2
E2(A21, A11, A51, A22) R2 E4
E4
E3
E3
R2(A21, A11, A51, A31, A41, AR21, AR22)
CSC343 – Introduction to Databases Database Design — 37 CSC343 – Introduction to Databases Database Design — 38

Homework - Fill in a Translation Summary of Transformation Rules

E1(A11, A51, A12)


E2(A21, A11, A51, A22)
E3(A31, A32)
E4(A41,A42)
E5(A51, A52, A61R3, A62R3, AR3, A61R4,
A62R4, A61R5, A62R5, AR5)
E6(A61, A62, A63)
R2(A21, A11, A51, A31, A41, AR21, AR22)

CSC343 – Introduction to Databases Database Design — 39 CSC343 – Introduction to Databases Database Design — 40

10
...More Rules... …Even More Rules...

CSC343 – Introduction to Databases Database Design — 41 CSC343 – Introduction to Databases Database Design — 42

…and the Last One... The Training Company Revisited

CSC343 – Introduction to Databases Database Design — 43 CSC343 – Introduction to Databases Database Design — 44

11
Logical Design Using CASE Tools
 The logical design phase is partially supported by
database design tools: Logical
 the translation to the relational model is carried out Design
by such tools semi-automatically; with a
 the restructuring step is difficult to automate and
CASE tools provide little or no support for it. CASE
 Most commercial CASE tools will generate automatically Tool
SQL code for the creation of the database.
 Some tools allow direct connection with a DBMS and
can construct the corresponding database
automatically.
 [CASE = Computer-Aided Software Engineering]

CSC343 – Introduction to Databases Database Design — 45 CSC343 – Introduction to Databases Database Design — 46

12

You might also like