Introduction to Database Design and



Outline - cont

 Data models
 Basic object modelling
 Advanced object modelling
 Object meta-modelling
 Function modelling
 Relationaal & Object oriented databases
 Reverse Engineering
 Very large database systems
Basic Concepts

• Informally:
– a database is a collection of related data
– a database management system (DBMS) is the
software that provides the functionality to manage
and access a database
– a database contains data integrated from several
sources (e.g. departments of a company)
– data is available in a standard format for many
different applications
What is a database?
 Any collection of data can be described as a
 Computerised database systems are now very
 Information is stored in a database every time
 use a bank account
 book a travel ticket
 make an appointment with a doctor
 etc.
Basic Definitions
 Database:
 A collection of related data.
 Data:
 Known facts that can be recorded and have an implicit meaning.
 Mini-world:
 Some part of the real world about which data is stored in a database. For
example, student grades and transcripts at a university.
 Database Management System (DBMS):
 A software package/ system to facilitate the creation and maintenance of a
computerized database.
 Database System:
 The DBMS software together with the data itself. Sometimes, the applications
are also included.
Why databases?
• To avoid:
– redundant data
– redundant processes/interfaces

• To enable:
– ease of maintenance
– sharing of data

• Motivation for Databases:

– Data is a very important asset of an organization
– Key idea:
maintain data independent from application
Database management systems

 A database is simply the collection of data which you need

to store
 To actually store the data, and to do anything useful with it,
you need a Database Management System (DBMS)
 A DBMS controls the way the data is stored on the
computer, and provides ways of getting data in and out of
the system
The Database Approach
DBMS applic. pgm. 1

data instructions
. . . . .


applic. pgm. n

• DBMS is “general purpose”
• self-describing
• contains data + metadata (“data about data”)

Example of the Database
Approach : a library system



What is a DBMS?

SYSTEM Application Programs/Queries

SOFTWARE Software to Process

Software to Access
Stored Data

Stored Database Stored

Definition Database

A simplified database system environment.

• general purpose software for definition, construction

and manipulation of databases
Characteristics of DBMS

• data is:
– integrated, shared, persistent
– self-describing
• abstraction: program-data
• multiple views of the data
(different users need different kinds of
Advantages of the database approach
– controlled redundancy
– data consistency and integrity
– sharing of data (multiple users)
– re-use of data across multiple applications
– improved security
– enforcement of standards and computation of
– economy of scale
– improved responsiveness, productivity
– DBMS provides: backup/recovery, concurrency

– What about the downside?

Disadvantages of the Database Approach

• complexity
• size (of software and application)
• cost
• performance
• risk of (spectacular!) failures
Relational databases
 Relatively complex data like this is better handled with the
relational model
 Devised by Edgar Codd around 1970
 Most databases nowadays are relational databases
 although there are others: object databases, XML databases,
“NoSQL” databases
 A database management system which uses the relational
model is called an RDBMS
Databases and Enterprise
Information Systems
Web pages GUI UI Layer

Command Domain
Objects Objects

Data Access Business Layer


Database Database Layer

Database servers
Desktop PC database accessed as file or
through local server


Desktop Application

network or internet
Desktop PC

Client Database

Desktop PC
Client-Server Application

database accessed through

Desktop PC

network server
Web browser
Desktop PC

Enterprise Database
Web browser
Desktop PC

Enterprise Web Application

Web browser
Desktop PC
Popular RDBMSs
 Microsoft Access
 aimed at small businesses, and useful for desktop applications and
systems with a small number of users
 Microsoft SQL Server, Oracle,
 scalable and secure, and widely used by large organisations
 open-source and quite powerful, widely used in web sites
 Microsoft SQL Server Compact, JavaDB, SQLite
 compact DBMSs, suitable for mobile devices in particular
 ...and many more
RDBMS tools

 Most RDBMSs include tools to create complete application,

for example:
 form designers – to allow data entry forms to be created for
the user interface
 report designers – to present data to the user
 stored procedures – to perform processing of data according
to business rules
RDBMS and other tools

 Can use your RDBMS and its tools for everything, or

 Can use the RDBMS as a component and use other tools
and programming languages to create the other components
SQL – the language of
relational databases
 To develop applications which use relational databases you
usually need to use SQL
 Structured Query Language
 This is the language which is used to define queries
 A query is a request to a DBMS for some specific
 Relational databases are sometimes referred to as SQL
SQL example

 SQL queries can be quite easy to understand

 For example, the following query finds the last name of all
the customers in a database:

SELECT lastName FROM Customers;

 SQL can also be used to add, update or delete data, and to

build the database in the first place
SQL standards

 SQL is supposed to be a standard language which is

supported by all RDBMSs
 In fact, you need to be careful because there are some
important differences between the versions of SQL used by
different systems
 Different versions of SQL standards (SQL92, SQL99,etc.)
 Different implementations by RDBMS vendors
SQL: characteristics

• based on the relational calculus

– comprehensive data language:
•DDL + DML (data definition + data manipulation)
– can be used interactively or embedded in a procedural
– terminology:
Relational model: relation tuple attribute

SQL: table row column

• some critics e.g.

– redundancy: more than one way to specify query
– certain relational operations omitted
SQL statements / commands
• Consist of:
– reserved words (fixed, part of syntax)
– user-defined words (names of: relations,
attributes, domains)
• Provide a comprehensive language to:
– create the database; creation/deletion of database
•schema,relations (tables),domains,views, indexes,…
– populate the database, and modify it
– query the database
A Selection of Data Types
Character CHAR (length) fixed length
VARCHAR (length) variable length (max)
TEXT variable length

Numeric INTEGER, SMALLINT integer

OID object identifier
NUMERIC (precision, decimal) number
FLOAT, DOUBLE floating-point number

Temporal DATE date

TIME time
TIMESTAMP date and time
INTERVAL interval of time

Logical BOOLEAN boolean, true or false

Data manipulation in SQL

• data manipulation involves:

– adding data to database tables
– modifying data in tables
– deleting data
– retrieving data from the database

• SQL provides four statements for data

manipulation (DML):
SQL query syntax
– retrieve and display data
• generic syntax:
SELECT [DISTINCT | ALL] {*| col_expression [AS new_name]][,...]}
[FROM table_name [alias] [,…]]
[WHERE condition]
[GROUP BY column_list] [HAVING condition]
[ORDER BY column l_list]

– SELECT: specifies columns to be retrieved

– FROM: specifies table(s) to be used in query
– WHERE: filter condition
– GROUP BY: groups rows with same column value
– ORDER BY: specifies order of output
An example database
• the conceptual model
– relationships are all one-to-many
– primary keys are unique identifiers for each record
cid {p_key} pid {p_key}
contact 1..1
cust_ord prod_ord
0..* 0..*

ordno {p_key}
extension of the example database (1)

cid name city discount contact
cOO1 TipTop London 10.00 Mr Smith
cOO2 Basic Paris 12.00 M Parot
cOO3 Allied London 8.00 Ms Evans
cOO4 ACME New York 8.00
cOO6 ACME Kyoto 0.00
extension of the example database (2)

pid name city quantity price
p01 Comb London 111400 0.50
p02 Brush New York 303000 0.50
p03 Razor New York 150600 1.00
p04 Pen Paris 125300 1.00
p05 Pencil Paris 221400 1.00
p06 case London 123100 2.00
extension of the example database (3)

ordno month cid pid qty Relationships are
1011 Jan c001 p01 1000 implemented by
1012 Jan c002 p03 1000 placing the primary
1013 Feb c001 p05 400 key value as a
1014 Feb c002 p03 200 ‘foreign key’ in the
1015 Feb c003 p04 500 table at the ‘many’
1016 Mar c001 p02 100 end
Designing a database

 A well-designed database helps to make sure that the data

stored is accurate and consistent and can be retrieved easily
 What do we mean by inconsistencies?
 It would, for example, be inconsistent to store a booking
without storing the details of the customer making the
 With careful design, we can make sure the database won’t
allow this to happen
Steps in designing a database

 Determining the intended uses of the system

 Creating a data model
 Implementing the database

• Determine the purpose of your database.

• Find and organize the information required. ...
• Divide the information into tables. ...
• Turn information items into columns. ...
• Specify primary keys. ...
• Set up the table relationships. ...
• Refine your design. ...
• Apply the normalization rules.
The data model

 Data model = domain model classes which represent entities

we need to store permanently
-password 1..1 0..* -status


Package 1..1

-name Tour
-description -departuredate
-adultprice -offer
-childprice 1..1 0..*
Data models

 The way in which data is organised for storage in a database

is known as the data model
 Early computer databases developed in the 1960’s used a
hierarchical model
 Similar to the way files and folders are still organised in
modern computer file systems
 Most data does not fit very well into a simple hierarchy
Data models
 Hierarchical data

 “Real-world” data – no clear hierarchy

Data Models

data models specify a set of concepts that can

be used to describe the database
•can be categorised as:
– high level (conceptual):
•concepts at user level: entities / classes, attributes,
relationships / associations ….
– representational (logical):
•relational, network, hierarchical; – may be record based,
object-oriented ….
– low level (physical):
•details of physical data storage; formats, access paths,
ordering ….
Data modelling techniques

 We are using object-oriented techniques with UML to

design our data model
 There are other methods which are also commonly used in
database design
 One widely used method is called Entity Relationship
Modelling (ERM)
 Represents the data model as an Entity Relationship
Diagram (ERD)
From data model to database

 Need to consider how the data model can be represented in

a specific RDBMS
 This requires some further design
 RDBMS software has specific ways of representing and
enforcing the entities, attributes and relationships in the data
 For example, a data model entity is represented as a table in
the relational database
Representing the data model in
Different representations

 business layer
 for example as Java classes and objects
 business logic in Java methods
 database layer
 data is stored permanently in a database
 system queries database to get data it needs to carry out a
particular action
 The system needs to map data from database tables to
Data modelling

data modelling for databases is concerned with:

– analysis of the universe of discourse

•understanding basic properties of data items/objects

•inter-relationship among these objects

– generic abstractions are used to

•understand, classify and model data objects

•define properties and semantics of objects

Basic data modelling concepts
• generic data modelling abstractions lead to basic data modelling
concepts – ER/EER – (extended) entity relationship model:
– entity: a concrete or abstract object, possessing
characteristics in which we are interested e.g. person, book…
•class or kind of entity – e.g. Person
•instance (occurrence) of an entity – e.g. ‘John Smith’

– attribute: a property, feature or characteristic which an entity

may possess e.g. title, name, date of birth
– relationship: an association between two (or more) entities
•note: a relationship may also have attribute(s)
• various diagrammatic notations used to represent these basic
concepts – e.g. UML (Unified Modelling Language) class
diagrams to express the conceptual data model
• represent a very high level approach to data

• the following abstractions are the building blocks for

all data models:

– classification: set membership (is_member_of)

– aggregation: set composition (is_part_of)

– specialization and generalization: super/sub-class

relationship (is_a)
• assigning objects with common properties to a class
or kind of entity
– an individual object is associated with its class by the
is_instance_of relationship (set membership)
– the same object can be classified in different ways i.e. an
object may belong to >1 class


Monday Tuesday Sunday


CompSci_stud Biol-stud Phys_stud

• defines a new class from a set of other classes
• two kinds of aggregation
– (a) aggregation of attributes to form an entity or class; this is
the commonest use of aggregation

USerID Name Faculty

– (b) combining entities into a higher level aggregate entity;

represents the is_part_of or is_component_of relationship


Wheels Engine Doors

Relationships (1)

• An association between elements of two entities/classes

Person owns Car

• there may be several mappings between two classes

Person owns Car
Person drivesCar

• properties of relationships:
– cardinality
•degree – 1:1, 1:N, N:M
•participation constraints – mandatory, optional
Relationships (2)

• cardinality specifies the minimum and maximum

number of times that an instance of an entity can
participate in a relationship

– minimum cardinality: can be any number, often 0 or 1

min-card = 0, then participation is optional, otherwise it is

– maximum cardinality: can be any number between 1 and *

(any number); * usually signifies ‘‘no limit’’

– most relationships are binary relationships, i.e. associations

between two classes
•sometimes relationships can be ternary, or n-ary
Data model in UML notation

Person Car

Id {pk} 1..1 owns 0..* regNo {pk}

name year
address make
DoB model
: date_of_ownership :
: :

• entities
• attributes
• cardinalities and participation constraints of the relationship
Generalization / specialization (1)

• the generalisation / specialisation abstraction

defines a hierarchical relationship between two
or more classes (superclasses and subclasses)
– represents an is_a relationship

• inheritance: attributes of the superclass are

inherited by subset classes

• semantics of this abstraction related to

coverage properties
Generalization / specialization (2)

• coverage can be described as:

– participation constraints: mandatory or optional
•mandatory: each instance of the super class must
also be a member of a subclass
•optional: each instance of a superclass need not be
an element of a subclass

– disjoint/overlapping constraints:
an instance of a superclass may be a
member of only one subclass (or)
•overlapping: an instance of a superclass may be a
member of more than one subclass (and)
Graphical notation for generalisation/ specialisation
hierarchies (1)

• hierarchy with coverage properties: ID {pk}

– total and disjoint

• common attributes must be associated

{Mandatory, Or}
with the superclass entity

• they are inherited by the subclasses U/G

In this example, every student is either

an undergrad or a postgrad, but not both
Graphical notation for generalisation/ specialisation
hierarchies (2)
• hierarchy with coverage properties:
– partial and overlapping


{Optional, And}

Cricket_Players Hockey_Players Football_Players

– some Sports_Members do not play any sport, and

others may play one or more sports
Stages in database development


Problem Investigation Identification and understanding of

data, constraints, facts, enterprise rules

Data Modelling Mapping the “real-world” facts into
a generalised conceptual model

Database Design Mapping the the conceptual model into

a DBMS-specific data model

Database Implementation Structuring data in the physical

Database Monitoring/ Monitoring DB usage and re-structuring
Tuning throughout its life, to maintain an
optimised database
Database design

• Conceptual design
– high level; independent of DBMS type
– described by a conceptual model
• Logical design
– description of structure of database; depends on DBMS type
– described by a logical model
• Physical design
– description of implementation of the database in terms of
storage structures and access methods
– DBMS specific; described by a physical model
Roles in the Database Environment
• database administrators: DBAs
– oversee, manage resources
– authorisation
– security policy
– performance, tuning ….
• database designers
– identify data to be stored and structure to be used, identify
– design includes structural and functional aspects
• application programmers
– develop and implement end-user requirements
• end users
– may be sophisticated/naïve, regular/casual ...
Database Languages
• DBMS support different users performing a range of
tasks; so appropriate languages and interfaces are
– Data Definition Language (DDL)
– includes structure and view definition languages
– Data Manipulation Language (DML)
– high or low level (declarative or procedural)
– standalone or embedded in a host procedural language
– set-oriented or record-oriented

• Most DBMS provide a single ‘query language’

including all required functionality (SQL)
Data definition language is used to store the information of metadata like
the number of tables and schemas, their names, indexes, columns in
each table, constraints, etc.
Here are some tasks that come under DDL:
o Create: It is used to create objects in the database.
o Alter: It is used to alter the structure of the database.
o Drop: It is used to delete objects from the database.
o Truncate: It is used to remove all records from a table.
o Rename: It is used to rename an object.
o Comment: It is used to comment on the data dictionary.
These commands are used to update the database schema that's why
they come under Data definition language.
Data Manipulation Language
 DML stands for Data Manipulation Language. It is used for accessing and
manipulating data in a database. It handles user requests.
 Here are some tasks that come under DML:
 o Select: It is used to retrieve data from a database.
 o Insert: It is used to insert data into a table. o Update: It is used to update
existing data within a table.
 o Delete: It is used to delete all records from a table.
 o Merge: It performs UPSERT operation, i.e., insert or update operations.
 o Call: It is used to call a structured query language or a Java subprogram.
 o Explain Plan: It has the parameter of explaining data.
 o Lock Table: It controls concurrency.
Data Control Language
 o DCL stands for Data Control Language. It is used to retrieve the
stored or saved data.
 o The DCL execution is transactional. It also has rollback
parameters. (But in Oracle database, the execution of data control
language does not have the feature of rolling back.)
 Here are some tasks that come under DCL:
 o Grant: It is used to give user access privileges to a database.
 o Revoke: It is used to take back permissions from the user.
 There are the following operations which have the authorization of
Transaction Control Language
 TCL is used to run the changes made by
the DML statement.
 TCL can be grouped into a logical
transaction. Here are some tasks that come
under TCL:
 o Commit: It is used to save the
transaction on the database.
o Rollback: It is used to restore the
database to original since the last Commit.
Things to do …

• Research and Read about

– conceptual data modelling
– data modelling abstractions
– methodologies for data modelling

