Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 63

Introduction to Database Design and

Development

1. DATABASES AND DATABASE DESIGN


Outline - cont

 Data models
 Basic object modelling
 Advanced object modelling
 Object meta-modelling
 Function modelling
 Relationaal & Object oriented databases
 Reverse Engineering
 Very large database systems
Basic Concepts

• Informally:
– a database is a collection of related data
– a database management system (DBMS) is the
software that provides the functionality to manage
and access a database
– a database contains data integrated from several
sources (e.g. departments of a company)
– data is available in a standard format for many
different applications
What is a database?
 Any collection of data can be described as a
database
 Computerised database systems are now very
commonplace
 Information is stored in a database every time
we:
 use a bank account
 book a travel ticket
 make an appointment with a doctor
 etc.
Basic Definitions
 Database:
 A collection of related data.
 Data:
 Known facts that can be recorded and have an implicit meaning.
 Mini-world:
 Some part of the real world about which data is stored in a database. For
example, student grades and transcripts at a university.
 Database Management System (DBMS):
 A software package/ system to facilitate the creation and maintenance of a
computerized database.
 Database System:
 The DBMS software together with the data itself. Sometimes, the applications
are also included.
Why databases?
• To avoid:
– redundant data
– redundant processes/interfaces

• To enable:
– ease of maintenance
– sharing of data

• Motivation for Databases:


– Data is a very important asset of an organization
– Key idea:
maintain data independent from application
programs
Database management systems

 A database is simply the collection of data which you need


to store
 To actually store the data, and to do anything useful with it,
you need a Database Management System (DBMS)
 A DBMS controls the way the data is stored on the
computer, and provides ways of getting data in and out of
the system
The Database Approach
DBMS applic. pgm. 1

data instructions
.
. . . . .

metadata

applic. pgm. n

.
instructions
• DBMS is “general purpose”
• self-describing
• contains data + metadata (“data about data”)

.
Example of the Database
Approach : a library system

DATABASE
APPLIC 1
Books

DBMS APPLIC 2
Borrowers

APPLIC 3
Loans
.
.
.
What is a DBMS?
Users/Programmers

DATABASE
SYSTEM Application Programs/Queries

DBMS
SOFTWARE Software to Process
Queries/Programs

Software to Access
Stored Data

Stored Database Stored


Definition Database
(Meta-Data)

A simplified database system environment.

• general purpose software for definition, construction


and manipulation of databases
Characteristics of DBMS

• data is:
– integrated, shared, persistent
– self-describing
• abstraction: program-data
independence
• multiple views of the data
(different users need different kinds of
information)
Advantages of the database approach
– controlled redundancy
– data consistency and integrity
– sharing of data (multiple users)
– re-use of data across multiple applications
– improved security
– enforcement of standards and computation of
statistics
– economy of scale
– improved responsiveness, productivity
– DBMS provides: backup/recovery, concurrency

– What about the downside?


Disadvantages of the Database Approach

• complexity
• size (of software and application)
• cost
• performance
• risk of (spectacular!) failures
Relational databases
 Relatively complex data like this is better handled with the
relational model
 Devised by Edgar Codd around 1970
 Most databases nowadays are relational databases
 although there are others: object databases, XML databases,
“NoSQL” databases
 A database management system which uses the relational
model is called an RDBMS
Databases and Enterprise
Information Systems
Web pages GUI UI Layer

Command Domain
Objects Objects

Data Access Business Layer


Objects

Database Database Layer


Database servers
Desktop PC database accessed as file or
through local server
Application

Database

Desktop Application

Client
Application
network or internet
Desktop PC
connections

Server
Client Database

Desktop PC
Application
Client-Server Application

database accessed through


Client
Application
Desktop PC

network server
Web browser
Desktop PC

Servers
Server
Enterprise Database
Web browser
Application
Desktop PC

Enterprise Web Application


Web browser
Desktop PC
Popular RDBMSs
 Microsoft Access
 aimed at small businesses, and useful for desktop applications and
systems with a small number of users
 Microsoft SQL Server, Oracle,
 scalable and secure, and widely used by large organisations
 MySQL
 open-source and quite powerful, widely used in web sites
 Microsoft SQL Server Compact, JavaDB, SQLite
 compact DBMSs, suitable for mobile devices in particular
 ...and many more
RDBMS tools

 Most RDBMSs include tools to create complete application,


for example:
 form designers – to allow data entry forms to be created for
the user interface
 report designers – to present data to the user
 stored procedures – to perform processing of data according
to business rules
RDBMS and other tools

 Can use your RDBMS and its tools for everything, or


 Can use the RDBMS as a component and use other tools
and programming languages to create the other components
SQL – the language of
relational databases
 To develop applications which use relational databases you
usually need to use SQL
 Structured Query Language
 This is the language which is used to define queries
 A query is a request to a DBMS for some specific
information
 Relational databases are sometimes referred to as SQL
databases
SQL example

 SQL queries can be quite easy to understand


 For example, the following query finds the last name of all
the customers in a database:

SELECT lastName FROM Customers;

 SQL can also be used to add, update or delete data, and to


build the database in the first place
SQL standards

 SQL is supposed to be a standard language which is


supported by all RDBMSs
 In fact, you need to be careful because there are some
important differences between the versions of SQL used by
different systems
 Different versions of SQL standards (SQL92, SQL99,etc.)
 Different implementations by RDBMS vendors
SQL: characteristics

• based on the relational calculus


– comprehensive data language:
•DDL + DML (data definition + data manipulation)
– can be used interactively or embedded in a procedural
language
– terminology:
Relational model: relation tuple attribute

SQL: table row column

• some critics e.g.


– redundancy: more than one way to specify query
– certain relational operations omitted
SQL statements / commands
• Consist of:
– reserved words (fixed, part of syntax)
– user-defined words (names of: relations,
attributes, domains)
• Provide a comprehensive language to:
– create the database; creation/deletion of database
objects:
•schema,relations (tables),domains,views, indexes,…
– populate the database, and modify it
– query the database
A Selection of Data Types
Character CHAR (length) fixed length
VARCHAR (length) variable length (max)
TEXT variable length

Numeric INTEGER, SMALLINT integer


OID object identifier
NUMERIC (precision, decimal) number
FLOAT, DOUBLE floating-point number

Temporal DATE date


TIME time
TIMESTAMP date and time
INTERVAL interval of time

Logical BOOLEAN boolean, true or false


Data manipulation in SQL

• data manipulation involves:


– adding data to database tables
– modifying data in tables
– deleting data
– retrieving data from the database

• SQL provides four statements for data


manipulation (DML):
INSERT
UPDATE
DELETE
SELECT
SQL query syntax
• SELECT
– retrieve and display data
• generic syntax:
SELECT [DISTINCT | ALL] {*| col_expression [AS new_name]][,...]}
[FROM table_name [alias] [,…]]
[WHERE condition]
[GROUP BY column_list] [HAVING condition]
[ORDER BY column l_list]

– SELECT: specifies columns to be retrieved


– FROM: specifies table(s) to be used in query
– WHERE: filter condition
– GROUP BY: groups rows with same column value
– ORDER BY: specifies order of output
An example database
• the conceptual model
– relationships are all one-to-many
– primary keys are unique identifiers for each record
PRODUCTS
CUSTOMERS
cid {p_key} pid {p_key}
pname
cname
city
city
quantity
discount
price
contact 1..1
1..1
cust_ord prod_ord
0..* 0..*

ORDERS
ordno {p_key}
month
qty
extension of the example database (1)

CUSTOMERS
cid name city discount contact
cOO1 TipTop London 10.00 Mr Smith
cOO2 Basic Paris 12.00 M Parot
cOO3 Allied London 8.00 Ms Evans
cOO4 ACME New York 8.00
cOO6 ACME Kyoto 0.00
extension of the example database (2)

PRODUCTS
pid name city quantity price
p01 Comb London 111400 0.50
p02 Brush New York 303000 0.50
p03 Razor New York 150600 1.00
p04 Pen Paris 125300 1.00
p05 Pencil Paris 221400 1.00
p06 case London 123100 2.00
extension of the example database (3)

ORDERS
ordno month cid pid qty Relationships are
1011 Jan c001 p01 1000 implemented by
1012 Jan c002 p03 1000 placing the primary
1013 Feb c001 p05 400 key value as a
1014 Feb c002 p03 200 ‘foreign key’ in the
1015 Feb c003 p04 500 table at the ‘many’
1016 Mar c001 p02 100 end
Designing a database

 A well-designed database helps to make sure that the data


stored is accurate and consistent and can be retrieved easily
 What do we mean by inconsistencies?
 It would, for example, be inconsistent to store a booking
without storing the details of the customer making the
booking
 With careful design, we can make sure the database won’t
allow this to happen
Steps in designing a database

 Determining the intended uses of the system


 Creating a data model
 Implementing the database
Steps

• Determine the purpose of your database.


• Find and organize the information required. ...
• Divide the information into tables. ...
• Turn information items into columns. ...
• Specify primary keys. ...
• Set up the table relationships. ...
• Refine your design. ...
• Apply the normalization rules.
The data model

 Data model = domain model classes which represent entities


we need to store permanently
User
Booking
-name
-adults
-address
-children
-username
-bookingdate
-password 1..1 0..* -status
-datejoined

0..*

Package 1..1

-location
-name Tour
-description -departuredate
-adultprice -offer
-childprice 1..1 0..*
-departure
Data models

 The way in which data is organised for storage in a database


is known as the data model
 Early computer databases developed in the 1960’s used a
hierarchical model
 Similar to the way files and folders are still organised in
modern computer file systems
 Most data does not fit very well into a simple hierarchy
Data models
 Hierarchical data

 “Real-world” data – no clear hierarchy


Data Models

data models specify a set of concepts that can


be used to describe the database
•can be categorised as:
– high level (conceptual):
•concepts at user level: entities / classes, attributes,
relationships / associations ….
– representational (logical):
•relational, network, hierarchical; – may be record based,
object-oriented ….
– low level (physical):
•details of physical data storage; formats, access paths,
ordering ….
Data modelling techniques

 We are using object-oriented techniques with UML to


design our data model
 There are other methods which are also commonly used in
database design
 One widely used method is called Entity Relationship
Modelling (ERM)
 Represents the data model as an Entity Relationship
Diagram (ERD)
From data model to database

 Need to consider how the data model can be represented in


a specific RDBMS
 This requires some further design
 RDBMS software has specific ways of representing and
enforcing the entities, attributes and relationships in the data
model
 For example, a data model entity is represented as a table in
the relational database
Representing the data model in
an RDBMS
Different representations

 business layer
 for example as Java classes and objects
 business logic in Java methods
 database layer
 data is stored permanently in a database
 system queries database to get data it needs to carry out a
particular action
 The system needs to map data from database tables to
classes
Data modelling

data modelling for databases is concerned with:

– analysis of the universe of discourse


•understanding basic properties of data items/objects

•inter-relationship among these objects

– generic abstractions are used to

•understand, classify and model data objects

•define properties and semantics of objects


Basic data modelling concepts
• generic data modelling abstractions lead to basic data modelling
concepts – ER/EER – (extended) entity relationship model:
– entity: a concrete or abstract object, possessing
characteristics in which we are interested e.g. person, book…
•class or kind of entity – e.g. Person
•instance (occurrence) of an entity – e.g. ‘John Smith’

– attribute: a property, feature or characteristic which an entity


may possess e.g. title, name, date of birth
– relationship: an association between two (or more) entities
•note: a relationship may also have attribute(s)
• various diagrammatic notations used to represent these basic
concepts – e.g. UML (Unified Modelling Language) class
diagrams to express the conceptual data model
Abstractions
• represent a very high level approach to data
modelling

• the following abstractions are the building blocks for


all data models:

– classification: set membership (is_member_of)

– aggregation: set composition (is_part_of)

– specialization and generalization: super/sub-class


relationship (is_a)
Classification
• assigning objects with common properties to a class
or kind of entity
– an individual object is associated with its class by the
is_instance_of relationship (set membership)
– the same object can be classified in different ways i.e. an
object may belong to >1 class

Day

Monday Tuesday Sunday

Student

CompSci_stud Biol-stud Phys_stud


Aggregation
• defines a new class from a set of other classes
• two kinds of aggregation
– (a) aggregation of attributes to form an entity or class; this is
the commonest use of aggregation
STUDENT

USerID Name Faculty

– (b) combining entities into a higher level aggregate entity;


represents the is_part_of or is_component_of relationship

CAR

Wheels Engine Doors


Relationships (1)

• An association between elements of two entities/classes


Person owns Car

• there may be several mappings between two classes


Person owns Car
Person drivesCar

• properties of relationships:
– cardinality
•degree – 1:1, 1:N, N:M
•participation constraints – mandatory, optional
Relationships (2)

• cardinality specifies the minimum and maximum


number of times that an instance of an entity can
participate in a relationship

– minimum cardinality: can be any number, often 0 or 1


•if
min-card = 0, then participation is optional, otherwise it is
mandatory

– maximum cardinality: can be any number between 1 and *


(any number); * usually signifies ‘‘no limit’’

– most relationships are binary relationships, i.e. associations


between two classes
•sometimes relationships can be ternary, or n-ary
Data model in UML notation

Person Car

Id {pk} 1..1 owns 0..* regNo {pk}


name year
address make
DoB model
: date_of_ownership :
: :

• entities
• attributes
• cardinalities and participation constraints of the relationship
Generalization / specialization (1)

• the generalisation / specialisation abstraction


defines a hierarchical relationship between two
or more classes (superclasses and subclasses)
– represents an is_a relationship

• inheritance: attributes of the superclass are


inherited by subset classes

• semantics of this abstraction related to


coverage properties
Generalization / specialization (2)

• coverage can be described as:


– participation constraints: mandatory or optional
•mandatory: each instance of the super class must
also be a member of a subclass
•optional: each instance of a superclass need not be
an element of a subclass

– disjoint/overlapping constraints:
•disjoint:
an instance of a superclass may be a
member of only one subclass (or)
•overlapping: an instance of a superclass may be a
member of more than one subclass (and)
Graphical notation for generalisation/ specialisation
hierarchies (1)
Student

• hierarchy with coverage properties: ID {pk}


– total and disjoint
Name
Faculty

• common attributes must be associated


{Mandatory, Or}
with the superclass entity

P/G
• they are inherited by the subclasses U/G
degree
res_topic
tutor
supervisor

In this example, every student is either


an undergrad or a postgrad, but not both
Graphical notation for generalisation/ specialisation
hierarchies (2)
• hierarchy with coverage properties:
– partial and overlapping

Sports_Members

{Optional, And}

Cricket_Players Hockey_Players Football_Players

– some Sports_Members do not play any sport, and


others may play one or more sports
Stages in database development

STAGE / PROCESS TASKS

Problem Investigation Identification and understanding of


data, constraints, facts, enterprise rules

Data Modelling Mapping the “real-world” facts into
a generalised conceptual model

Database Design Mapping the the conceptual model into


a DBMS-specific data model

Database Implementation Structuring data in the physical


database
Database Monitoring/ Monitoring DB usage and re-structuring
Tuning throughout its life, to maintain an
optimised database
Database design

• Conceptual design
– high level; independent of DBMS type
– described by a conceptual model
• Logical design
– description of structure of database; depends on DBMS type
– described by a logical model
• Physical design
– description of implementation of the database in terms of
storage structures and access methods
– DBMS specific; described by a physical model
Roles in the Database Environment
• database administrators: DBAs
– oversee, manage resources
– authorisation
– security policy
– performance, tuning ….
• database designers
– identify data to be stored and structure to be used, identify
constraints
– design includes structural and functional aspects
• application programmers
– develop and implement end-user requirements
• end users
– may be sophisticated/naïve, regular/casual ...
Database Languages
• DBMS support different users performing a range of
tasks; so appropriate languages and interfaces are
required:
– Data Definition Language (DDL)
– includes structure and view definition languages
– Data Manipulation Language (DML)
– high or low level (declarative or procedural)
– standalone or embedded in a host procedural language
– set-oriented or record-oriented

• Most DBMS provide a single ‘query language’


including all required functionality (SQL)
Data definition language is used to store the information of metadata like
the number of tables and schemas, their names, indexes, columns in
each table, constraints, etc.
Here are some tasks that come under DDL:
o Create: It is used to create objects in the database.
o Alter: It is used to alter the structure of the database.
o Drop: It is used to delete objects from the database.
o Truncate: It is used to remove all records from a table.
o Rename: It is used to rename an object.
o Comment: It is used to comment on the data dictionary.
These commands are used to update the database schema that's why
they come under Data definition language.
Data Manipulation Language
 DML stands for Data Manipulation Language. It is used for accessing and
manipulating data in a database. It handles user requests.
 Here are some tasks that come under DML:
 o Select: It is used to retrieve data from a database.
 o Insert: It is used to insert data into a table. o Update: It is used to update
existing data within a table.
 o Delete: It is used to delete all records from a table.
 o Merge: It performs UPSERT operation, i.e., insert or update operations.
 o Call: It is used to call a structured query language or a Java subprogram.
 o Explain Plan: It has the parameter of explaining data.
 o Lock Table: It controls concurrency.
Data Control Language
 o DCL stands for Data Control Language. It is used to retrieve the
stored or saved data.
 o The DCL execution is transactional. It also has rollback
parameters. (But in Oracle database, the execution of data control
language does not have the feature of rolling back.)
 Here are some tasks that come under DCL:
 o Grant: It is used to give user access privileges to a database.
 o Revoke: It is used to take back permissions from the user.
 There are the following operations which have the authorization of
Revoke: CONNECT, INSERT, USAGE, EXECUTE, DELETE,
UPDATE and SELECT.
Transaction Control Language
 TCL is used to run the changes made by
the DML statement.
 TCL can be grouped into a logical
transaction. Here are some tasks that come
under TCL:
 o Commit: It is used to save the
transaction on the database.
o Rollback: It is used to restore the
database to original since the last Commit.
Things to do …

• Research and Read about


– conceptual data modelling
– data modelling abstractions
– methodologies for data modelling

You might also like