Databases Course 3: Data Models

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

DataBases

Course 3
Data Models
Delia-Alexandrina Mitrea, S.L. Eng.,PhD
E-mail: Delia.Mitrea@cs.utcluj.ro

Interviewing Dr. Edgar F. Codd about databases is


a bit like interviewing Einstein about nuclear
physics.

Topics of this course


Describing and storing data in a DBMS. Data

modeling.
Definition of a data model
Data models classification
Abstractization, generalization, aggregation
Entities, connections between the entities
Most significant data models
The hierarchical data model
The network data model
The relational data model. The EntityRelationship (ER) model
The Entity-Relationship (ER) model detailed
Topics to review

Describing and storing data in a


DBMS

A data model a collection of high level data description


constructs that hide many low-level storage details
A DBMS allows a user to define the data to be stored in the

terms of a data model

Data modeling involves the organization of the corresponding


data in order to achieve the following things:
The accurate representation of the real world
Adaptation of the data to computerized representation
and processing
A data model is a theoretic instrument that allows us
to obtain an appropriate data interpretation to identify the
significance or the information content of an entire data
collection, in contrast with the individual values of the data

Describing and storing data in a DBMS

Technically speaking:
A data model is a formalism having two components:
A set of rules for data structuring and organization
A set of rules for data manipulation

What is a data model?

Data Model Definition

A data model

The database

an ensemble of rules for data organization,


together with a set of operations allowed to be performed on the
corresponding data
is a collection of data organized in a structure which
is described by a conceptual model (schema)

The rules for data structuring and the operations allowed on the data are defined
within the data model

A data model M can be defined as being composed


from two parts:

Set of rules for data structuring, called also generative rules,


denoted by G express the static properties of the data model
and are materialized within the DBMS through the Data
Description Language (DDL)
Set of operations allowed on the data, denoted by O express
the dynamic properties of the data model and are materialized
within a DBMS through the Data Manipulation Language (DML)

The data description language


Description of the data structures which are permitted
within a data model M
These data structures can be specified in the following
ways:

By specifying the permitted objects and the permitted relationships


between these objects using generic definition rules
By specifying the permitted objects and the unpermitted
relationships between these objects by defining restrictions called
constraints

Some data models partition the generating rules in two parts:


the part responsible for the specification of the data structure,

called Gs
the part responsible for the specification of the constraints, called
Gc

The data description language

a schema S will consist of two parts:

A part of structure definition, Ss


A part of constraint definition, Sc contains an explicit list of
constraints that must be respected
Besides the explicit constraints, there are the implicit
constraints, implied by the model itself, that are included in Ss

The data manipulation language


Refers to operations that produce a change in the state of the
database
The state of the database the ensemble of the values
corresponding to the data that is stored in the database at a
certain moment, together with the values of the position indicators
used in order to perform data retrieval
The state of the database => the dynamic nature of a
data model
This state changes after an operation on the database
The Data Manipulation Language (DML) operations cannot
affect the structure of the database => these operations
preserve the conceptual model of the database

Describing and storing data in a DBMS

The data models divided in two main groups:

strong typed data models


weak typed data models

Data models classification

Strong typed data models

Each data must belong to a certain category


The available categories are defined a priori and cannot
evolve dynamically

Weak typed data models

Dont make any assumption concerning the categories


The categories are allowed only if they prove to be useful
The individual data exist through themselves and can be
connected to other data
The information concerning the categories, if they exist, are
treated in a similar way with the information concerning the
data

Abstractization,
generalization,
aggregation

Abstractization, generalization,
aggregation

The abstractization
neglecting the non-relevant aspects and concentrating on the
properties of interest

The relevance criterion determined by the considered objectives

In data modeling the abstractization is used in order to obtain


data categories or to combine data categories into more general
categories => define a data type starting from a class of similar
objects

The abstractization can be done on multiple levels

Two forms of abstracting operations:

The generalization
The aggregation

Generalization

Associates a single generic type to a set of objects,


or to a set of types, having similar features

Object-type generalization classification

E.g. : some distinct students are considered to define the


generic type student
Instantiation is the opposite of classification

Subtype-type generalization the usual generalization


operation

E.g. : type student + type professor => the generic


type person
Specialization is the opposite of generalization

Generalization hierarchy
Example: the university domain
Person

Student

Employee

Administrative Person
Technician

Secretary

Teacher
As.

Lect.

Conf.

Prof.

Aggregation

An abstracting form assuming the representation of


a certain object through its component parts
E.g.:

the type PERSON is constituted, through aggregation, from


the types Name, Address, Age
the type ADDRESS is constituted, through aggregation,
from the types Town, Street, Number

Aggregation hierarchy
Person

Name

Address

Age

Town

Street

Number

Combined generalization & aggregation


Person
Name

Employee

Function

Salary

Address

Age

Student

Specialization

Year

What is an entity?

Entities, connections between the


entities

An entity something that has a well defined existence, a

reality that exists through itself


Every entity characterized by its properties (features) ->
represented through attributes within a data model
An entity type a representation within a data model that
corresponds to a category of objects in the real world and
constitutes the intension of that category

Technically speaking, the entity type corresponds to the definition


of an entity in the terms of its attributes, resulting from the
aggregation of those attributes

A conceptual model contains the description for all the


entities of a database, together with all the connections that
exist between these entities

Entities, connections between the


entities
Generalization => an entity type is considered the result of

classification of a set of entities that have some common properties


An entity type can also correspond to the generalization of one or
more entity types
E.g.: the set of Student entities can be represented through the
Student entity type,
having the following attributes:
Name
Has_scholarship
Year
Gender

The extension of an entity type is formed by the set of entities, having


common properties, being described by the given entity type

Relationships between entities


1:1 (One to one) Relationship
Example: Wives - Husbands

1:N (One to many) Relationship


Example: Groups - Students

Relationships between entities


M:N (Many to many) Relationship

Example: Students - Disciplines

The main data models.


The evolution of the data models.
The hierarchical data model
The network data model
The relational data model

The hierarchical data model


The first data model used within a DBMS
(1960)
The most representative implementation:

(1)

(2)

IMS (Information Management System / IBM- developed


in the context of the spatial research program Apollo)

Features specific to the model:


The data structure diagram: directed graph having as basic
type a hierarchical tree
Representation of the connections between the entities
through the directed edges of the tree

Description of the database built


according to the hierarchical model
Assumes the definition of the three structural elements:
the hierarchical definition tree-specification (root node,
and the parent-child connections)
the record types(the tree nodes)

the fields (attributes) within the records (the data type

and the corresponding dimension)


E.g.: using the DDL of the IMS hierarchical DBMS

Example the Faculty database


Faculty(FCode, Name, Address)
Usual_empl(Name, Function, Salary)
Teacher(Name, Function, Discipline)
Room(Number, Address, Capacity)
Student(Name, Date_of_Birth, Has_scholarship, Year, Sex)
Faculty

Usual employees

Teacher

Student

Room

Drawbacks of the hierarchical data model


The data insertion anomaly
one cannot add data referring to a certain student
until at least one of his teachers is known

The data deletion anomaly


if the record referring to a certain teacher is

being deleted, then the data referring to those students


for which he is the only teacher is being lost

The update anomaly


whenever the change of the value of a student

attribute must be done, it is necessary to explore the


whole database in order to find all the instances of the
corresponding student

a data structure which is lacked of flexibility, being appropriate for


modeling only the simple 1:N relationships
the considered data must present a hierarchical inner structure
the query domain is severely limited

Advantages of the hierarchical data model


The possibility to realize efficient implementations, also in
the case when data storage devices with sequential access are
used

The relative simplicity of the model, the possibility to

be easily understood

The number of the manipulation operators is reduced,


compared with the network data model

The network data model

DBTG = Data Base Task Group


Founded: in 1971 , at the Conference for Database
Language Normalization, CODASYL (COnference on DAta
SYstems Language)
The nowadays standard of network database was founded
by DBTG
The initial report: 1971 The first detailed specification
of a DBMS having the network data model at its basics
Updates: 1973, 1976, 1977, 1978

The network data model


Definition
The network data models data models based on tables and
graphs corresponding to the 2 forms of data structuring - the
record type and the explicit connections, having the following
features:
The vertices of the graph correspond to the entity types
(represented bytables)

The directed edges of the graph correspond to the relationships


between the entity sets, being represented as connections between the

tables

The database diagram appears as a usual directed graph (a certain


entity type can be connected to multiple parent entities or through
multiple arcs to the same parent entity)
The arcs are always labeled (compulsory)

The network data model


Definition elements:
The conceptual model representation through a data structure
diagram which is a usual graph
The labeled arcs are called Set Type the logical connections
between two types of records:

(a) The owner type


(b) The member type

The arcs are oriented from type (a) to type(b)


The name of the Set Type is the label of the corresponding arc

The network data model


OBS.1:
The Set Type is the central concept of the network data model
philosophy, as it represents a functional connection between two entity types:
the owner type and the member type.

OBS.2:

A Set Type can be used in order to represent the 1:1 connections


and the 1:N connections, but it cant be used in order to represent the
M:N connections

Equivalent names for the Set Type: fan-set or DBTG Set


EXAMPLE
I. The conceptual model for the Faculty Database

The network data model


FACULTY

Employees

USUAL
EMPLOYEES

Teachers

TEACHER
Given Marks

Marks
Received Marks

STUDENT

Rooms

ROOM

The network data model


The software implementation of the set types:
Pointer chains
Pointer matrices
The sets implemented through pointer chains (of type (a)) =
circular lists having as a list head a record of the owner type
Types of pointers that can be used by the DBADMIN
NEXT pointers - used for the simple forward chaining of the records
PRIOR pointers used for the backward chaining of the records
OWNER pointers connect a member record with the set owner

Prior, Owner pointers - optional

The network data model


Combinations of pointer types:
(C1) The structure with NEXT pointers:
The most simple and economical structure
- allows the unidirectional sequential access to the members of the set,
starting from the owner
Drawbacks: in order to delete the Mi member, the update of the
connection chain is necessary; there is the need of accessing the previous
member, Mi-1, but, in the absence of the PRIOR pointers, this can be a
difficult, costing task

M1

M2

Mn

The network data model


(C2) The structure with NEXT and PRIOR pointers:
the circular double-linked list; allows the forward and backward
access to the set members
Drawbacks: there are more increased memory requirements
and additional costs for pointer update
Avantages: the structure is useful when the delete operations are
frequent (allows a more direct access to the Mi-1 member, starting from
any position within the list)

M1

M2

Mn

The network data model


(C3) The structure with NEXT and OWNER pointers
Useful in the situations when the owner must be accessed frequently,
starting from the member records

M1

M2

P
Mn

(C4) The structure with NEXT, PRIOR and OWNER pointers


The circular triple-linked list the most complete structure involves the
most increased costs => it must be used only when it is absolutely
necessary

M1

M2

Mn

Drawbacks of the network data model


It has a schema (structure) which is too close to the
internal representation structure (in the memory of the
computer) pointers, explicit links between the entities
=> difficult to work with it; increased memory requirements
The structure of the database limits the possible query set; In
order to solve additional queries, updates to the database
structure are imposed

Data insertion, deletion, update

The data insertion operation

The data deletion operation

trivial operation
when a new student is added, the chain of the connection
elements will be empty, and will consist from a single
pointer representing a connection from the new record to
itself
one can delete a teacher without affecting the
corresponding students that exist through themselves

The update operation

the student appears within a single database record => there isnt
any risk of inconsistence resulted from the update operation

The relational data model

Apeared relatively late in the database theory and practice, as a


result of achieving a certain performance concerning the
computing equipments
Represents:

a valuable study instrument in the database theory


a starting point for the realization of the competitive DBMS
concerning their performances

The first relation-based data model: E.F. CODD (1970)


The basic principles: the mathematical theory of the relations,

logically extended for satisfying the data management requirements

OBS: The relational data model stands at the basics of the majority
of commercial DBMS, that exist and appear nowadays.

The relational data model


Advantages of the relational data models and of the
relational DBMS :
Possess high level data manipulation languages (DML) simple, but
very powerful, called Relational Languages (RL).

RL features:
Ability to allow the definition of new relations based on the existing
ones
Allow the development, with relational DBMS, of some flexible and
friendly interfaces, having the possibility to be directly explored by
much larger user categories, compared with the case of the network
and relational databases

Codd's Rules
A relational database management system (R DBMS) must manage its
stored data using only its relational capabilities
Rules primarily address implementation requirements for RDBMS
vendors
Some of them also have an impact on application design

1. Information rule
All information in the database should be represented in one and only one
way : as values in a table

2. Guaranteed Access Rule

Each and every atomic value is guaranteed to be logically accessible by a

combination of table name, primary key value, and column name

3. Systematic Treatment of Null Values

Null values are supported in fully RDBMS for representing missing

information independently of type

4. Dynamic Online Catalog


Database description is represented at the logical level in the same way

as ordinary data

5. Comprehensive Data Sublanguage Rule


Relational system may support several languages: it must be at least
one language to support data definition, view definition, data
manipulation, integrity constraints, authorization, transaction
boundaries

6. View Updating Rule

Views that are theoretically updatable

7. High-Level Insert, Update, and Delete


Capability of handling a basic or derived relation as a single operand that
applies to retrieval of data, to insertion, update, and deletion of data

8. Physical Data Independence


application programs and terminal activities remain logically
unimpaired whenever any changes are made in either storage
representation or access methods

9. Logical Data Independence

10. Integrity Independence


integrity constraints must be definable in a relational data sublanguage
and storable in catalog, not in application

11. Distribution Independence


data manipulation sublanguage must enable application programs

and terminal activities to remain logically unimpaired whether and whenever


data are physically centralized or distributed

12. Nonsubversion Rule

if relational system supports low-level language, that cannot be used to


subvert integrity rules or constraints

You might also like