Databases Course 3: Data Models

DataBases
Course 3
Data Models
Delia-Alexandrina Mitrea, S.L. Eng.,PhD
E-mail: Delia.Mitrea@cs.utcluj.ro
Interviewing Dr. Edgar F. Codd about databases is

a bit like interviewing Einstein about nuclear
physics.
Topics of this course

Describing and storing data in a DBMS. Data
modeling.
Definition of a data model
Data models classification
Abstractization, generalization, aggregation
Entities, connections between the entities
Most significant data models
The hierarchical data model
The network data model
The relational data model. The EntityRelationship (ER) model
The Entity-Relationship (ER) model detailed
Topics to review
Describing and storing data in a

DBMS
A data model a collection of high level data description

constructs that hide many low-level storage details
A DBMS allows a user to define the data to be stored in the
terms of a data model
Data modeling involves the organization of the corresponding

data in order to achieve the following things:
The accurate representation of the real world
Adaptation of the data to computerized representation
and processing
A data model is a theoretic instrument that allows us
to obtain an appropriate data interpretation to identify the
significance or the information content of an entire data
collection, in contrast with the individual values of the data
Describing and storing data in a DBMS
Technically speaking:
A data model is a formalism having two components:
A set of rules for data structuring and organization
A set of rules for data manipulation
What is a data model?
Data Model Definition
A data model
The database
an ensemble of rules for data organization,

together with a set of operations allowed to be performed on the
corresponding data
is a collection of data organized in a structure which
is described by a conceptual model (schema)
The rules for data structuring and the operations allowed on the data are defined
within the data model
A data model M can be defined as being composed

from two parts:
Set of rules for data structuring, called also generative rules,

denoted by G express the static properties of the data model
and are materialized within the DBMS through the Data
Description Language (DDL)
Set of operations allowed on the data, denoted by O express
the dynamic properties of the data model and are materialized
within a DBMS through the Data Manipulation Language (DML)
The data description language

Description of the data structures which are permitted
within a data model M
These data structures can be specified in the following
ways:
By specifying the permitted objects and the permitted relationships

between these objects using generic definition rules
By specifying the permitted objects and the unpermitted
relationships between these objects by defining restrictions called
constraints
Some data models partition the generating rules in two parts:

the part responsible for the specification of the data structure,
called Gs
the part responsible for the specification of the constraints, called
Gc
The data description language
a schema S will consist of two parts:
A part of structure definition, Ss

A part of constraint definition, Sc contains an explicit list of
constraints that must be respected
Besides the explicit constraints, there are the implicit
constraints, implied by the model itself, that are included in Ss
The data manipulation language

Refers to operations that produce a change in the state of the
database
The state of the database the ensemble of the values
corresponding to the data that is stored in the database at a
certain moment, together with the values of the position indicators
used in order to perform data retrieval
The state of the database => the dynamic nature of a
data model
This state changes after an operation on the database
The Data Manipulation Language (DML) operations cannot
affect the structure of the database => these operations
preserve the conceptual model of the database
Describing and storing data in a DBMS
The data models divided in two main groups:
strong typed data models

weak typed data models
Data models classification
Strong typed data models
Each data must belong to a certain category

The available categories are defined a priori and cannot
evolve dynamically
Weak typed data models
Dont make any assumption concerning the categories

The categories are allowed only if they prove to be useful
The individual data exist through themselves and can be
connected to other data
The information concerning the categories, if they exist, are
treated in a similar way with the information concerning the
data
Abstractization,
generalization,
aggregation
Abstractization, generalization,
aggregation
The abstractization
neglecting the non-relevant aspects and concentrating on the
properties of interest
The relevance criterion determined by the considered objectives
In data modeling the abstractization is used in order to obtain

data categories or to combine data categories into more general
categories => define a data type starting from a class of similar
objects
The abstractization can be done on multiple levels
Two forms of abstracting operations:
The generalization
The aggregation
Generalization
Associates a single generic type to a set of objects,

or to a set of types, having similar features
Object-type generalization classification
E.g. : some distinct students are considered to define the

generic type student
Instantiation is the opposite of classification
Subtype-type generalization the usual generalization

operation
E.g. : type student + type professor => the generic

type person
Specialization is the opposite of generalization
Generalization hierarchy
Example: the university domain
Person
Student
Employee
Administrative Person
Technician
Secretary
Teacher
As.
Lect.
Conf.
Prof.
Aggregation
An abstracting form assuming the representation of

a certain object through its component parts
E.g.:
the type PERSON is constituted, through aggregation, from

the types Name, Address, Age
the type ADDRESS is constituted, through aggregation,
from the types Town, Street, Number
Aggregation hierarchy
Person
Name
Address
Age
Town
Street
Number
Combined generalization & aggregation

Person
Name
Employee
Function
Salary
Address
Age
Student
Specialization
Year
What is an entity?
Entities, connections between the

entities
An entity something that has a well defined existence, a
reality that exists through itself

Every entity characterized by its properties (features) ->
represented through attributes within a data model
An entity type a representation within a data model that
corresponds to a category of objects in the real world and
constitutes the intension of that category
Technically speaking, the entity type corresponds to the definition

of an entity in the terms of its attributes, resulting from the
aggregation of those attributes
A conceptual model contains the description for all the

entities of a database, together with all the connections that
exist between these entities
Entities, connections between the

entities
Generalization => an entity type is considered the result of
classification of a set of entities that have some common properties

An entity type can also correspond to the generalization of one or
more entity types
E.g.: the set of Student entities can be represented through the
Student entity type,
having the following attributes:
Name
Has_scholarship
Year
Gender
The extension of an entity type is formed by the set of entities, having

common properties, being described by the given entity type
Relationships between entities

1:1 (One to one) Relationship
Example: Wives - Husbands
1:N (One to many) Relationship

Example: Groups - Students
Relationships between entities

M:N (Many to many) Relationship
Example: Students - Disciplines
The main data models.

The evolution of the data models.
The relational data model

The first data model used within a DBMS
(1960)
The most representative implementation:
(1)
(2)
IMS (Information Management System / IBM- developed

in the context of the spatial research program Apollo)
Features specific to the model:

The data structure diagram: directed graph having as basic
type a hierarchical tree
Representation of the connections between the entities
through the directed edges of the tree
Description of the database built

according to the hierarchical model
Assumes the definition of the three structural elements:
the hierarchical definition tree-specification (root node,
and the parent-child connections)
the record types(the tree nodes)
the fields (attributes) within the records (the data type
and the corresponding dimension)

E.g.: using the DDL of the IMS hierarchical DBMS
Example the Faculty database

Faculty(FCode, Name, Address)
Usual_empl(Name, Function, Salary)
Teacher(Name, Function, Discipline)
Room(Number, Address, Capacity)
Student(Name, Date_of_Birth, Has_scholarship, Year, Sex)
Faculty
Usual employees
Teacher
Student
Room
Drawbacks of the hierarchical data model

The data insertion anomaly
one cannot add data referring to a certain student
until at least one of his teachers is known
The data deletion anomaly

if the record referring to a certain teacher is
being deleted, then the data referring to those students

for which he is the only teacher is being lost
The update anomaly

whenever the change of the value of a student
attribute must be done, it is necessary to explore the

whole database in order to find all the instances of the
corresponding student
a data structure which is lacked of flexibility, being appropriate for

modeling only the simple 1:N relationships
the considered data must present a hierarchical inner structure
the query domain is severely limited
Advantages of the hierarchical data model

The possibility to realize efficient implementations, also in
the case when data storage devices with sequential access are
used
The relative simplicity of the model, the possibility to
be easily understood
The number of the manipulation operators is reduced,

compared with the network data model
DBTG = Data Base Task Group

Founded: in 1971 , at the Conference for Database
Language Normalization, CODASYL (COnference on DAta
SYstems Language)
The nowadays standard of network database was founded
by DBTG
The initial report: 1971 The first detailed specification
of a DBMS having the network data model at its basics
Updates: 1973, 1976, 1977, 1978

Definition
The network data models data models based on tables and
graphs corresponding to the 2 forms of data structuring - the
record type and the explicit connections, having the following
features:
The vertices of the graph correspond to the entity types
(represented bytables)
The directed edges of the graph correspond to the relationships

between the entity sets, being represented as connections between the
tables
The database diagram appears as a usual directed graph (a certain

entity type can be connected to multiple parent entities or through
multiple arcs to the same parent entity)
The arcs are always labeled (compulsory)

Definition elements:
The conceptual model representation through a data structure
diagram which is a usual graph
The labeled arcs are called Set Type the logical connections
between two types of records:
(a) The owner type

(b) The member type
The arcs are oriented from type (a) to type(b)

The name of the Set Type is the label of the corresponding arc

OBS.1:
The Set Type is the central concept of the network data model
philosophy, as it represents a functional connection between two entity types:
the owner type and the member type.
OBS.2:
A Set Type can be used in order to represent the 1:1 connections

and the 1:N connections, but it cant be used in order to represent the
M:N connections
Equivalent names for the Set Type: fan-set or DBTG Set

EXAMPLE
I. The conceptual model for the Faculty Database

FACULTY
Employees
USUAL
EMPLOYEES
Teachers
TEACHER
Given Marks
Marks
Received Marks
STUDENT
Rooms
ROOM

The software implementation of the set types:
Pointer chains
Pointer matrices
The sets implemented through pointer chains (of type (a)) =
circular lists having as a list head a record of the owner type
Types of pointers that can be used by the DBADMIN
NEXT pointers - used for the simple forward chaining of the records
PRIOR pointers used for the backward chaining of the records
OWNER pointers connect a member record with the set owner
Prior, Owner pointers - optional

Combinations of pointer types:
(C1) The structure with NEXT pointers:
The most simple and economical structure
- allows the unidirectional sequential access to the members of the set,
starting from the owner
Drawbacks: in order to delete the Mi member, the update of the
connection chain is necessary; there is the need of accessing the previous
member, Mi-1, but, in the absence of the PRIOR pointers, this can be a
difficult, costing task
M1
M2
Mn

(C2) The structure with NEXT and PRIOR pointers:
the circular double-linked list; allows the forward and backward
access to the set members
Drawbacks: there are more increased memory requirements
and additional costs for pointer update
Avantages: the structure is useful when the delete operations are
frequent (allows a more direct access to the Mi-1 member, starting from
any position within the list)
M1
M2
Mn

(C3) The structure with NEXT and OWNER pointers
Useful in the situations when the owner must be accessed frequently,
starting from the member records
M1
M2
P
Mn
(C4) The structure with NEXT, PRIOR and OWNER pointers

The circular triple-linked list the most complete structure involves the
most increased costs => it must be used only when it is absolutely
necessary
M1
M2
Mn
Drawbacks of the network data model

It has a schema (structure) which is too close to the
internal representation structure (in the memory of the
computer) pointers, explicit links between the entities
=> difficult to work with it; increased memory requirements
The structure of the database limits the possible query set; In
order to solve additional queries, updates to the database
structure are imposed
Data insertion, deletion, update
The data insertion operation
The data deletion operation
trivial operation
when a new student is added, the chain of the connection
elements will be empty, and will consist from a single
pointer representing a connection from the new record to
itself
one can delete a teacher without affecting the
corresponding students that exist through themselves
The update operation
the student appears within a single database record => there isnt
any risk of inconsistence resulted from the update operation
Apeared relatively late in the database theory and practice, as a

result of achieving a certain performance concerning the
computing equipments
Represents:
a valuable study instrument in the database theory

a starting point for the realization of the competitive DBMS
concerning their performances
The first relation-based data model: E.F. CODD (1970)

The basic principles: the mathematical theory of the relations,
logically extended for satisfying the data management requirements
OBS: The relational data model stands at the basics of the majority
of commercial DBMS, that exist and appear nowadays.

Advantages of the relational data models and of the
relational DBMS :
Possess high level data manipulation languages (DML) simple, but
very powerful, called Relational Languages (RL).
RL features:
Ability to allow the definition of new relations based on the existing
ones
Allow the development, with relational DBMS, of some flexible and
friendly interfaces, having the possibility to be directly explored by
much larger user categories, compared with the case of the network
and relational databases
Codd's Rules
A relational database management system (R DBMS) must manage its
stored data using only its relational capabilities
Rules primarily address implementation requirements for RDBMS
vendors
Some of them also have an impact on application design
1. Information rule
All information in the database should be represented in one and only one
way : as values in a table
2. Guaranteed Access Rule
Each and every atomic value is guaranteed to be logically accessible by a
combination of table name, primary key value, and column name
3. Systematic Treatment of Null Values
Null values are supported in fully RDBMS for representing missing
information independently of type
4. Dynamic Online Catalog

Database description is represented at the logical level in the same way
as ordinary data
5. Comprehensive Data Sublanguage Rule

Relational system may support several languages: it must be at least
one language to support data definition, view definition, data
manipulation, integrity constraints, authorization, transaction
boundaries
6. View Updating Rule
Views that are theoretically updatable
7. High-Level Insert, Update, and Delete

Capability of handling a basic or derived relation as a single operand that
applies to retrieval of data, to insertion, update, and deletion of data
8. Physical Data Independence

application programs and terminal activities remain logically
unimpaired whenever any changes are made in either storage
representation or access methods
9. Logical Data Independence
10. Integrity Independence

integrity constraints must be definable in a relational data sublanguage
and storable in catalog, not in application
11. Distribution Independence

data manipulation sublanguage must enable application programs
and terminal activities to remain logically unimpaired whether and whenever

data are physically centralized or distributed
12. Nonsubversion Rule
if relational system supports low-level language, that cannot be used to

subvert integrity rules or constraints

Databases Course 3: Data Models

Uploaded by

Copyright:

Available Formats

You might also like

Databases Course 3: Data Models

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Databases Course 3: Data Models

Uploaded by

Copyright:

Available Formats

DataBases

Interviewing Dr. Edgar F. Codd about databases is

Topics of this course

Describing and storing data in a

A data model a collection of high level data description

terms of a data model

Data modeling involves the organization of the corresponding

Describing and storing data in a DBMS

What is a data model?

Data Model Definition

an ensemble of rules for data organization,

A data model M can be defined as being composed

Set of rules for data structuring, called also generative rules,

The data description language

By specifying the permitted objects and the permitted relationships

Some data models partition the generating rules in two parts:

The data description language

a schema S will consist of two parts:

A part of structure definition, Ss

The data manipulation language

Describing and storing data in a DBMS

The data models divided in two main groups:

strong typed data models

Data models classification

Strong typed data models

Each data must belong to a certain category

Weak typed data models

Dont make any assumption concerning the categories

The relevance criterion determined by the considered objectives

In data modeling the abstractization is used in order to obtain

The abstractization can be done on multiple levels

Two forms of abstracting operations:

Associates a single generic type to a set of objects,

Object-type generalization classification

E.g. : some distinct students are considered to define the

Subtype-type generalization the usual generalization

E.g. : type student + type professor => the generic

An abstracting form assuming the representation of

the type PERSON is constituted, through aggregation, from

Combined generalization & aggregation

Entities, connections between the

An entity something that has a well defined existence, a

reality that exists through itself

Technically speaking, the entity type corresponds to the definition

A conceptual model contains the description for all the

Entities, connections between the

classification of a set of entities that have some common properties

The extension of an entity type is formed by the set of entities, having

Relationships between entities

1:N (One to many) Relationship

Relationships between entities

Example: Students - Disciplines

The main data models.

The hierarchical data model

IMS (Information Management System / IBM- developed

Features specific to the model:

Description of the database built