A Relational Model of Data For Large Shared Data Banks

A Relational Model of Data for
Large Shared Data Banks

E.F.CODD
IBM Research Laboratory,
San Jose,California
T.M.Hennayake
I.Willarachchige
G.M.Ekanayake
Agenda
Abstract
This Paper is concerned with…
Introduction
Data Dependencies in Present Systems
A Relational View of Data
Normal Form
Some Linguistic Ability
Expressible, Named And Stored Relations
Redundancy and Consistency
2
Abstract

In using large data banks we do not need to know the internal
representation of data

Changes in data representation often be needed as a result of changes in,
Query
Update
Report traffic
Natural growth in the types of stored information

When the internal or external representation of data changes, it will
affect to the
Activities of users at terminals
Most Application programs
3
This Paper is concerned with…
 Section 1:
Inadequacies of these models are discussed
 A model based on n-ary relations, a normal form for
database relations and concept of a universal data
sublanguage are introduced
 Section 2: Certain operations on relations are discussed

 operations are applied to the problems of redundancy and
consistency
4
Introduction
This paper concerned with the application of elementary
relation theory to systems which provide shared access to
large banks of formatted data
Problems Identified
 Data independence
 Independence of application programs
 Changes in data representation
 Data inconsistency
 Expected to become troublesome even in non-deductive
systems
5
Introduction Cont’d...
Why ER Model became superior….
Describing data with its natural structure only
Provides a basis for a high level data language
Provides sound basis for treating derivability,
redundancy, and consistency of relations

Permits a clearer evaluation of the scope and logical
limitations of present formatted data systems
6
Data Dependencies in Present
Systems
Mainly 3 types
Ordering Dependence
Indexing Dependence
Access Path Dependence
7
Ordering Dependence
Data in a databank can be stored in many different
ways. Elements can be stored in one order or in
different sorted orders
And there's no proper way

Ex:- Records of a file concerning parts might be stored in
ascending order by part serial number
Some well known applications fails because of this

sorted ordering
8
Indexing Dependence
Performance Oriented
Is a redundant component of data representation
Making indexes is useful to improve the performance

 Improves response to queries & updates
Slow downs response to insertions & deletions
9
Indexing Dependence Cont’d…
Problem :-
 “ Can the application programs/terminals remain
invariant as indices come and go…? “
Solution:-
Have taken different approaches in addressing the
problem
 TDMS :- Indexing on all attributes
 IMS :- Provides user with a choice for each file
 IDS :- Permits the file designers to select attributes to be

indexed
10
Access Path Dependence
 Path chosen by the system to retrieve data after a structured query language
(SQL) request is executed
 There can be more than one path/way to access a database
 Access path selection can make a tremendous impact on the overall

performance of the system
 Problem:-
 Regardless of the stored representation, terminal activities & programs become
dependent on user access paths
 Solution
 Defined user access paths not made obsolete until, all programs which uses this
become obsolete. This is Not Practical
11
A Relational View of Data
R(s1, s2, s3, ……….., sn )
According to the degree of relation we are having several types of

relations;
Degree 1 :: Unary
Degree 2 :: Binary
Degree 3 :: Ternary
Degree n :: N-ary
12
A Relational View of Data Cont’d…
N- ary Relation
Each row represents an n tuple of R
The ordering of rows is immaterial
All rows are distinct
The ordering of columns is significant
The significance of each column is partially conveyed by labeling it

with the name of the corresponding column
13
Active Domain
Primary Key
Foreign Key
Simple Domain
Non simple Domain
14
PK:- one domain of a given relation has values which
uniquely identify each element of that relation.
part part quantity
1 5 9
Component 2 5 7
3 5 2
FK:- A common requirement is for elements of a

relation to cross-reference other elements of the same
relation or elements of a different relation.
15
Active Domain:-Set of values represented at some
instance is called as the active domain.
Simple Domain:-Domains whose elements are
nondecomposable(atomic).
NonSimple Domain:-Domains whose elements are
decomposable.
16
Employee
Normal Form Jobhistory

Children
Salaryhistory
ee man# name birthday jobhistory
jobhistory jobdate title salaryhistory
Salaryhistory salarydate salary
Unnormalized set
Children childname birthyear
employee man# name birthday jobhistory children
jobhistory man# jobdate title salaryhistory
Salaryhistory man# jobdate salarydate salary
Children man# childname birthyear

Normalized set
17
Normalization Steps
Starting with the relation at the top of the tree, take its
primary key and expand each of the immediately
subordinate relations by inserting this primary key
domain or domain combination.
The primary key of each expanded relation consists of
the primary key before expansion augmented by the
primary key copied down from the parent relation.
Strike out from the parent relation all nonsimple
domains, remove the top node of the tree,
Repeat the same sequence of operations on each
remaining subtree.
18
Some Linguistic Ability
Normalized schema has very high linguistic
power.
descriptive ability
sensible
19
Expressible, named, and stored relations
The named set
 Relations that the community of users can
identify by means of a simple name (or identifier)
The expressible set

 The expressible set is the total collection of
relations that Can be designated by expressions in
the data language.
20
Expressible, named, and
stored relations….
Problem
Determining the class of stored representations to
be supported.
Variety of permitted data arrange and continual
reinterpretation of descriptions for the structures
currently in effect.
21
Redundancy and Consistency
Redundancy
Data redundancy is a data organization issue that allows
the unnecessary duplication of data within the database
Consistency
Consistency is that only valid data will be written to the
database
22
Operations on Relations
Since relations are sets, all of the usual set
operations are applicable to them
Can use these to overcome from the Redundancy

and Consistency
Permutation
Projection
Join
Composition
Restriction
23
Redundancy and Consistency
Operations on Relations…..
Permutations
Permutation of a set of values is an arrangement of
those values into a particular order
Example :
permutation of the set consist with 3 columns can be;
 3!= 6
{1,2,3} = [1,2,3] , [1,3,2] , [2,1,3] , [2,3,1] , [3,1,2] and
[3,2,1]
24
Operations on Relations Cont’d...
Projection
Striking out the others and then remove from the resulting
array any duplication in the rows
The final array represents a relation which is said to be a
projection of the given relation
Person πAge,Weight(Person)
Name Age Weight
Age Weight
Harry 34 80
34 80
Sally 28 64
28 64
George 29 70
29 70
Helena 54 54
54 54
Peter 34 80
25
Operations on Relations Cont’d…
Join
Combine two binary relations which have common

domains
The result of the natural join is the set of all

combinations of tuples in R and S that are equal
on their common attribute names
26
Example : Natural Join
Employee Dept
DeptName Manager
Name EmpId DeptName
Finance George
Harry 3415 Finance
Sales Harriet
Sally 2241 Sales
Production Charles
George 3401 Finance
Harriet 2202 Sales
Employee join Dept
DeptNam
Name EmpId e Manager
Harry 3415 Finance George

Sally 2241 Sales Harriet
George 3401 Finance George
Harriet 2202 Sales Harriet
27
Operations on Relations cont’d…
Composition
Thus, two relations are composable if and only if they
are joinable
 The composition of binary relation is a concept of

forming a new relation S & R from two given
relations R and S
28
Restriction
A subset of a relation is a relation.
29
Redundancy
Data redundancy is a data organization issue
that allows the unnecessary duplication of
data
StrongRedundancy
Weak Redundancy
30
Strong Redundancy
Least one relation that possesses a
projection which is derivable from other
projections of relations in the set
31
Weak Redundancy
A collection of relations is weakly redundant if it
contains a relation that has a projection
which is not derivable from other members but is

at all times a projection of some join of other
projections of relations in the collection
32
Consistency
Consistency states that only valid data will

be written to the database
33
Consistency
Ways to detect and respond to
inconsistencies
Checks for possible inconsistency whenever an insertion,

deletion, or key update occurs
Problem:-Naturally, such checking will slow these
operations down
If an inconsistency has been generated, details are logged
Internally, al, either the user or someone responsible for the
security and integrity of the data is notified
checking as a batch operation once a day or less frequently.
Track inputs which course inconconsistencies
34
Thank You!

A Relational Model of Data For Large Shared Data Banks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Relational Model of Data For Large Shared Data Banks

Uploaded by

Copyright:

Available Formats

A Relational Model of Data for

Large Shared Data Banks

 Section 2: Certain operations on relations are discussed

 Changes in data representation

Describing data with its natural structure only

Provides a basis for a high level data language

Provides sound basis for treating derivability,

redundancy, and consistency of relations

limitations of present formatted data systems

And there's no proper way

Some well known applications fails because of this

Is a redundant component of data representation

Making indexes is useful to improve the performance

Slow downs response to insertions & deletions

 IDS :- Permits the file designers to select attributes to be

 There can be more than one path/way to access a database

 Access path selection can make a tremendous impact on the overall

According to the degree of relation we are having several types of

The ordering of rows is immaterial

All rows are distinct

The ordering of columns is significant

The significance of each column is partially conveyed by labeling it

FK:- A common requirement is for elements of a

Normal Form Jobhistory

ee man# name birthday jobhistory

jobhistory jobdate title salaryhistory

Salaryhistory salarydate salary

employee man# name birthday jobhistory children

jobhistory man# jobdate title salaryhistory

Salaryhistory man# jobdate salarydate salary

Children man# childname birthyear

The expressible set

Can use these to overcome from the Redundancy

Combine two binary relations which have common

The result of the natural join is the set of all

Harry 3415 Finance George

 The composition of binary relation is a concept of

which is not derivable from other members but is

Consistency states that only valid data will

Checks for possible inconsistency whenever an insertion,

You might also like