Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Relational Database Design by ER- and EER-to-Relational Mapping

The Extended (Enhanced) Entity-Relationship (EER) Model


ER modeling was developed in the databases “subculture”. As the uses of databases broadened and people saw a need for more
sophisticated modeling, it was extended to incorporate techniques from other areas like object-oriented software design and AI knowledge
representation.
Subclasses, Superclasses, and Inheritance
An entity that's a member of a subclass/subtype is also a member of its superclass/supertype. It inherits all the attributes and relationships
of its superclass. In general a subclass may have multiple superclasses, though some designs may want to restrict that to single-inheritance.
Similarly, in general an entity instance may belong to more than one entity type, but some designs restrict that to single-typing.

Specialization and Generalization


Refers to two directions of using sub/supertyping in the design process:
Specialization:
Moving down—defining subclasses of an existing entity type.
Generalization:
Moving up—defining a common superclass to unify common
attributes/relationships of existing entity types.
Constraints and Hierarchies
In a predicate-defined (or condition-defined) subclass, the subclass
membership of an entity can be determined from its attribute value(s) in the
superclass. For example, an EER version of the Person-Employee-Client example
given earlier. A specialization group may have a single common defining attribute, making it attribute-defined. Otherwise it's user-defined.

Then a specialization group must be identified as either disjoint or overlapping, symbolized by ⓓ or ⓞ specialization nodes. It may also be
either a total specialization (double line connecting the superclass) or a partial specialization (single line connecting the superclass).
Hierarchies and Lattices:
If limited to single inheritance, the result is a
specialization hierarchy and has a tree
topology. Otherwise, in general it forms a
specialization lattice with DAG topology. An
entity type with more than one superclass is
called a shared subclass. A shared subclass
inherits attributes from its superclasses only
once, just like in most OO languages.
Modeling Union Types Using Categories
With multiple inheritance the shared subclass
inherits all the attributes of its superclasses. In
the left image above, an entity in D is also in A,
1
Relational Database Design by ER- and EER-to-Relational Mapping
B, and C. What if you want a “shared subclass” to be in one of its superclasses—to inherit some of the attributes? The right image above
shows a union type representing this scenario. This union type is partial, shown with a single line, meaning an entity may be a member of
type A without also having type D. If the union type were total, shown with a double line between D and the Ⓤ, every entity in one of the
superclasses would necessarily also be in D.
A Sample University EER Schema

2
Relational Database Design by ER- and EER-to-Relational Mapping
Example of Other Notation (UML)

The ▲ represents an overlapping specialization group, equivalent to ⓞ; while the △ represents a disjoint one, equivalent to ⓓ. Recall ◇
means composition and is the typical way to represent multivalued attributes in SQL. Not shown is total vs. partial specialization.
Data Abstraction, Knowledge Representation, and Ontology Concepts
Modeling a domain of knowledge (miniworld) using an ontology (schema). Generally broader and more sophisticated/complicated.
Inference engines can derive new facts from existing ones to answer more types of queries.
 Classification and Instantiation
 Identification—distinguishing between different entities, and to recognize multiple pieces of information about one entity.
 Specialization and Generalization
 Aggregation and Association
Examples:
3
Relational Database Design by ER- and EER-to-Relational Mapping
 Cyc, OpenCyc
 Semantic Web
 DBpedia
 Freebase — ER model, on a graph model, on a triplestore. Used in Google's Knowledge Graph
Having designed a database using an ER or EER model, let's automatically convert it to relational form as a first draft implementation
model.

Mapping ER to Relational
The Algorithm
Here we illustrate the 7-step procedure for
deriving a relational database schema from an
ER schema diagram (as described in Section
9.1 of E&N) by applying it to the COMPANY
ER schema obtained in Chapter 7. (It is
assumed that the reader is familiar with the
relational data model, the main concepts of
which include relation, tuple, and attribute
(or their loose equivalents, table, row, and
column, respectively).
Step 1: Regular Entity Types
Create an entity relation for each strong entity
type. Include all single-valued attributes.
Flatten composite attributes. Keys become
secondary keys, except for the one chosen to
be the primary key.
For each regular (i.e., strong, non-weak) entity
type E in the ER schema, create a relation E.
For each single-valued attribute A of E having
no multi-valued sub-attributes, make all its
atomic sub-attributes (including only A itself, if it is atomic) be attributes of E.
Choose as the primary key of E the collection of attributes arising from some key of E.
In the COMPANY example (using Figure 7.2 (3.2 in older editions)), this step leads us to the following (partial) relational schema:
 EMPLOYEE(SSN, Fname, Minit, Lname, Bdate, Sex, Address, Salary)
 DEPARTMENT(Number, Name)
 PROJECT(Number, Name, Location)
Notice that (composite) Name attribute of the EMPLOYEE entity type gives rise to three attributes in the relation EMPLOYEE,
corresponding to its three atomic sub-attributes.
We arbitrarily chose Number as the primary key for both DEPARTMENT and PROJECT, although we could have chosen Name for either
one. (According to Figure 7.2 (3.2 in older editions), both attributes (individually, that is) are keys of the corresponding entity types.)
Step 2: Weak Entity Types
Also create an entity relation for each weak entity type, similarly including its (flattened) single-valued attributes. In addition, add
the primary key of each owner entity type as a foreign key attribute here. Possibly make this foreign key CASCADE.
For each weak entity type W in the ER schema, create a relation W. As in Step 1, for each single-valued attribute of W having no multi-
valued sub-attributes, make all its atomic sub-attributes be attributes of W.
Let E, with corresponding relation E, be an owner entity type of W. Add as (foreign key) attributes of W the attributes forming the primary
key of E. (These attributes can be renamed, if desired.)
The primary key of W includes all the foreign key attributes added as a result of the previous paragraph, plus any partial key attributes of
weak entity type W.
For the COMPANY example, this step results in the introduction of relation DEPENDENT, which includes as attributes all those associated
with weak entity type DEPENDENT in Figure 7.2 (3.2 in older editions), plus the primary key SSN (renamed ESSN) of the relation
EMPLOYEE (which arose from owner entity type EMPLOYEE). The primary key of DEPENDENT is the partial key Name in
combination with the foreign key, ESSN.
We add the following to our relational schema:
 DEPENDENT(ESSN, Name, Sex, Birthdate, Relationship)
Note: It is possible for a weak entity type W1 to have an owner entity type W2 that is itself weak. In that case, the introduction of relation W2
should occur before that of W1 in order to ensure that the attributes forming the primary key of W1 can be ascertained when it is introduced.
End of note.
Step 3: Binary 1:1 Relationship Types
For each binary 1:1 relationship type R with participating entity types S and T, choose one of the corresponding relations, say S, and include
as a foreign key in S the attribute(s) forming the primary key of T. (Note: If possible, the role of S should be played by an entity type
constrained to participate totally in R. Include as attributes of S, in addition, all atomic sub-attributes of any attributes of R not involving
multi-valuedness.
With respect to COMPANY, consider the MANAGES relationship, which is 1:1. Among its participating entity types, EMPLOYEE and
DEPARTMENT, we choose the latter to play the role of S because it (and not the former) is constrained to participate totally. (Recall that
every department must have a manager; not every employee must be a manager, however.) Following the guidelines, we add foreign key
attribute MgrSSN (corresponding to primary key SSN of EMPLOYEE) to relation DEPARTMENT. We also add attribute MgrStartDate,
corresponding to the StartDate attribute of MANAGES.
4
Relational Database Design by ER- and EER-to-Relational Mapping
Updating our relational schema, we get
 DEPARTMENT(Name, Number, MgrSSN, MgrStartDate)
In the case that both participating entity types are constrained to participate totally in R, a viable alternative approach is to map the two entity
types, plus the relationship, into a single relation. This relation would include all the atomic sub-attributes of all attributes of S, T, and R not
involving multi-valuedness.
Let the relationship be of the form [S]——<R>——[T].

1. Foreign key approach: The primary key of T is added as a foreign key in S. Attributes of R are moved to S (possibly
renaming them for clarity). If only one of the entities has total participation it's better to call it S, to avoid null attributes. If
neither entity has total participation nulls may be unavoidable. This is the preferred approach in typical cases.
2. Merged relation approach: Both entity types are stored in the same relational table, “pre-joined”. If the relationship is not
total both ways, there will be null padding on tuples that represent just one entity type. Any attributes of R are also moved
to this table.
3. Cross-reference approach: A separate relation represents R; each tuple is a foreign key from S and a foreign key from T.
Any attributes of R are also added to this relation. Here foreign keys should CASCADE.
Approach Join cost Null-storage cost
Foreign key 1 low to moderate
Merged relation 0 very high, unless both are total
Cross-reference 2 none
Step 4: Binary 1:N Relationship Types
1:N and N:1 Relationships
For each regular (i.e., non-identifying) binary N:1 relationship type R involving entity types S and T (where S is at the N-side), include as a
foreign key in S the attribute(s) forming the primary key of T. (This makes sense, as each entity in S's entity set will participate in at most
one instance of R.)
Include as attributes of S, in addition, any atomic sub-attributes of attributes of R not involving multi-valuedness.
For (regular) 1:N relationships, do the same thing, reversing the roles of S and T.
In the COMPANY example, this guideline leads us to augment our relational schema to obtain the following:
 EMPLOYEE(SSN, Fname, Minit, Lname, Bdate, Sex, Address, Salary, DeptNum, SuperSSN )
 PROJECT(Name, Number, Location, CntrlDeptNum)
Let the relationship be of the form [S]——N<R>1——[T]. The primary key of T is added as a foreign key in S. Attributes of R are
moved to S. This is the foreign key approach. The merged relation approach is not possible for 1:N relationships. (Why?) The cross-
reference approach might be used if the join cost is worth avoid null storage.
Step 5: Binary M:N Relationship Types
For each M:N relationship type R, create a new relation R to represent R. Include as foreign key attributes in R the primary keys of the
relations corresponding to the participating entity types. Together those attributes will comprise the primary key of R
Include as attributes of R, in addition, all atomic sub-attributes of attributes of R not involving multi-valuedness.
Note that, unlike 1:1, 1:N, and N:1 relationships, we cannot represent an M:N relationship by including a foreign key (plus any attributes of
the relationship) in a relation corresponding to one of the two participating entity types.
In the COMPANY example, the M:N relationship WORKS_ON is mapped into a relation WORKS_ON with foreign key attributes PNO
(referring to the number of the participating project instance) and ESSN (referring to the SSN of the participating employee instance). These
two attributes together form the primary key of WORKS_ON. The attribute Hours (taken from the WORKS_ON relationship) is included as
an attribute of WORKS_ON, too.
Note that the approach described here for mapping M:N relationships from the ER to the relational model could be applied (as an alternative
to what was described in Steps 3 and 4) to 1:1, 1:N, and N:1 relationships. Doing so is recommended for relationships that are expected to
have few instances (relative to the number of instances of the relation into which a foreign key attribute would be inserted, following the
guidelines of Step 3 or 4) in order to avoid lots of null values in foreign key attributes.
Following this alternative approach, the primary key of the "relationship relation" will be the foreign key attribute(s) that refers to the
participating entity type at the N-side of the relationship. (The foreign key attribute referring to the entity type at the 1-side would not be part
of the primary key.)
For example, suppose that we wanted to keep track of which employees were married to one another. Then we would have a binary 1:1
relationship MARRIED_TO with participating entity types EMPLOYEE and EMPLOYEE. Following the guidelines of Step 3, we would
include an attribute Spouse, say, in the relation EMPLOYEE. But if 95% of our employees are not part of a married couple of employees,

5
Relational Database Design by ER- and EER-to-Relational Mapping
95% of the instances of EMPLOYEE will have the value null in the Spouse attribute. In this case, it would probably make more sense to
introduce, in accord with the guidelines of Step 5, a relation MARRIED_TO instead.
Here the cross-reference approach (also called a relationship relation) is the only possible way.
Step 6: Multivalued Attributes
Let an entity S have multivalued attribute A. Create a new relation R representing the attribute, with a foreign key into S added. The
primary key of R is the combination of the foreign key and A. Once again this relation is dependent on an “owner relation” so its
foreign key should CASCADE.
Step 7: Higher-Arity Relationship Types

Here again, use the cross-reference approach. For each n-ary relationship create a
relation to represent it. Add a foreign key into each participating entity type. Also
add any attributes of the relationship. The primary key of this relation is the
combination of all foreign keys into participating entity types that do not have a max cardinality of 1.
Summary
ER Model Relational Model
Entity type Entity relation
1:1 and 1:N relationship type Foreign key or relationship relation
M:N relationship type Relationship relation and two foreign keys
n-ary relationship type Relationship relation and n foreign keys
Simple attribute Attribute
Composite attribute Set of component attributes
Multivalued attribute Relation and foreign key
Value set Domain
Key attribute Primary key or secondary key
Mapping EER to Relational
Adds three new complications to consider: how to represent a specialization group (generalization group), how to handle shared
subclasses, and how to represent union types.
Specialization/Generalization
Step 8: Specialization/Generalization
Use Attrs(R) to mean the attributes of relation R, and PK(R) to mean the primary key of R. Let the subclasses be
{S₁,S₂,…,Sm} and the superclass be C, and let Attr(C) = {k,a₁,…,an} and PK(C) = k.

A. Multiple relations—superclass and subclasses Create a relation to represent C, having its attributes and primary key.
Also create relations for each Si, having attributes Attr(Si) ∪ {k}. In these relations k is both the primary key and a
foreign key into C.
B. Multiple relations—subclass relations only Only create relations for each Si. The attributes of each of these relations
will be Attr(Si) ∪ {k} ∪ Attr(C). The primary key will be k.

6
Relational Database Design by ER- and EER-to-Relational Mapping
C. Single relation with one type attribute Create one relation representing the entire specialization set. Its attributes are
Attr(C) ∪ {t} ∪ i=0⋃m Attr(Si). Attribute t is a type attribute (discriminator) that identifies which entity
subtype an entity belongs to. If the specialization is already attribute-defined it uses that as t, otherwise t is a new
attribute.
D. Single relation with multiple type attributes Proceed as in the previous approach, except instead of one t create m ts,
each one a Boolean indicating whether a tuple is a member of its associated subtype. Together the ts form a bitmap of the
entity's type.
Approach Preconditions Considerations Tradeoffs

A (super and sub relations) None Most flexible but incurs a join cost
B (sub relations) Specialization must Specialization should In order to query all entities of the supertype one
be total be disjoint must OUTER UNION the subtype relations (and
project to Attr(C), technically)
C (one relation, one t) Specialization must If one table is desirable Potential for a large NULL cost
be disjoint

D (one relation, multiple t's) None One table is desired but Also very flexible but incurs a NULL cost
specialization is
overlapping
Note this step is applied to each specialization group independently. In a hierarchy or lattice, each application of the step is free to be any of
the four approaches.

Person/{Employee,Alumnus,Student} 8A
Employee/{Staff,Faculty,Student_Assistant} 8C
Student_Assistant/{Research_Assistant,Teaching_Assistant} 8D
Student/Student_Assistant 8D
Student/{Graduate_Student,Undergraduate_Student} 8D
Shared Subclasses (Multiple Inheritance)
All classes must have the same key. Then, any of the four approaches can work, but an approach must be performed for each superclass of
the shared subclass (one for each specialization group). They can be different. In the previous example, Employee →
Student_Assistant used approach 8C, adding attribute Employee_type to the Employee relation. But Student →
Student-Assistant used approach 8D, adding Boolean attribute Student_assist_flag to the Student relation. (The
attribute of Student_Assistant, Percent_time, could have gone to either one of the relations, so long as it went to only one of
them.)
Union Types (Categories)
Step 9: Union Types
Let C₁,C₂,…,Cm be the entity types participating in the union and S be the union type. Create a relation for S. If the primary keys
of the Ci relations differ, create a surrogate key ks so that PK(S)=ks, and also add ks to each Attr(Ci) as a foreign key into S.
If all the Cis have the same primary key type, use that as PK(S) instead.
By the way, the book mentions it may be a good idea to add a type discriminator to S, to indicate which entity type a tuple in the union
type belongs to. Why?
Your Questions
 “The "of" relationship should be made a relationship relation” — It's the defining relationship of a weak entity, so it shouldn't.
 “Prop should not be part of the primary key in Prop_Appears_In” — It should, as per step 6.
 It seems like primary keys are arbitrary. — For entities it's one of the candidate keys, your choice. For others, it's specified as part
of the mapping process.
 Do all mappings of a binary 1:1 relationship type require total participation? — No, actually none of them require it but the
merged relation approach is pretty bad unless it's total both ways.
 What's the advantage of ER diagrams over a listing of table schemas? — Depending on size/complexity, perhaps none. But the
biggest advantage is in distinguishing between entities and relationships, both appearing as relations in the relational model. Then
there's also things like structured attribute types, multivalued attributes, and cardinality constraints.

7
Relational Database Design by ER- and EER-to-Relational Mapping
 Does having excess or useless information make it incorrect? — No, not necessarily.
 Do too many unneeded relationships impact the quality of a model? — Yes.

ER and EER Practice

1. Regular entity types


2. Weak entity types
3. Binary 1:1 relationship types
4. Binary 1:N relationship types
5. Binary M:N relationship types
6. Multivalued attributes
7. Higher-arity relationship types

1. Regular entity types


2. Weak entity types
3. Binary 1:1 relationship types
4. Binary 1:N relationship types
5. Binary M:N relationship types
6. Multivalued attributes
7. Higher-arity relationship types
8. Specialization/Generalization
9. Union types

The Relational Model


Concepts
Table, Row, Column Header, Column Type :: Relation, Tuple, Attribute, Domain
8
Relational Database Design by ER- and EER-to-Relational Mapping

domain
specialized data type, set of atomic values (indivisible to relational model)
relation schema
R(A₁,A₂,…,An), relation name and list of attributes (each of which has a name and a domain)
attribute
the name of a role played by a domain in that relation
relation (relation state)
r(R), a set of tuples. Each tuple is an ordered list of values, corresponding with the domain of their attribute, and representing a
fact. A relation state is a subset of the Cartesian product of the domains defining the relation schema.
characteristics of relations
 a relation is a set – tuples are not in any order, and have no duplicates
 flat relational model – values are atomic, not structures or lists
 NULL (ω) – “information missing”, “not applicable”; ambiguous semantics, not a member of any domain
 semantics of a relation – each tuple is an assertion of a true fact; sometimes confusing that this could be about an entity
or a relationship; the schema can be interpreted as a predicate, each tuple is a list of values for which the predicate is
true.
Notation
 Q,R,S: relation names
 R(A₁…): relation schema (intension of the relation)
 q,r,s: relation states (extension of the relation)
 t,u,v: tuples
 a relation name by itself refers to the current relation state, not the relation schema
 dot notation can qualify an attribute name (like Q.a versus R.a) but all attributes in a schema must be uniquely named
 t=<v₁,v₂,…vn>: a tuple with its component values
 t[Ai], t.Ai, t[i]: all ways of referring to the ith value of the tuple t, which corresponds with the attribute Ai
 t[Au,Aw,…,Az] (and similar): a subtuple formed from the values corresponding with the listed attributes
Constraints and schemas
Inherent model-based constraints or implicit constraints – already discussed. Schema-based constraints or explicit constrants – expressed by
the schema, written in the DDL, enforced by the DBMS. Application-based, semantic constraints, or business rules – other than the above;
typically enforced by the application.

domain constraints
within a tuple the value of each attribute must be an element from the domain of that attribute
key constraints
9
Relational Database Design by ER- and EER-to-Relational Mapping
a subset of attributes called a superkey (SK), that specify uniqueness on subtuples of their relation (for any distinct tuples t₁
and t₂ that are members of a relation r(R), t₁[SK]≠t₂[SK]). A key (K) is a minimal superkey. In notation, the primary
key is usually underlined. The remaining candidate keys are not and are called unique keys.
null constraints
applied to attributes, specifying that their values must be NOT NULL
relational database schema
A set of relational schemas and a set of integrity constraints. Also, a relational database state is a set of relation states (that satisfy
the integrity constraints).
entity integrity constraint
no primary key value can be NULL
foreign key
a set of attributes from one relation schema that have the same domains as, and refer to, the primary key of a second relation
schema
referential integrity constraint

One relation schema R₁has a foreign key FK referencing a second relation schema R₂ with primary key PK, and if for a tuple t₁ ∈
r₁ that t₁[FK] ≠ ω, t₂[PK] ∈ r₂. A foreign key must match its primary key unless it's NULL.
other constraints
semantic integrity constraints; enforced by applications, or by the DBMS if it supports a constraint specification language with
features like CREATE TRIGGER or CREATE ASSERTION.
Updates, transactions, constraint violations
Retrieval: A function taking a relational algebra expression, returning a result relation
Insert: Adds a new tuple to a relation. Can potentially violate any constraint.
Delete: Remove a tuple from a relation. Can violate referential integrity.
Update: Modify a tuple in a relation. Depending on what's modified, can
violate whatever the previous two can.
Example of a Generalization or Specialization

10

You might also like