Professional Documents
Culture Documents
Chapter 3.1 Complete (Relational Model)
Chapter 3.1 Complete (Relational Model)
The relational model is today the primary data model for commercial
data processing applications. It has attained its primary position
because of its simplicity, which eases the job of the programmer, as
compared to earlier data models such as the network model or the
hierarchical model.
We study
* The fundamentals of the relational model, which provides a very
simple yet powerful way of representing data.
* Three formal query languages (which are used to specify requests for
information). These are not user-friendly.
* We study Query Language: Relational Algebra in detail.
* The relational algebra forms the basis of the widely used SQL.
* At the end, we study the Tuple Relational Calculus (TRC) and the
Domain Relational Calculus (DRC), which are declarative query
languages based on mathematical logic.
* The domain relational calculus is the basis of the QBE (Query-By-
Example) query language.
Structure of Relational Database
A relational database consists of a collection of tables, each of which is
assigned a unique name. Each table has a structure similar to that
discussed earlier, where we represented E-R databases by tables. A row
in a table represents a relationship among a set of values. Since a table
is a collection of such relationships, there is a close correspondence
between the concept of table and the mathematical concept of relation,
from which the relational data model takes its name.
Consider the account table account-number branch-name balance
represented by Figure 3.1
A-101 Downtown 500
It has 3 column headers:
account-number, branch- A-102 Perryridge 400
name, and balance. We refer A-201 Brighton 900
to these headers as attributes. A-215 Mianus 700
For each attribute, there is a
set of permitted values, called A-217 Brighton 750
the domain of that attribute. A-222 Redwood 700
For the attribute branch- A-305 Round Hill 350
name, the domain is the set of
all branch names. Figure 3.1: The account relation
Let D1 denote the set of all account numbers, D2 the set
of all branch names, and D3 the set of all balances. Every row
of account must consists of a 3-tuple (v1,v2,v3) where v1 is an
account number, v2 is a branch name, and v3 is a balance.
In general, account will contain only a subset of the set of
all possible rows. Therefore, account is a subset of D1 X D2 X
D3. In general, a table of n-attributes must be a subset of D1 X
D2 X D3 … Dn-1 X Dn.
In mathematics, a relation is a subset of a Cartesian
product of a list of domains. This definition corresponds
almost exactly with our definition of table. The only difference
is that we have assigned names to attributes, whereas
mathematicians rely on numeric names. Because tables are
essentially relations, we will use the mathematical terms
relation and tuple in place of the terms table and row.
Tuple variable: A tuple variable is a variable whose domain is
the set of all tuples. In other words, a tuple variable is a
variable whose domain is the set of all tuples.
In the account relation, there are seven tuples. Let the tuple variable t
refer to the first tuple of the relation. We use the notation t[account-number]
to denote the value of t on the account-number attribute. Then, t[account-
number]=“A-101,” and t[branch-name]=“Downtown”. Alternatively, we can
write t[1] to denote the value of tuple t on the first attribute (account-
number), t[2] to denote branch-name, and so on. Since a relation is a set of
tuples, we use the mathematical notation of t ∈ r to denote that tuple t is in
relation r.
Remark: The order in which the tuple appear in a relation is irrelevant, since a
relation is a set of tuples.
It is possible for several attributes to have the same domain. For
example, we have a relation customer that has the three attributes customer-
name, customer-street, and customer-city, and a relation employee that
includes the attribute employee-name. It is possible that the attributes
customer-name and employee-name will have the same domain i.e. the set of
all person names, which at the physical level is the set of all character strings.
On the other hand, the domains of balance and branch-name must be distinct.
It is perhaps less clear whether customer-name and branch-name should have
the same domain. At the physical level, both customer names and branch
names are character strings. However, at the logical level, we may want
customer-name and branch-name to have distinct domains.
Database Schema: The database schema is the logical design of
the database whereas the database instance is the snapshot of
the data in the database at a given instant in time.
Remark: The concept of a relation corresponds to the
programming language notion of type definition.
*Use lower-case names for relations, and UPPER-CASE LETTER for
relation SCHEMAS.
Ex: We use Account-schema to denote the relation schema for
relation account. Thus,
Account-schema=(account-number, branch-name, balance)
Suppose account is a relation on Account-schema then we denote
by: account(Account-schema) or r(R).
relation schema
Consider the branch relation:
Figure 3.2: The branch relation
branch-name branch-city balance
Brighton Brooklyn 7100000
Downtown Brooklyn 9000000
Mianus Horseneck 400000
North Town Rye 3700000
Perry ridge Horseneck 170000
Pownal Bennington 300000
Redwood Palo Alto 2100000
Round Hill Horseneck 8000000
customer-name loan-number
Figure 3.6: The borrower relation
Adams L-16
Curry L-93
Hayes L-15
Jackson L-14
Jones L-17
Smith L-11
Smith L-23
Williams L-17
The following E-R diagram depicts the banking enterprise that we have
discussed so far. branch-
city
depositor loan-branch
loan-no amount
customer
-street
Figure 3.7: Entity-Relationship diagram for the banking enterprise.
Tables and attributes
branch(branch_name, branch_city, balance)
account(account_number, branch_name, balance)
customer(customer_name, customer_street, customer_city)
depositor(customer_name, account_number)
loan(loan_number, branch_name, amount)
borrower(customer_name, loan-number)
KEYS
The notion of superkey, candidate key, and primary key studied
earlier are also applicable to the relational model.
Ex: In branch-schema, {branch-name} and {branch-name, branch-city}
are both superkeys. {branch-name, branch-city} is not a candidate key,
because {branch-name} is a subset of {branch-name, branch-city} and
{branch-name} itself is a superkey. However, {branch-name} is a
candidate key, and for our purpose also will serve as a primary key. The
attribute branch-city is not a superkey, since two branches in the same
city can have different names.
Let R be a relation schema. If we say that a subset K of R is a
superkey for R, we are restricting consideration to relations r(R) in which
no two distinct tuples have the same values on all attributes in K. That is,
if t1 and t2 are in r and t1 ≠ t2 , then t1[K] ≠ t2[K].
If a relational database schema is based on tables derived from an
E-R schema, it is possible to determine the primary key for a relation
schema from the primary keys of the entity or relationship sets from
which the schema is derived.
Strong entity set: The primary key of the entity set becomes the
primary key of the relation.
Weak entity set: The relation corresponding to a weak entity set
includes:
(i) The attributes of the weak entity set
(ii) The primary key of the strong entity set on which the weak entity set
depends.
The primary key of the relation consists of the union of the primary key
of the strong entity set and the discriminator of the weak entity set.
Relationship set: The union of the primary key of the related entity sets
becomes a superkey of the relation. If the relationship is many-to-many,
this superkey is also the primary key.
Combined tables: We know that a binary many-to-one relationship set
from A to B can be represented by a table consisting of the attributes of
A and B attributes of the relationship set. The primary key of the many
entity set becomes the primary key of the relation. For one-to-one
relationship sets, the relation is constructed like that for a many-to-one
relationship set. However, we can choose either entity set’s primary key
of the relation, since both are candidate keys.
Multivalued attributes: The a multivalued attribute M is represented
by a table consisting of the primary key of the entity set or relationship
set of which M is an attribute plus a column C holding an individual
value of M. The primary key of the entity or relationship set together
with the attribute C becomes the primary key for the relation.
Suppose a relation schema, say r1, derived from an E-R schema
can include among its attributes the primary key of another schema,
say r2. This attributes is called a foreign key from r1, referencing r2.
The relation r1 is also called the referencing relation of the foreign key
dependency, and r2 is called the referenced relation of the primary key.
Ex: The attribute branch-name in Account-schema is a foreign key from
Account-schema referencing Branch-schema, since branch-name is the
primary key of Branch-schema.
Note: It is customary to list the primary key attributes of a relation
schema before the other attributes.
Ex: The branch-name attribute of Branch-schema is listed first since it is
the primary key.
Schema diagram: A database schema, along with primary key and foreign key
dependencies, can be depicted pictorially by schema diagrams. The following
figure shows the schema diagram for our banking enterprise.
loan borrower
loan-no customer-
branch-name name
amount loan-number
We can find all tuples in which the amount lent is more than $1200 by writing
σ amount >1200 (loan)
loan-number branch-name amount
Figure 3.10: σ amount >1200 (loan)
L-14 Downtown 1500
L-15 Perryridge 1500
L-16 Perryridge 1300
L-23 Redwood 2000
Figure 3.11: σ branch − name =" Perryridge " ∧ amount >1200 (loan)
loan-number branch-name amount
L-15 Perry ridge 1500
L-16 Perry ridge 1300
In general, we allow comparisons using =, ≠, < , ≤ , >, ≥ in the selection
predicate. Further, we can combine several predicates into a larger
predicate by using the connectives and (˄), or (˅), and not ( ¬ ) .
(2) The Project Operation:
Suppose we want to list all loan numbers and the amount of the loans,
but do not care about the branch name. The project operation
extracts attributes (columns or fields) from a relation (table).
The project operation is denoted by the Greek letter ∏ .
The general form of project operation:
∏ a t t r i b u t e -li s t (R ) loan-number amount
L-11 900
L-14 1500
Ex: Figure 3.12: ∏ loan-number, amount (loan) L-15 1500
L-16 1300
L-17 1000
Remark: Result of a relational operation is
L-23 2000
itself a relation.
L-93 500
Composition of Relational Operations/ Combination of Select and Project
operation together:
( )
General form: π attribute lists σ selection condition ( R )
Suppose we want to find the customers who live in “Harrison”.
( )
Ex: π customer name σ customer city =" Harrison " (customer )
Note that instead of giving the name of a relation as the argument of the
projection operation, we give an expression that evaluates to a relation.
In general, since the result of a relational-algebra operation is of the same
type (relation) as its input, relational-algebra operations can be composed
together into a relational-algebra expression. Composing relational-algebra
operations into relational-algebra expressions is just like composing
arithmetic operations (such as +,-,*, and ÷ ) into arithmetic expressions.
(3) The Union Operation: Consider a query to find the names of all bank
customers who have either an account or a loan or both. Note that the
customer-relation does not contain the information, since a customer
relation does not need to have either an account or a loan at the bank.
Here we need the information in the depositor relation (Figure 3.4) and
in the borrower relation (Figure 3.6).
We know how to find the names of all customers with a loan in the bank:
π customer −name ( borrower )
We also know how to find the names of all customers with an account in
the bank: π custom er − nam e ( depositor )
Now suppose we need all customer names that appear in either or both
of the two relations i.e. the union of the above two sets is,
π customer − name ( borrower ) ∪ π customer − name ( depositor ) customer-name
Figure 3.14
(5) The Cartesian-Product operation:
The Cartesian-product operation is denoted by a cross (X) and allows us to
combine information from any two relations. We write the Cartesian
product of relations r1 and r2 as r1 X r2.
Ex: The relation schema for r=borrower X loan is (borrower.customer-
name, borrower.loan-number, loan.loan-number, loan.branch-name,
loan.amount).
With this schema, we can distinguish borrower.loan-number from
loan.loan-number. For those attributes that appear in only one of the two
schemas, we shall usually drop the relation-name prefix. This simplification
does not lead to any ambiguity. We can write the relation schema for r as
(customer-name, borrower.loan-number, loan.loan-number, branch-name,
amount).
This naming convention requires that the relations that are the
arguments of the Cartesian-product operation have distinct names. This
requirement causes problems in some cases, such as when the Cartesian
product of a relation with itself is desired. A similar problem arises if we
use the result of a relational-algebra expression in a Cartesian product,
since we will need a name for the relation so that we can refer to the
relation’s attributes. In this case we use rename operation.
Assume that we have n1 tuples in borrower and n2 tuples in loan. Then, there
are n1Xn2 ways of choosing a pair of tuples. One tuple from each relation. So
there are n1*n2 tuples in r. In particular, note that for some tuples t in r, it may
be that t[borrower.loan-number] ≠ t[loan.loan-number].
In general, if we have relations r1(R1) and r2(R2), then r1Xr2 is a relation
whose schema is the concatenation of R1 and R2. Relation R contains all tuples
t for which there is a tuple t1 in r1 and a tuple t2 in r2 for which t[R1]=t1[R1]
and t[R2]=t2[R2].
Suppose that we want to find the names of all customers who have a
loan at the Perryridge branch. We need the information in both the loan
relation and the borrower relation to do so.
Consider the query σ branch − name =" Perryridge" ( borrower × loan )
The result relation for this query appears in Figure 3.16.
Suppose we have a relation that pertains to only the Perryridge branch.
However, the customer-name column can contain customers who do not have
a loan at the Perryridge branch.
customer-name borrower.loan- loan.loan-number branch-name amount
number
Adams L-16 L-11 Round Hill 900
Adams L-16 L-14 Downtown 1500
Adams L-16 L-15 Perryridge 1500
Adams L-16 L-16 Perryridge 1300
Adams L-16 L-17 Downtown 1000
Adams L-16 L-23 Redwood 2000
Adams L-16 L-93 Mianus 500
Curry L-93
.
.
Hayes L-15
.
.
Smith L-23
.
.
Williams L-17
.
.
Fig. 3.16 Result of σ branch − name =" Perryridge " ( borrower × loan )
Since the Cartesian-product operation associates every tuple of loan
with every tuple of borrower, we know that, if a customer has a loan in
the Perryridge branch, then there is some tuple in borrower X loan that
contains his name, and borrower.loan-number=loan.loan-number. So, if
we write
σ borrower .loan − number =loan.loan − number (σ branch − name =" Perryridge" ( borrower × loan ))
We get only those tuples of borrower X loan that pertain to customers
who have a loan at the Perryridge branch.
Finally, since we want only customer-name, then we use projection:
∏ customer − name (σ borrower .loan − number =loan.loan − number (σ branch − name =" Perryridge" ( borrower × loan )))
This expression gives those balances in the account relation for which a
larger balance appears somewhere in the account relation (renamed as
d). The result contains all balances except the largest one. The following
figure shows this relation.
balance
500
400
700
750
350