Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

KIST College of Science and Technology

Kamalpokhari, Kathmandu

Notes:
BIT276CO Database Management System
Semester IV

Notes Written By Biplap Bhattarai

May 30, 2018

1|Page
Unit 3 Relational Model

3.1 Introduction to Relational Databases:

A relational database is a collection of data items organized as a set of formally-described tables


from which data can be accessed or reassembled in many different ways without having to
reorganize the database tables. The relational database was invented by E. F. Codd at IBM in
1970. A software system used to maintain relational databases is a relational database
management system (RDBMS). Relational Database is based on Relational model: This Model
organizes data into one or more tables (or "relations") of columns and rows, with a unique key
identifying each row. Rows are also called records or tuples. Columns are also called attributes.

A relational database organizes data in tables (or relations). A table is made up of rows and
columns. A row is also called a record (or tuple). A column is also called a field (or attribute). A
database table is similar to a spreadsheet. However, the relationships that can be created among
the tables enable a relational database to efficiently store huge amount of data, and effectively
retrieve selected data.

The standard user and application program interface to a relational database is the structured
query language (SQL). SQL statements are used both for interactive queries for information from
a relational database and for gathering data for reports.

In addition to being relatively easy to create and access, a relational database has the important
advantage of being easy to extend. After the original database creation, a new data category can
be added without requiring that all existing applications be modified.

A relational database is a set of tables containing data fitted into predefined categories. Each
table (which is sometimes called a relation) contains one or more data categories in columns.
Each row contains a unique instance of data for the categories defined by the columns. For
example, a typical business order entry database would include a table that described a customer
with columns for name, address, phone number, and so forth. Another table would describe an
order: product, customer, date, sales price, and so forth. A user of the database could obtain a
view of the database that fitted the user's needs. For example, a branch office manager might like
a view or report on all customers that had bought products after a certain date. A financial
services manager in the same company could, from the same tables, obtain a report on accounts
that needed to be paid.

Today, there are many commercial Relational Database Management System (RDBMS), such as
Oracle, IBM DB2 and Microsoft SQL Server. There are also many free and open-source
RDBMS, such as MySQL, mSQL (mini-SQL) and the embedded JavaDB (Apache Derby).

2|Page
Database Design Objective
A well-designed database shall:
 Eliminate Data Redundancy: the same piece of data shall not be stored in more than one
place. This is because duplicate data not only waste storage spaces but also easily lead to
inconsistencies.
 Ensure Data Integrity and Accuracy

Relational Database Design Process


Database design is more art than science, as you have to make many decisions. Databases are
usually customized to suit a particular application. No two customized applications are alike, and
hence, no two database are alike. Guidelines (usually in terms of what not to do instead of what
to do) are provided in making these design decision, but the choices ultimately rest on the you -
the designer.
Step 1: Define the Purpose of the Database (Requirement Analysis)
Gather the requirements and define the objective of your database, e.g. ...
Drafting out the sample input forms, queries and reports, often helps.

Step 2: Gather Data, Organize in tables and Specify the Primary Keys
Once you have decided on the purpose of the database, gather the data that are needed to be
stored in the database. Divide the data into subject-based tables. Choose one column (or a few
columns) as the so-called primary key, which uniquely identify the each of the rows. In the
relational model, a table cannot contain duplicate rows. Most RDBMSs build an index on the
primary key to facilitate fast search and retrieval. Let's illustrate with an example: a table
customers contains columns lastName, firstName, phoneNumber, address, city, state, zipCode.
The candidates for primary key are name=(lastName, firstName), phoneNumber,
Address1=(address, city, state), Address1=(address, zipCode). Name may not be unique. Phone
number and address may change. Hence, it is better to create a fact-less auto-increment number,
say customerID, as the primary key.

Step 3: Create Relationships among Tables


A database consisting of independent and unrelated tables serves little purpose (you may
consider to use a spreadsheet instead). The power of relational database lies in the relationship
that can be defined between tables. The most crucial aspect in designing a relational database is
to identify the relationships among tables. The types of relationship include:
1. one-to-many
2. many-to-many
3. one-to-one

3|Page
1. One-to-Many
In a "class roster" database, a teacher may teach zero or more classes, while a class is taught by
one (and only one) teacher. In a "company" database, a manager manages zero or more
employees, while an employee is managed by one (and only one) manager. In a "product sales"
database, a customer may place many orders; while an order is placed by one particular
customer. This kind of relationship is known as one-to-many.

One-to-many relationship cannot be represented in a single table. For example, in a "class roster"
database, we may begin with a table called Teachers, which stores information about teachers
(such as name, office, phone and email). To store the classes taught by each teacher, we could
create columns class1, class2, class3, but faces a problem immediately on how many columns to
create. On the other hand, if we begin with a table called Classes, which stores information about
a class (courseCode, dayOfWeek, timeStart and timeEnd); we could create additional columns to
store information about the (one) teacher (such as name, office, phone and email). However,
since a teacher may teach many classes, its data would be duplicated in many rows in
table Classes.
To support a one-to-many relationship, we need to design two tables: a table Classes to store
information about the classes with classID as the primary key; and a table Teachers to store
information about teachers with teacherID as the primary key. We can then create the one-to-
many relationship by storing the primary key of the table Teacher (i.e., teacherID) (the "one"-end
or the parent table) in the table classes (the "many"-end or the child table), as illustrated below.

The column teacherID in the child table Classes is known as the foreign key. A foreign key of a
child table is a primary key of a parent table, used to reference the parent table.
Take note that for every value in the parent table, there could be zero, one, or more rows in the
child table. For every value in the child table, there is one and only one row in the parent table.

2. Many-to-Many
In a "product sales" database, a customer's order may contain one or more products; and a
product can appear in many orders. In a "bookstore" database, a book is written by one or more

4|Page
authors; while an author may write zero or more books. This kind of relationship is known
as many-to-many.

Let's illustrate with a "product sales" database. We begin with two tables: Products and Orders.
The table products contains information about the products (such
as name, description and quantityInStock) with productID as its primary key. The
table orders contains customer's orders (customerID, dateOrdered, dateRequired and status).
Again, we cannot store the items ordered inside the Orders table, as we do not know how many
columns to reserve for the items. We also cannot store the order information in
the Products table.
To support many-to-many relationship, we need to create a third table (known as a junction
table), say OrderDetails (or OrderLines), where each row represents an item of a particular order.
For the OrderDetails table, the primary key consists of two columns: orderID and productID, that
uniquely identify each row. The columns orderID and productID in OrderDetails table are used
to reference Orders and Products tables, hence, they are also the foreign keys in
the OrderDetails table.

The many-to-many relationship is, in fact, implemented as two one-to-many relationships, with
the introduction of the junction table.
a) An order has many items in OrderDetails. An OrderDetails item belongs to one particular
order.

5|Page
b) A product may appears in many OrderDetails. Each OrderDetails item specified one
product.

3. One-to-One
In a "product sales" database, a product may have optional supplementary information such
as image, moreDescription and comment. Keeping them inside the Products table results in many
empty spaces (in those records without these optional data). Furthermore, these large data may
degrade the performance of the database.
Instead, we can create another table (say ProductDetails, ProductLines or ProductExtras) to store
the optional data. A record will only be created for those products with optional data. The two
tables, Products and ProductDetails, exhibit a one-to-one relationship. That is, for every row in
the parent table, there is at most one row (possibly zero) in the child table. The same
column productID should be used as the primary key for both tables.
Some databases limit the number of columns that can be created inside a table. You could use a
one-to-one relationship to split the data into two tables. One-to-one relationship is also useful for
storing certain sensitive data in a secure table, while the non-sensitive ones in the main table.

Column Data Types


You need to choose an appropriate data type for each column. Commonly data types include:
integers, floating-point numbers, string (or text), date/time, binary, collection (such as
enumeration and set).

Step 4: Refine & Normalize the Design


For example,
 adding more columns,
 create a new table for optional data using one-to-one relationship,
 split a large table into two smaller tables,
 others.

6|Page
Normalization
Apply the so-called normalization rules to check whether your database is structurally correct
and optimal.
First Normal Form (1NF): A table is 1NF if every cell contains a single value, not a list of
values. This properties is known as atomic. 1NF also prohibits repeating group of columns such
as item1, item2,.., itemN. Instead, you should create another table using one-to-many
relationship.
Second Normal Form (2NF): A table is 2NF, if it is 1NF and every non-key column is fully
dependent on the primary key. Furthermore, if the primary key is made up of several columns,
every non-key column shall depend on the entire set and not part of it. For example, the primary
key of the OrderDetails table comprising orderID and productID. If unitPrice is dependent only
on productID, it shall not be kept in the OrderDetails table (but in the Products table). On the
other hand, if the unitPrice is dependent on the product as well as the particular order, then it
shall be kept in the OrderDetails table.
Third Normal Form (3NF): A table is 3NF, if it is 2NF and the non-key columns are
independent of each others. In other words, the non-key columns are dependent on primary key,
only on the primary key and nothing else. For example, suppose that we have a Products table
with columns productID (primary key), name and unitPrice. The column discountRate shall not
belong to Products table if it is also dependent on the unitPrice, which is not part of the primary
key.

Integrity Rules
You should also apply the integrity rules to check the integrity of your design:
Entity Integrity Rule: The primary key cannot contain NULL. Otherwise, it cannot uniquely
identify the row. For composite key made up of several columns, none of the column can contain
NULL. Most of the RDBMS check and enforce this rule.
Referential Integrity Rule: Each foreign key value must be matched to a primary key value in
the table referenced (or parent table).
 You can insert a row with a foreign key in the child table only if the value exists in the parent
table.
 If the value of the key changes in the parent table (e.g., the row updated or deleted), all rows
with this foreign key in the child table(s) must be handled accordingly. You could either
(a) disallow the changes; (b) cascade the change (or delete the records) in the child tables
accordingly; (c) set the key value in the child tables to NULL.

Most RDBMS can be setup to perform the check and ensure the referential integrity, in the
specified manner.
Business logic Integrity: Beside the above two general integrity rules, there could be integrity
(validation) pertaining to the business logic, e.g., zip code shall be 5-digit within a certain ranges,
delivery date and time shall fall in the business hours; quantity ordered shall be equal or less than

7|Page
quantity in stock, etc. These could be carried out in validation rule (for the specific column) or
programming logic.

 Query languages:
A query language is a language in which a user requests information from the database. Query
languages can be categorized as either procedural or non-procedural.
a) In a procedural languagethe user instructs the system to perform a sequence of operations
on the database to compute the desired result. Example: Relational algebra.
b) In a non-procedural language, the user describes the desired information desired without
giving a specific procedure for obtaining that information. Example: tuple relational
calculus, domain relational calculus, SQL etc.

3.2Relational Algebra
The relational algebra is a procedural query language. A relational algebra is a collection of
formal operations acting on relations and producing relations as result. It is one of the procedural
query language in which a user requests information from a database.The main operations of the
relational algebra are the set operations (such as union, intersectionand Cartesian product),
selection (keeping only some lines of a table) and the projection (keeping only some columns).
Operations in Relational Algebra:

a) Fundamental operations:
a) Select operation
b) Project operation
c) Union operation
d) Set difference operation
e) Cartesian product operation
f) Rename operation
b) Additional operations:
a) Set-intersection operation
b) Natural-join operation
c) Division operation
d) Assignment operation
c) Extended Relational Algebra Operations
a) Generalized projection
b) Aggregate functions
c) Outer join
d) Null values

8|Page
Types of Relational Algebra Operators :
Operator Basic/Derived Unary/Binary/
Operator Name
Symbol Operators Logical
Π Projection Operator Basic Unary

σ Selection Operator Basic Unary

ρ Rename Operator Basic Unary

∪ Union Operator Basic Binary

× Cross Product Operator Basic Binary

Minus or Set
− Basic Binary
Difference Operator

∩ Intersection Derived Binary

⋈ Join Derived Binary

/ or ÷ Division Derived

← Assignment Operator

∧ Logical And Logical Logical

∨ Logical OR Logical Logical

¬ Logical NOT Logical Logical

ρ Rename Operator

Characteristics of relational operators :


 Relational operators always work on one or more relational table.
 Relational operators always produce another relational table.
 The result table produced by a relational operator has all the properties of relational
model.
 Union, intersection and difference operation, the two operand relations must be Union
compatible.

9|Page
1. Fundamental operations:
a) Select operation:
The selection operation is used to extract tuples (rows) from a relation that satisfy a given
predicate. It is denoted by sigma symbol ().

Syntax: - <condition>(Relation)
Selection Example:
Assume the following relation Employee has the following tuples:
Employee
Name Office Dept Rank
Bipin 400 Computer Assistant
Niky 220 Economics Adjunct
Rahul 160 Economics Assistant
Binita 420 Computer Associate
Solu 500 Finance Associate
 Select only those Employees who involve in the Computer department:
Dept = 'Computer' (Employee)
Result:
Name Office Dept Rank
Bipin 400 CS Assistant
Binita 420 CS Associate

 Select only those Employees with first name Solu who are associate professors:
Name = 'Solu' ˄ Rank = 'Assistant' (Employee)
Result:
Name Office Dept Rank
Solu 400 Finance Associate

 Select only those Employees who are either Assistant Professors or in the Economics
department:
Rank = 'Assistant' DeDept = 'Economics' (Employee)

 Result:
Name Office Dept Rank
Bipin 400 Computer Assistant
Rahul 160 Economics Assistant
Niky 220 Economics Adjunct

10 | P a g e
 Select only those Employees who are not in the Computer department or Adjuncts:

(Rank = ‘Adjunct’ ˄Dept = 'Computer') (Employee)


 Result:
Name Office Dept Rank
Rahul 160 Economics Assistant
Solu 500 Finance Associate

b) Project operation:
Projection operation is used to extracts specified columns of a relation. With the help of this
operation, any number of columns can be omitted from a table or columns of table can rearrange.
Syntax: - π<attribute-list>(Relation)
Projection Examples:
Assume the same Employee relation above is used.
 Project only the names and departments of the employees:
π name, dept(Employee)
Results:
Name Dept
Bipin Computer
Niky Economics
Rahul Economics
Binita Computer
Solu Finance

Combining Selection and Projection Operations:


 The selection and projection operators can be combined to perform both operations.
 Show the names of all employees working in the ‘Computer’ department:

πname (Dept = 'Computer' (Employee) )


Results:
Name
Bipin
Binita

 Show the name and rank of those Employees who are not in the ‘Computer’ department
or Adjuncts:

π name, rank ((Rank = 'Adjunct' Dept = 'Computer') (Employee) )

11 | P a g e
Result:
Name Rank
Rahul Assistant
Solu Associate
Exercises
Evaluate the following expressions:

1. π name, rank ((Rank = 'Adjunct' Dept = 'Computer') (Employee) )


2. πfname, age (Age > 22 (R U S) )
c) Union Operation (  ):
Consider the following relations R and S. The union of relations R and S is denoted by R U S
and it is the set of tuples that are either in R or in S or in both. It returns the union (set union) of
two compatible relations. For a union operation to be legal, we require that invoked relations
must have the same number of attributes and corresponding attributes have same type.
R S
First Last Age
Bill Smith 22 First Last Age
Kamala dhami 21 Pinky ojha 36
Maya Singh 23 Maya singh 23
Anisha Jha 22 Anisha KC 22

Result: Relation with tuples from R and S with duplicates removed.


First Last Age
Bill Smith 22
kamala dhami 21
Maya singh 23
Anisha Jha 22
Pinky Ojha 36
Anisha KC 22

d) Set Difference Operation (-):


Set difference is denoted by the minus sign (-). It finds tuples that are in one relation, but not in
another. Thus results in a relation containing tuples that are in R but not in S.
Result: Result: Relation with tuples from R but not from S
R-S
First Last Age
Bill Smith 22
Maya Singh 23
Anisha Jha 22

12 | P a g e
e) Cartesian product(X):
The Cartesian product operation does not require relations to union-compatible i.e. the involved
relations may have different schemas. The Cartesian product of two relations R and S is
denoted by RX S, is the set of all possible combinations of tuples of the two relations.
Example:
R S
First Last Age Dinner Dessert
Kamala Ojha 22 Steak Ice Cream
Pawan Bhatt 23 Lobster Cheesecake
Anisha KC 32

Result: Produce all combinations of tuples from two relations.


RXS
First Last Age Dinner Dessert
Kamala Ojha 22 Steak Ice Cream
Kamala Ojha 22 Lobster Cheesecake
Pawan Bhatt 23 Steak Ice Cream
Pawan Bhatt 23 Lobster Cheesecake
Anisha KC 32 Steak Ice Cream
Anisha CK 32 Lobster Cheesecake

Key points to remember to Union Compatible Relations:


Two relations R and S are union compatible if and only if they have the same degree and
the domains of the corresponding attributes are the same.

 Attributes of relations need not be identical to perform union, intersection and


difference operations.
 However, they must have the same number of attributes or arity and the domains
for corresponding attributes must be identical.
 Domain is the data type and size of an attribute.
 The degree of relation R is the number of attributes it contains.

f) Rename Operation:
The rename operator is denoted by rho (  ).
It can be used in two ways:
 𝜌𝑥 (𝐸)return the result of expression E in the table named x.
 𝜌𝑥(𝐴1 ,𝐴2 ,…,𝐴𝑛) (𝐸) return the result of expression E in the table named x with the
attributes renamed to A1, A2,…, An.
It is mainly used in the situation where we need to find the Cartesian product of a relation with
itself i.e. Account × Account.

13 | P a g e
For that we should rename one of the account tables by some other name to avoid the
confusion.
Example: 𝝆𝐸𝑚𝑝(𝑁𝑎𝑚𝑒1,𝐷𝑒𝑝𝑡1) (𝐸𝑚𝑝𝑙𝑜𝑦𝑒𝑒)
Employee Emp
Name Department Name1 Dept1
Bhupi IT Bhupi IT
Arjun CSC After rename Arjun CSC
Aayan IT Aayan IT

2. Additional operations:
a) Set Intersection Operation (  ):
Set intersection is denoted by symbol and it returns a relation that contains tuples that are in
both of its argument relations.
Result: Relation with tuples that appear in both R and S.
R S

First Last Age


Maya singh 23

b) Join operations:
i. Natural join ( )
ii. Theta join ( θ)
iii. Outer Join
I. Left outer join( )
II. Right outer join( )
III. Full outer join( )

The Join operation is used to combine related tuples from two relations into single tuples.
i. Natural join operation ( ):
The natural join is a binary operation that allows us to combine certain selections and a Cartesian
product into one operation. It is denoted by the join symbol . The natural join operation
performs the Cartesian product of given relations together with remove the duplicate attributes.
The natural join thus performs a join by equatingthe attributes with the same name and
theneliminates the replicated attributes.
In brief the result of the natural join of two relations R and S is the set of all combinations of
tuples in R and S that are equal on their common attribute names.
Formal definition of natural join:

14 | P a g e
Let R and S be any two relations and {A1, A2, A3, .........,An}are n attributes of given
relations then their natural join is denoted by R S and is defined as follow:
R S = πRUS (R.A1=S.A1˄R.A2 =S.A2 ˄R.A3 =S.A3.........˄R.An=S.An R X S)
Where R  S = {A1, A2, A3.....An}
For example consider the tables Employee and Dept and their natural join:
Employee Department
e-id e-name Dept Dept Manager
11 Bhupi Computer Computer Anisha
13 Anju Finance Finance Manisha
43 Manju Computer Production Umesh
54 Nisha Finance

Employee Department (this is equivalent to EmployeeEmp.Dept=Depart.DeptDepartment)


e-id e-name Dept manager
11 Bhupi Computer Anisha
13 Anju Finance Manisha
43 Manju Computer Anisha
54 Nisha Finance Manisha
Note:- The natural join is also called equijoin.

ii. Theta join operation:


The theta join operation is an extension to the natural join operation that allows us to specify the
join condition. The theta condition consists one of the comparison operators {=, <, <=, >, >=,
<>}. When join condition is = i.e. θ is =, the operation is called an equijoin.
Example:
Employee Department
e-id e-name salary
Dept-id Manager
11 Bhupi 3000
09 Anisha
13 Anju 4000
22 Manisha
43 Manju 5000
59 Umesh
54 Nisha 6000

Then Employeee-id >Dept-idDepartment is


e-id e-name salary Dept-id Manager
11 Bhupi 3000 09 Anisha
13 Anju 4000 09 Anisha
43 Manju 5000 09 Anisha
43 Manju 5000 22 Manisha
54 Nisha 6000 09 Anisha
54 Nisha 6000 22 Manisha

15 | P a g e
c) Division operation (  ):
It is denoted by symboland is suited to queries that include the phrase “for all”. It takes
two relations and builds another relation, consisting of values of an attribute of one
relation that match all the values in the other relation.
Examples of Division AB

Example 2: let’s take two relations Depositor and Branch as below:


DepositorBranch
customer-name account-number branch-name branch-city Assets
Pukar A-101 Newroad-branch Kathmandu 7000
Shikha A-102 Pokhara-branch Pokhara 3000
Anisha A-201 Kirtipur-branch Kirtipur 9000
Gaurab A-209 Dodhara-branch Lalitpur 6000
Bikky A-233 Kalanki-branch Kathmandu 7200
Binek A-409 Balkhu-ranch Kathmandu 2200
Kamala A-511 Banepa-branch Banepa 4000

Account
Account-number branch-name Balance
A-101 Newroad-branch 50000
A-102 Kirtipur-branch 60000
A-201 Balkhu-branch 90000
A-206 Pokhara-branch 20000
A-301 Kalanki-branch 12000
A-401 Banepa-branch 22000
A-503 Dodhara-branch 41000

Suppose we want to find all the customers who have an account at all branches located in
Kathmandu.

16 | P a g e
Strategy: think of it as three steps.
We can obtain the names of all branches located in Kathmandu by
r1= bname( bcity=”Kathmandu” (branch))

branch-name
Newroad-branch
Kalanki-branch
Balkhu-ranch

We can also find all cname, bname pairs for which the customer has an account by
r2=cname, bname(depositor account)
customer-name branch-name
Pukar Newroad-branch
Shikha Kirtipur-branch
Anisha Balkhu-branch
Now we need to find all customers who have an account at all branches located in Kathmandu.
The divide operation provides exactly those customers:
cname, bname(depositor account) bname( bcity=”Kathmandu” (branch))
customer-name
Pukar
Shikha
Anisha

d) The Assignment Operation:


The assignment operation () provides a convenient way toexpress complex queries.It helps
human beings with writing out complex relational expressionsin steps so that they can be more
easily understood.
The assignment operation denoted by  and works like assignment in a programming language.
Example:
Variable E, Where E is any relational algebra expression.

3. Extended Relational Algebra Operations


1. Outer join operation:
The Outer joinoperation is an extension of the join operation to deal with missing information.
Three types of outer joins:
i. Left outer join operation ( ):
It includes all tuples in the left hand relation and includes only those matching tuples from the
right hand relation.

17 | P a g e
Example:
Assume we have two relations: PEOPLE and MENU:
PEOPLE: MENU:
Name Age Food Food Day
Alice 21 Hamburger Pizza Monday
Bill 24 Pizza Hamburger Tuesday
Carl 23 Beer Chicken Wednesday
Dina 19 Shrimp Pasta Thursday
Tacos Friday
Then PEOPLE MENU is
Name Age people.Food menu.Food Day
Alice 21 Hamburger Hamburger Tuesday
Bill 24 Pizza Pizza Monday
Carl 23 Beer NULL NULL
Dina 19 Shrimp NULL NULL

ii. Right outer join( ):


It includes all tuples in the right hand relation and includes only those matching tuples from the
left hand relation.
Example:
Assume we have two relations: PEOPLE and MENU as above:
Then PEOPLE MENU is
Name Age people.Food menu.Food Day
Bill 24 Pizza Pizza Monday
Alice 21 Hamburger Hamburger Tuesday
NULL NULL NULL Chicken Wednesday
NULL NULL NULL Pasta Thursday
NULL NULL NULL Tacos Friday

iii. Full outer join( ):


It includes all tuples in the left hand relation and from the right hand relation.
Example:
Assume we have two relations: PEOPLE and MENU as above:
Then PEOPLE MENU is
Name Age people.Food menu.Food Day
Alice 21 Hamburger Hamburger Tuesday
Bill 24 Pizza Pizza Monday
Carl 23 Beer NULL NULL
Dina 19 Shrimp NULL NULL
NULL NULL NULL Chicken Wednesday
NULL NULL NULL Pasta Thursday
NULL NULL NULL Tacos Friday

18 | P a g e
2. Null values:
It is possible for tuples to have a null value, denoted by null,for some of their attributes
 null signifies an unknown value or that a value does notexist.
 The result of any arithmetic expression involving null is null.
 Aggregate functions simply ignore null values (as in SQL)
 iFor duplicate elimination and grouping, null is treated likeany other value, and two nulls
are assumed to be the same(as in SQL)

3. Generalized projection:
It extends the projection operation by allowing arithmeticfunctions to be used in the projection
list. The generalized projection operation has the form:
F1, F2,…..,Fn (E)
Where E is any relational-algebra expression, and each of F1, F2, …, Fnare arithmetic
expressionsinvolving constants and attributes in the schema of E.
Example:Given relation instructor(ID, name, dept_name, salary)
Where salary is annual salary, get the same information butwith monthly salary we use following
projection operation called generalized projection.
ID, name, dept_name, salary/12 (instructor)

4. Aggregate Functions
Aggregate functions are functions that take a collection of values and return a single
value as a result. It is denoted by symbol(𝒢) read it as “calligraphic G”.
Some aggregate functions are:sum, avg, count, max, min.
Example: let’s take a relation “Fulltime-works” with a number of tuples as below:

Fulltime-works
employee-name
branch-name Salary
Ram Patan-branch 30000
Shyam Tokha-branch 20000
Rehman Palpa-branch 40000
Ram Patan-branch 25000

Problem:“Suppose we want to find the total salary of all the full time employees in branch wise”
branch-name 𝒢sum(salary) (Fulltime-works)
The result of aggregate function with grouping specified above will be:
branch-name sum of salary
Patan-branch 55000
Tokha-branch 20000
Palpa-branch 40000

19 | P a g e
Problem: Find the minimum Salary:
𝒢min(salary) (Fulltime-works)
Results: MIN(salary)
20000

Problem: Count the number of employees in the Patan-branch:


𝒢count(employee−name) (branch − 𝑛𝑎𝑚𝑒 = Patan-branch)(Fulltime-works)

Results: COUNT(employee-name)
2

Database Manipulation:
Until now we only did the extraction of information from the database. In this section we will
perform some modification on the database. We will namely use three types of operations for the
modification of the database; they are insertion, deletion and modification.
All these operations can be expressed using the assignment operator.
1. Insertion operation:
To insert data into a relation, we specify a tuple to be inserted.
Syntax: RR U E
Where R is a relation and E is a relational algebra expression.
Example: suppose we have a relation employee
Employee (Name, Salary, Address)
Suppose we wish to insert an employee “Bhupi” of salary 50,000 and live in Kathmandu then we
write,
EmployeeEmployee U {“Bhupi”, 50000, Kathmandu}

2. Deletion operation:
We can remove the selected tuples from the database. We cannot delete values of only particular
attributes.
Syntax: RR-E
Where R is a relation and E is a relational algebra expression.
Example: Delete all of Anju information from Employee relation:
Employee
e-id e-name Salary
11 Bhupi 3000
13 Anju 4000
43 Manju 5000
54 Nisha 6000
33 Anju 3400

20 | P a g e
EmployeeEmployee- e-name=”Anju”(Employee)
Result:
Employee
e-id e-name Salary
11 Bhupi 3000
43 Manju 5000
54 Nisha 6000

3. Updating Operation:
In some situation we may wish to change a value in tuple without changing all values in the
tuple. We can use the generalized-projection operator to do this task.
Syntax: <A1, A2,…..An>(Relation)
Where {A1, A2,........,An} are attributes.
Example: All employees working in department “Computer” has increased their salary by 15%.
Employee
e-id e-name department salary
11 Bhupi Computer 3000
13 Anju Math 4000
43 Manju Physics 5000
54 Nisha Computer 6000
33 Anisha Math 6400

Employee(e-id, e-name, department, salary + salary*0.15(department=”Computer”(Employee)) U e-id, e-name,


department, salary(department ≠”Computer”(Employee)))
Result: Employee
e-id e-name department salary
11 Bhupi Computer 3450
13 Anju Math 4000
43 Manju Physics 5000
54 Nisha Computer 6900
33 Anisha Math 6400

Relational Algebra Examples:


Example: Consider the relational database:
employee (person-name, street, city)
works (person-name, company-name, salary)
company (company-name, city)
manages (person-name, manager-name)

21 | P a g e
Give an expression in relational algebra for each of following requests:
a) Find the name of all employees who works for “NIBL Bank “.
person-name (company-name=”NIBL Bank” (works))
b) Find the names and cities of residence of all employees who work for “NIBLBank”.
person-name, city (company-name=”NIBL Bank”(employee works))

c) Find the names, street address, and cities of residence of all employees who works for
“Software Company” and earn more than 50000 per month.
person-name, street, city (company-name=”Software company”˄salary > 50000(employee works))

d) Find the name of all employees in the database who live in the same city as the company for
which they work.
person-name(employee works company))

e) Find the name of all employees in the database who do not work for “SBI Bank”.
person-name(company-name≠”SBI Bank”( works))

f) Find the names of all employees who earn more than every employee of “SBI bank”.
Temp𝒢max(salary) (company-name=”SBI Bank” (works))
person-name(salary > Temp (works))

g) Assume the company may be located in several cities. Find all companies located in every
city in which “SBI Bank” is located.
company-name, city (company) city (company-name=”SBI Bank”(company))

h) Give all employees of “SBI Bank” a 15% salary rise.


Works person-name, company-name, salary+salary*0.15 (company-name=”SBI Bank” (works))

i) Delete all tuples in the employee relation where employee’s city is “Kathmandu”.
employee employee-(city=”Kathmandu” (employee))

22 | P a g e
3.3 Relational Calculus
Relational calculus is non procedural, it is a language for expressing what we want without
expressing. Relational calculus has a variable. For tuple relational calculus, the variable ranges
over the tuples of a relation. For domain relational calculus, the variables ranges over the values
of the domain.

a) Tuple Relational Calculus


 A logical language with variables ranging over tuples:
{ T | C ond }
Return all tuples T that satisfy the condition Cond.
 { T | R(T ) }: returns all tuples T such that T is a tuple in relation R.
 { T .name | F AC U LT Y (T ) AND T .DeptI d =0 C S0 }. returns the values of name field
of all faculty tuples with the value ’CS’ in their department id field.
- The variable T is said to be free since it is not bound by a quantifier (for all, exists).
- The result of this statement is a relation (or a set of tuples) that correspond to all possible
ways to satisfy this statement.
- Find all possible instances of T that make this statement true.

The basic construct of tuple calculus is a tuple calculus expression. Tuple Calculus expressions
are made up of following constructs or elements.
 Tuple Variable :
A tuple variable is a variable that ‘ ranges over’ some named relation i.e. a variable
whose only permitted values are tuples of that relation.
Tuple variables are denoted by uppercase letters. For example T, U, V etc. If the tuple
variable T represents tuple t (at given time), then the expression T. A represents the A
component of t( at that time), where A is an attribute of the relation over which T
Ranges.

23 | P a g e
 Conditions :
Conditions are of the form x * y, where * is any relational operator =, !=, <, <=, >, >=
and at least one of the x & y is an expression of the form T.A & other is either a similar
expression or a constant.

 Well Formed Formulas (WFFs) :


A WFF is constructed from conditions, Boolean operators ( AND, OR, NOT) and
quantifiers (∃, ∀) according to the following rules:
- Every condition is WFF.
- If f is a WFFs, then (f) and NOT(f) are also WFFs.
- If f and G are WFFs, then (f AND g) and (f oOR g) are also WFFs.
- If f is WFF in which T occurs as a free variable then ∃T(f) are WFFs.
- Nothing else is a WFF.

 Criteria for free and bound variables :


- Within the condition all tuple variable occurrences are free.
- Tuple variable occurrence in the WFFs(f), NOT(f) are free/bound according as
they are free/bound in f.
- Tuple variable occurrences in the WFFs(f AND g), (f OR g) are free/bound
according as they are free/bound in f or g.
- Occurrence of T that are free in f are bound in the WFFs ∃T(f), ∀ T(f). Other
tuple variable occurrence in f are free/bound in these WFFs as they are
free/bound in f.

24 | P a g e
 Tuple calculus expression :
Form of expression is
T.A, U.B, ……., V.C [WHERE f]
Where T, U, …….., V are tuple variable; A, B, ……C are attributes of the associated
relation; & f is a WFF containing exactly T, U, ……, V as free variables. The value of
this expression is a projection of that subset of the cartesian product T x U x …..x V for
which f evaluates to true. If “where f” is omitted then the value is a projection of entire
cartesian product.

Quantified Statements
 Each variable T ranges over all possible tuples in the universe.
 Variables can be constrained by quantified statements to tuples in a single relation:
- Existential Quantifier. ∃T ∈ R(C ond) will succeed if Cond succeeds for at least one tuple in
T.
- Universal Quantifier. ∀T ∈ R(C ond) will succeed if Cond succeeds for at all tuples in T.

 Any variable that is not bound by a quantifier is said to be free.


 A tuple relational calculus expression may contain at most one free variable.
 The following two expressions are equivalent:
{ T .name | F AC U LT Y (T ) AND T.DeptI d = 0CS0}
is the same as:
{R | ∃T ∈ F AC U LT Y (T .DeptId =0CS0 AND R.name = T .name)}
can be read as:
“Find all tuples T field such that T is a tuple in the FACULTY relation and the value of DeptI d
field is ’CS’. Return a tuple with a single field name which is equivalent to the name field of one
such T tuple”.

 {R | ∃T ∈ F AC U LT Y (T .DeptI d =0 C S0 AND R.name = T .name)}


can be read as:
“Find all tuples R such that there exists a tuple T in FACULTY with the DeptI d field value
’CS’, and the value of the name field of R is equivalent to the name field of this tuple T .”
alternative read:
“Find all tuples R that can be obtained by copying the name field of SOME a tuple in
FACULTY with the value ’CS’ in its DeptId attribute.”

25 | P a g e
Tuple Relational Calculus Syntax
An atomic query condition is any of the following expressions:
 R(T ) where T is a tuple variable and R is a relation name.
 T .A oper S.B where T , S are tuple variables and A, B are attribute names, oper is a
comparison operator.
 T .A oper const where T is a tuple variable, A is an attribute name, oper is a comparison
operator, and const is a constant.
The satisfaction of atomic query conditions is defined in the usual way:
 R(T ) evaluates to true if T is a tuple in relation R.
If T is a variable, then R(T ) can evaluate to true by substituting T to one of the tuples in R.
 T .A oper const evaluates to true if the condition is true.
If T .A is an unbound variable, then this expression can evaluate to true by all possible
substitutions of T .A to some value that satisfy this condition.

Query Conditions
 Any atomic query condition is a query condition.
 If C1 and C2 are query conditions, then so are C1 AND C2, C1 OR C2, and NOT C1.
 If C is a query condition, R is a relation name, and T is a tuple variable, then ∀T ∈ R(C ) and
∃T ∈ R(C ) are both query conditions.
 A well formed tuple relational calculus query is an expression of the form:
{T | C } where C is a query condition where all the variables except for T are bound to quantified
expressions, and T is restricted a finite domain.

Query Condition Examples


 { T | ST U DEN T (T ) AND F AC U LT Y (T ) } will evaluate to true if T is a tuple in both
STUDENT and FACULTY relations. However, this is not possible since the schema of the
two relations are different. Two tuples can never be identical.
 Correct way to write this statement:
{T | ST U DEN T (T ) AND ∃T 2 ∈ F AC U LT Y (T .N ame = T 2.N ame AND T .Address
= T 2.Address AND T .P assword = T 2.P assword AND T .I d = T 2.I d)}.
 What is the result of the following statement?
{ T .DeptI d | ST U DEN T (T ) AND T .DeptI d =0 C S0 }
Since there is no DeptI d field of T , what is the value of T .DeptI d?
Answer. NULL

How about the following statements?


T emp1 = { T | T .A = 5 }
T emp2 = { T | T .A > 5 }

26 | P a g e
 T emp2 = { T | T .A > 5 } is an example of an unbounded expression, the tuple T can be
instantiated to infinitely many values. This is not allowed. All tuples variables should be
restricted to the tuples of a specific relation, even if they are not quantified.
 If a tuple variable T is bound to a relation R, then it only has values for the attributes in R.
All other attribute values are null.
 A well formed query will have a single unbounded variable. All other variables will have a
quantifier over them.

Example 1:
 Find the equivalent statement to this:
SELECT DISTINCT F.Name, C.CrsCode FROM FACULTY F, CLASS C
WHERE F.Id = C.InstructorId AND C.Year = 2002
{T | ∃F ∈ F AC U LT Y (∃C ∈ C LASS
(F.I d = C.I nstructorI d AND C.Y ear = 2002 AND
T .N ame = F.N ame AND T .C rsC ode = C.C rsC ode))}

 Find the equivalent statement to this:


SELECT DISTINCT F.Name FROM FACULTY F WHERE NOT EXISTS
(SELECT * FROM CLASS C WHERE F.Id = C.InstructorId AND C.Year = 2002)
{F.N ame | F AC U LT Y (F ) AND NOT(∃C ∈ C LASS(
F.I d = C.I nstructorI d AND C.Y ear = 2002))}
or carry the “NOT” inside the paranthesis:
{F.N ame | F AC U LT Y (F ) AND (∀C ∈ C LASS(
F.I d <> C.I nstructorI d OR C.Y ear <> 2002))}

Example 2:
 Find all students who have taken all the courses required by ’CSCI4380’.
{S.N ame | ST U DEN T (S) AND ∀R ∈ REQU I RES( R.C rsC ode <>0 C SC I 43800 OR (∃T
∈ T RAN SC RI P T ( T .StudI d = S.StudI d AND T .C rsC ode = R.P rereqC rsC ode AND
T .GradeI N (0A0,0 B0,0 C 0,0 D0))}.
 Find all students who have never taken a course from ’Prof. Acorn’. Return the name of the
student.
{S.N ame | ST U DEN T (S) AND ∀C ∈ C LASS (∃F ∈ F AC U LT Y (F.I d = C.I nstructorI d
AND (NOT(F.N amelike0%Acorn0) OR
NOT(∃T ∈ T RAN SC RI P T (S.I d = T .StudI d AND C.C rsC ode = T .C rsC ode AND C.Y
ear = T .Y ear AND C.SectionI d = T .SectionI d)))))}

27 | P a g e
Questions on Tuple Relational Calculus :
Consider the following Relations :
1. Suppliers (SID, Sname, Rating)
2. Parts (PID, Pname, Color)
3. Catalog (SID,PID, Cost)

Query 1 : SID of suppliers whose rating is greater than 10.


Solution :

Query 2 : Sname of suppliers who supplied some part.


Solution :

Query 3 : Sname of suppliers who supply some red part.


Solution :

Query 4 : SID suppliers who supplied red part.


Solution :

28 | P a g e
Query 5 : SID of suppliers who supplied some red part or Green part.
Solution :

Query 6 : SID of suppliers who supplied some red and some Green parts.
Solution :

Query 7 : SID of suppliers who supplied every part.


Solution :

29 | P a g e
b) Domain Oriented Relational Calculus
The domain oriented relational calculus differs from the tuples calculus in that its variable ranges
over domain rather than relations.
Expressions of the domain calculus are constructed from the following elements.
 Domain Variables :
Domain variables are denoted by uppercase letters. For example D,E,F etc. Each domain
variable is constrained to range over some specified domain.
 Conditions :
Conditions can takes two forms :
1. Simply comparisons :
Form is x * y, same as for the tuple calculus, except that x and y are now the
domain variables ( or constants).
2. Membership Condition :
The form is R (term, term, …..). Here R is a relation and each “term” is a pair A :
V, where A is an attribute of R and V is either a domain variable or a constant.

 Well Formed Formulas (WFFs):


Same as tuple calculus section but with revised definition of condition

 Free and Bound Variables :


Same as tuple calculus.

 Domain Calculus Expressions :


A domain calculus expression is then an expression of the form D, E, ….., F [WHERE f]
where D, E, ….., F are domain variables and f is a WFF containing exactly D, E, ….., F
are free variables. The values of this expression is that subset of the cartesian product D
x E x …. x F (where D, E, ….., F range over all their possible values) for which f
evaluated to true – or if “WHERE f” is omitted that entire cartesian product.

In the tuple relational calculus, you have use variables that have series of tuples in a relation. In
the domain relational calculus, you will also use variables but in this case the variables take their
values from domains of attributes rather than tuples of relations. An domain relational calculus
expression has the following general format –

{d1, d2, . . . , dn | F(d1, d2, . . . , dm)} m ≥ n


where d1, d2, . . . , dn, . . . , dm stand for domain variables and F(d1, d2, . . . , dm) stands for a
formula composed of atoms.

30 | P a g e
Formal Definitions

1. An expression is of the form

where the represent domain variables, and is a formula.

2. An atom in the domain relational calculus is of the following forms


o where is a relation on attributes, and , are domain
variables or constants.
o , where and are domain variables, and is a comparison operator.
o , where c is a constant.
3. Formulae are built up from atoms using the following rules:
o An atom is a formula.
o If is a formula, then so are and .
o If and are formulae, then so are , and .
o If is a formula where x is a domain variable, then so
are and .

Example Queries
We now give domain-relational-calculus queries for the examples that we considered earlier.
Note the similarity of these expressions and the corresponding tuple-relational-calculus
expressions.
i. Find the instructor ID, name, dept name, and salary for instructors whose salary is greater
than $80,000: {< i, n, d , s > | < i, n, d , s > ∈ instructor ∧ s > 80000}
ii. Find all instructor ID for instructors whose salary is greater than $80,000:
{< n > |∃ i, d , s (< i, n, d , s > ∈ instructor ∧ s > 80000)}

We now give several examples of queries in the domain relational calculus.


iii. Find the names of all instructors in the Physics department together with the course id of all
courses they teach:
{< n, c > |∃ i, a (< i, c, a , s, y > ∈ teaches ∧∃ d , s (< i, n, d , s > ∈ instructor ∧ d =
“Physics”))}
iv. Find the set of all courses taught in the Fall 2009 semester, the Spring 2010 semester, or
both: {< c > |∃ s (< c, a , s, y, b, r, t > ∈ section ∧ s = “Fall” ∧ y = “2009” ∨∃ u (< c, a , s,
y, b, r, t > ∈ section ∧ s = “Spring” ∧ y = “2010”

v. Find branch name, loan number, customer name and amount for loans of over $1200.

vi. Find all customers who have a loan for an amount > than $1200.

31 | P a g e
vii. Find all customers having a loan from the SFU branch, and the city in which they live.

viii. Find all customers having a loan, an account or both at the SFU branch.

ix. Find all customers who have an account at all branches located in Brooklyn.

If you find this example difficult to understand, try rewriting this expression using implication,
as in the tuple relational calculus example. Here's my attempt:

I've used two letter variable names to get away from the problem of having to remember
what stands for.

32 | P a g e

You might also like