Database Systems

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 112

Database Systems

Department of Electrical Engineering

Addis Ababa University

Betiglu Mengistu
2007
Jimma University
Faculty of Technology
Department of Electrical Engineering
EENG 477 - Database Systems 1st Semester, 2007/08 (2000 E.C.)

Course Objective : This course is intended for Electrical and Computer Engineering students to:
- familiarize them with the fundamentals of database systems and modeling techniques.
- give a ground knowledge for the analysis, design and implementation of database systems.
- discuss issues related to storage and security.
- introduce distributed and parallel database concepts.

Course Outcome : By the end of this course the students are expected to:
- understand the fundamental concept of database systems.
- identify different modeling levels and techniques and utilize them.
- analyze, design and implement a database system for a specific system.

Course Outline

1. Fundamental Concepts of a Database Systems (2Hrs)


ª Introduction to Database Systems
ª Components and Functionalities of a Database System
ª Types of Data Models
ª Steps of Database Design

2. Entity - Relationship (E/R) Data Model (8Hrs)


ª Elements of E/R Model
ª Design Principles
ª Keys and Constraints
ª Enhanced E/R Modeling

3. Relational Data Model (8Hrs)


ª Structure of Relational Database
ª Entity-Relational (E/R) Model to Relational Model Mapping
ª Dependencies
˜ Functional Dependencies
˜ Multi-valued Dependencies
ª Normal Forms and Normalization

4. Relational Algebra (6Hrs)


ª Fundamental Operations of Relational Algebra
Betiglu Mengistu

ª Additional and Extended Operations


ª Introduction to Relational Calculus

1
5. Structured Query Language (SQL) (8Hrs)
ª Introduction
ª Schema Definition in SQL
ª Simple Query Constructs and Syntax
ª Nested Sub-queries and Complex Queries
ª Views
ª Embedded and Dynamic SQL
[Relational Database Design and Implementation Project]

6. Data Storage and Querying (8Hrs)


ª Storage and File Structure
ª Indexing and Hashing
ª Query Processing and Optimization

7. Security and Integration (8Hrs)


ª Constraints and Triggers
ª Security and Authorization
ª Encryption and Authentication

8. Object Oriented Databases (8Hrs)


ª Object Analysis
ª Object Definition Language (ODL) Data Model
˜ Overview of Object Oriented Concept
˜ ODL Design and Syntax
ª Object-Relational Data Model
ª Object Oriented Databases

9. Introduction to Distributed and Parallel Databases (4Hrs)

References:
1. Database System Concepts: Silbershatz, Korth, Sudarshan; McGraw Hill; 4th Edition
2. Fundamentals of Database Systems: Elmasri, Navathe; Pearson; 4th Edition
3. Database System The Complete Book: H.G. Mollina, J.D. Ullman, J. Widom; Prentice Hall; 1st Edition
4. Database Management Systems: Raghu Ramakrishnan, Johannes Gehrke; McGraw Hill; 2nd Edition

Course Assessment (Subject to change)


Ó Semester Project 30%
Ó Quiz 20%
Ó Final Examination 50%

Betiglu M
Betiglu Mengistu

Department of Electrical and Computer Engineering


Faculty of Technology
Addis Ababa university

2
EENG 477- Database Systems 1
1. Fundamental Concepts of a Database Systems

1. Fundamental Concepts of a Database System

ª Introduction to Database Systems

Ñ What Is a Database System?


In its very simplest form, a Database can be viewed as a “repository for data” or “a collection of
data.” The repository is tasked with storing, maintaining and presenting large amounts of data
in a consistent and efficient fashion to the applications, and the users of such applications.
In the database definition the words collection and repository are more general; to be specific a
database has the following implicit properties:
& It represents aspects of a real world.
& It is collection of coherent (related) data.
& It is designed, built and populated to address a specific situation in real world.
Database Management System (DBMS) is then a tool for creating and managing this large
amounts of data efficiently and allowing it to persist for a long periods of time. Hence DBMS is a
general-purpose software that facilities the processes of defining, constructing, manipulating, and
sharing database.
- Defining: involves specifying data types, structure and constraints.
- Constructing: is the process of storing the data into a storage media.
- Manipulating: is retrieving and updating data from and into the storage.
- Sharing: allows multiple users to access data.
The phrase “Database System” is used to colloquially refer to database and database management
system (DBMS).

Ñ Evolution of a Database System


During the past three decades, the database technology for information systems has undergone
four generations of evolution, and we are nearly on the fifth generation database.
- The first generation was file system, such as ISAM and VSAM.
- The second generation was hierarchical database systems, such as IMS and System 2000.
- The third generation was the network model Conference on Data Systems Languages
(CODASYL) database systems, such as IDS, TOTAL, ADABAS, IDMS, etc. The second
Compiled By: Betiglu

and third generation systems realized the sharing of an integrated database among many
users within an application environment.
- The fourth-generation database technology, namely relational database technology arises
to solve the lack of data independence and the tedious navigational access to the database in

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 2
1. Fundamental Concepts of a Database Systems

the second and third generations. Relational database technology is characterized by the
notion of a declarative query.
- Fifth-generation database technology will be characterized by a richer data model and a
richer set of database facilities necessary to meet the requirements of applications beyond
the business data-processing applications for which the first four generations of database
technology have been developed.

Landmarks in Database System History


˜ 1950s and early 1960s: Magnetic disc into the usage of data storage. Data reading from
tapes and punched cards for processing were sequential.
˜ Late 19622s and 1970s: Hard disks come into play in late 1960s and direct data access
was made possible. A paper by Codd [1970] on relation model, querying and relational
database brighten the database system industry.
˜ 1980s: During the 1970s research and development activities in databases were focused
on realizing the relational database technology. These efforts culminated in the
introduction of commercially available systems in late 70s and early 80s, such as Oracle,
SQL/DB and DB2 and INGRES that became competitive to the hierarchical and network
database systems. A number of researches had also been published on distributed and
parallel database system.
˜ Late 1990s: The WWW and multimedia advancement forces the database system for
reliable and extensive operations. Moreover, the object-oriented programming languages
put a strain to a unified programming and database language. The reason is that an
object-oriented programming language is built on the object-oriented concepts, and
object-oriented concepts consist of a number of data modeling concepts, such as
aggregation, generalization, and membership relationships. An object-oriented database
system which supports such a unified object-oriented programming and database
language will be better platform for developing object-oriented database applications
than an extended relational database system which supports an extended relational
database language.

Ñ Database System Requirements


Databases evolved to take responsibility for the data away from the application, and most
importantly to enable data to be shared. Hence a database system must provide:
Compiled By: Betiglu

- Consistency: It must ensure that the data itself is not only consistently stored but can be
retrieved and shared efficiently.
- Concurrency: It must enable multiple users and systems to all retrieve the data at the same
time and to do so logically and consistently.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 3
1. Fundamental Concepts of a Database Systems

- Performance: It must support reasonable response times.


- Standard adherence: It should support a standard language for common understanding.
Standard Query Language (SQL) has to be supported. The two categories of the SQL are
Ð Data Definition Language (DDL): allow users to create new databases and specify
their schema.
Ð Data Manipulation Language (DML): enables users to query and manipulate data.
- Security: It should provide away to set access permissions (much like files at the operating
system level) and specific database mechanisms such as triggers.
- Reliability: It must keep the stored data intact. Additionally, it must cope well when things
go awry and it must, if set up properly, be able to recover to a known consistent point.

Ñ Database System versus File System


The traditional file processing system is file-directory structure supported by a conventional
operating system. A file system organization of data lucks a number of major features of a
database system, such as:
- Data redundancy and inconsistency: It is more likely that files and applications in a file
system to be of different format and standards. Moreover, same information may exist in
duplicate.
- Difficulty in accessing data: It does not support convenient and efficient responsive data-
retrieval system for new request in an existing data.
- Data isolation: Related data may be scattered across files.
- Integrity problems: Maintaining constraints across files and applications would be
difficult.
- Atomicity problems: In case of all-or-none set of operations it is crucial that, if a failure
occurs the data need to be restored to its consistent state. That is the set of operations must
be performed as a single unified operation.
- Concurrent access anomalies: Supervision of application is difficult to provide because
data may be accessed by any of the programs that are not coordinated.
- Security problems: Adding application programs to the system in ad hoc fashion makes
the system more vulnerable to security treats and attacks.

Ñ Database System Architecture


Compiled By: Betiglu

Centralized Database System Architecture


Centralized database systems are those that run on a single computer system and that do not
interact with the other computer system except for displaying information on display terminals.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 4
1. Fundamental Concepts of a Database Systems

Such database systems span from single-user database system that run on a single personal
computer to a high-performance database systems that run on a main frame.

Client/Server Architecture for a Database System


In the Client/Server architecture the client processes run separately from the server processes,
usually on a different computer. The architecture enables to specialized servers and
workstations (clients). The general structure of client/server architecture is shown below.

Client Client Client Client

File Mail DBMS


Server Server Server

Fig 1. Structure of client/server architecture

Two-Tier Client/Server Architecture is the simplest client/server application. In this


architecture the client processes provide an interface for the user, and gather and present data
usually either on a screen on the user's computer or in a printed report. The server processes
provide an interface with the data storage. The logic that validates data, monitors security and
permissions, and performs other business rules can be fully contained on either the client or the
server, or partly on the client and partly on the server. The exact division of the logic varies
from system to system.
The logic for the application can also be designed to form a separate middle tier. Applications
that are designed with separate middle tier have three logical tiers but still run into two physical
tiers. The middle tier may be contained in either the client or the server. Client/server
applications that are designed to run the user and business tiers of the application on the client
side, and the data tier on the server side are known as fat client applications. On the other hand,
applications that are designed to run the user tier on the client side and the business and data
tiers on the server side are known as thin client applications. Though fat and thin client/server
architectures have three tiers, such applications are intended to run on two computers as two
physical tiers. If the three tiers are separated so that the application can be run on three separate
computers, the implementation is known as a three-tier application.
Three-Tier Client/Server Architecture is an application that has three modularly separated
tiers that can be run on three machines. The standard model for a three-tier application has User
tier (GUI or Web Interface), Business tier (Application Server or Web Server) and Data tier
Compiled By: Betiglu

(Data Server).
- User tier presents the user interface for the application, displays data and collects user
input. It also sends and requests for data to the next tier. It is often known as the
presentation tier.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 5
1. Fundamental Concepts of a Database Systems

- The business tier incorporates the business rules for the application. It receives requests for
data from the user tier, evaluates them against the business rules and passes them on to the
data tier. It then receives data from the data tier and passes back to the user tier. It is also
known as the business logic tier.
- And finally at the base, the data tier comprises the data storage and a layer that passes data
from the data storage to the business tier and vice versa. It is also known as the data tier.

Client

Web Server

Data Server

Fig 2. Logical three-tier client/server architecture for a web application

Logical three-tier client/server architecture for a web application is shown below.

ª Components and Functionalities of a Database System


A database system can be partitioned into two modules as storage manager and query processor.
Storage manager: is a program module that provides interface between the low level data stored
in the database and the application programs or queries submitted to the system. The storage
manager translates the various DML statements into low level file system commands (the
conventional operating system commands); this it is responsible for storing, retrieving and
updating data. The main components of the storage manager are:
- Authorization and integrity manager: checks for credentials of the users and tests for the
integrity constraints.
- Transaction Manager: enables to preserve consistency despite system failure and avoid
conflict at the time of concurrent transaction.
Compiled By: Betiglu

- File manager: manages disk storage allocation and data structure for stored data.
- Buffer manger: is responsible for fetching data from disk storage to the main memory.
Query Processor: is a module that handles queries as well as requests for modification of the
data and metadata. Some of the components are:
Department of Electrical and Computer Engineering | AAU
EENG 477- Database Systems 6
1. Fundamental Concepts of a Database Systems

- DDL interpreter (compiler): processes DDL statements for schema definition (meta-data)
and records the definitions in the data dictionary.
- DML compiler: analyze, translates and optimizes DML statements in a high-level query
language into an evaluation plan consisting of low-level instructions codes to the query
evaluation (execution) engine.
- Query evaluation engine: execute low-level instructions generated by the DML compiler.
The components of general database management system can be summarized in the figure
shown below.
Application Application Query Tools Administration
Interfaces Programs Tools

Compiler and DML Queries DML Queries


Linker
Application
Program Object DML Compiler DDL Interpreter
Code and Organizer
Query Evaluation
Engine Query Processor

Authorization Transaction
Buffer Manager File Manager and Integrity Manager
Manager
Storage Manager

Indices Data Dictionary Disk Storage


Data Statistical Data
Fig 3. Database System Structure

ª Types of Data Models


Data model in a database design is one of the fundamental principles that provide some level of
data abstraction. It is a collection of conceptual tools for describing the database schema
(structure) that includes data, data relationship, data semantics, and consistency constraints.
In the last three decades many data models have been proposed some of them are:

Ñ Hierarchical Model
The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent and
child data segments. This structure implies that a record can have repeating information,
Compiled By: Betiglu

generally in the child data segments. Data in a series of records will have a set of field values
attached to it. It collects all the instances of a specific record together as a record type. These
record types are the equivalent of tables in the relational model, and with the individual records
being the equivalent of rows. To create links between these record types, the hierarchical model

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 7
1. Fundamental Concepts of a Database Systems

uses Parent Child Relationships. In a hierarchical database the parent-child relationship is one to
many. This restricts a child segment to having only one parent segment. Hierarchical DBMSs
were popular from the late 1960s, with the introduction of IBM's Information Management
System (IMS) DBMS, through the 1970s.

Ñ Network Model
Some data may naturally be modeled with more than one parent per child. So, the network model
permitted the modeling of many-to-many relationships in data. In 1971, the Conference on Data
Systems Languages (CODASYL) formally defined the network model. The basic data modeling
construct in the network model is the set construct. A set consists of an owner record type, a set
name, and a member record type. A member record type can have that role in more than one set,
hence the multi-parent concept is supported. An owner record type can also be a member or
owner in another set. The data model is a simple network, and link and intersection record types
may exist, as well as sets between them.

Ñ Relational Model
The history of the relational database began with E.F. Codd's 1970 paper, A Relational Model of
Data for Large Shared Data Banks. The concept derives from his principles of relational algebra.
Most of the database systems in use today are based on the relational system, known as
Relational Database Management Systems (RDBMS)
The model initial allows the definition of data structures, storage and retrieval operations and
integrity constraints. In such a database the data and relations between them are organized in
tables. A table is a collection of records and each record in a table contains the same fields
organized in columns. The records in the table form the rows of the table.
Properties of Relational Tables:
- Values Are Atomic
- Each Row is Unique
- Column Values Are of the Same Kind
- The Sequence of Columns is Insignificant
- The Sequence of Rows is Insignificant
- Each Column Has a Unique Name
The above three models the so-called legacy data models: the network and hierarchical models;
and the relational model are categorized under implementation (or representational) data
Compiled By: Betiglu

models which are closer to the physical structure of the database. Implementation data models
provide concepts to the understanding of users but they are not too far away from the way data
is organized within the computer.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 8
1. Fundamental Concepts of a Database Systems

The other category of data model is the high-level (or conceptual) data model.

Ñ High-level (Conceptual) Model


Conceptual data models provide concepts that are to the way many users perceive data, but don’t
specify the existence structure of physical data.

Entity – Relationship Model


Entity – Relationship (E/R) model is a conceptual model based on a perception of the world
based on the concept of entities, attributes and relationships.
- Entity: represents a real-world object or concept; such as employee and account.
- Attribute: describes an entity in the database; such as name and birth date for employee
and account number and balance for account.
- Relationship: is an association among the entities. For example a customer entity is
related to the account entity in a banking system.
E/R model is a diagrammatical modeling of a database schema that comprises rectangles,
ellipses, diamonds and lines.

Object-Oriented Model
The advancement of the Object-Oriented Programming (OOP) tends to evolve a new database
management system namely the Object DBMS (ODBMS). The object data model is a way for
the modeling of a database in ODBMS. It can be regard as high-level implementation data model
that is closer to the conceptual model. It is based on the object oriented concept mainly for
ODBMS implementation but can also be used in the data model of RDBMS implementation.
This combination object-oriented data model with the relational model leads into a data model
known as object-relational data model.

ª Steps of Database Design


Information Systems Design in general involves three steps
- Requirements analysis – specifies what the system is required to do based on user input.
- Design – specifies how the system will address the requirements.
- Implementation – translates design specifications into a working system.

Requirement analysis
Compiled By: Betiglu

Requirement analysis of a database design determines the data, information, system components,
data processing and analysis functions required by the system. It involves the process of
identifying and documenting the data required by users to meet present and future information
needs.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 9
1. Fundamental Concepts of a Database Systems

Requirements are determined by interviewing producers and users of data and producing a
formal requirements specification. The specification includes the data required for processing,
natural data relationships, constraints with respect to performance, integrity and security.
The Requirements analysis should address the following questions
- What user views are required (present and future)?
- What data elements are required in these user views?
- What are the primary keys that uniquely identify entities in the organization?
- What are the relationships between data elements?
- What are the operational requirements such as security, integrity, and response time?
Steps in requirements analysis for database design:
1. Identify scope of the design effort.
2. Establish metadata collection standards – who to interview, what to collect, how to
structure interview.
3. Identify user views – extracted by reviewing user tasks, types of decisions. Forms,
reports, graphs, maps can be useful information for defining views.
User view- subset of data used by a user in a specific context
4. Build a data dictionary – define and describe each item in detail: name, description, type,
length, range and relationships
5. Identify data volumes and usage patterns – how much data is used and how frequently is
data change.
6. Identify operational (functional) requirements.
The output of the requirement analysis can be broadly classified in to two as: data requirement
and functional requirement.

Design
Design of a database involves three types of designing steps:
Conceptual Design: Synthesis of information from requirements analysis according to
semantic rules. Outcome is a conceptual model. The conceptual model describes entities,
attributes and relations among entities independent of implementation details.
Implementation (Logical) Design: Transforms the conceptual data model into an internal
Compiled By: Betiglu

model - schema that can be processed by a particular DBMS. For example E/R model to
relational model mapping.
Physical Design: Involves design of internal storage structures, record formats, access
methods, record blocking and soon. [Requires a higher level study]
Department of Electrical and Computer Engineering | AAU
EENG 477- Database Systems 10
1. Fundamental Concepts of a Database Systems

Implementation
Implementation of a database is simply translating the implementation design into one of the
database management systems. That is writing/developing the entities and/or the objects in the
database schema together with their relationships and constraints.
The steps in the database design can be summarized in the following diagram.

Part of the Real


World

Problem

Requirement
Analysis

Functional Requirement Data Requirement

Functional Conceptual
Analysis Design

High-level Conceptual Schema


DBMS Transaction (High-level
Independent Specification Data Model)

Implementation
(Logical) Design

Application Program
DBMS Implementation
Design
Dependent (Logical) Schema

Application Program
Physical
Structure
Design

Internal Schema
Implementation (Low-level
Data Model)

Application Program

Fig 4. Database System Design Steps


Compiled By: Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 1
2. Entity - Relationship (E/R) Data Model

2. Entity – Relationship (E/R) Data Model


The database design stages as seen in the previous chapter can be split in five phases as:
1. Planning and Analysis
2. Conceptual Design
3. Implementation (Logical) Design
4. Physical Design, and
5. Implementation
The planning and analysis phase in a database design is an investigation phase, during which all
the needed information are gathered and analyzed. This stage is generally done with the help of
users, and is crucial to the second phase, the conceptual design. The borders between the
analysis, conceptual, and logical designs are often blurred. You go from one stage to the other,
back and forth.
The conceptual design stage develops a conceptual data model that later on is transformed into
the implementation model. The conceptual data model mainly refers to the Entity –
Relationship (E/R) Data Model.
At full deployment the conceptual design also includes functional requirements which describe
the kinds of operations (or transactions) that will be performed on the data.

Entity - Relationship (E/R) Data Model


The Entity-Relationship (E/R) data model was introduced by Peter Chen in 1976-77 as a way to
unify the network and relational database views. It has become very popular because it is very
simple to create and read (as it is a diagrammatic representation), and can be used directly to
create a relational model and transform its elements into database elements. The E/R model
translates analyzed information into data requirements, and is used to facilitate communications
between the database architect and the future users of the new system.
The E/R data model views the real world as a set of basic objects (entities) and relationships
among these objects. It is intended primarily for the database design process by allowing the
specification of an enterprise scheme. This represents the overall logical structure of the
database.

ª Elements of E/R Model


The three basic notions of the E/R model are:
Compiled By: Betiglu

˜ Entity: represents existing real-world objects or concepts, such as places, objects, events,
persons, orders, customers, and so on.
˜ Relationship: represents associations between objects, such as the fact that a customer
may place an order.
Department of Electrical and Computer Engineering | AAU
EENG 477- Database Systems 2
2. Entity - Relationship (E/R) Data Model

˜ Attribute: describes the entity, such as the invoice date or the customer first name.

Case Study (T he case in here is only for teac hi ng purpose and it


i s no w ay rel at e d t o a n y com p any )

Consider a database system to be developed for “XYZ Software Share


Company”. The following are brief and short listed structures of the
company.

- The company runs various projects with a total of 68 full‐time


employees and over 120 part‐time employees.

- A project has a unique id and a name that may be designed for a


new software development or for a release of a new version of
software that had been developed by the company.

- The projects are having start date, due date, complete date and
status that describe their progress. Every project is lead by a senior
manager organized into teams of five to eight programmers
coordinated by a team leader.

- The owners of the projects are the customers of the company. A


single customer can own one or more projects. The customers
have unique id, name and address.

- The company is organized into departments that are identified by


a unique name and lead by department heads. A department head
can only lead a single department in his/her employment by the
company.

- An employee can only belong to one department. Every employee


is identified by an Id, a name, an address, and a position. In
addition full‐time employees have monthly salary and allowance
rate; and part‐time employees have contract period and hourly
rate. Working schedule of both full‐time employees and part‐time
employees is maintained on weekly bases.
Compiled By: Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 3
2. Entity - Relationship (E/R) Data Model

Entity Sets
Entities are the principal data objects about which information is to be collected in E/R model.
Entities are usually recognizable concepts, either concrete or abstract, such as person, places,
things, or events which have relevance to the database. An entity set is then a set consisting of
the same type of entities that share same properties.
Consider the case study; some specific examples of entities are then:
EMPLOYEES, PROJECTS, CUSTOMERS …
The candidate entities from the requirement statements are the nouns and the adjective noun
phrases. The “EMPLOYEES” entity set represents all the set of employees and the “Projects”
entity set represents all the set of projects.
Entities are classified as independent (Strong) or dependent (Weak).
& A strong (independent) entity is one that does not rely on other entities for identification.
& A weak (dependent) entity is one that relies on other entities for identification.
An individual occurrence of an entity set is also known as an instance (object).

Attributes
Attributes are descriptive properties that are associated with an entity. A set of attributes
describe an entity.
A particular instance of an attribute is called a value.
For example, “Employee Id” and “Name” are the attributes of the “EMPLOYEES” entity set; and
“Kevin Jones” is one value of the attribute “Name”.
The domain of an attribute is the collection of all possible values an attribute can have. The
domain of “Name” is a character string.
Attributes can be classified as identifiers or descriptors.
& Identifiers: more commonly called keys, uniquely identify an instance of an entity.
Example: “Employee Id” uniquely identifies an employee entity from the entity
set.
& Descriptor: describes a non-unique characteristic of an entity instance.
Example: “Name” is a descriptor for the “EMPLOYEES” entity set.
Compiled By: Betiglu

Other way of categorizing Attributes is as Simple and Composite attributes.


& Simple Attributes: are attributes also known as Atomic Attributes that cannot be
divided into subparts mainly of primitive types.
Example: “Age” and “Gender” of the “EMPLOYEES” entity set.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 4
2. Entity - Relationship (E/R) Data Model

& Composite Attributes: are attributes that are composed of smaller subparts that can be
subdivided into the subparts (Attributes).
Example: “Address” of the “EMPLOYEES” entity set that can be divided into
“City”, “Home Address”, “Phone”, and “P.O. Box”
Hierarchical Composite Attributes
Address

City Home Address Phone P.O. Box

SubCity Kebele H. No.

Fig 1. Hierarchical Composite Attribute

Another classification of attributes is based on the values that they can hold as: Single-valued
and Multi-valued attributes.
& Single-valued Attributes: are attributes having only one possible value at any time.
Example: “Name” and “Gender” of the “EMPLOYEES” entity set.
& Multi-valued Attributes: are attributes that are having possibly more than one value.
Example: “Address” of the “EMPLOYEES” entity set.
Attributes can also be categorized Stored and Derived attributes.
& Derived Attributes: are attributes that can be calculated from the related stored
attributes, entities or general states.
& Stored Attributes: on the other hand are attributes that can not be calculated in any way
from the stored attributes.
Example: “Birth Date” of the “EMPLOYEES” entity set is a stored attribute,
where as “Age” is a derived attribute that can be calculated from the “Birth Date”
and “Current Date”.

Relationship Sets
A Relationship represents an association between two or more entities. An example of a
relationship would be:
- “EMPLOYEES” are Assigned to “TEAMS”
Compiled By: Betiglu

- “CUSTOMERS” Owns “PROJECTS”


- “TEAMS” works on “PROJECTS”

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 5
2. Entity - Relationship (E/R) Data Model

A Relationship Set is then a set consisting same types of relationships. The entities involved in
the relationship are known as participating entities and the function the entity plays in a
relationship is called the entity’s role.
Example: In the Assigned relationship “EMPLOYEES” and “TEAMS” entity sets are the
participating entity sets; and the “EMPLOYEES” entity has a role as a “Programmer” or
“Team Leader” in the relationship.
Relationships are classified in terms of degree, connectivity, cardinality, and existence.
Degree: The degree of a relationship is the number of entities associated with the relationship.
The n-ary (multi-way) relationship is the general form for degree n. Special cases are the binary,
and ternary, where the degree is 2, and 3, respectively
Connectivity: The connectivity of a relationship describes the mapping of associated entity
instances in the relationship. The values of connectivity are “one” or “many”.
Cardinality: The cardinality of a relationship is the actual number of related occurrences for
each of the two entities. The basic types of connectivity for relations are: one-to-one, one-to-many,
and many-to-many.

A B A B A B

a1 b1 b1 a1 b1
a2
a2 b2 b2 a2 b2

a3 b3 b3 a3 b3
a4
a4 b4 b4 a4 b4

(a) (b) (c)


Fig 2. Relationship credentials
(a) one-to-one (b) one-to-many (c) many-to-many
Existence: denotes whether the existence of an entity instance is dependent upon the existence
of another, related, entity instance. The existence of an entity in a relationship is defined as
either mandatory or optional.

ª Design Principles

E/R Diagram
Compiled By: Betiglu

The Entity Relationship (E/R) data model is a diagrammatical data model. The elements of the
E/R model are represented by:
- Rectangles - for the Entity sets,
- Ellipses - for the Attributes,

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 6
2. Entity - Relationship (E/R) Data Model

- Diamonds - for the Relationships, and

Task

PROJECTS WorksOn TEAMS

Fig 4. Relationship Attributes

- Lines - for the links between the attributes and the entity sets and between the entity
sets and the relationships.
- Double border Rectangles - for the weak entity sets.
- Double border Ellipses - for the multi-valued attributes.
- Dashed border Ellipses - for the derived attributes.
- Arrow Head Line - for the link between an entity set and a one-to-one or many-to-one
relationship. The arrow is headed to the one side entity set.
Example
- “EMPLOYEES” are Assigned to “TEAMS”
- “CUSTOMERS” Owns “PROJECTS”
- “TEAMS” works on “PROJECTS”
Composite attributes are represented by linked ellipses as depicted in the above figure with the
attributes “Address” and “H Addrs”.
Relationship Attributes: Attribute(s) may be used in some relationships to describe the
relationship further. Consider the relationship “WorksOn” between the “TEAMS” and
“PROJECTS” entity sets. The relationship can be further described if an attribute “Task” is
added to it as follows.
Compiled By: Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 7
2. Entity - Relationship (E/R) Data Model

Name Name
Descr EmpId BDate

Age
TEAMS Assigned EMPLOYEES
Address

City H Addrs Phone

WorksOn SubCity Kebele H No

Name
ProjId CustId Address

Name PROJECTS Owns CUSTOMERS

SDate DDate

Fig 3. Sample Partial E/R Model

Multi-way Relationship: Consider the three way relationship between the “PROJECTS”,
“TEAMS”, and “SOFTWARE” entity sets.
Cardinality Limits of a Relationship: The credential limit of a relationship is labeled as:
- 0..* or 0..∞ indicating zero or more participation of the entity in the relationship.
- 1..* or 1..∞ indicating one or more participation of the entity in the relationship.
- 0..1 indicating zero or one participation of the entity in the relationship.
- 1..1 indicating exactly one participation of the entity in the relationship.
Compiled By: Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 8
2. Entity - Relationship (E/R) Data Model

Task

1..1 WorksOn 1..*


SOFTWARE TEAMS
0..*
1..1
PROJECTS Assigned

5..8
EMPLOYEES
Fig 5. Multi-way Relationships and credential limits

The multi-way (ternary) relationship shown in figure 5 above can be reduced to a binary
relationship with the use of an entity set in place of the relationship and having three new
relationships for the links in between the participating entity sets and the relationship.
Task

SOFTWARE Produce ASSIGNMENT Formed TEAMS

For

PROJECTS
Fig 6. Multi-way Relationships to Binary Relationship

If the multi-way relationship set that is transformed into the binary relationship had any
attributes, these are assigned to the entity set that replaces the relationship.
Entity Set Roles in a Relationship: In some relationships a single entity set may participate
more than once in such case a label is on the link line from the entity set is used to differentiate
the participation of the entity set.

Assigned By
TEAMS Assigned EMPLOYEES
Assigned
For
Compiled By: Betiglu

Fig 7. Role of an entity set in a Relationship.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 9
2. Entity - Relationship (E/R) Data Model

Total Participation in a Relationship: The participation of the entity set in a relationship is


said to be in total if every element of the entity set is at least related to one element in the other
participating entities through the relationship; otherwise the participation is said to partial. A
total participation in E/R model is represented by a double line from the entity set to the
relationship. Consider the relationship between the “SOFTWARE” and “ASSIGNMENT” entity
sets above. Every Software is produced in an Assignment, hence the relationship from the
“SOFTWARE” to the “ASSIGNMENT” is total.

Task

SOFTWARE Produce ASSIGNMENT

Fig 8. Total Participation of the SOFTWARE entity set.

Design Issues
The following are some useful principles to be followed in designing databases.
1. Faithfulness - first and for most, the design should be faithful to the specifications. That
is classes or entity sets and their attributes should reflect reality.
2. Avoiding Redundancy - be careful to say everything only once.
3. Simplicity - avoid introducing more elements into your design than are absolutely
necessary.
4. Picking the Right kind of Element - Sometimes we have options regarding the type of
design element used to represent a real-world concept.
Use of Entity Set versus Attributes: Generally, of something has more information associated
with it than just its name, it probably needs to be an entity set. However, if it has only its
name to contribute to the design, then it is probably better to make it an attribute.
Example: A “SOFTWARE” entity set may have a “Version” attribute, or
“VERSION” can be argued to be an entity set.
Entity versus Relationship Sets: Since relationships represent events there will always be
confusion between the entity sets and relations.
Binary versus n-ary Relationship Sets: Generally, of something has more information
associated with it than just its name, it probably needs to be an entity set. However, if it
Compiled By: Betiglu

has only its name to contribute to the design, then it is probably better to make it an
attribute.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 10
2. Entity - Relationship (E/R) Data Model

Remarks on Designing
- Choose meaningful naming for the entities, attributes and relationships.
- Use short links.
- Cluster diagram if it has too many entities and relationships.

ª Keys and Constraints


Constraints in a database design are assertions that need to be conformed to maintain the actual
aspect of the real world system under construction. Some of the commonly used classifications of
database constraints are:
1. Keys – are attributes or set of attributes that can be used to uniquely identify an entity
within the entity set. A key in E/R model is represented by underlining the attributes
belonging to the key
2. Single-value constraints – are requirements that specify a given attribute or set of
attributes are unique in certain context.
3. Referential integrity constraints – are requirements for an existence of an entity in the
database so as it can be referenced by a certain relationship. Referential integrity in E/R
model is represented by a rounded arrow head pointing to the entity set required for
existence.
4. Domain constraints – are requirements on an attribute value to be in a specified range of
values. No specific notation, but side marks may be used to represent domain constraints.
5. Participation constraints – are assertions whether the participation of an entity in a
relationship is total or partial.

Keys
As described above keys are attributes or set of attributes that suffice to distinguish entities from
each other.
A super key also know as super set is then a set of one or more attributes that in group
(collectively) can identify an entity uniquely from the entity set.
Example: Consider the “EMPLOYEES” entity set, then
- “EmpId”, “EmpId, Name”, “NationalId”, “NationalId, BDate”, … are super keys
Compiled By: Betiglu

- “Name”, “BDate” are NOT super keys


REMARK
˜ If K is a super set (super key) then a set consisting of K is also a super set.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 11
2. Entity - Relationship (E/R) Data Model

The more interesting super set is the minimal super set that is referred to as the candidate key.
The candidate key is the sufficient and the necessary set of attributes to distinguish an entity set.
Example: In the “EMPLOYEES” entity set
- “EmpId”, “NationalId”, “Name, BDate” (assuming that there is no coincidence
that employees with the same name may born on the same day) … are candidate
keys

Name
ProjId CustId Address

Name PROJECTS Owns CUSTOMERS

SDate DDate

Fig 9. Primary key representation in E/R model

The designer of the database is the one that makes the choice of the candidate keys for
implementation, but the choice has to be made carefully. Primary key is a term used to refer to
the candidate key that is selected by the designer for implementation.

ª Enhanced E/R Modeling


E/R modeling can be further extended to include concepts such as inheritance (specialization)
and aggregation.

Ñ Week Entity Set


From the previous discussion it is to be recalled that a weak entity can be uniquely identified
with the help of other entity. That is, for the entity to be uniquely distinguished some of its
attributes in conjunction with the primary key of another entity has to be used.
The entity which contributes its primary key is called the identifying or owner entity set. The
identifying entity set is then said to own the weak entity set.
For an entity set to be called a weak entity set the following restrictions has to hold:
& The owner entity set and the weak entity set must participate in a one-to-many
Compiled By: Betiglu

relationship set (one owner entity is associated with one or more weak entities, but each
weak entity has a single owner). This relationship set is called the identifying
relationship or supporting relationship of the weak entity set.
& The weak entity set must have total participation in the identifying relationship.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 12
2. Entity - Relationship (E/R) Data Model

To distinguish weak entities that depend on one particular identifying entity (string entity) an
attribute or set attributes in the weak entity set is used. Such an attribute or set of attributes is
referred to as discriminator.
The key (primary key) of a weak entity set consists of:
1. Zero or more of its own attribute; the discriminator, and
2. Key attributes of the owner (identifying) entity set.
Notation for Weak Entity Set
- Weak Entity sets are represented by Double boarder Rectangles.
- The identifying many-to-one relationship is represented by Double border Diamonds.
- If the entity set has a discriminator then it is represented by Underlining the
attribute(s).
Example:
- Consider “TEAMS” entity set Teams with the same name can be formed to work on
different projects. Thus neither “Name” nor “Description” can uniquely identify a
“TEAMS” entity. Rather an entity will be distinguished when it is related to a
“PROJECT” entity. Note that the relationship in between is a many-to-one relationship
when it is seen from the “TEAMS” side.

Name
ProjId Descr

Name PROJECTS WorksOn TEAMS

SDate DDate

Fig 10. Weak entity set representation in E/R model

The two most common sources of weak entity sets are:


- Inheritance or specialization and generalization, and
- Multi-way relationships that are transformed into binary relationships through entity sets.

Ñ Specialization and Generalization


Sometimes grouping of entity set in hierarchical structure can be helpful to show the association
Compiled By: Betiglu

between the entities in the real world.


Specialization is a top-down process in grouping of entities that are similar in some way and
distinct in some other ways in which distinctions are made explicitly.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 13
2. Entity - Relationship (E/R) Data Model

Generalization is a design process of bottom-up approach in which multiple entity sets are
synthesized into a higher level entity set based on their common features.
Specialization and generalization in E/R model are represented by a triangle labeled “ISA”
between the entities. The vertex of the triangle is towards the generalized (super class) entity
set.
Example:
- “FULL-TIME EMPLOYEES” and “PART-TIME EMPLOYEES” can be generalized
to for an entity set “EMPLOYEES”
- “PROJECTS” entity set may be further specialized into “WEB-BASED PROJECTS”
and “WIN32 PROJECTS”

PROJECTS EMPLOYEES

ISA ISA

WEB-BASED WIN32 FULL-TIME PART-TIME


PROJECTS PROJECTS EMPLOYEES EMPLOYEES

Fig 11. Specialization and generalization in E/R model.

Constraints on Specialization and Generalization


Condition-defined vs. User-defined Lower-level Entity Sets: - Members of condition-defined
specialized (lower-level) entity sets are those satisfying an explicit condition or predicate from
the higher-level entity set; whereas, User-defined members are determined upon the entry of the
entities from the user irrespective of any constraint.
Example:
- Consider the “EMPLOYEES” entity set specialization. If there is field named “Emp
Type”, in the higher-level entity set that determines for a given entity to belong to
either of the two lower-level entity sets “FULL-TIME EMPLOYEES” and “PART-
TIME EMPLOYEES”. Then the lower-level entities are said to be condition-defined.
Compiled By: Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 14
2. Entity - Relationship (E/R) Data Model

Disjoint vs. Overlapping Specialization: - Specialization of a higher-level entity set to lower


entities may be either Disjoint or Overlapping based on whether the entities may belong to one or
more lower-level entity sets. If the specialization allows for a higher-level entity to belong to
more than one entity then the lower-level entities are said to be Overlapping otherwise Disjoint.
Example:
- Consider the “PROJECTS” entity set specialization. If a given project is allowed to
have the two flavors “web-based” and “win32” applications then the lower-level entity
sets are said to be Overlapping. However, the “EMPLOYEES” specialization entities
are disjoint, since an employee can be either full-time or part-time but not both.
Total vs. Partial Specialization or Generalization: - If each higher-level entity set in
specialization or generalization must belong to one of the lower-level entity sets then the
specialization or generalization is said to be total, however if it is not a must for the entity to
belong any of the lower-level entity sets it is said to be partial.

Ñ Aggregation
One deficiency in E/R modeling is the fact that a relationship is allowed only between entity
sets. But in some cases it may be advantageous to have a relationship between a relationship and
an entity set or a collection of entity sets.
Example:
- Consider the “WorksOn” relationship between “TEAMS” and “PROJECTS”. From the
case study it is to be noted that every project is to be lead by a senior manager. Hence
the manager is responsible for managing the teams, projects and the outcome of the
project, the software. Therefore the resulting E/R model would be as follows.

PROJECTS WorksOn TEAMS

Manages

EMPLOYEES
Fig 12. Redundancy in E/R model

As can be seen from the diagram above redundancy loop is introduced as the E/R model doesn’t
Compiled By: Betiglu

allow a direct association of a relationship to a relationship.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 15
2. Entity - Relationship (E/R) Data Model

An alternative way to avoid the redundancy is with the use of the aggregation. Aggregation is
an abstraction through which collection of related entity sets and relationships are treated as
high-level entities. It allows indicating for a relationship set (identified through a box) to
participate in another relationship set.
Example:
- The previous example can be alternatively represented as follows

PROJECTS WorksOn TEAMS

Manages

EMPLOYEES
Fig 13. Aggregation in E/R model
Compiled By: Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 1
3. Relational Data Model

3. Relational Data Model


Relational Data Model is an implementation (representational) model proposed by E.F. Codd in
1970. The model is an approach in a database design towards the Relational Database
Management System (RDBMS).

ª Structure of Relational Database


The main construct for representing data in the relational database is a two-dimensional table
called a relation.
Example
- “EMPLOYEES” relation

Employees
EmpId Name BDate Sub City Kebele Phone
E001 Alemu Girma 01/10/70 Bole 06 011-663-0712
E004 Kelem Belete 12/04/68 Gulele 03 011-227-2525
Fig 1. Typical Employee relation instance

The columns in the table are representing the attributes of the relationship, and the rows (other
than the heading row) represent tuples (records) of the relation.
A relation in a relational model consists of:
& The Relation schema: - that describes the column heads for the table and
& The Relation instance: - that is the table with the set of tuples.
The set of relation schema forms schema for the relational database called database schema
(relational database schema).
In relational model the relation schema are described first. And the schema specifies
- The relation's name
- Name for each attribute (field or column)
- Domain of each attribute: - A domain is referred to in a relation schema by the domain
name and has a set of associated values.
Example
Compiled By: Betiglu

- Employees (EmpId:sting, Name:string, BDate:date, SubCity:string, Kebel:integer, Phone:string)


- Projects (PrjId:integer, Name:string, SDate:date, DDate:date, CDate:date)
- Teams (Name:string, Descr:string)

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 2
3. Relational Data Model

Properties of Relations
& Rows (tuples) in a single relation are unique (that is; no two tuples are identical).
& Relations are set of tuples not lists (that is; order of tuples in a relation is immaterial).
& Attributes are atomic.
& The values that appear in a column must be drawn from the domain associated with that
column.
& The degree, also called arity, of a relation is the number of attributes in the relation.
& The relation names in a relational database are distinct.

Key Constraints
A key constraint is a statement that a certain minimal subset of the attributes of a relation is a
unique identifier for a tuple in the relation.
A set of attributes that uniquely identifies a tuple according to a key constraint is called a
candidate key for the relation; often abbreviated just as key.
Key attributes in relational model are indicated by underlying the attributes in the relational.
Example
- Employees (EmpId, Name, BDate, SubCity, Kebel, Phone)
- Projects (PrjId, Name, SDate, DDate, CDate)
- Teams (Name, Descr)

REMARK: Note that a key for a relation may not be directly inferred from the high-level
conceptual models in some cases.

Foreign Key Constraints


The most common integrity constraint involving two relations is a foreign key constraint. It
keeps data consistency when a data modification is done on a relation.
The foreign key in the referencing relation requires a match to a primary key in the referenced
relation. That is, there must be a compatible data type attribute in the referenced relation so as
the referencing relation may make the referencing.
Example
- Employees (EmpId, Name, BDate, SubCity, Kebel, Phone)
Compiled By: Betiglu

- WorkSchedule (SDate, EDate, HoursPerDay, Employee,)

In the above example for the “WorkSchedule” to refer to the “Employees” relation instance, it
has an attribute ‘Employee’ of the same type as the ‘EmpId’ in the “Employees” relation which is

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 3
3. Relational Data Model

a primary key. The foreign key constraint is implemented through the ‘Employee’ attribute in
the referencing relation “WorkSchedule”.

WorkSchedule Employees
… Hours Employee EmpId Name …
… 8 E001 E001 Alemu Girma …
… 6 E004 E004 Kelem Belete …
… 8 E002 E002 Mulken Getu …
… 4 E004
(a) Referencing relation (b) Referenced relation
Fig 2. Foreign Constraint in Relational Model

NOTE: - A single tuple can be referenced by zero or more tuples in the referencing relation, but
a single tuple with a single foreign key attribute can only reference one tuple.
- A foreign key could refer to the same relation.
- A relational database consists of related relations through a foreign key.

ª Entity-Relational (E/R) Model to Relational Model


Mapping
The second phase in database design is implementation design that transforms the conceptual
data model into an internal model - schema such as a relational data model for an implementation
into relational database management system (RDBMS).
E/R diagram’s entity sets and relationship are ways of describing a relational schema and the
sets of entities and relationship sets form the relational instance of the E/R schema which is not
part of the database design.

Ñ Entity Sets to Relations


Strong entity sets in E/R model are mapped to relations in relational model with the same name
and attributes. The primary keys assigned for the entity sets are also represented as keys in the
relations.
Example
Compiled By: Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 4
3. Relational Data Model

Consider the following E/R diagram from the case study

Name Name
Descr EmpId BDate

Age
TEAMS Assigned EMPLOYEES
Address

City H Addrs Phone

WorksOn SubCity Kebele H No

Name
ProjId CustId Address

Name PROJECTS Owns CUSTOMERS

SDate DDate

Fig 3. Partial E/R Model from Conceptual Data Model

Then the relations from the strong entity sets having only simple and single valued
attributes are as follows
- Projects(ProjId, Name, SDate, DDate)
- Customers(CustId, Name, Address)

Handling Weak Entity Sets


Suppose W is a weak entity set with attribute set {a1, a2, a3, … an} and identifying strong entity
set E. And let the primary key of E is the set {b1, b2, … bm}, then the attributes of the relation
for the weak entity set must include attributes for its complete key (including those belonging to
the identifying strong entity set) and its own, non-key attributes. That is, the set of attributes of
the mapping relation is {a1, a2, a3, … an} U {b1, b2, … bm}.
The primary key for the weak entity set relation thus include:
˜ The discriminator of the weak entity set, and
˜ The primary key of the identifying strong entity set.
Example
Compiled By: Betiglu

For the weak entity set (TEAMS) in figure 3 above the corresponding relation is:
- Teams(ProjId, Name, Descr)

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 5
3. Relational Data Model

Handling Composite and Multivalued Attributes


˜ Composite attributes from E/R model to a relational model can be represented by
creating separate attributes for each of the components of the attributes (Note that the
composite attribute is not mapped directly into a separate attribute).
˜ Multivalued attributes are handled by creating relations with the name of the attribute
having attributes that corresponds to the components of the multivalued attribute and
the primary key of the entity set or relationship set of which the attribute belongs. The
primary key for the newly created relation consists of:
- The primary key of the entity set or relationship set, and
- The attribute or set of attributes from the multivalued attribute.
Example
Consider the EMPLOYEES entity set in figure 3 above the corresponding relations for the
entity set are:
- Employees(EmpId, Name, BDate, Age)
- Addresses(City, SubCity, Kebele, HNo, Phone1, Phone2, EmpId)
REMARK
Note that; if the multivalued attribute has a fixed size of multiplicity (small size), it can be
represented by separate attributes for each multiplicity. For example consider phone
attribute above.

Ñ Relationship Sets to Relations


Suppose entity set E with a primary key {a11, a12, a13, … a1n} is related to an entity set F with a
primary key {a21, a22, a23, … a2m} through a relationship R. Let the relationship R has a
descriptive attribute set {b1, b2, b3, … bp}, then the relationship is represented by a relation
whose attributes are:
˜ The keys of the connected entity sets: {a11, a12, a13, … a1n} U {a21, a22, a23, … a2m}, and
˜ Attributes of the relationship itself: {b1, b2, b3, … bp}.
The union of the primary keys of the related entity sets forms super key for the relationship
relation. If the relationship is many-to-many the super key also becomes a primary key for the
relation, otherwise the primary key from the many said becomes the primary key for the relation.
Compiled By: Betiglu

Example
From figure 3 above the corresponding relations for the relationship sets are:
- Assigned(EmpId, ProjId, TeamName)

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 6
3. Relational Data Model

- Owns(ProjId, CustId)
NOTE: Supporting relationships (for example WorksOn) need not be transformed to relations if
their purpose is solely for identifying a weak entity set by passing on the identifying
strong entity set’s primary key to the weak entity set; otherwise they will introduce
redundancy.
Suppose entity set E and F are related through a many-to-one relationship R from E to F, then it
is possible to join the relations for E and R that come out of this E/R model into a single relation
S with a schema consisting of:
˜ All attributes of the entity set E,
˜ The keys attributes of the entity set F, and
˜ All Attributes of the relationship R.
If the participation of E into R is total it is also possible to include all attributes of F in the
relation S and have one single relation S in place of the three relations E, F and R.
The primary key for S would the primary key of E.
Example
Consider the entity sets “PROJECTS” and “CUSTOMERS” and the corresponding
relationship “Owns”, then we can have:
- Projects(ProjId, Name, SDate, DDate, CustId) and Customers(CustId, Name, Address),
or
- Projects(ProjId, Name, SDate, DDate, CustId, Name, Address)

Ñ Representation of Generalization and Specialization


Hierarchical structure (Specialization and Generalization or Inheritance) in relation model can be
represented in three different ways:
1. E/R Style: One relation for each lower-level entity set and the higher-level entity set.
Every relation of the lower-level entity set will include:
& Key attribute(s) of the higher-level entity set which forms the primary key of the
entity set, and
& Attributes of that lower-level entity set.
For total and disjoint generalization the higher-level entity set may not be mapped into a
Compiled By: Betiglu

relation instead all its attributes are passed to all immediate lower-level entity sets
realtions.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 7
3. Relational Data Model

2. Use of Nulls: One relation having a large set of attributes of all the lower-level entity
sets and higher-level entity set; entities have NULL in attributes that don’t belong to
them. Involves large number of NULL values for disjoint generalization.
3. Object-Oriented Approach: One relation per subset of subclasses, with all relevant
attributes including:
& Attributes of the higher-level entity set, and
& Attributes of that lower-level entity set.
The primary key of the higher-level entity set becomes the primary key of each relation.
Example
Consider the entity sets “EMPLOYEES” and its lower-level entity sets, then
- FullTimeEmployees(EmpId, Salary, Saving, Allowance)
- PartTimeEmployees(EmpId, HourlyPay, ContractPeriod)

ª Dependencies
In a database design the two most common pitfalls that result in bad designing are:
& Repetition of information, and
& Inability to present certain information (Loss of information).

Ñ Functional Dependencies
Functional dependency is a kind of constraint that helps to remove redundancy in relational
database design.
Defintion: Functional dependency denoted by X J A is an assertion about a relation R that
whenever two tuples of R agree on all the attributes of X, then they must also agree
on the attribute A. We say that “X J A holds in R” or “X functional determines A”
Note that in the notation X J A; X represent sets of attributes and A represent single
attribute. That is A1 A2 A3…An J B
The functional dependency is a generalization of the notion of superkey.
Example:
- Consider the Teams relation: Teams(PrjId, Name, Descr), then
Compiled By: Betiglu

PrjId, Name J Descr


- For the Employees relation:
Employees(EmpId, NationalId, Name, BDate, Age, Gender, City, HAddr, Phone)

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 8
3. Relational Data Model

EmpId J Name; EmpId J Age; Name BDate J Gender


A functional dependency A1 A2 A3…An J B is said to be trivial dependency if B is an element or
of {A1, A2, A3…An}.

Rules of Functional Dependency


Combining Rule:
The functional dependencies:
A1 A2 A3…An J B1
A1 A2 A3…An J B2
:
A1 A2 A3…An J Bm
can be written as:
A1 A2 A3…An J B1 B2 … Bm
Splitting Rule
The functional dependency A1 A2 A3…An J B1 B2 … Bm can be written as A1 A2 A3…An J Bi for
i=1, 2, 3, .. m
Closure of Attributes
Suppose {A1, A2, …An} is a set of attributes and S is a set of functional dependencies in
relation R. The closure of the set {A1, A2, …An} under the functional dependency set S is the
set of attributes B that are functionally determined from the set S. That is; A1 A2 …An J B
follows from the set S. The closure set of attributes A1, A2, …An is denoted by {A1, A2,
…An}+
The closure set of attributes can be determined by repeatedly applying the following three
rules known as Armstrong’s Axioms:
Reflexivity Rule

If α is set of attributes and β C α then, α J β holds.


Augmentation Rule

If α J β holds and γ is set of attributes, then γα J γβ holds.


Compiled By: Betiglu

Transitivity Rule

If α J β holds and β J γ holds, then α J γ holds.


Algorithm for computing the closure of X, X+ is given below.
Department of Electrical and Computer Engineering | AAU
EENG 477- Database Systems 9
3. Relational Data Model

1. Let X be a set of attributes that eventually will become the closure. First, we initialize X
to be X.
2. Now, we repeatedly search for some functional dependency B1 B2 …Bm J C Such that all
of B1, B2...Bm are in the set of attributes X but C is not. We then add C to the set X.
3. Repeat step 2 as many times as necessary until no more attributes can be added to X.
4. The set X, after no more attributes can be added to it, is the closure set X+.
Example: Consider a relation with attributes A, B, C, D, E, and F. Suppose that this relation
has the functional dependencies AB J C, BC J AD, D J E, and CF J B. What is
the closure of {A, B}, that is {A, B}?
Solution:
X = {A, B}
From the function dependency AB J C, we add C to X that is X = {A, B, C}
Similarly; BC J AD Ö X = {A, B, C, D}
D J E Ö X = {A, B, C, D, E}
No more changes in X are possible. Thus {A, B}+ = {A, B, C, D, E}
From the closure set it is to follow that AB J D
Exercise: Test whether D J A flows from the functional dependency set?
To test for D J A, first determine the closure set of {D}
X = {D}
From the function dependency D J E, we add E to X that is X = {D, E}
No more changes in X are possible. Thus {D}+ = {D, E}
From the closure set D J A does not hold.

Ñ Multivalued Dependencies
Multivalued dependency for a relation R, is defined as a constraint when the values of one set of
attributes is fixed, then the values in certain other attributes are independent of values of all the
other attributes in R.
That is; for a multivalued dependency X JJ Y in R where X and Y are subsets of the set of
Compiled By: Betiglu

attributes in R, if t and u are tuples in the relational instance r for the schema R, then there exist
a third tuple v that agrees:
1. with both t and u on X’s,

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 10
3. Relational Data Model

2. with t on Y’s, and


3. with u on all attributes of R that are not among X’s or Y’s (R – (X U Y)).

Rules of Multivalued Dependency


Multivalued dependency is a generalization for the functional dependency. That is;

If α J β holds, then α JJ β also holds.


All the rules except the splitting rule for the functional dependency are also applicable for a
multivalued dependency.
Complementation Rule
One additional rule in a multivalued dependency that does not have a counterpart in functional
dependency is the complementation rule.
The rule states that if X JJ Y holds then X JJ (R – (X U Y)), where R is a set of attributes for
the relational schema R.

ª Normalization and Normal Forms


In relational databases, normalization is a process that helps to
& eliminates redundancy,
& organizes data efficiently,
& reduces the potential for anomalies during data operations, and
& improves data consistency.
The formal classifications used for quantifying "how normalized" a relational database are called
normal forms (abbreviated as NF).

Ñ Normalization and Denormalization


Following standard database normalization recommendations when designing databases can
greatly maximize a database's performance by helping to:
& Reduce the total amount of redundant data in the database. The less data, the less work on
the RDBMS has to perform, hence, speeding its performance.
& Reduce the use of NULLS in the database. The use of NULLs in a database can greatly
reduce database performance, especially in WHERE clauses.
Compiled By: Betiglu

& Reduce the number of columns in tables. The less number of columns in tables, the more
rows can fit on a single data page, which helps to boost read performance of the
RDBMS.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 11
3. Relational Data Model

& Reduce the amount of SQL code. The less code there is, the less that has to run, speeding
your application's performance.
& Maximize the use of clustered indexes. The more data is separated into multiple tables
because of normalization, the more clustered indexes become available to help speed
up data access.
& Reduce the total number of indexes. The less columns tables have, the less need there is
for multiple indexes to retrieve it. And the fewer indexes, the less negative is the
performance effect of data insertion, modification and deletion.
Redundancy in a database design results in data anomalies classified as:
˜ Insertion Anomalies
˜ Deletion Anomalies
˜ Modification Anomalies
Example: Consider a relation schemas for Employees and Teams in a single realtion as
follows
Emp_Teams(EmpId, Name, BDate, Gender, TeamId, Project, TeamName)
It can easily be noted that there is redundancy of data in the “Emp_Teams” relation for
the Teams detail of the Employees. Consider the following instance for the relations
Employees
EmpId Name BDate Gender TeamId Project TeamName
E001 Alemu Girma 01/10/70 M 1 1 Programmer
E001 Alemu Girma 01/10/70 M 3 2 Programmer
E004 Kelem Belete 12/04/68 M 2 1 Tester
E005 Mulu Tasew 10/05/69 F 3 2 Programmer
E008 Belachew K 02/11/62 M 1 1 Programmer
E003 Almaz B 05/06/65 F 5 3 Programmer
E005 Mulu Tasew 10/05/69 F 2 1 Tester

- Insertion Anomalies: Suppose we want to insert a new employee that works in project 1
as a programmer, then the corresponding fields for the Team
detail has to be entered correctly. If data is entered incorrectly
the consistency will be violated.
Compiled By: Betiglu

- Deletion Anomalies: Suppose E003 is to be removed from the employees list, then
Team information of TeamId 5 will also be removed and vice
versa.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 12
3. Relational Data Model

- Modification Anomalies: During data update the consistency may also be violated as in
the case of insertion.
Although normalization is a way to remove redundancy anomalies and preserve consistency,
integrity and maintainability, it may also lead:
& Increase in storage space
& Complex queries (queries with many multiple joins of tables)
In such situations it may be desired to denormalize some of the tables in order to reduce storage
space and the number of required joins.
Denormalization is the process of selectively taking normalized tables and re-combining the data
in them. Sometimes the addition of a single column of redundant data to a table from another
table can reduce a 4-way join into a 2-way join, significantly boosting performance by reducing
the time it takes to perform the join.
Databases intended for Online Transaction Processing (OLTP) are normalized. By contrast,
databases intended for On Line Analytical Processing (OLAP) operations are primarily "read
only" databases and tend to extract historical data that has accumulated in the project for quite a
long time. For such databases, redundant or "denormalized" data may facilitate Business
Intelligence applications.
While denormalization can boost storage and query performance, it can also have negative
effects. For example, by adding redundant data to tables, you risk the following problems:
& More data means the RDBMS has to read more data pages than otherwise needed,
hurting performance.
& Redundant data can lead to data anomalies and bad data.
& In many cases, extra code will have to be written to keep redundant data in separate
tables in synch, which adds to database overhead.

Ñ Normal Forms
Normalization procedure provides:
& A framework for analyzing relation schemas based on functional and multivalued
dependencies.
& A series of normal form test that can be carried out on individual relation schemas so
that the relational database can be normalized to any degree.
Normalization through decomposition need to preserve the existence of two additional
Compiled By: Betiglu

properties of a relational schema:


˜ Lossless or Nonadditive Join: Nonadditive join property guarantees that the spurious
tuple generation does not occur after decomposition

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 13
3. Relational Data Model

˜ Dependency Preservation: Dependency preservation ensures that each functional


dependency is presented in one of the individual relation resulting after decomposition.

First Normal Form (1NF)


A relation (table) R is in 1NF if and only if all underlying domains of attributes contain only
atomic (simple, indivisible) values, i.e. the value of any attribute in a tuple (row) must be a single
value from the domain of their attribute.
1NF allows removal of multivalued attributes, composite attributes and their combination in the
relational schema.
Normalization (Decomposition)
Form new relation for each non-atomic attribute or nested relation.

Second Normal Form (2NF)


A relation schema R is in 2NF if it is in 1NF and every non-prime attribute A in R is fully
functionally dependent on the primary key. (i.e. not partially dependent on candidate key).
Functional dependency X J Y is said to be fully functionally dependent if removal of any
attribute from X result in for the dependency not hold.
NOTE: Mostly relational schemas that are mapped carefully from E/R model are in 2NF.
Normalization (Decomposition)
Decompose and set up a new relation for each partial key with its dependent attribute(s). Make
sure to keep relation with the original primary key and any attributes that are fully functionally
dependent on it.
Example: Consider a relation schemas for Employees and Teams in a single realtion as
follows
Emp_Teams(EmpId, Name, BDate, Gender, TeamId, Project, TeamName)
EmpId J Name, BDate, Gender
TeamId J Project, TeamName
Then upon decomposition we will have
Employees(EmpId, Name, BDate, Gender)
Compiled By: Betiglu

Teams(TeamId, Project, TeamName)


Emp_Teams(EmpId, TeamId)

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 14
3. Relational Data Model

Third Normal Form (3NF)


3NF for a relation schema R requires that the R be in 2NF, and that there would be no nonprime
attribute of R that has transitive dependencies on the primary key. In summary, all non-key
attributes are mutually independent. Thus, any relation in which all the attributes are prime
attributes (part of some key) is guaranteed to be in at least 3NF.
That is; if X J Y is non-trivial functional dependency in R, then
& X is superkey for schema R, or
& Attribute Y is a member of a candidate key (prime attribute).
Normalization (Decomposition)
Decompose and set up a relation that includes the non-key attribute(s) that functionally
determine(s) other non-key attributes.

Boyce-Codd Normal Form (BCNF)


BCNF requires that there will be no non-trivial functional dependencies of attributes on
something other than a superset of a candidate key (called a superkey). At this stage, all
attributes are dependent on a key, a whole key and nothing but a key (excluding trivial
dependencies).
A table is said to be in the BCNF if and only if it is in the 3NF and every non-trivial, left-
irreducible functional dependency has a candidate key as its determinant. In more informal
terms, a table is in BCNF if it is in 3NF and the only determinants are the candidate keys.
That is; if X J Y is non-trivial functional dependency in R, then
& X is superkey for schema R.
Note that major goals of database design with functional dependencies are:
& BCNF,
& Lossless join, and
& Dependency preservation;
However; in certain situations it is needed to compromise BCNF need with 3NF to preserve
dependency.
Example:
A relation that is in 3NF form but not in BCNF:
Compiled By: Betiglu

R(A, B, C, D) and F = {ABJCD, BCJAD, AJC}


AB and BC are candidate keys, thus
AJC will not violet 3NF where as it violets BCNF since A is not superkey.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 15
3. Relational Data Model

A relation that is in 3NF form and in BCNF:


R(A,B) is guaranteed to be in BCNF since its only possible functional
dependencies are AJB, BJA and/or the trivial ABJAB.
Example: Consider the Project relation from the E/R model
- Projects(ProjId, Name, SDate, DDate, CustId, Name, Address)
ProjId J Name, SDate, DDate, CustId, Name, Address
CustIdJ Name, Address
Then upon decomposition we will have
Projects(ProjId, Name, SDate, DDate, CustId)
Customers(CustId, Name, Address)

Fourth Normal Form (4NF)


4NF requires that there be no non-trivial multivalued dependencies of attribute sets on
something other than a superset of a candidate key.
A table is said to be in 4NF if and only if it is in the BCNF and multivalued dependencies are
functional dependencies. The 4NF removes unwanted data structures (redundancy): multivalued
dependencies.
That is; if X JJ Y is non-trivial multivalued dependency in R, then
& X is superkey for schema R.
Example:
Consider relation R and its dependency set
R(A, B, C, D) and F = {ABJCD, ABJ C}
Then the relation can be normalized as:
R1(A, B, C) and R1(A, B, D)

Fifth Normal Form (5NF or PJNF)


A join dependency (JD), denoted by JD(R1, R2, … Rn), specified on relational schema R,
specifies a constraint on the state r of R. The constraint states that every legal state r of R have a
nonadditive join decomposition into R1, R2, … Rn.
Compiled By: Betiglu

Join dependency is a general form of multivalued dependency where n = 2. (i.e. JD(R1, R2)
implies R1 ∩ R2 →→ ( R1 − R2 ) and using complement property R1 ∩ R2 →→ ( R2 − R1 ) ).

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 16
3. Relational Data Model

5NF also known as Project-Join Normal Form (PJNF) requires that there are no non-trivial join
dependencies that do not follow from the key constraints. A table is said to be in the 5NF if and
only if it is in 4NF and every join dependency in it is implied by the candidate keys.
That is; if JD(R1, R2, … Rn) is non-trivial join dependency in R, then
& Every Ri is superkey of R.
A join dependency (JD), denoted by JD(R1, R2, … Rn), specified on relational schema R, specifies
a constraint on the state r of R. The constraint states that every legal state r of R have a
nonadditive join decomposition into R1, R2, … Rn.
Although, there are also other higher level normalizations such as DKNF and 6NF, most
relational database designs are sufficiently normalized at BCNF level or even at 3NF.
Compiled By: Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 1
4. Relational Algebra

4. Relational Algebra
Relational Algebra is a procedural query language that consists of a set of operations that take
one or two relations as input and produce a new relation as a result. The algebra operations
enable a user to retrieve specific request on a relational model. The operations that produce a
new relation can be further manipulated using operations of the relation algebra. The sequence
of the relational algebra that produces new relation forms a relational algebra expression.

ª Fundamental Operations of Relational Algebra


The core relational algebra that has traditionally been thought of as the relational algebra
consists of the Fundamental operations that can be grouped into two based on the number of
relation operands of the operator. These are:
˜ Unary Operators.

- Selection (σ)

- Projection (Π)

- Rename(ρ)
˜ Binary Operators.
- Product (Cartesian Product) (¯)
- Union ( U )
- Difference ( – )
The binary operators listed above are also known as set operators as they are derived from the
set theory.

Ñ Unary Operations

Select Operation
The select operation selects a subset of tuples from a relation instance that satisfies a given
predicate (condition).
It is denoted by

σ C (R )
Compiled By: Betiglu

Where σ represents the SELECT operator, C is a boolean expression of the select


condition, and R is the relation or relational algebra expression.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 2
4. Relational Algebra

Example
- From the “EMPLOYEES” relation to extract Senior Mangers, selection operation can
be written as
- Employees(EmpId, Name, BDate, Age, Gender, Position, Salary)

σ Position ="Senior Manger " (Employees )

σ Salary >=3000 (Employees )

Project Operation
While the select operation is picking certain rows from a relation, projection operation forms a
new relation by picking certain columns in the relation.
It is denoted by:

Π A (R )

Where Π represents the PROJECT operator and A is a set of attributes in the relation R.
Example
- To extract Employees Name and Position only from the “EMPLOYEES” relation

Π Name, Position (Employees )

Π Name, Position (σ Salary >=3000 (Employees ))

Rename Operation
Unlike relations in the relational model the new relations driven from the relational algebra
expression do not have name that will allow us to refer to them in other expressions. The
renaming operator can be used to explicitly rename resulting relations of an expression.
It is denoted by:

ρ S ( A , A ,L A ) (R )
1 2 n

Where ρ represents the RENAME operator and S is a name for the new relation and A1,
A2, … An are new names for the attributes in the relation R.
After the renaming the name of the relation and the attributes can be used as ordinary relation
Compiled By: Betiglu

and attributes in a sequence of relational algebra expressions:

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 3
4. Relational Algebra

Ñ Binary Operations

Cartesian Product Operation


The Cartesian product operation (also known as Cross Product or Cross Join or Product) is
binary set operation that generates a new relation from two relation in a combinatorial fashion.
It is denoted by
RΧS
Where ¯ represents the PRODUCT operator and R and S are relations to be joined.
The product operation is the just like the product operation in set theory that maps each tuple in
relation with every tuple in S.
Example
- Consider the following relations R and S, then R Χ S is given as shown to the right.
R S R¯S
A B B C D A R.B S.B C D
1 2 2 a x 1 2 2 a x
3 4 4 b y 1 2 4 b y
5 c z 1 2 5 c z
3 4 2 a x
3 4 4 b y
3 4 5 c Z

- Consider the Employees, EmpTeams and Teams relation and develop a relational
algebra expression that retrieves the name and position of Employees that work on
Project 1 as Programmers and rename the relation as Programmers1.
- Employees(EmpId, Name, BDate, Age, Gender, Position, Salary)
- EmpTeams(EmpId, TeamId)
- Teams(TeamId, PrjId, Name, Descr)
ρ Programmers1( Name, Position ) (Π E.Name, E.Position (σ PrjId=1 AND T.Name=" Pr ogrammer" (
σ ET.TeamId=T.TeamId ( ρ T (Teams )Χσ E.EmpId= ET.EmpId (
ρ E (Employees)Χρ T (EmpTeams ))))))
Compiled By: Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 4
4. Relational Algebra

The expression tree for the above relational algebra expression is:

ρ Pr ogrammers1( Name , Position )

Π E . Name , E . Position

σ Pr jId =1 AND T. Name ="Pr ogrammer "

σ ET .TeamId =T .TeamId

ρT σ ET .TeamId =T .TeamId

Teams Χ
ρT ρT

Employees EmpTeams

Union Operation
The union operation on R and S denoted by R U S results a relation that includes all tuples either
in R or in S or in both. Duplicates are eliminated from the result.

Intersection Operation
The intersection operation on R and S denoted by R I S results a relation that includes all tuples
in both R and S.

Set Difference Operation


The result of the set difference operation on R and S denoted by R − S is the set of elements in R
but not in S.
For the set operations (Union, Intersection, Set difference) the two relational operands R and S
must have same type of tuples, this condition is known as Union Compatibility.
Two relations R(A1, A2, … An) and S(B1, B2, … Bn) are said to be union compatible if
1. They have same degree n, and
2. Domain(Ai) = Domain(Bi) for all i = 1, 2, … n
Compiled By: Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 5
4. Relational Algebra

Example
- Consider the following relations R and S, then R U S and R – S are given as shown to
the right.
R S RUS R–S
A B C D A B A B
1 2 1 2 1 2 5 6
3 4 4 3 3 4
5 6 4 3
5 6

- Find name and position of Employees that work on both Projects 1 and 2 as
Programmers.
Similar to the previous example, we can have
ρ Programmers 2( Name, Position ) (Π E.Name, E.Position (σ PrjId=2 AND T.Name="Programmer" (
σ ET.TeamId=T.TeamId (σ E.EmpId= ET.EmpId (
ρ E (Employees)Χρ T (EmpTeams )) Χρ T (Teams )))))

The employees working in both projects 1 and 2 are then given by relational
algebra:
Programmers1 I Programmers2

ª Additional Operations
The set of relational algebra operations {σ, Π, ρ,¯, U , –} is a complete set that the other
original relational algebra operations such as intersection, join, division and assignment can be
expressed as the sequence of the fundamental operations. In situation where the use of the
fundamental operators result complex and lengthy expressions such operators are helpful to
minimize the complexity of queries.

Natural Join Operation


In the Cartesian product R Χ S operation in the above example notice that a select operation is
used to retrieve the desired tuples from the joined relation that generates m*n tuples where m
and n are number of distinct tuples of R and S.
A frequent type of join connects two relations by:
Compiled By: Betiglu

& Equating attributes of the same name, and


& Projecting out one copy of each pair of equated attributes.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 6
4. Relational Algebra

Such a join is known as Natural Join and it is denoted by:


R >< S
Where ZY represents the NATURAL JOIN operator and R and S are relations to be
joined.
The product operation is the just like the product operation in set theory that maps each tuple in
relation are with every tuple in S.
Example
- Consider the following relations R and S, then R >< S is given as shown to the right.
R S R ZY S
A B B C D A B C D
1 2 2 a x 1 2 a x
3 4 4 b y 3 4 b y
5 c z

- The pervious example that retrieves the name and position of Employees that work on
Project 1 as Programmers from the modified relations below can be simplified as:
- Employees(EmpId, FullName, BDate, Age, Gender, Position, Salary)
- EmpTeams(EmpId, TeamId)
- Teams(TeamId, PrjId, TName, Descr)
Π E.FullName, E.Position (σ T.PrjId =1 AND T.TName ="Programmer" (
(ρ E (Employees) >< ρ T (EmpTeams )) >< ρT (Teams )))

Theta Join Operation


While the natural join enforces a join condition by equating similar attributes in the relations to
be joined; a theta join joins relations to an arbitrary condition C. The notation for theta join is:
R >< S
C

The result of the theta operation is constructed by:


& Taking the product of R and S, and
& Selecting only those tuples satisfying the condition C.
As with the product operation the schema for the resulting operation of the theta join is the
Compiled By: Betiglu

union of the schemas of R ands S. (That is, the operation does not eliminate repeated columns in
the two relations R and S if any).

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 7
4. Relational Algebra

Example

- Consider the following relations R and S, then R >< S is given as shown to the right.
R.B<S.B

R S R >< S
R.B <S.B

A B B C D A R.B S.B C D
1 2 2 a x 1 2 4 b y
3 4 4 b y 1 2 5 c z
5 c z 3 4 5 c Z

Assignment Operation
The assignment operation denoted by I is similar to assignment operation in programming that
helps to assign the result of a relational algebra from the right into a relation variable to the left.
Subsequent assignment operations can be used to develop complex sequential queries, the
intermediate assignment operations do not result any relation to the user.

Division Operation
The division operation denoted by ÷ is suited to queries that include “universal quantification” or
the phrase “for all”. A division operation is applied to two relations R(Z ) ÷ S ( X ) , where X ⊆ Z
and the result is T (Y ) where Y = Z − X . For tuples t to appear in the result T, the values in t
must appear in R in combination with every tuple in S.
The division operation can be expressed using the sequence of the fundamental operators as:

T 1 ← Π Y (R )
& T 2 ← Π Y ((S Χ T 1) − R )
T ← T1 − T 2
Example
- Consider the following relations R and S, then R ÷S is given as shown to the right.
R S R÷S
A B B A
1 2 2 1
1 4 4 3
3 2
2 4
Compiled By: Betiglu

4 4
3 4

- Retrieve all the projects that “Jhon” and “Dave” are jointly working as programmers.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 8
4. Relational Algebra

Π Pr ojects. Name (Pr ojects >< (σ Name="Pr ogrammer" (Teams >< (


EmpTeams ÷ Π EmpID (σ Name="Jhon" OR Name="Dave" (Employees))))))

ª Extended Operations
The basic relational algebra operations have been extended in several ways to enhance the
expressive power of the original relational algebra. Some of the extended operations are:
- Outer Join
- Extended Projection
- Duplicate Elimination
- Aggregation and Grouping, …

Outer Join Operation


The natural join operation R >< S results a tuple when there is a match to the common attributes
of the tuples in the relations R and S. Such joins are known as Inner Join operations. However,
there are cases when we want to have all the tuples from the participating relations and form the
join when there is much. In such cases outer join operations can be used to keep all the tuples in
R, or all those in S, or all those in both relations irrespective of they having matching tuples in
their common attributes.
The three types of outer join operators are:
˜ Left Outer Join
Left outer join denoted by
_
R −>< S
Keeps every tuples in the left relation R and when there is no matching for tuples in R
from tuples in S, the attributes of S are filled (padded) with NULL values.
˜ Right Outer Join
Right outer join denoted by
_
R ><− S
Similar to the left outer join operation it keeps all tuples in the right relation S and when
there is no matching for tuples in S from tuples in R, the attributes of R are padded with
NULL values.
Compiled By: Betiglu

˜ Full Outer Join


Full outer join denoted by
_ _
R −><− S
Department of Electrical and Computer Engineering | AAU
EENG 477- Database Systems 9
4. Relational Algebra

Keeps all tuples in both the left and right relations when no matching tuples are found,
padding them with NULL values as needed
Example
_
- Consider the following relations R and S, then R ><− S is given as shown to the right.

R S R ><− S
_

A B B C D A B C D
1 2 2 a x 1 2 a x
3 4 4 b y 3 4 b y
4 6 5 c z NULL 5 c z
7 d w NULL 7 d w

- Write relational algebra that retrieves all the projects and the corresponding teams in
the projects.
- Projects(PrjId, PName, SDate, DDate, CDate)
- Teams(TeamId, PrjId, TName, Descr)

((
Π P.PName, T.TName ρ P (Pr ojects ) −>< ρ T (Teams )
_
))
Extended Projection Operation
The extended (or generalization) projection extends the projection operation by allowing
arithmetic functions to be used in the project list to compute and produce new columns.
Example
- Write a relational algebra for calculating the net pay of employees.

Π Name , Salary , Salary-0.15*salary→Tax, Salary-Tax → NetIncome (Employees )

Duplicate Elimination
Not that the projection operation and the set operations discussed so far are set operations, that
is the resulting relation is a relation without duplication. But there are cases that may not purely
result relations without duplication (bag operations) in such situations the duplicate elimination
operator (δ) is used to eliminate duplicate tuples from the resulting relation. It is denoted by:

δ (R )
Compiled By: Betiglu

Aggregation and Grouping Operation


Aggregation functions (operators) such as SUM, COUNT, MIN, MAX, and AVG are collection
operators that return a single value as a result. Aggregation operators are not relational algebra

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 10
4. Relational Algebra

operators but they are used by the grouping operator (γ) that groups tuples according to their
values in one or more attributes. It is denoted by:

γ L (R )
Where L is either the list of grouping attributes in order or list of aggregation functions
applied to the attributes of the relation R.
Example
- Write a relational algebra that determines the number of teams all the employees are
working in.

γ Name , COUNT TeamId (Employees >< EmpTeams )

Sorting Operation
Sorting operator (τ) turns a relation into a list of sorted tuples according to one or more
attributes. The operator is used as a final step since all the other operators work on either a set
or a bag but not in a list. It is denoted by:

τ S (R )

Where S is a list of attributes of R indicating the sort order of the resulting relation.
Example
- Write a relational algebra that presents net pay of employees in order.

τ Name (Π Name , Salary , Salary-0.15*salary→Tax, Salary-Tax → NetIncome (Employees ))

ª Introduction to Relational Calculus


The relational algebra that is discussed so far is a procedural query language on a relational
database model. A relational calculus however is a declarative and nonprocedural expression
that specifies a retrieval request, and hence there is no description of how to evaluate the query
in a relational calculus. Rather, a relational calculus expression specifies what to be retrieved.
A relational calculus is classified into two as:
˜ Tuple Relational Calculus, and
˜ Domain Relational Calculus
Compiled By: Betiglu

Tuple Relational Calculus


A query in a tuple relational calculus (tuple calculus) is expressed as:

{t | p(t )}
Department of Electrical and Computer Engineering | AAU
EENG 477- Database Systems 11
4. Relational Algebra

Where t. is a tuple variable and p(t) a predicate (condition) that is to be true for the tuple t.
Formulas in the predicate of the tuple calculus are composed of atoms, variables and quantifiers
∃ (existential quantifier) and ∀ (universal quantifier)
Example
- Find all the employees that are working on projects having a due date before January 1,
2007.

⎧e.Name|Employees(e ) AND ( (∃et )(∃t )(∃p )(EmpTeams(et ) AND et.EmpId = e.EmpId AND ⎫
⎪ ⎪
⎨ Teams(t ) AND et.TeamId = t.TeamId AND ⎬
⎪ Pr ojects( p ) AND t.Pr jId = p.Pr jId AND p.DDate < 1/ 1/ 2007 ))⎪⎭

Domain Relational Calculus


A query in a domain relational calculus (domain calculus) uses domain variables that take on
values from an attributes domain rather than values for an entire tuple. It is expressed as:

{< x1 , x2 ,L xn >| p(x1 , x2 ,L xn )}


Where t x1 , x2 ,L xn represent domain variables and P is the predicate as in the case of
tuple calculus.
Formulas in the predicate are build in the same ways as the tuple calculus predicates.
Example
- Retrieve name and description of all the teams working on project named “banking db”.
- Projects(PrjId, PName, SDate, DDate, CDate)
- Teams(TeamId, PrjId, TName, Descr)

{yz|(∃x )(∃a )(∃b )(Teams(wxyz) AND Pr ojects(abcde) AND b ="banking db" AND x = a)}
Compiled By: Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 1
5. Structured Query Language (SQL)

5. Structured Query Language (SQL)

ª Introduction
Structured Query Language (SQL) is a query language that is standardized by the American
National Standards Institute (ANSI) for most commercial relational database management
systems (RDBMS). To retrieve or update information users execute 'queries' (SQL Statements)
to pull or modify the requested information from the database using criteria that is defined by
the user.
Unfortunately, there are many different versions of the SQL language, but to be in compliance
with the ANSI standard, they must support the same major keywords in a similar manner (such
as SELECT, UPDATE, DELETE, INSERT, WHERE, and others). Most of the SQL database
programs also have their own proprietary extensions in addition to the SQL standard such as
TSQL of Microsoft SQL Server and PLSQL of Oracle!
SQL supports data definition, query and update in Data Definition Language and Data
Manipulation Language (DML)

SQL Data Definition Language (DDL)


The Data Definition Language (DDL) part of SQL permits database tables to be created or
deleted. It can also define indexes (keys), specify links between tables, and impose constraints
between database tables.
The most important DDL statements in SQL are:
- CREATE TABLE - creates a new database table.
- ALTER TABLE - alters (changes) a database table.
- DROP TABLE - deletes a database table.
- CREATE INDEX - creates an index (search key).
- DROP INDEX - deletes an index.
The DDL statements are used for a schema definition of a relational database.

SQL Data Manipulation Language (DML)


The Data Manipulation Language (DML) is part of the SQL syntax for executing queries to
Compiled By: Betiglu

insert, retrieve, update, and delete records. The statements are;


- INSERT INTO - inserts new data into a database table.
- SELECT - extracts data from a database table.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 2
5. Structured Query Language (SQL)

- UPDATE - updates data in a database table.


- DELETE - deletes data from a database table.
The four most common commands are also known as SQL CRUD statements after the words
Create, Read, Update and Delete data.

ª Schema Definition in SQL


SQL uses the following terms for the corresponding terms in relational model
- Table – Relation
- Column – Attribute
- Row – Tuple

Ñ Schema Creation and Modification


The CREATE SCHEMA command in the SQL statement is used to group database objects such
as tables, views and permissions.
The syntax for the command is:
CREATE SCHEMA <schema_name> AUTHORIZATION <owner>
<schema_name> is the name of the schema and <owner> identifies the user who is the owner of
the schema.
Example:
- CREATE SCHEMA swprjct AUTHORIZATION dbo
SQL statements that can be included as part of the CREATE SCHEMA statement are:
& CREATE TABLE statement
& CREATE VIEW statement
& GRANT statement
& CREATE INDEX statement (not supported in Microsoft SQL Server 2000)
While CREATE SCHEMA command groups database objects the CREATE DATABASE
command in the SQL statement is used to create a new database and the corresponding files for
storing the database.
The syntax for the command is:
CREATE DATABASE <database_name>
Compiled By: Betiglu

Example:
- CREATE DATABASE SWPRJCT
<database_name> is the name of the new database.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 3
5. Structured Query Language (SQL)

The command also has different optional parameters in different RDBMS that helps in specifying
owner, file, growth, …

Ñ Table Creation and Modification


The CREATE TABLE command in the SQL statement is used to specify a new relation in a
database by giving it a name and listing its attributes.
The syntax for the command is:
CREATE TABLE <table_name> (
<column_name> <data_type> {column_constraint},
:
<column_name> <data_type> {column_constraint}
)
- <column_name> is the name of the column.
- <data_type> is the SQL supported data types: CHAR(n), VARCHAR(n), INT,
SMALLINT, DECIMAL(i,j), DATE, TIME (DATETIME), …
- {column_constraint} is optional constraints on the column such as NULL, NOT NULL,
PRIMARY KEY, FOREIGN KEY, UNIQUE, DEFAULT, …
Example:
For the PROJECTS and TEAMS relations the corresponding tables can be defined as:
- Projects (PrjId:integer, Name:string, SDate:date, DDate:date, CDate:date)
- Teams (PrjId:integer, Name:string, Descr:string)

CREATE TABLE Projects (


PrjId INT NOT NULL PRIMARY KEY,
Name VARCHAR(30) NOT NULL,
SDate DATE NOT NULL,
DDate DATE NULL,
CDate DATE NULL
)

CREATE TABLE Teams (


Compiled By: Betiglu

PrjId INT NOT NULL FOREIGN KEY REFERENCES Projects(PrjId),


Name VARCHAR(30) NOT NULL,
Description VARCHAR(100) NULL,
PRIMARY KEY (PrjId, Name)
)
Department of Electrical and Computer Engineering | AAU
EENG 477- Database Systems 4
5. Structured Query Language (SQL)

The primary key constraint in a relation is enforced by using the key word PRIMARY KEY
following the key attribute or incase of multiple attributes it can be specified on a separate line as
shown in the Teams table above.
The referential integrity constraint in a relational database is implemented by the use of a
foreign key. If the referential integrity enforced using a FOREIGN KEY is violated the default
SQL statement forces the rejection of the violating tuple. However, by the use of the optional
referential trigged actions the designer can attach clauses to the foreign key constraint such as:
- ON DELETE {CASCADE | NO ACTION | SET DEFAULT | SET NULL}
- ON UPDATE {CASCADE | NO ACTION | SET DEFAULT | SET NULL }
The default case is NO ACTION, on which the violating action is rejected. CASCADE option
ON DELETE deletes all the referencing rows on deletion of a row. SET DEFALT and SET
NULL allow replacing for all the referencing rows the column value by the default value or null
value. (Microsoft SQL Server 2000 doesn’t support SET DEFALUT and SET NULL)
The ALTER TABLE command allows modification (adding, changing, or dropping) of a column
or constraint in a table.
The syntax for the command is:
ALTER TABLE <table_name>
[ALTER COLUMN <column_name> <new_data_type>] |
[ADD <column_definition> | <constraint>] |
[DROP <column_name> | < constraint>]
- <table_name> is the name of the table to be altered.
- The ALTER TABLE command takes either of the three optional actions ALTER
COLUMN, ADD or DROP. The ALTER COLUMN option modifies an existing column
definition, the ADD option adds a new column or constraint and the DROP option drops
existing column or constraint.
Example 1:

ALTER TABLE Projects ALTER COLUMN PrjId SMALLINT


ALTER TABLE Teams ALTER COLUMN PrjId SMALLINT

Example 2:
Compiled By: Betiglu

CREATE TABLE Projects (


PrjId SMALLINT NOT NULL,
Name VARCHAR(30) NOT NULL,
SDate DATE NOT NULL,

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 5
5. Structured Query Language (SQL)

DDate DATE NULL,


CDate DATE NULL
)

CREATE TABLE Teams (


TeamId SMALLINT NOT NULL,
PrjId SMALLINT NOT NULL,
Name VARCHAR(30) NOT NULL,
Description VARCHAR(100) NULL,
)

ALTER TABLE Projects ADD


CONSTRAINT [PK_Projects] PRIMARY KEY CLUSTERED (PrjId)

ALTER TABLE Projects ADD Description VARCAHR(200)

ALTER TABLE Teams ADD


CONSTRAINT [PK_Teams] PRIMARY KEY CLUSTERED (TeamId)

ALTER TABLE Teams ADD


CONSTRAINT [FK_Teams_Projects] FOREIGN KEY (PrjId)
REFERENCES Projects (PrjId)

The DROP command is used to drop an exiting table, database or schema. The syntax for the
command is:
DROP TABLE <table_name>
DROP DATABASE <database_name>
DROP SCHEMA <schema_name>
Example:
DROP TABLE Projects

Ñ Index Creation and Modification


Indexes are the heart of fast data access. In fact, as the database grows, indexes are the guarantee
to fast data access. Data access can be fast without indexes, but only if the table is small. If the
table contains thousands or millions of rows, data access has to be done through indexes. Indices
in a book, helps to find information about a specific subject without having to read the entire
Compiled By: Betiglu

book. The same applies to a database index; it helps to find information about a specific row or
rows without having to search through the entire table.
An index for a table is managed by an external table which consists of the search key (index
attribute) and a pointer to the location of the data as columns.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 6
5. Structured Query Language (SQL)

Creating indexes is a straightforward process when done with the CREATE INDEX statements.
The basic CREATE INDEX statement is:
CREATE [CLUSTERED | NONCLUSTERED] INDEX <index_name>
ON {<table> | <view> } ( <column> [ ASC | DESC ] [ ,...n ] )
Example:
The following statement creates the DueDate nonclustered index on the Projects table:
CREATE INDEX DueDate ON Projects(DDate)

Clustered versus Nonclustered Index


For a clustered index the data is both stored and sorted on the index key; whereas, for a
nonclustered index the actual data is not stored in the index.
The default index is always nonclustered. One can create a clustered index by specifying it, as in
the following example:
Example:
CREATE CLUSTERED INDEX PrjId ON Projects(PrjId)

NOTE: A table can have only one clustered index. If a primary key constraint is created on a
table, a clustered index may be created to support the constraint.
The DROP command is also used with indexes to drop an existing database in a table. The
syntax for the command is:
DROP INDEX <index_name> [,...n ]
Example:
DROP INDEX DueDate

ª Simple Query Constructs and Syntax


The simplest DML query in the SQL statement is the SELECT-FROM-WHERE statement
used for retrieving information from a database. The SQL DML also supports data insertion,
modification and deletion through the INSERT INTO, UPDATE and DELETE statements.

Ñ The SELECT-FROM-WHERE Statement


The syntax for the SELECT-FROM-WHERE statement which consists of three clauses
Compiled By: Betiglu

SELECT, FROM and WHERE as shown below:


SELECT <column_list>
FROM <table_list>
WHERE <condition>

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 7
5. Structured Query Language (SQL)

- <column_list> is the list of column names whose values are retrieved by the query.
- <table_list> is the list of table names required in the process.
- <condition> is Boolean expression (conditional expression) that determines the rows to be
selected in the query. The expression is build from the logical comparison operators (=, >,
<, >=, <= and <>)
The column list in the SELECT clause can be replaced by an asterisk (*) to retrieve all the
columns in the participating tables.
The WHERE clause is an optional clause needed when a condition is to be set for retrieval of
rows, if the clause is not used in the statement, all the rows for the selected columns in the
specified tables will be retrieved.
Example:
A query to retrieve all the columns for all projects:
SELECT *
FROM Projects

A query to retrieve the name and due date of projects that are not yet completed:
SELECT Name, DDate
FROM Projects
WHERE CDate=NULL

A query to retrieve the projects name and corresponding team names for projects that are
not yet completed:
SELECT Projects.Name, Teams.Name
FROM Projects, Teams
WHERE Projects.PrjId=Teams.PrjId AND CDate=NULL

To retrieve all the columns from the team table:


SELECT Projects.Name, Teams.*
FROM Projects, Teams
WHERE Projects.PrjId=Teams.PrjId AND CDate=NULL

In SQL queries it may happen that two participating tables have columns with identical names,
to avoid the ambiguity of the columns the name of the table is used together with the column
name as shown above. Ambiguity may also arise if a single table is to participate more than once
in a query, in such situations an alias may be used for the tables as shown in the following query.
Compiled By: Betiglu

Example:
SELECT p.Name, t.Name
FROM Projects AS p, Teams AS t
WHERE p.PrjId=t.PrjId AND CDate=NULL

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 8
5. Structured Query Language (SQL)

The SELECT statement by default results a bag of rows rather than a set of rows (i.e. duplicate
rows may exist in the resulting rows). To remove duplicates and have a set of rows as a result
one can the DISTINCT key word on the SELECT clause as follows:
SELECT DISTINCT <column_list>
FROM <table_list>
WHERE <condition>
Example:
A query to retrieve employees name, the projects they are participating and due date of
the project.
SELECT DISTINCT e.Name, p.Name, p.DDate
FROM Employees AS e, EmpTeams AS et, Teams AS t, Projects AS p
WHERE e.EmpId=et.EmpId AND et.TeamId=t.TeamId AND p.PrjId=t.PrjId

If the SELECT is not DISTINCT the resulting table (view) will include identical set of
rows for an employee participating in different teams for same project.
Strings in the WHERE clause can be compared with the use of the comparison operators (=, <,
>, <=, >= and <>) and also the LIKE operator that provides the capability to compare strings
on the basis of pattern match. The expression is of the form:
S LIKE P
Where S is the string or the column name to be compared and p is the pattern constructed from
two special characters:
- _ : refers to a match to any one character in S, and
- % : refers to zero or more character sequences match in S.
String constants in SQL are enclosed by a single apostrophe. If the string consists of an
apostrophe a escape sequence with an apostrophe is used (i.e. two single apostrophes are used to
refer to a single apostrophe in a string constant).
The LIKE expression can also be used with the NOT operation as follows
S NOT LIKE P
Example:
A query to retrieve employees with a name starting by the letter ‘A’.
Compiled By: Betiglu

SELECT *
FROM Employees
WHERE Name=’A%’

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 9
5. Structured Query Language (SQL)

Ñ INSERT, UPDATE and DELETE

INSERT
The INSERT statement adds one or more new rows to a table. In a simplified treatment,
INSERT has this form:
INSERT INTO <table_name>| <view_name> [(column_list)] data_values
- data_values are one or more rows to be inserted into the named table or view.
- column_list is a list of column names, separated by commas, that can be used to specify the
columns for which data is supplied.
If column_list is not specified, all the columns in the table or view receive data. When a
column_list does not name all the columns in a table or view, a value of NULL (or the default
value if a default is defined for the column) is inserted into any column not named in the list. All
columns not specified in the column list must either allow null values or have a default assigned.
The data values supplied must match the column list. The number of data values must be the
same as the number of columns, and the data type, precision, and scale of each data value must
match those of the corresponding column.
There are two ways to specify the data values:
& VALUES (<value_or_expression> [,..n])
& SELECT <subquery>

The VALUES statement inserts a single row with the column values <value_or_expression> in
the columns listed in the INSERT INTO column list. The SELECT subquery is a standard
query that results a temporary table and the resulting rows in the table are inserted to the table
in the INSERT INTO clause. The columns in the subquery need to much the columns in the
columns list.
Example
INSERT INTO Projects(PrjId, Name, SDate)
VALUES (1, 'Test Project', '05-25-2006')
INSERT INTO Teams
VALUES (1, 1, 'Programmers Team 1', Programmers team for project 2.')
INSERT INTO Teams(TeamId, PrjId, Name)
Compiled By: Betiglu

SELECT TeamId+10, 2, Name FROM Teams WHERE PrjId=1

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 10
5. Structured Query Language (SQL)

UPDATE
The UPDATE statement changes the existing data in a table. The syntax for the UPDATE
command is:
UPDATE <table_name>| <view_name>
SET <column_name> = <value> [,..n]
WHERE <condtion>
- <value> is new value to be assigned to the column <column_name>
- The WHERE clause specifies the <condition> for selecting the rows to be modified. If the
WHERE clause is not included the update will be done for all existing rows in the table
Example
UPDATE Teams
SET Description = 'Programmers team for project2’
WHERE TeamId = 11

DELETE
The DELETE statement removes row(s) from a table. The syntax for the DELETE command
is:
DELETE FROM <table_name>| <view_name>
WHERE <condtion>
- The WHERE clause specifies the <condition> for selecting the rows to be deleted. If the
WHERE clause is not included the all existing rows in the table will be deleted unless
there is a constraint that protects the deletion of the rows.
Example
DELETE Teams
WHERE TeamId=2

ª Nested Subqueries and Complex Queries


The SELECT-FROM-WHERE statement discussed so far is the simplest SQL statement for
querying a database. SQL SELECT statements can be combined together to form Subqueries.
Subqueries in a SQL statement are complete form of SELECT-FROM-WHERE statements
Compiled By: Betiglu

that are contained in one query.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 11
5. Structured Query Language (SQL)

They can be used in different ways:


& Subqueries in the WHERE clause to form nested queries,
& Subqueries in set operations such as UNION, EXCEPT, …, and
& Subqueries in the FROM clause as constant tables

Ñ Nested Queries
SQL SELECT statements can be contained in the WHERE clause of another SQL statement to
form Nested queries. The SELECT statement that contains the nested query is said to be the
outer query. Subqueries in a nested SQL statement can produce scalar value (constant) or table.
Subqueries resulting scalar value can be used in comparison expression of the WHERE clause
similar to constant or column value comparisons. For subqueries that result table special
operators are used in the test expression such as the operator IN that is used to test the existence
of a scalar value in the resulting table.
Example:
Considering the following relations,
- Employees(EmpId, Name, BDate, SubCity, Kebele, Phone, Salary)
- Teams(TeamId, PrjId, Name, Descr)
- EmpTeams(EmpId, TeamId)
- Projects(PrjId, Name, SDate, DDate, CDate, CustId)
- Customers(CustId, Name, Address)
Write a query to retrieve all the projects that are owned by the customer ‘XYZ’. (Assume
name of a customer is unique)
SELECT Name, SDate, DDate, CDate
FROM Projects
WHERE CustId = (SELECT CustId
FROM Customers
WHERE Name=’XYZ’)

Alternative for the above query is;


SELECT p.Name, SDate, DDate, CDate
FROM Projects AS p, Customers AS c
WHERE p.CustId = c.CustId AND c.Name=’XYZ’

Write a query to retrieve employees name and phone that are participating on projects
Compiled By: Betiglu

that are owned by the customer ‘XYZ’.


SELECT Name, Phone
FROM Employees
WHERE EmpId IN (SELECT EmpId

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 12
5. Structured Query Language (SQL)

FROM EmpTeams AS et, Teams AS t


WHERE et.TeamId= t.TeamId AND
PrjId IN (SELECT PrjId
FROM Projects AS p, Customers AS c
WHERE p.CustId=c.CustId
AND c.Name=’XYZ’))

TEST Operators in subquery


˜ IN:- The IN operator is used to test the existence of a value in the resulting table of the
subquery. The subquery needs to result a single column table with compatible type to the
scalar value in the test expression.
˜ ALL:- The ALL operator is used with comparison operators (=, <>, >, <, >= and <=) to
compare the scalar value with all the rows in the resulting table. The expression will be
evaluated to TRUE if all the rows are satisfying the comparison operation. Otherwise, it
is evaluated to FALSE.
˜ ANY:- The ANY operator is also used with the comparison operators to compare a scalar
value with the rows in the resulting table and if there is at least one row which satisfies
the comparison, the expression is evaluated to TRUE.
˜ EXISTS:- The EXISTS function takes a subquery and returns TRUE if the resulting
table of the subquery has at least has one row, otherwise it returns FALSE.
Example:
Write a query to retrieve customers that at least own project
SELECT Name, Address
FROM Customers AS c
WHERE EXISTS (SELECT *
FROM Projects AS p
WHERE c.CustId=p.CustId)

Ñ SET Operation of Queries


From the discussion so far it is easily noted that the result of a query is set of tuples (rows).
Hence the set operations union, intersection and difference (UNION, INTERSECT and
EXCEPT) can be used in SQL statements that will consist of two subqueries as the operations
are binary. The two subqueries are enclosed in parenthesis and the operators are used in
Compiled By: Betiglu

between.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 13
5. Structured Query Language (SQL)

That is:
(Subquery 1)
UNION | INTERSECT | EXCEPT |
(Subquery 2)
For the set operations the two involving subqueries should select identical columns in their
<column-list> of the SELECT statement. If a need arise an alias (rename) of columns can be
used to have a common set of attributes.
Example:
Write a query to retrieve all the projects that are owned by ‘XYZ’ and ‘Dave’ is not
participating.
(SELECT p.Name, SDate, DDate, CDate
FROM Projects AS p, Customers AS c
WHERE p.CustId = c.CustId AND c.Name=’XYZ’)
EXCEPT

(SELECT Name, SDate, DDate, CDate


FROM Projects
WHERE PrjId IN (SELECT PrjId
FROM EmpTeams AS et, Teams AS t
WHERE et.TeamId= t.TeamId AND
EmpId IN (SELECT EmpId
FROM Employees
WHERE Name=’Dave’))
Unlike the SELECT statement that retrieves duplicate rows if there exist, the set operators do
not include duplicate in the resulting table. If all the resulting rows are to be retrieved the ALL
operator is used with the set operators to prevent duplicate elimination. That is:
- UNION ALL
- INTERSECT ALL
- EXCEPT ALL

Ñ Joined Tables
Most of the time queries are written to gather information from more than one table. In such
situation the FROM clause in the SELECT statement may consist of the tables lists and a
Compiled By: Betiglu

condition may be specified in the WHERE clause.


The other way of reading data from more than one table is with the use of joined tables in the
FROM clause.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 14
5. Structured Query Language (SQL)

The variations of join operation are:

Cross Product
<left_table> CROSS JOIN <right_table>
The Cross product forms the Cartesian product set from the participating tables.

Natural Join
<left_table> NATURAL JOIN <right_table>
The natural join forms a join of rows with identical values in the common attributes of the
participating tables.

Theta Join
<left_table> JOIN <right_table> ON<condition>
The theta join forms the theta join on the joining condition specified by the ON clause.

Outer Join
<left_table> [RIGHT | LEFT | FULL] OUTER {NATURAL} JOIN
<right_table> {ON<condition>}
The outer join forms a join from all the rows from the right table (if RIGHT is used), or the
left table (if LEFT is used), or both the tables (if FULL is used) to the other table on the
joining condition. When there is no much from the left or right table a NULL value is
replaced to the selected column. The NATURAL operator may be used optionally in the
OUTER JOIN to have a natural condition for the join.
Example:
The query that retrieves employees name and phone that are participating on projects
that are owned by the customer ‘XYZ’ can be modified as follows using joined tables:
SELECT Name, Phone
FROM Employees
WHERE EmpId IN (SELECT EmpId
FROM EmpTeams NATURAL JOIN Teams
WHERE PrjId IN (SELECT PrjId
FROM Projects AS p JOIN Customers AS c
Compiled By: Betiglu

ON p.CustId=c.CustId
WHERE c.Name=’XYZ’))

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 15
5. Structured Query Language (SQL)

A query to retrieve all the projects and the teams they consist if any:
SELECT p.Name, p.SDATE, p.DDate, t.Name, t.Descr
FROM Projects AS p LEFT OUTER JOIN Teams AS t ON p.PrjId=t.PrjId

The query returns all the projects, and if the projects are having teams the teams
will be joined with the teams as well. Projects having more than one team will be
joined with each teams in the resulting table.

Ñ Ordering Query Results


The resulting table from the SQL SELECT statement can be organized in some order. To
specify the order in which the resulting table is organized, the ORDER BY clause is used in the
statement. That is
ORDER BY {<order_column> [ASC | DESC]}[,..n]
The default order sequence is ASC that can be omitted.
Example:
A query that retrieves employees having a salary greater than 1200 ordered by their
salary followed by their name:
SELECT *
FROM Employees
WHERE Salary >=1200.0
ORDER BY Salary DESC, Name ASC

Ñ Aggregate Function in SQL


SQL allows grouping of resulting rows in a query so that aggregate functions can be applied to
make analysis and summary. The aggregate functions supported by the SQL statement are:

Summation
SUM (<column_name>)
Returns the sum of values for the numeric field (column) <column_name>.

Average
AVG (<column_name>)
Returns the average of values for the numeric field (column) <column_name>.
Compiled By: Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 16
5. Structured Query Language (SQL)

Minimum
MIN (<column_name>)
Returns the minimum of values for the numeric field (column) <column_name>.

Maximum
MAX (<column_name>)
Returns the maximum of values for the numeric field (column) <column_name>.

Count
COUNT (<column_name> | *)
Returns the number values in the column <column_name> or the number of rows in the
table if * is used.
Example:
A query that retrieves the number of employees
SELECT COUNT(*)
FROM Employees

A query that retrieves the average salary of employees


SELECT AVG(Salary)
FROM Employees

A NULL value in the aggregation function is ignored. That is a AVG function will not include
the NULL value in the average calculation in any way, and the COUNT(A) function counts only
the non-null values of column A.
The GROUP BY clause is used after the WHERE clause to group the result of the SQL
statement. The syntax for the clause is:
GROUP BY <group_column>[,..n]
HAVING <condition>
- The GROUP BY clause specifies the order in which the grouping is made.
- The HAVING clause set a condition <condition> for the rows that are included in the
grouping query.
Compiled By: Betiglu

The columns that can be included in the HAVING clause are:


& Aggregated column from the FROM clause tables, or
& Unaggregated columns from the GROUP BY list

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 17
5. Structured Query Language (SQL)

Example:
A query that counts the number of teams a given Project is having
SELECT p.Name, COUNT(t.Name) AS NoOfTeams
FROM Projects p JOIN Teams t ON p.PrjId=t.PrjId
GROUP BY p.Name

To add on a condition for grouping such as only projects that are having teams more
than 2.
SELECT p.Name, COUNT(t.Name) AS NoOfTeams
FROM Projects p JOIN Teams t ON p.PrjId=t.PrjId
GROUP BY p.Name
HAVING COUNT(t.Name)>2

For only the projects that are having a name starting with ‘A’, the query can be written
as:
SELECT p.Name, COUNT(t.Name) AS NoOfTeams
FROM Projects p JOIN Teams t ON p.PrjId=t.PrjId
GROUP BY p.Name
HAVING p.Name LIKE ‘A%’

SUMMARY
In summary the general form the SELECT statement is as follows
SELECT select_list
FROM table_source
[WHERE search_condition]
[GROUP BY group_by_expression]
[HAVING search_condition]
[ORDER BY order_expression [ASC | DESC] ]
The clauses enclosed in the square brace are optional clause that can be omitted.

ª Views
View in the context of SQL is a virtual table that is derived from one or more tables in an
alternative way. View does not necessarily exist in physically form rather they are used for
Compiled By: Betiglu

retrieving data, as well as updating or deleting rows in some other tables in a different form that
is frequently used.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 18
5. Structured Query Language (SQL)

Example:
It may be frequently required to retrieve employees and the projects they are
participating. In such case it would be advantageous to have a view that consists of the
employees and the projects instead of making the join operation all the time.
It should also be noted that as data in the original table changes, so does data in the view. This is
because a view isn't really a table itself, but only a way to look at part of the original table.
As views are generated from tables there will be restrictions on the views that may not allow
data update, insert or delete. But if the views allow data update, rows updated or deleted in the
view are updated or deleted in the corresponding table as well.
The CREATE VIEW statement is used to create views and the syntax is as follows:
CREATE VIEW <view_name>
AS <select_statement>
Example:
Considering the following relations,
- Employees(EmpId, EmpName, BDate, SubCity, Kebele, Phone, Salary)
- Teams(TeamId, PrjId, TeamName, Descr)
- EmpTeams(EmpId, TeamId)
- Projects(PrjId, PrjName, SDate, DDate, CDate, CustId)
A view for having retrieve employees and the projects they are participating.
CREATE VIEW ProjectEmployees
AS SELECT p.*, e.*
FROM Projects AS p NATURAL JOIN Teams AS t NATURAL JOIN EmpTeams AS
et NATURAL JOIN Employees AS e

REMARK: All the columns in the SELECT statement must have valid unique names. Otherwise
an alias has to be used for the columns.
Views as the database objects can also be dropped using the DROP statement as follows:
DROP VIEW <view_name>
Example:
To drop the ProjectEmployees view.
Compiled By: Betiglu

DROP VIEW ProjectEmployees

Views can be used in the SELECT statement FROM clause similar to the tables.

Department of Electrical and Computer Engineering | AAU


EENG 477- Database Systems 19
5. Structured Query Language (SQL)

Example:
To retrieve all the employees working on ‘XYZ’ Project.
SELECT EmpName
FROM ProjectEmployees
WHERE PrjName=’XYZ’

ª Embedded and Dynamic SQL


The idea of the embedded SQL arose long before the SQL procedural extensions were developed.
It was introduced by IBM in the beginning of the 1980s and then implemented by many other
SQL vendors. The dynamic SQL was the logical continuation of the embedded SQL principles
that alleviated some limitations and inconveniences of the latter.
Embedded SQL is placed inside a host program, not built on the fly. The actual SQL code is
embedded into the application code and converted to the code by a preprocessor and then
compiled normally. The preprocessor converts these statements into API calls appropriate for
the host language.
Static SQL is simply native SQL code that is handled normally. It doesn't change at runtime and
it is a constant string literal. Since SQL has to exist in a host program to talk to the outside
world, it will be embedded somewhere. Or it can be part of a trigger, stored procedure, etc.
Dynamic SQL is made up on the fly by a procedure or end user as a string containing SQL
statements that may change at runtime. For instance the where clause on a SQL statement may
be dependent on factors not known at compile time. The ANSI Standards have PREPARE and
EXECUTE statements for this. Microsoft and other vendors will do it differently, but it is the
same idea. Like having Query Analyzer in your program -- and just as dangerous.
The difference between static and dynamic SQL has to do with when the plan for database access
is determined. With static SQL the plan is determined before your program ever runs (or at least
could be). This means that the database doesn't have to figure out how to find the data you are
interested in at runtime. It also means that if the database statistics change radically the plan
used by your query may become out of date.
The plan used to execute dynamic SQL statements is determined at runtime. This means that
knowledge only available at runtime may be used to form the SQL statement. It also means that
the plan will be up to date with the current database statistics. Unfortunately the database will
have to do extra work at runtime to determine what the plan should be.
Compiled By: Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 1
6. Data Storage and Querying

6. Data Storage and Querying


ª Storage and File Structure
Ñ Introduction
A database system is designed to hold large size of data that need to be physically (permanently)
on the storage medium. The storage medium in a computer can be categorized as:
˜ Primary Storage: Storage media that have direct access to the CPU: the main memory
and the cache.
& Cache is the lowest level in the memory hierarchy
Tertiary Storage
that is built inside the microprocessor chip. Typically
the response time is in nanoseconds.
Secondary Storage
& The main memory is the next level in the hierarchy
that provides the main working environment for the
Main Memory
CPU to keep the programs and data.
˜ Secondary Storage: Storage media for permanent storage Cache
such as magnetic disk and optical disk. Larger in capacity but Fig 1. Memory Hierarchy
significantly slower than the primary storages. Typical
response time is in milliseconds. The secondary storage is used as a virtual memory, disk
storage, and file system.
˜ Tertiary Storage: Storage media that are used for archive and backup storage data, such
as magnetic tape. Typical response time is few seconds or even in minutes.
Virtual memory is storage on the disk that can be often addressed by 32 bit address space, hence
232~4GB of data can be managed.
Ñ Secondary Storage Device (Disk) Structure
The disk drive consists of two movement structure:
- Disk Assembly
- Head (Arm) Assembly
The disk consists of circular platters that are rotating around the spindle by the disk assembly.
Each platter has surface covered with a thin layer of magnetic material. The platters may be
double-sided (dual surface) both upper and lower or single-sided. The surfaces are organized
into tracks that are concentric circles of distinct diameter in each platter. The corresponding
tracks in the disc pack (platter) form cylinders. The trackers are further divided into sectors
which are segment of the circle separated by gaps.
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 2
6. Data Storage and Querying

The head assembly of the disk is placing


the disk heads for each surface closer to
the track and the disk assembly rotates the
disk to locate the first sector to be read or
written. The movement of the disk
assembly and the head assembly for data
read/write is managed by a processor
known as disk controller.
While sectors are physical units of disk for
bit storage, blocks are logical that are set
during disk formatting the operating
system. Typical size of blocks is in a range Fig 2. Disk Structure
from 512 to 4096 bytes.
Example:
A disk is having 8 double-sided platters. Each surface is divided into 214=16384 tracks
with 128 sectors. There is 4096bytes space per sector. Determine the size of the disk.
- Bytes per sector = 4,096bytes
- Bytes per track = 128*4096 = 524,288bytes
- Bytes per surface = 16384*524288 = 858,9934,592bytes
- Bytes in disk = 16*8589934592 = 137,438,953,472bytes
- Disk Size = 128GB
Exercise:
An HDD (hard disk drive) is labeled with parameters given below. Determine the
permissible sector size.
- 30GB
- 16383 Cylinders
- 16 Heads
- 224 Sectors per Track.
Data from disk to memory or from memory to disk are translated in blocks. And to transfer a
disk block, given its address the disk controller first locates the sector containing the block. The
time taken from the moment an instruction of data reading from disk is issued to the time the
data appear in memory is known as disk latency.
Betiglu

The major components of the latency of the disk are:

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 3
6. Data Storage and Querying

& Seek Time: time taken for read/write head to locate the proper track (cylinder). Typical
range for seek time is 7 to 10 millisecond.
& Rotational Latency (Delay): time taken to locate the sector containing the first desired
block. Typical rotational latency is 1 cycle per 10 milliseconds.
& Transfer Time: time to transfer data to the memory.
Ñ Data Representation
Data is stored in a form of record that consists of a collection of related data times. The data
items or values form a sequence of bytes that corresponds to particular fields.
Data type representation:
- INTEGER – 4 Bytes
- FLOAT – 4 or 8 Bytes
- DATETIME – 8 Bytes
- CHAR(n) – n Bytes; pad character (┴) is used to fill in unused characters’ bytes.
- VARCHAR(n) – maximum of n+1 Bytes; unused characters’ bytes are ignored.
- Enumerated types – represent integer codes with the request bytes.
Fixed Length Record Representation
Example:
Consider the Employees table:
Employees(EmpId, Name, BDate, Address, Salary)
- EmpId – INTGER – 4 Bytes
- Name – CHAR(30) – 30 Bytes
- BDate – DATETIME – 8 Bytes
- Address – VARCHAR(50) – 51 Bytes
- Salary – FLOAT – 4 Bytes

EmpId Name BDate Address Salary


0 4 34 42 93

Thus the record is represented as:


The record takes 97 Bytes.
The number of bytes at which a field begins is said to be the offset of the field.
Betiglu

- Thus offset of EmpId is 0, Name is 4, BDate is 34, …

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 4
6. Data Storage and Querying

In some machines the offset is required to be a multiple of 4 numbers


Example:

EmpId Name BDate Address Salary


0 4 36 44 96

The Employee record in multiple of 4 offset representation is as follows


The record takes 100 Bytes.
Record Header
A record representation may include information that may describe the record in a form of a
record header. A record header may consist of:
- The record schema: a pointer to the schema definition.
- The length of tee record.
- Time stamp for the record.
Example:
Time Stamp
To Schema

Length

EmpId Name BDate Address Salary


0 12 16 48 56 108

The record size is 112 Bytes.


Records are packed into blocks having block header as well.

Block HeaderR1 R2 Rn
The block header may consist of:
- Link to one or more other blocks.
- Information about the block.
- Information about the relation.
- Directory for the offset of each record.
- Block ID.
- Time stamp for the block.
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 5
6. Data Storage and Querying

Variable Length Record Representation


(Reading Assignment)
Ñ File Organization
File organization refers to the method of arranging a data of file into records on external
storage. Records organized on storage media can be physically located with use of record id.
However, the user expects mainly to apply a search condition based on certain field in the record.
Hence the designer of the file organization needs to look for a structure that can locate the
records easily.
One possible way is the use of indexes; data structures that allow to find the record ids of records
with given values in index search key fields.
There are also alternatives, each ideal for some situations, and not so good in others:
˜ Heap (Random Order) files
Heap Files also known as Pile Files are suitable when typical access is a file scan
retrieving all records. Records are placed in the file in the order in which they are
inserted where there is space.
Insertion is very efficient: the last disk block of the file is copied into memory; the new
record is added and rewritten back to the disk.
Searching is expensive: the only search possible is linear (exhaustive) search of block by
block.
Deletion requires periodic reorganization: the record to be deleted is located and the
block is fetched to memory the record is then deleted and the block is rewritten to the
disk. A deletion mark may be used to mark delete record and different mark is used for
valid record. The file is reorganized by accessing each block and packing records and
removing deleted records for claiming unused spaces.
˜ Sorted Files
Sorted Files also known as Sequential Files are best if records must be retrieved in
some order, or only a ‘range’ of records is needed. The records are physically ordered
based on the value of the desired field.
Insertion is expensive: the proper location for the incoming record needs to be located
and space has to be created (may require data movement) then can only the record be
added. To minimize time and improve insertion efficiency space may be interleaved
between block as overflow (transaction) block.
Searching is efficient: binary search is applicable in the ordering key. But searching with
Betiglu

the other criteria is similar to the heap file organization.

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 6
6. Data Storage and Querying

Deletion is expensive: similar to the insertion operation deletion may also involve large
data movement.
Update: may require data reorganization if the updated field is the ordering key,
otherwise update operation is simple operation that requires block reading; modifying the
record and rewriting the block back to disk.
˜ Indexes
Indexes are data structures to organize records via trees or hashing. Like sorted files,
they speed up searches for a subset of records, based on values in certain “search key”
fields. Updates are much faster than in sorted files.
Hashing
A file organization based on hashing provides fast access to records on certain search
condition. In hashing a hash function also know as randomizing function, h() is applied to
the hash field value to yield an address of the disk block in which the record is to be
stored.
B-Trees
The B-Tree data structure can be used as the primary organization of the records. B-Tree
is also used in indexing.

ª Indexing and Hashing


Indexes are auxiliary access structures that are used to speed up the retrieval of records in
response of a certain search condition.
There are two basic kinds of indexes:
& Ordered Indexes: Sorted order of the values in a key field.
& Hash Indexes: Uniform distribution of values across a range of buckets based on a hash
function.
Ñ Ordered Indexes
A file with a record structure having several fields (or attributes) is often accessed through an
index structure defined on a single field of the file called search key or indexing field. A single
file may have several index structures on various search keys. If the file is physically organized
sequentially in the search key then the index is said to be Primary Index or Clustering Index;
however if the search key specifies an order different from the sequential order of the file are
called Secondary Index or Non-clustering Index.
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 7
6. Data Storage and Querying

Primary Index
An index record (or index entry) is a separate file from the data file that consists of the search
key values and pointers to one or more records. The file is an ordered record with the search key
and the pointer identifies a disk block and an offset within the block to identify the record. The
primary index is mostly built from the primary key and in this context the record has distinct
values in the search key.
There are two types of ordered indexes namely dense index and sparse index.
Dense Index: has an index record for every search key in the data file. The number of entries in
a dense index is equal to the number of records in the data file.
Sparse (Non-dense) Index: has index entry for only the first records in a block known as
anchor record of the block. The numbers of entries in the index file is equal the number of
blocks for the data file. To locate a record with the help of the search key from a sparse index a
block that is pointed by the index entry with the largest search key value that is less than or
equal to the searched value is read.
NOTE: A single data file can have only one primary or clustering index.
Example:
Index File Data File: Employees Index File Data File: Employees
Key Ptr Name … Salary Key Ptr Name … Salary
Adam Adam 2500 Adam Adam 2500
Charles Charles 1400 Elisabeth Charles 1400
Dave Dave 1200 Kevin Dave 1200
Elisabeth
Helen Elisabeth 1600 Elisabeth 1600
Helen 1500 Helen 1500
John John 1200 John 1200
Kevin
Mary Kevin 2300 Kevin 2300
Mary 1800 Mary 1800
Fig 3. a) Dense index b) Sparse Index for Primary Index

Secondary Index
Secondary indexes provide a secondary way of accessing the data file. Since the data file is not
organized in the search key of the secondary index a block anchor can not be used for having a
sparse index in the secondary index. Hence a secondary is necessarily a dense index. Secondary
indexes enhance the performance of queries that use keys other than the search key of the
primary index.
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 8
6. Data Storage and Querying

Example:

Index File Data File: Employees


Key Ptr Name … Salary
1200 Adam 2500
1200 Charles 1400
1400 Dave 1200
1500
1600 Elisabeth 1600
1800 Helen 1500
John 1200
2300
2500 Kevin 2300
Mary 1800
Fig 4. Secondary Index

Multilevel Indexes
The main reason for having an index file is to have better search algorithm such as binary search
that reduces response time considerably. The binary search requires (log2bi) block access for an
index file having bi blocks. For large data size the index file will also increase in size and it may
not be kept in memory, hence require several disk block reads.
Example:
Consider a data file having 10,000,000 records and the block size is 10 records for
data and 100 for index (block factor). Determine the maximum number of block access
for a data search using:
a. A sequential search with no index.
b. A sequential search on the dense primary index search key.
c. A sequential search on the sparse primary index search key.
d. A binary search on a dense primary index search key, and
e. A binary search on a sparse primary index search key.
Solution
a. 10,000,000/10 = 1,000,000 blocks
b. 10,000,000/100 + 1 = 100,001 blocks
c. (10,000,000/10)/100 + 1 = 10,001 blocks
d. log2(100,000) + 1 = 17 + 1 = 19 blocks
e. log2(10,000) + 1 = 14 + 1 = 15 blocks
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 9
6. Data Storage and Querying

REAMRK: Block Factor (bfr) is the ratio of the block size to record size either for the data file
or index file. That is, a number of data or index records that can fit in a block.
To deal this problem, the primary index file (also applicable for secondary index) is treated as the
data file and a sparse index is built on top of it. The idea behind this logic is a multilevel index
that reduces block access for reading the index file as well. The index file that is used for creating
the other primary index is referred to as the first-level index of the multilevel index and the
index on the first index is called second-level index of the multilevel index, and so on.

2nd Level Index 1st Level Index Data File: Employees


Key Ptr Key Ptr Name … Salary
Adam Adam Adam 2500
John Charles Charles 1400
Dave Dave 1200
Elisabeth
Helen Elisabeth 1600
Helen 1500
John John 1200
Kevin
Mary Kevin 2300
Mary 1800
Fig 5. Two-level Index on a Dense Primary Index

Example:
Example:
For the previous example consider a two-level index, then
a. A binary search for the dense primary index will result:
10,000,000/100 = 100,000 Æ 100,000/100 = 1,000
log2(1,000) + 1 + 1= 10 + 2 = 12 block access
b. A binary search for the sparse primary index will result:
1000,000/100 = 10,000 Æ 10,000/100 = 100
log2(100) + 1+ 1 = 7 + 2 = 9 block access
For a three-level sparse primary index the binary search will have:
log2(100/100) + 1 + 1+ 1 = 1 + 3 = 4 block access
Ñ B-Tree Index Files
B+-Tree Index Structure
The B+-tree index structure is a form a balanced tree in which every path from the root of the
tree to a leaf of the tree is equal length. Each non-leaf node in the tree has between n/2 and n
Betiglu

children where n is fixed for a particular tree.

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 10
6. Data Storage and Querying

The B+-tree index structure imposes performance overhead on insertion and deletion and adds
space overhead too; however it alleviates degradation on performance as the file grows, both for
index lookup and for sequential data scan.
A typical node of a B+-tree index structure is as follows:

P1 K1 P2 K2 P3 Kn‐1 Pn
Fig 6. B+-Tree Node

K1, K2, … Kn-1 are the search keys and P1, P2, … Pn are pointers that point to either a file or
record if the node is a leaf or a next level node in the tree structure otherwise.
Example:
John

Dave Eliz Kevin

Adam Carl Dave Eliz Helen John Kevin Mary

Fig 7. B+-Tree Index Structure with n=3

B-Tree Index Structure


B-Tree index structure is a B+-tree index structure that does not allow the repetition of search
key values.
Typical nodes of a B-tree index structure are as follows:

P1 K1 P2 K2 P3 Kn‐1 Pn B1 P1 K1 P2 B2 K2 P3 B3 Kn‐1 Pn
(a) Leaf Node (b) Non-leaf Node
Fig 8. B-Tree Node

Example:

Dave John

Adam Carl Eliz Helen Kevin Mary

Fig 9. B-Tree Index Structure with n=3


Betiglu

(Read on Insertion and Deletion of a node from a B+-Tree/B-Tree index structure.)


Department of Electrical and Computer Engineering | AAU
EENG 447- Database Systems 11
6. Data Storage and Querying

Ñ Hash Index
The Hashing technique on file organization avoids the need for accessing index structure that
may require more disk access (I/O operation). Using hashing file organization the block of a
record is determined by computing a hash function on the search key. A storage that can store
one or more records having similar hash function result is referred to as bucket.
The hash function takes the search keys and uniformly randomizes the records in the buckets.
& Uniform distribution: the hash function assigns each bucket the same number of search
key values from the set of all possible search key values.
& Random distribution: the hash value will not be correlated to any externally visible
ordering on the search key values; the hash function will appear to be random.
Example:
A hash function that finds the sum of the binary representation of the characters in the
search key value and take modulo to the number of buckets.
Bucket Overflow
The main reasons for bucket overflow are:
˜ Insufficient Buckets: the number of buckets assigned may not be sufficient for the
current data size. The number of buckets (nB) must be chosen in such way that it is
greater than the number records (nT) divided by the number of records that can fit in a
bucket (fT). That is nB>nT/fT.
˜ Skew: some buckets may hold more records than others, and they may go overflow while
the others are still having space. The major reason for skew are:
- Multiple records for same search key,
- Non-uniform distribution of search key by the hash function.
Best solution for the overflow of buckets is the use of dynamic hashing (example: extendable
hashing) that can be modified dynamically to accommodate the growth or the shrinkage of the
database. But if a static has to be used then to avoid the consequences of overflow one can choose
either of the following options:
- Choose a hash function based on the current size, or
- Choose a hash function based on the anticipated size of the file, or
- Periodically organize the hash structure in response to file growth or shrinkage.
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 12
6. Data Storage and Querying

ª Query Processing and Optimization


Query Processing refers to the range of activities involved in extracting data from a database.
The basic steps in query processing are:
& Parsing and Translation
& Optimization
& Evaluation
Parser and Translator: The parser part of the Parser and Translator phase of the query
processing is the one that is responsible for identifying the language tokens such as SQL
keywords, attribute names, and relation names in the text query and checks for the query syntax.
An internal representation of query is created as a tree data structure known as query tree. The
translator then translates the query blocks from the query data structure into relational algebra
expressions.
Optimizer: The optimizer phase of the query processing optimizes the relational algebra
expression using various algorithms for the query blocks and produces an evaluation plan for
execution. The optimizer evaluates the cost of operations to select the optimized evaluation plan.
Evaluation Engine: The evaluation engine also known as Query Execution Engine takes a
query evaluation plan from the optimizer, executes the plan, and returns the answer to the query.

Query Parser and Relational Algebra


Translator Expression

Optimizer

Query Evaluation Execution Plan


Output Engine

Statistics
Data about Data

Fig 10. Steps in Query Processing


Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 1
7. Integrity and Security

7. Integrity and Security


The two fundamental concepts that need to be considered while designing database systems are:
& Maintaining the consistency of the database to all the changes, and
& Protecting the database from unauthorized users.
The concepts lead us to the study of Integrity and Security.

ª Constraints and Triggers


Ñ Constraints
Integrity constraints ensure that the changes made to the database by authorized users do not
result in a loss of data consistency. It is a predicate to the database that needs to be asserted at all
time.
Types of Constraints
˜ Key Constraints (Entity Integrity)
˜ Foreign Key Constraints (Referential Integrity)
˜ Domain Constraints (Domain Integrity)
˜ General Constraints (User Defined Integrity)
Key Constraints and Foreign Key Constraints are discussed in chapter 5.
Domain Constraints
A domain constraint is a predicate on an attribute A of each tuple of a relation to be atomic value
from a domain set domain(A).
Example:
Salary of an employee is a two decimal point numeric field in a range 150 to 6000
Domain constraint is created using the CREATE command as follows.
CREATE DOMAIN <domain_name> <data_type>
CONSTRAINT <constraint_name> CHECK <constraint>
The CHECK statement can also be directly applied to a column without defining a domain.
Example:
CREATE DOMAIN BasicSalary NUMERIC(9, 2)
CONSTRAINT SalaryRange CHECK (VALUE>=150.00 AND
VALUE<=6000.00)
Betiglu

Then after the “BasicSalary” can be used for column definition data type.
Department of Electrical and Computer Engineering | AAU
EENG 447- Database Systems 2
7. Integrity and Security

Or the CHECK constraint can be used in a column definition as follows:


Salary NUMERIC(9, 2) CHECK (Salary>=150.00 AND Salary<=6000.00)

The CHECK constraint can also be used in a table definition as a tuple based constraint as
CHECK (<logical_expression>)
Example:
CREATE TABLE Employees (
:
CONSTRAINT EmpDate_Constraint CHECK (EmpDate <= GETDATE())
)

General Constraints
A general constraint or a user defined constraint is an assertion defined by the user requirement.
Domain and Referential integrity constraints are special types of general constraint set by the
requirement.
The syntax for general assertion is:
CREATE ASSERTION <assertion_name> CHECK <predicate>
The <predicate> is a valid conditional expression similar to the <condition> in the WHERE
clause of the SELECT-FROM-WHERE statement.
Example:
Constraint on the number of employees in a team:
CREATE ASSERTION NumberOfTeamMembers CHECK
(8 >= ALL (SELECT EmpId FROM EmpTeams GROUP BY TeamId)

When an assertion is created the system tests it for validity of the predicate and if the assertion
is valid then can only any future modification to the database is allowed.
Ñ Triggers
Triggers are statements that the database management system executes automatically in
response to a modification to the database. Triggers need to specify:
- The event that will cause or initiate the trigger execution,
- Condition to be specified for the trigger execution to proceed, and
- The action to be taken in response.

condition ?
event ⎯⎯ ⎯ ⎯ ⎯⎯→ action
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 3
7. Integrity and Security

The trigger action may be used to inform respective administrators to take actions through
email, or it may execute some operation in response.
The trigger events are:
- INSERT, DELETE, UPDATE and SELECT.
The actions for the triggers may be taken:
- After successful completion of the operation (event): AFTER
- Before the execution of the operation (event): BEFORE (INSTEAD OF)
The syntax for the trigger statement is:
CREATE TRIGGER <trigger_name>
ON {<table>|<view>}
{FOR | AFTER | INSTEAD OF} {[INSERT] | [UPDATE] | [DELETE] |
[SELECT]}
AS
<SQL_Statement>

ª Security and Authorization


Ñ Security
Database security refers to protection of the database from malicious access such as:
- Unauthorized reading of data,
- Unauthorized modification of data, and
- Unauthorized destruction of data.
Some of the threats to the database because of malicious access are:
- Loss of integrity,
- Loss of availability,
- Loss of confidentiality
Security measure levels
- Database System,
- Operating System,
- Network,
- Physical,
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 4
7. Integrity and Security

- Human
Database System Security
Database system security can be implemented with the use of:
- Account and Role Creation,
- Privilege granting,
- Privilege revocation, and
- Security level assignment.
Ñ Authorization
Authorization levels in a database system can be set at broad categories as:
˜ Data Level Authorization
- Read
- Insert
- Update
- Delete
˜ Schema Level Authorization
- Index
- Resource
- Alter
- Drop
Privilege Granting
The syntax for privilege granting is as follows:
GRANT <privilege_list> ON {<table>|<view>}
TO <account_list> [WITH GRANT OPTION]
<privilege_list> is possible data level authorization for the table or view stated as:
{SELECT | INSERT | UPDATE | DELETE | ALL}
To grant access to a specific column in a table
GRANT {SELECT | UPDATE } (<column>) ON {<table>|<view>}
TO <account_list> [WITH GRANT OPTION]
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 5
7. Integrity and Security

To grant access for the column to be referenced in a foreign key or view that requires schema
building use:
GRANT REFERENCES (<column>) ON {<table>|<view>}
TO <account_list> [WITH GRANT OPTION]
Privilege Revoking
The syntax for privilege revoking is as follows:
REVOKE <privilege_list> ON {<table>|<view>}
FROM <account_list> [RESTRICT | CASCADE]
To revoke grant option from an account:
REVOKE GRANT OPTION FOR <privilege_list> ON {<table>|<view>}
FROM <account_list>
Privilege Denying
The syntax to deny a privilege from an account list is:
DENY <privilege_list> ON {<table>|<view>}
TO <account_list> [CASCADE]

ª Encryption and Authentication


Ñ Encryption
Cryptography is the art or science concerning the principles, means, and methods for rendering
plain information unintelligible, and for restoring the encrypted information to intelligible form.
Encryption is a transformation of intelligent (plain text) to unintelligent massage (cipher text).
encryption
P ⎯⎯ ⎯ ⎯ ⎯ ⎯→ C
K
(P )
Decryption is the reveres process of encryption in which the cipher text is translated into a plain
text.

decryption
P ←⎯ ⎯ ⎯ ⎯ ⎯⎯ C
K
(P )
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 6
7. Integrity and Security

Modern Cryptography systems can be broadly classified into Symmetric-key systems that use a
single key that both the sender and recipient have, and Asymmetric-key systems also known as
public-key systems that use two keys, a public key known to everyone and a private key that
only the recipient of messages uses.
˜ Symmetric Key Algorithms
- DES (Data Encryption Standard)
- IDEA (International Data Encryption Algorithm)
˜ Asymmetric Key Algorithms
- RSA (Rivest, Shamir and Adleman)
- DSA (Digital Signature Algorithm )
Ñ Authentication
Authentication is a process of verifying the identity of a user who is claimed to be.
There are two ways of authenticating a user:
- Use of Password
- Challenge Response.
With the use of a password a user is requested for user name and password upon login to a
system.
In a challenge response, the system sends a challenge string to the user upon login request; then
the user encrypts the message and sends the encrypted message to the system. The system
verifies the user by comparing the originally send challenge string and decrypted message
received from the user. For the encryption process a symmetric-key or a public-key algorithm
may be used. In the symmetric-key algorithm the shared key is saved in the system where as in a
public-key algorithm the public key is the only key know by the system and the private key
remains secret with the user.
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 1
8. Object Oriented Databases

8. Object Oriented Databases


ª Object Analysis
Object-oriented databases (OODBs) evolved from a need to support object-oriented
programming and to reap the benefits, such as system maintainability, from applying object
orientation to developing complex software systems. The first OODBs appeared in the late
1980s. Martin provides a complete list of these early OODBs. OODBs are based on the object
model and use the same conceptual models as Object-Oriented Analysis, Object-Oriented Design and
Object-Oriented Programming Languages. Using the same conceptual model simplifies
development; improves communication among users, analysts, and programmers; and lessens the
likelihood of errors.
Object-oriented analysis (OOA) is concerned with developing software engineering
requirements and specifications that expressed as a system's object model (which is composed of
a population of interacting objects), as opposed to the traditional data or functional views of
systems. OOA can yield the following benefits:
˜ Maintainability that refers to the ease with which a software system or component can
be modified to correct faults, improve performance, or other attributes, or adapt to a
changed environment. Maintainability is supported through simplified mapping to the
real world, which provides for less analysis effort, less complexity in system design, and
easier verification by the user;.
˜ Reusability of the analysis artifacts which saves time and costs; and depending on the
analysis method and programming language. It refers to the degree to which a software
module or other work product can be used in more than one computing program or
software system
˜ Productivity that refers to the quality or state of being productive gains through direct
mapping to features of Object-Oriented Programming Languages.
An object is a representation of a real-life entity or abstraction. For example, objects in a flight
reservation system might include: an airplane, an airline flight, an icon on a screen, or even a full
screen with which a travel agent interacts. OOA specifies the structure and the behavior of the
object- these comprise the requirements of that object. Different types of models are required to
specify the requirements of the objects. The information or object model contains the definition
of objects in the system, which includes: the object name, the object attributes, and object
relationships to other objects. The behavior or state model describes the behavior of the objects
in terms of the states the object exists in, the transitions allowed between objects, and the events
that cause objects to change states. These models can be created and maintained using CASE
tools that support representation of objects and object behavior.
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 2
8. Object Oriented Databases

OOA views the world as objects with data structures and behaviors and events that trigger
operations, or object behavior changes, that change the state of objects. The idea that a system
can be viewed as a population of interacting objects, each of which is an atomic bundle of data
and functionality, is the foundation of object technology and provides an attractive alternative
for the development of complex systems. This is a radical departure from prior methods of
requirements specification, such as functional decomposition and structured analysis and design
Object-oriented design (OOD) is concerned with developing an object-oriented model of a
software system to implement the identified requirements. Many OOD methods have been
described since the late 1980s. The most popular OOD methods include Booch, Buhr,
Wasserman, and the HOOD method developed by the European Space Agency OOD can yield
the benefits of maintainability, reusability and productivity.
OOD builds on the products developed during Object-Oriented Analysis (OOA) by refining
candidate objects into classes, defining message protocols for all objects, defining data structures
and procedures, and mapping these into an object-oriented programming language (OOPL) (see
Object-Oriented Programming Languages). Several OOD methods (Booch, Shlaer-Mellor, Buhr,
Rumbaugh) describe these operations on objects, although none is an accepted industry standard.
Analysis and design are closer to each other in the object-oriented approach than in structured
analysis and design. For this reason, similar notations are often used during analysis and the
early stages of design. However, OOD requires the specification of concepts nonexistent in
analysis, such as the types of the attributes of a class, or the logic of its methods.
Design can be thought of in two phases. The first, called high-level design, deals with the
decomposition of the system into large, complex objects. The second phase is called low-level
design. In this phase, attributes and methods are specified at the level of individual objects. This
is also where a project can realize most of the reuse of object-oriented products, since it is
possible to guide the design so that lower-level objects correspond exactly to those in existing
object libraries or to develop objects with reuse potential. As in OOA, the OOD artifacts are
represented using CASE tools with object-oriented terminology.

ª Object Definition Language (ODL) Data Model


Object Definition Language (ODL) is an object-oriented approach in a database design that is
standardized by the ODGM (Object Data Management Group). It is an extension of IDL
(Interface Data Language) a component in the distributed object-oriented computing standard
CORBA (Common Object Request Broker Architecture).
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 3
8. Object Oriented Databases

Ñ Overview of Object Oriented Concept


Some of the basic ideas in object oriented programming are:
˜ Complex Types – Object oriented programming supports rich collection of complex data
types. Typical types that can be constructed in OOP are: Structures (Records), Reference
Types (Pointers) and Collection Types:
˜ Classes and Objects – Classes are types (molds or templates) for similar set of objects.
And objects are instances of classes. Every object has (current) state and behavior.
Example: For a Vehicle class: - Color, Make, Model, Gas are states and Driving,
Parking are behaviors
- Variables represent the state of the object or the class.
- Methods/functions represent the behavior of the object or the class.
- Objects interact with each other through messages that initiate tasks.
˜ Objects Identity – Every object has a unique identity that helps to identify the object
independent of its values.
˜ Inheritance – Objects can inherit properties, both state and behavior, from other objects.
A class hierarchy (super-class and sub-classes) can be formed with the help of inheritance.
Ñ ODL Design and Syntax
Object Definition Language (ODL) is an object oriented approach that models the real world as a
composition of objects. Objects in ODL model are somehow similar to the entities in E/R model.
Real world concepts that are represented by objects of similar description and behavior are
grouped into classes.
ODL is used to define persistent classes, those whose objects may be stored permanently in the
database. The ODL classes are described by three properties (elements):
˜ Attributes – which are values associated with the objects whose types are from the built
in primitive types or a structure of the primitive types.
˜ Relationships – are connections between the object and another object of some class.
˜ Methods – are functions that may be applied to objects of the class.
NOTE: ODL classes look like Entity sets with binary relationships, plus methods.
An ODL class declaration includes:
1. A name for the class.
2. Optional key declaration(s).
Betiglu

3. Extent declaration: - name for the set of currently existing objects of the class.
Department of Electrical and Computer Engineering | AAU
EENG 447- Database Systems 4
8. Object Oriented Databases

4. Element declarations: - An element is either an attribute, a relationship, or a method.


The simplest form of the class declaration is:
class <name> {
<list of element declarations, separated by semicolons>
};

Attributes in ODL
Attributes are (usually) elements with a type that does not involve classes. They can be of simple,
enumerated or structured type. The syntaxes for the three types are as follows, respectively:
attribute <type> <name>;
attribute enum <typename>
{<enumlist1>, <enumlist2>,…}<name>;
attribute struct <typename>
{<type> <name1>, <type> <name2>,…}<name>;

Example:
- Consider the “EMPLOEES” class partial declaration.
class Employee {
attribute string empid;
attribute string name;
attribute integer age;
attribute enum Gender {Male, Female} gender;
attribute struct Address {string city, string hAddr, string phone} address;
};

REMARK: The names for the enumerated and structured data types are not necessary for the
declaration but giving the name helps to refer to the type outside the class
declaration using the scoped name such as, “Employees::Gender” and
“Employees::Address”.

Relationships in ODL
Syntax for a relationship in ODL is as follows
relationship <type> <name>
inverse <relationship>;
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 5
8. Object Oriented Databases

Relationship Types in ODL


The word <type> in the relationship declaration of ODL design refers to the type of relationship
and the related class in the statement.
Example:
- Consider the “EMPLOYEES” class and its relationship to “TEAMS” class.
class Employee {
attribute string empid;
attribute string name;
:
relationship Set <Teams> assigned;
:
};

The basic collection types of attributes and relationships in ODL model are:
1. Set:- Set<T> denotes a relationship to class T with finite number of association between
the objects in the class. It is an unordered set of association which doesn’t allow
repetition.
2. Bag (Multiset):- It is similar to the Set operator which allows repetition of association to
one object.
3. List:- It is an association in which order is material.
4. Array:- Array<T, i> denotes a fixed number of association to objects in the class T which
are indexed.
Inverse Relationships
Unlike E/R design the relationships in ODL model are only binary. Hence for every relationship
in class C there is an inverse relationship in the related class D. Suppose class C has a
relationship R to class D, then class D must have some relationship S to class C. R and S must
then be true inverses. That is; if object d is related to object c by R, then c must be related to d by
S.
Example:
- Consider the “EMPLOYEES” class and its relationship to “TEAMS” class.
class Employee {
relationship Set <Team> assigned;
Betiglu

};

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 6
8. Object Oriented Databases

class Team {
relationship Set <Employee> formed;
};

Inverse relationship in ODL design is represented by using the keyword inverse.


Example:
- Consider the “EMPLOYEES” class and its relationship to “TEAMS” class.
class Employee {
relationship Set <Team> assigned
inverse Team::formed;
};
class Team {
relationship Set <Employee> formed;
inverse Employee::assigned;
};

Multiplicity of Relationships
Multiplicity of relationships in ODL design is indicated by the type operators in the relationship
and the inverse relationship.
˜ Many-to-many relationships: - are indicated by Set<…> for the type of the relationship
and its inverse.
˜ One-to-many relationships: - have Set<…> in the relationship of the “one” and just the
class for the relationship of the “many.”
˜ One-to-one relationships: - have classes as the type in both directions.
REMARK: Note that the Set operator in the relationship type declaration is optional and can be
omitted for one-to-many and one-to-one relationships.
Example:
- In the previous relationship declaration between “EMPLOYEES” class and the
“TEAMS” class there is a many-to-many relationship.
class Employee {
relationship Set <Team> assigned
inverse Team::formed;
};
Betiglu

class Team {

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 7
8. Object Oriented Databases

relationship Set <Employee> formed;


inverse Employee::assigned;
};

- Consider the one-to-many relationship between “PROJECTS” and “CUSTOMERS”


class Project {
relationship <Customer> ownedBy
inverse Customers::owns;
};
class Customer {
relationship Set <Project> owns;
inverse Projects::ownedBy;
};

- Consider the one-to-one relationship between “PROJECTS” and “SOFTWARE”


class Project {
relationship <Software> produce
inverse Software::producedIn;
};
class Software {
relationship <Project> producedIn;
inverse Projects:: produce;
};

NOTE: Recall that ODL does not support 3-way or higher relationships. Multiway relationships
in ODL may be simulated by a “connecting” class, whose objects represent tuples of
objects that will be connected by the multiway relationship.

Inheritance in ODL
Inheritance in ODL is similar to the usual object-oriented inheritance principle. It indicates a
relationship between superclass and subclasses. Subclass lists only the properties unique to it and
it inherits its superclass’ properties.
Inheritance in ODL design is indicated by the colon operator as follows
class <Subclass>:<Superclass> {
<list of element declarations>
};
Betiglu

Example:

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 8
8. Object Oriented Databases

- Consider the “PART-TIME EMPLOYEES” class that is inherited from


“EMPLOYEES” class.
class PartTimeEmployee:Employee {
attribute real hourlPay;
attribute integer contractPeriod;
relationship Set <TimeSchedule> works
inverse TimeSchedule::working;
};

Declaring Keys in ODL


Recall that every object has an object Identifier and keys are attributes that can identify an object
uniquely for the set of objects.
In ODL any number of keys for a class can be declared by adding the following code after the
class name.
(key <list of keys>)

A key consisting of more than one attribute needs additional parentheses around those
attributes.
Example:
class Employee (key EmpId, NationalId, (Name, BDate)) {
attribute string empid;
attribute string name;
:
relationship Set <Teams> assigned;
inverse Teams::formed;
:
};

For each class in ODL there is an extent, the set of existing objects of that class. Extent is the
relation with that class as its schema (definition). It is indicated after the class name, along with
keys, as:
(extent <extent name> key <list of keys>)

Conventionally, singular nouns are used for class names and plural for the corresponding extent.
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 9
8. Object Oriented Databases

Example:
class Employee (extent Employees key EmpId, NationalId, (Name, BDate)) {
attribute string empid;
attribute string name;
:
relationship Set <Teams> assigned;
inverse Teams::formed;
:
};

ª Object-Relational Data Model


ODL model to relational model mapping is similar to the mapping process of E/R model to
relational.

ODL Class to Relations


ODL classes are directly mapped to relations in relational model with their attribute as in the
case of E/R. Unlike E/R entity sets ODL classes optionally define key, in such events it is
necessary to add new attribute as a primary key for the relation.
Non-atomic attributes in ODL classes are represented by simply expanding the structure
definition and making one attribute for each field of the structure.
Example:
- Consider the “EMPLOEES” class partial declaration in chapter 2, the corresponding
relational model is:
class Employee (extent Employees key EmpId, NationalId, (Name, BDate)){
attribute string empid;
attribute string name;
attribute integer age;
attribute enum Gender {Male, Female} gender;
attribute struct Address {string city, string hAddr, string phone} address;
};

Then the corresponding relational model is


- Employees(EmpId, Name, Age, Gender, City, HAddr, Phone)
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 10
8. Object Oriented Databases

Set-valued attributes: Recall that attributes in ODL class can be constructed from the type
constructors Set, Bag, List and Array. Such cases can be handled by three different methods:
1. By making one tuple for each value.
Example:
- Consider the “EMPLOEES” class having a set of addresses:
class Employee (extent Employees key EmpId, NationalId, (Name, BDate)){
:
attribute struct set Address {string city, string hAddr, string phone} address;
};

Then the corresponding relational model is


- Employees(EmpId, Name, Age, Gender, City, HAddr, Phone)
The relational schema may have the following instance
2. By separating out each set-valued attribute into a new relation and establishing a many-
to-many relationship.
3. By having multiple attributes sets for each set-valued attribute. This is applicable only if
the type constructor is fixed size array.

ODL Relationships to Relations


Recall that ODL relationships are represented in pair as inverse relationship, hence in
representing ODL relationships only one of the declarations are used. Similar to E/R
relationships ODL relationships can also be represented by a relation having the primary keys of

Employees
EmpId Name BDate Sub City Kebele Phone
E001 Alemu Girma 01/10/70 Bole 06 011-663-0712
E001 Alemu Girma 01/10/70 Bole 06 011-663-0712
E004 Kelem Belete 12/04/68 Gulele 03 011-227-2525

the related classes as attributes.


Example:
- Consider the relationship between “EMPLOYEES” class and the “TEAMS” class.
class Employee {
relationship Set <Team> assigned
inverse Team::formed;
Betiglu

};

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 11
8. Object Oriented Databases

class Team {
relationship Set <Employee> formed;
inverse Employee::assigned;
};

Then the corresponding relational models are


- Employees(EmpId, Name, Age, Gender, City, HAddr, Phone)
- Teams(TeamId, Name, Descr)
- Assigned(EmpId, TeamId)
For many-to-one or one-to-one relationship the relationship relation can be combined with the
many side relation as in the case of E/R relationship.
Example:
- Consider the one-to-many relationship between “PROJECTS” and “CUSTOMERS”
class Project {
relationship <Customer> ownedBy
inverse Customers::owns;
};
class Customer {
relationship Set <Project> owns;
inverse Projects::ownedBy;
};

Then the corresponding relational models are


- Projects(ProjId, Name, SDate, DDate, CustId)
- Customers(CustId, Name, Address)

ª Object Oriented Databases


With Object-Oriented databases, the application and the database use exactly the same object
model. This isn’t the case with relational databases, with which users must utilize an object
model for the application and a relational-data model for the database. Users thus must develop
mapping procedures between the object and relational models.
OODBs are designed for the purpose of storing and sharing objects; they are a solution for
persistent object handling. Persistent data are data that remain after a process is terminated.
There is no universally-acknowledged standard for OODBs. There is, however, some
Betiglu

commonality in the architecture of the different OODBs because of three necessary components:

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 12
8. Object Oriented Databases

object managers, object servers, and object stores. Applications interact with object managers,
which work through object servers to gain access to object stores. OODBs provide the following
benefits:
˜ OODBs allow for the storage of complex data structures that cannot be easily stored
using conventional database technology.
˜ OODBs support all the persistence necessary when working with object-oriented
languages.
˜ OODBs contain active object servers that support not only the distribution of data but
also the distribution of work (in this context, relational database management systems
(DBMS) have limited capabilities)
In addition, OODBs were designed to be well integrated with object-oriented programming
languages such as C++ and Smalltalk. They use the same object model as these languages. With
OODBs, the programmer deals with transient (temporary) and persistent (permanent) objects in
a uniform manner. The persistent objects are in the OODB, and thus the conceptual walls
between programming and database are removed. As stated earlier, the employment of a unified
conceptual model greatly simplifies development.
The type of database application should dictate the choice of database management technology.
In general, database applications can be categorized into two different applications:
˜ Data collection applications focus on entering data into a database and providing queries
to obtain information about the data. Examples of these kinds of database applications are
accounts payable, accounts receivable, order processing, and inventory control. Because
these types of applications contain relatively simple data relationships and schema design,
relational database management systems (RDBMs) are better suited for these
applications.
˜ Information analysis applications focus on providing the capability to navigate through
and analyze large volumes of data. Examples of these applications are CAD/CAM/CAE,
production planning, network planning, and financial engineering. These types of
applications are very dynamic and their database schemas are very complex. This type of
application requires a tightly-coupled language interface and the ability to handle the
creation and evolution of schema of arbitrary complexity without a lot of programmer
intervention. Object-oriented databases support these features to a great degree and are
therefore better suited for the information analysis type of applications.
OODBs are also used in applications handling BLOBs (binary large objects) such as images,
sound, video, and unformatted text. OODBs support diverse data types rather than only the
simple tables, columns and rows of relational databases.
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 13
8. Object Oriented Databases

An Object-Oriented Database-Management System (ODBMS) supports the modeling and


creation of data as objects. Users can support new media types with OO databases simply by
creating new objects.
The Object Database Management Group has developed ODMG standards for object-database
and object-relational-mapping products since 1993. However, the Object Database Management
Group hasn’t promoted ODMG widely within the ODBMS community and needs more vendors
on board to make the standard an important factor in the industry, said Philip Russom, director
of data warehousing and business-intelligence services for the Hurwitz Group consultancy.
For example, he said, the group created an OQL (Object Query Language) standard, but very
few database vendors implemented it.
A couple of years ago, industry observers touted object-oriented databases as a technology on
the rise, well suited for the emerging Internet age. They said object-oriented database-
management systems (ODBMSs) would soon become the primary database technology,
supplanting relational database-management systems (RDBMSs), which were not designed to
handle the type of multimedia data frequently found on the Internet. As further evidence of this,
they said the growth of intranets signaled a decline in the use of client-server networks, on
which most relational databases were used.
RDBMS programmers sometimes spend more than 25 percent of their coding time mapping
program objects to the database. The result for ODMBSs is less code to develop, reduced
development time, and reduced maintenance costs.
Meanwhile, OO databases are well suited for use with applications that must manage complex
relationships among data objects.
However, OO databases don’t scale up to high transaction volumes and user counts as well as
relational databases. For this and other reasons, although OO databases excel at handling some
data types, they have not become major players in the database market.
Fast-forward to today, and none of these predictions has come to pass. Relational databases are
still by far the most widely used databases. Meanwhile, object-relational database-management
systems (ORDBMSs) have added object capabilities to relational databases. They are gaining in
popularity and are expected to outsell even relational databases by 2003. And OO are still minor
players with solid but strictly niche markets. Sales of relational databases have grown
considerably faster than the sales of OO databases, and annual worldwide RDBMS revenues are
now about 50 times larger.
Several technical issues have led to OO databases’ limited strength in the database marketplace.
Some of them are:
˜ Object-relational databases. RDBMS vendors began developing and marketing OR
Betiglu

databases in part in response to the perceived threat from OO databases. OR databases

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 14
8. Object Oriented Databases

work via an object layer that sits atop a conventional tabular relational engine. Vendors
integrate OO features into the databases via software modules (such as Informix’s
DataBlades or Oracle’s Cartridges), each designed to handle video, audio, text, or other
types of media. So, in addition to handling the numerical data generally used in relational
databases, OR databases can handle multimedia data types.
˜ Performance. OO databases can store data sets in their entirety and thus typically run
faster than relational databases, which must break data sets into parts for storage within
tables and then reassemble them in response to queries. In addition, said Sun’s Cattell,
OO databases can automatically cache data in the client application’s memory, thereby
eliminating extra calls to the DBMS’s back end and speeding up responses. And OO
databases use optimizers that determine the best way to use a database’s indices and
physical layout to satisfy a query. However, relational databases have reduced OO
databases’ performance advantage with improved optimizers. The optimizers improve
ways of finding information within relational databases’ tables and indices.
˜ Standardization. Relational databases use the long-established SQL (Structured Query
Language) standard, which has been adopted by the International Organization for
Standardization (ISO) and the American National Standards Institute (ANSI). SQL, used
for querying and updating a relational database, serves as a user interface and application
program interface to an RDBMS.
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 4447- Database Systems 1
9. Introduuction to Parallel and Distribbuted Databasee Systems

9. Int
troduc tion to
o Paral
llel an d Distr
ributed
d
Dattabase System
ms
A databbase system can have vaarious architeecture such as:
˜ Client – Seerver Datab
base System
m: A system with
w task shaare between
n server and client.
˜ Parallel Database
Da Sysstem: Speeds up processsing within a system byy the use off parallel
query proceessing.
˜ Distributed
d Databasee System: Data
D are distrributed acrooss sites keepping them closer
c to
where they are generateed and needed often.
Ñ Clientt – Serve r System
m
In a clieent-server sy
ystem the daatabase funcctionalities are
a broadly divided
d into:
˜ Server, and
˜ Client

Clients run n client (froont-end) proocesses such


h as: User innterface, forrm interfacee, report
interface, grraphical interface,…
Servers run
n server proocesses having SQL enngine responnsible for trransaction and
a data
managemen nt.
Betiglu
B ti l

Departm
ment of Electrrical and Com
mputer Engineeering | AAU
EENG 4447- Database Systems 2
9. Introduuction to Parallel and Distribbuted Databasee Systems

Ñ Paralllel Datab
base Systtems
Parallell Systems im
mprove processing and d I/O speeds. Parallel Database
D Syystems uses parallel
processsing to improove query peerformance.
Important issues in
n parallel dattabase systems (parallel systems) arre:
˜ Speed Up – response tim
me
˜ Scale Up – throughput
t

Paral lel Datab


base Arc
chitecturres

˜ Shared Mem
mory ˜ Shared Nothing
N

˜ Share Disk

˜ Hierarch
hical
Betiglu
B ti l

Departm
ment of Electrrical and Com
mputer Engineeering | AAU
EENG 447- Database Systems 3
9. Introduction to Parallel and Distributed Database Systems

I/O Parallelism
I/O parallelism in a parallel database system refers to reducing the time required to retrieve
relations (data) from disk by partitioning the relations on multiple disks.
Horizontal Partitioning is a method that can be used in I/O parallelism that clusters tuples of a
relation. Some of the partitioning techniques are:
˜ Round-Robin
˜ Hash
˜ Range

Interquery Parallelism
In interquery parallelism queries or transactions are executed in parallel. The primary use is to
Scale Up transaction processing system.
It is harder to implement in share nothing and shared disk architectures as introduces Cache –
coherency problem that can be handled by the use of locking mechanizim.
˜ Lock the page – read/write – flush page – release lock.

Intraquery Parallelism
Intraquery parallelism refers to executing a single query or transaction in parallel on multiple
processors and disks. Its primary use is Speed Up running query.
˜ Intraoperation Parallelism: parallelizing the execution of each individual operations
such as: sort, select, …
˜ Interoperation Parallelism: parallelizing the different operations in query execution.
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 4447- Database Systems 4
9. Introduuction to Parallel and Distribbuted Databasee Systems

Ñ Distri buted Da
atabase Systems
The daatabase is stoored on sevveral computters (known
n as sites or nodes) thatt communicaate on a
network.

˜ Main reason
ns for distribbuted databaase:
¾ Sharring: share data
d across sites.
s
¾ Auto
onomy: each
h site has a degree
d of control over th
he data sharred locally.
¾ Availability: if one
o fails oth
her sites will remain in service.
˜ Implementaation issues
¾ Atom
micity
¾ Tran
nsaction Co
ommit Prottocols
¾ Con
ncurrency Control
C
„ Locking,, deadlock haandling
˜ Complexity
y
¾ Softw
ware develoopment cost
¾ Greaater potentiaal for bug
¾ Incrreased processsing overheead
Betiglu
B ti l

Departm
ment of Electrrical and Com
mputer Engineeering | AAU
EENG 447- Database Systems 5
9. Introduction to Parallel and Distributed Database Systems

Distributed Database Architectures

Distributed
Databases

Single - DBMS Multi - DBMS

Homogenous Heterogeneous
(Unfederated) (Federated)

Loose Schema Tight Schema


Coupling Coupling

Single Federated Multiple Federated


Schema Schema

Distributed Data Storage


The two most common approaches in distributed database data storage are:
˜ Replication: maintain several identical copies.
˜ Fragmentation: maintain partition of relations into several segments located at different
sites.
Combination of the two approaches can also be used.
˜ Replication
¾ Advantage
„ Availability
„ Increased Parallelism: minimize data movement.
¾ Disadvantage
„ Increased overhead un Update [Consistency]
Betiglu

Department of Electrical and Computer Engineering | AAU


EENG 447- Database Systems 6
9. Introduction to Parallel and Distributed Database Systems

˜ Fragmentation
¾ Horizontal Fragmentation
A relation, r is fragmented into a subset of relations r1, r2, r3, … rn
„ r = r1 ∪ r2 ∪ r3 ∪ … rn
¾ Vertical Fragmentation
The schema of the relation, r(R) is fragmented into a subset schemas R1, R2, R3, …
Rn
„ R = R1 ∪ R2 ∪ R3 ∪ … Rn
The fragmentation is done to have
„ r = r1 ZY r2 ZY r3 ZY … rn

Data Transparency in Distributed Database


˜ Distribution or Network Transparency
¾ Naming transparency
¾ Location transparency
˜ Replication Transparency
˜ Fragmentation Transparency
Betiglu

Department of Electrical and Computer Engineering | AAU

You might also like