DBMS

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 294

Chapter – 1: An Introduction to Database Systems

Writer: Dr. Kanwal Garg


Vetter: Prof. Rajender Nath
Structure:
1.1 Introduction
1.2 Objective
1.3 Presentation of Content
1.3.1 Data
1.3.2 Information
1.3.3 Knowledge
1.3.4 Difference between Data, Information and Knowledge
1.3.5 Manual Data Processing and its limitation
1.3.6 File Processing System and its Limitation
1.3.7 Database Approach
1.3.8 Characteristics of Database
1.3.9 The Database Management System
1.3.10 Advantages of Database Management Systems
1.3.11 Disadvantages of Database Management Systems
1.3.12 Difference between File System Approach and Database Management System
1.3.13 DBMS Terminology
1.3.14 Component of Database System
1.3.15 Database Administrator (DBA)
1.3.16 Disk Manager
1.3.17 Data Manager
1.3.18 File Manager
1.4 Summary
1.5 Suggested Reading/ Reference Material
1.6 Self Assessment Questions (SAQ)

1
1.1 Introduction

The word data refers to information or facts usually collected as the result of experience,

observation or experiment, or processes within a computer system, or premises. Data may consist

of numbers, words, or images, particularly as measurement or observations of a set of variables.

Data are often viewed as lowest level of abstraction from which information and knowledge are

derived.

A Database is an organized collection of data. The term originated with in the computer industry,

but its meaning has been broadened by popular use, to the extent that the some database

directives include electronic databases within its definition. A database is controlled by a

sophisticated software package called database management system. It has programs to set up

storage structures, load the data, and accept data request from programs and users.

Database plays important roles in all areas where computers are included such as library,

education, medicine, science etc. when you buy some goods from the market then you are simply

use the concept of databases. Databases play a crucial role in the growth of computer industry.

Before we start the concept of databases lets first start the basics of data and information.

1.2 Objective

This chapter discusses about the basic concept of database management system and provides an

excellent discussion about data, information and knowledge. It includes differentiation between

these three basic terms. This chapter comprise of file processing system and database system

along with its advantages and disadvantages. It defines the basic DBMS terminology and

explains the database components along with the brief role of people who design and manage

the database management system.

1.3 Presentation of Content

2
1.3.1 Data

Data is the term, which is very simple to grab. Data is defined as collection of meaningful facts

which can be stored and processed by the human or computer. In other words data is the material

on which computer program work upon. The word raw indicates that the facts have not yet been

processed to reveal their meaning. Data can be a number, letter or alphabet, word and special

symbol etc. Data can exist in any form, usable or not. It does not have meaning of itself. In

computer parlance, a spreadsheet generally starts out by holding data.

Example 1: The following sequence of digit 230504 is meaningless by itself since it could refer

to a part number of automobile, date of birth, the number of rupees spent on a project, population

of a town and so on. Therefore this sequence of digit would be considered as data.

Example 2: A set of words like ―Aryan, mathematics, highest mark, annual examination‖ would

be considered as data; since it is meaningless.

Figure 1.1: Data

1.3.2 Information

When the data is processed and converted into a meaningful and useful form, it is known as

information. Information will be generated after arranging data into a suitable and meaningful

form. For a business to be successful, a fast access to information is vital as important decisions
3
are based on the information available at any point of time. Such information can then be used as

the foundation for decision making. Traditionally, the data was stored in voluminous repository

such as files, books and ledgers.

However, storing data and retrieving information from these repositories was a time consuming

task. With the development of computers, the problem of information storage and retrieval was

resolved. Computers replaced tons of paper, file folders, and ledgers as the principal media for

storing important information. In computer parlance, a relational database makes information

from the data stored within it.

Example 3: In example 1, if we know the sequence of data refers to, then it becomes meaningful

and can be called information. When we write above as 23-05-07, it may mean date of birth.

Example 4: If the data mention in example 2 is processed as ―Aryan secured highest marks in

mathematics in annual examination‖ is information.

Figure 1.2: Information

1.3.3 Knowledge

Knowledge is the appropriate collection of information, such that its intent is to be useful.

Knowledge is a deterministic process. Knowledge is derived from information in the same way

information is derived from data. When someone "memorizes" information, then they have

amassed knowledge. This knowledge has useful meaning to them. It can be considered as the

integration of human perceptive processes that helps them to infer further knowledge.

4
Example 5: Elementary school children memorize, or amass knowledge of, the "number table".

They can tell you that "2 x 2 = 4" because they have amassed that knowledge (it being included

in their number table). So, knowledge is a continually gaining process from the information they

acquire from their day to day life.

Figure 1.3: Knowledge

Knowledge adds understanding and retention to information. It is the next natural progression

after information. Therefore if we want to have appropriate knowledge; we must have right

information. To have right information; we must have complete and correct data. Therefore

while maintaining the data is the database we must ensure that there should not be any missing

value, noise, inconsistency and incompleteness in data.

1.3.4 Difference between Data, Information and Knowledge

Data Information Knowledge

Data represents unorganized Information can be Knowledge is derived from

and unprocessed facts and considered as an aggregation information in the same way

figure. of data (processed data). information is derived from

data.

Data is not significant to a Information is significant to Knowledge always helps to a

business and of itself. business and of itself business for decision making

process.

5
Data is a prerequisite to Processed form of data Knowledge is usually based

information. represent to information. on learning, thinking, and

proper understanding of the

information

Data must be interpreted, by a Information is exchangeable Continuously gaining process

human or machine, to derive amongst people, about things, of information results into

meaning. facts and concept etc. knowledge.

Data can be a number, letter, Information provides answers Knowledge answers "how"

alphabet, word and special to "who", "what", "where", questions.

symbol etc. and "when" questions.

Example: In healthcare Example: The processed data Example: The trend of vital

industry data includes vital leads to certain information signs over time provides a

signs, weight and relevant which helps in providing the pattern that may lead to

assessment parameters. right clinical treatment. important decisions.

3.5 Manual Data Processing and its Limitation

In the early systems data was handled manually by the different users. The human being as the

users manages the whole database without the support of computers. It has got many problems

which are explained as below:

1. Address Dictionary: In an address book, numbers of pages are pre-allotted for writing

the address starting with the specific alphabet. Let it is ―A‖. Now if you start writing the

address related to names beginning with ―A‖; and if number of pages allotted to alphabet

―A‖ finished. Then it becomes a problem. One solution to the problem is to buy a new

6
address book with larger size and to transfer all the previous addresses in the new one.

This solution is very tiring and time consuming process. The second solution is to use

some blank pages at the end of same address book. This process is again cumbersome

because if you want to search the address for a specific person then you have to scroll the

allotted pages to that alphabet and also to search the last pages of that address book. So

searching has to perform twice at two different places in the same address book.

2. Repeated Transaction: There are many transactions which occur repeatedly on day to

day, week to week and month to month basis. For example to make the salary calculation

all payroll transactions are recorded manually in the ledger for a month and same

transactions are recorded again manually for the next month and so on. It‘s a just

calculation task and does not require any logic or intelligence. Therefore is not a wise

decision to waste the human skill and intelligence on such repetitive calculations.

3. Searching Process: Searching for a single entry in large number of manual records is

very difficult. For example in a publishing company Mr. Arpit is a subscriber. Now he

wants to renew his subscription. For the purpose he sent a cheque to the publishing

company. In this case the publisher has to search the all big list of subscriber to find out

the subscription number of Mr. Arpit. It is a boring and tiring job.

4. Updating the Manual Records: It is a difficult task to update the records of manual

database. First issue is the identification of appropriate record to be updated and the

second issue is the problem of overwriting. For security aspects we generally avoid

overwriting in the records because it may give a wrong impression to the reader of that

record.

7
Hence, when the database is large in size and difficult to manage then it is better to use

computers; rather than manually handling and processing the database.

1.3.6 File Processing System and its Limitation

File processing systems was an early attempt to computerize the manual filing system that we are

all familiar with. A file system is a method for storing and organizing computer files and the data

they contain to make it easy to find and access them. The manual filing system works well when

the number of items to be stored is small. It even works quite adequately when there are large

numbers of items and we have only to store and retrieve them. However, the manual filing

system breaks down when we have to cross-reference or process the information in the files.

There are following problems associated with the File Based approach:

1. Duplication in data: In this system data stored in the files are independent to each other.

Therefore, there is possibility of storing the same data in the multiple files. This causes to

duplication in data.

Example 6: Student roll number may be stored twice in two different files.

2. Inconsistency of data: Since in file processing system, the files being maintained are

independent to each other. It means that there is no relationship among them. Therefore,

if any data item is to be changed then all files containing related data need to be updated.

The problem arise if all the files might not be updated causing to inconsistency.

Example 7: Employee qualification may be maintained in two or my files. In case of

improvement in qualification; the data item may be updated only in one file and rest of

the file might not be updated. This results into inconsistency.

3. Lack of Data integrity: It is problem of ensuring that the data in the database is

accurate. In any application there are certain data integrity rules in the form of certain

8
condition and constraints that need to be maintained. In the file system it is not possible

to change the application program to apply such rules because these programs are

application dependent. So, it is difficult to maintain data integrity.

Example 8: The integrity constraint that the phone number of the student should be of 10

digits only, has to be implemented in all application programs using student file. For one

application, it is quite easy to incorporate this integrity rule, but for a number of

application programs it may be quite difficult to maintain.

4. Lack of Security: In a file system security constraints are not easy to enforce because

data is stored in different independent files. Therefore unauthorized users can destroy the

file data and it leads to data inconsistency.

Example 9: Any unauthorized user can access your files and can perform any fraudulent

operation on your data.

5. Data dependence: Data is stored in the files and files are maintained to fulfill the

requirement of application program. All application programs are independent to each

other. Therefore if any change took place in any data item, then it must be updated in the

entire application programs using that data. This is called as data dependence.

Example 10: Let an organization changes there employee id‘s from 6-digit to 10- digits,

all the application program that uses the data item have to be modified.

6. Difficult to share data: Files maintain the data may be of different format. Therefore

format of data stored in one file will differ from the format of data stored in other file. If

at any time data of these two files are need to be shared then different data format will

cause a problem. The solution to the problem is to develop an interface which further is a

time consuming process.

9
Example 11: Gender of MBA students is stored as ―1‖ and ―0‖ (Where ―1‖ stands for

male and ―0‖ stands for female); hence data type may be number, whereas gender of

MCA students is stored as ―M‖ and ―F‖ (Where ―M‖ stands for male and ―F‖ stands for

female); hence data type may be character. If we want to calculate the total number of

male and female in MBA and MCA then data type will create a problem.

7. Difficult to get quick response: Queries in the application program are written to meet

the specific requirements. If any clause of the query change, then it becomes difficult to

retrieve quick response.

Example 12: Suppose there is a condition that a student; whose age is 35 to 40 year can

only applies for a specific job. But if this age criteria changes from 30 to 35 years; then

respective changes has to be incorporated on all the related queries belonging to that

application program first. So it will delay the response time.

8. Concurrent problem: In a file system, when two or more users access the same data file

for read and write operations, then problem of concurrency may arise which leads to data

in inconsistent state.

Example 13: Suppose a spouse opens a bank account with a balance of Rs. 5000. After

some day husband withdraws Rs. 500 and balance remains as Rs. 4500, at the same time

wife also withdraws Rs. 700 while having the impression that balance would be Rs. 5000.

Since both transactions are executing concurrently therefore the problem of concurrency

arises.

9. Inadequate to Represent Data Modeling of Real World: Data in the file system is

simple maintained to support only an application program. It does not show any

10
relationship among data in different files. Moreover complex data cannot define in the

file system.

10. Difficulty in Data Representation from User’s View: To create useful application for

the user, it is necessary to combine the data of different files. But in file system

independent and isolated data is recorded and relationship among them is very hard to

determine. Therefore data in file system do not meet the user‘s requirement.

In order to remove all the above limitations of file based approach, a new approach was required

that must be more effective known as Database Approach.

1.3.7 Database Approach

The database is a shared collection of logically related data, designed to meet the information

needs of an organization. A database is a computer based record keeping system whose over all

purpose is to record and maintains information. The database is a single, large repository of data,

which can be used simultaneously by many departments and users. Instead of disconnected files

with redundant data, all data items are integrated with a minimum amount of duplication. The

database is no longer owned by one department but is a shared corporate resource. The database

holds not only the organizational operational data but also a description of this data. For this

reason, a database is also defined as a self-describing collection of integrated records. The

description of data is known as Data Dictionary or Meta Data. It is the self describing nature of a

database that provides program-data independence.

A database implies separation of physical storage from use of the data by an application program

to achieve program/ data independence. Using a database system, the user or programmer or

application specialist need not know the details of how the data are stored and such details are

transparent to the users. Changes can be made to data without affecting other components of the

11
system. These changes include change of data format or file structure or relocation from one

device to another.

1.3.8 Characteristics of Database

The data in a database should have the following features:

1. Shared: data in a database are shared among different users and applications.

2. Persistence: Data in a database exist permanently in the sense the data can live beyond

the scope of the process that created it.

3. Validity/ Integrity/ Correctness: Data should be correct with respect to the real world

entity that they represent.

4. Security: Data should be protected from unauthorized access.

5. Consistency: Whenever more than one data elements in a database represent related real

world values, the values should be consistent with respect to the relationship.

6. Non-redundancy: No two data-items in a database should represent the same real world

entity.

7. Independence: Data at different level should be independent of each other so that the

changes in one level should not affect the other levels.

To create, manage and manipulate data in databases, a management system known as database

management system was developed.

1.3.9 The Database Management System

A database management system (DBMS) is a general-purpose software system. It is a collection

of programs that enables users to define, create and maintain a database and provide controlled

access to the data. Defining a database involves specifying the data types, structures, and

constraints for the data to be stored in the database. Database may be defined as repository of

12
data for an organization such that it can be shared and integrated. Creating the database is the

process of storing the data itself on some storage medium that is controlled by the DBMS.

Manipulating a database includes such functions as querying the database to retrieve specific

data, updating the database to reflect changes in the real world, and generating reports from the

data. There are different types of DBMS ranging from small systems that run on personal

computers to huge systems that run on mainframes. The following are main examples of

database applications:

1. Computerized Library System

2. Automated teller machines

3. Railway/ Flight reservation systems

4. Computerized inventory systems and so on.

These systems allow users to create, update and extract information from their databases.

Compared to a manual filling system, the biggest advantages to a computerized database system

are speed, accuracy and accessibility. The other advantages of a DBMS are as follows.

1.3.10 Advantages of Database Management Systems

The database Management System has promising potential advantages, which are explained

below:

1. Data independence: Application programs should be as independent as possible from

details of data representation and storage. The DBMS can provide an abstract view of the

data to insulate application code from such details.

2. Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the

data in such a manner that users can think of the data as being accessed by only one user

at a time. Further, the DBMS protects users from system failures.

13
3. Reduced application development time: Clearly, the DBMS supports many important

functions that are common to many applications accessing data stored in the DBMS.

This, in conjunction with the high-level interface to the data, facilitates quick

development of applications. Such applications are also likely to be more robust than

applications developed from scratch because many important tasks are handled by the

DBMS instead of being implemented by the application.

4. Reduction of Redundancies: Centralized control of data by the DBA avoids

unnecessary duplication of data and effectively reduces the total amount of data storage

required. It also eliminates the extra processing necessary to trace the required data in a

large mass of data.

5. Elimination of Inconsistencies: The main advantage of avoiding duplication is the

elimination of inconsistencies that tend to be present in redundant data files. Any

redundancies that exist in the DBMS are controlled and the system ensures that these

multiple copies are consistent.

6. Shared Data: A database allows the sharing of data under its control by any number of

application programs or users. For example, the applications for the public relations and

payroll departments can share the same data.

7. Integrity: Centralized control can also ensure that adequate checks are incorporated in

the DBMS to provide data integrity. Data integrity means that the data contained in the

database is both accurate and consistent. Therefore, data values being entered for the

storage could be checked to ensure that they fall within a specified range and are of the

correct format.

14
8. Security: Data is of vital importance to an organization and may be confidential. Such

confidential data must not be accessed by unauthorized persons. The DBA who has the

ultimate responsibility for the data in the DBMS can ensure that proper access procedures

are followed, including proper authentication schemes for access to the DBMS and

additional checks before permitting access to sensitive data. Different levels of security

could be implemented for various types of data and operations.

9. Conflict Resolution: Since the database is under the control of the DBA, he/she should

resolve the conflicting requirements of various users and applications. In essence, the

DBA chooses the best file structure and access method to get optimal performance for the

response-critical applications, while permitting less critical applications to continue to

use the database, albeit with a relatively slower response.

10. Standards can be enforced: Since all access to the database must be through DBMS so

standards can be enforced. Standards may relate to naming of data, format of data,

structure of data etc. Standardizing stored data formats is usually desirable for the

purpose of data interchange or migration between systems.

1.3.11 Disadvantages of Database Management Systems

Although there are many advantages of DBMS, the DBMS may also have some minor

disadvantages. These are as follow:

1. Cost of software/hardware and migration: A significant disadvantage of the DBMS

system is cost. In addition to the cost of purchasing or developing the software, the

hardware has to be upgraded to allow for the extensive programs and work spaces

required for their execution and storage. The processing overhead introduced by DBMS

to implement security, integrity, and sharing of the data causes a degradation of the

15
response and through-put times. An additional cost is that of migration from a

traditionally separate application environment to an integrated one.

2. Problem associated with centralization: While centralization reduces duplication, the

lack of duplication requires that the database be adequately backed up so that in the case

of failure the data can be recovered. Centralization also means that the data is accessible

from a single source. This increases the potential severity of security breaches and

disruption of the operation of the organization because of downtimes and failures. The

replacement of a monolithic centralized database by a federation of independent and

cooperating distributed databases resolves some of the problems resulting from failures

and downtimes.

3. Complexity of Backup and Recovery: Backup and recovery operations are fairly

complex in a DBMS environment, and this is exacerbated in a concurrent multi user

database system. Furthermore, a database system requires a certain amount of controlled

redundancies and duplication to enable access to related data items.

4. Cost of Data Conversion: When a computer file-based system is replaced with a

database system, the data stored into file must be converted to database file. It is very

difficult and costly method to convert data files into database. For the purpose, we have

to hire database and system designer along with application programmer.

5. Cost of staff training: Most of DBMSs are often complex system so the training for

users to use DBMSs is required. Training is required to all levels including programming

to application development and database administration.

16
6. Database damage: In most of the organizations all data is integrated into a single

database. If database is damaged due to electricity failure or database is corrupted on the

storage media then your valuable data may be lost forever.

7. High cost of DBMS: Because a complete DBMS is very large and sophisticated piece of

software so it is expensive to purchase.

8. Slower processing in some applications: Although integrated database is designed to

provide better information, certain applications may be slower due to the integration of

data.

1.3.12 Difference between File System Approach and Database Management System

File System Approach Database Management System

1. It is used for small system. 1. It is used for large system.

2. It is relatively cheaper system. 2. It is relatively more expensive.

3. Large files are stored under this system. 3. Fewer files are stored under this system.

4. Data is stored in the form of files. 4. Data is stored in the form of tables.

5. It has simple structure. 5. It has comparatively complex structure.

6. The data may have redundancy under this 6. Under this system there is reduced

system. redundancy.

7. Data is isolated from other. 7. Data can be shared under this system.

8. There is no security and integrity of data. 8. It maintains security and integrity of data.

9. Backup and recovery process is simple in 9. Backup and recovery process is complex in

this system. this system.

10. Its examples are C, COBOL. 10. Its example is oracle Oracle, SQL.

11. There is no data independence. 11. There exists data independence.

17
1.3.13 DBMS Terminology

Database – A collection (or list) of information. A database is comprised of one or more lists

(called tables) of data organized by columns, rows, and cells.

Tables – The view that displays the data base as a combinations of rows (records) and columns

(fields). The cells contain the bits and pieces of data for each record in each field. The first row

of a table is reserved for the field names.

Field names – Identify the different categories in a database. The top row is reserved for field

names. Examples of field names are First name, last name, address, city, state, zip, phone

number.

Field – Categories in a database. Fields are displayed in columns. For Example, in a database,

the address field contains the address for each of the records. These are the bits and pieces of

data.

Field Name Record Cell


Table Name: Student

Student_id Student_name Student_age Student_grade

101 Arpit 19 Post graduate

102 Siddhant 15 Under graduate

103 Aryan 16 Under graduate

104 Satvik 19 Post graduate

Figure 1.4: Basic Terminology of DBMS

18
Records – Related information that is separated by columns or fields. A name and address are

considered one record in the database. A second Name and address are a different record.

Cells - The intersection of columns and rows that contain the data for each record.

Data – All of the records of information in a database including the field names i.e. Data + Field

Names = Records & All Records = a Database.

1.3.14 Component of Database System

A database system is composed of four components;

1. Data

2. Hardware

3. Software

4. Users

These components coordinate with each other to form an effective database system.

1. Data - It is a very important component of the database system. Most of the organizations

generate, store and process large amount of data. The data acts a bridge between the

machine parts i.e. hardware and software and the users which directly access it or

access it through some application programs. Data may be of different type as

explained below:

a) User Data - It consists of a table(s) of data called Relation(s) where Column(s) are

called fields or attributes and rows are called Records for tables. A Relation must be

structured properly.

b) Metadata - A description of the structure of the database is known as Metadata. It

basically means "data about data". System Tables store the Metadata which includes.

- Number of Tables and Table Names

19
- Number of fields and field Names

- Primary Key Fields

- Null Constraint

c) Application Metadata - It stores the structure and format of Queries, reports and

other applications components.

Figure 1.5: Data Base System

2. Hardware - The hardware consists of the secondary storage devices such as magnetic disks

(hard disk, zip disk, floppy disks), optical disks (CD-ROM), magnetic tapes etc. on

which data is stored together with the Input/ Output devices (mouse, keyboard,

printers), processors, main memory etc. which are used for storing and retrieving the

data in a fast and efficient manner. Since database can range from those of a single user

with a desktop computer to those on mainframe computers with thousands of users,

therefore proper care should be taken for choosing appropriate hardware devices for a

required database.

3. Software - The Software part consists of DBMS which acts as a bridge between the user and

the database or in other words, software that interacts with the users, application

20
programs, and database and files system of a particular storage media (hard disk,

magnetic tapes etc.) to insert, update, delete and retrieve data. For performing these

operations such as insertion, deletion and updation, we can either use the Query

Languages like SQL, QUEL, Gupta SQL or application software such as Visual 3asic,

Developer etc.

4. Users - Users are those persons who need the information from the database to carry out their

primary business responsibilities i.e. Personnel, Staff, Clerical, Managers, and

Executives etc. On the basis of the job and requirements made by them they are

provided access to the database totally or partially. The people who work with

databases include database users, system analysts, application programmers, and

database administrator (DBA).

Database users are those who interact with the database in order to query and update the

database, and generate reports. Database users are further classified into the following categories:

a) Naive users: The users who query and update the database by invoking some

already written application programs. For example, the owner of the bookstore enters the

details of various books in the database by invoking appropriate application program. The

naive user interacts with the database using form interface.

b) Sophisticated users: The users, such as business analyst, scientist, etc., who are

familiar with the facilities provided by a DBMS interact with the system without writing

any application programs. Such users use database query language to retrieve information

from the database to meet their complicated requirements.

c) Specialized users: The users who write specialized database programs, which are

different from traditional data processing applications, such as banking and payroll

21
management which use simple data types. Specialized users write applications such as

computer-aided design systems, knowledge-base and expert systems that store data

having complex data types.

d) System analysts: The users determine the requirements of the database users

(especially naive users) to create a solution for their business need, and focus on non-

technical and technical aspects. The non-technical aspects involve defining system

requirements, facilitating interaction between business users and technical staff, etc.

Technical aspects involve developing the specification for user interface (application

programs).

e) Application programmers: These are the computer professionals who

implement the specifications given by the system analysts, and develop application

programs. They can choose tools, such as rapid application development (RAD) to

develop the application program with minimal effort. The database application

programmer develops application program to facilitate easy data access for the database

users.

1.3.15 Database Administrator (DBA)

Database Administrator is a person who has central control over both data and application

programs. The responsibilities of DBA vary depending upon the job description and corporate

and organization policies. Some of the responsibilities of DBA are given here.

a) Schema definition and modification: The overall structure of the database is known as

database schema. It is the responsibility of the DBA to create the database schema by

executing a set of data definition statements in DDL. The DBA also carries out the

changes to the schema according to the changing needs of the organization.

22
b) New software installation: It is the responsibility of the DBA to install new DBMS

software, application software, and other related software. After installation, the DBA

must test the new software.

c) Security enforcement and administration: DBA is responsible for establishing and

monitoring the security of the database system. It involves adding and removing users,

auditing, and checking for security problems.

d) Data analysis: DBA is responsible for analyzing the data stored in the database, and

studying its performance and efficiency in order to effectively use indexes, parallel query

execution, etc.

e) Preliminary database design: The DBA works along with the development team during

the database design stage due to which many potential problems that can arise later (after

installation) can be avoided.

f) Physical organization modification: The DBA is responsible for carrying out the

modifications in the physical organization of the database for better performance.

g) Routine maintenance checks: The DBA is responsible for taking the database backup

periodically in order to recover from any hardware or software failure (if occurs). Other

routine maintenance checks that are carried out by the DBA are checking data storage

and ensuring the availability of free disk space for normal operations, upgrading disk

space as and when required, etc.

1.3.16 Disk Manager

The disk manager is part of the operating system of the host computer and all physical input and

output operations are performed by it. The disk manager transfers the block or page requested by

23
the file manager so that the latter need not be concerned with the physical characteristics of the

underlying storage media.

1.3.17 Data Manager

The data manager is the central software component of the DBMS. It is sometimes referred to as

the database control system. One of the functions of the data manager is to convert operations in

the user's queries coming directly via the query processor or indirectly via an application

program from the user's logical view to a physical file system. The data manager is responsible

for interfacing with the file system. In addition, the tasks of enforcing constraints to maintain the

consistency and integrity of the data, as well as its security, are also performed by the data

manager. It is also the responsibility of the Data Manager to provide the synchronization in the

simultaneous operations performed by concurrent users and to maintain the backup and recovery

operations.

1.3.17 File Manager

Responsibility for the structure of the files and managing the file space rests with the file

manager. It is also responsible for locating the block containing the required record, requesting

block from the disk manager, and transmitting the required record to the data manager as shown.

The file manager can be implemented using an interface to the existing file subsystem provided

by the operating system of the host computer or it can include a file subsystem written especially

for the DBMS.

1.4 Summary

Human is dealing with data and information since long time, perhaps beginning of the

civilization human is manipulating data. Since then, given and take of information is in practice,

which further contributes to meaningful knowledge.

24
To achieve the objective of this chapter, origin of database concept extend from file systems has

been discussed. Flaws of file systems, advantages of database system and a comparative survey

among them are sketched.

Basic terminology of database management system, its characteristics, and important

components of database systems along with the role of different users are explained herewith.

1.5 Suggested Reading/ Reference Material

1. Elmasri & Navathe: Fundamentals of Database systems, 3rd Edition, Addison Wesley,

New Delhi.

2. Korth & Silberschatz : Database System Concept, 4th Edition, McGraw Hill

International Edition.

3. Raghu Ramakrishnan & Johannes Gehrke: Database Management Systems, 2nd

edition, Mcgraw Hill International Edition.

4. C.J.Date: An Introduction to Databases Systems, 7th Edition, Addison Wesley, New

Delhi.

5. Bipin C.Desai : An Introduction to Database System, Galgotia Publication, New Delhi

6. O‘ Brien J.A.: Introduction to Information System in Business Management, 6th Edition,

Richard D. Irwin, Inc. 1991.

1.6 Self Assessment Questions (SAQ)

1. What is information? How it differs from data? Define database management system.

2. What is file system approach? What are its various limitations?

3. What is DBMS? What are its various advantages and disadvantages?

4. Differentiate between

a) File system approach and database approach.

25
b) Data, Information and Knowledge

5. List some advantages of DBMS as compared to a conventional data processing system.

6. Who is DBA? What are the responsibilities of DBA?

7. Discuss the role of data manager, file manager and disk manger in database management

system.

8. What are the important components of database management system? Explain database

users.

26
Chapter – 2: Database Systems Architecture, Functions &
Component Modules
Writer: Dr. Kanwal Garg
Vetter: Prof. Rajender Nath
Structure:
2.1 Introduction
2.2 Objective
2.3 Presentation of Content
2.3.1 Database Instances and Schemas
2.3.2 Three Level Architecture of DBMS
2.3.3 Mapping
2.3.4 Data Independence
2.3.5 Database Language and Interface
(i) DBMS Languages
(ii) DBMS Interface
2.3.6 DBMS Functions
2.3.7 Component Modules of DBMS
2.4 Summary
2.5 Suggested Reading/ Reference Material
2.6 Self Assessment Questions (SAQ)

27
2.1 Introduction

A collection of data designed to be used by different people is called a database. It is a collection

of interrelated data stored together with controlled redundancy to serve one or more applications

in an optimal fashion. The data are stored in such a fashion that they are independent of the

program to the people using the data. A common and controlled approach is used in adding data

and modifying and retrieving existing data within the database.

The users of the database do not have to worry about the physical implementation and internal

working of the database. The database management system has different layers and the different

user needs to interact with assigned layer only.

In this chapter, we present three tier architecture of DBMS package, which has been evolved

from the traditional system, where the whole database was tightly integrated. Mapping among

the different views/levels of three tier architecture are explained. Modification of data at schema

level, which keeps data separated from all program is discussed. DBMS languages, interfaces,

functions and component modules are taken care of for the clear understanding of DBMS.

2.2 Objective

After going through to this chapter the student will be having clear understanding to separate the

user application and the physical database. Three tier architecture of DBMS is presented to

provide a framework on which subsequent chapters can build. Such a framework is useful for

describing general database concepts. This architecture is proposed by ANSI/SPARC (American

National Standard Institute/ Standards Planning and Requirement Committee) which explains

database as three views. The mapping will define the correspondence between three view levels.

Modification of schema in the database will clarify the points to be taken care of. To

communicate with the database, DBMS interfaces and languages are incorporated. A detailed

28
knowledge of DBMS functions and component module is delivered in the last section of this

chapter.

2.3 Presentation of Content

2.3.1 Instances and Schema

Database changes over time when information is inserted or deleted. The collection of

information stored in the database at a particular moment is called an instance of the database.

The overall design of the database is called the database schema.

A schema diagram, as shown below, displays only names of record types (entities) and names of

data items (attributes) and does not show the relationship among the various files.

Schema of SupplierMaster

Sid Sname SCity Product Price Qty

Schema of ClientMaster

CNo Cname Caddress CPhoneno

Instances of Table SupplierMaster are as follow:

Sid Sname SCity Product Price Qty

S001 Compware Kurukshetra HP Laptop 29000/- 5

Computers

S002 Dell Computers Panipat Dell Laptop 31000/- 5

S003 Krishna Computers Karnal Acer Laptop 27000/- 8

Figure 2.1: Schema and Instance

29
The schema will remain the same while the values filled into it change from instant to instant.

When the schema framework is filled in with data item values, it is referred as an instance of the

schema. The data in the database at a particular moment of time is called a database state or

snapshot, which is also called the current set of occurrences or instances in the database.

In other words, ―the description of a database is called the database schema, which is specified

during database design and is not expected to change frequently‖.

A schema diagram displays only some aspects of a schema, such as the names of record types

and data items and some types of constraints. Other aspects are not specified in the schema

diagram. As in the above figure there is neither the data types of attributes nor the relationship

among the files are shown.

The actual data in database may change quite frequently. The data in the database at a particular

moment in times is called a database state or snapshot. It is also known as current set of

occurrence or instances in the database. The DBMS is partially database responsible for ensuring

that every state of the database is a valid state that is a state that satisfies the structure and

constraint specified is schema. Hence, specifying a correct schema to the DBMS is extremely

important, and the schema must be changed with care.

2.3.2 Three Level Architecture of DBMS

A database management system is a mega software designed to assist in managing, maintain and

utilizing large collection of data. A database is hence a general purpose software system that

facilitates the process of defining (task of internal view), constructing (task of conceptual view)

and manipulating (task of external view) database for various applications. A Complete

understanding of these processes, their role and implementations are discussed in detail in three

level/ tier architecture of DBMS.

30
The three-level/ tier architecture as shown in Figure 2.1 is also known as three-schema

architecture of database management system. The purpose of the three-tier architecture is to

separate the user applications and the physical database. Three schema architecture is a

convenient tool with which users can visualize the schema levels in a database system. DBMS

architecture is a framework where the structure of DBMS is defined. The main aim of this

architecture is to achieve the characteristics by defining the abstract view of the data and by

hiding the details from end users.

Three level architecture frame work was suggested by ANSI/SPARC (American National

Standard Institute/ Standards Planning and Requirement Committee). The view at each level is

described by schema. A schema is an outline or plan that describes the structure of database. The

word scheme means a systematic plan for achieving some goals. The word scheme can be

interchangeably used by schema. The subset of schema is known as Subschema. It refers to the

user‘s view of field that he uses from the database. Each view accesses some portion of database.

The architecture of DBMS is divided into three view levels;

1. External view level

2. Conceptual view level

3. Internal view level

1. External Level: - The external Level is described by an external schema i.e. it consists of

definition of logical records and relationship in the external view. Each external schema

describes the part of the database that a particular user group is interested in and hides the rest of

the database from that user group. It also contains the method of deriving the objects in the

external view from the objects in the conceptual view.

31
End Users

External Level External View/ External View/


Schema Schema

(Individual User External- Conceptual


Mapping
View)

Conceptual Conceptual View/ Schema


Level

(Community User
Conceptual- Internal
View) Mapping

Internal Level Internal View/ Schema

(Storage View)

Figure 2.2: Three-tier Architecture of DBMS

This is the highest level of abstraction where only those parts of the entire databases are include

which are of concern to a user. Despite the use of simpler structures at the logical level, some

complexity remains, because of the large size of the database. Many users of the database system

will not be concerned with all this information. Instead, such users need to access only a part of

the database. So that their interaction with the system is simplified, the view level of abstraction

is defined. The system may provide many views for the same database. Users can always fulfill

all demand using the part of the view provided and may never need the entire database so it is

32
called ―user‘s view‖ and ―view‖ which is complete and independent. The external view is written

in external schema using external data sub-language (DSL).

2. Conceptual Level: - The conceptual level has a conceptual schema which represents the

structure of entire database for a community of users. Conceptual schema describes the records

and relationship included in the Conceptual view. The conceptual schema hides the details of

physical storage structures and concentrates on describing entities, data types, relationships, user

operations, and constraints. It also contains the method of deriving the objects in the conceptual

view from the objects in the internal view.

One conceptual view represents the entire database. There is only one conceptual view per

database. It is large, complex and sophisticated. Database change over time as data is inserted

and deleted. The collection of information stored in the database at a particular moment is called

an instance of the database. The overall design of the database is called database schema and

these schemas changes frequently. The description of data at this level is in a format independent

of its physical representation. It also includes features that specify the checks to retain data

consistence and integrity. The conceptual view is written in conceptual schema using conceptual

data sub-language (DSL).

3. Internal Level: - The internal level has an internal schema. Internal level indicates how the

data will be stored and describes the data structures and access method to be used by the

database. It contains the definition of stored record and method of representing the data fields

and access aid used.

This lowest level of abstraction describes how the data are stored in the database, and what

relationship exists among those data. The entire database is thus described in terms of a small

number of relatively simple structures. Although implementation of the simple structures at the

33
logical level may involve complex physical-level structures, the user of the logical level does not

need to be aware of this complexity. The internal view is written in internal schema using

internal data sub-language (DSL).

2.3.3 Mapping

The processes of transforming requests and results between levels are called mappings. These

mappings may be time-consuming, so some DBMSs, especially those that are meant to support

small databases do not support external views. Even in such systems, however, a certain amount

of mapping is necessary to transform requests between the conceptual and internal levels. The

mapping description is stored in data dictionary. The DBMS is responsible for mapping between

these three types of schemas. There are two types of mapping.

1. External- Conceptual Mapping

2. Conceptual- Internal Mapping

External- Conceptual Mapping: A mapping between external and conceptual views gives the

correspondence among the records and relationship of the conceptual and external view. The

external and conceptual mapping tells the DBMS which objects on the conceptual level

correspond to the object requested on a particular user‘s external view. If changes are made to

either external view or conceptual view, then mapping must be changed accordingly.

Conceptual- Internal Mapping: The Conceptual- Internal mapping defines the correspondence

between the conceptual view and the internal view, i.e. the database stored on the physical

storage device. It describes how conceptual records are stored to and retrieved from the storage

device. This means that Conceptual- Internal mapping tells the DBMS that how the conceptual

records are physically represented. If the structure of the stored database is changed, then the

mapping must be changed accordingly. It is the responsibility of DBA to manage such changes.

34
These mapping are used primarily for data independence. All details are used in these mapping

so as to make overall view data independent. The changes in mapping are responsibilities of

DBA. In addition to the mapping and three views, there are three more points of reference in

architecture. One is DBMS; other is DBA and third is user-interface. An example for the

implementation of the above three view is given below:

Conceptual View:

Schema Name = Student

{Regd : Char(8); Primary Key;

RollNo : Number(4) Candidate Key;

Name : Char(20);

Address : Varchar2(20);

Marks1 : Number(3);

Marks2 : Number(3);

Marks3 : Number(3);

TMarks : Number(3);

Grade : Char(2);

External View:

View N1 = Student

{Regd : Char(8); Primary Key;

Name : Char(20);

TMarks : Number(3);

Grade : Char(2);

35
}

View N2 = Student

{ RollNo : Number(4) Candidate Key;

Name : Char(20);

Address : Varchar2(20);

Grade : Char(2);

Internal View:

Schema = Student

Block size = 1MB

File Name = xyz

Offset =0

Starting cylinder = xxxxxxxx

Ending Cylinder = xxxxxxxx

Organization = Index SQL etc.

The three views represented here give a broad idea of the database views. At conceptual level all

entries are made. At external level data availability is dependent on user. At internal level

technical details are provided.

2.3.4 Data Independence

The ability to modify a schema definition at one level of a database system without having to

change the schema at the next higher level is called data independence. Data independence is a

form of database management that keeps data separated from all programs that make use of data.

There are two types of data independence:

36
1. Logical data independence: It is the capacity to change the conceptual schema without

having to change external schema or application programs. We may change the conceptual

schema to expand the database (by adding a record type or data item), to change constraints, or

to reduce the database (by removing a record type or data item). To expand database, we can do

changes in conceptual schema and we can also change conceptual schema to change constraints.

It means that logical data independence gives us the freedom of changing the conceptual schema

without worrying about external schema. For example sometimes we may need to change the

logical schema by adding or removing the fields or attributes from the database. With logical

data independence, the change is possible.

Example 1: The addition or removal of new entity, attributes, and relationships to the conceptual

schema should be possible without having to change existing schemas or having to rewrite

existing application programs. Consider a relation i.e. Student (name, rollno, class)

Student

Name Rollno Class

If one more attribute i.e. Marks, is added in to the existing relation i.e. Student then the structure

of the relation looks like:

Student

Name Rollno Class Marks

Figure 2.3: Logical Data Independence

In Figure 2.2, we may need to change the logical schema by adding or removing the fields/

attributes from the database. With logical data independence, the change is possible. The change

37
would be absorbed by the view definitions and mapping between the external and the conceptual

view.

2. Physical data independence: It is the capacity to change the internal schema without having

to change the conceptual schema. Hence, the external schemas need not be changed as well.

Changes to the internal schema may be needed because of using different file organization or

storage structure, storage devices, or indexing strategy should be possible without having to

change the conceptual or external schema.

Alteration in the internal schema might include:

 Using new storage devices.

 Using different data structures.

 Switching from one access method to another.

 Using different file organization or storage structures.

 Modifying indexes.

2.3.5 Database Language and Interface

The main objective of DBMS is to allow its users to perform a number of operations on database

such as retrieval, deletion and modification of data in abstract terms without knowing about the

physical representation of data. Therefore DBMS must provide appropriate languages and

interfaces for each category of users. In this section we discuss the types of languages and

interfaces provided by a DBMS and the user categories targeted by each interface.

(i) DBMS Languages

A language is needed to describe the database to the DBMS as well provided facilitites for

changing the database and for defining and changing physical data structure. Another language is

called Data Description/ Definition Language (DDL) and Data Manipulation Language (DML)

38
respectively. Each DBMS has a DDL as well as a DML. The two languages may be parts of a

unified database language. The DBMS languages are of three forms as explained below:

1. Extended Host Languages

These are the subroutine called from one or more programming languages. For example, a

system amy provide extension to COBOL, FORTRAN, C, C++ etc. to enable the user to interact

with the database. The programming language that is extended is usually called the host

language.

2. Query Language

These are special purpose languages that usually provide more powerful facilities to interact with

the database. These languages are often designed to be simple so that non-programmers may use

this easily. There are four types of database languages or you may call it as SQL components i.e.

Data Definition Language (DDL), Data Manipulation Language (DML), Data Control Language

(DCL) and Transaction Control Language (TCL).

DDL is a computer language for defining the type of data structure used in database. DDL

statements are used to create, modify and remove database objects such as tables, indexes and

users. CREATE, ALTER, DROP, TRUNCATE, RENAME are the various DDL commands.

These commands are used for a specific purpose as given below:

 CREATE- To create objects in the database

 ALTER- Alters the structure of the database

 DROP- Delete objects from the database

 TRUNCATE- Remove all records from a table, including all spaces allocated to the

record.

 RENAME- Rename an object

39
DDL has a pre-defined syntax for describing data.

DML is a family of computer languages used by computer program database users to retrieve,

insert, delete and update data in a database. Currently most popular DML is that of SQL, which

is used to retrieve and manipulate data in a relational database. DML may be of two types i.e.

procedural (the user specifies what data is needed and how to get it) and non-procedural (the user

specifies what data is needed). DML performs the operations like SELECT, INSERT, UPDATE,

DELETE. These commands are used for a specific purpose as given below:

 SELECT- Retrieve data from a database

 INSERT- Insert data into table

 UPDATE- Updates existing data within a table

 DELETE- Delete all records from a table, the space for the records remain.

DCL is a computer language and subset of SQL, used to control access the data in a database.

That is a user can access any data based on the privileges given to him. DCL statements are used

to provide a kind of security to the database. GARNT and REVOKE are the commands that

come under the preview of DCL. The purpose of using these commands is as follow:

 GRANT- To allow specified users to perform specified tasks.

 REVOKE- To cancel previously granted or denied permissions.

TCL statements are used to manage the changes made by DML statements. It allows statements

to be grouped together into logical transactions. For revoking the transactions and to make the

data commit to database, we use TCL statements. COMMIT, ROLLBACK, SAVEPOINT, SET

TRANSACTION are TCL statements. These statements works as follow:

COMMIT- Save work done

ROLLBACK- Identify a point in a transaction to which you can later roll back

40
SAVEPOINT- Restore database to original since the last COMMIT

SET TRANSACTION- Change transaction options like isolation level and what rollback

segment to use.

3. Data Sublanguage

In relational database theory, the term sublanguage, first used for this purpose by E. F. Codd in

1970, refers to a computer language used to define or manipulate the structure and contents of a

relational database management system (RDBMS). Typical sublanguages associated with

modern RDBMS's are QBE (Query by Example) and SQL (Structured Query Language). In

1985, Codd encapsulated his thinking in twelve rules which every database must satisfy in order

to be truly relational. The fifth rule is known as the Comprehensive data sublanguage rule, and

states.

(ii) DBMS Interfaces

A DBMS interface is the abstraction of a piece of functionality of a DBMS. It usually refers to

the communication boundary between the DBMS and clients or to the abstraction provided by a

component within a DBMS.

Figure 2.4: Working Principle of DBMS Interface

41
A DBMS interface hides the implementation of the functionality of the component it

encapsulates. Any real life data stored via an application poses with the help of SQL, a query

language, a query to the database system. There, the corresponding answer (result set) is

prepared and also with the help of SQL given back to the application. This communication can

take place interactively or be embedded into another language. Working principle of a database

interface is shown in Figure 2.4 as given above:

DBMS provides the User-friendly interfaces which may include the following:

1. Menu-Based Interfaces for Web Clients or Browsing: These interfaces present the

user with lists of options, called menus, which lead the user through the formulation of a request.

Menus do away with the need to memorize the specific commands and syntax of a query

language; rather, the query is composed step by step by picking options from a menu that is

displayed by the system.

2. Forms-Based Interfaces: A forms-based interface displays a form to each user. Users

can fill out all of the form entries to insert new data, or they fill out only certain entries, in which

case the DBMS will retrieve matching data for the remaining entries. Forms are usually designed

and programmed for naive users as interfaces to canned transactions. A form based interface is

shown in Figure 2.5 for reader understanding.

3. Graphical User Interfaces: A graphical user interface (GUI) typically displays a schema

to the user in diagrammatic form. The user can then specify a query by manipulating the

diagram. In many cases, GUIs utilize both menus and forms. Most GUIs use a pointing device,

such as a mouse, to pick certain parts of the displayed schema diagram.

4. Interfaces for Parametric Users: Parametric users, such as bank tellers, often have a

small set of operations that they must perform repeatedly. Systems analysts and programmers

42
design and implement a special interface for each known class of naïve users. Usually, a small

set of abbreviated commands is included, with the goal of minimizing the number of keystrokes

required for each request.

Figure 2.5: Form Based Interface

5. Text Based Interface: To be able to administrate the database or for other professional

users there are possibilities to communicate with the DBMS directly in the query language (in

code form) like SQL input/output window as shown in Figure 2.6. Text-based interfaces are very

powerful tools and allow a comprehensive interaction with a DBMS. However, the use of these

is based on active knowledge of the respective database language.

6. Interfaces for the DBA: Most database systems contain privileged commands that can

be used only by the DBA's staff. These include commands for creating accounts, setting system

parameters, granting account authorization, changing a schema, and reorganizing the storage

structures of a database.

43
Figure 2.6: Text Based Interface

7. Natural Language Interfaces: These interfaces accept requests written in English or

some other language and attempt to "understand" them. A natural language interface usually has

its own "schema," which is similar to the database conceptual schema, as well as a dictionary of

important words. The natural language interface refers to the words in its schema, as well as to

the set of standard words in its dictionary, to interpret the request. If the interpretation is

successful, the interface generates a high-level query corresponding to the natural language

request and submits it to the DBMS for processing; otherwise, a dialogue is started with the user

to clarify the request.

2.3.6 DBMS Functions

There are many functions a Database Management System (DBMS) serves that are key

components to the operation of database management. When deciding to implement a DBMS in

a business environment, the first task is to decide what type of DBMS one actually requires. A

DBMS performs several important functions that guarantee integrity and consistency of data in

44
the database. Most of these functions are transparent to end-users. There are the following

important functions and services provided by a DBMS:

1) Ability to Update and Retrieve Data: This is a fundamental component of a DBMS and

essential to database management. Without the ability to view or manipulate data, there would be

no point to using a database system. Updating data in a database includes adding new records,

deleting existing records and changing information within a record. The user does not need to be

aware of how DBMS structures this data, all the user needs to be aware of is the availability of

updating and/or pulling up information, the DBMS handles the processes and the structure of the

data on a disk.

2) Support Concurrent Updates: Concurrent updates occur when multiple users make updates

to the database simultaneously. Supporting concurrent updates is also crucial to database

management as this component ensures that updates are made correctly and the end result is

accurate. Without DBMS intervention, important data could be lost and/or inaccurate data stored.

DBMS uses features to support concurrent updates such as batch processing, locking, two-phase

locking, and time stamping to make certain updates that are done accurately. Database

management system is responsibility to make sure that all updates are stored properly since the

user is unaware about all this updates.

3) Recovery of Data: In the event a catastrophe occurs, DBMS must provide ways to recover a

database so that data is not permanently lost. There are times when computers may crash, a fire

or other natural disaster may occur, or a user may enter incorrect information invalidating or

making records inconsistent. If the database is destroyed or damaged in any way, the DBMS

must be able to recover the correct state of the database, and this process is called Recovery. The

easiest way to do this is to make regular backups of information. This can be done at a set

45
structured time, so in the event a disaster occurs, the database can be restored to the state that it

was last at prior to crash.

4) Data Storage Management: It provides a mechanism for management of permanent storage

of the data. The internal schema defines how the data should be stored by the storage

management mechanism and the storage manager interfaces with the operating system to access

the physical storage.

5) Self- Describing Nature of a Database System: A database system contains not only the

database itself but also a complete definition or description of the database. This system is stored

in system catalog. The information stored in the catalog is called meta-data.

6) Program- Data Independence: DBMS access programs which are written independently of

any specific files. The structure of data files is stored in the DBMS catalog separately from the

access programs.

7) Program- Operation Independence: Users define operations as part of the database

definitions. User application programs can operate on data by invoking these operations through

their names and arguments. Users do not care about how the operation is implemented.

8) Support of Multiple Views: Each user of the database many require a different perspective of

the database. A view may be a subset of the database or it may contain virtual data that is derived

from the database files.

9) Authorization/ Security Management: The DBMS protects the database against

unauthorized access, either international or accidental. It furnishes mechanism to ensure that only

authorized users an access the database.

10) Database Access and Application Programming Interfaces: All DBMS provide interface

to enable applications to use DBMS services. They provide data access via Structured Query

46
Language (SQL). The DBMS query language contains two components: (a) a Data Definition

Language (DDL) and (b) a Data Manipulation Language (DML).

11) Concurrency Control Service: Since DBMSs support sharing of data among multiple users,

they must provide a mechanism for managing concurrent access to the database. DBMSs ensure

that the database kept in consistent state and that integrity of the data is preserved.

12) Transaction Management: A transaction is a series of database operations, carried out by a

single user or application program, which accesses or changes the contents of the database.

Therefore, a DBMS must provide a mechanism to ensure either that all the updates

corresponding to a given transaction are made or that none of them is made.

13) Backup and Recovery Management: The DBMS provides mechanisms for backing up data

periodically and recovering from different types of failures. This prevents the loss of data.

2.3.7 Component Modules of DBMS

The DBMS software is partitioned into several modules. Each module or component is assigned

a specific operation to perform. Some of the functions of the DBMS are supported by operating

systems (OS) to provide basic services and DBMS is built on top of it. Figure 2.7 explains

database system being a complex software system which is partitioned into several software

components that handle various tasks such as data definition and manipulation, security and data

integrity, data recovery and concurrency control, and performance optimization etc. as explained

below:

1) Data Definition: The DBMS provides functions to define the structure of the data. These

functions include defining and modifying the record structure, the data type of fields, and the

various constraints to be satisfied by the data in each field. It is the responsibility of DBA to

define the database, and make changes to its definition (if required) using the DDL and other

47
privileged commands. The DDL compiler component of DBMS processes these schema

definitions, and stores the schema descriptions in the DBMS catalog (data dictionary). Other

DBMS components then refer to the catalog information as and when required.

2) Data Manipulation: Once the data structure is defined, data needs to be manipulated. The

manipulation of data includes insertion, deletion, and modification of records. The functions that

perform these operations are also part of the DBMS. These functions can handle planned as well

as unplanned data manipulation needs.

I. The queries that are defined as a part of the application programs are known as planned

queries. The application programs are submitted to a pre-compiler, which extracts DML

commands from the application program and send them to DML compiler for

compilation. The rest of the program is sent to the host language compiler. The object

codes of both the DML commands and the rest of the program are linked and sent to the

query evaluation engine for execution.

II. The sudden queries that are executed as and when the need arises are known as

unplanned queries (interactive queries). These queries are compiled by the query

complier, and then optimized by the query optimizer. The query optimizer consults the

data dictionary for statistical and other physical information about the stored data. The

optimized query is finally passed to the query evaluation engine for execution. The naive

users of the database can also query and update the database by using some already given

application program interfaces. The object code of these queries is also passed to query

evaluation engine for processing.

48
3) Data Security and Integrity: The DBMS contains functions, which handle the security and

integrity of data stored in the database. Since these functions can be easily invoked by the

application, the application programmer need not code these functions in any PL/SQL program.

Figure 2.7: Component Modules of DBMS

4) Concurrency and Data Recovery: The DBMS also contains some functions that deal with

the concurrent access of records by multiple users and the recovery of data after a system failure.

5) Performance Optimization: The DBMS has a set of functions that optimize the performance

of the queries by evaluating the different execution plans of a query and choosing the best among

them.

6) Run Time Database Manager: Run time database manager is the central software

component of the DBMS, which interfaces with user-submitted application programs and

49
queries. It handles database access at run time. It converts operations in user's queries coming.

Directly via the query processor or indirectly via an application program from the user's logical

view to a physical file system. It accepts queries and examines the external and conceptual

schemas to determine what conceptual records are required to satisfy the user‘s request. It

enforces constraints to maintain the consistency and integrity of the data, as well as its security.

It also performs backing and recovery operations. Run time database manager is sometimes

referred to as the database control system and has the following components:

 Authorization control: The authorization control module checks the authorization of users

in terms of various privileges to users.

 Command processor: The command processor processes the queries passed by

authorization control module.

 Integrity checker: It .checks the integrity constraints so that only valid data can be entered

into the database.

 Query optimizer: The query optimizers determine an optimal strategy for the query

execution.

 Transaction manager: The transaction manager ensures that the transaction properties

should be maintained by the system.

 Scheduler: It provides an environment in which multiple users can work on same piece of

data at the same time in other words it supports concurrency.

7) Query processor: The query processor transforms user queries into a series of low level

instructions. It is used to interpret the online user's query and convert it into an efficient series of

operations in a form capable of being sent to the run time data manager for execution. The query

processor uses the data dictionary to find the structure of the relevant portion of the database and

50
uses this information in modifying the query and preparing and optimal plan to access the

database.

8) Data Manager: The data manager is responsible for the actual handling of data in the

database. It provides recovery to the system which that system should be able to recover the data

after some failure. It includes Recovery manager and Buffer manager. The buffer manager is

responsible for the transfer of data between the main memory and secondary storage (such as

disk or tape). It is also referred as the cache manger.

9) Database Engine: The Database Engine is the core service for storing, processing, and

securing data. The Database Engine provides controlled access and rapid transaction processing

to meet the requirements of the most demanding data consuming applications within your

enterprise. Use the Database Engine to create relational databases for online transaction

processing or online analytical processing data. This includes creating tables for storing data, and

database objects such as indexes, views, and stored procedures for viewing, managing, and

securing data.

10) Data dictionary: A data dictionary is a reserved space within a database which is used to

store information about the database itself. A data dictionary is a set of table and views which

can only be read and never altered. Most data dictionaries contain different information about the

data used in the enterprise. In terms of the database representation of the data, the data table

defines all schema objects including views, tables, clusters, indexes, sequences, synonyms,

procedures, packages, functions, triggers and many more. This will ensure that all these things

follow one standard defined in the dictionary. The data dictionary also defines how much space

has been allocated for and / or currently in used by all the schema objects. A data dictionary is

51
used when finding information about users, objects, schema and storage structures. Every time a

data definition language (DDL) statement is issued, the data dictionary becomes modified. A

data dictionary may contain information such as:

 Database design information

 Stored SQL procedures

 User permissions

 User statistics

 Database process information

 Database growth statistics

 Database performance statistics

11) Query Processor: A relational database consists of many parts, but at its heart are two major

components: the storage engine and the query processor. The storage engine writes data to and

reads data from the disk. It manages records, controls concurrency, and maintains log files. The

query processor accepts SQL syntax, selects a plan for executing the syntax, and then executes

the chosen plan. The user or program interacts with the query processor, and the query processor

in turn interacts with the storage engine. The query processor isolates the user from the details of

execution: The user specifies the result, and the query processor determines how this result is

obtained. The query processor components include

 DDL interpreter

 DML compiler

 Query evaluation engine

52
12) Report writer: Also called a report generator, a program, usually part of a database

management system, which extracts information from one or more files and presents the

information in a specified format. Most report writers allow you to select records that meet

certain conditions and to display selected fields in rows and columns. You can also format data

into pie charts, bar charts, and other diagrams. Once you have created a format for a report, you

can save the format specifications in a file and continue reusing it for new data.

In this way, the DBMS provides an environment that is both convenient and efficient to use

when there is a large volume of data and many transactions need to be processed concurrently.

2.4 Summary

In this chapter a DBMS is presented that cleanly separates the three levels which have mapping

between the schemas to transform requests and results from one level to the next. Most DBMS‘s

do not separate the three levels completely. We used the three-schema architecture to define the

concept of logical and physical data independence.

In the next section different types of user-friendly interfaces provided by DBMS and the users

with each interface is associated. Main types of languages that DBMS supports are also

explained which gives a thorough knowledge of high-level language that can be used as a

standalone language, often called as a query language.

In the last, main functionality of DBMS and its different component modules are explained.

When deciding to implement a DBMS in a business environment, the first and most important

task is to decide what type of DBMS that business actually requires. Also it is important to know

that how many and which modules are actually requires to fulfill the business desire. Therefore,

selection of the concerned module or component is required to perform the requisite operation.

53
2.5 Suggested Reading/ Reference Material

1. Elmasri & Navathe: Fundamentals of Database systems, 3rd Edition, Addison Wesley,

New Delhi.

2. Korth & Silberschatz : Database System Concept, 4th Edition, McGraw Hill International

Edition.

3. Raghu Ramakrishnan & Johannes Gehrke: Database Management Systems, 2nd edition,

Mcgraw Hill International Edition.

4. C.J.Date: An Introduction to Databases Systems, 7th Edition, Addison Wesley, New

Delhi.

5. Bipin C.Desai : An Introduction to Database System, Galgotia Publication, New Delhi

2.6 Self Assessment Questions (SAQ)

1 What is the difference between schema and subschema?

2. Outline three-level schema architecture of DBMS; distinguish each of the level clearly.

3. What do you mean by mapping? Discuss the different type of mapping in three-tier

architecture of DBMS.

4. Distinguish between conceptual schema and external schema in DBMS architecture.

5. What do you mean by data independence? Discuss the different type of data

independence.

6. What are database languages? Explain in detail.

7. Define DBMS. Explain the different functions of database management system.

8. Discuss the different component modules of database management system.

54
Chapter – 3: Entity-Relationship (ER) Modeling

Writer: Dr. Kanwal Garg


Vetter: Prof. Rajender Nath
Structure:
3.1 Introduction
3.2 Objective
3.3 Presentation of Content
3.3.1 Entity-Relationship (ER) Model – Concept
(i) Entity
(ii) Attribute
(iii) Relationship Types
(iv) Degree of Relationship
(v) Cardinality of a Relationship
(vi) Representing Relationship Types
(vii) Role Names and Recursive Relationships
3.3.2 Relationship Constraints
(i) Cardinality for Binary Relationship
(ii) Participation Constraints and Existence Dependencies
(iii) Attributes of Relationship Types
3.3.3 Keys
3.3.4 ER Model/ Diagram
3.3.5 Mapping Logical ER-Diagram Models to Relational Tables
3.4 Summary
3.5 Suggested Reading/ Reference Material
3.6 Self Assessment Questions (SAQ)

55
3.1 Introduction

It is the responsibility of database administrator (DBA) to perform the logical database design,

assigning the related data items of the database to columns of tables in a manner that preserve

desired properties. The most important test of logical design is that tables and attributes faithfully

reflect relationships among objects in the real world and that this remains true after all likely

database updates in future. Database with different data models have different structures for

representing data in relational database; the fundamental structure for representing data what we

have been calling relational tables.

The DBA starts by studying some real-world enterprise whose operations need to be supported

on a computerized database system. After a great deal with expertise of examination system;

DBA comes up with a list of data items and underlying data objects that must be keep track with

number of rules or constraints concerning the interrelationship of these data items. For all these

purposes the DBA used a data model to represent data items and their relationship called Entity-

Relationship (E-R) model.

3.2 Objective

This chapter will provide an idea to view real world objects as entity and relationship among

them by using basic component of relational model i.e. Entity-relationship diagram. The

objective of an entity relationship diagram is to show the business rules that apply to an

organization data. It contains entities, which are things of interest to a company, and

relationships, which are relationships between entities. It also documents volumetric data so that

we know what the initial data storage requirements will be together with the anticipated growth.

To design E-R diagram; basic data structuring concepts, constraints, relationship, keys,

cardinality ratios etc. are elaborated so that these concepts can be used in the designing of

56
conceptual schema for database applications. This design plan is designed by a database

developer to implement specific database management software. This model can be used to

communicate with the end-users.

3.3 Presentation of Content

3.3.1 Entity-Relationship (ER) Model – Concept

When a relational database is to be designed, an Entity-Relationship (ER) diagram is drawn at an

early stage and developed as the requirements of the database and its processing become better

understood. Drawing an entity-relationship diagram aids understanding of an organization's data

needs and can serve as a schema diagram for the required system's database. A schema diagram

is any diagram that attempts to show the structure of the data in a database. Nearly all systems

analysis and design methodologies contain entity-relationship diagramming as an important part

of the methodology and nearly all CASE (Computer Aided Software Engineering) tools contain

the facility for drawing entity-relationship diagrams. An entity-relationship diagram could serve

as the basis for the design of the files in a conventional file-based system as well as for a schema

diagram in a database system.

In 1976, Chen developed the Entity-Relationship (ER) model, defined as ―a high-level data

model that is useful in developing a conceptual design for a database‖. An Entity Relationship

(ER) diagram is an excellent communications tool, which can be used to confirm business

requirements and provide direction to the architecture and design team as they move forward

with physical database design. An ER Model generally provides the following:

 Confirms business rules;

 Is used as the ―target‖ for data movement mapping and helps ensure no data is

overlooked;

57
 Provides direction to the architecture and design team to start physical database design;

and

 Helps make important decisions about facts and dimensions required for business

intelligence purposes.

Creation of an ER diagram, which is one of the first steps in designing a database, helps the

designer(s) to understand and to specify the desired components of the database and the

relationships among those components. An ER model is a diagram containing entities or "items",

relationships among them, attributes of the entities and the relationships. These three categories

are considered to be sufficient to model the essentially static data-base parts of any organization's

information processing needs.

(i) Entity: An entity is an object that exists and which is distinguishable from other objects.

An entity can be a person, a place, an object, an event, or a concept about which an organization

wishes to maintain data. The following are some examples of entities:

Example 1: Student, Employee, Department are the examples of entities.

It is important to understand the distinction between an entity type, an entity instance, and an

entity set. An entity type defines a collection of entities that have same attributes. An entity

instance is a single item in this collection. An entity set is a set of entity instances.

Example 2: Let student is an entity type; a student with ID number 13-PGDCA-1100 is an entity

instance; and a collection of all students is an entity set.

In the E-R diagram, we assign a name to each entity type. When assigning names to entity types,

we follow certain naming conventions. An entity name should be a concise singular noun that

captures the unique characteristics of the entity type. An E-R diagram depicts an entity type

using a rectangle with the name of the entity inside as shown in Figure 3.1.

58
STUDENT EMPLOYEE DEPARTMENT

Figure 3.1: The Entity Representation in an E-R diagram.

An entity type may be of two types i.e. Strong entity and Weak entity. Entity types that have key

attribute (Primary Key) are called strong entity type. The strong entity type is also called regular

entity type. The entity type STUDENT is a strong entity type, since it has StudentID as a key

attribute as shown in example 3. While entity types that do not have any key attributes is called

weak entity type. In example 3, class is a weak entity, since it does not have any key attribute.

The weak entity type is also called child entity type or the subordinate entity type. In an E-R

diagram a strong entity is shown in rectangular box as shown in Figure 3.2 (a) and a weak entity

type in a double rectangular box as shown in Figure 3.2 (b).

Class
STUDENT

Figure 3.2: a) Strong Entity Type b) Weak Entity Type

(ii) Attribute: We represent an entity with a set of attributes. An attribute is a property or

characteristic of an entity type that is of interest to an organization. Some attributes of entity

types include the following:

Example 3: STUDENT = {Student Id, Name, Address, PhoneNo, Age, Dateofbirth, Language}

EMPLOYEE = {Employee Id, Employee Name, Employee Age, Employee Salary}

DEPARTMENT = {Department Id, Department Name}

CLASS = {Subject, Department, Section}

A particular value of an attribute, such as 101 as StudentID and Aryan as Name etc. for Student

entity as shown in Example 3 is a value of the attribute. Most of the data in a database consists of

59
values of attributes. The set of all possible values of an attribute is the attribute domain.

Sometimes the value of an attribute is unknown or missing, and sometimes a value is not

applicable. In such cases, the attribute can have the special value as null.

Following conventions are used while naming attributes:

1. Each word in a name starts with an uppercase letter followed by lower case letters.

2. If an attribute name contains two or more words, the first letter of each subsequent word is

also in uppercase, unless it is an article or preposition, such as ―a,‖ ―the,‖ ―of,‖ or ―about‖ etc.

E-R diagrams depict an attribute inside an ellipse/oval and connect the ellipse/oval with a line to

the associated entity type. Figure 3.3 illustrates an E-R diagram of Student entity with some of

the possible attributes.

One must note that all of the attributes as shown in Figure 3.3 are actually the several types of

attributes which uses different notations. These include: simple, composite, single-valued, multi-

valued, stored, derived and key attributes. In the upcoming subsections, we discuss the

distinctions between these types of attributes.

a) Simple and Composite Attributes: A simple or an atomic attribute, such as PhoneNo,

cannot be further divided into smaller components. A composite attribute, however, can be

divided into smaller subparts in which each subpart represents an independent attribute. Name in

this case is a composite attribute, since it can be further divided into smaller subpart. Similarly

Address can also be composite attribute. All other attributes, even those that are subcategories of

Name and Address, are simple attributes. Figure3.3 presents the notation that depicts a composite

attribute. Simple and composite attribute are denoted by oval/ ellipse in an E-R diagram.

60
b) Single-Valued and Multi-Valued Attributes: Most attributes have a single value for an

entity instance; such attributes are called single-valued attributes. A multi-valued attribute, on the

other hand, may have more than one value for an entity instance.

First Name
PhoneNo Name

Middle Name

Languages
Last Name

Student
StudentID

Dateofbirth Age

Figure 3.3: Different Types of Attribute in E-R Diagram

Example 4: Figure 3.3 states that in STUDENT entity type; language is an attribute. Language

attribute may be a multi-valued attribute, because it may store the names of the languages that a

student speaks. Since a student may speak several languages, it is a multi-valued attribute.

Attributes like Student Id of the STUDENT entity type is a single-valued attributes, because a

student has only one Student Id. In the E-R diagram, we denote a multi-valued attribute with a

double-lined ellipse. Note that in a multi-valued attribute, we always use a double-lined

ellipse/oval, regardless of the number of values.

61
Note: Student entity is a strong entity; since it has StudentId as key attribute.

c) Stored and Derived Attributes: The value of a derived attribute can be determined by

analyzing other attributes. In Figure 3.3 Age is a derived attribute and DateofBirth is a stored

attribute of STUDENT entity type. The value of Age attribute can be derived from the current

date and the attribute DateofBirth. An attribute whose value cannot be derived from the values of

other attributes is called a stored attribute. A derived attribute Age is not stored in the database.

Derived attributes are depicted in the E-R diagram with a dashed (dotted) ellipse/ oval.

d) Key Attribute: A key attribute (or identifier) is a single attribute or a combination of

attributes that uniquely identify an individual instance of an entity type. No two instances within

an entity set can have the same key attribute value. For the STUDENT entity shown in Figure

3.3, StudentID is the key attribute since each student identification number is unique. Name, by

contrast, cannot be an identifier because two students can have the same name. We underline key

attributes in an E-R diagram.

(iii) Relationship Types: The first two major elements of entity-relationship diagrams are

entity types and attributes. The final element is the relationship type. Sometimes, the word 'types'

is dropped and relationship types are called simply 'relationships' but since there is a difference

between the terms, one should really use the term relationship type.

Real-world entities have relationships between them, and relationships between entities on the

entity-relationship diagram are shown where appropriate. An entity-relationship diagram consists

of a network of entity types and connecting relationship types. A relationship type is a named

association between entities. Individual entities have individual relationships of the type between

them. An individual person (entity) occupies (relationship) an individual house (entity). In an

62
entity-relationship diagram, this is generalized into entity types and relationship types. The entity

type PERSON is related to the entity type HOUSE by the relationship type OCCUPIES. There

are lots of individual persons, lots of individual houses, and lots of individual relationships

linking them.

There can be more than one type of relationship between entities. Entities in an organization do

not exist in isolation but are related to each other. Students take courses and each STUDENT

entity is related to the COURSE entity. Faculty members teach courses and each FACULTY

entity is also related to the COURSE entity. Consequently, the STUDENT entity is related to the

FACULTY entity through the COURSE entity. E-R diagrams can also illustrate relationships

between entities. Therefore, we define a relationship as an association among several entities. A

relationship set is a grouping of all matching relationship instances, and the term relationship

type refers to the relationship between entity types.

Faculty Teaches Course

Figure 3.4: The relationship between FACULTY and COURSE entities in an E-R diagram.

In an E-R diagram, relationship types are represented with diamond-shaped boxes connected by

straight lines to the rectangles that represent participating entity types. A relationship type is a

given name that is displayed in this diamond-shaped box and typically takes the form of a

present tense verb or verb phrase that describes the relationship. An E-R diagram may depict a

relationship as shown in Figure 3.4 between the entities FACULTY and COURSE.

63
(iv) Degree of Relationship: The number of entity sets that participate in a relationship is

called the degree of relationship.

Example 5: The degree of the relationship featured in Figure 3.4 is two because FACULTY and

COURSE are two separate entity types that participate in the relationship. The three most

common degrees of a relationship in a database are unary (degree 1), binary (degree 2), and

ternary (degree 3).

Let E1, E2, . . . ,En denote n entity sets and let R be the relationship. The degree of the

relationship can also be expressed as follows:

a) Unary Relationship A unary relationship R is an association between two instances of

the same entity type.

Example 6: Let two students are roommates and stay together in a hostel. Because they share the

same address, a unary relationship exists between them for the attribute Address in Figure 3.3.

b) Binary Relationship A binary relationship R is an association between two instances of

two different entity types.

Example 7: In a University, a binary relationship exists between a student (STUDENT entity)

and an instructor (FACULTY entity) of a single class; an instructor teaches a student.

c) Ternary Relationship A ternary relationship R is an association between three instances

of three different entity types.

Example 8: Consider a student using certain equipment for a project. In this case, the

STUDENT, PROJECT, and EQUIPMENT entity types relate to each other with ternary

relationships: a student checks out equipment for a project.

64
(v) Cardinality of a Relationship: The term cardinal number refers to the number used in

counting. When we say cardinality of a relationship, we mean the ability to count the number of

entities involved in that relationship.

Example 9: If the entity types A and B are connected by a relationship, then the maximum

cardinality represents the maximum number of instances of entity B that can be associated with

any instance of entity A.

However, we don‘t need to assign a number value for every level of connection in a relationship.

In fact, the term maximum cardinality refers to only two possible values: one or many. While

this may seem to be too simple, the division between one and many allows us to categorize all of

the permutations possible in any relationship. The maximum cardinality value of a relationship,

then, allows us to define the four types of relationships possible between entity types A and B.

a) One-to-One Relationship: In a one-to-one relationship, at most one instance of entity B

can be associated with a given instance of entity A and vice versa.

b) One-to-Many Relationship: In a one-to-many relationship, many instances of entity B

can be associated with a given instance of entity A. However, only one instance of entity A can

be associated with a given instance of entity B.

Example 10: While a customer of a company can make many orders, an order can only be

related to a single customer.

c) Many-to-One Relationship: In a many-to-one relationship, many instances of entity A

can be associated with a given instance of entity B. However, only one instance of entity B can

be associated with a given instance of entity A.

Example 11: Many students of a class are taught by a faculty.

65
d) Many-to-Many Relationship In a many-to-many relationship, many instances of entity

A can be associated with a given instance of entity B, and, likewise, many instances of entity B

can be associated with a given instance of entity A.

Example 12: A machine may have different parts, while each individual part may be used in

different machines.

(vi) Representing Relationship Types: Figure 3.5 displays how we represent different

relationship types in an E-R diagram. An entity on the one side of the relationship is represented

by a vertical line, ―I,‖ which intersects the line connecting the entity and the relationship. Entities

on the many side of a relationship are designated by a crowfoot as depicted in Figure 3.5.

(vii) Role Names and Recursive Relationships: Each entity type in a relationship plays a

particular role. The role name specifies the role that a participating entity type plays in the

relationship and explains what the relationship means. For example, in the relationship between

Employee and Department, the Employee entity type plays the employee role, and the

Department entity type plays the department or employer role. In most cases the role names do

not have to be specified, but in cases where the same entity participates more than once in a

relationship type in different roles.

Example 13: Let there are two entity types MANAGER and ORGANIZATION. The

relationship name is manages. It states that MANAGER plays the role or worker (employee) and

ORGANIZATION plays the role of owner (employer). Further employee manages the

assignments for an employer.

In a recursive relationship the same entity type participate in more than once for a relationship

type in different roles. Such relationship types are called recursive relationship.

66
Example 14: In the Company schema, each employee has a supervisor, we need to include the

relationship ―Supervises‖, however a supervisor is also an employee, therefore the employee

entity type participates twice in the relationship, once as an employee and once as a supervisor,

and therefore we can specify two roles, employee and supervisor as shown in Figure 3.6.

1 1 1 M

A B R
R A B

One-to-One Relationship One-to-Many Relationship

M 1 M M

A B A R B
R

Many-to-One Relationship Many-to-Many Relationship

Figure 3.5: The relationship types based on maximum cardinality.

Employee

Supervisor Supervisee
Supervises

Figure 3.6: Recursive relationship

67
3.3.2 Relationship Constraints

Relationship types have certain constraints that limit the possible combination of entities that

may participate in relationship.

Example 15: An example of a constraint is that if we have the entities Doctor and Patient, the

organization may have a rule that a patient cannot be seen by more than one doctor. This

constraint needs to be described in the schema. There are two main types of relationship

constraints, cardinality ratio, and participation.

(i) Cardinality for Binary Relationship

Binary relationships are relationships between exactly two entities. The cardinality ratio specifies

the maximum number of relationship instances that an entity can participate in. The possible

cardinality ratios for binary relationship types are: 1:1, 1: N, N: 1, M: N. Cardinality ratios are

shown on ER diagrams by displaying 1, M and N on the diamonds box. The ratio shown closest

to an entity represents the ratio the other entity has to that entity.

(ii) Participation Constraints and Existence Dependencies

The participation constraint specifies whether the existence of an entity depends on its being

related to another entity via the relationship type. The constraint specifies the minimum number

of relationship instances that each entity can participate in. There are two types of participation

constraints:

a) Total Participation:

 An entity can exist, only if it participates in at least one relationship instance, then that

relationship is called total participation, meaning that every entity in one set, must be

related to at least one entity in a designated entity set.

68
 An example would be the Employee and Department relationship. If company policy

states that every employee must work for a department, then an employee can exist only

if it participates in at least one relationship instance (i.e. an employee can‘t exist without

a department)

 It is also sometimes called an existence dependency.

 Total participation is represented by a double line, going from the relationship to the

dependent entity.

b) Partial Participation:

 If only a part of the set of entities participate in a relationship, then it is called partial

participation.

 Using the Company example, every employee will not be a manager of a department, so

the participation of an employee in the ―Manages‖ relationship is partial.

 Partial participation is represented by a single line.

(iii) Attributes of Relationship Types

 Relationships can have attributes similar to entity types.

 For example, in the relationship Works_On, between the Employee entity and the

Department entity we would like to keep track of the number of hours an employee

works on a project. Therefore we can include Number of Hours as an attribute of the

relationship.

 Another example is for the ―manages‖ relationship between employee and department,

we can add Start Date as an attribute of the Manages relationship.

69
 For some relationships (1:1, or 1:N), the attribute can be placed on one of the

participating entity types. For example the ―Manages‖ relationship is 1:1, StartDate can

either be migrated to Employee or Department.

3.3.3 Keys

Keys are, as their name suggests, a key part of a relational database and a vital part of the

structure of a table. They ensure each record within a table can be uniquely identified by one or a

combination of fields within the table. They help enforce integrity and help identify the

relationship between tables. There are three main types of keys i.e. candidate keys, primary keys

and foreign keys. There is also an alternative key or secondary key that can be used, as the name

suggests, as a secondary or alternative key to the primary key and composite key as explained

below:

i) Candidate Key: A candidate key is any set of one or more columns whose combined

values are unique among all occurrences and the key cannot be further reduced. Since a null

value is not guaranteed to be unique, no component of candidate key is allowed to be null. There

can be any number of candidate keys in a table. Two properties must be satisfied by a candidate

key:

Uniqueness: There should not be any duplicate rows in a relation.

Irreducible: The attributes which are used to form the keys should not be further broken down

into sub parts.

Table 3.1: Student Relation (StudentID, FirstName, LastName, Class, Marks)

Candidate Key
StudentID First Name Last Name Class Marks

100 Arpit Garg PGDCA 2800

70
101 Satvik Juneja PGDCA 2900

102 Siddhant Luthra PGDCA 2850

103 Aryan Goel PGDCA 2875

Example 16: In Table 3.1, As an example we might have a student_id that uniquely identifies

the students in a student table. This would be a candidate key. But in the same table we might

have the student‘s first name and last name that also, when combined, uniquely identify the

student in a student table. These would both be candidate keys.

In order to be eligible for a candidate key it must pass certain criteria.

 It must contain unique values

 It must not contain null values

 It contains the minimum number of fields to ensure uniqueness

 It must uniquely identify each record in the table

Once your candidate keys have been identified you can now select one to be your primary key

ii) Super Key: A super key is the combination of attributes that can be uniquely identify a

database record. A table might have many super keys. Candidate keys are a special subset of

super keys that do not have any extraneous information in them. In other words if we add another

attribute to candidate key and it still satisfies the uniqueness property, then the combination of

those attributes is known as Super key. The main properties are as follows:

Uniqueness: It must uniquely identify the rows of a relation.

Irreducible: It may or may not be Irreducible.

71
Example 17: In Table 3.1, an attribute student_id act as the candidate key and it can also act as a

super key as it satisfies the uniqueness property. If we add another attribute to that key say

FirstName, LastName, Class, Marks and it still satisfies the uniqueness property then it‘s a super

key.

iii) Primary Key (PK): Primary keys are used to uniquely identify rows in a relational

database design. It usually comprises of a single table column, but may consist of a multiple

columns as well. It is possible for a table to have more than one column with unique values in the

table, however only one primary key can be defined. Each column with distinct values is called a

unique key. If we have more than one candidate key in our relation then choose one out of all

candidate keys. Primary keys can be defined at the time of table creation or can be added in after

the table has been created. Following points should be kept in mind while making primary key

from candidate key:

a) No rows can have an empty value (called the null) in the primary key column.

b) The value of the primary key attribute must not be duplicated in any tuple/ row.

c) The primary key should be composed of the minimum number of attributes that satisfies the

condition of unique occurrence.

d) The value of the primary key will remain same during the life time of the relation.

Table 3.2: Student Relation presenting Primary Key

Primary Key
Roll Name Class Marks

100 Arpit PGDCA 2800

101 Satvik PGDCA 2900

102 Siddhant PGDCA 2850

72
103 Aryan PGDCA 2875
Invalid Entry
103 Mukta PGDCA 2870

Null Prerna PGDCA 2829


Invalid Entry
Null Prem Lata M.Sc 3500

Example 18: In Table 3.2, there are six tuple/ rows in total. In which row no. 5 is invalid,

because Roll 103 is duplicate. The attribute, Name, and the values Mukta and Aryan have the

same Roll as 103. It does not satisfy the properties of primary key. Further in row no. 6, Roll

Null is there. It is also an invalid entry. Since primary key does not allow null values.

iv) Foreign key: A foreign key (FK) is a field or group of fields in a database record that

points to a key field or group of fields forming a key of another database record in some (usually

different) table. Usually a foreign key in one table refers to the primary key (PK) of another

table. This way references can be made to link information together and it is an essential part of

database normalization.

v) Alternate Key: An alternate key or secondary key is a candidate key which is not

selected to be the Primary key. In a relation there are number of attributes which may uniquely

identify the rows of a table. These attributes are called as candidate key. Out of these candidate

keys one is selected as a Primary key of the relation and the remaining candidate key left after

making primary key are known as Alternate keys.

vi) Composite Key: A compound key is a key that consists of two or more attributes that

uniquely identify the rows of a relation. Composite keys are also known as concatenated or

aggregate keys. A composite key cannot be irreducible and also it cannot contain null values.

73
3.3.4 ER Model/ Diagram

The Entity-Relationship (ER) model was originally proposed by Peter in 1976 as a way to unify

the network and relational database views. Simply stated, the ER model is a conceptual data

model that views the real world as entities and relationships. A basic component of the model is

the Entity-Relationship diagram, which is used to visually represent data objects. For the

database designer, the utility of the ER diagram is:

• It maps well to the relational model. The constructs used in the ER model can easily be

transformed into relational tables.

• It is simple and easy to understand with a minimum of training. Therefore, the model can

be used by the database designer to communicate the design to the end user.

• In addition, the model can be used as a design plan by the database developer to

implement a data model in specific database management software.

There are two techniques used for the purpose of data base designing from the system

requirements. These are:

 Top down Approach known as Entity-Relationship Modeling

 Bottom Up approach known as Normalization.

An entity-relationship (ER) diagram is a top down approach of designing database. It is a

specialized graphic technique that illustrates the interrelationships between entities in a database.

ER diagram often use symbols to represent three different types of information. Boxes are

commonly used to represent entities. Diamonds are used to represent relationships and ovals/

ellipse are used to represent attributes. The E-R models are designed diagrammatically using the

Entity- Relationship diagrams which represent the elements of conceptual model. The overall

74
logical structure of a database can be expressed graphically by an E-R diagram. Table 3.3 shows

the summary of ER diagram notation.

Table 3.3: Summary of the ER Diagram Notation

Notation Meaning

Entity type

Attribute

Key attribute

Derived attribute

Multivalued

attribute

Composite attribute

Relationship type

Total participation

Many-to-one

relationship

Weak Entity Type

75
Following are advantages of an E-R Model:

1. Visual Representation: The foremost and most important ERD benefit is that it provides a

visual representation of the design. It is normally crucial to have an ERD if you are looking to

come up with an effective database design. This is because the patterns assist the designer in

focusing on the way the database will primarily work with all the data flows and interactions. It

is common to the ERD being used together with data flow diagrams so as to attain a better visual

representation. Effective communication

2. Effective Communication: An ERD clearly communicates the key entities in a certain

database and their relationship with each other. ERD normally uses symbols for representing

three varying kinds of information. Diamonds are used for representing the relationships, ovals

are usually used for representing attributes and boxes represent the entities. This allows a

designer to effectively communicate what exactly the database will be like.

3. Simple to Understand: ERD is easy to understand and simple to create. In effect, this design

can be used to be shown to the representatives for both approval and confirmation. The

76
representatives can also make their contributions to the design, allowing the possibilities of

rectifying and enhancing the design.

4. High Flexibility: The ERD model is quite flexible to use as other relationships can be derived

easily from the already existing ones. This can be done using other relational tables and

mathematical formulae.

Following are disadvantages of an E-R Model:

1. No Industry Standard for Notation: There is no industry standard notation for developing

an E-R diagram.

2. Popular for High-Level Design: The E-R data model is especially popular for high level.

3.3.5 Mapping Logical ER-Diagram Models to Relational Tables

For each entity set and relationship set, there is a unique table which is assigned the name of the

corresponding set. Each table has a number of columns with unique names.

Step 1: For regular entity type E in ER schema, create a relation R that includes all the

simple attributes, and component attributes of composite attributes. Select the

primary key.

Step2: For weak entity type W in ER schema, with owner entity type E, create a relation

R, include all simple attributes of W as attribute of R. In addition, include the

primary key attributes of the relation Q for the owner entity type E. Primary key is

the combination of primary key of Q and partial key of R.

77
Step 3: For 1:1 relationship X, suppose S and T are the relations for the entity types

participating in it. Include primary key of T as foreign key of S. Include other

attributes of relationship X as attribute of S.

Step 4: For 1: N relationship Y, suppose S relation corresponds to the entity type at the N-

side and T relation corresponds to the entity type at the other side. Include

primary key of T as foreign key of S.

Customer_ name Branch_ name

Customer _city Branch_city


Customer_Id
Figure 3.7: ERBranch_Id
Diagram

CUTOMER BRANCH
Account

Account_no Balance

Figure 3.7: An ER Diagram to represent relationship between customer and branch.

Step 5: For M: N relationship Z, create a new relation R to represent Z. Include simple

attributes of Z in R. Include the primary keys of S and T as foreign keys of R,

their combination forms the primary key of R.

78
Step 6: For multi-valued attributes A, create a new relation R that includes an attribute

corresponding to A. Include primary key of the relation of the entity type having

A as an attribute. Primary key is their combination.

Step 7: For n-ary relationship type X, and n>2, create a new relation R, include primary

key of each participating entity type‘s relation as foreign key of R. Include

attribute of X as simple attributes of R.

3.4 Summary

Entity-Relationship (E-R) model is a high level conceptual data model developed by Chen in

1976 to facilitate database design. In this chapter, we had discussed an overview about E-R

modeling. Different type of entities, attribute and relationship among them are clearly elaborated.

We also exemplify the key concept, which are very important in E-R designing process. We also

discussed that how to construct an E-R diagram and further how to map an E-R model into

relational tables.

3.5 Suggested Reading/ Reference Material

1. Elmasri & Navathe: Fundamentals of Database systems, 3rd Edition, Addison Wesley,

New Delhi.

2. Korth & Silberschatz.: Database System Concept, 4th Edition, McGraw Hill International

Edition.

3. Raghu Ramakrishnan & Johannes Gehrke: Database Management Systems, 2nd edition,

Mcgraw Hill International Edition.

4. C.J.Date: An Introduction to Databases Systems, 7th Edition, Addison Wesley, New

Delhi.

79
5. Bipin C.Desai : An Introduction to Database System, Galgotia Publication, New Delhi

6. O‘ Brien J.A.: Introduction to Information System in Business Management, 6th Edition,

Richard D. Irwin, Inc. 1991.

3.6 Self Assessment Questions (SAQ)

1. What is E-R modeling? What are the components of E-R model? Discuss.

2. Differentiate between entity, entity type and entity set. Explain the different type of

entities.

3. What do you mean by attribute? Explore the different type of attributes with examples.

4. Outline the different notations and naming conventions used to represent an E-R diagram.

5. What do you mean by relationship? Discuss the degree of relationship. Explain

cardinality ratios of a relationship.

6. Distinguish between

a) Composite attributes and Atomic attributes

b) Single-Valued and Multi-Valued Attributes

80
Chapter – 4: Database Design: Case Studies

Writer: Dr. Kanwal Garg


Vetter: Prof. Rajender Nath
Structure:
4.1 Introduction
4.2 Objective
4.3 Presentation of Content
4.3.1 Database Design Process
4.3.2 Case Studies
(i) Draw an E-R diagram of Inventory System
(ii) Draw an E-R diagram of Payroll System
(iii) Draw an E-R diagram of Reservation System
(iv) Draw an E-R diagram of Online Book Store
4.3.3 Some Other Specimen ER Diagram
4.3.4 Benefits of ER Diagram
4.4 Summary
4.5 Suggested Reading/ Reference Material
4.6 Self Assessment Questions (SAQ)

81
4.1 Introduction

The performance of a DBMS is ultimate measurement of database design and database

designer‘s skill. A DBA can improve performance by adjusting some DBMS parameters like size

of the buffer pool or frequency of checkpoints. The overall database design activity has to

undergo systematic process called the design methodology. The overall process include

conceptual and external schema design, that is created as a collection of relations and views

along with a set of integrity constraints, we must address performance goals through physical

database design, in which we design the physical schema. It is usually necessary to tune

according to the user requirements.

4.2 Objective

The design process consists of two parallel activities. The first activity involves the design of

data content and structure of the database. The second activity relates to the design of the

database application. These two activities strongly influence each other. Traditionally, database

design methodologies have focused on different phases as discussed in upcoming sections. These

phases are similar to software design phases, but not strictly restrict to sequence of these phases.

4.3.1 Database Design Process

Proper database design is the only way that the database application will be efficient, flexible,

and easy to manage and maintain. An important aspect of database design is to use relationships

between tables instead of throwing all your data into one long flat file. Types of relationships

include one-to-one, one-to-many, many-to-one and many-to-many.

Using relationships to properly organize your data is called normalization. There are many levels

of normalization, but the primary levels are the first, second, and third normal forms. Each level

82
has a rule or two that you must follow. Following all the rules helps ensure that your database is

well organized and flexible.

To take an idea from inception through to fruition, you should follow a design process. This

process essentially says, ―Think before you act.‖ Discuss rules, requirements, and objectives;

then create the final version of your normalized tables. The systematic process of designing a

database is known as design methodology. Database design involves understanding operational

and business needs of an organization, modeling the specified requirements, and realizing the

requirements using a database. The goal of designing a database is to produce efficient, high

quality, and minimum cost database. In large organizations, database administrator (DBA) is

responsible for designing an efficient database system. He is responsible for controlling the

database life-cycle process. The overall database design and implementation process consists of

several phases.

i) Requirement Collection and Analysis: It is the process of knowing and analyzing the

expectations of the users for the new database application in as much detail as possible. A team

of analysts or requirement experts are responsible for carrying out the task of requirement

analysis. They review the current file processing system or DBMS system, and interact with the

users extensively to analyze the nature of business area to be supported and to justify the need

for data and databases. The initial requirements may be informal, incomplete, inconsistent, and

partially incorrect. The requirement specification techniques such as object-oriented analysis

(OOA), data flow diagrams (DFDs), etc., are used to transform these requirements into better

structured form. This phase can be quite time-consuming; however, it plays the most crucial and

important role in the success of the database system. The result of this phase is the document

containing the specification of user requirements.

83
ii) Conceptual Database Design: In this phase, the database designer selects a suitable data

model and translates the data requirements resulting from previous phase into a conceptual

database schema by applying the concepts of chosen data model. The conceptual schema is

independent of any specific DBMS. The main objective of conceptual schema is to provide a

detailed overview of the organization. In this phase, a high-level description of the data and

constraints are developed. The entity-relationship (E-R) diagram is generally used to represent

the conceptual database design. The conceptual schema should be expressive, simple,

understandable, minimal, and formal.

iii) Choice of a DBMS: The choice of a DBMS depends on many factors such as cost,

DBMS features and tools, underlying model, portability, and DBMS hardware requirements. The

technical factors that affect the choice of a DBMS are the type of DBMS (relational, object,

object-relational, etc.), storage structures and access paths that DBMS supports, the interfaces

available, the types of high-level query languages, and the architecture it supports (client/server,

parallel or distributed). The various types of costs that must be considered while choosing a

DBMS are software and hardware acquisition cost, maintenance cost, database creation and

conversion cost, personnel cost, training cost, and operating cost.

iv) Logical Database Design: Once an appropriate DBMS is chosen, the next step is to map

the high-level conceptual schema onto the implementation data model of the selected DBMS. In

this phase, the database designer moves from an abstract data model to the implementation of the

database. In case of relational model, this phase generally consists of mapping the E-R model

into a relational schema.

v) Physical Database Design: In this phase, the physical features such as storage structures,

file organization, and access paths for the database files are specified to achieve good

84
performance. The various options for file organization and access paths include various types of

indexing, clustering of records, hashing techniques, etc.

vi) Database System Implementation: Once the logical and physical database designs are

completed, the database system can be implemented. DDL statements of the selected DBMS are

used and compiled to create the database schema and database files, and finally the database is

loaded with the data.

vii) Testing and Evaluation: In this phase, the database is tested and fine-tuned for the

performance, integrity, concurrent access, and security constraints. This phase is carried out in

parallel with application programming. If the testing fails, various actions are taken such as

modification of physical design, modification of logical design or upgrade or change DBMS

software or hardware.

We must keep it in view that once the application programs are developed, it is easier to change

the physical database design. However, it is difficult to modify the logical database design as it

may affect the queries (written using DML commands) embedded in the program code. Thus, it

is necessary to carry out the design process effectively before developing the application

programs. While designing a database schema, it is necessary to avoid two major issues, namely,

redundancy and incompleteness. These problems may lead to bad database design.

4.3.2 Case Studies

Following points must be kept in mind before drawing the effective ER diagrams:

1. Identify all the relevant entities in a given system and determine the relationships among

these entities.

2. An entity should appear only once in a particular diagram.

85
3. Provide a precise and appropriate name for each entity, attribute, and relationship in the

diagram. Terms that are simple and familiar always beats vague, technical-sounding words.

In naming entities, remember to use singular nouns. However, adjectives may be used to

distinguish entities belonging to the same class (part-time employee and full time employee,

for example). Meanwhile attribute names must be meaningful, unique, system-independent,

and easily understandable.

4. Remove vague, redundant or unnecessary relationships between entities.

5. Never connect a relationship to another relationship.

i) Draw an E-R diagram of Inventory System:

The Inventory System provides a complete set of methods to support inventory handling. All

users of the Inventory System need the same functionality to complete their varied tasks.

The Inventory System allows you to:

 Remove items from inventory.

 Notify the store of a customer‘s intent to purchase an item that is not currently in stock.

(backorder)

 Notify the store of a customer‘s intent to purchase an item that has never been in stock.

(preorder)

The administrator of the store uses the inventory system to:

 Place a specific number of items on a shelf for customers to purchase, backorder, or

preorder.

 Decrease the number of items available for purchase, backorder, or preorder, perhaps

because of an error in stocking the item.

 Determine the number of items available for purchase, backorder, or preorder.

86
 Determine when a specific item will be back in stock.

For drawing an ER diagram of Inventory system, following components of ER diagram are taken

care of:

1. Entities identified for drawing an ER diagram of Inventory System are as follow:

Entity Purpose

Supplier To maintain the complete personal details of the supplier.

Staff To maintain the complete personal detail of the employee of

the organization.

Customer To maintain the complete personal detail of the customer.

Product To maintain the detail of the product, the organization is

dealing with.

Role To record the role of the staff hired by the organization.

Order To create an order unique number placed by a customer.

Category To maintain the category of the order, if it belongs to a

specific category.

Payment To maintain the payment details of the customer.

Order Detail To maintain the detail of the order being ordered by a

customer.

2. Respective attributes of the entity along with their type and name of the constraint.

Entity List of Attributes Data Type of the Name of

Attribute Constraint

Supplier Name – FNAme, Char (15) Not Null

87
MName, LName

Address Varchar2(25) Not Null

Phone Number(10) Not Null

Fax Number(10)

Email Varchar2(25)

Supplierid Varchar2(3) Primary Key

PayMethod Char(10)

Staff Name – FNAme, Char (15) Not Null

MName, LName

StaffId Varchar2(3) Primary Key

Address Varchar2(25) Not Null

Sex Char(1) Not Null

Phone Number(10) Not Null

UserName Varchar2(25) Not Null

Password Varchar2(25) Not Null

Customer CID Varchar2(3) Primary Key

Name – FNAme, Char (15) Not Null

MName, LName

Phone Number(10) Not Null

Fax Number(10)

SID Varchar2(3) Foreign Key

Email Varchar2(25)

Product ProductId Varchar2(3) Primary Key

88
Supplierid Varchar2(3) Foreign Key

PName Char (15) Not Null

Qperunit Number(3) Not Null

Discount Number(3,2)

Popstock Number(3) Not Null

Pclstock Number(3) Not Null

Pordered Number(3) Not Null

Uprice Number(3,2)

Role RoleId Varchar2(3) Primary Key

Description Char (35) Not Null

RoleName Char (15) Not Null

Order OrderId Varchar2(3) Primary Key

CID Varchar2(3) Foreign Key

OrderDate Date Not Null

Category CatID Varchar2(3) Primary Key

CName Char (15)

ProdID Varchar2(3) Foreign Key

Payment PayId Varchar2(3) Primary Key

Paydue Number(5,2) Not Null

Paypaid Number(5,2) Not Null

Paydate Date Not Null

BillNo. Varchar2(5) Not Null

Baldue Number(5,2) Not Null

89
ODetailID Varchar2(3) Foreign Key

Order Detail ODetailID Varchar2(3) Primary Key

DeliDate Date Not Null

Deliqty Number(3,2) Not Null

Ordquantity Number(3) Not Null

OrdDate Date Not Null

Discount Number(3.2)

OrderId Varchar2(3) Foreign Key

3. Relationship Name and Cardinality Ratio among Entity

Relationship Name Entity Cardinality Ratio

Supplies Supplier , Product 1:N

Belongs Product, Category N: 1

Has Staff, Role N: 1

Register Staff, Customer 1: N

Orders Customer, Order 1:N

Contain Order, OrderDetail 1:N

Takes Staff, Orders 1:N

Having Payment,OrderDetail N:1

Specimen of an E-R diagram: Given below an ER diagram is just an idea to the problem. It

may change as per the designer perception and customer requirement.

90
MNam MNam LNam MNam LNam
FName e FName e
FName e LNam e e
e sex Phone Phone
SupplierID Name
Name CID Fax
Name Password SID
Address
Paymethod StaffID Staff 1
Phone username Register Customer
s N
SUPPLIER Address
Fax N 1 Email
1
Email
1
has Orders
Sup takes
Desc
ProductID plies RoleNam 1 N
Popstoc e N
k RoleID CID
SupplierID N Role
Pclstock Order

PName ordID
Pordered
Product 1
qperunit OrdDate 1
N uprice
discount contain contain
s s
N
Belongs
to
ODetailID paypaid ordID
N
to N
1 billno N Having OrderDetail
ODetailID
CatID Category
Paydate Payment
ordDat
deliDate
PayDue e
CName
PayID deliqty Discount
BalanceDue
ProdID
91 ordquantity
(ii) Draw an E-R diagram of Payroll System:

A payroll system refers to the scheme that is used to pay employees in a firm. A payroll refers to

the financial records that relate to the payment of the employees. A payroll database is an

automated system that allows you to input employees‘ payroll information and compensate them

accordingly. The database may be a stand-alone system that enables only payroll operations, or

an integrated system that enables related business functions. Here we will restrict our scope only

to stand- alone system. A stand-alone payroll database is a single payroll application that you use

to perform payroll tasks. This option may come in handy if you already have HR and accounting

solutions in place. An effective stand-alone database gives you a complete range of services that

allows you to fully manage your payroll activities. This includes new-hire reporting, wage and

deduction calculations, check printing, direct deposit, wage garnishments, tax reporting and

management, paycheck reconciliation, multiple company management, and electronic record-

keeping etc.

For drawing an ER diagram of Inventory system, following components of ER diagram are taken

care of:

1. Entities identified for drawing an ER diagram of Payroll System are as follow:

Entity Purpose

Employee To maintain the complete personal details of the Employee.

Company To maintain the complete detail of the Company.

Branch If the company has multiple branches, then to maintain the

complete detail of the branch.

Employer It is assumed that the Employer has multiple companies,

branches and employees.

92
Salary To record the complete detail of the salary to the employee.

Department To record the information about the department.

2. Respective attributes of the entity along with their type and name of the constraint.

Entity List of Attributes Data Type of the Name of

Attribute Constraint

Employee Name – F_NAM, Char (15) Not Null

L_NAM

E_ID Varchar2(3) Primary Key

Phone Number(10) Not Null

DID Number(3) Foreign Key

Email Varchar2(20)

Address Varchar2(25) Not Null

Designation Varchar2(15) Not Null

B_ID Number(3) Not Null

Department DID Number(3) Primary Key

Dname Varchar2(10) Not Null

E_ID Varchar2(3) Not Null

Company C_ID Number(3) Primary Key

C_Name Char (15) Not Null

Employer Emp_Name Varchar2(30) Not Null

Emp_Add Varchar2(30) Not Null

Email Varchar2(20) Not Null

93
Phone Number(10) Not Null

Salary Basic Varchar2(3) Primary Key

Allowance Char (35) Not Null

Perquisites Char (15) Not Null

Total_Sal Varchar2(3) Primary Key

Tax Varchar2(3) Foreign Key

Net_Salary Date Not Null

E_ID Varchar2(3) Primary Key

Branch B_ID Number(3) Primary Key

B_Name Char(20) Not Null

DID Number(3) Foreign Key

E_ID Varchar2(3) Foreign Key

3. Relationship Name and Cardinality Ratio among Entity

Relationship Name Entity Cardinality Ratio

Belongs Employee, Department M:1

Gets Employee, Salary M:1

Works_In Employee, company M: 1

Employ Employee, employer M: 1

Has Company, Branch 1: M

Pays Employer, Salary 1:N

Specimen of an E-R diagram: Given below an ER diagram is just an idea to the problem. It

may change as per the designer perception and customer requirement.

94
M
EMPLOYEE WORKS
IN
ADDRES 1
COMPANY
Phone S 1 1
DESIGNATIO C_ID
N C_NAME
NAME EMAILID
E_ID HAS
F_NAM
B_ID E 1…….M
L_NAM
E EMPLOY
1…….M BRANCH
ES
M
DID
M HEADED
BY
B_NAME B_ID
PHONE
GETS
BELONGS
DID E_ID
1…….. M
M 1
1
EMPLOYER
PAYS
1
ENAME EMAILID

DEPARTMENT EMP_ADDR

1
DID Dname
SALARY

E_ID E_ID

PERQS NET_SAL
BASIC
ALLOWNACE TOT_SAL TAX
95
iii) Draw an E-R diagram of Reservation System:

We now live in an era where practically everything is inextricable from the internet, including

business. It's now crucial that every business - no matter the sector - has a recognizable web

presence. This help in organizing/ reserving tour and other activity service online. An online

reservation system is "used to store and retrieve information about tour product, tour product

options or lodging facility and conduct transactions for booking it." As a case study, we are

hereby discussing the air reservation system as a reference. Airline need to maintain multiple

type of information such as route information, aircraft information, schedule information, fare

information and reservation information etc.

For drawing an ER diagram of Inventory system, following components of ER diagram are taken

care of:

1. Entities identified for drawing an ER diagram of Reservation System are as follow:

Entity Purpose

Flight To maintain the flight details.

Passenger To maintain the passenger personal details

Airplane To record the details of aircraft.

Booking To records the booking details.

2. Respective attributes of the entity along with their type and name of the constraint.

Entity List of Attributes Data Type of the Name of

Attribute Constraint

Flight Flight_no. Varchar2( (15) Primary Key

From Varchar2(15) Not Null

96
To Varchar2( (15) Not Null

DepartureDate Date Not Null

DepartureTime Number (6) Not Null

DepartureDate Date Not Null

DepartureTime Number (6) Not Null

Airplane ModelNo. Varchar2(15) Not Null

Capacity Number (5) Not Null

Registeration_no. Varchar2(15) Primary Key

Booking Booking_ Date Date Not Null

Online_Payment Number (5,2) Not Null

Seat_no. Number (5) Not Null

Class Varchar2(10) Not Null

Passenger Passsenger_id Number(10) Primary Key

Name—Fname, Char (35) Not Null

Lname

Address Varchar2 (15) Not Null

Contact_info Number(10) Not Null

3. Relationship Name and Cardinality Ratio among Entity

Relationship Name Entity Cardinality Ratio

Flies Flight ,Airplane M:1

Books Passenger, Booking M:N

To Flight, Passenger \1 : N

97
Specimen of an E-R diagram: Given below an ER diagram is just an idea to the problem. It

may change as per the designer perception and customer requirement.

To
Flight Passenger id.
From no. Name

1 N

Departure
Flight Passenger
1To
Date FName Lname

M M

Departue
Time
Address

Arrival Time Flies Books


Contact Info

Arrival Date
1
N class
N

Airplane Booking
Model No. Seat_ No.

Capacity
Booking_ Date
Online_ payment

Registration No.

98
iv) Draw an E-R diagram of Online Book Store:

Shopping for books online helps you find the best possible price for just about any book you

want. If you‘re in the market for rare, collectible or autographed books, it‘s much cheaper and

faster to search online than it would be to call up local used and independent bookstores that

carry these types of items. The features available on many online bookstores also allow you to

compare similar titles with the click of a mouse and read reviews from professionals and

customers. You can also resell your used books to get more cash in your pocket and to clear out

your cluttered bookshelf. It‘s never been easier to ensure you never get stuck with a crummy title

again. A quality online bookstore will have a good product selection, an easy-to-use -yet

comprehensive- website, a variety of shipping options, a number of payment options, excellent

customer support and a strong return policy.

1. Entities identified for drawing an ER diagram of Online Book Store are as follow:

Entity Purpose

Customer To maintain the complete personal detail of the customer.

Order To create an order unique number placed by a customer.

Books To maintain complete record of books.

Author To maintain the detail of the Author.

Warehouse To maintain the record warehouse where the books are

stocked.

Books PDF To record the details of online books.

Publisher To maintain the record of publisher, who publish the books.

Book Store To maintain the record of book store.

99
2. Respective attributes of the entity along with their type and name of the constraint.

Entity List of Attributes Data Type of the Name of

Attribute Constraint

Customer Name – FNAme, Char (15) Not Null

MName, LName

C_ID Varchar2(3) Primary Key

Phone Number(10) Not Null

Address Varchar2(25) Not Null

BookStore Regiteration_no Varachar2(10) Primary Key

Address Varchar2(25) Not Null

Books ISBN_NO Varchar2(3) Primary Key

Price Number(4,2) Not Null

Vol_No. Number(3) Not Null

Year Number(4) Not Null

Issue_No. Number(3) Not Null

Book Name Varchar2(25) Not Null

Order OrderId Varchar2(3) Primary Key

O_Qty Number(3) Not Null

OrderDate Date Not Null

BooksPDF Name Varchar2(25) Not Null

ISBN Char (15) Primary Key

Author Varchar2(25) Not Null

Author Name – FNAme, Char (15) Not Null

100
MName, LName

Address Varchar2(25) Not Null

A_ID Varchar2(5) Not Null

Warehouse Code Varchar2(3) Primary Key

Address Varchar2(30) Not Null

Phone Number Number(10) Not Null

Publisher Name – FNAme, Char (15) Not Null

MName, LName

Phone Number(10) Not Null

Address Varchar2(25) Not Null

3. Relationship Name and Cardinality Ratio among Entity

Relationship Name Entity Cardinality Ratio

Visit Customer, book Store M:1

Place Order Customer, book, Order N: 1

Written By Book, Author M:N

Stock Book, Warehouse M:N

Contains Book Store, Book PDF 1:N

Publish By Books, Publisher M:N

Specimen of an E-R diagram: Given below an ER diagram is just an idea to the problem. It

may change as per the designer perception and customer requirement.

101
M 1
CUSTOMER VISITS BOOKSTORE
ADDRESS
1 1
PHONE REGISTERATION _NO
C_ID
NAME ADDRESS
CONTAINS

F_NAME
M_NAME

L_NAME
PLACE CONTAINS 0…….M
ORDER
0..1 BOOKS PDF
ORDER_ID ISSUE_NO YEAR

ORDER_DATE
MAY
HAVE NAME AUTHOR
ORDER_QTY

0……..M M ISBN
ORDER M
BOOKS PUBLISHED
M M BY
WRITTEN
BY ISBN NO.
1
PRICE

M BOOK_NAME PHONE PUBLISHER


VOL_No

AUTHOR STOCKS
ADDRESS

A_ID NAME
ADDRESS L_NAME
L_NAME
NAME 1……..M F_NAME

WAREHOUSE M_NAME
F_NAME M_NAME

CODE 102
PHONE
ADDRESS
4.3.3 Some Other Specimen ER Diagram

(i) An ER Diagram of Airline Reservation System:

103
(ii) An ER Diagram of University System:

104
(iii) An ER Diagram of an Organization System:

105
(iv) An ER Diagram of Banking System:

4.3.4 Benefits of ER Diagram:

ER diagrams constitute a very useful framework for creating and manipulating databases. Some

of the benefits of designing an ER diagram are as follow:

 First, ER diagrams are easy to understand and do not require a person to undergo

extensive training to be able to work with it efficiently and accurately. This means that

106
designers can use ER diagrams to easily communicate with developers, customers, and

end users, regardless of their IT proficiency.

 Second, ER diagrams are readily translatable into relational tables which can be used to

quickly build databases. In addition, ER diagrams can directly be used by database

developers as the blueprint for implementing data in specific software applications.

 Lastly, ER diagrams may be applied in other contexts such as describing the different

relationships and operations within an organization.

4.4 Summary

This entity-relationship diagram depicts the major concepts and relationships needed for

managing any of the real life case studies. It is neither a complete data model depicting every

necessary relational database table, nor is it meant to be an exactly same design for

implementations of such real life case studies. Alternate models may capture the necessary

attributes and relationships. Therefore, in this chapter an attempt has been initiated to design

some useful case studies which will assist developers with envisioning the complexity of the

environment that an ERM system must address, and ensure that crucial relationships and features

an E-R diagram must address.

4.5 Suggested Reading/ Reference Material

1. www.tutorialspoint.com/dbms/er_diagram_representation.htm

2. www.umsl.edu/~bcjtz4/umsl/er_diagrams.html

3. https://www.google.co.in/search?q=e-

r+diagrams+examples&sa=X&biw=1280&bih=590&tbm=isch&tbo=u&source=univ&ei

=nKAIVLXcCsyIuAT2woCIAQ&ved=0CCcQ7Ak

107
4. Elmasri & Navathe: Fundamentals of Database systems, 3rd Edition, Addison Wesley,

New Delhi.

4.6 Self Assessment Questions (SAQ)

1. What are the different phases of database designing? Explain each in detail.

2. Discuss the importance of requirement analysis phase of database designing process.

3. What are the uses of E-R diagram? Draw an E-R diagram of library system of an

Institute.

4. What do you mean by an ER diagram? Outline the different notations of an ER diagram.

Explain the benefits of an ER diagram.

5. What steps should be kept in mind while designing an ER diagram of University system?

Discuss.

6. Draw an ER diagram of airline reservation system. Explain the components of ER

diagram.

108
Chapter – 5: Data Models

Writer: Dr. Kanwal Garg


Vetter: Prof. Rajender Nath
Structure:
5.1 Introduction
5.2 Objective
5.3 Presentation of Content
5.3.1 Data Model
5.3.2 Importance of Data Models
5.3.3 Data Model Evolution
5.3.4 Type of Data Models
(i) Hierarchical Data Model
(ii) Network Data Model
(iii) Relational Data Model
(iv) Comparison between Hierarchical Model, Network Model and Relational
Model
(V) Entity-Relationship Data Model
(vi) Object-Oriented Model
(vii) Object/ Relational Model
5.3.5 Usage of Data Model
5.4 Summary
5.5 Suggested Reading/ Reference Material
5.6 Self Assessment Questions (SAQ)

109
5.1 Introduction

Data models in DBMS are systems that help you use and create databases. DBMS actually stands

for a database management system. Various DBMS types exist with different speed, flexibilities

and implementations. Each type has an advantage over others but there is no one superior kinds.

The kind of structure and data you need determines which data model in DBMS suits your needs

best.

A data model can be thought of as a flowchart of diagram that shows data relationships among

objects. It can be time-intensive to capture all the data in a model but this should not be rushed as

it is quite important. Basically, a database management system is a program collection allowing

end users to control, maintain or create records in a data base. Primarily, features of DBMS

address database creation for record interrogation, queries and data extraction. The difference

between an application development environment and a DBMS system ranges from the

personnel to the data usage.

A data model not only describes the structure of the data, it also defines a set of operations that

can be performed on the data. A data model generally consists of data model theory, which is a

formal description of how data may be structured and used, and data model instance, which is a

practical data model designed for a particular application. The process of applying a data model

theory to create a data model instance is known as data modeling.

5.2 Objective

A model is a representation of reality, 'real world' objects and events, associations. It is an

abstraction that concentrates on the essential, inherent aspects an organization and ignores the

accidental properties. A data model represents the organization itself. It should provide the basic

concepts and notations that will allow database designers and end users unambiguously and

110
accurately to communicate their understanding of the organizational data. This chapter focuses

on the popular data models i.e. Hierarchical Data Model, Network Data Model, and Relational

Data Model. The terminologies of these models are discussed in detail using simple examples.

Some other data models are also explained which can also be used to represent data. A data

model makes it easy to understand the actual meaning of data to endure that:

 Each user‘s requirement of data will be known.

 Nature of the data is independent from physical structure.

 How to use the data across the application program.

5.3 Presentation of Content

5.3.1 Data Model

The main objective of database system is to highlight only the essential features and to hide the

storage and data organization details from the user. This is known as data abstraction. A database

model provides the necessary means to achieve data abstraction. A database model or simply a

data model is an abstract model that describes how the data is represented and used. A data

model consists of a set of data structures and conceptual tools that is used to describe the

structure (data types, relationships, and constraints) of a database.

A data model not only describes the structure of the data, it also defines a set of operations that

can be performed on the data. A data model generally consists of data model theory, which is a

formal description of how data may be structured and used, and data model instance, which is a

practical data model designed for a particular application. The process of applying a data model

theory to create a data model instance is known as data modeling.

111
Depending on the concept they use to model the structure of the database, the data models are

categorized into three types, namely, high-level or conceptual data models, representational or

implementation data models and low-level or physical data models.

1) High-level or conceptual data models (based on entities & relationships) -

Conceptual data model describes the information used by an organization in a way that is

independent of any implementation-level issues and details. The main advantage of conceptual

data model is that it is independent of implementation details and hence, can be understood even

by the end users having non-technical background. The most popular conceptual data model is,

entity-relationship (E-R) model.

2) Low-level or physical data models - Physical data model describes the data in terms of

a collection of files, indices, and other storage structures such as record formats, record ordering,

and access paths. This model specifies how the database will be executed in a particular DBMS

software such as Oracle, Sybase, etc., by taking into account the facilities and constraints of a

given database management system. It also describes how the data is stored on disk and what

access methods are available to it.

3) Representational or implementation data models (record-based, object-

oriented) - The representational or implementation data models hide some data storage details

from the users; however, can be implemented directly on a computer system. Representational

data models are used most frequently in all traditional commercial DBMSs. The various

representational data models are discussed in section 5.3.4.

Some data models are schematics which depict the manner in which data records are connected

or related within a file structure. These are called record or structural data models. Some data

models are used to identify the subjects of corporate data processing - these are called entity-

112
relationship data models. Still another type of data model is used for analytic purposes to help

the analyst to solidify the semantics associated with critical corporate or business concepts.

3.2 Importance of Data Models

Data models are important to study, because of the following features:

I. Relatively simple representation, usually graphical, of complex real-world data structures

II. Communications tool to facilitate interaction among the designer, the applications

programmer, and the end user

III. Good database design uses an appropriate data model as its foundation

IV. End-users have different views and needs for data

V. Data model organizes data for various users

3.3 Data Model Evolution

Modern database implementation models were not created from a vacuum. Instead, they are the

end result of decades of evolution. This evolution has been in the form of a series or

progressively more sophisticated data models. Such models are designed by the researchers with

a view to achieve the following properties of data models.

i. It should be able to represent all information diagrammatically.

ii. It should be simple and expressible to design the data in the database.

iii. There should be no redundancy in the data.

iv. It should be independent from application.

Some of the common characteristics among data models are given as follow:

i. Conceptual simplicity without compromising the semantic completeness of the database

ii. Represent the real world as closely as possible

113
iii. Representation of real-world transformations (behavior) must be in compliance with

consistency and integrity characteristics of any data model

Each new data model capitalized on the shortcomings of previous models.

Hierarchical  Difficult to represent M:N


Relationship (Hierarchical model)
 Physical Level Dependency
Semantics in Data Model

Network  No ad hoc queries


 Access path predefined

 Provide Ad hoc queries


Relational
 Set-oriented access
 Weak semantic content
Entity-Relationship  Easy to understand.
 Incorporates more semantics.
Semantic
 More semantics in data model.
 Support for complex objects.
 Inheritance.
 Behaviour
Object-Oriented Extended Relational
(object-Relational)

Figure 5.1: The Development of Data Models

5.3.4 Type of Data Models:

(i) Hierarchical Data Model

The hierarchical data model is the oldest type of data model, developed by IBM in 1968. This

data model organizes the data in a tree-like structure, in which each child node (also known as

dependents) can have only one parent node. The database based on the hierarchical data model

comprises a set of records connected to one another through links. The link is an association

114
between two or more records. The top of the tree structure consists of a single node that does not

have any parent and is called the root node.

The root may have any number of dependents; each of these dependents may have any number

of lower level dependents. Each child node can have only one parent node and a parent node can

have any number of (many) child nodes. It, therefore, represents only one-to-one and one-to-

many relationships. The collection of same type of records is known as a record type. Figure 5.2

shows the hierarchical model of Online Book database. It consists of three record types, namely,

PUBLISHER, BOOK, and REVIEW. For simplicity, only few fields of each record type are

shown. One complete record of each record type represents a node.

Advantages:

 It promotes data sharing.

 Parent/child relationship promotes conceptual simplicity.

 Database security is provided and enforced by DBMS.

 Parent/child relationship promotes data integrity.

 It is efficient with 1: m relationship.

Disadvantages:

 Complex implementation requires knowledge of physical data storage characteristics.

 Navigational system yields complex application development, management, and usage;

requires knowledge of hierarchical path.

 Changes in structure require changes in all applications.

 There are limitations with respect to its implementation.

 There is no data definition or data manipulation language in DBMS.

 There is a lack of standards.

115
Figure 5.2: Hierarchical Model of Online Book database

Sample Database for Hierarchical Data Model

In order to understand the hierarchical data model better, let us take the example of the sample

database consisting of supplier, parts and shipments. The record structure and some sample

records for supplier, parts and shipments elements are as given in following tables.

We assume that each row in Supplier table is identified by a unique SNo (Supplier Number) that

uniquely identifies the entire row of the table. Likewise each part has a unique Pno (Part

Number). Also we assume that no more than one shipment exists for a given supplier/part

combination in the shipments table.

116
Hierarchical View for the Suppliers-Parts Database

The tree structure has parts record superior to supplier record. That is parts from the parent and

supplier forms the children. Each of the four trees figure, consists of one part record occurrence,

together with a set of subordinate supplier record occurrences. There is one supplier record for

each supplier of a particular part. Each supplier occurrence includes the corresponding shipment

quantity.

117
For example, supplier S3 supplies 300 quantities of part P2. Note that the set of supplier

occurrences for a given part occurrence may contain any number of members, including zero (for

the case of part P4). Part PI is supplied by two suppliers, S1 and S2. Part P2 is supplied by three

suppliers, S1, S2 and S3 and part P3 supplied by only supplier SI as shown in figure.

Operations on Hierarchical Model

There are four basic operations Insert, Update, Delete and Retrieve that can be performed on

each model. Now, we consider in detail that how these basic operations are performed in

hierarchical database model.

Insert Operation: It is not possible to insert the information of the supplier e.g. S4 who does not

supply any part. This is because a node cannot exist without a root. Since, a part P5 that is not

supplied by any supplier can be inserted without any problem, because a parent can exist without

118
any child. So, we can say that insert anomaly exists only for those children, which has no

corresponding parents.

Update Operation: Suppose we wish to change the city of supplier S1 from Qadian to

Jalandhar, then we will have to carry out two operations such as searching S1 for each part and

then multiple updation for different occurrences of S1. But, if we wish to change the city of part

P1 from Qadian to Jalandhar, then these problems will not occur because there is only a single

entry for part P I and the problem of inconsistency will not arise. So, we can say that update

anomalies only exist for children not for parent because children may have multiple entries in the

database.

Delete Operation: In hierarchical model, quantity information is incorporated into supplier

record. Hence, the only way to delete a shipment (or supplied quantity) is to delete the

corresponding supplier record. But such an action will lead to loss of information of the supplier,

which is not desired. For example: Supplier S2 stops supplying 250 quantity of part PI, then the

whole record of S2 has to be deleted under part PI which may lead to loss the information of

supplier. Another problem will arise if we wish to delete a part information and that part happens

to be only part supplied by some supplier. In hierarchical model, deletion of parent causes the

deletion of child records also and if the child occurrence is the only occurrence in the whole

database, then the information of child records will also lost with the deletion of parent. For

example: if we wish to delete the information of part P2 then we also lost the information of S3,

S2 and S1 supplier. The information of S2 and Sl can be obtained from PI, but the information

about supplier S3 is lost with the deletion of record for P2.

Record Retrieval: Record retrieval methods for hierarchical model are complex and

asymmetric.

119
(ii) Network Data Model

The first specification of network data model was presented by Conference on Data Systems

Languages (CODASYL) in 1969, followed by the second specification in 1971. It is powerful

but complicated. In a network model the data is also represented by a collection of records, and

relationships among data are represented by links. However, the link in a network data model

represents an association between precisely two records. Thus, the complete network of

relationships is represented by several pairwise sets; in each set some (one) record type is owner

(at the tail of the network arrow) and one or more record types are members (at the head of the

relationship arrow). Usually, a set defines a 1: M relationship, although 1:1 is permitted. Like

hierarchical data model, each record of a particular record type represents a node. However,

unlike hierarchical data model, all the nodes are linked to each other without any hierarchy. The

main difference between hierarchical and network data model is that in hierarchical data model,

the data is organized in the form of trees and in network data model, the data is organized in the

form of graphs. Figure 5.3 shows the network model of Online Book database.

Advantages:

 Conceptual simplicity is at least equal to that of the hierarchical model.

 It handles more relationship type, such as m:n and multi-parent.

 Data access is more flexible.

 Data owner/ member promote data integrity.

 There is conformance to standards.

 It includes data definition language (DDL) and data manipulation language (DML)

Disadvantages:

 System complexity limits efficiency.

120
 Navigational system yields complex implementation, application development and

management.

 Structural changes require changes in all application programs.

Figure 5.3: Network Model of Online Book Database

Network view of Sample Database

Considering again the sample supplier-part database, its network view is shown. In addition to

the part and supplier record types, a third record type is introduced which we will call as the

connector. A connector occurrence specifies the association (shipment) between one supplier and

one part. It contains data (quantity of the parts supplied) describing the association between

supplier and part records.

All connector occurrences for a given supplier are placed on a chain .The chain starts from a

supplier and finally returns to the supplier. Similarly, all connector occurrences for a given part

are placed on a chain starting from the part and finally returning to the same part.

121
Operations on Network Model

Detailed description of all basic operations in Network Model is as under:

Insert Operation: To insert a new record containing the details of a new supplier, we simply

create a new record occurrence. Initially, there will be no connector. The new supplier's chain

will simply consist of a single pointer starting from the supplier to itself.

For example, supplier S4 can be inserted in network model that does not supply any part as a

new record occurrence with a single pointer from S4 to itself. This is not possible in case of

hierarchical model. Similarly a new part can be inserted who does not supplied by any supplier.

Consider another case if supplier S 1 now starts supplying P3 part with quantity 100, then a new

connector containing the 100 as supplied quantity is added in to the model and the pointer of S1

and P3 are modified as shown in the below.

We can summarize that there is no insert anomalies in network model as in hierarchical model.

Update Operation: Unlike hierarchical model, where updation was carried out by search and

had many inconsistency problems, in a network model updating a record is a much easier

process. We can change the city of S I from Qadian to Jalandhar without search or inconsistency

122
problems because the city for S1 appears at just one place in the network model. Similarly, same

operation is performed to change the any attribute of part.

Delete operation: If we wish to delete the information of any part say PI, then that record

occurrence can be deleted by removing the corresponding pointers and connectors, without

affecting the supplier who supplies that part i.e. P1, the model is modified as shown. Similarly,

same operation is performed to delete the information of supplier.

In order to delete the shipment information, the connector for that shipment and its

corresponding pointers are removed without affecting supplier and part information.

123
For example, if supplier SI stops the supply of part PI with 250 quantity the model is modified as

shown below without affecting P1 and S1 information.

Retrieval Operation: Record retrieval methods for network model are symmetric but complex.

(iii) Relational Data Model

The relational data model was developed by E. F. Codd in 1970. In the relational data model,

unlike the hierarchical and network models, there are no physical links. All data is maintained in

the form of tables (generally, known as relations) consisting of rows and columns. Each row

(record) represents an entity and a column (field) represents an attribute of the entity. The

relationship between the two tables is implemented through a common attribute in the tables and

not by physical links or pointers. This makes the querying much easier in a relational database

system than in the hierarchical or network database systems. Thus, the relational model has

become more programmer friendly and much more dominant and popular in both industrial and

academic scenarios. Oracle, Sybase, DB2, Ingres, Informix, MS-SQL Server are few of the

popular relational DBMSs. Figure 5.4 shows the relational model of Online Book database.

124
Properties of Relational Tables:

 Values are atomic in nature.

 Each row is unique.

 Column values are of the same kind.

 The sequence of columns is insignificant.

 The sequence of rows is insignificant.

 Each column has a unique name.

Advantages:

 Structural independence is promoted by the use of independent tables. Changes in a tables

structure do not affect data access or application programs.

 Tabular view substantially improves conceptual simplicity, thereby promoting easier

database design, implementation, management and use.

 AD HOC query capability is based on SQL.

 Powerful RDBMS isolates the end user from physical level details and improves

implementation and management simplicity

Entity: Publisher

P_ID Pname Phone

P001 Hills Publication 7134019

P002 Pearson Education 2134562

P003 Khanna Publication 7876543

125
Entity: Book

ISBM Book Price

001-354-921-1 Ransack 22/=

000-987-760-9 C++ 25/=

Entity: Review

R_ID Rating

A0002 6.0

A0006 7.5

Figure 5.4: Relational Model of Online Book Database

Disadvantages:

 The RDBMS requires substantial hardware and system software overhead.

 Conceptual simplicity gives relatively untrained people the tools to use a good system

poorly, and if unchecked, it may produce the same data anomalies found in file systems.

 IT may promote island of information.

(iv) Comparison between hierarchical model, network model and relational

model

When we move with the data models such as hierarchical model, network model, relational

model, we can identify number of differences in terms of data structures, Data manipulation and

data integrity.

Characteristic Hierarchical model Network model Relational model

Data It is used as a method In network model we can Relational model is

126
structure for storing data in a identify multiple branches based on relation

database that looks emanating from one or more among entities.

like a family tree with nodes. So sometimes it looks Relation is a two-

one root and a number like several trees which share dimensional table.

of branches or branches. So the table can be

subdivisions. used to represent

Therefore record types some entity

are organized in the information or some

form of a rooted tree. relationship between

them

Data One to many or one to Allowed the network model to One to One,

structure one relationships support many to many One to many, Many

relationships to many

relationships

Data Based on parent child A record can have many Based on relational

structure relationship parents as well as many data structures.

children.

Data Does not provide an CODASYL (Conference on Relational databases

manipulation independent stand Data Systems Languages) are what brings

alone query interface many sources into a

common query (such

as SQL)

Data Retrieval algorithms Retrieval algorithms are Retrieval algorithms

127
manipulation are complex and complex and symmetric are simple and

asymmetric symmetric

Data integrity Cannot insert the Does not suffer form any Does not suffer from

information of a child insertion anomaly. any insertion

who does not have any anomaly.

parent.

Data integrity Multiple occurrences Free from update anomalies. Free form update

of child records which anomalies

lead to problems of

inconsistency during

the update operation

Data integrity Deletion of parent Free from deletion anomalies Free from deletion

results in deletion of anomalies

child records

(V) Entity-Relationship Data Model

The Entity - Relationship Model (E-R Model) is a high-level conceptual data model developed

by Chen in 1976 to facilitate database design. Conceptual Modeling is an important phase in

designing a successful database. A conceptual data model is a set of concepts that describe the

structure of a database and associated retrieval and updation transactions on the database. A high

level model is chosen so that all the technical aspects are also covered. The E-R data model grew

out of the exercise of using commercially available DBMS's to model the database. The E-R

model is the generalization of the earlier available commercial models like the Hierarchical and

128
the Network Model. It also allows the representation of the various constraints as well as their

relationships. Therefore, the Entity-Relationship (E-R) Model is based on the view of a real

world that consists of set of objects called entities and relationships among entity sets which are

basically a group of similar objects. The relationships between entity sets is represented by a

named E-R relationship and is of 1:1, 1: N or M: N type which tells the mapping from one entity

set to another. Entity-Relationship model has one important advantage. In as much as it is non-

DBMS specific, and is in fact not a DBMS model at all, data models can be developed by the

design team without first having to make a choice as to which DBMS to use.

Features of the E-R Model:

1. The E-R diagram used for representing E-R Model can be easily converted into Relations

(tables) in Relational Model.

2. The E-R Model is used for the purpose of good database design by the database developers so

to use that data model in various DBMS.

3. It is helpful as a problem decomposition tool as it shows the entities and the relationship

between those entities.

4. It is inherently an iterative process. On later modifications, the entities can be inserted into this

model.

5. It is very simple and easy to understand by various types of users and designers because

specific standards are used for their representation.

(vi) Object - Oriented Model

Object DBMSs add database functionality to object- oriented programming languages. They

bring much more than persistent storage to programming language objects. Object DBMSs

extend the semantics of the C++, Smalltalk and Java object programming languages to provide

129
full-featured database programming capability, while retaining native language compatibility. A

major benefit of this approach is the unification of the application and database development into

a seamless data model and language environment. As a result, applications require less code, use

more natural data modeling, and code bases are easier to maintain. Object developers can write

complete database applications with a modest amount of additional effort.

According to Rao (1994), "The object-oriented database (OODB) paradigm is the combination of

object-oriented programming language (OOPL) systems and persistent systems. The power of

the OODB comes from the seamless treatment of both persistent data, as found in databases, and

transient data, as found in executing programs."

In contrast to a relational DBMS where a complex data structure must be flattened out to fit into

tables or joined together from those tables to form the in-memory structure, object DBMSs have

no performance overhead to store or retrieve a web or hierarchy of interrelated objects. This one-

to-one mapping of object programming language objects to database objects has two benefits

over other storage approaches: it provides higher performance management of objects, and it

enables better management of the complex interrelationships between objects. This makes object

DBMSs better suited to support applications such as financial portfolio risk analysis systems,

telecommunications service applications, world wide web document structures, design and

manufacturing systems, and hospital patient record systems, which have complex relationships

between data.

Advantages:

 Semantic content is added.

 Visual representation includes semantic content.

 Inheritance promotes data integrity.

130
Disadvantages:

 Slow development of standards caused vendors to supply their own enhancements, thus

eliminating widely accepted standards.

 It is a complex navigational system.

 There is a steep learning curve.

 High system overhead slows transactions.

(vii) Object/ Relational Model

Object/relational database management systems (ORDBMSs) add new object storage capabilities

to the relational systems at the core of modern information systems. These new facilities

integrate management of traditional fielded data, complex objects such as time-series and

geospatial data and diverse binary media such as audio, video, images, and applets. By

encapsulating methods with data structures, an ORDBMS server can execute complex analytical

and data manipulation operations to search and transform multimedia and other complex objects.

As an evolutionary technology, the object/relational (OR) approach has inherited the robust

transaction- and performance-management features of it s relational ancestor and the flexibility

of its object-oriented cousin. Database designers can work with familiar tabular structures and

data definition languages (DDLs) while assimilating new object-management possibilities.

Query and procedural languages and call interfaces in ORDBMSs are familiar: SQL3, vendor

procedural languages, and ODBC, JDBC, and proprietary call interfaces are all extensions of

RDBMS languages and interfaces. And the leading vendors are, of course, quite well known:

IBM, Informix, and Oracle.

131
(viii) Semi structured Data Model

Unlike other data models, where every data item of a particular type must have the same set of

attributes, the semi structured data model allows individual data items of the same type to have

different set of attributes. In semi structured data model, the information about the description of

the data (schema) is contained within the data itself, which is sometimes called self-describing

data. In such databases there is no clear separation between the data and the schema and thus,

allowing data of any type. Semi structured data model has recently emerged as an important

topic of study for different reasons given here.

 There are data sources such as the Web, which is to be treated as databases; however,

they cannot be constrained by a schema.

 The need of flexible format for data exchange between heterogeneous databases.

 To facilitate browsing of data.

Semi structured data model facilitates data exchange among heterogeneous data sources. It helps

to discover new data easily and store it. It also facilitates querying the database without knowing

the data types. However, it loses the data type information.

5.3.5 Usage of Data Model

(i) Useful for the Personnel: Now that you know all the different data models, it is only

right that you know the advantages of data models in DBMS. A system of DBMS consists of

database administrators and managers that oversee the entire operation of DBMS. Primarily, the

duties are making sure primary schedule is run daily, loading program releases and the

maintenance of database records. Application development consists of system analysts, computer

technicians and programmers with the job of finding errors in the software for testing.

132
(ii) Useful For Records Interrogation: Programs for records interrogation are designed to

provide information to the end users through many programs such as general inquiry programs,

report generators and Query. The Query program is the most popular one, allowing end users to

develop basic skills of programming by constructing simple programs of data using a processor

for query language for data extraction. For records interrogation, Query programs happen to be

quite powerful.

(iii) Useful for Catalog Programs: In a system of DBMS, end users are able to catalog

program favorites to delete, edit or view data. Each of the users is able to copy routines to a

catalog file that is user defined for managing databases. The system of catalogs is a personal tool

used to run programs by end users without having a specialist of applications design programs

for them.

(iv) Useful for Accessing Data: Typically, there are centralized databases for DBMS

systems. Databases can be accessed by end-users without an application program developer

needing to create program access or interruption from a programmer. In the software, the record

structures and database are already built in. In this area, the advantage is access to data records

and structures.

5.4 Summary

Models are a blue-print of plan and play a major role in success of any project. In fact models are

more suitable than picture to express the thought because they include some logic and reasoning

in picture to achieve success of any project of an organization. To carry on this idea an early

proposal for a standard terminology and general architecture of database as a system was

produced in 1971 by the DBTG (Data Base Task Group) appointed by the Conference on data

Systems and Languages (CODASYL). The DBTG recognized the need for a two level approach

133
with a system view called the schema and user view called subschema. The American National

Standard Institute Terminology and Architecture (ANSI-SPARC) in 1975 recognized the need

for a three level approach with a system catalog. Therefore, they proposed relational data model.

With the technological improvements and as the need arises further other models are introduced

as discussed in this chapter.

5.5 Suggested Reading/ Reference Material

1. Elmasri & Navathe: Fundamentals of Database systems, 3rd Edition, Addison Wesley,

New Delhi.

2. Korth & Silberschatz : Database System Concept, 4th Edition, McGraw Hill International

Edition.

3. S.K.Singh: Database Systems Concept, Design and Applications, ,2006, Pearson

Education, ISBN: 81-7758-567-3.

4. .J.Date: An Introduction to Databases Systems, 7th Edition, Addison Wesley, New Delhi.

5. Alesian Leon, Mathews Leon: Database Management systems, Vikas Publication House

Pvt. Ltd., ISBN: 0-81-259-1165-0.

5.6 Self Assessment Questions (SAQ)

1. What do you mean by a Data Model? Discuss the different types of data models used.

2. Define data model. Under which categories data models are broadly classified? What is

the importance of data model?

3. Discuss the different data model along with their advantages and disadvantages.

4. Draw a comparative chart among Hierarchical, Network and Relational data models.

5. Discuss the importance of data model. How it is useful for an organization?

134
Chapter – 6: Relational Algebra

Writer: Dr. Kanwal Garg


Vetter: Prof. Rajender Nath
Structure:
6.1 Introduction
6.2 Objective
6.3 Presentation of Content
6.3.1 Relational Algebra
6.3.2 Uses of Relational Algebra
6.3.3 Relational-Oriented Operation
(i) Select Operation
(ii) Project Operation
(iii) Join Operation
(iv) Combining Operations
6.3.4 Set-oriented Operations
(i) Union
(ii) Intersection
(iii) SET Difference
(iv) Cartesian Join
(v) Division
6.3.5 Sample Queries Using Relational Algebra
6.3.6 Equivalence
6.3.7 Comparing Relational Algebra and SQL
6.4 Summary
6.5 Suggested Reading/ Reference Material
6.6 Self Assessment Questions (SAQ)

135
6.1 Introduction

Relational algebra is a formal language describing how new relations are created from old ones.

It is a useful tool for describing queries on a database management system. To defining the data

structure and constraints, a data model must include a set of operation to manipulate the data. A

basic set of relational model operations constitutes the relational algebra. In modern relational

database management systems each relation is stored as a table. Each row in the table represents

one tuple from the relationship, and each column one attribute. In that sense, we can think of

relational algebra as a language that can be used to describe operations for creating new tables

from existing tables in a database management system.

6.2 Objective

Relational algebra is a query language that is being used to explain basic relational operations

and their principles. Most of the currently used relational database management systems work

with SQL queries. Relational algebra is a good example of procedural language. This helps in

understanding the essential features of the relational model. Relational algebra explores the

various concepts of integrity that apply to the relational model. This chapter will teach us that

how a relational algebra expression interrogates a relational database.

6.3 Presentation of Content

6.3.1 Relational Algebra

Relational algebra is a formal language describing how new relations are created from old ones.

It is a useful tool for describing queries on a database management system. In relational database

management systems each relation is stored as a table. Each row in the table represents one tuple

from the relationship, and each column one attribute. In that sense, we can think of relational

algebra as a language that can be used to describe operations for creating new tables from

136
existing tables in a database management system. Give below are some of the important reasons

for which relational algebra is used frequently.

 Similar to normal algebra (as in 2+3*x-y), except we use relations as values instead of

numbers, and the operations and operators are different.

 Not used as a query language in actual DBMSs. (SQL instead.)

 The inner, lower-level operations of a relational DBMS are, or are similar to, relational

algebra operations. We need to know about relational algebra to understand query

execution and optimization in a relational DBMS.

 Some advanced SQL queries requires explicit relational algebra operations, most

commonly outer join.

 Relations are seen as sets of tuples, which means that no duplicates are allowed. SQL

behaves differently in some cases. Remember the SQL keyword distinct.

 SQL is declarative, which means that you tell the DBMS what you want, but not how it is

to be calculated. A C++ or Java program is procedural, which means that you have to

state, step by step, exactly how the result should be calculated. Relational algebra is

(more) procedural than SQL. (Actually, relational algebra is mathematical expressions.)

6.3.2 Use of Relational Algebra

The schema of a relation is similar to the format or structure of a table. The schema of a relation

is the set of attributes that forms a tuple for that relation. For example, the following describes

the schema for the relation student:

Student(student #, first name, last name, street address, city, state, zip, phone, major, GPA)

When we wish to obtain information from a database, we use a language like SQL to create a

query for the database management system to process. The response from the database will be a

137
result set with a particular schema. The definition above says ―Relational algebra is a formal

language describing how new relations are formed from existing relations.‖ If we think of the

tables in the database as the ―existing relations‖, and the result set of the query as the ―new

relations‖, then relational algebra is a language that can be used to describe data base queries that

will return a result set from the existing database.

result set = query (existing database)

Relational algebra is a procedural query language, which takes instances of relations as input and

yields instances of relations as output. It uses operators to perform queries. An operator can be

either unary or binary. The algebra operation thus produces new relations, which can be further

manipulated using operations of the same algebra. Relational algebra specifies the operations to

perform on existing relation to derive result relations. It defines the complete schema for each of

the result relations. The relational algebraic operations can be divided into two basic groups.

1) Relational-Oriented Operations

2) Set-Oriented Operations

6.3.3 Relational-Oriented Operations

This group consists of operations developed specifically for relational databases; these include

SELECT, PROJECT and JOIN. These operations are explained in upcoming paragraphs:

(i) The SELECT Operation (σ)

This operation is used to select only some of the tuples from a relation that satisfy a selection

condition (predicate) as shown in figure given below. It can be consider like a filter of rows from

a relation on the basis of certain criteria. In relational algebra SELECT operation is denoted by

the symbol σ (Sigma).

In general SELECT operation expression is given by

138
σ <Selection Condition>(R)

Where σ is symbol, denote SELECT operation, and the selection condition is a Boolean

expression specified on the attribute of relation R.

Figure 6.1: Select Operation

Consider a relational Employee. We can retrieve the rows of employee those work for

department finance and getting salary more that Rs. 25000/=. We can individually specify each

of these two conditions with a SELECT operation as follows.

σ DEP = “Finance’ (EMPLOYEE)

σ SALARY > 25000 (EMPLOYEE)

The boolean expression specified in < selection condition> is made up of a attribute name an

operator and a constant value or attribute name e.g. in above example SALARY is attribute

name, the operator is greater than (>) and 25000 is a constant value.

The above two algebraic expression is similar to following SQL statements:

Select * from EMPLOYEE

Where DEP = ‘Finance’;

Select * from EMPLOYEE

Where SALARY > 25000;

139
The SELECT operation is Unary; it means it is applied to a single relation. The degree of the

relation (number of displayed attributes) resulting from the SELECT operation is the same as

that of number of attributes in relation R.

Example

Query: Retrieve the Id, Name, Age of Students who live in Kurukshetra.

STUDENT

ID NAME AGE CITY

101 Ram Kumar 24 Panipat

102 Prem Lata 21 Kurukshetra

103 Pankaj Garg 20 Yamuna Nagar

104 Hitesh Goyal 22 Kurukshetra

σ CITY = “Kurukshetra’ (STUDENT)

Result

ID NAME AGE CITY

102 Prem Lata 21 Kurukshetra

104 Hitesh Goyal 22 Kurukshetra

(ii) The Project Operation (π)

The project operation selects certain columns from the table and discards the other columns. The

projection operation is used to either reduce the number of attributes in the resultant relation or to

reorder attributes. The projection of a relation is defined as a projection of all tuples over some

set of attributes. In relational algebra PROJECT operation is denoted by the symbol π (pi).

140
In general project operation expression is given by

π <Attribute list> (R)

Where π is symbol, denoted PROJECT operation, and the attribute list is a list of attributes from

the relation R hat are to be chosen for display.

Figure 6.1: Project Operation

Consider a relation Employee. We can retrieve only some columns (say Name, Age and Salary

only) of employee table. Then the simplest PROJECT operation expression is given as follows:

ΠNAME,AGE,SALARY (EMPLOYEE)

ΠSEX,SSN (EMPLOYEE)

The result of the project operation has only the attributes specified in <attribute list> and in the

same order as they appear in the list. The PROJECT operation removes any duplicates tuples.

This is also known as duplicate elimination.

The PROJECT operation is Unary; it means it is applied to a single relation. The number of

tuples in a relation resulting from PROJECT operation is always less than or equal to total

number of attributes in relation (R).

The above algebraic expression is similar to following SQL statements:

SELECT NAME, AGE, SALARY FROM EMPLOYEE;


141
SELECT SEX, SSN FROM EMPLOYEE;

Example

Query: Retrieve the Id, Name of Students.

STUDENT

ID NAME AGE CITY

101 Ram Kumar 24 Panipat

102 Prem Lata 21 Kurukshetra

103 Pankaj Garg 20 Yamuna Nagar

104 Hitesh Goyal 22 Kurukshetra

ΠID,NAME (STUDENT)

Result

ID NAME

101 Ram Kumar

102 Prem Lata

103 Pankaj Garg

104 Hitesh Goyal

(iii) THE JOIN OPERATION (∞)

The Join operation is used to combine related tuples from two relations into single tuples. The

tuples from the operand relations that participate in the operation and contribute to the result are

related. The join operation allows the processing of relationships existing between the operand

relations. In relational algebra JOIN operation is denoted by the symbol ∞.

142
In general JOIN operation expression is given by

(R1) ∞ < Join condition> (R2)

Where ∞ is symbol, denote JOIN operation, and the join condition is a Boolean expression

specified in the relation R.

Let we want to retrieve the name of the manager of each department. We need to combine each

department tuples with the employee tuple whose SSN matched the MGRSSN value in

department tuple. In relation algebra it is represented by

DEPT_MGR← DEPARTMENT ∞MGRSSN = SSN EMPLOYEE

RESULT ← ΠDNMAE,LNMAE,FNMAE (DEPT_MGR)

The above algebraic expression is similar to following SQL statements:

SELECT DNAME, LNAME, FNMAE FROM DEPT, EMPLOYEE

WHERE MGRSSN = SSN;

The JOIN operation is Binary; it means it is always applied with two relations results of the

JOIN is a relation Q with n+m attributes Q(A1,A2,A3…..An, B1,B2,B3…..Bn).

Generally, a JOIN operation performs with equality comparison only. Such a JOIN where the

only equal (=) comparison operator is used, is called EQUIJOIN. The result of EQUIJOIN

always has one or more pairs of attributes that have identical values.

If the two join attributes have the same name in both relations, such type of join is known as

NATURAL JOIN. NATURAL JOIN is denoted by the symbol *.

Example

Query: Retrieve the information of student who enrolls in at least one course.

STUDENT

ID NAME CITY

143
100 Vinod Kaithat

200 Jagdish Hisar

300 Armaan Hisar

400 Mandeep Sirsa

ENROL

EID COURSEID

200 101

100 113

300 101

(STUDENT) ∞ < ENROL.COURSEID =STUDETN.ID > (ENROL)

Result

ID NAME CITY COURSEID

100 Vinod Kaithat 113

200 Jagdish Hisar 101

300 Armaan Hisar 101

There are several types of joins, but the most basic type is the Cartesian join. Other joins,

including the natural join, the equi-join and the theta join, are variations of the Cartesian join in

which special rules are applied. Cartesian product will be discussed later in this chapter. Rest of

the Join types are discussed as below:

144
a) Natural Join

A natural join is performed on two relations that share at least one attribute, and is defined as

follows:

The natural join of relation A with relation B is a new relation formed by matching all tuples

from relation A one by one with all tuples from relation B one by one, but only where the value

of the shared attributes are the same. Each shared attribute is only included once in the schema

of the result set. A natural join can only be performed on two relations that have at least one

shared attribute.

The symbol for a natural join is ⋈ We would write C = A ⋈ B.

b) Theta Join

A theta join is similar to a Cartesian join, except that only those tuples are included that meet a

specified condition, as follows:

The theta join of relation A with relation B is a new relation formed by matching all tuples from

relation A one by one with all tuples from relation B one by one, but only where the tuples meet

a specified condition, called the theta predicate. If the relations share any attributes, then each

shared attribute is only included once in the schema of the result set.

The general symbol for a theta join is composite symbol, similar to the symbol for a natural join

subscripted with the Greek letter theta: ⋈θ .We would write C = A ⋈θ B. In practice, the theta

is replaced with the actual condition.

c) Equi-Join

An equi-join, which is similar to both a theta join and a natural join, is defined as follows:

The equi-join of relation A with relation B is a new relation formed by matching all tuples from

relation A one by one with all tuples from relation B one by one, but only where the tuples meet

145
a specified condition of equality, called the equi-join predicate. If the relations share any

attributes, then each shared attribute is only included once in the schema of the result set.

The symbolism for an equi-join is similar to the symbol for a natural join subscripted with the

equal sign: ⋈= .We would write C = A ⋈= B. Just as with the theta join, in practice the equal

sign is replaced with the actual condition.

The difference between an equi-join and a theta join is that the condition must be one of equality

in an equi-join. The difference between an equi-join and a natural join is that the two relations

do not need to have a common attribute in an equi-join.

Example:

We wish to match groups of people waiting for tables at a restaurant with the available tables, on

the condition that the number of people in the group equals the number of seats at the table.

Let W = the relation with data for all the groups waiting for tables

Let T = the relation with data for all of the available tables

Let M = the relation with data assigning groups to tables

M = W ⋈group.size = table.seats T

In this notation the attribute names are shown in their more complex form, relation.attribute, so

that group.size refers to the size attribute of the group relation, and table.seats refers to the seats

attribute of the table relation.

iv) Combining Operation

In general, several relation algebra operations are applied one after another. In this situation we

can write the operation as a nesting algebra expression or apply one operation at a time by

creating intermediate result. For example, to retrieve the first name,lastname, and the salary of all

146
employee who work in the department ‗FINANCE‘. We must be applied SELECT and

PROJECT operations. The combine relation algebraic operation can be given as follows:

ΠFNAME, LNAME, SALARY (σ DEP = ‘ Finance’ (EMPLOYEE))

Alternatively, this can be given like:

DEP_FINANCE → σ DEP = ‘Finance’ (EMPLOYEE)

RESULT → ΠFNAME, LNAME,SALARY (DEP_FINANCE)

6.3.4 Set-oriented Operations

These are the traditional set theory operations that include UNION, INTERSECTION, SET

DIFFERENCE and CARTESIAN PRODUCT. These are binary operation it means these are

applied in two sets. These operations are applied on only those relations that have same number

of attributes of same data type. This condition is called union compatibility. In other words two

relation R and S are said to be union compatible if they have same degree n and each pair of

corresponding attributes have the same domain.

(i) Union

The Union of relation A and relation B is a new relation containing all of the tuples contained in

either relation A or relation B. Union can only be performed on two relations that have the same

schema. The symbol for union is ∪.In relational algebra we would write something like R3 =

R1 ∪ R2. In other words Union is a binary operation performed on two relations and the new

table contains all the rows from both the tables, but duplicate in two tables will be shown only

once in the resultant tables. Two tables must be of same degree i.e. same number of columns and

same data type in both the tables.

Example:

147
R1

Name Rank Age

Mukta 1 34

Satvik 2 25

Aryan 3 18

R2

Name Rank Age

Arpit 5 13

Sidhant 10 17

Prerna 15 41

Satvik 2 25

Aryan 3 18

R3 = R1 U R2

Name Rank Age

Mukta 1 34

Satvik 2 25

Aryan 3 18

Arpit 5 13

Sidhant 10 17

Prerna 15 41

148
The union operation is both commutative and associative.

Commutative law of union: A ∪ B = B ∪ A

Associative law of union: (A ∪ B) ∪ C = A ∪ (B ∪ C)

(ii) Intersection

Like union, intersection means pretty much the same thing in relational algebra that it does in

simple set theory:

The Intersection of relation A and relation B is a new relation containing all of the tuples that are

contained in both relation A and relation B. Intersection can only be performed on two relations

that have the same schema. The symbol for intersection is ∩. In relational algebra we would

write something like R3 = R1 ∩ R2.

Example:

R1

Name Rank Age

Mukta 1 34

Satvik 2 25

Aryan 3 18

R2

Name Rank Age

Arpit 5 13

Sidhant 10 17

Prerna 15 41

Satvik 2 25

149
Aryan 3 18

R3 = R1 ∩ R2

Name Rank Age

Satvik 2 25

Aryan 3 18

The intersection operation is both commutative and associative.

Commutative law of intersection: A ∩ B = B ∩ A

Associative law of intersection: (A ∩ B) ∩ C = A ∩ (B ∩ C)

Unlike union, however, intersection is not considered as a basic operation, but a derived

operation, because it can be derived from the basic operations. We will look at difference next,

but the derivation looks like this:

R1 ∩ R2 = R1 – (R1 – R2)

For our purposes, however, it really doesn‘t matter that intersection is a derived operation.

(iii) SET Difference

The difference operation also means pretty much the same thing in relational algebra that it does

in simple set theory:

The difference between relation A and relation B is a new relation containing all of the tuples

that are contained in relation A but not in relation B. Difference can only be performed on two

relations that have the same schema. The symbol for difference is the same as the minus sign -

We would write R3 = R1 – R2.

Example:

150
R1

Name Rank Age

Mukta 1 34

Satvik 2 25

Aryan 3 18

R2

Name Rank Age

Arpit 5 13

Sidhant 10 17

Prerna 15 41

Satvik 2 25

R3 = R1 - R2

Name Rank Age

Mukta 1 34

Aryan 3 18

R3 = R2 – R1

Name Rank Age

Arpit 5 13

Sidhant 10 17

Prerna 15 41

The difference operation is neither commutative nor associative.

A-B ≠B-A

(A - B) - C ≠ A - (B - C)

151
(iv) Cartesian Join

The PRODUCT operation combines information from two relations pair wise on tuples. The

Cartesian product of two relations is the concatenation of tuples that belong to the two relations.

In other words the Cartesian product of two relations results in a new relation that includes every

row of first relation with every row of second relation.

It is not required that the two tables should be union compatible or that are of same degree. It is a

binary operation so it will take two relations to perform the operation.

Specify the Cartesian product of two sets X (for example the points on X-axis) and Y (for

example the points on Y-axis), denoted X * Y, is the set of all possible ordered pairs whose first

component is a member of X and whose second component is a member of Y (e.g. the whole of

the X-Y plane)

Imagine that we have two sets, one composed of letters and one composed of numbers, as

follows:

S1 = { a, b, c, d} and S2 = { 1,2,3}

The cross product of the two sets is a set of ordered pairs, matching each value from S1 with

each value from S2.

S1 x S2 = { (a,1), (a2), (a3), (b1), (b2), (b3), (c,1), (c2), (c3), (d1), (d2), (d3) }

One exception is with the empty set, which acts as a ―zero‖ and for equal sets.

In relational algebra, the cross product of two relations is also called the Cartesian Product or

Cartesian Join.

Example

152
R1

Name Rank Age

Mukta 1 34

Satvik 2 25

Aryan 3 18

R2

ID Hobby

101 Music

102 Dance

103 Cricket

104 Fine Arts

R1* R2

Name Rank Age ID Hobby

Mukta 1 34 101 Music

Mukta 1 34 102 Dance

Mukta 1 34 103 Cricket

Mukta 1 34 104 Fine Arts

Satvik 2 25 101 Music

Satvik 2 25 102 Dance

Satvik 2 25 103 Cricket

Satvik 2 25 104 Fine Arts

Aryan 3 18 101 Music

153
Aryan 3 18 102 Dance

Aryan 3 18 103 Cricket

Aryan 3 18 104 Fine Arts

Although Cartesian Joins form the conceptual basis for all other joins, they are rarely used in

actual database management systems because they often result in a relation with a large amount

of data. Consider the case of a table with data for 40,000 students, with each row needing 300

bytes of storage space, and a table for 2,000 advisors, with each row needing 200 bytes. The

two original tables would need about 12,000,000 and 400,000 bytes of storage space (12

megabytes and 400 kilobytes). The Cartesian join of these two would have 80,000,000 records,

each with nearly 600 bytes of storage space for a total of 48,000,000,000 bytes (48 gigabytes).

Another reason that Cartesian joins are not used often is this: What is the value of a Cartesian

join? How often do we really need to create such a table?

The other types of joins, which are based on the Cartesian join, are used more often, and are

commonly applied in combination with projection and selection operations.

(v) Division

The division is a binary operation that is written as R ÷ S. The result consists of the restriction of

tuples in R to the attribute names unique to R, i.e. in the header of R but not in the header of S,

for which it holds that all their combinations with tuples in S are present in R.

The division is very useful for a special kind of query such as ―Retrieve the name of the student

who enroll in all course taught by Professor Ram Kumar‖.

154
6.3.5 Sample Queries Using Relational Algebra:

Consider the following Relation, their respective attributes and tuple to solve queries in

Relational Algebra. Only one specimen tuple has been taken. You can have more tuple as per

requirement.

Employee

Fname Mname Lname SSN BDate Address Sex Salary SuperSSN Dno.

Amit Kr Goel 12345 1-1-90 1,ABC N 15,000/ 23456 5

. . . . . . . . . .

. . . . . . . . . .

Department

DName DNumber MGRSSN MGRSDate

Research 5 23456 2-5-1988

. . . .

. . . .

Dept_Location

DNumber DLocation

5 Old Campus

. .

. .

Works_On

ESSN Pno Hours

155
12345 1 32

. . .

. . .

Project

PName PNumber PLocation DNum

Zbase 3 Old Campus 5

. . . .

. . . .

Dependent

ESSN DependentName Sex BDate Relaionship

12345 Kavish M 1-10-2012 Son

. . . . .

. . . . .

QUERY 1

Retrieve the name and address of all employees who work for the 'Research' department.

RESEARCH_DEPT ← σ DNAME=' RESEARCH' (DEPARTMENT)

RESEARCH_EMPS ← (RESEARCH_DEPT ∞ DNUMBER=DNOEMPLOYEE)

RESULT ← ΠFNAME, LNAME, ADDRESS (RESEARCH_EMPS)

156
This query could be specified in other ways; for example, the order of the JOIN and SELECT

operations could be reversed, or the JOIN could be replaced by a NATURAL JOIN after

renaming one of the join attributes.

QUERY 2

For every project located in 'Stafford', list the project number, the controlling department

number, and the department manager's last name, address, and birth date.

STAFFORO_PROJS ← σPLOCATION=' STAFFORD' (PROJECT)

CONTR_DEPT ← (STAFFORD_PROJS ∞ DNUM=DNUMBER DEPARTMENT)

PROJ_DEPT_MGR ← (CONTR_DEPT ∞ MGRSSN=SSN EMPLOYEE)

RESULT ← Π PNUMBER, DNUM, LNAME, ADDRESS. BDATE (PROJ_DEPT_MGR)

QUERY 3

Find the names of employees who work on all the projects controlled by department number 5.

DEPT5_PROJS (PNO) ← Π PNUMBER(σDNUM=5 (PROJECT))

EMP_PROJ(SSN, PNO) ← Π ESSN, PNO (WORKS_ON)

RESULT_EMP_SSNS ← EMP_PROJ ’ DEPT5_PROJS

RESULT ← Π LNAME, FNAME (RESULT_EMP_SSNS * EMPLDYEE)

QUERY 4

Make a list of project numbers for projects that involve an employee whose last name is 'Smith',

either as a worker or as a manager of the department that controls the project.

SMITHS(ESSN) ← Π SSN(σLNAME=' SMITH' (EMPLOYEE))

SMITH_WORKER_PROJ ← Π PNO(WORKS_ON * SMITHS)

MGRS ← Π LNAME, DNUMBER (EMPLOYEE∞ SSN=MGRSSN DEPARTMENT)

SMITH_MANAGED_DEPTS (DNUM) ← Π DNUMBER(σLNAME=' SMITH' (MGRS))

157
SMITH_MGR_PROJS (PNO) ← Π PNUMBER (SMITH_MANAGED_DEPTS * PROJ ECT)

RESULT ← (SMITH_WORKER_PROJS U SMITH_MGR_PROJS)

QUERY 5

List the names of all employees with more than two dependents.

This query cannot be done in the basic (original) relational algebra. We have to use the

AGGREGATE FUNCTION operation with the COUNT aggregate function.

We assume that dependents of the same employee have distinct DEPENDENT_NAME values.

T1(SSN, NO_OF_DEPTS) ← ESSN COUNT DEPENDENT NAME (DEPENDENT)

T 2 ← (σNO_OF_DEPS >(T1)

RESULT ← Π LNAME, FNAME(T2 * EMPLOYEE)

QUERY 6

Retrieve the names of employees who have no dependents.

This is an example of the type of query that uses the MINUS (SET DIFFERENCE) operation.

ALL_EMPS ← Π SSN (EMPLOYEE)

EMPS_WITH_DEPS (SSN) ← Π ESSN (DEPENDENT)

EMPS_WITHOUT_DEPS ← (ALL_EMPS - EMPS_WITH_DEPS)

RESULT ← Π LNAME, FNAME (EMPS_WITHOUT_DEPS * EMPLOYEE)

QUERY 7

List the names of managers who have at least one dependent.

MGRS(SSN) ← Π MGRSSN(DEPARTMENT)

EMPS_WITH_DEPS (SSN) ← Π ESSN (DEPENDENT)

MGRS_WITH_DEPS ← (MGRS ∩ EMPS_WITH_DEPS)

RESULT ← Π LNAME, FNAME (MGRS_WITH_DEPS * EMPLOYEE)

158
The same query can in general be specified in many different ways. For example, the operations

can often be applied in various orders. In addition, some operations can be used to replace others;

for example, the INTERSECTION operation in Query 7 can be replaced by a NATURAL JOIN.

6.3.6 Equivalence

The same relational algebraic expression can be written in many different ways. The order in

which tuples appear in relations is never significant.

 A ×B ⇔ B × A

 A∩B⇔B∩A

 A ∪B ⇔ B ∪ A

 (A - B) is not the same as (B - A)

 σc1 (σc2(A)) ⇔ σc2 (σc1(A)) ⇔ σc1 ^ c2(A)

 πa1(A) ⇔ πa1(πa1,etc(A))

where etc represents any other attributes of A.

 many other equivalences exist.

While equivalent expressions always give the same result, some may be much easier to evaluate

those others. When any query is submitted to the DBMS, its query optimizer tries to find the

most efficient equivalent expression before evaluating it.

6.3.7 Comparing Relational Algebra and SQL

Relational Algebra: The result of every expression is a relation. It has a rigorous foundation has

simple semantics. It is used for reasoning, query optimization etc. Relational algebra is the

mathematical basis for relational databases developed by E.F. Codd. It is a kind of set theory that

gives a solid provable framework for software design that involves lots of data that must be

managed.

159
SQL: The Structured Query Language (SQL) is the common language of most database software

such as MySql, Postgresql, Oracle, DB2, etc. This language translates the relational theory into

practice but imperfectly, SQL is a language that is a loose implementation of relational theory

and has been further modified in its actual implementation by the Relational Database

Management System (RDBMS) software that uses it. It is a superset of relational algebra. It has

convenient formatting features etc. It provides aggregate functions. It has complicated semantics.

It is an end-user language.

6.4 Summary

Most commercial relational database system offers a query language. A query language is a

language in which user requests information from the database. The query language may be of

two types, i.e. Procedural Query Language and Non-Procedural Query Language. Relational

Algebra is procedural query language. It is an offshore of first order logic, deals with a set of

relations closed under operators. Operators operate on one or more relations to yield a relation.

Relation algebra is a pure mathematics and an algebraic structure to mathematical logic and set

theory.

6.5 Suggested Reading/ Reference Material

1. Elmasri & Navathe: Fundamentals of Database systems, 3rd Edition, Addison Wesley,

New Delhi.

2. Korth & Silberschatz : Database System Concept, 4th Edition, McGraw Hill International

Edition.

3. S.K.Singh : Database Systems Concept, Design and Applications, ,2006, Pearson

Education, ISBN: 81-7758-567-3.

160
4. C.J.Date: An Introduction to Databases Systems, 7th Edition, Addison Wesley, New

Delhi.

5. Vinod Kamboj : Fundamental of DBMS and Oracle, 2012, ABS Publications.

6.6 Self Assessment Questions (SAQ)

1. What is Relational Algebra? How many types of operation it support, define each?

2. Define term unary operation and binary operations.

3. What do you mean by Relational Algebra? Discuss the various relational-oriented

operations. Explain with the help of examples.

4. What are the uses of Relational Algebra? Explain the different set-oriented operations

with the help of examples.

5. Discuss the equivalence in relational algebra. How you will compare relational algebra
with SQL?

161
Chapter – 7: An Introduction to SQL
Writer: Dr. Kanwal Garg
Vetter: Prof. Rajender Nath
Structure:
7.1 Introduction
7.2 Objective
7.3 Presentation of Content
7.3.1 About SQL
7.3.2 SQL Data Types
7.3.3 SQL Commands
(i) Data Definition Language (DDL)
(ii) Data Manipulation Language (DML)
(iii) Transaction Control Language (TCL)
(iv) Data Control Language (DCL)
7.3.4 Views in SQL
(i) Creating Views
(ii) Updating a View
7.3.5 Queries and Sub-Query in SQL
7.3.6 Constraints in SQL
7.3.7 SQL Indexes
7.3.8 Sample SQL queries examples
7.4 Summary
7.5 Suggested Reading/ Reference Material
7.6 Self Assessment Questions (SAQ)

162
7.1 Introduction

SQL is a data sublanguage used to organize, manage and retrieve data from a relational database,

which is managed by RDBMS. The origin of SQL and the development of relational database

has same revolution path. Dr. E.F. Codd, an IBM researcher, developed the relational database

concept in June 1970. SQL was conceived in an IBM San Jose Research Laboratory in the mid

1970s as a database language for the new relational database model. In the Late 1970s IBM was

ready to develop a relational database system, SQL/DS RDBMS. Upon the news of this

development, vendors rushed to develop their own RDBMS. A small company, Relational

Software Incorporation beat IBM to the market with its own RDBMS. Relational Software

Incorporation later became Oracle Corporation.

7.2 Objective

SQL gives you everything you need to create, maintain and control your database. Some user

will never have to create a database and will be connected with the querying processes in SQL

found in DML language. Other will not only need to create database but will also have to

maintain and administer database. For these users, SQL provides DDL, DML and DCL.

7.3 Presentation of Content

7.3.1 About SQL

SQL stands for Structured Query Language. SQL is used to communicate with a database.

According to ANSI (American National Standards Institute), it is the standard language for

relational database management systems. SQL statements are used to perform tasks such as

update data on a database, or retrieve data from a database. Some common relational database

163
management systems that use SQL are: Oracle, Sybase, Microsoft SQL Server, Access, Ingres,

etc.

7.3.2 SQL Data Types

Oracle uses the table for storing the information in rows and columns. Each column can only

contain one type of data which we must define. A data type is an attribute that specifies the type

of data that the object can hold. The data type fall into following categories:

Fixed-length character string. Size is specified in parenthesis. Max 255


char(size)
bytes.

Varchar/Varchar2
Variable-length character string. Max size is specified in parenthesis.
(size)

Number value with a max number of column digits specified in


number(size)
parenthesis.

Date Date value

Number value with a maximum number of digits of "size" total, with a


number(size,d)
maximum number of "d" digits to the right of the decimal.

7.3.3 SQL Commands

SQL commands are instructions, coded into SQL statements, which are used to communicate

with the database to perform specific tasks, work, functions and queries with data.

SQL commands can be used not only for searching the database but also to perform various other

functions like, for example, you can create tables, add data to tables, or modify data, drop the

table, set permissions for users. SQL commands are grouped into four major categories

depending on their functionality:

164
(i) Data Definition Language (DDL) - These SQL commands are used for creating,

modifying, and dropping the structure of database objects. The commands are CREATE,

ALTER, DROP, RENAME, and TRUNCATE.

 CREATE TABLE - creates a new database table.

 ALTER TABLE - alters a database table.

 DROP TABLE - deletes a database table.

 RENAME TABLE- Rename on object.

 TRUNCATE- Remove all records from a table, including all spaces allocated for the

records are removed.

(ii) Data Manipulation Language (DML) - These SQL commands are used for storing,

retrieving, modifying, and deleting data. These Data Manipulation Language commands

are: SELECT, INSERT, UPDATE, and DELETE.

 SELECT - get data from a database table.

 INSERT INTO - insert new data in a database table.

 UPDATE - change data in a database table.

 DELETE - remove data from a database table.

(iii) Transaction Control Language (TCL) - These SQL commands are used for managing

changes affecting the data. These commands are COMMIT, ROLLBACK, and

SAVEPOINT.

 COMMIT- Save work done.

 ROLLBACK- Restore database to original since the last commit.

 SAVEPOINT- Identify a point in a transaction to which you can later roll back.

165
(iv) Data Control Language (DCL) - These SQL commands are used for providing security

to database objects. These commands are GRANT and REVOKE.

 GRANT- To allow specified users to perform specified tasks.

 REVOKE- To cancel previously granted or deny permission.

(i) Data Definition Language (DDL)

(a) CREATE TABLE STATEMENT

The create table statement is used to create a new table. Here is the formats of a simple create

table statement:

Syntax:

create table "tablename"

("column1" "data type",

"column2" "data type",

"column3" "data type");

Format of create table if you were to use optional constraints:

create table "tablename"

("column1" "data type" [constraint],

"column2" "data type" [constraint],

"column3" "data type" [constraint]);

Note: You may have as many columns as you'd like, and the constraints are optional [ ] =

optional.

To create a new table, enter the keywords create table followed by the table name, followed by

an open parenthesis, followed by the first column name, followed by the data type for that

column, followed by any optional constraints, and followed by a closing parenthesis. It is

166
important to make sure you use an open parenthesis before the beginning table, and a closing

parenthesis after the end of the last column definition. Make sure you separate each column

definition with a comma. All SQL statements should end with a ";".

The table and column names must start with a letter and can be followed by letters, numbers, or

underscores - not to exceed a total of 30 characters in length. Do not use any SQL reserved

keywords as names for tables or column names (such as "select", "create", "insert", etc).

Data types specify what the type of data can be for that particular column. If a column called

"Last_Name", is to be used to hold names, then that particular column should have a "varchar/

varchar2" (variable-length character) data type.

What are constraints? When tables are created, it is common for one or more columns to have

constraints associated with them. A constraint is basically a rule associated with a column that

the data entered into that column must follow. For example, a "primary key" constraint specifies

that no two records can have the same value in a particular column. They must all be unique and

cannot have null values. The other two most popular constraints are "not null" which specifies

that a column can't be left blank, and "unique key".

Example:

create table employee

(first varchar(15),

last varchar(20),

age number(3),

address varchar(30),

city varchar(20),

state varchar(20));

167
(b) ALTER TABLE STATEMENT

ALTER TABLE command can be used to add, delete or modify columns in an existing table.

When you add a column you must specify a data type.

To add a column in an existing table:

Syntax:

ALTER TABLE ―table_name‖

ADD column ―column_name‖ datatype (size);

Example:

Alter table employee

Add column phone_no (number(10));

To drop a column from an existing table:

Syntax:

ALTER TABLE ―table_name‖

DROP COLUMN ―column_name‖;

Example:

Alter table employee

drop column phone_no;

To modify the size of an existing column of a table:

Syntax:

ALTER TABLE ―table_name‖

MODIFY ―column_name‖ datatype (size);

Example:

Alter table employee

Modify phone_no (number(12));

168
(c) DROP TABLE SATEMENT

To remove an entire table from the database, we use DROP command.

Syntax:

DROP TABLE table_name;

Example:

drop table employee;

The drop table command is used to delete a table and all rows in the table. To delete an entire

table including all of its rows, issue the drop table command followed by the table_name. Drop

table is different from deleting all of the records in the table. Deleting all of the records in the

table leaves the table including column and constraint information. Dropping the table removes

the table definition as well as all of its rows.

(d) RENAME TABLE SATEMENT

By using rename command table name wil be changed to new name. The data of the table will

not be lost.

Syntax:

rename <old table_name> to < new table_name>;

Example:

rename employee to employee_master;

(e) TRUNCATE TABLE STATEMENT

Truncate command removes all rows from a table, but the table structure and its columns,

constraints, indexes and so on remains. In SQL, truncate table command quickly removes all

data from a table, typically bypassing a number of integrity enforcing mechanisms.

Syntax:

169
truncate table ―table_name‖;

Example:

truncate table employee;

(ii) Data Manipulation Language (DML)

(a) SELECT STATEMENT

The SELECT is used to query the database and retrieve selected data that match the specific

criteria that you specify:

Syntax:

SELECT column1 [, column2, ...]

FROM ―table_name‖

[WHERE Clause]

[GROUP BY clause]

[HAVING clause]

[ORDER BY clause];

The where clause can include these operators

 = Equal

 > Greater than

 < Less than

 >= Greater than or equal

 <= Less than or equal

 <> Not equal to

 LIKE pattern matching operator

Note: You may have as many clause as you'd like, and the clause are optional [ ] = optional.

Example:

170
select * from employee;

will returns all the data from the employee table.

The SQL SELECT DISTINCT Statement

In a table, a column may contain many duplicate values; and sometimes you only want to list the

different (distinct) values. The DISTINCT keyword can be used to return only distinct (different)

values.

Syntax:

select distinct “column_name‖, “column_name‖ from “table_name‖;

Example:

Select distinct city from employee;

(b) INSERT STATEMENT

The insert statement is used to insert or add a row of data into the table. There are two basic

syntaxes of INSERT INTO statement as follows:

Syntax1:

insert into "tablename"

first_column,...last_column)

values (first_value,...last_value);

In the example below, the column name first will match up with the value 'Satvik', and the column

name state will match up with the value 'Haryana'.

Syntax2:

You may not need to specify the column(s) name in the SQL query if you are adding values for

all the columns of the table. But make sure the order of the values is in the same order as the

columns in the table. The SQL INSERT INTO syntax would be as follows:

171
insert into ―table_name‖ values (first_value1,second_value2,...last_value);

Example 1:

insert into employee

first, last, age, address, city, state)

values ('Satvik', 'Garg', 25, '1489 Sector 3','Kurukshetra', 'Haryana');

insert into employee

first, last, age, address, city, state)

values (‗Mukta', 'Garg', 29, '107 ward 8','Yamuna Nagar', 'Haryana');

Example 2:

insert into employee

values ('Satvik', 'Garg', 25, '1489 Sector 3','Kurukshetra', 'Haryana');

insert into employee

values (‗Mukta', 'Garg', 29, '107 ward 8','Yamuna Nagar', 'Haryana');

Note: All strings should be enclosed between single quotes: 'string'

To insert records into a table, enter the key words insert into followed by the table name,

followed by an open parenthesis, followed by a list of column names separated by commas,

followed by a closing parenthesis, followed by the keyword values, followed by the list of values

enclosed in parenthesis. The values that you enter will be held in the rows and they will match up

with the column names that you specify. Strings should be enclosed in single quotes, and

numbers should not.

(c) UPDATE STEMENT

The update statement is used to update or change records that match specified criteria. This is

accomplished by carefully constructing a where clause.

172
Syntax1:

update "tablename"

set "columnname" = "newvalue"

[,"nextcolumn" = "newvalue2"...]

where "columnname" = "value" ;

In the above syntax Where clause is introduced, only when we want to update the table data

based on specified condition. If we want to update the attribute value for all the rows of a

specific column, the where clause is not required.

Syntax2:

update "tablename"

set "columnname" = "newvalue"

[,"nextcolumn" = "newvalue2"...]

Example1:

update employee

set first = ‗Aryan‘;

where city = ‗Kurukshetra‘

Note: The example 1 will replace attribute (first) value for the above first tuples as Aryan only

since condition will be matched.

Example2:

update employee

set first = ‗Aryan‘;

Note: The example 2 will replace attribute (first) value for all tuples as Aryan.

(d) Delete Statement

The delete statement is used to delete records or rows from the table.

173
Syntax:

delete from "tablename"

[where "columnname" = "value"];

Note: You may have where clause as optional [ ].

Example:

delete from employee

where city = 'Kurukshetra';

Note: if you leave off the where clause, all records will be deleted.

delete from employee;

To delete an entire record/row from a table, enter "delete from" followed by the table name. If

delete statement is followed by the where clause which contains the conditions to delete. Then

those rows will be delete for which the where condition is true.

(iii) Transaction Control Language (TCL)

(a) Commit

The commit statement saves all changes made to the database since the last commit or rollback

command. In Oracle, changes made to the database are not permanent until you tell the oracle to

make it permanent. The commit statement makes permanent any change to the database during

the current session/ transaction.

Syntax:

Commit:

Example:

Commit:

174
(b) Rollback

The rollback statement is the reverse of commit statement. It undoes some or all database

changes made during the current transaction. The rollback command is the transaction control

command used to undo transaction that have not already been saved to the database. The rollack

command can only be used to undo transaction since the last Commit or Rollback command was

issued.

Syntax:

Rollback:

Example:

Rollback:

(c) Savepoint

Savepoint is the a special mark inside a transaction that allows all command that are executed

after it was established to be rolled back, restoring the transaction state to what it was at the time

of the Savepoint. The Savepoint statement defines a Savepoint with in a transaction. Changes

made after a Savepoint can be undone at any time prior to the end of the transaction. A

transaction can have multiple Savepoint.

(iv) Data Control Language (DCL)

(a) Grant

This statement deals with who has access to your database? The GRANT command enables you

to grant privileges to user. You can grant privilege on seeing, adding, deleting, referencing and

using of a table.

175
If you decide that Mukta can see your tables in your database, you would use GRANT SELECT

statement. If you desire, you can allow her to insert or update your data in tables. The command

you would use is GRANT INSERT or GRANT UPDATE.

Syntax:

Grant privilege right;

Example:

Grant Select on Book to Mukta; (Here Book is an entity on which privilege has been granted for

Mukta.)

(b) Revoke

Whatever you grant, however, you may also revoke. The REVOKE statement enables you to

take away privileges you previously granted.

Syntax:

Revoke privilege right;

Example:

Revoke Select on Book to Mukta; (Here Book is an entity on which privilege has been revoked

for Mukta.)

7.3.4 Views in SQL

A view is nothing more than a SQL statement that is stored in the database with an associated

name. A view is actually a composition of a table in the form of a predefined SQL query. A view

can contain all rows of a table or select rows from a table. A view can be created from one or

many tables which depend on the written SQL query to create a view.

Views, which are kind of virtual tables, allow users to do the following:

 Structure data in a way that users or classes of users find natural or intuitive.

176
 Restrict access to the data such that a user can see and (sometimes) modify exactly what

they need and no more.

 Summarize data from various tables which can be used to generate reports.

(i) Creating Views:

Database views are created using the CREATE VIEW statement. Views can be created from a

single table, multiple tables, or another view. To create a view, a user must have the appropriate

system privilege according to the specific implementation.

Syntax:

CREATE VIEW view_name AS

SELECT column1, column2.....

FROM table_name

WHERE [condition];

Note: You can include multiple tables in your SELECT statement in very similar way as you use

them in normal SQL SELECT query.

(ii) Updating a View:

A view can be updated under certain conditions:

 The SELECT clause may not contain the keyword DISTINCT.

 The SELECT clause may not contain summary functions.

 The SELECT clause may not contain set functions.

 The SELECT clause may not contain set operators.

 The SELECT clause may not contain an ORDER BY clause.

 The FROM clause may not contain multiple tables.

 The WHERE clause may not contain subqueries.

177
 The query may not contain GROUP BY or HAVING.

 Calculated columns may not be updated.

 All NOT NULL columns from the base table must be included in the view in order for

the INSERT query to function.

So if a view satisfies all the above-mentioned rules then you can update a view.

7.3.5 Queries and Sub-Query in SQL

A query is a means to retrieve meaningful information from the database. There are different

ways to execute query as discussed earlier in this chapter. A Subquery or Inner query or Nested

query is a query within another SQL query and embedded within the WHERE clause.

A sub-query is used to return data that will be used in the main query as a condition to further

restrict the data to be retrieved.

Sub-queries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along

with the operators like =, <, >, >=, <=, IN, BETWEEN etc.

There are a few rules that sub-queries must follow:

 Sub-queries must be enclosed within parentheses.

 A sub-query can have only one column in the SELECT clause, unless multiple columns

are in the main query for the sub-query to compare its selected columns.

 An ORDER BY clause cannot be used in a sub-query, although the main query can use

an ORDER BY. The GROUP BY clause can be used to perform the same function as the

ORDER BY in a sub-query.

 Sub-queries that return more than one row can only be used with multiple value

operators, such as the IN operator.

178
 The SELECT list cannot include any references to values that evaluate to a BLOB,

ARRAY, CLOB, or NCLOB.

 A sub-query cannot be immediately enclosed in a set function.

 The BETWEEN operator cannot be used with a sub-query; however, the BETWEEN

operator can be used within the sub-query.

7.3.6 Constraints in SQL

The SQL CONSTRAINTS are some restrictions in the form of rules which defines some

conditions that restricts the column to remain true while inserting or updating or deleting data in

the column. Constraints can be specified when the table created first with CREATE TABLE

statement or at the time of modification of structure of an existing table with ALTER TABLE

statement.

The SQL CONSTRAINTS are used to implement the rules of the table. If there is any action

which violates the rules so defined by the SQL constraints, then the action is aborted by the

constraint. Some CONSTRAINTS can be used along with the SQL CREATE TABLE statement.

There are 6 types of constraints:

1. Primary key constraint.

2. Foreign Key constraint.

3. Unique Key constraint.

4. Not Null constraint.

5. Check constraint

6. Default constraint

Description of above constraints is as follows:

179
(i) Primary Key: Primary Key of a relational table uniquely identifies each record in the

table. It can be either be a normal attribute that is guaranteed to be unique such as in a school

name should be same of any student but roll number never be same of any student in a school.

(ii) Foreign Key: One of the most important concepts in database is creating relationships

between database tables. These relationships provide a mechanism for linking data stored in

multiple tables and retrieving it in an efficient manner. In order to create a link between two

tables we must specify a foreign key in one table that references a column in another table.

(iii) Unique Key: Unique key constraint is used to make sure that there is no duplicate value

in that column. Both unique key and primary key enforces the uniqueness of column but there is

one difference between them Unique key constraint allow one null value but primary key does

not null value. In a table we create one primary key but we can create more than one unique key

in Sql Server.

(iv) Not null constraint: Not null constraint is used to restrict the insertion of null value in

that column. Not null constraint is used for that column which is not ignorable.

(v) Check Constraint: This constraint is used to check value at the time of insertion like as

salary of any employee is always greater than zero. So we can create a check constraint on

employee table for all the field values which are greater than zero.

(vi) Default Constraint: The Default constraint is used to set a specific value of column if

we are not passing the value at the time of insertion. Through this constraint we set the default

value of column.

7.3.7 SQL Indexes

Index in sql is created on existing tables to retrieve the rows quickly. When there are thousands

of records in a table, retrieving information will take a long time. Therefore indexes are created

180
on columns which are accessed frequently, so that the information can be retrieved quickly.

Indexes can be created on a single column or a group of columns. When an index is created, it

first sorts the data and then it assigns a ROWID for each row.

Syntax to create Index:

CREATE INDEX index_name

ON table_name (column_name1,column_name2...);

In Oracle there are two types of SQL index namely, implicit and explicit.

(i) Implicit Indexes:

They are created when a column is explicity defined with PRIMARY KEY, UNIQUE KEY

Constraint.

(ii) Explicit Indexes:

They are created using the "create index.. " syntax.

NOTE:

1) Even though sql indexes are created to access the rows in the table quickly, they slow down

DML operations like INSERT, UPDATE, DELETE on the table, because the indexes and tables

both are updated along when a DML operation is performed. So, use indexes only on columns

which are used to search the table frequently.

2) It is not required to create indexes on table which have less data.

3) In oracle database you can define up to sixteen (16) columns in an INDEX.

7.3.8 Sample SQL Queries Examples

Specimen of Table Structure

181
Relational Diagram

Table name: EMPLOYEE

Table name: PAINTER

Table name: JOB

Table name: ASSIGNMENT

Table name: PROJECT

Write the SQL code that will create the table structure for a table named EMP_1. This

table is a subset of the EMPLOYEE table as shown above.

CREATE TABLE EMP_1 (

EMP_NUM CHAR(3) PRIMARY KEY,

EMP_LNAME VARCHAR(15) NOT NULL,

EMP_FNAME VARCHAR(15) NOT NULL,

EMP_INITIAL CHAR(1),

EMP_HIREDATE DATE,

JOB_CODE CHAR(3),

182
EMP_YEARS NUMBER(3),

FOREIGN KEY (JOB_CODE) REFERENCES JOB);

Write the SQL code to enter the first two rows for EMP_1 table.

INSERT INTO EMP_1 VALUES (‗101‘, ‗News‘, ‗John‘, ‗G‘, ‘08-Nov-00‘, ‗502‘);

INSERT INTO EMP_1 VALUES (‗102‘, ‗Senior‘, ‗David‘, ‗H‘, ‘12-Jul-89‘, ‗501‘);

After inserting multiple rows in table EMP_1, the records in the table are shown as below:

Assuming that the data shown in the EMP_1 table have been entered, write the SQL code

that will list all attributes for a job code of 502.

SELECT *

FROM EMP_1

WHERE JOB_CODE = ‗502‘;

Write the SQL code that will save the changes made to the EMP_1 table.

COMMIT;

Write the SQL code to change the job code to 501 for the person whose personnel number

is 107. After you have completed the task, examine the results, and then reset the job code

to its original value.

UPDATE EMP_1

SET JOB_CODE = ‗501‘

WHERE EMP_NUM = ‗107‘;

183
To see the changes:

SELECT *

FROM EMP_1

WHERE EMP_NUM = ‗107‘;

To reset, use

ROLLBACK;

Write the SQL code to delete the row for the person named William Smithfield, who was

hired on June 22, 2004 and whose job code classification is 500.

DELETE FROM EMP_1

WHERE EMP_LNAME = 'Smithfield'

AND EMP_FNAME = 'William'

AND EMP_HIREDATE = '22-June-04'

AND JOB_CODE = '500';

Write the SQL code that will restore the data to its original status; that is, the table should

contain the data that existed before you made the changes in Questions 5 and 6.

ROLLBACK;

Write the SQL code to create a copy of EMP_1, naming the copy EMP_2. Then write the

SQL code that will add the attributes EMP_PCT and PROJ_NUM to its structure. The

EMP_PCT is the bonus percentage to be paid to each employee. The new attribute

characteristics are:

EMP_PCT NUMBER (4,2)

PROJ_NUM CHAR (3)

There are two way to get this job done. The two possible solutions are shown next.

184
Solution A:

CREATE TABLE EMP_2 (

EMP_NUM CHAR(3) NOT NULL UNIQUE,

EMP_LNAME VARCHAR(15) NOT NULL,

EMP_FNAME VARCHAR(15) NOT NULL,

EMP_INITIAL CHAR(1),

EMP_HIREDATE DATE NOT NULL,

JOB_CODE CHAR(3) NOT NULL,

PRIMARY KEY (EMP_NUM),

FOREIGN KEY (JOB_CODE) REFERENCES JOB);

INSERT INTO EMP_2 SELECT * FROM EMP_1;

ALTER TABLE EMP_2

ADD (EMP_PCT NUMBER (4,2)),

ADD (PROJ_NUM CHAR(3));

Solution B:

CREATE TABLE EMP_2 AS SELECT * FROM EMP_1;

ALTER TABLE EMP_2

ADD (EMP_PCT NUMBER (4,2)),

ADD (PROJ_NUM CHAR(3));

185
Write the SQL code to enter an EMP_PCT value of 3.85 for the person whose employee

number (EMP_NUM) is 103.

UPDATE EMP_2

SET EMP_PCT = 3.85

WHERE EMP_NUM = '103';

Using a single command sequence, write the SQL code that will enter the project number

(PROJ_NUM) = 18 for all employees whose job classification (JOB_CODE) is 500.

UPDATE EMP_2

SET PROJ_NUM = '18'

WHERE JOB_CODE = '500';

Using a single command sequence, write the SQL code that will enter the project number

(PROJ_NUM) = 25 for all employees whose job classification (JOB_CODE) is 502 or

higher.

UPDATE EMP_2

SET PROJ_NUM = '25'

WHERE JOB_CODE > = '502'

Let the table look like as below after the above database operations:

186
Write the SQL code that will change the PROJ_NUM to 14 for those employees who were

hired before January 1, 1994 and whose job code is at least 501.

UPDATE EMP_2

SET PROJ_NUM = '14'

WHERE EMP_HIREDATE <= ' 01-Jan-94'

AND JOB_CODE >= '501';

Write the SQL code required to list all employees whose last names start with Smith. In

other words, the rows for both Smith and Smithfield should be included in the listing.

Assume case sensitivity.

SELECT *

FROM EMP_2

WHERE EMP_LNAME LIKE 'Smith%';

7.4 Summary

The Structured Query Language, or SQL, is one of the most powerful tools available today when

it comes to working with data sets and getting the information you need from databases. SQL is a

bit different from programming, that it requires that you ask the database for the information you

are looking for without regard for how that information will be retrieved. You only want the

information in the manner in which you have requested it. The SQL engine will take care of

working with sorted information to return the information you need.

7.5 Suggested Reading/ Reference Material

1. Abbey, Abramson & Corey: Oracle 8i-A Beginner's Guide, Tata McGraw Hill Publishing

Company Ltd.

187
2. Ivan Bayross: SQL, PL/SQL-The Program Language of ORACLE,BPB Publication,

New Delhi.

3. Elmasri & Navathe: Fundamentals of Database systems, 3rd Edition, Addison Wesley,

New Delhi.

4. Korth & Silberschatz: Database System Concept, 4th Edition, McGraw Hill International

Edition.

5. S.K.Singh: Database Systems Concept, Design and Applications, 2006, Pearson

Education, ISBN: 81-7758-567-3.

7.6 Self Assessment Questions (SAQ)

1. What are SQL commands? Discuss the different components of SQL with syntax and

suitable examples.

2. What are SQL DDL commands? How integrity constraints are achieved by SQL?

3. What is the difference between DROP and DELETE command? Explain with examples.

4. Differentiate between Alter and Update commands of SQL.

5. Explain GRANT and REVOKE command with suitable examples.

6. What are views? How you can create it?

7. What do you mean by constraints? Discuss the different types of Constraints.

8. Write a note on indexes.

188
Chapter – 8: Functional Dependencies
Writer: Dr. Kanwal Garg
Vetter: Prof. Rajender Nath
Structure:
8.1 Introduction
8.2 Objective
8.3 Presentation of Content
8.3.1 Functional Dependency
8.3.2 Importance of Dependencies
8.3.3 Types of Functional Dependency
8.3.4 Closure Set of Functional Dependency
8.3.5 Armstrong‘s Axioms
8.3.6 Minimal Functional Dependencies or Irreducible Set of Dependencies
8.4 Summary
8.5 Suggested Reading/ Reference Material
8.6 Self Assessment Questions (SAQ)

189
8.1 Introduction

The purpose of database design is to arrange the data field into an organized structure such that it

generates set of relationships and stores information without unnecessary redundancy. In fact, the

redundancy and database consistency are the most important logical criteria in database design.

A bad database design may result into repetitive data and information and an inability to

represent desired information. It is therefore, important to examine the relationships that exist

among the data of an entity to refine the database design. In the present chapter, functional

dependency concepts have been discussed to achieve the minimum redundancy without

compromising on easy data and information retrieval properties of the database.

8.2 Objective

To develop a good description of the data, its relationships and constraints, it is necessary to

understand the concept of functional dependency (FD). FD produces a stable set of relations that

is a faithful model for the enterprise. Such models are highly flexible. It also helps in reducing

redundancy, saving memory space and to maintain consistency among the data. Database

dependencies are important to understand because they provide the basic building blocks used in

database normalization.

8.3 Presentation of Content

8.3.1 Functional Dependency

Functional dependency in a database, serves as a constraint between two sets of attributes.

Defining functional dependency is an important part of relational database design and contributes

to aspect normalization.

Functional dependency is a relationship that exists when one attribute uniquely determines

another attribute. If R is a relation with attributes X and Y, a functional dependency between the

190
attributes is represented as X →Y, which specifies Y is functionally dependent on X. Here X is a

determinant set and Y is a dependent attribute. Each value of X is associated precisely with one

Y value.

Functional dependency defines Boyce-Codd normal form and third normal form in

Normalization. This preserves dependency between attributes, eliminates the repetition of

information. Functional dependency is related to a candidate key, which uniquely identifies a

tuple and determines the value of all other attributes in the relation. In some cases, functionally

dependent sets are irreducible if:

 The right-hand set of functional dependency holds only one attribute

 The left-hand set of functional dependency cannot be reduced, since this may change the

entire content of the set

 Reducing any of the existing functional dependency might change the content of the set

Functional dependency can be defined as follow:

An FD is a relationship between an attribute "Y" and a determinant (1 or more other attributes)

"X" such that for a given value of a determinant ‖Y‖ the value of the attribute ―X‖ is uniquely

defined.

Therefore formally the important properties of Functional Dependency are as follow:

 The concept of describing the whole database as a single universal relation schema.

 Informal definition: A functional dependency is a constraint between two sets of attributes

from the database.

 Formal definition:

o A functional dependency, denoted by X --> Y, between two sets of attributes X and Y

that are subsets of R specifies a constraint on the possible tuples that can form a relation

191
state r of R. The constraint is that, for any two tuples t1 and t2 in r that have t 1[X] =

t2[X], we must also have t 1[Y] = t2[Y].

o That is, the values of the Y component of a tuple in r depend on the values of the X

component.

o Or, there is a functional dependency from X to Y.

o The set of attributes X is called the left-hand side of the FD, and Y is called the right-

hand side .

o X functionally determines Y in a relation schema R if and only if, whenever two tuples of

r(R) agree on their X-value, they must necessarily agree on their Y-value.

 If X is a candidate key of R, then there exists a FD from X to Y for any subset of attributes Y

of R.

 The main use of functional dependencies is to describe further a relation schema R by

specifying constraints on its attributes that must hold at all times.

 Relation extensions r(R) that satisfy the functional dependency constraints are called legal

extensions (or legal relation states) of R.

 A functional dependency is a property of the relation schema (intension) R, not of a

particular legal relation state (extension) r of R. That is, a functional dependency must hold

for ALL the extensions of R.

In a Functional Dependency:

 X is a determinant

 X determines Y

 Y is functionally dependent on X

 X→Y

192
 X →Y is trivial if Y  X

As per Figure 8.1 given below, C# is a determinant of Cname, Ccity and Cphone" is thus also

"Cname, Ccity and Cphone are functionally dependent on C#". Given a particular value of

Cname value, there exists precisely one corresponding value for each of Cname, Ccity and

Cphone. This is more clearly seen via the following functional dependency diagram:

Figure 8.1: Functional dependencies in the Transaction relation

Similarly in Figure 8.2, "(C#, P#, Date) is a determinant of Qnt" is thus also "Qnt is

functionally dependent on the set of attributes (C#, P#, Date)". The set of attributes is also

known as a composite attribute.

Figure 8.2: Functional dependency on a composite attribute

Example 1:

In a Student relation; Student_ID determines Student _address, because corresponding to one

Student_ID there will be only one addresss. Therefore the FD will be written as:

Student_ID → Student_address

193
but vice-versa is not true, because several students can live against one address.

Example 2:

In an Employee relation; Social Security Number determines Employee Name and Salary,

because correspond to one SSN there will be only one Emp_name and Salary. Therefore the FD

will be written as:

SSN → (Emp_Name, Salary)

Additionally, the above can be read as:

SSN → EmpName and SSN Salary

but vice-versa is not true, because against several employee can have the same name and salary.

Example 3: In this relation

A B

1 1

2 4

3 9

4 16

2 4

7 9

8 10

Since for reach value of A, there is associated one and only one value for B. Hence

A→B

But, in the following relation

194
A B

1 1

2 4

3 9

4 16

2 4

7 9

8 9

As per the definition of Functional dependency, ―An attribute in a relational model is said to be

functionally dependent on another attribute in the table if it can take only one value for a given

value of the attribute upon which it is functionally dependent.‖ Since for A = 3 there is

associated more than one value of B.

Example: Consider the database having following tables:

Supplier Table

Sno Sname Status City

S1 Sumit 20 Panipat

S2 Ankit 10 Amritsar

S3 Amit 10 Amritsar

Part Table

Pno Pname Color Weight City

P1 Nut Red 12 Panipat

195
P2 Bolt Green 17 Amritsar

P3 Screw Blue 17 Amritsar

P4 Screw Red 14 Panipat

Shipment Table

SNo PNo Qty

S1 P1 270

S1 P2 300

S1 P3 700

S2 P1 270

S2 P2 700

S2 P2 300

Here in Supplier table

Sno - Supplier number of supplier that is unique

Sname - Supplier name

City - City of the supplier

Status - Status of the city e.g. A grade cities may have status 10, B grade cities

may have status 20 and so on.

Here, Sname is FD on Sno. Because, Sname can take only one value for the given value of Sno

(e.g. S1) or in other words there must be one Sname for supplier number S1.

FD is represented as:

Sno→ Sname

FD is shown by à which means that Sname is functionally dependent on Sno.

196
Similarly, city and status are also FD on Sno, because for each value of Sno there will be only

one city and status.

FD is represented as:

Sno → City

Sno → Status

S. Sno → S (Sname, City, Status)

Consider another database of shipment with following attributes:

Sno - Supplier number of the supplier

Pno - Part number supplied by supplier

Qty - Quantity supplied by supplier for a particular Part no

In this case Qty is FD on combination of Sno, Pno because each combination of Sno and Pno

results only for one Quantity.

SP (Sno, Pno) → SP.QTY

Dependency Diagrams

A dependency diagram consists of the attribute names and all functional dependencies in a given

table. The dependency diagram of Supplier table is.

Sno City

Sname Status

Here, following functional dependencies exist in supplier table

Sno - Sname

197
Sname - Sno

Sno - City

Sno - Status

Sname - City

Sname - Status

City - Status

The FD diagram of relation P is

Pname

Color
Pno

Weight

Here following functional dependencies exist in Part table:

Pno - Pname

Pno - Color

Pno - Weight

The FD diagram of relation Shipment is:

Sno
Qty

Pno

198
Here following functional dependencies exist in Shipment table is:

SP (Sno, Pno) - SP.QTY

The two most important things to remember about functional dependency (fd) are:

(1) Fd‘s are determined by the meaning of the attributes and their role in the "real world" which

is being modeled by the database.

(2) Fd‘s are in turn used to group the attributes together to form the normalized relations of the

database.

Consider the attributes:

C Class (of a course)

T Time for the class

R Room for the class

I Instructor of the class

Which describe the arrangement of room and time for classes taught by the instructors.

These attributes are used to model part of the "real world" in which we have classes at certain

time in certain room lectured by certain instructors. We have certain constraints about the objects

(such as class, time, etc.) in the "real world" and such constraints are in turn represented in terms

of functional dependencies.

The constraints are:

(1) No two instructors teach the same course.

(2) At any time and in a given room, there is at most one class being taught there.

(3) No class can be taught at one given time in two rooms.

(4) No instructor can teach two classes at one given time.

The functional dependencies are:

199
(a) C → I

(b) TR → C

(c) CT → R

(d) IT →C

Given below is a database which has some tuples violating the above functional dependencies.

Class Instructor Time Room

CS-DE-14 KAVITA 1:30/ M 101

CS-DE-20 BABITA 3:00/ T 205

CS-DE-14 SUMIT 2:00/F 310 (a)

MCA-24-50 AMIT 3:00/T 205 (b)

CS-DE-20 BABITA 3:00/T 208 (c)

CS-DE-17 KAVITA 1:30/M 202 (d)

It is easy to see that the tuples marked (a), (b), (c), (d) violate the functional dependencies (a),

(b), (c), (d) respectively.

8.3.2 Importance of Dependencies

Database dependencies are important to understand because they provide the basic building

blocks used in database normalization. For example:

 For a table to be in second normal form (2NF), there must be no case of a non-prime

attribute in the table that is functionally dependendent upon a subset of a candidate key.

 For a table to be in third normal form (3NF), every non-prime attribute must have a non-

transitive functional dependency on every candidate key.

 For a table to be in Boyce-Codd Normal Form (BCNF), every functional dependency

(other than trivial dependencies) must be on a super key.

200
 For a table to be in fourth normal form (4NF), it must have no multivalued dependencies.

8.3.3 Types of Functional Dependency:

Functional dependencies can be classified as follows:

(i) Full Functional Dependency:

When all the non-key attributes of a relation ‗R‘ are dependent on the key attributes, the

functional dependency is called as full functional dependency.

The term full functional dependency is used to describe the minimum set of attributes in the

determinant of an FD. The rules for full functional dependency are that if the set of attributes Y

are to be fully dependent on the set of attributes X, the following must hold:

 Y is functionally dependent on X, and

 Y is not functionally dependent on any subset of X

Example 1:

Let A and B are two attributes of a relation ‗R‘, where B is a non-key attribute which is

functionally dependent on A (key attribute), but not on any proper subset of A. i.e. A→ B. If we

remove any attribute from the relation ‗R‘ then it will violate the concept of functional

dependency.

Example 2:

Let there is a relation Employee with attributes (Emp_id, Emp_name, Emp_addr, Emp_phone).

Here we can see that the emp_id is a primary key attribute and all other attributes are non-key

attributes which are fully functional dependent of primary key. We can say that the relation is in

the form of full functional dependent.

201
Emp_name

Emp_addr
Emp_id

Emp_phone

Figure 8.1: Full Functional Dependent

(ii) Partial Functional Dependency:

Partial Functional Dependency indicates that if A and B are the attributes of a Relation ‗R‘. B is

Partial Functional Dependent on A (A→ B) if there is some attribute that can be removed from A

and dependency still holds.

Example: Let there is a relation Employee _project with attributes (Ecode, Pcode and Dept).

Ecode and Pcode are composite key attribute and Dept is a non-key attribute. Here

Ecode,PCode→ Dept, states that dept is functional dependent upon composite key attribute. If

we remove pcode from composite key and still Ecode → Dept exists then we can say that the

relation is Partial Functional Dependent.

"Given a relation R, attribute B of R is fully functionally dependent on attribute A of R if it is

functionally dependent on A and not functionally dependent on any subset of A (A must be

composite)".

Figure 8.3: Functional dependencies in the Transaction relation

202
For the Transaction relation, we may now say that:

Cname is fully functionally dependent on C#

Ccity is fully functionally dependent on C#

Cphone is fully functionally dependent on C#

Qnt is fully functionally dependent on (C#, P#, Date)

Cname is not fully functionally dependent on (C#, P#, Date), it is only partially dependent on it

(and similarly for Ccity and Cphone).

(iii) Transitive Functional Dependency:

A transitive functional dependency can occur only in a relation that has three or more attributes.

Let A, B, C are the three attributes in a Relation ‗R‘. Suppose all three of the following

conditions holds:

 A→ B

 It is not the case that B → A

 B → C, Then it presumes that A→ C

(iv) Multivalued Dependency (→→):

A Multi-Value Dependency (MVD) occurs when two or more independent multi valued facts

about the same attribute occur within the same table. It means that if in a relation R having A, B

and C as attributes, B and C are multi-value facts about A, which is represented as A →B and

A→C, then multi value dependency exist only if B and C are independent on each other.

The formal definition of multivalued dependency (X →→Y) is:

If t1 and t2 are tuples such that t1.X = t2.X, then there are tuples t3 and t4 such that

203
1. t1.X = t3.X = t4.X

2. t1.Y = t3.Y and t2.Y = t4.Y

3. t1.Z = t4.Z and t2.Z = t3.Z

where Z = R - (X U Y)

Examples: For example, imagine a car company that manufactures many models of car, but

always makes both red and blue colors of each model. If you have a table that contains the model

name, color and year of each car the company manufactures, there is a multivalued dependency

in that table. If there is a row for a certain model name and year in blue, there must also be a

similar row corresponding to the red version of that same car.

(v) Trivial and Non-Trivial Functional Dependency:

A functional dependency X→Y is said to be a trivial Functional Dependency if Y, the right hand

side of the Functional Dependency, is a subset of X.

Example: The functional dependency (Ecode, Pcode)→Ecode is trivial because the set{Ecode}

(for the R.H.S. of the functional dependency) is a subset of (Ecode, Pcode) (for the L.H.S. of the

functional dependency).

On the other hand, the functional dependency (Ecode, Pcode) → Ecode, Ename is NON-trivial

because the set {Ecode, Ename } is NOT a subset of the attribute set {Ecode, Pcode }.

8.3.4 Closure Set of Functional Dependencies

Let a relation ‗R‘ have some functional dependencies ‗F‘ specified. The closure of F (usually

written as F+) is the set of all functional dependencies that may be logically derived from ‗F‘.

Often ‗F‘ is the set of most obvious and important functional dependencies and F+, the closure,

is the set of all the functional dependencies including F and those that can be deduced from F.

204
The closure is important and may, for example, be needed in finding one or more candidate keys

of the relation.

To determine the set X+ of attributes that are functionally determined by X based on ―R‖, X+ is

called the closure of X under R. Algorithm to find X+ is given as below:

Input: A relation schema R, a set of functional dependencies F and a set of attributes X.

Output: X+.

(1) X+ ¬ X.

(2) Repeat

found ¬ false;

for (each fd Y ® Z ÎF) do

if (Y Í X+)

then begin

found ¬ true;

X+ ¬ X+ È Z;

remove Y ® Z from further consideration;

end;

until (found = false) or (X+ = all attributes).

8.3.5 Armstrong’s Axioms

To determine a systematic way to infer dependencies, we must discover a set of inference rules

that can be used to infer new dependencies from a given set of dependencies. William W.

Armstrong (1974) established a set of rules which can be sued to infer the functional

dependencies in a relational database. Such rules are given below:

205
Table 8.1 Inference Rules

Inference
Axiom Name Axiom Example
Rule

if a is set of attributes, b
IR1 Reflexivity SSN,Name → SSN
⊆ a, then a →b

if a→ b holds and c is a SSN → Name then

IR2 Augmentation set of attributes, then SSN,Phone → Name,

ca→cb Phone

if a →b holds and b→c SSN →Zip and Zip →


IR3 Transitivity
holds, then a→ c holds City then SSN →City

SSN→Name and
if a → b and a → c
IR4 Union or Additivity SSN→Zip then
holds then a→ bc holds
SSN→Name,Zip

SSN→Name,Zip then
Decomposition or if a → bc holds then
IR5 SSN→Name and
Projectivity a → b and a → c holds
SSN→Zip

Address → Project and

if a → b and cb → d Project,Date →Amount


IR6 Pseudotransitivity
hold then ac → d holds then Address,Date →

Amount

ab→ c does NOT imply


(NOTE)
a → b and b → c

206
 Inference rules IR1 through IR3 are known as Armstrong’s inference rules.

 Inference rules IR1 through IR3 are sound and complete.

 By sound, we mean that, given a set of functional dependencies F specified on a relation

schema R, any dependency that we can infer from F by using IR1 through IR3 holds in every

relation state r of R that satisfies the dependencies in F.

 By complete, we mean that using IR1 through IR3 repeatedly to infer dependencies until no

more dependencies can be inferred results in the complete set of all possible dependencies

that can be inferred from F, that is, the closure of F.

To determine F+ (as shown in section 8.3.4), we need rules for deriving all functional

dependencies that are implied by F. A set of rules that may be used to infer additional

dependencies was proposed by Armstrong in 1974 as shown in Table 8.1. These rules (or

axioms) are a complete set of rules in· that all possible functional dependencies may be derived

from them. The rules are explained as below:

1. Reflexivity Rule: If X is a set of attributes and Y is a subset of X, then X →Y holds.

The reflexivity rule is the simplest (almost trivial) rule. It states that each subset of X is

functionally dependent on X. In other words trivial dependence is defined as follows:

Trivial functional dependency: A trivial functional dependency is a functional dependency of an

attribute on a superset of itself.

For example: {Employee ID, Employee Address} → {Employee Address} is trivial, here

{Employee Address} is a subset of {Employee ID, Employee Address}.

2. Augmentation Rule: If X → Y holds and W is a set of attributes, and then WX → WY holds.

The argumentation ('u rule is also quite simple. It states that if Y is determined by X then a set of

attributes W and Y together will be determined by W and X together. Note that we use the

207
notation WX to mean the collection of all attributes in W and X and write WX rather than the

more conventional (W, X) for convenience.

For example: Rno - Name; Class and Marks is a set of attributes and act as

W. Then· {Rno, Class, Marks} →{Name, Class, Marks}

3. Transitivity Rule: If X →Y and Y →Z hold, then X →Z holds.

The transitivity rule is perhaps the most important one. It states that if X functionally determines

Y and Y functionally determine Z then X functionally determines Z.

For example: Rno →City and City →Status, then Rno →Status should be holding true.

These rules are called Armstrong's Axioms.

Further axioms may be derived from the above although the above three axioms are sound and

complete in that they do not generate any incorrect functional dependencies (soundness) and they

do generate all possible functional dependencies that can be inferred from F (completeness). The

most important additional axioms are:

a. Union Rule: If X →Y and X →Z hold, then X →YZ holds.

Proof: Using Armstrong's Axioms:

1. X→Y , Given

2. X→Z, Given

3. X→XZ, Augment 2 by X

4. XZ→Y Z, Augment 1 by Z

5. X→Y Z, Transitivity using 3 and 4.

b. Decomposition Rule: If X → YZ holds, then so do X → Y and X → Z.

Proof: Using Armstrong's Axioms:

1. X→Y Z, Given

208
2. Y Z→Y , Reexivity

3. X→Y , Transitivity on 1 and 2.

Similar proof for X→Z.

c. Pseudotransitivity Rule: If X → Y and WY → Z hold then so does WX →Z.

Based on the above axioms and the .functional dependencies specified for relation student, we

may write a large number of functional dependencies. Some of these are:

( sno, cno) → sno (Rule 1)

(sno, cno) → cno (Rule 1)

(sno, cno) → (Sname, cname) (Rule 2)

cno → office (Rule 3)

sno → (Sname, address) (Union Rule) Etc.

Often a very large list of dependencies can be derived from a given set F since Rule 1 itself will

lead to a large number of dependencies. Since we have seven attributes (sno, Sname, address,

cno, cname, instructor, office), there are 128 (that is, 2^7) subsets of these attributes. These 128

subsets could form 128 values of X in functional dependencies of the type X ~ Y. Of course,

each value of X will then be associated with a number of values for Y (Y being a subset of x)

Leading to several thousand dependencies. These large numbers of dependencies are not

particularly helpful in achieving our aim of normalizing relations.

Although we could follow the present procedure and compute the closure of F to find all the

functional dependencies, the computation requires exponential time and the list of dependencies

is often very large and therefore not very useful. There are two possible approaches that can be

taken to avoid dealing with the large number of dependencies in the closure. 'One' is to deal with

one attribute or a set of attributes at a time and find its closure (i.e. all functional dependencies

209
relating to them). The aim of this exercise is to find what attributes depend on a given set of

attributes and therefore ought to be together. The other approach is to find the minimal· covers.

8.3.6 Minimal Functional Dependencies or Irreducible Set of Dependencies

In discussing the concept of equivalent FDs, it is useful to define the concept of minimal

functional dependencies or minimal cover which is useful in eliminating necessary functional

dependencies so that only the minimal numbers of dependencies need to be enforced by the

system. The concept of minimal cover of F is sometimes called irreducible Set of F.

A functional depending set S is irreducible if the set has three following properties:

 Each right set of a functional dependency of S contains only one attribute.

 Each left set of a functional dependency of S is irreducible. It means that reducing anyone

attribute from left set will change the content of S (S will lose some information).

 Reducing any functional dependency will change the content of S.

Sets of functional dependencies with these properties are also called canonical or minimal.

Further, we can define minimal cover as:

Let F1 and F2 be two sets of functional dependencies. If F1 º F2, then we say the F1 is a cover of

F2 and F2 is a cover of F1. We also say that F1 covers F2 and vice versa. It is easy to show that

every set of functional dependencies F is covered by a set of functional dependencies G, in

which the right hand side of each fd has only one attribute.

We say a set of dependencies F is minimal if:

(1) Every right hand side of each fd in F is a single attribute.

(2) The left hand side of each fd does not have any redundant attribute, i.e., for every fd X → A

in F where X is a composite attribute, and for any proper subset Z of X, the functional

dependency Z→A € F+.

210
(3) F is reduced (without redundant fd‘s). This means that for every X → A in F, the set F - {X

→ A} is NOT equivalent to F.

It is easy to see that for each set F of functional dependencies, there exists a set of functional

dependencies F‘ such that F = F′ and F′ is minimal. We call such F′ a minimal cover of F.

The algorithm to compute F’, a minimal cover of F.

Input: F, a set of fd‘s.

output: F‘, a minimal cover of F.

(1) Let F‘ = {X→A | X→A ЄF and A is a single attribute}. For each fd X → A1,A2, ... An Є F

(n > 1), put the fd‘s X→ A1, X→ A2, ..., X → An into F‘, where Ai is a single attribute.

(2) While

there is an fd X → AЄ F‘ such that X is a composite attribute and Z is a proper subset of X and

Z → AЄ(F‘)+,

do

replace X → A with Z → A.

(3) For each fd X → AЄF‘, check if it is redundant, eliminates it.

It is important to note that for the above algorithm, the ordering between step (2) and step (3) is

critical.If you first perform step (3) and then perform step (2) of the algorithm, the resulting set

of fd‘s may still have redundant functional dependencies.

It should be pointed out that for a set of functional dependencies F, there may be more than one

minimal covers of F.

Example: (Computing a minimal cover.)

Let R = R(ABCDEGH) and F = {CD → AB, C → D, D → EH, AE → C, A → C, B → D}. The

process of computing a minimal cover of F is as follows:

211
(1) Break down the right hand side of each fd’s. After performing step (1) in the algorithm, we

get F‘ = {CD → A, CD → B, C → D, D → E, D → H, AE → C, A → C, B → D}.

(2) Eliminate redundancy in the left hand side. The fd CD → A is replaced by C → A. This is

because C → D Є(F‘)+, hence C → CD Є(F‘)+; from C → CD Є(F‘)+ and CD → A ЄF‘, by

transitivity, we have C → A Є(F‘)+ and hence CD → A should be replaced by C → A.

Similarly, CD→ B is replaced by C → B, AE → C is replaced by A → C. F‘ = {C→A, C → B,

C → D, D → E, D → H, A → C, B → D} after step (2).

(3) Remove redundant fd’s. The fd C → D is eliminated because it can be derived from C → B

and B → D and hence it is redundant. The F‘ now becomes {C → A, C → B, D → E, D → H, A

→ C, B → D}, which is the only minimal cover of F.

Example: (Computing a minimal cover.)

Let the relation R be R(ABCDE) and the set of functional dependencies be F = {AB → C,

ABC→ D, AE→ BC, BC → AE}.

We compute a minimal cover of F in the following steps:

(1) Break down the right hand side of each fd’s. F‘ = {AB → C, ABC → D, AE → B,

AE → C, BC → A, BC → E}.

(2) Eliminate redundancy in the left hand side. The fd ABC → D is replaced by AB → D

because AB+ = ABCDE, and hence the attribute C in fd ABC → D is redundant. Note that we

could also replace ABC -> D by BC → D, because BC+ = BCADE. But we do NOT need to

include both AB → D and BC → D in F‘: one of the two is sufficient. No other composite left

hand side of fd‘s can be reduced further, and thus we get F‘ = {AB → C, AB → D, AE → B,

AE → C, BC → A, BC → E}.

212
(3) Remove redundant fd’s. The fd AE → C is redundant because we can derive it from AE →

B and AB → C. Thus AE → C is removed. No other fd is redundant. The final minimal cover of

F is Fmin = {AB → C, AB → D, AE → B, BC → A, BC → E}.

Note: If we choose to replace ABC → D by BC→D in step (2) above, we would get an

alternative minimal cover of F.

8.4 Summary

A functional dependency (FD) is a many-to-many relationship between two sets of attributes of a

given relation. It is a kind of integrity constraint that generalizes the concepts of a key. Let X and

Y are two attributes of a relation. Given the value of X, if there is only one value of Y

corresponding to it, then Y is said to be functionally dependent on X. A functional dependency is

a property of the relation schema R, not of a particular legal relation state ‗r‘ of ‗R‘. Hence a FD

cannot be inferred automatically from a given relation state ‗r‘ but must be defined explicitly by

someone who knows the semantics of the attributes of ‗R‘.

5. Suggested Reading/ Reference Material

1. Elmasri & Navathe: Fundamentals of Database systems, 3rd Edition, Addison Wesley,

New Delhi.

2. Korth & Silberschatz : Database System Concept, 4th Edition, McGraw Hill International

Edition.

3. Abbey, Abramson & Corey: Oracle 8i-A Beginner's Guide, Tata McGraw Hill Publishing

Company Ltd.

4. S.K.Singh: Database Systems Concept, Design and Applications, 2006, Pearson Education,

ISBN: 81-7758-567-3.

213
6. Self Assessment Questions (SAQ)

1. What is Functional Dependency? Define the term full and partial functional dependency

with suitable examples.

2. Define closure set of functional dependency with suitable examples.

3. What do you mean by functional dependency? Discuss the different type of functional

dependency.

4. State Armstrong‘s Inference rules. How transitivity can be achieved? Explain.

5. Write a short note on Minimal Functional Dependencies.

214
Chapter – 9: Normalization
Writer: Dr. Kanwal Garg
Vetter: Prof. Rajender Nath
Structure:
9.1 Introduction
9.2 Objective
9.3 Presentation of Content
9.3.1 Bad Database Design
9.3.2 Database Anomalies
9.3.3 Normalization
(i) Rules of Normalization
9.3.4 Normal Forms
(i) First Normal Form (1NF)
(ii) Second Normal Form (2NF)
(iii) Third Normal Form (3NF)
(iv) Boycee- Codd Normal Form (BCNF)
9.3.5 Example First and Second Normal Form
9.4 Summary
9.5 Suggested Reading/ Reference Material
9.6 Self Assessment Questions (SAQ)

215
9.1 Introduction

Normalization is a rigorous design tool that is based on the mathematical theory of relations

which will result in very practical operational implementations. A properly normalized set of

relations actually simplifies the retrieval and maintenance processes and the effort spent in

ensuring good structures is certainly a worthwhile investment. Furthermore, if database relations

were simply seen as file structures of some vague file system, then the power and flexibility of

RDBMS cannot be exploited to the full. Good database design needless to say, is important.

Therefore to have an appropriate database design, Normalization is a systematic way of ensuring

that a database structure is suitable for general purpose querying and free from database

anomalies. Dr. E.F. Codd, the inventor of the relational model, introduced the concept of

normalization and what we now know as the first normal form in 1970. Dr. Codd went on to

define second and third normal forms in 1971, and Codd and Raymond F. Boycee defined the

Boycee-Codd normal form in 1974.

9.2 Objective

Normalization is a logical database design that involves organizing the data into more than one

table. Normalization improves the performance by reducing redundancy in database tables. The

basic objectives of normalization are to reduce redundancy, which means that information is to

be stored only once in a relation. Storing information several times leads to wastage of storage

space and increase in the total size of the data stored. There are certain goals of normalization

process:

 Eliminating redundant data

 Ensuring data dependencies by reducing the amount of space required by relation in

database and ensure that such relations are logically stored.

216
 Eliminating the columns that are not dependent on key attribute.

9.3 Presentation of Content

9.3.1 Bad Database Design

E.Codd has identified certain structural features in a relation which create retrieval and update

problems. Suppose we start off with a relation with a structure and details like:

This is a simple and straightforward design. It consists of one relation where we have a single

tuple for every customer and under that customer we keep all his transaction records about parts,

up to a possible maximum of 9 transactions. For every new transaction, we need not repeat the

customer details (of name, city and telephone), we simply add on a transaction detail.

However, we note the following disadvantages:

 The relation is wide and clumsy

 We have set a limit of 9 (or whatever reasonable value) transactions per customer. What

if a customer has more than 9 transactions?

 For customers with less than 9 transactions, it appears that we have to store null values in

the remaining spaces. What a waste of space!

 The transactions appear to be kept in ascending order of P#s. What if we have to delete,

for customer Codd, the part numbered 1- should we move the part numbered 2 up (or

217
rather, left)? If we did, what if we decide later to re-insert part 2? The additions and

deletions can cause awkward data shuffling.

Let us try to construct a query to "Find which customer(s) bought P# 2" ? The query would have

to access every customer tuple and for each tuple, examine every of its transaction looking for

(P1# = 2) OR (P2# = 2) OR (P3# = 2) ... OR (P9# = 2)

A comparatively simple query seems to require a clumsy retrieval formulation!

Alternatively, why don't we re-structure our relation such that we do not restrict the number of

transactions per customer. We can do this with the following structure:

This way, a customer can have just any number of Part transactions without worrying about any

upper limit or wasted space through null values (as it was with the previous structure).

Constructing a query to "Find which customer(s) bought P# 2" is not as cumbersome as before as

one can now simply state: P# = 2.

But again, this structure is not without its faults:

 It seems a waste of storage to keep repeated values of Cname, Ccity and Cphone.

 If C# 1 were to change his telephone number, we would have to ensure that we update ALL

occurrences of C# 1's Cphone values. This means updating tuple 1, tuple 2 and all other

tuples where there is an occurrence of C# 1. Otherwise, our database would be left in an

inconsistent state.

 Suppose we now have a new customer with C# 4. However, there is no part transaction yet

with the customer as he has not ordered anything yet. We may find that we cannot insert this

218
new information because we do not have a P# which serves as part of the 'primary key' of a

tuple. (A primary key cannot have null values).

Suppose the third transaction has been canceled, i.e. we no longer need information about 25 of

P# 1 being ordered on 26 Jan. We thus delete the third tuple. We are then left with the following

relation:

But then, suppose we need information about the customer "Martin", say the city he is located in.

Unfortunately as information about Martin was held in only that tuple and having the entire tuple

deleted because of its P# transaction, meant also that we have lost all information about Martin

from the relation.

As illustrated in the above instances, we note that badly designed, un-normalized relations waste

storage space. Worse, they give rise to database anomalies:

9.3.2 Database Anomalies

A serious problem with the relation as base relation is the problem of suffering from anomalies

such as insertion, deletion and update anomalies as explained below. To understand these

anomalies let us consider a relation ‗Deptt‘ with attributes i.e. {Deptt_id, Deptt_name,

Deptt_course)

Deptt

Deptt_id Deptt_name Deptt_course

101 DCSA MCA

219
102 DCSA M.SC

201 UIET CSE

202 UIET IT

203 IMS MBA

 Insertion Anomalies: It may not be possible to store information unless some other

information is stored as well e.g. if Deptt_name offered one more Deptt_course i.e. M.Phil.,

We cannot enter this data into the table until a student opt for this course.

 Deletion Anomalies: It may not be possible to delete some information without loosing

some other information as well. e.g. if department want to close a specific course, but we

cannot do so until all the students offered that course are deleted respectively.

 Update Anomalies: If one copy of such repeated data is updated, an inconsistency is created

unless all copies are similarly updated e.g. if we want to change the name of a department,

but we cannot do so, until the name is respectively changed to other tables storing such data

as well.

9.3.3 Normalization

Normalization is a step by step process of removing redundancies of attributes in data structure.

It is a technique used to design relational database. In order to design a relational model we have

to decide the logical structure of the database.

Normalization is a process in which data can be defined as a process during which redundant

relational schema are decomposed into smaller groups.

220
Normalization is typically a refinement process after the initial exercise of identifying the data

objects that should be in the database, identifying their relationships and defining the tables

required and the columns within each table.

(i) Rules of Normalization

Normalization is a specific relational database analysis and design technique used to model

groups of related data within an organization. Its purpose is to ensure data stored within the

database adheres to best practices by following a set of rules with the purpose of eliminating

redundancies and optimizing the process of information retrieval. Normalization leaves us with a

structure that groups like data into relational models referenced by keys and linked to other

relational models to form a relational database schema.

Normalization is represented by a logical set of steps that follow simple rules that are applied to

each stage of the modeling process. At the highest level the stages are separated into something

called Normal Forms, identified by a particular named process.

Initially there were only three normal forms, First Normal Form (1NF), Second Normal Form

(2NF) and Third Normal Form (3NF), but over time three more were added. In general terms the

first three are more commonly used in database modeling. The additional three are identification

of potential redundancies that could be considered but however when applied practically can lead

to inefficiencies in performance and tend to be used under special circumstances or for

consideration with complex data structures.

In addition we have something called Un-Normalized Form (UNF), though not generally

considered as part of the Normalization rules, is representative of the very first stages of the

Normalization process.

We can identify each of the normal forms as follows and will define each in detail thereafter:

221
1. Un-Normalized Form (UNF) – Data Modeling

2. First Normal Form (1NF) – Repeating Groups

Figure 9.1: Normalization Process

3. Second Normal Form (2NF) – Partial Dependencies

4. Third Normal Form (3NF) – Transitive Dependencies

Normalization helps to simplify the structure of tables. The performance of an application is

directly linked to the database design. Some rules that should be followed to achieve a good

database design are:

 Each table should have an identifier (Primary key attribute).

222
 Each table should store data for a single type of entity.

 Null value in the column should be avoided.

 The repetition of value in the column should be avoided.

Normalization depends on certain specified constraints, and rules that support the codd‘s

RDBMS rules. One of the constraints between two sets of attributes from the database is the

Functional Dependency as discussed in Chapter 8. We must be aware about it before proceeding

towards normalization.

The Normal forms are applicable to individual tables; to say that an entire database is, in normal

form is to say that all of its tables are in the normal form. In the upcoming sections we are

providing a detailed concept about normal forms based on primary key.

9.3.4 Normal Forms

Normalization works through a series of stages called normal forms. In a good database design

we need some guidance to decompose the relation into smaller relation. To provide such,

guidance several normal forms have been proposed. The normal forms based on functional

dependency are first normal form (1NF), second normal form (2NF), third normal form (3NF)

and Boycee-Codd normal Form (BCNF). These all normal forms are based on Primary key.

(i) First Normal Form (1NF)

IT states that the domain of an attribute must include only atomic value and that the value of any

attribute in a tuple must be a single value from the domain of that attribute. Hence, 1NF

disallows having a set of values, a tuple of values or combination of both as an attribute value for

a single tuple.

In other words, ―A relation is said to be in the first normal form, if every attribute of that relation

store atomic or indivisible values in every tuple.‖

223
Consider the following non-normalized relation.

Employee

EmpId EmpName Project

101 Arpit Networking, Software Engineering, Operating

System.

102 Aryan Management Information System, Marketing

103 Satvik System Analyst and Programming

The above relation does not fulfill the definition of first normal form. Because the attribute

{Project} is a multi-valued attribute. There are three techniques to achieve this non-normalized

relation into first normal form.

1. Horizontal Expansion:

Expand the number of attribute if the maximum number of values is known for an attribute. For

example if it is known that at the most three projects can be allocated to one employee. Then we

can create new attributes as {Project1, Project2, Project3} for every value of Project attribute.

The above relation can be normalized as below:

Employee

EmpId EmpName Project1 Project2 Project3

101 Arpit Networking Software Operating

Engineering System

102 Aryan Management Marketing Nul

Information

System

224
103 Satvik System Analyst Null Null

and

Programming

This solution has the disadvantage of introducing null values, if most employees has fewer

projects than three as mentioned. Therefore this solution is rejected.

2. Vertical Expansion:

Expand the key attribute so that there will be separate tuple in the original relation for each value

of the project.

Employee

EmpId EmpName Project1

101 Arpit Networking

101 Arpit Software Engineering

101 Arpit Operating System

102 Aryan Management Information System

102 Aryan Marketing

103 Satvik System Analyst and Programming

This solution has the disadvantage of introducing redundant data in the relation. Therefore this

solution is also rejected.

3. Decompose the Relation:

To convert the non-normalized relation into normalized table, we may remove the attribute that

violates the definition of 1NF and place that attribute in a separate relation along with the

primary key. The Primary key of this relation is an attribute or a set of attribute that uniquely

225
define a tuple in a relation. This decomposes the non-normalized relation into two relations

namely Emp_Proj and Proj_Info fulfilling the definition of 1NF.

Emp_Proj

EmpId EmpName ProjectId

101 Arpit 10

101 Arpit 11

101 Arpit 12

102 Aryan 13

102 Aryan 14

103 Satvik 15

Proj_Info

ProjectId Project_Name

10 Networking

11 Software Engineering

12 Operating System

13 Management Information System

14 Marketing

15 System Analyst and Programming

In the above three techniques, the third one is superior because it does not have redundancy and

it is completely general and follows the rules of First Normal Form.

226
(ii) Second Normal Form (2NF)

A relation is said to be in the second normal form, if it is in the first normal form and non-key

attributes are fully functional dependent on the key attribute. Concept of Full Functional

Dependency and Partial Functional Dependency are already explained in Chapter 8 Section 3.2.

Consider the non-normalized relation ―Order‖ as follows:

Order

Orderno ItemNo Orderdate Units

100 1000 01/01/2014 200

101 1000 02/01/2014 250

102 1002 03/01/2014 300

103 1003 04/01/2014 250

In the above relation ―Order‖, the attributes Orderno and Itemno is a composite key attributes

and the attributes Orderdate and units are the non-key attributes. In this relation Orderdate is

functionally dependent on Orderno, because for each tuple of Orderno we have unique value of

Orderdate. But, for each value of Itemno, there is more than one value of Orderdate. For

example, for attribute Itemno, value 1000, we have two values of Orderdate i.e. ‗01/01/2014‘ and

‗02/01/2014‘. Hence Orderdate is not functionally dependent on Itemno. Therefore, this relation

―Order‖ is not in second normal form. For the relation to be in second normal form, the non-key

attributes must be fully functional dependent on the whole of the primary key. To cover this non-

normalized relation into normalized relation, following steps are performed.

 Find and remove the attribute that are functionally dependent on only a part of the key

and not on the whole key, and place them in the different table.

227
 Group the remaining attributes.

To convert this relation into 2NF, we must remove the attributes that are not fully functional

dependent on whole key and place them in a different table along with the attribute that is

functionally dependent on. In the above example since Orderdate is not fully functional

dependant on whole of the key i.e. Orderno + Itemno, therefore we can place Orderdate along

with Orderno in a separate table called Orderinfo, and the attributes Orderno, Itemno and Units

in a separate relation called as Iteminfo.

Orderinfo

Orderno Orderdate

100 01/01/2014

101 02/01/2014

102 03/01/2014

103 04/01/2014

Iteminfo

Orderno Itemno Units

100 1000 200

101 1000 250

102 1002 300

103 1003 250

Hence the resultant relation fulfills the definition of second normal form. Therefore normalized

relations are achieved as above.

228
(iii) Third Normal Form (3NF)

A relation is said to be in third normal form when it is already in the second normal form and if

all the non key attributes of the relation are independent of all other non-key fields of the same

table. In other words, it requires that data stored in a table should be transitively functionally

dependent only on the primary key, and not on any other field in the table. Concept of

transitively functional dependent is already explained in Chapter 8 Section 3.2.

Hostel

IdNo Name Department Course Hostelno

100 Puneet Computers MCA 1

200 Vinod Computers MSc 2

300 Arman Science BSc 3

400 Jayshree Management MBA 4

In this relation, IdNo is the primary key attribute and all other non-key attributes should be

functionally dependent on it. So, it is in the second normal form. It is clear that all non-key

attribute are functionally dependent on the primary key attribute. Also a non-key attribute i.e.

Course is finctional dependent on other non-key attribute i.e. Hostelno. Therefore this relation is

not in 3NF. Therefore to cover this non-normalized relation into normalized relation, following

steps are performed.

 Find and remove the attributes that are functionally dependent on attributes that are not

the primary key, and place them in the different table.

 Group the remaining attributes.

229
Student_Detail

IdNo Name Department Course

100 Puneet Computers MCA

200 Vinod Computers MSc

300 Arman Science BSc

400 Jayshree Management MBA

Course

Course Hostelno

MCA 1

MSc 2

BSc 3

MBA 4

Hence the resultant relation fulfills the definition of third normal form. Therefore normalized

relations are achieved as above.

(iv) Boycee-Codd Normal Form (BCNF)

The BCNF was introduced as the simpler form of 3NF, because the 3NF was inadequate in some

situation. It was not satisfactory for the relations:

 That has multiple candidate keys (multi attribute key);

 Where the multiple candidate keys (multi attribute key) are composite; and

 Where the multiple candidate keys (multi attribute key) are overlapped it means it has at

least one attribute in common.

230
A relation is said to be in BCNF; if it is in 3NF and no dependency of an attribute of a multi

attribute key on an attribute of another multi attribute key.

Assume that a relation has more than one possible multi attribute key. Assume further that the

multi attribute keys have a common attribute. If an attribute of a composite key is dependent on

an attribute of other composite key, the relation is not in BCNF.

Consider a relation ―Teacher‖, where a teacher can work in more than one department,

percentage time he spent in each department is given but each department has only one HOD.

Teacher

TeacherId Department HOD PercentTime

100 Computers Rajinder Nath 50

200 Mathematics Anil Vashistha 60

200 Physics M.S. Yadav 40

300 History Mukta Garg 30

In the above relation, TeacherId and Department is a composite key attribute. The attribute HOD

and PercentTime are functionally dependent on composite key attribute. Also TeacherID and

HOD are the composite key attribute. The attribute Department and PercentTime are functional

dependent on this composite key. Further HOD is functional Dependent on Department. Hence

the above relation is not in BCNF.

In order to normalize the relation into BCNF, we have to create a new relation from the old

relation by breaking into two sub-tables i.e. Relation ―Department‖ and ―HOD‖.

231
Department

TeacherId Department PercentTime

100 Computers 50

200 Mathematics 60

200 Physics 40

300 History 30

HOD

Department HOD

Computers Rajinder Nath

Mathematics Anil Vashistha

Physics M.S. Yadav

History Mukta Garg

Hence the resultant relation fulfills the definition of BCNF. Therefore normalized relations are

achieved as above.

9.3.5 Example First and Second Normal Form

Example -1. Assume the following relation

Student-courses (Sid: pk, Sname, Phone, Courses-taken)

Where attribute Sid is the primary key, Sname is student name, Phone is student's phone number

and Courses-taken is a table contains course-id, course-description, credit hours and grade for

each course taken by the student. More precise definition of table Course-taken is:

Course-taken (Course-id: pk, Course-description, Credit-hours, Grade)

232
According to the definition of first normal form relation Student-courses is not in first normal

form because one of its attribute Courses-taken is itself a table and is not a simple attribute.

To clarify it more assume the above tables contain the data as shown below:

Student-courses

Sid Sname Phone Courses-taken

100 John 487 2454 St-100-courses-taken

200 Smith 671 8120 St-200-courses-taken

300 Russell 871 2356 St-300-courses-taken

St-100-Course-taken

Course-id Course-description Credit-hours Grade

IS380 Database Concepts 3 A

IS416 Unix Operating System 3 B

St-200-Course-taken

Course-id Course-description Credit-hours Grade

IS380 Database Concepts 3 B

IS416 Unix Operating System 3 B

IS420 Data Net Work 3 C

St-300-Course-taken

Course-id Course-description Credit-hours Grade

IS417 System Analysis 3 A

Now we will verify the various anomalies.

233
1. Insertion anomaly means that that some data cannot be inserted in the database. For

example we cannot add a new course to the database of example-1, unless we insert a student

who has taken that course.

2. Update anomaly means we have data redundancy in the database and to make any

modification we have to change all copies of the redundant data or else the database will

contain incorrect data. For example in our database we have the Course description

"Database Concepts" for IS380 appears in both St-100-Course-taken and St-200-Course-

taken tables. To change its description to "New Database Concepts" we have to change it in

all places. Indeed one of the purposes of normalization is to eliminate data redundancy in the

database.

3. Deletion anomaly means deleting some data cause other information to be lost. For example

if student Russell is deleted from St-100-Course-taken table we also lose the information that

we had a course call IS417 with description System Analysis.

Thus Student-courses table suffers from all the three anomalies.

To convert the above structure to first normal form relations, all non-simple attributes

must be removed or converted to simple attribute. To do that a new relation is created by

combining each row of Student-courses with all rows of its corresponding course table that was

taken by that specific student. Following is Student-courses table in first normal form.

Student-courses (Sid:pk1, Sname, Phone, Course-id:pk2, Course-description, Credit-hours,

Grade)

To cheque the resultant table fulfills the properties of all Normal form.

234
Notice that the primary key of this table is a composite key made up of two parts; Sid and

Course-id. Note that pk1 following an attribute indicates that the attribute is the first part of the

primary key and pk2 indicates that the attribute is the second part of the primary key.

Student-courses

Sid Sname Phone Course-id Course-description Credit-hours Grade

100 John 487 2454 IS380 Database Concepts 3 A

100 John 487 2454 IS416 Unix Operating System 3 B

200 Smith 671 8120 IS380 Database Concepts 3 B

200 Smith 671 8120 IS416 Unix Operating System 3 B

200 Smith 671 8120 IS420 Data Net Work 3 C

300 Russell 871 2356 IS417 System Analysis 3 A

Examination of the above Student-courses relation reveals that Sid does not uniquely identify a

row (tuple) in the relation hence cannot be the primary key. For the same reason Course-id

cannot be the primary key. However the combination of Sid and Course-id uniquely identifies a

row in Student-courses, Therefore (Sid, Course-id) is the primary key of the above relation.

The primary key determines every attribute. For example if you know both Sid and Course-id for

any student you will be able to retrieve Sname, Phone, Course-description, Credit-hours and

Grade, because these attributes are dependent on the primary key. Figure 1 below is the graphical

representation of the functional dependency between the primary key and attributes of the above

relation.

Note that the attribute to the right of the arrow is functionally dependent on the attribute in the

left of the arrow. Thus the combination (Sid, Course-id) is the determinant (that determines other

235
attributes) and attributes Sname, Phone, Course-description, Credit-hours and Grade are

dependent attributes.

Figure 9.2: Functional Dependency 1

Figure 9.3: Functional Dependency 2

Formally speaking a determinant is an attribute or a group of attributes determine the value of

other attributes. In addition to the (Sid, Course-id) there are two other determinants in the above

Student-courses relation. These are; Sid and Course-id attributes. Note that Sid alone determines

236
both Sname and Phone, and attribute Course-id alone determines both Credit-hours and

Course_description attributes.

Attribute Grade is fully functionally dependent on the primary key (Sid, Course-id) because both

parts of the primary keys are needed to determine Grade. On the other hand both Sname, and

Phone attributes are not fully functionally dependent on the primary key, because only a part of

the primary key namely Sid is needed to determine both Sname and Phone. Also attributes

Credit-hours and Course-Description are not fully functionally dependent on the primary key

because only Course-id is needed to determine their values.

The new relation Student-courses still suffers from all three anomalies for the following reasons:

1. The relation contains redundant data (Note Database_Concepts as the course description

for IS380 appears in more than one place).

2. The relation contains information about two entities Student and course.

Following is the detail description of the anomalies that relation Student-courses suffers from.

1. Insertion anomaly: We cannot add a new course such as IS247 with course description

programming techniques to the database unless we add a student who to take the course.

2. Update anomaly: If we change the course description for IS380 from Database Concepts to

New_Database_Concepts we have to make changes in more than one place or else the

database will be inconsistent. In other words in some places the course description will be

New_Database_Concepts and in any place were we forgot to make the changes the

description still will be Database_Concepts.

3. Deletion anomaly: If student Russell is deleted from the database we also loose information

that we had on course IS417 with description System Analysis.

237
The above observation indicates that having a single table Student-courses for our database

causing problems (anomalies). Therefore we break the table to smaller table to get a higher

normal form relation.

To convert Student-courses to second normal relations we have to make all non-primary

attributes to be fully functionally dependent on the primary key. To do that we need to project

(that is we break it down to two or more relations) Student-courses table into two or more tables.

However projections may cause problems. To avoid such problems it is important to keep

attributes, which are dependent on each other in the same table, when a relation is projected to

smaller relations. Following this principle and examination of Figure-1 indicate that we should

divide Student-courses relation into following three relations:

PROJECT Student-courses ON (Sid, Sname, Phone) creates a table call it Student. The relation

Student will be Student (Sid:pk, Sname, Phone) and

PROJECT Student-courses ON (Sid, Course-id, Grade) creates a table call it Student-grade. The

relation Student-grade will be

Student-grade (Sid:pk1:fk:Student, Course-id::pk2:fk:Courses, Grade) and

Projects Student-courses ON (Course-id, Course-Description, Credit-hours) create a table call it

Courses. Following are these three relations and their contents:

Student (Sid:pk, Sname, Phone)

Sid Sname Phone

100 John 487 2454

200 Smith 671 8120

300 Russell 871 2356

238
Courses (Course-id::pk, Course-Description)

Course-id Course-description Credit-hours

IS380 Database Concepts 3

IS416 Unix Operating System 3

IS420 Data Net Work 3

IS417 System Analysis 3

Student-grade (Sid:pk1:fk:Student, Course-id::pk2:fk:Courses, Grade)

Sid Course-id Grade

100 IS380 A

100 IS416 B

200 IS380 B

200 IS416 B

200 IS420 C

300 IS417 A

All these three relations are in second normal form. Examination of these relations shows that we

have eliminated the redundancy in the database. Now relation Student contains information only

related to the entity student, relation Courses contains information related to entity Courses only,

and the relation Student-grade contains information related to the relationship between these two

entity.

Further these three sets are free from all anomalies.

239
1. Insertion anomaly: Now a new Course with course-id IS247 and Course-description can be

inserted to the table Course. Equally we can add any new students to the database by adding

their id, name and phone to Student table. Therefore our database, which made up of these

three tables does not suffer from insertion anomaly.

Figure 9.4: Functional Dependency 3

2. Update anomaly: Since redundancy of the data was eliminated no update anomaly can

occur. To change the course-description for IS380 only one change is needed in table

Courses.

3. Deletion anomaly: the deletion of student Russell from the database is achieved by deleting

Russell's records from both Student and Student-grade relations and this does not have any

side effect because the course IS417 untouched in the table Courses.

9.4 Summary

The normal forms of relational database theory provide criteria for determining a table‘s degree

of vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable

to a table, the less vulnerable it is to inconsistencies and anomalies. The purpose of

240
normalization is to produce a stable set of relations that is a faithful model for operations of the

enterprises.

9.5 Suggested Reading/ Reference Material

1. Elmasri & Navathe: Fundamentals of Database systems, 3rd Edition, Addison Wesley,

New Delhi.

2. Raghu Ramakrishnan & Johannes Gehrke: Database Management Systems, 2nd edition,

Mcgraw Hill International Edition.

3. C.J.Date: An Introduction to Databases Systems, 7th Edition, Addison Wesley, New

Delhi.

4. Bipin C.Desai : An Introduction to Database System, Galgotia Publication, New Delhi

9.6 Self Assessment Questions (SAQ)

1. Define Normalization? Why do we need to normalize the database?

2. What do you understand by database anomalies? Write the procedure to generate First

normal form.

3. ―Every relation in BCNF is also in 3NF, but a relation in 3NF is not necessarily in

BCNF‖, Comment on the statement.

4. What do you mean by Normalization? Discuss the normal forms based on Primary key.

241
Chapter – 10: An Introduction to MS-Access
Writer: Dr. Kanwal Garg
Vetter: Prof. Rajender Nath
Structure:
10.1 Introduction
10.2 Objective
10.3 Presentation of Content
10.3.1 Interface Elements in MS-Office Access 2007
10.3.2 Tool Bars and Their Icons
(i) Getting Started with Microsoft Office Access
(ii) The Ribbon

(iii) Command Tab


10.3.3 Creating a New Database
10.3.4 Creating a Table
(i) Create a Table in Datasheet View
(ii) Create a Table in Design View
(iii) Create a Table Based on a Table Template
10.3.5 Relationship
(i) Creating a Table by Using the Table Wizard
(ii) Creating a Table by Entering Data in a Datasheet
(iii) Creating a Table by Entering Data in Design View
10.3.6 Import/ Export Tables
(i) External Data Operations in Access
(ii) Types of Data That Access Can Import, Link To, Or Export
(iii) Import or Link to Data in another Format
(iv) Export Data to another Format
10.4 Summary
10.5 Suggested Reading/ Reference Material
10.6 Self Assessment Questions (SAQ)

242
10.1 Introduction

Microsoft Access is a relational database management system that comes as a part of Microsoft

Office Suite. Ms- Access is graphical user interface (GUI) application software, which is very

easy; yet powerful to manage large volumes of data. It generally manages data related to

different environments like scientific, inventory, financial, payroll, education, hospitality and

various other environments. MS- Access can be used at a client end or at a server end, in a client

server computing architecture.

10.2 Objective

The extension of MS-Access file is .mdb. In a single file of MS-Access, we can create multiple

database objects i.e. tables, queries, forms, reports, data access pages, macros and modules. MS-

Access 2007 comprises a number of elements that define how we interact with the product.

These elements were chosen to help to find the commands that executes faster. The most

significant interface element in MS- Access 2007 is called the Ribbon. The Ribbon is the strip

across the top of the program window that contains groups of commands. The Office Fluent

Ribbon provides a single home for commands and is the primary replacement for menus and

toolbars. On the Ribbon are tabs that combine commands in ways that make sense. In Office

Access 2007, the main Ribbon tabs are Home, Create, External Data, and Database Tools. Each

tab contains groups of related commands, and these groups surface some of the additional new

GUI elements, such as the gallery, which is a new type of control that presents choices visually.

In the upcoming sections we will discuss some of the important items.

10.3 Presentation of Content

10.3.1 Interface Elements in MS-Office Access 2007

243
 Getting Started with Microsoft Office Access: The page that is displayed when you

start Access from the Windows Start button or from a desktop shortcut.

 The Office Fluent Ribbon: The area at the top of the program window where you can

choose commands.

 Command Tab: Commands are combined in ways that make sense.

 Contextual Command Tab: A command tab that appears depending on your context the

object that you are working on or the task that you are performing.

 Gallery: A control that displays a choice visually so that you can see the results that you

will get.

 Quick Access Toolbar: A single standard toolbar that appears on the Ribbon and offers

single-click access to most needed commands, such as Save and Undo.

 Navigation Pane: The area on the left side of the window that displays your database

objects. The Navigation Pane replaces the Database window from earlier versions of Access.

 Tabbed Documents: Your tables, queries, forms, reports, pages, and macros are displayed

as tabbed documents.

 Status Bar: The bar at the bottom of the program window that displays status information

and includes buttons that allow you to change your view.

 Mini Toolbar: An on-object element that transparently appears above text that you have

selected, so that you can easily apply formatting to the text.

10.3.2 Tool Bars and Their Icons

(i) Getting Started with Microsoft Office Access

When you start Office Access 2007 by clicking the Windows Start button or a desktop shortcut

(but not when you click on a database), the Getting Started with Microsoft Office Access page

244
appears as shown in Figure 10.1. This page shows what you can do to get started in Office

Access 2007.

Figure 10.1: Microsoft Office Access 2007

(ii) The Ribbon

The Office Fluent Ribbon is the primary replacement for menus and toolbars and provides the

main command interface in MS-Office Access 2007. One of the main advantages of the Ribbon

is that it consolidates, in one place, those tasks or entry points that used to require menus,

toolbars, task panes, and other GUI components to display. This way, you have only one place in

which to look for commands, instead of a multitude of places.

When you open a database, the Ribbon appears at the top of the main MS-Office Access 2007

window, where it displays the commands in the active command tab.

The Ribbon contains a series of command tabs that contain commands as shown in Figure 10.2.

In MS-Office Access 2007, the main command tabs are Home, Create, External Data, and

Database Tools. Each tab contains groups of related commands, and these groups surface some

of the additional new GUI elements, such as the gallery, which is a new type of control that

presents choices visually.

245
The commands on the Ribbon take into account the currently active object. For example, if you

have a table opened in Datasheet view and you click Form on the Create tab, in the Forms group,

MS-Office Access 2007 creates the form, based on the active table. That is, the name of the

active table is entered in the form's Record Source property.

Figure 10.2: Ribbon

You can use keyboard shortcuts with the Ribbon. All of the keyboard shortcuts from an earlier

version of MS-Access continue to work. The Keyboard Access System replaces the menu

accelerators from earlier versions of Access. This system uses small indicators with a single

letter or combination of letters that appear on the Ribbon and indicate what keyboard shortcut

actives the control underneath. When you have selected a command tab, you can browse the

commands available within that tab.

(iii) Command Tab

1. Start MS-Access.

2. Click the tab that you want.

The following table shows a representative sampling of the tabs and the commands available on

each tab. The tabs and the commands available change depending on what you are doing.

Table: 10.1: Tabs and Commands

Command Tab Common things you can do

246
Home Select a different view.

Copy and paste from the clipboard.

Set the current font characteristics.

Set the current Font Alignment.

Apply rich text formatting to a memo field.

Work with records (Refresh, New, Save, Delete, Totals, Spelling, More).

Sort and filter records.

Find records.

Create Create a new blank table.

Create a new table using a table template.

Create a list on a SharePoint site and a table in the current database that links

to the newly created list.

Create a new blank table in Design view.

Create a new form based on the active table or query.

Create a new pivot table or chart.

Create a new report based on the active table or query.

Create a new query, macro, module, or class module.

External Data Import or Link to external data.

Export data.

Collect and update data via e-mail.

Work with offline SharePoint lists.

Create saved imports and saved exports.

Move some or all parts of a database to a new or existing SharePoint site.

247
Database Tools Launch the Visual Basic editor or run a macro.

Create and view table relationships.

Show/hide object dependencies or the property sheet

Run the Database Documenter or analyze performance.

Move data to Microsoft SQL Server or to an Access (Tables only) database.

Run the Linked Table Manager.

Manage Access add-ins.

Create or edit a Visual Basic for Applications (VBA) module.

10.3.3 Creating a New Database

In MS- Access we have a variety of options for creating/ opening a database. Such options are

given below. We can open a New Blank Database, can create a New Database from a featured

template, create a new database from a Microsoft Office Online Template, and open a recently

used database. These features are explained as below:

(i) Open a New Blank Database

1. Start Access from the Start menu or from a shortcut. The Getting Started with Microsoft

Office Access page appears.

2. On the Getting Started with Microsoft Office Access page, under New Blank Database,

click Blank Database.

3. In the Blank Database pane, in the File Name box, type a file name or use the one that is

provided for you.

4. Click Create.

The new database is created, and a new table is opened in Datasheet view.

248
Figure 10.3: Creating a New Database

Once you have created a blank database with a database name, you can create the following six

objects as described below:

 Tables - a collection of data about a specific topic, such as products or suppliers.

 Queries - a command for viewing or analyzing data in different ways or a result of the

command.

 Forms - a friendly interface to add a new record

 Reports - an object that present the data in an organized way according to your specification.

Examples are telephone bills, sales summary etc.

 Macros - a set of one or more actions that each performs a particular operation, such as

opening a form or printing a report. Macros can help you to automate common tasks. For

example, you can run a macro that prints a report when a user clicks a command button.

 Module - a collection of small programs and procedures that are stored together as a unit.

249
(ii) Create a New Database from a Featured Template

1. Start Access from the Start menu or from a shortcut. The Getting Started with Microsoft

Office Access page appears.

2. On the Getting Started with Microsoft Office Access page, under Featured Online

Templates, click a template.

3. In the File Name box, type a file name or use the one that is provided for you.

4. Optionally, check the Create and link your database to a Windows SharePoint Services

site if you want to link to a Windows SharePoint Services site.

5. Click Create (or) Click Download. MS- Access will create a new database from the

template and opens it.

(iii) Create a New Database from a Microsoft Office Online Template

1. Start Access from the Start menu or from a shortcut. The Getting Started with Microsoft

Office Access page appears.

2. On the Getting Started with Microsoft Office Access page, in the Template Categories

pane, click a category and then, when the templates in that category appear, click a

template.

3. In the File Name box, type a file name or use the one that is provided for you.

4. Click Download.

(iv) Open a Recently Used Database

1. Start Access.

2. On the Getting Started with Microsoft Office Access page, under Open Recent Database,

click the database that you want to open.MS-Access will open the database.

250
10.3.4 Creating a Table:

There are three ways to create a table:

 Use Datasheet View, i.e. enter data directly

 Use Design View

 Use a Table Template

(i) Create a Table in Datasheet View

To create a blank (empty) table in datasheet view, on the Ribbon you can:

 Click CreateTable in Figure 10.4

Figure 10.5 shows a Datasheet View with column headings ID and Add New Field across the top

of the datasheet. Data can be entered directly into it. After entering data and hit the Enter key, the

column heading Add New Field automatically changes to Field1 and the next column‘s heading

becomes Add New Field. At the same time, an ID number will be assigned to that row. When

you save the new datasheet, Microsoft Access will analyze your data and automatically assign

the appropriate data type and format for each field. Because the names of each field are not

descriptive, you may want to rename the fields.

Figure 10.4: Ribbon for Creating New Table

a) Renaming Fields:

1. Place the cursor over the column heading you want to rename and double click. The column

heading will appear highlighted and the cursor will be blinking (edit mode).

251
2. Type the name you want to use and then press the Enter key.

3. Repeat the first two steps for the second column, and so on.

Figure 10.5: Creating a Table in Datasheet View (Renaming Fields)

As the column corresponds to the field, the row corresponds to the record. Now we are ready to

add the information. Say that, if we are doing a database of a company, the first table we may

have is Employee. And the fields of Employee may contain SSN, LastName, FirstName, and so

on. Figure 10.6 shows Employee table as an example.

Figure 10.6: Datasheet View (Employee Table)

252
b) Summarizing Datasheet View

Figure 10.7: Summary of Datasheet View (Employee Table)

(ii) Create a Table in Design View

In Design View new fields can be added, define how each field appears or handles data, and

create a primary key. To create a blank (empty) table in design view, you can:

 Click CreateTable Design as shown in Figure 10.4, Design View as shown in Figure 10.8

will appear.

 In this view, we can specify detailed properties for each field. This includes the length and

type of information used in the field. But if we were to enter data into the table, we must use

Datasheet View or Forms. The design view for the example Employee table mentioned

before will look like Figure 10.9.

 There are three columns on the top portion of the window. The Field Name is the name of the

fields. For example, SSN, FirstName, LastName are proper field names for the Employee

253
table. The name for a field must follow MS Access object-naming rules. The Data Type is

like the domain of an attribute.

Figure 10.8: Design View

Figure 10.9: Design View (Employee Table)

254
It provides a list of data types that we can choose from, including Text, Memo, Number, Date,

and so on. The Description column allows us to describe the field and it is optional. This allows

new users to easily understand the specifications and meaning of your fields. Table 10.2

summarizes all data types available in MS Access.

You can set up properties of fields in the Field Properties window at the bottom half pane. Table

10.4 describes all properties available for setup.

Before we save the table and quit, we need to specify the primary key. In our Employee table,

SSN will be good for primary key. To define SSN as the primary key, click the Field Selector as

shown in Figure 10.8 for the SSN field. Field Selector is the gray bar on the left side of the Table

Design grid by each field. When we click here, the whole row appears highlighted. Then click

menu EditPrimary Key or click the Primary Key button (i.e. the key symbol, shown in Figure

10.9) on the toolbar in design view, a key symbol will appear on the Field Selector. Save the

table as Employee. Now we have created one Table.10.3.

Table 10.2: Field Properties in Design View

255
Table 10.3: Data Types in MS Access

256
(c) Summarizing Design View

Figure 10.10: Summary of Design View (Employee Table)

(iii) Create a Table Based on a Table Template

To create a Contacts, Tasks, Issues, Events or Assets table, you might want to start with the table

templates for these subjects that come with Office Access 2007. To choose a template for your

table from the above predefined templates you can:

 Click CreateTable Templates in Figure 10.4.

 Select one of the available templates from the dropdown list.

10.3.5 Relationships

The tables in a database may be linked to each other by the creation of relationships between

specific fields in the database. These relationships can be viewed in the Relationships window:

Select Relationships on the Database Tools tab

257
Figure 10.11: Relationship

(i) Creating a Table by Using the Table Wizard

Microsoft Access has a wizard named the Table Wizard that will create a table for you. This

wizard gives you suggestions about what type of table you can create (for example, a Mailing

List table, a Students table, a Tasks table, and so on) and gives you many different possible

names for fields within these tables. To use the Table Wizard to create a table, follow these

steps:

1. Create a new, blank database.

2. In the Database window, click Tables under Objects, and then click New.

3. In the New Table dialog box, double-click Table Wizard.

4. Follow the directions in the Table Wizard pages.

If you want to modify the table that the Table Wizard creates, open the table in Design view

when you have finished using the Table Wizard.

258
(ii) Creating a Table by Entering Data in a Datasheet

In Microsoft Access, you can also create a table by just entering data into columns (fields) in a

datasheet. If you enter data that is consistent in each column (for example, only names in one

column, or only numbers in another column), Access will automatically assign a data type to the

fields. To create a table by just entering data in a datasheet, follow these steps:

1. Create a new, blank database.

2. In the Database window, click Tables under Objects, and then click New.

3. In the New Table dialog box, double-click Datasheet View. A blank datasheet is

displayed with default column names Field1, Field2, and so on.

4. Rename each column that you want to use. To do so, double-click the column name, type

a name for the column, and then press ENTER.

You can insert additional columns at any time. To do so, click in the column to the right

of where you want to insert a new column, and then on the Insert menu, click Column.

Rename the column as described earlier.

5. Enter your data in the datasheet. Enter each kind of data in its own column. For example,

if you are entering names, enter the first name in its own column and the last name in a

separate column. If you are entering dates, times, or numbers, enter them in a consistent

format. If you enter data in a consistent manner, Microsoft Access can create an

appropriate data type and display format for the column. For example, for a column in

which you enter only names, Access will assign the Text data type; for a column in which

you enter only numbers, Access will assign a Number data type. Any columns that you

leave empty will be deleted when you save the datasheet.

259
6. When you have added data to all the columns that you want to use, click Save on the File

menu.

7. Microsoft Access asks you if you want to create a primary key. If you have not entered

data that can be used to uniquely identify each row in your table, such as part numbers or

an ID numbers, it is recommended that you click Yes. If you have entered data that can

uniquely identify each row, click No, and then specify the field that contains that data as

your primary key in Design view after the table has been saved. To define a field as your

primary key after the table has been saved, follow these steps:

a. Open the table that Access created from the data that you entered in datasheet in

Design view.

b. Select the field or fields that you want to define as the primary key.

To select one field, click the row selector for the desired field.

To select multiple fields, hold down the CTRL key, and then click the row

selector for each field.

c. On the Edit menu, click Primary Key.

If you want the order of the fields in a multiple-field primary key to be different from the

order of those fields in the table, click Indexes on the toolbar to display the Indexes

window, and then reorder the field names for the index named Primary Key.

As mentioned earlier, Microsoft Access will assign data types to each field (column)

based on the kind of data that you entered. If you want to customize a field's definition

further--for example, to change a data type that Access automatically assigned, or to

define a validation rule--open the table in Design view.

260
(iii) Creating a Table by Entering Data in Design View

If you want to create the basic table structure yourself and define all the field names and data

types, you can create the table in Design view. To do so, follow these steps:

1. Create a new, blank database.

2. In the Database window, click Tables under Objects, and then click New.

3. In the New Table dialog box, double-click Design View.

4. In the <Table Name>: Table dialog box, define each of the fields that you want to include

in your table. To do so, follow these steps:

a. Click in the Field Name column, and then type a unique name for the field.

b. In the Data Type column, accept the default data type of Text that Access assigns or

click in the Data Type column, click the arrow, and then select the data type that you

want.

c. In the Description column, type a description of the information that this field will

contain. This description is displayed on the status bar when you are adding data to

the field, and it is included in the Object Definition of the table. The description is

optional.

d. Once you have added some fields, you may need to insert a field between two other

fields. To do so, click in the row below where you want to add the new field, and then

on the Insert menu, click Rows. This creates a blank row in which you can add a new

field.

To add a field to the end of the table, click in the first blank row. After you have

added all the fields, define a primary key field before saving your table. A primary

261
key is one or more fields whose value or values uniquely identify each record in a

table. To define a primary key, follow these steps:

e. Select the field or fields that you want to define as the primary key. To select one

field, click the row selector for the desired field. To select multiple fields, hold down

the CTRL key, and then click the row selector for each field.

f. On the Edit menu, click Primary Key.

If you want the order of the fields in a multiple-field primary key to be different from the order

of those fields in the table, click Indexes on the toolbar to display the Indexes dialog box, and

then reorder the field names for the index named Primary Key.

You do not have to define a primary key, but it is usually a good idea. If you do not define a

Primary key, Microsoft Access asks if you want Access to create one for you when you save the

table. When you are ready to save your table, on the File menu, click Save, and then type a

unique name for the table.

10.3.6 Import/ Export Tables

One of the most useful features of Access is its ability to interface with data from many other

programs. In fact, it‘s difficult to summarize in a single article all the ways in which you can

move data into and out of Access. For example, here are just a few ways in which you might use

the data-exchange features of Access:

 To combine data that was created in other programs.

 To transfer data between two other programs.

 To accumulate and store data over the long term, occasionally exporting data to other

programs such as Excel for analysis.

262
(i) External Data Operations in Access

In many programs, you use the Save As command to save a document in another format, so that

you can open it in another program. In Access, however, the Save As command is not used in the

same way. You can save Access objects as other Access objects, and you can save Access

databases as earlier versions of Access databases, but you cannot save an Access database as,

say, a spreadsheet file. Likewise, you cannot save a spreadsheet file as an Access file (.accdb).

Instead, you use the commands on the External Data tab in Access to import or export data

between other file formats.

(ii) Types of Data That Access Can Import, Link To, Or Export

A quick way to learn about the data formats that Access can import or export is to open a

database and then explore the External Data tab on the ribbon.

Figure 10.12: External Data Tab

1. The Import & Link (1 given in Figure 10.12) group displays icons for the data formats

that Access can import from or link to.

2. The Export (2 given in Figure 10.12) group displays icons for all the formats that Access

can export data to.

263
3. In each group, you can click More (3 given in Figure 10.12) to see more formats that

Access can work with.

If you don‘t see the exact program or data type that you need, chances are your data can be

exported by the other program into a format that Access understands. For example, most

programs can export columnar data as delimited text, which is then easily imported into Access.

The following table shows which formats can be imported into, linked to, or exported out of

Access:

Table 10.4: Program or Formats

Program or Import allowed? Linking allowed? Exporting allowed?

format

Microsoft Office Yes Yes Yes

Excel

Microsoft Office Yes Yes Yes

Access

ODBC Databases Yes Yes Yes

(For example, SQL

Server)

Text files Yes Yes Yes

(delimited or fixed-

width)

XML Files Yes No Yes

PDF or XPS files No No Yes

E-mail (file No No Yes

264
attachments)

Microsoft Office No, but you can save a No, but you can save a Yes (you can export as

Word Word file as a text file Word file as a text file Word Merge or as Rich

and then import the and then link to the Text)

text file. text file.

SharePoint List Yes Yes Yes

Data Services (see No Yes No

note)

HTML Documents Yes Yes Yes

Outlook Folders Yes Yes No, but you can export

as a text file, and then

import the text file into

Outlook.

dBase files Yes Yes Yes

(iii) Import or link to Data in Another Format

The general process for importing or linking data is as follows:

1. Open the database that you want to import or link data into.

2. On the External Data tab, click the type of data that you want to import or link to. For

example, if your source data is in a Microsoft Excel workbook, click Excel.

3. In most cases, Access starts the Get External Data wizard. In the wizard, you may be

asked for some or all of the information.

265
 Indicate whether the first row contains column headings, or whether it should be treated

as data.

 Specify the data type of each column.

 Choose whether to import the structure only, or the structure and the data together.

 If importing, specify whether you want Access to add a new primary key to the new

table, or use an existing key.

 Specify a name for the new table.

Figure 10.13: Source Data Tab

4. On the last page of the wizard, Access usually asks you if you want to save the details of

the import or link operation. If you think you‘ll need to perform the same operation on a

recurring basis, select the Save import steps check box, fill in the information, and then

click Close. Then, you can click Saved Imports on the External Data tab to re-run the

operation.

After you have completed the wizard, Access notifies you of any problems that might have

occurred during the import process. In some cases, Access might create a new table called

Import Errors, which contains any data that it was unable to import successfully. You can

examine the data in this table to try to find out why the data did not import correctly.

(iv) Export Data to another Format

The general process for exporting data from Access is as follows:

1. Open the database that you want to export data from.

266
2. In the Navigation Pane, select the object that you want to export the data from. You can

export data from table, query, form, and report objects, although not all export options are

available for all object types.

On the External Data tab, click the type of data that you want to export to. For example to

export data in a format that can be opened by Microsoft Excel, click Excel.

Figure 10.14: Export Data Tab

In most cases, Access starts the Export wizard. In the wizard, you may be asked for

information such as the destination file name and format, whether to include formatting

and layout, which records to export, and so on.

4. On the last page of the wizard, Access usually asks you if you want to save the details of

the export operation. If you think you will need to perform the same operation on a

recurring basis, select the Save export steps check box, fill in the information, and then

click Close. Then, you can click Saved Exports on the External Data tab to re-run the

operation.

10.4 Summary

MS-Access is a powerful RDBMS that is used to create and manage your databases. It is a

graphical user interface application software, which is very easy and powerful to manage large

volume of data. It has many built in features to assist you in constructing and viewing your

267
information related to different environment like scientific, inventory, financial, payroll,

education, hospitality and various other environments.. The information can be viewed, sorted,

manipulated, retrieved and printed in various ways. Ms-Access gives you a platform where you

can retrieve accurate and fast information. The extension of MS-Access file is .mdb.

10.5 Suggested Reading/ Reference Material

1. http://www.officetutorials.com

2. Windows XP Complete Reference, BPB Publications

3. MS Office XP Complete, BPB publication

10.6 Self Assessment Questions (SAQ)

1. Explain the steps to create table in design view. Discuss the process of creating relationships.

2. What data types are supported in MS-Access to create table.

3. Discuss the field properties of table in MS- Access.

4. Discuss the format and program whose data may be imported in and exported to MS-Access.

5. What are the different steps for creating a database in MS –Access?

6. Why we use MS-Access? Discuss the different way to create a table.

7. What do you understand by external data operations in MS –Access?

268
Chapter – 11: Database Operation in MS-Access
Writer: Dr. Kanwal Garg
Vetter: Prof. Rajender Nath
Structure:
11.1 Introduction
11.2 Objective
11.3 Presentation of Content
11.3.1 Queries
(i) Creating Queries
(ii) Query Wizard: A Select Query
(iii) Design View of an Existing Query
(iv) Creating a Query Totally in Design View
(v) Use a Query Wizard to Create a Crosstab Query
(vi) Create a Parameter Query in Design View
(vii) Creating Action Queries in Design View
(viii) Make-Table Queries
11.3.2 Reports
(i) Views
(ii) Report Wizard
(iii) Report Tool
(iv) Report Design
11.3.3 Forms
3.3.1 New Form Options
3.3.2 Design View of Forms
3.3.3. Design View Form Sections
3.3.4 Design View Info
11.4 Summary
11.5 Suggested Reading/ Reference Material
11.6 Self Assessment Questions (SAQ)

269
11.1 Introduction

This chapter provides essential tools/ operations such as queries, form and reports of any DBMS

or RDBMS package, as all the information is not required at one time. One always needs

selective data. Therefore query can filter data from a single table or group of related tables.

Forms provide an interactive way to data entry into the table. We can view, modify or delete data

stored in the table by using a form. The user can choose the design of the form from various

ready-made designs provided by MS- Access.

Reports are used to present data in a predefined or user-defined format. They are generally

prepared for presenting data in hard copy form using a printing device. Reports take data from

database tables and present it, in a way the user wants. One can group data on certain fields or

conditions or sort data on one or more fields in ascending or descending order.

11.2 Objective

The idea of this chapter is to make the student familiar with different database operation such as

queries, reports and forms. For this purpose, the author of this chapter provides an overview of

MS-Access 2007 concentrating on the said aspects. The screen-shots are provided at the

appropriate places for the better understanding of the students.

11.3 Presentation of Content

11.3.1 Queries

A query is a way to define a permanent filter to retrieve data or to create an action that performs

on records. Queries are also called dyna-sets for dynamic subsets of a table.

(i) Creating Queries

Queries can be one of four main types:

 A Select Query retrieves and displays records from tables according to what field you pick

270
and what criteria you place on the query.

 A Crosstab Query will display sums, counts, and averages from one field in a table and

show this in a datasheet with fields on the left and across the top.

 An Action Query performs operations on the records to match your criteria and include

make-table queries, update queries, append queries and delete queries.

 A Parameter Query prompts you for information to use to activate the query. It can help

you to query addresses state by state without creating 50 different queries.

(ii) Query Wizard: A Select Query

A query may be created at any time after you have a table. Access to all query options is found

on the Create Ribbon. Click the Create Ribbon and the Query Wizard button. From this point on

2007 is very similar to 2003.

 Follow the wizard dialog boxes and answer the questions to

create your select query. Select a query type first.

 First choose the object to base it on. A query may be based

on one or more tables or on another query.

 Select the fields you need to show in your query and send

them across.

 If you need fields from a second table or query, reselect that

table or query and add the fields to the list already chosen.

 In the next box choose whether you want Detail which shows

all fields of info selected or Summary which shows only the

summarized results. If you choose Detail, your query is

finished. Figure 11.1: Select Query

271
 Click Summary and then Summary Options to see the numerical fields listed and the option

for calculations.

 The Detail/Summary screen only appears if one of your fields has numerical data.

 Give the new query a name and choose to open the query or go directly into Design view.

 Click finish. It looks just like a datasheet, but it gives you filtered data, on command, without

redoing a filter.

 You may click on the design view icon to edit this query further with design view of a query.

 Note that in Access 2007 a table and a query can also make grand totals. You no longer need

to have a report to show the totals.

(iii) Design View of an Existing Query

In query design view a query grid shows the fields you have selected and the field list from the

table you are using for the query.

 If you open an existing query, the object it is based on shows in the area at the top.

Figure 11.2: Design View of Query

 The query properties are down the left side of the grid and change depending on the type of

query selected.

272
 Add more fields to your query either by clicking on an empty field cell down arrow and

choosing a field to add, or by dragging the field directly from the field list at the top and

dropping it on the field row.

 Make the query do an alphanumeric or numeric sort on any chosen field by clicking on the

Sort line in the chosen field and choose ascending or descending.

 The Default in queries has the first field auto-sorting.

 If the box in the show line is not checked this field will not show on your finished query.

 A hidden field can still be used as a sort field or a limiting field if criteria are set.

 Queries can pull data from multiple tables or queries. If you have a true relational database,

your query may be used to pull together all data from all tables into one large query.

 Click on the Show Table icon and select a second table or query.

 Choose the field from the popup list in the field cell or drag any field from the table or query

field list to use in your query.

(iv) Creating a Query Totally in Design View

 Choose Create\Design View to start a blank query.

 Select the table(s) or query to base your new query on. Close the pop-up ―Show Table‖ box

to continue.

 A special Query Tools/ Design Ribbon opens on the ribbon bar when you close the Show

Table box.

 To create a simple query, select the fields to query and set the criteria. Drag and drop fields

to the bottom grid or choose them from the drop down list that appears when you click on the

field line boxes.

 Choose from various options for the other properties given.

273
 Query options listed will depend on the query type you have started.

 Select a field to sort and choose ascending or descending.

 Determine whether or not to show each field by the check box.

Figure 11.3: Creating a Query in Design View

 Create an expression on the criteria line to filter out unwanted data or type in the exact data

you wish to see.

 If you click the View button on the Ribbon, you return to datasheet view to PREVIEW your

query. Remember that you are only previewing the query.

 You must close Design View of the query and choose to SAVE the query before you can

really be finished with the design and ready to run the query or use your tables or reports.

274
 Choose to turn your simple query into an action or crosstab query by choosing that option

from the Query type group on the Query Tools/ Design Ribbon.

(v) Use a Query Wizard to Create a Crosstab Query

Crosstab queries will display sums, counts, and averages from one field in a table and show this

in a datasheet with fields on the left and across the top. Use the wizard to help make your

crosstab query easy.

 Open Create\Query Wizard and choose the

Crosstab Query Wizard on the first screen.

 Choose the table or query to base this on.

Figure 11.4: Creating a Crosstab Query

 Choose the field you want to use as the rows.

 If you want the fields ―grouped‖ use more than one field. The grouping is by which field is

chosen first.

 Next choose a field for the column headings.

 The last step is to choose the field you want calculated in the crosstab.

275
 Next step is the selection of the function to use. Choose from average, count, first, last,

maximum, minimum, Standard Deviation, Sum etc.

 Click next and name your crosstab query. Run the query to see how it looks.

 If the query doesn't include the data you want, start over and do it again.

 Click on design view to see how the fields and info are set up.

 Click design view after tuning it on.

 Save and run the query again to see the results.

 Other options for queries can be set in design view.

o Turn on the alpha sort of types.

o You can't put criteria below a value field. Put the criteria under a second field and make it

hidden if necessary. Access tries to correct errors to make a query work.

o Save and re-open the query. Access creates extra fields and moves criteria when

necessary.

o If you create an incorrect expression, Access tries to correct your expression or gives you

a warning message and refuses to save if you cannot correct the problem.

o Do not use the crosstab query wizard if you are querying multiple tables. Use Design

View instead to create the query and then choose crosstab query from the query types on

the Ribbon and fill in the fields and options.

(vi) Create a Parameter Query in Design View

A parameter is a question set to ask for criteria before running the query. It can be added to any

existing query. Make a copy of the select query we created, and turn it into a parameter query.

 Open a query in design view, and check to be sure all desired fields are chosen.

 Choose which field is to be the basis of the parameter query.

276
 Type in a question requesting the needed parameter as the criteria of that same field.

Example: [Enter the computer type:] on the criteria line under the Type field. Include

brackets!

 I like listing my options in the parameter

question: [Choose from the computer types:

laptop, desktop, profile, or server:]. Save the

query.

 Run the query. It asks for you to input the

parameter before compiling the query.

Figure 11.5: Parameter Query View

(vii) Creating Action Queries in Design View

The action queries must all be started in Design View as per the steps given below:

 Click Create\ Query Design.

 Add the table or query to use in your query and close the Show Table window.

 Choose the action query type you want from the Query Tools/ Design Ribbon which opens

when you close the show table box.

 Select the desired type of query and create one of each.

(viii) Make-Table Queries

 Start a new Design Query, and choose the table or query to base it on.

 Select the Make-Table option from the query types on the design ribbon.

 Give your new table a name and tell whether you are adding it to the current database,

another database or a new database.

277
 Choose the fields to add to the new table. (Add

manufacturer, computer type, cost, and date

purchased.)

 Set criteria if needed such as a sort or

manufacturer you need.

 Close, save, and name the query.

 Double-click to run the Make-Table query and

create your new table.

 This beats copy and paste, and the original table

is not affected.

Figure 11.6: Make Table Query

Table queries include (a) delete queries, (b) append query and (c) update query as discussed

below:

a) Delete Queries:

 Start a new Design Query and choose the newly made table as the base.

 Select the Delete query option from the query types .

 Choose only fields needed for criteria from the new table.

 Set criteria on the date purchased field, so that computers with date purchased before

11/22/2005 will delete. (<11/22/2005)

 Note that ―#‖ signs will appear around the date automatically if you forget to add them.

 Save and name the query. Check for the ―#‖ signs.

 Run the Delete query and then look at the table. The oldest computers are gone.

278
b) Append Queries:

 Start a Design Query and choose the original table as a base to pull your data from.

 Select the Append query option from the query types.

 Next you are asked to select the table to append the data onto. Choose the one created by

the make-table query.

 Choose the same four fields from the original table. (Add manufacturer, computer type,

cost, and date purchased.)

 Set the criteria so computers with a date purchased before 11/22/2005 will append.

(<#11/22/2005#)

 Save and name the query.

 Run the Append query and then look at the computers appended to the table. All

computers are back.

 When you use an Append query, be sure the fields of data match in the two tables.

c) Update Queries:

 Start a new Design Query, and choose the appended make-table as a base.

 Select the Update Query option from the query type group.

 Choose only the field you are updating and the field you need to set the criteria. (If you

are not limiting criteria, you won't need the second field.)

 Set the criteria. If you need to refer back to data in the table to check for criteria, click to

open the table from the Navigation Pane. If necessary, block and paste the data from the

field you want to update. Example: use manufacturer: "Dell" for the criteria.

 When you paste criteria in, ―equals‖ is understood as the given and the quotation marks

automatically appear around Dell.

279
 Criteria with periods in it may confuse Access and require you to put in the quotation

marks to mark it as text.

 Set the field update to information. Example: [cost] + 1000; adds 1000 to each cost

amount on the Dells. The dollar sign and decimal are unneeded and will be ignored. DO

NOT use a comma in the dollar amount. Just use the field name plus the amount to

increase or the field name minus to reduce a cost: [cost] + 1000 or [cost]-500.

 Save and name the query. Run the Update query, and then look at the computer costs that

updated in the make-table.

10.3.2 Reports

An Access Report is a formatted, stylized way to print out any part of your database information.

Information in a report can be sorted, queried, formatted, calculated, or summarized. Your report

can be based on either a table or a query.

(i) Views

View options have changed in Access 2007. Check out the options on these four views. Each

view has a specific purpose in creating and modifying your report.

 Report View gives the access tabbed view of your finished report

lined up with any other tabbed open objects from the database.

 Print Preview takes you to the new print preview interface with

its program ribbon and features.

 Layout View is new in 2007 and allows you more flexibility in

setting up and modifying a report than ever seen before. You can do almost anything to fix

report problems all in a GUI interface.

280
 Design View has also changed in 2007, but still looks similar to previous versions. You can

use it to add more controls, edit control sources, and change properties.

(ii) Report Wizard

Click the Create ribbon and look at the Report Group. You‘ll see several options for creating

reports. Click the Report Wizard Tool and let Access lead you through the steps.

 If you click on the table or query you wish to base the report on, will

be given in the selection box. You may still change to another table

or query.

 Go down the field list choosing which fields you need and

clicking the center arrow buttons to send the fields to your

report. Put them in the order you want on the report.

 Next a box comes up allowing you to set grouping options for

your field. E.g. group name by manufacturer or computer

type. More than one grouping field may be set.

 Choose a sort order for any field except the grouping field

which is Alpha sorted already (default). You‘ll see the

summary option if you have grouping and any number fields.

 If you don‘t pick a grouping field, the summary options do

not show.

 Experiment with the choices for summary options by

choosing from sum, average, minimum or maximum. Try a

count as well.

Figure 11.7: Report Wizard-1

281
 Decide whether you want Summary Only (one grand total‐no list) or the Detail and Summary

(shows all items and their totals).

Figure 11.8: Report Wizard-II

 The layout of your data is the next set of options. Also choose page orientation and whether

or not all of a field will be shown.

 Next is a choice of pre‐set styles which are expanded in 2007.

 Give the report a name and choose whether to preview or modify.

 The report name becomes the title of the report. It can be changed later if needed.

 Preview the report or modify takes you to Design View. Click Finish.

 If the report doesn't look right, delete it, and start the wizard again.

 If you do not choose to use groupings, the wizard gives you options for columnar, tabular, or

just- ified reports. You can make a report look more like a set of forms.

 Go to File\Page Setup to modify margins, orientation and set the number of columns.

 Switch to Layout View or Design View to make necessary modifications to the report before

savng. (See below for further instructions on this.)

282
 Click the print icon or File\Print for all other printing options.

(iii) Report Tool

The old Auto Report as available in Ms-Access 2003 is missing, but the new Report Tool Ms-

Access 2007 creates a report just as easily. It gives you an instant download of all fields in the

table or query you have selected to base it on. The report also opens in Layout View which gives

you full editing options. Select a table or query first, and click on Create Report.

 All existing fields in the chosen table or query appear on the new report showing in Layout

view on the screen.

 Layout View lets you edit without going to Design View.

 Any extra fields may be manually selected and deleted in Layout View.

 Rearrange the order of fields or other objects on your report by dragging.

 Field column sizes may be increased or decreased by dragging the edges of a field.

Figure 11.9: Report Tools

Three Report Layout Tools (Contextual Tabs) used in modifying your report open automatically

as given below:

283
Figure 11.10: Report layout - I

Format: fonts, formatting, grouping, totals, gridlines, logos, page numbers, auto formatting

changes are here.

Figure 11.11: Report layout – II

Arrange: control layout options, alignment, positions, and property sheet tools are on the

arrange ribbon.

Figure 11.12: Report layout – III

Page Setup: change paper size, orientation, margins, columns, and other page set options.

 Use Format to set up a gridline on whatever you select.

 Use a fill color in the grids.

 Turn on totals in your report or add a page number. The list of option is on the ribbon.

 Click the Add Existing Fields button to get the Field List Pane turned on. Extra fields may be

added by dragging onto the report from the Field List pane.

284
 The Auto format gallery has an extensive set of report styles to click and apply.

Figure 11.13: Report Style

 When you are finished making design changes to your report, click on Report View to see

the finished version.

(iv) Report Design

If you initially start your report in Design View, it will not have a Record Source associated with

it, and you will have to manually assign one. All other report methods allow you to choose the

table or query to associate with the report.

To associate a table or query to the report:

 Click on Create\ Report Design.

 Right‐click the box at the upper left corner of the ruler bars with a black button in it. You‘ll

select the report and the shortcut menu that pops up gives you the Properties option. Click to

see the REPORT Property Sheet.

285
 If you select properties, but do not see the word REPORT in selection type, click the drop

down menu and choose report from the list.

 The first item in the properties box asks for a Record Source. Click the down arrow to see the

list, and choose the table or query you want for the report source.

 Now you are ready to open the field list and add controls to your design grid.

(v) More Advanced Design Tools:

When a control is added there are two pieces, the label and the control data. When you select

either item they can be moved together or separately. You can see dark blocks in the upper left

corners, but only one item has the yellow selection box showing. The items move together when

the compass is anywhere else on the box except on the dark block of the upper left corner.

To move the label only into the Page Header section, do the following:

Click to select the label so that the yellow box shows.

On the toolbar click the cut icon or press ctrl X.

Select the Page Header section by clicking on its section bar.

On the toolbar click the paste icon or ctrl V, and the label appear.

Add a sum or average of a numerical field in a report.

The simplest way is to use the Report Wizard. If you use grouping the next window will give

you a button for summary options. Choose from the choices and view your report. To add a

summary field directly into design view follows these steps:

• Be sure the Report Footer section is showing or turn it on.

• Drag the bottom edge of the grid down to create space for the expression.

• Select the text box icon and draw the box in the section you want.

• Delete the label or move it to the left to use as a label box with the word ―Sum‖ in it.

286
• The expression =Sum ([fieldname]) may be typed directly into the text box. For average type

=Avg ([fieldname]).

• Put the sum expression in the group footer for subtotals on your groups.

• Put the sum expression in the report footer for a grand total on the final page.

• Open the properties of the text box as a Format must also be set in properties. Use the drop

down menu to select from the list: Currency, Long integer, etc.

• The Expression Builder may also be used to create a summary of a field.

Add page numbering to the report footer.

• Create a Text Box by clicking that icon, and using the mouse to ―draw‖ a rectangle.

• Click inside and type in the following code: ="Page" & [Page] & "of " & [Pages].

• Other page numbering expressions may be seen in the Expression Builder.

• Delete the label box as it is unneeded.

Add a current date to report footer.

• Create a Text Box by clicking that icon, and using the mouse to ―draw‖ a rectangle.

• Click inside and type in the following code: =Now().

• Other expressions may be seen in the Expression Builder.

• Delete the label box as it is unneeded.

Add different types of graphics to your report.

• Start in Layout View or Design View.

• Insert a picture or clip art by clicking on the Logo button.

Locate a file and insert it directly into the report. Access now has the capability to shrink a

picture to fit whatever size area you have. If the graphic is added to the report header it appears

only on page one. If it is added to the page header, it appears on every page.

287
• Lines may be added to your document in design view and layout view. Some

AutoFormats can interfere with graphical options.

10.3.3 Forms

An Access Form may be created to use as a simple interface to input records one at a time. It can

also be used to view, print or search for individual records.

Figure 11.14: Database Tools

(i) New Form Options

Click on Create to see all forms options in the Forms group.

Form creates a simple form with all fields.

 Choose the table or query to base it on.

 Now when you click Form all fields from the chosen object are added to a basic form in

Layout View.

 The two Form Layout Tools Ribbons appear with more formatting and arrangement options.

Layout View of a form is similar to the option and design features you used in Report

Layout. Add an auto format, labels, graphics, backgrounds, or other options to your form.

 Move, resize, edit, delete fields.

Split Form: It creates a columnar form and includes the datasheet on a split screen with all fields

in selected query or table.

 Split form is created from a chosen table or query in Layout View.

 Edit using the Form Layout Tools options.

288
 Edit the table as well or move back and forth.

Figure 11.15: Forms

Figure 11.16: Split Forms

Blank Form – Creates a blank form in Layout View with the table field list turned on. You may

not use a query on this one. It is similar to the Blank Report.

 Drag and drop desired fields onto the form.

 Turn on Show all Tables if needed.

289
 Move and resize your chosen fields.

 Add formatting and style options.

Pivot Chart – Used to create a pivot chart form.

MS- Access includes four other kind of form.

 Choose a pivot table form.

 Check on modal dialog forms.

 Use a datasheet form.

 Form Wizard – Allows you to step through the process answering the questions and

creating a custom form of only the items you choose. Under More Forms follow the

following steps:

Decide the arrangement and position of the info on the form from columnar, tabular,

datasheet or justified.

Choose a style from the previews.

Name and save your form.

Form Design – Design brings up the blank design grid similar to the report design where you

build your own form from the ground up.

Use the Form tool or Form wizard to create an easy form. If you need to seriously edit the form,

it would be simpler to edit the original fields and then recreate the form. Use Design View to do

simple modifications on this form.

(ii) Design View of Forms

Use Design Form to create a personalized, custom form. Most forms are small and do not require

more than the detail section of the form. Creating a Form from design view is a complex task.

For the design view of a Form following steps are followed:

290
 Click on Create\ Form Design to open the form.

 Double click the black square dot in the upper left corner.

 The Property Sheet opens and you must choose the Record Source.

Figure 11.13: Design View of Forms

Now you are ready to click Add existing fields and to see the pane open with all table fields.

Drag needed fields into the detail section and arrange.

(iii) Design View Form Sections

 Form Header and Footer are not normally turned on, but may be found under Report Design

Tools\Arrange.

 A title for your form and any graphic image you may wish to add may be put in either the

Form Header or the detail section.

 Detail section is for the actual data you need to fill in for your table.

(iv) Design View Info

Each item in the form is represented in design view by a control the same as a report.

 Bound Controls are fields of data. No calculations

 Unbound Controls contain a label or text box. You can calculate in an unbound control.

 Calculated Controls are values that are calculated and not used in most forms.

(v) Editing the Form

To edit any of your form, follow the steps in design view as given below:

291
• Add or remove controls from your form.

• Create a command button on your form:

• Click ―Use Control Wizards‖ icon.

• Click ―Button‖ and draw a button on your design view form.

• From the Command Button Wizard click through the Category options and choose an action.

Examples: Print a record, a form, or a report.

• Select all controls on your form with the pointer, and apply different fonts, font sizes,

highlights, font colors, bold, italics, underlines, etc. to your form

• Resize the form and move the controls to the right half of the form to have room to insert a

graphic on the left side of the form.

• Add lines and boxes to the form. Click the line/border width icon to change the line width.

Pull on the end of line to lengthen.

• Use AutoFormat to add a style or right click to do a fill color in the background.

• If you change any field to a lookup field after your form (or report) is created, the form will

not automatically update this field. You will need to recreate the form or change it yourself in

design view. If you want to fix it yourself follow these steps:

• Select and delete the old field from the form.

• Open the field list and drag the new lookup field onto your form.

• Add a combo box or list box to a form in design view by dragging the new field from the

field list onto the design view grid. Properties are set for you.

• One common problem with many forms is the size of the fill in box. This is based on the

size of the field in the original table. Remember that 2007 has gone back to 255 characters as

292
default size. To decrease the field size, go back to the properties of each field while editing

in table design view. You will have to recreate the form to finish resizing.

11.4 Summary

The new user interface in Office Access 2007 comprises a number of elements that define how

you interact with the product. These new elements were chosen to help you master Access, and

to help you find the commands that you need faster. The new design also makes it easy to

discover features that otherwise might have remained hidden beneath layers of toolbars and

menus. And you will get up and running faster, thanks to the new Getting Started with Microsoft

Office Access page, which provides you with quick access to our new getting started experience,

including a suite of professionally designed templates.

The most significant new interface element is called the Ribbon, which is part of the Microsoft

Office Fluent user interface. The Ribbon is the strip across the top of the program window that

contains groups of commands. The Office Fluent Ribbon provides a single home for commands

and is the primary replacement for menus and toolbars. On the Ribbon are tabs that combine

commands in ways that make sense. In Office Access 2007, the main Ribbon tabs are Home,

Create, External Data, and Database Tools. Each tab contains groups of related commands, and

these groups surface some of the additional new UI elements, such as the gallery, which is a new

type of control that presents choices visually. Queries, Reports and Forms in Ms- Access are the

database operations which help the user to retrieve the meaningful information from the

database. Query is a request for retrieving data from table that satisfies a particular condition.

Forms provide an environment where the user can edit, insert and modify the existing records.

Data so stored in the database can be reported to top level management in the form of report,

which helps the management for decision making process.

293
11.5 Suggested Reading/ Reference Material

1. Windows XP Complete Reference. BPB Publications

2. MS Office XP complete BPB publication

3. Sandra Nees, Microsoft Access 2007: Forms and Reports, Creator and Presenter Booth

Library, EIU.

4. Sandra Nees, Microsoft Access 2007: Queries, Creator and Presenter Booth Library, EIU.

11.6 Self Assessment Questions (SAQ)

1. Explain the steps to create a query in design view.

2. What are the steps to create a form in design view?

3. What are the various methods to create queries in MS-Access?

4. What do you mean by reports? What are the uses of reports? Discuss the procedure of

report generation in MS-Access.

5. What are Forms? Why forms are used in Ms-Access? How they are different from tables?

294

You might also like