DBMS

Chapter – 1: An Introduction to Database Systems
Writer: Dr. Kanwal Garg

Vetter: Prof. Rajender Nath
Structure:
1.1 Introduction
1.2 Objective
1.3 Presentation of Content
1.3.1 Data
1.3.2 Information
1.3.3 Knowledge
1.3.4 Difference between Data, Information and Knowledge
1.3.5 Manual Data Processing and its limitation
1.3.6 File Processing System and its Limitation
1.3.7 Database Approach
1.3.8 Characteristics of Database
1.3.9 The Database Management System
1.3.10 Advantages of Database Management Systems
1.3.11 Disadvantages of Database Management Systems
1.3.12 Difference between File System Approach and Database Management System
1.3.13 DBMS Terminology
1.3.14 Component of Database System
1.3.15 Database Administrator (DBA)
1.3.16 Disk Manager
1.3.17 Data Manager
1.3.18 File Manager
1.4 Summary
1.5 Suggested Reading/ Reference Material
1.6 Self Assessment Questions (SAQ)
1
1.1 Introduction
The word data refers to information or facts usually collected as the result of experience,
observation or experiment, or processes within a computer system, or premises. Data may consist
of numbers, words, or images, particularly as measurement or observations of a set of variables.
Data are often viewed as lowest level of abstraction from which information and knowledge are
derived.
A Database is an organized collection of data. The term originated with in the computer industry,
but its meaning has been broadened by popular use, to the extent that the some database
directives include electronic databases within its definition. A database is controlled by a
sophisticated software package called database management system. It has programs to set up
storage structures, load the data, and accept data request from programs and users.
Database plays important roles in all areas where computers are included such as library,
education, medicine, science etc. when you buy some goods from the market then you are simply
use the concept of databases. Databases play a crucial role in the growth of computer industry.
Before we start the concept of databases lets first start the basics of data and information.
1.2 Objective
This chapter discusses about the basic concept of database management system and provides an
excellent discussion about data, information and knowledge. It includes differentiation between
these three basic terms. This chapter comprise of file processing system and database system
along with its advantages and disadvantages. It defines the basic DBMS terminology and
explains the database components along with the brief role of people who design and manage
the database management system.
2
1.3.1 Data
Data is the term, which is very simple to grab. Data is defined as collection of meaningful facts
which can be stored and processed by the human or computer. In other words data is the material
on which computer program work upon. The word raw indicates that the facts have not yet been
processed to reveal their meaning. Data can be a number, letter or alphabet, word and special
symbol etc. Data can exist in any form, usable or not. It does not have meaning of itself. In
computer parlance, a spreadsheet generally starts out by holding data.
Example 1: The following sequence of digit 230504 is meaningless by itself since it could refer
to a part number of automobile, date of birth, the number of rupees spent on a project, population
of a town and so on. Therefore this sequence of digit would be considered as data.
Example 2: A set of words like ―Aryan, mathematics, highest mark, annual examination‖ would
be considered as data; since it is meaningless.
Figure 1.1: Data
1.3.2 Information
When the data is processed and converted into a meaningful and useful form, it is known as
information. Information will be generated after arranging data into a suitable and meaningful
form. For a business to be successful, a fast access to information is vital as important decisions
3
are based on the information available at any point of time. Such information can then be used as
the foundation for decision making. Traditionally, the data was stored in voluminous repository
such as files, books and ledgers.
However, storing data and retrieving information from these repositories was a time consuming
task. With the development of computers, the problem of information storage and retrieval was
resolved. Computers replaced tons of paper, file folders, and ledgers as the principal media for
storing important information. In computer parlance, a relational database makes information
from the data stored within it.
Example 3: In example 1, if we know the sequence of data refers to, then it becomes meaningful
and can be called information. When we write above as 23-05-07, it may mean date of birth.
Example 4: If the data mention in example 2 is processed as ―Aryan secured highest marks in
mathematics in annual examination‖ is information.
Figure 1.2: Information
1.3.3 Knowledge
Knowledge is the appropriate collection of information, such that its intent is to be useful.
Knowledge is a deterministic process. Knowledge is derived from information in the same way
information is derived from data. When someone "memorizes" information, then they have
amassed knowledge. This knowledge has useful meaning to them. It can be considered as the
integration of human perceptive processes that helps them to infer further knowledge.
4
Example 5: Elementary school children memorize, or amass knowledge of, the "number table".
They can tell you that "2 x 2 = 4" because they have amassed that knowledge (it being included
in their number table). So, knowledge is a continually gaining process from the information they
acquire from their day to day life.
Figure 1.3: Knowledge
Knowledge adds understanding and retention to information. It is the next natural progression
after information. Therefore if we want to have appropriate knowledge; we must have right
information. To have right information; we must have complete and correct data. Therefore
while maintaining the data is the database we must ensure that there should not be any missing
value, noise, inconsistency and incompleteness in data.
1.3.4 Difference between Data, Information and Knowledge
Data Information Knowledge
Data represents unorganized Information can be Knowledge is derived from
and unprocessed facts and considered as an aggregation information in the same way
figure. of data (processed data). information is derived from
data.
Data is not significant to a Information is significant to Knowledge always helps to a
business and of itself. business and of itself business for decision making
process.
5
Data is a prerequisite to Processed form of data Knowledge is usually based
information. represent to information. on learning, thinking, and
proper understanding of the
information
Data must be interpreted, by a Information is exchangeable Continuously gaining process
human or machine, to derive amongst people, about things, of information results into
meaning. facts and concept etc. knowledge.
Data can be a number, letter, Information provides answers Knowledge answers "how"
alphabet, word and special to "who", "what", "where", questions.
symbol etc. and "when" questions.
Example: In healthcare Example: The processed data Example: The trend of vital
industry data includes vital leads to certain information signs over time provides a
signs, weight and relevant which helps in providing the pattern that may lead to
assessment parameters. right clinical treatment. important decisions.
3.5 Manual Data Processing and its Limitation
In the early systems data was handled manually by the different users. The human being as the
users manages the whole database without the support of computers. It has got many problems
which are explained as below:
1. Address Dictionary: In an address book, numbers of pages are pre-allotted for writing
the address starting with the specific alphabet. Let it is ―A‖. Now if you start writing the
address related to names beginning with ―A‖; and if number of pages allotted to alphabet
―A‖ finished. Then it becomes a problem. One solution to the problem is to buy a new
6
address book with larger size and to transfer all the previous addresses in the new one.
This solution is very tiring and time consuming process. The second solution is to use
some blank pages at the end of same address book. This process is again cumbersome
because if you want to search the address for a specific person then you have to scroll the
allotted pages to that alphabet and also to search the last pages of that address book. So
searching has to perform twice at two different places in the same address book.
2. Repeated Transaction: There are many transactions which occur repeatedly on day to
day, week to week and month to month basis. For example to make the salary calculation
all payroll transactions are recorded manually in the ledger for a month and same
transactions are recorded again manually for the next month and so on. It‘s a just
calculation task and does not require any logic or intelligence. Therefore is not a wise
decision to waste the human skill and intelligence on such repetitive calculations.
3. Searching Process: Searching for a single entry in large number of manual records is
very difficult. For example in a publishing company Mr. Arpit is a subscriber. Now he
wants to renew his subscription. For the purpose he sent a cheque to the publishing
company. In this case the publisher has to search the all big list of subscriber to find out
the subscription number of Mr. Arpit. It is a boring and tiring job.
4. Updating the Manual Records: It is a difficult task to update the records of manual
database. First issue is the identification of appropriate record to be updated and the
second issue is the problem of overwriting. For security aspects we generally avoid
overwriting in the records because it may give a wrong impression to the reader of that
record.
7
Hence, when the database is large in size and difficult to manage then it is better to use
computers; rather than manually handling and processing the database.
1.3.6 File Processing System and its Limitation
File processing systems was an early attempt to computerize the manual filing system that we are
all familiar with. A file system is a method for storing and organizing computer files and the data
they contain to make it easy to find and access them. The manual filing system works well when
the number of items to be stored is small. It even works quite adequately when there are large
numbers of items and we have only to store and retrieve them. However, the manual filing
system breaks down when we have to cross-reference or process the information in the files.
There are following problems associated with the File Based approach:
1. Duplication in data: In this system data stored in the files are independent to each other.
Therefore, there is possibility of storing the same data in the multiple files. This causes to
duplication in data.
Example 6: Student roll number may be stored twice in two different files.
2. Inconsistency of data: Since in file processing system, the files being maintained are
independent to each other. It means that there is no relationship among them. Therefore,
if any data item is to be changed then all files containing related data need to be updated.
The problem arise if all the files might not be updated causing to inconsistency.
Example 7: Employee qualification may be maintained in two or my files. In case of
improvement in qualification; the data item may be updated only in one file and rest of
the file might not be updated. This results into inconsistency.
3. Lack of Data integrity: It is problem of ensuring that the data in the database is
accurate. In any application there are certain data integrity rules in the form of certain
8
condition and constraints that need to be maintained. In the file system it is not possible
to change the application program to apply such rules because these programs are
application dependent. So, it is difficult to maintain data integrity.
Example 8: The integrity constraint that the phone number of the student should be of 10
digits only, has to be implemented in all application programs using student file. For one
application, it is quite easy to incorporate this integrity rule, but for a number of
application programs it may be quite difficult to maintain.
4. Lack of Security: In a file system security constraints are not easy to enforce because
data is stored in different independent files. Therefore unauthorized users can destroy the
file data and it leads to data inconsistency.
Example 9: Any unauthorized user can access your files and can perform any fraudulent
operation on your data.
5. Data dependence: Data is stored in the files and files are maintained to fulfill the
requirement of application program. All application programs are independent to each
other. Therefore if any change took place in any data item, then it must be updated in the
entire application programs using that data. This is called as data dependence.
Example 10: Let an organization changes there employee id‘s from 6-digit to 10- digits,
all the application program that uses the data item have to be modified.
6. Difficult to share data: Files maintain the data may be of different format. Therefore
format of data stored in one file will differ from the format of data stored in other file. If
at any time data of these two files are need to be shared then different data format will
cause a problem. The solution to the problem is to develop an interface which further is a
time consuming process.
9
Example 11: Gender of MBA students is stored as ―1‖ and ―0‖ (Where ―1‖ stands for
male and ―0‖ stands for female); hence data type may be number, whereas gender of
MCA students is stored as ―M‖ and ―F‖ (Where ―M‖ stands for male and ―F‖ stands for
female); hence data type may be character. If we want to calculate the total number of
male and female in MBA and MCA then data type will create a problem.
7. Difficult to get quick response: Queries in the application program are written to meet
the specific requirements. If any clause of the query change, then it becomes difficult to
retrieve quick response.
Example 12: Suppose there is a condition that a student; whose age is 35 to 40 year can
only applies for a specific job. But if this age criteria changes from 30 to 35 years; then
respective changes has to be incorporated on all the related queries belonging to that
application program first. So it will delay the response time.
8. Concurrent problem: In a file system, when two or more users access the same data file
for read and write operations, then problem of concurrency may arise which leads to data
in inconsistent state.
Example 13: Suppose a spouse opens a bank account with a balance of Rs. 5000. After
some day husband withdraws Rs. 500 and balance remains as Rs. 4500, at the same time
wife also withdraws Rs. 700 while having the impression that balance would be Rs. 5000.
Since both transactions are executing concurrently therefore the problem of concurrency
arises.
9. Inadequate to Represent Data Modeling of Real World: Data in the file system is
simple maintained to support only an application program. It does not show any
10
relationship among data in different files. Moreover complex data cannot define in the
file system.
10. Difficulty in Data Representation from User’s View: To create useful application for
the user, it is necessary to combine the data of different files. But in file system
independent and isolated data is recorded and relationship among them is very hard to
determine. Therefore data in file system do not meet the user‘s requirement.
In order to remove all the above limitations of file based approach, a new approach was required
that must be more effective known as Database Approach.
1.3.7 Database Approach
The database is a shared collection of logically related data, designed to meet the information
needs of an organization. A database is a computer based record keeping system whose over all
purpose is to record and maintains information. The database is a single, large repository of data,
which can be used simultaneously by many departments and users. Instead of disconnected files
with redundant data, all data items are integrated with a minimum amount of duplication. The
database is no longer owned by one department but is a shared corporate resource. The database
holds not only the organizational operational data but also a description of this data. For this
reason, a database is also defined as a self-describing collection of integrated records. The
description of data is known as Data Dictionary or Meta Data. It is the self describing nature of a
database that provides program-data independence.
A database implies separation of physical storage from use of the data by an application program
to achieve program/ data independence. Using a database system, the user or programmer or
application specialist need not know the details of how the data are stored and such details are
transparent to the users. Changes can be made to data without affecting other components of the
11
system. These changes include change of data format or file structure or relocation from one
device to another.
1.3.8 Characteristics of Database
The data in a database should have the following features:
1. Shared: data in a database are shared among different users and applications.
2. Persistence: Data in a database exist permanently in the sense the data can live beyond
the scope of the process that created it.
3. Validity/ Integrity/ Correctness: Data should be correct with respect to the real world
entity that they represent.
4. Security: Data should be protected from unauthorized access.
5. Consistency: Whenever more than one data elements in a database represent related real
world values, the values should be consistent with respect to the relationship.
6. Non-redundancy: No two data-items in a database should represent the same real world
entity.
7. Independence: Data at different level should be independent of each other so that the
changes in one level should not affect the other levels.
To create, manage and manipulate data in databases, a management system known as database
management system was developed.
1.3.9 The Database Management System
A database management system (DBMS) is a general-purpose software system. It is a collection
of programs that enables users to define, create and maintain a database and provide controlled
access to the data. Defining a database involves specifying the data types, structures, and
constraints for the data to be stored in the database. Database may be defined as repository of
12
data for an organization such that it can be shared and integrated. Creating the database is the
process of storing the data itself on some storage medium that is controlled by the DBMS.
Manipulating a database includes such functions as querying the database to retrieve specific
data, updating the database to reflect changes in the real world, and generating reports from the
data. There are different types of DBMS ranging from small systems that run on personal
computers to huge systems that run on mainframes. The following are main examples of
database applications:
1. Computerized Library System
2. Automated teller machines
3. Railway/ Flight reservation systems
4. Computerized inventory systems and so on.
These systems allow users to create, update and extract information from their databases.
Compared to a manual filling system, the biggest advantages to a computerized database system
are speed, accuracy and accessibility. The other advantages of a DBMS are as follows.
1.3.10 Advantages of Database Management Systems
The database Management System has promising potential advantages, which are explained
below:
1. Data independence: Application programs should be as independent as possible from
details of data representation and storage. The DBMS can provide an abstract view of the
data to insulate application code from such details.
2. Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the
data in such a manner that users can think of the data as being accessed by only one user
at a time. Further, the DBMS protects users from system failures.
13
3. Reduced application development time: Clearly, the DBMS supports many important
functions that are common to many applications accessing data stored in the DBMS.
This, in conjunction with the high-level interface to the data, facilitates quick
development of applications. Such applications are also likely to be more robust than
applications developed from scratch because many important tasks are handled by the
DBMS instead of being implemented by the application.
4. Reduction of Redundancies: Centralized control of data by the DBA avoids
unnecessary duplication of data and effectively reduces the total amount of data storage
required. It also eliminates the extra processing necessary to trace the required data in a
large mass of data.
5. Elimination of Inconsistencies: The main advantage of avoiding duplication is the
elimination of inconsistencies that tend to be present in redundant data files. Any
redundancies that exist in the DBMS are controlled and the system ensures that these
multiple copies are consistent.
6. Shared Data: A database allows the sharing of data under its control by any number of
application programs or users. For example, the applications for the public relations and
payroll departments can share the same data.
7. Integrity: Centralized control can also ensure that adequate checks are incorporated in
the DBMS to provide data integrity. Data integrity means that the data contained in the
database is both accurate and consistent. Therefore, data values being entered for the
storage could be checked to ensure that they fall within a specified range and are of the
correct format.
14
8. Security: Data is of vital importance to an organization and may be confidential. Such
confidential data must not be accessed by unauthorized persons. The DBA who has the
ultimate responsibility for the data in the DBMS can ensure that proper access procedures
are followed, including proper authentication schemes for access to the DBMS and
additional checks before permitting access to sensitive data. Different levels of security
could be implemented for various types of data and operations.
9. Conflict Resolution: Since the database is under the control of the DBA, he/she should
resolve the conflicting requirements of various users and applications. In essence, the
DBA chooses the best file structure and access method to get optimal performance for the
response-critical applications, while permitting less critical applications to continue to
use the database, albeit with a relatively slower response.
10. Standards can be enforced: Since all access to the database must be through DBMS so
standards can be enforced. Standards may relate to naming of data, format of data,
structure of data etc. Standardizing stored data formats is usually desirable for the
purpose of data interchange or migration between systems.
1.3.11 Disadvantages of Database Management Systems
Although there are many advantages of DBMS, the DBMS may also have some minor
disadvantages. These are as follow:
1. Cost of software/hardware and migration: A significant disadvantage of the DBMS
system is cost. In addition to the cost of purchasing or developing the software, the
hardware has to be upgraded to allow for the extensive programs and work spaces
required for their execution and storage. The processing overhead introduced by DBMS
to implement security, integrity, and sharing of the data causes a degradation of the
15
response and through-put times. An additional cost is that of migration from a
traditionally separate application environment to an integrated one.
2. Problem associated with centralization: While centralization reduces duplication, the
lack of duplication requires that the database be adequately backed up so that in the case
of failure the data can be recovered. Centralization also means that the data is accessible
from a single source. This increases the potential severity of security breaches and
disruption of the operation of the organization because of downtimes and failures. The
replacement of a monolithic centralized database by a federation of independent and
cooperating distributed databases resolves some of the problems resulting from failures
and downtimes.
3. Complexity of Backup and Recovery: Backup and recovery operations are fairly
complex in a DBMS environment, and this is exacerbated in a concurrent multi user
database system. Furthermore, a database system requires a certain amount of controlled
redundancies and duplication to enable access to related data items.
4. Cost of Data Conversion: When a computer file-based system is replaced with a
database system, the data stored into file must be converted to database file. It is very
difficult and costly method to convert data files into database. For the purpose, we have
to hire database and system designer along with application programmer.
5. Cost of staff training: Most of DBMSs are often complex system so the training for
users to use DBMSs is required. Training is required to all levels including programming
to application development and database administration.
16
6. Database damage: In most of the organizations all data is integrated into a single
database. If database is damaged due to electricity failure or database is corrupted on the
storage media then your valuable data may be lost forever.
7. High cost of DBMS: Because a complete DBMS is very large and sophisticated piece of
software so it is expensive to purchase.
8. Slower processing in some applications: Although integrated database is designed to
provide better information, certain applications may be slower due to the integration of
data.
1.3.12 Difference between File System Approach and Database Management System
File System Approach Database Management System
1. It is used for small system. 1. It is used for large system.
2. It is relatively cheaper system. 2. It is relatively more expensive.
3. Large files are stored under this system. 3. Fewer files are stored under this system.
4. Data is stored in the form of files. 4. Data is stored in the form of tables.
5. It has simple structure. 5. It has comparatively complex structure.
6. The data may have redundancy under this 6. Under this system there is reduced
system. redundancy.
7. Data is isolated from other. 7. Data can be shared under this system.
8. There is no security and integrity of data. 8. It maintains security and integrity of data.
9. Backup and recovery process is simple in 9. Backup and recovery process is complex in
this system. this system.
10. Its examples are C, COBOL. 10. Its example is oracle Oracle, SQL.
11. There is no data independence. 11. There exists data independence.
17
1.3.13 DBMS Terminology
Database – A collection (or list) of information. A database is comprised of one or more lists
(called tables) of data organized by columns, rows, and cells.
Tables – The view that displays the data base as a combinations of rows (records) and columns
(fields). The cells contain the bits and pieces of data for each record in each field. The first row
of a table is reserved for the field names.
Field names – Identify the different categories in a database. The top row is reserved for field
names. Examples of field names are First name, last name, address, city, state, zip, phone
number.
Field – Categories in a database. Fields are displayed in columns. For Example, in a database,
the address field contains the address for each of the records. These are the bits and pieces of
data.
Field Name Record Cell

Table Name: Student
Student_id Student_name Student_age Student_grade
101 Arpit 19 Post graduate
102 Siddhant 15 Under graduate
103 Aryan 16 Under graduate
104 Satvik 19 Post graduate
Figure 1.4: Basic Terminology of DBMS
18
Records – Related information that is separated by columns or fields. A name and address are
considered one record in the database. A second Name and address are a different record.
Cells - The intersection of columns and rows that contain the data for each record.
Data – All of the records of information in a database including the field names i.e. Data + Field
Names = Records & All Records = a Database.
1.3.14 Component of Database System
A database system is composed of four components;
1. Data
2. Hardware
3. Software
4. Users
These components coordinate with each other to form an effective database system.
1. Data - It is a very important component of the database system. Most of the organizations
generate, store and process large amount of data. The data acts a bridge between the
machine parts i.e. hardware and software and the users which directly access it or
access it through some application programs. Data may be of different type as
explained below:
a) User Data - It consists of a table(s) of data called Relation(s) where Column(s) are
called fields or attributes and rows are called Records for tables. A Relation must be
structured properly.
b) Metadata - A description of the structure of the database is known as Metadata. It
basically means "data about data". System Tables store the Metadata which includes.
- Number of Tables and Table Names
19
- Number of fields and field Names
- Primary Key Fields
- Null Constraint
c) Application Metadata - It stores the structure and format of Queries, reports and
other applications components.
Figure 1.5: Data Base System
2. Hardware - The hardware consists of the secondary storage devices such as magnetic disks
(hard disk, zip disk, floppy disks), optical disks (CD-ROM), magnetic tapes etc. on
which data is stored together with the Input/ Output devices (mouse, keyboard,
printers), processors, main memory etc. which are used for storing and retrieving the
data in a fast and efficient manner. Since database can range from those of a single user
with a desktop computer to those on mainframe computers with thousands of users,
therefore proper care should be taken for choosing appropriate hardware devices for a
required database.
3. Software - The Software part consists of DBMS which acts as a bridge between the user and
the database or in other words, software that interacts with the users, application
20
programs, and database and files system of a particular storage media (hard disk,
magnetic tapes etc.) to insert, update, delete and retrieve data. For performing these
operations such as insertion, deletion and updation, we can either use the Query
Languages like SQL, QUEL, Gupta SQL or application software such as Visual 3asic,
Developer etc.
4. Users - Users are those persons who need the information from the database to carry out their
primary business responsibilities i.e. Personnel, Staff, Clerical, Managers, and
Executives etc. On the basis of the job and requirements made by them they are
provided access to the database totally or partially. The people who work with
databases include database users, system analysts, application programmers, and
database administrator (DBA).
Database users are those who interact with the database in order to query and update the
database, and generate reports. Database users are further classified into the following categories:
a) Naive users: The users who query and update the database by invoking some
already written application programs. For example, the owner of the bookstore enters the
details of various books in the database by invoking appropriate application program. The
naive user interacts with the database using form interface.
b) Sophisticated users: The users, such as business analyst, scientist, etc., who are
familiar with the facilities provided by a DBMS interact with the system without writing
any application programs. Such users use database query language to retrieve information
from the database to meet their complicated requirements.
c) Specialized users: The users who write specialized database programs, which are
different from traditional data processing applications, such as banking and payroll
21
management which use simple data types. Specialized users write applications such as
computer-aided design systems, knowledge-base and expert systems that store data
having complex data types.
d) System analysts: The users determine the requirements of the database users
(especially naive users) to create a solution for their business need, and focus on non-
technical and technical aspects. The non-technical aspects involve defining system
requirements, facilitating interaction between business users and technical staff, etc.
Technical aspects involve developing the specification for user interface (application
programs).
e) Application programmers: These are the computer professionals who
implement the specifications given by the system analysts, and develop application
programs. They can choose tools, such as rapid application development (RAD) to
develop the application program with minimal effort. The database application
programmer develops application program to facilitate easy data access for the database
users.
1.3.15 Database Administrator (DBA)
Database Administrator is a person who has central control over both data and application
programs. The responsibilities of DBA vary depending upon the job description and corporate
and organization policies. Some of the responsibilities of DBA are given here.
a) Schema definition and modification: The overall structure of the database is known as
database schema. It is the responsibility of the DBA to create the database schema by
executing a set of data definition statements in DDL. The DBA also carries out the
changes to the schema according to the changing needs of the organization.
22
b) New software installation: It is the responsibility of the DBA to install new DBMS
software, application software, and other related software. After installation, the DBA
must test the new software.
c) Security enforcement and administration: DBA is responsible for establishing and
monitoring the security of the database system. It involves adding and removing users,
auditing, and checking for security problems.
d) Data analysis: DBA is responsible for analyzing the data stored in the database, and
studying its performance and efficiency in order to effectively use indexes, parallel query
execution, etc.
e) Preliminary database design: The DBA works along with the development team during
the database design stage due to which many potential problems that can arise later (after
installation) can be avoided.
f) Physical organization modification: The DBA is responsible for carrying out the
modifications in the physical organization of the database for better performance.
g) Routine maintenance checks: The DBA is responsible for taking the database backup
periodically in order to recover from any hardware or software failure (if occurs). Other
routine maintenance checks that are carried out by the DBA are checking data storage
and ensuring the availability of free disk space for normal operations, upgrading disk
space as and when required, etc.
1.3.16 Disk Manager
The disk manager is part of the operating system of the host computer and all physical input and
output operations are performed by it. The disk manager transfers the block or page requested by
23
the file manager so that the latter need not be concerned with the physical characteristics of the
underlying storage media.
1.3.17 Data Manager
The data manager is the central software component of the DBMS. It is sometimes referred to as
the database control system. One of the functions of the data manager is to convert operations in
the user's queries coming directly via the query processor or indirectly via an application
program from the user's logical view to a physical file system. The data manager is responsible
for interfacing with the file system. In addition, the tasks of enforcing constraints to maintain the
consistency and integrity of the data, as well as its security, are also performed by the data
manager. It is also the responsibility of the Data Manager to provide the synchronization in the
simultaneous operations performed by concurrent users and to maintain the backup and recovery
operations.
1.3.17 File Manager
Responsibility for the structure of the files and managing the file space rests with the file
manager. It is also responsible for locating the block containing the required record, requesting
block from the disk manager, and transmitting the required record to the data manager as shown.
The file manager can be implemented using an interface to the existing file subsystem provided
by the operating system of the host computer or it can include a file subsystem written especially
for the DBMS.
1.4 Summary
Human is dealing with data and information since long time, perhaps beginning of the
civilization human is manipulating data. Since then, given and take of information is in practice,
which further contributes to meaningful knowledge.
24
To achieve the objective of this chapter, origin of database concept extend from file systems has
been discussed. Flaws of file systems, advantages of database system and a comparative survey
among them are sketched.
Basic terminology of database management system, its characteristics, and important
components of database systems along with the role of different users are explained herewith.
1. Elmasri & Navathe: Fundamentals of Database systems, 3rd Edition, Addison Wesley,
New Delhi.
2. Korth & Silberschatz : Database System Concept, 4th Edition, McGraw Hill
International Edition.
3. Raghu Ramakrishnan & Johannes Gehrke: Database Management Systems, 2nd
edition, Mcgraw Hill International Edition.
4. C.J.Date: An Introduction to Databases Systems, 7th Edition, Addison Wesley, New
Delhi.
5. Bipin C.Desai : An Introduction to Database System, Galgotia Publication, New Delhi
6. O‘ Brien J.A.: Introduction to Information System in Business Management, 6th Edition,
Richard D. Irwin, Inc. 1991.
1. What is information? How it differs from data? Define database management system.
2. What is file system approach? What are its various limitations?
3. What is DBMS? What are its various advantages and disadvantages?
4. Differentiate between
a) File system approach and database approach.
25
b) Data, Information and Knowledge
5. List some advantages of DBMS as compared to a conventional data processing system.
6. Who is DBA? What are the responsibilities of DBA?
7. Discuss the role of data manager, file manager and disk manger in database management
system.
8. What are the important components of database management system? Explain database
users.
26
Chapter – 2: Database Systems Architecture, Functions &
Component Modules
Structure:
2.1 Introduction
2.2 Objective
2.3.1 Database Instances and Schemas
2.3.2 Three Level Architecture of DBMS
2.3.3 Mapping
2.3.4 Data Independence
2.3.5 Database Language and Interface
(i) DBMS Languages
(ii) DBMS Interface
2.3.6 DBMS Functions
2.3.7 Component Modules of DBMS
2.4 Summary
27
2.1 Introduction
A collection of data designed to be used by different people is called a database. It is a collection
of interrelated data stored together with controlled redundancy to serve one or more applications
in an optimal fashion. The data are stored in such a fashion that they are independent of the
program to the people using the data. A common and controlled approach is used in adding data
and modifying and retrieving existing data within the database.
The users of the database do not have to worry about the physical implementation and internal
working of the database. The database management system has different layers and the different
user needs to interact with assigned layer only.
In this chapter, we present three tier architecture of DBMS package, which has been evolved
from the traditional system, where the whole database was tightly integrated. Mapping among
the different views/levels of three tier architecture are explained. Modification of data at schema
level, which keeps data separated from all program is discussed. DBMS languages, interfaces,
functions and component modules are taken care of for the clear understanding of DBMS.
2.2 Objective
After going through to this chapter the student will be having clear understanding to separate the
user application and the physical database. Three tier architecture of DBMS is presented to
provide a framework on which subsequent chapters can build. Such a framework is useful for
describing general database concepts. This architecture is proposed by ANSI/SPARC (American
National Standard Institute/ Standards Planning and Requirement Committee) which explains
database as three views. The mapping will define the correspondence between three view levels.
Modification of schema in the database will clarify the points to be taken care of. To
communicate with the database, DBMS interfaces and languages are incorporated. A detailed
28
knowledge of DBMS functions and component module is delivered in the last section of this
chapter.
2.3.1 Instances and Schema
Database changes over time when information is inserted or deleted. The collection of
information stored in the database at a particular moment is called an instance of the database.
The overall design of the database is called the database schema.
A schema diagram, as shown below, displays only names of record types (entities) and names of
data items (attributes) and does not show the relationship among the various files.
Schema of SupplierMaster
Sid Sname SCity Product Price Qty
Schema of ClientMaster
CNo Cname Caddress CPhoneno
Instances of Table SupplierMaster are as follow:
Sid Sname SCity Product Price Qty
S001 Compware Kurukshetra HP Laptop 29000/- 5
Computers
S002 Dell Computers Panipat Dell Laptop 31000/- 5
S003 Krishna Computers Karnal Acer Laptop 27000/- 8
Figure 2.1: Schema and Instance
29
The schema will remain the same while the values filled into it change from instant to instant.
When the schema framework is filled in with data item values, it is referred as an instance of the
schema. The data in the database at a particular moment of time is called a database state or
snapshot, which is also called the current set of occurrences or instances in the database.
In other words, ―the description of a database is called the database schema, which is specified
during database design and is not expected to change frequently‖.
A schema diagram displays only some aspects of a schema, such as the names of record types
and data items and some types of constraints. Other aspects are not specified in the schema
diagram. As in the above figure there is neither the data types of attributes nor the relationship
among the files are shown.
The actual data in database may change quite frequently. The data in the database at a particular
moment in times is called a database state or snapshot. It is also known as current set of
occurrence or instances in the database. The DBMS is partially database responsible for ensuring
that every state of the database is a valid state that is a state that satisfies the structure and
constraint specified is schema. Hence, specifying a correct schema to the DBMS is extremely
important, and the schema must be changed with care.
2.3.2 Three Level Architecture of DBMS
A database management system is a mega software designed to assist in managing, maintain and
utilizing large collection of data. A database is hence a general purpose software system that
facilitates the process of defining (task of internal view), constructing (task of conceptual view)
and manipulating (task of external view) database for various applications. A Complete
understanding of these processes, their role and implementations are discussed in detail in three
level/ tier architecture of DBMS.
30
The three-level/ tier architecture as shown in Figure 2.1 is also known as three-schema
architecture of database management system. The purpose of the three-tier architecture is to
separate the user applications and the physical database. Three schema architecture is a
convenient tool with which users can visualize the schema levels in a database system. DBMS
architecture is a framework where the structure of DBMS is defined. The main aim of this
architecture is to achieve the characteristics by defining the abstract view of the data and by
hiding the details from end users.
Three level architecture frame work was suggested by ANSI/SPARC (American National
Standard Institute/ Standards Planning and Requirement Committee). The view at each level is
described by schema. A schema is an outline or plan that describes the structure of database. The
word scheme means a systematic plan for achieving some goals. The word scheme can be
interchangeably used by schema. The subset of schema is known as Subschema. It refers to the
user‘s view of field that he uses from the database. Each view accesses some portion of database.
The architecture of DBMS is divided into three view levels;
1. External view level
2. Conceptual view level
3. Internal view level
1. External Level: - The external Level is described by an external schema i.e. it consists of
definition of logical records and relationship in the external view. Each external schema
describes the part of the database that a particular user group is interested in and hides the rest of
the database from that user group. It also contains the method of deriving the objects in the
external view from the objects in the conceptual view.
31
End Users
External Level External View/ External View/

Schema Schema
(Individual User External- Conceptual

Mapping
View)
Conceptual Conceptual View/ Schema

Level
(Community User
Conceptual- Internal
View) Mapping
Internal Level Internal View/ Schema
(Storage View)
Figure 2.2: Three-tier Architecture of DBMS
This is the highest level of abstraction where only those parts of the entire databases are include
which are of concern to a user. Despite the use of simpler structures at the logical level, some
complexity remains, because of the large size of the database. Many users of the database system
will not be concerned with all this information. Instead, such users need to access only a part of
the database. So that their interaction with the system is simplified, the view level of abstraction
is defined. The system may provide many views for the same database. Users can always fulfill
all demand using the part of the view provided and may never need the entire database so it is
32
called ―user‘s view‖ and ―view‖ which is complete and independent. The external view is written
in external schema using external data sub-language (DSL).
2. Conceptual Level: - The conceptual level has a conceptual schema which represents the
structure of entire database for a community of users. Conceptual schema describes the records
and relationship included in the Conceptual view. The conceptual schema hides the details of
physical storage structures and concentrates on describing entities, data types, relationships, user
operations, and constraints. It also contains the method of deriving the objects in the conceptual
view from the objects in the internal view.
One conceptual view represents the entire database. There is only one conceptual view per
database. It is large, complex and sophisticated. Database change over time as data is inserted
and deleted. The collection of information stored in the database at a particular moment is called
an instance of the database. The overall design of the database is called database schema and
these schemas changes frequently. The description of data at this level is in a format independent
of its physical representation. It also includes features that specify the checks to retain data
consistence and integrity. The conceptual view is written in conceptual schema using conceptual
data sub-language (DSL).
3. Internal Level: - The internal level has an internal schema. Internal level indicates how the
data will be stored and describes the data structures and access method to be used by the
database. It contains the definition of stored record and method of representing the data fields
and access aid used.
This lowest level of abstraction describes how the data are stored in the database, and what
relationship exists among those data. The entire database is thus described in terms of a small
number of relatively simple structures. Although implementation of the simple structures at the
33
logical level may involve complex physical-level structures, the user of the logical level does not
need to be aware of this complexity. The internal view is written in internal schema using
internal data sub-language (DSL).
2.3.3 Mapping
The processes of transforming requests and results between levels are called mappings. These
mappings may be time-consuming, so some DBMSs, especially those that are meant to support
small databases do not support external views. Even in such systems, however, a certain amount
of mapping is necessary to transform requests between the conceptual and internal levels. The
mapping description is stored in data dictionary. The DBMS is responsible for mapping between
these three types of schemas. There are two types of mapping.
1. External- Conceptual Mapping
2. Conceptual- Internal Mapping
External- Conceptual Mapping: A mapping between external and conceptual views gives the
correspondence among the records and relationship of the conceptual and external view. The
external and conceptual mapping tells the DBMS which objects on the conceptual level
correspond to the object requested on a particular user‘s external view. If changes are made to
either external view or conceptual view, then mapping must be changed accordingly.
Conceptual- Internal Mapping: The Conceptual- Internal mapping defines the correspondence
between the conceptual view and the internal view, i.e. the database stored on the physical
storage device. It describes how conceptual records are stored to and retrieved from the storage
device. This means that Conceptual- Internal mapping tells the DBMS that how the conceptual
records are physically represented. If the structure of the stored database is changed, then the
mapping must be changed accordingly. It is the responsibility of DBA to manage such changes.
34
These mapping are used primarily for data independence. All details are used in these mapping
so as to make overall view data independent. The changes in mapping are responsibilities of
DBA. In addition to the mapping and three views, there are three more points of reference in
architecture. One is DBMS; other is DBA and third is user-interface. An example for the
implementation of the above three view is given below:
Conceptual View:
Schema Name = Student
{Regd : Char(8); Primary Key;
RollNo : Number(4) Candidate Key;
Name : Char(20);
Address : Varchar2(20);
Marks1 : Number(3);
Marks2 : Number(3);
Marks3 : Number(3);
TMarks : Number(3);
Grade : Char(2);
External View:
View N1 = Student
{Regd : Char(8); Primary Key;
Name : Char(20);
TMarks : Number(3);
Grade : Char(2);
35
}
View N2 = Student
{ RollNo : Number(4) Candidate Key;
Name : Char(20);
Address : Varchar2(20);
Grade : Char(2);
Internal View:
Schema = Student
Block size = 1MB
File Name = xyz
Offset =0
Starting cylinder = xxxxxxxx
Ending Cylinder = xxxxxxxx
Organization = Index SQL etc.
The three views represented here give a broad idea of the database views. At conceptual level all
entries are made. At external level data availability is dependent on user. At internal level
technical details are provided.
2.3.4 Data Independence
The ability to modify a schema definition at one level of a database system without having to
change the schema at the next higher level is called data independence. Data independence is a
form of database management that keeps data separated from all programs that make use of data.
There are two types of data independence:
36
1. Logical data independence: It is the capacity to change the conceptual schema without
having to change external schema or application programs. We may change the conceptual
schema to expand the database (by adding a record type or data item), to change constraints, or
to reduce the database (by removing a record type or data item). To expand database, we can do
changes in conceptual schema and we can also change conceptual schema to change constraints.
It means that logical data independence gives us the freedom of changing the conceptual schema
without worrying about external schema. For example sometimes we may need to change the
logical schema by adding or removing the fields or attributes from the database. With logical
data independence, the change is possible.
Example 1: The addition or removal of new entity, attributes, and relationships to the conceptual
schema should be possible without having to change existing schemas or having to rewrite
existing application programs. Consider a relation i.e. Student (name, rollno, class)
Student
Name Rollno Class
If one more attribute i.e. Marks, is added in to the existing relation i.e. Student then the structure
of the relation looks like:
Student
Name Rollno Class Marks
Figure 2.3: Logical Data Independence
In Figure 2.2, we may need to change the logical schema by adding or removing the fields/
attributes from the database. With logical data independence, the change is possible. The change
37
would be absorbed by the view definitions and mapping between the external and the conceptual
view.
2. Physical data independence: It is the capacity to change the internal schema without having
to change the conceptual schema. Hence, the external schemas need not be changed as well.
Changes to the internal schema may be needed because of using different file organization or
storage structure, storage devices, or indexing strategy should be possible without having to
change the conceptual or external schema.
Alteration in the internal schema might include:
 Using new storage devices.
 Using different data structures.
 Switching from one access method to another.
 Using different file organization or storage structures.
 Modifying indexes.
2.3.5 Database Language and Interface
The main objective of DBMS is to allow its users to perform a number of operations on database
such as retrieval, deletion and modification of data in abstract terms without knowing about the
physical representation of data. Therefore DBMS must provide appropriate languages and
interfaces for each category of users. In this section we discuss the types of languages and
interfaces provided by a DBMS and the user categories targeted by each interface.
(i) DBMS Languages
A language is needed to describe the database to the DBMS as well provided facilitites for
changing the database and for defining and changing physical data structure. Another language is
called Data Description/ Definition Language (DDL) and Data Manipulation Language (DML)
38
respectively. Each DBMS has a DDL as well as a DML. The two languages may be parts of a
unified database language. The DBMS languages are of three forms as explained below:
1. Extended Host Languages
These are the subroutine called from one or more programming languages. For example, a
system amy provide extension to COBOL, FORTRAN, C, C++ etc. to enable the user to interact
with the database. The programming language that is extended is usually called the host
language.
2. Query Language
These are special purpose languages that usually provide more powerful facilities to interact with
the database. These languages are often designed to be simple so that non-programmers may use
this easily. There are four types of database languages or you may call it as SQL components i.e.
Data Definition Language (DDL), Data Manipulation Language (DML), Data Control Language
(DCL) and Transaction Control Language (TCL).
DDL is a computer language for defining the type of data structure used in database. DDL
statements are used to create, modify and remove database objects such as tables, indexes and
users. CREATE, ALTER, DROP, TRUNCATE, RENAME are the various DDL commands.
These commands are used for a specific purpose as given below:
 CREATE- To create objects in the database
 ALTER- Alters the structure of the database
 DROP- Delete objects from the database
 TRUNCATE- Remove all records from a table, including all spaces allocated to the
record.
 RENAME- Rename an object
39
DDL has a pre-defined syntax for describing data.
DML is a family of computer languages used by computer program database users to retrieve,
insert, delete and update data in a database. Currently most popular DML is that of SQL, which
is used to retrieve and manipulate data in a relational database. DML may be of two types i.e.
procedural (the user specifies what data is needed and how to get it) and non-procedural (the user
specifies what data is needed). DML performs the operations like SELECT, INSERT, UPDATE,
DELETE. These commands are used for a specific purpose as given below:
 SELECT- Retrieve data from a database
 INSERT- Insert data into table
 UPDATE- Updates existing data within a table
 DELETE- Delete all records from a table, the space for the records remain.
DCL is a computer language and subset of SQL, used to control access the data in a database.
That is a user can access any data based on the privileges given to him. DCL statements are used
to provide a kind of security to the database. GARNT and REVOKE are the commands that
come under the preview of DCL. The purpose of using these commands is as follow:
 GRANT- To allow specified users to perform specified tasks.
 REVOKE- To cancel previously granted or denied permissions.
TCL statements are used to manage the changes made by DML statements. It allows statements
to be grouped together into logical transactions. For revoking the transactions and to make the
data commit to database, we use TCL statements. COMMIT, ROLLBACK, SAVEPOINT, SET
TRANSACTION are TCL statements. These statements works as follow:
COMMIT- Save work done
ROLLBACK- Identify a point in a transaction to which you can later roll back
40
SAVEPOINT- Restore database to original since the last COMMIT
SET TRANSACTION- Change transaction options like isolation level and what rollback
segment to use.
3. Data Sublanguage
In relational database theory, the term sublanguage, first used for this purpose by E. F. Codd in
1970, refers to a computer language used to define or manipulate the structure and contents of a
relational database management system (RDBMS). Typical sublanguages associated with
modern RDBMS's are QBE (Query by Example) and SQL (Structured Query Language). In
1985, Codd encapsulated his thinking in twelve rules which every database must satisfy in order
to be truly relational. The fifth rule is known as the Comprehensive data sublanguage rule, and
states.
(ii) DBMS Interfaces
A DBMS interface is the abstraction of a piece of functionality of a DBMS. It usually refers to
the communication boundary between the DBMS and clients or to the abstraction provided by a
component within a DBMS.
Figure 2.4: Working Principle of DBMS Interface
41
A DBMS interface hides the implementation of the functionality of the component it
encapsulates. Any real life data stored via an application poses with the help of SQL, a query
language, a query to the database system. There, the corresponding answer (result set) is
prepared and also with the help of SQL given back to the application. This communication can
take place interactively or be embedded into another language. Working principle of a database
interface is shown in Figure 2.4 as given above:
DBMS provides the User-friendly interfaces which may include the following:
1. Menu-Based Interfaces for Web Clients or Browsing: These interfaces present the
user with lists of options, called menus, which lead the user through the formulation of a request.
Menus do away with the need to memorize the specific commands and syntax of a query
language; rather, the query is composed step by step by picking options from a menu that is
displayed by the system.
2. Forms-Based Interfaces: A forms-based interface displays a form to each user. Users
can fill out all of the form entries to insert new data, or they fill out only certain entries, in which
case the DBMS will retrieve matching data for the remaining entries. Forms are usually designed
and programmed for naive users as interfaces to canned transactions. A form based interface is
shown in Figure 2.5 for reader understanding.
3. Graphical User Interfaces: A graphical user interface (GUI) typically displays a schema
to the user in diagrammatic form. The user can then specify a query by manipulating the
diagram. In many cases, GUIs utilize both menus and forms. Most GUIs use a pointing device,
such as a mouse, to pick certain parts of the displayed schema diagram.
4. Interfaces for Parametric Users: Parametric users, such as bank tellers, often have a
small set of operations that they must perform repeatedly. Systems analysts and programmers
42
design and implement a special interface for each known class of naïve users. Usually, a small
set of abbreviated commands is included, with the goal of minimizing the number of keystrokes
required for each request.
Figure 2.5: Form Based Interface
5. Text Based Interface: To be able to administrate the database or for other professional
users there are possibilities to communicate with the DBMS directly in the query language (in
code form) like SQL input/output window as shown in Figure 2.6. Text-based interfaces are very
powerful tools and allow a comprehensive interaction with a DBMS. However, the use of these
is based on active knowledge of the respective database language.
6. Interfaces for the DBA: Most database systems contain privileged commands that can
be used only by the DBA's staff. These include commands for creating accounts, setting system
parameters, granting account authorization, changing a schema, and reorganizing the storage
structures of a database.
43
Figure 2.6: Text Based Interface
7. Natural Language Interfaces: These interfaces accept requests written in English or
some other language and attempt to "understand" them. A natural language interface usually has
its own "schema," which is similar to the database conceptual schema, as well as a dictionary of
important words. The natural language interface refers to the words in its schema, as well as to
the set of standard words in its dictionary, to interpret the request. If the interpretation is
successful, the interface generates a high-level query corresponding to the natural language
request and submits it to the DBMS for processing; otherwise, a dialogue is started with the user
to clarify the request.
2.3.6 DBMS Functions
There are many functions a Database Management System (DBMS) serves that are key
components to the operation of database management. When deciding to implement a DBMS in
a business environment, the first task is to decide what type of DBMS one actually requires. A
DBMS performs several important functions that guarantee integrity and consistency of data in
44
the database. Most of these functions are transparent to end-users. There are the following
important functions and services provided by a DBMS:
1) Ability to Update and Retrieve Data: This is a fundamental component of a DBMS and
essential to database management. Without the ability to view or manipulate data, there would be
no point to using a database system. Updating data in a database includes adding new records,
deleting existing records and changing information within a record. The user does not need to be
aware of how DBMS structures this data, all the user needs to be aware of is the availability of
updating and/or pulling up information, the DBMS handles the processes and the structure of the
data on a disk.
2) Support Concurrent Updates: Concurrent updates occur when multiple users make updates
to the database simultaneously. Supporting concurrent updates is also crucial to database
management as this component ensures that updates are made correctly and the end result is
accurate. Without DBMS intervention, important data could be lost and/or inaccurate data stored.
DBMS uses features to support concurrent updates such as batch processing, locking, two-phase
locking, and time stamping to make certain updates that are done accurately. Database
management system is responsibility to make sure that all updates are stored properly since the
user is unaware about all this updates.
3) Recovery of Data: In the event a catastrophe occurs, DBMS must provide ways to recover a
database so that data is not permanently lost. There are times when computers may crash, a fire
or other natural disaster may occur, or a user may enter incorrect information invalidating or
making records inconsistent. If the database is destroyed or damaged in any way, the DBMS
must be able to recover the correct state of the database, and this process is called Recovery. The
easiest way to do this is to make regular backups of information. This can be done at a set
45
structured time, so in the event a disaster occurs, the database can be restored to the state that it
was last at prior to crash.
4) Data Storage Management: It provides a mechanism for management of permanent storage
of the data. The internal schema defines how the data should be stored by the storage
management mechanism and the storage manager interfaces with the operating system to access
the physical storage.
5) Self- Describing Nature of a Database System: A database system contains not only the
database itself but also a complete definition or description of the database. This system is stored
in system catalog. The information stored in the catalog is called meta-data.
6) Program- Data Independence: DBMS access programs which are written independently of
any specific files. The structure of data files is stored in the DBMS catalog separately from the
access programs.
7) Program- Operation Independence: Users define operations as part of the database
definitions. User application programs can operate on data by invoking these operations through
their names and arguments. Users do not care about how the operation is implemented.
8) Support of Multiple Views: Each user of the database many require a different perspective of
the database. A view may be a subset of the database or it may contain virtual data that is derived
from the database files.
9) Authorization/ Security Management: The DBMS protects the database against
unauthorized access, either international or accidental. It furnishes mechanism to ensure that only
authorized users an access the database.
10) Database Access and Application Programming Interfaces: All DBMS provide interface
to enable applications to use DBMS services. They provide data access via Structured Query
46
Language (SQL). The DBMS query language contains two components: (a) a Data Definition
Language (DDL) and (b) a Data Manipulation Language (DML).
11) Concurrency Control Service: Since DBMSs support sharing of data among multiple users,
they must provide a mechanism for managing concurrent access to the database. DBMSs ensure
that the database kept in consistent state and that integrity of the data is preserved.
12) Transaction Management: A transaction is a series of database operations, carried out by a
single user or application program, which accesses or changes the contents of the database.
Therefore, a DBMS must provide a mechanism to ensure either that all the updates
corresponding to a given transaction are made or that none of them is made.
13) Backup and Recovery Management: The DBMS provides mechanisms for backing up data
periodically and recovering from different types of failures. This prevents the loss of data.
2.3.7 Component Modules of DBMS
The DBMS software is partitioned into several modules. Each module or component is assigned
a specific operation to perform. Some of the functions of the DBMS are supported by operating
systems (OS) to provide basic services and DBMS is built on top of it. Figure 2.7 explains
database system being a complex software system which is partitioned into several software
components that handle various tasks such as data definition and manipulation, security and data
integrity, data recovery and concurrency control, and performance optimization etc. as explained
below:
1) Data Definition: The DBMS provides functions to define the structure of the data. These
functions include defining and modifying the record structure, the data type of fields, and the
various constraints to be satisfied by the data in each field. It is the responsibility of DBA to
define the database, and make changes to its definition (if required) using the DDL and other
47
privileged commands. The DDL compiler component of DBMS processes these schema
definitions, and stores the schema descriptions in the DBMS catalog (data dictionary). Other
DBMS components then refer to the catalog information as and when required.
2) Data Manipulation: Once the data structure is defined, data needs to be manipulated. The
manipulation of data includes insertion, deletion, and modification of records. The functions that
perform these operations are also part of the DBMS. These functions can handle planned as well
as unplanned data manipulation needs.
I. The queries that are defined as a part of the application programs are known as planned
queries. The application programs are submitted to a pre-compiler, which extracts DML
commands from the application program and send them to DML compiler for
compilation. The rest of the program is sent to the host language compiler. The object
codes of both the DML commands and the rest of the program are linked and sent to the
query evaluation engine for execution.
II. The sudden queries that are executed as and when the need arises are known as
unplanned queries (interactive queries). These queries are compiled by the query
complier, and then optimized by the query optimizer. The query optimizer consults the
data dictionary for statistical and other physical information about the stored data. The
optimized query is finally passed to the query evaluation engine for execution. The naive
users of the database can also query and update the database by using some already given
application program interfaces. The object code of these queries is also passed to query
evaluation engine for processing.
48
3) Data Security and Integrity: The DBMS contains functions, which handle the security and
integrity of data stored in the database. Since these functions can be easily invoked by the
application, the application programmer need not code these functions in any PL/SQL program.
Figure 2.7: Component Modules of DBMS
4) Concurrency and Data Recovery: The DBMS also contains some functions that deal with
the concurrent access of records by multiple users and the recovery of data after a system failure.
5) Performance Optimization: The DBMS has a set of functions that optimize the performance
of the queries by evaluating the different execution plans of a query and choosing the best among
them.
6) Run Time Database Manager: Run time database manager is the central software
component of the DBMS, which interfaces with user-submitted application programs and
49
queries. It handles database access at run time. It converts operations in user's queries coming.
Directly via the query processor or indirectly via an application program from the user's logical
view to a physical file system. It accepts queries and examines the external and conceptual
schemas to determine what conceptual records are required to satisfy the user‘s request. It
enforces constraints to maintain the consistency and integrity of the data, as well as its security.
It also performs backing and recovery operations. Run time database manager is sometimes
referred to as the database control system and has the following components:
 Authorization control: The authorization control module checks the authorization of users
in terms of various privileges to users.
 Command processor: The command processor processes the queries passed by
authorization control module.
 Integrity checker: It .checks the integrity constraints so that only valid data can be entered
into the database.
 Query optimizer: The query optimizers determine an optimal strategy for the query
execution.
 Transaction manager: The transaction manager ensures that the transaction properties
should be maintained by the system.
 Scheduler: It provides an environment in which multiple users can work on same piece of
data at the same time in other words it supports concurrency.
7) Query processor: The query processor transforms user queries into a series of low level
instructions. It is used to interpret the online user's query and convert it into an efficient series of
operations in a form capable of being sent to the run time data manager for execution. The query
processor uses the data dictionary to find the structure of the relevant portion of the database and
50
uses this information in modifying the query and preparing and optimal plan to access the
database.
8) Data Manager: The data manager is responsible for the actual handling of data in the
database. It provides recovery to the system which that system should be able to recover the data
after some failure. It includes Recovery manager and Buffer manager. The buffer manager is
responsible for the transfer of data between the main memory and secondary storage (such as
disk or tape). It is also referred as the cache manger.
9) Database Engine: The Database Engine is the core service for storing, processing, and
securing data. The Database Engine provides controlled access and rapid transaction processing
to meet the requirements of the most demanding data consuming applications within your
enterprise. Use the Database Engine to create relational databases for online transaction
processing or online analytical processing data. This includes creating tables for storing data, and
database objects such as indexes, views, and stored procedures for viewing, managing, and
securing data.
10) Data dictionary: A data dictionary is a reserved space within a database which is used to
store information about the database itself. A data dictionary is a set of table and views which
can only be read and never altered. Most data dictionaries contain different information about the
data used in the enterprise. In terms of the database representation of the data, the data table
defines all schema objects including views, tables, clusters, indexes, sequences, synonyms,
procedures, packages, functions, triggers and many more. This will ensure that all these things
follow one standard defined in the dictionary. The data dictionary also defines how much space
has been allocated for and / or currently in used by all the schema objects. A data dictionary is
51
used when finding information about users, objects, schema and storage structures. Every time a
data definition language (DDL) statement is issued, the data dictionary becomes modified. A
data dictionary may contain information such as:
 Database design information
 Stored SQL procedures
 User permissions
 User statistics
 Database process information
 Database growth statistics
 Database performance statistics
11) Query Processor: A relational database consists of many parts, but at its heart are two major
components: the storage engine and the query processor. The storage engine writes data to and
reads data from the disk. It manages records, controls concurrency, and maintains log files. The
query processor accepts SQL syntax, selects a plan for executing the syntax, and then executes
the chosen plan. The user or program interacts with the query processor, and the query processor
in turn interacts with the storage engine. The query processor isolates the user from the details of
execution: The user specifies the result, and the query processor determines how this result is
obtained. The query processor components include
 DDL interpreter
 DML compiler
 Query evaluation engine
52
12) Report writer: Also called a report generator, a program, usually part of a database
management system, which extracts information from one or more files and presents the
information in a specified format. Most report writers allow you to select records that meet
certain conditions and to display selected fields in rows and columns. You can also format data
into pie charts, bar charts, and other diagrams. Once you have created a format for a report, you
can save the format specifications in a file and continue reusing it for new data.
In this way, the DBMS provides an environment that is both convenient and efficient to use
when there is a large volume of data and many transactions need to be processed concurrently.
2.4 Summary
In this chapter a DBMS is presented that cleanly separates the three levels which have mapping
between the schemas to transform requests and results from one level to the next. Most DBMS‘s
do not separate the three levels completely. We used the three-schema architecture to define the
concept of logical and physical data independence.
In the next section different types of user-friendly interfaces provided by DBMS and the users
with each interface is associated. Main types of languages that DBMS supports are also
explained which gives a thorough knowledge of high-level language that can be used as a
standalone language, often called as a query language.
In the last, main functionality of DBMS and its different component modules are explained.
When deciding to implement a DBMS in a business environment, the first and most important
task is to decide what type of DBMS that business actually requires. Also it is important to know
that how many and which modules are actually requires to fulfill the business desire. Therefore,
selection of the concerned module or component is required to perform the requisite operation.
53
New Delhi.
2. Korth & Silberschatz : Database System Concept, 4th Edition, McGraw Hill International
Edition.
3. Raghu Ramakrishnan & Johannes Gehrke: Database Management Systems, 2nd edition,
Mcgraw Hill International Edition.
Delhi.
1 What is the difference between schema and subschema?
2. Outline three-level schema architecture of DBMS; distinguish each of the level clearly.
3. What do you mean by mapping? Discuss the different type of mapping in three-tier
architecture of DBMS.
4. Distinguish between conceptual schema and external schema in DBMS architecture.
5. What do you mean by data independence? Discuss the different type of data
independence.
6. What are database languages? Explain in detail.
7. Define DBMS. Explain the different functions of database management system.
8. Discuss the different component modules of database management system.
54
Chapter – 3: Entity-Relationship (ER) Modeling

Structure:
3.1 Introduction
3.2 Objective
3.3.1 Entity-Relationship (ER) Model – Concept
(i) Entity
(ii) Attribute
(iii) Relationship Types
(iv) Degree of Relationship
(v) Cardinality of a Relationship
(vi) Representing Relationship Types
(vii) Role Names and Recursive Relationships
3.3.2 Relationship Constraints
(i) Cardinality for Binary Relationship
(ii) Participation Constraints and Existence Dependencies
(iii) Attributes of Relationship Types
3.3.3 Keys
3.3.4 ER Model/ Diagram
3.3.5 Mapping Logical ER-Diagram Models to Relational Tables
3.4 Summary
55
3.1 Introduction
It is the responsibility of database administrator (DBA) to perform the logical database design,
assigning the related data items of the database to columns of tables in a manner that preserve
desired properties. The most important test of logical design is that tables and attributes faithfully
reflect relationships among objects in the real world and that this remains true after all likely
database updates in future. Database with different data models have different structures for
representing data in relational database; the fundamental structure for representing data what we
have been calling relational tables.
The DBA starts by studying some real-world enterprise whose operations need to be supported
on a computerized database system. After a great deal with expertise of examination system;
DBA comes up with a list of data items and underlying data objects that must be keep track with
number of rules or constraints concerning the interrelationship of these data items. For all these
purposes the DBA used a data model to represent data items and their relationship called Entity-
Relationship (E-R) model.
3.2 Objective
This chapter will provide an idea to view real world objects as entity and relationship among
them by using basic component of relational model i.e. Entity-relationship diagram. The
objective of an entity relationship diagram is to show the business rules that apply to an
organization data. It contains entities, which are things of interest to a company, and
relationships, which are relationships between entities. It also documents volumetric data so that
we know what the initial data storage requirements will be together with the anticipated growth.
To design E-R diagram; basic data structuring concepts, constraints, relationship, keys,
cardinality ratios etc. are elaborated so that these concepts can be used in the designing of
56
conceptual schema for database applications. This design plan is designed by a database
developer to implement specific database management software. This model can be used to
communicate with the end-users.
3.3.1 Entity-Relationship (ER) Model – Concept
When a relational database is to be designed, an Entity-Relationship (ER) diagram is drawn at an
early stage and developed as the requirements of the database and its processing become better
understood. Drawing an entity-relationship diagram aids understanding of an organization's data
needs and can serve as a schema diagram for the required system's database. A schema diagram
is any diagram that attempts to show the structure of the data in a database. Nearly all systems
analysis and design methodologies contain entity-relationship diagramming as an important part
of the methodology and nearly all CASE (Computer Aided Software Engineering) tools contain
the facility for drawing entity-relationship diagrams. An entity-relationship diagram could serve
as the basis for the design of the files in a conventional file-based system as well as for a schema
diagram in a database system.
In 1976, Chen developed the Entity-Relationship (ER) model, defined as ―a high-level data
model that is useful in developing a conceptual design for a database‖. An Entity Relationship
(ER) diagram is an excellent communications tool, which can be used to confirm business
requirements and provide direction to the architecture and design team as they move forward
with physical database design. An ER Model generally provides the following:
 Confirms business rules;
 Is used as the ―target‖ for data movement mapping and helps ensure no data is
overlooked;
57
 Provides direction to the architecture and design team to start physical database design;
and
 Helps make important decisions about facts and dimensions required for business
intelligence purposes.
Creation of an ER diagram, which is one of the first steps in designing a database, helps the
designer(s) to understand and to specify the desired components of the database and the
relationships among those components. An ER model is a diagram containing entities or "items",
relationships among them, attributes of the entities and the relationships. These three categories
are considered to be sufficient to model the essentially static data-base parts of any organization's
information processing needs.
(i) Entity: An entity is an object that exists and which is distinguishable from other objects.
An entity can be a person, a place, an object, an event, or a concept about which an organization
wishes to maintain data. The following are some examples of entities:
Example 1: Student, Employee, Department are the examples of entities.
It is important to understand the distinction between an entity type, an entity instance, and an
entity set. An entity type defines a collection of entities that have same attributes. An entity
instance is a single item in this collection. An entity set is a set of entity instances.
Example 2: Let student is an entity type; a student with ID number 13-PGDCA-1100 is an entity
instance; and a collection of all students is an entity set.
In the E-R diagram, we assign a name to each entity type. When assigning names to entity types,
we follow certain naming conventions. An entity name should be a concise singular noun that
captures the unique characteristics of the entity type. An E-R diagram depicts an entity type
using a rectangle with the name of the entity inside as shown in Figure 3.1.
58
STUDENT EMPLOYEE DEPARTMENT
Figure 3.1: The Entity Representation in an E-R diagram.
An entity type may be of two types i.e. Strong entity and Weak entity. Entity types that have key
attribute (Primary Key) are called strong entity type. The strong entity type is also called regular
entity type. The entity type STUDENT is a strong entity type, since it has StudentID as a key
attribute as shown in example 3. While entity types that do not have any key attributes is called
weak entity type. In example 3, class is a weak entity, since it does not have any key attribute.
The weak entity type is also called child entity type or the subordinate entity type. In an E-R
diagram a strong entity is shown in rectangular box as shown in Figure 3.2 (a) and a weak entity
type in a double rectangular box as shown in Figure 3.2 (b).
Class
STUDENT
Figure 3.2: a) Strong Entity Type b) Weak Entity Type
(ii) Attribute: We represent an entity with a set of attributes. An attribute is a property or
characteristic of an entity type that is of interest to an organization. Some attributes of entity
types include the following:
Example 3: STUDENT = {Student Id, Name, Address, PhoneNo, Age, Dateofbirth, Language}
EMPLOYEE = {Employee Id, Employee Name, Employee Age, Employee Salary}
DEPARTMENT = {Department Id, Department Name}
CLASS = {Subject, Department, Section}
A particular value of an attribute, such as 101 as StudentID and Aryan as Name etc. for Student
entity as shown in Example 3 is a value of the attribute. Most of the data in a database consists of
59
values of attributes. The set of all possible values of an attribute is the attribute domain.
Sometimes the value of an attribute is unknown or missing, and sometimes a value is not
applicable. In such cases, the attribute can have the special value as null.
Following conventions are used while naming attributes:
1. Each word in a name starts with an uppercase letter followed by lower case letters.
2. If an attribute name contains two or more words, the first letter of each subsequent word is
also in uppercase, unless it is an article or preposition, such as ―a,‖ ―the,‖ ―of,‖ or ―about‖ etc.
E-R diagrams depict an attribute inside an ellipse/oval and connect the ellipse/oval with a line to
the associated entity type. Figure 3.3 illustrates an E-R diagram of Student entity with some of
the possible attributes.
One must note that all of the attributes as shown in Figure 3.3 are actually the several types of
attributes which uses different notations. These include: simple, composite, single-valued, multi-
valued, stored, derived and key attributes. In the upcoming subsections, we discuss the
distinctions between these types of attributes.
a) Simple and Composite Attributes: A simple or an atomic attribute, such as PhoneNo,
cannot be further divided into smaller components. A composite attribute, however, can be
divided into smaller subparts in which each subpart represents an independent attribute. Name in
this case is a composite attribute, since it can be further divided into smaller subpart. Similarly
Address can also be composite attribute. All other attributes, even those that are subcategories of
Name and Address, are simple attributes. Figure3.3 presents the notation that depicts a composite
attribute. Simple and composite attribute are denoted by oval/ ellipse in an E-R diagram.
60
b) Single-Valued and Multi-Valued Attributes: Most attributes have a single value for an
entity instance; such attributes are called single-valued attributes. A multi-valued attribute, on the
other hand, may have more than one value for an entity instance.
First Name
PhoneNo Name
Middle Name
Languages
Last Name
Student
StudentID
Dateofbirth Age
Figure 3.3: Different Types of Attribute in E-R Diagram
Example 4: Figure 3.3 states that in STUDENT entity type; language is an attribute. Language
attribute may be a multi-valued attribute, because it may store the names of the languages that a
student speaks. Since a student may speak several languages, it is a multi-valued attribute.
Attributes like Student Id of the STUDENT entity type is a single-valued attributes, because a
student has only one Student Id. In the E-R diagram, we denote a multi-valued attribute with a
double-lined ellipse. Note that in a multi-valued attribute, we always use a double-lined
ellipse/oval, regardless of the number of values.
61
Note: Student entity is a strong entity; since it has StudentId as key attribute.
c) Stored and Derived Attributes: The value of a derived attribute can be determined by
analyzing other attributes. In Figure 3.3 Age is a derived attribute and DateofBirth is a stored
attribute of STUDENT entity type. The value of Age attribute can be derived from the current
date and the attribute DateofBirth. An attribute whose value cannot be derived from the values of
other attributes is called a stored attribute. A derived attribute Age is not stored in the database.
Derived attributes are depicted in the E-R diagram with a dashed (dotted) ellipse/ oval.
d) Key Attribute: A key attribute (or identifier) is a single attribute or a combination of
attributes that uniquely identify an individual instance of an entity type. No two instances within
an entity set can have the same key attribute value. For the STUDENT entity shown in Figure
3.3, StudentID is the key attribute since each student identification number is unique. Name, by
contrast, cannot be an identifier because two students can have the same name. We underline key
attributes in an E-R diagram.
(iii) Relationship Types: The first two major elements of entity-relationship diagrams are
entity types and attributes. The final element is the relationship type. Sometimes, the word 'types'
is dropped and relationship types are called simply 'relationships' but since there is a difference
between the terms, one should really use the term relationship type.
Real-world entities have relationships between them, and relationships between entities on the
entity-relationship diagram are shown where appropriate. An entity-relationship diagram consists
of a network of entity types and connecting relationship types. A relationship type is a named
association between entities. Individual entities have individual relationships of the type between
them. An individual person (entity) occupies (relationship) an individual house (entity). In an
62
entity-relationship diagram, this is generalized into entity types and relationship types. The entity
type PERSON is related to the entity type HOUSE by the relationship type OCCUPIES. There
are lots of individual persons, lots of individual houses, and lots of individual relationships
linking them.
There can be more than one type of relationship between entities. Entities in an organization do
not exist in isolation but are related to each other. Students take courses and each STUDENT
entity is related to the COURSE entity. Faculty members teach courses and each FACULTY
entity is also related to the COURSE entity. Consequently, the STUDENT entity is related to the
FACULTY entity through the COURSE entity. E-R diagrams can also illustrate relationships
between entities. Therefore, we define a relationship as an association among several entities. A
relationship set is a grouping of all matching relationship instances, and the term relationship
type refers to the relationship between entity types.
Faculty Teaches Course
Figure 3.4: The relationship between FACULTY and COURSE entities in an E-R diagram.
In an E-R diagram, relationship types are represented with diamond-shaped boxes connected by
straight lines to the rectangles that represent participating entity types. A relationship type is a
given name that is displayed in this diamond-shaped box and typically takes the form of a
present tense verb or verb phrase that describes the relationship. An E-R diagram may depict a
relationship as shown in Figure 3.4 between the entities FACULTY and COURSE.
63
(iv) Degree of Relationship: The number of entity sets that participate in a relationship is
called the degree of relationship.
Example 5: The degree of the relationship featured in Figure 3.4 is two because FACULTY and
COURSE are two separate entity types that participate in the relationship. The three most
common degrees of a relationship in a database are unary (degree 1), binary (degree 2), and
ternary (degree 3).
Let E1, E2, . . . ,En denote n entity sets and let R be the relationship. The degree of the
relationship can also be expressed as follows:
a) Unary Relationship A unary relationship R is an association between two instances of
the same entity type.
Example 6: Let two students are roommates and stay together in a hostel. Because they share the
same address, a unary relationship exists between them for the attribute Address in Figure 3.3.
b) Binary Relationship A binary relationship R is an association between two instances of
two different entity types.
Example 7: In a University, a binary relationship exists between a student (STUDENT entity)
and an instructor (FACULTY entity) of a single class; an instructor teaches a student.
c) Ternary Relationship A ternary relationship R is an association between three instances
of three different entity types.
Example 8: Consider a student using certain equipment for a project. In this case, the
STUDENT, PROJECT, and EQUIPMENT entity types relate to each other with ternary
relationships: a student checks out equipment for a project.
64
(v) Cardinality of a Relationship: The term cardinal number refers to the number used in
counting. When we say cardinality of a relationship, we mean the ability to count the number of
entities involved in that relationship.
Example 9: If the entity types A and B are connected by a relationship, then the maximum
cardinality represents the maximum number of instances of entity B that can be associated with
any instance of entity A.
However, we don‘t need to assign a number value for every level of connection in a relationship.
In fact, the term maximum cardinality refers to only two possible values: one or many. While
this may seem to be too simple, the division between one and many allows us to categorize all of
the permutations possible in any relationship. The maximum cardinality value of a relationship,
then, allows us to define the four types of relationships possible between entity types A and B.
a) One-to-One Relationship: In a one-to-one relationship, at most one instance of entity B
can be associated with a given instance of entity A and vice versa.
b) One-to-Many Relationship: In a one-to-many relationship, many instances of entity B
can be associated with a given instance of entity A. However, only one instance of entity A can
be associated with a given instance of entity B.
Example 10: While a customer of a company can make many orders, an order can only be
related to a single customer.
c) Many-to-One Relationship: In a many-to-one relationship, many instances of entity A
can be associated with a given instance of entity B. However, only one instance of entity B can
be associated with a given instance of entity A.
Example 11: Many students of a class are taught by a faculty.
65
d) Many-to-Many Relationship In a many-to-many relationship, many instances of entity
A can be associated with a given instance of entity B, and, likewise, many instances of entity B
can be associated with a given instance of entity A.
Example 12: A machine may have different parts, while each individual part may be used in
different machines.
(vi) Representing Relationship Types: Figure 3.5 displays how we represent different
relationship types in an E-R diagram. An entity on the one side of the relationship is represented
by a vertical line, ―I,‖ which intersects the line connecting the entity and the relationship. Entities
on the many side of a relationship are designated by a crowfoot as depicted in Figure 3.5.
(vii) Role Names and Recursive Relationships: Each entity type in a relationship plays a
particular role. The role name specifies the role that a participating entity type plays in the
relationship and explains what the relationship means. For example, in the relationship between
Employee and Department, the Employee entity type plays the employee role, and the
Department entity type plays the department or employer role. In most cases the role names do
not have to be specified, but in cases where the same entity participates more than once in a
relationship type in different roles.
Example 13: Let there are two entity types MANAGER and ORGANIZATION. The
relationship name is manages. It states that MANAGER plays the role or worker (employee) and
ORGANIZATION plays the role of owner (employer). Further employee manages the
assignments for an employer.
In a recursive relationship the same entity type participate in more than once for a relationship
type in different roles. Such relationship types are called recursive relationship.
66
Example 14: In the Company schema, each employee has a supervisor, we need to include the
relationship ―Supervises‖, however a supervisor is also an employee, therefore the employee
entity type participates twice in the relationship, once as an employee and once as a supervisor,
and therefore we can specify two roles, employee and supervisor as shown in Figure 3.6.
1 1 1 M
A B R
R A B
One-to-One Relationship One-to-Many Relationship
M 1 M M
A B A R B
R
Many-to-One Relationship Many-to-Many Relationship
Figure 3.5: The relationship types based on maximum cardinality.
Employee
Supervisor Supervisee
Supervises
Figure 3.6: Recursive relationship
67
3.3.2 Relationship Constraints
Relationship types have certain constraints that limit the possible combination of entities that
may participate in relationship.
Example 15: An example of a constraint is that if we have the entities Doctor and Patient, the
organization may have a rule that a patient cannot be seen by more than one doctor. This
constraint needs to be described in the schema. There are two main types of relationship
constraints, cardinality ratio, and participation.
(i) Cardinality for Binary Relationship
Binary relationships are relationships between exactly two entities. The cardinality ratio specifies
the maximum number of relationship instances that an entity can participate in. The possible
cardinality ratios for binary relationship types are: 1:1, 1: N, N: 1, M: N. Cardinality ratios are
shown on ER diagrams by displaying 1, M and N on the diamonds box. The ratio shown closest
to an entity represents the ratio the other entity has to that entity.
(ii) Participation Constraints and Existence Dependencies
The participation constraint specifies whether the existence of an entity depends on its being
related to another entity via the relationship type. The constraint specifies the minimum number
of relationship instances that each entity can participate in. There are two types of participation
constraints:
a) Total Participation:
 An entity can exist, only if it participates in at least one relationship instance, then that
relationship is called total participation, meaning that every entity in one set, must be
related to at least one entity in a designated entity set.
68
 An example would be the Employee and Department relationship. If company policy
states that every employee must work for a department, then an employee can exist only
if it participates in at least one relationship instance (i.e. an employee can‘t exist without
a department)
 It is also sometimes called an existence dependency.
 Total participation is represented by a double line, going from the relationship to the
dependent entity.
b) Partial Participation:
 If only a part of the set of entities participate in a relationship, then it is called partial
participation.
 Using the Company example, every employee will not be a manager of a department, so
the participation of an employee in the ―Manages‖ relationship is partial.
 Partial participation is represented by a single line.
(iii) Attributes of Relationship Types
 Relationships can have attributes similar to entity types.
 For example, in the relationship Works_On, between the Employee entity and the
Department entity we would like to keep track of the number of hours an employee
works on a project. Therefore we can include Number of Hours as an attribute of the
relationship.
 Another example is for the ―manages‖ relationship between employee and department,
we can add Start Date as an attribute of the Manages relationship.
69
 For some relationships (1:1, or 1:N), the attribute can be placed on one of the
participating entity types. For example the ―Manages‖ relationship is 1:1, StartDate can
either be migrated to Employee or Department.
3.3.3 Keys
Keys are, as their name suggests, a key part of a relational database and a vital part of the
structure of a table. They ensure each record within a table can be uniquely identified by one or a
combination of fields within the table. They help enforce integrity and help identify the
relationship between tables. There are three main types of keys i.e. candidate keys, primary keys
and foreign keys. There is also an alternative key or secondary key that can be used, as the name
suggests, as a secondary or alternative key to the primary key and composite key as explained
below:
i) Candidate Key: A candidate key is any set of one or more columns whose combined
values are unique among all occurrences and the key cannot be further reduced. Since a null
value is not guaranteed to be unique, no component of candidate key is allowed to be null. There
can be any number of candidate keys in a table. Two properties must be satisfied by a candidate
key:
Uniqueness: There should not be any duplicate rows in a relation.
Irreducible: The attributes which are used to form the keys should not be further broken down
into sub parts.
Table 3.1: Student Relation (StudentID, FirstName, LastName, Class, Marks)
Candidate Key
StudentID First Name Last Name Class Marks
100 Arpit Garg PGDCA 2800
70
101 Satvik Juneja PGDCA 2900
102 Siddhant Luthra PGDCA 2850
103 Aryan Goel PGDCA 2875
Example 16: In Table 3.1, As an example we might have a student_id that uniquely identifies
the students in a student table. This would be a candidate key. But in the same table we might
have the student‘s first name and last name that also, when combined, uniquely identify the
student in a student table. These would both be candidate keys.
In order to be eligible for a candidate key it must pass certain criteria.
 It must contain unique values
 It must not contain null values
 It contains the minimum number of fields to ensure uniqueness
 It must uniquely identify each record in the table
Once your candidate keys have been identified you can now select one to be your primary key
ii) Super Key: A super key is the combination of attributes that can be uniquely identify a
database record. A table might have many super keys. Candidate keys are a special subset of
super keys that do not have any extraneous information in them. In other words if we add another
attribute to candidate key and it still satisfies the uniqueness property, then the combination of
those attributes is known as Super key. The main properties are as follows:
Uniqueness: It must uniquely identify the rows of a relation.
Irreducible: It may or may not be Irreducible.
71
Example 17: In Table 3.1, an attribute student_id act as the candidate key and it can also act as a
super key as it satisfies the uniqueness property. If we add another attribute to that key say
FirstName, LastName, Class, Marks and it still satisfies the uniqueness property then it‘s a super
key.
iii) Primary Key (PK): Primary keys are used to uniquely identify rows in a relational
database design. It usually comprises of a single table column, but may consist of a multiple
columns as well. It is possible for a table to have more than one column with unique values in the
table, however only one primary key can be defined. Each column with distinct values is called a
unique key. If we have more than one candidate key in our relation then choose one out of all
candidate keys. Primary keys can be defined at the time of table creation or can be added in after
the table has been created. Following points should be kept in mind while making primary key
from candidate key:
a) No rows can have an empty value (called the null) in the primary key column.
b) The value of the primary key attribute must not be duplicated in any tuple/ row.
c) The primary key should be composed of the minimum number of attributes that satisfies the
condition of unique occurrence.
d) The value of the primary key will remain same during the life time of the relation.
Table 3.2: Student Relation presenting Primary Key
Primary Key
Roll Name Class Marks
100 Arpit PGDCA 2800
101 Satvik PGDCA 2900
102 Siddhant PGDCA 2850
72
103 Aryan PGDCA 2875
Invalid Entry
103 Mukta PGDCA 2870
Null Prerna PGDCA 2829

Invalid Entry
Null Prem Lata M.Sc 3500
Example 18: In Table 3.2, there are six tuple/ rows in total. In which row no. 5 is invalid,
because Roll 103 is duplicate. The attribute, Name, and the values Mukta and Aryan have the
same Roll as 103. It does not satisfy the properties of primary key. Further in row no. 6, Roll
Null is there. It is also an invalid entry. Since primary key does not allow null values.
iv) Foreign key: A foreign key (FK) is a field or group of fields in a database record that
points to a key field or group of fields forming a key of another database record in some (usually
different) table. Usually a foreign key in one table refers to the primary key (PK) of another
table. This way references can be made to link information together and it is an essential part of
database normalization.
v) Alternate Key: An alternate key or secondary key is a candidate key which is not
selected to be the Primary key. In a relation there are number of attributes which may uniquely
identify the rows of a table. These attributes are called as candidate key. Out of these candidate
keys one is selected as a Primary key of the relation and the remaining candidate key left after
making primary key are known as Alternate keys.
vi) Composite Key: A compound key is a key that consists of two or more attributes that
uniquely identify the rows of a relation. Composite keys are also known as concatenated or
aggregate keys. A composite key cannot be irreducible and also it cannot contain null values.
73
3.3.4 ER Model/ Diagram
The Entity-Relationship (ER) model was originally proposed by Peter in 1976 as a way to unify
the network and relational database views. Simply stated, the ER model is a conceptual data
model that views the real world as entities and relationships. A basic component of the model is
the Entity-Relationship diagram, which is used to visually represent data objects. For the
database designer, the utility of the ER diagram is:
• It maps well to the relational model. The constructs used in the ER model can easily be
transformed into relational tables.
• It is simple and easy to understand with a minimum of training. Therefore, the model can
be used by the database designer to communicate the design to the end user.
• In addition, the model can be used as a design plan by the database developer to
implement a data model in specific database management software.
There are two techniques used for the purpose of data base designing from the system
requirements. These are:
 Top down Approach known as Entity-Relationship Modeling
 Bottom Up approach known as Normalization.
An entity-relationship (ER) diagram is a top down approach of designing database. It is a
specialized graphic technique that illustrates the interrelationships between entities in a database.
ER diagram often use symbols to represent three different types of information. Boxes are
commonly used to represent entities. Diamonds are used to represent relationships and ovals/
ellipse are used to represent attributes. The E-R models are designed diagrammatically using the
Entity- Relationship diagrams which represent the elements of conceptual model. The overall
74
logical structure of a database can be expressed graphically by an E-R diagram. Table 3.3 shows
the summary of ER diagram notation.
Table 3.3: Summary of the ER Diagram Notation
Notation Meaning
Entity type
Attribute
Key attribute
Derived attribute
Multivalued
attribute
Composite attribute
Relationship type
Total participation
Many-to-one
relationship
Weak Entity Type
75
Following are advantages of an E-R Model:
1. Visual Representation: The foremost and most important ERD benefit is that it provides a
visual representation of the design. It is normally crucial to have an ERD if you are looking to
come up with an effective database design. This is because the patterns assist the designer in
focusing on the way the database will primarily work with all the data flows and interactions. It
is common to the ERD being used together with data flow diagrams so as to attain a better visual
representation. Effective communication
2. Effective Communication: An ERD clearly communicates the key entities in a certain
database and their relationship with each other. ERD normally uses symbols for representing
three varying kinds of information. Diamonds are used for representing the relationships, ovals
are usually used for representing attributes and boxes represent the entities. This allows a
designer to effectively communicate what exactly the database will be like.
3. Simple to Understand: ERD is easy to understand and simple to create. In effect, this design
can be used to be shown to the representatives for both approval and confirmation. The
76
representatives can also make their contributions to the design, allowing the possibilities of
rectifying and enhancing the design.
4. High Flexibility: The ERD model is quite flexible to use as other relationships can be derived
easily from the already existing ones. This can be done using other relational tables and
mathematical formulae.
Following are disadvantages of an E-R Model:
1. No Industry Standard for Notation: There is no industry standard notation for developing
an E-R diagram.
2. Popular for High-Level Design: The E-R data model is especially popular for high level.
3.3.5 Mapping Logical ER-Diagram Models to Relational Tables
For each entity set and relationship set, there is a unique table which is assigned the name of the
corresponding set. Each table has a number of columns with unique names.
Step 1: For regular entity type E in ER schema, create a relation R that includes all the
simple attributes, and component attributes of composite attributes. Select the
primary key.
Step2: For weak entity type W in ER schema, with owner entity type E, create a relation
R, include all simple attributes of W as attribute of R. In addition, include the
primary key attributes of the relation Q for the owner entity type E. Primary key is
the combination of primary key of Q and partial key of R.
77
Step 3: For 1:1 relationship X, suppose S and T are the relations for the entity types
participating in it. Include primary key of T as foreign key of S. Include other
attributes of relationship X as attribute of S.
Step 4: For 1: N relationship Y, suppose S relation corresponds to the entity type at the N-
side and T relation corresponds to the entity type at the other side. Include
primary key of T as foreign key of S.
Customer_ name Branch_ name
Customer _city Branch_city

Customer_Id
Figure 3.7: ERBranch_Id
Diagram
CUTOMER BRANCH
Account
Account_no Balance
Figure 3.7: An ER Diagram to represent relationship between customer and branch.
Step 5: For M: N relationship Z, create a new relation R to represent Z. Include simple
attributes of Z in R. Include the primary keys of S and T as foreign keys of R,
their combination forms the primary key of R.
78
Step 6: For multi-valued attributes A, create a new relation R that includes an attribute
corresponding to A. Include primary key of the relation of the entity type having
A as an attribute. Primary key is their combination.
Step 7: For n-ary relationship type X, and n>2, create a new relation R, include primary
key of each participating entity type‘s relation as foreign key of R. Include
attribute of X as simple attributes of R.
3.4 Summary
Entity-Relationship (E-R) model is a high level conceptual data model developed by Chen in
1976 to facilitate database design. In this chapter, we had discussed an overview about E-R
modeling. Different type of entities, attribute and relationship among them are clearly elaborated.
We also exemplify the key concept, which are very important in E-R designing process. We also
discussed that how to construct an E-R diagram and further how to map an E-R model into
relational tables.
New Delhi.
2. Korth & Silberschatz.: Database System Concept, 4th Edition, McGraw Hill International
Edition.
Delhi.
79
6. O‘ Brien J.A.: Introduction to Information System in Business Management, 6th Edition,
Richard D. Irwin, Inc. 1991.
1. What is E-R modeling? What are the components of E-R model? Discuss.
2. Differentiate between entity, entity type and entity set. Explain the different type of
entities.
3. What do you mean by attribute? Explore the different type of attributes with examples.
4. Outline the different notations and naming conventions used to represent an E-R diagram.
5. What do you mean by relationship? Discuss the degree of relationship. Explain
cardinality ratios of a relationship.
6. Distinguish between
a) Composite attributes and Atomic attributes
b) Single-Valued and Multi-Valued Attributes
80
Chapter – 4: Database Design: Case Studies

Structure:
4.1 Introduction
4.2 Objective
4.3.1 Database Design Process
4.3.2 Case Studies
(i) Draw an E-R diagram of Inventory System
(ii) Draw an E-R diagram of Payroll System
(iii) Draw an E-R diagram of Reservation System
(iv) Draw an E-R diagram of Online Book Store
4.3.3 Some Other Specimen ER Diagram
4.3.4 Benefits of ER Diagram
4.4 Summary
81
4.1 Introduction
The performance of a DBMS is ultimate measurement of database design and database
designer‘s skill. A DBA can improve performance by adjusting some DBMS parameters like size
of the buffer pool or frequency of checkpoints. The overall database design activity has to
undergo systematic process called the design methodology. The overall process include
conceptual and external schema design, that is created as a collection of relations and views
along with a set of integrity constraints, we must address performance goals through physical
database design, in which we design the physical schema. It is usually necessary to tune
according to the user requirements.
4.2 Objective
The design process consists of two parallel activities. The first activity involves the design of
data content and structure of the database. The second activity relates to the design of the
database application. These two activities strongly influence each other. Traditionally, database
design methodologies have focused on different phases as discussed in upcoming sections. These
phases are similar to software design phases, but not strictly restrict to sequence of these phases.
4.3.1 Database Design Process
Proper database design is the only way that the database application will be efficient, flexible,
and easy to manage and maintain. An important aspect of database design is to use relationships
between tables instead of throwing all your data into one long flat file. Types of relationships
include one-to-one, one-to-many, many-to-one and many-to-many.
Using relationships to properly organize your data is called normalization. There are many levels
of normalization, but the primary levels are the first, second, and third normal forms. Each level
82
has a rule or two that you must follow. Following all the rules helps ensure that your database is
well organized and flexible.
To take an idea from inception through to fruition, you should follow a design process. This
process essentially says, ―Think before you act.‖ Discuss rules, requirements, and objectives;
then create the final version of your normalized tables. The systematic process of designing a
database is known as design methodology. Database design involves understanding operational
and business needs of an organization, modeling the specified requirements, and realizing the
requirements using a database. The goal of designing a database is to produce efficient, high
quality, and minimum cost database. In large organizations, database administrator (DBA) is
responsible for designing an efficient database system. He is responsible for controlling the
database life-cycle process. The overall database design and implementation process consists of
several phases.
i) Requirement Collection and Analysis: It is the process of knowing and analyzing the
expectations of the users for the new database application in as much detail as possible. A team
of analysts or requirement experts are responsible for carrying out the task of requirement
analysis. They review the current file processing system or DBMS system, and interact with the
users extensively to analyze the nature of business area to be supported and to justify the need
for data and databases. The initial requirements may be informal, incomplete, inconsistent, and
partially incorrect. The requirement specification techniques such as object-oriented analysis
(OOA), data flow diagrams (DFDs), etc., are used to transform these requirements into better
structured form. This phase can be quite time-consuming; however, it plays the most crucial and
important role in the success of the database system. The result of this phase is the document
containing the specification of user requirements.
83
ii) Conceptual Database Design: In this phase, the database designer selects a suitable data
model and translates the data requirements resulting from previous phase into a conceptual
database schema by applying the concepts of chosen data model. The conceptual schema is
independent of any specific DBMS. The main objective of conceptual schema is to provide a
detailed overview of the organization. In this phase, a high-level description of the data and
constraints are developed. The entity-relationship (E-R) diagram is generally used to represent
the conceptual database design. The conceptual schema should be expressive, simple,
understandable, minimal, and formal.
iii) Choice of a DBMS: The choice of a DBMS depends on many factors such as cost,
DBMS features and tools, underlying model, portability, and DBMS hardware requirements. The
technical factors that affect the choice of a DBMS are the type of DBMS (relational, object,
object-relational, etc.), storage structures and access paths that DBMS supports, the interfaces
available, the types of high-level query languages, and the architecture it supports (client/server,
parallel or distributed). The various types of costs that must be considered while choosing a
DBMS are software and hardware acquisition cost, maintenance cost, database creation and
conversion cost, personnel cost, training cost, and operating cost.
iv) Logical Database Design: Once an appropriate DBMS is chosen, the next step is to map
the high-level conceptual schema onto the implementation data model of the selected DBMS. In
this phase, the database designer moves from an abstract data model to the implementation of the
database. In case of relational model, this phase generally consists of mapping the E-R model
into a relational schema.
v) Physical Database Design: In this phase, the physical features such as storage structures,
file organization, and access paths for the database files are specified to achieve good
84
performance. The various options for file organization and access paths include various types of
indexing, clustering of records, hashing techniques, etc.
vi) Database System Implementation: Once the logical and physical database designs are
completed, the database system can be implemented. DDL statements of the selected DBMS are
used and compiled to create the database schema and database files, and finally the database is
loaded with the data.
vii) Testing and Evaluation: In this phase, the database is tested and fine-tuned for the
performance, integrity, concurrent access, and security constraints. This phase is carried out in
parallel with application programming. If the testing fails, various actions are taken such as
modification of physical design, modification of logical design or upgrade or change DBMS
software or hardware.
We must keep it in view that once the application programs are developed, it is easier to change
the physical database design. However, it is difficult to modify the logical database design as it
may affect the queries (written using DML commands) embedded in the program code. Thus, it
is necessary to carry out the design process effectively before developing the application
programs. While designing a database schema, it is necessary to avoid two major issues, namely,
redundancy and incompleteness. These problems may lead to bad database design.
4.3.2 Case Studies
Following points must be kept in mind before drawing the effective ER diagrams:
1. Identify all the relevant entities in a given system and determine the relationships among
these entities.
2. An entity should appear only once in a particular diagram.
85
3. Provide a precise and appropriate name for each entity, attribute, and relationship in the
diagram. Terms that are simple and familiar always beats vague, technical-sounding words.
In naming entities, remember to use singular nouns. However, adjectives may be used to
distinguish entities belonging to the same class (part-time employee and full time employee,
for example). Meanwhile attribute names must be meaningful, unique, system-independent,
and easily understandable.
4. Remove vague, redundant or unnecessary relationships between entities.
5. Never connect a relationship to another relationship.
i) Draw an E-R diagram of Inventory System:
The Inventory System provides a complete set of methods to support inventory handling. All
users of the Inventory System need the same functionality to complete their varied tasks.
The Inventory System allows you to:
 Remove items from inventory.
 Notify the store of a customer‘s intent to purchase an item that is not currently in stock.
(backorder)
 Notify the store of a customer‘s intent to purchase an item that has never been in stock.
(preorder)
The administrator of the store uses the inventory system to:
 Place a specific number of items on a shelf for customers to purchase, backorder, or
preorder.
 Decrease the number of items available for purchase, backorder, or preorder, perhaps
because of an error in stocking the item.
 Determine the number of items available for purchase, backorder, or preorder.
86
 Determine when a specific item will be back in stock.
For drawing an ER diagram of Inventory system, following components of ER diagram are taken
care of:
1. Entities identified for drawing an ER diagram of Inventory System are as follow:
Entity Purpose
Supplier To maintain the complete personal details of the supplier.
Staff To maintain the complete personal detail of the employee of
the organization.
Customer To maintain the complete personal detail of the customer.
Product To maintain the detail of the product, the organization is
dealing with.
Role To record the role of the staff hired by the organization.
Order To create an order unique number placed by a customer.
Category To maintain the category of the order, if it belongs to a
specific category.
Payment To maintain the payment details of the customer.
Order Detail To maintain the detail of the order being ordered by a
customer.
2. Respective attributes of the entity along with their type and name of the constraint.
Entity List of Attributes Data Type of the Name of
Attribute Constraint
Supplier Name – FNAme, Char (15) Not Null
87
MName, LName
Address Varchar2(25) Not Null
Phone Number(10) Not Null
Fax Number(10)
Email Varchar2(25)
Supplierid Varchar2(3) Primary Key
PayMethod Char(10)
Staff Name – FNAme, Char (15) Not Null
MName, LName
StaffId Varchar2(3) Primary Key
Sex Char(1) Not Null
UserName Varchar2(25) Not Null
Password Varchar2(25) Not Null
Customer CID Varchar2(3) Primary Key
Name – FNAme, Char (15) Not Null
MName, LName
Fax Number(10)
SID Varchar2(3) Foreign Key
Email Varchar2(25)
Product ProductId Varchar2(3) Primary Key
88
Supplierid Varchar2(3) Foreign Key
PName Char (15) Not Null
Qperunit Number(3) Not Null
Discount Number(3,2)
Popstock Number(3) Not Null
Pclstock Number(3) Not Null
Pordered Number(3) Not Null
Uprice Number(3,2)
Role RoleId Varchar2(3) Primary Key
Description Char (35) Not Null
RoleName Char (15) Not Null
Order OrderId Varchar2(3) Primary Key
CID Varchar2(3) Foreign Key
OrderDate Date Not Null
Category CatID Varchar2(3) Primary Key
CName Char (15)
ProdID Varchar2(3) Foreign Key
Payment PayId Varchar2(3) Primary Key
Paydue Number(5,2) Not Null
Paypaid Number(5,2) Not Null
Paydate Date Not Null
BillNo. Varchar2(5) Not Null
Baldue Number(5,2) Not Null
89
ODetailID Varchar2(3) Foreign Key
Order Detail ODetailID Varchar2(3) Primary Key
DeliDate Date Not Null
Deliqty Number(3,2) Not Null
Ordquantity Number(3) Not Null
OrdDate Date Not Null
Discount Number(3.2)
OrderId Varchar2(3) Foreign Key
3. Relationship Name and Cardinality Ratio among Entity
Relationship Name Entity Cardinality Ratio
Supplies Supplier , Product 1:N
Belongs Product, Category N: 1
Has Staff, Role N: 1
Register Staff, Customer 1: N
Orders Customer, Order 1:N
Contain Order, OrderDetail 1:N
Takes Staff, Orders 1:N
Having Payment,OrderDetail N:1
Specimen of an E-R diagram: Given below an ER diagram is just an idea to the problem. It
may change as per the designer perception and customer requirement.
90
MNam MNam LNam MNam LNam
FName e FName e
FName e LNam e e
e sex Phone Phone
SupplierID Name
Name CID Fax
Name Password SID
Address
Paymethod StaffID Staff 1
Phone username Register Customer
s N
SUPPLIER Address
Fax N 1 Email
1
Email
1
has Orders
Sup takes
Desc
ProductID plies RoleNam 1 N
Popstoc e N
k RoleID CID
SupplierID N Role
Pclstock Order
PName ordID
Pordered
Product 1
qperunit OrdDate 1
N uprice
discount contain contain
s s
N
Belongs
to
ODetailID paypaid ordID
N
to N
1 billno N Having OrderDetail
ODetailID
CatID Category
Paydate Payment
ordDat
deliDate
PayDue e
CName
PayID deliqty Discount
BalanceDue
ProdID
91 ordquantity
(ii) Draw an E-R diagram of Payroll System:
A payroll system refers to the scheme that is used to pay employees in a firm. A payroll refers to
the financial records that relate to the payment of the employees. A payroll database is an
automated system that allows you to input employees‘ payroll information and compensate them
accordingly. The database may be a stand-alone system that enables only payroll operations, or
an integrated system that enables related business functions. Here we will restrict our scope only
to stand- alone system. A stand-alone payroll database is a single payroll application that you use
to perform payroll tasks. This option may come in handy if you already have HR and accounting
solutions in place. An effective stand-alone database gives you a complete range of services that
allows you to fully manage your payroll activities. This includes new-hire reporting, wage and
deduction calculations, check printing, direct deposit, wage garnishments, tax reporting and
management, paycheck reconciliation, multiple company management, and electronic record-
keeping etc.
care of:
1. Entities identified for drawing an ER diagram of Payroll System are as follow:
Entity Purpose
Employee To maintain the complete personal details of the Employee.
Company To maintain the complete detail of the Company.
Branch If the company has multiple branches, then to maintain the
complete detail of the branch.
Employer It is assumed that the Employer has multiple companies,
branches and employees.
92
Salary To record the complete detail of the salary to the employee.
Department To record the information about the department.
Employee Name – F_NAM, Char (15) Not Null
L_NAM
E_ID Varchar2(3) Primary Key
DID Number(3) Foreign Key
Email Varchar2(20)
Designation Varchar2(15) Not Null
B_ID Number(3) Not Null
Department DID Number(3) Primary Key
Dname Varchar2(10) Not Null
E_ID Varchar2(3) Not Null
Company C_ID Number(3) Primary Key
C_Name Char (15) Not Null
Employer Emp_Name Varchar2(30) Not Null
Emp_Add Varchar2(30) Not Null
Email Varchar2(20) Not Null
93
Salary Basic Varchar2(3) Primary Key
Allowance Char (35) Not Null
Perquisites Char (15) Not Null
Total_Sal Varchar2(3) Primary Key
Tax Varchar2(3) Foreign Key
Net_Salary Date Not Null
E_ID Varchar2(3) Primary Key
Branch B_ID Number(3) Primary Key
B_Name Char(20) Not Null
DID Number(3) Foreign Key
E_ID Varchar2(3) Foreign Key
Belongs Employee, Department M:1
Gets Employee, Salary M:1
Works_In Employee, company M: 1
Employ Employee, employer M: 1
Has Company, Branch 1: M
Pays Employer, Salary 1:N
94
M
EMPLOYEE WORKS
IN
ADDRES 1
COMPANY
Phone S 1 1
DESIGNATIO C_ID
N C_NAME
NAME EMAILID
E_ID HAS
F_NAM
B_ID E 1…….M
L_NAM
E EMPLOY
1…….M BRANCH
ES
M
DID
M HEADED
BY
B_NAME B_ID
PHONE
GETS
BELONGS
DID E_ID
1…….. M
M 1
1
EMPLOYER
PAYS
1
ENAME EMAILID
DEPARTMENT EMP_ADDR
1
DID Dname
SALARY
E_ID E_ID
PERQS NET_SAL
BASIC
ALLOWNACE TOT_SAL TAX
95
iii) Draw an E-R diagram of Reservation System:
We now live in an era where practically everything is inextricable from the internet, including
business. It's now crucial that every business - no matter the sector - has a recognizable web
presence. This help in organizing/ reserving tour and other activity service online. An online
reservation system is "used to store and retrieve information about tour product, tour product
options or lodging facility and conduct transactions for booking it." As a case study, we are
hereby discussing the air reservation system as a reference. Airline need to maintain multiple
type of information such as route information, aircraft information, schedule information, fare
information and reservation information etc.
care of:
1. Entities identified for drawing an ER diagram of Reservation System are as follow:
Entity Purpose
Flight To maintain the flight details.
Passenger To maintain the passenger personal details
Airplane To record the details of aircraft.
Booking To records the booking details.
Flight Flight_no. Varchar2( (15) Primary Key
From Varchar2(15) Not Null
96
To Varchar2( (15) Not Null
DepartureDate Date Not Null
DepartureTime Number (6) Not Null
DepartureDate Date Not Null
DepartureTime Number (6) Not Null
Airplane ModelNo. Varchar2(15) Not Null
Capacity Number (5) Not Null
Registeration_no. Varchar2(15) Primary Key
Booking Booking_ Date Date Not Null
Online_Payment Number (5,2) Not Null
Seat_no. Number (5) Not Null
Class Varchar2(10) Not Null
Passenger Passsenger_id Number(10) Primary Key
Name—Fname, Char (35) Not Null
Lname
Address Varchar2 (15) Not Null
Contact_info Number(10) Not Null
Flies Flight ,Airplane M:1
Books Passenger, Booking M:N
To Flight, Passenger \1 : N
97
To
Flight Passenger id.
From no. Name
1 N
Departure
Flight Passenger
1To
Date FName Lname
M M
Departue
Time
Address
Arrival Time Flies Books

Contact Info
Arrival Date
1
N class
N
Airplane Booking
Model No. Seat_ No.
Capacity
Booking_ Date
Online_ payment
Registration No.
98
iv) Draw an E-R diagram of Online Book Store:
Shopping for books online helps you find the best possible price for just about any book you
want. If you‘re in the market for rare, collectible or autographed books, it‘s much cheaper and
faster to search online than it would be to call up local used and independent bookstores that
carry these types of items. The features available on many online bookstores also allow you to
compare similar titles with the click of a mouse and read reviews from professionals and
customers. You can also resell your used books to get more cash in your pocket and to clear out
your cluttered bookshelf. It‘s never been easier to ensure you never get stuck with a crummy title
again. A quality online bookstore will have a good product selection, an easy-to-use -yet
comprehensive- website, a variety of shipping options, a number of payment options, excellent
customer support and a strong return policy.
1. Entities identified for drawing an ER diagram of Online Book Store are as follow:
Entity Purpose
Customer To maintain the complete personal detail of the customer.
Order To create an order unique number placed by a customer.
Books To maintain complete record of books.
Author To maintain the detail of the Author.
Warehouse To maintain the record warehouse where the books are
stocked.
Books PDF To record the details of online books.
Publisher To maintain the record of publisher, who publish the books.
Book Store To maintain the record of book store.
99
Customer Name – FNAme, Char (15) Not Null
MName, LName
C_ID Varchar2(3) Primary Key
BookStore Regiteration_no Varachar2(10) Primary Key
Books ISBN_NO Varchar2(3) Primary Key
Price Number(4,2) Not Null
Vol_No. Number(3) Not Null
Year Number(4) Not Null
Issue_No. Number(3) Not Null
Book Name Varchar2(25) Not Null
Order OrderId Varchar2(3) Primary Key
O_Qty Number(3) Not Null
OrderDate Date Not Null
BooksPDF Name Varchar2(25) Not Null
ISBN Char (15) Primary Key
Author Varchar2(25) Not Null
Author Name – FNAme, Char (15) Not Null
100
MName, LName
A_ID Varchar2(5) Not Null
Warehouse Code Varchar2(3) Primary Key
Phone Number Number(10) Not Null
Publisher Name – FNAme, Char (15) Not Null
MName, LName
Visit Customer, book Store M:1
Place Order Customer, book, Order N: 1
Written By Book, Author M:N
Stock Book, Warehouse M:N
Contains Book Store, Book PDF 1:N
Publish By Books, Publisher M:N
101
M 1
CUSTOMER VISITS BOOKSTORE
ADDRESS
1 1
PHONE REGISTERATION _NO
C_ID
NAME ADDRESS
CONTAINS
F_NAME
M_NAME
L_NAME
PLACE CONTAINS 0…….M
ORDER
0..1 BOOKS PDF
ORDER_ID ISSUE_NO YEAR
ORDER_DATE
MAY
HAVE NAME AUTHOR
ORDER_QTY
0……..M M ISBN
ORDER M
BOOKS PUBLISHED
M M BY
WRITTEN
BY ISBN NO.
1
PRICE
M BOOK_NAME PHONE PUBLISHER

VOL_No
AUTHOR STOCKS
ADDRESS
A_ID NAME
ADDRESS L_NAME
L_NAME
NAME 1……..M F_NAME
WAREHOUSE M_NAME
F_NAME M_NAME
CODE 102
PHONE
ADDRESS
4.3.3 Some Other Specimen ER Diagram
(i) An ER Diagram of Airline Reservation System:
103
(ii) An ER Diagram of University System:
104
(iii) An ER Diagram of an Organization System:
105
(iv) An ER Diagram of Banking System:
4.3.4 Benefits of ER Diagram:
ER diagrams constitute a very useful framework for creating and manipulating databases. Some
of the benefits of designing an ER diagram are as follow:
 First, ER diagrams are easy to understand and do not require a person to undergo
extensive training to be able to work with it efficiently and accurately. This means that
106
designers can use ER diagrams to easily communicate with developers, customers, and
end users, regardless of their IT proficiency.
 Second, ER diagrams are readily translatable into relational tables which can be used to
quickly build databases. In addition, ER diagrams can directly be used by database
developers as the blueprint for implementing data in specific software applications.
 Lastly, ER diagrams may be applied in other contexts such as describing the different
relationships and operations within an organization.
4.4 Summary
This entity-relationship diagram depicts the major concepts and relationships needed for
managing any of the real life case studies. It is neither a complete data model depicting every
necessary relational database table, nor is it meant to be an exactly same design for
implementations of such real life case studies. Alternate models may capture the necessary
attributes and relationships. Therefore, in this chapter an attempt has been initiated to design
some useful case studies which will assist developers with envisioning the complexity of the
environment that an ERM system must address, and ensure that crucial relationships and features
an E-R diagram must address.
1. www.tutorialspoint.com/dbms/er_diagram_representation.htm
2. www.umsl.edu/~bcjtz4/umsl/er_diagrams.html
3. https://www.google.co.in/search?q=e-
r+diagrams+examples&sa=X&biw=1280&bih=590&tbm=isch&tbo=u&source=univ&ei
=nKAIVLXcCsyIuAT2woCIAQ&ved=0CCcQ7Ak
107
New Delhi.
1. What are the different phases of database designing? Explain each in detail.
2. Discuss the importance of requirement analysis phase of database designing process.
3. What are the uses of E-R diagram? Draw an E-R diagram of library system of an
Institute.
4. What do you mean by an ER diagram? Outline the different notations of an ER diagram.
Explain the benefits of an ER diagram.
5. What steps should be kept in mind while designing an ER diagram of University system?
Discuss.
6. Draw an ER diagram of airline reservation system. Explain the components of ER
diagram.
108
Chapter – 5: Data Models

Structure:
5.1 Introduction
5.2 Objective
5.3.1 Data Model
5.3.2 Importance of Data Models
5.3.3 Data Model Evolution
5.3.4 Type of Data Models
(i) Hierarchical Data Model
(ii) Network Data Model
(iii) Relational Data Model
(iv) Comparison between Hierarchical Model, Network Model and Relational
Model
(V) Entity-Relationship Data Model
(vi) Object-Oriented Model
(vii) Object/ Relational Model
5.3.5 Usage of Data Model
5.4 Summary
109
5.1 Introduction
Data models in DBMS are systems that help you use and create databases. DBMS actually stands
for a database management system. Various DBMS types exist with different speed, flexibilities
and implementations. Each type has an advantage over others but there is no one superior kinds.
The kind of structure and data you need determines which data model in DBMS suits your needs
best.
A data model can be thought of as a flowchart of diagram that shows data relationships among
objects. It can be time-intensive to capture all the data in a model but this should not be rushed as
it is quite important. Basically, a database management system is a program collection allowing
end users to control, maintain or create records in a data base. Primarily, features of DBMS
address database creation for record interrogation, queries and data extraction. The difference
between an application development environment and a DBMS system ranges from the
personnel to the data usage.
A data model not only describes the structure of the data, it also defines a set of operations that
can be performed on the data. A data model generally consists of data model theory, which is a
formal description of how data may be structured and used, and data model instance, which is a
practical data model designed for a particular application. The process of applying a data model
theory to create a data model instance is known as data modeling.
5.2 Objective
A model is a representation of reality, 'real world' objects and events, associations. It is an
abstraction that concentrates on the essential, inherent aspects an organization and ignores the
accidental properties. A data model represents the organization itself. It should provide the basic
concepts and notations that will allow database designers and end users unambiguously and
110
accurately to communicate their understanding of the organizational data. This chapter focuses
on the popular data models i.e. Hierarchical Data Model, Network Data Model, and Relational
Data Model. The terminologies of these models are discussed in detail using simple examples.
Some other data models are also explained which can also be used to represent data. A data
model makes it easy to understand the actual meaning of data to endure that:
 Each user‘s requirement of data will be known.
 Nature of the data is independent from physical structure.
 How to use the data across the application program.
5.3.1 Data Model
The main objective of database system is to highlight only the essential features and to hide the
storage and data organization details from the user. This is known as data abstraction. A database
model provides the necessary means to achieve data abstraction. A database model or simply a
data model is an abstract model that describes how the data is represented and used. A data
model consists of a set of data structures and conceptual tools that is used to describe the
structure (data types, relationships, and constraints) of a database.
A data model not only describes the structure of the data, it also defines a set of operations that
can be performed on the data. A data model generally consists of data model theory, which is a
formal description of how data may be structured and used, and data model instance, which is a
practical data model designed for a particular application. The process of applying a data model
theory to create a data model instance is known as data modeling.
111
Depending on the concept they use to model the structure of the database, the data models are
categorized into three types, namely, high-level or conceptual data models, representational or
implementation data models and low-level or physical data models.
1) High-level or conceptual data models (based on entities & relationships) -
Conceptual data model describes the information used by an organization in a way that is
independent of any implementation-level issues and details. The main advantage of conceptual
data model is that it is independent of implementation details and hence, can be understood even
by the end users having non-technical background. The most popular conceptual data model is,
entity-relationship (E-R) model.
2) Low-level or physical data models - Physical data model describes the data in terms of
a collection of files, indices, and other storage structures such as record formats, record ordering,
and access paths. This model specifies how the database will be executed in a particular DBMS
software such as Oracle, Sybase, etc., by taking into account the facilities and constraints of a
given database management system. It also describes how the data is stored on disk and what
access methods are available to it.
3) Representational or implementation data models (record-based, object-
oriented) - The representational or implementation data models hide some data storage details
from the users; however, can be implemented directly on a computer system. Representational
data models are used most frequently in all traditional commercial DBMSs. The various
representational data models are discussed in section 5.3.4.
Some data models are schematics which depict the manner in which data records are connected
or related within a file structure. These are called record or structural data models. Some data
models are used to identify the subjects of corporate data processing - these are called entity-
112
relationship data models. Still another type of data model is used for analytic purposes to help
the analyst to solidify the semantics associated with critical corporate or business concepts.
3.2 Importance of Data Models
Data models are important to study, because of the following features:
I. Relatively simple representation, usually graphical, of complex real-world data structures
II. Communications tool to facilitate interaction among the designer, the applications
programmer, and the end user
III. Good database design uses an appropriate data model as its foundation
IV. End-users have different views and needs for data
V. Data model organizes data for various users
3.3 Data Model Evolution
Modern database implementation models were not created from a vacuum. Instead, they are the
end result of decades of evolution. This evolution has been in the form of a series or
progressively more sophisticated data models. Such models are designed by the researchers with
a view to achieve the following properties of data models.
i. It should be able to represent all information diagrammatically.
ii. It should be simple and expressible to design the data in the database.
iii. There should be no redundancy in the data.
iv. It should be independent from application.
Some of the common characteristics among data models are given as follow:
i. Conceptual simplicity without compromising the semantic completeness of the database
ii. Represent the real world as closely as possible
113
iii. Representation of real-world transformations (behavior) must be in compliance with
consistency and integrity characteristics of any data model
Each new data model capitalized on the shortcomings of previous models.
Hierarchical  Difficult to represent M:N

Relationship (Hierarchical model)
 Physical Level Dependency
Semantics in Data Model
Network  No ad hoc queries

 Access path predefined
 Provide Ad hoc queries

Relational
 Set-oriented access
 Weak semantic content
Entity-Relationship  Easy to understand.
 Incorporates more semantics.
Semantic
 More semantics in data model.
 Support for complex objects.
 Inheritance.
 Behaviour
Object-Oriented Extended Relational
(object-Relational)
Figure 5.1: The Development of Data Models
5.3.4 Type of Data Models:
(i) Hierarchical Data Model
The hierarchical data model is the oldest type of data model, developed by IBM in 1968. This
data model organizes the data in a tree-like structure, in which each child node (also known as
dependents) can have only one parent node. The database based on the hierarchical data model
comprises a set of records connected to one another through links. The link is an association
114
between two or more records. The top of the tree structure consists of a single node that does not
have any parent and is called the root node.
The root may have any number of dependents; each of these dependents may have any number
of lower level dependents. Each child node can have only one parent node and a parent node can
have any number of (many) child nodes. It, therefore, represents only one-to-one and one-to-
many relationships. The collection of same type of records is known as a record type. Figure 5.2
shows the hierarchical model of Online Book database. It consists of three record types, namely,
PUBLISHER, BOOK, and REVIEW. For simplicity, only few fields of each record type are
shown. One complete record of each record type represents a node.
Advantages:
 It promotes data sharing.
 Parent/child relationship promotes conceptual simplicity.
 Database security is provided and enforced by DBMS.
 Parent/child relationship promotes data integrity.
 It is efficient with 1: m relationship.
Disadvantages:
 Complex implementation requires knowledge of physical data storage characteristics.
 Navigational system yields complex application development, management, and usage;
requires knowledge of hierarchical path.
 Changes in structure require changes in all applications.
 There are limitations with respect to its implementation.
 There is no data definition or data manipulation language in DBMS.
 There is a lack of standards.
115
Figure 5.2: Hierarchical Model of Online Book database
Sample Database for Hierarchical Data Model
In order to understand the hierarchical data model better, let us take the example of the sample
database consisting of supplier, parts and shipments. The record structure and some sample
records for supplier, parts and shipments elements are as given in following tables.
We assume that each row in Supplier table is identified by a unique SNo (Supplier Number) that
uniquely identifies the entire row of the table. Likewise each part has a unique Pno (Part
Number). Also we assume that no more than one shipment exists for a given supplier/part
combination in the shipments table.
116
Hierarchical View for the Suppliers-Parts Database
The tree structure has parts record superior to supplier record. That is parts from the parent and
supplier forms the children. Each of the four trees figure, consists of one part record occurrence,
together with a set of subordinate supplier record occurrences. There is one supplier record for
each supplier of a particular part. Each supplier occurrence includes the corresponding shipment
quantity.
117
For example, supplier S3 supplies 300 quantities of part P2. Note that the set of supplier
occurrences for a given part occurrence may contain any number of members, including zero (for
the case of part P4). Part PI is supplied by two suppliers, S1 and S2. Part P2 is supplied by three
suppliers, S1, S2 and S3 and part P3 supplied by only supplier SI as shown in figure.
Operations on Hierarchical Model
There are four basic operations Insert, Update, Delete and Retrieve that can be performed on
each model. Now, we consider in detail that how these basic operations are performed in
hierarchical database model.
Insert Operation: It is not possible to insert the information of the supplier e.g. S4 who does not
supply any part. This is because a node cannot exist without a root. Since, a part P5 that is not
supplied by any supplier can be inserted without any problem, because a parent can exist without
118
any child. So, we can say that insert anomaly exists only for those children, which has no
corresponding parents.
Update Operation: Suppose we wish to change the city of supplier S1 from Qadian to
Jalandhar, then we will have to carry out two operations such as searching S1 for each part and
then multiple updation for different occurrences of S1. But, if we wish to change the city of part
P1 from Qadian to Jalandhar, then these problems will not occur because there is only a single
entry for part P I and the problem of inconsistency will not arise. So, we can say that update
anomalies only exist for children not for parent because children may have multiple entries in the
database.
Delete Operation: In hierarchical model, quantity information is incorporated into supplier
record. Hence, the only way to delete a shipment (or supplied quantity) is to delete the
corresponding supplier record. But such an action will lead to loss of information of the supplier,
which is not desired. For example: Supplier S2 stops supplying 250 quantity of part PI, then the
whole record of S2 has to be deleted under part PI which may lead to loss the information of
supplier. Another problem will arise if we wish to delete a part information and that part happens
to be only part supplied by some supplier. In hierarchical model, deletion of parent causes the
deletion of child records also and if the child occurrence is the only occurrence in the whole
database, then the information of child records will also lost with the deletion of parent. For
example: if we wish to delete the information of part P2 then we also lost the information of S3,
S2 and S1 supplier. The information of S2 and Sl can be obtained from PI, but the information
about supplier S3 is lost with the deletion of record for P2.
Record Retrieval: Record retrieval methods for hierarchical model are complex and
asymmetric.
119
(ii) Network Data Model
The first specification of network data model was presented by Conference on Data Systems
Languages (CODASYL) in 1969, followed by the second specification in 1971. It is powerful
but complicated. In a network model the data is also represented by a collection of records, and
relationships among data are represented by links. However, the link in a network data model
represents an association between precisely two records. Thus, the complete network of
relationships is represented by several pairwise sets; in each set some (one) record type is owner
(at the tail of the network arrow) and one or more record types are members (at the head of the
relationship arrow). Usually, a set defines a 1: M relationship, although 1:1 is permitted. Like
hierarchical data model, each record of a particular record type represents a node. However,
unlike hierarchical data model, all the nodes are linked to each other without any hierarchy. The
main difference between hierarchical and network data model is that in hierarchical data model,
the data is organized in the form of trees and in network data model, the data is organized in the
form of graphs. Figure 5.3 shows the network model of Online Book database.
Advantages:
 Conceptual simplicity is at least equal to that of the hierarchical model.
 It handles more relationship type, such as m:n and multi-parent.
 Data access is more flexible.
 Data owner/ member promote data integrity.
 There is conformance to standards.
 It includes data definition language (DDL) and data manipulation language (DML)
Disadvantages:
 System complexity limits efficiency.
120
 Navigational system yields complex implementation, application development and
management.
 Structural changes require changes in all application programs.
Figure 5.3: Network Model of Online Book Database
Network view of Sample Database
Considering again the sample supplier-part database, its network view is shown. In addition to
the part and supplier record types, a third record type is introduced which we will call as the
connector. A connector occurrence specifies the association (shipment) between one supplier and
one part. It contains data (quantity of the parts supplied) describing the association between
supplier and part records.
All connector occurrences for a given supplier are placed on a chain .The chain starts from a
supplier and finally returns to the supplier. Similarly, all connector occurrences for a given part
are placed on a chain starting from the part and finally returning to the same part.
121
Operations on Network Model
Detailed description of all basic operations in Network Model is as under:
Insert Operation: To insert a new record containing the details of a new supplier, we simply
create a new record occurrence. Initially, there will be no connector. The new supplier's chain
will simply consist of a single pointer starting from the supplier to itself.
For example, supplier S4 can be inserted in network model that does not supply any part as a
new record occurrence with a single pointer from S4 to itself. This is not possible in case of
hierarchical model. Similarly a new part can be inserted who does not supplied by any supplier.
Consider another case if supplier S 1 now starts supplying P3 part with quantity 100, then a new
connector containing the 100 as supplied quantity is added in to the model and the pointer of S1
and P3 are modified as shown in the below.
We can summarize that there is no insert anomalies in network model as in hierarchical model.
Update Operation: Unlike hierarchical model, where updation was carried out by search and
had many inconsistency problems, in a network model updating a record is a much easier
process. We can change the city of S I from Qadian to Jalandhar without search or inconsistency
122
problems because the city for S1 appears at just one place in the network model. Similarly, same
operation is performed to change the any attribute of part.
Delete operation: If we wish to delete the information of any part say PI, then that record
occurrence can be deleted by removing the corresponding pointers and connectors, without
affecting the supplier who supplies that part i.e. P1, the model is modified as shown. Similarly,
same operation is performed to delete the information of supplier.
In order to delete the shipment information, the connector for that shipment and its
corresponding pointers are removed without affecting supplier and part information.
123
For example, if supplier SI stops the supply of part PI with 250 quantity the model is modified as
shown below without affecting P1 and S1 information.
Retrieval Operation: Record retrieval methods for network model are symmetric but complex.
(iii) Relational Data Model
The relational data model was developed by E. F. Codd in 1970. In the relational data model,
unlike the hierarchical and network models, there are no physical links. All data is maintained in
the form of tables (generally, known as relations) consisting of rows and columns. Each row
(record) represents an entity and a column (field) represents an attribute of the entity. The
relationship between the two tables is implemented through a common attribute in the tables and
not by physical links or pointers. This makes the querying much easier in a relational database
system than in the hierarchical or network database systems. Thus, the relational model has
become more programmer friendly and much more dominant and popular in both industrial and
academic scenarios. Oracle, Sybase, DB2, Ingres, Informix, MS-SQL Server are few of the
popular relational DBMSs. Figure 5.4 shows the relational model of Online Book database.
124
Properties of Relational Tables:
 Values are atomic in nature.
 Each row is unique.
 Column values are of the same kind.
 The sequence of columns is insignificant.
 The sequence of rows is insignificant.
 Each column has a unique name.
Advantages:
 Structural independence is promoted by the use of independent tables. Changes in a tables
structure do not affect data access or application programs.
 Tabular view substantially improves conceptual simplicity, thereby promoting easier
database design, implementation, management and use.
 AD HOC query capability is based on SQL.
 Powerful RDBMS isolates the end user from physical level details and improves
implementation and management simplicity
Entity: Publisher
P_ID Pname Phone
P001 Hills Publication 7134019
P002 Pearson Education 2134562
P003 Khanna Publication 7876543
125
Entity: Book
ISBM Book Price
001-354-921-1 Ransack 22/=
000-987-760-9 C++ 25/=
Entity: Review
R_ID Rating
A0002 6.0
A0006 7.5
Figure 5.4: Relational Model of Online Book Database
Disadvantages:
 The RDBMS requires substantial hardware and system software overhead.
 Conceptual simplicity gives relatively untrained people the tools to use a good system
poorly, and if unchecked, it may produce the same data anomalies found in file systems.
 IT may promote island of information.
(iv) Comparison between hierarchical model, network model and relational
model
When we move with the data models such as hierarchical model, network model, relational
model, we can identify number of differences in terms of data structures, Data manipulation and
data integrity.
Characteristic Hierarchical model Network model Relational model
Data It is used as a method In network model we can Relational model is
126
structure for storing data in a identify multiple branches based on relation
database that looks emanating from one or more among entities.
like a family tree with nodes. So sometimes it looks Relation is a two-
one root and a number like several trees which share dimensional table.
of branches or branches. So the table can be
subdivisions. used to represent
Therefore record types some entity
are organized in the information or some
form of a rooted tree. relationship between
them
Data One to many or one to Allowed the network model to One to One,
structure one relationships support many to many One to many, Many
relationships to many
relationships
Data Based on parent child A record can have many Based on relational
structure relationship parents as well as many data structures.
children.
Data Does not provide an CODASYL (Conference on Relational databases
manipulation independent stand Data Systems Languages) are what brings
alone query interface many sources into a
common query (such
as SQL)
Data Retrieval algorithms Retrieval algorithms are Retrieval algorithms
127
manipulation are complex and complex and symmetric are simple and
asymmetric symmetric
Data integrity Cannot insert the Does not suffer form any Does not suffer from
information of a child insertion anomaly. any insertion
who does not have any anomaly.
parent.
Data integrity Multiple occurrences Free from update anomalies. Free form update
of child records which anomalies
lead to problems of
inconsistency during
the update operation
Data integrity Deletion of parent Free from deletion anomalies Free from deletion
results in deletion of anomalies
child records
(V) Entity-Relationship Data Model
The Entity - Relationship Model (E-R Model) is a high-level conceptual data model developed
by Chen in 1976 to facilitate database design. Conceptual Modeling is an important phase in
designing a successful database. A conceptual data model is a set of concepts that describe the
structure of a database and associated retrieval and updation transactions on the database. A high
level model is chosen so that all the technical aspects are also covered. The E-R data model grew
out of the exercise of using commercially available DBMS's to model the database. The E-R
model is the generalization of the earlier available commercial models like the Hierarchical and
128
the Network Model. It also allows the representation of the various constraints as well as their
relationships. Therefore, the Entity-Relationship (E-R) Model is based on the view of a real
world that consists of set of objects called entities and relationships among entity sets which are
basically a group of similar objects. The relationships between entity sets is represented by a
named E-R relationship and is of 1:1, 1: N or M: N type which tells the mapping from one entity
set to another. Entity-Relationship model has one important advantage. In as much as it is non-
DBMS specific, and is in fact not a DBMS model at all, data models can be developed by the
design team without first having to make a choice as to which DBMS to use.
Features of the E-R Model:
1. The E-R diagram used for representing E-R Model can be easily converted into Relations
(tables) in Relational Model.
2. The E-R Model is used for the purpose of good database design by the database developers so
to use that data model in various DBMS.
3. It is helpful as a problem decomposition tool as it shows the entities and the relationship
between those entities.
4. It is inherently an iterative process. On later modifications, the entities can be inserted into this
model.
5. It is very simple and easy to understand by various types of users and designers because
specific standards are used for their representation.
(vi) Object - Oriented Model
Object DBMSs add database functionality to object- oriented programming languages. They
bring much more than persistent storage to programming language objects. Object DBMSs
extend the semantics of the C++, Smalltalk and Java object programming languages to provide
129
full-featured database programming capability, while retaining native language compatibility. A
major benefit of this approach is the unification of the application and database development into
a seamless data model and language environment. As a result, applications require less code, use
more natural data modeling, and code bases are easier to maintain. Object developers can write
complete database applications with a modest amount of additional effort.
According to Rao (1994), "The object-oriented database (OODB) paradigm is the combination of
object-oriented programming language (OOPL) systems and persistent systems. The power of
the OODB comes from the seamless treatment of both persistent data, as found in databases, and
transient data, as found in executing programs."
In contrast to a relational DBMS where a complex data structure must be flattened out to fit into
tables or joined together from those tables to form the in-memory structure, object DBMSs have
no performance overhead to store or retrieve a web or hierarchy of interrelated objects. This one-
to-one mapping of object programming language objects to database objects has two benefits
over other storage approaches: it provides higher performance management of objects, and it
enables better management of the complex interrelationships between objects. This makes object
DBMSs better suited to support applications such as financial portfolio risk analysis systems,
telecommunications service applications, world wide web document structures, design and
manufacturing systems, and hospital patient record systems, which have complex relationships
between data.
Advantages:
 Semantic content is added.
 Visual representation includes semantic content.
 Inheritance promotes data integrity.
130
Disadvantages:
 Slow development of standards caused vendors to supply their own enhancements, thus
eliminating widely accepted standards.
 It is a complex navigational system.
 There is a steep learning curve.
 High system overhead slows transactions.
(vii) Object/ Relational Model
Object/relational database management systems (ORDBMSs) add new object storage capabilities
to the relational systems at the core of modern information systems. These new facilities
integrate management of traditional fielded data, complex objects such as time-series and
geospatial data and diverse binary media such as audio, video, images, and applets. By
encapsulating methods with data structures, an ORDBMS server can execute complex analytical
and data manipulation operations to search and transform multimedia and other complex objects.
As an evolutionary technology, the object/relational (OR) approach has inherited the robust
transaction- and performance-management features of it s relational ancestor and the flexibility
of its object-oriented cousin. Database designers can work with familiar tabular structures and
data definition languages (DDLs) while assimilating new object-management possibilities.
Query and procedural languages and call interfaces in ORDBMSs are familiar: SQL3, vendor
procedural languages, and ODBC, JDBC, and proprietary call interfaces are all extensions of
RDBMS languages and interfaces. And the leading vendors are, of course, quite well known:
IBM, Informix, and Oracle.
131
(viii) Semi structured Data Model
Unlike other data models, where every data item of a particular type must have the same set of
attributes, the semi structured data model allows individual data items of the same type to have
different set of attributes. In semi structured data model, the information about the description of
the data (schema) is contained within the data itself, which is sometimes called self-describing
data. In such databases there is no clear separation between the data and the schema and thus,
allowing data of any type. Semi structured data model has recently emerged as an important
topic of study for different reasons given here.
 There are data sources such as the Web, which is to be treated as databases; however,
they cannot be constrained by a schema.
 The need of flexible format for data exchange between heterogeneous databases.
 To facilitate browsing of data.
Semi structured data model facilitates data exchange among heterogeneous data sources. It helps
to discover new data easily and store it. It also facilitates querying the database without knowing
the data types. However, it loses the data type information.
5.3.5 Usage of Data Model
(i) Useful for the Personnel: Now that you know all the different data models, it is only
right that you know the advantages of data models in DBMS. A system of DBMS consists of
database administrators and managers that oversee the entire operation of DBMS. Primarily, the
duties are making sure primary schedule is run daily, loading program releases and the
maintenance of database records. Application development consists of system analysts, computer
technicians and programmers with the job of finding errors in the software for testing.
132
(ii) Useful For Records Interrogation: Programs for records interrogation are designed to
provide information to the end users through many programs such as general inquiry programs,
report generators and Query. The Query program is the most popular one, allowing end users to
develop basic skills of programming by constructing simple programs of data using a processor
for query language for data extraction. For records interrogation, Query programs happen to be
quite powerful.
(iii) Useful for Catalog Programs: In a system of DBMS, end users are able to catalog
program favorites to delete, edit or view data. Each of the users is able to copy routines to a
catalog file that is user defined for managing databases. The system of catalogs is a personal tool
used to run programs by end users without having a specialist of applications design programs
for them.
(iv) Useful for Accessing Data: Typically, there are centralized databases for DBMS
systems. Databases can be accessed by end-users without an application program developer
needing to create program access or interruption from a programmer. In the software, the record
structures and database are already built in. In this area, the advantage is access to data records
and structures.
5.4 Summary
Models are a blue-print of plan and play a major role in success of any project. In fact models are
more suitable than picture to express the thought because they include some logic and reasoning
in picture to achieve success of any project of an organization. To carry on this idea an early
proposal for a standard terminology and general architecture of database as a system was
produced in 1971 by the DBTG (Data Base Task Group) appointed by the Conference on data
Systems and Languages (CODASYL). The DBTG recognized the need for a two level approach
133
with a system view called the schema and user view called subschema. The American National
Standard Institute Terminology and Architecture (ANSI-SPARC) in 1975 recognized the need
for a three level approach with a system catalog. Therefore, they proposed relational data model.
With the technological improvements and as the need arises further other models are introduced
as discussed in this chapter.
New Delhi.
Edition.
3. S.K.Singh: Database Systems Concept, Design and Applications, ,2006, Pearson
Education, ISBN: 81-7758-567-3.
4. .J.Date: An Introduction to Databases Systems, 7th Edition, Addison Wesley, New Delhi.
5. Alesian Leon, Mathews Leon: Database Management systems, Vikas Publication House
Pvt. Ltd., ISBN: 0-81-259-1165-0.
1. What do you mean by a Data Model? Discuss the different types of data models used.
2. Define data model. Under which categories data models are broadly classified? What is
the importance of data model?
3. Discuss the different data model along with their advantages and disadvantages.
4. Draw a comparative chart among Hierarchical, Network and Relational data models.
5. Discuss the importance of data model. How it is useful for an organization?
134
Chapter – 6: Relational Algebra

Structure:
6.1 Introduction
6.2 Objective
6.3.1 Relational Algebra
6.3.2 Uses of Relational Algebra
6.3.3 Relational-Oriented Operation
(i) Select Operation
(ii) Project Operation
(iii) Join Operation
(iv) Combining Operations
6.3.4 Set-oriented Operations
(i) Union
(ii) Intersection
(iii) SET Difference
(iv) Cartesian Join
(v) Division
6.3.5 Sample Queries Using Relational Algebra
6.3.6 Equivalence
6.3.7 Comparing Relational Algebra and SQL
6.4 Summary
135
6.1 Introduction
Relational algebra is a formal language describing how new relations are created from old ones.
It is a useful tool for describing queries on a database management system. To defining the data
structure and constraints, a data model must include a set of operation to manipulate the data. A
basic set of relational model operations constitutes the relational algebra. In modern relational
database management systems each relation is stored as a table. Each row in the table represents
one tuple from the relationship, and each column one attribute. In that sense, we can think of
relational algebra as a language that can be used to describe operations for creating new tables
from existing tables in a database management system.
6.2 Objective
Relational algebra is a query language that is being used to explain basic relational operations
and their principles. Most of the currently used relational database management systems work
with SQL queries. Relational algebra is a good example of procedural language. This helps in
understanding the essential features of the relational model. Relational algebra explores the
various concepts of integrity that apply to the relational model. This chapter will teach us that
how a relational algebra expression interrogates a relational database.
6.3.1 Relational Algebra
Relational algebra is a formal language describing how new relations are created from old ones.
It is a useful tool for describing queries on a database management system. In relational database
management systems each relation is stored as a table. Each row in the table represents one tuple
from the relationship, and each column one attribute. In that sense, we can think of relational
algebra as a language that can be used to describe operations for creating new tables from
136
existing tables in a database management system. Give below are some of the important reasons
for which relational algebra is used frequently.
 Similar to normal algebra (as in 2+3*x-y), except we use relations as values instead of
numbers, and the operations and operators are different.
 Not used as a query language in actual DBMSs. (SQL instead.)
 The inner, lower-level operations of a relational DBMS are, or are similar to, relational
algebra operations. We need to know about relational algebra to understand query
execution and optimization in a relational DBMS.
 Some advanced SQL queries requires explicit relational algebra operations, most
commonly outer join.
 Relations are seen as sets of tuples, which means that no duplicates are allowed. SQL
behaves differently in some cases. Remember the SQL keyword distinct.
 SQL is declarative, which means that you tell the DBMS what you want, but not how it is
to be calculated. A C++ or Java program is procedural, which means that you have to
state, step by step, exactly how the result should be calculated. Relational algebra is
(more) procedural than SQL. (Actually, relational algebra is mathematical expressions.)
6.3.2 Use of Relational Algebra
The schema of a relation is similar to the format or structure of a table. The schema of a relation
is the set of attributes that forms a tuple for that relation. For example, the following describes
the schema for the relation student:
Student(student #, first name, last name, street address, city, state, zip, phone, major, GPA)
When we wish to obtain information from a database, we use a language like SQL to create a
query for the database management system to process. The response from the database will be a
137
result set with a particular schema. The definition above says ―Relational algebra is a formal
language describing how new relations are formed from existing relations.‖ If we think of the
tables in the database as the ―existing relations‖, and the result set of the query as the ―new
relations‖, then relational algebra is a language that can be used to describe data base queries that
will return a result set from the existing database.
result set = query (existing database)
Relational algebra is a procedural query language, which takes instances of relations as input and
yields instances of relations as output. It uses operators to perform queries. An operator can be
either unary or binary. The algebra operation thus produces new relations, which can be further
manipulated using operations of the same algebra. Relational algebra specifies the operations to
perform on existing relation to derive result relations. It defines the complete schema for each of
the result relations. The relational algebraic operations can be divided into two basic groups.
1) Relational-Oriented Operations
2) Set-Oriented Operations
6.3.3 Relational-Oriented Operations
This group consists of operations developed specifically for relational databases; these include
SELECT, PROJECT and JOIN. These operations are explained in upcoming paragraphs:
(i) The SELECT Operation (σ)
This operation is used to select only some of the tuples from a relation that satisfy a selection
condition (predicate) as shown in figure given below. It can be consider like a filter of rows from
a relation on the basis of certain criteria. In relational algebra SELECT operation is denoted by
the symbol σ (Sigma).
In general SELECT operation expression is given by
138
σ <Selection Condition>(R)
Where σ is symbol, denote SELECT operation, and the selection condition is a Boolean
expression specified on the attribute of relation R.
Figure 6.1: Select Operation
Consider a relational Employee. We can retrieve the rows of employee those work for
department finance and getting salary more that Rs. 25000/=. We can individually specify each
of these two conditions with a SELECT operation as follows.
σ DEP = “Finance’ (EMPLOYEE)
σ SALARY > 25000 (EMPLOYEE)
The boolean expression specified in < selection condition> is made up of a attribute name an
operator and a constant value or attribute name e.g. in above example SALARY is attribute
name, the operator is greater than (>) and 25000 is a constant value.
The above two algebraic expression is similar to following SQL statements:
Select * from EMPLOYEE
Where DEP = ‘Finance’;
Select * from EMPLOYEE
Where SALARY > 25000;
139
The SELECT operation is Unary; it means it is applied to a single relation. The degree of the
relation (number of displayed attributes) resulting from the SELECT operation is the same as
that of number of attributes in relation R.
Example
Query: Retrieve the Id, Name, Age of Students who live in Kurukshetra.
STUDENT
ID NAME AGE CITY
101 Ram Kumar 24 Panipat
102 Prem Lata 21 Kurukshetra
103 Pankaj Garg 20 Yamuna Nagar
104 Hitesh Goyal 22 Kurukshetra
σ CITY = “Kurukshetra’ (STUDENT)
Result
ID NAME AGE CITY
(ii) The Project Operation (π)
The project operation selects certain columns from the table and discards the other columns. The
projection operation is used to either reduce the number of attributes in the resultant relation or to
reorder attributes. The projection of a relation is defined as a projection of all tuples over some
set of attributes. In relational algebra PROJECT operation is denoted by the symbol π (pi).
140
In general project operation expression is given by
π <Attribute list> (R)
Where π is symbol, denoted PROJECT operation, and the attribute list is a list of attributes from
the relation R hat are to be chosen for display.
Figure 6.1: Project Operation
Consider a relation Employee. We can retrieve only some columns (say Name, Age and Salary
only) of employee table. Then the simplest PROJECT operation expression is given as follows:
ΠNAME,AGE,SALARY (EMPLOYEE)
ΠSEX,SSN (EMPLOYEE)
The result of the project operation has only the attributes specified in <attribute list> and in the
same order as they appear in the list. The PROJECT operation removes any duplicates tuples.
This is also known as duplicate elimination.
The PROJECT operation is Unary; it means it is applied to a single relation. The number of
tuples in a relation resulting from PROJECT operation is always less than or equal to total
number of attributes in relation (R).
The above algebraic expression is similar to following SQL statements:
SELECT NAME, AGE, SALARY FROM EMPLOYEE;

141
SELECT SEX, SSN FROM EMPLOYEE;
Example
Query: Retrieve the Id, Name of Students.
STUDENT
ID NAME AGE CITY
101 Ram Kumar 24 Panipat
103 Pankaj Garg 20 Yamuna Nagar
ΠID,NAME (STUDENT)
Result
ID NAME
101 Ram Kumar
102 Prem Lata
103 Pankaj Garg
104 Hitesh Goyal
(iii) THE JOIN OPERATION (∞)
The Join operation is used to combine related tuples from two relations into single tuples. The
tuples from the operand relations that participate in the operation and contribute to the result are
related. The join operation allows the processing of relationships existing between the operand
relations. In relational algebra JOIN operation is denoted by the symbol ∞.
142
In general JOIN operation expression is given by
(R1) ∞ < Join condition> (R2)
Where ∞ is symbol, denote JOIN operation, and the join condition is a Boolean expression
specified in the relation R.
Let we want to retrieve the name of the manager of each department. We need to combine each
department tuples with the employee tuple whose SSN matched the MGRSSN value in
department tuple. In relation algebra it is represented by
DEPT_MGR← DEPARTMENT ∞MGRSSN = SSN EMPLOYEE
RESULT ← ΠDNMAE,LNMAE,FNMAE (DEPT_MGR)
The above algebraic expression is similar to following SQL statements:
SELECT DNAME, LNAME, FNMAE FROM DEPT, EMPLOYEE
WHERE MGRSSN = SSN;
The JOIN operation is Binary; it means it is always applied with two relations results of the
JOIN is a relation Q with n+m attributes Q(A1,A2,A3…..An, B1,B2,B3…..Bn).
Generally, a JOIN operation performs with equality comparison only. Such a JOIN where the
only equal (=) comparison operator is used, is called EQUIJOIN. The result of EQUIJOIN
always has one or more pairs of attributes that have identical values.
If the two join attributes have the same name in both relations, such type of join is known as
NATURAL JOIN. NATURAL JOIN is denoted by the symbol *.
Example
Query: Retrieve the information of student who enrolls in at least one course.
STUDENT
ID NAME CITY
143
100 Vinod Kaithat
200 Jagdish Hisar
300 Armaan Hisar
400 Mandeep Sirsa
ENROL
EID COURSEID
200 101
100 113
300 101
(STUDENT) ∞ < ENROL.COURSEID =STUDETN.ID > (ENROL)
Result
ID NAME CITY COURSEID
100 Vinod Kaithat 113
200 Jagdish Hisar 101
300 Armaan Hisar 101
There are several types of joins, but the most basic type is the Cartesian join. Other joins,
including the natural join, the equi-join and the theta join, are variations of the Cartesian join in
which special rules are applied. Cartesian product will be discussed later in this chapter. Rest of
the Join types are discussed as below:
144
a) Natural Join
A natural join is performed on two relations that share at least one attribute, and is defined as
follows:
The natural join of relation A with relation B is a new relation formed by matching all tuples
from relation A one by one with all tuples from relation B one by one, but only where the value
of the shared attributes are the same. Each shared attribute is only included once in the schema
of the result set. A natural join can only be performed on two relations that have at least one
shared attribute.
The symbol for a natural join is ⋈ We would write C = A ⋈ B.
b) Theta Join
A theta join is similar to a Cartesian join, except that only those tuples are included that meet a
specified condition, as follows:
The theta join of relation A with relation B is a new relation formed by matching all tuples from
relation A one by one with all tuples from relation B one by one, but only where the tuples meet
a specified condition, called the theta predicate. If the relations share any attributes, then each
shared attribute is only included once in the schema of the result set.
The general symbol for a theta join is composite symbol, similar to the symbol for a natural join
subscripted with the Greek letter theta: ⋈θ .We would write C = A ⋈θ B. In practice, the theta
is replaced with the actual condition.
c) Equi-Join
An equi-join, which is similar to both a theta join and a natural join, is defined as follows:
The equi-join of relation A with relation B is a new relation formed by matching all tuples from
relation A one by one with all tuples from relation B one by one, but only where the tuples meet
145
a specified condition of equality, called the equi-join predicate. If the relations share any
attributes, then each shared attribute is only included once in the schema of the result set.
The symbolism for an equi-join is similar to the symbol for a natural join subscripted with the
equal sign: ⋈= .We would write C = A ⋈= B. Just as with the theta join, in practice the equal
sign is replaced with the actual condition.
The difference between an equi-join and a theta join is that the condition must be one of equality
in an equi-join. The difference between an equi-join and a natural join is that the two relations
do not need to have a common attribute in an equi-join.
Example:
We wish to match groups of people waiting for tables at a restaurant with the available tables, on
the condition that the number of people in the group equals the number of seats at the table.
Let W = the relation with data for all the groups waiting for tables
Let T = the relation with data for all of the available tables
Let M = the relation with data assigning groups to tables
M = W ⋈group.size = table.seats T
In this notation the attribute names are shown in their more complex form, relation.attribute, so
that group.size refers to the size attribute of the group relation, and table.seats refers to the seats
attribute of the table relation.
iv) Combining Operation
In general, several relation algebra operations are applied one after another. In this situation we
can write the operation as a nesting algebra expression or apply one operation at a time by
creating intermediate result. For example, to retrieve the first name,lastname, and the salary of all
146
employee who work in the department ‗FINANCE‘. We must be applied SELECT and
PROJECT operations. The combine relation algebraic operation can be given as follows:
ΠFNAME, LNAME, SALARY (σ DEP = ‘ Finance’ (EMPLOYEE))
Alternatively, this can be given like:
DEP_FINANCE → σ DEP = ‘Finance’ (EMPLOYEE)
RESULT → ΠFNAME, LNAME,SALARY (DEP_FINANCE)
6.3.4 Set-oriented Operations
These are the traditional set theory operations that include UNION, INTERSECTION, SET
DIFFERENCE and CARTESIAN PRODUCT. These are binary operation it means these are
applied in two sets. These operations are applied on only those relations that have same number
of attributes of same data type. This condition is called union compatibility. In other words two
relation R and S are said to be union compatible if they have same degree n and each pair of
corresponding attributes have the same domain.
(i) Union
The Union of relation A and relation B is a new relation containing all of the tuples contained in
either relation A or relation B. Union can only be performed on two relations that have the same
schema. The symbol for union is ∪.In relational algebra we would write something like R3 =
R1 ∪ R2. In other words Union is a binary operation performed on two relations and the new
table contains all the rows from both the tables, but duplicate in two tables will be shown only
once in the resultant tables. Two tables must be of same degree i.e. same number of columns and
same data type in both the tables.
Example:
147
R1
Name Rank Age
Mukta 1 34
Satvik 2 25
Aryan 3 18
R2
Name Rank Age
Arpit 5 13
Sidhant 10 17
Prerna 15 41
Satvik 2 25
Aryan 3 18
R3 = R1 U R2
Name Rank Age
Mukta 1 34
Satvik 2 25
Aryan 3 18
Arpit 5 13
Sidhant 10 17
Prerna 15 41
148
The union operation is both commutative and associative.
Commutative law of union: A ∪ B = B ∪ A
Associative law of union: (A ∪ B) ∪ C = A ∪ (B ∪ C)
(ii) Intersection
Like union, intersection means pretty much the same thing in relational algebra that it does in
simple set theory:
The Intersection of relation A and relation B is a new relation containing all of the tuples that are
contained in both relation A and relation B. Intersection can only be performed on two relations
that have the same schema. The symbol for intersection is ∩. In relational algebra we would
write something like R3 = R1 ∩ R2.
Example:
R1
Name Rank Age
Mukta 1 34
Satvik 2 25
Aryan 3 18
R2
Name Rank Age
Arpit 5 13
Sidhant 10 17
Prerna 15 41
Satvik 2 25
149
Aryan 3 18
R3 = R1 ∩ R2
Name Rank Age
Satvik 2 25
Aryan 3 18
The intersection operation is both commutative and associative.
Commutative law of intersection: A ∩ B = B ∩ A
Associative law of intersection: (A ∩ B) ∩ C = A ∩ (B ∩ C)
Unlike union, however, intersection is not considered as a basic operation, but a derived
operation, because it can be derived from the basic operations. We will look at difference next,
but the derivation looks like this:
R1 ∩ R2 = R1 – (R1 – R2)
For our purposes, however, it really doesn‘t matter that intersection is a derived operation.
(iii) SET Difference
The difference operation also means pretty much the same thing in relational algebra that it does
in simple set theory:
The difference between relation A and relation B is a new relation containing all of the tuples
that are contained in relation A but not in relation B. Difference can only be performed on two
relations that have the same schema. The symbol for difference is the same as the minus sign -
We would write R3 = R1 – R2.
Example:
150
R1
Name Rank Age
Mukta 1 34
Satvik 2 25
Aryan 3 18
R2
Name Rank Age
Arpit 5 13
Sidhant 10 17
Prerna 15 41
Satvik 2 25
R3 = R1 - R2
Name Rank Age
Mukta 1 34
Aryan 3 18
R3 = R2 – R1
Name Rank Age
Arpit 5 13
Sidhant 10 17
Prerna 15 41
The difference operation is neither commutative nor associative.
A-B ≠B-A
(A - B) - C ≠ A - (B - C)
151
(iv) Cartesian Join
The PRODUCT operation combines information from two relations pair wise on tuples. The
Cartesian product of two relations is the concatenation of tuples that belong to the two relations.
In other words the Cartesian product of two relations results in a new relation that includes every
row of first relation with every row of second relation.
It is not required that the two tables should be union compatible or that are of same degree. It is a
binary operation so it will take two relations to perform the operation.
Specify the Cartesian product of two sets X (for example the points on X-axis) and Y (for
example the points on Y-axis), denoted X * Y, is the set of all possible ordered pairs whose first
component is a member of X and whose second component is a member of Y (e.g. the whole of
the X-Y plane)
Imagine that we have two sets, one composed of letters and one composed of numbers, as
follows:
S1 = { a, b, c, d} and S2 = { 1,2,3}
The cross product of the two sets is a set of ordered pairs, matching each value from S1 with
each value from S2.
S1 x S2 = { (a,1), (a2), (a3), (b1), (b2), (b3), (c,1), (c2), (c3), (d1), (d2), (d3) }
One exception is with the empty set, which acts as a ―zero‖ and for equal sets.
In relational algebra, the cross product of two relations is also called the Cartesian Product or
Cartesian Join.
Example
152
R1
Name Rank Age
Mukta 1 34
Satvik 2 25
Aryan 3 18
R2
ID Hobby
101 Music
102 Dance
103 Cricket
104 Fine Arts
R1* R2
Name Rank Age ID Hobby
Mukta 1 34 101 Music
Mukta 1 34 102 Dance
Mukta 1 34 103 Cricket
Mukta 1 34 104 Fine Arts
Satvik 2 25 101 Music
Satvik 2 25 102 Dance
Satvik 2 25 103 Cricket
Satvik 2 25 104 Fine Arts
Aryan 3 18 101 Music
153
Aryan 3 18 102 Dance
Aryan 3 18 103 Cricket
Aryan 3 18 104 Fine Arts
Although Cartesian Joins form the conceptual basis for all other joins, they are rarely used in
actual database management systems because they often result in a relation with a large amount
of data. Consider the case of a table with data for 40,000 students, with each row needing 300
bytes of storage space, and a table for 2,000 advisors, with each row needing 200 bytes. The
two original tables would need about 12,000,000 and 400,000 bytes of storage space (12
megabytes and 400 kilobytes). The Cartesian join of these two would have 80,000,000 records,
each with nearly 600 bytes of storage space for a total of 48,000,000,000 bytes (48 gigabytes).
Another reason that Cartesian joins are not used often is this: What is the value of a Cartesian
join? How often do we really need to create such a table?
The other types of joins, which are based on the Cartesian join, are used more often, and are
commonly applied in combination with projection and selection operations.
(v) Division
The division is a binary operation that is written as R ÷ S. The result consists of the restriction of
tuples in R to the attribute names unique to R, i.e. in the header of R but not in the header of S,
for which it holds that all their combinations with tuples in S are present in R.
The division is very useful for a special kind of query such as ―Retrieve the name of the student
who enroll in all course taught by Professor Ram Kumar‖.
154
6.3.5 Sample Queries Using Relational Algebra:
Consider the following Relation, their respective attributes and tuple to solve queries in
Relational Algebra. Only one specimen tuple has been taken. You can have more tuple as per
requirement.
Employee
Fname Mname Lname SSN BDate Address Sex Salary SuperSSN Dno.
Amit Kr Goel 12345 1-1-90 1,ABC N 15,000/ 23456 5
. . . . . . . . . .
. . . . . . . . . .
Department
DName DNumber MGRSSN MGRSDate
Research 5 23456 2-5-1988
. . . .
. . . .
Dept_Location
DNumber DLocation
5 Old Campus
. .
. .
Works_On
ESSN Pno Hours
155
12345 1 32
. . .
. . .
Project
PName PNumber PLocation DNum
Zbase 3 Old Campus 5
. . . .
. . . .
Dependent
ESSN DependentName Sex BDate Relaionship
12345 Kavish M 1-10-2012 Son
. . . . .
. . . . .
QUERY 1
Retrieve the name and address of all employees who work for the 'Research' department.
RESEARCH_DEPT ← σ DNAME=' RESEARCH' (DEPARTMENT)
RESEARCH_EMPS ← (RESEARCH_DEPT ∞ DNUMBER=DNOEMPLOYEE)
RESULT ← ΠFNAME, LNAME, ADDRESS (RESEARCH_EMPS)
156
This query could be specified in other ways; for example, the order of the JOIN and SELECT
operations could be reversed, or the JOIN could be replaced by a NATURAL JOIN after
renaming one of the join attributes.
QUERY 2
For every project located in 'Stafford', list the project number, the controlling department
number, and the department manager's last name, address, and birth date.
STAFFORO_PROJS ← σPLOCATION=' STAFFORD' (PROJECT)
CONTR_DEPT ← (STAFFORD_PROJS ∞ DNUM=DNUMBER DEPARTMENT)
PROJ_DEPT_MGR ← (CONTR_DEPT ∞ MGRSSN=SSN EMPLOYEE)
RESULT ← Π PNUMBER, DNUM, LNAME, ADDRESS. BDATE (PROJ_DEPT_MGR)
QUERY 3
Find the names of employees who work on all the projects controlled by department number 5.
DEPT5_PROJS (PNO) ← Π PNUMBER(σDNUM=5 (PROJECT))
EMP_PROJ(SSN, PNO) ← Π ESSN, PNO (WORKS_ON)
RESULT_EMP_SSNS ← EMP_PROJ ’ DEPT5_PROJS
RESULT ← Π LNAME, FNAME (RESULT_EMP_SSNS * EMPLDYEE)
QUERY 4
Make a list of project numbers for projects that involve an employee whose last name is 'Smith',
either as a worker or as a manager of the department that controls the project.
SMITHS(ESSN) ← Π SSN(σLNAME=' SMITH' (EMPLOYEE))
SMITH_WORKER_PROJ ← Π PNO(WORKS_ON * SMITHS)
MGRS ← Π LNAME, DNUMBER (EMPLOYEE∞ SSN=MGRSSN DEPARTMENT)
SMITH_MANAGED_DEPTS (DNUM) ← Π DNUMBER(σLNAME=' SMITH' (MGRS))
157
SMITH_MGR_PROJS (PNO) ← Π PNUMBER (SMITH_MANAGED_DEPTS * PROJ ECT)
RESULT ← (SMITH_WORKER_PROJS U SMITH_MGR_PROJS)
QUERY 5
List the names of all employees with more than two dependents.
This query cannot be done in the basic (original) relational algebra. We have to use the
AGGREGATE FUNCTION operation with the COUNT aggregate function.
We assume that dependents of the same employee have distinct DEPENDENT_NAME values.
T1(SSN, NO_OF_DEPTS) ← ESSN COUNT DEPENDENT NAME (DEPENDENT)
T 2 ← (σNO_OF_DEPS >(T1)
RESULT ← Π LNAME, FNAME(T2 * EMPLOYEE)
QUERY 6
Retrieve the names of employees who have no dependents.
This is an example of the type of query that uses the MINUS (SET DIFFERENCE) operation.
ALL_EMPS ← Π SSN (EMPLOYEE)
EMPS_WITH_DEPS (SSN) ← Π ESSN (DEPENDENT)
EMPS_WITHOUT_DEPS ← (ALL_EMPS - EMPS_WITH_DEPS)
RESULT ← Π LNAME, FNAME (EMPS_WITHOUT_DEPS * EMPLOYEE)
QUERY 7
List the names of managers who have at least one dependent.
MGRS(SSN) ← Π MGRSSN(DEPARTMENT)
EMPS_WITH_DEPS (SSN) ← Π ESSN (DEPENDENT)
MGRS_WITH_DEPS ← (MGRS ∩ EMPS_WITH_DEPS)
RESULT ← Π LNAME, FNAME (MGRS_WITH_DEPS * EMPLOYEE)
158
The same query can in general be specified in many different ways. For example, the operations
can often be applied in various orders. In addition, some operations can be used to replace others;
for example, the INTERSECTION operation in Query 7 can be replaced by a NATURAL JOIN.
6.3.6 Equivalence
The same relational algebraic expression can be written in many different ways. The order in
which tuples appear in relations is never significant.
 A ×B ⇔ B × A
 A∩B⇔B∩A
 A ∪B ⇔ B ∪ A
 (A - B) is not the same as (B - A)
 σc1 (σc2(A)) ⇔ σc2 (σc1(A)) ⇔ σc1 ^ c2(A)
 πa1(A) ⇔ πa1(πa1,etc(A))
where etc represents any other attributes of A.
 many other equivalences exist.
While equivalent expressions always give the same result, some may be much easier to evaluate
those others. When any query is submitted to the DBMS, its query optimizer tries to find the
most efficient equivalent expression before evaluating it.
6.3.7 Comparing Relational Algebra and SQL
Relational Algebra: The result of every expression is a relation. It has a rigorous foundation has
simple semantics. It is used for reasoning, query optimization etc. Relational algebra is the
mathematical basis for relational databases developed by E.F. Codd. It is a kind of set theory that
gives a solid provable framework for software design that involves lots of data that must be
managed.
159
SQL: The Structured Query Language (SQL) is the common language of most database software
such as MySql, Postgresql, Oracle, DB2, etc. This language translates the relational theory into
practice but imperfectly, SQL is a language that is a loose implementation of relational theory
and has been further modified in its actual implementation by the Relational Database
Management System (RDBMS) software that uses it. It is a superset of relational algebra. It has
convenient formatting features etc. It provides aggregate functions. It has complicated semantics.
It is an end-user language.
6.4 Summary
Most commercial relational database system offers a query language. A query language is a
language in which user requests information from the database. The query language may be of
two types, i.e. Procedural Query Language and Non-Procedural Query Language. Relational
Algebra is procedural query language. It is an offshore of first order logic, deals with a set of
relations closed under operators. Operators operate on one or more relations to yield a relation.
Relation algebra is a pure mathematics and an algebraic structure to mathematical logic and set
theory.
New Delhi.
Edition.
3. S.K.Singh : Database Systems Concept, Design and Applications, ,2006, Pearson
160
Delhi.
5. Vinod Kamboj : Fundamental of DBMS and Oracle, 2012, ABS Publications.
1. What is Relational Algebra? How many types of operation it support, define each?
2. Define term unary operation and binary operations.
3. What do you mean by Relational Algebra? Discuss the various relational-oriented
operations. Explain with the help of examples.
4. What are the uses of Relational Algebra? Explain the different set-oriented operations
with the help of examples.
5. Discuss the equivalence in relational algebra. How you will compare relational algebra
with SQL?
161
Chapter – 7: An Introduction to SQL
Structure:
7.1 Introduction
7.2 Objective
7.3.1 About SQL
7.3.2 SQL Data Types
7.3.3 SQL Commands
(i) Data Definition Language (DDL)
(ii) Data Manipulation Language (DML)
(iii) Transaction Control Language (TCL)
(iv) Data Control Language (DCL)
7.3.4 Views in SQL
(i) Creating Views
(ii) Updating a View
7.3.5 Queries and Sub-Query in SQL
7.3.6 Constraints in SQL
7.3.7 SQL Indexes
7.3.8 Sample SQL queries examples
7.4 Summary
162
7.1 Introduction
SQL is a data sublanguage used to organize, manage and retrieve data from a relational database,
which is managed by RDBMS. The origin of SQL and the development of relational database
has same revolution path. Dr. E.F. Codd, an IBM researcher, developed the relational database
concept in June 1970. SQL was conceived in an IBM San Jose Research Laboratory in the mid
1970s as a database language for the new relational database model. In the Late 1970s IBM was
ready to develop a relational database system, SQL/DS RDBMS. Upon the news of this
development, vendors rushed to develop their own RDBMS. A small company, Relational
Software Incorporation beat IBM to the market with its own RDBMS. Relational Software
Incorporation later became Oracle Corporation.
7.2 Objective
SQL gives you everything you need to create, maintain and control your database. Some user
will never have to create a database and will be connected with the querying processes in SQL
found in DML language. Other will not only need to create database but will also have to
maintain and administer database. For these users, SQL provides DDL, DML and DCL.
7.3.1 About SQL
SQL stands for Structured Query Language. SQL is used to communicate with a database.
According to ANSI (American National Standards Institute), it is the standard language for
relational database management systems. SQL statements are used to perform tasks such as
update data on a database, or retrieve data from a database. Some common relational database
163
management systems that use SQL are: Oracle, Sybase, Microsoft SQL Server, Access, Ingres,
etc.
7.3.2 SQL Data Types
Oracle uses the table for storing the information in rows and columns. Each column can only
contain one type of data which we must define. A data type is an attribute that specifies the type
of data that the object can hold. The data type fall into following categories:
Fixed-length character string. Size is specified in parenthesis. Max 255

char(size)
bytes.
Varchar/Varchar2
Variable-length character string. Max size is specified in parenthesis.
(size)
Number value with a max number of column digits specified in

number(size)
parenthesis.
Date Date value
Number value with a maximum number of digits of "size" total, with a

number(size,d)
maximum number of "d" digits to the right of the decimal.
7.3.3 SQL Commands
SQL commands are instructions, coded into SQL statements, which are used to communicate
with the database to perform specific tasks, work, functions and queries with data.
SQL commands can be used not only for searching the database but also to perform various other
functions like, for example, you can create tables, add data to tables, or modify data, drop the
table, set permissions for users. SQL commands are grouped into four major categories
depending on their functionality:
164
(i) Data Definition Language (DDL) - These SQL commands are used for creating,
modifying, and dropping the structure of database objects. The commands are CREATE,
ALTER, DROP, RENAME, and TRUNCATE.
 CREATE TABLE - creates a new database table.
 ALTER TABLE - alters a database table.
 DROP TABLE - deletes a database table.
 RENAME TABLE- Rename on object.
 TRUNCATE- Remove all records from a table, including all spaces allocated for the
records are removed.
(ii) Data Manipulation Language (DML) - These SQL commands are used for storing,
retrieving, modifying, and deleting data. These Data Manipulation Language commands
are: SELECT, INSERT, UPDATE, and DELETE.
 SELECT - get data from a database table.
 INSERT INTO - insert new data in a database table.
 UPDATE - change data in a database table.
 DELETE - remove data from a database table.
(iii) Transaction Control Language (TCL) - These SQL commands are used for managing
changes affecting the data. These commands are COMMIT, ROLLBACK, and
SAVEPOINT.
 COMMIT- Save work done.
 ROLLBACK- Restore database to original since the last commit.
 SAVEPOINT- Identify a point in a transaction to which you can later roll back.
165
(iv) Data Control Language (DCL) - These SQL commands are used for providing security
to database objects. These commands are GRANT and REVOKE.
 GRANT- To allow specified users to perform specified tasks.
 REVOKE- To cancel previously granted or deny permission.
(i) Data Definition Language (DDL)
(a) CREATE TABLE STATEMENT
The create table statement is used to create a new table. Here is the formats of a simple create
table statement:
Syntax:
create table "tablename"
("column1" "data type",
"column2" "data type",
"column3" "data type");
Format of create table if you were to use optional constraints:
create table "tablename"
("column1" "data type" [constraint],
"column2" "data type" [constraint],
"column3" "data type" [constraint]);
Note: You may have as many columns as you'd like, and the constraints are optional [ ] =
optional.
To create a new table, enter the keywords create table followed by the table name, followed by
an open parenthesis, followed by the first column name, followed by the data type for that
column, followed by any optional constraints, and followed by a closing parenthesis. It is
166
important to make sure you use an open parenthesis before the beginning table, and a closing
parenthesis after the end of the last column definition. Make sure you separate each column
definition with a comma. All SQL statements should end with a ";".
The table and column names must start with a letter and can be followed by letters, numbers, or
underscores - not to exceed a total of 30 characters in length. Do not use any SQL reserved
keywords as names for tables or column names (such as "select", "create", "insert", etc).
Data types specify what the type of data can be for that particular column. If a column called
"Last_Name", is to be used to hold names, then that particular column should have a "varchar/
varchar2" (variable-length character) data type.
What are constraints? When tables are created, it is common for one or more columns to have
constraints associated with them. A constraint is basically a rule associated with a column that
the data entered into that column must follow. For example, a "primary key" constraint specifies
that no two records can have the same value in a particular column. They must all be unique and
cannot have null values. The other two most popular constraints are "not null" which specifies
that a column can't be left blank, and "unique key".
Example:
create table employee
(first varchar(15),
last varchar(20),
age number(3),
address varchar(30),
city varchar(20),
state varchar(20));
167
(b) ALTER TABLE STATEMENT
ALTER TABLE command can be used to add, delete or modify columns in an existing table.
When you add a column you must specify a data type.
To add a column in an existing table:
Syntax:
ALTER TABLE ―table_name‖
ADD column ―column_name‖ datatype (size);
Example:
Alter table employee
Add column phone_no (number(10));
To drop a column from an existing table:
Syntax:
DROP COLUMN ―column_name‖;
Example:
drop column phone_no;
To modify the size of an existing column of a table:
Syntax:
MODIFY ―column_name‖ datatype (size);
Example:
Modify phone_no (number(12));
168
(c) DROP TABLE SATEMENT
To remove an entire table from the database, we use DROP command.
Syntax:
DROP TABLE table_name;
Example:
drop table employee;
The drop table command is used to delete a table and all rows in the table. To delete an entire
table including all of its rows, issue the drop table command followed by the table_name. Drop
table is different from deleting all of the records in the table. Deleting all of the records in the
table leaves the table including column and constraint information. Dropping the table removes
the table definition as well as all of its rows.
(d) RENAME TABLE SATEMENT
By using rename command table name wil be changed to new name. The data of the table will
not be lost.
Syntax:
rename <old table_name> to < new table_name>;
Example:
rename employee to employee_master;
(e) TRUNCATE TABLE STATEMENT
Truncate command removes all rows from a table, but the table structure and its columns,
constraints, indexes and so on remains. In SQL, truncate table command quickly removes all
data from a table, typically bypassing a number of integrity enforcing mechanisms.
Syntax:
169
truncate table ―table_name‖;
Example:
truncate table employee;
(ii) Data Manipulation Language (DML)
(a) SELECT STATEMENT
The SELECT is used to query the database and retrieve selected data that match the specific
criteria that you specify:
Syntax:
SELECT column1 [, column2, ...]
FROM ―table_name‖
[WHERE Clause]
[GROUP BY clause]
[HAVING clause]
[ORDER BY clause];
The where clause can include these operators
 = Equal
 > Greater than
 < Less than
 >= Greater than or equal
 <= Less than or equal
 <> Not equal to
 LIKE pattern matching operator
Note: You may have as many clause as you'd like, and the clause are optional [ ] = optional.
Example:
170
select * from employee;
will returns all the data from the employee table.
The SQL SELECT DISTINCT Statement
In a table, a column may contain many duplicate values; and sometimes you only want to list the
different (distinct) values. The DISTINCT keyword can be used to return only distinct (different)
values.
Syntax:
select distinct “column_name‖, “column_name‖ from “table_name‖;
Example:
Select distinct city from employee;
(b) INSERT STATEMENT
The insert statement is used to insert or add a row of data into the table. There are two basic
syntaxes of INSERT INTO statement as follows:
Syntax1:
insert into "tablename"
first_column,...last_column)
values (first_value,...last_value);
In the example below, the column name first will match up with the value 'Satvik', and the column
name state will match up with the value 'Haryana'.
Syntax2:
You may not need to specify the column(s) name in the SQL query if you are adding values for
all the columns of the table. But make sure the order of the values is in the same order as the
columns in the table. The SQL INSERT INTO syntax would be as follows:
171
insert into ―table_name‖ values (first_value1,second_value2,...last_value);
Example 1:
insert into employee
first, last, age, address, city, state)
values ('Satvik', 'Garg', 25, '1489 Sector 3','Kurukshetra', 'Haryana');
first, last, age, address, city, state)
values (‗Mukta', 'Garg', 29, '107 ward 8','Yamuna Nagar', 'Haryana');
Example 2:
values ('Satvik', 'Garg', 25, '1489 Sector 3','Kurukshetra', 'Haryana');
values (‗Mukta', 'Garg', 29, '107 ward 8','Yamuna Nagar', 'Haryana');
Note: All strings should be enclosed between single quotes: 'string'
To insert records into a table, enter the key words insert into followed by the table name,
followed by an open parenthesis, followed by a list of column names separated by commas,
followed by a closing parenthesis, followed by the keyword values, followed by the list of values
enclosed in parenthesis. The values that you enter will be held in the rows and they will match up
with the column names that you specify. Strings should be enclosed in single quotes, and
numbers should not.
(c) UPDATE STEMENT
The update statement is used to update or change records that match specified criteria. This is
accomplished by carefully constructing a where clause.
172
Syntax1:
update "tablename"
set "columnname" = "newvalue"
[,"nextcolumn" = "newvalue2"...]
where "columnname" = "value" ;
In the above syntax Where clause is introduced, only when we want to update the table data
based on specified condition. If we want to update the attribute value for all the rows of a
specific column, the where clause is not required.
Syntax2:
update "tablename"
set "columnname" = "newvalue"
[,"nextcolumn" = "newvalue2"...]
Example1:
update employee
set first = ‗Aryan‘;
where city = ‗Kurukshetra‘
Note: The example 1 will replace attribute (first) value for the above first tuples as Aryan only
since condition will be matched.
Example2:
update employee
set first = ‗Aryan‘;
Note: The example 2 will replace attribute (first) value for all tuples as Aryan.
(d) Delete Statement
The delete statement is used to delete records or rows from the table.
173
Syntax:
delete from "tablename"
[where "columnname" = "value"];
Note: You may have where clause as optional [ ].
Example:
delete from employee
where city = 'Kurukshetra';
Note: if you leave off the where clause, all records will be deleted.
delete from employee;
To delete an entire record/row from a table, enter "delete from" followed by the table name. If
delete statement is followed by the where clause which contains the conditions to delete. Then
those rows will be delete for which the where condition is true.
(iii) Transaction Control Language (TCL)
(a) Commit
The commit statement saves all changes made to the database since the last commit or rollback
command. In Oracle, changes made to the database are not permanent until you tell the oracle to
make it permanent. The commit statement makes permanent any change to the database during
the current session/ transaction.
Syntax:
Commit:
Example:
Commit:
174
(b) Rollback
The rollback statement is the reverse of commit statement. It undoes some or all database
changes made during the current transaction. The rollback command is the transaction control
command used to undo transaction that have not already been saved to the database. The rollack
command can only be used to undo transaction since the last Commit or Rollback command was
issued.
Syntax:
Rollback:
Example:
Rollback:
(c) Savepoint
Savepoint is the a special mark inside a transaction that allows all command that are executed
after it was established to be rolled back, restoring the transaction state to what it was at the time
of the Savepoint. The Savepoint statement defines a Savepoint with in a transaction. Changes
made after a Savepoint can be undone at any time prior to the end of the transaction. A
transaction can have multiple Savepoint.
(iv) Data Control Language (DCL)
(a) Grant
This statement deals with who has access to your database? The GRANT command enables you
to grant privileges to user. You can grant privilege on seeing, adding, deleting, referencing and
using of a table.
175
If you decide that Mukta can see your tables in your database, you would use GRANT SELECT
statement. If you desire, you can allow her to insert or update your data in tables. The command
you would use is GRANT INSERT or GRANT UPDATE.
Syntax:
Grant privilege right;
Example:
Grant Select on Book to Mukta; (Here Book is an entity on which privilege has been granted for
Mukta.)
(b) Revoke
Whatever you grant, however, you may also revoke. The REVOKE statement enables you to
take away privileges you previously granted.
Syntax:
Revoke privilege right;
Example:
Revoke Select on Book to Mukta; (Here Book is an entity on which privilege has been revoked
for Mukta.)
7.3.4 Views in SQL
A view is nothing more than a SQL statement that is stored in the database with an associated
name. A view is actually a composition of a table in the form of a predefined SQL query. A view
can contain all rows of a table or select rows from a table. A view can be created from one or
many tables which depend on the written SQL query to create a view.
Views, which are kind of virtual tables, allow users to do the following:
 Structure data in a way that users or classes of users find natural or intuitive.
176
 Restrict access to the data such that a user can see and (sometimes) modify exactly what
they need and no more.
 Summarize data from various tables which can be used to generate reports.
(i) Creating Views:
Database views are created using the CREATE VIEW statement. Views can be created from a
single table, multiple tables, or another view. To create a view, a user must have the appropriate
system privilege according to the specific implementation.
Syntax:
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE [condition];
Note: You can include multiple tables in your SELECT statement in very similar way as you use
them in normal SQL SELECT query.
(ii) Updating a View:
A view can be updated under certain conditions:
 The SELECT clause may not contain the keyword DISTINCT.
 The SELECT clause may not contain summary functions.
 The SELECT clause may not contain set functions.
 The SELECT clause may not contain set operators.
 The SELECT clause may not contain an ORDER BY clause.
 The FROM clause may not contain multiple tables.
 The WHERE clause may not contain subqueries.
177
 The query may not contain GROUP BY or HAVING.
 Calculated columns may not be updated.
 All NOT NULL columns from the base table must be included in the view in order for
the INSERT query to function.
So if a view satisfies all the above-mentioned rules then you can update a view.
7.3.5 Queries and Sub-Query in SQL
A query is a means to retrieve meaningful information from the database. There are different
ways to execute query as discussed earlier in this chapter. A Subquery or Inner query or Nested
query is a query within another SQL query and embedded within the WHERE clause.
A sub-query is used to return data that will be used in the main query as a condition to further
restrict the data to be retrieved.
Sub-queries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along
with the operators like =, <, >, >=, <=, IN, BETWEEN etc.
There are a few rules that sub-queries must follow:
 Sub-queries must be enclosed within parentheses.
 A sub-query can have only one column in the SELECT clause, unless multiple columns
are in the main query for the sub-query to compare its selected columns.
 An ORDER BY clause cannot be used in a sub-query, although the main query can use
an ORDER BY. The GROUP BY clause can be used to perform the same function as the
ORDER BY in a sub-query.
 Sub-queries that return more than one row can only be used with multiple value
operators, such as the IN operator.
178
 The SELECT list cannot include any references to values that evaluate to a BLOB,
ARRAY, CLOB, or NCLOB.
 A sub-query cannot be immediately enclosed in a set function.
 The BETWEEN operator cannot be used with a sub-query; however, the BETWEEN
operator can be used within the sub-query.
7.3.6 Constraints in SQL
The SQL CONSTRAINTS are some restrictions in the form of rules which defines some
conditions that restricts the column to remain true while inserting or updating or deleting data in
the column. Constraints can be specified when the table created first with CREATE TABLE
statement or at the time of modification of structure of an existing table with ALTER TABLE
statement.
The SQL CONSTRAINTS are used to implement the rules of the table. If there is any action
which violates the rules so defined by the SQL constraints, then the action is aborted by the
constraint. Some CONSTRAINTS can be used along with the SQL CREATE TABLE statement.
There are 6 types of constraints:
1. Primary key constraint.
2. Foreign Key constraint.
3. Unique Key constraint.
4. Not Null constraint.
5. Check constraint
6. Default constraint
Description of above constraints is as follows:
179
(i) Primary Key: Primary Key of a relational table uniquely identifies each record in the
table. It can be either be a normal attribute that is guaranteed to be unique such as in a school
name should be same of any student but roll number never be same of any student in a school.
(ii) Foreign Key: One of the most important concepts in database is creating relationships
between database tables. These relationships provide a mechanism for linking data stored in
multiple tables and retrieving it in an efficient manner. In order to create a link between two
tables we must specify a foreign key in one table that references a column in another table.
(iii) Unique Key: Unique key constraint is used to make sure that there is no duplicate value
in that column. Both unique key and primary key enforces the uniqueness of column but there is
one difference between them Unique key constraint allow one null value but primary key does
not null value. In a table we create one primary key but we can create more than one unique key
in Sql Server.
(iv) Not null constraint: Not null constraint is used to restrict the insertion of null value in
that column. Not null constraint is used for that column which is not ignorable.
(v) Check Constraint: This constraint is used to check value at the time of insertion like as
salary of any employee is always greater than zero. So we can create a check constraint on
employee table for all the field values which are greater than zero.
(vi) Default Constraint: The Default constraint is used to set a specific value of column if
we are not passing the value at the time of insertion. Through this constraint we set the default
value of column.
7.3.7 SQL Indexes
Index in sql is created on existing tables to retrieve the rows quickly. When there are thousands
of records in a table, retrieving information will take a long time. Therefore indexes are created
180
on columns which are accessed frequently, so that the information can be retrieved quickly.
Indexes can be created on a single column or a group of columns. When an index is created, it
first sorts the data and then it assigns a ROWID for each row.
Syntax to create Index:
CREATE INDEX index_name
ON table_name (column_name1,column_name2...);
In Oracle there are two types of SQL index namely, implicit and explicit.
(i) Implicit Indexes:
They are created when a column is explicity defined with PRIMARY KEY, UNIQUE KEY
Constraint.
(ii) Explicit Indexes:
They are created using the "create index.. " syntax.
NOTE:
1) Even though sql indexes are created to access the rows in the table quickly, they slow down
DML operations like INSERT, UPDATE, DELETE on the table, because the indexes and tables
both are updated along when a DML operation is performed. So, use indexes only on columns
which are used to search the table frequently.
2) It is not required to create indexes on table which have less data.
3) In oracle database you can define up to sixteen (16) columns in an INDEX.
7.3.8 Sample SQL Queries Examples
Specimen of Table Structure
181
Relational Diagram
Table name: EMPLOYEE
Table name: PAINTER
Table name: JOB
Table name: ASSIGNMENT
Table name: PROJECT
Write the SQL code that will create the table structure for a table named EMP_1. This
table is a subset of the EMPLOYEE table as shown above.
CREATE TABLE EMP_1 (
EMP_NUM CHAR(3) PRIMARY KEY,
EMP_LNAME VARCHAR(15) NOT NULL,
EMP_FNAME VARCHAR(15) NOT NULL,
EMP_INITIAL CHAR(1),
EMP_HIREDATE DATE,
JOB_CODE CHAR(3),
182
EMP_YEARS NUMBER(3),
FOREIGN KEY (JOB_CODE) REFERENCES JOB);
Write the SQL code to enter the first two rows for EMP_1 table.
INSERT INTO EMP_1 VALUES (‗101‘, ‗News‘, ‗John‘, ‗G‘, ‘08-Nov-00‘, ‗502‘);
INSERT INTO EMP_1 VALUES (‗102‘, ‗Senior‘, ‗David‘, ‗H‘, ‘12-Jul-89‘, ‗501‘);
After inserting multiple rows in table EMP_1, the records in the table are shown as below:
Assuming that the data shown in the EMP_1 table have been entered, write the SQL code
that will list all attributes for a job code of 502.
SELECT *
FROM EMP_1
WHERE JOB_CODE = ‗502‘;
Write the SQL code that will save the changes made to the EMP_1 table.
COMMIT;
Write the SQL code to change the job code to 501 for the person whose personnel number
is 107. After you have completed the task, examine the results, and then reset the job code
to its original value.
UPDATE EMP_1
SET JOB_CODE = ‗501‘
WHERE EMP_NUM = ‗107‘;
183
To see the changes:
SELECT *
FROM EMP_1
WHERE EMP_NUM = ‗107‘;
To reset, use
ROLLBACK;
Write the SQL code to delete the row for the person named William Smithfield, who was
hired on June 22, 2004 and whose job code classification is 500.
DELETE FROM EMP_1
WHERE EMP_LNAME = 'Smithfield'
AND EMP_FNAME = 'William'
AND EMP_HIREDATE = '22-June-04'
AND JOB_CODE = '500';
Write the SQL code that will restore the data to its original status; that is, the table should
contain the data that existed before you made the changes in Questions 5 and 6.
ROLLBACK;
Write the SQL code to create a copy of EMP_1, naming the copy EMP_2. Then write the
SQL code that will add the attributes EMP_PCT and PROJ_NUM to its structure. The
EMP_PCT is the bonus percentage to be paid to each employee. The new attribute
characteristics are:
EMP_PCT NUMBER (4,2)
PROJ_NUM CHAR (3)
There are two way to get this job done. The two possible solutions are shown next.
184
Solution A:
CREATE TABLE EMP_2 (
EMP_NUM CHAR(3) NOT NULL UNIQUE,
EMP_LNAME VARCHAR(15) NOT NULL,
EMP_FNAME VARCHAR(15) NOT NULL,
EMP_INITIAL CHAR(1),
EMP_HIREDATE DATE NOT NULL,
JOB_CODE CHAR(3) NOT NULL,
PRIMARY KEY (EMP_NUM),
FOREIGN KEY (JOB_CODE) REFERENCES JOB);
INSERT INTO EMP_2 SELECT * FROM EMP_1;
ALTER TABLE EMP_2
ADD (EMP_PCT NUMBER (4,2)),
ADD (PROJ_NUM CHAR(3));
Solution B:
CREATE TABLE EMP_2 AS SELECT * FROM EMP_1;
ALTER TABLE EMP_2
ADD (EMP_PCT NUMBER (4,2)),
ADD (PROJ_NUM CHAR(3));
185
Write the SQL code to enter an EMP_PCT value of 3.85 for the person whose employee
number (EMP_NUM) is 103.
UPDATE EMP_2
SET EMP_PCT = 3.85
WHERE EMP_NUM = '103';
Using a single command sequence, write the SQL code that will enter the project number
(PROJ_NUM) = 18 for all employees whose job classification (JOB_CODE) is 500.
UPDATE EMP_2
SET PROJ_NUM = '18'
WHERE JOB_CODE = '500';
Using a single command sequence, write the SQL code that will enter the project number
(PROJ_NUM) = 25 for all employees whose job classification (JOB_CODE) is 502 or
higher.
UPDATE EMP_2
SET PROJ_NUM = '25'
WHERE JOB_CODE > = '502'
Let the table look like as below after the above database operations:
186
Write the SQL code that will change the PROJ_NUM to 14 for those employees who were
hired before January 1, 1994 and whose job code is at least 501.
UPDATE EMP_2
SET PROJ_NUM = '14'
WHERE EMP_HIREDATE <= ' 01-Jan-94'
AND JOB_CODE >= '501';
Write the SQL code required to list all employees whose last names start with Smith. In
other words, the rows for both Smith and Smithfield should be included in the listing.
Assume case sensitivity.
SELECT *
FROM EMP_2
WHERE EMP_LNAME LIKE 'Smith%';
7.4 Summary
The Structured Query Language, or SQL, is one of the most powerful tools available today when
it comes to working with data sets and getting the information you need from databases. SQL is a
bit different from programming, that it requires that you ask the database for the information you
are looking for without regard for how that information will be retrieved. You only want the
information in the manner in which you have requested it. The SQL engine will take care of
working with sorted information to return the information you need.
1. Abbey, Abramson & Corey: Oracle 8i-A Beginner's Guide, Tata McGraw Hill Publishing
Company Ltd.
187
2. Ivan Bayross: SQL, PL/SQL-The Program Language of ORACLE,BPB Publication,
New Delhi.
New Delhi.
4. Korth & Silberschatz: Database System Concept, 4th Edition, McGraw Hill International
Edition.
5. S.K.Singh: Database Systems Concept, Design and Applications, 2006, Pearson
1. What are SQL commands? Discuss the different components of SQL with syntax and
suitable examples.
2. What are SQL DDL commands? How integrity constraints are achieved by SQL?
3. What is the difference between DROP and DELETE command? Explain with examples.
4. Differentiate between Alter and Update commands of SQL.
5. Explain GRANT and REVOKE command with suitable examples.
6. What are views? How you can create it?
7. What do you mean by constraints? Discuss the different types of Constraints.
8. Write a note on indexes.
188
Chapter – 8: Functional Dependencies
Structure:
8.1 Introduction
8.2 Objective
8.3.1 Functional Dependency
8.3.2 Importance of Dependencies
8.3.3 Types of Functional Dependency
8.3.4 Closure Set of Functional Dependency
8.3.5 Armstrong‘s Axioms
8.3.6 Minimal Functional Dependencies or Irreducible Set of Dependencies
8.4 Summary
189
8.1 Introduction
The purpose of database design is to arrange the data field into an organized structure such that it
generates set of relationships and stores information without unnecessary redundancy. In fact, the
redundancy and database consistency are the most important logical criteria in database design.
A bad database design may result into repetitive data and information and an inability to
represent desired information. It is therefore, important to examine the relationships that exist
among the data of an entity to refine the database design. In the present chapter, functional
dependency concepts have been discussed to achieve the minimum redundancy without
compromising on easy data and information retrieval properties of the database.
8.2 Objective
To develop a good description of the data, its relationships and constraints, it is necessary to
understand the concept of functional dependency (FD). FD produces a stable set of relations that
is a faithful model for the enterprise. Such models are highly flexible. It also helps in reducing
redundancy, saving memory space and to maintain consistency among the data. Database
dependencies are important to understand because they provide the basic building blocks used in
database normalization.
8.3.1 Functional Dependency
Functional dependency in a database, serves as a constraint between two sets of attributes.
Defining functional dependency is an important part of relational database design and contributes
to aspect normalization.
Functional dependency is a relationship that exists when one attribute uniquely determines
another attribute. If R is a relation with attributes X and Y, a functional dependency between the
190
attributes is represented as X →Y, which specifies Y is functionally dependent on X. Here X is a
determinant set and Y is a dependent attribute. Each value of X is associated precisely with one
Y value.
Functional dependency defines Boyce-Codd normal form and third normal form in
Normalization. This preserves dependency between attributes, eliminates the repetition of
information. Functional dependency is related to a candidate key, which uniquely identifies a
tuple and determines the value of all other attributes in the relation. In some cases, functionally
dependent sets are irreducible if:
 The right-hand set of functional dependency holds only one attribute
 The left-hand set of functional dependency cannot be reduced, since this may change the
entire content of the set
 Reducing any of the existing functional dependency might change the content of the set
Functional dependency can be defined as follow:
An FD is a relationship between an attribute "Y" and a determinant (1 or more other attributes)
"X" such that for a given value of a determinant ‖Y‖ the value of the attribute ―X‖ is uniquely
defined.
Therefore formally the important properties of Functional Dependency are as follow:
 The concept of describing the whole database as a single universal relation schema.
 Informal definition: A functional dependency is a constraint between two sets of attributes
from the database.
 Formal definition:
o A functional dependency, denoted by X --> Y, between two sets of attributes X and Y
that are subsets of R specifies a constraint on the possible tuples that can form a relation
191
state r of R. The constraint is that, for any two tuples t1 and t2 in r that have t 1[X] =
t2[X], we must also have t 1[Y] = t2[Y].
o That is, the values of the Y component of a tuple in r depend on the values of the X
component.
o Or, there is a functional dependency from X to Y.
o The set of attributes X is called the left-hand side of the FD, and Y is called the right-
hand side .
o X functionally determines Y in a relation schema R if and only if, whenever two tuples of
r(R) agree on their X-value, they must necessarily agree on their Y-value.
 If X is a candidate key of R, then there exists a FD from X to Y for any subset of attributes Y
of R.
 The main use of functional dependencies is to describe further a relation schema R by
specifying constraints on its attributes that must hold at all times.
 Relation extensions r(R) that satisfy the functional dependency constraints are called legal
extensions (or legal relation states) of R.
 A functional dependency is a property of the relation schema (intension) R, not of a
particular legal relation state (extension) r of R. That is, a functional dependency must hold
for ALL the extensions of R.
In a Functional Dependency:
 X is a determinant
 X determines Y
 Y is functionally dependent on X
 X→Y
192
 X →Y is trivial if Y  X
As per Figure 8.1 given below, C# is a determinant of Cname, Ccity and Cphone" is thus also
"Cname, Ccity and Cphone are functionally dependent on C#". Given a particular value of
Cname value, there exists precisely one corresponding value for each of Cname, Ccity and
Cphone. This is more clearly seen via the following functional dependency diagram:
Figure 8.1: Functional dependencies in the Transaction relation
Similarly in Figure 8.2, "(C#, P#, Date) is a determinant of Qnt" is thus also "Qnt is
functionally dependent on the set of attributes (C#, P#, Date)". The set of attributes is also
known as a composite attribute.
Figure 8.2: Functional dependency on a composite attribute
Example 1:
In a Student relation; Student_ID determines Student _address, because corresponding to one
Student_ID there will be only one addresss. Therefore the FD will be written as:
Student_ID → Student_address
193
but vice-versa is not true, because several students can live against one address.
Example 2:
In an Employee relation; Social Security Number determines Employee Name and Salary,
because correspond to one SSN there will be only one Emp_name and Salary. Therefore the FD
will be written as:
SSN → (Emp_Name, Salary)
Additionally, the above can be read as:
SSN → EmpName and SSN Salary
but vice-versa is not true, because against several employee can have the same name and salary.
Example 3: In this relation
A B
1 1
2 4
3 9
4 16
2 4
7 9
8 10
Since for reach value of A, there is associated one and only one value for B. Hence
A→B
But, in the following relation
194
A B
1 1
2 4
3 9
4 16
2 4
7 9
8 9
As per the definition of Functional dependency, ―An attribute in a relational model is said to be
functionally dependent on another attribute in the table if it can take only one value for a given
value of the attribute upon which it is functionally dependent.‖ Since for A = 3 there is
associated more than one value of B.
Example: Consider the database having following tables:
Supplier Table
Sno Sname Status City
S1 Sumit 20 Panipat
S2 Ankit 10 Amritsar
S3 Amit 10 Amritsar
Part Table
Pno Pname Color Weight City
P1 Nut Red 12 Panipat
195
P2 Bolt Green 17 Amritsar
P3 Screw Blue 17 Amritsar
P4 Screw Red 14 Panipat
Shipment Table
SNo PNo Qty
S1 P1 270
S1 P2 300
S1 P3 700
S2 P1 270
S2 P2 700
S2 P2 300
Here in Supplier table
Sno - Supplier number of supplier that is unique
Sname - Supplier name
City - City of the supplier
Status - Status of the city e.g. A grade cities may have status 10, B grade cities
may have status 20 and so on.
Here, Sname is FD on Sno. Because, Sname can take only one value for the given value of Sno
(e.g. S1) or in other words there must be one Sname for supplier number S1.
FD is represented as:
Sno→ Sname
FD is shown by à which means that Sname is functionally dependent on Sno.
196
Similarly, city and status are also FD on Sno, because for each value of Sno there will be only
one city and status.
FD is represented as:
Sno → City
Sno → Status
S. Sno → S (Sname, City, Status)
Consider another database of shipment with following attributes:
Sno - Supplier number of the supplier
Pno - Part number supplied by supplier
Qty - Quantity supplied by supplier for a particular Part no
In this case Qty is FD on combination of Sno, Pno because each combination of Sno and Pno
results only for one Quantity.
SP (Sno, Pno) → SP.QTY
Dependency Diagrams
A dependency diagram consists of the attribute names and all functional dependencies in a given
table. The dependency diagram of Supplier table is.
Sno City
Sname Status
Here, following functional dependencies exist in supplier table
Sno - Sname
197
Sname - Sno
Sno - City
Sno - Status
Sname - City
Sname - Status
City - Status
The FD diagram of relation P is
Pname
Color
Pno
Weight
Here following functional dependencies exist in Part table:
Pno - Pname
Pno - Color
Pno - Weight
The FD diagram of relation Shipment is:
Sno
Qty
Pno
198
Here following functional dependencies exist in Shipment table is:
SP (Sno, Pno) - SP.QTY
The two most important things to remember about functional dependency (fd) are:
(1) Fd‘s are determined by the meaning of the attributes and their role in the "real world" which
is being modeled by the database.
(2) Fd‘s are in turn used to group the attributes together to form the normalized relations of the
database.
Consider the attributes:
C Class (of a course)
T Time for the class
R Room for the class
I Instructor of the class
Which describe the arrangement of room and time for classes taught by the instructors.
These attributes are used to model part of the "real world" in which we have classes at certain
time in certain room lectured by certain instructors. We have certain constraints about the objects
(such as class, time, etc.) in the "real world" and such constraints are in turn represented in terms
of functional dependencies.
The constraints are:
(1) No two instructors teach the same course.
(2) At any time and in a given room, there is at most one class being taught there.
(3) No class can be taught at one given time in two rooms.
(4) No instructor can teach two classes at one given time.
The functional dependencies are:
199
(a) C → I
(b) TR → C
(c) CT → R
(d) IT →C
Given below is a database which has some tuples violating the above functional dependencies.
Class Instructor Time Room
CS-DE-14 KAVITA 1:30/ M 101
CS-DE-20 BABITA 3:00/ T 205
CS-DE-14 SUMIT 2:00/F 310 (a)
MCA-24-50 AMIT 3:00/T 205 (b)
CS-DE-20 BABITA 3:00/T 208 (c)
CS-DE-17 KAVITA 1:30/M 202 (d)
It is easy to see that the tuples marked (a), (b), (c), (d) violate the functional dependencies (a),
(b), (c), (d) respectively.
8.3.2 Importance of Dependencies
Database dependencies are important to understand because they provide the basic building
blocks used in database normalization. For example:
 For a table to be in second normal form (2NF), there must be no case of a non-prime
attribute in the table that is functionally dependendent upon a subset of a candidate key.
 For a table to be in third normal form (3NF), every non-prime attribute must have a non-
transitive functional dependency on every candidate key.
 For a table to be in Boyce-Codd Normal Form (BCNF), every functional dependency
(other than trivial dependencies) must be on a super key.
200
 For a table to be in fourth normal form (4NF), it must have no multivalued dependencies.
8.3.3 Types of Functional Dependency:
Functional dependencies can be classified as follows:
(i) Full Functional Dependency:
When all the non-key attributes of a relation ‗R‘ are dependent on the key attributes, the
functional dependency is called as full functional dependency.
The term full functional dependency is used to describe the minimum set of attributes in the
determinant of an FD. The rules for full functional dependency are that if the set of attributes Y
are to be fully dependent on the set of attributes X, the following must hold:
 Y is functionally dependent on X, and
 Y is not functionally dependent on any subset of X
Example 1:
Let A and B are two attributes of a relation ‗R‘, where B is a non-key attribute which is
functionally dependent on A (key attribute), but not on any proper subset of A. i.e. A→ B. If we
remove any attribute from the relation ‗R‘ then it will violate the concept of functional
dependency.
Example 2:
Let there is a relation Employee with attributes (Emp_id, Emp_name, Emp_addr, Emp_phone).
Here we can see that the emp_id is a primary key attribute and all other attributes are non-key
attributes which are fully functional dependent of primary key. We can say that the relation is in
the form of full functional dependent.
201
Emp_name
Emp_addr
Emp_id
Emp_phone
Figure 8.1: Full Functional Dependent
(ii) Partial Functional Dependency:
Partial Functional Dependency indicates that if A and B are the attributes of a Relation ‗R‘. B is
Partial Functional Dependent on A (A→ B) if there is some attribute that can be removed from A
and dependency still holds.
Example: Let there is a relation Employee _project with attributes (Ecode, Pcode and Dept).
Ecode and Pcode are composite key attribute and Dept is a non-key attribute. Here
Ecode,PCode→ Dept, states that dept is functional dependent upon composite key attribute. If
we remove pcode from composite key and still Ecode → Dept exists then we can say that the
relation is Partial Functional Dependent.
"Given a relation R, attribute B of R is fully functionally dependent on attribute A of R if it is
functionally dependent on A and not functionally dependent on any subset of A (A must be
composite)".
Figure 8.3: Functional dependencies in the Transaction relation
202
For the Transaction relation, we may now say that:
Cname is fully functionally dependent on C#
Ccity is fully functionally dependent on C#
Cphone is fully functionally dependent on C#
Qnt is fully functionally dependent on (C#, P#, Date)
Cname is not fully functionally dependent on (C#, P#, Date), it is only partially dependent on it
(and similarly for Ccity and Cphone).
(iii) Transitive Functional Dependency:
A transitive functional dependency can occur only in a relation that has three or more attributes.
Let A, B, C are the three attributes in a Relation ‗R‘. Suppose all three of the following
conditions holds:
 A→ B
 It is not the case that B → A
 B → C, Then it presumes that A→ C
(iv) Multivalued Dependency (→→):
A Multi-Value Dependency (MVD) occurs when two or more independent multi valued facts
about the same attribute occur within the same table. It means that if in a relation R having A, B
and C as attributes, B and C are multi-value facts about A, which is represented as A →B and
A→C, then multi value dependency exist only if B and C are independent on each other.
The formal definition of multivalued dependency (X →→Y) is:
If t1 and t2 are tuples such that t1.X = t2.X, then there are tuples t3 and t4 such that
203
1. t1.X = t3.X = t4.X
2. t1.Y = t3.Y and t2.Y = t4.Y
3. t1.Z = t4.Z and t2.Z = t3.Z
where Z = R - (X U Y)
Examples: For example, imagine a car company that manufactures many models of car, but
always makes both red and blue colors of each model. If you have a table that contains the model
name, color and year of each car the company manufactures, there is a multivalued dependency
in that table. If there is a row for a certain model name and year in blue, there must also be a
similar row corresponding to the red version of that same car.
(v) Trivial and Non-Trivial Functional Dependency:
A functional dependency X→Y is said to be a trivial Functional Dependency if Y, the right hand
side of the Functional Dependency, is a subset of X.
Example: The functional dependency (Ecode, Pcode)→Ecode is trivial because the set{Ecode}
(for the R.H.S. of the functional dependency) is a subset of (Ecode, Pcode) (for the L.H.S. of the
functional dependency).
On the other hand, the functional dependency (Ecode, Pcode) → Ecode, Ename is NON-trivial
because the set {Ecode, Ename } is NOT a subset of the attribute set {Ecode, Pcode }.
8.3.4 Closure Set of Functional Dependencies
Let a relation ‗R‘ have some functional dependencies ‗F‘ specified. The closure of F (usually
written as F+) is the set of all functional dependencies that may be logically derived from ‗F‘.
Often ‗F‘ is the set of most obvious and important functional dependencies and F+, the closure,
is the set of all the functional dependencies including F and those that can be deduced from F.
204
The closure is important and may, for example, be needed in finding one or more candidate keys
of the relation.
To determine the set X+ of attributes that are functionally determined by X based on ―R‖, X+ is
called the closure of X under R. Algorithm to find X+ is given as below:
Input: A relation schema R, a set of functional dependencies F and a set of attributes X.
Output: X+.
(1) X+ ¬ X.
(2) Repeat
found ¬ false;
for (each fd Y ® Z ÎF) do
if (Y Í X+)
then begin
found ¬ true;
X+ ¬ X+ È Z;
remove Y ® Z from further consideration;
end;
until (found = false) or (X+ = all attributes).
8.3.5 Armstrong’s Axioms
To determine a systematic way to infer dependencies, we must discover a set of inference rules
that can be used to infer new dependencies from a given set of dependencies. William W.
Armstrong (1974) established a set of rules which can be sued to infer the functional
dependencies in a relational database. Such rules are given below:
205
Table 8.1 Inference Rules
Inference
Axiom Name Axiom Example
Rule
if a is set of attributes, b
IR1 Reflexivity SSN,Name → SSN
⊆ a, then a →b
if a→ b holds and c is a SSN → Name then
IR2 Augmentation set of attributes, then SSN,Phone → Name,
ca→cb Phone
if a →b holds and b→c SSN →Zip and Zip →

IR3 Transitivity
holds, then a→ c holds City then SSN →City
SSN→Name and
if a → b and a → c
IR4 Union or Additivity SSN→Zip then
holds then a→ bc holds
SSN→Name,Zip
SSN→Name,Zip then
Decomposition or if a → bc holds then
IR5 SSN→Name and
Projectivity a → b and a → c holds
SSN→Zip
Address → Project and
if a → b and cb → d Project,Date →Amount

IR6 Pseudotransitivity
hold then ac → d holds then Address,Date →
Amount
ab→ c does NOT imply

(NOTE)
a → b and b → c
206
 Inference rules IR1 through IR3 are known as Armstrong’s inference rules.
 Inference rules IR1 through IR3 are sound and complete.
 By sound, we mean that, given a set of functional dependencies F specified on a relation
schema R, any dependency that we can infer from F by using IR1 through IR3 holds in every
relation state r of R that satisfies the dependencies in F.
 By complete, we mean that using IR1 through IR3 repeatedly to infer dependencies until no
more dependencies can be inferred results in the complete set of all possible dependencies
that can be inferred from F, that is, the closure of F.
To determine F+ (as shown in section 8.3.4), we need rules for deriving all functional
dependencies that are implied by F. A set of rules that may be used to infer additional
dependencies was proposed by Armstrong in 1974 as shown in Table 8.1. These rules (or
axioms) are a complete set of rules in· that all possible functional dependencies may be derived
from them. The rules are explained as below:
1. Reflexivity Rule: If X is a set of attributes and Y is a subset of X, then X →Y holds.
The reflexivity rule is the simplest (almost trivial) rule. It states that each subset of X is
functionally dependent on X. In other words trivial dependence is defined as follows:
Trivial functional dependency: A trivial functional dependency is a functional dependency of an
attribute on a superset of itself.
For example: {Employee ID, Employee Address} → {Employee Address} is trivial, here
{Employee Address} is a subset of {Employee ID, Employee Address}.
2. Augmentation Rule: If X → Y holds and W is a set of attributes, and then WX → WY holds.
The argumentation ('u rule is also quite simple. It states that if Y is determined by X then a set of
attributes W and Y together will be determined by W and X together. Note that we use the
207
notation WX to mean the collection of all attributes in W and X and write WX rather than the
more conventional (W, X) for convenience.
For example: Rno - Name; Class and Marks is a set of attributes and act as
W. Then· {Rno, Class, Marks} →{Name, Class, Marks}
3. Transitivity Rule: If X →Y and Y →Z hold, then X →Z holds.
The transitivity rule is perhaps the most important one. It states that if X functionally determines
Y and Y functionally determine Z then X functionally determines Z.
For example: Rno →City and City →Status, then Rno →Status should be holding true.
These rules are called Armstrong's Axioms.
Further axioms may be derived from the above although the above three axioms are sound and
complete in that they do not generate any incorrect functional dependencies (soundness) and they
do generate all possible functional dependencies that can be inferred from F (completeness). The
most important additional axioms are:
a. Union Rule: If X →Y and X →Z hold, then X →YZ holds.
Proof: Using Armstrong's Axioms:
1. X→Y , Given
2. X→Z, Given
3. X→XZ, Augment 2 by X
4. XZ→Y Z, Augment 1 by Z
5. X→Y Z, Transitivity using 3 and 4.
b. Decomposition Rule: If X → YZ holds, then so do X → Y and X → Z.
Proof: Using Armstrong's Axioms:
1. X→Y Z, Given
208
2. Y Z→Y , Reexivity
3. X→Y , Transitivity on 1 and 2.
Similar proof for X→Z.
c. Pseudotransitivity Rule: If X → Y and WY → Z hold then so does WX →Z.
Based on the above axioms and the .functional dependencies specified for relation student, we
may write a large number of functional dependencies. Some of these are:
( sno, cno) → sno (Rule 1)
(sno, cno) → cno (Rule 1)
(sno, cno) → (Sname, cname) (Rule 2)
cno → office (Rule 3)
sno → (Sname, address) (Union Rule) Etc.
Often a very large list of dependencies can be derived from a given set F since Rule 1 itself will
lead to a large number of dependencies. Since we have seven attributes (sno, Sname, address,
cno, cname, instructor, office), there are 128 (that is, 2^7) subsets of these attributes. These 128
subsets could form 128 values of X in functional dependencies of the type X ~ Y. Of course,
each value of X will then be associated with a number of values for Y (Y being a subset of x)
Leading to several thousand dependencies. These large numbers of dependencies are not
particularly helpful in achieving our aim of normalizing relations.
Although we could follow the present procedure and compute the closure of F to find all the
functional dependencies, the computation requires exponential time and the list of dependencies
is often very large and therefore not very useful. There are two possible approaches that can be
taken to avoid dealing with the large number of dependencies in the closure. 'One' is to deal with
one attribute or a set of attributes at a time and find its closure (i.e. all functional dependencies
209
relating to them). The aim of this exercise is to find what attributes depend on a given set of
attributes and therefore ought to be together. The other approach is to find the minimal· covers.
8.3.6 Minimal Functional Dependencies or Irreducible Set of Dependencies
In discussing the concept of equivalent FDs, it is useful to define the concept of minimal
functional dependencies or minimal cover which is useful in eliminating necessary functional
dependencies so that only the minimal numbers of dependencies need to be enforced by the
system. The concept of minimal cover of F is sometimes called irreducible Set of F.
A functional depending set S is irreducible if the set has three following properties:
 Each right set of a functional dependency of S contains only one attribute.
 Each left set of a functional dependency of S is irreducible. It means that reducing anyone
attribute from left set will change the content of S (S will lose some information).
 Reducing any functional dependency will change the content of S.
Sets of functional dependencies with these properties are also called canonical or minimal.
Further, we can define minimal cover as:
Let F1 and F2 be two sets of functional dependencies. If F1 º F2, then we say the F1 is a cover of
F2 and F2 is a cover of F1. We also say that F1 covers F2 and vice versa. It is easy to show that
every set of functional dependencies F is covered by a set of functional dependencies G, in
which the right hand side of each fd has only one attribute.
We say a set of dependencies F is minimal if:
(1) Every right hand side of each fd in F is a single attribute.
(2) The left hand side of each fd does not have any redundant attribute, i.e., for every fd X → A
in F where X is a composite attribute, and for any proper subset Z of X, the functional
dependency Z→A € F+.
210
(3) F is reduced (without redundant fd‘s). This means that for every X → A in F, the set F - {X
→ A} is NOT equivalent to F.
It is easy to see that for each set F of functional dependencies, there exists a set of functional
dependencies F‘ such that F = F′ and F′ is minimal. We call such F′ a minimal cover of F.
The algorithm to compute F’, a minimal cover of F.
Input: F, a set of fd‘s.
output: F‘, a minimal cover of F.
(1) Let F‘ = {X→A | X→A ЄF and A is a single attribute}. For each fd X → A1,A2, ... An Є F
(n > 1), put the fd‘s X→ A1, X→ A2, ..., X → An into F‘, where Ai is a single attribute.
(2) While
there is an fd X → AЄ F‘ such that X is a composite attribute and Z is a proper subset of X and
Z → AЄ(F‘)+,
do
replace X → A with Z → A.
(3) For each fd X → AЄF‘, check if it is redundant, eliminates it.
It is important to note that for the above algorithm, the ordering between step (2) and step (3) is
critical.If you first perform step (3) and then perform step (2) of the algorithm, the resulting set
of fd‘s may still have redundant functional dependencies.
It should be pointed out that for a set of functional dependencies F, there may be more than one
minimal covers of F.
Example: (Computing a minimal cover.)
Let R = R(ABCDEGH) and F = {CD → AB, C → D, D → EH, AE → C, A → C, B → D}. The
process of computing a minimal cover of F is as follows:
211
(1) Break down the right hand side of each fd’s. After performing step (1) in the algorithm, we
get F‘ = {CD → A, CD → B, C → D, D → E, D → H, AE → C, A → C, B → D}.
(2) Eliminate redundancy in the left hand side. The fd CD → A is replaced by C → A. This is
because C → D Є(F‘)+, hence C → CD Є(F‘)+; from C → CD Є(F‘)+ and CD → A ЄF‘, by
transitivity, we have C → A Є(F‘)+ and hence CD → A should be replaced by C → A.
Similarly, CD→ B is replaced by C → B, AE → C is replaced by A → C. F‘ = {C→A, C → B,
C → D, D → E, D → H, A → C, B → D} after step (2).
(3) Remove redundant fd’s. The fd C → D is eliminated because it can be derived from C → B
and B → D and hence it is redundant. The F‘ now becomes {C → A, C → B, D → E, D → H, A
→ C, B → D}, which is the only minimal cover of F.
Example: (Computing a minimal cover.)
Let the relation R be R(ABCDE) and the set of functional dependencies be F = {AB → C,
ABC→ D, AE→ BC, BC → AE}.
We compute a minimal cover of F in the following steps:
(1) Break down the right hand side of each fd’s. F‘ = {AB → C, ABC → D, AE → B,
AE → C, BC → A, BC → E}.
(2) Eliminate redundancy in the left hand side. The fd ABC → D is replaced by AB → D
because AB+ = ABCDE, and hence the attribute C in fd ABC → D is redundant. Note that we
could also replace ABC -> D by BC → D, because BC+ = BCADE. But we do NOT need to
include both AB → D and BC → D in F‘: one of the two is sufficient. No other composite left
hand side of fd‘s can be reduced further, and thus we get F‘ = {AB → C, AB → D, AE → B,
AE → C, BC → A, BC → E}.
212
(3) Remove redundant fd’s. The fd AE → C is redundant because we can derive it from AE →
B and AB → C. Thus AE → C is removed. No other fd is redundant. The final minimal cover of
F is Fmin = {AB → C, AB → D, AE → B, BC → A, BC → E}.
Note: If we choose to replace ABC → D by BC→D in step (2) above, we would get an
alternative minimal cover of F.
8.4 Summary
A functional dependency (FD) is a many-to-many relationship between two sets of attributes of a
given relation. It is a kind of integrity constraint that generalizes the concepts of a key. Let X and
Y are two attributes of a relation. Given the value of X, if there is only one value of Y
corresponding to it, then Y is said to be functionally dependent on X. A functional dependency is
a property of the relation schema R, not of a particular legal relation state ‗r‘ of ‗R‘. Hence a FD
cannot be inferred automatically from a given relation state ‗r‘ but must be defined explicitly by
someone who knows the semantics of the attributes of ‗R‘.
5. Suggested Reading/ Reference Material
New Delhi.
Edition.
3. Abbey, Abramson & Corey: Oracle 8i-A Beginner's Guide, Tata McGraw Hill Publishing
Company Ltd.
4. S.K.Singh: Database Systems Concept, Design and Applications, 2006, Pearson Education,
ISBN: 81-7758-567-3.
213
6. Self Assessment Questions (SAQ)
1. What is Functional Dependency? Define the term full and partial functional dependency
with suitable examples.
2. Define closure set of functional dependency with suitable examples.
3. What do you mean by functional dependency? Discuss the different type of functional
dependency.
4. State Armstrong‘s Inference rules. How transitivity can be achieved? Explain.
5. Write a short note on Minimal Functional Dependencies.
214
Chapter – 9: Normalization
Structure:
9.1 Introduction
9.2 Objective
9.3.1 Bad Database Design
9.3.2 Database Anomalies
9.3.3 Normalization
(i) Rules of Normalization
9.3.4 Normal Forms
(i) First Normal Form (1NF)
(ii) Second Normal Form (2NF)
(iii) Third Normal Form (3NF)
(iv) Boycee- Codd Normal Form (BCNF)
9.3.5 Example First and Second Normal Form
9.4 Summary
215
9.1 Introduction
Normalization is a rigorous design tool that is based on the mathematical theory of relations
which will result in very practical operational implementations. A properly normalized set of
relations actually simplifies the retrieval and maintenance processes and the effort spent in
ensuring good structures is certainly a worthwhile investment. Furthermore, if database relations
were simply seen as file structures of some vague file system, then the power and flexibility of
RDBMS cannot be exploited to the full. Good database design needless to say, is important.
Therefore to have an appropriate database design, Normalization is a systematic way of ensuring
that a database structure is suitable for general purpose querying and free from database
anomalies. Dr. E.F. Codd, the inventor of the relational model, introduced the concept of
normalization and what we now know as the first normal form in 1970. Dr. Codd went on to
define second and third normal forms in 1971, and Codd and Raymond F. Boycee defined the
Boycee-Codd normal form in 1974.
9.2 Objective
Normalization is a logical database design that involves organizing the data into more than one
table. Normalization improves the performance by reducing redundancy in database tables. The
basic objectives of normalization are to reduce redundancy, which means that information is to
be stored only once in a relation. Storing information several times leads to wastage of storage
space and increase in the total size of the data stored. There are certain goals of normalization
process:
 Eliminating redundant data
 Ensuring data dependencies by reducing the amount of space required by relation in
database and ensure that such relations are logically stored.
216
 Eliminating the columns that are not dependent on key attribute.
9.3.1 Bad Database Design
E.Codd has identified certain structural features in a relation which create retrieval and update
problems. Suppose we start off with a relation with a structure and details like:
This is a simple and straightforward design. It consists of one relation where we have a single
tuple for every customer and under that customer we keep all his transaction records about parts,
up to a possible maximum of 9 transactions. For every new transaction, we need not repeat the
customer details (of name, city and telephone), we simply add on a transaction detail.
However, we note the following disadvantages:
 The relation is wide and clumsy
 We have set a limit of 9 (or whatever reasonable value) transactions per customer. What
if a customer has more than 9 transactions?
 For customers with less than 9 transactions, it appears that we have to store null values in
the remaining spaces. What a waste of space!
 The transactions appear to be kept in ascending order of P#s. What if we have to delete,
for customer Codd, the part numbered 1- should we move the part numbered 2 up (or
217
rather, left)? If we did, what if we decide later to re-insert part 2? The additions and
deletions can cause awkward data shuffling.
Let us try to construct a query to "Find which customer(s) bought P# 2" ? The query would have
to access every customer tuple and for each tuple, examine every of its transaction looking for
(P1# = 2) OR (P2# = 2) OR (P3# = 2) ... OR (P9# = 2)
A comparatively simple query seems to require a clumsy retrieval formulation!
Alternatively, why don't we re-structure our relation such that we do not restrict the number of
transactions per customer. We can do this with the following structure:
This way, a customer can have just any number of Part transactions without worrying about any
upper limit or wasted space through null values (as it was with the previous structure).
Constructing a query to "Find which customer(s) bought P# 2" is not as cumbersome as before as
one can now simply state: P# = 2.
But again, this structure is not without its faults:
 It seems a waste of storage to keep repeated values of Cname, Ccity and Cphone.
 If C# 1 were to change his telephone number, we would have to ensure that we update ALL
occurrences of C# 1's Cphone values. This means updating tuple 1, tuple 2 and all other
tuples where there is an occurrence of C# 1. Otherwise, our database would be left in an
inconsistent state.
 Suppose we now have a new customer with C# 4. However, there is no part transaction yet
with the customer as he has not ordered anything yet. We may find that we cannot insert this
218
new information because we do not have a P# which serves as part of the 'primary key' of a
tuple. (A primary key cannot have null values).
Suppose the third transaction has been canceled, i.e. we no longer need information about 25 of
P# 1 being ordered on 26 Jan. We thus delete the third tuple. We are then left with the following
relation:
But then, suppose we need information about the customer "Martin", say the city he is located in.
Unfortunately as information about Martin was held in only that tuple and having the entire tuple
deleted because of its P# transaction, meant also that we have lost all information about Martin
from the relation.
As illustrated in the above instances, we note that badly designed, un-normalized relations waste
storage space. Worse, they give rise to database anomalies:
9.3.2 Database Anomalies
A serious problem with the relation as base relation is the problem of suffering from anomalies
such as insertion, deletion and update anomalies as explained below. To understand these
anomalies let us consider a relation ‗Deptt‘ with attributes i.e. {Deptt_id, Deptt_name,
Deptt_course)
Deptt
Deptt_id Deptt_name Deptt_course
101 DCSA MCA
219
102 DCSA M.SC
201 UIET CSE
202 UIET IT
203 IMS MBA
 Insertion Anomalies: It may not be possible to store information unless some other
information is stored as well e.g. if Deptt_name offered one more Deptt_course i.e. M.Phil.,
We cannot enter this data into the table until a student opt for this course.
 Deletion Anomalies: It may not be possible to delete some information without loosing
some other information as well. e.g. if department want to close a specific course, but we
cannot do so until all the students offered that course are deleted respectively.
 Update Anomalies: If one copy of such repeated data is updated, an inconsistency is created
unless all copies are similarly updated e.g. if we want to change the name of a department,
but we cannot do so, until the name is respectively changed to other tables storing such data
as well.
9.3.3 Normalization
Normalization is a step by step process of removing redundancies of attributes in data structure.
It is a technique used to design relational database. In order to design a relational model we have
to decide the logical structure of the database.
Normalization is a process in which data can be defined as a process during which redundant
relational schema are decomposed into smaller groups.
220
Normalization is typically a refinement process after the initial exercise of identifying the data
objects that should be in the database, identifying their relationships and defining the tables
required and the columns within each table.
(i) Rules of Normalization
Normalization is a specific relational database analysis and design technique used to model
groups of related data within an organization. Its purpose is to ensure data stored within the
database adheres to best practices by following a set of rules with the purpose of eliminating
redundancies and optimizing the process of information retrieval. Normalization leaves us with a
structure that groups like data into relational models referenced by keys and linked to other
relational models to form a relational database schema.
Normalization is represented by a logical set of steps that follow simple rules that are applied to
each stage of the modeling process. At the highest level the stages are separated into something
called Normal Forms, identified by a particular named process.
Initially there were only three normal forms, First Normal Form (1NF), Second Normal Form
(2NF) and Third Normal Form (3NF), but over time three more were added. In general terms the
first three are more commonly used in database modeling. The additional three are identification
of potential redundancies that could be considered but however when applied practically can lead
to inefficiencies in performance and tend to be used under special circumstances or for
consideration with complex data structures.
In addition we have something called Un-Normalized Form (UNF), though not generally
considered as part of the Normalization rules, is representative of the very first stages of the
Normalization process.
We can identify each of the normal forms as follows and will define each in detail thereafter:
221
1. Un-Normalized Form (UNF) – Data Modeling
2. First Normal Form (1NF) – Repeating Groups
Figure 9.1: Normalization Process
3. Second Normal Form (2NF) – Partial Dependencies
4. Third Normal Form (3NF) – Transitive Dependencies
Normalization helps to simplify the structure of tables. The performance of an application is
directly linked to the database design. Some rules that should be followed to achieve a good
database design are:
 Each table should have an identifier (Primary key attribute).
222
 Each table should store data for a single type of entity.
 Null value in the column should be avoided.
 The repetition of value in the column should be avoided.
Normalization depends on certain specified constraints, and rules that support the codd‘s
RDBMS rules. One of the constraints between two sets of attributes from the database is the
Functional Dependency as discussed in Chapter 8. We must be aware about it before proceeding
towards normalization.
The Normal forms are applicable to individual tables; to say that an entire database is, in normal
form is to say that all of its tables are in the normal form. In the upcoming sections we are
providing a detailed concept about normal forms based on primary key.
9.3.4 Normal Forms
Normalization works through a series of stages called normal forms. In a good database design
we need some guidance to decompose the relation into smaller relation. To provide such,
guidance several normal forms have been proposed. The normal forms based on functional
dependency are first normal form (1NF), second normal form (2NF), third normal form (3NF)
and Boycee-Codd normal Form (BCNF). These all normal forms are based on Primary key.
(i) First Normal Form (1NF)
IT states that the domain of an attribute must include only atomic value and that the value of any
attribute in a tuple must be a single value from the domain of that attribute. Hence, 1NF
disallows having a set of values, a tuple of values or combination of both as an attribute value for
a single tuple.
In other words, ―A relation is said to be in the first normal form, if every attribute of that relation
store atomic or indivisible values in every tuple.‖
223
Consider the following non-normalized relation.
Employee
EmpId EmpName Project
101 Arpit Networking, Software Engineering, Operating
System.
102 Aryan Management Information System, Marketing
103 Satvik System Analyst and Programming
The above relation does not fulfill the definition of first normal form. Because the attribute
{Project} is a multi-valued attribute. There are three techniques to achieve this non-normalized
relation into first normal form.
1. Horizontal Expansion:
Expand the number of attribute if the maximum number of values is known for an attribute. For
example if it is known that at the most three projects can be allocated to one employee. Then we
can create new attributes as {Project1, Project2, Project3} for every value of Project attribute.
The above relation can be normalized as below:
Employee
EmpId EmpName Project1 Project2 Project3
101 Arpit Networking Software Operating
Engineering System
102 Aryan Management Marketing Nul
Information
System
224
103 Satvik System Analyst Null Null
and
Programming
This solution has the disadvantage of introducing null values, if most employees has fewer
projects than three as mentioned. Therefore this solution is rejected.
2. Vertical Expansion:
Expand the key attribute so that there will be separate tuple in the original relation for each value
of the project.
Employee
EmpId EmpName Project1
101 Arpit Networking
101 Arpit Software Engineering
101 Arpit Operating System
102 Aryan Management Information System
102 Aryan Marketing
103 Satvik System Analyst and Programming
This solution has the disadvantage of introducing redundant data in the relation. Therefore this
solution is also rejected.
3. Decompose the Relation:
To convert the non-normalized relation into normalized table, we may remove the attribute that
violates the definition of 1NF and place that attribute in a separate relation along with the
primary key. The Primary key of this relation is an attribute or a set of attribute that uniquely
225
define a tuple in a relation. This decomposes the non-normalized relation into two relations
namely Emp_Proj and Proj_Info fulfilling the definition of 1NF.
Emp_Proj
EmpId EmpName ProjectId
101 Arpit 10
101 Arpit 11
101 Arpit 12
102 Aryan 13
102 Aryan 14
103 Satvik 15
Proj_Info
ProjectId Project_Name
10 Networking
11 Software Engineering
12 Operating System
13 Management Information System
14 Marketing
15 System Analyst and Programming
In the above three techniques, the third one is superior because it does not have redundancy and
it is completely general and follows the rules of First Normal Form.
226
(ii) Second Normal Form (2NF)
A relation is said to be in the second normal form, if it is in the first normal form and non-key
attributes are fully functional dependent on the key attribute. Concept of Full Functional
Dependency and Partial Functional Dependency are already explained in Chapter 8 Section 3.2.
Consider the non-normalized relation ―Order‖ as follows:
Order
Orderno ItemNo Orderdate Units
100 1000 01/01/2014 200
101 1000 02/01/2014 250
102 1002 03/01/2014 300
103 1003 04/01/2014 250
In the above relation ―Order‖, the attributes Orderno and Itemno is a composite key attributes
and the attributes Orderdate and units are the non-key attributes. In this relation Orderdate is
functionally dependent on Orderno, because for each tuple of Orderno we have unique value of
Orderdate. But, for each value of Itemno, there is more than one value of Orderdate. For
example, for attribute Itemno, value 1000, we have two values of Orderdate i.e. ‗01/01/2014‘ and
‗02/01/2014‘. Hence Orderdate is not functionally dependent on Itemno. Therefore, this relation
―Order‖ is not in second normal form. For the relation to be in second normal form, the non-key
attributes must be fully functional dependent on the whole of the primary key. To cover this non-
normalized relation into normalized relation, following steps are performed.
 Find and remove the attribute that are functionally dependent on only a part of the key
and not on the whole key, and place them in the different table.
227
 Group the remaining attributes.
To convert this relation into 2NF, we must remove the attributes that are not fully functional
dependent on whole key and place them in a different table along with the attribute that is
functionally dependent on. In the above example since Orderdate is not fully functional
dependant on whole of the key i.e. Orderno + Itemno, therefore we can place Orderdate along
with Orderno in a separate table called Orderinfo, and the attributes Orderno, Itemno and Units
in a separate relation called as Iteminfo.
Orderinfo
Orderno Orderdate
100 01/01/2014
101 02/01/2014
102 03/01/2014
103 04/01/2014
Iteminfo
Orderno Itemno Units
100 1000 200
101 1000 250
102 1002 300
103 1003 250
Hence the resultant relation fulfills the definition of second normal form. Therefore normalized
relations are achieved as above.
228
(iii) Third Normal Form (3NF)
A relation is said to be in third normal form when it is already in the second normal form and if
all the non key attributes of the relation are independent of all other non-key fields of the same
table. In other words, it requires that data stored in a table should be transitively functionally
dependent only on the primary key, and not on any other field in the table. Concept of
transitively functional dependent is already explained in Chapter 8 Section 3.2.
Hostel
IdNo Name Department Course Hostelno
100 Puneet Computers MCA 1
200 Vinod Computers MSc 2
300 Arman Science BSc 3
400 Jayshree Management MBA 4
In this relation, IdNo is the primary key attribute and all other non-key attributes should be
functionally dependent on it. So, it is in the second normal form. It is clear that all non-key
attribute are functionally dependent on the primary key attribute. Also a non-key attribute i.e.
Course is finctional dependent on other non-key attribute i.e. Hostelno. Therefore this relation is
not in 3NF. Therefore to cover this non-normalized relation into normalized relation, following
steps are performed.
 Find and remove the attributes that are functionally dependent on attributes that are not
the primary key, and place them in the different table.
 Group the remaining attributes.
229
Student_Detail
IdNo Name Department Course
100 Puneet Computers MCA
200 Vinod Computers MSc
300 Arman Science BSc
400 Jayshree Management MBA
Course
Course Hostelno
MCA 1
MSc 2
BSc 3
MBA 4
Hence the resultant relation fulfills the definition of third normal form. Therefore normalized
relations are achieved as above.
(iv) Boycee-Codd Normal Form (BCNF)
The BCNF was introduced as the simpler form of 3NF, because the 3NF was inadequate in some
situation. It was not satisfactory for the relations:
 That has multiple candidate keys (multi attribute key);
 Where the multiple candidate keys (multi attribute key) are composite; and
 Where the multiple candidate keys (multi attribute key) are overlapped it means it has at
least one attribute in common.
230
A relation is said to be in BCNF; if it is in 3NF and no dependency of an attribute of a multi
attribute key on an attribute of another multi attribute key.
Assume that a relation has more than one possible multi attribute key. Assume further that the
multi attribute keys have a common attribute. If an attribute of a composite key is dependent on
an attribute of other composite key, the relation is not in BCNF.
Consider a relation ―Teacher‖, where a teacher can work in more than one department,
percentage time he spent in each department is given but each department has only one HOD.
Teacher
TeacherId Department HOD PercentTime
100 Computers Rajinder Nath 50
200 Mathematics Anil Vashistha 60
200 Physics M.S. Yadav 40
300 History Mukta Garg 30
In the above relation, TeacherId and Department is a composite key attribute. The attribute HOD
and PercentTime are functionally dependent on composite key attribute. Also TeacherID and
HOD are the composite key attribute. The attribute Department and PercentTime are functional
dependent on this composite key. Further HOD is functional Dependent on Department. Hence
the above relation is not in BCNF.
In order to normalize the relation into BCNF, we have to create a new relation from the old
relation by breaking into two sub-tables i.e. Relation ―Department‖ and ―HOD‖.
231
Department
TeacherId Department PercentTime
100 Computers 50
200 Mathematics 60
200 Physics 40
300 History 30
HOD
Department HOD
Computers Rajinder Nath
Mathematics Anil Vashistha
Physics M.S. Yadav
History Mukta Garg
Hence the resultant relation fulfills the definition of BCNF. Therefore normalized relations are
achieved as above.
9.3.5 Example First and Second Normal Form
Example -1. Assume the following relation
Student-courses (Sid: pk, Sname, Phone, Courses-taken)
Where attribute Sid is the primary key, Sname is student name, Phone is student's phone number
and Courses-taken is a table contains course-id, course-description, credit hours and grade for
each course taken by the student. More precise definition of table Course-taken is:
Course-taken (Course-id: pk, Course-description, Credit-hours, Grade)
232
According to the definition of first normal form relation Student-courses is not in first normal
form because one of its attribute Courses-taken is itself a table and is not a simple attribute.
To clarify it more assume the above tables contain the data as shown below:
Student-courses
Sid Sname Phone Courses-taken
100 John 487 2454 St-100-courses-taken
200 Smith 671 8120 St-200-courses-taken
300 Russell 871 2356 St-300-courses-taken
St-100-Course-taken
Course-id Course-description Credit-hours Grade
IS380 Database Concepts 3 A
IS416 Unix Operating System 3 B
St-200-Course-taken
IS380 Database Concepts 3 B
IS416 Unix Operating System 3 B
IS420 Data Net Work 3 C
St-300-Course-taken
IS417 System Analysis 3 A
Now we will verify the various anomalies.
233
1. Insertion anomaly means that that some data cannot be inserted in the database. For
example we cannot add a new course to the database of example-1, unless we insert a student
who has taken that course.
2. Update anomaly means we have data redundancy in the database and to make any
modification we have to change all copies of the redundant data or else the database will
contain incorrect data. For example in our database we have the Course description
"Database Concepts" for IS380 appears in both St-100-Course-taken and St-200-Course-
taken tables. To change its description to "New Database Concepts" we have to change it in
all places. Indeed one of the purposes of normalization is to eliminate data redundancy in the
database.
3. Deletion anomaly means deleting some data cause other information to be lost. For example
if student Russell is deleted from St-100-Course-taken table we also lose the information that
we had a course call IS417 with description System Analysis.
Thus Student-courses table suffers from all the three anomalies.
To convert the above structure to first normal form relations, all non-simple attributes
must be removed or converted to simple attribute. To do that a new relation is created by
combining each row of Student-courses with all rows of its corresponding course table that was
taken by that specific student. Following is Student-courses table in first normal form.
Student-courses (Sid:pk1, Sname, Phone, Course-id:pk2, Course-description, Credit-hours,
Grade)
To cheque the resultant table fulfills the properties of all Normal form.
234
Notice that the primary key of this table is a composite key made up of two parts; Sid and
Course-id. Note that pk1 following an attribute indicates that the attribute is the first part of the
primary key and pk2 indicates that the attribute is the second part of the primary key.
Student-courses
Sid Sname Phone Course-id Course-description Credit-hours Grade
100 John 487 2454 IS380 Database Concepts 3 A
100 John 487 2454 IS416 Unix Operating System 3 B
200 Smith 671 8120 IS380 Database Concepts 3 B
200 Smith 671 8120 IS416 Unix Operating System 3 B
200 Smith 671 8120 IS420 Data Net Work 3 C
300 Russell 871 2356 IS417 System Analysis 3 A
Examination of the above Student-courses relation reveals that Sid does not uniquely identify a
row (tuple) in the relation hence cannot be the primary key. For the same reason Course-id
cannot be the primary key. However the combination of Sid and Course-id uniquely identifies a
row in Student-courses, Therefore (Sid, Course-id) is the primary key of the above relation.
The primary key determines every attribute. For example if you know both Sid and Course-id for
any student you will be able to retrieve Sname, Phone, Course-description, Credit-hours and
Grade, because these attributes are dependent on the primary key. Figure 1 below is the graphical
representation of the functional dependency between the primary key and attributes of the above
relation.
Note that the attribute to the right of the arrow is functionally dependent on the attribute in the
left of the arrow. Thus the combination (Sid, Course-id) is the determinant (that determines other
235
attributes) and attributes Sname, Phone, Course-description, Credit-hours and Grade are
dependent attributes.
Figure 9.2: Functional Dependency 1
Formally speaking a determinant is an attribute or a group of attributes determine the value of
other attributes. In addition to the (Sid, Course-id) there are two other determinants in the above
Student-courses relation. These are; Sid and Course-id attributes. Note that Sid alone determines
236
both Sname and Phone, and attribute Course-id alone determines both Credit-hours and
Course_description attributes.
Attribute Grade is fully functionally dependent on the primary key (Sid, Course-id) because both
parts of the primary keys are needed to determine Grade. On the other hand both Sname, and
Phone attributes are not fully functionally dependent on the primary key, because only a part of
the primary key namely Sid is needed to determine both Sname and Phone. Also attributes
Credit-hours and Course-Description are not fully functionally dependent on the primary key
because only Course-id is needed to determine their values.
The new relation Student-courses still suffers from all three anomalies for the following reasons:
1. The relation contains redundant data (Note Database_Concepts as the course description
for IS380 appears in more than one place).
2. The relation contains information about two entities Student and course.
Following is the detail description of the anomalies that relation Student-courses suffers from.
1. Insertion anomaly: We cannot add a new course such as IS247 with course description
programming techniques to the database unless we add a student who to take the course.
2. Update anomaly: If we change the course description for IS380 from Database Concepts to
New_Database_Concepts we have to make changes in more than one place or else the
database will be inconsistent. In other words in some places the course description will be
New_Database_Concepts and in any place were we forgot to make the changes the
description still will be Database_Concepts.
3. Deletion anomaly: If student Russell is deleted from the database we also loose information
that we had on course IS417 with description System Analysis.
237
The above observation indicates that having a single table Student-courses for our database
causing problems (anomalies). Therefore we break the table to smaller table to get a higher
normal form relation.
To convert Student-courses to second normal relations we have to make all non-primary
attributes to be fully functionally dependent on the primary key. To do that we need to project
(that is we break it down to two or more relations) Student-courses table into two or more tables.
However projections may cause problems. To avoid such problems it is important to keep
attributes, which are dependent on each other in the same table, when a relation is projected to
smaller relations. Following this principle and examination of Figure-1 indicate that we should
divide Student-courses relation into following three relations:
PROJECT Student-courses ON (Sid, Sname, Phone) creates a table call it Student. The relation
Student will be Student (Sid:pk, Sname, Phone) and
PROJECT Student-courses ON (Sid, Course-id, Grade) creates a table call it Student-grade. The
relation Student-grade will be
Student-grade (Sid:pk1:fk:Student, Course-id::pk2:fk:Courses, Grade) and
Projects Student-courses ON (Course-id, Course-Description, Credit-hours) create a table call it
Courses. Following are these three relations and their contents:
Student (Sid:pk, Sname, Phone)
Sid Sname Phone
100 John 487 2454
200 Smith 671 8120
300 Russell 871 2356
238
Courses (Course-id::pk, Course-Description)
Course-id Course-description Credit-hours
IS380 Database Concepts 3
IS416 Unix Operating System 3
IS420 Data Net Work 3
IS417 System Analysis 3
Student-grade (Sid:pk1:fk:Student, Course-id::pk2:fk:Courses, Grade)
Sid Course-id Grade
100 IS380 A
100 IS416 B
200 IS380 B
200 IS416 B
200 IS420 C
300 IS417 A
All these three relations are in second normal form. Examination of these relations shows that we
have eliminated the redundancy in the database. Now relation Student contains information only
related to the entity student, relation Courses contains information related to entity Courses only,
and the relation Student-grade contains information related to the relationship between these two
entity.
Further these three sets are free from all anomalies.
239
1. Insertion anomaly: Now a new Course with course-id IS247 and Course-description can be
inserted to the table Course. Equally we can add any new students to the database by adding
their id, name and phone to Student table. Therefore our database, which made up of these
three tables does not suffer from insertion anomaly.
2. Update anomaly: Since redundancy of the data was eliminated no update anomaly can
occur. To change the course-description for IS380 only one change is needed in table
Courses.
3. Deletion anomaly: the deletion of student Russell from the database is achieved by deleting
Russell's records from both Student and Student-grade relations and this does not have any
side effect because the course IS417 untouched in the table Courses.
9.4 Summary
The normal forms of relational database theory provide criteria for determining a table‘s degree
of vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable
to a table, the less vulnerable it is to inconsistencies and anomalies. The purpose of
240
normalization is to produce a stable set of relations that is a faithful model for operations of the
enterprises.
New Delhi.
Delhi.
1. Define Normalization? Why do we need to normalize the database?
2. What do you understand by database anomalies? Write the procedure to generate First
normal form.
3. ―Every relation in BCNF is also in 3NF, but a relation in 3NF is not necessarily in
BCNF‖, Comment on the statement.
4. What do you mean by Normalization? Discuss the normal forms based on Primary key.
241
Chapter – 10: An Introduction to MS-Access
Structure:
10.1 Introduction
10.2 Objective
10.3.1 Interface Elements in MS-Office Access 2007
10.3.2 Tool Bars and Their Icons
(i) Getting Started with Microsoft Office Access
(ii) The Ribbon
(iii) Command Tab

10.3.3 Creating a New Database
10.3.4 Creating a Table
(i) Create a Table in Datasheet View
(ii) Create a Table in Design View
(iii) Create a Table Based on a Table Template
10.3.5 Relationship
(i) Creating a Table by Using the Table Wizard
(ii) Creating a Table by Entering Data in a Datasheet
(iii) Creating a Table by Entering Data in Design View
10.3.6 Import/ Export Tables
(i) External Data Operations in Access
(ii) Types of Data That Access Can Import, Link To, Or Export
(iii) Import or Link to Data in another Format
(iv) Export Data to another Format
10.4 Summary
242
10.1 Introduction
Microsoft Access is a relational database management system that comes as a part of Microsoft
Office Suite. Ms- Access is graphical user interface (GUI) application software, which is very
easy; yet powerful to manage large volumes of data. It generally manages data related to
different environments like scientific, inventory, financial, payroll, education, hospitality and
various other environments. MS- Access can be used at a client end or at a server end, in a client
server computing architecture.
10.2 Objective
The extension of MS-Access file is .mdb. In a single file of MS-Access, we can create multiple
database objects i.e. tables, queries, forms, reports, data access pages, macros and modules. MS-
Access 2007 comprises a number of elements that define how we interact with the product.
These elements were chosen to help to find the commands that executes faster. The most
significant interface element in MS- Access 2007 is called the Ribbon. The Ribbon is the strip
across the top of the program window that contains groups of commands. The Office Fluent
Ribbon provides a single home for commands and is the primary replacement for menus and
toolbars. On the Ribbon are tabs that combine commands in ways that make sense. In Office
Access 2007, the main Ribbon tabs are Home, Create, External Data, and Database Tools. Each
tab contains groups of related commands, and these groups surface some of the additional new
GUI elements, such as the gallery, which is a new type of control that presents choices visually.
In the upcoming sections we will discuss some of the important items.
10.3.1 Interface Elements in MS-Office Access 2007
243
 Getting Started with Microsoft Office Access: The page that is displayed when you
start Access from the Windows Start button or from a desktop shortcut.
 The Office Fluent Ribbon: The area at the top of the program window where you can
choose commands.
 Command Tab: Commands are combined in ways that make sense.
 Contextual Command Tab: A command tab that appears depending on your context the
object that you are working on or the task that you are performing.
 Gallery: A control that displays a choice visually so that you can see the results that you
will get.
 Quick Access Toolbar: A single standard toolbar that appears on the Ribbon and offers
single-click access to most needed commands, such as Save and Undo.
 Navigation Pane: The area on the left side of the window that displays your database
objects. The Navigation Pane replaces the Database window from earlier versions of Access.
 Tabbed Documents: Your tables, queries, forms, reports, pages, and macros are displayed
as tabbed documents.
 Status Bar: The bar at the bottom of the program window that displays status information
and includes buttons that allow you to change your view.
 Mini Toolbar: An on-object element that transparently appears above text that you have
selected, so that you can easily apply formatting to the text.
10.3.2 Tool Bars and Their Icons
(i) Getting Started with Microsoft Office Access
When you start Office Access 2007 by clicking the Windows Start button or a desktop shortcut
(but not when you click on a database), the Getting Started with Microsoft Office Access page
244
appears as shown in Figure 10.1. This page shows what you can do to get started in Office
Access 2007.
Figure 10.1: Microsoft Office Access 2007
(ii) The Ribbon
The Office Fluent Ribbon is the primary replacement for menus and toolbars and provides the
main command interface in MS-Office Access 2007. One of the main advantages of the Ribbon
is that it consolidates, in one place, those tasks or entry points that used to require menus,
toolbars, task panes, and other GUI components to display. This way, you have only one place in
which to look for commands, instead of a multitude of places.
When you open a database, the Ribbon appears at the top of the main MS-Office Access 2007
window, where it displays the commands in the active command tab.
The Ribbon contains a series of command tabs that contain commands as shown in Figure 10.2.
In MS-Office Access 2007, the main command tabs are Home, Create, External Data, and
Database Tools. Each tab contains groups of related commands, and these groups surface some
of the additional new GUI elements, such as the gallery, which is a new type of control that
presents choices visually.
245
The commands on the Ribbon take into account the currently active object. For example, if you
have a table opened in Datasheet view and you click Form on the Create tab, in the Forms group,
MS-Office Access 2007 creates the form, based on the active table. That is, the name of the
active table is entered in the form's Record Source property.
Figure 10.2: Ribbon
You can use keyboard shortcuts with the Ribbon. All of the keyboard shortcuts from an earlier
version of MS-Access continue to work. The Keyboard Access System replaces the menu
accelerators from earlier versions of Access. This system uses small indicators with a single
letter or combination of letters that appear on the Ribbon and indicate what keyboard shortcut
actives the control underneath. When you have selected a command tab, you can browse the
commands available within that tab.
(iii) Command Tab
1. Start MS-Access.
2. Click the tab that you want.
The following table shows a representative sampling of the tabs and the commands available on
each tab. The tabs and the commands available change depending on what you are doing.
Table: 10.1: Tabs and Commands
Command Tab Common things you can do
246
Home Select a different view.
Copy and paste from the clipboard.
Set the current font characteristics.
Set the current Font Alignment.
Apply rich text formatting to a memo field.
Work with records (Refresh, New, Save, Delete, Totals, Spelling, More).
Sort and filter records.
Find records.
Create Create a new blank table.
Create a new table using a table template.
Create a list on a SharePoint site and a table in the current database that links
to the newly created list.
Create a new blank table in Design view.
Create a new form based on the active table or query.
Create a new pivot table or chart.
Create a new report based on the active table or query.
Create a new query, macro, module, or class module.
External Data Import or Link to external data.
Export data.
Collect and update data via e-mail.
Work with offline SharePoint lists.
Create saved imports and saved exports.
Move some or all parts of a database to a new or existing SharePoint site.
247
Database Tools Launch the Visual Basic editor or run a macro.
Create and view table relationships.
Show/hide object dependencies or the property sheet
Run the Database Documenter or analyze performance.
Move data to Microsoft SQL Server or to an Access (Tables only) database.
Run the Linked Table Manager.
Manage Access add-ins.
Create or edit a Visual Basic for Applications (VBA) module.
10.3.3 Creating a New Database
In MS- Access we have a variety of options for creating/ opening a database. Such options are
given below. We can open a New Blank Database, can create a New Database from a featured
template, create a new database from a Microsoft Office Online Template, and open a recently
used database. These features are explained as below:
(i) Open a New Blank Database
1. Start Access from the Start menu or from a shortcut. The Getting Started with Microsoft
Office Access page appears.
2. On the Getting Started with Microsoft Office Access page, under New Blank Database,
click Blank Database.
3. In the Blank Database pane, in the File Name box, type a file name or use the one that is
provided for you.
4. Click Create.
The new database is created, and a new table is opened in Datasheet view.
248
Figure 10.3: Creating a New Database
Once you have created a blank database with a database name, you can create the following six
objects as described below:
 Tables - a collection of data about a specific topic, such as products or suppliers.
 Queries - a command for viewing or analyzing data in different ways or a result of the
command.
 Forms - a friendly interface to add a new record
 Reports - an object that present the data in an organized way according to your specification.
Examples are telephone bills, sales summary etc.
 Macros - a set of one or more actions that each performs a particular operation, such as
opening a form or printing a report. Macros can help you to automate common tasks. For
example, you can run a macro that prints a report when a user clicks a command button.
 Module - a collection of small programs and procedures that are stored together as a unit.
249
(ii) Create a New Database from a Featured Template
2. On the Getting Started with Microsoft Office Access page, under Featured Online
Templates, click a template.
3. In the File Name box, type a file name or use the one that is provided for you.
4. Optionally, check the Create and link your database to a Windows SharePoint Services
site if you want to link to a Windows SharePoint Services site.
5. Click Create (or) Click Download. MS- Access will create a new database from the
template and opens it.
(iii) Create a New Database from a Microsoft Office Online Template
2. On the Getting Started with Microsoft Office Access page, in the Template Categories
pane, click a category and then, when the templates in that category appear, click a
template.
3. In the File Name box, type a file name or use the one that is provided for you.
4. Click Download.
(iv) Open a Recently Used Database
1. Start Access.
2. On the Getting Started with Microsoft Office Access page, under Open Recent Database,
click the database that you want to open.MS-Access will open the database.
250
10.3.4 Creating a Table:
There are three ways to create a table:
 Use Datasheet View, i.e. enter data directly
 Use Design View
 Use a Table Template
(i) Create a Table in Datasheet View
To create a blank (empty) table in datasheet view, on the Ribbon you can:
 Click CreateTable in Figure 10.4
Figure 10.5 shows a Datasheet View with column headings ID and Add New Field across the top
of the datasheet. Data can be entered directly into it. After entering data and hit the Enter key, the
column heading Add New Field automatically changes to Field1 and the next column‘s heading
becomes Add New Field. At the same time, an ID number will be assigned to that row. When
you save the new datasheet, Microsoft Access will analyze your data and automatically assign
the appropriate data type and format for each field. Because the names of each field are not
descriptive, you may want to rename the fields.
Figure 10.4: Ribbon for Creating New Table
a) Renaming Fields:
1. Place the cursor over the column heading you want to rename and double click. The column
heading will appear highlighted and the cursor will be blinking (edit mode).
251
2. Type the name you want to use and then press the Enter key.
3. Repeat the first two steps for the second column, and so on.
Figure 10.5: Creating a Table in Datasheet View (Renaming Fields)
As the column corresponds to the field, the row corresponds to the record. Now we are ready to
add the information. Say that, if we are doing a database of a company, the first table we may
have is Employee. And the fields of Employee may contain SSN, LastName, FirstName, and so
on. Figure 10.6 shows Employee table as an example.
Figure 10.6: Datasheet View (Employee Table)
252
b) Summarizing Datasheet View
Figure 10.7: Summary of Datasheet View (Employee Table)
(ii) Create a Table in Design View
In Design View new fields can be added, define how each field appears or handles data, and
create a primary key. To create a blank (empty) table in design view, you can:
 Click CreateTable Design as shown in Figure 10.4, Design View as shown in Figure 10.8
will appear.
 In this view, we can specify detailed properties for each field. This includes the length and
type of information used in the field. But if we were to enter data into the table, we must use
Datasheet View or Forms. The design view for the example Employee table mentioned
before will look like Figure 10.9.
 There are three columns on the top portion of the window. The Field Name is the name of the
fields. For example, SSN, FirstName, LastName are proper field names for the Employee
253
table. The name for a field must follow MS Access object-naming rules. The Data Type is
like the domain of an attribute.
Figure 10.8: Design View
Figure 10.9: Design View (Employee Table)
254
It provides a list of data types that we can choose from, including Text, Memo, Number, Date,
and so on. The Description column allows us to describe the field and it is optional. This allows
new users to easily understand the specifications and meaning of your fields. Table 10.2
summarizes all data types available in MS Access.
You can set up properties of fields in the Field Properties window at the bottom half pane. Table
10.4 describes all properties available for setup.
Before we save the table and quit, we need to specify the primary key. In our Employee table,
SSN will be good for primary key. To define SSN as the primary key, click the Field Selector as
shown in Figure 10.8 for the SSN field. Field Selector is the gray bar on the left side of the Table
Design grid by each field. When we click here, the whole row appears highlighted. Then click
menu EditPrimary Key or click the Primary Key button (i.e. the key symbol, shown in Figure
10.9) on the toolbar in design view, a key symbol will appear on the Field Selector. Save the
table as Employee. Now we have created one Table.10.3.
Table 10.2: Field Properties in Design View
255
Table 10.3: Data Types in MS Access
256
(c) Summarizing Design View
Figure 10.10: Summary of Design View (Employee Table)
(iii) Create a Table Based on a Table Template
To create a Contacts, Tasks, Issues, Events or Assets table, you might want to start with the table
templates for these subjects that come with Office Access 2007. To choose a template for your
table from the above predefined templates you can:
 Click CreateTable Templates in Figure 10.4.
 Select one of the available templates from the dropdown list.
10.3.5 Relationships
The tables in a database may be linked to each other by the creation of relationships between
specific fields in the database. These relationships can be viewed in the Relationships window:
Select Relationships on the Database Tools tab
257
Figure 10.11: Relationship
(i) Creating a Table by Using the Table Wizard
Microsoft Access has a wizard named the Table Wizard that will create a table for you. This
wizard gives you suggestions about what type of table you can create (for example, a Mailing
List table, a Students table, a Tasks table, and so on) and gives you many different possible
names for fields within these tables. To use the Table Wizard to create a table, follow these
steps:
1. Create a new, blank database.
2. In the Database window, click Tables under Objects, and then click New.
3. In the New Table dialog box, double-click Table Wizard.
4. Follow the directions in the Table Wizard pages.
If you want to modify the table that the Table Wizard creates, open the table in Design view
when you have finished using the Table Wizard.
258
(ii) Creating a Table by Entering Data in a Datasheet
In Microsoft Access, you can also create a table by just entering data into columns (fields) in a
datasheet. If you enter data that is consistent in each column (for example, only names in one
column, or only numbers in another column), Access will automatically assign a data type to the
fields. To create a table by just entering data in a datasheet, follow these steps:
3. In the New Table dialog box, double-click Datasheet View. A blank datasheet is
displayed with default column names Field1, Field2, and so on.
4. Rename each column that you want to use. To do so, double-click the column name, type
a name for the column, and then press ENTER.
You can insert additional columns at any time. To do so, click in the column to the right
of where you want to insert a new column, and then on the Insert menu, click Column.
Rename the column as described earlier.
5. Enter your data in the datasheet. Enter each kind of data in its own column. For example,
if you are entering names, enter the first name in its own column and the last name in a
separate column. If you are entering dates, times, or numbers, enter them in a consistent
format. If you enter data in a consistent manner, Microsoft Access can create an
appropriate data type and display format for the column. For example, for a column in
which you enter only names, Access will assign the Text data type; for a column in which
you enter only numbers, Access will assign a Number data type. Any columns that you
leave empty will be deleted when you save the datasheet.
259
6. When you have added data to all the columns that you want to use, click Save on the File
menu.
7. Microsoft Access asks you if you want to create a primary key. If you have not entered
data that can be used to uniquely identify each row in your table, such as part numbers or
an ID numbers, it is recommended that you click Yes. If you have entered data that can
uniquely identify each row, click No, and then specify the field that contains that data as
your primary key in Design view after the table has been saved. To define a field as your
primary key after the table has been saved, follow these steps:
a. Open the table that Access created from the data that you entered in datasheet in
Design view.
b. Select the field or fields that you want to define as the primary key.
To select one field, click the row selector for the desired field.
To select multiple fields, hold down the CTRL key, and then click the row
selector for each field.
c. On the Edit menu, click Primary Key.
If you want the order of the fields in a multiple-field primary key to be different from the
order of those fields in the table, click Indexes on the toolbar to display the Indexes
window, and then reorder the field names for the index named Primary Key.
As mentioned earlier, Microsoft Access will assign data types to each field (column)
based on the kind of data that you entered. If you want to customize a field's definition
further--for example, to change a data type that Access automatically assigned, or to
define a validation rule--open the table in Design view.
260
(iii) Creating a Table by Entering Data in Design View
If you want to create the basic table structure yourself and define all the field names and data
types, you can create the table in Design view. To do so, follow these steps:
3. In the New Table dialog box, double-click Design View.
4. In the <Table Name>: Table dialog box, define each of the fields that you want to include
in your table. To do so, follow these steps:
a. Click in the Field Name column, and then type a unique name for the field.
b. In the Data Type column, accept the default data type of Text that Access assigns or
click in the Data Type column, click the arrow, and then select the data type that you
want.
c. In the Description column, type a description of the information that this field will
contain. This description is displayed on the status bar when you are adding data to
the field, and it is included in the Object Definition of the table. The description is
optional.
d. Once you have added some fields, you may need to insert a field between two other
fields. To do so, click in the row below where you want to add the new field, and then
on the Insert menu, click Rows. This creates a blank row in which you can add a new
field.
To add a field to the end of the table, click in the first blank row. After you have
added all the fields, define a primary key field before saving your table. A primary
261
key is one or more fields whose value or values uniquely identify each record in a
table. To define a primary key, follow these steps:
e. Select the field or fields that you want to define as the primary key. To select one
field, click the row selector for the desired field. To select multiple fields, hold down
the CTRL key, and then click the row selector for each field.
f. On the Edit menu, click Primary Key.
If you want the order of the fields in a multiple-field primary key to be different from the order
of those fields in the table, click Indexes on the toolbar to display the Indexes dialog box, and
then reorder the field names for the index named Primary Key.
You do not have to define a primary key, but it is usually a good idea. If you do not define a
Primary key, Microsoft Access asks if you want Access to create one for you when you save the
table. When you are ready to save your table, on the File menu, click Save, and then type a
unique name for the table.
10.3.6 Import/ Export Tables
One of the most useful features of Access is its ability to interface with data from many other
programs. In fact, it‘s difficult to summarize in a single article all the ways in which you can
move data into and out of Access. For example, here are just a few ways in which you might use
the data-exchange features of Access:
 To combine data that was created in other programs.
 To transfer data between two other programs.
 To accumulate and store data over the long term, occasionally exporting data to other
programs such as Excel for analysis.
262
(i) External Data Operations in Access
In many programs, you use the Save As command to save a document in another format, so that
you can open it in another program. In Access, however, the Save As command is not used in the
same way. You can save Access objects as other Access objects, and you can save Access
databases as earlier versions of Access databases, but you cannot save an Access database as,
say, a spreadsheet file. Likewise, you cannot save a spreadsheet file as an Access file (.accdb).
Instead, you use the commands on the External Data tab in Access to import or export data
between other file formats.
(ii) Types of Data That Access Can Import, Link To, Or Export
A quick way to learn about the data formats that Access can import or export is to open a
database and then explore the External Data tab on the ribbon.
Figure 10.12: External Data Tab
1. The Import & Link (1 given in Figure 10.12) group displays icons for the data formats
that Access can import from or link to.
2. The Export (2 given in Figure 10.12) group displays icons for all the formats that Access
can export data to.
263
3. In each group, you can click More (3 given in Figure 10.12) to see more formats that
Access can work with.
If you don‘t see the exact program or data type that you need, chances are your data can be
exported by the other program into a format that Access understands. For example, most
programs can export columnar data as delimited text, which is then easily imported into Access.
The following table shows which formats can be imported into, linked to, or exported out of
Access:
Table 10.4: Program or Formats
Program or Import allowed? Linking allowed? Exporting allowed?
format
Microsoft Office Yes Yes Yes
Excel
Microsoft Office Yes Yes Yes
Access
ODBC Databases Yes Yes Yes
(For example, SQL
Server)
Text files Yes Yes Yes
(delimited or fixed-
width)
XML Files Yes No Yes
PDF or XPS files No No Yes
E-mail (file No No Yes
264
attachments)
Microsoft Office No, but you can save a No, but you can save a Yes (you can export as
Word Word file as a text file Word file as a text file Word Merge or as Rich
and then import the and then link to the Text)
text file. text file.
SharePoint List Yes Yes Yes
Data Services (see No Yes No
note)
HTML Documents Yes Yes Yes
Outlook Folders Yes Yes No, but you can export
as a text file, and then
import the text file into
Outlook.
dBase files Yes Yes Yes
(iii) Import or link to Data in Another Format
The general process for importing or linking data is as follows:
1. Open the database that you want to import or link data into.
2. On the External Data tab, click the type of data that you want to import or link to. For
example, if your source data is in a Microsoft Excel workbook, click Excel.
3. In most cases, Access starts the Get External Data wizard. In the wizard, you may be
asked for some or all of the information.
265
 Indicate whether the first row contains column headings, or whether it should be treated
as data.
 Specify the data type of each column.
 Choose whether to import the structure only, or the structure and the data together.
 If importing, specify whether you want Access to add a new primary key to the new
table, or use an existing key.
 Specify a name for the new table.
Figure 10.13: Source Data Tab
4. On the last page of the wizard, Access usually asks you if you want to save the details of
the import or link operation. If you think you‘ll need to perform the same operation on a
recurring basis, select the Save import steps check box, fill in the information, and then
click Close. Then, you can click Saved Imports on the External Data tab to re-run the
operation.
After you have completed the wizard, Access notifies you of any problems that might have
occurred during the import process. In some cases, Access might create a new table called
Import Errors, which contains any data that it was unable to import successfully. You can
examine the data in this table to try to find out why the data did not import correctly.
(iv) Export Data to another Format
The general process for exporting data from Access is as follows:
1. Open the database that you want to export data from.
266
2. In the Navigation Pane, select the object that you want to export the data from. You can
export data from table, query, form, and report objects, although not all export options are
available for all object types.
On the External Data tab, click the type of data that you want to export to. For example to
export data in a format that can be opened by Microsoft Excel, click Excel.
Figure 10.14: Export Data Tab
In most cases, Access starts the Export wizard. In the wizard, you may be asked for
information such as the destination file name and format, whether to include formatting
and layout, which records to export, and so on.
4. On the last page of the wizard, Access usually asks you if you want to save the details of
the export operation. If you think you will need to perform the same operation on a
recurring basis, select the Save export steps check box, fill in the information, and then
click Close. Then, you can click Saved Exports on the External Data tab to re-run the
operation.
10.4 Summary
MS-Access is a powerful RDBMS that is used to create and manage your databases. It is a
graphical user interface application software, which is very easy and powerful to manage large
volume of data. It has many built in features to assist you in constructing and viewing your
267
information related to different environment like scientific, inventory, financial, payroll,
education, hospitality and various other environments.. The information can be viewed, sorted,
manipulated, retrieved and printed in various ways. Ms-Access gives you a platform where you
can retrieve accurate and fast information. The extension of MS-Access file is .mdb.
1. http://www.officetutorials.com
2. Windows XP Complete Reference, BPB Publications
3. MS Office XP Complete, BPB publication
1. Explain the steps to create table in design view. Discuss the process of creating relationships.
2. What data types are supported in MS-Access to create table.
3. Discuss the field properties of table in MS- Access.
4. Discuss the format and program whose data may be imported in and exported to MS-Access.
5. What are the different steps for creating a database in MS –Access?
6. Why we use MS-Access? Discuss the different way to create a table.
7. What do you understand by external data operations in MS –Access?
268
Chapter – 11: Database Operation in MS-Access
Structure:
11.1 Introduction
11.2 Objective
11.3.1 Queries
(i) Creating Queries
(ii) Query Wizard: A Select Query
(iii) Design View of an Existing Query
(iv) Creating a Query Totally in Design View
(v) Use a Query Wizard to Create a Crosstab Query
(vi) Create a Parameter Query in Design View
(vii) Creating Action Queries in Design View
(viii) Make-Table Queries
11.3.2 Reports
(i) Views
(ii) Report Wizard
(iii) Report Tool
(iv) Report Design
11.3.3 Forms
3.3.1 New Form Options
3.3.2 Design View of Forms
3.3.3. Design View Form Sections
3.3.4 Design View Info
11.4 Summary
269
11.1 Introduction
This chapter provides essential tools/ operations such as queries, form and reports of any DBMS
or RDBMS package, as all the information is not required at one time. One always needs
selective data. Therefore query can filter data from a single table or group of related tables.
Forms provide an interactive way to data entry into the table. We can view, modify or delete data
stored in the table by using a form. The user can choose the design of the form from various
ready-made designs provided by MS- Access.
Reports are used to present data in a predefined or user-defined format. They are generally
prepared for presenting data in hard copy form using a printing device. Reports take data from
database tables and present it, in a way the user wants. One can group data on certain fields or
conditions or sort data on one or more fields in ascending or descending order.
11.2 Objective
The idea of this chapter is to make the student familiar with different database operation such as
queries, reports and forms. For this purpose, the author of this chapter provides an overview of
MS-Access 2007 concentrating on the said aspects. The screen-shots are provided at the
appropriate places for the better understanding of the students.
11.3.1 Queries
A query is a way to define a permanent filter to retrieve data or to create an action that performs
on records. Queries are also called dyna-sets for dynamic subsets of a table.
(i) Creating Queries
Queries can be one of four main types:
 A Select Query retrieves and displays records from tables according to what field you pick
270
and what criteria you place on the query.
 A Crosstab Query will display sums, counts, and averages from one field in a table and
show this in a datasheet with fields on the left and across the top.
 An Action Query performs operations on the records to match your criteria and include
make-table queries, update queries, append queries and delete queries.
 A Parameter Query prompts you for information to use to activate the query. It can help
you to query addresses state by state without creating 50 different queries.
(ii) Query Wizard: A Select Query
A query may be created at any time after you have a table. Access to all query options is found
on the Create Ribbon. Click the Create Ribbon and the Query Wizard button. From this point on
2007 is very similar to 2003.
 Follow the wizard dialog boxes and answer the questions to
create your select query. Select a query type first.
 First choose the object to base it on. A query may be based
on one or more tables or on another query.
 Select the fields you need to show in your query and send
them across.
 If you need fields from a second table or query, reselect that
table or query and add the fields to the list already chosen.
 In the next box choose whether you want Detail which shows
all fields of info selected or Summary which shows only the
summarized results. If you choose Detail, your query is
finished. Figure 11.1: Select Query
271
 Click Summary and then Summary Options to see the numerical fields listed and the option
for calculations.
 The Detail/Summary screen only appears if one of your fields has numerical data.
 Give the new query a name and choose to open the query or go directly into Design view.
 Click finish. It looks just like a datasheet, but it gives you filtered data, on command, without
redoing a filter.
 You may click on the design view icon to edit this query further with design view of a query.
 Note that in Access 2007 a table and a query can also make grand totals. You no longer need
to have a report to show the totals.
(iii) Design View of an Existing Query
In query design view a query grid shows the fields you have selected and the field list from the
table you are using for the query.
 If you open an existing query, the object it is based on shows in the area at the top.
Figure 11.2: Design View of Query
 The query properties are down the left side of the grid and change depending on the type of
query selected.
272
 Add more fields to your query either by clicking on an empty field cell down arrow and
choosing a field to add, or by dragging the field directly from the field list at the top and
dropping it on the field row.
 Make the query do an alphanumeric or numeric sort on any chosen field by clicking on the
Sort line in the chosen field and choose ascending or descending.
 The Default in queries has the first field auto-sorting.
 If the box in the show line is not checked this field will not show on your finished query.
 A hidden field can still be used as a sort field or a limiting field if criteria are set.
 Queries can pull data from multiple tables or queries. If you have a true relational database,
your query may be used to pull together all data from all tables into one large query.
 Click on the Show Table icon and select a second table or query.
 Choose the field from the popup list in the field cell or drag any field from the table or query
field list to use in your query.
(iv) Creating a Query Totally in Design View
 Choose Create\Design View to start a blank query.
 Select the table(s) or query to base your new query on. Close the pop-up ―Show Table‖ box
to continue.
 A special Query Tools/ Design Ribbon opens on the ribbon bar when you close the Show
Table box.
 To create a simple query, select the fields to query and set the criteria. Drag and drop fields
to the bottom grid or choose them from the drop down list that appears when you click on the
field line boxes.
 Choose from various options for the other properties given.
273
 Query options listed will depend on the query type you have started.
 Select a field to sort and choose ascending or descending.
 Determine whether or not to show each field by the check box.
Figure 11.3: Creating a Query in Design View
 Create an expression on the criteria line to filter out unwanted data or type in the exact data
you wish to see.
 If you click the View button on the Ribbon, you return to datasheet view to PREVIEW your
query. Remember that you are only previewing the query.
 You must close Design View of the query and choose to SAVE the query before you can
really be finished with the design and ready to run the query or use your tables or reports.
274
 Choose to turn your simple query into an action or crosstab query by choosing that option
from the Query type group on the Query Tools/ Design Ribbon.
(v) Use a Query Wizard to Create a Crosstab Query
Crosstab queries will display sums, counts, and averages from one field in a table and show this
in a datasheet with fields on the left and across the top. Use the wizard to help make your
crosstab query easy.
 Open Create\Query Wizard and choose the
Crosstab Query Wizard on the first screen.
 Choose the table or query to base this on.
Figure 11.4: Creating a Crosstab Query
 Choose the field you want to use as the rows.
 If you want the fields ―grouped‖ use more than one field. The grouping is by which field is
chosen first.
 Next choose a field for the column headings.
 The last step is to choose the field you want calculated in the crosstab.
275
 Next step is the selection of the function to use. Choose from average, count, first, last,
maximum, minimum, Standard Deviation, Sum etc.
 Click next and name your crosstab query. Run the query to see how it looks.
 If the query doesn't include the data you want, start over and do it again.
 Click on design view to see how the fields and info are set up.
 Click design view after tuning it on.
 Save and run the query again to see the results.
 Other options for queries can be set in design view.
o Turn on the alpha sort of types.
o You can't put criteria below a value field. Put the criteria under a second field and make it
hidden if necessary. Access tries to correct errors to make a query work.
o Save and re-open the query. Access creates extra fields and moves criteria when
necessary.
o If you create an incorrect expression, Access tries to correct your expression or gives you
a warning message and refuses to save if you cannot correct the problem.
o Do not use the crosstab query wizard if you are querying multiple tables. Use Design
View instead to create the query and then choose crosstab query from the query types on
the Ribbon and fill in the fields and options.
(vi) Create a Parameter Query in Design View
A parameter is a question set to ask for criteria before running the query. It can be added to any
existing query. Make a copy of the select query we created, and turn it into a parameter query.
 Open a query in design view, and check to be sure all desired fields are chosen.
 Choose which field is to be the basis of the parameter query.
276
 Type in a question requesting the needed parameter as the criteria of that same field.
Example: [Enter the computer type:] on the criteria line under the Type field. Include
brackets!
 I like listing my options in the parameter
question: [Choose from the computer types:
laptop, desktop, profile, or server:]. Save the
query.
 Run the query. It asks for you to input the
parameter before compiling the query.
Figure 11.5: Parameter Query View
(vii) Creating Action Queries in Design View
The action queries must all be started in Design View as per the steps given below:
 Click Create\ Query Design.
 Add the table or query to use in your query and close the Show Table window.
 Choose the action query type you want from the Query Tools/ Design Ribbon which opens
when you close the show table box.
 Select the desired type of query and create one of each.
(viii) Make-Table Queries
 Start a new Design Query, and choose the table or query to base it on.
 Select the Make-Table option from the query types on the design ribbon.
 Give your new table a name and tell whether you are adding it to the current database,
another database or a new database.
277
 Choose the fields to add to the new table. (Add
manufacturer, computer type, cost, and date
purchased.)
 Set criteria if needed such as a sort or
manufacturer you need.
 Close, save, and name the query.
 Double-click to run the Make-Table query and
create your new table.
 This beats copy and paste, and the original table
is not affected.
Figure 11.6: Make Table Query
Table queries include (a) delete queries, (b) append query and (c) update query as discussed
below:
a) Delete Queries:
 Start a new Design Query and choose the newly made table as the base.
 Select the Delete query option from the query types .
 Choose only fields needed for criteria from the new table.
 Set criteria on the date purchased field, so that computers with date purchased before
11/22/2005 will delete. (<11/22/2005)
 Note that ―#‖ signs will appear around the date automatically if you forget to add them.
 Save and name the query. Check for the ―#‖ signs.
 Run the Delete query and then look at the table. The oldest computers are gone.
278
b) Append Queries:
 Start a Design Query and choose the original table as a base to pull your data from.
 Select the Append query option from the query types.
 Next you are asked to select the table to append the data onto. Choose the one created by
the make-table query.
 Choose the same four fields from the original table. (Add manufacturer, computer type,
cost, and date purchased.)
 Set the criteria so computers with a date purchased before 11/22/2005 will append.
(<#11/22/2005#)
 Save and name the query.
 Run the Append query and then look at the computers appended to the table. All
computers are back.
 When you use an Append query, be sure the fields of data match in the two tables.
c) Update Queries:
 Start a new Design Query, and choose the appended make-table as a base.
 Select the Update Query option from the query type group.
 Choose only the field you are updating and the field you need to set the criteria. (If you
are not limiting criteria, you won't need the second field.)
 Set the criteria. If you need to refer back to data in the table to check for criteria, click to
open the table from the Navigation Pane. If necessary, block and paste the data from the
field you want to update. Example: use manufacturer: "Dell" for the criteria.
 When you paste criteria in, ―equals‖ is understood as the given and the quotation marks
automatically appear around Dell.
279
 Criteria with periods in it may confuse Access and require you to put in the quotation
marks to mark it as text.
 Set the field update to information. Example: [cost] + 1000; adds 1000 to each cost
amount on the Dells. The dollar sign and decimal are unneeded and will be ignored. DO
NOT use a comma in the dollar amount. Just use the field name plus the amount to
increase or the field name minus to reduce a cost: [cost] + 1000 or [cost]-500.
 Save and name the query. Run the Update query, and then look at the computer costs that
updated in the make-table.
10.3.2 Reports
An Access Report is a formatted, stylized way to print out any part of your database information.
Information in a report can be sorted, queried, formatted, calculated, or summarized. Your report
can be based on either a table or a query.
(i) Views
View options have changed in Access 2007. Check out the options on these four views. Each
view has a specific purpose in creating and modifying your report.
 Report View gives the access tabbed view of your finished report
lined up with any other tabbed open objects from the database.
 Print Preview takes you to the new print preview interface with
its program ribbon and features.
 Layout View is new in 2007 and allows you more flexibility in
setting up and modifying a report than ever seen before. You can do almost anything to fix
report problems all in a GUI interface.
280
 Design View has also changed in 2007, but still looks similar to previous versions. You can
use it to add more controls, edit control sources, and change properties.
(ii) Report Wizard
Click the Create ribbon and look at the Report Group. You‘ll see several options for creating
reports. Click the Report Wizard Tool and let Access lead you through the steps.
 If you click on the table or query you wish to base the report on, will
be given in the selection box. You may still change to another table
or query.
 Go down the field list choosing which fields you need and
clicking the center arrow buttons to send the fields to your
report. Put them in the order you want on the report.
 Next a box comes up allowing you to set grouping options for
your field. E.g. group name by manufacturer or computer
type. More than one grouping field may be set.
 Choose a sort order for any field except the grouping field
which is Alpha sorted already (default). You‘ll see the
summary option if you have grouping and any number fields.
 If you don‘t pick a grouping field, the summary options do
not show.
 Experiment with the choices for summary options by
choosing from sum, average, minimum or maximum. Try a
count as well.
Figure 11.7: Report Wizard-1
281
 Decide whether you want Summary Only (one grand total‐no list) or the Detail and Summary
(shows all items and their totals).
Figure 11.8: Report Wizard-II
 The layout of your data is the next set of options. Also choose page orientation and whether
or not all of a field will be shown.
 Next is a choice of pre‐set styles which are expanded in 2007.
 Give the report a name and choose whether to preview or modify.
 The report name becomes the title of the report. It can be changed later if needed.
 Preview the report or modify takes you to Design View. Click Finish.
 If the report doesn't look right, delete it, and start the wizard again.
 If you do not choose to use groupings, the wizard gives you options for columnar, tabular, or
justified reports. You can make a report look more like a set of forms.
 Go to File\Page Setup to modify margins, orientation and set the number of columns.
 Switch to Layout View or Design View to make necessary modifications to the report before
savng. (See below for further instructions on this.)
282
 Click the print icon or File\Print for all other printing options.
(iii) Report Tool
The old Auto Report as available in Ms-Access 2003 is missing, but the new Report Tool Ms-
Access 2007 creates a report just as easily. It gives you an instant download of all fields in the
table or query you have selected to base it on. The report also opens in Layout View which gives
you full editing options. Select a table or query first, and click on Create Report.
 All existing fields in the chosen table or query appear on the new report showing in Layout
view on the screen.
 Layout View lets you edit without going to Design View.
 Any extra fields may be manually selected and deleted in Layout View.
 Rearrange the order of fields or other objects on your report by dragging.
 Field column sizes may be increased or decreased by dragging the edges of a field.
Figure 11.9: Report Tools
Three Report Layout Tools (Contextual Tabs) used in modifying your report open automatically
as given below:
283
Figure 11.10: Report layout - I
Format: fonts, formatting, grouping, totals, gridlines, logos, page numbers, auto formatting
changes are here.
Figure 11.11: Report layout – II
Arrange: control layout options, alignment, positions, and property sheet tools are on the
arrange ribbon.
Figure 11.12: Report layout – III
Page Setup: change paper size, orientation, margins, columns, and other page set options.
 Use Format to set up a gridline on whatever you select.
 Use a fill color in the grids.
 Turn on totals in your report or add a page number. The list of option is on the ribbon.
 Click the Add Existing Fields button to get the Field List Pane turned on. Extra fields may be
added by dragging onto the report from the Field List pane.
284
 The Auto format gallery has an extensive set of report styles to click and apply.
Figure 11.13: Report Style
 When you are finished making design changes to your report, click on Report View to see
the finished version.
(iv) Report Design
If you initially start your report in Design View, it will not have a Record Source associated with
it, and you will have to manually assign one. All other report methods allow you to choose the
table or query to associate with the report.
To associate a table or query to the report:
 Click on Create\ Report Design.
 Right‐click the box at the upper left corner of the ruler bars with a black button in it. You‘ll
select the report and the shortcut menu that pops up gives you the Properties option. Click to
see the REPORT Property Sheet.
285
 If you select properties, but do not see the word REPORT in selection type, click the drop
down menu and choose report from the list.
 The first item in the properties box asks for a Record Source. Click the down arrow to see the
list, and choose the table or query you want for the report source.
 Now you are ready to open the field list and add controls to your design grid.
(v) More Advanced Design Tools:
When a control is added there are two pieces, the label and the control data. When you select
either item they can be moved together or separately. You can see dark blocks in the upper left
corners, but only one item has the yellow selection box showing. The items move together when
the compass is anywhere else on the box except on the dark block of the upper left corner.
To move the label only into the Page Header section, do the following:
Click to select the label so that the yellow box shows.
On the toolbar click the cut icon or press ctrl X.
Select the Page Header section by clicking on its section bar.
On the toolbar click the paste icon or ctrl V, and the label appear.
Add a sum or average of a numerical field in a report.
The simplest way is to use the Report Wizard. If you use grouping the next window will give
you a button for summary options. Choose from the choices and view your report. To add a
summary field directly into design view follows these steps:
• Be sure the Report Footer section is showing or turn it on.
• Drag the bottom edge of the grid down to create space for the expression.
• Select the text box icon and draw the box in the section you want.
• Delete the label or move it to the left to use as a label box with the word ―Sum‖ in it.
286
• The expression =Sum ([fieldname]) may be typed directly into the text box. For average type
=Avg ([fieldname]).
• Put the sum expression in the group footer for subtotals on your groups.
• Put the sum expression in the report footer for a grand total on the final page.
• Open the properties of the text box as a Format must also be set in properties. Use the drop
down menu to select from the list: Currency, Long integer, etc.
• The Expression Builder may also be used to create a summary of a field.
Add page numbering to the report footer.
• Create a Text Box by clicking that icon, and using the mouse to ―draw‖ a rectangle.
• Click inside and type in the following code: ="Page" & [Page] & "of " & [Pages].
• Other page numbering expressions may be seen in the Expression Builder.
• Delete the label box as it is unneeded.
Add a current date to report footer.
• Create a Text Box by clicking that icon, and using the mouse to ―draw‖ a rectangle.
• Click inside and type in the following code: =Now().
• Other expressions may be seen in the Expression Builder.
• Delete the label box as it is unneeded.
Add different types of graphics to your report.
• Start in Layout View or Design View.
• Insert a picture or clip art by clicking on the Logo button.
Locate a file and insert it directly into the report. Access now has the capability to shrink a
picture to fit whatever size area you have. If the graphic is added to the report header it appears
only on page one. If it is added to the page header, it appears on every page.
287
• Lines may be added to your document in design view and layout view. Some
AutoFormats can interfere with graphical options.
10.3.3 Forms
An Access Form may be created to use as a simple interface to input records one at a time. It can
also be used to view, print or search for individual records.
Figure 11.14: Database Tools
(i) New Form Options
Click on Create to see all forms options in the Forms group.
Form creates a simple form with all fields.
 Choose the table or query to base it on.
 Now when you click Form all fields from the chosen object are added to a basic form in
Layout View.
 The two Form Layout Tools Ribbons appear with more formatting and arrangement options.
Layout View of a form is similar to the option and design features you used in Report
Layout. Add an auto format, labels, graphics, backgrounds, or other options to your form.
 Move, resize, edit, delete fields.
Split Form: It creates a columnar form and includes the datasheet on a split screen with all fields
in selected query or table.
 Split form is created from a chosen table or query in Layout View.
 Edit using the Form Layout Tools options.
288
 Edit the table as well or move back and forth.
Figure 11.15: Forms
Figure 11.16: Split Forms
Blank Form – Creates a blank form in Layout View with the table field list turned on. You may
not use a query on this one. It is similar to the Blank Report.
 Drag and drop desired fields onto the form.
 Turn on Show all Tables if needed.
289
 Move and resize your chosen fields.
 Add formatting and style options.
Pivot Chart – Used to create a pivot chart form.
MS- Access includes four other kind of form.
 Choose a pivot table form.
 Check on modal dialog forms.
 Use a datasheet form.
 Form Wizard – Allows you to step through the process answering the questions and
creating a custom form of only the items you choose. Under More Forms follow the
following steps:
Decide the arrangement and position of the info on the form from columnar, tabular,
datasheet or justified.
Choose a style from the previews.
Name and save your form.
Form Design – Design brings up the blank design grid similar to the report design where you
build your own form from the ground up.
Use the Form tool or Form wizard to create an easy form. If you need to seriously edit the form,
it would be simpler to edit the original fields and then recreate the form. Use Design View to do
simple modifications on this form.
(ii) Design View of Forms
Use Design Form to create a personalized, custom form. Most forms are small and do not require
more than the detail section of the form. Creating a Form from design view is a complex task.
For the design view of a Form following steps are followed:
290
 Click on Create\ Form Design to open the form.
 Double click the black square dot in the upper left corner.
 The Property Sheet opens and you must choose the Record Source.
Figure 11.13: Design View of Forms
Now you are ready to click Add existing fields and to see the pane open with all table fields.
Drag needed fields into the detail section and arrange.
(iii) Design View Form Sections
 Form Header and Footer are not normally turned on, but may be found under Report Design
Tools\Arrange.
 A title for your form and any graphic image you may wish to add may be put in either the
Form Header or the detail section.
 Detail section is for the actual data you need to fill in for your table.
(iv) Design View Info
Each item in the form is represented in design view by a control the same as a report.
 Bound Controls are fields of data. No calculations
 Unbound Controls contain a label or text box. You can calculate in an unbound control.
 Calculated Controls are values that are calculated and not used in most forms.
(v) Editing the Form
To edit any of your form, follow the steps in design view as given below:
291
• Add or remove controls from your form.
• Create a command button on your form:
• Click ―Use Control Wizards‖ icon.
• Click ―Button‖ and draw a button on your design view form.
• From the Command Button Wizard click through the Category options and choose an action.
Examples: Print a record, a form, or a report.
• Select all controls on your form with the pointer, and apply different fonts, font sizes,
highlights, font colors, bold, italics, underlines, etc. to your form
• Resize the form and move the controls to the right half of the form to have room to insert a
graphic on the left side of the form.
• Add lines and boxes to the form. Click the line/border width icon to change the line width.
Pull on the end of line to lengthen.
• Use AutoFormat to add a style or right click to do a fill color in the background.
• If you change any field to a lookup field after your form (or report) is created, the form will
not automatically update this field. You will need to recreate the form or change it yourself in
design view. If you want to fix it yourself follow these steps:
• Select and delete the old field from the form.
• Open the field list and drag the new lookup field onto your form.
• Add a combo box or list box to a form in design view by dragging the new field from the
field list onto the design view grid. Properties are set for you.
• One common problem with many forms is the size of the fill in box. This is based on the
size of the field in the original table. Remember that 2007 has gone back to 255 characters as
292
default size. To decrease the field size, go back to the properties of each field while editing
in table design view. You will have to recreate the form to finish resizing.
11.4 Summary
The new user interface in Office Access 2007 comprises a number of elements that define how
you interact with the product. These new elements were chosen to help you master Access, and
to help you find the commands that you need faster. The new design also makes it easy to
discover features that otherwise might have remained hidden beneath layers of toolbars and
menus. And you will get up and running faster, thanks to the new Getting Started with Microsoft
Office Access page, which provides you with quick access to our new getting started experience,
including a suite of professionally designed templates.
The most significant new interface element is called the Ribbon, which is part of the Microsoft
Office Fluent user interface. The Ribbon is the strip across the top of the program window that
contains groups of commands. The Office Fluent Ribbon provides a single home for commands
and is the primary replacement for menus and toolbars. On the Ribbon are tabs that combine
commands in ways that make sense. In Office Access 2007, the main Ribbon tabs are Home,
Create, External Data, and Database Tools. Each tab contains groups of related commands, and
these groups surface some of the additional new UI elements, such as the gallery, which is a new
type of control that presents choices visually. Queries, Reports and Forms in Ms- Access are the
database operations which help the user to retrieve the meaningful information from the
database. Query is a request for retrieving data from table that satisfies a particular condition.
Forms provide an environment where the user can edit, insert and modify the existing records.
Data so stored in the database can be reported to top level management in the form of report,
which helps the management for decision making process.
293
1. Windows XP Complete Reference. BPB Publications
2. MS Office XP complete BPB publication
3. Sandra Nees, Microsoft Access 2007: Forms and Reports, Creator and Presenter Booth
Library, EIU.
4. Sandra Nees, Microsoft Access 2007: Queries, Creator and Presenter Booth Library, EIU.
1. Explain the steps to create a query in design view.
2. What are the steps to create a form in design view?
3. What are the various methods to create queries in MS-Access?
4. What do you mean by reports? What are the uses of reports? Discuss the procedure of
report generation in MS-Access.
5. What are Forms? Why forms are used in Ms-Access? How they are different from tables?
294

DBMS

Uploaded by

Copyright:

Available Formats

You might also like

DBMS

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DBMS

Uploaded by

Copyright:

Available Formats

Chapter – 1: An Introduction to Database Systems

Writer: Dr. Kanwal Garg

of numbers, words, or images, particularly as measurement or observations of a set of variables.

directives include electronic databases within its definition. A database is controlled by a

the database management system.

1.3 Presentation of Content

computer parlance, a spreadsheet generally starts out by holding data.

be considered as data; since it is meaningless.

Figure 1.1: Data

such as files, books and ledgers.

storing important information. In computer parlance, a relational database makes information

from the data stored within it.

mathematics in annual examination‖ is information.

Figure 1.2: Information

acquire from their day to day life.

Figure 1.3: Knowledge

value, noise, inconsistency and incompleteness in data.

1.3.4 Difference between Data, Information and Knowledge

Data Information Knowledge

Data represents unorganized Information can be Knowledge is derived from

figure. of data (processed data). information is derived from

Data is not significant to a Information is significant to Knowledge always helps to a

information. represent to information. on learning, thinking, and

proper understanding of the

Data must be interpreted, by a Information is exchangeable Continuously gaining process

meaning. facts and concept etc. knowledge.

alphabet, word and special to "who", "what", "where", questions.

symbol etc. and "when" questions.

assessment parameters. right clinical treatment. important decisions.

3.5 Manual Data Processing and its Limitation

which are explained as below:

the subscription number of Mr. Arpit. It is a boring and tiring job.

computers; rather than manually handling and processing the database.

1.3.6 File Processing System and its Limitation

Example 7: Employee qualification may be maintained in two or my files. In case of

the file might not be updated. This results into inconsistency.

application dependent. So, it is difficult to maintain data integrity.

application programs it may be quite difficult to maintain.

file data and it leads to data inconsistency.

operation on your data.

requirement of application program. All application programs are independent to each

time consuming process.

retrieve quick response.

application program first. So it will delay the response time.

that must be more effective known as Database Approach.

1.3.7 Database Approach

reason, a database is also defined as a self-describing collection of integrated records. The

database that provides program-data independence.

1.3.8 Characteristics of Database

The data in a database should have the following features:

the scope of the process that created it.

entity that they represent.

4. Security: Data should be protected from unauthorized access.

changes in one level should not affect the other levels.

management system was developed.

1.3.9 The Database Management System

A database management system (DBMS) is a general-purpose software system. It is a collection

1. Computerized Library System

2. Automated teller machines