Dbms-3bcom-1-3 Units

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 160

Database Management System

III B.Com Computer Applications


UNIT –
I
Data and Information
• Data are raw facts that constitute building block of information. Data are
the heart of the DBMS. It is to be noted that all the data will not convey
useful information. Useful information is obtained from processed data. In
other words, data has to be interpreted in order to obtain information.
Good, timely, relevant information is the key to decision making. Data are
a representation of facts, concepts, or instructions in a formalized manner
suitable for communication, interpretation, or processing by humans or
automatic means. The data in DBMS can be broadly classified into two
types, one is the collection of information needed by the organization and
the other is “metadata” which is the information about the database. The
term “metadata” will be discussed in detail later in this chapter. A
company needs to save information about employees, departments, and
salaries. These pieces of information are called data. Permanent storage
of data are referred to as persistent data.
• Information is data that has been processed in such a way as to be
meaningful to the person who receives it. Information is data that has
been converted into a more useful or intelligible form. It is the set of
data that has been organized for direct utilization of mankind,
as information helps human beings in their decision making process.
Examples are: Time Table, Merit List, Report card, Headed tables,
printed documents, pay slips, receipts, reports etc. The information is
obtained by assembling items of data into a meaningful form. For
example, marks obtained by students and their roll numbers form
data, the report card/sheet is the .information. Other forms of
information are pay-slips, schedules, reports, worksheet, bar charts,
invoices and account returns etc.
Database

• A database is a collection of information that is organized. So that it


can easily be accessed, managed, and updated. A database is a well-
organized collection of data that are related in a meaningful way,
which can be accessed in different logical orders. Database systems
are systems in which the interpretation and storage of information are
of primary importance. The database should contain all the data
needed by the organization as a result, a huge volume of data, the
need for long-term storage of the data, and access of the data by a
large number of users generally characterize database systems
Database Management System
• A database management system (DBMS) consists of collection of interrelated
data and a set of programs to access that data. It is software that is helpful in
maintaining and utilizing a database. A DBMS consists of:
• A collection of interrelated and persistent data. This part of DBMS is referred
to as database (DB).
• A set of application programs used to access, update, and manage data. This
part constitutes data management system (MS).
• A DBMS is general-purpose software i.e., not application specific. The same
DBMS (e.g., Oracle, Sybase, etc.) can be used in railway reservation system,
library management, university, etc.
• A DBMS takes care of storing and accessing data, leaving only application
specific tasks to application programs. it is evident that DBMS allows user to
input data, share the data, edit the data, manipulate the data, and display the
data in the database. Because a DBMS allows more than one user to share the
data; the complexity extends to its design and implementation.
Objectives of DBMS
The main objectives of database management system are data availability, data integrity,
data security, and data independence.

• Data Availability: Data availability refers to the fact that the data are made available to
wide variety of users in a meaningful format at reasonable cost so that the users can
easily access the data.
• Data Integrity: Data integrity refers to the correctness of the data in the database. In
other words, the data available in the database is a reliable data.
• Data Security : Data security refers to the fact that only authorized users can access the
data. Data security can be enforced by passwords. If two separate users are accessing a
particular data at the same time, the DBMS must not allow them to make conflicting
changes
• Data Independence : DBMS allows the user to store, update, and retrieve data in an
efficient manner. DBMS provides an “abstract view” of how the data is stored in the
database. In order to store the information efficiently, complex data structures are used to
represent the data. The system hides certain details of how the data are stored and
maintained.
Evolution of Database Management
Systems
• File-based system was the predecessor to the database management
system. Apollo moon-landing process was started in the year 1960. At
that time, there was no system available to handle and manage large
amount of information. As a result, North American Aviation which is
now popularly known as Rockwell International developed software
known as Generalized Update Access Method (GUAM).
• In the mid-1960s, IBM joined North American Aviation to develop GUAM
into Information Management System (IMS). IMS was based on
Hierarchical data model. In the mid-1960s, General Electric released
Integrated Data Store (IDS).
• IDS were based on network data model. Charles Bachmann was mainly
responsible for the development of IDS. The network database was
developed to fulfill the need to represent more complex data
relationships than could be modeled with hierarchical structures.
• Conference on Data System Languages formed Data Base Task Group (DBTG) in 1967. DBTG
specified three distinct languages for standardization. They are Data Definition Language
(DDL), which would enable Database Administrator to define the schema, a subschema DDL,
which would allow the application programs to define the parts of the database and Data
Manipulation Language (DML) to
• manipulate the data. The network and hierarchical data models developed during that time
had the drawbacks of minimal data independence, minimal theoretical foundation, and
complex data access.
• To overcome these drawbacks, in 1970, Codd of IBM published a paper titled “A Relational
Model of Data for Large Shared Data Banks” in Communications of the ACM, vol. 13, No. 6,
pp. 377–387, June 1970.
• As an impact of Codd’s paper, System R project was developed during the late 1970 by IBM
San Jose Research Laboratory in California. The project was developed to prove that
relational data model was implementable. The outcome of System R project was the
development of Structured Query Language (SQL) which is the standard language for
relational database management system.
• In 1980s IBM released two commercial relational database management systems known as
DB2 and SQL/DS and Oracle Corporation released Oracle.
• In 1979, Codd himself attempted to address some of the failings in his original work with an
extended version of the relational model called RM/T in 1979 and RM/V2 in 1990.
• In recent years, two approaches to DBMS are more popular, which are Object-
Oriented DBMS (OODBMS) and Object Relational DBMS (ORDBMS). The
chronological order of the development of DBMS is as follows:
• Flat files – 1960s–1980s
• Hierarchical – 1970s–1990s
• Network – 1970s–1990s
• Relational – 1980s–present
• Object-oriented – 1990s–present
• Object-relational – 1990s–present
• Data warehousing – 1980s–present
• Web-enabled – 1990s–present
• Early 1960s. Charles Bachman at GE created the first general purpose DBMS
Integrated Data Store. It created the basis for the network model which was
standardized by CODASYL (Conference on Data System Language).
• Late 1960s. IBM developed the Information Management System (IMS). IMS
used an alternate model, called the Hierarchical Data Model.
• 1970. Edgar Codd, from IBM created the Relational Data Model.
• In 1981 Codd received the Turing Award for his contributions to database
theory. Codd Passed away in April 2003.
• 1976. Peter Chen presented Entity-Relationship model, which is widely used
in database design. 1980. SQL developed by IBM, became the standard query
language for databases. SQL was standardized by ISO.
• 1980s and 1990s. IBM, Oracle, Informix and others developed powerful
DBMS.
Classification of Database Management
System
• The database management system can be broadly classified into two categories, they
are
• (1) Passive Database Management System
• (2) Active Database Management System
•  1. Passive Database Management System.
• Passive Database Management Systems are program-driven. In passive database
management system the users query the current state of database and retrieve the
information currently available in the database. Traditional DBMS are passive in the
sense that they are explicitly and synchronously invoked by user or application program
initiated operations. Applications send requests for operations to be performed by the
DBMS and wait for the DBMS to confirm and return any possible answers. The
operations can be definitions and updates of the schema, as well as queries and updates
of the data.
2. Active Database Management System.
Active Database Management Systems are data-driven or event-
driven systems. In active database management system, the users specify
to the DBMS the information they need. If the information of interest is
currently available, the DBMS actively monitors the arrival of the desired
information and provides it to the relevant users. The scope of a query in
a passive DBMS is limited to the past and present data, whereas the
scope of a query in an active DBMS additionally includes future data. An
active DBMS reverses the control flow between applications and the
DBMS instead of only applications calling the DBMS, the DBMS may also
call applications in an active DBMS.
UNIT-II
Historical Roots of File and File System
• In olden days records are maintained in traditional file systems. File system
means organization of files.
• File is a collection (or) group of records. A record is collection of fields. Where
the field contains the real data. For ex: Student file, Where we maintain all
students record consisting of Roll No, name, group, marks, average.
• In the olden days the file system maintains all the files in flat manner (flat
files/text-files). The flat file permits to search any record in sequential access
only. It was cumbersome and slow. To overcome the slowness they have gone
for Index file system which was faster in accessing in random manner. But it
occupies extra memory to maintain Index table.
• In general in the file system all the data has to be stored in the
corresponding folders (or) directories. For ex: In a college we can
maintain admission details of students. Suppose the director wants to
know today’s admission status group wise in that case the file manager
(or) clerk has to open each folder to answer the director’s question.
That’s why it is time consuming, memory consuming, may be error
prone.
• To over come this, the data base system has evolved. Which uses the
4GL language i.e. SQL. Which allows answering any query?
• The DBMS maintains all the records in the form of tables by means of
rows and columns. But in DBMS before storing the data the schema has
to be created with the help of DDL.
File system

• Assume maintaining the first year student’s data using file systems.
BBA BBA B.Com B.Com

• Folder name
• Roll No, name, fees Roll No, name, fees
• Every folder contains all the relevant fields which occupy extra
memory, calculation is slow. Constraints cannot be imposed in File
systems DBMS.
 
Roll No Name Branch Fees
60012 Gopal B.Sc 20000

60013 Sainath B.Com 19000

60014 Surekha B.Sc 20000

60015 Ramya BBA 30000

In DBMS all the students information will be available in a centralized


place i.e. table. So it is easy to retrieve any data using query.
Constraints can be imposed in DBMS. [Primary key]. Data redundancy
can be removed easily.
File processing system

File processing systems at Pine Valley Furniture Company


Pine Valley Furniture Company manufactures high-quality all-wood furniture
and distributes it to retail stores nationwide. Among the firm’s several product
lines are computer desks, entertainment centers, dinette sets, bookcases, and
wall units. Customers submit orders to Pine Valley furniture by any of several
means i.e., telephone, mail, fax (or) electronic forms via the Internet.
Early computer applications at Pine Valley furniture used the traditional file
processing approach.
Three of the computer applications based on the file processing approach
are shown in below figure. The systems illustrated are order filling, Invoicing
and payroll. The figure also shows the major data files associated with each
application. For example the order filling system has three files; customer
master, inventory master and backorder.
Disadvantages of file processing systems
• Several disadvantages are associated with conventional file processing
systems. These disadvantages are
1. Program Data dependency: File descriptions are stored within each
application program that accesses a given file. For example, in the
Invoicing system in the above figure program A accesses both the
Inventory pricing file and the customer master file. Therefore, this
program contains a detailed file description for both of these files. As a
consequence any change to a file structure requires changes to the file
description for all programs that the file
2. Duplication of Data: Since applications are often developed
independently in file processing systems unplanned duplicate data files
are rule rather than the exception. For example in the above figure the
order filling system contains an Inventory pricing file, while the
Invoicing system contains an Inventory pricing file. These files
undoubtedly both contain data describing Pine Valley Furniture
company’s products. Such as product description, unit price and
quantity on hand. This duplication wasteful since it requires additional
storage space and increased effort to keep all files up to date.
Unfortunately, duplicate data files often result in loss of data integrity.
3. Limited data sharing: With the traditional file processing approach,
each application has its own private files and users have little
opportunity to share data outside their own applications. Notice in the
above figure. For example, that user in the accounting department has
access to the Invoicing system and its files, but they probably do not
have access to the order filling system (or) the payroll system and their
files. It is often frustrating to managers to find that a requested report
will require a major programming effort to obtain data from several
incompatible files in separate systems.
4. Lengthy Development Times: With traditional file processing
systems, there is little opportunity to leverage previous development
efforts. Each new application requires that the developer essentially
start from scratch by designing new file formats and descriptions and
then writing the file access logic for each new program. The lengthy
development times required are often inconsistent with today’s fast
paced business environment.
5. Excessive Program Maintenance: The preceding factors all combine
to create a heavy program maintenance load in organizations that rely
on traditional file processing systems. In fact, as much as 80 percent of
the total information systems development budget may be devoted to
program maintenance in such organizations. This of course leaves little
opportunity for developing new applications
Advantages and disadvantages of DBMS :

In a typical file-processing system, records are stored in various files. A


number of different application programs are written to extract records
from and add records to the appropriate files. File-processing system
has a number of major disadvantages, such as data redundancy, data
inconsistency, un sharable data, unstandardized data, insecure data,
incorrect data etc.
Database management system answers all these problems as it
provides a centralized control over data.
The advantages of DBMS are as follows:
1. Reduces the data redundancy to a large extent.
Data redundancy means duplication of data. Non-database systems
maintain separate copy of data for each application. The database
systems do not maintain separate copies of the same data. Rather, all
the data are kept at one place and all the applications that require data
refer to the centrally maintained database.
2. Databases can control data inconsistency to a large extent
When the redundancy is not controlled, there may be occasions on
which the two entries about the same data do not agree. At such
times, database is said to be inconsistent. Obviously, an inconsistent
database will provide incorrect or conflicting information.
3. Databases facilitate sharing of data
Sharing of data means that individual pieces of data in the database may
be shared among several different users, in the sense that each of those users
may have access to the same piece of data and each of them may use it for
different purposes.
4. Databases enforce standards
The database management system can ensure that all the data follow the
applicable standards. There may be certain standards laid by the company or
organization using the database. Similarly, there may be national or
international standards.
5. Databases can ensure data security
A database management system ensures data security and privacy by
ensuring that the only means of access to the database is through the proper
channel and also by carrying out authorization checks whenever access to
sensitive data is attempted.
6. Program-Data Independence
The separation of data description (metadata) from the application
programs that use the data is called data independence. With the
database approach, data descriptions are stored in a central location
called the repository.
7. Increased Productivity of Application Development
A major advantage of the database approach is that it greatly reduces
the cost and time for developing new business applications.
8. Improved Data Quality
The database approach provides a number of tools and processes to
improve data quality.
9. Improved Data Accessibility and Responsiveness
With a relational database, end users without programming
experience can often retrieve and display data, even when it crosses
traditional departmental boundaries.
Disadvantages of DBMS

• In spite of the advantages of using a DBMS, there are a few situations


in which such a system may involve unnecessary overhead costs that
would not be incurred in traditional file processing. The overhead
costs of using a DBMS are due to the following.
• High initial investment in hardware, software and training.
• The generality that a DBMS provides for defining and processing data.
• Overhead for providing security, concurrency control, recover, and
integrity functions.
• Additional problems may arise if the database designers and DBA do
not properly design the database or if the database systems
applications are not implemented properly.
Functions of the DBMS
A DBMS performs several functions that guarantee the integrity and
consistency of the data in the database. Most of those functions are
transparent to end users, and most can be achieved only through the
use of a DBMS.
1. Data dictionary management:- The DBMS stores definitions of the
data elements and their relationships (metadata) in a data dictionary.
In turn, all programs that access the data in the databases work
through the relationship, thus relieving you from having to code such
complex relationships in each program. In other words, the
DBMSprovides data abstraction and it removes structural and data
dependency from the system.
2. Data storage management: The DBMS creates and manages the
complex structures required for data storage, thus relieving you from the
difficult task of defining and programming the physical data
characteristics. Data storage management is also important for database
performance tuning. Performance tuning relates to the activities that
make the database perform more efficiently in terms of storage and
access speed.
3. Data transformation and presentation: The DBMS transforms entered
data to conform to required data structures. The DBMS relieves you of
the chore of making a distinction between the logical data format and the
physical data format. That is, the DBMS formats the physically retrieved
data to make it conform to the user’s logical expectations.
4. Security management: The DBMS creates a security system that
enforces user security and data privacy. Security rules determine which
users can access the database, which data items each user can access,
and which data operations the user can perform.
5. Multiuser access control: To provide data integrity and data
consistency, the DBMS uses sophisticated algorithms to ensure that
multiple users can access the database concurrently without
compromising the integrity of the database.
6. Backup and recovery management: The DBMS provides backup and
data recovery to ensure data safety and integrity. Current DBMS systems
provide special utilities that allow the DBA to perform routine and special
backup and restore procedures. Recovery management deals with the
recovery of the database after a failure, such as a bad sector in the disk
or a power failure.
7. Data integrity management: The DBMS promotes and enforces
integrity rules, thus minimizing data redundancy and maximizing data
consistency. The data relationships stored in the data dictionary are
used to enforce data integrity. Ensuring data integrity is especially
important in transaction-oriented database systems.
8. Database access languages and application programming interfaces:
The DBMS provides data access through a query language. A query
language is a nonprocedural language-one that lets the user specify
what must be done without having to specify how it is to be done.
9. Database communication interfaces: Current-generation DBMSs
accept end-user requests via multiple, different network environments.
For example, the DBMS might provide access to the database via the
Internet through the use of Web browsers such as Mozilla Firefox or
Microsoft Internet Explorer.
 
Data models
Data model: A Model is an abstraction of a more complex real-world
object or event. A data model is the relatively simple representation,
usually graphical of complex real world data structures. The data
model’s main function is to help us understand the complexities of the
real world environment.
• Within the database environment, a data model represents data
structures and their characteristics, relations, constraints, and
transformations. Good database design uses an appropriate data
model as its foundation.
• A data model provides a blueprint of the data that is required for a
functional system.
Types of database models
Data modeling or database modeling is a technique that records the inventory,
shape, size, contents and rules of data elements used in the scope of business
process. The business process scope may be as large as a multidiscipline global
corporation, or as small as the receiving of dock. Simply we can define data model
as modeling of data for an organization.
Types of Database models:
1. Flat file database model
2. Hierarchical database model
3. Network database model
4. Relational database model
5. The E-R model.
6. Object oriented model
7. Object relational Database model
Flat file database model
A flat file database consists of one or more readable files, normally
stored in a text format. Information in these files is stored as fields, the
fields having either a constant length or a variable length
Every flat file database system is different because companies store
different data and companies have different needs.
• Hierarchical database model:
• The architecture of a hierarchical model is based on the concept of
parent/child relationships. To overcome the problems flat file model it is
developed. It a hierarchical database, a root table, or parent table, resides
at the top of the structure, which points to child tables containing related
data. The structure of hierarchical database model appears are an inverted
tree

Publishers

Authors Book Store

Titles Inventory Orders


Network Database model:
Improvements were made to the hierarchical model in order to
derive the network model. One of the model main advantages of the
network model is the capability of a parent to share relationships with
child. This means that a child table can have multiple parent cables.
Additionally, a user can access data by starting with any cable in the
structure, navigating either up or down in the tree. The user is not
required to access a root table first to get to child tables.

Publishers

Authors Book Store

Titles Inventory Orders


The Relational Database Model

• The relational model was introduced in 1970 by E.F.Codd in his


landmark paper ”A Relational Model of Data for Large shared
Databanks”. The relational model represented a major breakthrough
for both users and designers. To use an analogy, the relational model
produced an “automatic transmission“ database to replace the
“standard transmission” databases that proceeded it . Its conceptual
simplicity set the stage for genuine database revolution.
The Entity Relationship (E-R) model
• The entity relationship data model is based on a perception of a real
world that consists of collection of a basic objects called entities and
relationships among these objects. Entities are described in a
database by a set of attributes. For example, the attributes acc_no,
and balance may describe one particular account in a bank, and they
from attributes of the account entity set. Similarly, attributes
cust_name,cust_street and cust_city may describe a customer entity

C_Name C_Street
Acc_Num balance

Cust_Id C_City

deposit
CUSTOMER or ACCOUNT
 
Object oriented data base model
• An OO programming language allows the programmer to work with
objects to a time an application that interacts with a relational
database. During the last few years, object oriented programming has
become popular with languages such as C++, VB & Java. For ex.
Elements within a program or database application are usually
represented as objects. These objects have properties, which can be
modified, and can also be inherited from other objects. Related types
of objects are assigned various properties that can be adjusted to
define the particular object and determine how the object will act.
Object Relational Database Model
• Although some major seems exist between the object oriented and relational
models, the object relational model was developed with the objective of the
concepts of the relational databases model with object-oriented programming
style. The OR model is supposed to represent the best of both workers
( relational & OO ), although OR model is still early in development.

Person
Eno fname varchar
E_inf person &name varchar
Address_inf address initial varchar
Phone number
Address
Street varchar
City varchar
State varchar
Components and Interfaces of Database Management
System

A database management system involves five major components:


• Hardware
• Software
• Data
• Users/People
• Procedures
The interfaces between the components are shown in the figure
• Hardware:
The hardware can range from a single personal computer to a mainframe or
to a network of computers. The particular hardware depends on the
requirements of the organization and the DBMS used. A DBMS requires a
minimum amount of main memory and disk space to run, but this minimum
configuration may not necessarily give acceptable performance.
• Software:
The software includes the DBMS software, application programs together
with the operating systems including the network software if the DBMS is
being used over a network. The application programs are written in third
generation programming languages like ‘C’, COBOL, etc. or using fourth
generation language such as SQL, embedded in a third generation language.
The target DBMS may have its own fourth generation tools which allow
development of applications through the provision of non procedural query
languages, report generators, graphics generators and application generators.
• Data:
Database is an organized collection of logically related data, usually designed to meet the
information needs of multiple users in an organization. It is important to distinguish between
the database and the repository. The repository contains definitions of data, whereas the
database contains occurrences of data.
The data in the database is integrated, shared and persistent.
• Integrated Data: A data can be considered to be a unification of several distinct data files and
when any redundancy among those files is eliminated.
• Shared Data: A database contains data that can be shared by different users for different
applications simultaneously.
• Persistent Data: A data which cannot be removed from the database as a side effect of some
other process. Persistent data have a life span that is not limited to single execution of the
programs that use them.

Users/people interacting with database:

Procedure:Procedures are the rules that govern the design and the use of database.
Components of Database Environment:
The major components of a typical database environment and
their relationships are shown below
• Computer-aided software engineering (CASE) tools : CASE Tools are automated
tools used to design databases and application programs.
• Repository: Repository is Centralized knowledge base for all data definitions, data
relationships, screen and report formats, and other system components. A
repository contains an extended set of metadata important for managing
databases as will as other components of an information system.
• Database management system (DBMS): DBMS is a Commercial software (and
occasionally, hardware and firmware) system, which is used to define, create,
maintain, and provide controlled access to the database and also to the repository.
In other words DBMS is collection of logically related data and set of programs to
operate data.
• Database: Database is an organized collection of logically related data, usually
designed to meet the information needs of multiple users in an organization. It is
important to distinguish between the database and the repository. The repository
contains definitions of data, whereas the database contains occurrences of data.
• Application Programs: Application programs are Computer programs that are
used to create and maintain the database and provide information to users.
• User interface Languages, menus, and other facilities by which users
interact with various system components, such as CASE tools, application
programs, the DBMS, and the repository.
• Data administrators Persons who are responsible for the overall
information resources of an organization. Data administrators use CASE
tools to improve the productivity of database planning and design.
• System developers Persons such as systems analysts and programmers
who design new application programs. System developers often use
CASE tools for system requirements analysis and program design.
• End users Persons throughout the organization who add, delete, and
modify data in the database and who request or receive information
from it. All user interactions with the database must be routed through
the DBMS.
Ranges of Database Applications
The range of database applications can be divided into five categories: Personal
databases, workgroup databases, department databases, enterprise databases,
and Internet, Intranet, and Extranet databases.
Personal Databases:
Personal databases are designed to support one user. Personal databases have
long resided on personal computers (PCs), including laptops. Recently the
introduction of personal digital assistants (PDAs) has incorporated personal
databases into handheld devices that not only function as computing devices but
also as cellular phones, fax senders, and Web browsers.
• Personal databases are widely used because they can often improve personal
productivity. However, they entail a risk: The data cannot easily be shared with
other users. For this reason, personal databases should be limited to those rather
special situation (such as in a very small organization) where the need to share
the data among users of the personal database is unlikely to arise.
Workgroup Database:
A workgroup is a relatively small team of people who collaborate
on the same project or application or on a group of similar projects or
applications. A workgroup typically comprises fewer than 25 persons.
A workgroup database is designed to support the collaborative efforts
of such a team.
• The method of sharing the data in this database is shown in below
figure. Each member of the workgroup has a desktop computer and
the computers are linked by means of a local area network (LAN). The
database is stored on a central device called the database server,
which is also connected to the network. Thus each member of the
workgroup has access to the shared data.
• Workgroup database with local area network
Project Manager
Developer 1 Developer n Librarian

Local area network

Database server

Workgroup database

Department Databases:
A department is a functional unit within an organization. Typical examples
of department are personnel, marketing, manufacturing, and accounting.
A department is generally larger than a workgroup (typically between 25
and 100 persons) and is responsible for a more diverse range of functions.
• Department databases are designed to support the various functions and
activities of a department
Enterprise Databases
An enterprise database is one whose scope is the entire
organization or enterprise (or, at least, many different departments).
Such databases are intended to support organization-wide operations
and decision making. An enterprise database does, however, support
information needs from many departments. Over the last decade, the
evolution of enterprise databases has resulted in two major
developments:
1. Enterprise resource planning (ERP) systems
2. Data warehousing implementations.
• An enterprise data warehouse

Branch
Office -1

Branch
Office-2

Branch Corporate Office


Office-3
Data
warehouse
Branch
Office- 4

Branch
Office-5
Internet, Intranet and Extranet Databases

Internet: The most recent changes that affects the database


environment is the ascendance of the Internet, a worldwide network
that connects users of multiple platforms easily through an interface
known as a Web browser.
Extranet: Use of Internet protocols to establish limited access to
company data and information by the company’s customers and
suppliers.
Intranet: Use of Internet protocols to establish access to company data
and information that is limited to the organization
Database Architecture

• Database architecture essentially describes the location of all the pieces of


information that make up the database application. The database
architecture can be broadly classified into the following categories
• Two-Tier Architecture
• Three-tier Architecture
• Multitier Architecture
Two-Tier Architecture
The two-tier architecture is a client–server architecture in which
the client contains the presentation code and the SQL statements for
data access. The database server processes the SQL statements and
sends query results back to the client. Two-tier client/server provides a
basic separation of tasks. The client, or first tier, is primarily responsible
for the presentation of data to the user and the “server,” or second tier,
is primarily responsible for supplying data services to the client.
Presentation Services
“Presentation services” refers to the portion of the application which presents data to
the user. In addition, it also provides for the mechanisms in which the user will interact with
the data. More simply put, presentation logic defines and interacts with the user interface.
The presentation of the data should generally not contain any validation rules.
Application Services
“Application services” provide other functions necessary for the application.
Business Services/objects
“Business services” are a category of application services. Business services encapsulate
an organizations business processes and requirements. These rules are derived from the
steps necessary to carry out day-today business in an organization. These rules can be
validation rules, used to be sure that the incoming information is of a valid type and format,
or they can be process rules, which ensure that the proper business process is followed in
order to complete an operation.
Data Services
“Data services” provide access to data independent of their location. The data can come
from legacy mainframe, SQL RDBMS, or proprietary data access systems.
Advantages of Two-tier Architecture
• The two-tier architecture is a good approach for systems with stable
requirements and a moderate number of clients.
• The two-tier architecture is the simplest to implement, due to the number
of good commercial development environments.
Drawbacks of Two-tier Architecture
• Software maintenance can be difficult because PC clients contain a mixture
of presentation, validation, and business logic code.
• To make a significant change in the business logic, code must be modified
on many PC clients.
• Moreover the performance of two-tier architecture can be poor when a
large number of clients submit requests because the database server may
be overwhelmed with managing messages.
• With a large number of simultaneous clients, three-tier architecture may be
necessary.
Three-tier Architecture

Three-tier architecture offers a technology neutral method of


building client/server applications with vendors who employ standard
interfaces which provide services for each logical “tier.” Through
standard tiered interfaces, services are made available to the
application. A single application can employ many different services
which may reside on dissimilar platforms or are developed and
maintained with different tools. This approach allows a developer to
leverage investments in existing systems while creating new application
which can utilize existing resources.
Multitier Architecture
• A multi-tier, three-tier, or N-tier implementation employs a three-tier
logical architecture superimposed on a distributed physical model.
Application Servers can access other application servers in order to
supply services to the client application as well as to other Application
Servers. The multiple-tier architecture is the most general client–
server architecture.
• It can be most difficult to implement because of its generality.
However, a good design and implementation of multiple-tier
architecture can provide the most benefits in terms of scalability,
interoperability, and flexibility. In the above example, the client
application looks to Application Server #1 to supply data from a
mainframe-based application..
• Application Server #1 has no direct access to the mainframe application, but it
does know, through the development of application services, that Application
Server #2 provides a service to access the data from the mainframe application
which satisfies the client request. Application Server #1 then invokes the
appropriate service on Application Server #2 and receives the requested data
which is then passed on to the client. Application Servers can take many forms.
An Application Server may be anything from custom application services,
Transaction Processing Monitors, Database Middleware, Message Queue to a
CORBA/COM based solution
DBMS Vendors and their Products

Some of the popular DBMS vendors and their corresponding products are as follows
vendor product
IBM –DB2/MVS
–DB2/UDB –DB2/400
–Informix Dynamic Server (IDS)
Microsoft –Access
–SQLServer
–DesktopEdition(MSDE)
Open Source –MySQL
–PostgreSQL
Oracle –Oracle DBMS
–RDB
Sybase –Adaptive Server Enterprise (ASE)
–Adaptive Server Anywhere (ASA)
–Watcom
Different views of Database/Abstraction

A collection of interrelated files and a set of programs that allow users


to access and modify these files are known as database management
system.
• A major purpose of a database system is to provide the users only
that much information that is required by them. This means that the
system does not disclose all the details of data, rather it hides certain
details of how the data is stored and maintained.
• Since the requirements of different users differ from one another,
the complexity of the database is hidden from them, if needed,
through several levels of abstraction in order to simplify their
interaction with the system.
Various Levels of Database Implementation
A database is implemented through three general levels: internal,
conceptual and external so as cater to the needs of its users.
1. Internal Level (Physical Level). The lowest level of abstraction, the internal
level, is the one closest to physical storage. This level is also sometimes
termed as physical level. It describes how the data are actually stored on the
storage medium. At this level, complex low-level data structures are
described in details.
2. Conceptual Level This level of abstraction describes what data are actually
stored in the database. It also describes the relationships existing among
data. At this level, the database is described logically in terms of simple data-
structures. The users of this level are not concerned with how these logical
data structures will be implemented at the physical level. Rather, they just
are concerned about what information is to be kept in the database.
3. External Level (View Level). This is the level closest to the users and is
concerned with the way in which the data are viewed by individual users.
Most of the users of the database are not concerned with all the information
contained in the database. Instead, they need only a part of the database
relevant to them. For example, even though the bank database stores a lot
much information, an account holder (a user) is interested only in his account
details and not with the rest of the information stored in the database. To
simply such users’ interaction with the system, this level of abstraction is
defined. The system thus provides many views for the same database. The
below figure illustrates the interrelationship among these three levels of
abstraction.
View 1 View 1
View 1 View 2 View 3 Item-Name Price Item-Name Price
   

Conceptual Level

Conceptual
Item-Number Character (6)
Physical Level Item-Name Character (20)
Price Numeric (5+2)
Reorder-quantity Numeric (4)

Internal
Stored-Item Length = 40
Item# Type = Byte (6), Offset = 0, Index = 1x
Name Type = Byte (20), Offset = 6
Price Type = Byte (8), Offset = 26
ROQ Type = Byte (4), Offset = 34
UNIT –
III
Building Blocks of an Entity–Relationship Diagram

ER diagram is a graphical modeling tool to standardize ER modeling. The


modeling can be carried out with the help of pictorial representation of
entities, attributes, and relationships. The basic building blocks of Entity-
Relationship diagram are Entity, Attribute and Relationship.
Entity
An entity is an object that exists and is distinguishable from other objects.
• Ex: person, place, department etc
Entity Type
An entity type or entity set is a collection of similar entities. Some
examples of entity types are:
• All students in EDC, say STUDENT.
• All courses in EDC, say COURSE.
• All departments in EDC, say DEPARTMENT.
• An entity may belong to more than one entity type. For example, a
staff working in a particular department can pursue higher education
as part-time. Hence the same person is a LECTURER at one instance
and STUDENT at another instance.
Relationship
A relationship is an association of entities where the association includes
one entity from each participating entity type whereas relationship type is
a meaningful association between entity types.
• Teaches is the relationship type between LECTURER and STUDENT.
• Buying is the relationship between VENDOR and CUSTOMER.
• Treatment is the relationship between DOCTOR and PATIENT.
Attributes
Attributes are properties of entity types. In other words, entities are
described in a database by a set of attributes.
• Brand, cost, and weight are the attributes of CELLPHONE.
• Roll number, name, and grade are the attributes of STUDENT.
ER Diagram
The ER diagram is used to represent database schema.
In ER diagram:
• A rectangle represents an entity set.
• An ellipse represents an attribute.
• A diamond represents a relationship.
• Lines represent linking of attributes to entity sets and of entity sets to
relationship sets.
• Entity Set 

• Attribute 

• Relationship 
Classification of Entity Sets

• Entity sets can be broadly classified into:


• Strong entity
• Weak entity.
• Associative entity.

Entity Set
Strong Entity
Strong entity is one whose existence does not depend on other entity.
Example
• Consider the example, student takes course. Here student is a strong
entity

Student Takes Course


Weak Entity
Weak entity is one whose existence depends on other entity. In many
cases, weak entity does not have primary key

depends
EMPLOYEE DEPENDENTS
Associative entity 
An associative entity is a term used in relational and entity–relationship
 theory. A relational database requires the implementation of a base
relation (or base table) to resolve many-to-many relationships. A base
relation representing this kind of entity is called, informally, an associative
table.
Attribute Classification
Attribute is used to describe the properties of the entity. This attribute can be
broadly classified based on value and structure. Based on value the attribute
can be classified into single value, multivalue, derived, and null value attribute.
Based on structure, the attribute can be classified as simple and composite
attribute
Symbols Used in ER Diagram

The elements in ER diagram are Entity, Attribute, and Relationship. The


different types of entities like strong, weak, and associative entity, different
types of attributes like multivalued and derived attributes and identifying
relationship and their corresponding symbols are
Single Value Attribute
Single value attribute means, there is only one value associated with
that attribute.
Multivalued Attribute
In the case of multivalue attribute, more than one value will be
associated with that attribute.
Derived Attribute
The value of the derived attribute can be derived from the values of
other related attributes or entities. In ER diagram, the derived attribute is
represented by dotted ellipse
Null Value Attribute
In some cases, a particular entity may not have any applicable value
for an attribute. For such situation, a special value called null value is
created.
Composite Attribute
Composite attribute is one which can be further subdivided into simple
attributes
Relationship Degree
Relationship degree refers to the number of associated entities. The
relationship degree can be broadly classified into unary, binary, and ternary
relationship
Unary Relationship
The unary relationship is otherwise known as recursive relationship. In the
unary relationship the number of associated entity is one. An entity related to
itself is known as recursive relationship
Binary Relationship
In a binary relationship, two entities are involved. Consider the example;
each staff will be assigned to a particular department. Here the two entities
are STAFF and DEPARTMENT

Ternary Relationship
In a ternary relationship, three entities are simultaneously involved. Ternary
relationships are required when binary relationships are not sufficient to
accurately describe the semantics of an association among three entities
Specialization and Generalization (or)
Characteristics of Super type/Subtypes

There are two processors “Specialization” and “Generalization” that


serve as mental models in developing super type / subtype
relationships
Generalization: In data modeling “Generalization is the process of
defining a more general entity types from a set of more specialized
entity types”. Thus generalization is a bottom-up approach process. The
example of generalization is as show in the following figure.
• The above figure represents three entity types car, truck motorcycle.
At this stage data modeler intends to represents this separately on E-R
diagram however, on close examination, we see that three entity
types have a no. of attributes is common and they are vehicle-
identifier, vehicle-name, price and engine displacement.
• This fact suggests that each of the three entity types is really a version a
more general entity type. This more general entity type vehicle, together
with the resulting supertype/subtype relationship is shown below.
• The entity CAR has the specific attribute no. of passengers, while truck has
two specific attributes capacity and cab-type. Thus, generalization has
allowed us to group entity types, along with their common attributes and at
the same time preserve specific attributes that are unique to each subtype.
• The entry type motorcycle is not included in the relationship because it
doesn’t satisfy the subtype conditions.
• Referring to the figure (d) you will notice that the only attributes of
motorcycle are those that are common to all vehicles, there are no
attributes specific to motorcycles. Motorcycle does not have a relationship
to another entity type. Thus, there is no need to create a motorcycle
subtype. The fact that there is no motorcycle subtype suggests that it must
be possible to have an instance of vehicle that is not a member of any of its
subtypes.
b) Specialization: Specialization is “the process of defining one or more
subtypes of the supertype and forming supertype / subtype relationship”. Each
subtype is formed based on some distinguishing characteristic, such as
attributes or relationships specific to the subtype specialization is a top down
process, direct reverse of generalization.
Fig. (b) Specialization to MANUFACTURED PART and PURCHASED
PART
• The fig(a) shows an entity type named PART together with its attributes. The
identifier is part no and other attributes include description, unit-price,
location, Quantity_On_Hand, routing number and supplier. Supplier a
multivolume attribute. Since there may be more than one supplier with
associated unit price for a part
• In discussion with users we discover that there are two possible sources
for parts some are manufactured internally, while other are purchased
from outside suppliers. Further, we discover that some parts of are
obtained from both sources.
• Some of the attributes of entity type in fig(a) apply to all parts
regardless of source. However, others depend on the source. Thus,
routing-number applies only to manufactured parts. While, supplier-id
and unit-price apply only to purchased parts. Analyst suggested that
they create a new relationship between purchased part and supplier.
This relationship (named supplies) allows users to more easily
associated purchased parts with their suppliers. The attribute unit-price
is now associated with the relationship supplies, so that the unit price
for a part may vary from one supplier to another. These factors suggest
that part should be specialized defining the subtypes manufactured part
and purchased part as show in the fig (b)
Combining Generalization and Specialization

Specialization and Generalization are both valuable technique for developing


supertype/subtype relationships. Which technique we use at particular time
depends on several factors such as the nature of the problem domain,
previous modeling efforts and personal preference. We should be prepare to
use both approaches and to alternate back and forth as dictated by the
proceeding factors.
Specifying constraints in Supertype/Subtype Relationships.
The specifying constraint in supertype/subtype relationship allows us to
capture some of the important business rules that apply to these
relationships; these two most important types of constraints are
“completeness” and “disjointness constraints”.
1.Specifying completeness constraints:
“A completeness constraint addresses the question whether an instance
of a supertype must also be a member of at least one subtype”.
• The completeness constraints have two possible rules.
• a. Total specialization rule.
• Partial specialization rule.
a. Total Specialization Rule:- “The total specialization rule specifies that
each entity instance of the supertype must be a member of some subtype in
the relationship”.
• This figure introduces the notation for total specialization. In this example, the
business rule is the following “A patient must be either an OUTPATIENT or
RESIDENT PATIENT” total specialization is indicated by the double line extending
from the patient entity type to the circle.
b) Partial Specialization Rule:- The partial specialization rule specifies that the
entity instance of the supertype is allowed not belong to any subtype.
If a vehicle is a CAR, it must appear as an instance of CAR and if it is a truck, it
must appear as an instance of truck. However, if the vehicle is a motorcycle, if
cannot appear as an instance of any subtype. It is an example of partial
specialization and it is specified by the single line from the vehicle, super type to
the circle.
II) Specifying Disjoint ness constraints:
A disjoint ness constraint addresses the question whether an instance of a
supertype may simultaneously a member of two or more subtypes.
• The disjoint ness constraints have two possible rules.
• The disjoint rule and
• The overlap rule
a. The disjoint rule:- The disjoint rule specified that if an entity instance (of the
supertype) is a member of one subtype. It can’t simultaneously be a member of
any other subtype.
• At any given time a PATEINT must be either an OUT PATIENT or a RESIDENT PATIENT
but cannot be both. This is the disjoint rule as specified by the letter‘d’ in the circle
joining the supertype and its subtypes.
b. Overlap rule: - The overlap rule specified that an entity instance can simultaneously
be a member of two or more subtype. The following figure shows an example of
overlap rule. Below figure shows the entity type “PART” with its two sub types,
“MANUFACTURED PART” and “PURCHASED PART”. Some parts are both manufactured
and purchased.
• The overlap rule is specified by placing the letter ‘O’ the circle. Notice in the figure
that the total specialization rule is also specified as indicated by the double line. Thus
any part must be either or purchased part or a manufactured part or it may
simultaneously be both of these.
II) Specifying subtype discriminators:
A subtype discriminator is an attribute of the supertype, whose values
determine the target subtype or subtypes.
a. Disjoint Subtypes:- An example of the use of a subtype discriminator is shown
in the following figure.
• In the above figure, a new attribute “Employee - type” has been added to the
supertype to serve as subtype discriminator”. When a new employee is added
to supertype, this attribute is coded with one of three values as follows:
• “H” – for hourly
• “S” – for salaried
• “C” – for consultant.
• Depending on this code the instance is then assigned to the appropriate
subtype.
• The notation we use to specify the subtype discriminator is also shown in the
figure. The expression “Employee-type” is placed next to the line leading
from the supertype to the circle. The value of the attribute that selects the
appropriate subtype is placed adjacent to the line leading to the subtype.
Thus, for example to condition employee – type = S, causes an entity
instance to be inserted in to the SALARIED EMPLOYEE subtype.
b. Overlaping Subtypes:- When subtypes overlap, a slightly modified
approach must be applied for the subtype discriminator. The reason is that a
given instance of the supertype may require that we create an instance in
more than one subtype.
• An example of this situation is shown in the following figure.
• The figure shows an entity PART and its overlapping subtypes. A new attributed
named part-type has been added to PART Part type is a composite attribute
with components manufactured? And purchased? Each of these attributes is a
Boolean Variable (i.e., it takes only the values yes “Y” and No “N”). When a
new instance is added to PART, these components are coded as follows.
Normalization
Normalization is a design technique that is widely used as a guide
in designing relational databases. Normalization is essentially a
two step process that puts data into tabular form by removing
repeating groups and then removes duplicated data from the
relational tables. Normalization theory is based on the concepts
of normal forms. A relational table is said to be a particular
normal form if it satisfies certain set of constraints
Advantages of Normalization

• Avoids data modification (INSERT/DELETE/UPDATE) anomalies as each data


item lives in One place
• Greater flexibility in getting the expected data in atomic granular
• Normalization is conceptually cleaner and easier to maintain and change as
your needs change
• Fewer null values and less opportunity for inconsistency
• A better handle on database security
• Increased storage efficiency
• The normalization process helps maximize the use of clustered indexes, which
is the most powerful and useful type of index available. As more data is
separated into multiple tables because of normalization, the more clustered
indexes become available to help speed up data access
Disadvantages
1. Requires much more CPU, memory, and I/O to process thus
normalized data gives reduced database performance
2. Requires more joins to get the desired result. A poorly-written query
can bring the database down
3. Maintenance overhead. The higher the level of normalization, the
greater the number of tables in the database.
Basic and Higher Normal forms
• Normal forms are classified into two main categories they are
• 1. Basic Normal Forms
• 2. Higher Normal Forms.
1) Basic Normal Forms:
The basic normal forms are
• First normal form
• Second normal form
• Third normal form
First Normal Form:
A relation is in a first normal form if it contains no multi-valued attribute.
• For example the relation EMPLOYEE contains the attributes like Emp-Id, Emp-Name, Sal,
and gender. Here is no multi valued attribute. So the relation EMPLOYEE is in first normal
form.
• EMPLOYEE
Emp-No Emp-Name Sal Gender

Second Normal Form (2NF):


A relation is in second normal form if it is in first normal form and every non-key
attribute is fully functionally dependent on the primary attribute. That means no partial
functional dependency.
• If a relation is in second normal form if any one of the following conditions apply.
• A relation contains only one primary key.
• No non-key attributes in the relation.
• Every non-key attribute is functionally dependent on the full set of primary key
attributes (fully functional dependency)
• For example the relation EMPLOYEE contains the attributes like Emp-Id, Emp-
Name, course-Id, Sal, and Date-completed. In this example the primary key for
this relation is composite key of Emp-Id, Course-Id. Here the relation employee
is not in second normal form because the non-key attribute Emp-Name, Sal are
functionally dependent on the part of the primary key (Emp-Id, Partial
functional dependency). So we decompose the relation EMPLOYEE into new
relations EMPLOYEE.
EMPLOYEE
Emp-No Course-Id Emp-Name Sal Date-completed
Course-Id → Dat e-completed
Emp-Id→ Emp-Name
Emp-Id → Sal
EMPLOYEE
Emp-Id Emp-Name Sal
COURSE
Emp-Id Course_Id Date-completed
Third Normal Form (3NF):
A relation is in 3rd normal form if it is in second normal form and no
transitive dependency. A transitive dependency in a relation is a functional
dependency between two or more non-key attributes. For example consider
the following relation SALEs.
• SALES.
Cust-Id Cust-Name Salesperson-Id Region

• In the above relation cust-Id is the primary key. So all of the remaining
attributes are functionally dependent on this attribute. However there is a
transitive dependency. The attribute region is functionally dependent the
attributes salesperson-Id. So we decompose the above relation into new
relations that satisfy our 3rd normal form.
CUSTOMER
Cust-Id Cust-Name Salesperson-Id
SALES
Salesperson-Id Region

• Advanced Normal Forms/Higher Normal Forms:


• Advanced Normal Forms are
• Boyce codd normal form (BCNF)
• Fourth normal form
• Fifth normal form
• 1) Boyc/ Codd Normal Form:
• A relation is in Boyce codd normal form if it is in third normal form and the determinants
are candidate keys. For example consider a relation STUDENT with attributes like St.No,
subject, Adviser.
• In the above relation primary key is a composite key of St.No, subject. Here two functional
dependencies occur.
STUDENT
St-No Subject Adviser
St-No adviser
Subject Adviser
In the above second functional dependency determinant is adviser. It is not a
candidate key. So the above relation cannot follow Boyce codd normal form. So we
decompose the above relation into new relations.
STUDENT
St-No Adviser

ADVISER
Subject Adviser
2) Fourth Normal Form:
A relation is in fourth normal form if it is in Boyce codd normal form and
no multi valued dependency. Here multi-valued dependency is a functional
dependency that exists a non-key attribute is functionally dependent on two
or more sets of primary key attributes.
• For example consider a relation student contains attributes like st-No, st-
Name, course-Id and grade. Here primary key is a composite key of st-No,
st-Name, course-Id. In the above example the non-key attribute grade is
functionally dependent on st-no, course-Id, and st-name, course-Id.
STUDENT
St-No St-name Course-Id Grade
St-No, course-Id grade
St-Name, course-Id grade
The above relation contains multi-valued dependency. So it is not in fourth
normal form. To avoid the multi-valued dependency we decompose the above
relations into new relations.

St-No St-Name

St-No Course-Id Grade


Fifth Normal Form (5NF):- (Domain Normal Form) (Projection-join Normal Form)
A relation is in fifth normal form if it is in fourth normal form and that
contains joined dependency. Here join-dependency means if a relation
contains minimum of 3 attributes and every attribute may functionally
dependent on the remaining attributes. For example consider a relation
CLASS with attributes like subject, teacher and text-book. Here primary
key is a composite key of subject, teacher, text-book. The above relation
CLASS is not in fifth normal form because it satisfies join-dependency.
So we decompose the above relation into new relations.
Subject Teacher Text-book

Subject, teacher --- text-book Subject Teacher

Text-book, subject --- teacher


Teacher Text-book

Text-book, teacher --- subject


Text-book Subject
Aggregation and Composition

• Relationships among relationships are not supported by the ER


model. Groups of entities and relationships can be abstracted into
higher level entities using aggregation. Aggregation represents a
“HAS-A” or “IS-PART-OF” relationship between entity types. One
entity type is the whole, the other is the part. Aggregation allows us
to indicate that a relationship set participates in another relationship
set. The car has various components like tires, doors, engine, seat,
etc., which varies from one car to another. Relationship drives is
insufficient to model the complexity of this system. Part of
relationships allow abstraction into higher level entities. In this
example engine, tires, doors, and seats are aggregated into car.
• Composition is a stronger form of aggregation where the part cannot exist
without its containing whole entity type and the part can only be part of one
entity type. Consider the example of DEPARTMENT has PROJECT. Each
project is associated with a particular DEPARTMENT. There cannot be a
PROJECT without DEPARTMENT. Hence DEPARTMENT has PROJECT is an
example of composition
Relationships within the relational data
base / Mapping cardinalities
Mapping cardinalities (or) cardinality ratios express the number of
entities to which another entity can be associated via a relationship set.
Mapping cardinalities are most useful in describing binary relationship
sets although occasionally they contribute to the description of
relationship sets that involve more than two entity sets.
• For a binary relationship set R between entity sets A and B, the
mapping cardinality must be one of the following.
(i) One – to – One: An entity in A is associated with at most one entity in B;
and an entity in B is associated with at must one entity in A

a1 b1
 
(i)
a2
  One
a3 – to
 
a4

One – to - One

• EXAMPLE

Employee Parking Place


I-assigned
(ii)One – to – Many An entity in A is associated with any number of entities
in B. An entity in B, however can be associated with at most one entity in
A.
b1
 

a1 b2
 
 
b3
a2  
  b4
 
b5

Example:

One - to - One
(iii) Many – to – One An entity in A is associated with at most one entity
in B. An entity in B however, can be associated with any number of
entities in A.
a1
  b1
a2  
 
b2
a3  
 
b3
a4
 
a5

Many – to - One

• Example:
PRODUCT Product_Line
Contains

Many - to - One
(iv) Many-to-Many An entity in A is associated with any number of entities in
B and an entity in B is associated with any number of entities in A.

a1 b1
   
a2 b2
   
a3 b3
   
a4 b4

Example:
STUDENT Registers- Course
for
Entity Relationship model constructs(ER Model)

The basic constructs of the entity relationship model are entities,


relationships and attributes. The richness of the E-R Model allows
designers to model real world situations accurately and expressively,
which helps account for the popularity of the model.
Entity:
• An entity is an object that may represents person, place, object, event
or concept in the user environment about which the organization
wishes to maintain data.
• Eg: Employee, Student, Store, warehouse, Machine, Registration,
Account, course.
Entity type versus Entity Instance:
Entity Type: An entity type “is a collection of entitles that share common
properties or characteristics”. Each type in an E-R model is given a name.
Since the name represents a collection of items, all it is always singular. We
use capital letters for names of entity types. In an E-R diagram the entity
name is placed inside the box representing the entity type.
Entity Instance: An entity instance “is a single occurrence of an entity
type”. An entity type is described just once (Using metadata) in database,
while many instances of that entity type may be represented by data stored
in the databases.
Strong Vs weak entity type
Strong entity: An entity that exists independently of other entity type, then it is
called strong entity type or An entity set that has a primary key is termed as a
strong entity set. Strong entity is denoted by a symbol single lined Box

• Weak entity: An entity set may not have sufficient attributes to form a primary
key such entity set is termed as weak entity set (or) An entity type whose
existence depends on some other entity type is called weak entity type. weak
entity type has no meaning in the E- R diagram without the entity on which it
depends. the entity type on which the weak entity type depends is called “the
identifying owner” (or) simply called “owner” . Weak entity is denoted by a
symbol double lined Box.
Identity Relationship : The relationship that associates the weak entity set
with an owner is the identifying relationship.
Attributes: property or characteristic of an entity type that is of interest to
the organization is called attribute. An attribute is denoted by a symbol ellipse

• Ex:
FACULTY
•  In above example
F_Id,Name,Dob,Age,Skill
Qualification
are attributes of the FACULTY

FACULTY entity.
Types of attributes

1. Simple Vs Composite Attribute:


Simple attribute:
A ‘Simple” or atomic attribute is an attribute that cannot be broken down
into smaller components. In above Example F_Id, Dob, Skill, Age, Qualificaton
are Simple attributes.
Composite attribute: A “composite” attribute is an attribute that can broken
down into component parts. In above Example Name is Composite attribute.
Because it is broken in to 3 parts(Components) namely F_Name, M_Name,
and L_Name.
Single valued Vs Mutlivalued Attribute
2. Single valued Vs Mutlivalued Attribute:
Single valued attribute :A “Single valued’ attribute is an attribute that takes
only one value for a given entity instance.
• Eg : The F_Id attribute of an entity FACULTY takes only one value for each
entity instance.
Mutlivalued Attribute: A “multi valued” attribute is an attribute that may
take more than one value for a given entity instance. We indicate a multi
valued attribute with a double lined ellipse
• Ex: Consider a FACULTY entity set with the
attribute Skill. Any particular
faculty may have Skill in more than one topic.
3. Null attributes: A null value is used when an entity does not have a value for
an attribute.
4. Stored Vs Derived Attributes :
Stored attribute: Some attribute values are calculated or derived from other
related attribute values that stored in the database.
• Ex : In FACULTY entity type F_Id, Dob are stored attributes.
• Derived Attribute: A “derived” attribute is an attribute whose values can be
calculated from related attribute values. We indicate a derived attribute in
an E-R Diagram by using an ellipse with a dashed line.
• Ex : The FACULTY entity type has Age attribute. If the users need to know
Age, that value must be derived from the attribute Dob.
5. Identifier attribute or Primary Key: An identifier is an attribute that
uniquely identifies individual instances of an entity type. The identifier for the
Faculty entity type is F_Id. Each entity instance must have a single value for
the attribute and the attribute must be associated with entity. We underline
identifiers name on the E-R Diagram.
Primary Key: A Primary key is a set of one or more attributes that can uniquely
identify tuples within the relation.
Composite identifier or Composite key attribute: A composite identifier is an
identifier that consists of a composite attribute.
• The following figure shows the entity set flight with the composite
identifier-flight-id. The Flight-id composite attribute inturn has component
attributes flight-no and date. This combination is required to uniquely
identify individual occurrences of flight.
Flight-no
date

No.of
passengers
Flight-id

Primary key Flight


Relationships:
A relationship is an association among the instances of one (or) more entity
types i.e., of interest to the organization.
• Eg : Consider the entity type employee and course where courses represents
training courses that may be taken by employees. To track courses that have
been completed by particular employees. We define a relationship called
“Completes” between two entity types as shown in the following figure.
• Relationship type (Completes)

Course-id Course-title

Employee-id Employee-
name

Topic

Birthdate

Employee Completes Course


Relationship instances:
Employee Course
Raju C++
Kiran Java
Suman COBOL
Hema

• This is a many-To- many relationship since each employee may complete any no. of
course, while a given course may be completed by any no. of employees.
• Associative Entities : An associative entity is an entity type that associates the
instances of one or more entity types and contains attributes that are peculiar to the
relationships between those entity instances,
• In the E-R model associative entities are represented with the diamond relationship
symbol enclosed within entity box. The purpose of this symbol is to preserve the
information that the entity was initially specified as a relationship on E-R Model.
• An associative entity (Certificate)

Employee-id Employee-name Certificate-Number


DOC Course -id

Course -Title

EMPLOYEE CERTIFICATE Course


• Figure (b) shows the relationship “completes” converted to an
associative entity type. In this case the training department for the
company has decided to award a certificate to each employee who
completes a course. Thus, the entity is named certificate which
certainly has independent meaning to end users. Each certificate has
a number that serves as a identifier. The attribute date completed is
also included.
• Constraint:
• A constraint is a restriction placed on the data. Constraints are
important because they help to ensure data integrity. Constraints are
normally expressed in the form of rules
Different Keys
Keys: It is important to be able to specify how rows in a relation are
distinguished conceptually, rows are distinct from one another, but from a
database perspective the difference among them must be expressed in terms
of their attributes. Keys come here for a rescue.
Primary key: A primary key is a set of one (or) more attributes that can
uniquely identify tuple within the relation.
• Ex: In our sample database, sup # is the primary key for supplier’s relation
(table) as it contains unique value for each tuple in the relation
Composite Primary Key: In some tables, combination of more than one
attribute provides a unique value for each row. In such tables, the group of
these attributes is declared as primary key. In such cases, the primary key
consists of more than one attribute; it is called composite-primary-key.
• Ex: Supp # and item # is the primary key for the shipments relation (table).
Candidate Key: All attribute combinations inside a relation that can serve as primary
key are candidate keys as they are candidates for the primary key position.
• Ex: In our sample database, there are two candidate keys supp # and sup-name in
the suppliers relation. Both of these attributes contain unique values for each tuple.
Super Key
A super key is a column (or) set of columns that uniquely identifies a row within a
table.
• Ex:
Given table : Employees { employee-id, first-name, sur-name, sal }
Possible super keys are :
• { Employee_Id}
• { Employee_Id, First_Name }
• { Employee_Id, First_Name, Surname }
• { Employee_Id, First_Name, Surname, Sal }
Secondary Key: It is defined as a key that is used strictly for data retrieval
purposes.
Ex: Suppose customer data are stored in a CUSTOMER table in which the
customer number is the primary key. Suppose, that some of the customers
forget their number? Data retrieval for a customer can be facilitated when the
customer’s last name and phone number are used. In that case, the primary
key is the customer number, the secondary key is the combination of the
customer’s last name and phone number.
Foreign Key: A non-key attribute, whose values are derived from the primary
key of some other table, is known as foreign-key in its current table.
SUPPLIER
Supp # Supp-Name Status city
S1 Britannia 10 Delhi
S2 New-Bakers 30 Mumbai
S3 Mother Dairy 10 Delhi
S4 Cockz 50 Bangalore
S5 Haldiram 40 Jaipur
SHIPMENT
Supp # Item # Qty-supplied
S1 12 10
S1 13 20
S1 16 20
S2 14 20
S2 15 10
S3 11 10
S3 17 10
S4 18 30
S5 19 30
Data dictionary and System catalog
Data Dictionary:
An integral part of RDBMS is the data dictionary which stores
Meta data, (or) information about the database, including attribute names
and definitions for each table in the database. The data dictionary is usually
a part of the system catalog that is generated for each database.
• The system catalog describes all database objects, including
table-related data such as table names, table creators (or) owners, column
names and data types, foreign keys and primary keys, index files,
authorized users, user access privileges and so forth. The system catalog is
created by the DBMS and the information is stored in system tables, which
may be queried in the same manner as any other data table, if the user has
sufficient access privileges.
• The system catalog automatically produces database documentation. As
new tables are added to the database, that documentation also allows
the RDBMS to check for and eliminate homonyms and synonyms.
• Homonyms are similar-sounding words with different
meanings, such as boar and bore (or) identically spelled words with
different meanings such as fair
• ( meaning “just”) and fair (meaning “festival”).
• In a database context, the word homonym indicates the
use of the same attribute name to label different attributes. To lesser
confusion, you should avoid database homonyms, the data dictionary is
very useful in this regard.
• A synonym is the opposite of a homonym and indicates the use of
different names to describe the same attribute. For ex, can and auto refer
to the same object. Synonyms must be avoided.
Integrity constraints
Integrity Rules:
Relational database integrity rules are very important to good database
design. Many RDBMSs enforce integrity rules automatically. However, it is
much safer to make sure that your application design conforms to the
entity and referential integrity rules.
Entity Integrity:
All primary key entries are unique, and no part of a primary key may be
null. Each row will have a unique identity, and foreign key values can
properly reference primary key values.
Referential Integrity:
A foreign key may have either a null entry, as long as it is not a part of its tables
primary key, or an entry that matches the primary key value in a table to which it is
related. (Every non-null foreign key value must reference an existing primary key
value.)
Integrity Constraints: The relational data model includes several types of integrity
constraints. The purpose of integrity constraints is to implement the business rules in
the database. In relational data model several types of integrity constraints are
1. Domain Integrity Constraints:
A domain is a set of values that may be assigned to an attribute. All of the values
that may appear in a column of a table must be taken from the same domain. A
domain consists of values like column name, data type, size and allowable values.
Domain integrity constraints are 1) Not Null and 2) Check. The Not Null constraint is
used to avoid null values. The check constraint is used to specify a condition for an
attribute. In relational data model NULL values is not equal to zero or Null strings.
Here one null value is not equal to another null value.
Entity Integrity Constraints:
Mainly Entity integrity constraints are two types. They are (i) Primary key
and (ii) Unique:
• Every primary key attribute is non-null and contains unique values. In
some cases a particular attribute cannot be assigned a data value. These
are two situations. Where this is likely to occur either there is no
applicable data value (or) the applicable data value is not known. In this
case we use the entity integrity constraint unique.
Referential Integrity Constraints:
In the relational data model association between the tables are defined
with the help of Referential integrity constraints (foreign key). For example
the association between the CUSTOMER and ORDER tables is identified by
including the customer-Id attribute as foreign key in ORDER table.
CUSTOMER
Customer-Id Cust-Name Add
Primary key
ORDER
Order-Id Order description Customer-Id

Foreign key

• A referential integrity constraint is a rule that maintains consistency


among the rows of two relations. The says that if there is a foreign key in
one relation, the foreign key values must match the primary key values in
another relation.
Logical view of Data
Logical view of data: A collection of interrelated files and a set of programs that allow
users to access and modify these files is known as a database management system.
The database stores and manages both data and meta data. The DBMS manages and
controls access to the data and the database structure. Such an arrangement i.e; placing
the DBMS between the application and the database, eliminates most of the file system’s
internet limitations.
• The relational data model allows the designer to focus on the logical representation of
the data and its relationships, rather than on physical storage details. The relational
model enables you to view data logically rather than physically.
• The use of a table, how the advantages of structural and data independence. A table
does resemble a file from a conceptual point of view. Because you can think of related
records as being stored in independent tables, the relational data base model is much
easier to understand than the hierarchical and network models. Logical simplicity tends
to yield simple and effective database design methodologies. The table plays such a
prominent vole in the relational model.
Tables and their characteristics

• The logical view of the relational database is facilitated by the creation of


data relationships based on a logical construct known as a relation. Because
a relation is a mathematical construct, end-users find it much easier to think
of a relation as a table. A table is perceived as a two-dimensional structure
composed of rows and columns. A table is also called a relation because the
relational model’s creator, E.F. Codd, used the term relation as a synonym
for table
Characteristics of a Relational Table

1. A table is perceived as a two-dimensional structure composed of rows and


columns.
2. Each table row (tuple) represents a single entity occurrence within the entity
set.
3. Each table column represents an attribute, and each column has a distinct
name.
4. Each row/column intersection represents a single data value.
5. All values in a column must conform to the same data format.
6. Each column has a specific range of values known as the attribute domain.
7. The order of the rows and columns is immaterial to the DBMS.
8. Each table must have an attribute or a combination of attributes that uniquely
identifies each row
Relational set operators

• The relational algebra is a collection of operations on relations. Each


operation takes one (or) more relations as its operand (s) and
produces another relation as its result. Relational algebra defines the
theoretical way of manipulating table contents using the eight
relational operators. The operations defined in relational algebra
include select, project, Cartesian product, union, set difference, set
intersection, natural join, division e.t.c; The select and project are
unary operations since they operate on one relations
1. The select operation
The select operation selects tuples (horizontal subset) from a relation that
satisfy a given predicate (i.e.: a given condition). The selection is denoted by
 (sigma). Ex: To select those tuples from Items relation where the price is
more than 14.00, we shall write
price > 14.00 (items)

ITEM ITEM
Item # Items-Name Price
Item # Item-Name price
I1 Milk 15.00
I1 Milk 15.00
I2 Cake 5.00
I4 Milk bread 14.00 I7 Ice cream 16.00
I5 Plain biscuit 6.00 I9 Namkeen 15.00
I6 Cream biscuit 10.00
I7 Ice cream 16.00
I8 Namkeen 15.00
2. The project operation
The project operation yields a “vertical” subset of a given relation.
The projection lets you select specified attributes in a specified order.
Projection is denoted by  (Pi)
• Ex: To project supplier names and their cities from the relation suppliers,
we shall write  Supp-Name, city (suppliers)
• The relation resulting from this query is as shown in fig (b)
SUPPLIER
Supp # Supp-Name Status city
S1 Britannia 10 Delhi
S2 New Bakers 30 Mumbai
S3 Mother Dairy 10 Delhi
S4 Cookz 50 Bangalore
S5 Haldiram 40 jaipur
Fig (a)
SUPPLIER
Supp-Name City
Britannia Delhi
New Bakers Mumbai
Mother dairy Delhi
Cookz Bangalore
Haldiram jaipur
• The Cartesian product operation
• The Cartesian product is a binary operation and is denoted by a cross (X). The
Cartesian product of two relation A and B is written as A x B. The Cartesian
product yields a new relation which has a degree (number of attributes) equal
to the sum of the degrees of the two relations. Operated upon. The Cartesian
product of two relations yields a relation with all possible combinations of the
tuples of the two relations operated upon. Let us consider how does it work
let us assume there are two relations student and Instructor
STUDENT INSTUUCTOR
Stud # Stud-Name Hosteler INST# INST-Name subject
S001 Meenakshi Y 101 K.Lal English
S002 Radhika N
102 R.L.Arora Maths
S003 Abhinav N
The resulting relation has been shown in below figure student x Instructor

The resulting relation contains all possible combinations of tuples of the two relations.
STUDENT_INSTUUCTOR
Stud # Stud-Name Hosteler INST# INST-Name Subject
S001 Meenakshi Y 101 K.Lal English
S001 Meenakshi Y 102 R.L.Arora Maths
S002 Radhika N 101 K.Lal English
S002 Radhika N 102 R.L.Arora Maths
S003 Abhinav N 101 K.Lal English
S003 abhinav N 102 R.L.Arora Maths
The union operation
The union operation is a binary operation that requires two relations as its
operands. It produces a third relation that contains tuples from both the
operand relations. The union operation is denoted by U. To denote the union
of two relations X and Y we will write X U Y
• One thing must be remembered about union that is both the operand
relations must be union-compatible. For a union operation A U B to be valid,
the following two conditions must be satisfied by the two operands A and B.
• The relations A and B must be of the same degree. That is, they must have
the same number of attributes.
• The domains of the ith attributes of A and the ith attribute of B must be the
same.
A B

AUB
conside r t hat we have following t wo re lat ions Drama and Song, t he n t he re sult
of SONG U DRAMA will be as shown in Fig (c)
DRAMA SONG
Roll No. Name Age Roll No. Name Age
13 Kush 15 2 Manya 15
17 Swat i 14 10 Rishabh 15
13 Kush 15

Re sult of SONG V DRAMA will be


Roll No. Name Age
2 Manya 15
10 Rishabh 15
13 Kush 15
17 Swat hi 14
The set Intersection operation
The set intersection operation finds tuples that are common to the two operand
relations. The set intersection operation is denoted by . The A  B will yield a
relation having tuples common to A and B.
DRAMA SONG A B
Roll No. Name Age Roll No. Name Age
13 Kush 15 2 Manya 15
17 Swathi 14 10 Rishabh 15
13 Kush 15

Result of DRAMA_SONG will be


Roll No. Name Age
17 Swathi 14
The join operation
The join operation joins two relations to form a new relation on the basis of
one common column the two operand relations have. That is, if two tables
each have a column defined over some common domain, they may be joined
over those two columns, the result of the join is a new, wider table in which
each row is formed by concatenating two rows, one from each of the original
tables, such that the two rows have the same value in those two columns.
• 8. Division Operation: The DIVIDE operation uses one single-column table
(i.e. column “a”) as the divisor and one 2-column table (i.e. columns “a” and
“b”) as the dividend. The tables must have a common column (i.e. column
“a”.) The output of the DIVIDE operation is a single column with the values of
column “a” from the dividend table rows where the value of the common
column (i.e. column “a”) in both tables match.
E.F. CODD’S RELATIONAL DATABASE
RULES
• In 1985, Dr.E.F.Codd published a list of 12 rules to define a relational database
system. The reason Dr.Codd published the list was his concern that many
vendors were marketing products as “relational” even though those products
did not meet the minimum relational standards. Dr.Codd’s list below serves as
a frame of reference for what a truly relational database should be.
1. Information Representation: All information in a relational database must
be logically represented as column values in rows within tables.
2. Guaranted Access: Every value in a table is guaranteed to be accessible
through a combination of table name, primary key value, and column name
3. Systematic Treatment of Nulls: Nulls must be represented and treated in a
systematic way, independent of data type.
4. Dynamic On-Line Catalog Based on the Relational Model: The metadata
must be stored and managed as ordinary data, that is, in tables within the
database. Such data must be available to authorized users using the relational
database relational language.
5. Comprehensive Data Sublanguage: The relational database may support
many languages. However, it must support one well defined, declarative
language. However, it must support one well defined, declarative language
with support for data definition, view definition, data manipulation (interactive
and by program), integrity constraints, authorization, and transaction
management (begin, commit, and rollback).
6. View Updating: Any view that is theoretically updatable must be updatable
through the system.
7. High-Level Insert, Update and Delete: The database must support set-level inserts,
updates, and deletes.
8. Physical Data Independence: Application programs and ad hoc facilities are logically
unaffected when physical access methods or storage structures are changed.
9. Logical Data Independence: Application programs and hoc facilities are logically
unaffected when changes are made to the table structures that preserve the original
table values (changing order of column or inserting columns.)
10. Integrity Independence: All relational integrity constraints must be definable in the
relational language and stored in the system catalog, not at the application level.
11. Distribution Independence: The end users and application programs are unaware
and unaffected by the data location (distributed vs. local databases).
12. Nonsubversion: If the system supports low-level access to the data, there must not
be a way to bypass the integrity rules of the database.
13. Rule Zero: All preceding rules are based on the notion that in order for a database
to be considered relational, it must use its relational facilities exclusively to manage the
database.

You might also like