Download as pdf or txt
Download as pdf or txt
You are on page 1of 112

HND YEAR 1

SOFTWARE ENGINEERING

INTRODUCTION TO
DATABASE

BY
MR. NDUMU

HAND OUT
INTRODUCTION TO DATABASE SYSTEMS

INTRODUCTION TO DATABASE

A database is an organized collection of structured information, or data, typically stored


electronically in a computer system. A database is usually controlled by a database
management system (DBMS). Together, the data and the DBMS, along with the applications
that are associated with them, are referred to as a database system, often shortened to just
database.

Data within the most common types of databases in operation today is typically modeled in
rows and columns in a series of tables to make processing and data querying efficient. The
data can then be easily accessed, managed, modified, updated, controlled, and organized.
Most databases use structured query language (SQL) for writing and querying data.

Databases have evolved dramatically since their inception in the early 1960s. Navigational
databases such as the hierarchical database (which relied on a tree-like model and allowed
only a one-to-many relationship), and the network database (a more flexible model that
allowed multiple relationships), were the original systems used to store and manipulate
data. Although simple, these early systems were inflexible. In the 1980s, relational
databases became popular, followed by object-oriented databases in the 1990s. More
recently, NoSQL databases came about as a response to the growth of the internet and the
need for faster speed and processing of unstructured data. Today, cloud databases and self-
driving databases are breaking new ground when it comes to how data is collected, stored,
managed, and utilized.

Databases and spreadsheets (such as Microsoft Excel) are both convenient ways to store
information. The primary differences between the two are:

 How the data is stored and manipulated


 Who can access the data
1|Page
 How much data can be stored

Spreadsheets were originally designed for one user, and their characteristics reflect that.
They’re great for a single user or small number of users who don’t need to do a lot of
incredibly complicated data manipulation. Databases, on the other hand, are designed to
hold much larger collections of organized information—massive amounts, sometimes.
Databases allow multiple users at the same time to quickly and securely access and query
the data using highly complex logic and language.

TYPES OF DATABASES

There are many different types of databases. The best database for a specific organization
depends on how the organization intends to use the data.

 Relational databases. Relational databases became dominant in the 1980s. Items in a


relational database are organized as a set of tables with columns and rows. Relational
database technology provides the most efficient and flexible way to access structured
information.
 Object-oriented databases. Information in an object-oriented database is represented
in the form of objects, as in object-oriented programming.
 Distributed databases. A distributed database consists of two or more files located in
different sites. The database may be stored on multiple computers, located in the
same physical location, or scattered over different networks.
 Data warehouses. A central repository for data, a data warehouse is a type of database
specifically designed for fast query and analysis.
 NoSQL databases. A NoSQL, or nonrelational database, allows unstructured and
semistructured data to be stored and manipulated (in contrast to a relational
database, which defines how all data inserted into the database must be composed).
NoSQL databases grew popular as web applications became more common and more
complex.
2|Page
 Graph databases. A graph database stores data in terms of entities and the
relationships between entities.
 OLTP databases. An OLTP database is a speedy, analytic database designed for large
numbers of transactions performed by multiple users.

These are only a few of the several dozen types of databases in use today. Other, less
common databases are tailored to very specific scientific, financial, or other functions. In
addition to the different database types, changes in technology development approaches
and dramatic advances such as the cloud and automation are propelling databases in
entirely new directions. Some of the latest databases include

 Open source databases. An open source database system is one whose source code
is open source; such databases could be SQL or NoSQL databases.
 Cloud databases. A cloud database is a collection of data, either structured or
unstructured, that resides on a private, public, or hybrid cloud computing platform.
There are two types of cloud database models: traditional and database as a service
(DBaaS). With DBaaS, administrative tasks and maintenance are performed by a
service provider.
 Multimodel database. Multimodel databases combine different types of database
models into a single, integrated back end. This means they can accommodate various
data types.
 Document/JSON database. Designed for storing, retrieving, and managing document-
oriented information, document databases are a modern way to store data in JSON
format rather than rows and columns.
 Self-driving databases. The newest and most groundbreaking type of database, self-
driving databases (also known as autonomous databases) are cloud-based and use
machine learning to automate database tuning, security, backups, updates, and other
routine management tasks traditionally performed by database administrators.

3|Page
WHAT IS A DATABASE?

A database (DB) is a collection of data that lives for a long time. Many systems fit this
definition, for example, a paper-based file system, a notebook, or even a string with knobs
for counting.

A Database Management System (DBMS) is a system (software) that provides an interface


to database for information storage and retrieval. We are more interested in software
systems rather than manual systems because they can do the job more efficiently. The
common features of a DBMS includes

 capacity for large amount of data


 an easy to use interface language (SQL-structured query language)
 efficient retrieval mechanisms
 multi-user support
 security management
 concurrency and transaction control
 persistent storage with backup and recovery for reliability

The users of a database assume difference roles such as

 end user - application programmers that use the DB as a storage subsystem


 designer - application programmers and/or business analysts who design the layout
of the DB
 administrator - operators who maintain the heath and efficiency of the DB
 implementor - programmers who maintain and develop the DBMS

The key concepts of database includes

 schema - the structure and the constraints of data


 data - the actual content of the DB representing information
4|Page
 data definition language - used to specify the schema
 data manipulation and query language - used to change the data and query them

Schema are meta-data that describe data. Such meta-data can describe the structure of the
data which ranges from strictly enforced structure (relational) to semi-structure (XML) and
free-structured data (text files). Before we define the schema we must decide on a model of
the data - a metaphor. For relational database n-ary relation is used to model data.

A database can be defined as a shared collection of interrelated data designed to meet the
varied information needs of an organisation. (McFadden and Hoffer).

PROPERTIES OF A DATABASE

A database is

Integrated Previously distinct data files have been logically organised to


eliminate or reduce redundancy and to facilitate data access.

Shared All qualified users in the organisation have access to the same data
for use in a variety of activities.

Interrelated Structured in a manner that is logically meaningful to the


organisation.

ADVANTAGES

There are a number of advantages to the database approach. These are discussed in more
detail in the topic Database Systems. Some of the advantages of a database system are
summarised below.

Data management A DBMS provides a mechanism for easier management of


the data held by an organisation.

5|Page
Data independence In a database system the data is held independently of the
application programs that use it. Changes to the data or
the way it is stored will not necessarily affect the
programs that use the data.

Data consistency A database system is designed to reduce


redundancy. Items of data are ideally only stored in one
location. When updates are made to data, changes should
only need to be made to this one location. This greatly
improves the consistency of the results of queries made
on the database.

Data sharing The collection of data in a central data repository and


managed by a DBMS allows the data to be shared by
several users and applications.

Increased productivity File structures are already determined in the database


structure. New application programs can then be
developed that use these structures. This results in
shorter development times.

Reduced maintenance A number of tools for maintaining the data are already
built in to most DBMS. Tools may include report
generators, query languages, form generators, code
checkers, data dictionaries and maintenance logs.

6|Page
COMPONENTS

A database consists of:

Database structure Which is conceived as a single entity consisting of a


collection of interrelated files.

Repository Which contains the rules under which users can access the
data and the rules under which the data is organised ie
metadata.

End users Who access the database.

Interfaces Which are the various ways an end user may access the data
via display terminals, phone links from remote terminals,
touch screens, bar code readers, by using voice commands,
touch pads, graphics selection, selections from menus with
keyboard or mouse, keying in commands or scanning
cards. The interface also is the form in which the
information is delivered to the end user eg graph, printed
report, graphical display.

Application programs Which are used to perform the main operations on the data
ie create, modify, delete and retrieve. Application programs
also combine the data in meaningful ways to produce
reports.

Database Management Which is the collection of programs that manages the


System (DBMS) database structure and interprets the rules in the repository
and also manages user access in multi-user systems.

7|Page
Computer Aided Are also considered to be part of the database
Software Engineering environment. These are automated tools to assist with the
Tools (CASE tools) design of systems and databases.

Database Manage the DBMS and are responsible for the overall
Administrators (DBA) information resources of an organisation. You will look at
the role of the DBA more fully in a later section.

System Developers Include systems analysts and programmers. You will look at
the role of each of these in other modules. Both use CASE
tools to assist in the development of application programs.

8|Page
DATABASE MANAGEMENT SYSTEM (DBMS)

A database management system (DBMS) is system software for creating and managing
databases. A DBMS makes it possible for end users to create, read, update and delete data
in a database. The most prevalent type of data management platform, the DBMS essentially
serves as an interface between databases and end users or application programs, ensuring
that data is consistently organized and remains easily accessible.

FUNCTIONS OF A DBMS

The DBMS manages three important things: the data, the database engine that allows data
to be accessed, locked and modified, and the database schema, which defines the database's
logical structure. These three foundational elements help provide concurrency, security,
data integrity and uniform data administration procedures. Typical database
administration tasks supported by the DBMS include change management, performance
monitoring and tuning, security, and backup and recovery. Many database management
systems are also responsible for automated rollbacks and restarts as well as the logging and
auditing of activity in databases.

The DBMS is perhaps most useful for providing a centralized view of data that can be
accessed by multiple users, from multiple locations, in a controlled manner. A DBMS can
limit what data the end user sees, as well as how that end user can view the data, providing
many views of a single database schema. End users and software programs are free from
having to understand where the data is physically located or on what type of storage media
it resides because the DBMS handles all requests.

The DBMS can offer both logical and physical data independence.This means it can protect
users and applications from needing to know where data is stored or having to be
concerned about changes to the physical structure of data. As long as programs use the
9|Page
application programming interface (API) for the database that is provided by the DBMS,
developers won't have to modify programs just because changes have been made to the
database.

In a relational database management system (RDBMS), the most widely used type of DBMS,
this API is SQL, a standard programming language for defining, protecting and accessing
data in an RDBMS.

Popular types of DBMS technologies

Popular database models and management systems include:

 Relational database management system (RDBMS) -- adaptable to most use cases, but
RDBMS Tier-1 products can be quite expensive.
 NoSQL DBMS -- well-suited for loosely defined data structures that may evolve over
time.
 In-memory database management system (IMDBMS) -- provides faster response
times and better performance.

10 | P a g e
 Columnar database management system (CDBMS) -- well-suited for data warehouses
that have a large number of similar data items.
 Cloud-based database management system -- the cloud service provider is
responsible for providing and maintaining the DBMS.

ADVANTAGES OF USING A DBMS

Using a DBMS to store and manage data comes with advantages, but also processing
overhead. One of the biggest advantages of using a DBMS is that it lets end users and
application programmers access and use the same data while managing data integrity. Data
is better protected and maintained when it can be shared using a DBMS instead of creating
new iterations of the same data stored in new files for every new application. The DBMS
provides a central store of data that can be accessed by multiple users in a controlled
manner.

Central storage and management of data within the DBMS provides:

 Data abstraction and independence.


 Data security.
 A locking mechanism for concurrent access.
 An efficient handler to balance the needs of multiple applications using the same data.
 The ability to swiftly recover from crashes and errors, including restartability and
recoverability.
 Robust data integrity capabilities.
 Logging and auditing of activity.
 Simple access using a standard API.
 Uniform administration procedures for data.

11 | P a g e
Another advantage of a DBMS is that it can be used to impose a logical, structured
organization on the data. A DBMS delivers economy of scale for processing large amounts
of data because it is optimized for such operations.

A DBMS can also provide many views of a single database schema. A view defines what data
the user sees and how that user sees the data. The DBMS provides a level of abstraction
between the conceptual schema that defines the logical structure of the database and the
physical schema that describes the files, indexes and other physical mechanisms used by
the database. When a DBMS is used, systems can be modified much more easily when
business requirements change. New categories of data can be added to the database without
disrupting the existing system and applications can be insulated from how data is
structured and stored.

However, a DBMS must perform additional work to provide these advantages, thereby
bringing with it the overhead. A DBMS will use more memory and CPU than a simple file
storage system, and different types of DBMSes will require different types and levels of
system resources.

A database management system receives instruction from a database administrator (DBA)


and accordingly instructs the system to make the necessary changes. These commands can
be to load, retrieve or modify existing data from the system.

A DBMS always provides data independence. Any change in storage mechanism and formats
are performed without modifying the entire application. There are four main types of
database organization:

 Relational Database: Data is organized as logically independent tables. Relationships


among tables are shown through shared data. The data in one table may reference
similar data in other tables, which maintains the integrity of the links among them.
This feature is referred to as referential integrity – an important concept in a

12 | P a g e
relational database system. Operations such as "select" and "join" can be performed
on these tables. This is the most widely used system of database organization.
 Flat Database: Data is organized in a single kind of record with a fixed number of
fields. This database type encounters more errors due to the repetitive nature of data.
 Object-Oriented Database: Data is organized with similarity to object-oriented
programming concepts. An object consists of data and methods, while classes group
objects having similar data and methods.
 Hierarchical Database: Data is organized with hierarchical relationships. It becomes
a complex network if the one-to-many relationship is violated.

NATURE OF DATA BASE MANAGEMENT SYSTEM:

Features & characteristics of data base management system are as follows:

1. The data are combined to form operational units to minimise the duplication of data and
increase access to all data in the data base.

2. The advancement in the data base enables to add more data and program to the system.

3. The capacity to store large amount of data necessary for user’s needs. They are stored on
direct accessible devices for one line support.

4. The control in the systems limits the access to the data base files and builds the
confidentiality of all data in these files.

5. The capacity to interrogating data files, retrieving and modifying data and recording the
changes.

Objectives of Data Base Management System (DBMS):

The objectives of a data base management system are to facilitate the creation of data
structures and relieve the programmer of the problems of setting up complicated files.
13 | P a g e
The objectives of DBMS can be narrated as follows:

1. Eliminate redundant data.

2. Make access to the data easy for the user.

3. Provide for mass storage of relevant data.

4. Protect the data from physical harm and un-authorised systems.

5. Allow for growth in the data base system.

6. Make the latest modifications to the data base available immediately.

7. Allow for multiple users to be active at one time.

8. Provide prompt response to user requests for data.

Functions of Data Base Management System (DBMS):

Every computer application has unique requirements. For example, special purpose
software systems that handle personnel, inventory, and marketing data, may differ not only
in the type of information these store, but also in the facilities they provide for data entry
and retrieval.

The cost of designing and building special purpose software systems for Data management
tasks often prohibits otherwise cost effective automation. Data base management systems
are general purpose programs that dramatically reduce the time necessary to computerise
an application.

The purpose of DBMS is to provide following main functions:

1. A mechanism for organising, structuring and storing data.

14 | P a g e
2. A mechanism for accessing data that provides a measure of data independence, i.e., to
some extent it insulates application programs from changes to the data structure.

3. Creating program and data independence. Either one can be altered independently of the
other.

4. Reducing data redundancy.

5. Providing security to the user’s data. Access is limited to authorized users by pass words
or similar schemes.

6. Reducing physical storage requirements by separating the logical and physical aspects of
the data base.

Merits of DBMS:

Advantages and merits of DBMS are as follows:

1. Integrity

2. Security

3. Data independence

4. Shared data

5. Conflict resolution

6. Reduction of redundancies.

1. Integrity:

Centralised control can also ensure that adequate checks are incorporated in the DBMS to
provide data integrity. Data integrity means that the data contained in the data base is both
15 | P a g e
accurate and consistent. Therefore, Data values being entered for storage could be checked
to ensure that they fall within a specified range and are of the correct format.

For example, the value for the age of an employee may be in the range of 16 and 75.

Another integrity check that should be incorporated in the data base is to ensure that if
there is a reference to certain object, that object must exist. In the case of an automatic teller
machine, for example, a user is not allowed to transfer funds from a non-existent savings
account to a checking account.

2. Security:

Data is of vital importance to an organisation and may be confidential. Such confidential


data must not be accessed by un-authorised persons. The data base administrator (DBA)
who has the ultimate responsibility for the data in the DBMS can ensure that proper access
procedures are followed, including proper authentication schemes for access to the DBMS
and additional checks before permitting access to sensitive data.

Different levels of security could be implemented for various types of data and operations.
The enforcement of security could be data value dependent (e.g., a manager has access to
the salary details of employees in his department only), as well as data type dependent (but
the manager cannot access the medical history of any employee, including those in his
department).

3. Data Independence:

Data independence is usually considered from two points of view; physical data
independence and logical data independence. Physical data independence allows changes
in the physical storage devices or organisation of the files to be made without requiring
changes in the conceptual view or any of the external views and hence in the application
programs using the data base.
16 | P a g e
Thus, the files may migrate from one type of physical media to another or the file structure
may change without any need for changes in the application programs. Logical data
independence implies that application programs need not be changed if fields are added to
an existing record; nor do they have to be changed if fields not used by application programs
are deleted.

Logical data independence indicates that the conceptual schema can be changed without
affecting the existing external schemas. Data independence is advantageous in the data base
environment since it allows for changes at one level of the data base without affecting other
levels. These changes are absorbed by the mappings between the levels.

4. Shared Data:

A data base allows the sharing of data under its control by any number of application
programs or users. In the example discussed earlier, the applications for the public relations
and payroll departments could share the data contained for the record type employee.

5. Conflict Resolution:

Since the data base is under the control of the data base administrator (DBA), he should
resolve the conflicting requirements of various users and applications. In essence, the DBA
chooses the best file structure and access method to get optimal performance for the critical
applications, while permitting less critical applications to continue to use the data base,
albeit with the relative response.

6. Reduction of Redundancies:

Centralised control of data by the DBA avoids unnecessary duplication of data and
effectively reduces the total amount of data storage required. It also eliminates the extra
processing necessary to trace the required data in a large mass of data.

17 | P a g e
Another advantage of avoiding duplication is the elimination of the inconsistencies that
tend to be present in redundant data files. Any redundancies that exist in the DBMS are
controlled and the system ensures that these multiple copies are consistent.

DISADVANTAGES OF DBMS:

Disadvantages of data base management system are:

1. Complexity of backup and recovery

2. Problem associated with centralization

3. Cost of software, hardware and migration.

1. Complexity of Backup and Recovery:

Backup and recovery operations are very complex in a data base management system
(DBMS) environment and this is evident in concurrent multi user data base system. A data
base system requires a certain amount of controlled redundancies and duplication to enable
assess to related data items.

2. Problem Associated with Centralization:

Centralisation increases a potential severity of security breaches and disruption of the


operation of the organisation because of down time and failures. The centralisation means
that the data is accessible from a single data source or a data base.

The decentralisation of the data base that is the replacement of a single centralised data
base by independent and co-operating distributed data bases solves the problem arise out
because of the centralisation that is the problem of failures and down time.

18 | P a g e
3. Cost of Software, Hardware and Migration:

For a well-designed and effective data base system, it is necessary to purchase and develop
the software’s and hardware has to be upgraded to allow for the extensive programs and
the work spaces required for their execution and storage. It involves a lot of cost. An
additional cost is that of migration that means the shift from a traditional separate
application environment to an integrated application environment.

From my point of view, the basic objectives of a database system can be summarized as
below:

1. A database should act as a kind of medium to collect and store the incoming data in
an organized way. For example, in case of a relational database such as Oracle, the
main purpose of this database is to store the input data and organize them in terms
of attributes (columns) and tuples (rows) grouped into relations (tables).
2. A database, in addition to storing the input data, should allow for an efficient retrieval
of stored data as per user’s requirements.
3. A database should be implemented with various security features such that it ensures
high level of integrity, i.e., developing a trust for the users about their data stored in
the database.
4. A database should be highly scalable as the amount of data increases over time. In this
context, it should also be highly adaptable to changes with respect to business needs.
5. A database should be highly consistent despite the amount of concurrent transactions
operating on the data stored in it. Further, it should also be highly durable so that it
prevents loss of data despite the loss of power.

19 | P a g e
DBMS DATABASE MODELS

A Database model defines the logical design and structure of a database and defines how
data will be stored, accessed and updated in a database management system. While the
Relational Model is the most widely used database model, there are other models too:

 Hierarchical Model
 Network Model
 Entity-relationship Model
 Relational Model

HIERARCHICAL MODEL

This database model organises data into a tree-like-structure, with a single root, to which
all the other data is linked. The heirarchy starts from the Root data, and expands like a tree,
adding child nodes to the parent nodes.

In this model, a child node will only have a single parent node.

This model efficiently describes many real-world relationships like index of a book, recipes
etc.

In hierarchical model, data is organised into tree-like structure with one one-to-many
relationship between two different types of data, for example, one department can have
many courses, many professors and of-course many students.

20 | P a g e
NETWORK MODEL

This is an extension of the Hierarchical model. In this model data is organised more like a
graph, and are allowed to have more than one parent node.

In this database model data is more related as more relationships are established in this
database model. Also, as the data is more related, hence accessing the data is also easier and
fast. This database model was used to map many-to-many data relationships.

This was the most widely used database model, before Relational Model was introduced.

21 | P a g e
RELATIONAL MODEL (ENTITY-RELATIONSHIP MODEL)

In this database model, relationships are created by dividing object of interest into entity
and its characteristics into attributes.

Different entities are related using relationships.

E-R Models are defined to represent the relationships into pictorial form to make it easier
for different stakeholders to understand.

This model is good to design a database, which can then be turned into tables in relational
model(explained below).

Let's take an example, If we have to design a School Database, then Student will be an entity
with attributes name, age, address etc. As Address is generally complex, it can be another
entity with attributes street name, pincode, city etc, and there will be a relationship between
them.

22 | P a g e
Relationships can also be of different types. To learn about E-R Diagrams in details, click on
the link.

RELATIONAL MODEL

In this model, data is organized in two-dimensional tables and the relationship is


maintained by storing a common field.

This model was introduced by E.F Codd in 1970, and since then it has been the most widely
used database model, in fact, we can say the only database model used around the world.

The basic structure of data in the relational model is tables. All the information related to a
particular type is stored in rows of that table.

Hence, tables are also known as relations in relational model.

In the coming tutorials we will learn how to design tables, normalize them to reduce data
redundancy and how to use Structured Query language to access data from tables.

23 | P a g e
DATABASE MODELS

A database model is a theory or specification describing how a database is structured and


used. Several such models have been suggested.

The common models include

 Network Model - Any links supporting quick access.


 Hierarchical Model - Links but no cycles (hierarchy).
 Relational Model - Data Independence.
 Object Oriented Model - Entity Abstraction.

24 | P a g e
NETWORK MODEL

The popularity of the network data model coincided with the popularity of the hierarchical
data model. Some data were more naturally modeled with more than one parent per child.
So, the network model permitted the modeling of many-to-many relationships in data. In
1971, the Conference on Data Systems Languages (CODASYL) formally defined the network
model. The basic data modeling construct in the network model is the set construct. A set
consists of an owner record type, a set name, and a member record type. A member record
type can have that role in more than one set, hence the multiparent concept is supported.
An owner record type can also be a member or owner in another set. The data model is a
simple network, and link and intersection record types (called junction records by IDMS)
may exist, as well as sets between them . Thus, the complete network of relationships is
represented by several pairwise sets; in each set some (one) record type is owner (at the
tail of the network arrow) and one or more record types are members (at the head of the
relationship arrow). Usually, a set defines a 1:M relationship, although 1:1 is permitted. The
CODASYL network model is based on mathematical set theory.

HIERARCHICAL MODEL

The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent
and child data segments. This structure implies that a record can have repeating
information, generally in the child data segments. Data in a series of records, which have a
set of field values attached to it. It collects all the instances of a specific record together as a
record type. These record types are the equivalent of tables in the relational model, and
with the individual records being the equivalent of rows. To create links between these
record types, the hierarchical model uses Parent Child Relationships. These are a 1:N
mapping between record types. This is done by using trees, like set theory used in the
relational model, "borrowed" from maths. For example, an organization might store

25 | P a g e
information about an employee, such as name, employee number, department, salary. The
organization might also store information about an employee's children, such as name and
date of birth. The employee and children data forms a hierarchy, where the employee data
represents the parent segment and the children data represents the child segment. If an
employee has three children, then there would be three child segments associated with one
employee segment. In a hierarchical database the parent-child relationship is one to many.
This restricts a child segment to having only one parent segment. Hierarchical DBMSs were
popular from the late 1960s, with the introduction of IBM's Information Management
System (IMS) DBMS, through the 1970s.

RELATIONAL MODEL

(RDBMS - relational database management system) A database based on the relational


model developed by E.F. Codd. A relational database allows the definition of data structures,
storage and retrieval operations and integrity constraints. In such a database the data and
relations between them are organised in tables. A table is a collection of records and each
record in a table contains the same fields.

Properties of Relational Tables:


# Values Are Atomic
# Each Row is Unique
# Column Values Are of the Same Kind
# The Sequence of Columns is Insignificant
# The Sequence of Rows is Insignificant
# Each Column Has a Unique Name

Certain fields may be designated as keys, which means that searches for specific values of
that field will use indexing to speed them up. Where fields in two different tables take
values from the same set, a join operation can be performed to select related records in
26 | P a g e
the two tables by matching values in those fields. Often, but not always, the fields will have
the same name in both tables. For example, an "orders" table might contain (customer-ID,
product-code) pairs and a "products" table might contain (product-code, price) pairs so to
calculate a given customer's bill you would sum the prices of all products ordered by that
customer by joining on the product-code fields of the two tables. This can be extended to
joining multiple tables on multiple fields. Because these relationships are only specified at
retreival time, relational databases are classed as dynamic database management system.
The RELATIONAL database model is based on the Relational Algebra.

OBJECT-ORIENTED MODEL

Object DBMSs add database functionality to object programming languages. They bring
much more than persistent storage of programming language objects. Object DBMSs
extend the semantics of the C++, Smalltalk and Java object programming languages to
provide full-featured database programming capability, while retaining native language
compatibility. A major benefit of this approach is the unification of the application and
database development into a seamless data model and language environment. As a result,
applications require less code, use more natural data modeling, and code bases are easier
to maintain. Object developers can write complete database applications with a modest
amount of additional effort.

According to Rao (1994), "The object-oriented database (OODB) paradigm is the


combination of object-oriented programming language (OOPL) systems and persistent
systems. The power of the OODB comes from the seamless treatment of both persistent
data, as found in databases, and transient data, as found in executing programs." In contrast
to a relational DBMS where a complex data structure must be flattened out to fit into tables
or joined together from those tables to form the in-memory structure, object DBMSs have
no performance overhead to store or retrieve a web or hierarchy of interrelated objects.

27 | P a g e
This one-to-one mapping of object programming language objects to database objects has
two benefits over other storage approaches: it provides higher performance management
of objects, and it enables better management of the complex interrelationships between
objects. This makes object DBMSs better suited to support applications such as financial
portfolio risk analysis systems, telecommunications service applications, World Wide Web
document structures, design and manufacturing systems, and hospital patient record
systems, which have complex relationships between data.

RELATIONAL DATA MODEL IN DBMS: CONCEPTS, CONSTRAINTS, EXAMPLE

What is Relational Model?

RELATIONAL MODEL (RM) represents the database as a collection of relations. A relation


is nothing but a table of values. Every row in the table represents a collection of related data
values. These rows in the table denote a real-world entity or relationship.

The table name and column names are helpful to interpret the meaning of values in each
row. The data are represented as a set of relations. In the relational model, data are stored
as tables. However, the physical storage of the data is independent of the way the data are
logically organized.

Some popular Relational Database management systems are:

 DB2 and Informix Dynamic Server - IBM


 Oracle and RDB – Oracle
 SQL Server and Access - Microsoft

In this tutorial, you will learn

 Relational Model Concepts


 Relational Integrity constraints

28 | P a g e
 Operations in Relational Model
 Best Practices for creating a Relational Model
 Advantages of using Relational model
 Disadvantages of using Relational model

Relational Model Concepts

1. Attribute: Each column in a Table. Attributes are the properties which define a
relation. e.g., Student_Rollno, NAME,etc.
2. Tables – In the Relational model the, relations are saved in the table format. It is
stored along with its entities. A table has two properties rows and columns. Rows
represent records and columns represent attributes.
3. Tuple – It is nothing but a single row of a table, which contains a single record.
4. Relation Schema: A relation schema represents the name of the relation with its
attributes.
5. Degree: The total number of attributes which in the relation is called the degree of the
relation.
6. Cardinality: Total number of rows present in the Table.
7. Column: The column represents the set of values for a specific attribute.
8. Relation instance – Relation instance is a finite set of tuples in the RDBMS system.
Relation instances never have duplicate tuples.
9. Relation key - Every row has one, two or multiple attributes, which is called relation
key.
10. Attribute domain – Every attribute has some pre-defined value and scope
which is known as attribute domain

29 | P a g e
RELATIONAL INTEGRITY CONSTRAINTS

Relational Integrity constraints is referred to conditions which must be present for a valid
relation. These integrity constraints are derived from the rules in the mini-world that the
database represents.

There are many types of integrity constraints. Constraints on the Relational database
management system is mostly divided into three main categories are:

1. Domain constraints
2. Key constraints
3. Referential integrity constraints

Domain Constraints

Domain constraints can be violated if an attribute value is not appearing in the


corresponding domain or it is not of the appropriate data type.

30 | P a g e
Domain constraints specify that within each tuple, and the value of each attribute must be
unique. This is specified as data types which include standard data types integers, real
numbers, characters, Booleans, variable length strings, etc.

Example:

Create DOMAIN CustomerName


CHECK (value not NULL)

The example shown demonstrates creating a domain constraint such that CustomerName
is not NULL

Key constraints

An attribute that can uniquely identify a tuple in a relation is called the key of the table. The
value of the attribute for different tuples in the relation has to be unique.

Example:

In the given table, CustomerID is a key attribute of Customer Table. It is most likely to have
a single key for one customer, CustomerID =1 is only for the CustomerName =" Google".

CustomerID CustomerName Status

1 Google Active

2 Amazon Active

3 Apple Inactive

31 | P a g e
Referential integrity constraints

Referential integrity constraints is base on the concept of Foreign Keys. A foreign key is an
important attribute of a relation which should be referred to in other relationships.
Referential integrity constraint state happens where relation refers to a key attribute of a
different or same relation. However, that key element must exist in the table.

Example:

In the above example, we have 2 relations, Customer and Billing.

Tuple for CustomerID =1 is referenced twice in the relation Billing. So we know


CustomerName=Google has billing amount $300

OPERATIONS IN RELATIONAL MODEL

Four basic update operations performed on relational database model are

32 | P a g e
Insert, update, delete and select.

 Insert is used to insert data into the relation


 Delete is used to delete tuples from the table.
 Modify allows you to change the values of some attributes in existing tuples.
 Select allows you to choose a specific range of data.

Whenever one of these operations are applied, integrity constraints specified on the
relational database schema must never be violated.

Insert Operation

The insert operation gives values of the attribute for a new tuple which should be inserted
into a relation.

Update Operation

You can see that in the below-given relation table CustomerName= 'Apple' is updated from
Inactive to Active.

33 | P a g e
Delete Operation

To specify deletion, a condition on the attributes of the relation selects the tuple to be
deleted.

In the above-given example, CustomerName= "Apple" is deleted from the table.

The Delete operation could violate referential integrity if the tuple which is deleted is
referenced by foreign keys from other tuples in the same database.

Select Operation

In the above-given example, CustomerName="Amazon" is selected

Best Practices for creating a Relational Model

 Data need to be represented as a collection of relations


 Each relation should be depicted clearly in the table
 Rows should contain data about instances of an entity
 Columns must contain data about attributes of the entity
 Cells of the table should hold a single value
 Each column should be given a unique name
 No two rows can be identical

34 | P a g e
 The values of an attribute should be from the same domain

Advantages of using Relational model

 Simplicity: A relational data model is simpler than the hierarchical and network
model.
 Structural Independence: The relational database is only concerned with data and not
with a structure. This can improve the performance of the model.
 Easy to use: The relational model is easy as tables consisting of rows and columns is
quite natural and simple to understand
 Query capability: It makes possible for a high-level query language like SQL to avoid
complex database navigation.
 Data independence: The structure of a database can be changed without having to
change any application.
 Scalable: Regarding a number of records, or rows, and the number of fields, a database
should be enlarged to enhance its usability.

Disadvantages of using Relational model

 Few relational databases have limits on field lengths which can't be exceeded.
 Relational databases can sometimes become complex as the amount of data grows,
and the relations between pieces of data become more complicated.
 Complex relational database systems may lead to isolated databases where the
information cannot be shared from one system to another.

Understanding the Relational Database Model

The relational database model was a huge leap forward from the network database model.
Instead of relying on a parent-child or owner-member relationship, the relational model
allows any file to be related to any other by means of a common field. Suddenly, the
complexity of the design was greatly reduced because changes could be made to the
35 | P a g e
database schema without affecting the system's ability to access data. And because access
was not by means of paths to and from files, but from a direct relationship between files,
new relations between these files could easily be added.

In 1970, when E.F. Codd developed the model, it was thought to be impractical. The
increased ease of use comes at a large performance penalty, and the hardware in those days
was not able to implement the model. Since then, of course, hardware has taken huge strides
to where today, even the simplest computers can run sophisticated relational database
management systems.

Relational databases go hand-in-hand with the development of SQL. The simplicity of SQL -
where even a novice can learn to perform basic queries in a short period of time - is a large
part of the reason for the popularity of the relational model.

The two tables below relate to each other through the product_code field. Any two tables
can relate to each other simply by creating a field they have in common.

Table 1
Product_code Description Price

A416 Nails, box $0.14

C923 Drawing pins, box $0.08

Table 2
Invoice_code Invoice_line Product_code Quantity

3804 1 A416 10

3804 2 C923 15

36 | P a g e
BASIC CONCEPTS OF ER MODEL IN DBMS

As we described in the tutorial Database models, Entity-relationship model is a model used


for design and representation of relationships between data.

The main data objects are termed as Entities, with their details defined as attributes, some
of these attributes are important and are used to identity the entity, and different entities
are related using relationships.

In short, to understand about the ER Model, we must understand about:

 Entity and Entity Set


 What are Attributes? And Types of Attributes.
 Keys
 Relationships

Let's take an example to explain everything. For a School Management Software, we will
have to store Student information, Teacher information, Classes, Subjects taught in each
class etc.

ER MODEL: ENTITY AND ENTITY SET

Considering the above example, Student is an entity, Teacher is an entity, similarly, Class,
Subject etc are also entities.

An Entity is generally a real-world object which has characteristics and holds relationships
in a DBMS.

If a Student is an Entity, then the complete dataset of all the students will be the Entity Set
37 | P a g e
ER Model: Attributes

If a Student is an Entity, then student's roll no., student's name, student's age, student's
gender etc will be its attributes.

An attribute can be of many types, here are different types of attributes defined in ER
database model:

1. Simple attribute: The attributes with values that are atomic and cannot be broken
down further are simple attributes. For example, student's age.
2. Composite attribute: A composite attribute is made up of more than one simple
attribute. For example, student's address will contain, house no., street name, pincode
etc.
3. Derived attribute: These are the attributes which are not present in the whole
database management system, but are derived using other attributes. For example,
average age of students in a class.
4. Single-valued attribute: As the name suggests, they have a single value.
5. Multi-valued attribute: And, they can have multiple values.

ER MODEL: KEYS

If the attribute roll no. can uniquely identify a student entity, amongst all the students, then
the attribute roll no. will be said to be a key.

Following are the types of Keys:

38 | P a g e
1. Super Key
2. Candidate Key
3. Primary Key

We have covered Keys in details here in Database Keys tutorial.

ER Model: Relationships

When an Entity is related to another Entity, they are said to have a relationship. For
example, A Class Entity is related to Student entity, becasue students study in classes, hence
this is a relationship.

Depending upon the number of entities involved, a degree is assigned to relationships.

For example, if 2 entities are involved, it is said to be Binary relationship, if 3 entities are
involved, it is said to be Ternary relationship, and so on.

In the next tutorial, we will learn how to create ER diagrams and design databases using ER
diagrams.

Working with ER Diagrams

ER Diagram is a visual representation of data that describes how data is related to each
other. In ER Model, we disintegrate data into entities, attributes and setup relationships
between entities, all this can be represented visually using the ER diagram.

For example, in the below diagram, anyone can see and understand what the diagram wants
to convey: Developer develops a website, whereas a Visitor visits a website.

39 | P a g e
COMPONENTS OF ER DIAGRAM

Entitiy, Attributes, Relationships etc form the components of ER Diagram and there are
defined symbols and shapes to represent each one of them.

Let's see how we can represent these in our ER Diagram.

Entity

Simple rectangular box represents an Entity.

Relationships between Entities - Weak and Strong

Rhombus is used to setup relationships between two or more entities.

40 | P a g e
Attributes for any Entity

Ellipse is used to represent attributes of any entity. It is connected to the entity.

Weak Entity

A weak Entity is represented using double rectangular boxes. It is generally connected to


another entity.

Key Attribute for any Entity

To represent a Key attribute, the attribute name inside the Ellipse is underlined.

41 | P a g e
Derived Attribute for any Entity

Derived attributes are those which are derived based on other attributes, for example, age
can be derived from date of birth.

To represent a derived attribute, another dotted ellipse is created inside the main ellipse.

Multivalued Attribute for any Entity

Double Ellipse, one inside another, represents the attribute which can have multiple values.

Composite Attribute for any Entity

A composite attribute is the attribute, which also has attributes.

42 | P a g e
ER Diagram: Entity

An Entity can be any object, place, person or class. In ER Diagram, an entity is represented
using rectangles. Consider an example of an Organisation- Employee, Manager, Department,
Product and many more can be taken as entities in an Organisation.

The yellow rhombus in between represents a relationship.

ER Diagram: Weak Entity

Weak entity is an entity that depends on another entity. Weak entity doesn't have anay key
attribute of its own. Double rectangle is used to represent a weak entity.

43 | P a g e
ER Diagram: Attribute

An Attribute describes a property or characterstic of an entity. For example, Name, Age,


Address etc can be attributes of a Student. An attribute is represented using eclipse.

ER Diagram: Key Attribute

Key attribute represents the main characterstic of an Entity. It is used to represent a


Primary key. Ellipse with the text underlined, represents Key Attribute.

44 | P a g e
ER Diagram: Composite Attribute

An attribute can also have their own attributes. These attributes are known as Composite
attributes.

ER Diagram: Relationship

A Relationship describes relation between entities. Relationship is represented using


diamonds or rhombus.

45 | P a g e
There are three types of relationship that exist between Entities.

1. Binary Relationship
2. Recursive Relationship
3. Ternary Relationship

ER Diagram: Binary Relationship

Binary Relationship means relation between two Entities. This is further divided into three
types.

One to One Relationship

This type of relationship is rarely seen in real world.

46 | P a g e
The above example describes that one student can enroll only for one course and a course
will also have only one Student. This is not what you will usually see in real-world
relationships.

One to Many Relationship

The below example showcases this relationship, which means that 1 student can opt for
many courses, but a course can only have 1 student. Sounds weird! This is how it is.

47 | P a g e
Many to One Relationship

It reflects business rule that many entities can be associated with just one entity. For
example, Student enrolls for only one Course but a Course can have many Students.

Many to Many Relationship

48 | P a g e
The above diagram represents that one student can enroll for more than one courses. And
a course can have more than 1 student enrolled in it.

ER Diagram: Recursive Relationship

When an Entity is related with itself it is known as Recursive Relationship.

ER Diagram: Ternary Relationship

Relationship of degree three is called Ternary relationship.

A Ternary relationship involves three entities. In such relationships we always consider two
entites together and then look upon the third.

49 | P a g e
For example, in the diagram above, we have three related entities, Company, Product and
Sector. To understand the relationship better or to define rules around the model, we
should relate two entities and then derive the third one.

A Company produces many Products/ each product is produced by exactly one company.

A Company operates in only one Sector / each sector has many companies operating in it.

Considering the above two rules or relationships, we see that although the complete
relationship involves three entities, but we are looking at two entities at a time.

Normalization of Database

Database Normalization is a technique of organizing the data in the database. Normalization


is a systematic approach of decomposing tables to eliminate data redundancy(repetition)
and undesirable characteristics like Insertion, Update and Deletion Anomalies. It is a multi-

50 | P a g e
step process that puts data into tabular form, removing duplicated data from the relation
tables.

Normalization is used for mainly two purposes,

 Eliminating redundant(useless) data.


 Ensuring data dependencies make sense i.e data is logically stored.

The video below will give you a good overview of Database Normalization. If you want you
can skip the video, as the concept is covered in detail, below the video.

Problems Without Normalization

If a table is not properly normalized and have data redundancy then it will not only eat up
extra memory space but will also make it difficult to handle and update the database,
without facing data loss. Insertion, Updation and Deletion Anomalies are very frequent if
database is not normalized. To understand these anomalies let us take an example of a
Student table.

rollno name branch hod office_tel

401 Akon CSE Mr. X 53337

402 Bkon CSE Mr. X 53337

403 Ckon CSE Mr. X 53337

404 Dkon CSE Mr. X 53337

51 | P a g e
In the table above, we have data of 4 Computer Sci. students. As we can see, data for the
fields branch, hod(Head of Department) and office_tel is repeated for the students who are
in the same branch in the college, this is Data Redundancy.

Insertion Anomaly

Suppose for a new admission, until and unless a student opts for a branch, data of the
student cannot be inserted, or else we will have to set the branch information as NULL.

Also, if we have to insert data of 100 students of same branch, then the branch information
will be repeated for all those 100 students.

These scenarios are nothing but Insertion anomalies.

Updation Anomaly

What if Mr. X leaves the college? or is no longer the HOD of computer science department?
In that case all the student records will have to be updated, and if by mistake we miss any
record, it will lead to data inconsistency. This is Updation anomaly.

Deletion Anomaly

In our Student table, two different informations are kept together, Student information and
Branch information. Hence, at the end of the academic year, if student records are deleted,
we will also lose the branch information. This is Deletion anomaly.

52 | P a g e
NORMALIZATION RULE

Normalization rules are divided into the following normal forms:

1. First Normal Form


2. Second Normal Form
3. Third Normal Form
4. BCNF
5. Fourth Normal Form

First Normal Form (1NF)

For a table to be in the First Normal Form, it should follow the following 4 rules:

1. It should only have single(atomic) valued attributes/columns.


2. Values stored in a column should be of the same domain
3. All the columns in a table should have unique names.
4. And the order in which data is stored, does not matter.

In the next tutorial, we will discuss about the First Normal Form in details.

Second Normal Form (2NF)

For a table to be in the Second Normal Form,

1. It should be in the First Normal form.


2. And, it should not have Partial Dependency.

To understand what is Partial Dependency and how to normalize a table to 2nd normal for,
jump to the Second Normal Form tutorial.
53 | P a g e
Third Normal Form (3NF)

A table is said to be in the Third Normal Form when,

1. It is in the Second Normal form.


2. And, it doesn't have Transitive Dependency.

Here is the Third Normal Form tutorial. But we suggest you to first study about the second
normal form and then head over to the third normal form.

Boyce and Codd Normal Form (BCNF)

Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals
with certain type of anomaly that is not handled by 3NF. A 3NF table which does not have
multiple overlapping candidate keys is said to be in BCNF. For a table to be in BCNF,
following conditions must be satisfied:

 R must be in 3rd Normal Form


 and, for each functional dependency ( X → Y ), X should be a super Key.

To learn about BCNF in detail with a very easy to understand example, head to Boye-Codd
Normal Form tutorial.

Fourth Normal Form (4NF)

A table is said to be in the Fourth Normal Form when,

1. It is in the Boyce-Codd Normal Form.

54 | P a g e
2. And, it doesn't have Multi-Valued Dependency.

Here is the Fourth Normal Form tutorial. But we suggest you to understand other normal
forms before you head over to the fourth normal form.

What is First Normal Form (1NF)?

In this tutorial we will learn about the 1st(First) Normal Form which is more like the Step
1 of the Normalization process. The 1st Normal form expects you to design your table in
such a way that it can easily be extended and it is easier for you to retrieve data from it
whenever required.

In our last tutorial we learned and understood how data redundancy or repetition can lead
to several issues like Insertion, Deletion and Updation anomalies and how Normalization
can reduce data redundancy and make the data more meaningful.

If tables in a database are not even in the 1st Normal Form, it is considered as bad database
design.

Rules for First Normal Form

The first normal form expects you to follow a few simple rules while designing your
database, and they are:

55 | P a g e
Rule 1: Single Valued Attributes

Each column of your table should be single valued which means they should not contain
multiple values. We will explain this with help of an example later, let's see the other rules
for now.

Rule 2: Attribute Domain should not change

This is more of a "Common Sense" rule. In each column the values stored must be of the
same kind or type.

For example: If you have a column dob to save date of births of a set of people, then you
cannot or you must not save 'names' of some of them in that column along with 'date of
birth' of others in that column. It should hold only 'date of birth' for all the records/rows.

Rule 3: Unique name for Attributes/Columns

This rule expects that each column in a table should have a unique name. This is to avoid
confusion at the time of retrieving data or performing any other operation on the stored
data.

If one or more columns have same name, then the DBMS system will be left confused.

Rule 4: Order doesn't matters

This rule says that the order in which you store the data in your table doesn't matter.

56 | P a g e
Time for an Example

Although all the rules are self explanatory still let's take an example where we will create a
table to store student data which will have student's roll no., their name and the name of
subjects they have opted for.

Here is our table, with some sample data added to it.

roll_no name subject

101 Akon OS, CN

103 Ckon Java

102 Bkon C, C++

Our table already satisfies 3 rules out of the 4 rules, as all our column names are unique, we
have stored data in the order we wanted to and we have not inter-mixed different type of
data in columns.

But out of the 3 different students in our table, 2 have opted for more than 1 subject. And
we have stored the subject names in a single column. But as per the 1st Normal form each
column must contain atomic value.

How to solve this Problem?

It's very simple, because all we have to do is break the values into atomic values.

Here is our updated table and it now satisfies the First Normal Form.

roll_no name subject

57 | P a g e
101 Akon OS

101 Akon CN

103 Ckon Java

102 Bkon C

102 Bkon C++

By doing so, although a few values are getting repeated but values for the subject column
are now atomic for each record/row.

Using the First Normal Form, data redundancy increases, as there will be many columns
with same data in multiple rows but each row as a whole will be unique.

What is Second Normal Form?

For a table to be in the Second Normal Form, it must satisfy two conditions:

1. The table should be in the First Normal Form.


2. There should be no Partial Dependency.

If you want you can skip the video, as the concept is covered in detail below the video.

What is Partial Dependency? Do not worry about it. First let's understand what is
Dependency in a table?

58 | P a g e
What is Dependency?

Let's take an example of a Student table with columns student_id, name, reg_no(registration
number), branch and address(student's home address).

student_id name reg_no branch address

In this table, student_id is the primary key and will be unique for every row, hence we can
use student_id to fetch any row of data from this table

Even for a case, where student names are same, if we know the student_id we can easily
fetch the correct record.

student_id name reg_no branch address

10 Akon 07-WY CSE Kerala

11 Akon 08-WY IT Gujarat

Hence we can say a Primary Key for a table is the column or a group of columns(composite
key) which can uniquely identify each record in the table.

I can ask from branch name of student with student_id 10, and I can get it. Similarly, if I ask
for name of student with student_id 10 or 11, I will get it. So all I need is student_id and
every other column depends on it, or can be fetched using it.

This is Dependency and we also call it Functional Dependency.

59 | P a g e
What is Partial Dependency?

Now that we know what dependency is, we are in a better state to understand what partial
dependency is.

For a simple table like Student, a single column like student_id can uniquely identfy all the
records in a table.

But this is not true all the time. So now let's extend our example to see if more than 1 column
together can act as a primary key.

Let's create another table for Subject, which will have subject_id and subject_name fields
and subject_id will be the primary key.

subject_id subject_name

1 Java

2 C++

3 Php

Now we have a Student table with student information and another table Subject for storing
subject information.

Let's create another table Score, to store the marks obtained by students in the respective
subjects. We will also be saving name of the teacher who teaches that subject along with
marks.

score_id student_id subject_id marks teacher

60 | P a g e
1 10 1 70 Java Teacher

2 10 2 75 C++ Teacher

3 11 1 80 Java Teacher

In the score table we are saving the student_id to know which student's marks are these
and subject_id to know for which subject the marks are for.

Together, student_id + subject_id forms a Candidate Key(learn about Database Keys) for
this table, which can be the Primary key.

Confused, How this combination can be a primary key?

See, if I ask you to get me marks of student with student_id 10, can you get it from this table?
No, because you don't know for which subject. And if I give you subject_id, you would not
know for which student. Hence we need student_id + subject_id to uniquely identify any
row.

But where is Partial Dependency?

Now if you look at the Score table, we have a column names teacher which is only dependent
on the subject, for Java it's Java Teacher and for C++ it's C++ Teacher & so on.

Now as we just discussed that the primary key for this table is a composition of two columns
which is student_id & subject_id but the teacher's name only depends on subject, hence the
subject_id, and has nothing to do with student_id.

This is Partial Dependency, where an attribute in a table depends on only a part of the
primary key and not on the whole key.

61 | P a g e
How to remove Partial Dependency?

There can be many different solutions for this, but out objective is to remove teacher's name
from Score table.

The simplest solution is to remove columns teacher from Score table and add it to the
Subject table. Hence, the Subject table will become:

subject_id subject_name teacher

1 Java Java Teacher

2 C++ C++ Teacher

3 Php Php Teacher

And our Score table is now in the second normal form, with no partial dependency.

score_id student_id subject_id marks

1 10 1 70

2 10 2 75

3 11 1 80

Quick Recap

1. For a table to be in the Second Normal form, it should be in the First Normal form and
it should not have Partial Dependency.
2. Partial Dependency exists, when for a composite primary key, any attribute in the
table depends only on a part of the primary key and not on the complete primary key.

62 | P a g e
3. To remove Partial dependency, we can divide the table, remove the attribute which is
causing partial dependency, and move it to some other table where it fits in well.

Third Normal Form (3NF)

Third Normal Form is an upgrade to Second Normal Form. When a table is in the Second
Normal Form and has no transitive dependency, then it is in the Third Normal Form.

The video below covers the concept of Third Normal Form in details.

In our last tutorial, we learned about the second normal form and even normalized our
Score table into the 2nd Normal Form.

So let's use the same example, where we have 3 tables, Student, Subject and Score.

Student Table
student_id name reg_no branch address

10 Akon 07-WY CSE Kerala

11 Akon 08-WY IT Gujarat

12 Bkon 09-WY IT Rajasthan

Subject Table
subject_id subject_name teacher

1 Java Java Teacher

2 C++ C++ Teacher

63 | P a g e
3 Php Php Teacher

Score Table
score_id student_id subject_id marks

1 10 1 70

2 10 2 75

3 11 1 80

In the Score table, we need to store some more information, which is the exam name and
total marks, so let's add 2 more columns to the Score table.

score_id student_id subject_id marks exam_name total_marks

Requirements for Third Normal Form

For a table to be in the third normal form,

1. It should be in the Second Normal form.


2. And it should not have Transitive Dependency.

64 | P a g e
What is Transitive Dependency?

With exam_name and total_marks added to our Score table, it saves more data now. Primary
key for our Score table is a composite key, which means it's made up of two attributes or
columns → student_id + subject_id.

Our new column exam_name depends on both student and subject. For example, a
mechanical engineering student will have Workshop exam but a computer science student
won't. And for some subjects you have Prctical exams and for some you don't. So we can say
that exam_name is dependent on both student_id and subject_id.

And what about our second new column total_marks? Does it depend on our Score table's
primary key?

Well, the column total_marks depends on exam_name as with exam type the total score
changes. For example, practicals are of less marks while theory exams are of more marks.

But, exam_name is just another column in the score table. It is not a primary key or even a
part of the primary key, and total_marks depends on it.

This is Transitive Dependency. When a non-prime attribute depends on other non-prime


attributes rather than depending upon the prime attributes or primary key.

How to remove Transitive Dependency?

Again the solution is very simple. Take out the columns exam_name and total_marks from
Score table and put them in an Exam table and use the exam_id wherever required.

Score Table: In 3rd Normal Form


score_id student_id subject_id marks exam_id

65 | P a g e
The new Exam table
exam_id exam_name total_marks

1 Workshop 200

2 Mains 70

3 Practicals 30

Advantage of removing Transitive Dependency

The advantage of removing transitive dependency is,

 Amount of data duplication is reduced.


 Data integrity achieved.

Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and is also
known as 3.5 Normal Form.

Follow the video above for complete explanation of BCNF. Or, if you want, you can even skip
the video and jump to the section below for the complete tutorial.

In our last tutorial, we learned about the third normal form and we also learned how to
remove transitive dependency from a table, we suggest you to follow the last tutorial before
this one.

66 | P a g e
Rules for BCNF

For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two
conditions:

1. It should be in the Third Normal Form.


2. And, for any dependency A → B, A should be a super key.

The second point sounds a bit tricky, right? In simple words, it means, that for a dependency
A → B, A cannot be a non-prime attribute, if B is a prime attribute.

Time for an Example

Below we have a college enrolment table with columns student_id, subject and professor.

student_id subject professor

101 Java P.Java

101 C++ P.Cpp

102 Java P.Java2

103 C# P.Chash

104 Java P.Java

As you can see, we have also added some sample data to the table.

In the table above:

67 | P a g e
 One student can enrol for multiple subjects. For example, student with student_id
101, has opted for subjects - Java & C++
 For each subject, a professor is assigned to the student.
 And, there can be multiple professors teaching one subject like we have for Java.

What do you think should be the Primary Key?

Well, in the table above student_id, subject together form the primary key, because using
student_id and subject, we can find all the columns of the table.

One more important point to note here is, one professor teaches only one subject, but one
subject may have two different professors.

Hence, there is a dependency between subject and professor here, where subject depends
on the professor name.

This table satisfies the 1st Normal form because all the values are atomic, column names
are unique and all the values stored in a particular column are of same domain.

This table also satisfies the 2nd Normal Form as their is no Partial Dependency.

And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.

But this table is not in Boyce-Codd Normal Form.

Why this table is not in BCNF?

In the table above, student_id, subject form primary key, which means subject column is a
prime attribute.

But, there is one more dependency, professor → subject.

68 | P a g e
And while subject is a prime attribute, professor is a non-prime attribute, which is not
allowed by BCNF.

How to satisfy BCNF?

To make this relation(table) satisfy BCNF, we will decompose this table into two tables,
student table and professor table.

Below we have the structure for both the tables.

Student Table

student_id p_id

101 1

101 2

and so on...

And, Professor Table

p_id professor subject

1 P.Java Java

2 P.Cpp C++

and so on...

And now, this relation satisfy Boyce-Codd Normal Form. In the next tutorial we will learn
about the Fourth Normal Form.

69 | P a g e
INTRODUCTION TO SQL(STRUCTURED QUERY LANGUAGE)

Structure Query Language (SQL) is a database query language used for storing and
managing data in Relational DBMS. SQL was the first commercial language introduced for
E.F Codd's Relational model of database. Today almost all RDBMS(MySql, Oracle, Infomix,
Sybase, MS Access) use SQL as the standard database query language. SQL is used to
perform all types of data operations in RDBMS.

SQL Command

SQL defines following ways to manipulate data stored in an RDBMS.

DDL: Data Definition Language

This includes changes to the structure of the table like creation of table, altering table,
deleting a table etc.

All DDL commands are auto-committed. That means it saves all the changes permanently in
the database.

Command Description

create to create new table or database

alter for alteration

truncate delete data from table

drop to drop a table

70 | P a g e
rename to rename a table

DML: Data Manipulation Language

DML commands are used for manipulating the data stored in the table and not the table
itself.

DML commands are not auto-committed. It means changes are not permanent to database,
they can be rolled back.

Command Description

insert to insert a new row

update to update existing row

delete to delete a row

merge merging two rows or two tables

TCL: Transaction Control Language

These commands are to keep a check on other commands and their affect on the database.
These commands can annul changes made by other commands by rolling the data back to
its original state. It can also make any temporary change permanent.

Command Description

commit to permanently save

rollback to undo change

71 | P a g e
savepoint to save temporarily

DCL: Data Control Language

Data control language are the commands to grant and take back authority from any
database user.

Command Description

grant grant permission of right

revoke take back permission.

DQL: Data Query Language

Data query language is used to fetch data from tables based on conditions that we can easily
apply.

Command Description

select retrieve records from one or more table

SQL: create command

create is a DDL SQL command used to create a table or a database in relational database
management system.

72 | P a g e
CREATING A DATABASE

To create a database in RDBMS, create command is used. Following is the syntax,

CREATE DATABASE <DB_NAME>;

Example for creating Database


CREATE DATABASE Test;

The above command will create a database named Test, which will be an empty schema
without any table.

To create tables in this newly created database, we can again use the create command.

CREATING A TABLE

create command can also be used to create tables. Now when we create a table, we have to
specify the details of the columns of the tables too. We can specify the names and datatypes
of various columns in the create command itself.

Following is the syntax,

CREATE TABLE <TABLE_NAME>


(
column_name1 datatype1,
column_name2 datatype2,
column_name3 datatype3,
column_name4 datatype4
);

73 | P a g e
create table command will tell the database system to create a new table with the given
table name and column information.

Example for creating Table


CREATE TABLE Student(
student_id INT,
name VARCHAR(100),
age INT);

The above command will create a new table with name Student in the current database with
3 columns, namely student_id, name and age. Where the column student_id will only store
integer, name will hold upto 100 characters and age will again store only integer value.

If you are currently not logged into your database in which you want to create the table then
you can also add the database name along with table name, using a dot operator .

For example, if we have a database with name Test and we want to create a table Student in
it, then we can do so using the following query:

CREATE TABLE Test.Student(


student_id INT,
name VARCHAR(100),
age INT);

Most commonly used datatypes for Table columns

Here we have listed some of the most commonly used datatypes used for columns in tables.

74 | P a g e
Datatype Use

INT used for columns which will store integer values.

FLOAT used for columns which will store float values.

DOUBLE used for columns which will store float values.

used for columns which will be used to store characters and integers, basically
VARCHAR
a string.

CHAR used for columns which will store char values(single character).

DATE used for columns which will store date values.

used for columns which will store text which is generally long in length. For
example, if you create a table for storing profile information of a social
TEXT
networking website, then for about me section you can have a column of type
TEXT.

DROP COMMAND

DROP command completely removes a table from the database. This command will also
destroy the table structure and the data stored in it. Following is its syntax,

DROP TABLE table_name

Here is an example explaining it,

DROP TABLE student;

The above query will delete the Student table completely. It can also be used on Databases,
to delete the complete database. For example, to drop a database,

75 | P a g e
DROP DATABASE Test;

The above query will drop the database with name Test from the system.

RENAME query

RENAME command is used to set a new name for any existing table. Following is the syntax,

RENAME TABLE old_table_name to new_table_name

Here is an example explaining it.

RENAME TABLE student to students_info;

The above query will rename the table student to students_info.

USING INSERT SQL COMMAND

Data Manipulation Language (DML) statements are used for managing data in database.
DML commands are not auto-committed. It means changes made by DML command are not
permanent to database, it can be rolled back.

Talking about the Insert command, whenever we post a Tweet on Twitter, the text is stored
in some table, and as we post a new tweet, a new record gets inserted in that table.

INSERT command

Insert command is used to insert data into a table. Following is its general syntax,

INSERT INTO table_name VALUES(data1, data2, ...)

76 | P a g e
Lets see an example,

Consider a table student with the following fields.

s_id name age

INSERT INTO student VALUES(101, 'Adam', 15);

The above command will insert a new record into student table.

s_id name age

101 Adam 15

Insert value into only specific columns

We can use the INSERT command to insert values for only some specific columns of a row.
We can specify the column names along with the values to be inserted like this,

INSERT INTO student(id, name) values(102, 'Alex');

The above SQL query will only insert id and name values in the newly inserted record.

Insert NULL value to a column

Both the statements below will insert NULL value into age column of the student table.

INSERT INTO student(id, name) values(102, 'Alex');

Or,

INSERT INTO Student VALUES(102,'Alex', null);

The above command will insert only two column values and the other column is set to null.
77 | P a g e
S_id S_Name age

101 Adam 15

102 Alex

Insert Default value to a column


INSERT INTO Student VALUES(103,'Chris', default)
S_id S_Name age

101 Adam 15

102 Alex

103 chris 14

Suppose the column age in our tabel has a default value of 14.

Also, if you run the below query, it will insert default value into the age column, whatever
the default value may be.

INSERT INTO Student VALUES(103,'Chris')

Using UPDATE SQL command

Let's take an example of a real-world problem. These days, Facebook provides an option for
Editing your status update, how do you think it works? Yes, using the Update SQL command.

Let's learn about the syntax and usage of the UPDATE command.

78 | P a g e
UPDATE COMMAND

UPDATE command is used to update any record of data in a table. Following is its general
syntax,

UPDATE table_name SET column_name = new_value WHERE some_condition;


WHERE is used to add a condition to any SQL query, we will soon study about it in detail.

Lets take a sample table student,

student_id name age

101 Adam 15

102 Alex

103 chris 14

UPDATE student SET age=18 WHERE student_id=102;


S_id S_Name age

101 Adam 15

102 Alex 18

103 chris 14

In the above statement, if we do not use the WHERE clause, then our update query will
update age for all the columns of the table to 18.

Updating Multiple Columns

We can also update values of multiple columns using a single UPDATE statement.
79 | P a g e
UPDATE student SET name='Abhi', age=17 where s_id=103;

The above command will update two columns of the record which has s_id 103.

s_id name age

101 Adam 15

102 Alex 18

103 Abhi 17

UPDATE Command: Incrementing Integer Value

When we have to update any integer value in a table, then we can fetch and update the value
in the table in a single statement.

For example, if we have to update the age column of student table every year for every
student, then we can simply run the following UPDATE statement to perform the following
operation:

UPDATE student SET age = age+1;

As you can see, we have used age = age + 1 to increment the value of age by 1.

NOTE: This style only works for integer values.

Using DELETE SQL command

When you ask any question in Studytonight's Forum it gets saved into a table. And using the
Delete option, you can even delete a question asked by you. How do you think that works?
Yes, using the Delete DML command.

80 | P a g e
Let's study about the syntax and the usage of the Delete command.

DELETE command

DELETE command is used to delete data from a table.

Following is its general syntax,

DELETE FROM table_name;

Let's take a sample table student:

s_id name age

101 Adam 15

102 Alex 18

103 Abhi 17

Delete all Records from a Table


DELETE FROM student;

The above command will delete all the records from the table student.

Delete a particular Record from a Table

In our student table if we want to delete a single record, we can use the WHERE clause to
provide a condition in our DELETE statement.

DELETE FROM student WHERE s_id=103;

81 | P a g e
The above command will delete the record where s_id is 103 from the table student.

S_id S_Name age

101 Adam 15

102 Alex 18

Isn't DELETE same as TRUNCATE

TRUNCATE command is different from DELETE command. The delete command will delete
all the rows from a table whereas truncate command not only deletes all the records stored
in the table, but it also re-initializes the table(like a newly created table).

For eg: If you have a table with 10 rows and an auto_increment primary key, and if you use
DELETE command to delete all the rows, it will delete all the rows, but will not re-initialize
the primary key, hence if you will insert any row after using the DELETE command, the
auto_increment primary key will start from 11. But in case of TRUNCATE command,
primary key is re-initialized, and it will again start from 1.

o avoid that, we use the COMMIT command to mark the changes as permanent.

Following is commit command's syntax,

COMMIT;

ROLLBACK COMMAND

This command restores the database to last commited state. It is also used with SAVEPOINT
command to jump to a savepoint in an ongoing transaction.

82 | P a g e
If we have used the UPDATE command to make some changes into the database, and realise
that those changes were not required, then we can use the ROLLBACK command to rollback
those changes, if they were not commited using the COMMIT command.

Following is rollback command's syntax,

ROLLBACK TO savepoint_name;

SAVEPOINT command

SAVEPOINT command is used to temporarily save a transaction so that you can rollback to
that point whenever required.

Following is savepoint command's syntax,

SAVEPOINT savepoint_name;

In short, using this command we can name the different states of our data in any table and
then rollback to that state using the ROLLBACK command whenever required.

Using Savepoint and Rollback

Following is the table class,

id name

1 Abhi

2 Adam

4 Alex

83 | P a g e
Lets use some SQL queries on the above table and see the results.

INSERT INTO class VALUES(5, 'Rahul');

COMMIT;

UPDATE class SET name = 'Abhijit' WHERE id = '5';

SAVEPOINT A;

INSERT INTO class VALUES(6, 'Chris');

SAVEPOINT B;

INSERT INTO class VALUES(7, 'Bravo');

SAVEPOINT C;

SELECT * FROM class;


NOTE: SELECT statement is used to show the data stored in the table.

The resultant table will look like,

id name

1 Abhi

2 Adam

4 Alex

84 | P a g e
5 Abhijit

6 Chris

7 Bravo

Now let's use the ROLLBACK command to roll back the state of data to the savepoint B.

ROLLBACK TO B;

SELECT * FROM class;

Now our class table will look like,

id name

1 Abhi

2 Adam

4 Alex

5 Abhijit

6 Chris

Now let's again use the ROLLBACK command to roll back the state of data to the savepoint
A

ROLLBACK TO A;

SELECT * FROM class;

Now the table will look like,


85 | P a g e
id name

1 Abhi

2 Adam

4 Alex

5 Abhijit

So now you know how the commands COMMIT, ROLLBACK and SAVEPOINT works.

Using GRANT and REVOKE

Data Control Language (DCL) is used to control privileges in Database. To perform any
operation in the database, such as for creating tables, sequences or views, a user needs
privileges. Privileges are of two types,

 System: This includes permissions for creating session, table, etc and all types of other
system privileges.
 Object: This includes permissions for any command or query to perform any
operation on the database tables.

In DCL we have two commands,

 GRANT: Used to provide any user access privileges or other priviliges for the
database.
 REVOKE: Used to take back permissions from any user.

86 | P a g e
Allow a User to create session

When we create a user in SQL, it is not even allowed to login and create a session until and
unless proper permissions/priviliges are granted to the user.

Following command can be used to grant the session creating priviliges.

GRANT CREATE SESSION TO username;

Allow a User to create table

To allow a user to create tables in the database, we can use the below command,

GRANT CREATE TABLE TO username;

Provide user with space on tablespace to store table

Allowing a user to create table is not enough to start storing data in that table. We also must
provide the user with priviliges to use the available tablespace for their table and data.

ALTER USER username QUOTA UNLIMITED ON SYSTEM;

The above command will alter the user details and will provide it access to unlimited
tablespace on system.

NOTE: Generally unlimited quota is provided to Admin users.

Grant all privilege to a User

sysdba is a set of priviliges which has all the permissions in it. So if we want to provide all
the privileges to any user, we can simply grant them the sysdba permission.
87 | P a g e
GRANT sysdba TO username

Grant permission to create any table

Sometimes user is restricted from creating come tables with names which are reserved for
system tables. But we can grant privileges to a user to create any table using the below
command,

GRANT CREATE ANY TABLE TO username

Grant permission to drop any table

As the title suggests, if you want to allow user to drop any table from the database, then
grant this privilege to the user,

GRANT DROP ANY TABLE TO username

To take back Permissions

And, if you want to take back the privileges from any user, use the REVOKE command.

REVOKE CREATE TABLE FROM username

Using the WHERE SQL clause

WHERE clause is used to specify/apply any condition while retrieving, updating or deleting
data from a table. This clause is used mostly with SELECT, UPDATE and DELETEquery.

When we specify a condition using the WHERE clause then the query executes only for those
records for which the condition specified by the WHERE clause is true.

88 | P a g e
Syntax for WHERE clause

Here is how you can use the WHERE clause with a DELETE statement, or any other
statement,

DELETE FROM table_name WHERE [condition];

The WHERE clause is used at the end of any SQL query, to specify a condition for execution.

Time for an Example

Consider a table student,

s_id name age address

101 Adam 15 Chennai

102 Alex 18 Delhi

103 Abhi 17 Banglore

104 Ankit 22 Mumbai

Now we will use the SELECT statement to display data of the table, based on a condition,
which we will add to our SELECT query using WHERE clause.

Let's write a simple SQL query to display the record for student with s_id as 101.

SELECT s_id,
name,
age,
89 | P a g e
address
FROM student WHERE s_id = 101;

Following will be the result of the above query.

s_id name age address

101 Adam 15 Noida

Applying condition on Text Fields

In the above example we have applied a condition to an integer value field, but what if we
want to apply the condition on name field. In that case we must enclose the value in single
quote ' '. Some databases even accept double quotes, but single quotes is accepted by all.

SELECT s_id,
name,
age,
address
FROM student WHERE name = 'Adam';

Following will be the result of the above query.

s_id name age address

101 Adam 15 Noida

90 | P a g e
OPERATORS FOR WHERE CLAUSE CONDITION

Following is a list of operators that can be used while specifying the WHERE clause
condition.

Operator Description

= Equal to

!= Not Equal to

< Less than

> Greater than

<= Less than or Equal to

>= Greate than or Equal to

BETWEEN Between a specified range of values

LIKE This is used to search for a pattern in value.

IN In a given set of values

SQL ORDER BY Clause

Order by clause is used with SELECT statement for arranging retrieved data in sorted order.
The Order by clause by default sorts the retrieved data in ascending order. To sort the data
in descending order DESC keyword is used with Order by clause.

91 | P a g e
Syntax of Order By

SELECT column-list|* FROM table-name ORDER BY ASC | DESC;

Using default Order by

Consider the following Emp table,

eid name age salary

401 Anu 22 9000

402 Shane 29 8000

403 Rohan 34 6000

404 Scott 44 10000

405 Tiger 35 8000

SELECT * FROM Emp ORDER BY salary;

The above query will return the resultant data in ascending order of the salary.

eid name age salary

403 Rohan 34 6000

402 Shane 29 8000

405 Tiger 35 8000

401 Anu 22 9000

404 Scott 44 10000

92 | P a g e
Using Order by DESC

Consider the Emp table described above,

SELECT * FROM Emp ORDER BY salary DESC;

The above query will return the resultant data in descending order of the salary.

eid name age salary

404 Scott 44 10000

401 Anu 22 9000

405 Tiger 35 8000

402 Shane 29 8000

403 Rohan 34 6000

RELATIONAL ALGEBRA IN DBMS

What is Relational Algebra?

RELATIONAL ALGEBRA is a widely used procedural query language. It


collects instances of relations as input and gives occurrences of relations
as output. It uses various operations to perform this action. Relational
algebra operations are performed recursively on a relation. The output

93 | P a g e
of these operations is a new relation, which might be formed from one
or more input relations.

Basic Relational Algebra Operations:

Relational Algebra devided in various groups

Unary Relational Operations

 SELECT (symbol: σ)
 PROJECT (symbol: π)
 RENAME (symbol: )

Relational Algebra Operations From Set Theory

 UNION (υ)
 INTERSECTION ( ),
 DIFFERENCE (-)
 CARTESIAN PRODUCT ( x )

Binary Relational Operations

 JOIN
 DIVISION

Let's study them in detail:

SELECT (σ)

The SELECT operation is used for selecting a subset of the tuples


according to a given selection condition. Sigma(σ)Symbol denotes it. It is

94 | P a g e
used as an expression to choose tuples which meet the selection
condition. Select operation selects tuples that satisfy a given predicate.

σp(r)

σ is the predicate

r stands for relation which is the name of the table

p is prepositional logic

Example 1

σ topic = "Database" (Tutorials)

Output - Selects tuples from Tutorials where topic = 'Database'.

Example 2

σ topic = "Database" and author = "guru99"( Tutorials)

Output - Selects tuples from Tutorials where the topic is 'Database' and
'author' is guru99.

Example 3

σ sales > 50000 (Customers)

Output - Selects tuples from Customers where sales is greater than


50000

Projection(π)

95 | P a g e
The projection eliminates all attributes of the input relation but those
mentioned in the projection list. The projection method defines a
relation that contains a vertical subset of Relation.

This helps to extract the values of specified attributes to eliminates


duplicate values. (pi) The symbol used to choose attributes from a
relation. This operation helps you to keep specific columns from a
relation and discards the other columns.

Example of Projection:

Consider the following table

CustomerID CustomerName Status

1 Google Active

2 Amazon Active

3 Apple Inactive

4 Alibaba Active

Here, the projection of CustomerName and status will give

Π CustomerName, Status (Customers)


CustomerName Status

Google Active

Amazon Active

Apple Inactive

96 | P a g e
Alibaba Active

Union operation (υ)

UNION is symbolized by ∪ symbol. It includes all tuples that are in tables


A or in B. It also eliminates duplicate tuples. So, set A UNION set B would
be expressed as:

The result <- A ∪ B

For a union operation to be valid, the following conditions must hold -

 R and S must be the same number of attributes.


 Attribute domains need to be compatible.
 Duplicate tuples should be automatically removed.

Example

Consider the following tables.

Table A Table B

column column column column


1 2 1 2

1 1 1 1

1 2 1 3

A ∪ B gives

Table A ∪ B

97 | P a g e
column 1 column 2

1 1

1 2

1 3

Set Difference (-)

- Symbol denotes it. The result of A - B, is a relation which includes all


tuples that are in A but not in B.

 The attribute name of A has to match with the attribute name in B.


 The two-operand relations A and B should be either compatible or
Union compatible.
 It should be defined relation consisting of the tuples that are in
relation A, but not in B.

Example

A-B
Table A - B

column 1 column 2

1 2

Intersection

An intersection is defined by the symbol ∩

98 | P a g e
A∩B

Defines a relation consisting of a set of all tuple that are in both A and B.
However, A and B must be union-compatible.

Example:

A∩B
Table A ∩ B

column 1 column 2

1 1

Cartesian product(X)

This type of operation is helpful to merge columns from two relations.


Generally, a Cartesian product is never a meaningful operation when it
performs alone. However, it becomes meaningful when it is followed by
other operations.

Example – Cartesian product

σ column 2 = '1' (A X B)

99 | P a g e
Output – The above example shows all rows from relation A and B whose
column 2 has value 1

σ column 2 = '1' (A X
B)

column 1 column 2

1 1

1 1

Join Operations

Join operation is essentially a cartesian product followed by a selection


criterion.

Join operation denoted by ⋈.

JOIN operation also allows joining variously related tuples from different
relations.

Types of JOIN:

Various forms of join operation are:

Inner Joins:

 Theta join
 EQUI join
 Natural join

100 | P a g e
Outer join:

 Left Outer Join


 Right Outer Join
 Full Outer Join

Inner Join:

In an inner join, only those tuples that satisfy the matching criteria are
included, while the rest are excluded. Let's study various types of Inner
Joins:

Theta Join:

The general case of JOIN operation is called a Theta join. It is denoted by


symbol θ

Example

A ⋈θ B

Theta join can use any conditions in the selection criteria.

For example:

A ⋈ A.column 2 > B.column 2 (B)


A ⋈ A.column 2 > B.column
2 (B)

column 1 column 2

1 2

101 | P a g e
EQUI join:

When a theta join uses only equivalence condition, it becomes a equi join.

For example:

A ⋈ A.column 2 = B.column 2 (B)


A ⋈ A.column 2 = B.column
2 (B)

column 1 column 2

1 1

EQUI join is the most difficult operations to implement efficiently in an


RDBMS and one reason why RDBMS have essential performance
problems.

NATURAL JOIN (⋈)

Natural join can only be performed if there is a common attribute


(column) between the relations. The name and type of the attribute must
be same.

Example

Consider the following two tables

Num Square

102 | P a g e
2 4

3 9

Num Cube

2 8

3 27

C⋈D
C⋈D

Num Square Cube

2 4 4

3 9 27

OUTER JOIN

In an outer join, along with tuples that satisfy the matching criteria, we
also include some or all tuples that do not match the criteria.

Left Outer Join(A B)

In the left outer join, operation allows keeping all tuple in the left
relation. However, if there is no matching tuple is found in right relation,
then the attributes of right relation in the join result are filled with null
values.

103 | P a g e
Consider the following 2 Tables

Num Square

2 4

3 9

4 16

Num Cube

2 8

3 18

5 75

A B
A⋈B

Num Square Cube

2 4 4

104 | P a g e
3 9 9

4 16 -

Right Outer Join: ( A B)

In the right outer join, operation allows keeping all tuple in the right
relation. However, if there is no matching tuple is found in the left
relation, then the attributes of the left relation in the join result are filled
with null values.

A B
A⋈B

Num Cube Square

2 8 4

3 18 9

5 75 -

Full Outer Join: ( A B)

In a full outer join, all tuples from both relations are included in the
result, irrespective of the matching condition.

105 | P a g e
A B
A⋈B

Num Cube Square

2 4 8

3 9 18

4 16 -

5 - 75

RELATIONAL ALGEBRA

Relational database systems are expected to be equipped with a query language that can
assist its users to query the database instances. There are two kinds of query languages −
relational algebra and relational calculus.

Relational Algebra

Relational algebra is a procedural query language, which takes instances of relations as


input and yields instances of relations as output. It uses operators to perform queries. An
operator can be either unary or binary. They accept relations as their input and yield
relations as their output. Relational algebra is performed recursively on a relation and
intermediate results are also considered relations.

The fundamental operations of relational algebra are as follows −

 Select
 Project

106 | P a g e
 Union
 Set different
 Cartesian product
 Rename

We will discuss all these operations in the following sections.

Select Operation (σ)

It selects tuples that satisfy the given predicate from a relation.

Notation − σp(r)

Where σ stands for selection predicate and r stands for relation. p is prepositional logic
formula which may use connectors like and, or, and not. These terms may use relational
operators like − =, ≠, ≥, < , >, ≤.

For example −

σsubject = "database"(Books)

Output − Selects tuples from books where subject is 'database'.

σsubject = "database" and price = "450"(Books)

Output − Selects tuples from books where subject is 'database' and 'price' is 450.

σsubject = "database" and price = "450" or year > "2010"(Books)

Output − Selects tuples from books where subject is 'database' and 'price' is 450 or those
books published after 2010.

Project Operation (∏)

107 | P a g e
It projects column(s) that satisfy a given predicate.

Notation − ∏A1, A2, An (r)

Where A1, A2 , An are attribute names of relation r.

Duplicate rows are automatically eliminated, as relation is a set.

For example −

∏subject, author (Books)

Selects and projects columns named as subject and author from the relation Books.

Union Operation (∪)

It performs binary union between two given relations and is defined as −

r ∪ s = { t | t ∈ r or t ∈ s}

Notation − r U s

Where r and s are either database relations or relation result set (temporary relation).

For a union operation to be valid, the following conditions must hold −

 r, and s must have the same number of attributes.


 Attribute domains must be compatible.
 Duplicate tuples are automatically eliminated.

∏ author (Books) ∪ ∏ author (Articles)

Output − Projects the names of the authors who have either written a book or an article or
both.

108 | P a g e
Set Difference (−)

The result of set difference operation is tuples, which are present in one relation but are not
in the second relation.

Notation − r − s

Finds all the tuples that are present in r but not in s.

∏ author (Books) − ∏ author (Articles)

Output − Provides the name of authors who have written books but not articles.

Cartesian Product (Χ)

Combines information of two different relations into one.

Notation − r Χ s

Where r and s are relations and their output will be defined as −

r Χ s = { q t | q ∈ r and t ∈ s}

σauthor = 'tutorialspoint'(Books Χ Articles)

Output − Yields a relation, which shows all the books and articles written by tutorialspoint.

Rename Operation (ρ)

The results of relational algebra are also relations but without any name. The rename
operation allows us to rename the output relation. 'rename' operation is denoted with small
Greek letter rho ρ.

Notation − ρ x (E)

109 | P a g e
Where the result of expression E is saved with name of x.

Additional operations are −

 Set intersection
 Assignment
 Natural join

Relational Calculus

In contrast to Relational Algebra, Relational Calculus is a non-procedural query language,


that is, it tells what to do but never explains how to do it.

Relational calculus exists in two forms −

Tuple Relational Calculus (TRC)

Filtering variable ranges over tuples

Notation − {T | Condition}

Returns all tuples T that satisfies a condition.

For example −

{ T.name | Author(T) AND T.article = 'database' }

Output − Returns tuples with 'name' from Author who has written article on 'database'.

TRC can be quantified. We can use Existential (∃) and Universal Quantifiers (∀).

For example −

{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}

110 | P a g e
Output − The above query will yield the same result as the previous one.

Domain Relational Calculus (DRC)

In DRC, the filtering variable uses the domain of attributes instead of entire tuple values (as
done in TRC, mentioned above).

Notation −

{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}

Where a1, a2 are attributes and P stands for formulae built by inner attributes.

For example −

{< article, page, subject > | ∈ TutorialsPoint ∧ subject = 'database'}

Output − Yields Article, Page, and Subject from the relation TutorialsPoint, where subject is
database.

Just like TRC, DRC can also be written using existential and universal quantifiers. DRC also
involves relational operators.

The expression power of Tuple Relation Calculus and Domain Relation Calculus is
equivalent to Relational Algebra.

111 | P a g e

You might also like