RDBMS Material

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 81

DBMS

Relational Database Management System

Unit 1 : Database Architecture and ER Diagram


Data

In simple words, Data can be facts related to any object in consideration. For example, your
name, age, height, weight, etc. are some data related to you. A picture, image, file, pdf, etc.
can also be considered data.

Database :

 The database is a collection of inter-related data which is used to retrieve, insert and
delete the data efficiently.

 It is also used to organize the data in the form of a table, schema, views, and reports,
etc.

Database Management System

 Database management system is a software which is used to manage the database. For
example: MySQL, Oracle.
 DBMS provides an interface to perform various operations like database creation,
storing data in it, updating data, creating a table in the database and a lot more.
 It provides protection and security to the database. In the case of multiple users, it also
maintains data consistency.

DBMS allows users the following tasks:

 Data Definition: It is used for creation, modification, and removal of definition that
defines the organization of data in the database.
 Data Updation: It is used for the insertion, modification, and deletion of the actual
data in the database.
 Data Retrieval: It is used to retrieve the data from the database which can be used by
applications for various purposes.
 User Administration: It is used for registering and monitoring users, maintain data
integrity, enforcing data security, dealing with concurrency control, monitoring
performance and recovering information corrupted by unexpected failure.

Types of DBMS

The main Four Types of Database Management System are:

 Hierarchical database
 Network database
 Relational database
 Object-Oriented database

Hierarchical DBMS

In a Hierarchical database, model data is organized in a tree-like structure. Data is Stored


Hierarchically (top down or bottom up) format. Data is represented using a parent-child
relationship. In Hierarchical DBMS parent may have many children, but children have only
one parent.

Network Model

The network database model allows each child to have multiple parents. It helps you to
address the need to model more complex relationships like as the orders/parts many-to-many
relationship. In this model, entities are organized in a graph which can be accessed through
several paths.

Relational Model

Relational DBMS is the most widely used DBMS model because it is one of the easiest. This
model is based on normalizing data in the rows and columns of the tables. Relational model
stored in fixed structures and manipulated using SQL.

Object-Oriented Model

In Object-oriented Model data stored in the form of objects. The structure which is called
classes which display data within it. It is one of the components of DBMS that defines a
database as a collection of objects which stores both data members values and operations.

Advantages & Disadvantages of DBMS :

S.No Advantages Disadvantages

1 It controls database redundancy. It requires high cost of Hardware


and Software
2 Reduce time: It reduces development time Size: It occupies a large space of disks
and maintenance need. and large memory to run them
efficiently.
3 Data sharing: In DBMS, we can share the Complexity: Database system creates
data among multiple users. additional complexity and requirements.
4 Multiple user interface: It provides Higher impact of failure: All the data
different types of UI like Graphical User stored in a single database
Interfaces, Application Program Interfaces

Data Abstraction is a process of hiding unwanted or irrelevant details from the end user. It
provides a different view and helps in achieving data independence which is used to enhance
the security of data.
Levels of abstraction for DBMS
Database systems include complex data-structures. In terms of retrieval of data, reduce complexity in terms
of usability of users and in order to make the system efficient, developers use levels of abstraction that hide
irrelevant details from the users. Levels of abstraction simplify database design.

Mainly there are three levels of abstraction for DBMS, which are as follows −

 Physical or Internal Level


 Logical or Conceptual Level
 View or External Level

Let us discuss each level in detail.

Physical or Internal Level

It is the lowest level of abstraction for DBMS which defines how the data is actually stored, it defines data-
structures to store data and access methods used by the database. Actually, it is decided by developers or
database application programmers how to store the data in the database.

Logical or Conceptual Level

Logical level is the intermediate level or next higher level. It describes what data is stored in the database
and what relationship exists among those data. It tries to describe the entire or whole data because it
describes what tables to be created and what are the links among those tables that are created.

View or External Level

It is the highest level. In view level, there are different levels of views and every view only defines a part of
the entire data. It also simplifies interaction with the user and it provides many views or multiple views of
the same database.

Application of DBMS

Sector Use of DBMS


Banking For customer information, account activities, payments,
deposits, loans, etc.
Airlines For reservations and schedule information.

Universities For student information, course registrations, colleges and


grades.
Telecommunication It helps to keep call records, monthly bills, maintaining
balances, etc.
Finance For storing information about stock, sales, and purchases of
financial instruments like stocks and bonds.
Sales Use for storing customer, product & sales information.

Manufacturing It is used for the management of supply chain and for tracking
production of items. Inventories status in warehouses.
HR Management For information about employees, salaries, payroll, deduction,
generation of paychecks, etc.


Purpose of DBMS :

The purpose of DBMS is to transform the following −

 Data into information.


 Information into knowledge. 
 Knowledge into the action.

The diagram given below explains the process as to how the transformation of data to
information to knowledge to action happens respectively in the DBMS −

Previously, the database applications were built directly on top of the file system.

Database Language

 A DBMS has appropriate languages and interfaces to express database queries and
updates.
 Database languages can be used to read, store and update the data in the database.

Types of Database Language

1. Data Definition Language

 DDL stands for Data Definition Language. It is used to define database structure or
pattern.
 It is used to create schema, tables, indexes, constraints, etc. in the database.
 Using the DDL statements, you can create the skeleton of the database.
 Data definition language is used to store the information of metadata like the number
of tables and schemas, their names, indexes, columns in each table, constraints, etc.

Here are some tasks that come under DDL Commands :

 Create: It is used to create objects in the database.


 Alter: It is used to alter the structure of the database.
 Drop: It is used to delete objects from the database.
 Truncate: It is used to remove all records from a table.
 Rename: It is used to rename an object.
 Comment: It is used to comment on the data dictionary.

These commands are used to update the database schema that's why they come under Data
definition language.

2. Data Manipulation Language

DML stands for Data Manipulation Language. It is used for accessing and manipulating data
in a database. It handles user requests.

Here are some tasks that come under DML Commands :

 Select: It is used to retrieve data from a database.


 Insert: It is used to insert data into a table.
 Update: It is used to update existing data within a table.
 Delete: It is used to delete all records from a table.
 Merge: It performs UPSERT operation, i.e., insert or update operations.
 Call: It is used to call a structured query language or a Java subprogram.
 Explain Plan: It has the parameter of explaining data.
 Lock Table: It controls concurrency.

3. Data Control Language

 DCL stands for Data Control Language. It is used to retrieve the stored or saved data.
 The DCL execution is transactional. It also has rollback parameters.

Here are some tasks that come under DCL Commands :

 Grant: It is used to give user access privileges to a database.


 Revoke: It is used to take back permissions from the user.

There are the following operations which have the authorization of Revoke:

CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.

4. Transaction Control Language

TCL is used to run the changes made by the DML statement. TCL can be grouped into a
logical transaction.

Here are some tasks that come under TCL:

 Commit: It is used to save the transaction on the database.


 Rollback: It is used to restore the database to original since the last Commit.
DBMS Architecture

 The DBMS design depends upon its architecture. The basic client/server architecture is
used to deal with a large number of PCs, web servers, database servers and other
components that are connected with networks.
 The client/server architecture consists of many PCs and a workstation which are
connected via the network.
 DBMS architecture depends upon how users are connected to the database to get their
request done.

Types of DBMS Architecture

Database architecture can be seen as a single tier or multi-tier. But logically, database
architecture is of two types like: 2-tier architecture and 3-tier architecture.

Types of DBMS Architecture

There are mainly three types of DBMS architecture:

 One Tier Architecture (Single Tier Architecture)


 Two Tier Architecture
 Three Tier Architecture
1-Tier Architecture
o In this architecture, the database is directly available to the user. It means the user can directly sit on the
DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't provide a handy tool for end
users.
o The 1-Tier architecture is used for development of the local application, where programmers can directly
communicate with the database for the quick response.

2-Tier Architecture

o The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the client
end can directly communicate with the database at the server side. For this interaction, API's
like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing and transaction
management.
o To communicate with the DBMS, client-side application establishes a connection with the server side.

Fig: 2-tier Architecture

3-Tier Architecture

o The 3-Tier architecture contains another layer between the client and server. In this architecture, client
can't directly communicate with the server.
o The application on the client-end interacts with an application server which further communicates with the
database system.
o End user has no idea about the existence of the database beyond the application server. The database also
has no idea about any other user beyond the application.
o The 3-Tier architecture is used in case of large web application.

Fig: 3-tier Architecture


The goal of Three Tier client-server architecture is:

 To separate the user applications and physical database


 To support DBMS characteristics
 Program-data independence
 Supporting multiple views of the data

Database Users and Administrators

Database users are the persons who interact with the database and take the benefits of
database. They are differentiated into different types as follows :

1. Naive users: They are the unsophisticated users who interact with the system by using
permanent applications that already exist. Example: Online Library Management System,
ATMs (Automated Teller Machine), etc.
2. Application programmers: They are the computer professionals who interact with
system through DML. They write application programs.
3. Sophisticated users: They interact with the system by writing SQL queries directly
through the query processor without writing application programs.
4. Specialized users: They are also sophisticated users who write specialized database
applications that do not fit into the traditional data processing framework. Example:
Expert System, Knowledge Based System, etc.

DATABASE ADMINISTRATOR

A person who has such central control over the system is called a database administrator
(DBA). The functions of a DBA include :

Schema definition. The DBA creates the original database schema by execut-
ing a set of data definition statements in the DDL.

• Storage structure and access-method definition.

• Schema and physical-organization modification. The DBA carries out changes


to the schema and physical organization to reflect the changing needs of the
organization, or to alter the physical organization to improve performance.

• Granting of authorization for data access. By granting different types of


authorization, the database administrator can regulate which parts of the
database various users can access. The authorization information is kept in a
special system structure that the database system consults whenever someone
attempts to access the data in the system.

• Routine maintenance. Examples of the database administrator’s routine


maintenance activities are:
◦ Periodically backing up the database, either onto tapes or onto remote
servers, to prevent loss of data in case of disasters such as flooding.
◦ Ensuring that enough free disk space is available for normal operations,
and upgrading disk space as required.
◦ Monitoring jobs running on the database and ensuring that performance
is not degraded by very expensive tasks submitted by some users.

ER Diagram / ER model

 ER model stands for an Entity-Relationship model. It is a high-level data model.


This model is used to define the data elements and relationship for a specified system.
 It develops a conceptual design for the database. It also develops a very simple and
easy to design view of data.
 In ER modeling, the database structure is portrayed as a diagram called an entity-
relationship diagram.

For example, Suppose we design a School Database. In this database, the student will be an
entity with attributes like address, name, id, age, etc. The address can be another entity with
attributes like city, street name, pin code, etc and there will be a relationship between them.

Component of ER Diagram
1. Entity:

An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.

Consider an organization as an example- manager, product, employee, department etc. can be


taken as an entity.

a. Weak Entity

An entity that depends on another entity called a weak entity. The weak entity doesn't
contain any key attribute of its own. The weak entity is represented by a double rectangle.

2. Attribute

The attribute is used to describe the property of an entity. Eclipse is used to represent an
attribute.

For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute

The key attribute is used to represent the main characteristics of an entity. It represents a
primary key. The key attribute is represented by an ellipse with the text underlined.

b. Composite Attribute

An attribute that composed of many other attributes is known as a composite attribute. The
composite attribute is represented by an ellipse, and those ellipses are connected with an
ellipse.

c. Multivalued Attribute

An attribute can have more than one value. These attributes are known as a multivalued
attribute. The double oval is used to represent multivalued attribute.

For example, a student can have more than one phone number.
d. Derived Attribute

An attribute that can be derived from other attribute is known as a derived attribute. It can
be represented by a dashed ellipse.

For example, A person's age changes over time and can be derived from another attribute
like Date of birth.

3. Relationship

 A relationship is used to describe the relation between entities.


 Diamond or Rhombus is used to represent the relationship.

Types of relationship

a. One-to-One Relationship

When only one instance of an entity is associated with the relationship, then it is known as
One to One relationship.

For example, A female can marry to one male, and a male can marry to one female.
b. One-to-many relationship

When only one instance of the entity on the left, and more than one instance of an entity on
the right associates with the relationship then this is known as a One-to-Many relationship.

For example, Scientist can invent many inventions, but the invention is done by the only
specific scientist.

c. Many-to-one relationship

When more than one instance of the entity on the left, and only one instance of an entity on
the right associates with the relationship then it is known as a Many-to-One relationship.

For example, Student enrolls for only one course, but a course can have many students.

d. Many-to-many relationship

When more than one instance of the entity on the left, and more than one instance of an
entity on the right associates with the relationship then it is known as a Many-to-Many
relationship.

For example, Employee can assign by many projects and project can have many employees.

Structural Constraints of Relationships in ER Model

Structural Constraints :
Structural Constraints are also called Structural properties of a database management
system (DBMS). Cardinality Ratios and Participation Constraints taken together are called
Structural Constraints.
The Structural constraints are represented by Min-Max notation. This is a pair of
numbers(m, n) that appear on the connecting line between the entities and their
relationships.

Weak Entity

A weak entity cannot be used independently as it is dependent on a strong entity type known
as its owner entity. Also, the relationship that connects the weak entity to its owner identity
is called the identifying relationship.

In the ER diagram, both the weak entity and its corresponding relationship are represented
using a double line and the partial key is underlined with a dotted line.

In the given ER diagram, Dependent is the weak entity and it depends on the strong entity
Employee via the relationship Depends on. There can be an employee without a dependent in
the Company but there will be no record of the Dependent in the company systems unless the
dependent is associated with an Employee.

Enhanced / Extended ER Modeling ( EER )

EER is a high-level data model that incorporates the extensions to the original ER model.
Enhanced ERD are high level models that represent the requirements and complexities of
complex database.

In addition to ER model concepts EE-R includes −

 Subclasses and Super classes.


 Specialization and Generalization.
 Category or union type.
 Aggregation.

These concepts are used to create EE-R diagrams.

Subclasses and Super class

Super class is an entity that can be divided into further subtype.

For example − consider Shape super class.


 Super class shape has sub groups: Triangle, Square and Circle.
 Sub classes are the group of entities with some unique attributes.
 Sub class inherits the properties and attributes from super class.

Inheritance

We use all the above features of ER-Model in order to create classes of objects in object-
oriented programming. The details of entities are generally hidden from the user; this process
known as abstraction.

Inheritance is an important feature of Generalization and Specialization. It allows lower-level


entities to inherit the attributes of higher-level entities.

For example, the attributes of a Person class such as name, age, and gender can be inherited
by lower-level entities such as Student or Teacher.

Specialization and Generalization


Specialization is a top-down approach, and it is opposite to Generalization. In specialization,
one higher level entity can be broken down into two lower level entities.

 Specialization is used to identify the subset of an entity set that shares some
distinguishing characteristics. In above example Vehicle entity can be a Car, Truck or
Motorcycle.
 For example: In an Employee management system, EMPLOYEE entity can be
specialized as TESTER or DEVELOPER based on what role they play in the company.
Generalization is a process of generalizing an entity which contains generalized attributes
or properties of generalized entities.

It is a Bottom up process i.e. consider we have 3 sub entities Car, Truck and Motorcycle. Now
these three entities can be generalized into one super class named as Vehicle.

Unit 2 : RELATIONAL DATA MODEL

Relational Model in DBMS

 Relational Model was proposed by E.F. Codd to model data in the form of relations
or tables.
 After designing the conceptual model of Database using ER diagram, we need to
convert the conceptual model in the relational model which can be implemented using
any RDBMS languages like Oracle SQL, MySQL etc.

Relational data model is the primary data model, which is used widely around the world for
data storage and processing. This model is simple and it has all the properties and
capabilities required to process data with storage efficiency.

Relational Model concept

Relational model can represent as a table with columns and rows. Each row is known as a
tuple. Each table of the column has a name or attribute.
Tables − In relational data model, relations are saved in the format of Tables. This format stores the
relation among entities. A table has rows and columns, where rows represents records and columns
represent the attributes.
Tuple − A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance − A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
Relation schema − A relation schema describes the relation name (table name), attributes, and their
names.
Relation key − Each row has one or more attributes, known as relation key, which can identify the row
in the relation (table) uniquely.
Attribute domain − Every attribute has some pre-defined value scope, known as attribute domain.

In general, a relation schema consists of a directory of attributes and their corresponding domain.
Some Common Relational Model Terms

Relation: A relation is a table with columns and rows.


Attribute: An attribute is a named column of a relation.
Domain: A domain is the set of allowable values for one or more attributes.
Tuple: A tuple is a row of a relation.

Constraints on Relational database model

Constraints in the databases can be categorized into 3 main categories:

1. Constraints that are applied in the data model is called Implicit constraints.
2. Constraints that are directly applied in the schemas of the data model, by specifying
them in the DDL(Data Definition Language). These are called as schema-based
constraints or Explicit constraints.
3. Constraints that cannot be directly applied in the schemas of the data model. We call
these Application based or semantic constraints.

So here we will deal with Implicit constraints.


Mainly Constraints on the relational database are of 4 types:

1. Domain constraints
2. Key constraints
3. Referential integrity constraints

Relational Integrity constraints in DBMS are referred to conditions which must be present for
a valid relation. These Relational constraints in DBMS are derived from the rules in the mini-
world that the database represents.

Domain Constraints

Domain constraints can be violated if an attribute value is not appearing in the corresponding
domain or it is not of the appropriate data type.

Domain constraints specify that within each tuple, and the value of each attribute must be
unique. This is specified as data types which include standard data types integers, real
numbers, characters, Booleans, variable length strings, etc.

Example:

Create DOMAIN CustomerName


CHECK (value not NULL)

The example shown demonstrates creating a domain constraint such that CustomerName is
not NULL

Key Constraints

An attribute that can uniquely identify a tuple in a relation is called the key of the table. The
value of the attribute for different tuples in the relation has to be unique.

Example:

In the given table, CustomerID is a key attribute of Customer Table. It is most likely to have
a single key for one customer, CustomerID =1 is only for the CustomerName =” Google”.

CustomerID CustomerName Status


1 Google Active
2 Amazon Active
3 Apple Inactive

Referential Integrity Constraints

Referential Integrity constraints in DBMS are based on the concept of Foreign Keys. A foreign
key is an important attribute of a relation which should be referred to in other relationships.
Referential integrity constraint state happens where relation refers to a key attribute of a
different or same relation. However, that key element must exist in the table.

Example:

In the above example, we have 2 relations, Customer and Billing.

Tuple for CustomerID =1 is referenced twice in the relation Billing. So we know


CustomerName=Google has billing amount $300

Relational Langauage

 Relational database systems are expected to be equipped with a query language that
can assist its users to query the database instances.
 There are two kinds of Query languages − Relational Algebra and Relational
Calculus.

Relational Algebra

Relational algebra is a procedural query language. It gives a step by step process to obtain the
result of the query. It uses operators to perform queries.

Types of Relational operation


1. Select Operation (σ) :

 The select operation selects tuples that satisfy a given predicate.


 It is denoted by sigma (σ).

Notation : σ p(r)

Where:

σ is used for selection prediction


r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT.
These relational can use as relational operators like =, ≠, ≥, <, >, ≤.

For example: LOAN Relation

BRANCH_NAME LOAN_NO AMOUNT


Downtown L-17 1000
Redwood L-23 2000
Perryride L-15 1500
Downtown L-14 1500
Mianus L-13 500
Roundhill L-11 900
Perryride L-16 1300

Input:

σ BRANCH_NAME="perryride" (LOAN)

Output:

BRANCH_NAME LOAN_NO AMOUNT


Perryride L-15 1500
Perryride L-16 1300

2. Project Operation ( ∏ ):

 This operation shows the list of those attributes that we wish to appear in the result.
Rest of the attributes are eliminated from the table.
 It is denoted by ∏.

Notation: ∏ A1, A2, An (r)

Where

A1, A2, A3 is used as an attribute name of relation r.

Example: CUSTOMER RELATION


NAME STREET CITY
Jones Main Harrison
Smith North Rye
Hays Main Harrison
Curry North Rye
Johnson Alma Brooklyn
Brooks Senator Brooklyn

Input :

∏ NAME, CITY (CUSTOMER)

Output:

NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn

3. Union Operation ∪ :

 Suppose there are two tuples R and S. The union operation contains all the tuples that
are either in R or S or both in R & S.
 It eliminates the duplicate tuples. It is denoted by ∪.

Notation : R ∪ S

A union operation must hold the following condition:

 R and S must have the attribute of the same number.


 Duplicate tuples are eliminated automatically.

Example:

DEPOSITOR RELATION

CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION

CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17

Input:

∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams

4. Set Intersection ∩ :

 Suppose there are two tuples R and S. The set intersection operation contains all tuples
that are in both R & S.
 It is denoted by intersection ∩.

Notation : R ∩ S

Example: Using the above DEPOSITOR table and BORROW table

Input:

∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME
Smith
Jones
5. Set Difference - :

 Suppose there are two tuples R and S. The set intersection operation contains all tuples
that are in R but not in S.
 It is denoted by intersection minus (-).

Notation : R - S

Example: Using the above DEPOSITOR table and BORROW table

Input:

∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME
Jackson
Hayes
Willians
Curry

6. Cartesian product ( X ) :

 The Cartesian product is used to combine each row in one table with each row in the
other table. It is also known as a cross product.
 It is denoted by X.

Notation: E X D

Example:

EMPLOYEE

EMP_ID EMP_NAME EMP_DEPT


1 Smith A
2 Harry C
3 John B

DEPARTMENT

DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal

Input:

EMPLOYEE X DEPARTMENT
Output:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME


1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal

7. Rename Operation (ρ).:

The rename operation is used to rename the output relation. It is denoted by rho (ρ).

Example: We can use the rename operator to rename STUDENT relation to STUDENT1.

ρ(STUDENT1, STUDENT)

Note : Apart from these common operations Relational algebra can be used in Join operations

Relational Calculus

 Relational calculus is a non-procedural query language.


 In the non-procedural query language, the user is concerned with the details of how to
obtain the end results.
 The relational calculus tells what to do but never explains how to do.

Types of Relational calculus:

1. Tuple Relational Calculus (TRC)

 The tuple relational calculus is specified to select the tuples in a relation. In TRC,
filtering variable uses the tuples of a relation.
 The result of the relation can have one or more tuples.

Notation : {T | P (T)} or {T | Condition (T)}

Where ,

T is the resulting tuples

P(T) is the condition used to fetch T.

For example ,

{ T.name | Author(T) AND T.article = 'database' }

OUTPUT: This query selects the tuples from the AUTHOR relation. It returns a tuple with
'name' from Author who has written an article on 'database'.

TRC (Tuple Relation Calculus) can be quantified. In TRC, we can use Existential (∃) and
Universal Quantifiers (∀).

For example:

{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}

Output: This query will yield the same result as the previous one.

2. Domain Relational Calculus (DRC)

 The second form of relation is known as Domain relational calculus. In domain


relational calculus, filtering variable uses the domain of attributes.
 Domain relational calculus uses the same operators as tuple calculus. It uses logical
connectives ∧ (and), ∨ (or) and ┓ (not).
 It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable.

Notation : { a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}

Where ,

a1, a2 are attributes


P stands for formula built by inner attributes

For example : {< article, page, subject > | ∈ javatpoint ∧ subject = 'database'}

Output: This query will yield the article, page, and subject from the relational javatpoint,
where the subject is a database.
Structured Query Language ( SQL )

Basics :

 SQL stands for Structured Query Language. It is used for storing and managing data
in relational database management system (RDMS).
 It is a standard language for Relational Database System. It enables a user to create,
read, update and delete relational databases and tables.
 All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server use SQL as
their standard database language.
 SQL allows users to query the database in a number of ways, using English-like
statements.

SQL process:

 When an SQL command is executing for any RDBMS, then the system figure out the
best way to carry out the request and the SQL engine determines that how to interpret
the task.
 In the process, various components are included. These components can be optimization
Engine, Query engine, Query dispatcher, classic, etc.
 All the non-SQL queries are handled by the classic query engine, but SQL query engine
won't handle logical files.

SQL Datatype

 SQL Datatype is used to define the values that a column can contain.
 Every column is required to have a name and data type in the database table.

Datatype of SQL
1. Binary Datatypes

There are Three types of binary Datatypes which are given below:

Data Type Description


binary It has a maximum length of 8000 bytes. It contains fixed-length binary data.
varbinary It has a maximum length of 8000 bytes. It contains variable-length binary
data.
Image It has a maximum length of 2,147,483,647 bytes. It contains variable-length
binary data.

2. Approximate Numeric Datatype :

The subtypes are given below:

Data From To Description


type
Float -1.79E + 1.79E + It is used to specify a floating-point value e.g. 6.2, 2.9
308 308 etc.
Real -3.40e + 38 3.40E + 38 It specifies a single precision floating point number

3. Exact Numeric Datatype

The subtypes are given below:

Data type Description


Int It is used to specify an integer value.
smallint It is used to specify small integer value.
Bit It has the number of bits to store.
decimal It specifies a numeric value that can have a decimal number.
numeric It is used to specify a numeric value.

4. Character String Datatype

The subtypes are given below:

Data Description
type
Char It has a maximum length of 8000 characters. It contains Fixed-length non-
unicode characters.
varchar It has a maximum length of 8000 characters. It contains variable-length non-
unicode characters.
Text It has a maximum length of 2,147,483,647 characters. It contains variable-
length non-unicode characters.

5. Date and time Datatypes

The subtypes are given below:

Datatype Description
Date It is used to store the year, month, and days value.
Time It is used to store the hour, minute, and second values.
timestamp It stores the year, month, day, hour, minute, and the second value.

SQL Set Operation

The SQL Set operation is used to combine the two or more SQL SELECT statements.

Types of Set Operation

1. Union
2. UnionAll
3. Intersect
4. Minus

1. Union

 The SQL Union operation is used to combine the result of two or more SQL SELECT
queries.
 In the union operation, all the number of datatype and columns must be same in both
the tables on which UNION operation is being applied.
 The union operation eliminates the duplicate rows from its resultset.

Syntax

> SELECT *FROM t_employees UNION SELECT *FROM t2_employees;

Example:

The First table

ID NAME
1 Jack
2 Harry
3 Jackson

The Second table

ID NAME
3 Jackson
4 Stephan
5 David

Union SQL query will be:

1. SELECT * FROM First


2. UNION
3. SELECT * FROM Second;

The result set table will look like:

ID NAME
1 Jack
2 Harry
3 Jackson
4 Stephan
5 David

2. Union All

Union All operation is equal to the Union operation. It returns the set without removing
duplication and sorting the data.

Syntax: SELECT *FROM t_employees UNION ALL SELECT *FROM t2_employees;

Example: Using the above First and Second table.Union All query will be like:
1. SELECT * FROM First
2. UNION ALL
3. SELECT * FROM Second;

The result set table will look like:

ID NAME
1 Jack
2 Harry
3 Jackson
3 Jackson
4 Stephan
5 David

3. Intersect

 It is used to combine two SELECT statements. The Intersect operation returns the
common rows from both the SELECT statements.
 In the Intersect operation, the number of datatype and columns must be the same.
 It has no duplicates and it arranges the data in ascending order by default.
Syntax SELECT *FROM t_employees INTERSECT SELECT *FROM t2_employees;

Example:

Using the above First and Second table.

Intersect query will be:

1. SELECT * FROM First


2. INTERSECT
3. SELECT * FROM Second;

The resultset table will look like:

ID NAME
3 Jackson

4. Minus

 It combines the result of two SELECT statements. Minus operator is used to display
the rows which are present in the first query but absent in the second query.
 It has no duplicates and data arranged in ascending order by default.

Syntax:

Example

Using the above First and Second table.

Minus query will be:

1. SELECT * FROM First


2. MINUS
3. SELECT * FROM Second;

The resultset table will look like:

ID NAME
1 Jack
2 Harry
SQL Aggregate Functions

 SQL aggregation function is used to perform the calculations on multiple rows of a


single column of a table. It returns a single value.
 It is also used to summarize the data.

Types of SQL Aggregation Function

1. COUNT FUNCTION

 COUNT function is used to Count the number of rows in a database table. It can work
on both numeric and non-numeric data types.
 COUNT function uses the COUNT(*) that returns the count of all the rows in a
specified table. COUNT(*) considers duplicate and Null.

Syntax

1. COUNT(*)

or

2. COUNT( [ALL|DISTINCT] expression )

Sample table:

PRODUCT_MAST

PRODUCT COMPANY QTY RATE COST


Item1 Com1 2 10 20
Item2 Com2 3 25 75
Item3 Com1 2 30 60
Item4 Com3 5 10 50
Item5 Com2 2 20 40
Item6 Cpm1 3 25 75
Item7 Com1 5 30 150
Item8 Com1 3 10 30
Item9 Com2 2 25 50
Item10 Com3 4 30 120
Example: COUNT()

1. SELECT COUNT(*)
2. FROM PRODUCT_MAST;

Output:

10

Example: COUNT with WHERE

1. SELECT COUNT(*)
2. FROM PRODUCT_MAST;
3. WHERE RATE>=20;

Output:

Example: COUNT() with DISTINCT

1. SELECT COUNT(DISTINCT COMPANY)


2. FROM PRODUCT_MAST;

Output:

Example: COUNT() with GROUP BY

1. SELECT COMPANY, COUNT(*)


2. FROM PRODUCT_MAST
3. GROUP BY COMPANY;

Output:

Com1 5
Com2 3
Com3 2

Example: COUNT() with HAVING

1. SELECT COMPANY, COUNT(*)


2. FROM PRODUCT_MAST
3. GROUP BY COMPANY
4. HAVING COUNT(*)>2;

Output:

Com1 5
Com2 3
2. SUM Function

Sum function is used to calculate the sum of all selected columns. It works on numeric fields
only.

Syntax

1. SUM() or
2. SUM( [ALL|DISTINCT] expression )

Example: SUM()

1. SELECT SUM(COST)
2. FROM PRODUCT_MAST;

Output:

670

Example: SUM() with WHERE

1. SELECT SUM(COST)
2. FROM PRODUCT_MAST
3. WHERE QTY>3;

Output:

320

Example: SUM() with GROUP BY

1. SELECT SUM(COST)
2. FROM PRODUCT_MAST
3. WHERE QTY>3
4. GROUP BY COMPANY;

Output:

Com1 150
Com2 170

Example: SUM() with HAVING

1. SELECT COMPANY, SUM(COST)


2. FROM PRODUCT_MAST
3. GROUP BY COMPANY
4. HAVING SUM(COST)>=170;

Output:

Com1 335
Com3 170
3. AVG function

The AVG function is used to calculate the average value of the numeric type. AVG function
returns the average of all non-Null values.

Syntax

1. AVG() or
2. AVG( [ALL|DISTINCT] expression )

Example:

1. SELECT AVG(COST)
2. FROM PRODUCT_MAST;

Output:

67.00

4. MAX Function

MAX function is used to find the maximum value of a certain column. This function
determines the largest value of all selected values of a column.

Syntax

1. MAX() or
2. MAX( [ALL|DISTINCT] expression )

Example:

1. SELECT MAX(RATE)
2. FROM PRODUCT_MAST;

Output :

30

5. MIN Function

MIN function is used to find the minimum value of a certain column. This function
determines the smallest value of all selected values of a column.

Syntax

1. MIN() or
2. MIN( [ALL|DISTINCT] expression )

Example:

1. SELECT MIN(RATE)
2. FROM PRODUCT_MAST;
Output:

10

Null Values

The SQL NULL is the term used to represent a missing value. A NULL value in a table is a
value in a field that appears to be blank.

A field with a NULL value is a field with no value. It is very important to understand that a
NULL value is different than a zero value or a field that contains spaces.

Syntax

The basic syntax of NULL while creating a table.


SQL> CREATE TABLE CUSTOMERS(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25) ,
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID)
);

Here, NOT NULL signifies that column should always accept an explicit value of the given
data type. There are two columns where we did not use NOT NULL, which means these
columns could be NULL.

A field with a NULL value is the one that has been left blank during the record creation.

Example

The NULL value can cause problems when selecting data. However, because when comparing
an unknown value to any other value, the result is always unknown and not included in the
results. You must use the IS NULL or IS NOT NULL operators to check for a NULL value.

Consider the following CUSTOMERS table having the records as shown below.

+----+----------+-----+-----------+----------+-------------------------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+------------------------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | _ |
| 7 | Muffy | 24 | Indore | _ |
+----+----------+-----+-----------+----------+---------------------------+
Now, following is the usage of the IS NOT NULLoperator.

SQL> SELECT ID, NAME, AGE, ADDRESS, SALARY


FROM CUSTOMERS
WHERE SALARY IS NOT NULL;

This would produce the following result −

+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
+----+----------+-----+-----------+----------+

Now, following is the usage of the IS NULL operator.

SQL> SELECT ID, NAME, AGE, ADDRESS, SALARY


FROM CUSTOMERS
WHERE SALARY IS NULL;

This would produce the following result −

+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 6 | Komal | 22 | MP | |
| 7 | Muffy | 24 | Indore | |
+----+----------+-----+-----------+----------+

Sub Queries

A Subquery or Inner query or a Nested query is a query within another SQL query and
embedded within the WHERE clause.

A subquery is used to return data that will be used in the main query as a condition to further
restrict the data to be retrieved.

Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements
along with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.

Subqueries with the SELECT Statement

Subqueries are most frequently used with the SELECT statement. The basic syntax is as
follows −

SELECT column_name [, column_name ]


FROM table1 [, table2 ]
WHERE column_name OPERATOR
(SELECT column_name [, column_name ]
FROM table1 [, table2 ]
[WHERE])

Example

Consider the CUSTOMERS table having the following records −

+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+

Now, let us check the following subquery with a SELECT statement.

SQL> SELECT *
FROM CUSTOMERS
WHERE ID IN (SELECT ID
FROM CUSTOMERS
WHERE SALARY > 4500) ;

This would produce the following result.

+----+----------+-----+---------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+----------+
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+---------+----------+

Subqueries with the INSERT Statement

Subqueries also can be used with INSERT statements. The INSERT statement uses the data
returned from the subquery to insert into another table. The selected data in the subquery
can be modified with any of the character, date or number functions.

The basic syntax is as follows.

INSERT INTO table_name [ (column1 [, column2 ]) ]


SELECT [ *|column1 [, column2 ]
FROM table1 [, table2 ]
[ WHERE VALUE OPERATOR ]

Example

Consider a table CUSTOMERS_BKP with similar structure as CUSTOMERS table. Now to


copy the complete CUSTOMERS table into the CUSTOMERS_BKP table, you can use the
following syntax.

SQL> INSERT INTO CUSTOMERS_BKP


SELECT * FROM CUSTOMERS
WHERE ID IN (SELECT ID
FROM CUSTOMERS) ;

Subqueries with the UPDATE Statement

The subquery can be used in conjunction with the UPDATE statement. Either single or
multiple columns in a table can be updated when using a subquery with the UPDATE
statement.

The basic syntax is as follows.

UPDATE table
SET column_name = new_value
[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
FROM TABLE_NAME)
[ WHERE) ]

Example

Assuming, we have CUSTOMERS_BKP table available which is backup of CUSTOMERS


table. The following example updates SALARY by 0.25 times in the CUSTOMERS table for all
the customers whose AGE is greater than or equal to 27.

SQL> UPDATE CUSTOMERS


SET SALARY = SALARY * 0.25
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 27 );

This would impact two rows and finally CUSTOMERS table would have the following records.

+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 125.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 2125.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Subqueries with the DELETE Statement

The subquery can be used in conjunction with the DELETE statement like with any other
statements mentioned above.

The basic syntax is as follows.

DELETE FROM TABLE_NAME


[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
FROM TABLE_NAME)
[ WHERE) ]

Example

Assuming, we have a CUSTOMERS_BKP table available which is a backup of the


CUSTOMERS table. The following example deletes the records from the CUSTOMERS table
for all the customers whose AGE is greater than or equal to 27.

SQL> DELETE FROM CUSTOMERS


WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 27 );

This would impact two rows and finally the CUSTOMERS table would have the following
records.

+----+----------+-----+---------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+---------+----------+

SQL Join Relations

As the name shows, JOIN means to combine something. In case of SQL, JOIN means "to
combine two or more tables".

In SQL, JOIN clause is used to combine the records from two or more tables in a database.

Types of SQL JOIN

1. INNER JOIN
2. LEFT JOIN
3. RIGHT JOIN
4. FULL JOIN
Sample Table

EMPLOYEE

EMP_ID EMP_NAME CITY SALARY AGE


1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Kristen Washington 500000 29

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

PROJECT

PROJECT_NO EMP_ID DEPARTMENT


101 1 Testing
102 2 Development
103 3 Designing
104 4 Development

1. INNER JOIN

In SQL, INNER JOIN selects records that have matching values in both tables as long as the
condition is satisfied. It returns the combination of all rows from both the tables where the
condition satisfies.

Syntax

1. SELECT table1.column1, table1.column2, table2.column1,....


2. FROM table1
3. INNER JOIN table2
4. ON table1.matching_column = table2.matching_column;

Query

1. SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT


2. FROM EMPLOYEE
3. INNER JOIN PROJECT
4. ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output

EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development

2. LEFT JOIN

The SQL left join returns all the values from left table and the matching values from the right
table. If there is no matching join value, it will return NULL.

Syntax

1. SELECT table1.column1, table1.column2, table2.column1,....


2. FROM table1
3. LEFT JOIN table2
4. ON table1.matching_column = table2.matching_column;

Query

1. SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT


2. FROM EMPLOYEE
3. LEFT JOIN PROJECT
4. ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Russell NULL
Marry NULL

3. RIGHT JOIN

In SQL, RIGHT JOIN returns all the values from the values from the rows of right table and
the matched values from the left table. If there is no matching in both tables, it will return
NULL.

Syntax

1. SELECT table1.column1, table1.column2, table2.column1,....


2. FROM table1
3. RIGHT JOIN table2
4. ON table1.matching_column = table2.matching_column;

Query

1. SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT


2. FROM EMPLOYEE
3. RIGHT JOIN PROJECT
4. ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development

4. FULL JOIN

In SQL, FULL JOIN is the result of a combination of both left and right outer join. Join tables
have all the records from both tables. It puts NULL on the place of matches not found.

Syntax

1. SELECT table1.column1, table1.column2, table2.column1,....


2. FROM table1
3. FULL JOIN table2
4. ON table1.matching_column = table2.matching_column;

Query

1. SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT


2. FROM EMPLOYEE
3. FULL JOIN PROJECT
4. ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Russell NULL
Marry NULL
What is Embedded SQL?

As we have seen in our previous tutorials, SQL is known as the Structured Query Language.
It is the language that we use to perform operations and transactions on the databases.

Some of the prominent examples of languages with which we embed SQL are as follows:

 C++
 Java
 Python etc.

Dynamic SQL

Dynamic SQL is a programming technique that could be used to write SQL queries during
runtime. Dynamic SQL could be used to create general and flexible SQL queries.

Syntax for dynamic SQL is to make it string as below :

'SELECT statement';

To run a dynamic SQL statement, run the stored procedure sp_executesql as shown below :

EXEC sp_executesql N'SELECT statement';

Use prefix N with the sp_executesql to use dynamic SQL as a Unicode string.

Steps to use Dynamic SQL :

1. Declare two variables, @var1 for holding the name of the table and @var 2 for holding
the dynamic SQL :
2. DECLARE
3. @var1 NVARCHAR(MAX),
@var2 NVARCHAR(MAX);

4. Set the value of the @var1 variable to table_name :

SET @var1 = N'table_name';

5. Create the dynamic SQL by adding the SELECT statement to the table name
parameter :
6. SET @var2= N'SELECT *
FROM ' + @var1;

7. Run the sp_executesql stored procedure by using the @var2 parameter :

EXEC sp_executesql @var2;

Example –

SELECT *
from geek;
Table – Geek

ID NAME CITY
1 Khushi Jaipur
2 Neha Noida
3 Meera Delhi

Using Dynamic SQL :

DECLARE
@tab NVARCHAR(128),
@st NVARCHAR(MAX);
SET @tab = N'geektable';
SET @st = N'SELECT *
FROM ' + @tab;
EXEC sp_executesql @st;

Table – Geek

ID NAME CITY
1 Khushi Jaipur
2 Neha Noida
3 Meera Delhi

Embedded SQL

Embedded SQL involves the placement of SQL language constructs in procedural language
code. Precompilation translates the embedded SQL into calls to Pro*Ada runtime library
procedures that handle the interaction between your program and the Oracle Server. After
precompilation, you simply compile the resulting source files using your standard, supported
Ada compiler, then build the application in the normal way. Pro*Ada supplies all the
necessary library procedures

Additional Information: Consult your platform-specific Oracle documentation for details on


creating an Ada library for use with Pro*Ada.

Embedded SQL is an ANSI and ISO standard. It supports an extended method of database
access above and beyond interactive SQL. The SQL statements that you can embed in an Ada
program are a superset of the SQL statements that are supported by interactive SQL, and
may have a slightly different syntax. For example, using an interactive SQL tool such as
SQL*Plus, you can issue the statement

SELECT ename, sal


FROM emp
WHERE empno = &EMP_NUMBER;
SQL*Plus prompts you for the value of the substitution variable EMP_NUMBER, and once
you have entered this, it displays the results. In a Pro*Ada program, the equivalent embedded
SQL statement is:

EXEC SQL SELECT ename, sal


INTO :EMPLOYEE_NAME, :EMPLOYEE_SALARY
FROM emp
WHERE empno = :EMP_NUMBER;

Other SQL Functions

 AVG() - Returns the average value.


 COUNT() - Returns the number of rows.
 FIRST() - Returns the first value.
 LAST() - Returns the last value.
 MAX() - Returns the largest value.
 MIN() - Returns the smallest value.
 SUM() - Returns the sum.

Difference between Data Security and Data Integrity :

S.No. Data Security Data Integrity


1. Data security refers to the Data integrity refers to the quality of data,
prevention of data corruption which assures the data is complete and has a
through the use of controlled access whole structure.
mechanisms.
2. Its motive is the protection of data. Its motive is the validity of data.
3. Its work is to only the people who Its work is to check the data is correct and not
should have access to the data are corrupt.
the only ones who can access the
data.
4. It refers to making sure that data It refers to the structure of the data and how it
is accessed by its intended users, matches the schema of the database.
thus ensuring the privacy and
protection of data.
5. Some of the popular means of data Some of the means to preserve integrity are
security are backing up, error detection, designing a suitable
authentication/authorization, user interface and correcting data.
masking, and encryptions.
6. It relates to the physical form of It relates to the logical protection (correct,
data against accidental or complete and consistence) of data.
intentional loss or misuse and
destruction.
7. It avoids unauthorized access of It avoids human error when data is entered.
data.
8. It can be implemented through : It can be implemented by following rule :

 user accounts (passwords)  Primary Key


 authentication schemes  Foreign Key
 Relationship
Unit 3 : DATA NORMALIZATION

Pitfalls in Relational Database Design

Relational database design requires that we find a “good” collection of relation schemas. A bad
design may lead to Repetition of Information & Inability to represent certain information.

Design Goals:
1. Avoid redundant data
2. Ensure that relationships among attributes are represented
3. Facilitate the checking of updates for violation of database integrity constraints.
Example
. Consider the relation schema:
Lending-schema = (branch-name, branch-city, assets, customer-name,
loan-number, amount)

Branch City Assests Customer Loan Amount


name Amount
Downtown Brooklyn 9000000 Jones L-17 1000
Redwood Palo Alto 2100000 Smith L-23 2000
Perryride Horseneck 1700000 Hayes L-15 1500
Downtown Brroklyn 9000000 Jackson L-14 1500

Problems:
Redundancy:
. Data for branch-name, branch-city, assets are repeated for each
loan that a branch makes
. Wastes space
. Complicates updating, introducing possibility of inconsistency of
assets value

Null values
. Cannot store information about a branch if no loans exist
. Can use null values, but they are difficult to handle

Relational Decomposition

 When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
 In a database, it breaks the table into multiple tables.
 If the relation has no proper decomposition, then it may lead to problems like loss of
information.
 Decomposition is used to eliminate some of the problems of bad design like anomalies,
inconsistencies, and redundancy.
Types of Decomposition

Lossless Decomposition

 If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
 The lossless decomposition guarantees that the join of relations will result in the same
relation as it was decomposed.
 The relation is said to be lossless decomposition if natural joins of all the decomposition
give the original relation.

Example:

EMPLOYEE_DEPARTMENT table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME


22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Marketing
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT

EMPLOYEE table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY


22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida

DEPARTMENT table

DEPT_ID EMP_ID DEPT_NAME


827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the
resultant relation will look like:

Employee ⋈ Department

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME


22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Marketing
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

Dependency Preserving

 It is an important constraint of the database.


 In the dependency preservation, at least one decomposed table must satisfy every
dependency.
 If a relation R is decomposed into relation R1 and R2, then the dependencies of R either
must be a part of R1 or R2 or must be derivable from the combination of functional
dependencies of R1 and R2.
 For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of relation R1(ABC).

Functional Dependency

A Functional dependency is a relationship between attributes. In functional dependency we


can obtain the value of another attribute from given attribute.

For example,
If we know the value of student roll number, we can obtain student address, marks etc. By
this, we say that student address and marks is functionally dependent on student roll
number.

Types of functional dependency:

1. Single Valued Functional Dependency


2. Fully Functional Dependency
3. Partial Functional Dependency
4. Transitive Functional Dependency
5. Trivial Functional Dependency

1) Single Valued Functional Dependency:


A simple example of single value functional dependency is when Roll_Number is the primary
key of an entity and Student_Name is some single valued attribute of the entity.

Then,
Roll_Number → Student_Name

Roll_Number Student_Name Student_Address


011 Jayesh Umre Burhanpur
012 Kunal Batra Burhanpur
013 Nilesh Nimbhorkar Ichchapur
014 Aryan Jagdale Ujjain

2) Fully Functional Dependency:


A functional dependency P → Q is full functional dependency if removal of any attribute A
from P means that the dependency does not hold any more.

Roll_Number Subject_Name Paper_Hours


011 DBMS 3
012 Python 1
013 AWT 3
025 DBMS 2
From above table,

{Roll_Number, Subject_Name} --> Paper_Hour

Since neither Roll_Number --> Paper_Hour nor Subject_Name --> Paper_Hour hold.

3) Partial Functional Dependency:


A Functional Dependency in which one or more non key attributes are functionally depending
on a part of the primary key is called partial functional dependency.

Roll_Number Subject_Name Student_Name


011 DBMS Jayesh Umre
012 Python Kunal Batra
013 AWT Nilesh Nimbhorkar
014 DBMS Aryan Jagdale
From above table,

{Roll_Number, Subject_Name} --> Student_Name is not a full FD.

Since Roll_Number --> Student_Name also hold.

4) Transitive Functional Dependency:


Given a relation R(A,B,C) then dependency like A–>B, B–>C is a transitive dependency,
since A–>C is implied

Roll_Number Pin_Code City_Name


011 450331 Burhanpur
012 450001 Khandwa
013 456001 Ujjain
014 452020 Indore
From above table,

Roll_Number --> Pin_Code and Pin_Code --> City_Name hold.

Than Roll_Number --> City_Name is a transitive FD.

5) Trivial Functional Dependency:


Functional dependency of the form A–>B is trivial if B subset of A or B = A.

Roll_Number Student_Name

012 Kunal Batra


013 Nilesh Nimbhorkar
014 Aryan Jagdale
From above table,

{Roll_Nuber, Student_Name} --> Roll_Number is a trivial functional dependency as


Roll_Number is a subset of {Roll_Number,Student_Name}.

Normalization

 Normalization is the process of organizing the data in the database.


 Normalization is used to minimize the redundancy from a relation or set of relations. It
is also used to eliminate the undesirable characteristics like Insertion, Update and
Deletion Anomalies.
 Normalization divides the larger table into the smaller table and links them using
relationship.
 The normal form is used to reduce redundancy from the database table.

Types of Normal Forms

There are the four types of normal forms:

Normal Form Description


1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.
4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no
multi-valued dependency.
5NF A relation is in 5NF if it is in 4NF and not contains any join dependency
and joining should be lossless.
First Normal Form (1NF)

 A relation will be 1NF if it contains an atomic value.


 It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.
 First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute


EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab

Second Normal Form (2NF)

 In the 2NF, relational must be in 1NF.


 In the second normal form, all non-key attributes are fully functional dependent on the
primary key

Example: Let's assume, a school can store the data of teachers and the subjects they teach.
In a school, a teacher can teach more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE


25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID


which is a proper subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE
25 30
47 35
83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer

Third Normal Form (3NF)

 A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
 If there is no transitive dependency for non-prime attributes, then the relation must be
in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY


222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal

Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP} .... so on

Candidate key: {EMP_ID}


Non-prime attributes: In the given table, all attributes except EMP_ID are non-
prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on


EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent
on super key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP


222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY


201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal

Boyce Codd normal form (BCNF)

 BCNF is the advance version of 3NF. It is stricter than 3NF.


 A table is in BCNF if every functional dependency X → Y, X is the super key of the
table.
 For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one
department.

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO


264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549

In the above table Functional dependencies are as follows:


1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY
264 India
264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO


Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

For the first table: EMP_ID


For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

Fourth normal form (4NF)

 A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
 For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.
Example

STUDENT

STU_ID COURSE HOBBY


21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and
Math and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on
STU_ID, which leads to unnecessary repetition of data. So to make the above table into 4NF,
we can decompose it into two tables:

STUDENT_COURSE

STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics

STUDENT_HOBBY

STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hocke

Fifth normal form (5NF)

 A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
 5NF is satisfied when all the tables are broken into as many tables as possible in order
to avoid redundancy.
 5NF is also known as Project-join normal form (PJ/NF).

Example
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1
Computer John Semester 1
Math John Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case, combination of all these fields required to
identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who
will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

P1

SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math

P2

SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen

P3

SEMSTER LECTURER
Semester Anshika
1
Semester John
1
Semester John
1
Semester Akash
2
Semester Praveen
1
Unit 4 : STORAGE AND FILE ORGANIZATION

DBMS Storage System

Databases are stored in record-containing file formats. At the physical level, the actual data is
stored on some devices in electromagnetic format. These storage devices can be classified
generally into three types.

 Primary Storage − This category contains the memory storage that is directly
available to the CPU. The internal memory (registers), fast memory (cache), and main
memory ( RAM) of the CPU are directly accessible to the CPU since they are all placed
on the chipset of the motherboard or CPU. Usually, this storage is very small, ultra-
fast, and volatile. In order to maintain its condition, primary storage requires a
continuous power supply. All the data is lost in the event of a power failure.
 Secondary Storage-Secondary storage devices are used for potential use or as backup
data storage. k Secondary storage covers memory devices, such as magnetic disks,
optical disk (DVDs, CDs, etc.), disk drives, flash drives, and magnetic tapes that are not
part of the CPU chipset or motherboard.
 Tertiary Storage- - Tertiary storage is used to store immense data volumes. They are
the slowest in speed because such storage devices are external to the computer system.
For the most part, these storage devices are used to back up an entire system. Tertiary
storage is commonly used for optical disks and magnetic tapes.

Memory Hierarchy

A computer system has a well-established memory hierarchy. A CPU has direct access to its
primary memory and its built-in registers. Obviously, the access time of the main memory is
less than the CPU speed. Cache memory will be introduced to decrease this speed mismatch.
The fastest access time is given by cache memory and it includes data that is most frequently
accessed by the CPU.

The most expensive one is the memory with the fastest entry. Larger storage devices are slow
and less expensive, but compared to CPU registers or cache memory, they can hold huge
amounts of data.

Disks
Disks are online devices that can be accessed directly. Typical database applications
need only a small portion of the database Many files can be stored on a singleat a time for
processing ...... disk storage device with little wasted space . Files can be accessed and opened
practically instantaneously .

Magnetic disk is the secondary storage device used to support direct access to a desired
location.

Parts in Magnetic disk

The different parts that are present in magnetic disk or hard disk are explained below. All
these parts are helpful to read, write and store the data in the hard disk.

 Disk blocks − The unit data transfer between disk and main memory is a block. A disk
block is a contiguous sequence of bytes.
 Track − Blocks are arranged in concentric rings called tracks.
 Sectors − A sector is the smallest unit of information that can be read from or written
to disk: for example: sector size of 512 bytes.
 Platters − The surface of the platter is covered with a magnetic material. Information is
recorded on this surface. The set of all tracks with the same diameter is called a
cylinder. Typical platter diameters are 3.5 inch or 5.2 inch.
 Read-write head − Each platter has a read-write head on both sides. It is used for
reading and writing the data on a platter.
 Disk controller − A disk controller interfaces a disk driver to the computer.

Calculate the performance of a hard disk

The time to read or write a block varies depending on the location of data. The performance of
a hard disk can be calculated by using the below mentioned formula.

Access time = seek time+ rotational delay + transfer time

Here,

 Seek time − The time to move the disk-head to the track on which a desired block is
located.
 Rotational delay − It is the waiting time for the desired block to rotate under disk
head.
 Transfer time − It is the time to read or write the data in the block once the head is
positioned.

Redundant Array of Independent Disks ( RAID )

RAID or Redundant Array of Independent Disks, is a technology to connect multiple


secondary storage devices and use them as a single storage media.

RAID consists of an array of disks in which multiple disks are connected together to achieve
different goals. RAID levels define the use of disk arrays.
RAID 0

In this level, a striped array of disks is implemented. The data is broken down into blocks and
the blocks are distributed among disks. Each disk receives a block of data to write/read in
parallel. It enhances the speed and performance of the storage device. There is no parity and
backup in Level 0.

RAID 1

RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a copy of
data to all the disks in the array. RAID level 1 is also called mirroring and provides 100%
redundancy in case of a failure.

RAID 2

RAID 2 records Error Correction Code using Hamming distance for its data, striped on
different disks. Like level 0, each data bit in a word is recorded on a separate disk and ECC
codes of the data words are stored on a different set disks. Due to its complex structure and
high cost, RAID 2 is not commercially available.

RAID 3

RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is stored
on a different disk. This technique makes it to overcome single disk failures.

RAID 4

In this level, an entire block of data is written onto data disks and then the parity is
generated and stored on a different disk. Note that level 3 uses byte-level striping, whereas
level 4 uses block-level striping. Both level 3 and level 4 require at least three disks to
implement RAID.
RAID 5

RAID 5 writes whole data blocks onto different disks, but the parity bits generated for data
block stripe are distributed among all the data disks rather than storing them on a different
dedicated disk.

RAID 6

RAID 6 is an extension of level 5. In this level, two independent parities are generated and
stored in distributed fashion among multiple disks. Two parities provide additional fault
tolerance. This level requires at least four disk drives to implement RAID.

File Organization

File – A file is named collection of related information that is recorded on secondary storage
such as magnetic disks, magnetic tables and optical disks.

What is File Organization?


File Organization refers to the logical relationships among various records that constitute the
file, particularly with respect to the means of identification and access to any specific record.
In simple terms, Storing the files in certain order is called file Organization. File Structure
refers to the format of the label and data blocks and of any logical control record.
Types of File Organizations –

Various methods have been introduced to Organize files. These particular methods have
advantages and disadvantages on the basis of access or selection .
Some types of File Organizations are :

 Sequential File Organization


 Heap File Organization
 Hash File Organization
 B+ Tree File Organization
 Clustered File Organization

We will be discussing each of the file Organizations in further sets of this article along withdifferences and
advantages/ disadvantages of each file Organization methods.

Sequential File Organization –

The easiest method for file Organization is Sequential method. In this method the file are
stored one after another in a sequential manner. There are two ways to implement this
method:

 Pile File Method – This method is quite simple, in which we store the records in a
sequence i.e one after other in the order in which they are inserted into the tables.

1. Insertion of new record –


Let the R1, R3 and so on upto R5 and R4 be four records in the sequence. Here, records
are nothing but a row in any table. Suppose a new record R2 has to be inserted in the
sequence, then it is simply placed at the end of the file.

 Sorted File Method –In this method, As the name itself suggest whenever a new
record has to be inserted, it is always inserted in a sorted (ascending or descending)
manner. Sorting of records may be based on any primary key or any other key.
1. Insertion of new record –
Let us assume that there is a preexisting sorted sequence of four records R1, R3, and so
on upto R7 and R8. Suppose a new record R2 has to be inserted in the sequence, then it
will be inserted at the end of the file and then it will sort the sequence .

Pros and Cons of Sequential File Organization –


Pros –

 Fast and efficient method for huge amount of data.


 Simple design.
 Files can be easily stored in magnetic tapes i.e cheaper storage mechanism.

Cons –

 Time wastage as we cannot jump on a particular record that is required, but we have to
move in a sequential manner which takes our time.
 Sorted file method is inefficient as it takes time and space for sorting records.

Heap File Organization –

Heap File Organization works with data blocks. In this method records are inserted at the
end of the file, into the data blocks. No Sorting or Ordering is required in this method. If a
data block is full, the new record is stored in some other block, Here the other data block need
not be the very next data block, but it can be any block in the memory. It is the responsibility
of DBMS to store and manage the new records.
Insertion of new record –

Suppose we have four records in the heap R1, R5, R6, R4 and R3 and suppose a new record R2
has to be inserted in the heap then, since the last data block i.e data block 3 is full it will be
inserted in any of the data blocks selected by the DBMS, lets say data block 1.

If we want to search, delete or update data in heap file Organization the we will traverse the
data from the beginning of the file till we get the requested record. Thus if the database is
very huge, searching, deleting or updating the record will take a lot of time.

Pros and Cons of Heap File Organization –


Pros –

 Fetching and retrieving records is faster than sequential record but only in case of
small databases.
 When there is a huge number of data needs to be loaded into the database at a time,
then this method of file Organization is best suited.

Cons –

 Problem of unused memory blocks.


 Inefficient for larger databases.

Hashing is an efficient technique to directly search the location of desired data on the disk
without using index structure. Data is stored at the data blocks whose address is generated
by using hash function. The memory location where these records are stored is called as data
block or data bucket.

Hash File Organization :

 Data bucket – Data buckets are the memory locations where the records are stored.
These buckets are also considered as Unit Of Storage.

 Hash Function – Hash function is a mapping function that maps all the set of search
keys to actual record address. Generally, hash function uses primary key to generate
the hash index – address of the data block. Hash function can be simple mathematical
function to any complex mathematical function.
 Hash Index-The prefix of an entire hash value is taken as a hash index. Every hash
index has a depth value to signify how many bits are used for computing a hash
function. These bits can address 2n buckets. When all these bits are consumed ? then
the depth value is increased linearly and twice the buckets are allocated.

Below given diagram clearly depicts how hash function work:

Hashing is further divided into two sub categories :

Static Hashing –

In static hashing, when a search-key value is provided, the hash function always computes
the same address. For example, if we want to generate address for STUDENT_ID = 76 using
mod (5) hash function, it always result in the same bucket address 4. There will not be any
changes to the bucket address here. Hence number of data buckets in the memory for this
static hashing remains constant throughout.

Operations –

 Insertion – When a new record is inserted into the table, The hash function h generate
a bucket address for the new record based on its hash key K.
Bucket address = h(K)
 Searching – When a record needs to be searched, The same hash function is used to
retrieve the bucket address for the record. For Example, if we want to retrieve whole
record for ID 76, and if the hash function is mod (5) on that ID, the bucket address
generated would be 4. Then we will directly got to address 4 and retrieve the whole
record for ID 104. Here ID acts as a hash key.
 Deletion – If we want to delete a record, Using the hash function we will first fetch the
record which is supposed to be deleted. Then we will remove the records for that
address in memory.
 Updation – The data record that needs to be updated is first searched using hash
function, and then the data record is updated.
Now, If we want to insert some new records into the file But the data bucket address
generated by the hash function is not empty or the data already exists in that address. This
becomes a critical situation to handle. This situation in the static hashing is called bucket
overflow.
How will we insert data in this case?
There are several methods provided to overcome this situation. Some commonly used methods
are discussed below:

1. Open Hashing –
In Open hashing method, next available data block is used to enter the new record,
instead of overwriting older one. This method is also called linear probing.

For example, D3 is a new record which needs to be inserted , the hash function
generates address as 105. But it is already full. So the system searches next available
data bucket, 123 and assigns D3 to it.

2. Closed hashing –
In Closed hashing method, a new data bucket is allocated with same address and is
linked it after the full data bucket. This method is also known as overflow chaining.
For example, we have to insert a new record D3 into the tables. The static hash
function generates the data bucket address as 105. But this bucket is full to store the
new data. In this case is a new data bucket is added at the end of 105 data bucket and
is linked to it. Then new record D3 is inserted into the new bucket.

o Quadratic probing :
Quadratic probing is very much similar to open hashing or linear probing. Here,
The only difference between old and new bucket is linear. Quadratic function is
used to determine the new bucket address.
o Double Hashing :
Double Hashing is another method similar to linear probing. Here the difference
is fixed as in linear probing, but this fixed difference is calculated by using
another hash function. That’s why the name is double hashing.
Dynamic Hashing –

The drawback of static hashing is that that it does not expand or shrink dynamically as the
size of the database grows or shrinks. In Dynamic hashing, data buckets grows or shrinks
(added or removed dynamically) as the records increases or decreases. Dynamic hashing is
also known as extended hashing.

In dynamic hashing, the hash function is made to produce a large number of values. For
Example, there are three data records D1, D2 and D3 . The hash function generates three
addresses 1001, 0101 and 1010 respectively. This method of storing considers only part of
this address – especially only first one bit to store the data. So it tries to load three of them at

address of 0 & 1.

But the problem is that No bucket address is remaining for D3. The bucket has to grow
dynamically to accommodate D3. So it changes the address have 2 bits rather than 1 bit, and
then it updates the existing data to have 2 bit address. Then it tries to accommodate D3.

B+ Tree File Organization –

B+ Tree, as the name suggests, It uses a tree like structure to store records in File. It uses the concept of Key
indexing where the primary key is used to sort the records. For each primary key, an index value is
generated and mapped with the record. An index of a record is the address of record in the file.

B+ Tree is very much similar to binary search tree, with the only difference that instead of just two children,
it can have more than two. All the information is stored in leaf node and the intermediate nodes acts as
pointer to the leaf nodes. The information in leaf nodes always remain a sorted sequential linked list.
In the above diagram 56 is the root node which is also called the main node of the tree.
The intermediate nodes here, just consist the address of leaf nodes. They do not contain any actual record.
Leaf nodes consist of the actual record. All leaf nodes are balanced.

Pros and Cons of B+ Tree File Organization –

Pros –

 Tree traversal is easier and faster.


 Searching becomes easy as all records are stored only in leaf nodes and are sorted sequential
linked list.
 There is no restriction on B+ tree size. It may grows/shrink as the size of data
increases/decreases.

Cons –

 Inefficient for static tables.

Cluster File Organization –

In cluster file organization, two or more related tables/records are stored within same file known as clusters.
These files will have two or more tables in the same data block and the key attributes which are used to map
these table together are stored only once.

Thus it lowers the cost of searching and retrieving various records in different files as they are now
combined and kept in a single cluster.
For example we have two tables or relation Employee and Department. These table are related to each
other.

Therefore these table are allowed to combine using a join operation and can be seen in a cluster file.
If we have to insert, update or delete any record we can directly do so. Data is sorted based on the primary
key or the key with which searching is done. Cluster key is the key with which joining of the table is
performed.

Types of Cluster File Organization – There are two ways to implement this method:

1. Indexed Clusters –
In Indexed clustering the records are group based on the cluster key and stored together. The
above mentioned example of the Employee and Department relationship is an example of
Indexed Cluster where the records are based on the Department ID.
2. Hash Clusters –
This is very much similar to indexed cluster with only difference that instead of storing the
records based on cluster key, we generate hash key value and store the records with same hash
key value

Fixed-Length Records

Fixed-length records means setting a length and storing the records into the file. If the record
size exceeds the fixed size, it gets divided into more than one block. Due to the fixed size there
occurs following two problems:

1. Partially storing subparts of the record in more than one block requires access to all the
blocks containing the subparts to read or write in it.
2. It is difficult to delete a record in such a file organization. It is because if the size of the
existing record is smaller than the block size, then another record or a part fills up the
block.

Variable-Length Records

Variable-length records are the records that vary in size. It requires the creation of multiple
blocks of multiple sizes to store them. These variable-length records are kept in the following
ways in the database system:

1. Storage of multiple record types in a file.


2. It is kept as Record types that enable repeating fields like multisets or arrays.
3. It is kept as Record types that enable variable lengths either for one field or more.

In variable-length records, there exist the following two problems:


1. Defining the way of representing a single record so as to extract the individual
attributes easily.
2. Defining the way of storing variable-length records within a block so as to extract that
record in a block easily.

Slotted-page Structure

There occurs a problem to store variable-length records within the block. Thus, such records
are organized in a slotted-page structure within the block. In the slotted-page structure, a
header is present at the starting of each block. This header holds information such as:

1. The number of record entries in the header


2. No free space remaining in the block
3. An array containing the information on the location and size of the records.

Inserting and Deleting Method

The variable-length records reside in a contiguous manner within the block.

When a new record is to be inserted, it gets the place at the end of the free space. It is because
free space is contiguous as well. Also, the header fills an entry with the size and location
information of the newly inserted record.

Data Dictionary Storage

Till now, we learned and understood about relations and its representation. In the relational
database system, it maintains all information of a relation or table, from its schema to the
applied constraints. All the metadata is stored. In general, metadata refers to the data about
data. So, storing the relational schemas and other metadata about the relations in a structure
is known as Data Dictionary or System Catalog.

A data dictionary is like the A-Z dictionary of the relational database system holding all
information of each relation in the database.

The types of information a system must store are:

 Name of the relations


 Name of the attributes of each relation
 Lengths and domains of attributes
 Name and definitions of the views defined on the database
 Various integrity constraints
With this, the system also keeps the following data based on users of the system:

 Name of authorized users


 Accounting and authorization information about users.
 The authentication information for users, such as passwords or other related
information.

In addition to this, the system may also store some statistical and descriptive data
about the relations, such as:

 Number of tuples in each relation


 Method of storage for each relation, such as clustered or non-clustered.

A system may also store the storage organization, whether sequential, hash, or
heap. It also notes the location where each relation is stored:

 If relations are stored in the files of the operating system, the data dictionary note, and
stores the names of the file.
 If the database stores all the relations in a single file, the data dictionary notes and
store the blocks containing records of each relation in a data structure similar to a
linked list.

At last, it also stores the information regarding each index of all the relations:

 Name of the index.


 Name of the relation being indexed.
 Attributes on which the index is defined.
 The type of index formed.

Unit 5 : QUERY PROCESSING & TRANSACTION MANAGEMENT

Query Processing in DBMS

Query Processing is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps involved
are:

1. Parsing and translation


2. Optimization
3. Evaluation

The query processing works in the following way:

Parsing and Translation

 The Translation process in query processing is similar to the parser of a query. When a
user executes any query, for generating the internal form of the query, the parser in
the system checks the syntax of the query, verifies the name of the relation in the
database, the tuple, and finally the required attribute value.

 The Parser creates a tree of the query, known as 'parse-tree.' Further, translate it into
the form of relational algebra. With this, it evenly replaces all the use of the views
when used in the query.

Thus, we can understand the working of a query processing in the below-described diagram:

Suppose a user executes a query. As we have learned that there are various methods of
extracting the data from the database. In SQL, a user wants to fetch the records of the
employees whose salary is greater than or equal to 10000. For doing this, the following query
is undertaken:

select emp_name from Employee where salary>10000;

Thus, to make the system understand the user query, it needs to be translated in the form of
relational algebra. We can bring this query in the relational algebra form as:

 σsalary>10000 (πsalary (Employee))


 πsalary (σsalary>10000 (Employee))

After translating the given query, we can execute each relational algebra operation by using
different algorithms. So, in this way, a query processing begins its working.

Evaluation

For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and
evaluating each operation. Thus, after translating the user query, the system executes a
query evaluation plan.
Query Evaluation Plan

 In order to fully evaluate a query, the system needs to construct a query evaluation
plan.
 The annotations in the evaluation plan may refer to the algorithms to be used for the
particular index or the specific operations.
 Such relational algebra with annotations is referred to as Evaluation Primitives. The
evaluation primitives carry the instructions needed for the evaluation of the operation.
 Thus, a query evaluation plan defines a sequence of primitive operations used for
evaluating a query. The query evaluation plan is also referred to as the query
execution plan.
 A query execution engine is responsible for generating the output of the given
query. It takes the query execution plan, executes it, and finally makes the output for
the user query.

Optimization

 The cost of the query evaluation can vary for different types of queries. Although the
system is responsible for constructing the evaluation plan, the user does need not to
write their query efficiently.
 Usually, a database system generates an efficient query evaluation plan, which
minimizes its cost. This type of task performed by the database system and is known as
Query Optimization.
 For optimizing a query, the query optimizer should have an estimated cost analysis of
each operation. It is because the overall operation cost depends on the memory
allocations to several operations, execution costs, and so on.

Finally, after selecting an evaluation plan, the system evaluates the query and produces the
output of the query.

Transaction

 The transaction is a set of logically related operation. It contains a group of tasks.


 A transaction is an action or series of actions. It is performed by a single user to
perform operations for accessing the contents of the database.

Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account.
This small transaction contains several low-level tasks:

X's Account

1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)

Y's Account

1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)

What are ACID Properties?

ACID Properties are used for maintaining the integrity of database during transaction
processing. ACID in DBMS stands for Atomicity, Consistency, Isolation, and Durability.

 Atomicity: A transaction is a single unit of operation. You either execute it entirely or


do not execute it at all. There cannot be partial execution.


 In the above diagram, it can be seen that after crediting $10, the amount is still $100 in
account B. So, it is not an atomic transaction.
 The below image shows that both debit and credit operations are done successfully.
Thus the transaction is atomic.


 Thus, when the amount loses atomicity, then in the bank systems, this becomes a huge
issue, and so the atomicity is the main focus in the bank systems.

 Consistency: Once the transaction is executed, it should move from one consistent
state to another.

Example:
 Isolation: Transaction should be executed in isolation from other transactions (no
Locks). During concurrent transaction execution, intermediate transaction results from
simultaneously executed transactions should not be made available to each other.
(Level 0,1,2,3)

Example: If two operations are concurrently running on two different accounts, then
the value of both accounts should not get affected. The value should remain persistent.
As you can see in the below diagram, account A is making T1 and T2 transactions to
account B and C, but both are executing independently without affecting each other. It
is known as Isolation.


 Durability: · After successful completion of a transaction, the changes in the database
should persist. Even in the case of system failures.

States of Transactions

A transaction in a database can be in one of the following states −

State Description
Active In this state, the transaction is being executed. This is the initial state of
state every transaction.

Partially When a transaction


o executes its final operation, it is said to be in a
committed partially committed state.

Aborted When the normal execution can no longer be performed.


Failed or aborted transactions may be restarted later, eit her
automatically or a fter being resubmitted by the user as n ew transactions.
Concurrency control in DBMS

Concurrency Control is the management procedure that is required for controlling concurrent execution of
the operations that take place on a database.

But before knowing about concurrency control, we should know about concurrent execution.

Concurrent Execution in DBMS

 In a multi-user system, multiple users can access and use the same database at one time,
which is known as the concurrent execution of the database. It means that the same database
is executed simultaneously on a multi-user system by different users.
 While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case, concurrent
execution of the database is performed.
 The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations, thus
maintaining the consistency of the database. Thus, on making the concurrent execution of the
transaction operations, there occur several challenging problems that need to be solved.

Problems with Concurrent Execution

In a database transaction, the two main operations are READ and WRITE operations. So, there is a need to
manage these two operations in the concurrent execution of the transactions as if these operations are not
performed in an interleaved manner, and the data may become inconsistent. So, the following problems
occur with the Concurrent Execution of the operations:

Problem 1: Lost Update Problems (W - W Conflict)

The problem occurs when two different database transactions perform the read/write operations on the
same database items in an interleaved manner (i.e., concurrent execution) that makes the values of the items
incorrect hence making the database inconsistent.

For example:

Consider the below diagram where two transactions TX and TY, are performed on the same account A
where the balance of account A is $300.
 At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
 At time t2, transaction TX deducts $50 from account A that becomes $250 (only deducted and
not updated/write).
 Alternately, at time t3, transaction T Y reads the value of account A that will be $300 only
because TX didn't update the value yet.
 At time t4, transaction TY adds $100 to account A that becomes $400 (only added but not
updated/write).
 At time t6, transaction TX writes the value of account A that will be updated as $250 only, as T Y
didn't update the value yet.
 Similarly, at time t7, transaction TY writes the values of account A, so it will write as done at
time t4 that will be $400. It means the value written by TX is lost, i.e., $250 is lost.

Hence data becomes incorrect, and database sets to inconsistent.

Dirty Read Problems (W-R Conflict)

The dirty read problem occurs when one transaction updates an item of the database, and somehow the
transaction fails, and before the data gets rollback, the updated database item is accessed by another
transaction. There comes the Read-Write Conflict between both transactions.

For example:

Consider two transactions TX and TY in the below diagram performing read/write operations on
account A where the available balance in account A is $300:

 At time t1, transaction TX reads the value of account A, i.e., $300.


 At time t2, transaction TX adds $50 to account A that becomes $350.
 At time t3, transaction TX writes the updated value in account A, i.e., $350.
 Then at time t4, transaction TY reads account A that will be read as $350.
 Then at time t5, transaction TX rollbacks due to server problem, and the value changes back to
$300 (as initially).
 But the value for account A remains $350 for transaction TY as committed, which is the dirty
read and therefore known as the Dirty Read Problem.

Unrepeatable Read Problem (W-R Conflict)

Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two different values are
read for the same database item.

For example:

Consider two transactions, TX and TY, performing the read/write operations on account A, having an
available balance = $300. The diagram is shown below:

 At time t1, transaction TX reads the value from account A, i.e., $300.
 At time t2, transaction TY reads the value from account A, i.e., $300.
 At time t3, transaction TY updates the value of account A by adding $100 to the available
balance, and then it becomes $400.
 At time t4, transaction TY writes the updated value, i.e., $400.
 After that, at time t5, transaction TX reads the available value of account A, and that will be
read as $400.
 It means that within the same transaction TX, it reads two different values of account A, i.e., $
300 initially, and after updation made by transaction TY , it reads $400. It is an unrepeatable
read and is therefore known as the Unrepeatable read problem.

Thus, in order to maintain consistency in the database and avoid such problems that take place in concurrent
execution, management is needed, and that is where the concept of Concurrency Control comes into role.

Lock-Based Protocols

To attain consistency, isolation between the transactions is the most important tool. Isolation
is achieved if we disable the transaction to perform a read/write operation. This is known as
locking an operation in a transaction. Through lock-based protocols, desired operations are
freely allowed to perform locking the undesired operations.

There are four kinds of lock-based protocols:

Simplistic Lock Protocol: This protocol instructs to lock all the other operations on the
data when the data is going to get updated. All the transactions may unlock all the operations
on the data after the write operation.

Pre-claiming Lock Protocol: According to the pre-claiming lock protocol initially, an


assessment of the operations that are going to be performed is conducted. Then a list is
prepared to contain the data items on which locks will be imposed. The transaction requests
the system all the locks before starting the execution of the operations. If all the locks are
provided then the operations in the transaction run smoothly and then locks are returned to
the system on completion. The transaction rolls back if all the locks are not provided.

Two-phase Locking Protocol: This protocol consists of three phases. The transaction starts
its execution with the first phase, where it asks for the locks. Once the locks are granted, the
second phase begins, where the transaction contains all the locks. When the transaction
releases the first lock, the third phase begins where all the locks are getting released after the
execution of every operation in the transaction.

Strict Two-Phase Locking Protocol: The strict 2PL is almost similar to 2PL. The only
difference is that the strict 2PL does not allow releasing the locks just after the execution of
the operations, but it carries all the locks and releases them when the commit is triggered.

Deadlock Handling :

A Deadlock is a condition where two or more transactions are waiting indefinitely for one another to give
up locks. Deadlock is said to be one of the most feared complications in DBMS as no task ever gets finished
and is in waiting state forever.

For example: In the student table, transaction T1 holds a lock on some rows and needs to update some rows
in the grade table. Simultaneously, transaction T2 holds locks on some rows in the grade table and needs to
update the rows in the Student table held by Transaction T1.

Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and similarly,
transaction T2 is waiting for T1 to release its lock. All activities come to a halt state and remain at a
standstill. It will remain in a standstill until the DBMS detects the deadlock and aborts one of the
transactions.

Deadlock Avoidance

 When a database is stuck in a deadlock state, then it is better to avoid the database rather
than aborting or restating the database. This is a waste of time and resource.
 Deadlock avoidance mechanism is used to detect any deadlock situation in advance. A method
like "wait for graph" is used for detecting the deadlock situation but this method is suitable
only for the smaller database. For the larger database, deadlock prevention method can be
used.
Deadlock Detection

In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should detect whether
the transaction is involved in a deadlock or not. The lock manager maintains a Wait for the graph to detect
the deadlock cycle in the database.

Wait for Graph

 This is the suitable method for deadlock detection. In this method, a graph is created based on
the transaction and their lock. If the created graph has a cycle or closed loop, then there is a
deadlock.
 The wait for the graph is maintained by the system for every transaction which is waiting for
some data held by the others. The system keeps checking the graph if there is any cycle in the
graph.

The wait for a graph for the above scenario is shown below:

Deadlock Prevention

 Deadlock prevention method is suitable for a large database. If the resources are allocated in
such a way that deadlock never occurs, then the deadlock can be prevented.
 The Database management system analyzes the operations of the transaction whether they can
create a deadlock situation or not. If they do, then the DBMS never allowed that transaction to
be executed.

Wait-Die scheme

In this scheme, if a transaction requests for a resource which is already held with a conflicting lock by
another transaction then the DBMS simply checks the timestamp of both transactions. It allows the older
transaction to wait until the resource is available for execution.

Let's assume there are two transactions Ti and Tj and let TS(T) is a timestamp of any transaction T. If T2
holds a lock by some other transaction and T1 is requesting for resources held by T2 then the following
actions are performed by DBMS:

1. Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and Tj has held some resource, then Ti is
allowed to wait until the data-item is available for execution. That means if the older
transaction is waiting for a resource which is locked by the younger transaction, then the older
transaction is allowed to wait for resource until it is available.
2. Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has held some resource and if Tj is
waiting for it, then Tj is killed and restarted later with the random delay but with the same
timestamp.
Wound wait scheme

 In wound wait scheme, if the older transaction requests for a resource which is held by the
younger transaction, then older transaction forces younger one to kill the transaction and
release the resource. After the minute delay, the younger transaction is restarted but with the
same timestamp.
 If the older transaction has held a resource which is requested by the Younger transaction,
then the younger transaction is asked to wait until older releases it.

Recovery Systems

Database Recovery Techniques in DBMS

Database systems, like any other computer system, are subject to failures but the data
stored in it must be available as and when required. When a database fails it must possess
the facilities for fast recovery. It must also have atomicity i.e. either transactions are
completed successfully and committed (the effect is recorded permanently in the database) or
the transaction should have no effect on the database..

Recovery techniques are heavily dependent upon the existence of a special file known as a
system log. It contains information about the start and end of each transaction and any
updates which occur in the transaction. The log keeps track of all transaction operations
that affect the values of database items. This information is needed to recover from
transaction failure.

 The log is kept on disk start_transaction(T): This log entry records that transaction T
starts the execution.
 read_item(T, X): This log entry records that transaction T reads the value of database
item X.
 write_item(T, X, old_value, new_value): This log entry records that transaction T
changes the value of the database item X from old_value to new_value. The old value is
sometimes known as a before an image of X, and the new value is known as an
afterimage of X.
 commit(T): This log entry records that transaction T has completed all accesses to the
database successfully and its effect can be committed (recorded permanently) to the
database.
 abort(T): This records that transaction T has been aborted.
 checkpoint: Checkpoint is a mechanism where all the previous logs are removed from
the system and stored permanently in a storage disk. Checkpoint declares a point
before which the DBMS was in consistent state, and all the transactions were
committed.

A transaction T reaches its commit point when all its operations that access the database
have been executed successfully i.e. the transaction has reached the point at which it will not
abort (terminate without completing). Once committed, the transaction is permanently
recorded in the database.

Commitment always involves writing a commit entry to the log and writing the log to disk. At
the time of a system crash, item is searched back in the log for all transactions T that have
written a start_transaction(T) entry into the log but have not written a commit(T) entry yet;
These transactions may have to be rolled back to undo their effect on the database during
the recovey process
 Undoing – If a transaction crashes, then the recovery manager may undo transactions
i.e. reverse the operations of a transaction. This involves examining a transaction for
the log entry write_item(T, x, old_value, new_value) and setting the value of item x in
the database to old-value.There are two major techniques for recovery from non-
catastrophic transaction failures: deferred updates and immediate updates

 Deferred update – This technique does not physically update the database on disk
until a transaction has reached its commit point. Before reaching commit, all
transaction updates are recorded in the local transaction workspace. If a transaction
fails before reaching its commit point, it will not have changed the database in any way
so UNDO is not needed. It may be necessary to REDO the effect of the operations that
are recorded in the local transaction workspace, because their effect may not yet have
been written in the database. Hence, a deferred update is also known as the No-
undo/redo algorithm

 Immediate update – In the immediate update, the database may be updated by some
operations of a transaction before the transaction reaches its commit point. However,
these operations are recorded in a log on disk before they are applied to the database,
making recovery still possible. If a transaction fails to reach its commit point, the effect
of its operation must be undone i.e. the transaction must be rolled back hence we
require both undo and redo. This technique is known as undo/redo algorithm.

 Caching/Buffering – In this one or more disk pages that include data items to be
updated are cached into main memory buffers and then updated in memory before
being written back to disk. A collection of in-memory buffers called the DBMS cache is
kept under control of DBMS for holding these buffers. A directory is used to keep track
of which database items are in the buffer. A dirty bit is associated with each buffer,
which is 0 if the buffer is not modified else 1 if modified.

Shadow paging – It provides atomicity and durability. A directory with n entries is


constructed, where the ith entry points to the ith database page on the link. When a
transaction began executing the current directory is copied into a shadow directory.
When a page is to be modified, a shadow page is allocated in which changes are made
and when it is ready to become durable, all pages that refer to original are updated to
refer new replacement page.
Some of the backup techniques are as follows :
 Full database backup – In this full database including data and database, Meta
information needed to restore the whole database, including full-text catalogs are
backed up in a predefined time series.

 Differential backup – It stores only the data changes that have occurred since last
full database backup. When same data has changed many times since last full database
backup, a differential backup stores the most recent version of changed data. For this
first, we need to restore a full database backup.

 Transaction log backup – In this, all events that have occurred in the database, like
a record of every single statement executed is backed up. It is the backup of transaction
log entries and contains all transaction that had happened to the database. Through
this, the database can be recovered to a specific point in time. It is even possible to
perform a backup from a transaction log if the data files are destroyed and not even a
single committed transaction is lost.

You might also like