Unit IV

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 87

Unit-IV

Transactions
THE CONCEPT OF A TRANSACTION
A user writes data access/update programs in terms of the high-level query and update
language supported by the DBMS.

To understand how the DBMS handles such requests, with respect to concurrency control
and recovery, it is convenient to regard an execution of a user program, or transaction, as
a series of reads and writes of database objects:

• To read a database object, it is first brought into main memory (specifically, some frame
in the buffer pool) from disk, and then its value is copied into a program variable.
• To write a database object, an in-memory copy of the object is first modified and then
written to disk.

Database ‘objects’ are the units in which programs read or write information. The units
could be pages, records, and so on, but this is dependent on the DBMS and is not central to
the principles underlying concurrency control or recovery.
Database Transaction
A Database Transaction is a logical unit of processing in a DBMS which entails one or
more database access operation.
In a nutshell, database transactions represent real-world events of any enterprise.
Facts about Database Transactions
1. A transaction is a program unit whose execution may or may not change the
contents of a database.
2. The transaction concept in DBMS is executed as a single unit.
3. If the database operations do not update the database but only retrieve data, this type
of transaction is called a read-only transaction.
4. A successful transaction can change the database from one CONSISTENT STATE to
another.
5. DBMS transactions must be atomic, consistent, isolated and durable.
6. If the database were in an inconsistent state before a transaction, it would remain
in the inconsistent state after the transaction.
Need concurrency in Transactions

A database is a shared resource accessed. It is used by many users and processes


concurrently. For example, the banking system, railway, and air reservations systems, stock
market monitoring, supermarket inventory, and checkouts, etc.

Not managing concurrent access may create issues like:


• Hardware failure and system crashes
• Concurrent execution of the same transaction, deadlock, or slow performance
States of Transactions
States of Transactions

The various states of a transaction concept in DBMS are listed below:

State Transaction types


A transaction enters into an active state when the
Active State execution process begins. During this state read or write
operations can be performed.
A transaction goes into the partially committed state
Partially Committed
after the end of a transaction.
When the transaction is committed to state, it has
already completed its execution successfully. Moreover,
Committed State
all of its changes are recorded to the database
permanently.
A transaction considers failed when any one of the
Failed State checks fails or if the transaction is aborted while it is in
the active state.
State of transaction reaches terminated state when
Terminated State certain transactions which are leaving the system can’t
be restarted.
Atomicity

Consistency
Total after T occurs = 400 + 300 = 700.
Total before T occurs = 500 + 200 = 700.
Isolation
Let A = 500, B = 500
Some important points:

Property Responsibility for maintaining properties

Atomicity Transaction Manager


Consistency Application programmer
Isolation Concurrency Control Manager
Durability Recovery Manager

Uses of ACID Properties


• In totality, the ACID properties of transactions provide a mechanism in DBMS to ensure the
consistency and correctness of any database.
• It ensures consistency in a way that every transaction acts as a group of operations acting
as single units, produces consistent results, operates in an isolated manner from all
the other operations, and makes durably stored updates.
• These ensure the integrity of data in any given database.
Properties of Transaction

ACID Properties are used for maintaining the integrity of database during transaction
processing.
ACID in DBMS stands for Atomicity, Consistency, Isolation, and Durability.

Atomicity: A transaction is a single unit of operation. You either execute it entirely or do


not execute it at all. There cannot be partial execution.
Consistency: Once the transaction is executed, it should move from one consistent
state to another.
Isolation: Transaction should be executed in isolation from other transactions (no Locks).
During concurrent transaction execution, intermediate transaction results from
simultaneously executed transactions should not be made available to each other.
Durability: After successful completion of a transaction, the changes in the database
should persist. Even in the case of system failures.
Schedule
A series of operation from one transaction to another transaction is known as schedule. It
is used to preserve the order of the operation in each of the individual transaction.

A Schedule Involving Two Transactions

A schedule that contains either an abort or a commit for each transaction whose actions
are listed in it is called a complete schedule.
Schedule
Serial Schedule
The serial schedule is a type of schedule where one transaction is executed
completely before starting another transaction.
In the serial schedule, when the first transaction completes its cycle, then the next
transaction is executed.
For example: Suppose there are two transactions T1 and T2 which have some operations. If it has no interleaving
of operations, then there are the following two possible outcomes:
• Execute all the operations of T1 which was followed by all the operations of T2.
• Execute all the operations of T2 which was followed by all the operations of T1.
Non-serial Schedule
If interleaving of operations is allowed, then there will be non-serial schedule.
It contains many possible orders in which the system can execute the individual
operations of the transactions.
Serializable schedule
The serializability of schedules is used to find non-serial schedules that allow the
transaction to execute concurrently without interfering with one another.
It identifies which schedules are correct when executions of the transaction have
interleaving of their operations.
A non-serial schedule will be serializable if its result is equal to the result of its
transactions executed serially.
Serial Schedules Vs Serializable Schedules-

Serial Schedules Serializable Schedules


No concurrency is allowed. Concurrency is allowed.
Thus, all the transactions necessarily Thus, multiple transactions can execute
execute serially one after the other. concurrently.
Serial schedules lead to less resource Serializable schedules improve both
utilization and CPU throughput. resource utilization and CPU throughput.
Serial Schedules are less efficient as Serializable Schedules are always better
compared to serializable schedules. than serial schedules.
Types of Serializability-

Serializability is mainly of two types-


Conflict Serializability: A schedule is called conflict serializable if it can be transformed into a
serial schedule by swapping non-conflicting operations.

Conflicting operations: Two operations are said to be conflicting if all conditions satisfy:

• They belong to different transactions Note: Conflict pairs for the same data item are:
Read-Write
• They operate on the same data item
Write-Write
• At Least one of them is a write operation Write-Read

Example: –

• Conflicting operations pair (R1(A), W2(A)) because they belong to two different transactions on
same data item A and one of them is write operation.
• Similarly, (W1(A), W2(A)) and (W1(A), R2(A)) pairs are also conflicting.
• On the other hand, (R1(A), W2(B)) pair is non-conflicting because they operate on different data
item.
• Similarly, ((W1(A), W2(B)) pair is non-conflicting.
View-Serializability
There may be some schedules that are not Conflict-Serializable but still gives a
consistent result because the concept of Conflict-Serializability becomes limited when
the Precedence Graph of a schedule contains a loop/cycle.
In such a case we cannot predict whether a schedule would be consistent or
inconsistent.
As per the concept of Conflict-Serializability, We can say that a schedule is Conflict-
Serializable (means serial and consistent) iff its corresponding precedence graph does
not have any loop/cycle.

View Serializability
• A schedule will view serializable if it is view equivalent to a serial schedule.
• If a schedule is conflict serializable, then it will be view serializable.
• The view serializable which does not conflict serializable contains blind writes.
View Equivalent
Two schedules S1 and S2 are said to be view equivalent if they satisfy the following
conditions:

1. Initial Read
An initial read of both schedules must be the same. Suppose two schedule S1 and S2.
In schedule S1, if a transaction T1 is reading the data item A, then in S2, transaction T1
should also read A.

Above two schedules are view equivalent because Initial read operation in S1 is done by T1
and in S2 it is also done by T1.
2. Updated Read
In schedule S1, if Ti is reading A which is updated by Tj then in S2 also, Ti should read A
which is updated by Tj.

Above two schedules are not view equal because, in S1, T3 is reading A updated by T2
and in S2, T3 is reading A updated by T1.
3. Final Write
A final write must be the same between both the schedules. In schedule S1, if a transaction
T1 updates A at last then in S2, final writes operations should also be done by T1.

Above two schedules is view equal because Final write operation in S1 is done by T3
and in S2, the final write operation is also done by T3.
Example:

Schedule S

With 3 transactions, the total number of possible schedule

= 3! = 6
S1 = <T1 T2 T3>
S2 = <T1 T3 T2>
S3 = <T2 T3 T1>
S4 = <T2 T1 T3>
S5 = <T3 T1 T2>
S6 = <T3 T2 T1>
Non-Serializable Schedules

A non-serial schedule which is not serializable is called as a non-serializable schedule.


A non-serializable schedule is not guaranteed to produce the same effect as produced
by some serial schedule on any consistent database.

Characteristics-
Non-serializable schedules-

• may or may not be consistent


• may or may not be recoverable
Recoverable Schedules

Schedules in which transactions commit only after all transactions whose changes they
read commit are called recoverable schedules.
In other words, if some transaction Tj is reading value updated or written by some other
transaction Ti, then the commit of Tj must occur after the commit of Ti.
Example 1: (Recoverable Schedule)
T1 T2

R(A)

W(A)

W(A)

R(A)

commit

commit
Example 2: (Recoverable Schedule)
T1 T2
Read(A)

Write(A)

- Read(A) ///Dirty Read


- Write(A)
-

Commit

Commit // delayed
Irrecoverable schedules
If a transaction does a dirty read operation from an uncommitted transaction and
commits before the transaction from where it has read the value, then such a schedule
is called an irrecoverable schedule.
T1 T2
Read(A)
Example : Irrecoverable schedule
Write(A)

- Read(A)
///Dirty Read
- Write(A)
- Commit
-

Rollback
Types of Recoverable Schedules-

A recoverable schedule may be any one of these kinds-


Cascading Schedule
If in a schedule, failure of one transaction causes several other dependent
transactions to rollback or abort, then such a schedule is called as a Cascading
Schedule or Cascading Rollback or Cascading Abort.
It simply leads to the wastage of CPU time.
Cascadeless Schedule
If in a schedule, a transaction is not allowed to read a data item until the last
transaction that has written it is committed or aborted, then such a schedule is called
as a Cascadeless Schedule.

In other words,
• Cascadeless schedule allows only committed read operations.
• Therefore, it avoids cascading roll back and thus saves CPU time.

NOTE-

Cascadeless schedule allows only committed read operations.


However, it allows uncommitted write operations.
Cascadeless Schedule
Cascadeless Schedule
Strict Schedule

If in a schedule, a transaction is neither allowed to read nor write a data item until the
last transaction that has written it is committed or aborted, then such a schedule is
called as a Strict Schedule.

In other words,

Strict schedule allows only committed read and write operations.


Clearly, strict schedule implements more restrictions than cascadeless schedule.
Example-
Note
• Strict schedules are more strict than cascadeless schedules.
• All strict schedules are cascadeless schedules.
• All cascadeless schedules are not strict schedules.
Concurrency Control
Concurrency Control is the management procedure that is required for controlling
concurrent execution of the operations that take place on a database.

Concurrent Execution
• In a multi-user system, multiple users can access and use the same database at
one time, which is known as the concurrent execution of the database.
• It means that the same database is executed simultaneously on a multi-user system
by different users.
• While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case,
concurrent execution of the database is performed.
• The simultaneous execution that is performed should be done in an interleaved manner,
and no operation should affect the other executing operations, thus maintaining the
consistency of the database. Thus, on making the concurrent execution of the
transaction operations, there occur several challenging problems that need to be solved.
Problems with Concurrent Execution
In a database transaction, the two main operations are READ and WRITE operations. So,
there is a need to manage these two operations in the concurrent execution of the
transactions as if these operations are not performed in an interleaved manner, and the
data may become inconsistent.

The following problems occur with the Concurrent Execution of the operations:
1. Lost Update Problems (W - W Conflict)
2. Dirty Read Problems (W-R Conflict)
3. Unrepeatable Read Problem (W-R Conflict)
Lost Update Problems (W - W Conflict)

The problem occurs when two different database transactions perform the read/write
operations on the same database items in an interleaved manner (i.e., concurrent
execution) that makes the values of the items incorrect hence making the database
inconsistent.
For example:
Consider the below diagram where two transactions T X and TY, are performed on the same
account A where the balance of account A is Rs300.

Hence data becomes incorrect, and database sets to inconsistent.


Dirty Read Problems (W-R Conflict)

The dirty read problem occurs when one transaction updates an item of the database,
and somehow the transaction fails, and before the data gets rollback, the updated
database item is accessed by another transaction.
There comes the Read-Write Conflict between both transactions.
For example:

Consider two transactions TX and TY in the below diagram performing read/write operations
on account A where the available balance in account A is Rs300:
Unrepeatable Read Problem (W-R Conflict)
Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two
different values are read for the same database item.

For example:
Consider two transactions, TX and TY, performing the read/write operations on account A,
having an available balance = Rs300. The diagram is shown below:
Concurrency Control
Concurrency Control is the working concept that is required for controlling and managing
the concurrent execution of database operations and thus avoiding the
inconsistencies in the database.
Thus, for maintaining the concurrency of the database, we have the concurrency
control protocols.

Concurrency Control Protocols


The concurrency control protocols ensure the atomicity, consistency, isolation,
durability and serializability of the concurrent execution of the database transactions.

Therefore, these protocols are categorized as:

• Lock Based Concurrency Control Protocol


• Time Stamp Concurrency Control Protocol
• Validation Based Concurrency Control Protocol
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it.

There are two types of lock:


1. Shared lock

2. Exclusive lock
Lock-Based Protocol
1. Shared lock(S):
It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.

2. Exclusive lock(X):
In the exclusive lock, the data item can both read as well as written by the transaction.
This lock is exclusive, and in this lock, multiple transactions do not modify the same
data simultaneously.
Also called write lock.
Lock Compatibility Matrix –

• A transaction may be granted a lock on an item if the requested lock is compatible


with locks already held on the item by other transactions.
• Any number of transactions can hold shared locks on an item, but if any transaction
holds an exclusive(X) on the item no other transaction may hold any lock on the
item.
• If a lock cannot be granted, the requesting transaction is made to wait till all
incompatible locks held by other transactions have been released. Then the lock is
granted.
Lock Compatibility Matrix –

Upgrade / Downgrade locks : A transaction that holds a lock on an item A is allowed


under certain condition to change the lock state from one state to another.
• Upgrade: A S(A) can be upgraded to X(A) if Ti is the only transaction holding the S-lock
on element A.
• Downgrade: We may downgrade X(A) to S(A) when we feel that we no longer want to
write on data-item A. As we were holding X-lock on A, we need not check any conditions.
Lock-Based Protocol

There are four types of lock protocols available:


1. Simplistic lock protocol
It is the simplest way of locking the data while transaction.
Simplistic lock-based protocols allow all the transactions to get the lock on the data
before insert or delete or update on it.
It will unlock the data item after completing the transaction.
2. Pre-claiming Lock Protocol
Pre-claiming Lock Protocols evaluate the transaction to list all the data items on
which they need locks.
Before initiating an execution of the transaction, it requests DBMS for all the lock on
all those data items.
If all the locks are granted, then this protocol allows the transaction to begin. When
the transaction is completed then it releases all the lock.
If all the locks are not granted, then this protocol allows the transaction to rolls back
and waits until all the locks are granted.
3. Two-phase locking (2PL)
The two-phase locking protocol divides the execution phase of the transaction into
three parts.
In the first part, when the execution of the transaction starts, it seeks permission for
the lock it requires.
In the second part, the transaction acquires all the locks. The third phase is started as
soon as the transaction releases its first lock.
In the third phase, the transaction cannot demand any new locks. It only releases the
acquired locks.
There are two phases of 2PL:

Growing phase: In the growing phase, a new lock on the data item may be acquired by
the transaction, but none can be released.

Shrinking phase: In the shrinking phase, existing lock held by the transaction may be
released, but no new locks can be acquired.

In the below example, if lock conversion is allowed then the following phase can happen:

Upgrading of lock (from S(a) to X (a)) is allowed in growing phase.


Downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.
The following way shows how
unlocking and locking work with 2-
PL.

Transaction T1:

Growing phase: from step 1-3


Shrinking phase: from step 5-7
Lock point: at 3

Transaction T2:

Growing phase: from step 2-6


Shrinking phase: from step 8-9
Lock point: at 6
4. Strict Two-phase locking (Strict-2PL)
The first phase of Strict-2PL is similar to 2PL.
In the first phase, after acquiring all the locks, the transaction continues to execute
normally.
The only difference between 2PL and strict 2PL is that Strict-2PL does not release a
lock after using it.
Strict-2PL waits until the whole transaction to commit, and then it releases all the
locks at a time.
Strict-2PL protocol does not have shrinking phase of lock release.

It does not have cascading abort as 2PL does.


Timestamp based Concurrency Control
Timestamp is a unique identifier created by the DBMS to identify a transaction. They are
usually assigned in the order in which they are submitted to the system. Refer to the
timestamp of a transaction T as TS(T).
Timestamp Ordering Protocol
• The Timestamp Ordering Protocol is used to order the transactions based on their
Timestamps. The order of transaction is nothing but the ascending order of the
transaction creation.
• The priority of the older transaction is higher that's why it executes first. To determine
the timestamp of the transaction, this protocol uses system time or logical counter.
• The lock-based protocol is used to manage the order between conflicting pairs
among transactions at the execution time. But Timestamp based protocols start
working as soon as a transaction is created.
• Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the system at 007
times and transaction T2 has entered the system at 009 times. T1 has the higher priority, so it executes first as it
is entered the system first.
• The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write'
operation on a data.
Basic Timestamp ordering protocol works as follows:

1. Check the following condition whenever a transaction Ti issues a Read (X) operation:

If W_TS(X) >TS(Ti) then the operation is rejected.


If W_TS(X) <= TS(Ti) then the operation is executed.
Timestamps of all the data items are updated.

2. Check the following condition whenever a transaction Ti issues a Write(X) operation:

If TS(Ti) < R_TS(X) then the operation is rejected.


If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the operation is executed.

Where,
TS(TI) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.
Advantages and Disadvantages of TO protocol:
TO protocol ensures serializability since the precedence graph is as follows:

TS protocol ensures freedom from deadlock that means no transaction ever waits.
But the schedule may not be recoverable and may not even be cascade- free.
Validation Based Protocol
Validation phase is also known as optimistic concurrency control technique. In the
validation based protocol, the transaction is executed in the following three phases:

• Read phase: In this phase, the transaction T is read and executed. It is used to read
the value of various data items and stores them in temporary local variables. It can
perform all the write operations on temporary variables without an update to the
actual database.
• Validation phase: In this phase, the temporary variable value will be validated
against the actual data to see if it violates the serializability.
• Write phase: If the validation of the transaction is validated, then the temporary results
are written to the database or system otherwise the transaction is rolled back.
Here each phase has the following different timestamps:

Start(Ti): It contains the time when Ti started its execution.

Validation (Ti): It contains the time when Ti finishes its read phase and starts its validation
phase.

Finish(Ti): It contains the time when Ti finishes its write phase.

This protocol is used to determine the time stamp for the transaction for serialization using the
time stamp of the validation phase, as it is the actual phase which determines if the
transaction will commit or rollback.
Hence TS(T) = validation(T).
The serializability is determined during the validation process. It can't be decided in advance.
While executing the transaction, it ensures a greater degree of concurrency and also less number
of conflicts.
Thus it contains transactions which have less number of rollbacks.
Thomas write Rule
Thomas Write Rule provides the guarantee of serializability order for the protocol. It
improves the Basic Timestamp Ordering Algorithm.

The basic Thomas write rules are as follows:

If TS(T) < R_TS(X) then transaction T is aborted and rolled back, and operation is
rejected.
If TS(T) < W_TS(X) then don't execute the W_item(X) operation of the transaction and
continue processing.
If neither condition 1 nor condition 2 occurs, then allowed to execute the WRITE
operation by transaction Ti and set W_TS(X) to TS(T).
If we use the Thomas write rule then some serializable schedule can be permitted that does not conflict
serializable as illustrate by the schedule in a given figure:

Figure: A Serializable Schedule that is not Conflict Serializable

In the above figure, T1's read and precedes T1's write of the same data item. This schedule does not conflict
serializable.

Thomas write rule checks that T2's write is never seen by any transaction. If we delete the write operation in
transaction T2, then conflict serializable schedule can be obtained which is shown in below figure.

Figure: A Conflict Serializable Schedule


Implementation of Isolation
Levels of isolation
There are four levels of isolations −

Read Uncommitted − It is the lowest level of isolation. At this level; the dirty reads are
allowed, which means one can read the uncommitted changes made by another.

Read committed − It allows no dirty reads, and clearly states that any uncommitted data is
committed now it is read.

Repeatable Read − This is the most restricted level of isolation. The transaction holds
read locks on all the rows it references and write locks over all the rows it
updates/inserts/deletes. So, there is no chance of non-repeatable reads.

Serializable − The highest level of civilization. It determines that all concurrent


transactions be executed serially.
Multiple Granularity

Granularity: It is the size of data item allowed to lock.

Multiple Granularity:
It can be defined as hierarchically breaking up the database into blocks which can be
locked.
The Multiple Granularity protocol enhances concurrency and reduces lock overhead.
It maintains the track of what to lock and how to lock.
It makes easy to decide either to lock a data item or to unlock a data item.
This type of hierarchy can be graphically represented as a tree.
Multiple Granularity
For example: Consider a tree which has four levels of nodes.
The first level or higher level shows the entire database.
The second level represents a node of type area. The higher level database consists of exactly these areas.
The area consists of children nodes which are known as files. No file can be present in more than one area.
Finally, each file contains child nodes known as records. The file has exactly those records that are its child
nodes. No records represent in more than one file.

The levels of the tree starting from the top level :


• Database
• Area
• File
• Record
There are three additional lock modes with multiple granularity:

Intention Mode Lock

Intention-shared (IS): It contains explicit locking at a lower level of the tree but only
with shared locks.

Intention-Exclusive (IX): It contains explicit locking at a lower level with exclusive or


shared locks.

Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared mode, and
some node is locked in exclusive mode by the same transaction.
Compatibility Matrix with Intention Lock Modes: The below table describes the compatibility matrix for
these lock modes:

It uses the intention lock modes to ensure serializability. It requires that if a transaction attempts to lock a node, then that node
must follow these protocols:

• Transaction T1 should follow the lock-compatibility matrix.


• Transaction T1 firstly locks the root of the tree. It can lock it in any mode.
• If T1 currently has the parent of the node locked in either IX or IS mode, then the transaction T1 will lock a node in S or IS
mode only.
• If T1 currently has the parent of the node locked in either IX or SIX modes, then the transaction T1 will lock a node in X, SIX, or
IX mode only.
• If T1 has not previously unlocked any node only, then the Transaction T1 can lock a node.
• If T1 currently has none of the children of the node-locked only, then Transaction T1 will unlock a node.
Recovery and Atomicity
When a system crashes, it may have several transactions being executed and various files
opened for them to modify the data items. Transactions are made of various operations, which
are atomic in nature. But according to ACID properties of DBMS, atomicity of transactions as a
whole must be maintained, that is, either all the operations are executed or none.

When a DBMS recovers from a crash, it should maintain the following −

• It should check the states of all the transactions, which were being executed.

• A transaction may be in the middle of some operation; the DBMS must ensure the atomicity of
the transaction in this case.

• It should check whether the transaction can be completed now or it needs to be rolled back.

• No transactions would be allowed to leave the DBMS in an inconsistent state.


There are two types of techniques, which can help a DBMS in recovering as well as
maintaining the atomicity of a transaction −

Maintaining the logs of each transaction, and writing them onto some stable storage
before actually modifying the database.

Maintaining shadow paging, where the changes are done on a volatile memory, and
later, the actual database is updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a
transaction. It is important that the logs are written prior to the actual modification and stored
on a stable storage media, which is failsafe.

Log-based recovery works as follows −


The log file is kept on a stable storage media.
When a transaction enters the system and starts execution, it writes a log about it.
<Tn, Start>

When the transaction modifies an item X, it write logs as follows −


<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V1 to V2.

When the transaction finishes, it logs −


<Tn, commit>
Log-based Recovery

The database can be modified using two approaches −


Deferred database modification − All logs are written on to the stable storage and the
database is updated when a transaction commits.

Immediate database modification − Each log follows an actual database modification.


That is, the database is modified immediately after every operation.

While recovering the data about transaction by using log files each transaction will be listed
in one of the below list.
1. re-do list
2. undo list
Write-Ahead Log
When a log is created after executing a transaction, there will not be any log information about the
data before to the transaction. In addition, if a transaction fails, then there is no question of creating the
log itself. We will lose all the data if we create a log file after the transaction. Hence it is of no use while
recovering the data.

Suppose we created a log file first with before value of the data. Then if the system crashes while executing
the transaction, then we know what its previous state / value was and we can easily revert the changes.
Hence it is always a better idea to log the details into log file before the transaction is executed. In
addition, it should be forced to update the log files first and then have to write the data into DB. i.e.; in
ATM withdrawal, each stages of transactions should be logged into log files, and stored somewhere in
the memory. Then the actual balance has to be updated in DB. This will guarantee the atomicity of the
transaction even if the system fails. This is known as Write-Ahead Logging Protocol.
Shadow Paging:
Concept of Shadow Paging Technique
Shadow paging is an alternative to transaction-log based recovery techniques.
Here, the database considered as made up of fixed size disk blocks, called pages.
These pages mapped to physical storage using a table, called page table.

The page table indexed by a page number of the database. The information about
physical pages, in which database pages are stored, is kept in this page table.
This technique is similar to paging technique used by Operating Systems to allocate
memory, particularly to manage virtual memory.
Shadow Paging:
The figure depicts the concept of shadow
paging.
Execution of Transaction
• During the execution of the transaction,
two-page tables maintained.
1. Current Page Table: Used to access data
items during transaction execution.
2. Shadow Page Table: Original page table, and
not get modified during transaction
execution.

• Whenever any page is about to written for


the first time
1. A copy of this page made into a free page,
2. The current page table made to point to the
copy,
3. The update made in this copy.
Recovery with Concurrent Transactions
When more than one transaction are being executed in parallel, the logs are
interleaved. At the time of recovery, it would become hard for the recovery system to
backtrack all logs, and then start recovering.

To ease this situation, most modern DBMS use the concept of 'checkpoints'.

Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the
memory space available in the system. As time passes, the log file may grow too big to be
handled at all.
• Checkpoint is a mechanism where all the previous logs are removed from the
system and stored permanently in a storage disk.
• Checkpoint declares a point before which the DBMS was in consistent state, and all
the transactions were committed.
Recovery with Concurrent Transactions

Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the following manner −

Recovery
The recovery system reads the logs backwards from the end to the last checkpoint.

It maintains two lists, an undo-list and a redo-list.

• If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it puts the
transaction in the redo-list.

• If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the transaction in
undo-list.

All the transactions in the undo-list are then undone and their logs are removed.
All the transactions in the redo-list and their previous logs are removed and then redone before saving their
logs.
Recovery Algorithm of ARIES ( or ARIES Recovery Algorithm)

• A steal, no-force approach


Steal: if a frame is dirty and chosen for replacement, the page it contains is written to
disk even if the modifying transaction is still active.
No-force: Pages in the buffer pool that are modified by a transaction are not forced to
disk when the transaction commits.

ARIES is a recovery algorithm designed to work with a no-force, steal database approach.
The ARIES recovery procedure consists of three main steps:
1. Analysis
The analysis step identifies the dirty (updated) pages in the buffer, and the set of
transactions active at the time of the crash. The appropriate point in the log where the REDO
operation should start is also determined
2. REDO
The REDO phase actually reapplies updates from the log to the database. Generally, the
REDO operation is applied to only committed transactions. However, in ARIES, this is not the
case. Certain information in the ARIES log will provide the start point for REDO, from which
REDO operations are applied until the end of the log is reached. In addition, information
stored by ARIES and in the data pages will allow ARIES to determine whether the operation to
be redone has actually been applied to the database and hence need not be reapplied. Thus
only the necessary REDO operations are applied during recovery.
3. UNDO
During the UNDO phase, the log is scanned backwards and the operations of transactions
that were active at the time of the crash are undone in reverse order. The information needed
for ARIES to accomplish its recovery procedure includes the log, the Transaction Table, and
the Dirty Page Table. In addition, check pointing is used. These two tables are maintained by
the transaction manager and written to the log during check pointing.
Data structures used in ARIES algorithm:
1. page table
2. dirty page table
3. pageLSN
4. RedoLSN
5. Transaction Table
6. Checkpoint Log
** LSN stands for Log Sequence Number
For efficient recovery, we need Transaction table and Dirty Page table .
Data structures used in ARIES algorithm:

The 2 tables are maintained by transaction manager


The Transaction Table contains an entry for each active transaction, with information such
as the transaction ID, transaction status, and the LSN of the most recent log record for the
transaction.

The Dirty Page Table contains an entry for each dirty page in the buffer, which includes the
page ID and the LSN corresponding to the earliest update to that page.
Checkpointing in ARIES consists of the following:
1. writing a begin_checkpoint record to the log,
2. writing an end_checkpoint record to the log, and
3. writing the LSN of the begin_checkpoint record to a special file.

This Checkpoint log file is accessed during recovery to locate the last checkpoint
information.
After a crash, the ARIES recovery manager takes over.
DEAD LOCKS:
Consider two transaction t1 and t2.if t1 holds lock on data item x and t2 holds lock on data
item y now t1 refers lock over y & t2 request lock over x then deadlock situation occur when
none of the transaction are ready to release locks on x ,y.

The following two techniques can be used for deadlock handling(prevention):


1. wait-die
2. wait-wound
DEAD LOCKS:
1. wait-die:
• In wait die technique the older transaction waited in queue &younger will die.
• The older transaction waits for the younger if the younger has accessed the granule first.
• The younger transaction is aborted (dies) and restarted if it tries to access a granule after an older
concurrent transaction.

• The wait-die based on time stamp of the transaction request for conflicting resources.
1)ts(t1)<ts(t2):t1 will wait in a queue & t2 will die/abort.
2)ts(t1)>ts(t2):t2 will be waiting in queue & t1 will abort/die

For example:
Suppose that transaction T22, T23, T24 have time-stamps 5, 10 and 15 respectively. If T22
requests a data item held by T23 then T22 will wait. If T24 requests a data item held by
T23, then T24 will be rolled back.
DEAD LOCKS:

2. wait wound technique:


• It based on time stamp of transaction request
• It is a preemptive technique for deadlock prevention. It is a counterpart to the wait-die
scheme. When Transaction Ti requests a data item currently held by Tj, Ti is allowed to
wait only if it has a timestamp larger than that of Tj, otherwise Tj is rolled back (Tj is
wounded by Ti)
For example:
Suppose that Transactions T22, T23, T24 have time-stamps 5, 10 and 15 respectively . If
T22 requests a data item held by T23, then data item will be preempted from T23 and T23
will be rolled back. If T24 requests a data item held by T23, then T24 will wait.
Here the younger transactions are made to wait in queue & older transaction going to
abort.
1) ts (t1) < ts (t2): t2 will be in waiting state & t1 in abort.
2) Ts (t1)>ts (t2):t1 will be in waiting & t2 in abort.
DEAD LOCK AVOIDANCE:
Wait for graph:
• This is a simple method available to track if any deadlock situation may arise.
• For each transaction entering into the system, a node is created.
• When a transaction Ti requests for a lock on an item, say X, which is held by
some other transaction Tj, a directed edge is created from Ti to Tj. If Tj
releases item X, the edge between them is dropped and Ti locks the data
item.
• The system maintains this wait-for graph for every transaction waiting for
some data items held by others. The system keeps checking if there's any
cycle in the graph.
Here, we can use any of the two following approaches −
• First, do not allow any request for an item, which is already locked by
another transaction. This is not always feasible and may cause starvation,
where a transaction indefinitely waits for a data item and can never acquire it.
• The second option is to roll back one of the transactions. It is not always
feasible to roll back the younger transaction, as it may be important than the
older one. With the help of some relative algorithm, a transaction is chosen,
which is to be aborted. This transaction is known as the victim and the
process is known as victim selection.
Buffer Management
A database buffer is a temporary storage area in the main memory. It allows storing the data
temporarily when moving from one place to another. A database buffer stores a copy of disk blocks.
But, the version of block copies on the disk may be older than the version in the buffer.

Buffer Manager
• A Buffer Manager is responsible for allocating space to the buffer in order to store data into the
buffer.
• If a user request a particular block and the block is available in the buffer, the buffer manager
provides the block address in the main memory.
• If the block is not available in the buffer, the buffer manager allocates the block in the buffer.
• If free space is not available, it throws out some existing blocks from the buffer to allocate the
required space for the new block.
• The blocks which are thrown are written back to the disk only if they are recently modified when
writing on the disk.
• If the user requests such thrown-out blocks, the buffer manager reads the requested block from
the disk to the buffer and then passes the address of the requested block to the user in the main
memory.
• However, the internal actions of the buffer manager are not visible to the programs that may create any
problem in disk-block requests. The buffer manager is just like a virtual machine.
The buffer manager uses the following methods:

1. Buffer Replacement Strategy: If no space is left in the buffer, it is required to remove


an existing block from the buffer before allocating the new one. The various operating
system uses the LRU (least recently used) scheme. In LRU, the block that was least
recently used is removed from the buffer and written back to the disk. Such type of
replacement strategy is known as Buffer Replacement Strategy.
2. Pinned Blocks: If the user wants to recover any database system from the crashes, it
is essential to restrict the time when a block is written back to the disk. In fact, most
recovery systems do not allow the blocks to be written on the disk if the block updation
being in progress. Such types of blocks that are not allowed to be written on the disk
are known as pinned blocks. Luckily, many operating systems do not support the
pinned blocks.
3. Forced Output of Blocks: In some cases, it becomes necessary to write the block
back to the disk even though the space occupied by the block in the buffer is not
required. When such type of write is required, it is known as the forced output of a
block. It is because sometimes the data stored on the buffer may get lost in some
system crashes, but the data stored on the disk usually does not get affected due to
any disk crash.

You might also like