Professional Documents
Culture Documents
Unit IV
Unit IV
Unit IV
Transactions
THE CONCEPT OF A TRANSACTION
A user writes data access/update programs in terms of the high-level query and update
language supported by the DBMS.
To understand how the DBMS handles such requests, with respect to concurrency control
and recovery, it is convenient to regard an execution of a user program, or transaction, as
a series of reads and writes of database objects:
• To read a database object, it is first brought into main memory (specifically, some frame
in the buffer pool) from disk, and then its value is copied into a program variable.
• To write a database object, an in-memory copy of the object is first modified and then
written to disk.
Database ‘objects’ are the units in which programs read or write information. The units
could be pages, records, and so on, but this is dependent on the DBMS and is not central to
the principles underlying concurrency control or recovery.
Database Transaction
A Database Transaction is a logical unit of processing in a DBMS which entails one or
more database access operation.
In a nutshell, database transactions represent real-world events of any enterprise.
Facts about Database Transactions
1. A transaction is a program unit whose execution may or may not change the
contents of a database.
2. The transaction concept in DBMS is executed as a single unit.
3. If the database operations do not update the database but only retrieve data, this type
of transaction is called a read-only transaction.
4. A successful transaction can change the database from one CONSISTENT STATE to
another.
5. DBMS transactions must be atomic, consistent, isolated and durable.
6. If the database were in an inconsistent state before a transaction, it would remain
in the inconsistent state after the transaction.
Need concurrency in Transactions
Consistency
Total after T occurs = 400 + 300 = 700.
Total before T occurs = 500 + 200 = 700.
Isolation
Let A = 500, B = 500
Some important points:
ACID Properties are used for maintaining the integrity of database during transaction
processing.
ACID in DBMS stands for Atomicity, Consistency, Isolation, and Durability.
A schedule that contains either an abort or a commit for each transaction whose actions
are listed in it is called a complete schedule.
Schedule
Serial Schedule
The serial schedule is a type of schedule where one transaction is executed
completely before starting another transaction.
In the serial schedule, when the first transaction completes its cycle, then the next
transaction is executed.
For example: Suppose there are two transactions T1 and T2 which have some operations. If it has no interleaving
of operations, then there are the following two possible outcomes:
• Execute all the operations of T1 which was followed by all the operations of T2.
• Execute all the operations of T2 which was followed by all the operations of T1.
Non-serial Schedule
If interleaving of operations is allowed, then there will be non-serial schedule.
It contains many possible orders in which the system can execute the individual
operations of the transactions.
Serializable schedule
The serializability of schedules is used to find non-serial schedules that allow the
transaction to execute concurrently without interfering with one another.
It identifies which schedules are correct when executions of the transaction have
interleaving of their operations.
A non-serial schedule will be serializable if its result is equal to the result of its
transactions executed serially.
Serial Schedules Vs Serializable Schedules-
Conflicting operations: Two operations are said to be conflicting if all conditions satisfy:
• They belong to different transactions Note: Conflict pairs for the same data item are:
Read-Write
• They operate on the same data item
Write-Write
• At Least one of them is a write operation Write-Read
Example: –
• Conflicting operations pair (R1(A), W2(A)) because they belong to two different transactions on
same data item A and one of them is write operation.
• Similarly, (W1(A), W2(A)) and (W1(A), R2(A)) pairs are also conflicting.
• On the other hand, (R1(A), W2(B)) pair is non-conflicting because they operate on different data
item.
• Similarly, ((W1(A), W2(B)) pair is non-conflicting.
View-Serializability
There may be some schedules that are not Conflict-Serializable but still gives a
consistent result because the concept of Conflict-Serializability becomes limited when
the Precedence Graph of a schedule contains a loop/cycle.
In such a case we cannot predict whether a schedule would be consistent or
inconsistent.
As per the concept of Conflict-Serializability, We can say that a schedule is Conflict-
Serializable (means serial and consistent) iff its corresponding precedence graph does
not have any loop/cycle.
View Serializability
• A schedule will view serializable if it is view equivalent to a serial schedule.
• If a schedule is conflict serializable, then it will be view serializable.
• The view serializable which does not conflict serializable contains blind writes.
View Equivalent
Two schedules S1 and S2 are said to be view equivalent if they satisfy the following
conditions:
1. Initial Read
An initial read of both schedules must be the same. Suppose two schedule S1 and S2.
In schedule S1, if a transaction T1 is reading the data item A, then in S2, transaction T1
should also read A.
Above two schedules are view equivalent because Initial read operation in S1 is done by T1
and in S2 it is also done by T1.
2. Updated Read
In schedule S1, if Ti is reading A which is updated by Tj then in S2 also, Ti should read A
which is updated by Tj.
Above two schedules are not view equal because, in S1, T3 is reading A updated by T2
and in S2, T3 is reading A updated by T1.
3. Final Write
A final write must be the same between both the schedules. In schedule S1, if a transaction
T1 updates A at last then in S2, final writes operations should also be done by T1.
Above two schedules is view equal because Final write operation in S1 is done by T3
and in S2, the final write operation is also done by T3.
Example:
Schedule S
= 3! = 6
S1 = <T1 T2 T3>
S2 = <T1 T3 T2>
S3 = <T2 T3 T1>
S4 = <T2 T1 T3>
S5 = <T3 T1 T2>
S6 = <T3 T2 T1>
Non-Serializable Schedules
Characteristics-
Non-serializable schedules-
Schedules in which transactions commit only after all transactions whose changes they
read commit are called recoverable schedules.
In other words, if some transaction Tj is reading value updated or written by some other
transaction Ti, then the commit of Tj must occur after the commit of Ti.
Example 1: (Recoverable Schedule)
T1 T2
R(A)
W(A)
W(A)
R(A)
commit
commit
Example 2: (Recoverable Schedule)
T1 T2
Read(A)
Write(A)
Commit
Commit // delayed
Irrecoverable schedules
If a transaction does a dirty read operation from an uncommitted transaction and
commits before the transaction from where it has read the value, then such a schedule
is called an irrecoverable schedule.
T1 T2
Read(A)
Example : Irrecoverable schedule
Write(A)
- Read(A)
///Dirty Read
- Write(A)
- Commit
-
Rollback
Types of Recoverable Schedules-
In other words,
• Cascadeless schedule allows only committed read operations.
• Therefore, it avoids cascading roll back and thus saves CPU time.
NOTE-
If in a schedule, a transaction is neither allowed to read nor write a data item until the
last transaction that has written it is committed or aborted, then such a schedule is
called as a Strict Schedule.
In other words,
Concurrent Execution
• In a multi-user system, multiple users can access and use the same database at
one time, which is known as the concurrent execution of the database.
• It means that the same database is executed simultaneously on a multi-user system
by different users.
• While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case,
concurrent execution of the database is performed.
• The simultaneous execution that is performed should be done in an interleaved manner,
and no operation should affect the other executing operations, thus maintaining the
consistency of the database. Thus, on making the concurrent execution of the
transaction operations, there occur several challenging problems that need to be solved.
Problems with Concurrent Execution
In a database transaction, the two main operations are READ and WRITE operations. So,
there is a need to manage these two operations in the concurrent execution of the
transactions as if these operations are not performed in an interleaved manner, and the
data may become inconsistent.
The following problems occur with the Concurrent Execution of the operations:
1. Lost Update Problems (W - W Conflict)
2. Dirty Read Problems (W-R Conflict)
3. Unrepeatable Read Problem (W-R Conflict)
Lost Update Problems (W - W Conflict)
The problem occurs when two different database transactions perform the read/write
operations on the same database items in an interleaved manner (i.e., concurrent
execution) that makes the values of the items incorrect hence making the database
inconsistent.
For example:
Consider the below diagram where two transactions T X and TY, are performed on the same
account A where the balance of account A is Rs300.
The dirty read problem occurs when one transaction updates an item of the database,
and somehow the transaction fails, and before the data gets rollback, the updated
database item is accessed by another transaction.
There comes the Read-Write Conflict between both transactions.
For example:
Consider two transactions TX and TY in the below diagram performing read/write operations
on account A where the available balance in account A is Rs300:
Unrepeatable Read Problem (W-R Conflict)
Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two
different values are read for the same database item.
For example:
Consider two transactions, TX and TY, performing the read/write operations on account A,
having an available balance = Rs300. The diagram is shown below:
Concurrency Control
Concurrency Control is the working concept that is required for controlling and managing
the concurrent execution of database operations and thus avoiding the
inconsistencies in the database.
Thus, for maintaining the concurrency of the database, we have the concurrency
control protocols.
2. Exclusive lock
Lock-Based Protocol
1. Shared lock(S):
It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.
2. Exclusive lock(X):
In the exclusive lock, the data item can both read as well as written by the transaction.
This lock is exclusive, and in this lock, multiple transactions do not modify the same
data simultaneously.
Also called write lock.
Lock Compatibility Matrix –
Growing phase: In the growing phase, a new lock on the data item may be acquired by
the transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be
released, but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can happen:
Transaction T1:
Transaction T2:
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
Where,
TS(TI) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.
Advantages and Disadvantages of TO protocol:
TO protocol ensures serializability since the precedence graph is as follows:
TS protocol ensures freedom from deadlock that means no transaction ever waits.
But the schedule may not be recoverable and may not even be cascade- free.
Validation Based Protocol
Validation phase is also known as optimistic concurrency control technique. In the
validation based protocol, the transaction is executed in the following three phases:
• Read phase: In this phase, the transaction T is read and executed. It is used to read
the value of various data items and stores them in temporary local variables. It can
perform all the write operations on temporary variables without an update to the
actual database.
• Validation phase: In this phase, the temporary variable value will be validated
against the actual data to see if it violates the serializability.
• Write phase: If the validation of the transaction is validated, then the temporary results
are written to the database or system otherwise the transaction is rolled back.
Here each phase has the following different timestamps:
Validation (Ti): It contains the time when Ti finishes its read phase and starts its validation
phase.
This protocol is used to determine the time stamp for the transaction for serialization using the
time stamp of the validation phase, as it is the actual phase which determines if the
transaction will commit or rollback.
Hence TS(T) = validation(T).
The serializability is determined during the validation process. It can't be decided in advance.
While executing the transaction, it ensures a greater degree of concurrency and also less number
of conflicts.
Thus it contains transactions which have less number of rollbacks.
Thomas write Rule
Thomas Write Rule provides the guarantee of serializability order for the protocol. It
improves the Basic Timestamp Ordering Algorithm.
If TS(T) < R_TS(X) then transaction T is aborted and rolled back, and operation is
rejected.
If TS(T) < W_TS(X) then don't execute the W_item(X) operation of the transaction and
continue processing.
If neither condition 1 nor condition 2 occurs, then allowed to execute the WRITE
operation by transaction Ti and set W_TS(X) to TS(T).
If we use the Thomas write rule then some serializable schedule can be permitted that does not conflict
serializable as illustrate by the schedule in a given figure:
In the above figure, T1's read and precedes T1's write of the same data item. This schedule does not conflict
serializable.
Thomas write rule checks that T2's write is never seen by any transaction. If we delete the write operation in
transaction T2, then conflict serializable schedule can be obtained which is shown in below figure.
Read Uncommitted − It is the lowest level of isolation. At this level; the dirty reads are
allowed, which means one can read the uncommitted changes made by another.
Read committed − It allows no dirty reads, and clearly states that any uncommitted data is
committed now it is read.
Repeatable Read − This is the most restricted level of isolation. The transaction holds
read locks on all the rows it references and write locks over all the rows it
updates/inserts/deletes. So, there is no chance of non-repeatable reads.
Multiple Granularity:
It can be defined as hierarchically breaking up the database into blocks which can be
locked.
The Multiple Granularity protocol enhances concurrency and reduces lock overhead.
It maintains the track of what to lock and how to lock.
It makes easy to decide either to lock a data item or to unlock a data item.
This type of hierarchy can be graphically represented as a tree.
Multiple Granularity
For example: Consider a tree which has four levels of nodes.
The first level or higher level shows the entire database.
The second level represents a node of type area. The higher level database consists of exactly these areas.
The area consists of children nodes which are known as files. No file can be present in more than one area.
Finally, each file contains child nodes known as records. The file has exactly those records that are its child
nodes. No records represent in more than one file.
Intention-shared (IS): It contains explicit locking at a lower level of the tree but only
with shared locks.
Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared mode, and
some node is locked in exclusive mode by the same transaction.
Compatibility Matrix with Intention Lock Modes: The below table describes the compatibility matrix for
these lock modes:
It uses the intention lock modes to ensure serializability. It requires that if a transaction attempts to lock a node, then that node
must follow these protocols:
• It should check the states of all the transactions, which were being executed.
• A transaction may be in the middle of some operation; the DBMS must ensure the atomicity of
the transaction in this case.
• It should check whether the transaction can be completed now or it needs to be rolled back.
Maintaining the logs of each transaction, and writing them onto some stable storage
before actually modifying the database.
Maintaining shadow paging, where the changes are done on a volatile memory, and
later, the actual database is updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a
transaction. It is important that the logs are written prior to the actual modification and stored
on a stable storage media, which is failsafe.
While recovering the data about transaction by using log files each transaction will be listed
in one of the below list.
1. re-do list
2. undo list
Write-Ahead Log
When a log is created after executing a transaction, there will not be any log information about the
data before to the transaction. In addition, if a transaction fails, then there is no question of creating the
log itself. We will lose all the data if we create a log file after the transaction. Hence it is of no use while
recovering the data.
Suppose we created a log file first with before value of the data. Then if the system crashes while executing
the transaction, then we know what its previous state / value was and we can easily revert the changes.
Hence it is always a better idea to log the details into log file before the transaction is executed. In
addition, it should be forced to update the log files first and then have to write the data into DB. i.e.; in
ATM withdrawal, each stages of transactions should be logged into log files, and stored somewhere in
the memory. Then the actual balance has to be updated in DB. This will guarantee the atomicity of the
transaction even if the system fails. This is known as Write-Ahead Logging Protocol.
Shadow Paging:
Concept of Shadow Paging Technique
Shadow paging is an alternative to transaction-log based recovery techniques.
Here, the database considered as made up of fixed size disk blocks, called pages.
These pages mapped to physical storage using a table, called page table.
The page table indexed by a page number of the database. The information about
physical pages, in which database pages are stored, is kept in this page table.
This technique is similar to paging technique used by Operating Systems to allocate
memory, particularly to manage virtual memory.
Shadow Paging:
The figure depicts the concept of shadow
paging.
Execution of Transaction
• During the execution of the transaction,
two-page tables maintained.
1. Current Page Table: Used to access data
items during transaction execution.
2. Shadow Page Table: Original page table, and
not get modified during transaction
execution.
To ease this situation, most modern DBMS use the concept of 'checkpoints'.
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the
memory space available in the system. As time passes, the log file may grow too big to be
handled at all.
• Checkpoint is a mechanism where all the previous logs are removed from the
system and stored permanently in a storage disk.
• Checkpoint declares a point before which the DBMS was in consistent state, and all
the transactions were committed.
Recovery with Concurrent Transactions
Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the following manner −
Recovery
The recovery system reads the logs backwards from the end to the last checkpoint.
• If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it puts the
transaction in the redo-list.
• If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the transaction in
undo-list.
All the transactions in the undo-list are then undone and their logs are removed.
All the transactions in the redo-list and their previous logs are removed and then redone before saving their
logs.
Recovery Algorithm of ARIES ( or ARIES Recovery Algorithm)
ARIES is a recovery algorithm designed to work with a no-force, steal database approach.
The ARIES recovery procedure consists of three main steps:
1. Analysis
The analysis step identifies the dirty (updated) pages in the buffer, and the set of
transactions active at the time of the crash. The appropriate point in the log where the REDO
operation should start is also determined
2. REDO
The REDO phase actually reapplies updates from the log to the database. Generally, the
REDO operation is applied to only committed transactions. However, in ARIES, this is not the
case. Certain information in the ARIES log will provide the start point for REDO, from which
REDO operations are applied until the end of the log is reached. In addition, information
stored by ARIES and in the data pages will allow ARIES to determine whether the operation to
be redone has actually been applied to the database and hence need not be reapplied. Thus
only the necessary REDO operations are applied during recovery.
3. UNDO
During the UNDO phase, the log is scanned backwards and the operations of transactions
that were active at the time of the crash are undone in reverse order. The information needed
for ARIES to accomplish its recovery procedure includes the log, the Transaction Table, and
the Dirty Page Table. In addition, check pointing is used. These two tables are maintained by
the transaction manager and written to the log during check pointing.
Data structures used in ARIES algorithm:
1. page table
2. dirty page table
3. pageLSN
4. RedoLSN
5. Transaction Table
6. Checkpoint Log
** LSN stands for Log Sequence Number
For efficient recovery, we need Transaction table and Dirty Page table .
Data structures used in ARIES algorithm:
The Dirty Page Table contains an entry for each dirty page in the buffer, which includes the
page ID and the LSN corresponding to the earliest update to that page.
Checkpointing in ARIES consists of the following:
1. writing a begin_checkpoint record to the log,
2. writing an end_checkpoint record to the log, and
3. writing the LSN of the begin_checkpoint record to a special file.
This Checkpoint log file is accessed during recovery to locate the last checkpoint
information.
After a crash, the ARIES recovery manager takes over.
DEAD LOCKS:
Consider two transaction t1 and t2.if t1 holds lock on data item x and t2 holds lock on data
item y now t1 refers lock over y & t2 request lock over x then deadlock situation occur when
none of the transaction are ready to release locks on x ,y.
• The wait-die based on time stamp of the transaction request for conflicting resources.
1)ts(t1)<ts(t2):t1 will wait in a queue & t2 will die/abort.
2)ts(t1)>ts(t2):t2 will be waiting in queue & t1 will abort/die
For example:
Suppose that transaction T22, T23, T24 have time-stamps 5, 10 and 15 respectively. If T22
requests a data item held by T23 then T22 will wait. If T24 requests a data item held by
T23, then T24 will be rolled back.
DEAD LOCKS:
Buffer Manager
• A Buffer Manager is responsible for allocating space to the buffer in order to store data into the
buffer.
• If a user request a particular block and the block is available in the buffer, the buffer manager
provides the block address in the main memory.
• If the block is not available in the buffer, the buffer manager allocates the block in the buffer.
• If free space is not available, it throws out some existing blocks from the buffer to allocate the
required space for the new block.
• The blocks which are thrown are written back to the disk only if they are recently modified when
writing on the disk.
• If the user requests such thrown-out blocks, the buffer manager reads the requested block from
the disk to the buffer and then passes the address of the requested block to the user in the main
memory.
• However, the internal actions of the buffer manager are not visible to the programs that may create any
problem in disk-block requests. The buffer manager is just like a virtual machine.
The buffer manager uses the following methods: