Professional Documents
Culture Documents
DBMS_UNIT 5_(2022-2023)
DBMS_UNIT 5_(2022-2023)
DBMS_UNIT 5_(2022-2023)
TRANSACTIONS
Introduction to Transaction Processing
Transaction is a logical unit of work that represents real-world events of any organization or an
enterprise whereas concurrency control is the management of concurrent transaction
execution. Transaction processing systems execute database transactions with large databases
and hundreds of concurrent users, for example, railway and air reservations systems, banking
system, credit card processing, stock market monitoring, super market inventory and checkouts
and so on.
A process is resumed at the point where it was suspended whenever it gets its turn to use the CPU
again. Hence, concurrent execution of processes is actually interleaved, as illustrated in Figure
21.1, which shows two processes, A and B, executing concurrently in an interleaved fashion.
Interleaving keeps the CPU busy when a process requires an input or output (I/O) operation, such
as reading a block from disk. The CPU is switched to execute another process rather than
remaining idle during I/O time. Interleaving also prevents a long process from delaying other
processes. If the computer system has multiple hardware processors (CPUs), parallel processing
of multiple processes is possible, as illustrated by processes C and D in Figure 21.1. Concurrency
Control deals with interleaved execution of more than one transaction.
Transactions, Database Items, Read and Write Operations, and DBMS Buffers: -
A transaction is an executing program that forms a logical unit of database processing. A
transaction includes one or more database access operations—these can include insertion,
deletion, modification, or retrieval operations. One way of specifying the transaction boundaries is
by specifying explicit begin transaction and end transaction statements in an application
program; in this case, all database access operations between the two are considered as forming
one transaction. A single application program may contain more than one transaction if it contains
several transaction boundaries. If the database operations in a transaction do not update the
database but only retrieve data, the transaction is called a read-only transaction; otherwise it is
known as a read-write transaction.
The database model that is used to present transaction processing concepts is quite simple when
compared to the data models such as the relational model or the object model. A database is
basically represented as a collection of named data items. The size of a data item is called its
granularity. A data item can be a database record, but it can also be a larger unit such as a whole
disk block, or even a smaller unit such as an individual field (attribute) value of some record in the
database. Using this simplified database model, the basic database access operations that a
transaction can include are as follows:
• read_item(X). Reads a database item named X into a program variable. To simplify our
notation, we assume that the program variable is also named X.
• write_item(X). Writes the value of program variable X into the database item named X.
The basic unit of data transfer from disk to main memory is one block. Executing a read_item(X)
command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if that disk block is not already in some
main memory buffer).
3. Copy item X from the buffer to the program variable named X.
Concurrency control and recovery mechanisms are mainly concerned with the database
commands in a transaction. Transactions submitted by the various users may execute
concurrently and may access and update the same database items. If this concurrent execution is
uncontrolled, it may lead to problems, such as an inconsistent database.
Here,
• At time t2, transaction-X reads A's value.
• At time t3, Transaction-Y reads A's value.
• At time t4, Transactions-X writes A's value on the basis of the value seen at time t2.
• At time t5, Transactions-Y writes A's value on the basis of the value seen at time t3.
• So, at time T5, the update of Transaction-X is lost because Transaction y overwrites it without
looking at its current value.
• Such type of problem is known as Lost Update Problem as update made by one transaction is
lost here.
The Temporary Update (or Dirty Read) Problem: - The dirty read occurs in the case when one
transaction updates an item of the database, and then the transaction fails for some reason. The
updated database item is accessed by another transaction before it is changed back to the original
value. A transaction T1 updates a record which is read by T2. If T1 aborts then T2 now has values
which have never formed part of the stable database.
Example: -
Here,
• At time t2, transaction-Y writes A's value.
• At time t3, Transaction-X reads A's value.
• At time t4, Transactions-Y rollbacks. So, it changes A's value back to that of prior to t1.
• So, Transaction-X now contains a value which has never become part of the stable database.
• Such type of problem is known as Dirty Read Problem, as one transaction reads a dirty value
which has not been committed
• Transaction-X is doing the sum of all balance while transaction-Y is transferring an amount 50
from Account-1 to Account-3.
• Here, transaction-X produces the result of 550 which is incorrect. If we write this produced
result in the database, the database will become an inconsistent state because the actual sum is
600. Here, transaction-X has seen an inconsistent state of the database.
The Unrepeatable Read Problem: -
Another problem that may occur is called unrepeatable read, where a transaction T reads the
same item twice and the item is changed by another transaction T1_ between the two reads.
Hence, T receives different values for its two reads of the same item.
Types of Failures. Failures are generally classified as transaction, system, and media failures.
There are several possible reasons for a transaction to fail in the middle of execution:
1. A computer failure (system crash). A hardware, software, or network error occurs in the
computer system during transaction execution. Hardware crashes are usually media
failures—for example, main memory failure.
2. A transaction or system error. Some operation in the transaction may cause it to fail, such as
integer overflow or division by zero. Transaction failure may also occur because of erroneous
parameter values or because of a logical programming error. Additionally, the user may
interrupt the transaction during its execution.
3. Local errors or exception conditions detected by the transaction. During transaction
execution, certain conditions may occur that necessitate cancellation of the transaction. For
example, data for the transaction may not be found. An exception condition,4 such as
insufficient account balance in a banking database, may cause a transaction, such as a fund
withdrawal, to be canceled. This exception could be programmed in the transaction itself, and
in such a case would not be considered as a transaction failure.
4. Concurrency control enforcement. The concurrency control method may decide to abort a
transaction because it violates serializability, or it may abort one or more transactions to
resolve a state of deadlock among several transactions. Transactions aborted because of
serializability violations or deadlocks are typically restarted automatically at a later time.
5. Disk failure. Some disk blocks may lose their data because of a read or write
malfunction or because of a disk read/write head crash. This may happen during a read or a
write operation of the transaction.
6. Physical problems and catastrophes. This refers to an endless list of problems that includes
power or air-conditioning failure, fire, theft, sabotage, overwriting disks or tapes by mistake,
and mounting of a wrong tape by the operator.
Failures of types 1, 2, 3, and 4 are more common than those of types 5 or 6. Whenever a failure of
type 1 through 4 occurs, the system must keep sufficient information to quickly recover from the
failure. Disk failure or other catastrophic failures of type 5 or 6 do not happen frequently; if they
do occur, recovery is a major task.
The concept of transaction is fundamental to many techniques for concurrency control and
recovery from failures.
Transaction and System Concepts: -
Transaction States and Additional Operations
A transaction is an atomic unit of work that should either be completed in its entirety or not done
at all. For recovery purposes, the system needs to keep track of when each transaction starts,
terminates, and commits or aborts. Therefore, the recovery manager of the DBMS needs to keep
track of the following operations:
• BEGIN_TRANSACTION. This marks the beginning of transaction execution.
• READ or WRITE. These specify read or write operations on the database items that are
executed as part of a transaction.
• END_TRANSACTION. This specifies that READ and WRITE transaction operations have ended
and marks the end of transaction execution. However, at this point it may be necessary to check
whether the changes introduced by the transaction can be permanently applied to the database
(committed) or whether the transaction has to be aborted because it violates serializability or
for some other reason.
• COMMIT_TRANSACTION. This signals a successful end of the transaction so that any changes
(updates) executed by the transaction can be safely committed to the database and will not be
undone.
• ROLLBACK (or ABORT). This signals that the transaction has ended unsuccessfully, so that
any changes or effects that the transaction may have applied to the database must be undone.
The above figure shows a state transition diagram that illustrates how a transaction moves
through its execution states. A transaction goes into an active state immediately after it starts
execution, where it can execute its READ and WRITE operations. When the transaction ends, it
moves to the partially committed state. At this point, some recovery protocols need to ensure
that a system failure will not result in an inability to record the changes of the transaction
permanently. Once this check is successful, the transaction is said to have reached its commit
point and enters the committed state. When a transaction is committed, it has concluded its
execution successfully and all its changes must be recorded permanently in the database, even if a
system failure occurs.
However, a transaction can go to the failed state if one of the checks fails or if the transaction is
aborted during its active state. The transaction may then have to be rolled back to undo the effect
of its WRITE operations on the database. The terminated state corresponds to the transaction
leaving the system. The transaction information that is maintained in system tables while the
transaction has been running is removed when the transaction terminates. Failed or aborted
transactions may be restarted later—either automatically or after being resubmitted by the
user—as brand new transactions.
Protocols for recovery that avoid cascading rollbacks —which include nearly all practical
protocols—do not require that READ operations are written to the system log. Additionally, some
recovery protocols require simpler WRITE entries only include one of new_value and old_value
instead of including both.
Notice that we are assuming that all permanent changes to the database occur within transactions,
so the notion of recovery from a transaction failure amounts to either undoing or redoing
transaction operations individually from the log. If the system crashes, we can recover to a
consistent database state by examining the log. Because the log contains a record of every WRITE
operation that changes the value of some database item, it is possible to undo the effect of these
WRITE operations of a transaction T by tracing backward through the log and resetting all items
changed by a WRITE operation of T to their old_values. Redo of an operation may also be
necessary if a transaction has its updates recorded in the log but a failure occurs before the system
can be sure that all these new_values have been written to the actual database on disk from the
main memory buffers.
T1 T2
Read(A) Read(B)
A:= A-100 Y:= Y+100
Write(A) Write(B)
The total amount must be maintained before or after the transaction.
Total before T occurs = 600+300=900
Total after T occurs= 500+400=900
Therefore, the database is consistent. In the case when T1 is completed but T2 fails, then
inconsistency will occur.
The isolation property is enforced by the concurrency control subsystem of the DBMS. If
every transaction does not make its updates (write operations) visible to other transactions until
it is committed, one form of isolation is enforced that solves the temporary update problem and
eliminates cascading rollbacks but does not eliminate all other problems. There have been
attempts to define the level of isolation of a transaction. A transaction is said to have level 0
(zero) isolation if it does not overwrite the dirty reads of higher-level transactions. Level 1 (one)
isolation has no lost updates, and level 2 isolation has no lost updates and no dirty reads. Finally,
level 3 isolation (also called true isolation) has, in addition to level 2 properties, repeatable reads.
And last, the durability property is the responsibility of the recovery subsystem of the DBMS.
SERIALIZABILITY
When multiple transactions are running concurrently then there is a possibility that the database
may be left in an inconsistent state. Serializability is a concept that helps us to check
which schedules are Serializable. A Serializable schedule is the one that always leaves the
database in consistent state.
• Conflict Serializability: -
Conflict Serializability is one of the type of Serializability, which can be used to check whether
a non-serial schedule is conflict serializable or not.
Conflicting operations: -
Two operations are said to be in conflict, if they satisfy all the following three conditions:
• Both the operations should belong to different transactions.
• Both the operations are working on same data item.
• At least one of the operation is a write operation.
Some examples to understand this
Example 1: -
Operation W(X) of transaction T1 and operation R(X) of transaction T2 are conflicting
operations, because they satisfy all the three conditions mentioned above. They belong to
different transactions; they are working on same data item X, one of the operation in write
operation.
Example 2:
Similarly Operations W(X) of T1 and W(X) of T2 are conflicting operations.
Example 3:
Operations W(X) of T1 and W(Y) of T2 are non-conflicting operations because both the
write operations are not working on same data item so these operations don’t satisfy the
second condition.
Example 4:
Similarly R(X) of T1 and R(X) of T2 are non-conflicting operations because none of them is
write operation.
Example 5:
Similarly W(X) of T1 and R(X) of T1 are non-conflicting operations because both the
operations belong to same transaction T1.
We finally got a serial schedule after swapping all the non-conflicting operations so we can say
that the given schedule is Conflict Serializable.
• View Serializability: -
View Serializability is a process to find out that a given schedule is view Serializable or not. To
check whether a given schedule is view Serializable, we need to check whether the given
schedule is View Equivalent to its serial schedule. Let’s take an example to understand what I
mean by that.
Given Schedule: -
T1 T2
----- ------
R(X)
W(X)
R(X)
W(X)
R(Y)
W(Y)
R(Y)
W(Y)
If we can prove that the given schedule is View Equivalent to its serial schedule then the given
schedule is called view Serializable
View Equivalent: -
Two schedules T1 and T2 are said to be view equivalent, if they satisfy all the following
conditions:
• Initial Read: -
Initial read of each data item in transactions must match in both schedules. For example, if
transaction T1 reads a data item X before transaction T2 in schedule S1 then in schedule
S2, T1 should read X before T2.
Read vs Initial Read: You may be confused by the term initial read. Here initial read
means the first read operation on a data item, for example, a data item X can be read
multiple times in a schedule but the first read operation on X is called the initial read. This
will be more clear once we will get to the example in the next section of this same article.
• Final Write: -
Final write operations on each data item must match in both the schedules. For example, a
data item X is last written by Transaction T1 in schedule S1 then in S2, the last write
operation on X should be performed by the transaction T1.
• Update Read: -:
If in schedule S1, the transaction T2 is reading a data item updated by T1 then in schedule
S2, T2 should read the value after the write operation of T1 on same data item. For
example, In schedule S1, T2 performs a read operation on X after the write operation on X
by T1 then in S2, T2 should read the X after T1 performs write on X.
View Serializable Example: -
Table 21.1 summarizes the possible violations for the different isolation levels. An entry of Yes
indicates that a violation is possible and an entry of No indicates that it is not possible. READ
UNCOMMITTED is the most forgiving, and SERIALIZABLE is the most restrictive in that it avoids
all three of the problems mentioned above.
Here,
• At time t2, transaction-X reads A's value.
• At time t3, Transaction-Y reads A's value.
• At time t4, Transactions-X writes A's value on the basis of the value seen at time t2.
• At time t5, Transactions-Y writes A's value on the basis of the value seen at time t3.
• So, at time T5, the update of Transaction-X is lost because Transaction y overwrites it without
looking at its current value.
• Such type of problem is known as Lost Update Problem as update made by one transaction is
lost here.
The Temporary Update (or Dirty Read) Problem: - The dirty read occurs in the case when one
transaction updates an item of the database, and then the transaction fails for some reason. The
updated database item is accessed by another transaction before it is changed back to the original
value. A transaction T1 updates a record which is read by T2. If T1 aborts then T2 now has values
which have never formed part of the stable database.
Example: -
Here,
• At time t2, transaction-Y writes A's value.
• At time t3, Transaction-X reads A's value.
• At time t4, Transactions-Y rollbacks. So, it changes A's value back to that of prior to t1.
• So, Transaction-X now contains a value which has never become part of the stable database.
• Such type of problem is known as Dirty Read Problem, as one transaction reads a dirty value
which has not been committed
• Exclusive Lock (X): - Data item can be both read as well as written. This is Exclusive and
cannot be held simultaneously on the same data item. X-lock is requested using lock-X
instruction.
If a resource is already locked by another transaction, then a new lock request can be
granted only if the mode of the requested lock is compatible with the mode of the existing lock.
Any number of transactions can hold shared locks on an item, but if any transaction holds an
exclusive lock on item, no other transaction may hold any lock on the item.
Upgrade / Downgrade locks: - A transaction that holds a lock on an item A is allowed under
certain condition to change the lock state from one state to another.
Upgrade: - A S(A) can be upgraded to X(A) if Ti is the only transaction holding the S-lock on
element A.
Downgrade: - We may downgrade X(A) to S(A) when we feel that we no longer want to write on
data-item A. As we were holding X-lock on A, we need not check any conditions.
Applying simple locking, we may not always produce Serializable results; it may lead to
Deadlock Inconsistency.
Problem with Simple Locking: -
Consider the Partial Schedule:
T1 T2
1 lock-X(B)
2 read(B)
3 B:=B-50
4 write(B)
5 lock-S(A)
6 read(A)
7 lock-S(B)
8 lock-X(A)
9 …… ……
Deadlock: - Deadlock refers to a specific situation where two or more processes are waiting for
each other to release a resource or more than two processes are waiting for the resource in a
circular chain. Consider the above execution phase. Now, T1 holds an Exclusive lock over B,
and T2 holds a Shared lock over A. Consider Statement 7, T2 requests for lock on B, while in
Statement 8 T1 requests lock on A. This as you may notice imposes a Deadlock as none can
proceed with their execution.
Starvation: - Starvation is also possible if concurrency control manager is badly designed.
Starvation is the situation when a transaction needs to wait for an indefinite period to acquire a
lock. Following are the reasons for Starvation:
• When waiting scheme for locked items is not properly managed.
• In the case of resource leak.
• The same transaction is selected as a victim repeatedly.
For example: - A transaction may be waiting for an X-lock on an item, while a sequence of other
transactions request and are granted an S-lock on the same item. This may be avoided if the
concurrency control manager is properly designed.
The Two-Phase Locking protocol allows each transaction to make a lock or unlock request in
two steps:
• Growing Phase: - In this phase transaction may obtain locks but may not release any locks.
• Shrinking Phase: In this phase, a transaction may release locks but not obtain any new lock.
It is true that the 2PL protocol offers Serializability. However, it does not ensure that
deadlocks do not happen.
In the below example, if lock conversion is allowed then the following phase can happen:
• Upgrading of lock (from S(a) to X (a)) is allowed in growing phase.
• Downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.
Example: -
T1 T2
1 LOCK – S(A)
2 LOCK – S(A)
3 LOCK – X(B)
4 ------ ------
5 UNLOCK(A)
6 LOCK – X(C)
7 UNLOCK(B)
8 UNLOCK(A)
9 UNLOCK(C)
10 ------ ------
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
• Growing phase: from step 1-3
• Shrinking phase: from step 5-7
• Lock point: at 3
Transaction T2:
• Growing phase: from step 2-6
• Shrinking phase: from step 8-9
• Lock point: at 6
Lock Point is the point at which the growing phase ends, i.e., when transaction takes the final lock
it needs to carry on its work.
2-PL ensures Serializability, but there are still some drawbacks of 2-PL.
• Cascading Rollback is possible under 2-PL.
• Deadlocks and Starvation is possible.
Deadlock in 2-PL: -
Consider this simple example; it will be easy to understand. Say we have two transactions
T1 and T2.
Schedule: Lock-X1(A) Lock-X2(B) Lock-X1(B) Lock-X2(A)
Drawing the precedence graph, you may detect the loop. So Deadlock is also possible in 2-PL.
Two-phase locking may also limit the amount of concurrency that occurs in a schedule
because a Transaction may not be able to release an item after it has used it. This may be
because of the protocols and other restrictions we may put on the schedule to ensure
Serializability, deadlock freedom and other factors. The 2-PL is called Basic 2PL. To sum it up
it ensures Conflict Serializability but does not prevent Cascading Rollback and Deadlock.
There three other types of 2PL, Strict 2PL, Conservative 2PL and Rigorous 2PL.
Strict Two-Phase Locking Method: - Strict-Two phase locking system is almost similar to 2PL.
The only difference is that Strict-2PL never releases a lock after using it. It holds all the locks until
the commit point and releases all the locks at one go when the process is over.
Rigorous 2PL: - This requires that in addition to the lock being 2 phase all Exclusive(X) and
Shared(S) Locks held by the transaction be released until after the transaction commits.
Conservative 2PL: -
Conservative 2-PL is Deadlock free but it does not ensure Strict schedule. However, it is
difficult to use in practice because of need to predeclare the read-set and the write-set which is
not possible in many situations. In practice, the most popular variation of 2-PL is Strict 2-PL.
The Venn diagram below shows the classification of schedules which are rigorous and
strict. The universe represents the schedules which can be serialized as 2-PL. Now as the diagram
suggests, and it can also be logically concluded, if a schedule is Rigorous then it is Strict. We put a
restriction on a schedule which makes it strict, adding another to the list of restrictions make it
Rigorous.
T1: T2:
Read(A) Read(A)
A:=A-50 Temp:=A*0.1
Write(A) A:=A-Temp
Read(B) Write(A)
B:=B+50 Read(B)
Write(B). B:=B+Temp
Write(B).
Impose a partial ordering ⟶ on the set D = {d1, d2, d3, ….., dn} of all data items.
• If di –> dj then any transaction accessing both di and dj must access di before accessing dj.
• Implies that the set D may now be viewed as a directed acyclic graph (DAG), called a database
graph.
Tree Based Protocols is a simple implementation of Graph Based Protocol
Database Graph: -
The following 4 transactions follow the tree protocol on previous data base graph.
• T1: lock – x(B); lock – x(E); lock – x(D); unlock – x(B); unlock – x(E); lock – x(G); unlock – x(D);
unlock – x(G).
• T2: lock – x(D); lock – x(H); unlock – x(D); unlock – x(H).
• T3: lock – x(B); lock – x(E); unlock – x(E); unlock – x(B).
• T4: lock – x(H); lock – x(J); unlock – x(H); unlock – x(J).
Advantages: -
• The tree protocol ensures conflict Serializability.
• Freedom from deadlock.
• Unlocking may occur earlier in the tree locking protocol than in the two phase locking
protocol.
• Shorter waiting times, and increase in concurrency.
• Protocol is deadlock free, no rollbacks are required.
Disadvantages: -
• In the tree locking protocol, a transaction may have to lock data items that it does not access.
• Increased locking over head, and additional waiting time.
• Potential decrease in concurrency.
In deferred update, the changes are not In immediate update, the changes are
1. applied immediately to the database. applied directly to the database.
The log file contains all the changes that The log file contains both old as well as new
2. are to be applied to the database. values.
In this method once rollback is done all In this method once rollback is done the old
the records of log file are discarded and values are restored into the database using
3. no changes are applied to the database. the records of the log file.
The major disadvantage of this method The major disadvantage of this method is
is that it requires a lot of time for that there are frequent I/O operations
5. recovery in case of system failure. while the transaction is active.
IMPORTANT QUESTIONS: -
1. Explain two multiversion techniques for concurrency control.
2. Discuss the problems of deadlocks and starvation.
3. Discuss two phase locking protocol and strict two phase locking protocols.
4. Explain these terms: conflict-serializable schedule, view- serializable schedule & Shadow
Paging technique.
5. Explain different types of locks that can be applied in brief.
6. Discuss the following terminology in Transaction support in SQL:
i) Access mode
ii) Isolation level
iii) Dirty read
iv) No repeatable read
7. Explain anomalies due to interleaved transactions. Write lock based concurrency control.
8. Write Recovery Techniques Based on Immediate Update operations.
9. Outline how Serializability is guaranteed by Two-Phase Locking.
10. Discuss how schedules are Characterized based on Recoverability & Serializability.
11. Discuss the UNDO and REDO operations and the recovery techniques that use each in
transactions.
12. Discuss the actions taken by the read_item and write_item operations on a database.
13. Explain anomalies due to interleaved transactions.
14. Write about any two Recovery Protocols
15. Discuss various Recovery Techniques Based on Immediate Update and Shadow Paging.
16. Explain the Transaction Support provided by SQL.
17. What type of lock is need for insert and deletes operations?
18. Discuss about serialization by 2 phase locking.