Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

DBMS_MODULE_5_NOTES

Single-User Vs. Multi-User System

• A DBMS is single-user if at most one user at a time can use the system
• A DBMS is multi-user if many user can use the system concurrently
• In a multi DBMS, the stored data items are the primary resources that may be accessed
concurrently by user programs, which are constantly retrieving information from and modifying
the database. The execution of a program that accesses or change the contents of the database is
called a transaction.
Q] What is a Transaction?
Q] What is a Transaction processing?
Transaction Processing
• A transaction can be defined as a group of tasks.
• A single task is the minimum processing unit which cannot be divided further.
• Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from A's
account to B's account.
• This very simple and small transaction involves several low-level tasks
A’s Account
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance – 500
A.balance = New_Balance
Close_Account(A)
. B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)

Q] Explain neatly State Transition Diagram for transition?

State Transition Diagram for Transaction


Q] List and explain the properties of a transaction?
ACID Properties
A transaction in a database system must maintain
Atomicity, Consistency, Isolation, and Durability − commonly known as ACID properties − In order to
ensure accuracy, completeness, and data integrity

Atomicity − This property states that a transaction must be treated as an atomic unit, that is, either all of
its operations are executed or none.
There must be no state in a database where a transaction is left partially completed.
States should be defined either before the execution of the transaction or after the
execution/abortion/failure of the transaction.

Consistency − The database must remain in a consistent state after any transaction.
No transaction should have any adverse effect on the data residing in the database.
If the database was in a consistent state before the execution of a transaction, it must remain consistent
after the execution of the transaction as well

Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will be carried out
and executed as if it is the only transaction in the system.
No transaction will affect the existence of any other transaction

Durability − The database should be durable enough to hold all its latest updates even if the system fails
or restarts. If a transaction updates a chunk of data in a database and commits, then the database will
hold the modified data.
If a transaction commits but the system fails before the data could be written on to the disk, then that
data will be updated once the system springs back into action.
Q] Explain concurrency control?
Concurrency control
Concurrency control is the process of managing simultaneous execution of transactions
• (such as queries, updates, inserts, deletes and so on) in a multiprocessing database system
without having them interfere with one another.
• This property of DBMS allows many transactions to access the same database at the same time
without interfering with each other.
• The primary goal of concurrency is to ensure the atomicity of the execution of transactions in a
multi-user database environment.
• Concurrency controls mechanisms attempt to interleave (parallel) READ and WRITE operations of
multiple transactions so that the interleaved execution yields results that are identical to the
results of a serial schedule execution.
Q]Discuss various problems in concurrency control?**
Problems of Concurrency Control :
• When concurrent transactions are executed in an uncontrolled manner, several problems can
occur.
• The concurrency control has the following three main problems:
1. Lost updates.
2. Dirty read (or uncommitted data).
3. Unrepeatable read (or inconsistent retrievals)

Q] Explain Lost Update Problem in Concurrency control?


Lost Update Problem :

• A lost update problem occurs when two transactions that access the same database items have
their operations in a way that makes the value of some database item incorrect.
• In other words, if transactions T1 and T2 both read a record and then update it, the effects of the
first update will be overwritten by the second update

Example:
Consider the situation given in figure that shows operations performed by two transactions,
Transaction- A and Transaction- B with respect to time.
Transaction- A Time Transaction- B

----- t0 ----

Read X t1 ----
---- t2 Read X

Update X t3 ----

---- t4 Update X

---- t5 ----

• At time t1 , Transactions-A reads value of X.


At time t2 , Transactions-B reads value of X.
At time t3,Transactions-A writes value of X on the basis of the value seen at time t1.
• At time t4,Transactions-B writes value of X on the basis of the value seen at time t2.
• So,update of Transactions-A is lost at time t4,because Transactions-B overwrites it without looking
at its current value.
• Such type of problem is referred as the Update Lost Problem, as update made by one transaction
is lost here.

Q] Explain Dirty read Problem in Concurrency control?

Dirty Read Problem :

• A dirty read problem occurs when one transaction updates a database item and then the
transaction fails for some reason.
• The updated database item is accessed by another transaction before it is changed back to the
original value.
• In other words, a transaction T1 updates a record, which is read by the transaction T2.
• Then T1 aborts and T2 now has values which have never formed part of the stable database

Example:
Consider the situation given in figure
Transaction- A Time Transaction- B

---- t0 Read X

---- t1 Update X
Read X t2 ----

---- t3 Rollback

---- t4 ----

• At time t1 , Transactions-B writes value of X.


At time t2 , Transactions-A reads value of X.
At time t3 , Transactions-B rollbacks.
• So, it changes the value of X back to that of prior to t1.
So, Transaction- A now has value which has never become part of the stable database.
• Such type of problem is referred as the Dirty Read Problem, as one transaction reads a dirty value
which has not been committed.

Q] Explain Inconsistent Retrievals Problem or Incorrect Summary problem?

Inconsistent Retrievals Problem or Incorrect Summary problem :


Unrepeatable read (or inconsistent retrievals) occurs when a transaction calculates some
summary (aggregate) function over a set of data while other transactions are updating the data.
The problem is that the transaction might read some data before they are changed and other data
after they are changed, there by yielding inconsistent results.
In an unrepeatable read, the transaction T1 reads a record and then does some other processing
during which the transaction T2 updates the record. Now, if T1 rereads the record, the new value
will be inconsistent with the previous value

• Example:
• Consider the situation given in figure that shows two transactions operating on three accounts :
Account-1 Account-2 Account-3
Balance = 200 Balance = 250 Balance = 150

Transaction- A Time Transaction- B

----- t0 ----

Read Balance of Acc-1 t1 ----


sum <-- 200
Read Balance of Acc-2
Sum <-- Sum + 250 = 450 t2 ----

---- t3 Read Balance of Acc-3

---- t4 Update Balance of Acc-3


150 --> 150 - 50 --> 100

---- t5 Read Balance of Acc-1

---- t6 Update Balance of Acc-1


200 --> 200 + 50 --> 250

---- t7 COMMIT
Read Balance of Acc-3

Sum <-- Sum + 250 = 550 t8 ----

Transaction-A is summing all balances;


while, Transaction-B is transferring an amount 50 from Account-3 to Account-1.
Here, the result produced by Transaction-A is 550,which is incorrect.
if this result is written in database, database will be in inconsistent state, as actual sum is 600.
Here, Transaction-A has seen an inconsistent state of database, and has performed inconsistent
analysis
where multiple transactions can be executed simultaneously, it is highly important to control the
concurrency of transactions.
We have concurrency control protocols to ensure atomicity, isolation, and serializability of
concurrent transactions.
Q] Explain the two concurrency control protocols ?
Concurrency control protocols can be broadly divided into two categories −
• Lock based protocols
• Time stamp based protocols
Q]What is granularity of locks?Explain?
Lock Granularity :
• A database is basically represented as a collection of named data items.
• The size of the data item chosen as the unit of protection by a concurrency control program
is called GRANULARITY.
• Locking can take place at the following level :
1. Database level.
2. Table level.
3. Row (Tuple) level.
4. Attributes (fields) level

Database level Locking :


At database level locking, the entire database is locked. Thus, it prevents the use of any tables in
the database by transaction T2while transaction T1 is being executed.
Table level Locking :
At table level locking, the entire table is locked. Thus, it prevents the access to any row (tuple) by
transaction T2 while transaction T1 is using the table. if a transaction requires access to several
tables, each table may be locked. However, two transactions can access the same database as
long as they access different tables. Table level locking is less restrictive than database
level. Table level locks are not suitable for multi-user DBMS
Row (Tuple) level Locking :
At row level locking, particular row (or tuple) is locked. A lock exists for each row in each table
of the database.
The DBMS allows concurrent transactions to access different rows of the same table,
Attributes (fields) level Locking :
At attribute level locking, particular attribute (or field) is locked.
Attribute level locking allows concurrent transactions to access the same row, as long as they
require the use of different attributes within the row.

Lock-based Protocols
• Database systems equipped with lock-based protocols use a mechanism by which any transaction
cannot read or write data until it acquires an appropriate lock on it.
Q] Explain the two kinds of locks with respect to concurrency control?
Locks are of two kinds −
A] Binary Locks − A lock on a data item can be in two states; it is either locked or unlocked.
B] Shared / Exclusive Locking
Shared lock :
These locks are referred as read locks, and denoted by 'S'.
If a transaction T has obtained Shared-lock on data item X, then T can read X, but cannot write X.
Multiple Shared lock can be placed simultaneously on a data item.
Exclusive lock :
These Locks are referred as Write locks, and denoted by 'X'.
If a transaction T has obtained Exclusive lock on data item X, then T can be read as well as write X.
Only one Exclusive lock can be placed on a data item at a time.
This means multiple transactions does not modify the same data simultaneously
Q]Explain two phase locking protocol(2PL) ?
Two-Phase Locking (2PL) :
Two-phase locking (also called 2PL) is a method or a protocol of controlling concurrent processing
in which all locking operations precede the first unlocking operation.
Thus, a transaction is said to follow the two-phase locking protocol if all locking operations (such
as read_Lock, write_Lock) precede the first unlock operation in the transaction.
The essential discipline is that after a transaction has released a lock it may not obtain any further
locks.

2PL has the following two phases:


• A growing phase, in which a transaction acquires all the required locks without unlocking any
data.
Once all locks have been acquired, the transaction is in its locked
• A shrinking phase, in which a transaction releases all locks and cannot obtain any new lock

Time Transaction Remarks

t0 Lock - X (A) acquire Exclusive lock on A.

t1 Read A read original value of A

t2 A = A - 100 subtract 100 from A

t3 Write A write new value of A

t4 Lock - X (B) acquire Exclusive lock on B.

t5 Read B read original value of B

t6 B = B + 100 add 100 to B

t7 Write B write new value of B

t8 Unlock (A) release lock on A


t9 Unock (B) release lock on B

Q]Explain TimestampBased protocol(2PL) ?


Timestamp Based Protocol
• The most commonly used concurrency protocol is the timestamp based protocol.
• This protocol uses either system time or logical counter as a timestamp.
• Every transaction has a timestamp associated with it and the ordering is determined by the age of
the transaction.
• A transaction created at 0002 clock time would be older than all other transactions that come
after it.
• For example, any transaction 'y' entering the system at 0004 is two seconds younger and the
priority would be given to the older one.
Timestamp Ordering Protocol
• The timestamp-ordering protocol ensures serializability among transactions in their conflicting
read and write operations.
• This is the responsibility of the protocol system that the conflicting pair of tasks should be
executed according to the timestamp values of the transactions.
• The timestamp of transaction Ti is denoted as TS(Ti).
• Read time-stamp of data-item X is denoted by RTS(X).
• Write time-stamp of data-item X is denoted by WTS(X).

Timestamp ordering protocol works as follows −


• If a transaction Ti issues a read(X) operation −
– If TS(Ti) < WTS(X)
• Operation rejected.
– If TS(Ti) >= WTS(X)
• Operation executed.
– All data-item timestamps updated.
• If a transaction Ti issues a write(X) operation −
– If TS(Ti) < RTS(X)
• Operation rejected.
– If TS(Ti) < WTS(X)
• Operation rejected and Ti rolled back.
– Otherwise, operation executed.
Q] Define Schedule, good schedule, bad schedule?

Schedule
A chronological execution sequence of a transaction is called a schedule.
A schedule can have many transactions in it, each comprising of a number of instructions/tasks
Operations
read(Q,q)
read the value of the database item Q and store in the local variable q.
write(Q,q)
write the value of the database item Q and store in the local variable q. other operations such as
arithmetic
commit
rollback

Example: A “Good” Schedule


One possible schedule, initially X = 10, Y=10

Resulting Database: X=21,Y=21, X=Y


Example: A “Bad” Schedule
Another possible schedule
Resulting Database X=22,Y=21, X=Y

In the previous schedule, the result was “incorrect.”


• What do we mean by correctness?
Definition of Correctness: The concurrent execution of a set of transactions is said to be correct if it is
“equivalent” to some serial execution of the those transactions.
Several notions of equivalence, e.g., result equivalent
Serial means one after another
• For transactions T1 and T2 there are only two serial schedules.
• T1, T2
• T2, T1

What is serializability? Explain?
Serializability
When multiple transactions are being executed by the operating system in a multiprogramming
environment, there are possibilities that instructions of one transactions are interleaved with some other
transaction
No guarantee that results of all serial executions of a given set of transactions will be identical

Serial Schedule − It is a schedule in which transactions are aligned in such a way that one transaction is
executed first.
When the first transaction completes its cycle, then the next transaction is executed. Transactions are
ordered one after the other. This type of schedule is called a serial schedule, as transactions are executed
in a serial manner
Nonserial Schedule- Objective of serializability is to find nonserial schedules that allow transactions to
execute concurrently without interfering with one another.
In other words, want to find nonserial schedules that are equivalent to some serial schedule. Such a
schedule is called serializable.

Serializable
• Definition: A schedule is said to be serializable if the result of executing that schedule is the same
as the result of executing some serial schedule.
– A schedule, S, is serializable if S produces the same results as either
– T1, T2
– T2, T1
• A simplifying assumption: Operations in a schedule are
– read X
– write X
– commit
– abort

Q]write a short note on Multiversion Concurrency Control Technique?

Multiversion Concurrency Control Technique


This concurrency control technique keeps the old values of a data item when the item is updated. These
are known as multiversion concurrency control, because several versions (values) of an item are
maintained.

When a transaction requires access to an item, an appropriate version is chosen to maintain the
serializability of the currently executing schedule, if possible. The idea is that some read operations that
would be rejected in other techniques can still be accepted by reading an older version of the item to
maintain serializability. When a transaction writes an item, it writes a new version and the old version of
the item is retained. Some multiversion concurrency control algorithms use the concept of view
serializability rather than conflict serializability.

An obvious drawback of multiversion techniques is that more storage is needed to maintain multiple
versions of the database items. However, older versions may have to be maintained anyway—for
example, for recovery purposes. In addition, some database applications require older versions to be kept
to maintain a history of the evolution of data item values. The extreme case is a temporal database,
which keeps track of all changes and the times at which they occurred. In such cases, there is no
additional storage penalty for multiversion techniques, since older versions are already maintained.

Multiversion Technique Based on Timestamp Ordering

In this method, several versions , , ..., of each data item X are maintained. For each version, the value of
version and the following two timestamps are kept:

1. read_TS: The read timestamp of is the largest of all the timestamps of transactions that have
successfully read version .

2. write_TS: The write timestamp of is the timestamp of the transaction that wrote the value of version .

Whenever a transaction T is allowed to execute a write_item(X) operation, a new version of item X is


created, with both the write_TS and the read_TS set to TS(T). Correspondingly, when a transaction T is
allowed to read the value of version Xi, the value of read_TS() is set to the larger of the current read_TS()
and TS(T).
To ensure serializability, the following two rules are used:

1. If transaction T issues a write_item(X) operation, and version i of X has the highest write_TS() of all
versions of X that is also less than or equal to TS(T), and read_TS() > TS(T), then abort and roll back
transaction T; otherwise, create a new version of X with read_TS() = write_TS() = TS(T).

2. If transaction T issues a read_item(X) operation, find the version i of X that has the highest write_TS() of
all versions of X that is also less than or equal to TS(T); then return the value of to transaction T, and set
the value of read_TS() to the larger of TS(T) and the current read_TS().

As we can see in case 2, a read_item(X) is always successful, since it finds the appropriate version to read
based on the write_TS of the various existing versions of X. In case 1, however, transaction T may be
aborted and rolled back. This happens if T is attempting to write a version of X that should have been read
by another transaction T whose timestamp is read_TS(); however, T has already read version Xi, which
was written by the transaction with timestamp equal to write_TS(). If this conflict occurs, T is rolled back;
otherwise, a new version of X, written by transaction T, is created. Notice that, if T is rolled back,
cascading rollback may occur. Hence, to ensure recoverability, a transaction T should not be allowed to
commit until after all the transactions that have written some version that T has read have committed.

Multiversion Two-Phase Locking Using Certify Locks

In this multiple-mode locking scheme, there are three locking modes for an item: read, write, and certify,
instead of just the two modes (read, write). Hence, the state of LOCK(X) for an item X can be one of read-
locked, write-locked, certify-locked, or unlocked.

In the standard locking scheme, once a transaction obtains a write lock on an item, no other transactions
can access that item. The idea behind multiversion 2PL is to allow other transactions T to read an
item Xwhile a single transaction T holds a write lock on X. This is accomplished by allowing two
versions for each item X; one version must always have been written by some committed transaction. The
second version X is created when a transaction T acquires a write lock on the item. Other transactions can
continue to read the committed version of X while T holds the write lock. Transaction T can write the
value of X as needed, without affecting the value of the committed version X. However, once T is ready to
commit, it must obtain a certify lock on all items that it currently holds write locks on before it can
commit. The certify lock is not compatible with read locks, so the transaction may have to delay its
commit until all its write-locked items are released by any reading transactions in order to obtain the
certify locks. Once the certify locks—which are exclusive locks—are acquired, the committed version X of
the data item is set to the value of version X, version X is discarded, and the certify locks are then
released. In this multiversion 2PL scheme, reads can proceed concurrently with a single write operation—
an arrangement not permitted under the standard 2PL schemes.

Q] Write a short note on shadow paging ?

Shadow paging

Shadow paging is an alternative to log-based recovery techniques, which has both advantages and
disadvantages.The idea is to maintain two page tables during the life of a transaction: the current page
table and the shadow page table. When the transaction starts, both tables are identical. The shadow page
is never changed during the life of the transaction. The current page is updated with each write
operation.

Each table entry points to a page on the disk. When the transaction is committed, the shadow page entry
becomes a copy of the current page table entry and the disk block with the old data is released. If the
shadow is stored in nonvolatile memory and a system crash occurs, then the shadow page table is copied
to the current page table. This guarantees that the shadow page table will point to the database pages
corresponding to the state of the database prior to any transaction that was active at the time of the
crash, making aborts automatic.
There are drawbacks to the shadow-page technique:

1. Commit overhead. The commit of a single transaction using shadow paging requires multiple
blocks to be output -- the current page table, the actual data and the disk address of the current
page table. Log-based schemes need to output only the log records.
2. Data fragmentation. Shadow paging causes database pages to change locations (therefore, no
longer contiguous.
3. Garbage collection. Each time that a transaction commits, the database pages containing the old
version of data changed by the transactions must become inaccessible. Such pages are considered
to be garbage since they are not part of the free space and do not contain any usable information.
Periodically it is necessary to find all of the garbage pages and add them to the list of free pages.
This process is called garbage collection and imposes additional overhead and complexity on the
system.

Q] Recovery techniques based on deferred update?

Deferred update
The idea behind deferred update is to defer or postpone any actual updates to the database itself until the transaction
completes its execution successfully and reaches its commit point. During transaction execution, the
updates are recorded only in the log and in the transaction workspace. After the transaction reaches its
commit point and the log is force-written to disk, the updates are recorded in the database itself. If a
transaction fails before reaching its commit point, there is no need to undo any operations, because the
transaction has not affected the database in any way.
The steps involved in the deferred update protocol are as follows:
1. When a transaction starts, write an entry start transaction(T) to the log.
2. When any operation is performed that will change values in the database, write a log entry
write_item(T, x, old_value, new_value).
3. When a transaction is about to commit, write a log record of the form commit(T); write all log
records to disk.
4. Commit the transaction, using the log to write the updates to the database; the writing of data to
disk need not occur immediately.
5. If the transaction aborts, ignore the log records and do not write the changes to disk.
The database is never updated until after the transaction commits, and there is never a need to UNDO
any operations. Hence this technique is known as the NO-UNDO/REDO algorithm. The REDO is needed in
case the system fails after the transaction commits but before all its changes are recorded in the
database. In this case, the transaction operations are redone from the log entries. The protocol and how
different entries are affected can be best summarized as shown:

Q] Recovery techniques based on immediate update?

Immediate update

In the immediate update techniques, the database may be updated by the operations of a transaction
immediately, before the transaction reaches its commit point. However, these operations are typically
recorded in the log on disk by force-writing before they are applied to the database, so that recovery is
possible.
When immediate update is allowed, provisions must be made for undoing the effect of update operations
on the database, because a transaction can fail after it has applied some updates to the database itself.
Hence recovery schemes based on immediate update must include the capability to roll back a
transaction by undoing the effect of its write operations.
1. When a transaction starts, write an entry start_transaction(T) to the log;
2. When any operation is performed that will change values in the database, write a log entry
write_item(T, x, old_value, new_value);
3. Write the log to disk;
4. Once the log record is written, write the update to the database buffers;
5. When convenient, write the database buffers to the disk;
6. When a transaction is about to commit, write a log record of the form commit(T);
7. Write the log to disk.
The protocol and how different entries are affected can be best summarized below:

In general, we can distinguish two main categories of immediate update algorithms. If the recovery
technique ensures that all updates of a transaction are recorded in the database on disk before the
transaction commits, there is never a need to redo any operations of committed transactions. Such an
algorithm is called UNDO/NO-REDO.
On the other hand, if the transaction is allowed to commit before all its changes are written to the
database, we have the UNDO/REDO method, the most general recovery algorithm. This is also the most
complex technique. Recovery activities are summarised below:

Database Backup & Recovery from Catastrophic Failure


Non - Catastrophic Failures

A key assumption has been that the system log is maintained on the disk and is not lost as a result of the failure.
Similarly, the shadow directory must be stored on disk to allow recovery when shadow paging is used. The recovery
techniques we have discussed use the entries in the system log or the shadow directory to recover from failure by
bringing the database back to a consistent state.

Catastrophic Failures

The recovery manager of a DBMS must also be equipped to handle more catastrophic failures such as disk crashes.
The main technique used to handle such crashes is that of database backup. The whole database and the log are
periodically copied onto a cheap storage medium such as magnetic tapes. In case of a catastrophic system failure,
the latest backup copy can be reloaded from the tape to the disk, and the system can be restarted.

To avoid losing all the effects of transactions that have been executed since the last backup, it is customary to back
up the system log at more frequent intervals than full database backup by periodically copying it to magnetic tape.
The system log is usually substantially smaller than the database itself and hence can be backed up more
frequently. Thus users do not lose all transactions they have performed since the last database backup. All
committed transactions recorded in the portion of the system log that has been backed up to tape can have their
effect on the database redone. A new log is started after each database backup.

Hence, to recover from disk failure, the database is first recreated on disk from its latest backup copy on tape.
Following that, the effects of all the committed transactions whose operations have been recorded in the backed-
up copies of the system log are reconstructed.

Recovery in catastrophic failure


A catastrophic failure is one where a stable, secondary storage device gets corrupt. With the storage
device, all the valuable data that is stored inside is lost. We have two different strategies to recover data
from such a catastrophic failure −

 Remote backup &minu; Here a backup copy of the database is stored at a remote location from
where it can be restored in case of a catastrophe.

 Alternatively, database backups can be taken on magnetic tapes and stored at a safer place. This
backup can later be transferred onto a freshly installed database to bring it to the point of
backup.

Grown-up databases are too bulky to be frequently backed up. In such cases, we have techniques where
we can restore a database just by looking at its logs. So, all that we need to do here is to take a backup of
all the logs at frequent intervals of time. The database can be backed up once a week, and the logs being
very small can be backed up every day or as frequently as possible.

Remote Backup
Remote backup provides a sense of security in case the primary location where the database is located
gets destroyed. Remote backup can be offline or real-time or online. In case it is offline, it is maintained
manually

Online backup systems are more real-time and lifesavers for database administrators and investors. An
online backup system is a mechanism where every bit of the real-time data is backed up simultaneously
at two distant places. One of them is directly connected to the system and the other one is kept at a
remote place as backup.

As soon as the primary database storage fails, the backup system senses the failure and switches the
user system to the remote storage. Sometimes this is so instant that the users can’t even realize a
failure.
Crash Recovery
DBMS is a highly complex system with hundreds of transactions being executed every second. The
durability and robustness of a DBMS depends on its complex architecture and its underlying hardware
and system software. If it fails or crashes amid transactions, it is expected that the system would follow
some sort of algorithm or techniques to recover lost data

You might also like