Professional Documents
Culture Documents
Unit X - Database Recovery Techniques
Unit X - Database Recovery Techniques
- 3hrs
1
Recovery Concepts
In database the failure may occur due to inconsistency, network failure,
errors or any accidental damage but the data stored in database must be
available when required.
So, database recovery is restoring the data when it get deleted, hacked or
damaged accidentally to the previous existing condition.
Recovery System in DBMS from Transaction Failure
• In a database recovery management system, there are mainly two recovery
techniques that can help a DBMS in recovering and maintaining the
atomicity of a transaction. Those are as follows
1.Log Based Recovery.
2.Shadow Paging
2
Log Based Recovey
A log is a sequence of records that contains the history of all updates made to the
Database. Log the most commonly used structure for recording database modification.
Some time log record is also known as system log.
Update log has the following fields-
1. Transaction Identifier: To get the Transaction that is executing.
2. Data item Identifier: To get the data item of the Transaction that is running.
3. The old value of the data item (Before the write operation).
4. The new value of the data item (After the write operation).
The various kinds of log records are as shown in the following points. This is the basic
structure of the format of a log record.
<T, Start >. The Transaction has started.
<T, X, V1,V2>. The Transaction has performed write on data. V1 is a value that X will have
value before writing, and V2 is a Value that X will have after the writing operation.
<T, Commit>. The Transaction has been committed.
<T, Abort>. The Transaction has aborted.
3
Log Based Recovey
Following points should be remembered while doing the Log Based Recovery
• Whenever a transaction performs a write, it is essential that the log record for that write is to be
created before the D.B. is modified.
• Once a log record exists, we can output the modification into D.B. if required. Also, we have the
ability to undo the modification that has already been updated in D.B.
Log Based Recovery work in two modes These modes are as follow-
• Immediate Mode
• Deferred Mode
4
Log Based Recovey in immediate update
In immediate update Mode of log-based recovery, database modification is
performed while Transaction is in Active State.
It means as soon as Transaction is performed or executes its WRITE
Operation, then immediately these changes are saved in Database
also.
In immediate Update Mode, there is no need to wait for the execution of the
commit statement to update the Database.
Explanation
• Consider the transition T1 as shown in the above table. The log of this
Transaction is written in the second column. So when the value of data
items A and B are changed from 1000 to 950 and 1050 respectively at that
time, the value of A and B will also be Update in the Database.
5
Log Based Recovey in immediate update
• In the case of Immediate Mode, we Need both Old value and New value of the Data Item in
the Log File.
• Now, if the system is crashed or failed in the following cases may be possible.
Case 1: If the system crashes after Transaction executing the Commit statement.
• In this case, when Transaction executed commit statement, then corresponding
commit entry will also be made to the Log file immediately.
• To recover, the database recovery manager will check the log file to recover the Database,
then the recovery manager will find both <T, Start > and < T, Commit> in the Log file then it
represents that Transaction T has been completed successfully before the system failed
so REDO(T) operation will be performed and Updated values of Data Item A and B will be
set in the database.
6
Log Based Recovery in immediate update
7
Log Based Recovery in Defferred Mode
In the Deferred Mode of Log-based recovery method, all modifications to Database are
recorded but WRITE Operation is deferred until the Transaction is partially committed.
It means In the case of Deferred mode, Database is modified after Commit operation of
Transaction is performed.
For database Recovery in Deferred Mode, there may be two possible cases.
Case 1: If the system fails or crashes after Transaction performed the commit operation. In this
situation, since the Transaction has performed the commit operation successfully so there
will be an entry for the commit statement in the Log file of the Transaction.
So after System Failure, when the recovery manager will recover the Database, then he will
check the log file, and the recovery manager will find both <T, Start> and <T, Commit> It
means Transaction has been completed successfully before the system crash so in this
situation REDO(T) operation will be performed and Updated value of Data item A and B will
be set in Database.
8
Log Based Recovery in Defferred Mode
Case 2: If Transaction failed before executing the Commit, it means there is no commit
statement in Transaction as shown in the table given below, then there will be no entry for
Commit in the log file.
• So, in this case, when the system will fail or crash, then the recovery manager will check
the Log file, and he/she will find the < T, Start> entry in the Log file but not find the < T,
Commit> entry. It means before system failure, Transaction was not completed
successfully, so to ensure the atomicity property, the recovery manager will set the old
value of data items A and B.
Note – In this case of Deferred Mode, there is no need to Perform UNDO (T). Update
values of data item not written to Database immediately after the WRITE operation.
• In deferred modes, updated values will be written only after the Transaction commit.
So, in this case, there is an old value of the data item in the Database.
9
Shadow Paging Recovery Method
It is a commonly used method for database recovery systems in DBMS. It requires less disk
access than do-log methods.
Here the D.B. is partitioned into some number of fixed-length blocks known as pages, and it
maintains two-page tables during the life cycle of Transaction.
10
Shadow Paging Recovery Method
• Here each entry contains a pointer to a certain block on the disk.
The key idea is to maintain two-page tables during the
transaction-1) Current page table 2) Shadow page table.
• When the Transaction starts, both the pages are identical. But
during the Transaction, the current page table makes all the
changes while the shadow page table remains as it was before.
On the shadow page, the instructions of the Transaction are
stored.
11
Checkpoints Recovery Methods in DBMS
A checkpoint is another recovery technique used in database recovery management in DBMS. In
this technique, checkpoint operation is performed periodically that copies log information onto
stable storage (volatile to stable storage). The information and operations performed at each
checkpoint consists of the following-
The Start of the checkpoint and the time and date of the checkpoint is written to the log, and it’s
done on a stable storage device.
All log data from the buffers within the computer memory is copied to the log on the stable
storage.
The databases are updated from the buffers that are in the volatile storage that are then moved to
the physical Database.
An end of checkpoint record is written, and the address of the checkpoint record is saved on a file
accessible to the recovery routine on start-up after a system crash.
The frequency of check pointing is a design consideration of the recovery system. Following are
the options-
– The fixed interval of time.
– Transaction consistent checkpoint.
– Action-consistent checkpoint.
– Transaction oriented checkpoint
12
Difference between Deferred update and immediate update
• Deferred update – This technique does not physically update the database on disk until a
transaction has reached its commit point. Before reaching commit, all transaction updates
are recorded in the local transaction workspace. If a transaction fails before reaching its
commit point, it will not have changed the database in any way so UNDO is not needed. It
may be necessary to REDO the effect of the operations that are recorded in the local
transaction workspace, because their effect may not yet have been written in the database.
Hence, a deferred update is also known as the No-undo/redo algorithm
• Immediate update – In the immediate update, the database may be updated by some
operations of a transaction before the transaction reaches its commit point. However, these
operations are recorded in a log on disk before they are applied to the database, making
recovery still possible. If a transaction fails to reach its commit point, the effect of its
operation must be undone i.e. the transaction must be rolled back hence we require both
undo and redo. This technique is known as undo/redo algorithm.
13
Database backup and recovery from catastropic failures
15
Executing read_item(X) and Write_item(X)
16
Executing read_item(X) and Write_item(X)
17
Database Buffer
18
States of Transactions / Transaction Model
A transaction in a database can be in one of the following states −
1) Active − In this state, the transaction is being executed. This is the initial state of
every transaction.
2) Partially Committed − When a transaction executes its final operation, it is said to
be in a partially committed state.
19
States of Transactions
3) Failed − A transaction is said to be in a failed state if any of the checks made by
the database recovery system fails. A failed transaction can no longer proceed
further.
4) Aborted − If any of the checks fails and the transaction has reached a failed state,
then the recovery manager rolls back all its write operations on the database to
bring the database back to its original state where it was prior to the execution of
the transaction. Transactions in this state are called aborted. The database
recovery module can select one of the two operations after a transaction aborts −
- Re-start the transaction
- Kill the transaction
5) Committed − If a transaction executes all its operations successfully, it is said to
be committed. All its effects are now permanently established on the database
system.
20
ACID Properties
A transaction is a very small unit of a program and it may contain several low-level
tasks.
A transaction in a database system must maintain Atomicity, Consistency,
Isolation, and Durability − commonly known as ACID properties − in order to
ensure accuracy, completeness, and data integrity.
1) Atomicity
• This property states that a transaction must be treated as an atomic unit, that is,
either all of its operations are executed or none.
• There must be no state in a database where a transaction is left partially
completed.
• States should be defined either before the execution of the transaction or after the
execution/abortion/failure of the transaction.
21
ACID Properties
1) Atomicity
Consider the following transaction T consisting of T1 and T2: Transfer of 100 from
account X to account Y.
If the transaction fails after completion of T1 but before completion of T2.( say,
after write(X) but before write(Y)), then amount has been deducted from X but not
added to Y.
This results in an inconsistent database state.
Therefore, the transaction must be executed in entirety in order to ensure correctness
of database state.
22
ACID Properties
2) Consistency
•The database must remain in a consistent state after any transaction.
•No transaction should have any adverse effect on the data residing in the database.
•If the database was in a consistent state before the execution of a transaction, it must
remain consistent after the execution of the transaction as well.
•Referring to the example above, The total amount before and after the transaction
must be maintained.
Total before T occurs = 500 + 200 = 700.
Total after T occurs = 400 + 300 = 700.
23
ACID Properties
3) Isolation
•In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions
will be carried out and executed as if it is the only transaction in the system.
•No transaction will affect the existence of any other transaction.
4) Durability
•The database should be durable enough to hold all its latest updates even if the
system fails or restarts.
•If a transaction updates a chunk of data in a database and commits, then the
database will hold the modified data.
•If a transaction commits but the system fails before the data could be written on to
the disk, then that data will be updated once the system springs back into action.
24
Isolation Example
Let X= 500, Y = 500.
Consider two transactions T and T”.
Suppose T has been executed till Read (Y) and then T’’ starts. As a result ,
interleaving of operations takes place due to which T’’ reads correct value of X but
incorrect value of Y and sum computed by
??????
This results in database inconsistency, due to a loss of some units value. Hence,
transactions must take place in isolation and changes should be visible only after a
they have been made to the main memory.
25
26
27
28
Buffer Replacement Policies
The critical choice that the buffer manager must make is what block to throw
out of the buffer pool when a buffer is needed for a newly requested block.
The buffer-replacement strategies commonly used may be familiar to you
from other applications of scheduling policies, such as in operating systems.
Frame is chosen for replacement by a replacement policy:
Least-recently-used (LRU)
Most-recently-used (MRU)
First-In-First-Out (FIFO)
Clock / Circular order
Policy can have big impact on number of I/Os
Depends on the access pattern
29
30
31
32
33
34
35
36
37
38
Serializability
When multiple transactions are running concurrently then there is a possibility that
the database may be left in an inconsistent state.
Serializability is a concept that helps us to check which schedules are serializable.
A serializable schedule is the one that always leaves the database in consistent
state.
40
Example of Conflict Serializability
Lets consider this schedule:
To convert this schedule into a serial schedule we must have to swap the R(A) operation of transaction
T2 with the W(A) operation of transaction T1.
However we cannot swap these two operations because they are conflicting operations, thus we can say
that this given schedule is not Conflict Serializable.
41
Example of Conflict Serializability
Lets take another example:
42
View Serializability
View Serializability is a process to find out that a given schedule is view serializable or not.
Two schedules S1 and S2 are said to be view equal if below conditions are satisfied :
1) Initial Read
If a transaction T1 reading data item A from initial database in S1 then in S2 also T1 should
read A from initial database.
43
View Serializability
2) Updated Read
If Ti is reading A which is updated by Tj in S1 then in S2 also Ti should read A which is
updated by Tj.
44
Concurrency Control
Concurrency control is the procedure in DBMS for managing simultaneous
operations without conflicting with each another.
Concurrent access is quite easy if all users are just reading data. There is no way
they can interfere with one another.
Though for any practical database, would have a mix of reading and WRITE
operations and hence the concurrency is a challenge.
Concurrency control is used to address such conflicts which mostly occur with a
multi-user system.
It helps you to make sure that database transactions are performed concurrently
without violating the data integrity of respective databases.
Therefore, concurrency control is a most important element for the proper
functioning of a system where two or multiple database transactions that require
access to the same data, are executed simultaneously.
45
Why use Concurrency method?
Reasons for using Concurrency control method is DBMS:
46
Concurrency Control Protocols
Different concurrency control protocols offer different benefits between the amount of
concurrency they allow and the amount of overhead that they impose.
•Lock-Based Protocols
•Two Phase
•Timestamp-Based Protocols
•Validation-Based Protocols
47
Lock-Based Protocols
A lock is a data variable which is associated with a data item. This lock signifies
that operations that can be performed on the data item.
Locks help synchronize access to the database items by concurrent
transactions.
All lock requests are made to the concurrency-control manager. Transactions
proceed only once the lock request is granted.
Binary Locks: A Binary lock on a data item can either locked or unlocked states.
Shared/exclusive: This type of locking mechanism separates the locks based on their uses.
If a lock is acquired on a data item to perform a write operation, it is called an exclusive
lock.
48
Lock-Based Protocols
1. Shared Lock (S):
A shared lock is also called a Read-only lock. With the shared lock, the data item
can be shared between transactions.
This is because you will never have permission to update data on the data item.
For example, consider a case where two transactions are reading the account
balance of a person.
The database will let them read by placing a shared lock.
However, if another transaction wants to update that account's balance, shared lock
prevent it until the reading process is over.
49
Lock-Based Protocols
2. Exclusive Lock (X):
With the Exclusive Lock, a data item can be read as well as written.
This is exclusive and can't be held concurrently on the same data item. X-lock is
requested using lock-x instruction.
Transactions may unlock the data item after finishing the 'write' operation.
For example, when a transaction needs to update the account balance of a person.
You can allows this transaction by placing X lock on it.
Therefore, when the second transaction wants to read or write, exclusive lock
prevent this operation.
50
Deadlock Handling
Deadlock refers to a specific situation where two or more processes are waiting for
each other to release a resource or more than two processes are waiting for the
resource in a circular chain.
A deadlock is a condition where two or more transactions are waiting indefinitely for
one another to give up locks.
51
Deadlock Example in DBMS
For example: In the student table, transaction T1 holds a lock on some rows and needs to
update some rows in the grade table.
Simultaneously, transaction T2 holds locks on some rows in the grade table and needs to
update the rows in the Student table held by Transaction T1.
Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and
similarly, transaction T2 is waiting for T1 to release its lock.
All activities come to a halt state and remain at a standstill. It will remain in a standstill until
the DBMS detects the deadlock and aborts one of the transactions.
52
Deadlock Avoidance
When a database is stuck in a deadlock state, then it is better to avoid the database
rather than aborting or restating the database. This is a waste of time and resource.
Deadlock Prevention
• Deadlock prevention method is suitable for a large database. If the resources are
allocated in such a way that deadlock never occurs, then the deadlock can be
prevented.
• The Database management system analyzes the operations of the transaction whether
they can create a deadlock situation or not. If they do, then the DBMS never allowed that
transaction to be executed.
53
Deadlock Detection
In a database, when a transaction waits indefinitely to obtain a lock, then the DBMS should
detect whether the transaction is involved in a deadlock or not.
The lock manager maintains a Wait for the graph to detect the deadlock cycle in the database.
Wait for Graph
• This is the suitable method for deadlock detection. In this method, a graph is created based on
the transaction and their lock. If the created graph has a cycle or closed loop, then there is a
deadlock.
• The wait for the graph is maintained by the system for every transaction which is waiting for
some data held by the others. The system keeps checking the graph if there is any cycle in the
graph.
54
Timestamp-based Protocols
The timestamp-based algorithm uses a timestamp to serialize the execution of concurrent transactions.
This protocol ensures that every conflicting read and write operations are executed in timestamp order.
The protocol uses the System Time or Logical Count as a Timestamp.
The older transaction is always given priority in this method. It uses system time to determine the time
stamp of the transaction. This is the most commonly used concurrency protocol.
Lock-based protocols help you to manage the order between the conflicting transactions when they will
execute. Timestamp-based protocols manage conflicts as soon as an operation is created. Example,
55