Dbms 4 Notes

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 45

UNIT-4

ACID Properties
A transaction is a very small unit of a program and it may
contain several lowlevel tasks. A transaction in a
database system must maintain Atomicity, Consistency,
Isolation, and Durability − commonly known as ACID
properties − in order to ensure accuracy, completeness, and
data integrity.
 Atomicity − This property states that a transaction must
be treated as an atomic unit, that is, either all of its
operations are executed or none. There must be no state
in a database where a transaction is left partially
completed. States should be defined either before the
execution of the transaction or after the
execution/abortion/failure of the transaction..
• It states that all operations of the transaction take place at once if not,
the transaction is aborted.
• There is no midway, i.e., the transaction cannot occur partially. Each
transaction is treated as one unit and either run to completion or is not
executed at all.
• Atomicity involves the following two operations:
• Abort: If a transaction aborts then all the changes made are not
visible.
• Commit: If a transaction commits then all the changes made are
visible.
• Example: Let's assume that following transaction T consisting of T1
and T2. A consists of Rs 600 and B consists of Rs 300. Transfer Rs
100 from account A to account B.
• T1 T2
• Read(A)

A:= A-100

Write(A)
• Read(B)

Y:= Y+100

Write(B)After completion of the transaction, A consists of Rs 500


and B consists of Rs 400.
• If the transaction T fails after the completion of transaction T1 but
before completion of transaction T2, then the amount will be
deducted from A but not added to B. This shows the inconsistent
database state. In order to ensure correctness of database state, the
transaction must be executed in entirety.

 Consistency − The database must remain in a consistent


state after any transaction. No transaction should have
any adverse effect on the data residing in the database. If
the database was in a consistent state before the execution
of a transaction, it must remain consistent after the
execution of the transaction as well.
 The integrity constraints are maintained so that the database is
consistent before and after the transaction.
 The execution of a transaction will leave a database in either its
prior stable state or a new stable state.
 The consistent property of database states that every transaction
sees a consistent database instance.
 The transaction is used to transform the database from one
consistent state to another consistent state.
 For example: The total amount must be maintained before or after
the transaction.
 Total before T occurs = 600+300=900
 Total after T occurs= 500+400=900
 Therefore, the database is consistent. In the case when T1 is
completed but T2 fails, then inconsistency will occur.

 Durability − The database should be durable enough to


hold all its latest updates even if the system fails or
restarts. If a transaction updates a chunk of data in a
database and commits, then the database will hold the
modified data. If a transaction commits but the system
fails before the data could be written on to the disk, then
that data will be updated once the system springs back
into action.

The durability property is used to indicate the performance of the database's


consistent state. It states that the transaction made the permanent changes.
They cannot be lost by the erroneous operation of a faulty transaction or by
the system failure. When a transaction is completed, then the database
reaches a state known as the consistent state. That consistent state cannot be
lost, even in the event of a system's failure.
The recovery subsystem of the DBMS has the responsibility of Durability
property.

 Isolation − In a database system where more than one


transaction are being executed simultaneously and in
parallel, the property of isolation states that all the
transactions will be carried out and executed as if it is the
only transaction in the system. No transaction will affect
the existence of any other transaction.

It shows that the data which is used at the time of execution of a transaction
cannot be used by the second transaction until the first one is completed.
In isolation, if the transaction T1 is being executed and using the data item
X, then that data item can't be accessed by any other transaction T2 until
the transaction T1 ends.
The concurrency control subsystem of the DBMS enforced the isolation
property.

States of Transactions
A transaction in a database can be in one of the following states

 Active − In this state, the transaction is being executed.
This is the initial state of every transaction.
 Partially Committed − When a transaction executes its
final operation, it is said to be in a partially committed
state.
 Failed − A transaction is said to be in a failed state if any
of the checks made by the database recovery system fails.
A failed transaction can no longer proceed further.
 Aborted − If any of the checks fails and the transaction
has reached a failed state, then the recovery manager rolls
back all its write operations on the database to bring the
database back to its original state where it was prior to
the execution of the transaction. Transactions in this state
are called aborted. The database recovery module can
select one of the two operations after a transaction aborts

o Re-start the transaction
o Kill the transaction
 Committed − If a transaction executes all its operations
successfully, it is said to be committed. All its effects are
now permanently established on the database system.
Schedule
 A series of operation from one transaction to another transaction is
known as schedule. It is used to preserve the order of the
operation in each of the individual transaction.
1. Serial Schedule
The serial schedule is a type of schedule where one transaction is
executed completely before starting another transaction. In the
serial schedule, when the first transaction completes its cycle, then
the next transaction is executed.
For example: Suppose there are two transactions T1 and T2 which
have some operations. If it has no interleaving of operations, then
there are the following two possible outcomes:
Execute all the operations of T1 which was followed by all the
operations of T2.
In the given (a) figure, Schedule A shows the serial schedule where
T1 followed by T2.
In the given (b) figure, Schedule B shows the serial schedule where
T2 followed by T1.

2. Non-serial Schedule
If interleaving of operations is allowed, then there will be non-
serial schedule.
It contains many possible orders in which the system can execute
the individual operations of the transactions.
In the given figure (c) and (d), Schedule C and Schedule D are the
non-serial schedules. It has interleaving of operations.

3. Serializable schedule
The serializability of schedules is used to find non-serial schedules
that allow the transaction to execute concurrently without
interfering with one another.
It identifies which schedules are correct when executions of the
transaction have interleaving of their operations.
A non-serial schedule will be serializable if its result is equal to the
result of its transactions executed serially.
Testing of Serializability
Serialization Graph is used to test the Serializability of a schedule.
Assume a schedule S. For S, we construct a graph known as precedence
graph.
This graph has a pair G = (V, E), where V consists a set of vertices, and E
consists a set of edges.
The set of vertices is used to contain all the transactions participating in the
schedule.
The set of edges is used to contain all edges Ti ->Tj for which one of the
three conditions holds:
Create a node Ti → Tj if Ti executes write (Q) before Tj executes read (Q).
Create a node Ti → Tj if Ti executes read (Q) before Tj executes write (Q).
Create a node Ti → Tj if Ti executes write (Q) before Tj executes write (Q).
If a precedence graph contains a single edge Ti → Tj, then all the instructions
of Ti are executed before the first instruction of Tj is executed.
If a precedence graph for schedule S contains a cycle, then S is non-
serializable. If the precedence graph has no cycle, then S is known as
serializable.
Explanation:
Read(A): In T1, no subsequent writes to A, so no new edges

Read(B): In T2, no subsequent writes to B, so no new edges

Read(C): In T3, no subsequent writes to C, so no new edges

Write(B): B is subsequently read by T3, so add edge T2 → T3

Write(C): C is subsequently read by T1, so add edge T3 → T1

Write(A): A is subsequently read by T2, so add edge T1 → T2

Write(A): In T2, no subsequent reads to A, so no new edges

Write(C): In T1, no subsequent reads to C, so no new edges

Write(B): In T3, no subsequent reads to B, so no new edges


We cannot change sequence of statement execution , but we can go for
context switching as shown above. No multiple instructions are not executed,
bcos processor do not execute multiple execution at the same time. Does
processor do more than one task?
Serial scheduling when no context switching
Non serial scheduling when context switching happens

If we can convert a non serial schedule to a serial schedule then we can say it
is consistent schedule.
But if we cannot convert a non serial to serial schedule it does not mean it is
inconsistent
It is not guarantee that a student not passing EAMCET is poor ….

• If we can swap instructions in 2 transactions then we call them as non


conflicting instructions otherwise called as conflicting instructions.
• So we can swap instructions of non serial schedule to make it serial
schedule by swapping of non conflicting instructions.
• First we should find out when an instruction is conflicting and when an
instruction is non conflicting.
If problem exists means the statements are conflicting so no swapping can
happen

If write and write happens and it is called blind write ,because they are not
reading. But this is also conflicting bcos the final value written in database is
different.
2 instructions are said to be conflicting when they belong to different
transactions . But they must operate on same data value and one of the
instruction should be write instruction.
How to convert this non serial to serial schedule

• Conflict serializability def:


Non serial schedule is changed to serial schedule by swapping of non
conflicting instructions then we say it is conflict seriali- able. So it is
consistent .
Lock-based Protocols
Database systems equipped with lock-based protocols use a
mechanism by which any transaction cannot read or write data
until it acquires an appropriate lock on it. Locks are of two
kinds −
 Binary Locks − A lock on a data item can be in two
states; it is either locked or unlocked.
 Shared/exclusive − This type of locking mechanism
differentiates the locks based on their uses. If a lock is
acquired on a data item to perform a write operation, it is
an exclusive lock. Allowing more than one transaction to
write on the same data item would lead the database into
an inconsistent state. Read locks are shared because no
data value is being changed.
wo-Phase Locking 2PL
This locking protocol divides the execution phase of a
transaction into three parts. In the first part, when the
transaction starts executing, it seeks permission for the locks it
requires. The second part is where the transaction acquires all
the locks. As soon as the transaction releases its first lock,
the third phase starts. In this phase, the transaction cannot
demand any new locks; it only releases the acquired locks.
Two-phase locking has two phases, one is growing, where all
the locks are being acquired by the transaction; and the second
phase is shrinking, where the locks held by the transaction are
being released.
To claim an exclusive (write) lock, a transaction must first
acquire a shared (read) lock and then upgrade it to an
exclusive lock.
Strict Two-Phase Locking
The first phase of Strict-2PL is same as 2PL. After
acquiring all the locks in the first phase, the transaction
continues to execute normally. But in contrast to 2PL, Strict-
2PL does not release a lock after using it. Strict-2PL holds all
the locks until the commit point and releases all the locks at a
time.

Strict-2PL does not have cascading abort


as 2PL does. Timestamp-based Protocols
The most commonly used concurrency protocol is the
timestamp based protocol. This protocol uses either system
time or logical counter as a timestamp.
Lock-based protocols manage the order between the
conflicting pairs among transactions at the time of execution,
whereas timestamp-based protocols start working as soon as
a transaction is created.
Every transaction has a timestamp associated with it, and the
ordering is determined by the age of the transaction. A
transaction created at 0002 clock time would be older than all
other transactions that come after it. For example, any
transaction 'y' entering the system at 0004 is two seconds
younger and the priority would be given to the older one.
In addition, every data item is given the latest read and write-
timestamp. This lets the system know when the last ‘read and
write’ operation was performed on the data item.
Timestamp Ordering Protocol
The timestamp-ordering protocol ensures serializability among
transactions in their conflicting read and write operations. This
is the responsibility of the
protocol system that the conflicting pair of tasks should be
executed according to the timestamp values of the transactions.

 The timestamp of transaction Ti is denoted as TS(Ti).


 Read time-stamp of data-item X is denoted by R-
timestamp(X).
 Write time-stamp of data-item X is denoted by W-
timestamp(X). Failure Classification
To see where the problem has occurred, we generalize a
failure into various
categories, as follows −
Transaction failure
A transaction has to abort when it fails to execute or when it
reaches a point from where it can’t go any further. This is
called transaction failure where only a few transactions or
processes are hurt.
Reasons for a transaction failure could be −
 Logical errors − Where a transaction cannot complete
because it has some code error or any internal error
condition.
 System errors − Where the database system itself
terminates an active transaction because the DBMS is not
able to execute it, or it has to stop because of some
system condition. For example, in case of deadlock or
resource unavailability, the system aborts an active
transaction.
System Crash
There are problems − external to the system − that may cause
the system to stop abruptly and cause the system to crash. For
example, interruptions in power supply may cause the failure
of underlying hardware or software failure.
Examples may include operating system errors.
Disk Failure
In early days of technology evolution, it was a common
problem where hard- disk drives or storage drives used to fail
frequently.
Disk failures include formation of bad sectors, unreachability
to the disk, disk head crash or any other failure, which
destroys all or a part of disk storage.
Storage Structure
We have already described the storage system. In brief, the
storage structure can be divided into two categories −
 Volatile storage − As the name suggests, a volatile
storage cannot survive system crashes. Volatile storage
devices are placed very close to the CPU; normally they
are embedded onto the chipset itself. For example, main
memory and cache memory are examples of volatile
storage. They are fast but can store only a small amount
of information.
 Non-volatile storage − These memories are made to
survive system crashes. They are huge in data storage
capacity, but slower in accessibility. Examples may
include hard-disks, magnetic tapes, flash memory, and
non-volatile (battery backed up) RAM.
Recovery and Atomicity
When a system crashes, it may have several transactions being
executed and various files opened for them to modify the data
items. Transactions are made of various operations, which are
atomic in nature. But according to ACID properties of DBMS,
atomicity of transactions as a whole must be maintained, that
is, either all the operations are executed or none.
When a DBMS recovers from a crash, it should maintain the
following −
 It should check the states of all the transactions, which
were being executed.
 A transaction may be in the middle of some operation;
the DBMS must ensure the atomicity of the transaction
in this case.
 It should check whether the transaction can be
completed now or it needs to be rolled back.
 No transactions would be allowed to leave the DBMS in
an inconsistent state.
There are two types of techniques, which can help a DBMS
in recovering as well as maintaining the atomicity of a
transaction −
 Maintaining the logs of each transaction, and writing
them onto some stable storage before actually modifying
the database.
 Maintaining shadow paging, where the changes are done
on a volatile memory, and later, the actual database is
updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of
actions performed by a transaction. It is important that the logs
are written prior to the actual modification and stored on a
stable storage media, which is failsafe.
Log-based recovery works as follows −
 The log file is kept on a stable storage media.
 When a transaction enters the system and starts

<Tn, Start>
execution, it writes a log about it.
 When the transaction modifies an item X, it write logs as
follows −
<Tn, X, V1, V2>

It reads Tn has changed the value of X, from V1 to V2.

<Tn, commit>
 When the transaction finishes, it logs −
The database can be modified using two approaches −
 Deferred database modification − All logs are written on
to the stable storage and the database is updated when a
transaction commits.
 Immediate database modification − Each log follows an
actual database modification. That is, the database is
modified immediately after every operation.
Multiple Granularity ■
Allow data items to be of various sizes and define a hierarchy of
data granularities, where the small granularities are nested
within larger ones ■ Can be represented graphically as a tree
(but don't confuse with treelocking protocol) ■ When a
transaction locks a node in the tree explicitly, it implicitly locks
all the node's descendents in the same mode. ■ Granularity of
locking (level in tree where locking is done): ● fine
granularity (lower in tree): high
concurrency, high locking overhead ● coarse granularity (higher
in tree): low locking overhead, low concurrency

Example of Granularity Hierarchy The levels, starting from


the coarsest (top) level
are

Levels
are DB
Tables
Records
Fields

ValidationBased Protocol

Execution of transaction Ti is done in three phases. 1. Read


and execution phase: Transaction Ti writes only to temporary
local variables 2. Validation phase: Transaction Ti performs a
``validation test'' to determine if local variables can be written
without violating serializability. 3. Write phase: If Ti is
validated, the updates are applied to the database; otherwise,
Ti is rolled back.
Multiversion Schemes ■

Multiversion schemes keep old versions of data item to increase


concurrency.
● Multiversion Timestamp Ordering

● Multiversion TwoPhase Locking ■ Each successful write


results in the creation of a new version of the data item written.
■ Use timestamps to label versions. ■ When a read(Q) operation
is issued, select an appropriate version of Q based on the
timestamp of the transaction, and return the value of the
selected version. ■ reads never have to wait as an appropriate
version is returned immediately.
DBMS | Why recovery is needed?

Basically, whenever a transaction is submitted to a DBMS for


execution, the operating system is responsible for making sure
or to be confirmed that all the operation which need to be in
performed in the transaction have completed successfully and
their effect is either recorded in the database or the
transaction doesn’t affect the database or any other transactions.

The DBMS must not permit some operation of the transaction


T to be applied to the database while other operations of T is
not. This basically may happen if a transaction fails after
executing some of its operations but before executing all of
them.

Types of failures –
There are basically following types of failures that may
occur and leads to failure of the transaction such as:
1. Transaction failure
2. System failure
3. Media failure and so on.
Let us try to understand the different types of failures that may
occur during the transaction.

1. System crash –
A hardware, software or network error occurs comes under
this category this types of failures basically occurs during
the execution of the transaction. Hardware failures are
basically considered as Hardware failure.
2. System error –
Some operation that is performed during the transaction is the
reason for this type of error to occur, such as integer or
divide by zero. This type of failures is also known as the
transaction which may also occur because of erroneous
parameter values or because of a logical programming
error. In
addition to this user may also interrupt the execution
during execution which may lead to failure in the
transaction.
3. Local error –
This basically happens when we are doing the transaction
but certain conditions may occur that may lead to
cancellation of the transaction. This type of error is
basically coming under Local error. The simple example of
this is that data for the transaction may not found. When
we want to debit money from an insufficient balance
account which leads to the cancellation of our request or
transaction. And this exception should be programmed in
the transaction itself so that it wouldn’t be considered as a
failure.
4. Concurrency control enforcement –
The concurrency control method may decide to abort the
transaction, to start again because it basically violates
serializability or we can say that several processes are in a
deadlock.
5. Disk failure –
This type of failure basically occur when some disk
loses their data because of a read or write malfunction or
because of a disk read/write head crash. This may
happen during a read /write operation of the transaction.
6. Castropher –
These are also known as physical problems it basically
refers to the endless list of problems that include power
failure or air-conditioning failure, fire, theft sabotage
overwriting disk or tapes by mistake and mounting of the
wrong tape by the operator.
Protecting data against loss, corruption, disasters (manmade or natural)
and other problems is one of the top priorities for IT organizations. In
concept, the ideas are simple, although implementing an efficient and
effective set of backup operations can be difficult.

The term backup has become synonymous with data protection over the
past several decades and may be accomplished via several methods.
Backup software applications have been developed to reduce the
complexity of performing backup and recovery operations. Backing up
data is only one part of a disaster protection plan, and may not provide
the level of data and disaster recovery capabilities desired without
careful design and testing.

The purpose of most backups is to create a copy of data so that a


particular file or application may be restored after data is lost, corrupted,
deleted or a disaster strikes. Thus, backup is not the goal, but rather it is
one means to accomplish the goal of protecting data. Testing backups is
just as important as backing up and restoring data. Again, the point of
backing up data is to enable restoration of data at a later point in time.
Without periodic testing, it is impossible to guarantee that the goal of
protecting data is being met.

Backing up data is sometimes confused with archiving data, although


these operations are different. A backup is a secondary copy of data
used for data protection. In contrast, an archive is the primary data,
which is moved to a less-expensive type of media (such as tape) for
long-term, low-cost storage.

Backup applications have long offered several types of backup


operations. The most common backup types are a full backup,
incremental backup and differential backup. Other backup types
include synthetic full
backups, mirroring, reverse incremental and continuous
data protection (CDP).
Backups

Full

backups

The most basic and complete type of backup operation is a


full backup. As the name implies, this type of backup makes
a copy of all data to another set of media, which can be tape,
disk or a DVD or CD. The primary advantage to performing a
full backup during every operation is that a complete copy of
all data is available with a single set of media. This results in
a minimal time to restore data, a metric known as a recovery
time objective (RTO). However, the disadvantages are that it
takes longer to perform a full backup than other types
(sometimes by a factor of 10 or more), and it requires more
storage space.

Thus, full backups are typically run only periodically. Data


centers that have a small amount of data (or critical
applications) may choose to run a full backup daily, or even
more often in some cases. Typically, backup operations
employ a full backup in combination with either incremental
ordifferential backups.

Incremental backups

An incremental backup operation will result in copying only


the data that has changed since the last backup operation of
any type. The modified time stamp on files is typically used
and compared to the time stamp of the last backup. Backup
applications track and record the date and time that backup
operations occur in order to track files modified since these
operations.

Because an incremental backup will only copy data since the


last backup of any type, it may be run as often as desired, with
only the
most recent changes stored. The benefit of an incremental
backup is that they copy a smaller amount of data than a full.
Thus, these operations will complete faster, and require less
media to store the backup.

Differential backups

A differential backup operation is similar to an incremental


the first time it is performed, in that it will copy all data
changed from the previous backup. However, each time it is
run afterwards, it will continue to copy all data changed
since the previous full backup.
Thus, it will store more data than an incremental on subsequent
operations, although typically far less than a full backup.
Moreover, differential backups require more space and time to
complete than incremental backups, although less than full
backups.

Remote Backup
Remote backup provides a sense of security in case the primary location
where the database is located gets destroyed. Remote backup can be
offline or real-time or online. In case it is offline, it is maintained
manually.
Online backup systems are more real-time and lifesavers for
database administrators and investors. An online backup system is a
mechanism where every bit of the real-time data is backed up
simultaneously at two distant places. One of them is directly connected
to the system and the other one is kept at a remote place as backup.
As soon as the primary database storage fails, the backup system senses
the failure and switches the user system to the remote storage.
Sometimes this is so instant that the users can’t even realize a failure.
DBMS | Log based recovery

Atomicity property of DBMS states that either all the operations


of transactions must be performed or none. The modifications
done by an aborted transaction should not be visible to database
and the modifications done by committed transaction should be
visible.
To achieve our goal of atomicity, user must first output to
stable storage information describing the modifications,
without modifying the database itself. This information can
help us ensure that all modifications performed by committed
transactions are reflected in the database. This information can
also help us ensure that no modifications made by an aborted
transaction persist in the database.

Log and log records –


The log is a sequence of log records, recording all the update
activities in the database. In a stable storage, logs for each
transaction are maintained. Any operation which is performed
on the database is recorded is on the log. Prior to performing
any modification to database, an update log record is created
to reflect that modification.
An update log record represented as: <Ti, Xj, V1, V2> has these
fields:

1. Transaction identifier: Unique Identifier of the


transaction that performed the write operation.
2. Data item: Unique identifier of the data item written.
3. Old value: Value of data item prior to write.
4. New value: Value of data item after write
operation. Other type of log records are:

1. <Ti start>: It contains information about when a transaction


Ti starts.
2. <Ti commit>: It contains information about when a
transaction Ti commits.
3. <Ti abort>: It contains information about when a transaction
Ti aborts.
Undo and Redo Operations –
Because all database modifications must be preceded by
creation of log record, the system has available both the old
value prior to modification of data item and new value that is to
be written for data item. This allows system to perform redo and
undo operations as appropriate:
1. Undo: using a log record sets the data item specified in
log record to old value.
2. Redo: using a log record sets the data item specified in log
record to new value.
The database can be modified using two approaches –
1. Deferred Modification Technique: If the transaction
does not modify the database until it has partially
committed, it is said to use deferred modification
technique.
2. Immediate Modification Technique: If database
modification occur while transaction is still active, it is
said to use immediate modification technique.
Recovery using Log records –
After a system crash has occurred, the system consults the
log to determine which transactions need to be redone and
which need to be undone.
1. Transaction Ti needs to be undone if the log contains the
record <Ti start> but does not contain either the record <Ti
commit> or the record <Ti abort>.
2. Transaction Ti needs to be redone if log contains record
<Ti start> and either the record <Ti commit> or the record <Ti
abort>.
Transaction Isolation Levels in DBMS

Prerequisite – Concurrency control in DBMS, ACID


Properties in DBMS As we know that, in order to maintain
consistency in a database, it follows
ACID properties. Among these four properties (Atomicity,
Consistency, Isolation and Durability) Isolation determines
how transaction integrity is visible to other users and systems.
It means that a transaction should take place in a system in
such a way that it is the only transaction that is accessing the
resources in a database system.
Isolation levels define the degree to which a transaction must
be isolated from the data modifications made by any other
transaction in the database system. A transaction isolation level
is defined by the following phenomena –

 Dirty Read – A Dirty read is the situation when a


transaction reads a data that has not yet been committed.
For example, Let’s say transaction 1 updates a row and
leaves it uncommitted, meanwhile, Transaction 2 reads the
updated row. If transaction 1 rolls back the change,
transaction 2 will have read data that is considered never
to have existed.
 Non Repeatable read – Non Repeatable read occurs when
a transaction reads same row twice, and get a different
value each time. For example, suppose transaction T1
reads data. Due to concurrency, another transaction T2
updates the same data and commit, Now if transaction T1
rereads the same data, it will retrieve a different value.

 Phantom Read – Phantom Read occurs when two same


queries are executed, but the rows retrieved by the two,
are different. For example, suppose transaction T1
retrieves a set of rows that satisfy some search criteria.
Now, Transaction T2 generates some new rows that
match the search criteria for transaction T1. If
transaction T1 re-executes the statement that reads the
rows, it gets a different set of rows this time.

You might also like