Professional Documents
Culture Documents
TX Processing
TX Processing
General Overview
Where we ‘ve been...
DBA skills for relational DB’s:
Logical Schema Design
E/R diagrams
Decomposition and Normalization
Query languages
RA, RC, SQL
Integrity Constraints
Transactions
15.2
What Does a DBMS Manage?
1. Data organization
E/R Model
Relational Model
2. Data Retrieval
Relational Algebra
Relational Calculus
SQL
3. Data Integrity
Integrity Constraints
Transactions
15.3
What is a transaction
A transaction is the basic or atomic and logical unit of execution or DB
processing in an information system which accesses & possibly updates
various data items in the DB.
A transaction is a sequence of operations that must be executed as a whole,
taking a consistent (& correct) database state into another consistent (&
correct) database state;
A collection of actions that make consistent transformations of system states
while preserving system consistency
An indivisible unit of processing
database in a database in a
consistent state consistent state
Transfer £500
Account A Fred Bloggs £1000 Account A Fred Bloggs £500
Account B Sue Smith £0 Account B Sue Smith £500
15.5
The Threat to Data Integrity
Consistent DB
15.7
ACID Properties
Properties that a Transaction needs to have:
Atomicity: either all operations in a Transaction take effect, or
none
i.e. a transaction is an atomic unit of processing and it is either
performed entirely or not at all
15.8
ACID Properties contd…
Isolation / Independence: intermediate, inconsistent states
must be concealed from other Transactions
i.e. the updates of a transaction must not be made visible to
other transactions until it is committed (solves the temporary
update problem)
15.9
Demonstrating ACID
Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
15.10
Threats to ACID
1. Programmer Error
e.g.: $50 substracted from A, $30 added to B
threatens consistency
2. System Failures
e.g.: crash after write(A) and before write(B)
threatens atomicity
e.g.: crash after write(B)
threatens durability
3. Concurrency
E.g.: concurrent Transaction reads A, B between steps 3-6
threatens isolation
15.11
Isolation
Simplest way to guarantee: forbid concurrent Xactions!
But, concurrency is desirable:
15.12
Isolation
Approach to ensuring Isolation:
Distinguish between “good” and “bad” concurrency
Prevent all “bad” (and sometime some “good”) concurrency from
happening OR
Recognize “bad” concurrency when it happens and undo its effects
(abort some transactions)
Pessimistic vs Optimistic CC
15.13
Requirements for Database Consistency
Concurrency Control
Most DBMS are multi-user systems.
The concurrent execution of many different transactions submitted by
various users must be organised such that each transaction does not
interfere with another transaction with one another in a way that produces
incorrect results.
The concurrent execution of transactions must be such that each
transaction appears to execute in isolation.
Recovery
System failures, either hardware or software, must not result in an
inconsistent database
15.14
Transaction as a Recovery Unit
If an error or hardware/software crash occurs between the begin and end,
the database will be inconsistent
Computer Failure (system crash)
A transaction or system error
Local errors or exception conditions detected by the transaction
Concurrency control enforcement
Disk failure
Physical problems and catastrophes
The database is restored to some state from the past so that a correct state
—close to the time of failure—can be reconstructed from the past state.
A DBMS ensures that if a transaction executes some updates and then a
failure occurs before the transaction reaches normal termination, then
those updates are undone.
The statements COMMIT and ROLLBACK (or their equivalent) ensure
Transaction Atomicity
15.15
Recovery
Mirroring
keep two copies of the database and maintain them simultaneously
Backup
periodically dump the complete state of the database to some form of tertiary
storage
System Logging
the log keeps track of all transaction operations affecting the values of
database items. The log is kept on disk so that it is not affected by failures
except for disk and catastrophic failures.
15.16
Recovery from Transaction Failures
Catastrophic failure
Restore a previous copy of the database from archival backup
Apply transaction log to copy to reconstruct more current state
by redoing committed transaction operations up to failure point
Incremental dump + log each transaction
Non-catastrophic failure
Reverse the changes that caused the inconsistency by undoing
the operations and possibly redoing legitimate changes which
were lost
The entries kept in the system log are consulted during recovery.
No need to use the complete archival copy of the database.
15.17
Transaction States
For recovery purposes the system needs to keep track of when a
transaction starts, terminates and commits.
Begin_Transaction: marks the beginning of a transaction execution;
End_Transaction: specifies that the read and write operations have ended and
marks the end limit of transaction execution (but may be aborted because of
concurrency control);
Commit_Transaction: signals a successful end of the transaction. Any updates
executed by the transaction can be safely committed to the database and will not
be undone;
Rollback (or Abort): signals that the transaction has ended unsuccessfully. Any
changes that the transaction may have applied to the database must be undone;
Undo: similar to ROLLBACK but it applies to a single operation rather than to a
whole transaction;
Redo: specifies that certain transaction operations must be redone to ensure
that all the operations of a committed transaction have been applied successfully
to the database;
15.18
Entries in the System Log
For every transaction a unique transaction-id is generated by Credit_labmark (sno
the system. NUMBER, cno CHAR, credit
NUMBER)
[start_transaction, transaction-id]: the start of old_mark NUMBER;
execution of the transaction identified by transaction-id new_mark NUMBER;
ROLLBACK ROLLBACK
READ, WRITE
terminated
failed
If a system failure occurs, searching the log and rollback the transactions that
have written into the log a
[start_transaction, transaction-id]
[write_item, transaction-id, X, old_value, new_value]
but have not recorded into the log a [commit, transaction-id]
15.20
Read and Write Operations of a
Transaction
Specify read or write operations on the database items that are executed as
part of a transaction
read_item(X):
reads a database item named X into a program variable also named X.
1. find the address of the disk block that contains item X
2. copy that disk block into a buffer in the main memory
3. copy item X from the buffer to the program variable named
write_item(X):
writes the value of program variable X into the database item named X.
1. find the address of the disk block that contains item X
2. copy that disk block into a buffer in the main memory
3. copy item X from the program variable named X into its current location in the
buffer store the updated block in the buffer back to disk (this step updates the
database on disk)
X:= X
15.21
Checkpoints in the System Log
A [checkpoint] record is written periodically into the
log when the system writes out to the database on
disk the effect of all WRITE operations of committed
transactions.
data
All transactions whose [commit, transaction-id] entries
can be found in the system log will not require their
WRITE operations to be redone in the case of a
system crash.
Before a transaction reaches commit point, force-
write or flush the log file to disk before commit
transaction.
Actions Constituting a Checkpoint
temporary suspension of transaction execution
forced writing of all updated database blocks in main
memory buffers to disk log
writing a [checkpoint] record to the log and force writing
the log to disk
resuming of transaction execution
15.22
Transaction as a Concurrency Unit
Transactions must be synchronised correctly to guarantee
database consistency
Simultaneous Execution
Account B Sue Smith £0
T1
Account A Fred Bloggs £500
15.23
Transaction scheduling algorithms
Transaction Serialisability
The effect on a database of any number of transactions executing
in parallel must be the same as if they were executed one after
another
15.24
The Lost Update Problem
Two transactions accessing the same database item have their operations
interleaved in a way that makes the database item incorrect
X=4
T1: (joe) T2: (fred) X Y Y=8
N=2
read_item(X); 4 M=3
X:= X - N; 2
read_item(X); 4
X:= X + M; 7
write_item(X); 2
read_item(Y); 8
write_item(X); 7
Y:= Y + N; 10
write_item(Y); 10
item X has incorrect value because its update from T1 is “lost” (overwritten)
T2 reads the value of X before T1 changes it in the database and hence the updated
database value resulting from T1 is lost
15.25
The Incorrect Summary or Unrepeatable Read or
Inconsistent Retrieval Problem
One transaction is calculating an aggregate summary function on a
number of records while other transactions are updating some of these
records.
The aggregate function may calculate some values before they are
updated and others after.
T1: T2: T1 T2 Sum
sum:= 0; 0
read_item(A); 4
T2 reads X
sum:= sum + A; 4
after N is
read_item(X);. . 4
subtracted and X:= X - N; . 2
reads Y before write_item(X); 2
N is added, so read_item(X); 2
a wrong sum:= sum + X; 6
summary is read_item(Y); 8
the result sum:= sum + Y; 14
read_item(Y); 8
Y:= Y + N; 10
write_item(Y); 10
15.26
Dirty Read or The Temporary Update Problem
One transaction updates a database item and then the transaction fails. The
updated item is accessed by another transaction before it is changed back to its
original value
15.27
Next class we will discuss about
SCHEDULES
Serialisable and non-serialisable schedules
Deadlock & Locking protocols
Deadlock prevention
Recovery Techniques
15.28
Schedules
Schedules – sequences that indicate the chronological order in
which instructions of concurrent transactions are executed
a schedule for a set of transactions must consist of all instructions of
those transactions
must preserve the order in which the instructions appear in each
individual transaction
T1 T2
T1 T2 1
1 A one possible schedule: A
2 B B
3 C 2
D 3
C
D
15.29
Example Schedules
Transactions:
T1: transfers $50 from A to B
T2: transfers 10% of A to B
Example 1: a “serial” schedule
T1 T2
read(A)
Constraint: The sum of A+B
A <- A -50
must be the same
write(A)
read(B)
B<-B+50 Before: 100+50
write(B) =150, consistent
After: 45+105
read(A)
tmp <- A*0.1
A <- A – tmp
write(A)
read(B)
B <- B+ tmp
write(B)
15.30
Example Schedule
Another “serial” schedule:
T1 T2 Before: 100+50
read(A)
tmp <- A*0.1 =150, consistent
A <- A – tmp After: 40+110
write(A)
read(B) Consistent but not the
B <- B+ tmp same as previous schedule..
write(B)
Either is OK!
read(A)
A <- A -50
write(A)
read(B)
B<-B+50
write(B)
15.31
Example Schedule (Cont.)
Another “good” schedule:
T1 T2
read(A)
Effect: Before After
A <- A -50
write(A)
A 100 45
read(A) B 50 105
tmp <- A*0.1
A <- A – tmp Same as one of the serial schedules
write(A) Serializable
read(B)
B<-B+50
write(B)
read(B)
B <- B+ tmp
write(B)
15.32
Example Schedules (Cont.)
A “bad” schedule
T1 T2
read(A)
A <- A -50
Before: 100+50 = 150
read(A)
tmp <- A*0.1
After: 50+60 = 110 !!
A <- A – tmp
write(A)
read(B) Not consistent
write(A)
read(B)
B<-B+50
write(B)
B <- B+ tmp
write(B)
15.33
Serializability
How to distinguish good and bad schedules?
for previous example, any schedule leaving A+B = 150 is good
Ans: No. In general, won’t know A+B, can’t check value of A+B at given time
for consistency
Alternative: Serializability
15.34
Serializability
Serializable: A schedule is serializable if its effects on the db are the
equivalent to some serial schedule.
Hard to ensure; more conservative approaches are used in practice
All schedules
SQL serializable
Serializable schedules
15.36
Conflict Serializability (Cont.)
If a schedule S can be transformed into a schedule S´ by a
series of swaps of non-conflicting instructions, we say that
S and S´ are conflict equivalent.
We say that a schedule S is conflict serializable if it is
conflict equivalent to a serial schedule
Ex:
T1 T2
T1 T2 …. ….
…. ….
can be rewritten
read(A) read(A)
to equivalent
read(A)
read(A) schedule
. .
. .
. .
15.37
Conflict Serializability (Cont.)
Example:
T1 T2
1. Read(A) Swaps:
2. A A -50
3. Write(A) 4 <->d 5<->d 6<->d
4<->c 5<->c 6<->c
a. Read(A)
4<->b 5<->b 6<->b
b. tmp A * 0.1
4<->a 5<->a 6<->a
c. A A - tmp
d.Write(A)
4.Read(B) Conflict serializble
5. B B + 50
6. Write(B) T1, T2
e. Read(B)
f. B B + tmp
g. Write(B)
15.38
Conflict Serializability (Cont.)
The effects of swaps
T1 T2
read(A)
A <- A -50 Because example schedule could
write(A) be swapped to this schedule (<T1, T2>)
read(B)
B<-B+50 example schedule is
write(B) conflict serializable
read(A)
tmp <- A*0.1
A <- A – tmp
write(A)
read(B)
B <- B+ tmp
write(B)
15.39
The swaps we made
A. Reads and writes of different data elements
e.g.: T1 T2 T1 T2
write(A) read(B)
=
read(B) write(A)
15.40
Swaps
T1 T2 T1 T2
write(A) read(A)
=
read(A) write(A)
What affect on state of db could above swap make?
Suppose what follows read(A) in T1 is:
read(C)
C C+A
write(C)
15.41
The swaps We Made
A. Reads and writes of different data elements
4 <-> d
6 <-> a
15.42
Conflict Serializability (Cont.)
Previous example wo/ local operations:
T1 T2 Swaps:
1. Read(A)
2. Write(A) 3 <->b
a. Read(A) 3<->a
b. Write(A) 4<->b
4<->a
3. Read(B)
4. Write(B)
c. Read(B) T1, T2
d. Write(B)
15.43
Swappable Operations
Swappable operations:
1. Any operation on different data element
2. Reads of the same data (Read(A))
(regardless of order of reads, the same value for A is read)
15.44
Conflicts
(1) READ/WRITE conflicts:
conflict because value read depends on whether write has
occurred
15.45
Conflict Serializability
Q: Is the following shcedule conflict serializable? If so, what’s its equivalent
serial schedule? If not, why?
T1 T2
(1) read(Q)
write(Q) (a)
(2) write(Q)
Ans: No. Swapping (a) with (1) is a R/W conflict, and swapping (a) with (2)
is a W/W conflict.
Not equivalent to <T1, T2> or <T2, T1>
15.46
Conflict Serializability
Q: Is the following schedule conflict serializable? If so, what’s its equivalent
serial schedule? If not, why?
T1 T2 T3 Ans.: NO.
All possible serial schedules are
(1) Read(A) not conflict equivalent.
(a) Write(A)
(b) Read(B) <T1, T2, T3>
<T1, T3, T2>
(x) Write(B) <T2, T1, T3>
(y) Read(S) . . . . ..
(2) Write(S)
15.47
Conflict Serializability
Testing: too expensive to test a schedule by swapping operations
(usually schedules are big!)
15.48
Precedence Graph
An example of a “Precedence Graph”:
T1 T2 T3
Read(A) T1 R/W
(A )
Write(A) T2
Read(B) )
(B
W
R/
Write(B)
Read(S) T3
15.49
Precedence Graph
Another example:
T1 T2 T3
Read(A) T1 R/W
(A )
Write(A) T2
Read(B) )
(B
R/W
W
R/
(S)
Write(B)
Read(S) T3
15.50
Example Schedule (Schedule B)
T1 T2 T3 T4 T5
read(X)
read(Y)
read(Z)
read(V)
read(W)
read(Y)
write(Y)
write(Z)
read(U)
read(Y)
write(Y)
read(Z)
write(Z)
read(U)
write(U)
15.51
Precedence Graph for Schedule A
R/W (Y)
T1 T2
R/
W
(y
), R/W(Y)
R/W(Z) R /W
(Z
)
R/W(Z) , W/W(Z)
T3 T4
15.52
Test for Conflict Serializability
A schedule is conflict serializable if and only if its precedence
graph is acyclic.
Cycle-detection algorithms exist which take order n2 time, where
n is the number of vertices in the graph. (Better algorithms take
order n + e where e is the number of edges.)
If precedence graph is acyclic, the serializability order can be
obtained by a topological sorting of the graph.
For example, a serializability order for Schedule A would be
T5 T1 T3 T2 T4 .
15.53
View Serializability
“View Equivalence”:
S and S´ are view equivalent if the following three conditions
are met:
1. For each data item Q, if transaction Ti reads the initial value of Q in
schedule S, then transaction Ti must, in schedule S´, also read the
initial value of Q.
2. For each data item Q, if transaction Ti reads the value of Q written
by Tj in S, it also does in S’
3. For each data item Q, the transaction (if any) that performs the final
write(Q) operation in schedule S must perform the final write(Q)
operation in schedule S´.
As can be seen, view equivalence is also based purely on reads
and writes alone.
15.54
View Serializability (Cont.)
A schedule S is view serializable it is view equivalent to a serial schedule.
Example:
T1 T2 T3
Read(A)
Write(A)
Is this schedule Write(A)
view serializable? Write(A)
conflict serializable?
15.55
View Serializability
(1) We just showed: conflict serializable
view serializable
view serializable
(2) We can also show:
serializable
15.56
Other Notions of Serializability
Equivalent to the serial schedule < T1, T2 >, yet is not conflict equivalent or view equivalent to it.
T1 T2
Read(A)
A A -50 Determining such equivalence
Write(A) requires analysis of operations
Read(B)
B B - 10 other than read and write.
Write(B)
Read(B)
B B + 50
Write(B)
Read(A)
A A + 10
Write(A)
15.57
Transaction Definition in SQL
Data manipulation language must include a construct for
specifying the set of actions that comprise a transaction.
In SQL, a transaction begins implicitly.
A transaction in SQL ends by:
Commit work commits current transaction and begins a new one.
Rollback work causes current transaction to abort.
Levels of consistency specified by SQL-92:
Serializable — default (more conservative than conflict
serializable)
Repeatable read
Read committed
Read uncommitted
15.58
Transactions in SQL
Serializable — default
15.59