Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 59

Transaction Processing

General Overview
Where we ‘ve been...
DBA skills for relational DB’s:
Logical Schema Design
 E/R diagrams
 Decomposition and Normalization
Query languages
 RA, RC, SQL
Integrity Constraints
Transactions

Where we are going


Database Implementation Issues

15.2
What Does a DBMS Manage?
1. Data organization
 E/R Model
 Relational Model
 2. Data Retrieval
 Relational Algebra
 Relational Calculus
 SQL
 3. Data Integrity
 Integrity Constraints
 Transactions

15.3
What is a transaction
 A transaction is the basic or atomic and logical unit of execution or DB
processing in an information system which accesses & possibly updates
various data items in the DB.
 A transaction is a sequence of operations that must be executed as a whole,
taking a consistent (& correct) database state into another consistent (&
correct) database state;
 A collection of actions that make consistent transformations of system states
while preserving system consistency
 An indivisible unit of processing
database in a database in a
consistent state consistent state
Transfer £500
Account A Fred Bloggs £1000 Account A Fred Bloggs £500
Account B Sue Smith £0 Account B Sue Smith £500

begin Transaction execution of Transaction end Transaction


database may be
temporarily in an
inconsistent state
during execution
15.4
Updates in SQL
An example:
What takes place:
UPDATE account
SET balance = balance -50 memory
WHERE acct_no = A102 …
Disk
Dntn: A102: 300 (1) Read
Dntn: A15: 500
(2) update account

Transaction:
(3) write
Mian: A142: 300
1. Read(A)
2. A <- A -50
3. Write(A)

15.5
The Threat to Data Integrity
Consistent DB

Name Acct bal


-------- ------ ------
Joe A-33 300
Joe A-509 100 Inconsistent DB

Joe’s total: 400 Name Acct bal


-------- ------ ------
Joe A-33 250
transaction
Joe A-509 100

Joe’s total: 350


Consistent DB
What a Transaction should
Name Acct bal look like to Joe
-------- ------ ------
Joe A-33 250
Joe A-509 150 What actually happens
during execution
Joe’s total: 400
15.6
Transactions
What?:
 Updates to db that can be executed concurrently
Why?:
(1) Updates can require multiple reads, writes on a db
e.g., transfer $50 from A-33 to A509
= read(A)
A  A -50
write(A)
read(B)
BB+50
write(B)
(2) For performance reasons, db’s permit updates to be executed
concurrently
Concern: concurrent access/updates of data can compromise data integrity

15.7
ACID Properties
Properties that a Transaction needs to have:
 Atomicity: either all operations in a Transaction take effect, or
none
i.e. a transaction is an atomic unit of processing and it is either
performed entirely or not at all

 Consistency: operations, taken together preserve db


consistency
i.e. a transaction's correct execution must take the database
from one correct state to another

15.8
ACID Properties contd…
 Isolation / Independence: intermediate, inconsistent states
must be concealed from other Transactions
i.e. the updates of a transaction must not be made visible to
other transactions until it is committed (solves the temporary
update problem)

 Durability / Permanency. If a Transaction successfully


completes (“commits”), changes made to db must persist, even if
system crashes
i.e. if a transaction changes the database and is committed, the
changes must never be lost because of subsequent failure
Serialisability: transactions are considered serialisable if the effect
of running them in an interleaved fashion is equivalent to running
them serially in some order

15.9
Demonstrating ACID
Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)

Consistency: total value A+B, unchanged by Xaction

Atomicity: if Transaction fails after 3 and before 6, 3 should not affect db

Durability: once user notified of Transaction commit, updates to A,B should


not be undone by system failure
Isolation: other Xactions should not be able to see A, B between steps 3-6

15.10
Threats to ACID
1. Programmer Error
e.g.: $50 substracted from A, $30 added to B
 threatens consistency

2. System Failures
e.g.: crash after write(A) and before write(B)
 threatens atomicity
e.g.: crash after write(B)
 threatens durability

3. Concurrency
E.g.: concurrent Transaction reads A, B between steps 3-6

 threatens isolation

15.11
Isolation
Simplest way to guarantee: forbid concurrent Xactions!
But, concurrency is desirable:

(1) Achieves better throughput (TPS: transactions per second)


one Transaction can use CPU while another is waiting for disk to
service request

(2) Achieves better average response time


short Xactions don’t need to get stuck behind long ones

Prohibiting concurrency is not an option

15.12
Isolation
 Approach to ensuring Isolation:
 Distinguish between “good” and “bad” concurrency
 Prevent all “bad” (and sometime some “good”) concurrency from
happening OR
 Recognize “bad” concurrency when it happens and undo its effects
(abort some transactions)
 Pessimistic vs Optimistic CC

 Both pessimistic and optimistic approaches require


distinguishing between good and bad concurrency

How: concurrency characterized in terms of possible Transaction“schedules”

15.13
Requirements for Database Consistency
 Concurrency Control
 Most DBMS are multi-user systems.
 The concurrent execution of many different transactions submitted by
various users must be organised such that each transaction does not
interfere with another transaction with one another in a way that produces
incorrect results.
 The concurrent execution of transactions must be such that each
transaction appears to execute in isolation.

 Recovery
 System failures, either hardware or software, must not result in an
inconsistent database

15.14
Transaction as a Recovery Unit
 If an error or hardware/software crash occurs between the begin and end,
the database will be inconsistent
 Computer Failure (system crash)
 A transaction or system error
 Local errors or exception conditions detected by the transaction
 Concurrency control enforcement
 Disk failure
 Physical problems and catastrophes
 The database is restored to some state from the past so that a correct state
—close to the time of failure—can be reconstructed from the past state.
 A DBMS ensures that if a transaction executes some updates and then a
failure occurs before the transaction reaches normal termination, then
those updates are undone.
 The statements COMMIT and ROLLBACK (or their equivalent) ensure
Transaction Atomicity

15.15
Recovery
 Mirroring
 keep two copies of the database and maintain them simultaneously

 Backup
 periodically dump the complete state of the database to some form of tertiary
storage

 System Logging
 the log keeps track of all transaction operations affecting the values of
database items. The log is kept on disk so that it is not affected by failures
except for disk and catastrophic failures.

15.16
Recovery from Transaction Failures
Catastrophic failure
 Restore a previous copy of the database from archival backup
 Apply transaction log to copy to reconstruct more current state
by redoing committed transaction operations up to failure point
 Incremental dump + log each transaction

Non-catastrophic failure
 Reverse the changes that caused the inconsistency by undoing
the operations and possibly redoing legitimate changes which
were lost
 The entries kept in the system log are consulted during recovery.
 No need to use the complete archival copy of the database.

15.17
Transaction States
 For recovery purposes the system needs to keep track of when a
transaction starts, terminates and commits.
 Begin_Transaction: marks the beginning of a transaction execution;
 End_Transaction: specifies that the read and write operations have ended and
marks the end limit of transaction execution (but may be aborted because of
concurrency control);
 Commit_Transaction: signals a successful end of the transaction. Any updates
executed by the transaction can be safely committed to the database and will not
be undone;
 Rollback (or Abort): signals that the transaction has ended unsuccessfully. Any
changes that the transaction may have applied to the database must be undone;
 Undo: similar to ROLLBACK but it applies to a single operation rather than to a
whole transaction;
 Redo: specifies that certain transaction operations must be redone to ensure
that all the operations of a committed transaction have been applied successfully
to the database;

15.18
Entries in the System Log
For every transaction a unique transaction-id is generated by Credit_labmark (sno
the system. NUMBER, cno CHAR, credit
NUMBER)
 [start_transaction, transaction-id]: the start of old_mark NUMBER;
execution of the transaction identified by transaction-id new_mark NUMBER;

 [read_item, transaction-id, X]: the transaction identified SELECT labmark INTO


old_mark FROM enrol
by transaction-id reads the value of database item X. WHERE studno = sno and
Optional in some protocols. courseno = cno FOR UPDATE
OF labmark;
 [write_item, transaction-id, X, old_value, new_value]:
the transaction identified by transaction-id changes the new_ mark := old_ mark +
value of database item X from old_value to new_value credit;

UPDATE enrol SET labmark


 [commit, transaction-id]: the transaction identified by = new_mark WHERE studno =
transaction-id has completed all accesses to the database sno and courseno = cno ;
successfully and its effect can be recorded permanently
COMMIT;
(committed)
EXCEPTION
 [abort, transaction-id]: the transaction identified by WHEN OTHERS THEN
transaction-id has been aborted ROLLBACK;

15.19 END credit_labmark;


Transaction execution
A transaction reaches its commit point when all
operations accessing the database are completed
and the result has been recorded in the log. It then
writes a [commit, transaction-id].
BEGIN END
TRANSACTION TRANSACTION
active partially
committed COMMIT
committed

ROLLBACK ROLLBACK
READ, WRITE

terminated
failed

If a system failure occurs, searching the log and rollback the transactions that
have written into the log a
[start_transaction, transaction-id]
[write_item, transaction-id, X, old_value, new_value]
but have not recorded into the log a [commit, transaction-id]

15.20
Read and Write Operations of a
Transaction
 Specify read or write operations on the database items that are executed as
part of a transaction
 read_item(X):
 reads a database item named X into a program variable also named X.
1. find the address of the disk block that contains item X
2. copy that disk block into a buffer in the main memory
3. copy item X from the buffer to the program variable named
 write_item(X):
 writes the value of program variable X into the database item named X.
1. find the address of the disk block that contains item X
2. copy that disk block into a buffer in the main memory
3. copy item X from the program variable named X into its current location in the
buffer store the updated block in the buffer back to disk (this step updates the
database on disk)

X:= X

15.21
Checkpoints in the System Log
 A [checkpoint] record is written periodically into the
log when the system writes out to the database on
disk the effect of all WRITE operations of committed
transactions.
data
 All transactions whose [commit, transaction-id] entries
can be found in the system log will not require their
WRITE operations to be redone in the case of a
system crash.
 Before a transaction reaches commit point, force-
write or flush the log file to disk before commit
transaction.
 Actions Constituting a Checkpoint
 temporary suspension of transaction execution
 forced writing of all updated database blocks in main
memory buffers to disk log
 writing a [checkpoint] record to the log and force writing
the log to disk
 resuming of transaction execution
15.22
Transaction as a Concurrency Unit
 Transactions must be synchronised correctly to guarantee
database consistency

Simultaneous Execution
Account B Sue Smith £0
T1
Account A Fred Bloggs £500

Transfer £500 Account B Sue Smith £500


from A to B
Account A Fred Bloggs £1000

T2 Account A Fred Bloggs £800


Account C Jill Jones £700
Transfer £300 Account C Jill Jones £400
from C to A
Net result
Account A 800
Account B 500
Account C 400

15.23
Transaction scheduling algorithms
 Transaction Serialisability
 The effect on a database of any number of transactions executing
in parallel must be the same as if they were executed one after
another

 Problems due to the Concurrent Execution of Transactions


 The Lost Update Problem
 The Incorrect Summary or Unrepeatable Read Problem
 The Temporary Update (Dirty Read) Problem

15.24
The Lost Update Problem
 Two transactions accessing the same database item have their operations
interleaved in a way that makes the database item incorrect

X=4
T1: (joe) T2: (fred) X Y Y=8
N=2
read_item(X); 4 M=3
X:= X - N; 2
read_item(X); 4
X:= X + M; 7
write_item(X); 2
read_item(Y); 8
write_item(X); 7
Y:= Y + N; 10
write_item(Y); 10
 item X has incorrect value because its update from T1 is “lost” (overwritten)
 T2 reads the value of X before T1 changes it in the database and hence the updated
database value resulting from T1 is lost

15.25
The Incorrect Summary or Unrepeatable Read or
Inconsistent Retrieval Problem
 One transaction is calculating an aggregate summary function on a
number of records while other transactions are updating some of these
records.
 The aggregate function may calculate some values before they are
updated and others after.
T1: T2: T1 T2 Sum
sum:= 0; 0
read_item(A); 4
T2 reads X
sum:= sum + A; 4
after N is
read_item(X);. . 4
subtracted and X:= X - N; . 2
reads Y before write_item(X); 2
N is added, so read_item(X); 2
a wrong sum:= sum + X; 6
summary is read_item(Y); 8
the result sum:= sum + Y; 14
read_item(Y); 8
Y:= Y + N; 10
write_item(Y); 10
15.26
Dirty Read or The Temporary Update Problem
 One transaction updates a database item and then the transaction fails. The
updated item is accessed by another transaction before it is changed back to its
original value

T1: (joe) T2: (fred) Database Log Log


Joe books old new
seat on read_item(X); 4
flight X X:= X - N; 2
write_item(X); 2 4 2
read_item(X); 2
X:= X- N; -1
write_item(X); -1 2 -1
Joe failed write (X) 4 rollback T1
cancels
log

 transaction T1 fails Fred books seat on flight X


and must change the value of X back to its old value
because Joe was on Flight X
 meanwhile T2 has read the “temporary” incorrect value of X

15.27
 Next class we will discuss about

 SCHEDULES
 Serialisable and non-serialisable schedules
 Deadlock & Locking protocols
 Deadlock prevention
 Recovery Techniques

15.28
Schedules
 Schedules – sequences that indicate the chronological order in
which instructions of concurrent transactions are executed
 a schedule for a set of transactions must consist of all instructions of
those transactions
 must preserve the order in which the instructions appear in each
individual transaction

T1 T2
T1 T2 1
1 A one possible schedule: A
2 B B
3 C 2
D 3

C
D

15.29
Example Schedules
Transactions:
T1: transfers $50 from A to B
T2: transfers 10% of A to B
Example 1: a “serial” schedule

T1 T2
read(A)
Constraint: The sum of A+B
A <- A -50
must be the same
write(A)
read(B)
B<-B+50 Before: 100+50
write(B) =150, consistent
After: 45+105
read(A)
tmp <- A*0.1
A <- A – tmp
write(A)
read(B)
B <- B+ tmp
write(B)

15.30
Example Schedule
 Another “serial” schedule:

T1 T2 Before: 100+50
read(A)
tmp <- A*0.1 =150, consistent
A <- A – tmp After: 40+110
write(A)
read(B) Consistent but not the
B <- B+ tmp same as previous schedule..
write(B)
Either is OK!
read(A)
A <- A -50
write(A)
read(B)
B<-B+50
write(B)

15.31
Example Schedule (Cont.)
Another “good” schedule:

T1 T2
read(A)
Effect: Before After
A <- A -50
write(A)
A 100 45
read(A) B 50 105
tmp <- A*0.1
A <- A – tmp Same as one of the serial schedules
write(A) Serializable
read(B)
B<-B+50
write(B)

read(B)
B <- B+ tmp
write(B)

15.32
Example Schedules (Cont.)
A “bad” schedule

T1 T2
read(A)
A <- A -50
Before: 100+50 = 150
read(A)
tmp <- A*0.1
After: 50+60 = 110 !!
A <- A – tmp
write(A)
read(B) Not consistent

write(A)
read(B)
B<-B+50
write(B)

B <- B+ tmp
write(B)

15.33
Serializability
How to distinguish good and bad schedules?
 for previous example, any schedule leaving A+B = 150 is good

Q: could we express good schedules in terms of integrity constraints?

Ans: No. In general, won’t know A+B, can’t check value of A+B at given time
for consistency

Alternative: Serializability

15.34
Serializability
Serializable: A schedule is serializable if its effects on the db are the
equivalent to some serial schedule.
Hard to ensure; more conservative approaches are used in practice

All schedules

SQL serializable

Serializable schedules

“view serializable” schedules

“conflict serializable” schedules


15.35
Conflict Serializability
Conservative approximation of serializability
(conflict serializable => serializable but <= doesn’t hold)

Idea: can we swap the execution order of consecutive operation


wo/ affecting state of db, Xactions so as to leave a serial schedule?

15.36
Conflict Serializability (Cont.)
 If a schedule S can be transformed into a schedule S´ by a
series of swaps of non-conflicting instructions, we say that
S and S´ are conflict equivalent.
 We say that a schedule S is conflict serializable if it is
conflict equivalent to a serial schedule
 Ex:

T1 T2
T1 T2 …. ….
…. ….
can be rewritten
read(A) read(A)
to equivalent
read(A)
read(A) schedule
. .
. .
. .

15.37
Conflict Serializability (Cont.)
Example:

T1 T2
1. Read(A) Swaps:
2. A  A -50
3. Write(A) 4 <->d 5<->d 6<->d
4<->c 5<->c 6<->c
a. Read(A)
4<->b 5<->b 6<->b
b. tmp  A * 0.1
4<->a 5<->a 6<->a
c. A  A - tmp
d.Write(A)
4.Read(B) Conflict serializble
5. B  B + 50
6. Write(B) T1, T2

e. Read(B)
f. B  B + tmp
g. Write(B)

15.38
Conflict Serializability (Cont.)
The effects of swaps

T1 T2
read(A)
A <- A -50 Because example schedule could
write(A) be swapped to this schedule (<T1, T2>)
read(B)
B<-B+50 example schedule is
write(B) conflict serializable

read(A)
tmp <- A*0.1
A <- A – tmp
write(A)
read(B)
B <- B+ tmp
write(B)

15.39
The swaps we made
A. Reads and writes of different data elements

e.g.: T1 T2 T1 T2
write(A) read(B)
=
read(B) write(A)

OK because: value of B unaffected by write of A


( read(B) has same effect )
write of A is not undone by read of B
( write(A) has same effect)
Note : T1 T2 T1 T2
write(A) read(A)
=
read(A) write(A)
Why? In the first, T1 reads value of A written by T2. May be different
value than previous value of A

15.40
Swaps
T1 T2 T1 T2
write(A) read(A)
=
read(A) write(A)
What affect on state of db could above swap make?
Suppose what follows read(A) in T1 is:
read(C)
C C+A
write(C)

Unless T2 writes the same value to A,


the first schedule will leave a different value for C than the second

15.41
The swaps We Made
A. Reads and writes of different data elements
4 <-> d
6 <-> a

B. Reads of different data elements: 4 <-> a

C. Writes of different data elements: 6 <-> d

D. Any operation with a local operation

OK because local operations don’t go to disk. Therefore, unaffected by


other operations:
4 <-> b 5 <-> a ....
4 <-> c

To simplify, local operations are ommited from schedules

15.42
Conflict Serializability (Cont.)
Previous example wo/ local operations:

T1 T2 Swaps:
1. Read(A)
2. Write(A) 3 <->b
a. Read(A) 3<->a
b. Write(A) 4<->b
4<->a
3. Read(B)
4. Write(B)

c. Read(B) T1, T2
d. Write(B)

15.43
Swappable Operations
 Swappable operations:
1. Any operation on different data element
2. Reads of the same data (Read(A))
(regardless of order of reads, the same value for A is read)

 Conflicts: T2: Read(A) T2: Write(A)


T1: Read (A) OK R/W Conflict
T1: Write (A) W/R Conflict W/W Conflict

15.44
Conflicts
(1) READ/WRITE conflicts:
conflict because value read depends on whether write has
occurred

(2) WRITE/WRITE conflicts:


conflict because value left in db depends on which write
occurred last

(3) READ/READ : no conflict

15.45
Conflict Serializability
Q: Is the following shcedule conflict serializable? If so, what’s its equivalent
serial schedule? If not, why?
T1 T2

(1) read(Q)
write(Q) (a)
(2) write(Q)

Ans: No. Swapping (a) with (1) is a R/W conflict, and swapping (a) with (2)
is a W/W conflict.
Not equivalent to <T1, T2> or <T2, T1>

15.46
Conflict Serializability
Q: Is the following schedule conflict serializable? If so, what’s its equivalent
serial schedule? If not, why?

T1 T2 T3 Ans.: NO.
All possible serial schedules are
(1) Read(A) not conflict equivalent.
(a) Write(A)
(b) Read(B) <T1, T2, T3>
<T1, T3, T2>
(x) Write(B) <T2, T1, T3>
(y) Read(S) . . . . ..
(2) Write(S)

15.47
Conflict Serializability
Testing: too expensive to test a schedule by swapping operations
(usually schedules are big!)

Alternative: “Precedence Graphs”


* vertices = Xactions
* edges = conflicts between Xactions

E.g.: Ti  Tj if: (1) Ti, Tj have a conflicting operation, and


(2) Ti executed its operation in conflict first

15.48
Precedence Graph
An example of a “Precedence Graph”:

T1 T2 T3

Read(A) T1 R/W
(A )
Write(A) T2
Read(B) )
(B
W
R/
Write(B)
Read(S) T3

Q: When is a schedule not


conflict serializable?

15.49
Precedence Graph
Another example:

T1 T2 T3

Read(A) T1 R/W
(A )
Write(A) T2
Read(B) )
(B

R/W
W
R/

(S)
Write(B)
Read(S) T3

Write(S) Not conflict serializable!!


Because there is a cycle in the PG,
the cycle creates contradiction

15.50
Example Schedule (Schedule B)
T1 T2 T3 T4 T5

read(X)
read(Y)
read(Z)
read(V)
read(W)

read(Y)
write(Y)
write(Z)
read(U)
read(Y)
write(Y)
read(Z)
write(Z)
read(U)
write(U)

15.51
Precedence Graph for Schedule A

R/W (Y)

T1 T2
R/
W
(y
), R/W(Y)
R/W(Z) R /W
(Z
)

R/W(Z) , W/W(Z)
T3 T4

15.52
Test for Conflict Serializability
 A schedule is conflict serializable if and only if its precedence
graph is acyclic.
 Cycle-detection algorithms exist which take order n2 time, where
n is the number of vertices in the graph. (Better algorithms take
order n + e where e is the number of edges.)
 If precedence graph is acyclic, the serializability order can be
obtained by a topological sorting of the graph.
For example, a serializability order for Schedule A would be
T5  T1  T3  T2  T4 .

15.53
View Serializability
 “View Equivalence”:
S and S´ are view equivalent if the following three conditions
are met:
1. For each data item Q, if transaction Ti reads the initial value of Q in
schedule S, then transaction Ti must, in schedule S´, also read the
initial value of Q.
2. For each data item Q, if transaction Ti reads the value of Q written
by Tj in S, it also does in S’
3. For each data item Q, the transaction (if any) that performs the final
write(Q) operation in schedule S must perform the final write(Q)
operation in schedule S´.
As can be seen, view equivalence is also based purely on reads
and writes alone.

15.54
View Serializability (Cont.)
 A schedule S is view serializable it is view equivalent to a serial schedule.
 Example:
T1 T2 T3

Read(A)
Write(A)
Is this schedule Write(A)
view serializable? Write(A)
conflict serializable?

VS: Yes. Equivalent to <T1, T2, T3>

CS: No. PG has a cycle.

Every view serializable schedule that is not conflict


serializable has blind writes.

15.55
View Serializability
(1) We just showed: conflict serializable

view serializable

view serializable
(2) We can also show:

serializable

15.56
Other Notions of Serializability
Equivalent to the serial schedule < T1, T2 >, yet is not conflict equivalent or view equivalent to it.

T1 T2

Read(A)
A  A -50 Determining such equivalence
Write(A) requires analysis of operations
Read(B)
B  B - 10 other than read and write.
Write(B)
Read(B)
B  B + 50
Write(B)

Read(A)
A  A + 10
Write(A)

15.57
Transaction Definition in SQL
 Data manipulation language must include a construct for
specifying the set of actions that comprise a transaction.
 In SQL, a transaction begins implicitly.
 A transaction in SQL ends by:
 Commit work commits current transaction and begins a new one.
 Rollback work causes current transaction to abort.
 Levels of consistency specified by SQL-92:
 Serializable — default (more conservative than conflict
serializable)
 Repeatable read
 Read committed
 Read uncommitted

15.58
Transactions in SQL
 Serializable — default

- can read only committed records


- if T is reading or writing X, no other Transaction can change
X until T commits
- if T is updating a set of records (identified by WHERE
clause), no other Transaction can change this set until T
commits
 Weaker versions (non-serializable) levels can also be declared

Idea: tradeoff: More concurrency => more overhead to ensure


valid schedule.
Lower degrees of consistency useful for gathering approximate
information about the database, e.g., statistics for query optimizer.

15.59

You might also like