A Distributed Computing Lecture by Steven Choy
Some Basics of Transaction
- A transaction is the basic logical unit of execution in an information system.
- A transaction is a sequence of operations that must be executed as a whole.
- Example: Transfer $500 from Account A to Account B
(1) Debit Account A $500
(2) Credit Account B $500
The database system must ensure that either (1) and (2) happen or that neither happens. Otherwise inconsistency occurs.
- Desirable Properties of Transactions (ACID)
- Atomicity - Either all or none of the operations in a transaction are performed. That means a transaction either completes successfully and the effects of all of its operations are recorded in the database or else, if it fails, it has no effect at all.
- Consistency - The set of operations taken together (in a transaction) should change the system from one consistent state to another. In fact, it is generally the responsibility of the programmers of servers and clients to ensure that transactions work in consistent states.
- Isolation - An incomplete transaction cannot reveal its result to other transactions before the result is committed. If a transaction is incomplete, we have no idea whether it will commit or abort. Thus, its incomplete result should not affect other transactions, and therefore we need to isolate its result from others. This property can be achieved if transactions are serializable (they have the same effect when transactions are run serially, one by one). In other words, each transaction perceives the system as if no other transactions were running concurrently.
- Durability - Once a transaction is committed, the system must guarantee that the results of its operations will persist in the data items, even if system failures occur. In other words, a transaction is not committed until its result is stored in its database and the modification can be identified by other transactions
- Transaction Serializabilty
The effect on a database of any number of transactions executing in parallel must be the same as if they were executed one after another
Transaction Operation Primitives
- BEGIN_TRANSACTION: This primitive marks the start of a transaction.
- END_TRANSACTION: This primitive normally terminates the transaction and tries to commit it.
- ABORT_TRANSACTION: This primitive kills the transaction and restores the old values.
- READ: This primitive reads data from a file (or other data object).
- WRITE: This primitive writes data to a file (or other data object).
Classification of transactions
- Flat and nested transactions
- Distributed transactions
- Issues: atomic commit protocol, implementation of distributed transactions
The Need for Concurrency Control of Transaction Execution
- Most DBMS are multi-user systems.
- Many transactions are executed concurrently.
- The concurrent execution of transactions may lead, if uncontrolled, to problems such as an inconsistent database.
- There is a need to ensure that concurrent transactions do not interfere with each others operations.
- That is, we need concurrency control techniques (transaction scheduling algorithms) to ensure transaction serializability.
Problems of concurrent execution of transactions
- Example: The initial balances of the three accounts A, B, and C are $400 each. Transaction T transfers $100 from Account A to Account B. Transaction U transfers $300 from Account C to Account B. Transaction V calculates the total amount of Accounts A, B, and C.
Transaction T:
balanceT1 = READ(A);
WRITE(A, balanceT1-100);
balanceT2 = READ(B);
WRITE(B, balanceT2+100);
Transaction U:
balanceU1 = READ(C);
WRITE(C, balanceU1-300);
balanceU2 = READ(B);
WRITE(B, balanceU2+300);
Transaction V:
total = READ(A);
total = total + READ(B);
total = total + READ(C);
- Class Exercise: Suppose the three transactions are done sequentially (serially, one at a time). What are the end results of Acccounts A, B, and C? Does the order of transactions matter?
- Class Discussions: What would happens when the three transactions are allowed to run concurrently without any concurrency control measures?
- Concurrent execution of transactions allows for performance improvement, but it might also create problems if there are no rules or control measures to take care of interleaving schedules.
The Lost Update Problem
T: balanceT1 = READ(A);
T: WRITE(A, balanceT1-100);
| Account A is changed from $400 to $300.
|
U: balanceU1 = READ(C);
U: WRITE(C, balanceU1-300);
| Account C is changed from $400 to $100.
|
T: balanceT2 = READ(B);
| No effect to Account B.
|
U: balanceU2 = READ(B);
U: WRITE(B, balanceU2+300);
| Account B is changed from $400 to $700.
|
T: WRITE(B, balanceT2+100);
| Account B is changed from $700 to $500.
|
- You see that the value of Account B is incorrect, because Transaction T reads the value of B before Transaction U changes it and hence the updated value resulting from U is lost. We call this the lost update problem.
- Another example:
T: balanceT1 = READ(A);
T: WRITE(A, balanceT1-100);
| Account A is changed from $400 to $300.
|
U: balanceU1 = READ(C);
U: WRITE(C, balanceU1-300);
U: balanceU2 = READ(B);
| Account C is changed from $400 to $100.
No effect to Account B.
|
T: balanceT2 = READ(B);
T: WRITE(B, balanceT2+100);
| Account B is changed from $400 to $500.
|
U: WRITE(B, balanceU2+300);
| Account B is changed from $400 to $700.
|
- Here the value of Account B is incorrect, because Transaction U reads the value of B before Transaction T changes it and hence the updated value resulting from T is lost.
- Class Exercise: Using transactions T and U as an example, there are still other possible interleaving schedules that will cause the lost update problem. Can you work out a few of them?
The Incorrect Summary (Inconsistent Analysis) Problem
T: balanceT1 = READ(A);
T: WRITE(A, balanceT1-100);
| Account A is changed from $400 to $300.
|
V: total = READ(A);
V: total = total + READ(B);
V: total = total + READ(C);
|
Value total is now 300.
Value total is now 700.
Value total is now 1100.
|
T: balanceT2 = READ(B);
T: WRITE(B, balanceT2+100);
| Account B is changed from $400 to $500.
|
- We expect the total amount of Accounts A, B, and C is equal to $1200. However, Transaction V gets $1100 as the total amount, which is incorrect. The problem happened because Transaction V reads inconsistent values of Accounts A and B.
- The Incorrect Summary Problem - One transaction is calculating an aggregate summary function on a number of records while other transactions are updating some of these records. The aggregate function may calculate some values before they are updated and others after.
The Temporary Update (uncommitted dependency) Problem
- One transaction updates a database item and then the transaction -for some reason- fails. The updated item is accessed by another transaction before it is changed back to its original value. (Also known as dirty read problem)
Schedules of Transactions
- Serial, Nonserial and Serializable (Serially Equivalent) Schedules
- A schedule S is serial if, for every transaction T participating in S all of T's operations are executed consecutively in the schedule; otherwise it is called nonserial.
- A schedule S of n transactions is serializable if it is equivalent to some serial schedule of the same n transactions.
Serially Equivalent and Conflicting Operations
"For two transactions to be serially equivalent, it is necessary and sufficient that all pairs of conflicting operations of the two transactions be executed in the same order at all of the objects they both access." (p.476 of the textbook)
Two transactions T and U are defined as follows:
T: y = read(k); x = read(i); write(j,22);
U: write(i,33); write(j,44);
Identify any pair of conflicting operations.
Give three interleavings of transactions T and U that are not serially equivalent.
Give three interleavings of transactions T and U that are serially equivalent.
Concurrency Control Algorithms
- Locking
- Timestamp ordering
- Optimistic concurrency control
We will have a more detailed study on them in next lecture.
Class Exercises
A server manages the objects a1, a2,...an. The server provides two operations for its clients:
read (i) returns the value of ai;
write(i, Value) assigns Value to ai,
The transactions T and U are defined as follows:
T: x= read (j); y = read (i); write(j, 44); write(i, 33);
U: x= read(k); write(i, 55); y = read (j); write(k, 66).
- Question 1: What are the two possible serial executions and their effects?
- Question 2: Give three serially equivalent interleavings of the transactions T and U.
Thanks for Reading
If you would rather like to have this lecture note in printed format, please click the print action link in the top right corner.
If you find any problem in this lecture note, please feel free to reach Steven by steven@findaway.hk