Distributed system - Class 15

Two-Phase Locking (2PL)

continue obtaining locks, until the end, you release all locks.

If you start a write transaction, nobody can get the locker.

But this is not practical, because when you have a read lock, then want a write lock, but you can’t anymore lock if you release all locks. So you need to know the logic behind it and hold the lcok no longer than you need. So you must know the concurrency in a time unit.

Dead lock happens depends on your lock protocals. For example, one transaction holds read lock, and wait writing lock, but another transaction holds write lock, and wait for reading lock to finish. Then dead lock happends.

strict 2PL

holding all locks until the end

centralized 2PL
locker manager manages all locks for all transactions.
dsitributed 2PL

locks are on side of data, which means locks partition to mutilple sites. This way is better for distributed system but more cost for messages.

Deadlock

You holds the resource I want and I also hold the resource you want.

one solution: One people holds the non-shared resource.

Deadlock management

It’s doesn’t matter centralized system or distributed system, the only resolution is to kill one transaction.

prevent deadlock happen: all resource may be needed predelcare, 确保根本不发生 (highest level)
avoid potential deadlock: request lock in a certian order, 避免可能发生的情况 (middle level)
detection and recovery: check graph cycle and kill one transaction, 解决发生 (low level),kill youngest transaction which not done much work

Probing: each side know which lock others are waiting for, record the meassage and transfer to next, until one site sees the cycle knows it’s a deadlock, so it will abort or ask another site to abort.

Timestamp ordering

There are a banch of transaction waiting, I assgin each of them a stamp which is uique and I serialize them in order and then serve them in order.

If no confilct, I will serve them concurrently, but if conflict, I will serve them in timestamp order.

Every data item has a:

read-ts(x): largest timestamp from among all timestamps if txns that have successfully read x, if T2 reads, then read-ts(2); T3 wants to read, then read-ts(2) changes to read-ts(3); T1 wants to read, then read-ts(3) keeps because it’s the largeset.
Write-ts(y): largest ts from among all timestamps of txns that have successfully written x

compare write-ts(y) with read-ts(x), make sure y < x.

Eg,

r-ts(x)	0
w-ts(x)	0

(1) w_1(x)

write confilcts with read and write. 1 > 0, then update w-ts(x):1.

r-ts(x)	0
w-ts(x)	1

(2) r_2(x)

read only conflict with write. 2 > 1, then it can execute. Execute it, then update r-ts(x): 2.

r-ts(x)	2
w-ts(x)	1

(3) r_1(x)

read only conflicts with write. 1 = 1, then it can execute, after execution, 1 < 2, then no need to update r-ts(x).

r-ts(x)	2
w-ts(x)	1

(4) w_1(x)

wirte conflicts with read and write, 1 < 2, so it can’t execute.

r-ts(x)	2
w-ts(x)	1

Generally, if you abort one transaction, you should abort all related transactions. But in timestamp ordering transacton, you only need to abort one transaction adn restart it to give it a higher timestamp.

Reference material:
Book: Distributed Systems, Third edition, Version 3.02(2018), Maarten van Steen and Andrew S. Tanenbaum.
Lectures: University of Waterloo, CS 454/654 (Distributed System), 2020 winter term, Professor Khuzaima Daudjee.