Distributed system - Class 16(Replication)

fundamental issue of replication: improve good performance

Four reason how replication improve good performance:

decrease latency, bring data closer to client
for avaliablity, once server fails, client can still continue doing its job
for avaliablity, if one server dies, you should back up data to another server, “back up” is replication
For consistence (improving trust), on server, client A read value back, you trust. But if another client B change it, do you trust it? If you read two copy have the same value, you can trust it. Replication improves the trust.

When you design a distributed system, (1) you should think about the replication, how to map the logical copy to physical copies. (2) what is your replication requirements? eg, if client writes a file, you update all copies or only one copy. Transparency: you do the operation, don’t need to know how replication applied. (3) Then what algorithm you will use to satisfy requirements.

Some levels of consistency

strict consistency

everthing happens in real time, read file A - machine 2, write file A - machine 1, close ifle A.

result: mechine 1 should see file A changes.

Linearizability

sign timestamp to each operation. Server holds the order of all timestamps. Keeping global timestamp is chanllenge and expensive.

Sequential consistency

Compared to strict conssitency, sequential consistency doesn’t have global timestamps, which order should be applied?

Image you buy seeds online, read means you see 2 seeds avaliable online, write means you purchase online.

For example:

mechine 1 do write, mechine 2 do read on the same file. Which global order should be?

If read before write, machine 2 can only read old data. In verse, read new data.

For example:

Mechine 1 will do r1[x] and w1[x], machine 2 will do r2[x] and w2[x].

w1[x] r1[x] w2[x] r2[x]. Wrong, don’t inverse order for one meachine.
r1[x] w1[x] r2[x] w2[x]. Correct, for each machine, the order is keeped.
r2[x] w2[x] r1[x] w1[x]. Correct, it doesn’t matter who does write first, because when client wants to purchase, others may have purchased. Client doesn’t know the order behinds, he thinks it may happen because my order is late than others.
r1[x] r2[x] w1[x] w2[x]. Correct.
r2[x] r1[x] w2[x] w1[x]. / r2[x] r1[x] w1[x] w2[x]. Correct.

Main idea: Obey the order on each meachine, order of individual process depends on this program.

transactional replication consistency

copy stands for physical machine which stores copy.

eager/synchronous replication: if you do transaction on one copy, all conpies will be updated imediately. (1) expensive commit. One transaction, update all copies, then commit transactions on all copies. (2) when transaction is long-lined. One transaction takes long time during one copy, after while, when apply to another copy, the copy isn’t avalibale at this time, then replication skip it. The replication fail. (3) update everything at once. If one copy is down, you can’t be synchronous. You can wait or abort and start transaction again.
lazy/asynchronous replication: if transaction on one copy, it only updates one master copy. If one copy fails, transaction applies to another copy. If one large transaction, you break it into many small transactions on different copies. Long-lined transaction problem is solved. If one copy is not avalibale, you work on another cocpy. Weak: transaction 1 on copy A, transaction 2 can only apply on copy B, at the end, you want to merge two transactions on A and B. On A, T1 - T2. On B, T2 - T1. The merge order on two copies is different. Solution: one copy must be master copy. Then A and B know apply which transaction first.

Reference material:
Book: Distributed Systems, Third edition, Version 3.02(2018), Maarten van Steen and Andrew S. Tanenbaum.
Lectures: University of Waterloo, CS 454/654 (Distributed System), 2020 winter term, Professor Khuzaima Daudjee.