Distributed system - Class 13(Coordination)

Vector clocks (according to lampor timestamp)

Multiple demensions so that more machines can coordinate together. At least one dimension is higher and at least one dimension is equal, then you can infer the timestamp is before the other one.

if A happends before B, it doesn’t mean A causes B.

Mutual exclusion

Successful algorithm of system must ensure:

Correctness, your system doesn’t do multiple processes at one time.
The order of serve.
Liveness of processes, if a client holding the resource die, the server doesn’t get reply then it knows the client dies, so it can clean the data and give permission to others. If something is good, it must happen eventually if give enough time. Eg, if one person geting the entry permission is good thing, the person will get the permission eventually.

There is perfermance bottleneck, distributed system wants to do concurrent to imporve the perfermance, but it’s not allowed to execute many processes at the same time. How to solve this? Two approches:

Permission based
Token based

Permission based

sysmply use a coordinator: You ask coordinator for permission, coordinator say “ok” then you can access, otherwise put you in the waiting queue and not reply you, when last person finish then tell coordinator he is done, coordinator replies to you “ok”, then you can access.
according to timestamp, decide the order. Both of A and B send requests and also their timestamps, the timestamp lower will has permission first. When one leaves, send “ok” to the other person, then the person enter. How to solve one person’s failure? One person dies, he can’t send “ok” back to the other person. To centralize system, it works, but to distributed system, it doesn’t work.

Token based

when you can enter depends on your position in the ring, the token will pass to you when the last person in the ring is done. Eg, “5” sends token to “6”, let “7” knows “6” is holding the token, start to set time out. If “6” dies, “7” konws, then “5” sends token to “7” directly.

What happends if coordinator fails? – Election algorithm

Having different coordinators to do the same thing is not a good idea, because they may reply differently, then who you should believe.

higher ID can be new coordinator (bully algorithm)

4 sends reuqest to 7 to get permission, but 7 fails, 4 say it seems 7 fail, then it asks who want to be new coordinator between 5, 6, 7. Alive IDs “5” and “6” response “ok”. “5” and “6” start to held an election, “5” sends to “6” and “7”, only “6” response “ok, I can be a new coordinator”. At the same time, “6” sends request to “7”, “7” doesn’t response. At last, “6” wins, it tells everyone “I am the new coordinator”.

election algorithm using a ring

0 - 1 - 2 - 3 - 4 - 5 - 6 - 7 - 0. Start from “3”, “3” sets time out and know a coordinator die, and puts its own ID on message [3], “4” also puts its own ID on message [3,4], … until “6” passes [3,4,5,6] to “7”, “7” doesn’t response, the message goes to “0”, “0” only recieve [3,4,5,6], then it knows “7” fails, … the message goes to “3”, “3” knows “7” fails as well and pick highest ID as new coordinator. “3” starts to pass message to let everyone know “6” is the new coordinator.

Reference material:
Book: Distributed Systems, Third edition, Version 3.02(2018), Maarten van Steen and Andrew S. Tanenbaum.
Lectures: University of Waterloo, CS 454/654 (Distributed System), 2020 winter term, Professor Khuzaima Daudjee.