semantics of file sharing
file locking
file server let you know where the callback promise from which exact client who has the copy, some client is writting the file,
your turn to decide you want old copy or the latest one.
Adrew File System , keep ida of caching on client side, use different approach. The majority of users’ files are small. Squential reads chunks of files.
Temporal reads means if read the block of file now, in the futhure user will read it again.
Cacheing algorithms plays two things: frequency and recency. How smart you can play them.
Sharing File in Coda
client — server
— freshness interval : t —
— time cache entry was last validated by client : T_c , time copy is valid , client gets from the server—
— current present time : T, time —
— T_m client(or server) : time file was last modified at server as recorded by client(or server) —
(T - T_c) < t or T_m client == T_m server. If (current time - last valid time) < internal time. If last modiefied time are the same on both server and client, it’s ok to continue using the file.
When client B close file, copy send to the server, client A gets the call back from server “copy invalidation” then decide get new file or keep old file. But if only read, A is fine with it. If client A wants the new copy, client A close the file and open it again, server will send new copy when “open” action happens. If client A open file by read only, no need to send the copy back to server and the server no need to send message about the file to client B while B is reading the file, because B already has the latest copy.
cluster-based distributed file systems
Blocks of file are storaged in distributed storage server. Two types:
- file splits into blocks, blocks from same file go to the same server,have high fault tolerance, because one server down still other server can acess other files.
- same file splits into different server, for large scale, reads in pair so that once one server is down, still can access any block.
master:(file name space mapping from file names to chunk serers)
- only metadata, name space
- keep in memory, periodically made persistent, if fail can send the last valid mapping is
- each GFS file divided into 64 MB chunks
- chunks are distributed accross chunk servers
check server:
- chunks are replicated across chunk servers (1 primary chunk server)
chunck server:
- keepts up identifier to located chunks
which chunk of file should be responded, master server doesn’t store real data, only store meta data, also called name server, know which chunk server stores the chunk of file. In the chunk server, there are replication on chunk server, at least one another chunk server replicate files. Only primary chunk server answers the request from client, if primary chunk server down, the request goes to replicated chunk server. Eg. client “write” action happends, master server send back the index, client find the primary chunk server according to index, when done, the primary chunk server updates all replication on other chunck servers.
cluster helps files locate to chunk servers.
Reference material:
Book: Distributed Systems, Third edition, Version 3.02(2018), Maarten van Steen and Andrew S. Tanenbaum.
Lectures: University of Waterloo, CS 454/654 (Distributed System), 2020 winter term, Professor Khuzaima Daudjee.