|
Distributed Computing
This website demonstrates using wikis as teaching and learning tool.
The course instructor is also happy to share the teaching materials here with those who find it readable.
|
Time in Distributed Systems
A Distributed Computing Lecture by Steven Choy
Acknowledgement: Some of the materials in this lecture are based on Chapter 11: Time and Global State in 'Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4 (the website for the book is http://www.cdk4.net/). Please refer to this chapter of the textbook.
Introduction
Time in Distributed Systems
- Time is important. Why?
- However, it is problematic in distributed systems. Why?
- We want to know when something happened.
- Algorithms may depend upon clock synchronization.
- No global notion of time.
Clocks, Events, and Process States
- We define an event to be the occurrence of a single action that a process carries out as it executes.
- An event is a communication action or a state-transforming action.
- Clocks — every computer has one, and it can be used to timestamp any event.
System Model
- N processes in a distributed system:
- Each executes on a single processor.
- No shared memory.
- The only communication means is exchange of messages.
- pi has state si.
- An event is the occurrence of a single action
- An internal state-transforming action by pi, or
- a communication action
- e →i e' iff e occurs before e’ in pi
Clock Skew
- Each computer has its own physical clock.
- The software clock Ci can be used to timestamp any event at pi.
- Problems: Clock skew and clock drift.
Coordinated Universal Time (UTC)
- International standard for timekeeping
- Based on atomic time
- Synchronizes with astronomical time using leap seconds
- UTC signals are synchronized and broadcast regularly from land-based radio stations and satellites.
- Computers with receivers attached can synchronize their clocks with these timing signals.
Synchronizing physical clocks
Basics
- External synchronization:
- setting the time to some external source of time
- For each i, synchronize Ci with an authoritative, external source of time.
- Internal synchronization:
- setting the time based on "local agreement" (local time)
- For each i,j, synchronize Ci and Cj with each other
- Three typical methods of synchronization
- Cristians algorithm.
- The Berkeley algorithm.
- Network time protocol (NTP).
Cristian’s Algorithm
- Basic idea - Getting the current time from a "time server", using periodic client requests
- Major problem – what happens if the time from the time server is less than the client
- resulting in time running backwards on the client! (Which cannot happen – time does not go backwards)
- Minor problem - results from the delay introduced by the network request/response: latency
- Procedures:
Process p requests time in m r and receives t in m t from S
p records total round-trip time: T round
p sets its clock to: t + T round / 2
If minimal transmission time is known, accuracy can be calculated
Class Exercise
- A client attempts to synchronize with a time server. It records the round-trip times and timestamps returned by the server in the table below.
Round-trip time: 22ms; Time: 10:54:23.674
Round-trip time: 25ms; Time: 10:54:25.450
Round-trip time: 20ms; Time: 10:54:28.342
- Question 1: Which of these times should it use to set its clock? To what time should it set it?
- Question 2: Estimate the accuracy of the setting with respect to the server’ clock.
- Question 3: If it is known that the time between sending and receiving a message in the system concerned is at least 8 ms, do your answers change?
- Answers : The client should choose the minimum round-trip time of 20 ms = 0.02 s. It then estimates the current time to be 10:54:28.342 + 0.02/2 = 10:54:28.352. The accuracy is ± 10 ms. If the minimum message transfer time is known to be 8 ms, then the setting remains the same but the accuracy improves to ± 2 ms.
The Berkeley algorithm
- Internal synchronization
- A coordinator computer chosen to act as master
- Master periodically polls the other computers (the slaves)
- Slaves send back their clock values to master
- Master calculates an average (taking the roundtrip times into account)
- Master sends the amount by which each individual slave’s clock requires adjustment
- The algorithm eliminates readings from faulty clocks
The Network Time Protocol (NTP)
- The most used clock synchronization solution on the internet is the Internet Network Time Protocol (NTP) which is a layered client-server architecture based on UDP message passing.
- The Network Time Protocol (NTP) is a protocol for synchronizing the clocks of computer systems over packet-switched, variable-latency data networks. NTP uses UDP port 123 as its transport layer. It is designed particularly to resist the effects of variable latency
Experience NTP
pool.ntp.org: the internet cluster of ntp servers - The pool.ntp.org project is a big virtual cluster of timeservers striving to provide reliable easy to use NTP service for millions of clients without putting a strain on the big popular timeservers.
pool.ntp.org: How do I setup NTP to use the pool? - To synchronise your computers clock to the network, you just need to add few statements in the configuration file (for the ntpd program from the ntp.org distribution, on any supported operating system - Linux, *BSD, Windows and even some more exotic systems).
If you're using a recent Windows version, you can also use the ntp client that is built into the system. Just execute:
net time /setsntp:"0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org"
net time /set
NTP Design Goals
- To provide a service enabling clients across the Internet to be synchronized accurately to UTC
- To provide a reliable service that can survive lengthy losses of connectivity
- To enable clients to resynchronize sufficiently frequently to offset the rates of drift found in most computers
- To provide protection against interference with the time service, whether malicious or accidental
How NTP Works
- Provides a network of servers located across the Internet
- Primary servers attached to a UTC time source
- Secondary servers connected to a primary server for synchronization
- The servers are connected in a logical hierarchy called a "synchronization subnet", whose levels are called "strata"
- The synchronization subnet can reconfigure when servers become unreachable or failures occur
- Messages are delivered using UDP
- Some illustrations: From wikipedia, from usno.navy.mil, from process.com
Logical Clock and Vector Clock
Logical Clocks
- Synchronization is based on relative time.
- Note that (with this mechanism) there is no requirement for relative time to have any relation to the real time.
- What’s important is that the processes in the Distributed System agree on the ordering in which certain events occur.
- Such clocks are referred to as Logical Clocks.
Idea on Lamport Logical Clock
- If two processes do not interact, then their clocks do not need to be synchronized
- They can operate concurrently without fear of interfering with each other
- It does not matter that two processes share a common notion of what the “real” current time is. What does matter is that the processes have some agreement on the order in which certain events occur
- Lamport used these two observations to define the “happens-before” relation (also often referred to within the context of Lamport’s Timestamps)
The Happens-Before Relation
- If A and B are events in the same process, and A occurs before B, then we can state that: A “happens-before” B is true
- Equally, if A is the event of a message being sent by one process, and B is the event of the same message being received by another process, then A “happens-before” B is also true
- Note that a message cannot be received before it is sent, since it takes a finite, nonzero amount of time to arrive … and, of course, time is not allowed to run backwards
- Obviously, if A “happens-before” B and B “happens-before” C, then it follows that A “happens-before” C
- If the “happens-before” relation holds, deductions about the current clock “value” on each distributed system component can then be made
- It therefore follows that if C(A) is the time on A, then C(A) is less than C(B), and so on
- Now, assume three processes are in a distributed system: A, B and C
- All have their own physical clocks (which are running at differing rates due to “clock skew”, etc.)
- A sends a message to B and includes a “timestamp”
- If this sending timestamp is less than the time of arrival at B, things are OK, as the “happensbefore” relation still holds (i.e. A “happensbefore” B is true)
- However, if the timestamp is more than the time of arrival at B, things are NOT OK (as A “happensbefore” B is not true, and this cannot be as the receipt of a message has to occur after it was sent)
- The question to ask is: How can some event that “happens-before” some other event possibly have occurred at a later time?
- The answer is: it can’t!
- So, Lamport’s solution is to have the receiving process adjust its clock forward to one more than the sending timestamp value. This allows the “happens-before” relation to hold, and also keeps all the clocks running in a synchronized state. The clocks are all kept in sync relative to each other
Lamport logical clocks and timestamps
- A Lamport logical clock is a monotonically increasing software counter.
- Each process pi keeps its own logical clock Li which is used to apply Lamport timestamps to events.
- To capture the happened-before relation → , processes update their logical clocks and transmit the values of their logical clocks in messages as follows:
- Before each event at pi: Li := Li + 1
- When pi sends a message m, it piggybacks t=Li
- When pj receives (m,t): Lj := max(Lj,t) + 1
- e → e’ => L(e) < L(e’)
- Example
- More Examples from the Web
Vector Timestamps
- Observation: Lamport timestamps do not guarantee that if C(a) < C(b) that a indeed happened before b.
- We need vector timestamps for that.
- Each process Pi has an array Vi[1...n] , where Vi[j] denotes the number of events that process Pi knows have taken place at process Pj.
- When Pi sends a message m, it adds 1 to Vi[i], and sends Vi along with m as vector timestamp vt(m). Result: upon arrival, each other process knows Pi’s timestamp.
V i[i]: The number of events that p i has timestamped;
Vi[j]: The number of events that have occurred at pj that pi has potentially been affected by
- The simple rules for updating the vector clocks:
- Initially, Vi[j] = 0 for i, j = 1, 2..., N
- Just before pi timestamps an event, it sets Vi[i] := Vi[i] + 1
- pi includes the value t = Vi in every message it sends
- When pi receives a timestamp t in a message, it sets Vi[j] := max(Vi[j], t[j]), for j = 1, 2...,N. Taking the component-wise maximum of two vector timestamps in this way is known as a merge operation.
Ordering with Vector Timestamps
- Lamport clock: e1 → e2 => C(e1) < C(e2)
- Vector Clock: e1 → e2 => V(e1) < V(e2) and V(e1) < V(e2) => e1 → e2
Some Examples
Thanks for Reading
If you would rather like to have this lecture note in printed format, please click the print action link in the top right corner.
If you find any problem in this lecture note, please feel free to reach Steven by steven@findaway.hk
|