..
Termination Detection
Introduction
- Why should we care about termination detection?
- Usually in a distributed context many subproblems can only start after other subproblems are completed.
- Therefore it is essential that the termination is detected.
- Is termination detection in a distributed context trivial? If not justify.
- It is not trivial
- No system in a distributed context has complete knowledge of the other systems
- There is no global clock
- These factors make termination detection non-trivial
- Define global termination
- All the processes in the system are terminated locally.
- No messages are in transit.
- Define local termination
- Completed the underlying local computation.
- Won’t start any actions unless a message is received.
- How many computations are running during a termination detection algorithm
-
- The base computation and the computation to determine termination
- What are the type of messages
- Basic
- Control
-
- What should be ensured by a TD algorithm ?
- Should not freeze the underlying computation indefinitely.
- Shouldn’t add any new communication channels.
System Model of a Distributed Computation
- List the characteristics of a distributed computation
- A process can be in only one of two states
- Active/busy
- Idle/passive
- An active process can become idle at any time. This indicates that the process has finished its local computation and processed all its messages
- An idle process can only become active on receiving a message.
- Only active processes can send messages
- A process can receive a message in both states
- Send and receive are atomic operations
- What is the assumption here?
- All processes are initially idle and are put into active state by an external message
- Elaborate the reasoning behind the assumption All processes are initially idle and are put into active state by an external message?
- It is known that only active processes can send messages. In the beginning all the processes are idle. So no messages are being sent. Thus an external message is needed to kick off the computation
- By assuming the above, the rule “idle processes only become active on receiving a message” can be enforced
- A process can be in only one of two states
- Mathematically define the termination condition
- $(\forall i::p_i(t_0) = idle) \land(\forall i \forall j::c_{i,j}(t_0)=0)$
Termination Detection Using Distributed Snapshots
- Why will this algorithm work?
- Mention the one key idea behind this algorithm?
- For every computation there exists a unique process that became idle last
- Informally describe the algorithm in one sentence
- When a process goes from active to idle, it requests everyone to take a snapshot
- When a request is received, the process checks whether the requester became idle later than itself. If that was the case then a snapshot is taken
- Formally define the algorithm
- A process can send a basic message to another process via $B(x)$
- When a process receives a basic messages it does the following
- $x := x + 1$
- $if\ idle\ then\ become\ active$
- When a process goes from active to idle it,
- $x := x + 1$
- $k := i$
- Send out $R(x, k)$ to all other processes
- When a process receives a $R(x’, k’)$ it follows the following
- if $(x’, k’) > (x, k) \land is_idle()$ then take a local snapshot and $(x, k) := (x’, k’)$
- if $(x’, k’) \le (x, k) \land is_idle()$ then do nothing
- if $not\ is_idle()$ then $x := max(x, x’)$ - Here think of $x$ as global clock value and $k$ as the process that this process thinks became idle the last
Spanning-tree-based Termination Detection Algorithm
- What is a token ?
- This is the signal sent by the leaf nodes to it parent.
- What is a repeat signal ?
- If the termination is not detected by the root then it send out a repeat signal to its children to restart the process
- When does a node considered to be in $S$
- If it has one or more tokens
- When does a node considered to be outside of $S$
- Idle processes
- More specifically if a process doesn’t have a token and a node in the path from the root to itself is in $S$
- Give the simple algorithm
- All the leaf nodes are given with a token
- If a leaf node becomes idle it sends its token to its parent
- For all the inodes, when they become idle and have received a token from each of their children, they send a token to their parent
- Program is terminated when the root node is idle and has received a token from all of its children
- What is the problem with the simple algorithm
- The problem occurs when a message is sent to a node that is idle. This means that the nodes should become active again thus making the token it had sent void.
- How to fix this using a simple concept?
- This can be fixed by using a coloring scheme
- A process becomes black if it sends a message
- A black process sends black tokens
- A process who has at least one black token has to also send a black token
- A process becomes white when it sends a black token to its parent
- Give all the steps of this algorithm
- Who is initially in the set $S$ ?
- All the leaf nodes as they are given a token
- When does a process become black ?
- When it sends a message
- When does can a process become white ?
- When it sends a black token to its parent
- What is the condition for termination ?
- Root is idle
- Root is white
- Root received white tokens from its children
- Who is initially in the set $S$ ?
- What is the complexity of this algorithm?
- $\Omega(N)$ - Only one pass. No messages sent
- $\mathcal{O}(N\times{}M)$ - $M$ is the number of messages sent