Paxos

Paxos is a flexible and fault-tolerant consensus protocol that was defined by Leslie Lamport in his paper The part-time parliament. ACM Trans. on Comp. Syst. 16 (2), 133-169 (1998).

In order to fully describe the algorithm and construct, we must generalize the assumption we made about the preceding topology. Instead of a single P0 process making the change, essentially a process (or node, in Paxos parlance) can take up one of three roles:

  • Proposer: This is the node driving the consensus.
  • Acceptor: These are nodes that independently accept or reject the proposal.
  • Learner: Learners are not directly involved in the consensus building process, they learn of the accepted values from the Acceptor. Generally, Learners and Acceptors are packaged together in a single component.

The basic steps in Paxos are very similar to the two-phase commit. As in the two-phase protocol, in the standard Paxos algorithm, proposers send two types of messages to acceptors: Prepare and Accept. However, for the Prepare phase, in addition to the value being proposed, they also send a proposal number (n). These proposal numbers must be positive, monotonically increasing numbers and unique across all the processes. One way to achieve this is to construct the proposal number from two integersone identifying the process itself and the other being a per-process counter. Whenever an Acceptor gets conflicting proposals, it chooses the one with the higher proposal number. An acceptor has to remember the highest numbered proposal that it has ever accepted and the number of the highest-numbered prepare request to which it has responded.

The various stages are described here:

  • Phase 1:
    • Prepare: The Proposed constructs a Prepare message with the value (v) and the proposal number, N (which is greater than any previous number used by that process). This message is then sent to a Quorum of Acceptors.
    • Promise: When an Acceptor gets the Prepare message, it checks that the proposal's number (N) is higher than any previous proposal number that has been accepted. If so, it logs the latest accepted value and the sequence number, N. Any Prepare messages with proposal numbers less than N are ignored (even though response is not necessary, sending a NACK will help the algorithm to converge faster). If the Acceptor accepted a proposal at some point in the past, it must include the previous proposal number and previous value in its response to the Proposer.
An acceptor can be in a state where it has accepted multiple proposals.
  • Phase 2
    • Accept Request: Once the Proposer receives response messages from the majority of the nodes, it moves the algorithm to the Acceptance phase. The proposer essentially wants the acceptors to commit to what they accepted. Here, there are three cases:
      •  If a majority of the Acceptors reply with a NACK message or fail to reply, the Proposer abandons the proposal and will go back to the initial state/phase.
      • If none of the Acceptors have accepted a proposal up to this point, the Proposer may choose the original value, v, with proposal number, N.
      • If any Acceptors had previously accepted any proposal, the value and sequence numbers will be available at the Proposer. Here, if w is the value of the accepted values with the higher sequence number (say, w), Paxos forces the Proposer to drive the acceptance of w (not v). This prevents the new Proposer who died and came back up to not diverge the system from consensus.

  • The Proposer sends an Accept message with the chosen value to all the Acceptors:
    • Acceptance: When an Acceptor receives the Accept message, it checks for the following conditions:
      • The value is one from one of the previously accepted proposals.
      • The sequence number in the message is the highest proposal number the Acceptor has agreed on.
      • If both conditions are met, the Acceptor sends an Accept message back to Proposer. Otherwise, a Reject message is sent.

Paxos is more failure-tolerant than the multi-commit algorithm because of the following:

  • Proposer failure-tolerance: If a Proposer fails in-between, another node can take up the role and issue its own proposal.
  • If there are dueling Proposers, especially after an earlier Proposer recovery, then due to ordering imposed by the sequence number's only a previously accepted value can be chosen.
  • Network partitioning does not affect Paxos as it does to the three-phase commit protocol, because just a majority of acceptors is needed. If a majority is there, consensus is reached, even if the other nodes are not reachable and it's not there, the round is failed.

One potential issue with Paxos is that it is possible for two dueling proposers to keep issuing proposals with increasing numbers. Acceptors might ignore messages with lower proposal numbers, which might cause Proposers to continuously try with higher and higher proposal numbers. To overcome this and ensure that progress is made, a distinguished proposer is generally selected among the Proposers. This Leader sequences the proposals and avoids this situation. Leader-election is covered in a later section.

Paxos Go implementations can be found at https://github.com/go-distributed/epaxos and https://github.com/kkdai/paxos.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset