CAP theorem

As seen from the preceding discussion, consistency in a distributed system is a complicated subject. However, there is a theorem that cleanly summarizes the most important impacts, and it is called the CAP theorem. It was proposed by Eric Brewer in 1998, and states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:

  • Consistency (C): By consistency, we mean strict consistency. A read is guaranteed to return the most recent write, from any client.
  • Availability (A): The response is able to handle a request at any given point in time, and provide a legitimate response.
  • Partition-tolerance (P): The system remains operational, despite the loss of network connectivity or message loss between nodes.

According to the CAP theorem, only any two of the three are possible. In the face of Partition-tolerance, the system has to choose between either being available (such a system then becomeAP) or Consistent (CP):

The core choice of the CAP theorem takes space when one of the data nodes want to communicate to others (maybe for replication) and there is a timeout. Here, the code must decide between two actions:

  • Terminate the operation and declare the system as unavailable to the client
  • Proceed with the operation locally and other reachable node, and thus compromise consistency

We can always retry communications, but the decision needs to be made at some point.

Retrying indefinitely is effectively the first choice we mentionedchoosing Consistency over Availability.

Generally, most modern systems, such as Cassandra, leave the CP or AP choice as tuneables for the user. However, understanding the tradeoff and making the application work with a soft state (the eventually consistent model) can lead to massive increase in the scalability of the application.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset