Network-partitioning tolerance

Regional failures are rare, but remember anything that can go wrong will go wrong; to build resiliency, we need to take into account that networking components could fail at any time. When a regional failure happens, in which clients get disconnected for connectivity issues, we can reroute current traffic to a second region in which systems are up and running and to some extent synchronize data concerning the primary region.

Let's imagine this scenario: a user purchases an e-commerce website, and distributed systems of this application create a transaction that propagates from the middleware to a DBMS. Before the database updates, a network switch fails, leading to an inconsistent state for the business and database tiers due to a mismatch. The application cannot prove the system consistency, because at this moment we have two versions of the system until a rollback gets fired at the middleware. One half of the system has not finished the update operation (data tier), and the other half is waiting for a commit and acknowledges this (business tier); this leads to a split-brain scenario, where the two parts of a brain know nothing about their failing counterpart.

This idea is the essence of the CAP theorem, which tells us that any distributed system in the presence of a network partition must, by nature, choose between consistency and availability.

Some of the AWS services are designed to be highly available and tolerant to network partitioning giving on strong consistency in favor of future eventual consistency, maybe we are talking about milliseconds for a system to be strongly consistent again until it converges.

Table of Contents for Network-partitioning tolerance

Create new playlist

Sign In

Sign Up

Table of Contents for
Network-partitioning tolerance