Resilience

A lot of things can go wrong with the models described in this section:

The broker can fail, which takes away any messages stored on it. To overcome this eventuality, generally brokers are deployed in a cluster of redundant instances. Every message is replicated on a set of machines as part of the write commit. If a message is replicated onto n brokers, that means the system can tolerate the failure of n-1 instances.
The producer-to-broker communication can fail, which causes messages to be lost. This is generally solved by acknowledgements (as seen previously). Duplicate messages being produced can be avoided by having a sequence number for messages.
The consumer-to-broker message communication can fail, which causes messages to be lost. Hence, messages should not be deleted from the Broker, unless there is an explicit acknowledgement from the consumer that the message has been processed. As we saw in the Acknowledgement section, when we do the acknowledgement, it can result in at-least-once or at-most-once semantics. Exactly-once delivery can be enabled by deduplication on the consumer side and cooperation between the broker and the consumer.

Many applications also need some atomicity guarantees in terms of read from n queues and write to m queues. Messaging systems such as Kafka provide transaction constructs to enable this. We shall see this in detail in the Apache Kafka deep-dive section.

Table of Contents for Resilience

Create new playlist

Sign In

Sign Up

Table of Contents for
Resilience