Closing remarks on the Lambda and Kappa architecture

Two architecture paradigms are currently in vogue: the Lambda and Kappa architectures.

Lambda is the brainchild of the Storm creator and main committer, Nathan Marz. It essentially advocates building a functional architecture on all data. The architecture has two branches. The first is a batch arm envisioned to be powered by Hadoop, where historical, high-latency, high-throughput data are pre-processed and made ready for consumption. The real-time arm is envisioned to be powered by Storm, and it processes incrementally streaming data, derives insights on the fly, and feeds aggregated information back to the batch storage.

Kappa is the brainchild of one the main committer of Kafka, Jay Kreps, and his colleagues at Confluent (previously at LinkedIn). It is advocating a full streaming pipeline, effectively implementing, at the enterprise level, the unified log enounced in the previous pages.

Understanding Lambda architecture

Lambda architecture combines batch and streaming data to provide a unified query mechanism on all available data. Lambda architecture envisions three layers: a batch layer where precomputed information are stored, a speed layer where real-time incremental information is processed as data streams, and finally the serving layer that merges batch and real-time views for ad hoc queries. The following diagram gives an overview of the Lambda architecture:

Understanding Lambda architecture

Understanding Kappa architecture

The Kappa architecture proposes to drive the full enterprise in streaming mode. The Kappa architecture arose from a critique from Jay Kreps and his colleagues at LinkedIn at the time. Since then, they moved and created Confluent with Apache Kafka as the main enabler of the Kappa architecture vision. The basic tenet is to move in all streaming mode with a Unified Log as the main backbone of the enterprise information architecture.

A Unified Log is a centralized enterprise structured log available for real-time subscription. All the organization's data is put in a central log for subscription. Records are numbered beginning with zero so that they are written. It is also known as a commit log or journal. The concept of the Unified Log is the central tenet of the Kappa architecture.

The properties of the unified log are as follows:

  • Unified: There is a single deployment for the entire organization
  • Append only: Events are immutable and are appended
  • Ordered: Each event has a unique offset within a shard
  • Distributed: For fault tolerance purpose, the unified log is distributed redundantly on a cluster of computers
  • Fast: The systems ingests thousands of messages per second

The following screenshot captures the moment Jay Kreps announced his reservations about the Lambda architecture. His main reservation about the Lambda architecture is implementing the same job in two different systems, Hadoop and Storm, with each of their specific idiosyncrasies, and with all the complexities that come along with it. Kappa architecture processes the real-time data and reprocesses historical data in the same framework powered by Apache Kafka.

Understanding Kappa architecture
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset