Summary

In this chapter, we laid out the foundations of streaming architecture apps and described their challenges, constraints, and benefits. We went under the hood and examined the inner working of Spark Streaming and how it fits with Spark Core and dialogues with Spark SQL and Spark MLlib. We illustrated the streaming concepts with TCP sockets, followed by live tweet ingestion and processing directly from the Twitter firehose. We discussed the notions of decoupling upstream data publishing from downstream data subscription and consumption using Kafka in order to maximize the resilience of the overall streaming architecture. We also discussed Flumeā€”a reliable, flexible, and scalable data ingestion and transport pipeline system. The combination of Flume, Kafka, and Spark delivers unparalleled robustness, speed, and agility in an ever changing landscape. We closed the chapter with some remarks and observations on two streaming architectural paradigms, the Lambda and Kappa architectures.

The Lambda architecture combines batch and streaming data in a common query front-end. It was envisioned with Hadoop and Storm in mind initially. Spark has its own batch and streaming paradigms, and it offers a single environment with common code base to effectively bring this architecture paradigm to life.

The Kappa architecture promulgates the concept of the unified log, which creates an event-oriented architecture where all events in the enterprise are channeled in a centralized commit log that is available to all consuming systems in real time.

We are now ready for the visualization of the data collected and processed so far.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset