In this part, we are going to learn about Spark Streaming.
Spark Streaming was the first streaming API offered on Apache Spark and is currently used in production by many companies around the world. It provides a powerful and extensible functional API based on the core Spark abstractions. Nowadays, Spark Streaming is mature and stable.
Our exploration of Spark Streaming begins with a practical example that provides us with an initial feeling of its API usage and programming model. As we progress through this part, we explore the different aspects involved in the programming and execution of robust Spark Streaming applications:
Understanding the Discretized Stream (DStream) abstraction
Creating applications using the API and programming model
Consuming and producing data using streaming sources and Output Operations
Combining SparkSQL and other libraries into streaming applications
Understanding the fault-tolerance characteristics and how to create robust applications
Monitoring and managing streaming applications
After this part, you will have the knowledge required to design, implement, and execute stream-processing applications using Spark Streaming. We will also be prepared for Part IV, in which we cover more advanced topics like the application of probabilistic data structures for stream processing and online machine learning.