Introduction

Streaming is the process of dividing continuously flowing input data into discrete units so that it can be processed easily. Familiar examples in real life are streaming video and audio content (though a user can download the full movie before he/she can watch it, a faster solution is to stream data in small chunks that start playing for the user while the rest of the data is being downloaded in the background).

Real-world examples of streaming, besides multimedia, are the processing of market feeds, weather data, electronic stock trading data, and so on. All these applications produce large volumes of data at very fast rates and require special handling of the data so that you can derive insight from the data in real time.

Streaming has a few basic concepts; it'll be better if we discuss them before we focus on Spark Streaming. The rate at which a streaming application receives data is called data rate and is expressed in the form of kilobytes per second (Kbps) or megabytes per second (Mbps).

One important use case of streaming is complex event processing (CEP). In CEP, it is important to control the scope of the data being processed. This scope is called window, which can be either based on time or size. An example of a time-based window is analyzing data that has come in the last 1 minute. An example of a size-based window can be the average asking price of the last 100 trades of a given stock.

Spark Streaming is Spark's library that provides support to process live data. This stream can come from any source, such as Twitter, Kafka, or Flume.

Spark Streaming has a few fundamental building blocks that you need to understand well before diving into the recipes.

Table of Contents for Introduction

Create new playlist

Sign In

Sign Up

Table of Contents for
Introduction