As noted by Tathagata Das – committer and member of the project management committee (PMC) to the Apache Spark project and lead developer of Spark Streaming – in the Datanami article Spark Streaming: What is It and Who's Using it (https://www.datanami.com/2015/11/30/spark-streaming-what-is-it-and-whos-using-it/), there is a business need for streaming. With the prevalence of online transactions and social media, as well as sensors and devices, companies are generating and processing more data at a faster rate.
The ability to develop actionable insight at scale and in real time provides those businesses with a competitive advantage. Whether you are detecting fraudulent transactions, providing real-time detection of sensor anomalies, or reacting to the next viral tweet, streaming analytics is becoming increasingly important in data scientists' and data engineer's toolbox.
The reason Spark Streaming is itself being rapidly adopted is because Apache Spark unifies all of these disparate data processing paradigms (Machine Learning via ML and MLlib, Spark SQL, and Streaming) within the same framework. So, you can go from training machine learning models (ML or MLlib), to scoring data with these models (Streaming) and perform analysis using your favourite BI tool (SQL) – all within the same framework. Companies including Uber, Netflix, and Pinterest often showcase their Spark Streaming use cases:
Currently, there are four broad use cases surrounding Spark Streaming: