Part V. Beyond Apache Spark

In this part, we want to view Apache Spark’s streaming engines within a broader scope. We begin with a detailed comparison with other relevant projects of the distributed stream-processing industry, explaining both where Spark comes from and how there is no alternative exactly like it.

We offer a brief description of and a focused comparison to other distributed processing engines, including the following:

Apache Storm

A historical landmark of distributed processing, and a system that still has a legacy footprint today

Apache Flink

A distributed stream processing engine that is the most active competitor of Spark

Apache Kafka Streams

A reliable distributed log and stream connector that is fast developing analytical chops

We also touch on the cloud offerings of the main players (Amazon and Microsoft) as well as the centralizing engine of Google Cloud Dataflow.

After you are equipped with a detailed sense of the potential and challenges of Apache Spark’s streaming ambitions, we’ll touch on how you can become involved with the community and ecosystem of stream processing with Apache Spark, providing references for contributing, discussing, and growing in the practice of streaming analytics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset