Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Summary

It is important to note that Structured Streaming is currently (at the time of writing) not production-ready. It is, however, a paradigm shift in Spark that will hopefully make it easier for data scientists and data engineers to build continuous applications. While not explicitly called out in the previous sections, when working with streaming applications, there are many potential problems that you will need to design for, such as late events, partial outputs, state recovery on failure, distributed reads and writes, and so on. With structured streaming, many of these issues will be abstracted away to make it easier for you to build continuous applications.

We encourage you to try Spark Structured Streaming so you will be able to easily build streaming applications as structured streaming matures. As Reynold Xin noted in his Spark Summit 2016 East presentation The Future of Real-Time in Spark (http://www.slideshare.net/rxin/the-future-of-realtime-in-spark):

"The simplest way to perform streaming analytics is not having to reason about streaming."

For more information, here are some additional Structured Streaming resources:

PySpark 2.1 Documentation: pyspark.sql.module: http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html
Introducing Apache Spark 2.1: https://databricks.com/blog/2016/12/29/introducing-apache-spark-2-1.html
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets and Streaming - by Michael Armbrust: http://www.slideshare.net/databricks/structuring-spark-dataframes-datasets-and-streaming-62871797
Structured Streaming Programming Guide: http://spark.apache.org/docs/latest/streaming-programming-guide.html
Structured Streaming (aka Streaming DataFrames) [SPARK-8360]: https://issues.apache.org/jira/browse/SPARK-8360
Structured Streaming Programming Abstraction, Semantics, and APIs Apache JIRA: https://issues.apache.org/jira/secure/attachment/12793410/StructuredStreamingProgrammingAbstractionSemanticsandAPIs-ApacheJIRA.pdf

In the next chapter we will show you how to modularize and package up your PySpark application and submit it for execution programmatically.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Summary

Table of Contents for
Summary