Summary

It is important to note that Structured Streaming is currently (at the time of writing) not production-ready. It is, however, a paradigm shift in Spark that will hopefully make it easier for data scientists and data engineers to build continuous applications. While not explicitly called out in the previous sections, when working with streaming applications, there are many potential problems that you will need to design for, such as late events, partial outputs, state recovery on failure, distributed reads and writes, and so on. With structured streaming, many of these issues will be abstracted away to make it easier for you to build continuous applications.

We encourage you to try Spark Structured Streaming so you will be able to easily build streaming applications as structured streaming matures. As Reynold Xin noted in his Spark Summit 2016 East presentation The Future of Real-Time in Spark (http://www.slideshare.net/rxin/the-future-of-realtime-in-spark):

"The simplest way to perform streaming analytics is not having to reason about streaming."

For more information, here are some additional Structured Streaming resources:

In the next chapter we will show you how to modularize and package up your PySpark application and submit it for execution programmatically.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset