Summary

The concepts presented in this chapter are just the beginning of the road to using Blaze. There are many other ways it can be used and data sources it can connect with. Treat this as a starting point to build your understanding of polyglot persistence.

Note, however, that these days most of the concepts explained in this chapter can be attained natively within Spark, as you can use SQLAlchemy directly within Spark making it easy to work with a variety of data sources. The advantage of doing so, despite the initial investment of learning the API of SQLAlchemy, is that the data returned will be stored in a Spark DataFrame and you will have access to everything that PySpark has to offer. This, by no means, implies that you never should never use Blaze: the choice, as always, is yours.

In the next chapter, you will learn about streaming and how to do it with Spark. Streaming has become an increasingly important topic these days, as, daily (true as of 2016), the world produces roughly 2.5 exabytes of data (source: http://www.northeastern.edu/levelblog/2016/05/13/how-much-data-produced-every-day/) that need to be ingested, processed and made sense of.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset