Chapter 9. Polyglot Persistence with Blaze

Our world is complex and no single approach exists that solves all problems. Likewise, in the data world one cannot solve all problems with one piece of technology.

Nowadays, any big technology company uses (in one form or another) a MapReduce paradigm to sift through terabytes (or even petabytes) of data collected daily. On the other hand, it is much easier to store, retrieve, extend, and update information about products in a document-type database (such as MongoDB) than it is in a relational database. Yet, persisting transaction records in a relational database aids later data summarizing and reporting.

Even these simple examples show that solving a vast array of business problems requires adapting to different technologies. This means that you, as a database manager, data scientist, or data engineer, would have to learn all of these separately if you were to solve your problems with the tools that are designed to solve them easily. This, however, does not make your company agile and is prone to errors and lots of tweaking and hacking needing to be done to your system.

Blaze abstracts most of the technologies and exposes a simple and elegant data structure and API.

In this chapter, you will learn:

  • How to install Blaze
  • What polyglot persistence is about
  • How to abstract data stored in files, pandas DataFrames, or NumPy arrays
  • How to work with archives (GZip)
  • How to connect to SQL (PostgreSQL and SQLite) and No-SQL (MongoDB) databases with Blaze
  • How to query, join, sort, and transform the data, and perform simple summary statistics

Installing Blaze

If you run Anaconda it is easy to install Blaze. Just issue the following command in your CLI (see the Bonus Chapter 1, Installing Spark if you do not know what a CLI is):

conda install blaze

Once the command is issued, you will see a screen similar to the following screenshot:

Installing Blaze

We will later use Blaze to connect to the PostgreSQL and MongoDB databases, so we need to install some additional packages that Blaze will use in the background.

We will install SQL Alchemy and PyMongo, both of which are part of Anaconda:

conda install sqlalchemy
conda install pymongo

All that is now left to do is to import Blaze itself in our notebook:

import blaze as bl
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset