ETL Using Node.js

In this chapter, we will focus on ETL operations. ETL stands for Extract-Transform-Load, and its name pretty much describes what we are about to do.

As a matter of fact, we are already familiar with loading the data, as well as extracting it; we also did some data transformations when we reprojected datasets, made flat data spatial, exported a subset of a dataset, or cropped a portion of a raster off a larger dataset. Indeed, we have done ETL already, although our approach involved some manually executed tasks, so it is easy to imagine how labor intensive and therefore time consuming our operations would become if we had to repeat them many times.

Not surprisingly, it is possible to make our lives easier with just a bit of scripting. Over the next few pages, we will define some hypothetical workflows, and then use Node.js to automate and chain the required operations.

We could obviously use any other programming language, but because we're going to do some WebGIS-related stuff a bit later, using JavaScript seemed more natural.

The workflows presented will not necessarily be 100% real-world examples, but they will touch on some processes I tend to encounter more or less frequently.

It is important to stress that there are some really well-equipped ETL tools out there, both commercial and open source. However, the point of this chapter is not to compete with them, but show how some relatively easy-to-use techniques can bring some more muscle to our already powerful PostGIS database.

One can think of ETL as mainly focused on local or database resources. This is, indeed, often the case, but basically any data processing that leads to the creation of a new dataset, even a simple data projection onto another model, can be safely named a transformation. And since we usually also have to read and write the data, we do the E and L of ETL anyway. ETL does not always have to indicate heavy data lifting operations. Also, the sequence does not always have to be E->T->L; some steps are not always implicit (for example, data extraction from a PostGIS to SHP may result in some data being truncated and therefore transformed, regardless of our intent).

We will focus on a couple of examples and go through the following steps:

  • Set up Node.js
  • Handshake with a database using Node.js's PgSQL client
  • Retrieve and process JSON data
  • Geocode address data
  • Consume WFS data
  • Output GeoJSON
  • Output TopoJSON
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset