To get a powerful and fun visualization of the flight paths and connections in this dataset, we can leverage the Airports D3 visualization (https://mbostock.github.io/d3/talk/20111116/airports.html) within our Databricks notebook. By connecting our GraphFrames, DataFrames, and D3 visualizations, we can visualize the scope of all the flight connections as noted for all on-time or early departing flights within this dataset.
The blue circles represent the vertices (that is, airports) where the size of the circle represents the number of edges (that is, flights) in and out of those airports. The black lines are the edges themselves (that is, flights) and their respective connections to the other vertices (that is, airports). Note for any edges that go offscreen, they are representing vertices (that is, airports) in the states of Hawaii and Alaska.
For this to work, we first create a scala
package called d3a
that is embedded in our notebook (you can download it from here: http://bit.ly/2kPkXkc). Because we're using Databricks notebooks, we can make Scala
calls within our PySpark notebook:
%scala // On-time and Early Arrivals import d3a._ graphs.force( height = 800, width = 1200, clicks = sql("""select src, dst as dest, count(1) as count from departureDelays_geo where delay <= 0 group by src, dst""").as[Edge])
The results of the preceding query for on-time and early arrivals flights are visualized in the following screenshot:
You can hover over the airports (blue circle, vertex) in the airports D3 visualization where the lines are the edges (flights). The preceding visualization is a snapshot when hovering over Seattle (SEA) airport; while the following visualization is a snapshot when hovering over Los Angeles (LAX) airport: