Now, we will dive into the creation of interactive maps with Bokeh. First, we create a world map where we geo-locate sample tweets and, on moving our mouse over these locations, we can see the users and their respective tweets in a hover box.
The second map is focused on mapping upcoming meetups in London. It could be an interactive map that would act as a reminder of date, time, and location for upcoming meetups in a specific city.
The objective is to create a world map scatter plot of the locations of important tweets on the map, and the tweets and authors are revealed on hovering over these points. We will go through three steps to build this interactive visualization:
In step one, we create a Python list called data that will contain all the world countries boundaries with their respective latitude and longitude:
In [4]: # # This module exposes geometry data for World Country Boundaries. # import csv import codecs import gzip import xml.etree.cElementTree as et import os from os.path import dirname, join nan = float('NaN') __file__ = os.getcwd() data = {} with gzip.open(join(dirname(__file__), 'AN_Spark/data/World_Country_Boundaries.csv.gz')) as f: decoded = codecs.iterdecode(f, "utf-8") next(decoded) reader = csv.reader(decoded, delimiter=',', quotechar='"') for row in reader: geometry, code, name = row xml = et.fromstring(geometry) lats = [] lons = [] for i, poly in enumerate(xml.findall('.//outerBoundaryIs/LinearRing/coordinates')): if i > 0: lats.append(nan) lons.append(nan) coords = (c.split(',')[:2] for c in poly.text.split()) lat, lon = list(zip(*[(float(lat), float(lon)) for lon, lat in coords])) lats.extend(lat) lons.extend(lon) data[code] = { 'name' : name, 'lats' : lats, 'lons' : lons, } In [5]: len(data) Out[5]: 235
In step two, we load a sample set of important tweets that we wish to visualize with their respective geo-location information:
In [69]: # data # # In [8]: import pandas as pd csv_in = '/home/an/spark/spark-1.5.0-bin-hadoop2.6/examples/AN_Spark/data/spark_tweets_20.csv' t20_df = pd.read_csv(csv_in, index_col=None, header=0, sep=',', encoding='utf-8') In [9]: t20_df.head(3) Out[9]: id created_at user_id user_name tweet_text htag urls ptxt tgrp date user_handles txt_terms search_grp lat lon 0 638818911773856000 Tue Sep 01 21:01:11 +0000 2015 2511247075 Noor Din RT @kdnuggets: R leads RapidMiner, Python catc... [#KDN] [://t.co/3bsaTT7eUs] r leads rapidminer python catches up big data ... [spark, python] 2015-09-01 21:01:11 [@kdnuggets] r leads rapidminer python catches up big data ... [spark, python] 37.279518 -121.867905 1 622142176768737000 Fri Jul 17 20:33:48 +0000 2015 24537879 IBM Cloudant Be one of the first to sign-up for IBM Analyti... [#ApacheSpark, #SparkInsight] [://t.co/C5TZpetVA6, ://t.co/R1L29DePaQ] be one of the first to sign up for ibm analyti... [spark] 2015-07-17 20:33:48 [] be one of the first to sign up for ibm analyti... [spark] 37.774930 -122.419420 2 622140453069169000 Fri Jul 17 20:26:57 +0000 2015 515145898 Arno Candel Nice article on #apachespark, #hadoop and #dat... [#apachespark, #hadoop, #datascience] [://t.co/IyF44pV0f3] nice article on apachespark hadoop and datasci... [spark] 2015-07-17 20:26:57 [@h2oai] nice article on apachespark hadoop and datasci... [spark] 51.500130 -0.126305 In [98]: len(t20_df.user_id.unique()) Out[98]: 19 In [17]: t20_geo = t20_df[['date', 'lat', 'lon', 'user_name', 'tweet_text']] In [24]: # t20_geo.rename(columns={'user_name':'user', 'tweet_text':'text' }, inplace=True) In [25]: t20_geo.head(4) Out[25]: date lat lon user text 0 2015-09-01 21:01:11 37.279518 -121.867905 Noor Din RT @kdnuggets: R leads RapidMiner, Python catc... 1 2015-07-17 20:33:48 37.774930 -122.419420 IBM Cloudant Be one of the first to sign-up for IBM Analyti... 2 2015-07-17 20:26:57 51.500130 -0.126305 Arno Candel Nice article on #apachespark, #hadoop and #dat... 3 2015-07-17 19:35:31 51.500130 -0.126305 Ira Michael Blonder Spark 101: Running Spark and #MapReduce togeth... In [22]: df = t20_geo #
In step three, we first imported all the necessary Bokeh libraries. We will instantiate the output in the Jupyter Notebook. We get the world countries boundary information loaded. We get the geo-located tweet data. We instantiate the Bokeh interactive tools such as wheel and box zoom as well as the hover tool.
In [29]: # # Bokeh Visualization of tweets on world map # from bokeh.plotting import * from bokeh.models import HoverTool, ColumnDataSource from collections import OrderedDict # Output in Jupiter Notebook output_notebook() # Get the world map world_countries = data.copy() # Get the tweet data tweets_source = ColumnDataSource(df) # Create world map countries_source = ColumnDataSource(data= dict( countries_xs=[world_countries[code]['lons'] for code in world_countries], countries_ys=[world_countries[code]['lats'] for code in world_countries], country = [world_countries[code]['name'] for code in world_countries], )) # Instantiate the bokeh interactive tools TOOLS="pan,wheel_zoom,box_zoom,reset,resize,hover,save"
We are now ready to layer the various elements gathered into an object figure called p. Define the title, width, and height of p. Attach the tools. Create the world map background by patches with a light background color and borders. Scatter plot the tweets according to their respective geo-coordinates. Then, activate the hover tool with the users and their respective tweet. Finally, render the picture on the browser. The code is as follows:
# Instantiante the figure object p = figure( title="%s tweets " %(str(len(df.index))), title_text_font_size="20pt", plot_width=1000, plot_height=600, tools=TOOLS) # Create world patches background p.patches(xs="countries_xs", ys="countries_ys", source = countries_source, fill_color="#F1EEF6", fill_alpha=0.3, line_color="#999999", line_width=0.5) # Scatter plots by longitude and latitude p.scatter(x="lon", y="lat", source=tweets_source, fill_color="#FF0000", line_color="#FF0000") # # Activate hover tool with user and corresponding tweet information hover = p.select(dict(type=HoverTool)) hover.point_policy = "follow_mouse" hover.tooltips = OrderedDict([ ("user", "@user"), ("tweet", "@text"), ]) # Render the figure on the browser show(p) BokehJS successfully loaded. inspect # #
The following code gives an overview of the world map with the red dots representing the locations of the tweets' origins:
We can hover on a specific dot to reveal the tweets in that location:
We can zoom into a specific location:
Finally, we can reveal the tweets in the given zoomed-in location:
Now, our objective is to focus on upcoming meetups in London. We are mapping three meetups Data Science London, Apache Spark, and Machine Learning. We embed a Google Map within a Bokeh visualization and geo-locate the three meetups according to their coordinates and get information such as the name of the upcoming event for each meetup with a hover tool.
First, import all the necessary Bokeh libraries:
In [ ]: # # Bokeh Google Map Visualization of London with hover on specific points # # from __future__ import print_function from bokeh.browserlib import view from bokeh.document import Document from bokeh.embed import file_html from bokeh.models.glyphs import Circle from bokeh.models import ( GMapPlot, Range1d, ColumnDataSource, PanTool, WheelZoomTool, BoxSelectTool, HoverTool, ResetTool, BoxSelectionOverlay, GMapOptions) from bokeh.resources import INLINE x_range = Range1d() y_range = Range1d()
We will instantiate the Google Map that will act as the substrate upon which our Bokeh visualization will be layered:
# JSON style string taken from: https://snazzymaps.com/style/1/pale-dawn map_options = GMapOptions(lat=51.50013, lng=-0.126305, map_type="roadmap", zoom=13, styles=""" [{"featureType":"administrative","elementType":"all","stylers":[{"visibility":"on"},{"lightness":33}]}, {"featureType":"landscape","elementType":"all","stylers":[{"color":"#f2e5d4"}]}, {"featureType":"poi.park","elementType":"geometry","stylers":[{"color":"#c5dac6"}]}, {"featureType":"poi.park","elementType":"labels","stylers":[{"visibility":"on"},{"lightness":20}]}, {"featureType":"road","elementType":"all","stylers":[{"lightness":20}]}, {"featureType":"road.highway","elementType":"geometry","stylers":[{"color":"#c5c6c6"}]}, {"featureType":"road.arterial","elementType":"geometry","stylers":[{"color":"#e4d7c6"}]}, {"featureType":"road.local","elementType":"geometry","stylers":[{"color":"#fbfaf7"}]}, {"featureType":"water","elementType":"all","stylers":[{"visibility":"on"},{"color":"#acbcc9"}]}] """)
Instantiate the Bokeh object plot from the class GMapPlot
with the dimensions and map options from the previous step:
# Instantiate Google Map Plot plot = GMapPlot( x_range=x_range, y_range=y_range, map_options=map_options, title="London Meetups" )
Bring in the information from our three meetups we wish to plot and get the information by hovering above the respective coordinates:
source = ColumnDataSource( data=dict( lat=[51.49013, 51.50013, 51.51013], lon=[-0.130305, -0.126305, -0.120305], fill=['orange', 'blue', 'green'], name=['LondonDataScience', 'Spark', 'MachineLearning'], text=['Graph Data & Algorithms','Spark Internals','Deep Learning on Spark'] ) )
Define the dots to be drawn on the Google Map:
circle = Circle(x="lon", y="lat", size=15, fill_color="fill", line_color=None) plot.add_glyph(source, circle)
Define the stings for the Bokeh tools to be used in this visualization:
# TOOLS="pan,wheel_zoom,box_zoom,reset,hover,save" pan = PanTool() wheel_zoom = WheelZoomTool() box_select = BoxSelectTool() reset = ResetTool() hover = HoverTool() # save = SaveTool() plot.add_tools(pan, wheel_zoom, box_select, reset, hover) overlay = BoxSelectionOverlay(tool=box_select) plot.add_layout(overlay)
Activate the hover
tool with the information that will be carried:
hover = plot.select(dict(type=HoverTool)) hover.point_policy = "follow_mouse" hover.tooltips = OrderedDict([ ("Name", "@name"), ("Text", "@text"), ("(Long, Lat)", "(@lon, @lat)"), ]) show(plot)
Render the plot that gives a pretty good view of London:
Once we hover on a highlighted dot, we can get the information of the given meetup:
Full smooth zooming capability is preserved, as the following screenshot shows: