Chapter 6. Visualizing Insights and Trends

So far, we have focused on the collection, analysis, and processing of data from Twitter. We have set the stage to use our data for visual rendering and extracting insights and trends. We will give a quick lay of the land about visualization tools in the Python ecosystem. We will highlight Bokeh as a powerful tool for rendering and viewing large datasets. Bokeh is part of the Python Anaconda Distribution ecosystem.

In this chapter, we will cover the following points:

  • Gauging the key words and memes within a social network community using charts and wordcloud
  • Mapping the most active location where communities are growing around certain themes or topics

Revisiting the data-intensive apps architecture

We have reached the final layer of the data-intensive apps architecture: the engagement layer. This layer focuses on how to synthesize, emphasize, and visualize the key context relevant information for the data consumers. A bunch of numbers in a console will not suffice to engage with end-users. It is critical to present the mass of information in a rapid, digestible, and attractive fashion.

The following diagram sets the context of the chapter's focus highlighting the engagement layer.

Revisiting the data-intensive apps architecture

For Python plotting and visualizations, we have quite a few tools and libraries. The most interesting and relevant ones for our purpose are the following:

  • Matplotlib is the grandfather of the Python plotting libraries. Matplotlib was originally the brainchild of John Hunter who was an open source software proponent and established Matplotlib as one of the most prevalent plotting libraries both in the academic and the data scientific communities. Matplotlib allows the generation of plots, histograms, power spectra, bar charts, error charts, scatterplots, and so on. Examples can be found on the Matplotlib dedicated website at http://matplotlib.org/examples/index.html.
  • Seaborn, developed by Michael Waskom, is a great library to quickly visualize statistical information. It is built on top of Matplotlib and integrates seamlessly with Pandas and the Python data stack, including Numpy. A gallery of graphs from Seaborn at http://stanford.edu/~mwaskom/software/seaborn/examples/index.html shows the potential of the library.
  • ggplot is relatively new and aims to offer the equivalent of the famous ggplot2 from the R ecosystem for the Python data wranglers. It has the same look and feel of ggplot2 and uses the same grammar of graphics as expounded by Hadley Wickham. The ggplot the Python port is developed by the team at yhat. More information can be found at http://ggplot.yhathq.com.
  • D3.js is a very popular, JavaScript library developed by Mike Bostock. D3 stands for Data Driven Documents and brings data to life on any modern browser leveraging HTML, SVG, and CSS. It delivers dynamic, powerful, interactive visualizations by manipulating the DOM, the Document Object Model. The Python community could not wait to integrate D3 with Matplotlib. Under the impulse of Jake Vanderplas, mpld3 was created with the aim of bringing matplotlib to the browser. Examples graphics are hosted at the following address: http://mpld3.github.io/index.html.
  • Bokeh aims to deliver high-performance interactivity over very large or streaming datasets whilst leveraging lot of the concepts of D3.js without the burden of writing some intimidating javascript and css code. Bokeh delivers dynamic visualizations on the browser with or without a server. It integrates seamlessly with Matplotlib, Seaborn and ggplot and renders beautifully in IPython notebooks or Jupyter notebooks. Bokeh is actively developed by the team at Continuum.io and is an integral part of the Anaconda Python data stack.

Bokeh server provides a full-fledged, dynamic plotting engine that materializes a reactive scene graph from JSON. It uses web sockets to keep state and update the HTML5 canvas using Backbone.js and Coffee-script under the hoods. Bokeh, as it is fueled by data in JSON, creates easy bindings for other languages such as R, Scala, and Julia.

This gives a high-level overview of the main plotting and visualization library. It is not exhaustive. Let's move to concrete examples of visualizations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset