Chapter 8. Advanced Visualization

Visualization methods have transformed from the traditional bar and pie graphs several decades ago to much more creative forms lately. Designing visualization is not as straightforward as picking one from the many choices that a particular tool offers. The right visualization conveys the right message, and the wrong visualization may distort, confuse, or convey the wrong message.

Computers and storage devices within them are useful in not only storing large chunks of data using data structures, but also to use the power of computing via algorithms. According to Michael Bostock, the creator of D3.js and a leading visualization expert, we should visualize the algorithm and not just the data that feeds into it. An algorithm is the core engine behind any process or computational model; therefore, this algorithm has become an important use case for visualization.

Visualizing algorithms has only been recognized in the recent past few years, and one interesting place to explore this concept is visualgo.net, where they have some advanced algorithms to teach data structures and algorithms. Visualgo contains algorithms that can be found in Dr. Steven Halim's book titled Competitive Programming. Another similar interesting visualization methods have been made available by Prof. David Galles from the University of San Francisco (https://www.cs.usfca.edu/~galles/visualization/). There are other such contributions to teach algorithms and data.

We discussed many different areas, including numerical computing, financial models, statistical and machine learning, and network models. Later in this chapter, we will discuss some new and creative ideas about visualization and some simulation and signal processing examples. In addition, we will cover the following topics:

  • Computer simulation, signal processing, and animation examples
  • Some interesting visualization methods using HTML5
  • How is Julia different from Python?—advantages and disadvantages
  • Why is D3.js the most popular visualization tool when compared with Python
  • Tools to create dashboards

Computer simulation

A computer simulation is a discipline that gained popularity for more than several decades. It is a computer program that attempts to simulate an abstract model. The models of computer simulation can assist in the creation of complex systems as a way to understand and evaluate hidden or unknown scenarios. Some notable examples of computer simulation modeling are weather forecasting and aircraft simulators used for training pilots.

Computer simulations have become a very productive part of mathematical modeling of systems in diverse fields, such as physics, chemistry, biology, economics, engineering, psychology, and social science.

Here are the benefits of simulation models:

  • Gaining a better understanding of an algorithm or process that is being studied
  • Identifying the problem areas in the processes and algorithm
  • Evaluating the impact of changes in anything that relates to the algorithmic model

The types of simulation models are as follows:

  • Discrete models: In this, changes to the system occur only at specific times
  • Continuous models: In this, the state of the system changes continuously over a period of time
  • Mixed models: This contains both discrete and continuous elements

In order to conduct a simulation, it is common to use random probabilistic inputs because it is unlikely that you would have real data before any such simulation experiment is performed. It is therefore common that simulation experiments involve random numbers whether it is done for a deterministic model or not.

To begin with, let's consider several options to generate random numbers in Python and illustrate one or more examples in simulation.

Python's random package

Python provides a package called random that has several convenient functions that can be used for the following:

  • To generate random real numbers between 0.0 and 1.0, or between specific start and end values
  • To generate random integers between specific ranges of numbers
  • To get a list of random values from a list of numbers or letters
import random

print random.random() # between 0.0 and 1.0
print random.uniform(2.54, 12.2) # between 2.54 and 12.2
print random.randint(5,10)  # random integer between 5 and 10

print random.randrange(25)  # random number between 0 and 25
#  random numbers from the range of 5 to 500 with step 5
print random.randrange(5,500,5) 

# three random number from the list 
print random.sample([13,15,29,31,43,46,66,89,90,94], 3) 
# Random choice from a list
random.choice([1, 2, 3, 5, 9])

SciPy's random functions

NumPy and SciPy are Python modules that consist of mathematical and numerical routines. The Numeric Python (NumPy) package provides basic routines to manipulate large arrays and matrices of numeric data. The scipy package extends NumPy with algorithms and mathematical techniques.

NumPy has a built-in pseudorandom number generator. The numbers are pseudorandom, which means that they are generated deterministically from a single seed number. Using the same seed number, you can generate the same set of random numbers, as shown in the following code:

Import numpy as np
np.random.seed(65536)

A different random sequence can be generated by not providing the seed value. NumPy automatically selects a random seed (based on the time) that is different every time a program is run with the following code:

np.random.seed()

An array of five random numbers in the interval [0.0, 1.0] can be generated as follows:

import numpy as np
np.random.rand(5)
#generates the following
array([ 0.2611664,  0.7176011,  0.1489994,  0.3872102,  0.4273531])

The rand function can be used to generate random two-dimensional arrays as well, as shown in the following code:

np.random.rand(2,4) 
array([
[0.83239852, 0.51848638, 0.01260612, 0.71026089],        
[0.20578852, 0.02212809, 0.68800472, 0.57239013]])

To generate random integers, you can use randint (min, max), where min and max define the range of numbers, in which the random integer has to be drawn, as shown in the following code:

np.random.randint(4,18) 

Use the following code to draw the discrete Poisson distribution with λ = 8.0:

np.random.poisson(8.0)

To draw from a continuous normal (Gaussian) distribution with the mean as μ = 1.25 and the standard deviation as σ = 3.0, use the following code:

np.random.normal(2.5, 3.0)

#for mean 0 and variance 1
np.random.mormal()

Simulation examples

In the first example, we will select geometric Brownian motion, which is also known as exponential Brownian motion, to model the stock price behavior with the Stochastic Differential Equation (SDE):

Simulation examples

In the preceding equation, Wt is Brownian motion, μ the percentage drift, and σ is the percentage volatility. The following code shows Brownian motion plot:

from numpy.random import standard_normal
from numpy import zeros, sqrt
import matplotlib.pyplot as plt

S_init = 20.222
T =1
tstep =0.0002
sigma = 0.4
mu = 1
NumSimulation=6

colors = [ (214,27,31), (148,103,189), (229,109,0), (41,127,214), 
(227,119,194),(44,160,44),(227,119,194), (72,17,121), (196,156,148)]  

# Scale the RGB values to the [0, 1] range.

for i in range(len(colors)):  
    r, g, b = colors[i]  
    colors[i] = (r / 255., g / 255., b / 255.)

plt.figure(figsize=(12,12))

Steps=round(T/tstep); #Steps in years
S = zeros([NumSimulation, Steps], dtype=float)
x = range(0, int(Steps), 1)

for j in range(0, NumSimulation, 1):

    S[j,0]= S_init
    for i in x[:-1]:
       S[j,i+1]=S[j,i]+S[j,i]*(mu-0.5*pow(sigma,2))*tstep+ 
          sigma*S[j,i]*sqrt(tstep)*standard_normal()
    plt.plot(x, S[j], linewidth=2., color=colors[j])

plt.title('%d Simulation using %d Steps, 
$sigma$=%.6f $mu$=%.6f $S_0$=%.6f ' % (int(NumSimulation), int(Steps), sigma, mu, S_init), 
          fontsize=18)
plt.xlabel('steps', fontsize=16)
plt.grid(True)
plt.ylabel('stock price', fontsize=16)
plt.ylim(0,90)

plt.show()

The following plot shows the results of six simulations using Brownian motion:

Simulation examples

Another simulation example here demonstrates how you can apply the Hodrick–Prescott filter to get a smoothed curve representation of the stock price data that falls under the class of time series data:

Simulation examples

Here, we will use the finance subpackage in matplotlib to generate the stock price data for a range of dates with the start date as May 2012 and the end date as Dec 2014. Using the hold method of matplotlib, you can show the smoothed curve together with the stock price plot, as shown in the following code:

from matplotlib import finance
import matplotlib.pyplot as plt

import statsmodels.api as sm

titleStr='Stock price of FB from May. 2012 to Dec. 2014'
plt.figure(figsize=(11,10))

dt1 = datetime.datetime(2012, 05, 01)
dt2 = datetime.datetime(2014, 12, 01)
sp=finance.quotes_historical_yahoo('FB',dt1,dt2,asobject=None)

plt.title(titleStr, fontsize=16) 
plt.xlabel("Days", fontsize=14) 
plt.ylabel("Stock Price", fontsize=14)

xfilter = sm.tsa.filters.hpfilter(sp[:,2], lamb=100000)[1]

plt.plot(sp[:,2])
plt.hold(True)
plt.plot(xfilter,linewidth=5.)

In addition to these examples, you can simulate a queue system or any process that is event-based. For instance, you can simulate a neural network, and one such package that helps to model one quickly is available at http://briansimulator.org. Take a look at their demo programs for more details.

Signal processing

There are many examples in signal processing that you could think of, but we will choose one specific example that involves convolution. A convolution of two signals is a way to combine them to produce a filtered third signal. In a real-life situation, signal convolutions are applied to smoothen images. To a great extent, convolution is also applied to calculate signal interference. For more details, you can refer to a book on microwave measurements, but we will attempt to show you some simple examples.

Let's consider three simple examples here. The first example illustrates the convoluted signal of a digital signal and simulates the analog signal using hamming, as shown in the following code:

import matplotlib.pyplot as plt
from numpy import concatenate, zeros, ones, hamming, convolve

digital = concatenate ( (zeros(20), ones(25), zeros(20)))
norm_hamming = hamming(80)/sum(hamming(80))
res = convolve(digital, norm_hamming)
plt.figure(figsize=(10,10))
plt.ylim(0, 0.6)
plt.plot(res, color='r', linewidth=2)
plt.hold(True)
plt.plot(data, color='b', linewidth=3)
plt.hold(True)
plt.plot(norm_hamming, color='g', linewidth=4)
plt.show()

In this example, we will use concatenate and zeros and ones from numpy to produce digital signals, hamming to produce analog signals, and convolve to apply convolutions.

If we plot all the three signals, that is, digital signals, analog hammings, and convolved result signals (res), the resulting signal will be shifted as expected, as shown in the following graph:

Signal processing

In another example, we will use a random signal, that is, random_data and apply fast Fourier transform (FFT) as follows:

import matplotlib.pyplot as plt
from scipy import randn
from numpy import fft

plt.figure(figsize=(10,10))
random_data = randn(500)
res = fft.fft(random_data)
plt.plot(res, color='b')
plt.hold(True)
plt.plot(random_data, color='r')
plt.show()

Using randn from scipy to generate random signal data and fft from numpy that performs fast Fourier transform, the result that comes out of the transform is plotted in blue and the original random signal is plotted in red using matplotlib, as shown in the following image:

Signal processing

In the third example, a simple illustration of how to create an inverted image using the scipy package is shown. Before we get to the actual Python code and the results, let's try to analyze how an inverted image will help in visualizing data.

It is debated that in certain cases, inverted colors create less strain on our vision and is comfortable to look at. Surprisingly, if we place the original image and the inverted image side by side, inverted images will help in visualizing certain areas that may otherwise be difficult in the original image, if not for all images at least in certain cases. The following code shows how you can convert an image to an inverted image using scipy.misc.pilutil.Image():

import scipy.misc as scm 
from scipy.misc.pilutil import Image  

# open original image 
orig_image = Image.open('/Users/kvenkatr/Desktop/filter.jpg')

# extract image data into array
image1 = scm.fromimage(orig_image)
# invert array values 
inv_image = 255 - image1

# using inverted array values, convert image 
inverted_image = scm.toimage(inv_image) 

#save inverted image
inverted_image.save('/Users/kvenkatr/Desktop/filter_invert.jpg').

The inverted image result is shown along with the original image here:

Signal processing

Similarly, other filtering mechanisms can be applied to any image using some of the following functions:

convolve()         Multidimensional convolution.
correlate()        Multi-dimensional correlation.
gaussian_filter()  Multidimensional Gaussian filter

A full list of functions is shown at http://tinyurl.com/3xubv9p.

Animation

You can accomplish animation in Python using matplotlib, but the results are saved in a file in the MP4 format that can be used to be replayed later. The basic setup for the animation is as follows:

import numpy as np 
import matplotlib.pyplot as plt 
from matplotlib import animation  

# Set up the figure, axis, and the plot element to be animated 
fig = plt.figure() 
ax = plt.axes(xlim=(0, 3.2), ylim=(-2.14, 2.14)) 
line, = ax.plot([], [], lw=2)

Make sure that the animation package is imported from matplotlib, sets the axes, and prepares the necessary plotting variables (this is just an empty line) as follows:

# initialization function: plot the background of each frame
def init():
    line.set_data([], [])
    return line,

The initialization of plotting needs to be performed before starting any animation because it creates a base frame, as shown in the following code:

# animation function.  This is called sequentially
def animate(i):
    x = np.linspace(0, 2, 1000)
    xval = 2 * np.pi * (x - 0.01 * i)
    y = np.cos(xval) # Here we are trying to animate cos function
    line.set_data(x, y)
    return line,

Here is the animate function that takes the frame number as the input, defines the changed x and y values, and sets the plotting variables:

anim = animation.FuncAnimation(fig, animate, init_func=init,
            frames=200, interval=20, blit=True)
anim.save('basic_animation.mp4', fps=30)
plt.show()

The actual animation object is created via FuncAnimation and passes the init() and animate() functions, along with the number of frames, frames per second (fps), and time interval parameters. The blit=True parameter tells you that only the changed part of the display needs to be redrawn (otherwise, one may see flickers).

Before you attempt to perform an animation, you have to make sure that mencoder or ffmpeg is installed; otherwise, running this program without ffmpeg or mencoder will result in the following error: ValueError: Cannot save animation: no writers are available. Please install mencoder or ffmpeg to save animations.. The following image shows an animation of trigonometric curves, such as sin or cos:

Animation

You can embed this MP4 file in an HTML for display and press the play button in the bottom-left corner to see the animation.

There is an interesting demonstration of a double pendulum animation by Jake Vanderplas at https://jakevdp.github.io/blog/2012/08/18/matplotlib-animation-tutorial/ and a dynamic image animation at http://matplotlib.org/examples/animation/dynamic_image2.html.

In this book, so far we have discussed visualization methods that involve how to plot in Python or create external formats (such as MP4). One of the reasons why JavaScript-based visualization methods are popular is because you can present them on the Web and also associate some event-driven animation to them. Support Vector Graphics (SVG) is gaining popularity for many reasons, and one among them is the ability to scale to any size without losing details.

Visualization methods using HTML5

A simple illustration of SVG to display circles using feGaussianBlur is shown in the following code:

  <svg width="230" height="120" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
    <filter id="blurMe">
       <feGaussianBlur in="SourceGraphic" stdDeviation="5" />
    </filter>

    <circle cx="60"  cy="80" r="60" fill="#E90000" />
    <circle cx="190" cy="80" r="60" fill="#E90000"
      filter="url(#blurMe)" />
    <circle cx="360"  cy="80" r="60" fill="#4E9B01" />
    <circle cx="490" cy="80" r="60" fill="#4E9B01"
      filter="url(#blurMe)" />
    <circle cx="660"  cy="80" r="60" fill="#0080FF" />
    <circle cx="790" cy="80" r="60" fill="#0080FF"
      filter="url(#blurMe)" />
  </svg>

The first two circles are drawn with the radius as 60 and are filled with the same color, but the second circle uses the blurring filter. Similarly, adjacent circles in green and blue also follow the same behavior (for a colored effect, refer to http://knapdata.com/dash/html/svg_circle.html), as shown in the following image:

Visualization methods using HTML5

How can we use this blurring concept when the data presentation needs parts-of-whole in visualization, but does not combine to become a whole. What does this mean? Let's consider two examples. In the first example, we'll consider a class of students enrolled in foreign languages (in some cases, more than one language). If we were to represent the distribution as follows, how would we do it?

Visualization methods using HTML5

You can generate the SVG format via the Python program, as show in the following code:

import os
display_prog = 'more' # Command to execute to display images.
svcount=1

class Scene:
    def __init__(self,name="svg",height=400,width=1200):
        self.name = name
        self.items = []
        self.height = height
        self.width = width
        return

    def add(self,item): self.items.append(item)

    def strarray(self):
        var = [ "<html>
<body>
<svg height="%d" width="%d" >
" % (self.height,self.width),
               "  <g id="setttings">
",
               "    <filter id="dropshadow" height="160%">
",
               "     <feGaussianBlur in="SourceAlpha" stdDeviation="5"></feGaussianBlur>
",
               "       <feOffset dx="0" dy="3" result="offsetblur"></feOffset>
",
               "       <feMerge>
",
               "          <feMergeNode></feMergeNode>
",
               "          <feMergeNode in="SourceGraphic"></feMergeNode>
",
               "       </feMerg>
",
               "    </filter>
"]
        for item in self.items: var += item.strarray()            
        var += [" </g>
</svg>
</body>
</html>"]
        return var

    def write_svg(self,filename=None):
        if filename:
            self.svgname = filename
        else:
            self.svgname = self.name + ".html"
        file = open(self.svgname,'w')
        file.writelines(self.strarray())
        file.close()
        return

    def display(self,prog=display_prog):
        os.system("%s %s" % (prog,self.svgname))
        return        

def colorstr(rgb): return "#%x%x%x" % (rgb[0]/16,rgb[1]/16,rgb[2]/16)

class Text:
    def __init__(self, x,y,txt, color, isItbig, isBold):
        self.x = x
        self.y = y
        self.txt = txt
        self.color = color
        self.isItbig = isItbig 
        self.isBold = isBold
    def strarray(self):
        if ( self.isItbig == True ):
          if ( self.isBold == True ):
            retval = [" <text y="%d" x="%d" style="font-size:18px;font-weight:bold;fill:%s">%s</text>
" %(self.y, self.x, self.color,self.txt) ]
          else:
            retval = [" <text y="%d" x="%d" style="font-size:18px;fill:%s">%s</text>
" %(self.y, self.x, self.color,self.txt) ]
        else:
          if ( self.isBold == True ):
            retval = [" <text y="%d" x="%d" style="fill:%s;font-weight:bold;">%s</text>
" %(self.y, self.x, self.color,self.txt) ]
          else:
            retval = [" <text y="%d" x="%d" style="fill:%s">%s</text>
" %(self.y, self.x, self.color,self.txt) ]
        return retval

class Circle:
    def __init__(self,center,radius,color, perc):
        self.center = center #xy tuple
        self.radius = radius #xy tuple
        self.color = color   #rgb tuple in range(0,256)
        self.perc = perc
        return

    def strarray(self):
        global svcount
        diam = self.radius+self.radius
        fillamt = self.center[1]-self.radius - 6 + (100.0 - self.perc)*1.9
        xpos = self.center[0] - self.radius
        retval = ["  <circle cx="%d" cy="%d" r="%d"
" %
                (self.center[0],self.center[1],self.radius),
                "    style="stroke: %s;stroke-width:2;fill:white;filter:url(#dropshadow)"  />
" % colorstr(self.color),
               "  <circle clip-path="url(#dataseg-%d)" fill="%s" cx="%d" cy="%d" r="%d"
" %
                (svcount, colorstr(self.color),self.center[0],self.center[1],self.radius),
                "    style="stroke:rgb(0,0,0);stroke-width:0;z-index:10000;"  />
",
               "<clipPath id="dataseg-%d"> <rect height="%d" width="%d" y="%d" x="%d"></rect>" %(svcount,diam, diam,fillamt,xpos),
               "</clipPath>
"
                ]
        svcount += 1
        return retval

def languageDistribution():
    scene = Scene('test')
    scene.add(Circle((140,146),100,(0,128,0),54))
    scene.add(Circle((370,146),100,(232,33,50),42))
    scene.add(Circle((600,146),100,(32,119,180),65))
    scene.add(Circle((830,146),100,(255,128,0),27))
    scene.add(Text(120,176,"English", "white", False, True))
    scene.add(Text(120,196,"Speaking", "#e2e2e2", False, False))
    scene.add(Text(340,202,"German", "black", False, True))
    scene.add(Text(576,182,"Spanish", "white", False, True))
    scene.add(Text(804,198,"Japanese", "black", False, True))

    scene.add(Text(120,88,"54%", "black", True, True))
    scene.add(Text(350,88,"42%", "black", True, True))
    scene.add(Text(585,88,"65%", "black", True, True))
    scene.add(Text(815,88,"27%", "black", True, True))

    scene.write_svg()
    scene.display()
    return

if __name__ == '__main__': languageDistribution()

The preceding example gives an idea to create custom svg methods for visualization. There are many other svg writers in Python today, but none of them have demonstrated the methods to display the one that we have shown here. There are also many different ways to create custom visualization methods in other languages, such as Julia. This has been around for almost three years now and is considered suitable for numerical and scientific computing.

How is Julia different from Python?

Julia is a dynamic programming language. However, it is comparable to C in terms of performance because Julia is a low-level virtual machine-based just-in-time compiler (JIT compiler). As we all know, in Python, in order to combine C and Python, you may have to use Cython.

Some notable advantages of Julia are as follows:

  • Performance comparable to C
  • The built-in package manager
  • Has lisp-like macros
  • Can call Python functions using the PyCall package
  • Can call C functions directly
  • Designed for distributed computing
  • User-defined types are as fast as built-ins

The only disadvantage is that you have to learn a new language, although there are some similarities with C and Python.

D3.js (where D3 in short means DDD, which stands for document-driven data) is one among the competing frameworks in Python for visualization.

D3.js for visualization

D3.js is a JavaScript library for presenting data on the Web and helps in displaying data, using HTML, SVG, and CSS.

D3.js attaches data to Document Object Model (DOM) elements; therefore, you can use CSS3, HTML, and SVG to showcase their data. Furthermore, as JavaScript has event listeners, you can make the data interactive.

Mike Bostock created D3.js during his PhD work at the Stanford Visualization Group. First, Mike worked with the Stanford Visualization Group to produce Protivis, which then eventually became D3. Mike Bostock, Vadim Ogievetsky, and Jeffrey Heer produced a paper titled D3: Data-Driven Documents, which can be accessed at http://vis.stanford.edu/papers/d3.

In practice, the underlying principle of D3.js is to use the CSS style selector to select from DOM nodes and then use the jQuery style to manipulate them. Here is an example:

d3.selectAll("p")            // select all <p> elements
  .style("color", "#FF8000") // set style "color" to value "#FF8000"
  .attr("class", "tin")      // set attribute "class" to value "tin"
  .attr("x", 20);            // set attribute "x" to 20px

One of the many advantages of D3 is that by simply accessing a mechanism of DOM, you can create a stunning representation of data. Another advantage is that by fully using the power of JavaScript combined with the power of computing today, you can easily add the navigational behavior quickly. There is a large collection of such visualizations available at http://bost.ocks.org/mike/. One example of D3 visualization plot is shown here:

D3.js for visualization

There are many visualization examples that you can produce, and among the examples in the gallery (http://christopheviau.com/d3list/gallery.html#visualizationType=lollipop), my favorite is the one that tells the story about different aggregations using multiple series and multiple axes, which can be viewed at http://tinyurl.com/p988v2u (also shown in the preceding image).

Dashboards

Python has many advantages compared to D3. When you combine these two, you can use the best of both. For instance, Python offers some very good options of packages for numerical and scientific computing. For this reason, it has been very popular to academia.

There are very few interesting data visualization and collaboration tools that have emerged lately, and one such tool is Plotly (https://plot.ly). The Python dashboard collection can be accessed at https://plot.ly/python/dashboard/. As this is fairly new, we have not had a chance to explore further to see what one can do. Splunk offers an SDK to create Python-based dashboards at http://dev.splunk.com/view/SP-CAAADSR, and Pyxley is a collection of packages that combine the power of Python and JavaScript to create web-based dashboards. One of the examples from Splunk Dashboard is shown here:

Dashboards

One of the examples of Plotly is shown in the preceding image. It demonstrates how you can generate a visualization that looks pretty, is easy to understand, and is navigable at http://tinyurl.com/pwmg5zr.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset