A walk around the googleVis package

googleVis is a package in R that mainly interfaces R and Google Chart's API. This means that you can create Google charts within R via high-level functions. This has the great advantage of not needing to make service calls and parse the objects to generate the charts. Unlike traditional plotting in R, Google charts are displayed in a browser. In fact, their plot creation functions do not display a plot directly but generate an HTML code.

When working under R but not in a Shiny application, a plot() call with the HTML object as argument automatically opens a browser with the corresponding plot. The following is an example of this:

data(iris)

iris.table <- aggregate(Petal.Length ~ Species, data=iris, FUN="mean")

column.chart <- gvisColumnChart(iris.table,"Species","Petal.Length")
plot(column.chart)

As it was said previously, gvisColumnChart() does not generate a plot by itself but it generates a list with an HTML code that will generate the corresponding plot afterwards. This can be seen, if required, by typing the object name where the output was stored (in this case, column.chart) or by directly calling the function without storing it anywhere. This will display the generated list that contains the HTML code in the console.

From this small example, the advantages and disadvantages of this package can be already appreciated: the main advantage is that this enables us to create very attractive graphics (with tooltips and other types of in-graphic interactions) with very little effort. However, this has the drawback that the possibilities of customization are limited and in comparison to native R plotting options it is harder to code. Nevertheless, if what you need to display can be done within Google charts, the easiest and nicest way to do it will be probably in R.

googleVis in R

The googleVis functions tend to have unusual argument syntaxes. This relies on the fact that although most of the passed arguments and functionalities of Google charts have been adapted to R's coding style, there are still some parameters that need to be defined in an unusual way for R, especially when some layouts need to be different from the default.

Another unusual fact of the functions in googleVis is that they always receive a data frame. These data frames already must have the data ready to plot, as Google charts will only ask for the variables needed to do the plots from within the passed data frame. This relies on the fact that all these functions only transform the data frame to a JSON data object but perform no calculations unlike, for example, boxplots in the graphics package (in which the whole vector can be directly passed and R calculates the different values needed to do the plots).

Note

Options parameter for the googleVis() functions

Almost every function in googleVis provides the possibility of changing some of the default layouts. In R, they can be changed by passing the desired value in the corresponding argument inside options. Unfortunately, the full documentation of the different options that can be edited is not available in R's package documentation.

However, they can be found at https://developers.google.com/chart/interactive/docs/gallery. All the options that appear in each of the functions can be edited in the way described. Some of the following examples illustrate how to do this.

An overview of some functions

As it happened with the previous graphics package, only some of the graphical possibilities will be covered in depth in this section. However, the package covered here provides a very good demo that can be run by typing the following:

demo(googleVis)

This statement triggers commented examples where the user can clearly appreciate what each function does and how it must be constructed. Anyway, for further questions, the documentation can be found at http://cran.r-project.org/web/packages/googleVis/googleVis.pdf.

These are the topics that will be covered within this section. They were chosen mainly due to their novelty. As it can be seen in the demo, googleVis provides all kinds of graphics but these are probably either unusual or different from traditional plotting options:

  • Candlesticks
  • Geolocalized visualizations
  • Treemaps
  • Motion charts

Candlesticks

Candlesticks are graphics packages designed for financial analysis to describe the behavior of a variable within a period of time. Originally, they were created to describe the behavior of stocks per day. Although, they look very similar to boxplots, candlesticks should not be confused with these as they represent completely different things.

In candlesticks, only four values from the series that they represent are displayed: the first one, the last one, the highest one, and the lowest one. The graph mainly consists of a rectangle whose height is the difference between the first and the last value. Of course, the lowest value between the first and the last value will represent the lower side of the rectangle while the highest value will represent the upper side.

The fill color of the rectangle will be different according to whether the first value is greater or smaller than the last value. Lastly, from the rectangle's top and bottom, two lines will be drawn until the lowest or highest value respectively.

The googleVis function that creates the HTML to do the candlestick charts receives a data frame and a string representing the name of the variable for each of the values needed to draw a candlestick chart (that is, low, high, open, and close). The categorical variable (that could be, for example, a date) goes under xvar. gvisCandlestickChart() will plot one candlestick per variable.

The following example shows an artificially created dataset that matches some conditions, for example, the low value is in fact the lowest one for each series, and then each variable is passed to its corresponding argument in gvisCandlestickChart():

library(googleVis)
#Artificial dataset generation
example.data <- data.frame(year = 2005:2014, open = runif(10,0,100), close = runif(10,0,100))
example.data$low <- apply(example.data[,2:3],1, function(x) min(x) - runif(1,0,10))
example.data$high <- apply(example.data[,2:3],1, function(x) max(x) + runif(1,0,10))

#Plotting
candlestick.chart <- gvisCandlestickChart(example.data, xvar = "year", low="low",open="open",
close="close",high="high")
plot(candlestick.chart)

Geolocalized visualizations

Although there are other possibilities to display geolocalized visualizations, using googleVis is definitely the best one as Google charts is perfectly integrated to Google Maps to provide, in the end, very simple ways to display visualizations with maps and georeferenced data.

There are several different possibilities to plot geolocalized data in R, but they can be divided in two big groups: the ones that use latlong values, and the ones that refer to a geographical space by name (for example, a country name). Most of the functions that create visualizations based on geolocalized data accept both the alternatives as locationvar. Two examples using each of these are given in the following.

In the first one, an artificial data frame with approximate latlong values inside the USA is plotted. Here, region is set to US inside the options argument. The default for this argument is world (that is, display of the whole world):

library(googleVis)
#Artificial Dataset generation

latitudes <- runif(10,27,49)
longitudes <- runif(10,-125,-72)
values <- runif(10,0,100)

us.dataset <- data.frame(lat=latitudes,long=longitudes,val=values)

#Generate a latlong variable as expected in 'locationvar'

us.dataset$latlong <- paste(us.dataset$lat,us.dataset$long,sep=":")

#Map HTML creation

us.map <- gvisGeoChart(us.dataset, locationvar="latlong",sizevar="val",
options = list(region="US"))

#Plotting

plot(us.map)

Alternatively, different codes or names representing geographical regions can be used. In the following example, an artificial dataset for Brazil, Argentina, Peru, and Paraguay is built. After this, the same function as before is plotted but with some differences. Apart from changing the region, displayMode was set to regions. This causes instead of dots the whole surface of the country to be painted with the corresponding color:

#Artificial Dataset Generation

countries <- c("BR","AR","PE","PY")
value1 <- runif(4,0,10)
value2 <- round(runif(4,0,100))

sa.dataset <- data.frame(countries=countries,val1=value1,val2=value2)

#Plot of the Map. '005' is the region code for South America

southamerica.map <- gvisGeoChart(sa.dataset, locationvar="countries",sizevar="val1",
hovervar="val2",
options = list(region="005",displayMode="regions"))

#Plotting

plot(southamerica.map)

Treemaps

Treemaps are very useful visualizations for hierarchies, that is, subelements that belong to a greater element. It displays the relationship between three dimensions: the hierarchy, the colors, and the size.

They are used in multiple different areas, such as computer science (for instance, to display directories and subdirectories), economy (a very good example of this is available at the MIT's Observatory of Economic Complexity, http://atlas.media.mit.edu/explore/tree_map/), and news (http://newsmap.jp/) among others.

gvisTreemap() is the function to create treemaps in googleVis. In the following code, the structure of this can be clearly seen. Firstly, idvar, the variable which indicates the name of the elements, is expected. In this case, this variable will be the regions variable. Each row must also have another row on which it depends or belongs to. This must be specified in another column and passed to the function in the parentvar argument.

As you can see from the following code, the root node, that is, the node that does not belong to another node, has an NA value under this column. gvisTreemap() only accepts one root node. The size variable determines the size of each of the squares. This is done, however, by comparing only the elements of the same node, for example, Asia, America, and Europe, or South America and North America. The values of Asia and South America neither the values of Japan and Brazil are compared. It's not necessary that the sum of the child nodes is equal to the parent node:

library(googleVis)

#Generate random data with dependencies

regions <- c("World","America","Europe","Asia","South America",
"North America","Western Europe","Eastern Europe", "Middle East",
"Far East", "Argentina","Brazil","USA","Canada", "Germany",
"France","Hungary","Russia","Israel","Saudi Arabia","China","Japan")

dependency <- c(NA,"World","World","World","America","America","Europe","Europe",
"Asia","Asia","South America","South America","North America",
"North America", "Western Europe", "Western Europe",
"Eastern Europe", "Eastern Europe", "Middle East", "Middle East",
"Far East", "Far East")

size <- runif(22,1,100)
color <- runif(22,1,100)

frame <- data.frame(regions=regions,dependency=dependency,size=size,color=color)

#Plot treemap

treemap <- gvisTreeMap(frame, "regions","dependency","size","color")

plot(treemap)

Tip

Left-clicking on a square shows one level down (left-clicking on Asia displays Middle East and Far East) while right-clicking shows one level up.

Motion chart

Originally developed by Hans Rosling in GapMinder and now offered by Google under the name of motion chart, this is a visualization whose main advantage relies on the amount of variables it can display at the same time without compromising visual clarity.

In a very general way, this describes the evolution of a series of variables over time. It consists mainly of bubbles whose positions depend on their values for the variables represented on the X and Y axes and whose color and size depict the value of the other two variables. These last two parameters are optional; in case they are not used, the bubbles will be of the same size/color.

Note

A very impressing example of a problem described with motion charts is given by Rosling himself in this video: https://www.youtube.com/watch?v=jbkSRLYSojo.

For the following example, an additional WDI package was installed. WDI is a package that retrieves data from the World Bank API. As this type of visualization requires a temporal variable, WDI data is very easy to display in this kind of graphs. For this example, some arbitrary indicators and countries were taken. In this case, a variable was assigned to every option, even to size and color:

#Install WDI to obtain data from the World Bank API and call the library(gooeglVis)

install.packages("WDI")
library(WDI)

# Load some data

indicators <- c("BM.KLT.DINV.GD.ZS","BG.GSR.NFSV.GD.ZS","EN.ATM.CO2E.PP.GD","NY.GDP.MKTP.CD")
countries <- c("AR","BR","DE","US","CA","FR","GB","CN","RU","JP")

frame <- WDI(country = countries, indicator = indicators, start = 2005, end=2013)

#Change indicator names just to make it easier to understand

names(frame)[4:7] <- paste0("indicator",1:4)

#Graph HTML Creation

motionchart <- gvisMotionChart(data = frame, idvar = "iso2c", timevar = "year", xvar = "indicator1", yvar = "indicator2", sizevar = "indicator3", colorvar = "indicator4")

#Plotting

plot(motionchart)

This visualization is similar to a small dashboard, as it provides us with the possibility of changing the variables of the different indicators (each of them has a small drop-down menu with all the available variables in the dataset) or even changing the type of visualization shown by clicking on one of the icons in the top-right corner.

googleVis in Shiny

googleVis in Shiny has two particular characteristics that are worth mentioning: firstly, it has its own reactive function, which only works for googleVis visualizations. This function is renderGvis(). In the next example of a Shiny web application done entirely with googleVis, it is shown clearly how this works.

Another particular thing about googleVis is that, instead of plotOutput(), it uses HTMLOutput() in UI.R. This makes absolute sense if we consider that the output of all the googleVis functions are mainly HTML code

A small example of googleVis in Shiny

Taking the World Bank example in the motion chart, in the following, you will find a Shiny application done entirely with the googleVis visualizations that you can reproduce as any other example, simply by creating the same files that appear here.

Note

Due to some reasons that definitely exceed the scope of this book, the following example works properly only on a separate browser. This means that after running it, please select Run External in newer versions of RStudio, or click on Open in Browser and test this from the browser window in older ones.

In global.R, the WDI library is used, which is mainly an interface to connect to the World Bank API and where data from different indicators can be retrieved by year and country. In this script, firstly, all indicators are retrieved with WDIsearch() and some indicators are chosen (the election was arbitrary). After this, the data for these indicators for an arbitrary list of countries between 2005 and 2013 is retrieved.

Finally, an indicator vector and a country vector is created. These vectors are named just to illustrate how a named vector works in UI.R. However, this is not necessary. Have a look at the following code snippet for global.R:

#Call WDI library

library(WDI)
library(reshape2)
library(googleVis)

#Load all indicators

all.indicators <- as.data.frame(WDIsearch())

#Take 6 indicators

used.indicators <- all.indicators[c(1:3,12,14,15),]

#Retrieve Data from indicators

countries <- c("AR","BR","DE","US","CA","FR","GB","CN","RU","JP")


frame <- WDI(country = countries, indicator = as.character(used.indicators[,1])
             , start = 2005, end=2013)

#Create indicator's vector

indicators.vector <- as.character(used.indicators[,1])
names(indicators.vector) <- as.character(used.indicators[,2])

#Create countries' vector

countries.vector <- unique(frame$iso2c)
names(countries.vector) <- unique(frame$country)

In UI.R, the input options are defined by the data retrieved in global.R. As it was explained, UI.R uses the named character vectors in checkboxGroupInput() and selectInput(). If a named vector is passed, the names are displayed in the applications frontend while the variable adopts the value from the selected element. With respect to sliderInput(), the minimum and maximum values are directly taken from the dataset created in global.R.

Tip

This strategy of passing the values by reference, instead of hardcoding them, is much more flexible. In case any change of countries or indicators is done in global.R, UI.R will keep working.

In the output section, a tabset with two tabs is displayed: one for the intensity map, and the second one for the motion chart. The following code is for UI.R:

library(shiny)

# Starting line
shinyUI(fluidPage(
  
  # Application title
  titlePanel("World Bank Dashboard with GoogleVis"),
  
  # Sidebar
  sidebarLayout(
  sidebarPanel(
      #Country selection
      checkboxGroupInput("countries","Select the countries:",
                  countries.vector,
                  selected=countries.vector),
      
      #Years selection
      sliderInput("years","Select the year range",min(frame$year),max(frame$year),
                  value = c(min(frame$year),max(frame$year))),
      #Map variable selection
      selectInput("map.var","Select the variable to plot in the map",indicators.vector)),
    
  #The plot created in server.R is displayed
    mainPanel(
      #htmlOutput("MotionChart")
      tabsetPanel(
        tabPanel("Map Chart",htmlOutput("Map")),
        tabPanel("Motion Chart",htmlOutput("MotionChart"))
    )
  )
  
  )
))

In server.R, subsets of the data are first created according to the filters applied. After this, each of the functions that create their visualization work differently according to their needs. In order to create the map, a sum aggregation by country code is performed for every variable. This is needed because the original dataset is split by years; in this case, one value per item (that is, country) is needed.

After this, the selected variable in the drop-down menu (selectInput()) is selected to be the intensity variable (passed in the sizevar argument). This piece of code can be optimized as some variables are needlessly aggregated in the aggregation phase (basically, all the variables that will not be used).

Note

Unfortunately, there is no link to provide here. The optimization relies mainly on the way it is coded. Basically, the aggregation expression can be written by dynamically taking only the variable needed, but this would have required an explanation of expression objects, which is definitely a more advanced stage of R.

The dataset for motion chart graphics on the contrary does not need any modifications in order to make it work. This is the reason why the chart creation function is called directly with the corresponding variables passed to it. The following is the code for server.R:

library(shiny)

#initialization of server.R
shinyServer(function(input, output) {

  frame.sset <- reactive({subset(frame,iso2c %in% input$countries &
    year >= input$years[1] &
    year <= input$years[2])})
  
  #Table generation where the summary is displayed
  output$Map <- renderGvis({
    aggregated.frame <- aggregate(.~iso2c + country,frame.sset()[,-3], sum)
    map <- gvisGeoChart(aggregated.frame, locationvar="iso2c",sizevar=input$map.var,
      hovervar="country",
      options = list(region="world",displayMode="regions"))
    return(map)
    })
  
  output$MotionChart <- renderGvis({
    mchart <- gvisMotionChart(frame.sset(), "country","year")
    return(mchart)
  })
  
})
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset