A Sankey diagram is a really powerful way of displaying your data. Particularly, Sankey diagrams are a really convenient way of showing flows of data from their origin to their end.
A really famous example of these kind of diagrams is the one presented by Charles Minard's 1869 chart showing the number of men in Napoleon's 1812 Russian campaign army, their movements, as well as the temperature they encountered on the return path:
In a Sankey diagram, a given amount is shown on the leftmost side of the plot and, while moving to the right (which can be interpreted as the flow of time), this given amount is split into parts or simply reduced. The latter is the case for the Minard's diagram, where soldiers died during the campaign and the number of deaths are counted in a separate line plot at the bottom.
In order to get started with this recipe, you will need to install and load the networkD3
and jsonlite
packages:
install.packages(c("networkD3","jsonlite")) library(networkD3) library(jsonlite)
The first package is the one which implements the sankeyNetwork()
function that we will leverage in our recipe, while the second one is simply required to parse the dataset we will use from the JSON format to the data frame.
Our example will regard energy flow from production to the final usage or waste, using the original dataset provided by Christopher Gandrud, creator of the networkD3
package.
In order to make this dataset available, we first need to download it and then convert it from jsonlite
to an ordinary list:
URL
object pointing to the data source:URL <- paste0("https://cdn.rawgit.com/christophergandrud/networkD3/", "master/JSONdata/energy.json")
Energy
object where we can store the data from the defined source:Energy <- jsonlite::fromJSON(URL)
This Energy
list will now be composed by two data frames; one for nodes (that is, vertex) and one for links (that is, edges):
List of 2 $ nodes:'data.frame': 48 obs. of 1 variable: ..$ name: chr [1:48] "Agricultural 'waste'" "Bio-conversion" "Liquid" "Losses" ... $ links:'data.frame': 68 obs. of 3 variables: ..$ source: int [1:68] 0 1 1 1 1 6 7 8 10 9 ... ..$ target: int [1:68] 1 2 3 4 5 2 4 9 9 4 ... ..$ value : num [1:68] 124.729 0.597 26.862 280.322 81.144 …
The latter data frame is a list of weighted hedges, where a starting point and an end point are exposed, and this link is weighted by a value attribute.
If you are willing to apply this recipe to your data, which I hope you are, you should have them arranged within two distinct data frames with the following structure:
Nodes
, you should write 0
here (and 3
within the to argument) as shown in this example; be aware that first node has value 0
and not 1
It may be useful to you to underline that the second data frame is rightly named a hedge list, where each observation represents a hedge of your network.
sankeyNetwork(Links = Energy$links, Nodes = Energy$nodes, Source = "source", Target = "target", Value = "value", NodeID = "name", units = "TWh", fontSize = 12, nodeWidth = 30)
This will result in the following Sankey diagram:
fontSize
parameter:sankeyNetwork(Links = Energy$links, Nodes = Energy$nodes, Source = "source", Target = "target", Value = "value", NodeID = "name", units = "TWh", fontSize = 10, nodeWidth = 30)
nodeWidth
:sankeyNetwork(Links = Energy$links, Nodes = Energy$nodes, Source = "source", Target = "target", Value = "value", NodeID = "name", units = "TWh", fontSize = 12, nodeWidth = 5)
This control will let you save your diagram as an HTML file.
In step 1 we call the
sankeyNetwork()
function, which will produce an interactive Sankey diagram in your RStudio Viewer pane, where the node alignment can be customized and flows can be highlighted by clicking on them.
In step 4 we save your Sankey diagram as a web page, which will let you embed on websites, preserving interactive features.