operations are a bonus in that the same data being used for D&R can be used for other
parallel computing purposes. Tessera connects to Spark using the SparkR package [19],
which exposes the Spark API in the R console.
Support in Tessera for Spark at the time of this writing is very experimental—it has
been implemented and works, and adding it to Tessera is a testament of Tessera’s flexibility
in being back end agnostic, but it has only been tested with rather small datasets.
3.6 Discussion
In this chapter, we have presented one point of view regarding methodology and compu-
tational tools for deep statistical analysis and visualization of large complex data. D&R is
attractive because of its simplicity and its ability to make a wide array of methods available
without needing to implement scalable versions of them. D&R also builds on approaches
that are already very popular with small data, particularly implementations of the split-
apply-combine paradigm such as the plyr and dplyr R packages. D&R as implemented in
datadr is future proof because of its design, enabling adoption of improved back-end tech-
nology as it comes along. All of these factors give D&R a high chance of success. However,
there is a great need for more research and software development to extend D&R to more
statistical domains and make it easier to program.
Handbook of Big Data
