In order to do very complex or meaningful analysis, we'll need to be able to pass vector or matrix data into R to operate on and analyze.
Let's see how to do this.
We must first complete the recipe, Setting up R to talk to Clojure, and have Rserve running. We must also have the Clojure-specific parts of that recipe done and the connection to Rserve made.
We'll also need access to the clojure.string
namespace:
(require '[clojure.string :as str])
To make passing values into R easier, we'll first define a protocol and then we'll use it to pass a matrix to R:
ToR
. Any data types that we want to marshal into R must implement this, as follows:(defprotocol ToR (->r [x] "Convert an item to R."))
(extend-protocol ToR clojure.lang.ISeq (->r [coll] (str "c(" (str/join , (map ->r coll)) ")")) clojure.lang.PersistentVector (->r [coll] (->r (seq coll))) java.lang.Integer (->r [i] (str i)) java.lang.Long (->r [l] (str l)) java.lang.Float (->r [f] (str f)) java.lang.Double (->r [d] (str d)))
mean
function:(defn r-mean ([coll] (r-mean coll *r-cxn*)) ([coll r-cxn] (.. r-cxn (eval (str "mean(" (->r coll) ")")) asDouble)))
user=> (r-mean [1.0 2.0 3.0]) 2.0 user=> (r-mean (map (fn [_] (rand)) (range 5))) 0.3966653617356786
For most data types, marshaling to R simply means converting it to a string. However, for sequences and vectors, it's a little more complicated. Clojure has to convert all the sequence's items to R strings, join the items with a comma, and wrap it in a call to R's c
constructor.
This is a perfect place to use protocols. Defining methods in order to marshal more data types to R is simple. For example, we can define a naïve method to work with strings as shown here:
(extend-protocol ToR java.lang.String (->r [s] (str ' s ')))
Of course, this method isn't without its problems. If a string has a quote within it, for instance, it must be escaped. Also, having to marshal data types back and forth in this manner can be computationally expensive, especially for large or complex data types.