Once you've done the work of slicing, dicing, cleaning, and aggregating your datasets, you might want to save them. Incanter by itself doesn't have a good way to do this. However, with the help of some Clojure libraries, it's not difficult at all.
We'll need to include a number of dependencies in our project.clj
file:
(defproject inc-dsets "0.1.0" :dependencies [[org.clojure/clojure "1.6.0"] [incanter "1.5.5"] [org.clojure/data.csv "0.1.2"] [org.clojure/data.json "0.2.5"]])
We'll also need to include these libraries in our script or REPL:
(require '[incanter.core :as i] '[incanter.io :as i-io] '[clojure.data.csv :as csv] '[clojure.data.json :as json] '[clojure.java.io :as io])
Also, we'll use the same data that we introduced in the Selecting columns with $ recipe.
This process is really as simple as getting the data and saving it. We'll pull out the data for the year 2000 from the larger dataset. We'll use this subset of the data in both the formats here:
(def data2000 (i/$ [:Indicator-Code :Indicator-Name :2000] chn-data))
To save a dataset as a CSV, all in one statement, open a file and use clojure.data.csv/write-csv
to write the column names and data to it:
(with-open [f-out (io/writer "data/chn-2000.csv")] (csv/write-csv f-out [(map name (i/col-names data2000))]) (csv/write-csv f-out (i/to-list data2000)))
For CSV and JSON, as well as many other data formats, the process is very similar. Get the data, open the file, and serialize data into it. There will be differences in how the output function wants the data (to-list
or :rows
), and there will be differences in how the output function is called (for instance, whether the file handle is the first or second argument). But generally, outputting datasets will be very similar and relatively simple.
In Chapter 1, Importing Data for Analysis, we talked about how to read CSV files (Reading CSV data into Incanter datasets recipe) and JSON files (Reading JSON data into Incanter datasets recipe) into Incanter.