Sometimes, we are more interested in how values change over time, or across some other progression, than we are in the values themselves. This information is latent in the data, but making it explicit makes it easier to work with and visualize.
First, we'll use these dependencies in our project.clj
:
(defproject statim "0.1.0" :dependencies [[org.clojure/clojure "1.6.0"] [incanter "1.5.5"]])
We also need to require Incanter in our script or REPL:
(require '[incanter.core :as i] 'incanter.io)
Finally, we'll use the Virginia census data. You can download the file from http://www.ericrochester.com/clj-data-analysis/data/all_160_in_51.P3.csv:
(def data-file "data/all_160_in_51.P3.csv")
For this recipe, we'll take some census data and add a column to show the change in population between the 2000 and 2010 censuses:
(def data (incanter.io/read-dataset data-file :header true))
(defn check-int [x] (if (integer? x) x 0)) x 0))
(def growth-rates (->> data (i/$map check-int :POP100.2000) (i/minus (i/sel data :cols :POP100)) (i/dataset [:POP.DELTA]) (i/conj-cols data)))
user=> (i/sel growth-rates :cols [:NAME :POP100 :POP100.2000 :POP.DELTA] :rows (range 5)) | :NAME | :POP100 | :POP100.2000 | :POP.DELTA | |-----------------+---------+--------------+------------| | Abingdon town | 8191 | 7780 | 411.0 | | Accomac town | 519 | 547 | -28.0 | | Alberta town | 298 | 306 | -8.0 | | Alexandria city | 139966 | 128283 | 11683.0 | | Allisonia CDP | 117 | | 117.0 |
This was a pretty straightforward process, but let's look at it line-by-line to make sure everything's clear. We'll follow the steps of the ->>
macro.
replace-empty
function we defined earlier to get rid of empty values:(->> data (i/$map check-int :POP100.2000)
(i/minus (i/sel data :cols :POP100))
:POP.DELTA
:(i/dataset [:POP.DELTA])
incanter.core/conj-cols
. This function takes two datasets with the same number of rows, and it returns a new dataset with the columns from both of the input datasets:(i/conj-cols data))