While we can filter datasets before we import them into Incanter, Incanter makes it easy to filter and create new datasets from the existing ones. We'll take a look at its query language in this recipe.
We'll use the same dependencies, imports, and data as we did in the Selecting columns with $ recipe.
Once we have the data, we query it using the $where
function:
user=> (def land-use (i/$where {:Indicator-Code "AG.LND.AGRI.ZS"} chn-data)) user=> (i/nrow land-use) 1 user=> (i/$ [:Indicator-Code :2000] land-use) ("AG.LND.AGRI.ZS" "56.2891584865366")
user=> (i/$ (range 5) [:Indicator-Code :1962]
(i/$where {:1962 {:ne ""}} chn-data))
| :Indicator-Code | :1962 |
|-------------------+-------------------|
| AG.AGR.TRAC.NO | 55360 |
| AG.LND.AGRI.K2 | 3460010 |
| AG.LND.AGRI.ZS | 37.0949187612906 |
| AG.LND.ARBL.HA | 103100000 |
| AG.LND.ARBL.HA.PC | 0.154858284392508 |
Incanter's query language is even more powerful than this, but these examples should show you the basic structure and give you an idea of the possibilities.
To better understand how to use $where
, let's break apart the last example:
($i/where {:1962 {:ne ""}} chn-data)
The query is expressed as a hashmap from fields to values (highlighted). As we saw in the first example, the value can be a raw value, either a literal or an expression. This tests for inequality.
($i/where {:1962 {:ne ""}} chn-data)
Each test pair is associated with a field in another hashmap (highlighted).
In this example, both the hashmaps shown only contain one key-value pair. However, they might contain multiple pairs, which will all be ANDed together.
Incanter supports a number of test operators. The basic boolean tests are :$gt
(greater than), :$lt
(less than), :$gte
(greater than or equal to), :$lte
(less than or equal to), :$eq
(equal to), and :$ne
(not equal). There are also some operators that take sets as parameters: :$in
and :$nin
(not in).
The last operator—:$fn
—is interesting. It allows you to use any predicate function. For example, this will randomly select approximately half of the dataset:
(def random-half (i/$where {:Indicator-Code {:$fn (fn [_] (< (rand) 0.5))}} chnchn-data))
For full details of the query language, see the documentation for incanter.core/query-dataset
(http://liebke.github.com/incanter/core-api.html#incanter.core/query-dataset).