Clojure has a number of tools to work with agents. One of them is validators. When an agent's message function returns a value, any validator functions assigned to that agent receive the agent's data before it does. If the validators return true, all is well. The agent is updated and processing continues. However, if any validator returns false or raises an error, an error is raised on the agent.
This can be a handy tool to make sure that the data assigned to your agent conforms to your expectations, and it can be an important check on the consistency and validity of your data.
For this recipe, we'll read data from a CSV file and convert the values in some of the columns to integers. We'll use a validator to ensure that this actually happens.
For this recipe, we'll use the dependencies and requirements that we did from the Managing program complexity with STM recipe. We'll also use the lazy-read-csv
and with-header
functions from that recipe, and we'll use the data file that we used in that recipe. We'll keep that filename bound to data-file
.
This recipe will be built from a number of shorter functions:
(def int-rows [:GEOID :SUMLEV :STATE :POP100 :HU100 :POP100.2000 :HU100.2000 :P035001 :P035001.2000])
(defn int? [x] (or (instance? Integer x) (instance? Long x)))
(defn try-read-string [x] (try (read-string x) (catch Exception ex x)))
(defn coerce-row [_ row sink] (let [cast-row (apply assoc row (mapcat (fn [k] [k (try-read-string (k row))]) int-rows))] (send sink conj cast-row) cast-row))
coerce-row
agent, queues itself to read another item of the input, and sets its value to the rest of the input:(defn read-row [rows caster sink] (when-let [[item & items] (seq rows)] (send caster coerce-row item sink) (send *agent* read-row caster sink) items))
coerce-row
agent. It checks that the integer fields are either integers or empty strings:(defn int-val? [x] (or (int? x) (empty? x))) (defn validate [row] (or (nil? row) (reduce #(and %1 (int-val? (%2 row))) true int-rows)))
(defn agent-ints [input-file] (let [reader (agent (seque (with-header (lazy-read-csv input-file)))) caster (agent nil) sink (agent [])] (set-validator! caster validate) (send reader read-row caster sink) {:reader reader :caster caster :sink sink}))
:sink
agent:user=> (def ags (agent-ints data-file)) #'user/ags user=> (first @(:sink ags)) {:SUMLEV 160, :P035001 2056, :HU100.2000 3788, :HU100 4271, :NAME "Abingdon town", :GEOID 5100148, :NECTA "", :CBSA "", :CSA "", :P035001.2000 2091, :POP100.2000 7780, :CNECTA "", :POP100 8191, :COUNTY "", :STATE 51}
The agent-ints
function is pretty busy. It defines the agents, sets everything up, and returns the map containing the agents.
Let's break it down:
(let [reader (agent (seque (with-header (lazy-read-csv input-file)))) caster (agent nil) sink (agent [])]
These lines define the agents. One reads in the data, one converts it to integers, and one accumulates the results. This figure illustrates that process:
Next, read-row
simply gets the first item of the input and sends it to the caster
agent. The coerce-row
function tries to change the data in the columns listed in int-rows
to integers. It then passes the results to the sink
agent. Before it's completely done, however, its new state is passed to its validator function, validate
.
The validator allows nil
rows (for the agent's initial state) or integer fields that contain either integers or empty strings. Finally, the sink
agent is called with conj
. It accumulates the converted results.