Often, simply querying data won't do everything you need to do. The data might not be in a form you can use, for instance. In that case, you'll need to transform the data. You can easily do this in Cascalog.
For this recipe, we'll define a custom operation and use it to split year ranges of the form '2000–2010' into two fields.
We'll use the same dependencies and inclusions that we used in the Initializing Cascalog and Hadoop for distributed processing recipe. We'll also use the Doctor Who companion data from that recipe.
#"u2013"
). If the input isn't a range (that is, it's just a year), then the year is returned for both the start and end of the range:(defmapfn split-range [date-range] (let [[from to] (string/split (str date-range) #"u2013" 2)] [from (if (nil? to) from (str (.substring from 0 2) to))]))
user=> (?<- (stdout) [?n ?name ?from ?to] (actor ?n ?name ?range) (split-range ?range :> ?from ?to)) … RESULTS ----------------------- 1 William Hartnell 1963 1966 2 Patrick Troughton 1966 1969 3 Jon Pertwee 1970 1974 …