Chapter 2. Clojure elements: Data structures and functions

This chapter covers

  • Clojure’s core data structures
  • Clojure functions
  • Program flow with Clojure

In the previous chapter, you read about some features of the Clojure language that make it interesting. You saw some code, but it probably looked a little alien. It’s now time to set that right. This chapter and the next address the basics of writing code in Clojure. This one will give an overview of the various data structures that make up the core of the language and walk you through the fundamentals of the structure and flow of Clojure programs. By the end of the next chapter, you’ll be able to read most Clojure code and write your own programs.

2.1. Coding at the REPL

Unlike many other languages, Clojure doesn’t have to be typed into files and compiled all at once. Instead, you can interactively build a working program an expression at a time and try out code immediately. This form of interactive development is possible through the read-evaluate-print loop (REPL). It’s an interactive shell similar to those provided by languages such as Ruby and Python. In this section we’ll introduce you to interacting with a live Clojure environment through the REPL, which will enable you to follow along with the lessons in this chapter and the next. We encourage you to read these chapters near a REPL, copy the examples, explore different approaches, and try some code of your own and see what happens.

If you haven’t already done so, get a Clojure REPL up and running—see appendix A for instructions. (If you don’t want to wait, go to http://www.tryclj.com.)

2.1.1. Clojure REPL

Clojure programs are usually not all typed out in one go. In fact, these days, programs in most languages are often written using test-driven design (TDD). This technique allows the programmer to build up a larger program from smaller units of tested code. Doing this keeps programmer productivity high because the focus is always on one piece of the program at any given time. You write the test for something, write just enough code to make the test pass, and repeat the process. This style of development also has the added benefit of leaving behind a set of regression tests that can be used later. It ensures that as the program is modified and enhanced, nothing breaks existing functionality.

Clojure code can also be written with a TDD approach; indeed, it often is. The Clojure REPL adds a fantastic tool that allows you to be even more productive than when using plain TDD. This combination of using the REPL alongside the typical TDD style results in far shorter code-test-debug cycles.

The REPL prompt (the text behind the cursor that waits for keyboard input) is the name of the active namespace followed by the > symbol. When you first start the REPL, you’ll see the following prompt:

user>

As this prompt shows, Clojure puts you into the default namespace of user. You can type Clojure code at this prompt. When you finish typing in a form (a single valid expression, also called a symbolic expression or s-expression)[1] and press Enter, the Clojure reader accepts the stream of characters from the prompt (or any other source) and converts it into Clojure data structures. The data structures are evaluated to produce the result of the program, which is usually another data structure. The Clojure printer attempts to print the result in a format that can be read back by the reader. Finally, Clojure loops back and waits for more input.

1

If you want to be super-pedantic, there’s a distinction between “form” and “expression.” A form is a single readable unit, such as a number, a pair of matched quotation marks for a string, or a pair of matched parentheses. Forms are what matter when reading. An expression is something with a value, such as a data structure. Expressions are what matter when evaluating. This distinction between forms and expressions, reading and evaluating, is another symptom of Lisp’s nonsyntax as explained in chapter 1: it’s possible to read forms without knowing how to evaluate them as expressions!

Let’s look at a concrete example of REPL interaction:

Lines starting with => are the printed values of the expression evaluated on the previous prompt. The first expression adds 1 and 2. The second expression defines a namespace-qualified global user/my-addition, which contains an addition function. That funny-looking #'user/my-addition is a var (created and returned by def). A var is a named, mutable container that holds a value, in this case the addition function. You’ll learn more about vars later. For now, just know that if you want to save the value of an expression to refer to it later, use (def variable-name "value to save").

The third and fourth expressions invoke the newly defined addition function and return the results. There’s no explicit “return” from the function—the value returned from a function is always the last expression evaluated in the function.

Notice from the final three lines that the REPL isn’t run per prompt or per line but per form. Clojure reads until it sees a complete form, then evaluates and prints, and then if there are still characters in its buffer it reads another form, evaluates, and prints. The REPL will not prompt the user for more input until it runs out of complete forms to read (which is exactly what happens when Clojure is reading code out of a file instead of from the prompt).

Functions like my-addition are usually created first in the REPL and then tested with various inputs. Once you’re satisfied that the function works, you copy the test cases into an appropriate test file. You also copy the function definition into an appropriate source file and run the tests. At any time you can modify the definition of the function in the REPL by redefining it, and your tests will run using the new definition. This is because the REPL is a long-running process with the various definitions present in memory. That means that functions using any such redefined functions will exhibit the new behavior.

Various editors can integrate with the REPL and provide convenient ways to evaluate code from inside files being edited. This kind of integration further increases the productivity of the REPL-based TDD cycle. (Chapter 10 has much more detail on testing and TDD using Clojure.)

Now that you’re somewhat comfortable interacting with a Clojure environment via the REPL, it’s time for you to write some more code. We’ll begin with the traditional “Hello, world!” program, and before ending the section, we’ll address a few more points about Clojure syntax.

2.1.2. “Hello, world!”

Let’s get started with a simple program. To keep with tradition, we’ll examine a program that prints “Hello, world!” as shown here:

user> (println "Hello, world!")
Hello, world!
=> nil

Pretty simple, right? But there are still a few points to note. First, notice that “Hello, world!” was printed on a line by itself, with no => in front of it, and that a second line says nil. What’s going on? The function println is unusual (well, for Clojure) because it’s a side-effecting function: it prints a string to standard-out and then returns nil. Normally you want to create pure functions that only return a result and don’t modify the world (for example, by writing to the console). So the Hello, world! line was printed during the REPL’s evaluation phase, and the => nil line was printed during the REPL’s print phase.

From now on we’ll omit the user> prompt and begin result lines with ;=> if there’s no ambiguity between return values and printed side effects. This is a convention designed to facilitate easy copy-pasting of Clojure code into files and REPLs.

Magic REPL variables

There are also four magic REPL variables that you should keep in mind to save you some typing as you experiment in the REPL: *1, *2, *3, and *e. These variables hold the value of the last, second-last, and third-last successfully read forms (that is, the lines starting with =>) and the last error. Each time a new form is evaluated successfully, its value is put in *1, and the old *1 moves to *2, and the old *2 moves to *3. For example:

If there’s an error, the number variables stay the same and the error is bound to *e:

())
;=> ()
RuntimeException Unmatched delimiter: )  clojure.lang.Util.runtimeException (Util.java:221)
*1
;=> ()
*e
;=> #<ReaderException clojure.lang.LispReader$ReaderException: java.lang.RuntimeException: Unmatched delimiter: )>

Before moving on to the various topics planned for this chapter, let’s look at a couple of facilities provided by Clojure that can help with the learning process itself.

2.1.3. Looking up documentation using doc, find-doc, and apropos

Thanks to a feature of Clojure called metadata, all functions have documentation available at runtime, even in the REPL. You’ll learn more in chapter 3 about adding documentation to functions you define yourself and about custom metadata, but before we introduce these concepts formally, we’ll review a family of functions you can use to search for and read Clojure documentation as you explore in the REPL: doc, find-doc, and apropos.

doc

Clojure provides a useful macro called doc that allows you to look up the documentation associated with any other function or macro. It accepts the name of the entity you’re trying to learn about. Here’s an example:

user> (doc +)
-------------------------
clojure.core/+
([] [x] [x y] [x y & more])
  Returns the sum of nums. (+) returns 0.

Note that it prints not only the documentation string but also what arguments can be passed to the function or macro. The line ([] [x] [x y] [x y & more]) is the argument specification. Each pair of square brackets describes a possible way of calling the function. For example, the + function can be called in any of the following ways:

The & symbol in a function argument specification means “and any number of optional arguments.” Functions like this are called variadic functions. You’ll learn more about using and defining variadic functions in chapter 3.

find-doc

The find-doc function accepts a string, which can be a regular expression (regex) pattern. It then finds the documentation for all functions or macros the names or associated documentation of which match the supplied pattern. Although doc is useful for when you know the name of the function or macro you want to look up, find-doc is useful if you aren’t sure of the name. Here’s an example:

user> (find-doc "lazy")
-------------------------
clojure.core/concat
([] [x] [x y] [x y & zs])
  Returns a lazy seq representing the concatenation of...
-------------------------
clojure.core/cycle
([coll])
  Returns a lazy (infinite!) sequence of repetitions of...
... more results

These two forms, doc and find-doc, are quite useful at the REPL when you want to quickly look up what a function does or you want to find the right options. You may see a lot of functions and documentation you don’t understand yet, but rest assured we’ll cover it all eventually.

apropos

A related function is apropos, which works very similarly to find-doc but prints only the names of the functions that match the search pattern. Here’s an example:

user=> (apropos 'doc)
(find-doc doc *remote-javadocs* javadoc add-remote-javadoc add-local-javadoc *local-javadocs*)

2.1.4. A few more points on Clojure syntax

In chapter 1, we discussed the unique, parentheses-heavy syntax that Clojure employs. We examined why it exists and what it makes possible. Before we start examining the various constructs of the Clojure language, let’s cover a few more key points about Clojure syntax:

  • Prefix notation
  • Whitespace and comments
  • Case sensitivity
Prefix notation

Clojure code uses prefix notation (also called polish notation) to represent function calls. For those who are new to Lisp, this definitely takes a little getting used to, especially when it comes to using math functions such +, /, *, and the like. Instead of writing 1 + 2, Clojure represents this evaluation as (+ 1 2). Prefix notation is less familiar than the mathematical form we all learned at school.

Regular functions, on the other hand, don’t have this problem. In a language such as Ruby, you’d call an add function as follows:

add(1, 2)

If you look closely, this is also prefix notation because the name of the function appears first, followed by arguments. The advantage of prefix notation for functions is that the function always appears as the first symbol, and everything else that follows can be treated as arguments to it. The Clojure version moves the parentheses (and drops the unnecessary comma, because whitespace is sufficient to delimit the arguments):

(add 1 2)

In most languages, mathematical functions like addition and subtraction are special cases built into the language as operators to make it possible to represent math in the more familiar in-fix notation. Clojure avoids this special case by not having any operators at all. Instead, math operators are just Clojure functions. All functions work the same way, whether they’re math related or not.

By avoiding special cases and relying on the same prefix notation for all functions, Clojure maintains its regularity and gives you all the advantages that come from having no syntax. We discussed this aspect of the language in some detail in chapter 1. The main advantage we talked about was that it makes it easy to generate and manipulate code. For example, consider the regular way in which Clojure structures the conditional cond form (you can think of this as a set of if-then-else clauses in other languages):

(def x 1)
;=> #'user/x
(cond
   (> x 0) "greater!"
   (= x 0) "zero!"
   (< x 0) "lesser!")
;=> "greater!"

This is a nested list, and it contains an even number of expressions that appear in pairs. The first element of each pair is a test expression, and the second is the respective expression that’s evaluated and returned if the test expression succeeds. Generating such a simple list is easy, especially when compared to a case statement in a language like Java.

This is the reason Clojure uses prefix notation, and most programmers new to this way of calling functions will get used to it in no time. Now, let’s discuss two more aspects of writing Clojure code: whitespace and comments.

Whitespace

As you’ve seen, Clojure uses parentheses (and braces and square brackets) to delimit fragments of code. Unlike languages such as Ruby and Java, it doesn’t need commas to delimit elements of a list (such as a vector or arguments passed to a function). You can use commas if you like, because Clojure treats them as whitespace and ignores them. So the following function calls are all equivalent:

(+ 1 2 3 4 5)
;=> 15
(+ 1, 2, 3, 4, 5)
;=> 15
(+ 1,2,3,4,5)
;=> 15
(+ 1,,,,,2,3 4,,5)
;=> 15

Although Clojure ignores commas, it sometimes uses them to make things easier for the programmer to read. For instance, if you have a hash map like the following

(def a-map {:a 1 :b 2 :c 3})
;=> #'user/a-map

and ask for its value at the REPL, the Clojure printer echoes it with commas:

user> a-map
{:a 1, :c 3, :b 2}

The results are easier to read, especially if you’re looking at a large amount of data. By the way, if you’re wondering why the ordering of the key-value pairs is different, it’s because hash maps aren’t ordered, and the Clojure printer doesn’t print them in any specific order. It makes no difference to the actual hash map, just how it looks after being printed. We’ll talk more about hash maps in this chapter. Now let’s look at comments.

Comments

Like most Lisps, single-line comments in Clojure are denoted using semicolons. To comment out a line of text, put one or more semicolons at the beginning. Here’s an example:

;; This function does addition.
(defn add [x y]
  (+ x y))
How many semicolons?

As an aside, some folks use the following convention relating to comment markers. Single semicolons are used when the comment appears after some program text. Double semicolons are used, as shown previously, to comment out an entire line of text. And finally, triple semicolons are used for block comments. These are just conventions, of course, and you’re free to decide what works for you.

Clojure provides a rather convenient macro that can be used for multiline comments. The macro is called comment, and here’s an example:

(comment
  (defn this-is-not-working [x y]
    (+ x y)))
;=> nil

This causes the whole s-expression to be treated as a comment. Specifically, the comment macro ignores forms passed in and returns nil.

As a final note on syntax, let’s address case sensitivity.

Case sensitivity

Like the majority of modern programming languages (including Java), Clojure is case sensitive. This is unlike most Lisps, however, which are usually not case sensitive.

Now that we’ve covered Clojure syntax, you’re ready to learn about writing programs in the language. We’ll begin by surveying the built-in data structures Clojure makes available and the functions that manipulate them. Then you’ll learn how to fill those functions with definition and control-flow forms, such as let, if, when, cond, loop, and others.

2.2. Clojure data structures

In this section, we’re going to explore the various built-in data types and data structures of Clojure. We’ll start with the basic characters and strings, and end with Clojure sequences.

2.2.1. nil, truth, and falsehood

You’ve seen these in action in the last several pages, so let’s run a quick recap. Clojure’s nil is equivalent to null in Java and nil in Ruby. It means “nothing.” Calling a function on nil may lead to a NullPointerException, although core Clojure functions try to do something reasonable when operating on nil.

Boolean values are simple. Everything other than false and nil is considered true. There’s an explicit true value, which can be used when needed.

2.2.2. Characters and strings

Clojure characters are Java characters (unsigned 16-bit UTF-16 code points). Clojure has a reader macro, the backslash, which can be used to denote characters, like a or g. (There are other reader macros; you’ll learn more about reader macros in section 2.3.4.)

Clojure strings are Java strings. They’re denoted using double quotes (because a single quote is a reader macro, which as you saw earlier means something else entirely). For this reason, it’s useful to know the API provided by the Java String class. Some examples are

(.contains "clojure-in-action" "-")

and

(.endsWith "program.clj" ".clj")

both of which return what you’d expect: true. Note the leading periods in .contains and .endsWith. This is Clojure syntax for calling a nonstatic Java method, and chapter 5 focuses entirely on Java interop.

2.2.3. Clojure numbers

The basics of Clojure numbers are easy: most of the time the numbers you’ll be using in Clojure are 64-bit integers (Java primitive longs) or 64-bit floating-point numbers (Java primitive doubles). When you need a bigger range, you can use big integers (arbitrary-precision integers) or big decimals (arbitrary-precision decimals).

Clojure also adds another less-common type of number: the ratio. Ratios are created when two integers are divided such that they can’t be reduced any further. For example, executing the code (/ 4 9) returns a ratio 4/9. Table 2.1 summarizes Clojure numbers.

Table 2.1. Syntax of Clojure numbers

Type

Casting function

Range and implementation

Syntax examples

Contagiousness

Integer Long Signed 64 bits (Java long) Base 10: 42 Base 16: 0x2a 0x2A 0X2a (case of letters never matters) Base 8: 052 (leading zero) Any base from 2 to 36: 2r101010 10r42 36r16 Negative numbers: -42 -0x2a -052 -36r16 0 (Lowest)
Big integer bigint Infinite (like a Java BigInteger, but actually a clojure.lang.BigInt) Base 10: 42N Base 16: 0x2aN Base 8: 052N Note: XrX syntax of normal integers isn’t supported! 1
Ratio rationalize Infinite: big integer numerator and denominator 1/3 -2/4 2
Big decimal bigdec Exact decimal number of arbitrary magnitude, good for financial calculations (Java BigDecimal) 2.78M 278e-2M +0.278E1M 3
Floating point double IEEE-794 double-precision floating point (Java double) 2.78 278e-2 +0.278E1 4 (Highest)

When different number types are mixed together in the same arithmetic operations, the number type with the highest “contagiousness” will “infect” the result with its type. Here’s an illustration of this principle:

(+ 1 1N)
;=> 2N

(+ 1 1N 1/2)
;=> 5/2
(+ 1 1N 1/2 0.5M)
;=> 3.0M
(+ 1 1N 1/2 0.5M 0.5)
;=> 3.5

There’s one more subtlety of Clojure integers. Sometimes an arithmetic operation on integers can produce a result too large to represent as a Clojure integer (that is, in 64 bits)—this is called overflow. The only possible arithmetic operations that can overflow in Clojure are adding, subtracting, and multiplying integers (dividing integers produces a ratio if the division is out of range). Normally when an overflow occurs Clojure throws a java.lang.ArithmeticException. If you’d like Clojure to autopromote the result to a big integer instead, you should use a set of alternative math functions: +', -', *', inc' (increment), and dec' (decrement). Note that these are spelled like their normal, nonoverflowing counterparts except for a single quote at the end. Here’s an example:

user> (inc 9223372036854775807)
ArithmeticException integer overflow  clojure.lang.Numbers.throwIntOverflow (Numbers.java:1424)
user> (inc' 9223372036854775807)
;=> 9223372036854775808N

2.2.4. Symbols and keywords

Symbols are the identifiers in a Clojure program, the names that signify values. For example, in the form (+ 1 2) the + is a symbol signifying the addition function. Because Clojure separates reading and evaluating, symbols have two distinct aspects: their existence in the program data structure after reading and the value that they resolve to. Symbols by themselves are just names with an optional namespace, but when an expression is evaluated they’re replaced with the value they signify.

It’s easy to have an intuitive sense of what a valid symbol looks like, but their syntax is difficult to explain precisely. Basically, a symbol is any run of alphanumeric characters or the following characters: *!_?$%&=<>. But there are a few restrictions. Symbols can’t start with a number; if they start with -, +, or ., they can’t have a number as the second character (so they aren’t confused with number literals); and they can optionally have a single / in the middle (and nowhere else!) to separate the namespace and name parts.

Here are some representative examples of valid symbols: foo, foo/bar, ->Bar, -foo, foo?, foo-bar, and foo+bar. And here are some invalid attempts at symbols: /bar, /foo, and +1foo.

In a program, symbols normally resolve to something else that isn’t a symbol. But it’s possible to treat a symbol as a value itself and not an identifier by quoting the symbol with a leading single-quote character. The quote tells the reader that the next form is literal data and not code for it to evaluate later. Notice the difference:

arglebarg
CompilerException java.lang.RuntimeException: Unable to resolve symbol: arglebarg in this context.
'arglebarg
;=> arglebarg

In the first example the symbol arglebarg isn’t bound to anything, so an attempt to evaluate it throws an error. The second example is evaluating the symbol arglebarg itself as a value.

Essentially what’s happening when you quote a symbol is you’re treating the symbol as data and not code. In practice you’ll almost never quote symbols to use them as data because Clojure has a special type specifically for this use case: the keyword. A keyword is sort of like an autoquoted symbol: keywords never reference some other value and always evaluate to themselves. Keyword syntax is almost like symbol syntax, except keywords always begin with a colon. Here are some examples of keywords: :foo, :foo/bar, :->foo, and :+. You’ll end up using keywords very often in your Clojure code, typically as keys in hash maps and as enumerated values.

You can construct keywords and symbols from strings using the keyword and symbol functions, which take a string of the name and optionally a string of the namespace. Likewise, you can examine keywords and functions using the name and namespace functions. For example:

We’ve discussed all the scalar Clojure types. Let’s now talk about some Clojure collections.

2.2.5. Lists

Lists are the basic collection data structure in Clojure. If you’re familiar with lists from other languages, Clojure lists are singly linked lists, meaning that it’s easy to go from the first to the last element of a list but impossible to go backward from the last to the first element. This means that you can only add or remove items from the “front” of the list. But this also means that multiple different lists can share the same “tails.” This makes lists the simplest possible immutable data structure.

Use the list function to create a list and the list? function to test for list types:

(list 1 2 3 4 5)
;=> (1 2 3 4 5)
(list? *1)
;=> true

Use the conj function to create a new list with another value added to it:

(conj (list 1 2 3 4 5) 6)
;=> (6 1 2 3 4 5)

The conj function is the generic “add an item to a collection” function in Clojure. It always adds an item to a collection in the fastest way possible for that collection. So on lists conj adds to the beginning, as you saw, but with other collections it may add to the end or even (for unordered collections) nowhere in particular. You’ll see more of conj when we look at the other collection types.

conj can take multiple arguments; it will add each argument to the list in the order in which it is supplied. Note that this means it will appear in the new list in the reverse order, because lists can grow only from the front:

You can treat a list like a stack, too. Use peek to return the head of the list and pop to return the tail:

Finally, you can count the number of items in a list in constant time using the count function:

(count (list))
;=> 0
(count (list 1 2 3 4))
;=> 4
Lists are special

As you learned earlier, Clojure code is represented using Clojure data structures. The list is special because each expression of Clojure code is a list. The list may contain other data structures such as vectors, but the list is the primary one.

In practice, this implies that lists are treated differently. Clojure assumes that the first symbol appearing in a list represents the name of a function (or a macro). The remaining expressions in the list are considered arguments to the function. Here’s an example:

(+ 1 2 3)

This list contains the symbol for plus (which evaluates to the addition function), followed by symbols for numbers representing one, two, and three. Once the reader reads and parses this, the list is evaluated by applying the addition function to the numbers 1, 2, and 3. This evaluates to 6, and this result is returned as the value of the expression (+ 1 2 3).

This has another implication. What if you wanted to define three-numbers as a list containing the numbers 1, 2, and 3? You can try that:

(def three-numbers (1 2 3))
; CompilerException java.lang.ClassCastException: java.lang.Long cannot be cast to clojure.lang.IFn, compiling:(NO_SOURCE_FILE:1)

The reason for this error is that Clojure is trying to treat the list (1 2 3) the same way as it treats all lists. The first element is considered a function, and here the integer 1 isn’t a function. What you want here is for Clojure not to treat the list as code. You want to say, “This list isn’t code, so don’t try to apply normal rules of evaluation to it.” Notice you had the same problem with the arglebarg symbol earlier, where you wanted to treat the symbol as data instead of as code. The solution is the same, too—quoting:

(def three-numbers '(1 2 3))
;=> #'user/three-numbers

In practice you won’t use lists for data very often in your Clojure code unless you’re writing a macro. The same way that Clojure has a special data type for the symbol-as-data use case (the keyword type), Clojure also has a superpowered counterpart to the humble list that’s more appropriate for use as data: the vector.

2.2.6. Vectors

Vectors are like lists, except for two things: they’re denoted using square brackets, and they’re indexed by number. Vectors can be created using the vector function or literally using the square bracket notation:

(vector 10 20 30 40 50)
;=> [10 20 30 40 50]
(def the-vector [10 20 30 40 50])
;=> #'user/the-vector

Vectors being indexed by numbers means that you have fast random access to the elements inside a vector. The functions that allow you to get these elements are get and nth. If the-vector is a vector of several elements, the following is how you’d use these functions:

(get the-vector 2)
;=> 30
(nth the-vector 2)
;=> 30
(get the-vector 10)
;=> nil

(nth the-vector 10)
IndexOutOfBoundsException   clojure.lang.PersistentVector.arrayFor (PersistentVector.java:107)

As shown here, the difference between nth and get is that nth throws an exception if the value isn’t found, whereas get returns nil. There are also several ways to modify a vector (that is, return a new one with the change). The most commonly used one is assoc, which accepts the index at which to associate a new value, along with the value itself:

You saw how the conj function works on lists earlier. It also works on vectors. Notice that the new element ends up at the end of the sequence this time, because that’s the fastest spot on a vector:

(conj [1 2 3 4 5] 6)
;=> [1 2 3 4 5 6]

peek and pop work, too, and they also look at the end of the vector instead of the beginning like with lists:

Vectors have another interesting property: they’re functions that take a single argument. The argument is assumed to be an index, and when the vector is called with a number, the value associated with that index is looked up inside itself. Here’s an example:

(the-vector 3)
;=> 40

The advantage of this is that vectors can be used where functions are expected. This helps a lot when using functional composition to create higher-level functions. We’ll revisit this aspect of vectors in the next chapter.

2.2.7. Maps

Maps are similar to associative arrays or dictionaries in languages like Python, Ruby, and Perl. A map is a sequence of key-value pairs. The keys can be pretty much any kind of object, and a value can be looked up inside a map with its key. Maps are denoted using braces. Here’s an example of a map using keywords as keys, which, as it turns out, is a common pattern:

(def the-map {:a 1 :b 2 :c 3})
;=> #'user/the-map

Maps can also be constructed using the hash-map function:[2]

2

Map literals and the hash-map function aren’t exactly equivalent, because Clojure actually has two different map implementations: hash-map and array-map. Array maps store keys and values in sorted order and perform lookups by scanning instead of hashing. This is faster for small maps, so smaller map literals (approximately 10 keys or less) actually become an array map instead of a hash map. If you assoc too many keys to an array map, you’ll eventually get a hash map instead. (The opposite is not true, however: a hash map will never return an array map if it gets too small.) Transparently replacing the implementation of a data structure is a common performance trick in Clojure made possible by the use of immutable data structures and pure functions. The hash-map and array-map functions will always return the corresponding structure, regardless of the number of arguments you call them with.

(hash-map :a 1 :b 2 :c 3)
;=> {:a 1, :c 3, :b 2}

Here, the-map is a sequence of key-value pairs. The keys are :a, :b, and :c. The values are 1, 2, and 3. Each key-value pair appears in sequence, establishing which value associates with which key. The values can be looked up like this:

(the-map :b)
;=> 2

The reason this is valid Clojure code is because a Clojure map is also a function. It accepts a key as its parameter, which is used to look up the associated value inside itself. Clojure keywords (like :a and :b) are also functions: they accept an associative collection, such as a map or vector, and look themselves up in the collection, for example:

The advantage of both maps and keywords being functions is that it makes function composition more flexible. Both these kinds of objects can be used where functions are needed, resulting in less and clearer code.

Like all Clojure data structures, maps are also immutable. There are several functions that can modify a map, and assoc and dissoc are the ones commonly used. Here’s an example of inserting a new key value into a map (except that Clojure returns a new map):

(def updated-map (assoc the-map :d 4))
;=> #'user/updated-map
updated-map
;=> {:d 4, :a 1, :b 2, :c 3}
(dissoc updated-map :a)
;=> {:b 2, :c 3, :d 4}

Before wrapping up this section, let’s look at some rather convenient functions that can make working with maps easy. First, let’s look at what you want to accomplish. Imagine you had an empty map, and you wanted to store user details in it. With one entry, the map might look like this:

(def users {:kyle {
              :date-joined "2009-01-01"
              :summary {
                :average {
                  :monthly 1000
                  :yearly 12000}}}})

Note the use of nested maps. Because maps are immutable, if you wanted to update Kyle’s summary for his monthly average, you couldn’t simply drill down to the spot on the map and update it in place as you would in most other languages. Instead, you would need to go down to the place you want to change, create the changed map, and assoc the changes into the intermediate maps on your way back up to the root. Doing this all the time would be tedious and error prone.

Fortunately, Clojure provides three functions that make updating nested collections easy. The first one is called assoc-in, and here it is in action:

(assoc-in users [:kyle :summary :average :monthly] 3000)
;=> {:kyle {:date-joined "2009-01-01", :summary {:average {:monthly 3000, :yearly 12000}}}}

This is helpful, because you don’t have to write a new function to set a new value rather deep in the user’s map. The general form of assoc-in is

(assoc-in map [key & more-keys] value)

If any nested map doesn’t exist along the way, it gets created and correctly associated.

The next convenience function reads values out of such nested maps. This function is called get-in:

(get-in users [:kyle :summary :average :monthly])
;=> 1000

The final function that’s relevant to this discussion is called update-in, which can be used to update values in such nested maps. To see it in action, imagine you wanted to increase Kyle’s monthly average by 500:

(update-in users [:kyle :summary :average :monthly] + 500)
;=> {:kyle {:date-joined "2009-01-01", :summary {:average {:monthly 1500, :yearly 12000}}}}

The general form of update-in is

(update-in map [key & more-keys] update-function & args)

This works similarly to assoc-in, in that the keys are used to find what to update with a new value. Instead of supplying the new value itself, you supply a function that accepts the old value as the first argument (and any other arguments that you can supply as well). The function is applied to these arguments, and the result becomes the new value. The + function here does that job—it takes the old monthly average value of 1000 and adds it to the supplied argument of 500.

Many Clojure programs use the map as a core data structure. Often, programmers used to objects in the stateful (data) sense of the word use maps in their place. This is a natural choice and works well.

2.2.8. Sequences

A sequence isn’t a collection type. Rather, a sequence is an interface (called ISeq) that exposes a “one thing followed by more things” abstraction. This interface is implemented pervasively by Clojure’s data structures, functions, and macros. The sequence abstraction allows all data structures to look and act like lists, even if the underlying values are some other collection type (such as a vector or hash map) or are even created lazily as they’re needed.

The ISeq interface provides three functions: first, rest, and cons. Here’s how first and rest work:

first returns the first element of the sequence like peek does for lists but the same way on all collection types. rest returns the sequences without the first element just like pop does on lists but the same way on all collection types and without throwing an exception for empty things.

cons (short for construct) creates new sequences given an element and an existing sequence:

(cons 1 [2 3 4 5])
;=> (1 2 3 4 5)

cons adds an item to the beginning of a sequence (even on vectors) and the original sequence is the “tail” of the new sequence. Notice that this is exactly how conj would work on a list: the sequence abstraction cons uses allows it to act as though all sequential structures it touches are listlike.

Note that the sequence abstraction is usually lazy, meaning that functions like first, rest, and cons don’t do extra work to create lists, even though the result prints like a list (that is, surrounded by parentheses). Observe:

(list? (cons 1 (list 2 3)))
;=> false

The sequence abstraction allows everything to seem as though real lists were being manipulated but avoids actually creating any new data structures (such as actual lists) or doing any unnecessary work (such as creating items farther down the sequence that are never used).

Now that you have a solid foundation in the data structures of Clojure, it’s time to write some programs that use them.

2.3. Program structure

In this section, we’ll examine several constructs that are part of the Clojure language. Most of those that we discuss here are categorized as structural forms because they lend structure to the code; they set up local names, allow for looping and recursion, and the like. We’ll begin with the most fundamental aspect of structuring Clojure code, namely the function.

2.3.1. Functions

Clojure is a functional language, which means that functions are first-class citizens of the language. For something to be first class, the language should allow them to

  • Be created dynamically
  • Be passed as arguments to functions
  • Be returned from other functions
  • Be stored as values inside other data structures

Clojure functions comply with all of these requirements.

If you’re used to programming in a language like C++ or Java, this will be a different experience. To start, let’s see how to define Clojure functions.

Function definition

Clojure offers the convenient defn macro, which allows traditional-looking function definitions, such as the following:

(defn addition-function [x y]
  (+ x y))

In reality, the defn macro expands to a combination of calls to def and fn, where fn is itself another macro and def is a special form. Here, def creates a var with the specified name and is bound to a new function object. This function has a body as specified in the defn form. Here’s what the equivalent expanded form looks like:

(def addition-function
  (fn [x y]
    (+ x y)))

The fn macro accepts a sequence of arguments in square brackets, followed by the body of the function. The fn form can be used directly to define anonymous functions. The def form shown here assigns the function created using fn to the var addition-function.

Variable arity

To define functions of variable arity, parameter lists can use the & symbol. An example is the addition function from Clojure core, where the parameters are defined as

[x y & more]

This allows + to handle any number of arguments. Functions are explained in more detail in chapter 3. Now you’ll learn about a form that helps in structuring the innards of functions themselves.

2.3.2. The let form

Consider the following function that calculates the average number of pets owned by the previously declared users:

(defn average-pets []
  (/ (apply + (map :number-pets (vals users))) (count users)))

Don’t worry yet about all that’s going on here. Observe that the body of the function is quite a long, complex-looking line of code. Such code can take several seconds to read. It would be nice if you could break it down into pieces, to make the intent of the code clearer. The let form allows you to introduce locally named things into your code by binding a symbol to a value. Consider the following alternate implementation:

Here, user-data, pet-counts, and total are namespace-less symbols that resolve to specific values but only in the scope of the let. Unlike vars created by def, these bindings can’t be changed, only shadowed by other bindings in nested scopes. Now the computation is much clearer, and it’s easy to read and maintain this code. Although this is a trivial example, you can imagine more complex use cases. Further, the let form can be used to name things that might be needed more than once in a piece of code. Indeed, you can introduce a local value computed from previously named values, within the same form, for instance:

(let [x 1
      y 2
      z (+ x y)]
  z)
;=> 3

More specifically, the let form accepts as its first argument a vector containing an even number of forms, followed by zero or more forms that get evaluated when the let is evaluated. The value of the last expression is returned.

Underscore identifier

Before moving on, it’s worth discussing the situation where you might not care about the return value of an expression. Typically, such an expression is called purely for its side effect. A trivial example is calling println, because you don’t care that it returns nil. If you do this inside a let form for any reason, you’ll need to specify an identifier in which to hold the return value. The code might look like this:

(defn average-pets []
  (let [user-data (vals users)
        pet-counts (map :number-pets user-data)
        value-from-println (println "total  pets:" pet-counts)
        total (apply + pet-counts)]
    (/ total (count users))))

In this code, the only reason you create value-from-println is that the let form needs a name to bind the value of each expression. In such cases where you don’t care about the value, you can just use a single underscore as the identifier name. Take a look at the following:

(defn average-pets []
  (let [user-data (vals users)
        pet-counts (map :number-pets user-data)
        _ (println "total  pets:" pet-counts)
        total (apply + pet-counts)]
    (/ total (count users))))

The underscore identifier can be used in any situation where you don’t care about the value of something. There’s nothing special about the underscore symbol: this is merely a Clojure convention to signal that the programmer doesn’t care about the symbol’s value and isn’t planning to use it later, but syntax requires the programmer to provide a binding symbol.

Although this example works for debugging, it isn’t particularly widespread in production code. The underscore identifier will be even more useful when we explore Clojure’s destructuring support in the next chapter.

We’ve covered the basics of the let form. We’re going to explore immutability and mutation a lot more, starting in chapter 3. For now, let’s continue with learning about the do form.

2.3.3. Side effects with do

In a pure functional language, programs are free of side effects. The only way to “do something” is for a function to compute a value and return it. Calling a function doesn’t alter the state of the world in any way. Consider the following code snippet:

(defn do-many-things []
  (do-first-thing)
  (do-another-thing)
  (return-final-value))

In a world without state and side effects, the do-many-things function would be equivalent to this one:

(defn do-many-things-equivalent []
  (return-final-value))

The calls to do-first-thing and do-another-thing can be eliminated without change in behavior, even without knowing what they do. This is because in a stateless world without side effects, the only thing that “does something” in do-many-things is the last function call to return-final-value, which presumably computes and returns a value. In such a world, there’d be no reason to ever call a series of functions (as shown in the first example), because only the last one would ever do anything useful.

The real world is full of state, and side effects are a necessity. For example, printing something to the console or to a log file is a side effect that changes the state of the world. Storing something in a database alters the state of the world and is another example of a side effect.

To combine multiple s-expressions into a single form, Clojure provides the do form. It can be used for any situation as described previously where some side effect is desired and the higher-order form accepts only a single s-expression. As an example, consider the if block:

(if (is-something-true?)
  (do
    (log-message "in true branch")
    (store-something-in-db)
    (return-useful-value)))

Normally, because the consequent part of the if form accepts only a single s-expression, without the do as shown here, it would be impossible to get the true case to call all three functions (log-message, store-something-in-db, and return-useful-value).

The do form is a convenient way to combine multiple s-expressions into one. This is a common idiom in macros, and plenty of core Clojure forms are macros that accept multiple forms as parameters and combine them into one using an implicit do. Examples are fn, let, doseq, loop, try, when, binding, dosync, and locking.

Now that you know how to create blocks of code using do, we’ll move on to learning about other structural constructs in the remainder of this section. First, though, let’s look at exception handling in Clojure.

2.3.4. Reader macros

The Clojure reader converts program text into Clojure data structures. It does this by recognizing that characters such as parentheses, braces, and the like are special and that they form the beginning (and ending) of lists, hash maps, and vectors. These rules are built into the reader.

Other characters are special also, because they signal to the reader that the form that follows them should be treated in a special way. In a sense, these characters extend the capability of the reader, and they’re called reader macros. The simplest (and most traditional) example of a reader macro is the comment character (;). When the reader encounters a semicolon, it treats the rest of that line of code as a comment and ignores it. Table 2.2 shows the available reader macros in Clojure.

Table 2.2. Clojure’s reader macros and their descriptions

Reader macro character

Description of reader macro

Quote (') Quotes the form following it, same as (quote )
Character () Yields a character literal
Comment (;) Single-line comment
Meta (^) Associates metadata for the form that follows
Deref (@) Dereferences the agent or ref that follows
Dispatch (#) #{} Constructs a set #"" Constructs a regex pattern #^ Associates metadata for the form that follows (deprecated by ^) #' Resolves the var for the symbol that follows, same as (var ) #() Constructs an anonymous function #_ Skips the following form
Syntax quote (`) Used in macros to render s-expressions
Unquote (~) Unquotes forms inside syntax-quoted forms
Unquote splice (~@) Unquotes a list inside a syntax form, but inserts the elements of the list without the surrounding parentheses

You don’t have to understand all of these now: you’ll learn about each of these reader macros in the relevant section in the book. For instance, we’ll use the last three quite heavily in chapter 7, which examines macros.

Reader macros are implemented as entries in a read table. An entry in this table is essentially a reader macro character associated with the macro function that describes how the form that follows is to be treated. Most Lisps expose this read table to the programmers, allowing them to manipulate it or add new reader macros. Clojure doesn’t do this, and so you can’t define your own reader macros. Starting with Clojure 1.4, Clojure does let you define your own data literals, something that we’ll examine in the next chapter.

In this section, you saw various structural constructs provided by Clojure. In the next section, you’ll see forms that control the execution flow of Clojure programs.

2.4. Program flow

Like most other languages, the basics of Clojure are simple to learn, with few special forms and indeed few constructs that control the flow of execution. In this section, we’ll begin with conditional program execution, with the if special form and other macros built on top of the if form, and then we’ll look at various functional constructs that allow for looping and working on sequences of data. Specifically, we’ll consider loop/recur, followed by a few macros that use loop/recur internally to make it convenient to process sequences. We’ll close this chapter with a few higher-order functions that apply other functions to sequences of data.

2.4.1. Conditionals

A conditional form is one that causes Clojure to either execute or not execute associated code. In this section, we’ll examine if, if-not, cond, when, and when-not. We’ll also briefly look at logical functions.

if

The most basic example of this is the if form. In Clojure, the general form of if looks like this:

(if test consequent alternative)

This shows that the if form accepts a test expression, which is evaluated to determine what to do next. If the test is true, the consequent is evaluated. If the test is false, and if an alternative form is provided, then it is evaluated instead (otherwise nil is returned). Because the consequent and alternative clauses of the if form can only be a single s-expression, you can use the do form to have it do multiple things. Here’s an example:

(if (> 5 2)
  "yes"
  "no")
;=> "yes"

if is a special form, which means that the Clojure language implements it internally as a special case. In a language that provides the if special form and a macro system, all other conditional forms can be implemented as macros, which is what Clojure does. Let’s visit a few such macros.

if-not

The if-not macro does the inverse of what the if special form does. The general structure of this macro is

(if-not test consequent alternative)

Here, if the test is false, the consequent is evaluated; else if it’s true and the alternative is provided, it’s evaluated instead. Here’s a quick example:

(if-not (> 5 2) "yes" "no")
;=> "no"
cond

cond allows you to flatten nested trees of if conditions. The general form looks like the following:

(cond & clauses)

Here’s a simple example of using cond:

(def x 1)
;=> #'user/x
(cond
   (> x 0)  "greater!"
   (= x 0)  "zero!"
   :default "lesser!")
;=> "greater!"

As you can see, the clauses are pairs of expressions, each of the form test consequent. Each test expression is evaluated in sequence, and when one returns true (actually anything other than false or nil), the associated consequent is evaluated and returned. If none returns a truthy value, you can pass in something that works as a true value (for example, the keyword :default), and the associated consequent is evaluated and returned instead.

when

Here’s the general form of the when macro:

(when test & body)

This convenient macro is an if (without the alternative clause), along with an implicit do. This allows multiple s-expressions to be passed in as the body. Here’s how it might be used:

(when (> 5 2)
  (println "five")
  (println "is")
  (println "greater")
  "done")

five
is
greater
;=> "done"

Note that there’s no need to wrap the three functions in the body inside a do, because the when macro takes care of this. You’ll find this a common pattern, and it’s a convenience that most macros provide to their callers.

when-not

when-not is the opposite of when, in that it evaluates its body if the test returns false or nil. The general form looks similar to that of when:

(when-not test & body)

Here’s an example:

(when-not (< 5 2)
  (println "two")
  (println "is")
  (println "smaller")
  "done")
two
is
smaller
;=> "done"

These are some of the many forms that allow programs to handle different kinds of conditional situations. Except for the if special form, they’re all implemented as macros, which also implies that the programmer is free to implement new ones, suited to the domain of the program. In the next section, you’ll see a little more detail about writing test expressions using logical functions.

2.4.2. Logical functions

Any expression that returns a truthy or falsey value can be used for the test expression in all the previously mentioned conditional forms. To write compound test expressions, Clojure provides some logical operators. Let’s examine the logical and first.

and accepts zero or more forms. It evaluates each in turn, and if any returns nil or false, and returns that value. If none of the forms return false or nil, then and returns the value of the last form. and short-circuits the arguments by not evaluating the remaining if any one returns a falsey value. A simple rule to remember the return value of and is that it returns the “deciding” value, which is the last value it had to examine, or true if there are no values. Here are some examples:

or works in the opposite way. It also accepts zero or more forms and evaluates them one by one. If any returns a logical true, it returns it as the value of the or. If none return a logical true, then or returns the last value. or also short-circuits its arguments. Here are some examples:

(or)
;=> nil
(or :a :b :c)
;=> :a
(or :a nil :c)
;=> :a
(or nil false)
;=> false
(or false nil)
;=> nil

Another point of interest is that both and and or are also macros. This means that they’re not built into the Clojure language but come as part of the core library. It also means that you can write your own macros that behave like and or or and they would be indistinguishable from the language. We’ll explore this more in chapter 7.

Finally, Clojure provides a not function that inverts the logical value of whatever is passed in as an argument. It always returns the exact values true or false. Here are some examples:

(not true)
;=> false
(not 1)
;=> false
(not nil)
;=> true

As a relevant side note, Clojure provides all the usual comparison and equality functions. Examples are <, <=, >, >=, and =. They all work the way you’d expect them to, with an additional feature: they take any number of arguments. The < function, for instance, checks to see if the arguments are in increasing order. Here are a couple quick examples:

(< 2 4 6 8)
;=> true

(< 2 4 3 8)
;=> false

The = function is the same as Java’s equals, but it works for a wider range of objects including nil, numbers, and sequences. Note that it’s a single = symbol and not ==, which is commonly used in many programming languages.

= (single-equals) vs. == (double-equals)

But Clojure does also have a == (double-equals) function that can only compare numbers. The difference between = and == is very subtle. = can compare any two Clojure values but gives unintuitive results when comparing numbers among the three different categories of numbers: integer (including ratio), big decimal, and floating point. Here are some examples:

(= 1 1N 1/1)
;=> true
(= 0.5 1/2)
;=> false
(= 0.5M 0.5)
;=> false
(= 0.5M 1/2)
;=> false

You’re probably scratching your head at those false values! If you’re comparing numbers from different classes, you can use == instead, but all arguments must be numbers:

A simple rule of thumb: if you know that everything you’re comparing is a number and you expect different categories of numbers, use ==; otherwise, use =.[3]

3

You may be wondering why Clojure has this wart. Clojure (and Java) has a contract stipulating that values that compare equal have the same hash code. This makes it possible to do extremely fast hash-based equality checks even when comparing large collections against one another. But it’s difficult to write a fast hash function that would produce the same value for different number categories. Further, even if they could hash the same, numbers in different categories aren’t complete replacements for one another because they have different precision, range, and arithmetic behavior. It would lead to surprising results if a floating-point number were used in place of a big decimal or a ratio. If you want a deeper dive into issues surrounding Clojure’s notion of equality, read the document Equality by Andy Fingerhut at https://github.com/jafingerhut/thalia/blob/master/doc/other-topics/equality.md.

These logical functions are sufficient to create compound logical expressions from simple ones. Our next stop in this section is iterations—not strictly the kind supported by imperative languages such as C++ and Java but the functional kind.

2.4.3. Functional iteration

Most functional languages don’t have traditional iteration constructs like for because typical implementations of for require mutation of the loop counter. Instead, they use recursion and function application to process lists of things. We’ll start this section by looking at the familiar while form, followed by examining Clojure’s loop/recur looping construct. Then we’ll examine a few convenient macros such as doseq and dotimes, which are built on top of loop/recur.

while

Clojure’s while macro works in a similar fashion to those seen in imperative languages such as Ruby and Java. The general form is as follows:

(while test & body)

Consider the case where you have a function request-on-queue? that checks to see if a message has arrived on a messaging system you’re using and another function pop-request-queue that retrieves such a message. The following shows a way to set up the request-handling loop:

(while (request-on-queue?)
  (handle-request (pop-request-queue)))

Here, requests will continue to be processed as long as they keep appearing on the request queue. The while loop will end if request-on-queue? returns a value either false or nil, presumably because something else happened elsewhere in the system. Note that the only way for a while loop to end is for a side effect to cause the test expression to return a logically false value (that is, either false or nil).

Now, let’s move on to another looping construct—one that’s somewhat different from imperative languages, because it relies on what appears to be recursion.

loop/recur

Clojure doesn’t have traditional for loops for iteration; instead, programs can achieve similar behavior through the use of higher-level functions such as map and other functions in the sequence library. The Clojure version of iterative flow control is loop and the associated recur. Here’s an example of calculating the factorial of a number n using loop/recur:

Here’s the general form of the loop:

(loop bindings & body)

loop sets up bindings that work exactly like the let form does. In this example, [current n fact 1] works the same way if used with a let form: current gets bound to the value of n, and fact gets bound to a value of 1. Then it executes the supplied body inside the lexical scope of the bindings. In this case, the body is the if form.

Now let’s talk about recur. It has similar semantics as the let form bindings:

(recur bindings)

The bindings are computed, and each value is bound to the respective name as described in the loop form. Execution then returns to the start of the loop body. In this example, recur has two binding values, (dec current) and (* fact current), which are computed and rebound to current and fact. The if form then executes again. This continues until the if condition causes the looping to end by not calling recur anymore.

recur is a special form in Clojure, and despite looking recursive, it doesn’t consume the stack. It’s the preferred way of doing self-recursion, as opposed to a function calling itself by name. The reason for this is that Clojure currently doesn’t have tail-call optimization, though it’s possible that this will be added at some point in the future if the Java virtual machine (JVM) were to support it. recur can be used only from tail positions of code, and if an attempt is made to use it from any other position, the compiler will complain. For instance, this will cause Clojure to complain:

The specific error you’ll see is

CompilerException java.lang.UnsupportedOperationException: Can only recur from tail position, compiling:(NO_SOURCE_PATH:5:7)

This will tip you off that you have a recur being used from a nontail position of loop, and such errors in code are easy to fix.

As you’ve seen, loop/recur is simple to understand and use. recur is more powerful and can cause execution to return to any recursion point. Recursion points can be set up, as you saw in the example, by a loop form or by a function form (enabling you to create self-recursive functions). You’ll see the latter in action in the next chapter. By the way, another point to note is that by using recur, you’re being explicit about where you want the recursion to occur in a tail-recursive manner. This improves the readability of the code.

Now let’s look at a few macros that Clojure provides that make it easy to work with sequences without having to use loop/recur directly.

doseq and dotimes

Imagine that you have a list of users and you wish to generate expense reports for each user. You could use the looping construct from the previous section, but instead there’s a convenient way to achieve the same effect in the following dispatch-reporting-jobs function:

(defn run-report [user]
  (println "Running report for" user))

(defn dispatch-reporting-jobs [all-users]
  (doseq [user all-users]
    (run-report user)))

Here, the form of interest is doseq. The simplest form accepts a vector containing two terms, where the first term is a new symbol, which will be sequentially bound to each element in the second term (which must be a sequence). The body will be executed for each element in the sequence and then the entire form will return nil. In this case, dispatch-reporting-jobs will call run-reports for each user present in the sequence all-users.

dotimes is similar. It’s a convenience macro that accepts a vector containing a symbol and a number n, followed by the body. The symbol is set to numbers from 0 to (n – 1), and the body is evaluated for each number. Here’s an example:

(dotimes [x 5]
  (println "X is" x))

This will print the numbers 0 through 4 and return nil.

Despite the convenience of these macros, they’re not used as much as you’d imagine, especially if you’re coming from an imperative background. In Clojure, the most common pattern of computing things from lists of data is using higher-level functions such as map, filter, and reduce. We’ll look at these briefly in the remainder of this section.

map

Don’t be confused by the name here: map the function is different from “map” the data structure! The simplest use of map accepts a unary function and a sequence of data elements. A unary function is a function that accepts only one argument. map applies this function to each element of the sequence and returns a new sequence that contains all the returned values, for example:

(map inc [0 1 2 3])
;=> (1 2 3 4)

Even though this is a common way of using map, it’s even more general than this. map accepts a function that can take any number of arguments, along with the same number of sequences. It collects the result of applying the function to corresponding elements from each sequence. If the sequences are of varying lengths, map works through the shortest one:

In other languages, this would have required much more code: an iteration block, a list that collects the return values, and a condition that checks to see if the list is exhausted. A single call to map does all this.

filter and remove

filter does something similar to map—it collects values. But it accepts a predicate function and a sequence and returns only those elements of the sequence that return a logically true value when the predicate function is called on them. Here’s an example that returns only valid expenses that aren’t zero in value:

remove is the opposite of filter: where filter uses the predicate to decide what to keep, remove uses it to decide what to drop. You can rewrite the non-zero-expenses function using remove:

For several kinds of calculations, you’ll need to operate on only those expenses that aren’t zero. non-zero-expenses is a function that selects all such values, and it does so in one line of code (three words!).

reduce

The simplest form of reduce is a high-level function that accepts a function of arity two and a sequence of data elements. The function is applied to the first two elements of the sequence, producing the first result. The same function is then called again with this result and the next element of the sequence. This then repeats with the following element, until the last element is processed.

Here you’ll write the factorial function using reduce:

(defn factorial [n]
  (let [numbers (range 1 (+ n 1))]
    (reduce * numbers)))

range is a Clojure function that returns a list of numbers starting from the first argument (inclusive) to the second argument (exclusive). For instance:

(range 10)
;=> (0 1 2 3 4 5 6 7 8 9)

This is why numbers is computed by calling range with 1 and (+ n 1). The rest is easy; you reduce the sequence using the multiply (*) function.

Let’s examine how this works when factorial is called with 5:

(factorial 5)
;=> 120

numbers is set to the result of calling range on 1 and 6, which is the sequence of the numbers 1, 2, 3, 4, and 5. This sequence is what reduce operates on, along with the multiplication function. The result of multiplying 1 and 2 (which is 2) is multiplied by 3 (resulting in 6). That’s then multiplied by 4 (resulting in 24), which is finally multiplied by 5, resulting in 120.

If you’re ever having trouble visualizing the steps of a reduction, you can replace the reduce function with the reductions function: reduce returns only the final reduced value, but reductions returns a sequence of every intermediate value. You’ll rewrite the factorial function to use reductions:

reduce is a powerful function, and as shown here, it accomplishes in a single line of code what might require several lines in other languages.

for

What book can be complete without mentioning for in the context of iteration? We said earlier that few functional languages have a traditional for construct. Clojure does have for, but it isn’t quite like what you might be used to. In Clojure, for is used for list comprehensions, which is a syntactic feature that allows sequences to be constructed out of existing ones. The general form of the for construct follows:

(for seq-exprs body-expr)

seq-exprs is a vector specifying one or more binding-form/collection-expr pairs. body-expr can use the bindings set up in seq-exprs to construct each element of the list. Consider the following example that generates a list of labels for each square on a chessboard:

(def chessboard-labels
  (for [alpha "abcdefgh"
        num (range 1 9)]
    (str alpha num)))

The str function concatenates the string values of the arguments passed to it. Now chessboard-labels is a lazy sequence with all 64 labels:

chessboard-labels
;=> ("a1" "a2" "a3" "a4" "a5"  ...  "h6" "h7" "h8")

The for seq-exprs can take modifiers :let, :when, and :while. To see an example of :when in use, first consider a function that checks to see if a number is prime:

Although there are more efficient ways to test for a prime number, this implementation will suffice for this example. By the way, some is a core function that returns the first logical true value returned when the specified predicate is called with each element of the specified collection. We’ll revisit this function shortly. Also, Math/sqrt is code that calls the sqrt static method on the Math Java class. This is an example of Clojure’s Java interop, and chapter 5 is dedicated to it.

Now you’ll use for to write a function primes-less-than, which returns a list of all primes between 2 and the number passed in:

(defn primes-less-than [n]
  (for [x (range 2 (inc n))
        :when (prime? x)]
    x))

Notice how you specify a condition in the for form using the :when option. You can test this function:

(primes-less-than 50)
;=> (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47)

Let’s look at another, slightly more complex example. You’ll use the prime? function to find all pairs of numbers under, say, a number like 5, such that the sum of each is prime. Here it is:

(defn pairs-for-primes [n]
  (let [z (range 2 (inc n))]
    (for [x z y z :when (prime? (+ x y))]
      (list x y))))

Now test it out:

(pairs-for-primes 5)
;=> ((2 3) (2 5) (3 2) (3 4) (4 3) (5 2))

As you can see, Clojure’s for is a powerful construct, and it can be used to create arbitrary lists. A great advantage of this feature is that it’s almost declarative. For instance, the code in pairs-for-primes reads almost like a restatement of the problem itself.

Our next stop isn’t strictly about program flow but about a couple of macros that are useful in writing other functions and macros.

2.4.4. Threading macros

You’re going to learn a lot about macros in this book, starting with an introduction to them in chapter 7. From a developer point of view, several macros are extremely useful. You’ve seen some already, and in this section you’ll see two more, which make writing code a lot more convenient and result in more readable code as well. They’re called threading macros.

Thread-first

Imagine that you need to calculate the savings that would be available to a user several years from now based on some amount the user invests today. You can use the formula for compound interest to calculate this:

final-amount = principle * (1 + rate/100) ^ time-periods

You can write a function to calculate this:

You can test that it works by calling it at the REPL:

(final-amount 100 20 1)
;=> 120.0
(final-amount 100 20 2)
;=> 144.0

This is fine, but the function definition is difficult to read, because it’s written inside out, thanks to the prefix nature of Clojure’s syntax. This is where the thread-first macro (named ->) helps, as shown in the following code:

(defn final-amount-> [principle rate time-periods]
  (-> rate
      (/ 100)
      (+ 1)
      (Math/pow time-periods)
      (* principle)))

It works the same, and you can confirm this on the REPL:

(final-amount-> 100 20 1)
;=> 120.0
(final-amount-> 100 20 2)
;=> 144.0

What the thread-first macro does is take the first argument supplied and place it in the second position of the next expression. It’s called thread-first because it moves code into the position of the first argument of the following form. It then takes the entire resulting expression and moves it into the second position of the following expression, and through all of them, until all expressions are exhausted. So when the macro expands in the case of the final-amount-> function, the form looks like this:

(* (Math/pow (+ (/ rate 100) 1) time-periods) principle)

To be more accurate, the call to Java’s Math/pow is also expanded, but we’ll explore that in chapter 5. For now, it’s enough to see that the expanded form is exactly like the one we manually defined in final-amount earlier. The advantage is that final-amount-> is much easier to write and read. This is an example of how a macro can manipulate code to make it easier to read. Doing something like this is nearly impossible in most other languages.

In the next section, we’ll examine a related macro, called thread-last.

Thread-last

The thread-last macro (named ->>) is a cousin of the thread-first macro. Instead of taking the first expression and moving it into the second position of the next expression, it moves it into the last place. It then repeats the process for all the expressions provided to it. Examine a version of the factorial function again:

(defn factorial [n]
  (reduce * (range 1 (+ 1 n))))

This is also written in the inside-out syntax, and it isn’t immediately obvious what the sequence of operations is. Here’s the same function rewritten using the ->> macro:

(defn factorial->> [n]
  (->> n
       (+ 1)
       (range 1)
       (reduce *)))

You can check that it works by testing it at the REPL:

(factorial->> 5)
;=> 120

This macro expands the factorial->> function to

(reduce * (range 1 (+ 1 n)))

This ensures that it works the same way as factorial defined previously. The main advantage of this macro (similar to the -> macro) is that it lets developers focus on the sequence of operations, rather than ensuring they’re writing the nested expressions correctly. It’s also easy to read and maintain the resulting function.

A far more common use of this macro is when working with sequences of data elements and using higher-order functions such as map, reduce, and filter. Each of these functions accepts the sequence as the last element, so the thread-last macro is perfect for the job.

While we’re on the topic of threading macros, Clojure 1.5 introduced two related ones called some-> and some->>. These two behave exactly the same as the respective ones we just discussed, but computation ends if the result of any step in the expansion is nil.

Thread-as

Another threading macro introduced in Clojure 1.5 is thread-as (named as->). The threading macros you’ve seen so far don’t give you any control over the position of the previous expression: it is either first or last, depending on which threading macro you use. as-> is more flexible: you supply it a name, and it will bind the result of each successive form to that name so you can use it in the next. For example:

This may look like magic, but the macro is actually quite simple. This is what the previous example expands to:

(let [<> {"a" [1 2 3 4]}
      <> (<> "a")
      <> (conj <> 10)
      <> (map inc <>)]
 <>)

The as-> macro is really just a more compact way of chaining a series of let bindings to the same name.

Conditional threading

The final set of threading macros we’ll look at were also introduced in Clojure 1.5: cond-> and cond->>. These threading macros are exactly like -> and ->>, except each form is guarded by a conditional (which is not threaded) and can be skipped if the conditional is false. Here’s an example:

Notice that when the left-hand condition form of a pair is false, the right-hand threaded form is skipped. cond-> is superficially similar to cond because both accept predicate-result pairs, but cond will stop evaluating pairs as soon as it finds a predicate that’s truthy, though the cond-> threading macro will evaluate every conditional. cond->> is the same as cond->, except that the form is threaded into the last position like ->> instead of through the first position like ->.

The succinctness of cond-> and cond->> also makes them hard to grasp at first. It may help to see an equivalent implementation that makes more explicit what the predicates are doing and where the threading occurs:

(let [x 1 y 2]
  (as-> [] <>
        (if (odd? x)          (conj <> "x is odd")            <>)
        (if (zero? (rem y 3)) (conj <> "y is divisible by 3") <>)
        (if (even? y)         (conj <> "y is even")           <>)))
;=> ["x is odd" "y is even"]

The conditional threading macros are handy when you need to build up a data structure based on a large number of other factors, such as building a map of configuration settings or running different map and filter functions conditionally over the same data structure.

In this section, you saw various ways to control the execution flow of Clojure programs. We started off with conditionals and explored the associated logical functions. We then addressed the idea of looping—not directly as imperative for loops do in other languages but through a recursive form and through higher-order functions. Armed with this knowledge, you could write a lot of code without ever missing imperative constructs.

2.5. Summary

This was a long chapter! We started out by getting up and running with the Clojure REPL and then addressed the basics of writing code in the language. Specifically, we addressed forms that structure code, such as functions, let, and looping. We also looked at execution control forms, such as if, when, and cond. We also visited some of the data types and data structures that come built into the language. Understanding these equips you to use and create the right data abstractions in your programs.

Armed with this knowledge, you can probably write a fair amount of Clojure code already. The material from the next chapter, combined with this one, should enable you to write almost any basic program using the core of Clojure. And from there we’ll dive into more intermediate concepts.

In the next chapter, we’re going to explore more building blocks of Clojure. We’ll begin with a deep dive into functions in an attempt to understand Clojure’s support for functional programming. We’ll also explore the idea of scope and show how to organize your programs with namespaces. Finally, we’ll explore a concept somewhat unique to Clojure (well, uncommon in imperative languages such as Java and C++ at least) called destructuring.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset