Topics in This Chapter
13.1 The Main Collections Traits
13.2 Mutable and Immutable Collections
13.6 Operators for Adding or Removing Elements
13.9 Reducing, Folding, and Scanning
13.14 Interoperability with Java Collections
In this chapter, you will learn about the Scala collections library from a library user’s point of view. In addition to arrays and maps, which you have already encountered, you will see other useful collection types. There are many methods that can be applied to collections, and this chapter presents them in an orderly way.
The key points of this chapter are:
• All collections extend the Iterable
trait.
• The three major categories of collections are sequences, sets, and maps.
• Scala has mutable and immutable versions of most collections.
• A Scala list is either empty, or it has a head and a tail which is again a list.
• Sets are unordered collections.
• Use a LinkedHashSet
to retain the insertion order or a SortedSet
to iterate in sorted order.
• +
adds an element to an unordered collection; +:
and :+
prepend or append to a sequence; ++
concatenates two collections; -
and --
remove elements.
• The Iterable
and Seq
traits have dozens of useful methods for common operations. Check them out before writing tedious loops.
• Mapping, folding, and zipping are useful techniques for applying a function or operation to the elements of a collection.
Figure 13–1 shows the most important traits that make up the Scala collections hierarchy.
An Iterable
is any collection that can yield an Iterator
with which you can access all elements in the collection:
val coll = ... // some Iterable
val iter = coll.iterator
while (iter.hasNext)
do something with iter.next()
This is the most basic way of traversing a collection. However, as you will see throughout this chapter, usually there are more convenient ways.
A Seq
is an ordered sequence of values, such as an array or list. An IndexedSeq
allows fast random access through an integer index. For example, an ArrayBuffer
is indexed but a linked list is not.
A Set
is an unordered collection of values. In a SortedSet
, elements are always visited in sorted order.
A Map
is a set of (
key,
value)
pairs. A SortedMap
visits the entries as sorted by the keys. See Chapter 4 for more information.
This hierarchy is similar to that in Java, with a couple of welcome improvements:
1. Maps are a part of the hierarchy and not a separate hierarchy.
2. IndexedSeq
is the supertype of arrays but not of lists, allowing you to tell the two apart.
In Java, both ArrayList
and LinkedList
implement a common List
interface, making it difficult to write efficient code when random access is preferred, for example when searching in a sorted sequence. This was a flawed design decision in the original Java collections framework. In a later version, a marker interface RandomAccess
was added to deal with this problem.
Each Scala collection trait or class has a companion object with an apply
method for constructing an instance of the collection. For example,
Iterable(0xFF, 0xFF00, 0xFF0000)
Set(Color.RED, Color.GREEN, Color.BLUE)
Map(Color.RED -> 0xFF0000, Color.GREEN -> 0xFF00, Color.BLUE -> 0xFF)
SortedSet("Hello", "World")
This is called the “uniform creation principle.”
There are methods toSeq
, toSet
, toMap
, and so on, as well as a generic to[C]
method, that you can use to translate between collection types.
val coll = Seq(1, 1, 2, 3, 5, 8, 13)
val set = coll.toSet
val buffer = coll.to[ArrayBuffer]
Note
You can use the ==
operator to compare any sequence, set, or map with another collection of the same kind. For example, Seq(1, 2, 3) == (1 to 3)
yields true
. But comparing different kinds, for example, Seq(1, 2, 3) == Set(1, 2, 3)
always yields false
. In that case, use the sameElements
method.
Scala supports both mutable and immutable collections. An immutable collection can never change, so you can safely share a reference to it, even in a multi-threaded program. For example, there is a scala.collection.mutable.Map
and a scala.collection.immutable.Map
. Both have a common supertype scala.collection.Map
(which, of course, contains no mutation operations).
Note
When you have a reference to a scala.collection.immutable.Map
, you know that nobody can change the map. If you have a scala.collection.Map
, then you can’t change it, but someone else might.
Scala gives a preference to immutable collections. The companion objects in the scala.collection
package produce immutable collections. For example, scala.collection.Map("Hello" -> 42)
is an immutable map.
Moreover, the scala
package and the Predef
object, which are always imported, have type aliases List
, Set
, and Map
that refer to the immutable traits. For example, Predef.Map
is the same as scala.collection.immutable.Map
.
Tip
With the statement
import scala.collection.mutable
you can get an immutable map as Map
and a mutable one as mutable.Map
.
If you had no prior experience with immutable collections, you may wonder how you can do useful work with them. The key is that you can create new collections out of old ones. For example, if numbers
is an immutable set, then
numbers + 9
is a new set containing the numbers
together with 9
. If 9
was already in the set, you just get a reference to the old set. This is particularly natural in recursive computations. For example, here we compute the set of all digits of an integer:
def digits(n: Int): Set[Int] =
if (n < 0) digits(-n)
else if (n < 10) Set(n)
else digits(n / 10) + (n % 10)
This method starts out with a set containing a single digit. At each step, another digit is added. However, adding the digit doesn’t mutate a set. Instead, in each step, a new set is constructed.
Figure 13–2 shows the most important immutable sequences.
A Vector
is the immutable equivalent of an ArrayBuffer
: an indexed sequence with fast random access. Vectors are implemented as trees where each node has up to 32 children. For a vector with one million elements, one needs four layers of nodes. (Since 103 » 210, 106 » 324.) Accessing an element in such a list will take 4 hops, whereas in a linked list it would take an average of 500,000.
A Range
represents an integer sequence, such as 0,1,2,3,4,5,6,7,8,9
or 10,20,30
. Of course a Range
object doesn’t store all sequence values but only the start, end, and increment. You construct Range
objects with the to
and until
methods, as described in Chapter 2.
We discuss lists in the next section, and streams in Section 13.12, “Streams,” on page 189.
See Figure 13–3 for the most useful mutable sequences.
We discussed array buffers in Chapter 3. Stacks, queues, and priority queues are standard data structures that are useful for implementing certain algorithms. If you are familiar with these structures, the Scala implementations won’t surprise you.
In Scala, a list is either Nil
(that is, empty) or an object with a head
element and a tail
that is again a list. For example, consider the list
val digits = List(4, 2)
The value of digits.head
is 4
, and digits.tail
is List(2)
. Moreover, digits.tail.head
is 2
and digits.tail.tail
is Nil
.
The ::
operator makes a new list from a given head and tail. For example,
9 :: List(4, 2)
is List(9, 4, 2)
. You can also write that list as
9 :: 4 :: 2 :: Nil
Note that ::
is right-associative. With the ::
operator, lists are constructed from the end:
9 :: (4 :: (2 :: Nil))
In Java or C++, one uses an iterator to traverse a linked list. You can do this in Scala as well, but it is often more natural to use recursion. For example, the following function computes the sum of all elements in a linked list of integers:
def sum(lst: List[Int]): Int =
if (lst == Nil) 0 else lst.head + sum(lst.tail)
Or, if you prefer, you can use pattern matching:
def sum(lst: List[Int]): Int = lst match {
case Nil => 0
case h :: t => h + sum(t) // h is lst.head, t is lst.tail
}
Note the ::
operator in the second pattern. It “destructures” the list into head and tail.
Note
Recursion works so naturally because the tail of a list is again a list.
Of course, for this particular example, you do not need to use recursion at all. The Scala library already has a sum
method:
List(9, 4, 2).sum // Yields 15
If you want to mutate list elements in place, you can use a ListBuffer
, a data structure that is backed by a linked list with a reference to the last node. This makes it efficient to add or remove elements at either end of the list.
However, adding or removing elements in the middle is not efficient. For example, suppose you want to remove every second element of a mutable list. With a Java LinkedList
, you use an iterator and call remove
after every second call to next
. There is no analogous operation on a ListBuffer
. Of course, removing multiple elements by their index positions is very inefficient in a linked list. Your best bet is to generate a new list with the result (see Exercise 3 on page 194).
Note
There are deprecated LinkedList
and DoubleLinkedList
classes and an internal MutableList
class that you should not be using.
A set is a collection of distinct elements. Trying to add an existing element has no effect. For example,
Set(2, 0, 1) + 1
is the same as Set(2, 0, 1)
.
Unlike lists, sets do not retain the order in which elements are inserted. By default, sets are implemented as hash sets in which elements are organized by the value of the hashCode
method. (In Scala, as in Java, every object has a hashCode
method.)
For example, if you iterate over
Set(1, 2, 3, 4, 5, 6)
the elements are visited in the order
5 1 6 2 3 4
You may wonder why sets don’t retain the element order. It turns out that you can find elements much faster if you allow sets to reorder their elements. Finding an element in a hash set is much faster than in an array or list.
A linked hash set remembers the order in which elements were inserted. It keeps a linked list for this purpose. For example,
val weekdays = scala.collection.mutable.LinkedHashSet("Mo", "Tu", "We", "Th", "Fr")
If you want to iterate over elements in sorted order, use a sorted set:
val numbers = scala.collection.mutable.SortedSet(1, 2, 3, 4, 5)
A bit set is an implementation of a set of non-negative integers as a sequence of bits. The ith bit is 1 if i is present in the set. This is an efficient implementation as long as the maximum element is not too large. Scala provides both mutable and immutable BitSet
classes.
The contains
method checks whether a set contains a given value. The subsetOf
method checks whether all elements of a set are contained in another set.
val digits = Set(1, 7, 2, 9)
digits contains 0 // false
Set(1, 2) subsetOf digits // true
The union
, intersect
, and diff
methods carry out the usual set operations. If you prefer, you can write them as |
, &
, and &~
. You can also write union as ++
and difference as --
. For example, if we have the set
val primes = Set(2, 3, 5, 7)
then digits union primes
is Set(1, 2, 3, 5, 7, 9)
, digits & primes
is Set(2, 7)
, and digits -- primes
is Set(1, 9)
.
When you want to add or remove an element, or a number of elements, the operators to use depend on the collection type. Table 13–1 provides a summary.
Generally, +
is used for adding an element to an unordered collection, while +:
and :+
add an element to the beginning or end of an ordered collection.
Vector(1, 2, 3) :+ 5 // Yields Vector(1, 2, 3, 5)
1 +: Vector(1, 2, 3) // Yields Vector(1, 1, 2, 3)
Note that +:
, like all operators ending in a colon, is right-associative, and that it is a method of the right operand.
These operators return new collections (of the same type as the original ones) without modifying the original. Mutable collections have a +=
operator that mutates the left-hand side. For example,
val numbers = ArrayBuffer(1, 2, 3)
numbers += 5 // Adds 5 to numbers
With an immutable collection, you can use +=
or :+=
with a var
, like this:
var numbers = Set(1, 2, 3)
numbers += 5 // Sets numbers to the immutable set numbers + 5
var numberVector = Vector(1, 2, 3)
numberVector :+= 5 // += does not work since vectors don't have a + operator
To remove an element, use the -
operator:
Set(1, 2, 3) - 2 // Yields Set(1, 3)
You can add multiple elements with the ++
operator:
coll ++ coll2
yields a collection of the same type as coll
that contains both coll
and coll2
. Similarly, the --
operator removes multiple elements.
Tip
As you can see, Scala provides many operators for adding and removing elements. Here is a summary:
1. Append (:+
) or prepend (+:
) to a sequence.
2. Add (+
) to an unordered collection.
3. Remove with the -
operator.
4. Use ++
and --
for bulk add and remove.
5. Mutations are += ++= -= --=
.
6. For lists, many Scala programmers prefer the ::
and :::
operators.
7. Stay away from ++: +=: ++=:
.
Note
For lists, you can use +:
instead of ::
for consistency, with one exception: Pattern matching (case h::t
) does not work with the +:
operator.
Table 13–2 gives a brief overview of the most important methods of the Iterable
trait, sorted by functionality.
The Seq
trait adds several methods to the Iterable
trait. Table 13–3 shows the most important ones.
Note that these methods never mutate a collection. They return a collection of the same type as the original. This is sometimes called the “uniform return type” principle.
You may want to transform all elements of a collection. The map
method applies a function to a collection and yields a collection of the results. For example, given a list of strings
val names = List("Peter", "Paul", "Mary")
you get a list of the uppercased strings as
names.map(_.toUpperCase) // List("PETER", "PAUL", "MARY")
This is exactly the same as
for (n <- names) yield n.toUpperCase
If the function yields a collection instead of a single value, you may want to concatenate all results. In that case, use flatMap
. For example, consider
def ulcase(s: String) = Vector(s.toUpperCase(), s.toLowerCase())
Then names.map(ulcase)
is
List(Vector("PETER", "peter"), Vector("PAUL", "paul"), Vector("MARY", "mary"))
List("PETER", "peter", "PAUL", "paul", "MARY", "mary")
Tip
If you use flatMap
with a function that returns an Option
, the resulting collection contains all values v for which the function returns Some(
v)
.
Note
The map
and flatMap
methods are important because they are used for translating for
expressions. For example, the expression
for (i <- 1 to 10) yield i * i
is translated to
(1 to 10).map(i => i * i)
and
for (i <- 1 to 10; j <- 1 to i) yield i * j
becomes
(1 to 10).flatMap(i => (1 to i).map(j => i * j))
Why flatMap
? See Exercise 9 on page 195.
The transform
method is the in-place equivalent of map
. It applies to mutable collections, and replaces each element with the result of a function. For example, the following code changes all buffer elements to uppercase:
val buffer = ArrayBuffer("Peter", "Paul", "Mary")
buffer.transform(_.toUpperCase)
If you just want to apply a function for its side effect and don’t care about the function values, use foreach
:
names.foreach(println)
The collect
method works with partial functions—functions that may not be defined for all inputs. It yields a collection of all function values of the arguments on which it is defined. For example,
"-3+4".collect { case '+' => 1 ; case '-' => -1 } // Vector(-1, 1)
The groupBy
method yields a map whose keys are the function values, and whose values are the collections of elements whose function value is the given key. For example,
val words = ...
val map = words.groupBy(_.substring(0, 1).toUpper)
builds a map that maps "A"
to all words starting with A, and so on.
The map
method applies a unary function to all elements of a collection. The methods that we discuss in this section combine elements with a binary function. The call c.reduceLeft(op)
applies op
to successive elements, like this:
.
.
.
op
/
op coll(3)
/
op coll(2)
/
coll(0) coll(1)
For example,
List(1, 7, 2, 9).reduceLeft(_ - _)
is
-
/
- 9
/
- 2
/
1 7
or
((1 - 7) - 2) - 9 = 1 - 7 - 2 - 9 = -17
The reduceRight
method does the same, but it starts at the end of the collection. For example,
List(1, 7, 2, 9).reduceRight(_ - _)
is
1 - (7 - (2 - 9)) = 1 - 7 + 2 - 9 = -13
Often, it is useful to start the computation with an initial element other than the initial element of a collection. The call coll.foldLeft(init)(op)
computes
.
.
.
op
/
op coll(2)
/
op coll(1)
/
init coll(0)
For example,
List(1, 7, 2, 9).foldLeft(0)(_ - _)
is
0 - 1 - 7 - 2 - 9 = -19
Note
The initial value and the operator are separate “curried” parameters so that Scala can use the type of the initial value for type inference in the operator. For example, in List(1, 7, 2, 9).foldLeft("")(_ + _)
, the initial value is a string, so the operator must be a function (String, Int) => String
.
You can also write the foldLeft
operation with the /:
operator, like this:
(0 /: List(1, 7, 2, 9))(_ - _)
The /:
is supposed to remind you of the shape of the tree.
Note
With the /:
operator, the initial value is the first operand. Note that, since the operator ends with a colon, it is a method of the second operand.
There is a foldRight
or :
variant as well, computing
.
.
.
op
/
coll(n-3) op
/
coll(n-2) op
/
coll(n-1) init
These examples don’t seem to be very useful. Of course, coll.reduceLeft(_ + _)
or coll.foldLeft(0)(_ + _)
computes the sum, but you can get that directly with coll.sum
.
Folding is sometimes attractive as a replacement for a loop. Suppose, for example, we want to count the frequencies of the letters in a string. One way is to visit each letter and update a mutable map.
val freq = scala.collection.mutable.Map[Char, Int]()
for (c <- "Mississippi") freq(c) = freq.getOrElse(c, 0) + 1
// Now freq is Map('i' -> 4, 'M' -> 1, 's' -> 4, 'p' -> 2)
Here is another way of thinking about this process. At each step, combine the frequency map and the newly encountered letter, yielding a new frequency map. That’s a fold:
.
.
.
op
/
op 's'
/
op 'i'
/
empty map 'M'
What is op
? The left operand is the partially filled map, and the right operand is the new letter. The result is the augmented map. It becomes the input to the next call to op
, and at the end, the result is a map with all counts. The code is
(Map[Char, Int]() /: "Mississippi") {
(m, c) => m + (c -> (m.getOrElse(c, 0) + 1))
}
Note that this is an immutable map. We compute a new map at each step.
Note
It is possible to replace any while
loop with a fold. Build a data structure that combines all variables updated in the loop, and define an operation that implements one step through the loop. I am not saying that this is always a good idea, but you may find it interesting that loops and mutations can be eliminated in this way.
Finally, the scanLeft
and scanRight
methods combine folding and mapping. You get a collection of all intermediate results. For example,
(1 to 10).scanLeft(0)(_ + _)
Vector(0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55)
The methods of the preceding section apply an operation to adjacent elements in the same collection. Sometimes, you have two collections, and you want to combine corresponding elements. For example, suppose you have a list of product prices and corresponding quantities:
val prices = List(5.0, 20.0, 9.95)
val quantities = List(10, 2, 1)
The zip
method lets you combine them into a list of pairs. For example,
prices zip quantities
is a List[(Double, Int)]
:
List[(Double, Int)] = List((5.0, 10), (20.0, 2), (9.95, 1))
The method is called “zip” because it combines the two collections like the teeth of a zipper.
Now it is easy to apply a function to each pair.
(prices zip quantities) map { p => p._1 * p._2 }
The result is a list of prices:
List(50.0, 40.0, 9.95)
The total price of all items is then
((prices zip quantities) map { p => p._1 * p._2 }) sum
If one collection is shorter than the other, the result has as many pairs as the shorter collection. For example,
List(5.0, 20.0, 9.95) zip List(10, 2)
is
List((5.0, 10), (20.0, 2))
The zipAll
method lets you specify defaults for the shorter list:
List(5.0, 20.0, 9.95).zipAll(List(10, 2), 0.0, 1)
is
List((5.0, 10), (20.0, 2), (9.95, 1))
The zipWithIndex
method returns a list of pairs where the second component is the index of each element. For example,
is
Vector(('S', 0), ('c', 1), ('a', 2), ('l', 3), ('a', 4))
This can be useful if you want to compute the index of an element with a certain property. For example,
"Scala".zipWithIndex.max
is ('l', 3)
. The index of the value with the largest encoding is
"Scala".zipWithIndex.max._2
You can obtain an iterator from a collection with the iterator
method. This isn’t as common as in Java or C++ because you can usually get what you need more easily with one of the methods from the preceding sections.
However, iterators are useful for collections that are expensive to construct fully. For example, Source.fromFile
yields an iterator because it might not be efficient to read an entire file into memory. There are a few Iterable
methods that yield an iterator, such as grouped
or sliding
.
When you have an iterator, you can iterate over the elements with the next
and hasNext
methods.
while (iter.hasNext)
do something with iter.next()
If you prefer, you can use a for
loop instead:
for (elem <- iter)
do something with elem
Both loops end up moving the iterator to the end of the collection, after which it is no longer usable.
Sometimes, you want to be able to look at the next element before deciding whether to consume it. In that case, use the buffered
method to turn an Iterator
into a BufferedIterator
. The head
method yields the next element without advancing the iterator.
val iter = scala.io.Source.fromFile(filename).buffered
while (iter.hasNext && iter.head.isWhitespace) iter.next
// Now iter points to the first non-whitespace character
The Iterator
class defines a number of methods that work identically to the methods on collections. In particular, all Iterable
methods listed in Section 13.7, “Common Methods,” on page 180 are available, except for head
, headOption
, last
, lastOption
, tail
, init
, takeRight
, and dropRight
. After calling a method such as map
, filter
, count
, sum
, or even length
, the iterator is at the end of the collection, and you can’t use it again. With other methods, such as find
or take
, the iterator is past the found element or the taken ones.
If you find it too tedious to work with an iterator, you can use a method such as toArray
, toIterable
, toSeq
, toSet
, or toMap
to copy the values into a collection.
In the preceding sections, you saw that an iterator is a “lazy” alternative to a collection. You get the elements as you need them. If you don’t need any more elements, you don’t pay for the expense of computing the remaining ones.
However, iterators are fragile. Each call to next
mutates the iterator. Streams offer an immutable alternative. A stream is an immutable list in which the tail is computed lazily—that is, only when you ask for it.
Here is a typical example:
def numsFrom(n: BigInt): Stream[BigInt] = n #:: numsFrom(n + 1)
The #::
operator is like the ::
operator for lists, but it constructs a stream.
When you call
val tenOrMore = numsFrom(10)
you get a stream object that is displayed as
Stream(10, ?)
The tail is unevaluated. If you call
tenOrMore.tail.tail.tail
you get
Stream(13, ?)
Stream methods are executed lazily. For example,
val squares = numsFrom(1).map(x => x * x)
yields
Stream(1, ?)
You have to call squares.tail
to force evaluation of the next entry.
If you want to get more than one answer, you can invoke take
followed by force
, which forces evaluation of all values. For example,
squares.take(5).force
produces Stream(1, 4, 9, 16, 25)
.
Of course, you don’t want to call
squares.force // No!
That call would attempt to evaluate all members of an infinite stream, causing an OutOfMemoryError
.
You can construct a stream from an iterator. For example, the Source.getLines
method returns an Iterator[String]
. With that iterator, you can only visit the lines once. A stream caches the visited lines so you can revisit them:
val words = Source.fromFile("/usr/share/dict/words").getLines.toStream
words // Stream(A, ?)
words(5) // Aachen
words // Stream(A, A's, AOL, AOL's, Aachen, ?)
Note
Scala streams are quite different from streams in Java 8. However, lazy views, covered in the next section, are conceptually equivalent to Java 8 streams.
In the preceding section, you saw that stream methods are computed lazily, delivering results only when they are needed. You can get a similar effect with other collections by applying the view
method. This method yields a collection on which methods are applied lazily. For example,
val palindromicSquares = (1 to 1000000).view
.map(x => x * x)
.filter(x => x.toString == x.toString.reverse)
yields a collection that is unevaluated. (Unlike a stream, not even the first element is evaluated.) When you call
palindromicSquares.take(10).mkString(",")
then enough squares are generated until ten palindromes have been found, and then the computation stops. Unlike streams, views do not cache any values. If you call palindromicSquares.take(10).mkString(",")
again, the computation starts over.
As with streams, use the force
method to force evaluation of a lazy view. You get back a collection of the same type as the original.
The apply
method forces evaluation of the entire view. Instead of calling lazyView(i)
, call lazyView.take(i).last
.
When you obtain a view into a slice of a mutable collection, any mutations affect the original collection. For example,
ArrayBuffer buffer = ...
buffer.view(10, 20).transform(x => 0)
clears the given slice and leaves the other elements unchanged.
At times you may need to use a Java collection, and you will likely miss the rich set of methods that you get with Scala collections. Conversely, you may want to build up a Scala collection and then pass it to Java code. The JavaConversions
object provides a set of conversions between Scala and Java collections.
Give the target value an explicit type to trigger the conversion. For example,
import scala.collection.JavaConversions._
val props: scala.collection.mutable.Map[String, String] = System.getProperties()
If you are worried about unwanted implicit conversions, just import the ones you need, for example:
import scala.collection.JavaConversions.propertiesAsScalaMap
Table 13–4 shows the conversions from Scala to Java collections.
And Table 13–5 shows the opposite conversions from Java to Scala collections.
Note that the conversions yield wrappers that let you use the target interface to access the original type. For example, if you use
val props: scala.collection.mutable.Map[String, String] = System.getProperties()
then props
is a wrapper whose methods call the methods of the underlying Java object. If you call
props("com.horstmann.scala") = "impatient"
then the wrapper calls put("com.horstmann.scala", "impatient")
on the underlying Properties
object.
It is hard to write correct concurrent programs, yet concurrency is often required nowadays to keep all processors of a computer busy. Scala offers a particularly attractive solution for manipulating large collections. Such tasks often parallelize naturally. For example, to compute the sum of all elements, multiple threads can concurrently compute the sums of different sections; in the end, these partial results are summed up. Of course it is troublesome to schedule these concurrent activities—but with Scala, you don’t have to. If coll
is a large collection, then
coll.par.sum
computes the sum concurrently. The par
method produces a parallel implementation of the collection. That implementation parallelizes the collection methods whenever possible. For example,
coll.par.count(_ % 2 == 0)
counts the even numbers in coll
by evaluating the predicate on subcollections in parallel and combining the results.
You can parallelize a for
loop by applying .par
to the collection over which you iterate, like this:
for (i <- (0 until 100000).par) print(s" $i")
Try it out—the numbers are printed in the order they are produced by the threads working on the task.
In a for/yield
loop, the results are assembled in order. Try this:
(for (i <- (0 until 100000).par) yield i) == (0 until 100000)
Caution
If parallel computations mutate shared variables, the result is unpredictable. For example, do not update a shared counter:
var count = 0
for (c <- coll.par) { if (c % 2 == 0) count += 1 } // Error!
The parallel collections returned by the par
method belong to types that extend the ParSeq
, ParSet
, or ParMap
traits. These are not subtypes of Seq
, Set
, or Map
, and you cannot pass a parallel collection to a method that expects a sequential collection.
You can convert a parallel collection back to a sequential one with the seq
method.
val result = coll.par.filter(p).seq
Not all methods can be parallelized. For example, reduceLeft
and reduceRight
require that each operator is applied in sequence. There is an alternate method, reduce
, that operates on parts of the collection and combines the results. For this to work, the operator must be associative—it must fulfill (a op
b) op
c = a op
(b op
c). For example, addition is associative but subtraction is not: (a – b) – c ≠ a – (b – c).
Similarly, there is a fold
method that operates on parts of the collection. Unfortunately, it is not as flexible as foldLeft
or foldRight
—both arguments of the operator must be elements. That is, you can do coll.par.fold(0)(_ + _)
, but you cannot do a more complex fold such as the one at the end of Section 13.9, “Reducing, Folding, and Scanning,” on page 184.
To solve this problem, there is an even more general aggregate
that applies an operator to parts of the collection, and then uses another operator to combine the results. For example, str.par.aggregate(Set[Char]())(_ + _, _ ++ _)
is the equivalent of str.foldLeft(Set[Char]())(_ + _)
, forming a set of all distinct characters in str
.
Note
By default, parallel collections use a global fork-join pool, which is well suited for processor-intensive calculations. If you carry out parallel computation steps that block, you should choose a different “execution context”—see Chapter 17.
1. Write a function that, given a string, produces a map of the indexes of all characters. For example, indexes("Mississippi")
should return a map associating 'M'
with the set {0}
, 'i'
with the set {1, 4, 7, 10}
, and so on. Use a mutable map of characters to mutable sets. How can you ensure that the set is sorted?
2. Repeat the preceding exercise, using an immutable map of characters to lists.
3. Write a function that removes every second element from a ListBuffer
. Try it two ways. Call remove(i)
for all even i
starting at the end of the list. Copy every second element to a new list. Compare the performance.
4. Write a function that receives a collection of strings and a map from strings to integers. Return a collection of integers that are values of the map corresponding to one of the strings in the collection. For example, given Array("Tom", "Fred", "Harry")
and Map("Tom" -> 3, "Dick" -> 4, "Harry" -> 5)
, return Array(3, 5)
. Hint: Use flatMap
to combine the Option
values returned by get
.
5. Implement a function that works just like mkString
, using reduceLeft
.
6. Given a list of integers lst
, what is (lst : List[Int]())(_ :: _)
? (List[Int]() /: lst)(_ :+ _)
? How can you modify one of them to reverse the list?
7. In Section 13.10, “Zipping,” on page 187, the expression (prices zip quantities) map { p => p._1 * p._2 }
is a bit inelegant. We can’t do (prices zip quantities) map { _ * _ }
because _ * _
is a function with two arguments, and we need a function with one argument that is a tuple. The tupled
method of the Function
object changes a function with two arguments to one that takes a tuple. Apply tupled
to the multiplication function so you can map it over the list of pairs.
8. Write a function that turns an array of Double
values into a two-dimensional array. Pass the number of columns as a parameter. For example, with Array(1, 2, 3, 4, 5, 6)
and three columns, return Array(Array(1, 2, 3), Array(4, 5, 6))
. Use the grouped
method.
9. The Scala compiler transforms a for
/yield
expression
for (i <- 1 to 10; j <- 1 to i) yield i * j
to invocations of flatMap
and map
, like this:
(1 to 10).flatMap(i => (1 to i).map(j => i * j))
Explain the use of flatMap
. Hint: What is (1 to i).map(j => i * j)
when i
is 1
, 2
, 3
?
What happens when there are three generators in the for
/yield
expression?
10. The method java.util.TimeZone.getAvailableIDs
yields time zones such as Africa/Cairo
and Asia/Chungking
. Which continent has the most time zones? Hint: groupBy
.
11. Harry Hacker reads a file into a string and wants to use a parallel collection to update the letter frequencies concurrently on portions of the string. He uses the following code:
val frequencies = new scala.collection.mutable.HashMap[Char, Int]
for (c <- str.par) frequencies(c) = frequencies.getOrElse(c, 0) + 1
Why is this a terrible idea? How can he really parallelize the computation? (Hint: Use aggregate
.)