Chapter 13. Collections

In this chapter, you will learn about the Scala collections library from a library user’s point of view. In addition to arrays and maps, which you have already encountered, you will see other useful collection types. There are many methods that can be applied to collections, and this chapter presents them in an orderly way.

The key points of this chapter are:

• All collections extend the Iterable trait.

• The three major categories of collections are sequences, sets, and maps.

• Scala has mutable and immutable versions of most collections.

• A Scala list is either empty, or it has a head and a tail which is again a list.

• Sets are unordered collections.

• Use a LinkedHashSet to retain the insertion order or a SortedSet to iterate in sorted order.

+ adds an element to an unordered collection; +: and :+ prepend or append to a sequence; ++ concatenates two collections; - and -- remove elements.

• The Iterable and Seq traits have dozens of useful methods for common operations. Check them out before writing tedious loops.

• Mapping, folding, and zipping are useful techniques for applying a function or operation to the elements of a collection.

13.1 The Main Collections Traits

Figure 13–1 shows the most important traits that make up the Scala collections hierarchy.

Image

Figure 13–1 Key traits in the Scala collections hierarchy

An Iterable is any collection that can yield an Iterator with which you can access all elements in the collection:

val coll = ... // some Iterable
val iter = coll.iterator
while (iter.hasNext)
   do something with iter.next()

This is the most basic way of traversing a collection. However, as you will see throughout this chapter, usually there are more convenient ways.

A Seq is an ordered sequence of values, such as an array or list. An IndexedSeq allows fast random access through an integer index. For example, an ArrayBuffer is indexed but a linked list is not.

A Set is an unordered collection of values. In a SortedSet, elements are always visited in sorted order.

A Map is a set of (key, value) pairs. A SortedMap visits the entries as sorted by the keys. See Chapter 4 for more information.

This hierarchy is similar to that in Java, with a couple of welcome improvements:

1. Maps are a part of the hierarchy and not a separate hierarchy.

2. IndexedSeq is the supertype of arrays but not of lists, allowing you to tell the two apart.


Image Note

In Java, both ArrayList and LinkedList implement a common List interface, making it difficult to write efficient code when random access is preferred, for example when searching in a sorted sequence. This was a flawed design decision in the original Java collections framework. In a later version, a marker interface RandomAccess was added to deal with this problem.


Each Scala collection trait or class has a companion object with an apply method for constructing an instance of the collection. For example,

Iterable(0xFF, 0xFF00, 0xFF0000)
Set(Color.RED, Color.GREEN, Color.BLUE)
Map(Color.RED -> 0xFF0000, Color.GREEN -> 0xFF00, Color.BLUE -> 0xFF)
SortedSet("Hello", "World")

This is called the “uniform creation principle.”

There are methods toSeq, toSet, toMap, and so on, as well as a generic to[C] method, that you can use to translate between collection types.

val coll = Seq(1, 1, 2, 3, 5, 8, 13)
val set = coll.toSet
val buffer = coll.to[ArrayBuffer]


Image Note

You can use the == operator to compare any sequence, set, or map with another collection of the same kind. For example, Seq(1, 2, 3) == (1 to 3) yields true. But comparing different kinds, for example, Seq(1, 2, 3) == Set(1, 2, 3) always yields false. In that case, use the sameElements method.


13.2 Mutable and Immutable Collections

Scala supports both mutable and immutable collections. An immutable collection can never change, so you can safely share a reference to it, even in a multi-threaded program. For example, there is a scala.collection.mutable.Map and a scala.collection.immutable.Map. Both have a common supertype scala.collection.Map (which, of course, contains no mutation operations).


Image Note

When you have a reference to a scala.collection.immutable.Map, you know that nobody can change the map. If you have a scala.collection.Map, then you can’t change it, but someone else might.


Scala gives a preference to immutable collections. The companion objects in the scala.collection package produce immutable collections. For example, scala.collection.Map("Hello" -> 42) is an immutable map.

Moreover, the scala package and the Predef object, which are always imported, have type aliases List, Set, and Map that refer to the immutable traits. For example, Predef.Map is the same as scala.collection.immutable.Map.


Image Tip

With the statement

import scala.collection.mutable

you can get an immutable map as Map and a mutable one as mutable.Map.


If you had no prior experience with immutable collections, you may wonder how you can do useful work with them. The key is that you can create new collections out of old ones. For example, if numbers is an immutable set, then

numbers + 9

is a new set containing the numbers together with 9. If 9 was already in the set, you just get a reference to the old set. This is particularly natural in recursive computations. For example, here we compute the set of all digits of an integer:

def digits(n: Int): Set[Int] =
  if (n < 0) digits(-n)
  else if (n < 10) Set(n)
  else digits(n / 10) + (n % 10)

This method starts out with a set containing a single digit. At each step, another digit is added. However, adding the digit doesn’t mutate a set. Instead, in each step, a new set is constructed.

13.3 Sequences

Figure 13–2 shows the most important immutable sequences.

Image

Figure 13–2 Immutable sequences

A Vector is the immutable equivalent of an ArrayBuffer: an indexed sequence with fast random access. Vectors are implemented as trees where each node has up to 32 children. For a vector with one million elements, one needs four layers of nodes. (Since 103 » 210, 106 » 324.) Accessing an element in such a list will take 4 hops, whereas in a linked list it would take an average of 500,000.

A Range represents an integer sequence, such as 0,1,2,3,4,5,6,7,8,9 or 10,20,30. Of course a Range object doesn’t store all sequence values but only the start, end, and increment. You construct Range objects with the to and until methods, as described in Chapter 2.

We discuss lists in the next section, and streams in Section 13.12, “Streams,” on page 189.

See Figure 13–3 for the most useful mutable sequences.

Image

Figure 13–3 Mutable sequences

We discussed array buffers in Chapter 3. Stacks, queues, and priority queues are standard data structures that are useful for implementing certain algorithms. If you are familiar with these structures, the Scala implementations won’t surprise you.

13.4 Lists

In Scala, a list is either Nil (that is, empty) or an object with a head element and a tail that is again a list. For example, consider the list

val digits = List(4, 2)

The value of digits.head is 4, and digits.tail is List(2). Moreover, digits.tail.head is 2 and digits.tail.tail is Nil.

The :: operator makes a new list from a given head and tail. For example,

9 :: List(4, 2)

is List(9, 4, 2). You can also write that list as

9 :: 4 :: 2 :: Nil

Note that :: is right-associative. With the :: operator, lists are constructed from the end:

9 :: (4 :: (2 :: Nil))

In Java or C++, one uses an iterator to traverse a linked list. You can do this in Scala as well, but it is often more natural to use recursion. For example, the following function computes the sum of all elements in a linked list of integers:

def sum(lst: List[Int]): Int =
  if (lst == Nil) 0 else lst.head + sum(lst.tail)

Or, if you prefer, you can use pattern matching:

def sum(lst: List[Int]): Int = lst match {
  case Nil => 0
  case h :: t => h + sum(t) // h is lst.head, t is lst.tail
}

Note the :: operator in the second pattern. It “destructures” the list into head and tail.


Image Note

Recursion works so naturally because the tail of a list is again a list.


Of course, for this particular example, you do not need to use recursion at all. The Scala library already has a sum method:

List(9, 4, 2).sum // Yields 15

If you want to mutate list elements in place, you can use a ListBuffer, a data structure that is backed by a linked list with a reference to the last node. This makes it efficient to add or remove elements at either end of the list.

However, adding or removing elements in the middle is not efficient. For example, suppose you want to remove every second element of a mutable list. With a Java LinkedList, you use an iterator and call remove after every second call to next. There is no analogous operation on a ListBuffer. Of course, removing multiple elements by their index positions is very inefficient in a linked list. Your best bet is to generate a new list with the result (see Exercise 3 on page 194).


Image Note

There are deprecated LinkedList and DoubleLinkedList classes and an internal MutableList class that you should not be using.


13.5 Sets

A set is a collection of distinct elements. Trying to add an existing element has no effect. For example,

Set(2, 0, 1) + 1

is the same as Set(2, 0, 1).

Unlike lists, sets do not retain the order in which elements are inserted. By default, sets are implemented as hash sets in which elements are organized by the value of the hashCode method. (In Scala, as in Java, every object has a hashCode method.)

For example, if you iterate over

Set(1, 2, 3, 4, 5, 6)

the elements are visited in the order

5 1 6 2 3 4

You may wonder why sets don’t retain the element order. It turns out that you can find elements much faster if you allow sets to reorder their elements. Finding an element in a hash set is much faster than in an array or list.

A linked hash set remembers the order in which elements were inserted. It keeps a linked list for this purpose. For example,

val weekdays = scala.collection.mutable.LinkedHashSet("Mo", "Tu", "We", "Th", "Fr")

If you want to iterate over elements in sorted order, use a sorted set:

val numbers = scala.collection.mutable.SortedSet(1, 2, 3, 4, 5)

A bit set is an implementation of a set of non-negative integers as a sequence of bits. The ith bit is 1 if i is present in the set. This is an efficient implementation as long as the maximum element is not too large. Scala provides both mutable and immutable BitSet classes.

The contains method checks whether a set contains a given value. The subsetOf method checks whether all elements of a set are contained in another set.

val digits = Set(1, 7, 2, 9)
digits contains 0 // false
Set(1, 2) subsetOf digits // true

The union, intersect, and diff methods carry out the usual set operations. If you prefer, you can write them as |, &, and &~. You can also write union as ++ and difference as --. For example, if we have the set

val primes = Set(2, 3, 5, 7)

then digits union primes is Set(1, 2, 3, 5, 7, 9), digits & primes is Set(2, 7), and digits -- primes is Set(1, 9).

13.6 Operators for Adding or Removing Elements

When you want to add or remove an element, or a number of elements, the operators to use depend on the collection type. Table 13–1 provides a summary.

Image

Table 13–1 Operators for Adding and Removing Elements

Generally, + is used for adding an element to an unordered collection, while +: and :+ add an element to the beginning or end of an ordered collection.

Vector(1, 2, 3) :+ 5 // Yields Vector(1, 2, 3, 5)
1 +: Vector(1, 2, 3) // Yields Vector(1, 1, 2, 3)

Note that +:, like all operators ending in a colon, is right-associative, and that it is a method of the right operand.

These operators return new collections (of the same type as the original ones) without modifying the original. Mutable collections have a += operator that mutates the left-hand side. For example,

val numbers = ArrayBuffer(1, 2, 3)
numbers += 5 // Adds 5 to numbers

With an immutable collection, you can use += or :+= with a var, like this:

var numbers = Set(1, 2, 3)
numbers += 5 // Sets numbers to the immutable set numbers + 5
var numberVector = Vector(1, 2, 3)
numberVector :+= 5 // += does not work since vectors don't have a + operator

To remove an element, use the - operator:

Set(1, 2, 3) - 2 // Yields Set(1, 3)

You can add multiple elements with the ++ operator:

coll ++ coll2

yields a collection of the same type as coll that contains both coll and coll2. Similarly, the -- operator removes multiple elements.


Image Tip

As you can see, Scala provides many operators for adding and removing elements. Here is a summary:

1. Append (:+) or prepend (+:) to a sequence.

2. Add (+) to an unordered collection.

3. Remove with the - operator.

4. Use ++ and -- for bulk add and remove.

5. Mutations are += ++= -= --=.

6. For lists, many Scala programmers prefer the :: and ::: operators.

7. Stay away from ++: +=: ++=:.



Image Note

For lists, you can use +: instead of :: for consistency, with one exception: Pattern matching (case h::t) does not work with the +: operator.


13.7 Common Methods

Table 13–2 gives a brief overview of the most important methods of the Iterable trait, sorted by functionality.

Image
Image

Table 13–2 Important Methods of the Iterable Trait

The Seq trait adds several methods to the Iterable trait. Table 13–3 shows the most important ones.

Image

Table 13–3 Important Methods of the Seq Trait


Image Note

Note that these methods never mutate a collection. They return a collection of the same type as the original. This is sometimes called the “uniform return type” principle.


13.8 Mapping a Function

You may want to transform all elements of a collection. The map method applies a function to a collection and yields a collection of the results. For example, given a list of strings

val names = List("Peter", "Paul", "Mary")

you get a list of the uppercased strings as

names.map(_.toUpperCase) // List("PETER", "PAUL", "MARY")

This is exactly the same as

for (n <- names) yield n.toUpperCase

If the function yields a collection instead of a single value, you may want to concatenate all results. In that case, use flatMap. For example, consider

def ulcase(s: String) = Vector(s.toUpperCase(), s.toLowerCase())

Then names.map(ulcase) is

List(Vector("PETER", "peter"), Vector("PAUL", "paul"), Vector("MARY", "mary"))

but names.flatMap(ulcase) is

List("PETER", "peter", "PAUL", "paul", "MARY", "mary")


Image Tip

If you use flatMap with a function that returns an Option, the resulting collection contains all values v for which the function returns Some(v).



Image Note

The map and flatMap methods are important because they are used for translating for expressions. For example, the expression

for (i <- 1 to 10) yield i * i

is translated to

(1 to 10).map(i => i * i)

and

for (i <- 1 to 10; j <- 1 to i) yield i * j

becomes

(1 to 10).flatMap(i => (1 to i).map(j => i * j))

Why flatMap? See Exercise 9 on page 195.


The transform method is the in-place equivalent of map. It applies to mutable collections, and replaces each element with the result of a function. For example, the following code changes all buffer elements to uppercase:

val buffer = ArrayBuffer("Peter", "Paul", "Mary")
buffer.transform(_.toUpperCase)

If you just want to apply a function for its side effect and don’t care about the function values, use foreach:

names.foreach(println)

The collect method works with partial functions—functions that may not be defined for all inputs. It yields a collection of all function values of the arguments on which it is defined. For example,

"-3+4".collect { case '+' => 1 ; case '-' => -1 } // Vector(-1, 1)

The groupBy method yields a map whose keys are the function values, and whose values are the collections of elements whose function value is the given key. For example,

val words = ...
val map = words.groupBy(_.substring(0, 1).toUpper)

builds a map that maps "A" to all words starting with A, and so on.

13.9 Reducing, Folding, and Scanning Image

The map method applies a unary function to all elements of a collection. The methods that we discuss in this section combine elements with a binary function. The call c.reduceLeft(op) applies op to successive elements, like this:

            .
           .
          .
          op
         /  
        op   coll(3)
       /  
      op   coll(2)
     /  
coll(0)  coll(1)

For example,

List(1, 7, 2, 9).reduceLeft(_ - _)

is

             -
           /   
          -     9
        /   
       -     2
     /   
    1     7

or

((1 - 7) - 2) - 9 = 1 - 7 - 2 - 9 = -17

The reduceRight method does the same, but it starts at the end of the collection. For example,

List(1, 7, 2, 9).reduceRight(_ - _)

is

1 - (7 - (2 - 9)) = 1 - 7 + 2 - 9 = -13

Often, it is useful to start the computation with an initial element other than the initial element of a collection. The call coll.foldLeft(init)(op) computes

          .
         .
        .
        op
       /  
      op   coll(2)
     /  
    op   coll(1)
   /  
init   coll(0)

For example,

List(1, 7, 2, 9).foldLeft(0)(_ - _)

is

0 - 1 - 7 - 2 - 9 = -19


Image Note

The initial value and the operator are separate “curried” parameters so that Scala can use the type of the initial value for type inference in the operator. For example, in List(1, 7, 2, 9).foldLeft("")(_ + _), the initial value is a string, so the operator must be a function (String, Int) => String.


You can also write the foldLeft operation with the /: operator, like this:

(0 /: List(1, 7, 2, 9))(_ - _)

The /: is supposed to remind you of the shape of the tree.


Image Note

With the /: operator, the initial value is the first operand. Note that, since the operator ends with a colon, it is a method of the second operand.


There is a foldRight or : variant as well, computing

      .
       .
        .
        op
       /  
coll(n-3)  op
          /  
   coll(n-2)  op
             /  
      coll(n-1)  init            

These examples don’t seem to be very useful. Of course, coll.reduceLeft(_ + _) or coll.foldLeft(0)(_ + _) computes the sum, but you can get that directly with coll.sum.

Folding is sometimes attractive as a replacement for a loop. Suppose, for example, we want to count the frequencies of the letters in a string. One way is to visit each letter and update a mutable map.

val freq = scala.collection.mutable.Map[Char, Int]()
for (c <- "Mississippi") freq(c) = freq.getOrElse(c, 0) + 1
// Now freq is Map('i' -> 4, 'M' -> 1, 's' -> 4, 'p' -> 2)

Here is another way of thinking about this process. At each step, combine the frequency map and the newly encountered letter, yielding a new frequency map. That’s a fold:

                .
               .
              .
             op
            /  
           op   's'
          /  
         op   'i'
        /  
empty map  'M'

What is op? The left operand is the partially filled map, and the right operand is the new letter. The result is the augmented map. It becomes the input to the next call to op, and at the end, the result is a map with all counts. The code is

(Map[Char, Int]() /: "Mississippi") {
  (m, c) => m + (c -> (m.getOrElse(c, 0) + 1))
}

Note that this is an immutable map. We compute a new map at each step.


Image Note

It is possible to replace any while loop with a fold. Build a data structure that combines all variables updated in the loop, and define an operation that implements one step through the loop. I am not saying that this is always a good idea, but you may find it interesting that loops and mutations can be eliminated in this way.


Finally, the scanLeft and scanRight methods combine folding and mapping. You get a collection of all intermediate results. For example,

(1 to 10).scanLeft(0)(_ + _)

yields all partial sums:

Vector(0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55)

13.10 Zipping

The methods of the preceding section apply an operation to adjacent elements in the same collection. Sometimes, you have two collections, and you want to combine corresponding elements. For example, suppose you have a list of product prices and corresponding quantities:

val prices = List(5.0, 20.0, 9.95)
val quantities = List(10, 2, 1)

The zip method lets you combine them into a list of pairs. For example,

prices zip quantities

is a List[(Double, Int)]:

List[(Double, Int)] = List((5.0, 10), (20.0, 2), (9.95, 1))

The method is called “zip” because it combines the two collections like the teeth of a zipper.

Now it is easy to apply a function to each pair.

(prices zip quantities) map { p => p._1 * p._2 }

The result is a list of prices:

List(50.0, 40.0, 9.95)

The total price of all items is then

((prices zip quantities) map { p => p._1 * p._2 }) sum

If one collection is shorter than the other, the result has as many pairs as the shorter collection. For example,

List(5.0, 20.0, 9.95) zip List(10, 2)

is

List((5.0, 10), (20.0, 2))

The zipAll method lets you specify defaults for the shorter list:

List(5.0, 20.0, 9.95).zipAll(List(10, 2), 0.0, 1)

is

List((5.0, 10), (20.0, 2), (9.95, 1))

The zipWithIndex method returns a list of pairs where the second component is the index of each element. For example,

"Scala".zipWithIndex

is

Vector(('S', 0), ('c', 1), ('a', 2), ('l', 3), ('a', 4))

This can be useful if you want to compute the index of an element with a certain property. For example,

"Scala".zipWithIndex.max

is ('l', 3). The index of the value with the largest encoding is

"Scala".zipWithIndex.max._2

13.11 Iterators

You can obtain an iterator from a collection with the iterator method. This isn’t as common as in Java or C++ because you can usually get what you need more easily with one of the methods from the preceding sections.

However, iterators are useful for collections that are expensive to construct fully. For example, Source.fromFile yields an iterator because it might not be efficient to read an entire file into memory. There are a few Iterable methods that yield an iterator, such as grouped or sliding.

When you have an iterator, you can iterate over the elements with the next and hasNext methods.

while (iter.hasNext)
  do something with iter.next()

If you prefer, you can use a for loop instead:

for (elem <- iter)
  do something with elem

Both loops end up moving the iterator to the end of the collection, after which it is no longer usable.

Sometimes, you want to be able to look at the next element before deciding whether to consume it. In that case, use the buffered method to turn an Iterator into a BufferedIterator. The head method yields the next element without advancing the iterator.

val iter = scala.io.Source.fromFile(filename).buffered
while (iter.hasNext && iter.head.isWhitespace) iter.next
  // Now iter points to the first non-whitespace character

The Iterator class defines a number of methods that work identically to the methods on collections. In particular, all Iterable methods listed in Section 13.7, “Common Methods,” on page 180 are available, except for head, headOption, last, lastOption, tail, init, takeRight, and dropRight. After calling a method such as map, filter, count, sum, or even length, the iterator is at the end of the collection, and you can’t use it again. With other methods, such as find or take, the iterator is past the found element or the taken ones.

If you find it too tedious to work with an iterator, you can use a method such as toArray, toIterable, toSeq, toSet, or toMap to copy the values into a collection.

13.12 Streams Image

In the preceding sections, you saw that an iterator is a “lazy” alternative to a collection. You get the elements as you need them. If you don’t need any more elements, you don’t pay for the expense of computing the remaining ones.

However, iterators are fragile. Each call to next mutates the iterator. Streams offer an immutable alternative. A stream is an immutable list in which the tail is computed lazily—that is, only when you ask for it.

Here is a typical example:

def numsFrom(n: BigInt): Stream[BigInt] = n #:: numsFrom(n + 1)

The #:: operator is like the :: operator for lists, but it constructs a stream.

When you call

val tenOrMore = numsFrom(10)

you get a stream object that is displayed as

Stream(10, ?)

The tail is unevaluated. If you call

tenOrMore.tail.tail.tail

you get

Stream(13, ?)

Stream methods are executed lazily. For example,

val squares = numsFrom(1).map(x => x * x)

yields

Stream(1, ?)

You have to call squares.tail to force evaluation of the next entry.

If you want to get more than one answer, you can invoke take followed by force, which forces evaluation of all values. For example,

squares.take(5).force

produces Stream(1, 4, 9, 16, 25).

Of course, you don’t want to call

squares.force // No!

That call would attempt to evaluate all members of an infinite stream, causing an OutOfMemoryError.

You can construct a stream from an iterator. For example, the Source.getLines method returns an Iterator[String]. With that iterator, you can only visit the lines once. A stream caches the visited lines so you can revisit them:

val words = Source.fromFile("/usr/share/dict/words").getLines.toStream
words // Stream(A, ?)
words(5) // Aachen
words // Stream(A, A's, AOL, AOL's, Aachen, ?)


Image Note

Scala streams are quite different from streams in Java 8. However, lazy views, covered in the next section, are conceptually equivalent to Java 8 streams.


13.13 Lazy Views Image

In the preceding section, you saw that stream methods are computed lazily, delivering results only when they are needed. You can get a similar effect with other collections by applying the view method. This method yields a collection on which methods are applied lazily. For example,

val palindromicSquares = (1 to 1000000).view
  .map(x => x * x)
  .filter(x => x.toString == x.toString.reverse)

yields a collection that is unevaluated. (Unlike a stream, not even the first element is evaluated.) When you call

palindromicSquares.take(10).mkString(",")

then enough squares are generated until ten palindromes have been found, and then the computation stops. Unlike streams, views do not cache any values. If you call palindromicSquares.take(10).mkString(",") again, the computation starts over.

As with streams, use the force method to force evaluation of a lazy view. You get back a collection of the same type as the original.


Image Caution

The apply method forces evaluation of the entire view. Instead of calling lazyView(i), call lazyView.take(i).last.


When you obtain a view into a slice of a mutable collection, any mutations affect the original collection. For example,

ArrayBuffer buffer = ...
buffer.view(10, 20).transform(x => 0)

clears the given slice and leaves the other elements unchanged.

13.14 Interoperability with Java Collections

At times you may need to use a Java collection, and you will likely miss the rich set of methods that you get with Scala collections. Conversely, you may want to build up a Scala collection and then pass it to Java code. The JavaConversions object provides a set of conversions between Scala and Java collections.

Give the target value an explicit type to trigger the conversion. For example,

import scala.collection.JavaConversions._
val props: scala.collection.mutable.Map[String, String] = System.getProperties()

If you are worried about unwanted implicit conversions, just import the ones you need, for example:

import scala.collection.JavaConversions.propertiesAsScalaMap

Table 13–4 shows the conversions from Scala to Java collections.

Image

Table 13–4 Conversions from Scala Collections to Java Collections

And Table 13–5 shows the opposite conversions from Java to Scala collections.

Image

Table 13–5 Conversions from Java Collections to Scala Collections

Note that the conversions yield wrappers that let you use the target interface to access the original type. For example, if you use

val props: scala.collection.mutable.Map[String, String] = System.getProperties()

then props is a wrapper whose methods call the methods of the underlying Java object. If you call

props("com.horstmann.scala") = "impatient"

then the wrapper calls put("com.horstmann.scala", "impatient") on the underlying Properties object.

13.15 Parallel Collections

It is hard to write correct concurrent programs, yet concurrency is often required nowadays to keep all processors of a computer busy. Scala offers a particularly attractive solution for manipulating large collections. Such tasks often parallelize naturally. For example, to compute the sum of all elements, multiple threads can concurrently compute the sums of different sections; in the end, these partial results are summed up. Of course it is troublesome to schedule these concurrent activities—but with Scala, you don’t have to. If coll is a large collection, then

coll.par.sum

computes the sum concurrently. The par method produces a parallel implementation of the collection. That implementation parallelizes the collection methods whenever possible. For example,

coll.par.count(_ % 2 == 0)

counts the even numbers in coll by evaluating the predicate on subcollections in parallel and combining the results.

You can parallelize a for loop by applying .par to the collection over which you iterate, like this:

for (i <- (0 until 100000).par) print(s" $i")

Try it out—the numbers are printed in the order they are produced by the threads working on the task.

In a for/yield loop, the results are assembled in order. Try this:

(for (i <- (0 until 100000).par) yield i) == (0 until 100000)


Image Caution

If parallel computations mutate shared variables, the result is unpredictable. For example, do not update a shared counter:

var count = 0
for (c <- coll.par) { if (c % 2 == 0) count += 1 } // Error!


The parallel collections returned by the par method belong to types that extend the ParSeq, ParSet, or ParMap traits. These are not subtypes of Seq, Set, or Map, and you cannot pass a parallel collection to a method that expects a sequential collection.

You can convert a parallel collection back to a sequential one with the seq method.

val result = coll.par.filter(p).seq

Not all methods can be parallelized. For example, reduceLeft and reduceRight require that each operator is applied in sequence. There is an alternate method, reduce, that operates on parts of the collection and combines the results. For this to work, the operator must be associative—it must fulfill (a op b) op c = a op (b op c). For example, addition is associative but subtraction is not: (ab) – ca – (bc).

Similarly, there is a fold method that operates on parts of the collection. Unfortunately, it is not as flexible as foldLeft or foldRight—both arguments of the operator must be elements. That is, you can do coll.par.fold(0)(_ + _), but you cannot do a more complex fold such as the one at the end of Section 13.9, “Reducing, Folding, and Scanning,” on page 184.

To solve this problem, there is an even more general aggregate that applies an operator to parts of the collection, and then uses another operator to combine the results. For example, str.par.aggregate(Set[Char]())(_ + _, _ ++ _) is the equivalent of str.foldLeft(Set[Char]())(_ + _), forming a set of all distinct characters in str.


Image Note

By default, parallel collections use a global fork-join pool, which is well suited for processor-intensive calculations. If you carry out parallel computation steps that block, you should choose a different “execution context”—see Chapter 17.


Exercises

1. Write a function that, given a string, produces a map of the indexes of all characters. For example, indexes("Mississippi") should return a map associating 'M' with the set {0}, 'i' with the set {1, 4, 7, 10}, and so on. Use a mutable map of characters to mutable sets. How can you ensure that the set is sorted?

2. Repeat the preceding exercise, using an immutable map of characters to lists.

3. Write a function that removes every second element from a ListBuffer. Try it two ways. Call remove(i) for all even i starting at the end of the list. Copy every second element to a new list. Compare the performance.

4. Write a function that receives a collection of strings and a map from strings to integers. Return a collection of integers that are values of the map corresponding to one of the strings in the collection. For example, given Array("Tom", "Fred", "Harry") and Map("Tom" -> 3, "Dick" -> 4, "Harry" -> 5), return Array(3, 5). Hint: Use flatMap to combine the Option values returned by get.

5. Implement a function that works just like mkString, using reduceLeft.

6. Given a list of integers lst, what is (lst : List[Int]())(_ :: _)? (List[Int]() /: lst)(_ :+ _)? How can you modify one of them to reverse the list?

7. In Section 13.10, “Zipping,” on page 187, the expression (prices zip quantities) map { p => p._1 * p._2 } is a bit inelegant. We can’t do (prices zip quantities) map { _ * _ } because _ * _ is a function with two arguments, and we need a function with one argument that is a tuple. The tupled method of the Function object changes a function with two arguments to one that takes a tuple. Apply tupled to the multiplication function so you can map it over the list of pairs.

8. Write a function that turns an array of Double values into a two-dimensional array. Pass the number of columns as a parameter. For example, with Array(1, 2, 3, 4, 5, 6) and three columns, return Array(Array(1, 2, 3), Array(4, 5, 6)). Use the grouped method.

9. The Scala compiler transforms a for/yield expression

for (i <- 1 to 10; j <- 1 to i) yield i * j

to invocations of flatMap and map, like this:

(1 to 10).flatMap(i => (1 to i).map(j => i * j))

Explain the use of flatMap. Hint: What is (1 to i).map(j => i * j) when i is 1, 2, 3?

What happens when there are three generators in the for/yield expression?

10. The method java.util.TimeZone.getAvailableIDs yields time zones such as Africa/Cairo and Asia/Chungking. Which continent has the most time zones? Hint: groupBy.

11. Harry Hacker reads a file into a string and wants to use a parallel collection to update the letter frequencies concurrently on portions of the string. He uses the following code:

val frequencies = new scala.collection.mutable.HashMap[Char, Int]
for (c <- str.par) frequencies(c) = frequencies.getOrElse(c, 0) + 1

Why is this a terrible idea? How can he really parallelize the computation? (Hint: Use aggregate.)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset