Chapter 4. Having fun with functional data structures

This chapter covers

  • Introducing type parameterization with covariance and contravariance
  • Having fun with higher-order functions
  • Creating your own function objects in Scala
  • Introducing the Scala collection hierarchy and parallel collections

In this chapter you’ll switch gears to begin a fun and interesting part of the Scala language: Scala collections which broadly support two categories of data structures—immutable and mutable.

To understand and benefit from Scala collections, you need to know two concepts: type parameterization and higher-order functions. Type parameterization allows you to create types that take another type as a parameter (similar to Java generics). Higher-order functions let you create functions that take other functions as parameters. These two concepts allow you to create generic and reusable components, like Scala collections.

The Scala collection is one of Scala’s most powerful features. The library implements all the common data structures you need, making it essential for every Scala developer. A recent addition to the collections library is parallel collections. Scala parallel collections allow you to solve data parallelism problems in Scala with ease. You’ll see how the Scala parallel collections library helps when working with large datasets, so buckle up! This will be a fun and exciting ride.

4.1. Introducing type parameterization

In programming, type parameterization allows you to define methods or classes in terms of a type that will be specified later. It’s like creating a placeholder for the type. Parameterized types should not be a new concept to you if you’ve worked with Java or C# generics. Scala provides additional extensions to generics because of its sound and powerful type system.

In chapter 3 you implemented the query interface to the MongoDB driver, and one of the methods you exposed was findOne, which retrieves a single document from the collection. The problem with that method is that it returns null when the collection is empty, but this little fact isn’t clear from the method declaration. One way to solve the problem would be to add a comment, although adding a comment to a method is the weakest form of documentation. Wouldn’t it be better if you could explicitly communicate that sometimes the method might not work as intended?

In a situation like this, Scala developers use something called an Option. Option is a collection type in Scala. Unlike other collection types, an Option contains a maximum of one element. It represents one of two possible values: None and Some. None means “no value” and Some represents “some value.” By returning Option from a method, you can communicate that the method might not return a value at all times.

For this section, forget that Scala provides an Option type so that you can build something similar on your own. You’ll create a function that will return the index position of an element in a list. It also needs to work for all kinds of lists. The list could contain integers, longs, or strings. How can you build a function that works for all types of lists? Using type parameterization, you can make the type information configurable. Here’s how the position function looks with type parameterization:

Here A denotes a type that could only be determined when the function is invoked. Both List and the value you’re looking for need to be of the same type. Unlike Java and C#, Scala uses square brackets ([]) to declare the type parameter. When this function is invoked with a list of integers, A represents the type Int. Similarly if you invoke it with a list of strings, the type parameter A is replaced with String. Now test your position method with different types of parameters:

scala> val xs = List("one", "two", "three")
xs: List[java.lang.String] = List(one, two, three)

scala> position(xs, "two")
res2: Int = 1

scala> val ys = List(20, 30, 40)
ys: List[Int] = List(20, 30, 40)

scala> position(ys, 40)
res6: Int = 2

scala> position[Int](ys, 300)
res11: Int = -1

Even though you can explicitly specify the type value for the type parameter as in the last example, it’s optional. Scala type inference determines the value of the type parameter based on the arguments passed to the function.

Currently your position function returns -1 when there’s no matching element. Here, instead of returning the Int result, you’ll return a new type that clearly expresses the behavior of the method. You’ll create a container called Maybe that will wrap the result. Your position method will look like this:

def position[A](xs: List[A], value: A): Maybe[Int] = {
  val index = xs.indexOf(value)
  if(index != -1) Just(index) else Nil
}

In the case of a valid index (not -1), the position method returns Just (you’ll see it shortly); otherwise Nil. Just and Nil are two implementations of the Maybe container, and the following listing shows how they’re implemented.

Listing 4.1. Version 1 of Maybe.scala
sealed abstract class Maybe[+A] {
  def isEmpty: Boolean
  def get: A
}

final case class Just[A](value: A) extends Maybe[A] {
  def isEmpty = false
  def get = value
}

case object Nil extends Maybe[Nothing] {
  def isEmpty = true
  def get = throw new NoSuchElementException("Nil.get")
}

Most of this code should be familiar to you, except the type parameter part. The Maybe class is declared as a covariant on type A. The next section explores type variance in detail.

4.2. Type variance with covariance and contravariance

Type variance complements type parameterization with a mechanism to specify constraints like covariant and contravariant to the type system. The subtyping relationship gives rise to the question of variance—how do two types with a subtype relationship relate in type parameterization with each other? In Scala, you can use type parameters with classes and traits, as demonstrated previously. When using type parameters for classes or traits, you can use a + sign along with the type parameter to make it covariant (like the Maybe class in the previous example). Covariance allows subclasses to override and use narrower types than their superclass in covariant positions such as the return value. Here the Nil object is a subclass of Maybe with scala.Nothing as a type parameter. The reason you’re using scala.Nothing here is that the get method in the Nil object throws an exception. Because the A type of Maybe is covariant, you can return a narrower type from its subclasses. There’s no narrower type than Nothing in Scala because it’s at the bottom of the hierarchy.

Usefulness of Nothing in Scala

In Scala all methods and functions need to have a return type. There’s no void in Scala. In situations where you have methods that don’t return anything, you use scala.Unit as a return type of the method. Scala uses scala.Nothing as a return type in situations where the return type isn’t clear, such as when an exception is thrown:

scala> def throwException = throw new RuntimeException(
                      "Always throws exception")
throwException: Nothing

When you see a method returning Nothing, that means that method won’t return successfully. Here’s another example of invoking exit (System.exit) to terminate the runtime:

scala> def kill = sys.exit(1)
kill: Nothing

In Scala, an immutable List is covariant in its type parameter, so List[String] is a subtype of List[Any]. You can take an instance, List[String], and assign it to a List of Any:

scala> val everything: List[Any] = List("one", "two", "three")
everything: List[Any] = List(one, two, three)

Covariance has lots of useful advantages. Look at the following method:

def ++(that: GenTraversableOnce [A]): List[A]

That method takes a type of iterator (GenTraversableOnce is a template trait for all objects that could be iterated once) and returns a concatenated collection. Similarly the collection.Seq, collection.Iterable and collecton.Traversable also provide the same method:

def ++(that: GenTraversableOnce[A]): Traversable[A]
def ++(that: GenTraversableOnce[A]): Iterable[A]
def ++(that: GenTraversableOnce [A]: Seq[A]

Traversable is the parent trait for all the collection types in Scala, and the ++ method is only defined in this trait. In the preceding example Seq, Iterable, and List inherit the definition from Traversable. Still, depending on the type of collection you’re dealing with, it returns the same type of collection back, because Traversable is defined with a covariant parameter. You’ll explore Scala collection hierarchy later in this chapter.

The opposite of covariance is contravariance. In the case of covariance, subtyping can go downward, as you saw in the example of List, but in contravariance it’s the opposite: subtypes go upward. Contravariance comes in handy when you have a mutable data structure.

Mutable objects need to be invariant

A type parameter is invariant when it’s neither covariant nor contravariant. All Scala mutable collection classes are invariant. An example can explain why mutable objects need to be invariant. By now, you should be comfortable using collection.immutable.List; the mutable counterpart of List is collection.mutable.ListBuffer. Because ListBuffer is mutable, it’s declared as invariant as follows:

final class ListBuffer[A] ...{ ... }

Notice that when you declare an invariant type, you drop the - or + sign. Because it’s declared as invariant, you can’t assign ListBuffer from one type to another. The following code will throw a compilation error:

scala> val mxs: ListBuffer[String] = ListBuffer("pants")
mxs: scala.collection.mutable.ListBuffer[String] =
          ListBuffer(pants)
scala> val everything: ListBuffer[Any] = mxs
<console>:6: error: type mismatch;
 found   : scala.collection.mutable.ListBuffer[String]
 required: scala.collection.mutable.ListBuffer[Any]
       val everything: ListBuffer[Any] = mxs

Even though String is a subtype of scala.Any, Scala still doesn’t let you assign mxs to everything. To understand why, assume ListBuffer is covariant and the following code snippet works without any compilation problem:

scala> val mxs: ListBuffer[String] = ListBuffer("pants")
mxs: scala.collection.mutable.ListBuffer[String] =
          ListBuffer(pants)
scala> val everything: ListBuffer[Any] = mxs

scala> everything += 1
res4: everything.type = ListBuffer(1, pants)

Can you spot the problem? Because everything is of the type Any, you can store an integer value into a collection of strings. This is a disaster waiting to happen. It’s exactly what happens to Java arrays. To avoid these kinds of problems, it’s always a good idea to make mutable objects invariant. The next question is what happens in case of an immutable object for collections. It turns out that for immutable objects, covariance isn’t a problem at all. If you replace ListBuffer with the immutable List, you can take an instance of List[String] and assign it to List[Any] without a problem.

scala> val xs: List[String] = List("pants")
xs: List[String] = List(pants)
scala> val everything: List[Any] = xs
everything: List[Any] = List(pants)

The only reason this assignment is safe is because List is immutable. You can add 1 to xs List, and it will return a new List of type Any.

scala> 1 :: xs
res5: List[Any] = List(1, pants)

Again, this addition is safe because the cons(::) method always returns a new List, and its type is determined by the type of elements in the List. The only type that could store an integer value and reference value is scala.Any. This is an important property to remember about type variance when dealing with mutable/immutable objects.

The best way to understand contravariance is to see the problem that comes when it’s absent. Try to spot the problem in the following Java code example:

Object[] arr = new int[1];
arr[0] = "Hello, there!";

You end up assigning the string to an integer array. Java fortunately catches this error at runtime by throwing an ArrayStoreException. Scala stops these kinds of errors at compile time by forcing parameter types to be either contravariant or invariant. A type parameter is invariant when it’s neither covariant nor contravariant. A good example of contravariance in action is the Function1 trait defined in Scala:

trait Function1[-P, +R] { ... }

Scala uses the minus sign (-) to denote contravariance and the plus sign (+) for covariance. In this case, Function1 is contravariant in P and covariant in R. In Scala, functions are also values and have type. For example, Function1 type represents any function that takes one parameter. The important question is why the Function1 trait is contravariant for the parameter and covariant for the return type.

To answer, consider what will happen when the opposite is the case—for instance, covariant for parameter and contravariant for return value. Now imagine you have a covariant parameter, as in the following example:

val addOne: Function1[Any, Int] = { x: Int => x + 1 }

Because Int is a subtype of scala.Any, the covariant parameter should allow the preceding code to compile. But the loophole in this code is that you can now invoke addOne with any type of parameter as long as it’s a subtype of scala.Any. Doing so could cause all sorts of problems for a function that expects only Int. Scala, being a type-safe language, doesn’t allow you to do that. The only other option you have is to declare the parameter type as invariant, but that would make the Function1 trait inflexible. A contravariant parameter type is the only viable option to create a type-safe function.

You can’t use a contravariant return type because you again get into a problem. Consider the following example:

val asString: Int => Int = { x: Int => (x.toString: Any) }

This code is not valid because Any is a super type of Int, and a contravariant allows you to go from a narrower type to a more generic type. The following piece of code should also be valid then, right?

asString(10) + 20

But you end up adding 20 to a string value, and that could be problematic. Scala’s strong type system implementation stops you from making these kinds of mistakes when dealing with parameter types and return types. To have a flexible and type-safe Function1 trait, the only possible implementation would be to have the parameter contravariant and the return type covariant. In the next section you’ll learn to set bounds for type parameters—another important concept associated with type parameters.

4.3. Lower and upper type bounds

In Scala, type parameters may be constrained by type bound. Such type bounds limit the concrete values of the type variables. An example will illustrate this concept more clearly. The position function in the following listing throws an exception if you invoke get method on the Nil object:

scala> val xs = List("one", "two", "three")
xs: List[java.lang.String] = List(one, two, three)

scala> position(xs, "two").get
res3: Int = 1

scala> position(List(), "two").get
java.util.NoSuchElementException: Nil.get
  at Nil$.get(<console>:7)
  at Nil$.get(<console>:5)
  at .<init>(<console>:10)

Wouldn’t it better to pass a default value in case the element isn’t found? That way you can control the outcome in case of error. You can add a new method to the Maybe abstract class that will take a default callback:

sealed abstract class Maybe[+A] {
  def isEmpty: Boolean
  def get: A
  def getOrElse(default: A): A = {
    if(isEmpty) default else get
  }
}

Here getOrElse returns the default value if the isEmpty method returns true. In case of a Nil instance, isEmpty will always return true. But if you try to compile this code, you’ll get the following compiler error:

covariant type A occurs in contravariant position in type => A of value default

Because A is a covariant type, Scala doesn’t allow the covariant type as an input parameter. You’ll also get the following compilation error if A is contravariant because it’s used as a return type for get:

contravariant type A occurs in covariant position in type => A of method get

You could solve this problem in two ways: make the Maybe class an invariant and lose all the subtyping with Just and Nil, or use type bound. I’m not willing to give up nice subtyping, so let’s go with the local type bound.

Scala provides two types of type bound: lower and upper. An upper type bound T <: A declares that type variable T is a subtype of a type A, and A is the upper bound. To create a function that can only take subclasses of Maybe, you can write something like the following:

def defaultToNull[A <: Maybe[_]](p: A) = {
  p.getOrElse(null)
}

The function defaultToNull takes parameter A, and it’s constrained to one of the subtypes of Maybe.

Because Maybe takes a type parameter, you have to declare the type parameter when defining the upper type bound. If you don’t care about the type parameter, you can use the _ placeholder as in the last example.

A lower bound sets the lower limit of the type parameter. The lower bound T >: A declares that the type parameter T is constrained to some super type of type A. You can use the lower type bounds to implement the getOrElse method. The lower bound helps you restrict the type of the parameter to A to some super type. The following listing shows the complete Maybe classes.

Listing 4.2. Complete Maybe.scala

The Maybe class is defined with a covariant parameter type A so that its subclasses can return more specialized types. The getOrElse method returns the value contained by Just or the default value when it’s empty. Because the default value is taken as a parameter, you have to set the lower bound to A to satisfy the contravariant rule.

Just and Nil are the two subclasses that represent success and error situations. The sealed modifier restricts anyone else from creating a subclass of Maybe (modifiers are covered in chapter 3).

When getting the index, you can always invoke the getOrElse method and avoid any unnecessary exceptions:

scala> position(List(), "something").getOrElse(-1)
res6: Int = -1

scala> position(List("one", "two", "three"), "three").getOrElse(-1)
res8: Int = 2

Scala’s type system prevents you from making mistakes that could be easily caught during the compilation phase. You’ve learned the important role that covariance and contravariance play in building nice, type-safe applications. But remember that the approaches discussed here work only for immutable objects. When it comes to mutable objects, it’s always safer to make them invariant.

You’ve only scratched the surface of Scala’s type system. But I think you have enough to understand the data structures discussed in this chapter. You’ll see more examples and explanations of Scala’s type system throughout the book.

4.4. Higher-order functions, including map, flatMap, and friends

A function is called higher order if it takes a function as an argument or returns a function as a result. You’ve already seen some examples of Scala immutable.List. Now consider one of its methods called map. This method allows you to build a new list by applying a function to all elements of a given List:

class List[+A] ...  {

    def map[B](f: A => B) : List[B]
}

Here A represents the type of List in which the map is defined, and B is the type of the new List. To define a function argument, you have to use the => sign. In this case, f is a function that takes a single parameter of type A and returns a result of type B. In the following examples you’re creating a new List by adding 1 to all the elements of a given list in various ways:

scala> List(1, 2, 3) map { (x: Int) => x + 1 }
res12: List[Int] = List(2, 3, 4)

scala> List(1, 2, 3) map { _ + 1 }
res13: List[Int] = List(2, 3, 4)

scala> def addOne(num:Int) = num + 1
addOne: (num: Int)Int

scala> List(1, 2, 3) map addOne
res11: List[Int] = List(2, 3, 4)

In the first case you’re passing an anonymous function that takes x as a parameter and adds 1 to it. In the second case you’re using a function literal, where a placeholder represents the parameter. In the last example, you’re passing an existing function without referring to the parameters that are passed between the map function and the addOne function. This is a good example of pointfree-style[1] programming and functional composition. It’s also an example of a call-by-name invocation, where the parameters of the function are wrapped inside another function. The function addOne is invoked every time map accesses the function.

1 “Pointfree,” June 5, 2011 (last modified), www.haskell.org/haskellwiki/Pointfree.

You haven’t looked at an example where a function returns another function. Let’s fix that by refactoring the addOne function to add a nested function that abstracts our increment to 1 operation:

def addOne(num: Int) = {
  def ++ = (x:Int) => x + 1
  ++(num)
}
scala> List(1, 2, 3) map addOne
res15: List[Int] = List(2, 3, 4)
Call-by-value, call-by-reference, and call-by-name method invocation

Java supports two types of method invocation: call-by-reference and call-by-value. In call-by-reference invocation, a function is invoked with a reference to an object. In the case of Scala, it’s any subtype of AnyRef. In the case of call-by-value invocation, the function is invoked with a value type. In Scala these are value types like Int or Float. Remember, Scala unboxes and boxes value types from and to an object depending on how it’s used in the code. Along with these standard ones, Scala also provides additional method invocation mechanisms called call-by-name and call-by-need. In call-by-name method invocation, Scala passes a method parameter as a function. Let’s look at an example that demonstrates this feature. In the following example, you have a log function, which logs a message when log is enabled:

def log(m: String) = if(logEnabled) println(m)

But to retrieve a log message, you have to look it up in the error queue—a time-consuming operation:

def popErrorMessage = { popMessageFromASlowQueue() }
log("The error message is " + popErrorMessage).

Here the parameter will always be evaluated even if log isn’t enabled, and maybe you don’t want that. You can use call-by-name to avoid the unnecessary computation:

def log(m: => String) = if(logEnabled) println(m)

Now by using =>, you’ve made the parameter a call-by-name, and it won’t affect the way you’re invoking the log function. But Scala will pass the parameter as a function that will be evaluated every time the parameter is accessed. If the log isn’t enabled, the parameter will never be evaluated. Later in this chapter you’ll see how lazy collections like Stream use the call-by-name method invocation pattern.

Here the nested function ++ returns another function that takes Int as a parameter and returns Int. If you evaluate the ++ function in REPL, you’ll see that the return type of the function is Int => Int:

scala> def ++ = (x:Int) => x + 1
$plus$plus: (Int) => Int

How can you implement a function like map that works for any type of list? You have a couple of ways to implement the map function—one uses recursion and the other uses a for-comprehension. The following is the implementation based on recursion, where using pattern matching you’re extracting elements from the list and applying the supplied function:

def map[A, B](xs: List[A], f: A => B): List[B] = {
  xs match {
    case List() => Nil
    case head :: tail => f(head) :: map(tail, f)
  }
}

You’re returning Nil (empty List) when the List is empty, and when the List isn’t empty you’re using pattern matching to separate List into head and tail. The head is assigned to the first element in the List, and the tail is the remaining list. You’re using cons (::) to append the result of the f function recursively.

If you try the map function with List(1, 2, 3) and the addOne function as parameters, the execution steps look like this:

case 1 :: List(2, 3) => addOne(1) :: map(List(2, 3), addOne)
case 2 :: List(3) => 2 :: addOne(2) :: map(List(3), addOne)
case 3 :: List() => 2 :: 3 :: addOne(3) :: map(List(), addOne)

At this point the empty list will match the first case, and the final expression will look like this:

2 :: 3 :: 4 :: Nil

Now the result of the each function f will be prepended to the empty list, resulting in a new list:

scala> map(List(1, 2, 3), addOne)
res0: List[Int] = List(2, 3, 4)
How does head :: tail work?

This pattern is called the Infix Operation pattern, where the expression head :: tail is shorthand for the constructor pattern ::(head, tail). The immutable List in Scala is defined as an abstract sealed class, and its only two subclasses are Nil and ::. Both are case classes. As you already know, one of the biggest benefits of Scala case classes is that they can participate in pattern matching. So ::(head, tail) matches the constructor of the :: case class that takes head as the first element and the list as the second element, which in this case is tail.

The associativity of the cons ( :: ) is right instead of left. In the expression 2 :: 3 :: 4 :: Nil you’re invoking :: (cons) on Nil first, and then on the result you return by the first cons operation, and so on. You could write the expression the following way too: Nil.::(2).::(3).::(4). In Scala, the associativity of an operator is determined by the operator’s last character. If the operator ends with :, then it’s right-associative. All other operators are left-associative. The associativity of the operator confuses many Scala newcomers. Always remember that associativity is determined by the last character of the operator.

Another simple way to implement the map function is to use a for-comprehension; here’s the implementation:

def map1[A, B](f: A => B, xs: List[A]): List[B] = for(x <- xs) yield f(x)

Another interesting method defined for List is flatMap. This method builds a new collection by applying a function to all elements of this list and concatenating the result. The following shows how the flatMap method is declared in the List class:

class List[+A] { ...
    def flatMap[B](f: A => GenTraversableOnce[B]): List[B]
}

GenTraversableOnce represents all collection types that could be iterated either sequentially or in parallel. All the traits that start with Gen are introduced to the collection library to implement operations that are applicable to both sequential and parallel collections. Here you’re focusing on sequential collections—later you’ll learn about parallel collections.

The flatMap method works similarly to the map method with the added ability to flatten a collection of collections into a single collection. In the following example you’re creating a list of characters from a list of strings:

scala> List("one","two", "three", "") flatMap { _.toList }
res5: List[Char] = List(o, n, e, t, w, o, t, h, r, e, e)

As mentioned earlier, String is treated as a Seq-like collection in Scala, and it exposes a method called toList to convert to a List. In the previous example you invoked toList on each element in the list. If you use map instead of flatMap, you get the following result:

scala> List("one","two", "three", "") map { _.toList }
res7: List[List[Char]] = List(List(o, n, e), List(t, w, o), List(t, h, r, e,
     e), List())

As you can see, flatMap takes the result of the map and flattens it to a single list. Here’s how you could implement the flatMap function for List:

def flatten[B](xss: List[List[B]]): List[B] = {
  xss match {
    case List() => Nil
    case head :: tail => head ::: flatten(tail)
  }
}

def flatMap[A, B](xs: List[A])(f: A => List[B]) : List[B] = {
  flatten(map(xs, f))
}

Here flatMap is implemented by combining map with a new function called flatten that takes a List of List and flattens it to a single List. The ::: is another method defined for List that appends the contents of one List to another. Let’s try the flatMap function:

scala> flatMap(List("one", "two", "three")) { _.toList }
res9: List[Char] = List(o, n, e, t, w, o, t, h, r, e, e)

The flatMap function in that example is declared a little differently, as if it has two sets of parameters, one for the List and another for the f parameter:

def flatMap[A, B](xs: List[A])(f: A => List[B]) : List[B]

This is called currying.[2] Currying allows you to chain functions one after another with a single parameter. You’ll explore currying in detail in chapter 5. An additional benefit of separating parameters of functions is that it allows you to pass functions as closures {_.toList }.

2 Daniel Spiewak (blog), “Function Currying in Scala,” March 2008, http://mng.bz/u0sJ.

What’s the difference between a lambda and a closure?

A lambda is an anonymous function—a function without a name. You’ve already seen some examples of it. Closure is any function that closes over the environment in which it’s defined. Let’s use an example to explore this fact, applying a percentage to a list of amounts:

scala> List(100, 200, 300) map { _ * 10/100 }
res34: List[Int] = List(10, 20, 30)

In this case the function you’re passing to the map function is a lambda. Now assume that the percentage value could change over time, so save the current percentage value in a variable.

scala> var percentage = 10
percentage: Int = 10
scala> val applyPercentage = (amount: Int) =>
amount * percentage/100
applyPercentage: (Int) => Int = <function1>

In this case applyPercentage is a closure because it keeps track of the environment in which it’s created, like a percentage variable:

scala> percentage = 20
percentage: Int = 20
scala> List(100, 200, 300) map applyPercentage
res33: List[Int] = List(20, 40, 60)

Lambdas and closures are different concepts, but they’re closely related.

The downside of using a recursive solution is that it can throw a stack overflow error on large datasets. An alternative is to implement the function using tail recursion so that the Scala compiler could do tail call optimization and transform the recursion into a loop. In tail recursion you perform your calculation first and then execute the recursive call by passing the result of the current step to the next step. Here’s the implementation of the flatten function using tail recursion:

def flatten3[B](xss: List[List[B]]): List[B] = {
def _flatten3(oldList: List[List[B]], newList: List[B]): List[B] =
oldList match {
  case List() => newList
  case head :: tail => _flatten3(tail, newList ::: head)
}
_flatten3(xss, Nil)
}

In this case the flatten function is implemented using a nested function that uses the tail recursion pattern. The result of newList ::: head is passed as a parameter to the function so that the Scala compiler can optimize it. You’ll learn more about tail call recursion in the next chapter. In the next section you’ll explore another new concept called fold that allows you to process a data structure in some order and build a return value.

4.5. Using foldLeft and foldRight

Two other interesting methods defined in List are foldLeft and foldRight. These two operations allow you to perform binary operations on all the elements of the List. Both of these methods in List are declared as follows:

class List[+A] {
     ...

    def foldLeft[B](z: B)(f: (B, A) => B): B

    def foldRight[B](z: B)(f: (A, B) => B): B
}

The main difference between these methods is the way the parameters are passed to the function f. In foldLeft, z represents the start value, and it’s passed as the first parameter to f. For foldRight the start value is passed as the second parameter. The function f is a combining function that takes two parameters and produces a single result value. To understand how fold works, look at the previous flatten implementation one more time. Notice that it’s similar to the recursive implementation of map:

def map[A, B](xs: List[A], f: A => B): List[B] = {
  xs match {
    case List() => Nil
    case head :: tail => f(head) :: map(tail, f)
  }
}

def flatten[B](xss: List[List[B]]): List[B] = {
  xss match {
    case List() => Nil
    case head :: tail => head ::: flatten(tail)
  }
}

There’s a common pattern in these functions: you do one thing when List is empty and something else when List isn’t empty. The only thing that’s different between these functions is the binary operation. You can avoid this duplication by using foldRight:

def map2[A, B](xs: List[A])(f: A => B): List[B] = {
  val startValue = List.empty[B]
  xs.foldRight(startValue) { f(_) :: _ }
}

def flatten2[B](xss: List[List[B]]): List[B] = {
  val startValue = List.empty[B]
  xss.foldRight(startValue) { _ ::: _ }
}

The startValue is set to an empty List, and the combining function lets you apply the binary operator to all elements. The reason for using foldRight here is the right-associativeness of the :: and ::: operators. Both the cases start with an empty List and invoke either :: or ::: on the parameters. I’m sure by now you’re familiar with the underscore (_) as a placeholder for parameters. This is a good example of how having higher-order functions helps libraries build common operations that can be used in many different ways without duplicating code.

Avoid foldRight as much as possible as it uses recursion and can potentially throw a stack overflow error. In some cases the Scala compiler transforms recursive functions to loops—you’ll learn about them in the next chapter. One alternative approach is to use foldLeft and reverse the result. For example, you can implement map2 using foldLeft and reverse the result:

def map3[A, B](xs: List[A])(f: A => B): List[B] = {
  val startValue = List.empty[B]
  xs.foldLeft(startValue)((a, x) => f(x) :: a).reverse
}

The foldLeft method applies a binary operator to a start value and all elements of a List going from left to right. Here’s how to use foldLeft to calculate the sum and length of a List:

scala> List(1, 2, 3, 4).foldLeft(0) { _ + _ }
res25: Int = 10

scala> List(1, 2, 3, 4).foldLeft(0) { (a, b) => a + 1 }
res27: Int = 4

The first example calculates the sum of all the elements, and the second calculates the length of the List. The second example can’t use underscore because you aren’t using all the function parameters to compute the length of the List.

Both foldLeft and foldRight have an alias version, /: (foldLeft) and : (foldRight), that you can use instead. But Scala programmers tend to prefer foldLeft and foldRight to symbolic versions.

Folding is a great way to process collections of data to build a return value. Where map and flatMap allow you to transform contents of a collection, folding allows you to do structural transformations. You can use folding in many interesting ways to solve your day-to-day programming problems. The following example uses foldLeft to determine whether an element exists in a List:

def exists[A](xs: List[A], e: A) =
xs.foldLeft(false)((a, x) => a || (x == e))

It must be obvious by now that higher-order functions provide a nice way to encode common programming idioms as a part of the library. Using foldLeft and foldRight, you can perform any type of binary operation without creating additional functions. Next time you feel the need to create a loop, step back and think about ways you could use higher abstractions like maps or folds.

Just because the Scala List collection is used for all the examples doesn’t mean it should be your go-to collection type. For a given problem, always try to choose the right collection type. Later in this chapter I introduce you to other Scala collection types.

All the goodness of higher-order functions is possible in Scala because of its function objects. The next section explores how function objects work in Scala.

4.6. Building your own function objects

A function object is an object that you can use as a function. That doesn’t help you much, so why don’t I show you an example? Here you create a function object that wraps the foldLeft method you saw in the previous example:

object foldl {
  def apply[A, B](xs: Traversable[A], defaultValue: B)(op: (B, A) => B) =
            (defaultValue /: xs)(op)
}

Now you can use the foldl function object like any function you’ve used before:

scala> foldl(List("1", "2", "3"), "0") { _ + _ }
res0: java.lang.String = 0123

scala> foldl(IndexedSeq("1", "2", "3"), "0") { _ + _ }
res24: java.lang.String = 0123

scala> foldl(Set("1", "2", "3"), "0") { _ + _ }
res25: java.lang.String = 0123

You’ve already seen a few examples of apply and how to use them. It’s no different when creating a function object. To treat an object as a function object, all you have to do is declare an apply method. In Scala, <object>(<arguments>) is syntactic sugar for <object>.apply(<arguments>). You can also have multiple sets of arguments. Because you’ve defined the parameter Traversable, the parent trait for all collection types in Scala, you can pass any collection type as a parameter.

The expression (defaultValue /: xs)(op) might look a little cryptic, but the idea is to demonstrate the alternative syntax for foldLeft, /:. Remember that when an operator ends with :, the right associativeness kicks in.

When declaring function objects, it’s a good idea to extend one of the Function traits defined in the Scala library. In Scala, the Functon1 trait is declared as follows:

trait Function1[-T1, +R] extends AnyRef {
  def apply(v: T1): R
}

Here 1 stands for “function with one parameter.” Similarly, Scala defines a function trait for two or more parameters. In the following example, you’re building an increment function that takes an integer parameter and increments its value by 1:

object ++ extends Function1[Int, Int]{
  def apply(p: Int): Int = p + 1
}

The shorthand and equivalent implementation of this function would be:

val ++ = (x: Int) => x + 1

There’s one more way you could define your function object: using the alternative notation of function =>.

object ++ extends (Int => Int) {
  def apply(p: Int): Int = p + 1
}

In the last case you’re using the shorthand notation of Function1; that is, Int => Int. You use a similar syntactic notation when declaring higher-order functions. You can use function objects anywhere you’ve used lambdas or closures before. Here you’re invoking your map function with your new ++ function:

scala> map(List(10, 20, 30), ++)
res1: List[Int] = List(11, 21, 31)

This is the same as invoking it with an anonymous function:

scala> map(List(10, 20, 30), (x: Int) => x + 1)
res2: List[Int] = List(11, 21, 31)

It’s also the same as passing the Function trait:

scala> map(List(10, 20, 30), new Function1[Int, Int] {
              def apply(p: Int) = p + 1
          })
res3: List[Int] = List(11, 21, 31)

When passing an existing function (not a function object) as a parameter, Scala creates a new anonymous function object with an apply method, which invokes the original function. This is called eta-expansion.

One interesting subtrait of the Function1[-P, +R] trait is PartialFunction, which allows you to define a function that’s not defined for all P and allows you to compose with other partial functions to create a complete function that’s defined for all values of input parameter(s). You’ll explore PartialFunction in the next chapter, where you’ll have a good use case for it.

Function1 is also defined as Function

Because the Function1 trait is used so often, Scala defines a type alias called Function. You won’t find any reference to this trait in scaladoc because this type alias is defined inside the Predef class as follows:

type Function[-A, +B] = Function1[A, B]

type is a keyword in Scala, and it’s used to create type aliases. It’s similar to typedef in C. I discuss type variables at length in chapter 6, where we’ll explore using them to create abstract members in Scala.

Another helpful use of type variables is as a representation of a complicated type:

type MILS = Map[Int, List[String]]
val mapping: MILS = Map(
    1 -> List("one", "uno"), 2 -> List("two", "dos"))

Function traits also let you compose two functions to create a new function. This is important because in functional programming you tend to solve problem by combining functions. The following example composes the same function twice to create a new one:

val addOne: Int => Int = x => x + 1
val addTwo: Int => Int = x => x + 2
val addThree = addOne compose addTwo

Composing the addOne and addTwo functions together creates the addThree function. This is similar to the following implementation:

val addThree: Int => Int = x => addOne(addTwo(x))

The compose method allows you to chain functions together to create new functions. You’ll look into function composition in much more detail in the next chapter. Now it’s time for you to explore the Scala collection hierarchy.

4.7. Scala collection hierarchy

One of the major features of the new Scala 2.8 release is its new, improved collection library.[3], [4] You’re going to explore Scala collections in detail in this section.

3 Martin Odersky, SID#3 “New collection classes,” revised, July 2010,www.scala-lang.org/sid/3.

4 Martin Odersky, A. Moors, “Fighting Bit Rot with Types,” 2009, http://mng.bz/104f.

The Scala collection classes are part of the scala.collection package and its subpackages. The scala.collection.mutable package defines collection classes that provide side-effect operations that could change the state of the collection in place. On the other hand, scala.collection.immutable is guaranteed to be immutable for everyone. The immutable collections are sometimes called persistent data structures because you can be certain that accessing the same collection will yield the same result over time. Here persistent has nothing to do with a database or anything like that, but over time an immutable collection stays unchanged during the current program execution. Any change in an immutable collection produces a new collection without touching the existing collection.

The generic subpackage provides building blocks for implementing various collections. Typically, collection classes in mutable and immutable packages use the classes in the generic package for implementation of some of their operations. Normally you don’t have to deal with the classes in the generic package unless you’re planning to create custom collection classes on your own.

A collection class defined in the package scala.collection can be either mutable or immutable. For example, scala.collection.Map[A, +B] is a super class for both collection.mutable.Map[A, B] and collection.immutable.Map[A, +B]. Generally, the root collections in the package scala.collection define the same interface as the immutable collections, and the mutable collections add additional methods on top of the immutable collections that allow mutation of the collection. In the case of mutable map, it provides methods like += and -= that add and remove elements from the collection and hence change the collection in the process. Even though you can use a root collection type as a reference to an immutable collection, it’s always better to be explicit about the type of collection (mutable or immutable) when dealing with collections so that users can figure out the code easily. Here you’re assigning both mutable and immutable collections to the collection.Map type value, and it’s not clear from the type of the mapping whether or not the map it refers to can be changed by others:

scala> val mapping: collection.Map[String, String] = Map("Ron" -> "admin",
     "Sam" -> "Analyst")
mapping: scala.collection.Map[String,String] =
Map(Ron -> admin, Sam -> Analyst)

scala> val mapping: collection.Map[String, String] =
     collection.mutable.Map("Ron" -> "admin", "Sam" -> "Analyst")
mapping: scala.collection.Map[String,String] = Map(Sam -> Analyst, Ron ->
     admin)

Scala automatically imports immutable collections, but you have to explicitly import mutable collection types, as you did for collection.mutable.Map. Figure 4.1 shows a simplified version of the Scala collection library.

Figure 4.1. Scala collection hierarchy with three main types of collections: Set, Seq, and Map

The root of the hierarchy is the trait Traversable. This base trait implements the behavior common to all collection types in Scala (see table 4.1). The only abstract method defined in the Traversable trait is foreach:

Table 4.1. Useful methods defined in the Traversable trait

Methods

Description

xs.size The number of elements in the collection.
xs ++ ys A collection consisting of the elements of both xs and ys.
xs map f The collection obtained from applying the function f to every element in xs.
xs flatMap f The collection obtained from applying the collection valued function f to every element in xs and concatenating the results.
xs filter p The collection consisting of those elements of xs that satisfy the predicate p.
xs find p An option containing the first element in xs that satisfies p, or None if no element qualifies.
(z /: xs)(op) Apply binary operation op between successive elements of xs, going left to right and starting with z. /: is alias for foldLeft.
(xs : z)(op) Apply binary operation op between successive elements of xs, going right to left and starting with z. : is alias for foldRight.
xs.head The first element of the collection (or some element, if no order is defined).
xs.tail The rest of the collection except xs.head.
xs mkString sep Produces a string that shows all elements of xs between separators sep.
def foreach[U](f: Elem => U)

Let’s see an example where you can use this knowledge. You’ve already seen examples of how to use Java collections with Scala, but here’s another simple example where you could wrap any Java collection to Traversable so that you could use it in Scala freely:

import java.util.{Collection => JCollection, ArrayList }

class JavaToTraversable[E](javaCollection: JCollection[E]) extends
                                                           Traversable[E] {
  def foreach[U](f : E => U): Unit = {
    val iterator = javaCollection.iterator
    while(iterator.hasNext) {
      f(iterator.next)
    }
  }
}

You’re providing a concrete implementation of only an abstract method in the Traversable trait, foreach. Now with just one concrete method, you can use all the methods defined in the Traversable trait, such as map, foldLeft, or filter:

scala> val jCol = new ArrayList[Int]
jCol: java.util.ArrayList[Int] = []

scala> (1 to 5) foreach { jCol.add(_) }

scala> jCol
res3: java.util.ArrayList[Int] = [1, 2, 3, 4, 5]

scala> val jtoT = new JavaToTraversable(jCol)
jtoT: JavaToTraversable[Int] = line3(1, 2, 3, 4, 5)

scala> jtoT map { _ * 10 } filter { _ > 20 }
res10: Traversable[Int] = List(30, 40, 50)

In Scala you can define a traversable object as finite or infinite; hasDefiniteSize determines whether a collection is finite or infinite. You’ll see examples of an infinite collection later in this chapter.

4.8. Mutable and immutable collections

A collection in a scala.collection can be both mutable and immutable. You read about the difference between mutable and immutable collection classes in the previous section. Here you’ll focus on both mutable and immutable collections. But before I start looking into specific collection classes, let’s peek at the Iterable trait. This trait comes after Traversable. It provides the implementation of foreach that you learned in the last section and it exposes a new abstract method called iterator. It also adds methods like takeRight and dropRight along with the methods defined in the Traversable trait. takeRight returns a collection consisting of the last n elements for a given n, and dropRight does the opposite:

scala> Iterable(1, 2, 3, 4, 5) dropRight 2
res0: Iterable[Int] = List(1, 2, 3)

scala> Iterable(1, 2, 3, 4, 5) takeRight 2
res1: Iterable[Int] = List(4, 5)

The most interesting things about the Iterable trait are its three base classes: Seq, Set, and Map. One thing common among these subclasses is that they all implement the PartialFunction trait, and that means that all of them have the apply method. You’ll now explore these base classes and their subclasses in detail.

4.9. Working with List and ListBuffer

The elements in a sequence are indexed, and they’re indexed from 0 to length – 1, where length is the number of elements in the sequence collection. Because Seq also implements PartialFunction, it has an apply method (the Function trait defines an apply method), and it’s a partial function from Int to an element. The reason it’s partial is because for all values of Int you may not have elements in the collection. In the following example you’re trying to access an element that exists in the sequence and one that doesn’t:

scala> val languages = Seq("Scala", "Haskell", "OCaml", "ML")
languages: Seq[java.lang.String] = List(Scala, Haskell, OCaml, ML)

scala> languages(1)
res11: java.lang.String = Haskell

scala> languages(10)
java.lang.IndexOutOfBoundsException
    at scala.collection.LinearSeqLike$class.apply(LinearSeqLike.scala:78)
    at scala.collection.immutable.List.apply(List.scala:46)
Using a collection as PartialFunction

Along with the standard apply method, the PartialFunction trait in Scala defines a couple of interesting methods, including andThen and orElse. For example, to avoid situations where your Seq doesn’t have elements, you could use orElse, which works as a fallback when the partial function isn’t defined for a given value. Here’s how to use orElse:

val default: PartialFunction[Int, String] = {
  case _ => "Is it a functional language?" }
val languagesWithDefault = languages orElse default

Now if you try to access an index that doesn’t exist, your default is used instead.

languagesWithDefault(10) will produce "Is it a functional language?"

This is a good example of function composition and how to use it in Scala. You’ll explore PartialFunction and other function composition parts and their usage throughout the book.

If the sequence is mutable like ListBuffer, then along with the apply method it offers an update method. An assignment is turned into an update method call. In the following code snippet you’re creating a ListBuffer and updating it:

The assignment at and the update method called at are identical. The two main subclasses for Seq are LinearSeq and Vector. These classes provide different performance characteristics. Vector provides efficient apply and length operations, whereas LinearSeq has efficient head and tail operations. The most common subclasses of LinearSeq are List and Stream. You’ve already seen examples of List. You’ll explore Stream shortly.

What type of collection should I use?

Scala collections provide various types of collections, and every collection type has different performance characteristics. Making sure that you select the appropriate collection type to solve your problem is important. In case you’re not sure what type of collection to use, always reach out for Vector. Overall, Vector has better performance characteristics compared to other collection types.

One interesting subcategory of Sequences in Scala is Buffers. Buffers are always mutable, and most of the collections I talk about here are internally built using Buffers. The two common subclasses of Buffers are mutable.ListBuffer and mutable.ArrayBuffer.

4.9.1. Working with Set and SortedSet

Set is an iterable collection type that contains no duplicate elements. Set provides the contains method to check whether the given element exists in the Set, and the apply method that does the same thing. Here’s how to use them:

scala> val frameworks = Set("Lift", "Akka", "Playframework", "Scalaz")
frameworks: scala.collection.mutable.Set[java.lang.String] =
Set(Lift, Playframework, Akka, Scalaz)

scala> frameworks contains "Lift"
res36: Boolean = true

scala> frameworks contains "Scalacheck"
res37: Boolean = false

scala> frameworks("Lift")
res38: Boolean = true

To add or remove elements to or from an immutable Set, use + and , respectively. Using these methods for a mutable Set isn’t a good idea because it will create a new Set. It will not update itself. A better way to change mutable Sets is using the += and -= methods (for other useful methods available for Set, see table 4.2). Here are some examples of adding and removing elements from both immutable and mutable Sets:

Table 4.2. Useful methods defined in immutable and mutable Sets

Methods

Description

xs contains x Test whether x is an element of xs.
xs ++ ys Set containing all elements of xs and additional elements from ys. It won’t add the duplicate entries from ys.
xs & ys, xs intersect ys The set intersection of xs and ys.
xs | ys, xs union ys The set union of xs and ys.
xs &~ ys, xs diff ys The set difference of xs and ys.
xs ++= ys Add elements from ys to set xs as a side effect and return xs itself. Only for mutable Set.
xs(x) = b, xs.update(x, b) If Boolean argument b is true, add x to xs; otherwise remove x from xs.
xs.clear() Remove all elements from xs.
scala> val frameworks = Set() + "Akka" + "Lift" + "Scalaz"
frameworks: scala.collection.immutable.Set[java.lang.String] = Set(Akka,
     Lift, Scalaz)

scala> val frameworks = Set("Akka",  "Lift", "Scalaz") - "Lift"
frameworks: scala.collection.immutable.Set[java.lang.String] = Set(Akka,
     Scalaz)

scala> val mFrameworks = collection.mutable.Set[String]()
mFrameworks: scala.collection.mutable.Set[String] = Set()

scala> mFrameworks += "Akka" += "Lift"
res5: mFrameworks.type = Set(Lift, Akka)

scala> mFrameworks += "Scalacheck"
res12: mFrameworks.type = Set(Lift, Akka, Scalacheck)

Along with the add and remove methods, you can perform other Set operations, such as union, intersect, and diff. One interesting subtrait of Set is SortedSet. When iterator or foreach is called on SortedSet, it produces its elements in a certain order. In the following code snippet you’re adding two sets, one using Set and the other using SortedSet. In the case of SortedSet, the order of the elements is maintained:

scala> Set(1, 2, 3) ++ Set(3, 4, 5)
res15: scala.collection.immutable.Set[Int] = Set(5, 1, 2, 3, 4)

scala> import collection.immutable.SortedSet
import collection.immutable.SortedSet

scala> SortedSet (1, 2, 3) ++ SortedSet(3, 5, 4)
res18: scala.collection.immutable.SortedSet [Int] = TreeSet(1, 2, 3, 4, 5)

4.9.2. Working with Map and Tuple

Maps are iterable pairs of keys and values. The key-value pair is represented by scala.Tuple2, a tuple of two elements. Unlike other collections, a Tuple is a heterogeneous collection where you can store various types of elements. Here’s how you can create an instance of an immutable Map with two key-value pairs:

scala> val m = Map((1, "1st"), (2, "2nd"))
m: scala.collection.immutable.Map[Int,java.lang.String] =
     Map(1 -> 1st, 2 -> 2nd)

An alternative way to create a similar Map would be to use the key -> value syntax:

scala> val m = Map(1 -> "1st", 2 -> "2nd")
m: scala.collection.immutable.Map[Int,java.lang.String] =
                                           Map(1 -> 1st, 2 -> 2nd)

Most of the operations of Map are similar to those of Set. In the case of Map, the apply method returns the value of a given key, and if the value doesn’t exist, it throws an exception. Here’s an example of how to use the apply method:

scala> m(1)
res20: java.lang.String = 1st

scala> m(3)
java.util.NoSuchElementException: key not found: 3
    at scala.collection.MapLike$class.default(MapLike.scala:226)
    at scala.collection.immutable.Map$Map2.default(Map.scala:88)

The better way to retrieve a value associated with a key is to use the get method defined in Map. Instead of returning the value, it wraps the value in a container called Option:

def get(key: A): Option[B]

Option is similar to the Maybe construct you created at the beginning of the chapter. The Scala Option provides more features than you built for the Maybe. You can think of Option as a List of one element. When an element exists in the Map, it returns Some of the elements; otherwise, it returns None. In the following example you’re using get to retrieve the value for a key:

scala> m.get(1)
res22: Option[java.lang.String] = Some(1st)

scala> m.get(3)
res23: Option[java.lang.String] = None

You can use get to extract the element from Option or use getOrElse to retrieve a value from Option. To get all the keys and values from the Map, you can use m.keys and m.values (table 4.3 lists some of the useful methods defined in mutable and immutable maps), and both of them return Iterator. Scala Map also defines methods like filter, which takes a predicate and returns a Map of all the key values for which the predicate was true. In the following code snippet, you’re filtering out all the rock artists:

Table 4.3. Useful methods defined in immutable and mutable Map

Methods

Description

ms getOrElse (k, d) The value associated with key k in map ms, or the default value d if not found.
ms + (k -> v) The map containing all mappings of ms as well as the mapping k -> v from key k to value v.
ms ++ kvs The map containing all mappings of ms as well as all key-value pairs of kvs.
ms filterKeys p A map view containing only those mappings in ms where the key satisfies predicate p.
ms mapValues f A map view resulting from applying function f to each value associated with a key in ms.
ms(k) = v, ms.update(k, v) Adds mapping from key k to value v to map ms as a side effect, overwriting any previous mapping of k.
ms getOrElseUpdate(k, d) If key k is defined in map ms, return its associated value. Otherwise, update ms with the mapping k -> d and return d.
xs.clear() Removes all mappings from ms.
scala> val artists  = Map(
"Pink Floyd" -> "Rock", "Led Zeppelin" -> "Rock",
"Michael Jackson" -> "Pop", "Above & Beyond" -> "Trance")
artists: scala.collection.immutable.Map[java.lang.String,java.lang.String] =
     Map(Pink Floyd -> Rock, Led Zeppelin -> Rock, Michael Jackson -> Pop,
     Above & Beyond -> Trance)

scala> artists filter { (t) => t._2 == "Rock" }
res26: scala.collection.immutable.Map[java.lang.String,java.lang.String] =
     Map(Pink Floyd -> Rock, Led Zeppelin -> Rock)

The filter method in Map calls the predicate by passing the key-value pair as an instance of scala.Tuple2. The Scala Tuple2 defines the methods _1 and _2 to retrieve the first and second elements from the Tuple. Another way you could filter out all the rock artists would be to use for-comprehension, like so:

scala> for(t <- artists; if(t._2 == "Rock")) yield t
res31: scala.collection.immutable.Map[java.lang.String,java.lang.String] =
     Map(Pink Floyd -> Rock, Led Zeppelin -> Rock)

This brings up an interesting point of how a for-comprehension is similar to the map, filter, and foreach methods you’ve learned about so far. In the next section you’ll see how a for-comprehension is translated in Scala.

4.9.3. Under the hood of for-comprehension

You’ve learned that for-comprehensions are made up of generators, value definitions, and guards. But I haven’t talked about the under-the-hood translation that happens and how a for-comprehension combines pattern matching with filter, map, flatMap, and foreach. This knowledge will help you understand how to combine simple functions to create something powerful. As always, a better way is to look at what’s going on underneath. This time you’ll create a case class to represent the artists and use a for-comprehension to create a list of rock artists. Here’s the code snippet:

case class Artist(name: String, genre: String)

val artists = List(Artist("Pink Floyd", "Rock"),
                   Artist("Led Zeppelin", "Rock"),
                   Artist("Michael Jackson", "Pop"),
                   Artist("Above & Beyond", "trance")
                  )
for(Artist(name, genre) <- artists; if(genre == "Rock"))
yield name

Under the hood, this for-comprehension will get translated to something like the following:

artists withFilter {
  case Artist(name, genre) => genre == "Rock"
} map {
  case Artist(name, genre) => name
}

How does this translation work? Every generator p <- e gets translated to a call to withFilter on e like e.withFilter { case p => true; case _ => false }. In the generator, the first part of Artist(name, genre) is nothing but pattern matching, and because you can use case classes in pattern matching, you used them for the Artist example. The withFilter method returns an instance of class List.WithFilter that supports all the filtering operations. The yield part of the for-comprehension is turned into a call to the map method on the output of the filter. For example, for(Artist(name, genre) <- artists) yield name gets translated into something like the following:

artists withFilter {
  case Artist(name, genre) => true; case _ => false
} map {
  case Artist(name, genre) => name
}

For-comprehensions without yield (imperative version) are translated into a foreach method call on the output of the filter. Here’s an example:

for(Artist(name, genre) <- artists) println(name + "," + genre)

artists withFilter {
  case Artist(name, genre) => true; case _ => false
} foreach {
  case Artist(name, genre) => println(name + "," + genre)}

When you have multiple generators in the for-comprehension, things become a little more involved and interesting. Let’s say that along with the artists you also like to store albums produced by those artists, and you’re only interested in rock albums. Create another case class to store the artists with their albums, and using a for-comprehension you can easily filter out all the rock albums. Here’s the code:

Why use withFilter but not filter?

The answer lies in strict versus nonstrict processing of the filter. The filter method processes all the elements of the list and selects only those elements that satisfy the predicate/condition, whereas nonstrict processing means that computation is done on a need-to basis. Starting with Scala 2.8, for-comprehensions are nonstrict. In this example you have a list of numbers, and you want to control the processing based on a flag:

val list = List(1, 2, 3)
var go = true
val x = for(i <- list; if(go)) yield {
    go = false
    i
}
println(x)

You’d expect that x would be List(1), but if you run the same code in Scala 2.7.*, you’ll see that x is List(1, 2, 3). The reason is that prior to Scala 2.8, for-comprehensions were translated into something like the following:

val y = list filter {
  case i => go
} map {
  case i => {
    go = false
    i
  }
}
println(y)

As you can see, when the filter processes the elements, go is true; hence it returns all the elements. The fact that you’re making the go flag false has no effect on the filter because the filter is already done. In Scala 2.8 this problem is fixed using withFilter. When withFilter is used, the condition is evaluated every time an element is accessed inside a map method.

case class Artist(name: String, genre: String)
case class ArtistWithAlbums(artist: Artist, albums: List[String])

val artistsWithAlbums = List(
                   ArtistWithAlbums(Artist("Pink Floyd", "Rock"),
                        List("Dark side of the moon", "Wall")),
                   ArtistWithAlbums(Artist("Led Zeppelin", "Rock"),
                        List("Led Zeppelin IV", "Presence")),
                   ArtistWithAlbums(Artist("Michael Jackson", "Pop"),
                        List("Bad", "Thriller")),
                   ArtistWithAlbums(Artist("Above & Beyond", "trance"),
                        List("Tri-State", "Sirens of the Sea"))
                  )
for { ArtistWithAlbums(artist, albums) <- artistsWithAlbums
      album <- albums
      if(artist.genre == "Rock")
    } yield album

For each artist you’re iterating through all the albums and checking to see if the genre is Rock. When you have multiple generators, Scala uses flatMap instead of map. The preceding code gets translated to the following:

artistsWithAlbums flatMap {
  case ArtistWithAlbums(artist, albums) => albums withFilter {
                                     album => artist.genre == "Rock"
                             } map { case album => album }
}

The reason you use flatMap here is because you have to flatten the output of map for each generator. Using flatMap you get List(Dark side of the moon, Wall, Led Zeppelin IV, Presence) as the result, and with map you get List(List(Dark side of the moon, Wall), List(Led Zeppelin IV, Presence), List(), List()) as output.

Before leaving Scala collections, I’d like to discuss one more thing, and that’s the usage of Scala Option. You’ve already seen some of the methods defined in Scala collections that return Option, and it’s time to see what they are.

4.9.4. Use Option not Null

If you’ve worked with Java or a Ruby-like language, then you understand the pain of working with null or nil (in the case of Ruby) in code. In Ruby, things are a little better in the sense that Nil is a singleton object, and you can invoke methods on Nil. But in Java if a variable reference is null, you get a NullPointerException. To avoid the issue many programmers clutter their codebase with null checks and make the code difficult to read.

Scala takes a different approach to solving this problem, using something called Option.[5] Option implements the Null Object pattern. Option is also a Monad. I talk about Monads at length in the next chapter, but for now think of a Monad as a simple container. Option is an abstract class in Scala and defines two subclasses, Some and None. You implemented something similar in listing 4.2. Every now and then you encounter a situation where a method needs to return a value or nothing. In Java or Ruby you typically do this by returning null or nil. But in the case of Scala, Option is the recommended way to go when you have a function return an instance of Some or otherwise return None. The get method in Scala map does exactly that. When the given key exists, it returns the value wrapped in Some or returns None when the key doesn’t exist. Here’s an example:

5 Daniel Spiewak (blog), “The ‘Option’ Pattern,” 2008, http://mng.bz/AsUQ.

val artists  = Map("Pink Floyd" -> "Rock", "Led Zeppelin" -> "Rock", "Michael
     Jackson" -> "Pop", "Above & Beyond" -> "Trance")

artists: scala.collection.immutable.Map[java.lang.String,java.lang.String] =
     Map(Pink Floyd -> Rock, Led Zeppelin -> Rock, Michael Jackson -> Pop,
     Above & Beyond -> Trance)

scala> artists.get("Pink Floyd")
res33: Option[java.lang.String] = Some(Rock)

scala> artists.get("Abba")
res34: Option[java.lang.String] = None

You can use Option with pattern matching in Scala, and it also defines methods like map, filter, and flatMap so that you can easily use it in a for-comprehension.

When should you use Either rather than Option?

scala.Either represents one of the two possible meaningful results, unlike Option, which returns a single meaningful result or Nothing. Either provides two subclasses: Left and Right. By convention, Left represents failure and Right is akin to Some. In the following code you’re trying to make a socket connection, and as you know, it might fail or return a connection based on whether a server is available to accept it. You could easily wrap these kinds of operations in a function called throwableToLeft:

def throwableToLeft[T](block: => T):Either[Throwable, T] =
  try {
    Right(block)
  } catch {
    case ex Throwable => Left(ex)
  }

When creating a new Socket connection, you can wrap using throwableToLeft as in the following:

scala> val r = throwableToLeft {
  new java.net.Socket("localhost", 4444)
}
scala> r match {
       case Left(e) => e.printStackTrace
       case Right(e) => println(e)
       }

When an exception occurs, you create an instance of Left otherwise Right. Most programmers are used to throwing exceptions, and using Either could take some getting used to. In particular, throwing an exception isn’t a good idea in concurrent programming, and using Either like a pattern is a good way to send and receive failures between processes and threads.

4.10. Working with lazy collections: views and streams

To understand lazy collections, step back and examine their opposite—strict collections. So far you’ve looked at collections that are strict, meaning they evaluate their elements eagerly. The following example adds 1 to all the elements of the List but only returns the head from the List:

scala> List(1, 2, 3, 4, 5).map( _ + 1).head
res43: Int = 2

The problem with this code is that even though you’re looking for only the head element of the output (another instance of List) produced by map, you’re still processing all the elements from 1 to 5 in the List. To make it clearer, break the previous line like this:

scala> val newList = List(1, 2, 3, 4, 5).map( _ + 1)
newList: List[Int] = List(2, 3, 4, 5, 6)

scala> newList.head
res44: Int = 2

Sometimes this isn’t a big issue, but other times you may want to save space and time by not creating intermediate collections and operations unless required. Scala offers a couple of interesting and useful ways to create more on-demand collections using View and Stream. Let’s start with views.

4.10.1. Convert a strict collection to a nonstrict collection with views

The more technical term for on-demand collections is nonstrict collections. You’ll see an example shortly. Sometimes nonstrict collections are called lazy collections, but lazy usually refers to nonstrict functions that cache results.

Tip

Prior to Scala 2.8, Views were called Projections. This is one of the migration issues you may face when moving to Views from Projections.

Almost all collections expose a method called view, which returns a nonstrict view of the collection you’re working on. To process the previous List example on demand, you could do the following:

scala> List(1, 2, 3, 4, 5).view.map( _ + 1).head
res46: Int = 2

In this case, a call to map produces another view without doing the calculation, and the calculation is deferred until you invoke head on it. Another interesting way to look at laziness is how sometimes you can avoid errors with lazy processing.[6] The following example processes elements of List by using each element as a divisor of 2. But one of the elements will result in a divide-by-zero error:

6 Daniel Sobral (blog), “Strict Ranges?,” October 2009, http://mng.bz/nM4s.

scala> def strictProcessing = List(-2, -1, 0, 1, 2) map { 2 / _ }
strictProcessing: List[Int]

scala> strictProcessing(0)
java.lang.ArithmeticException: / by zero
    at $anonfun$strictProcessing$1.apply(<console>:6)

Even though you’re interested in only the first element of the list, the entire collection is processed, causing the exception for the third element. Using View would avoid the exception when accessing the first element:

scala> def nonStrictProcessing = List(-2, -1, 0, 1, 2).view  map { 2 / _ }
nonStrictProcessing: scala.collection.SeqView[Int,Seq[_]]
scala> nonStrictProcessing(0)
res50: Int = -1

You can skip the error element and process the other elements, but the moment you access the error element you’ll get an exception:

scala> nonStrictProcessing(3)
res52: Int = 2

scala> nonStrictProcessing(2)
java.lang.ArithmeticException: / by zero
    at $anonfun$nonStrictProcessing$1.apply(<console>:6)

To force strict processing on a view, invoke the force method on it, which, like the strict version, will throw ArithmeticException:

scala> nonStrictProcessing.force
java.lang.ArithmeticException: / by zero
...

The nonstrict method of processing collection elements is a useful and handy way to improve performance, especially when the operation is time-consuming. In lazy functional languages like Haskell and Clean, almost every construct is evaluated lazily. But because Scala isn’t a lazy functional programming language, you have to take extra steps to simulate the equivalent type of laziness by using call-by-name functions or partial functions. An example will demonstrate this idea. The following code snippet has a time-consuming operation called tweets, which takes a handle and searches for Twitter messages that have the handle name:

import scala.io._
import scala.xml.XML

def tweets(handle: String) = {
  println("processing tweets for " + handle)

  val source = Source.fromURL(new
    java.net.URL("http://search.twitter.com/search.atom?q=" + handle))
  val iterator = source.getLines()
  val builder = new StringBuilder
  for(line <- iterator) builder.append(line)
  XML.loadString(builder.toString)
}

Using Source you get the Twitter search result in XML and create an XML node instance from it. Even though it doesn’t take that much time, for argument’s sake let’s consider this operation to be expensive and time-consuming. Now you need to process these Twitter search results for multiple users. The most obvious solution would be to create a Map that stores the tweets with the handle name, as in the following:

scala> val allTweets = Map("nraychaudhuri" -> tweets("nraychaudhuri"),
                      "ManningBooks" -> tweets("ManningBooks"),
                      "bubbl_scala" -> tweets("bubbl_scala")
                       )
processing tweets for nraychaudhuri
processing tweets for ManningBooks
processing tweets for bubbl_scala

The problem with this approach is that while creating the Map, you’re invoking the tweets function for all users. But because the tweets function is time-consuming, you’d like to invoke it only when you need it for a user. An alternative way to solve the problem would be to use a partial function, as discussed previously:

scala> val allTweets = Map(
"nraychaudhuri" -> tweets _ , "ManningBooks" -> tweets _,
"bubbl_scala" -> tweets _)

allTweets: scala.collection.immutable.Map[java.lang.String,(String) =>
     scala.xml.Elem] = Map(nraychaudhuri -> <function1>, ManningBooks ->
     <function1>, bubbl_scala -> <function1>)

In this case you’ve created the map using a partial function. A function becomes a partial function when you don’t specify all the arguments it needs. For example, if you invoke tweets with an argument you get messages, but if you omit the argument you get a function back:

scala> tweets("ManningBooks")
processing tweets for ManningBooks

scala> tweets _
res73: (String) => scala.xml.Elem = <function1>

To omit an argument you have to specify _; otherwise, you’ll get an error from Scala:

scala> tweets
<console>:19: error: missing arguments for method tweets in object $iw;
follow this method with '_' if you want to treat it as a partially applied
     function
       tweets

In the example if you use view, you can achieve the laziness you’re looking for, and your tweets function will get called when you need the value:

scala> allTweets.view.map{ t => t._2(t._1)}.head
processing tweets for nraychaudhuri

Inside a Map, values are stored as Tuple2, a tuple of two elements. _1 is the handle and _2 is the value, and in this case it’s a partial function. You’re invoking the tweets function by passing the handle name. If you want to process the messages for Manning Books, you can use a for-comprehension, as in the following:

for(t <- allTweets; if(t._1 == "ManningBooks")) t._2(t._1)

Note that starting with Scala 2.8, for-comprehensions are now nonstrict for standard operations.

4.10.2. Working with Streams

The class Stream implements lazy lists in Scala where elements are evaluated only when they’re needed. Stream is like List, in which elements are stored in two parts, head and tail. The tail for Stream isn’t computed until it’s needed. If you want, you can build an infinite list in Scala using Stream, and it will consume memory based on your use. Because Stream extends from LinearSeq, you have almost all the List methods available to you. The following example zips each element of the List with its index:

scala> List("zero", "one", "two", "three", "four",
       "five").zip(Stream.from(0))

res88: List[(java.lang.String, Int)] = List((zero,0), (one,1), (two,2),
     (three,3), (four,4), (five,5))

Here the from method defined in the Stream object creates an infinite stream starting at 0 and incrementing by 1. Scala Streams also enjoy the same benefits as views when it comes to memory consumption and performance. Let’s look at an example to demonstrate that, using the Fibonacci sequence.[7] In mathematics, the Fibonacci numbers are the numbers in the following sequence:

7 The Fibonacci sequence example is taken from www.haskell.org/soe/.

0, 1, 1, 2, 3, 5, 8, 13, 21, 34 ...

By definition, the first two numbers are 0 and 1, and each remaining number is the sum of the previous two. The most common way to implement the Fibonacci sequence is to use recursion; here’s the code:

def fib(n: Int): Int = n match {
  case 0 => 0
  case 1 => 1
  case n => fib(n - 1) + fib(n - 2)
}

This function will return the Fibonacci number for a given n value. To see the efficiency problem of this implementation, you have to try the function for a value of n greater than 20. To clearly understand why it’s not an efficient solution, look at what happens for fib 8:

fib(8)
fib(7) + fib(6)
(fib(6) + fib(5)) + (fib(5) + fib(4))
((fib(5) + fib(4)) + (fib(4) + fib(3)) + ((fib(4) + fib(3)) + (fib(3) +
     fib(2))
...

As you can see, the calculation is growing exponentially, and unfortunately many of the steps are computed repeatedly. One way to implement the Fibonacci sequence is to build it using an infinite stream. If you take a Fibonacci sequence starting from 0 (0, 1, 1, 2, 3...) and zip with its tail, you get a sequence of pairs (tuples): ((0, 1), (1, 2), (2, 3)...). If you use map to sum the two parts of the pair, you get a number in the Fibonacci series. To implement this, use recursive streams. Here’s the code snippet:

val fib: Stream[Int]  = Stream.cons(0, Stream.cons(1,
                               fib.zip(fib.tail).map(t => t._1 + t._2)))

You’re using the apply method of the cons object with a head of 0 and another Stream as a tail. Now if you compare the timing of both implementations, you’ll see that the stream-based solution performs better than the previous one.

The question that remains is how Stream can manage to evaluate the tail when it’s required and not eagerly evaluated. The answer is a call-by-name parameter. A close look at the signature of the apply method in the cons object will show that the second parameter is declared as a function that takes no arguments and returns a stream:

def apply[A](hd: A, tl: => Stream[A]) = new Cons(hd, tl)

Here t1 is a call-by-name parameter, which is encoded by turning it into a no-arg function. Note that when pattern matching a Stream, the usual cons(::) doesn’t work; you have to use #::.

4.11. Divide and conquer with parallel collections

So far you’ve looked at collections that use either an eager or lazy evaluation strategy, and the elements of the collection are processed sequentially. Now you’ll learn about Scala parallel collections, in which elements of a collection are evaluated in parallel.

The Scala parallel collections are implemented in terms of split and combine operations. The split operation divides parallel collections into small Iterable collections until it hits some threshold defined for a given collection. Then a set of tasks is created to perform the operation in parallel on small Iterable collections. These tasks are implemented in terms of a Fork/Join framework.[8] The Fork/Join framework figures out the number of CPU cores available to perform the operation and uses threads to execute the task. At the end, the output of the each task combines to produce the final result. Figure 4.2 shows how a map operation is performed on ParVector, a parallel version of Vector collection.

8 Java Tutorials, “Fork/Join,” http://mng.bz/0dnU.

Figure 4.2. Parallel collections implemented in terms of split and combine operations using the Fork/Join framework

In figure 4.2 the ParVector(10, 20, 30, 40, 50, 60, 70, 80, 90) is split into smaller Iterable collections, and each small Iterable collection implements the scala.collection.parallel.Splitter trait. The threshold operation defined for each parallel collection provides an estimate on the minimum number of elements the collection has before the splitting stops. Once the split operation is over, each collection is handed over to a task to perform the operation. For example TaskA takes 10, 20 as an input and performs a map operation on them to produce 5, 10. At the end, the output of each task is combined into the final result. Each parallel collection type provides a combiner that knows how to combine smaller collections to produce the output. To figure how many workers are used to perform the operation, you could try the following snippet inside the REPL:

scala> import scala.collection.parallel.immutable._
import scala.collection.parallel.immutable._

scala> ParVector(10, 20, 30, 40, 50, 60, 70, 80, 90).map {x =>
     println(Thread.currentThread.getName); x / 2 }

ForkJoinPool-1-worker-2
ForkJoinPool-1-worker-2
ForkJoinPool-1-worker-3
ForkJoinPool-1-worker-3
ForkJoinPool-1-worker-3
ForkJoinPool-1-worker-3
ForkJoinPool-1-worker-1
ForkJoinPool-1-worker-0
ForkJoinPool-1-worker-0
res2: scala.collection.parallel.immutable.ParVector[Int] = ParVector(5, 10,
     15, 20, 25, 30, 35, 40, 45)

I’ve changed the map method a little bit to print out the name of the thread executing the operation. Because I’m running on a quad-core Macbook Pro, the Fork/Join framework used four different worker threads to execute the map operation in parallel. The details of Fork/Join are completely taken care of for you so you can focus on solving problems.

Configuring parallel collections

The engine responsible for scheduling the parallel collection operations is called TaskSupport. Each parallel collection is configured with a task support object that is responsible for keeping track of the thread pool, load-balancing, and scheduling of tasks.

Scala provides a few implementations of task support out of the box:

  • “ForkJoinTaskSupport—This uses the fork-join thread pool used by JVM 1.6.
  • “ThreadPoolTaskSupport—Less efficient than ForkJoinTaskSupport; it uses the normal thread pool to execute tasks.
  • “ExecutionContextTaskSupport—This is the default task support for all the parallel collection types in Scala. This is implemented in the scala.concurrent package and uses the fork-join thread pool behind the scene.

To change the task support associated with a given collection, simply change the taskSupport property as in the following:

import scala.collection.parallel._
val pv = immutable.ParVector(1, 2, 3)
pv.tasksupport = new ForkJoinTaskSupport(new
scala.concurrent.forkjoin.ForkJoinPool(4))

In this case tasksupport is changed to ForkJoinTask with four working threads.

Parallel collections are well-suited for data parallel problems and improve the performance of the overall operation considerably without your worrying about concurrency. The map operation is a perfect example of an operation that you could parallelize because it doesn’t depend on the order in which elements of a collection are processed. Scala parallel collections don’t guarantee any order of execution. On the other hand, foldLeft isn’t suited for parallel collections because the elements need to be processed in a certain order. The following example demonstrates that foldLeft is executed by a single thread even if performed on ParVector:

scala> ParVector(10, 20, 30, 40, 50, 60, 70, 80, 90).foldLeft(0) {(a,x) =>
     println(Thread.currentThread.getName); a + x }
Thread-14
Thread-14
Thread-14
Thread-14
Thread-14
Thread-14
Thread-14
Thread-14
Thread-14
res3: Int = 450

Note that in the absence of side effects, parallel collections have the same semantics as the sequential collections. The next section explores the types of available parallel collections.

4.11.1. Parallel collection hierarchy

Parallel collections were added to the collections library in Scala 2.9. All the parallel collection classes are implemented in a separate class hierarchy (see figure 4.3). At the top of the parallel collection library is ParIterable. This trait implements the common behavior of all the parallel collections.

Figure 4.3. Scala parallel collection hierarchy

Why there are Gen* classes in the scala.collection package

The Gen* classes implement operations that could be implemented both in a sequential and parallel collections library. These classes form a hierarchy similar to the class hierarchy in figure 4.1. If you want your code to not care whether it receives a parallel or sequential collection, you should prefix it with Gen: GenTraversable, GenIterable, GenSeq, and so on. These can be either parallel or sequential.

The Scala parallel collections library implements parallel versions of almost all the collection types available in the scala.collection package, both mutable and immutable types. I say almost because you won’t find parallel implementation of LinearSeq type collections like List because they aren’t well-suited for parallel execution.

To use any of these collection types you have to import the scala.collection.parallel.immutable or scala.collection.parallel.mutable package:

scala> import scala.collection.parallel.mutable._
import scala.collection.parallel.mutable._

scala> ParArray(1, 2, 3, 4).sum
res4: Int = 10

You don’t always have to start with a parallel collection implementation; you can easily switch between sequential and parallel as you need them.

4.11.2. Switching between sequential and parallel collections

Scala collections provide the par method to all sequential collection types to create a parallel version of the collection. And on the other side, all the parallel collection types implement the seq method to create sequential versions of the collection. The following example filters out all the odd numbers by converting the sequential version of Vector to ParVector using the par method:

val vs = Vector.range(1, 100000)
vs.par.filter(_ % 2 == 0)

In this case the output will be an instance of ParVector of even numbers. To get back the sequential version of Vector, you have to invoke the seq method. The following example converts Vector to its parallel counterpart to perform a filter operation and then converts back to Vector again:

Vector.range(1, 100000).par.filter(_ % 2 == 0).seq

This kind of conversion has the additional benefit that you can optimize part of a codebase to use parallel collections without changing the type of the collection used across the code.

But parallel collections aren’t a silver bullet. You shouldn’t expect that everything will perform better by switching to parallel collections. In fact, in some cases it might perform worse than the sequential version. Consider two points before switching to parallel collections:

  • Type of operation
  • Overhead of conversion

First, not all operations are parallelizable, so switching to parallel collection won’t improve the performance of these operations. An ideal candidate would be the one that doesn’t assume any order of execution and doesn’t have any side effects. Operations like map, flatMap, filter, and forall are good examples of methods that would be easily parallelized.

Second, there’s an overhead of creating a parallel version of a sequential collection and using the Fork/Join framework. If it takes less time to perform the operation than to create a parallel collection, then using the parallel version will reduce your performance. It also depends on the type of collection you’re using. Converting Seq to ParSeq is much faster than converting List to Vector because there’s no parallel List implementation, so when you invoke par on List you get Vector back.

4.12. Summary

You’ve learned a few important concepts in this chapter. The knowledge of type parameterization helped in exploring type-variance concepts and the type-safety features of Scala. Understanding this concept is important for building generic, type-safe, reusable components in Scala. The chapter also explored the use and importance of higher-order functions such as map and filter and how they help users of the Scala library—in particular the collection library that provides rich and useful APIs. Using higher-order functions, you can easily encapsulate common programming idioms.

This chapter also introduced the Scala collections library and the new changes made to the API starting with Scala 2.8. The Scala collections library is a vast set of APIs, and you saw only the most important and commonly used ones. You need to explore the scaladoc for other Collection classes and APIs, because almost all common, useful functions are already encoded in the library.

When working with collections it’s also important to understand the performance and memory requirements of individual collection types. Knowing the difference between strict and nonstrict processing will help you decide which type of collection would be useful and when.

The next chapter explores functional programming. You’ll learn what functional programming is and how to do functional programming in Scala. Understanding functional programming will help you build functional, immutable, and simple solutions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset