© Raul Estrada and Isaac Ruiz 2016

Raul Estrada and Isaac Ruiz, Big Data SMACK, 10.1007/978-1-4842-2175-4_3

3. The Language: Scala

Raul Estrada and Isaac Ruiz1

(1)Mexico City, Mexico

The main part of the SMACK stack is Spark, but sometimes the S is for Scala. You can develop in Spark in four languages: Java, Scala, Python, and R. Because Apache Spark is written in Scala, and this book is focused on streaming architecture, we are going to show examples in only the Scala language.

Other Apache Spark books present their examples in the four languages, but for the SMACK stack , simply discussing Scala is enough to develop a robust streaming pipeline. It is important to mention that all the Java programs run in Scala.

If you came here without previous Scala knowledge, welcome to the crash course. It is always good to learn a new programming language. We are not going to study Scala as the first programming language, however. This chapter is organized as a series of exercises in the language. If you already know Scala, try to follow the exercises to improve your knowledge.

As said by many, programming is just about algorithms and data structures. This chapter covers all the Scala data structures. The next chapter covers the algorithms—that is, the Akka actor model.

Functional Programming

Our goal in this chapter is not to learn Scala, but to reach the fully functional thinking in all of its pure expression. It is an open secret that each SMACK technology is independent and autonomous from the others. However, each could be developed (replaced) in Java or Scala.

The truth is that each and every one of the SMACK technologies can be developed ad hoc. Yes, the sun shines for everyone in the streaming pipeline world. You can develop from scratch any SMACK technology or replace one as your project needs.

How to write an entire Apache Akka project is beyond this book’s scope, but you should understand how it works to make good architectural decisions.

You need to be clear on these rules:

  • Scala collections and Java collections are different

  • Spark collections and Scala collections are different

There are three fundamentals (among many others) in functional programming :

  • Predicates

  • Literal functions

  • Implicit loops

Predicate

A predicate is a multiple parameter function with just one boolean value as a return.

This is an example (with body definition):

def isEven (i: Int) = if (i % 2 == 0) true else false

Here is another example (without body definition):

def isPrime (p: Long)

Note that the function has no parameters, but this is weird. If a function doesn’t receive an input, then this implies it is obtaining its data from a global variable or a shared context; both are strongly discouraged (even prohibited) in functional programming. Yes, we know that it could take a random number or take the system time to make its decisions, but these are special cases.

Literal Functions

In functional programming, functions are first-class citizens. In the 21st century it may sound archaic, but programming languages that discriminate against functions still exist, usually because they are low-level languages.

The rule of thumb is to think of it as algebra. In algebra, functions can be composed; you can make operations with functions and pass functions as other functions parameters. If you have problems with algebra, then sorry, this book (and programming) is not for you.... Just kidding. In this case, you can think of functions as traditional object-oriented programming (OOP) objects. So following that idea, you define a higher-order function in mathematics and computer science as a function that does at least one of the following:

  • Takes functions as arguments (as parameters)

  • Returns a function as a result

For example, the isEven function could be rewritten as this:

(i: Int) => i % 2 == 0            

In this code, the => symbol should be thought of as a transformer.

This is a high-order function because it returns a function. Simple, isn’t it?

Yes, in mathematics, as in life, definitions are difficult but necessary to support and generalize our theories. With examples, everything is clear.

Implicit Loops

As a final step, the isEven function could be rewritten as this:

_ % 2 == 0                  

The _ symbol denotes the parameter, or the thing (object, function, entity) to be used as input.

Combined with the filter method, over a list, we find expressions like these:

        scala> val oneToTen = List.range(1, 10)
        oneToTen: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9)
        scala> val evens = nums.filter(_ % 2 == 0)
        evens: List[Int] = List(2, 4, 6, 8)

The third line contains an implicit loop. Yes, in functional programming we try to avoid loops. If your code has a lot of fors and whiles, it could probably be simplified.

Functional is elegant and concise, but of course, there are some memory tricks that can be issued and solved through structured programming. Throughout history, code readability has proved to be more effective in economic terms (time and money) than hardware optimization, which has become cheaper.

Collections Hierarchy

At the top of the Scala collections hierarchy there is the Traversable class (as shown in Figure 3-1). All Traversable trait children have the implementation for this method:

A420086_1_En_3_Fig1_HTML.jpg
Figure 3-1. The Scala collection’s top hierarchy
        def foreach[U](f: Elem => U)

The Iterable trait has implementation in terms of an iterator:

        def foreach[U](f: Elem => U): Unit = {
          val ite = iterator
          while (ite.hasNext) f(ite.next())
        }

As you can see, the Iterable trait has three children: Seq, Set, and Map.

Sequences

The Seq trait represents sequences .

As shown in Figure 3-2 , Seq has three children: IndexedSeq, LinearSeq, and Buffer.

A420086_1_En_3_Fig2_HTML.jpg
Figure 3-2. The Seq children

A sequence is an iterable that has a length and whose elements start from zero and have fixed index positions.

LinearSeq and IndexedSeq don’t add any new operations, but each has different performance.

LinearSeq is the list. As you know from functional programming, it has head, tail, and isEmpty operations. It is very efficient with apply, length, and update operations.

IndexedSeq is the array. As you know from structured programming, it has the index operations. So, if you have an array of rooms, and you write Room(101), you access the 101st room.

Buffer is an important mutable sequence. Buffers allow you to update existing elements and to insert, remove, and add new elements at the end.

Maps

A mapis an iterable consisting of pairs. Each pair consists of a key and a value (also called mappings or associations). The Map family is shown in Figure 3-3 .

A420086_1_En_3_Fig3_HTML.jpg
Figure 3-3. The Map family

Scala offers an implicit conversion that lets you write key -> value as an alternate syntax for the (key, value).

For example, Map("uno" -> 1, "dos" -> 2, "tres" -> 3) is the same as Map(("uno", 1), ("dos", 2), ("tres", 3)), but is easier to read.

Sets

A setis an iterable that contains no duplicate elements. As you can see in Figure 3-4 , the Set hierarchy is similar to the Map family.

A420086_1_En_3_Fig4_HTML.jpg
Figure 3-4. The Set family

Choosing Collections

Many programmers argue that the Scala type system is difficult and cumbersome. In fact, as you saw, you have to choose only one of these three types:

  • Sequence

  • Map

  • Set

The actual decision is to choose between the mutable and immutable versions.

Sequences

There are only two sequences: the LinearSeq (list) and the IndexedSeq (array). The true effort is to learn the names used, not the hierarchy itself (see Table 3-1).

Table 3-1. The Sequence Collections
 

Immutable

Mutable

IndexedSeq

Vector

ArrayBuffer

LinearSeq

List

ListBuffer

Immutable Sequences

  • LinearSeq

    • List: The list as we know from the functional world.

    • Queue: The FIFO data structure of the traditional computer science books.

    • Stack: The LIFO data structure of the traditional computer science books.

    • Stream: Infinite, lazy and persistent; our everyday flow.

  • IndexedSeq

    • Range: A limited list of integers.

    • String: The well-known and limited char sequence.

    • Vector: Immutable, indexed, the sedan model of the lists.

Mutable Sequences
  • LinearSeq

    • LinkedList: Those traditionally used as an introduction to the C/C++ pointers.

    • DoubleLinkedList: LinkedList with the “previous” method implemented.

    • ListBuffer: The List version of the indexed Array.

    • MutableList: A list for those non-functional rebels.

    • Queue: The FIFO for non-functional guys.

    • Stack: The LIFO for non-functional fellas.

  • IndexedSeq

    • Array: A list which length is constant and every element is not.

    • ArrayBuffer: An indexed array that always fits memory needs.

    • ArrayStack: LIFO implementation when performance matters.

    • StringBuilder: Efficient string manipulation for those with a limited memory budget.

Maps

You have to choose either a mutable map or a sorted map.

  • Mutable maps

    • HashMap: A map whose internal implementation is a hash table.

    • LinkedHashMap: Elements are returned as they were inserted.

    • ListMap: Elements are returned as the inverse of how they were inserted.

    • Map: The map as everybody knows it; key-value pairs.

  • Immutable maps

    • HashMap: A map whose internal implementation is a tree.

    • ListMap: Elements are returned as the inverse of how they were inserted.

    • Map: The map as everybody knows it; key-value pairs.

    • SortedMap: The keys are stored in a sorted order.

    • TreeMap: A sorted map; the red-black tree of the traditional computer science books.

Sets

You have to choose either a mutable set or a sorted set.

  • Mutable sets

    • BitSet: Used to save memory, but only integers are allowed.

    • HashSet: A set implemented using a hash table.

    • LinkedHashSet: The elements are returned as they were inserted.

    • TreeSet: The AVL tree of the traditional computer science books.

    • Set: The mutable vanilla set.

    • SortedSet: The mutable TreeSet, but ordered.

  • Immutable sets

    • BitSet: To save (more) memory, only integers are allowed.

    • HashSet: A set implemented using a tree.

    • ListSet: A set for the public; a list for those who knows it.

    • TreeSet: An immutable set implemented using a tree.

    • Set: The immutable vanilla set.

    • SortedSet: The immutable TreeSet but ordered.

Traversing

foreach is the standard method for traversing collections in Scala. Its complexity is O(n); that is, the computation time has a linear relation with the number of elements in the input. We also have the traditional for and the iterators, as in Java.

foreach

In Scala, the foreach method takes a function as argument. This function must have only one parameter and it doesn’t return anything (this is called a procedure). It operates in every element of the collection, one at a time. The parameter type of the function must match the type of every element in the collection.

scala> val zahlen = Vector("Eins", "Zwei", "Drei")
zahlen: scala.collection.immutable.Vector[String] = Vector(Eins, Zwei, Drei)


scala> zahlen.foreach(s => print(s))
EinsZweiDrei

This function takes one character and prints it:

scala> def printAChar(c: Char) { print(c) }            
printAChar: (c: Char)Unit

The function is applied to a string (a sequence of chars):

scala> "SMACK".foreach( c => printAChar(c) )
SMACK

The type inference is a useful tool in these modern times:

scala> "SMACK".foreach( printAChar )
SMACK

This is same as the preceding example but with a literal function:

scala> "SMACK".foreach( (c: Char) => print(c) )
SMACK

This is same as the preceding example but uses a type inference and literal functions:

scala> "SMACK".foreach( print )
SMACK

This example uses an implicit loop:

scala> "SMACK: Spark Mesos Akka Cassandra Kafka".split(" ")
Array[String] = Array(SMACK:, Spark, Mesos, Akka, Cassandra, Kafka)

for

As in all modern functional programming languages, we can explore all the elements of a collection with a for loop .

Remember, foreach and for are not designed to produce new collections. If you want a new collection, use the for/yield combo.

As we stated earlier, if it’s iterable, then it’s traversable (inheritance 101):

scala> val smack = Traversable("Spark", "Mesos", "Akka", "Cassandra", "Kafka")
smack: Traversable[String] = List(Spark, Mesos, Akka, Cassandra, Kafka)


scala> for (f <- smack) println(f)
Spark
Mesos
Akka
Cassandra
Kafka


scala> for (f <- smack) println( f.toUpperCase )
SPARK
MESOS
AKKA
CASSANDRA
KAFKA

To build a new collection, use the for/yield construct:

scala> val smack = Array("Spark", "Mesos", "Akka", "Cassandra", "Kafka")
smack: Array[java.lang.String] = Array(Spark, Mesos, Akka, Cassandra, Kafka)


scala> val upSmack = for (s <- smack) yield s.toUpperCase
upSmack: Array[java.lang.String] = Array(SPARK, MESOS, AKKA, CASSANDRA, KAFKA)

This for/yield construct is called for comprehension.

Now, let’s iterate a map with a for loop:

scala> val smack = Map("S" ->"Spark", "M" -> "Mesos", "A" -> "Akka", "C" ->"Cassandra", "K" -> "Kafka")
smack: scala.collection.immutable.Map[String,String] = Map(A -> Akka, M -> Mesos, C -> Cassandra, K -> Kafka, S -> Spark)


scala> for ((k,v) <- smack) println(s"letter: $k, means: $v")
letter: A, means: Akka
letter: M, means: Mesos
letter: C, means: Cassandra
letter: K, means: Kafka
letter: S, means: Spark

Iterators

To iterate a collection in Java, you use hasNext() and next(). In Scala, however, they don’t exist, because there are the map and foreach methods.

You only use iterators in Scala when reading very large streams; a file is the most common example. As a rule of thumb, you use iterators when it’s not convenient to load all the data structure in memory.

Once it has been used, an iterator remains “exhausted,” as shown in the following:

scala> val iter = Iterator("S","M","A","C","K")
iter: Iterator[String] = non-empty iterator


scala> iter.foreach(println)
S
M
A
C
K
scala> iter.foreach(println)

As you can see, the last line didn’t produce any output, because the iterator is exhausted.

Mapping

Another way to transform collections different from the for/yield is by using the map method call with a function as argument, as follows:

scala> val smack = Vector("spark", "mesos", "akka", "cassandra", "kafka")

smack: scala.collection.immutable.Vector[String] = Vector(spark, mesos, akka, cassandra, kafka)

// the long way
scala> val cap = smack.map(e => e.capitalize)
cap: scala.collection.immutable.Vector[String] = Vector(Spark, Mesos, Akka, Cassandra, Kafka)


// the short way
scala> val cap = smack.map(_.capitalize)
cap: scala.collection.immutable.Vector[String] = Vector(Spark, Mesos, Akka, Cassandra, Kafka)


//producing a Vector of Int
scala> val lens = smack.map(_.size)
lens: scala.collection.immutable.Vector[Int] = Vector(5, 5, 4, 9, 5)


//producing a Vector of XML elements
scala> val elem = smack.map(smack => <li>{smack}</li>)
elem: scala.collection.immutable.Vector[scala.xml.Elem] = Vector(<li>spark</li>, <li>mesos</li>, <li>akka</li>, <li>cassandra</li>, <li>kafka</li>)

Unfortunately, Scala has type inference; that is, there is no a general rule for the collection type returned after a mapping operation.

You can say that you are a seasoned Scala functional programmer if you can identify the comprehension to be used: for/yield or map.

scala> val smack = List("spark", "mesos", "akka", "cassandra", "kafka")
smack: List[String] = List(spark, mesos, akka, cassandra, kafka)


// capitalize with map
scala> val m = smack.map(_.capitalize)
m: List[String] = List(Spark, Mesos, Akka, Cassandra, Kafka)


// capitalize with for/yield
scala> val y = for (s <- smack) yield s.capitalize
y: List[String] = List(Spark, Mesos, Akka, Cassandra, Kafka)

Flattening

In functional programming, the flattening process occurs when you convert a list of lists (also called sequence of sequences or multilist) into one list. The following is an example:

scala> val allies = List(List("Java","Scala"), List("Javascript","PHP"))
allies: List[List[String]] = List(List(Java, Scala), List(Javascript, PHP))


scala> val languages = allies.flatten
languages: List[String] = List(Java, Scala, Javascript, PHP)

The power of (functional) programming is the expressive power and simplicity. Here we capitalize, flat, and sort all in one sentence:

scala> val jargon = allies.flatten.map(_.toUpperCase).sorted
jargon: List[String] = List(JAVA, JAVASCRIPT, PHP, SCALA)

When you work with connected nodes, flattening helps with the network:

val webFriends = List("Java", "JS")
val javaFriends = List("Scala", "Clojure", "Ceylon")
val jsFriends = List("PHP", "Ceylon")


val friendsOfFriends = List( javaFriends, jsFriends)

scala> val uniqueFriends = friendsOfFriends.flatten.distinct
uniqueFriends: List[String] = List(Scala, Clojure, Ceylon, PHP)

As you may guess, flattening a string produces a list of its chars:

scala> val stuff = List("SMACK", "Scala")
stuff: List[String] = List(SMACK, Scala)


scala> stuff.flatten
List[Char] = List(S, M, A, C, K, s, c, a, l, a)

If a collection contains elements of type None, flattening removes them.

If a collection contains elements of type Some, flattening strips them:

scala> val boxes = Vector(Some("Something"), None, Some(3.14), None)
boxes: scala.collection.immutable.Vector[Option[Any]] = Vector(Some(Something), None, Some(3.14), None)


scala> boxes.flatten
res1: scala.collection.immutable.Vector[Any] = Vector(Something, 3.14)

Filtering

In functional programming, filtering traverses a collection and builds a new collection with elements that match specific criteria. This criteria must be a predicate. You apply the predicate to each collection element, for example:

scala> val dozen = List.range(1, 13)
dozen: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)


scala> val multiplesOf3 = dozen.filter(_ % 3 == 0)
multiplesOf3: List[Int] = List(3, 6, 9, 12)


scala> val languages = Set("Java", "Scala", "Clojure", "Ceylon")
languages: scala.collection.immutable.Set[String] = Set(Java, Scala, Clojure, Ceylon)


scala> val c = languages.filter(_.startsWith("C"))
c: scala.collection.immutable.Set[String] = Set(Clojure, Ceylon)


scala> val s = languages.filter(_.length < 6)
s: scala.collection.immutable.Set[String] = Set(Java, Scala)

Filtering has the following two rules:

  1. The filter doesn’t modify the collection. You must keep the result in a new one.

  2. Only the elements whose predicate returns true are kept.

Extracting

In this section, we are going to examine the methods to extract subsequences. The following are examples.

// We declare an array of Int from 1 to 9
scala> val magic = (0 to 9).toArray
magic: Array[Int] = Array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)


// Without the first N elements
scala> val d = magic.drop(3)
d: Array[Int] = Array(3, 4, 5, 6, 7, 8, 9)


// Without the elements matching a predicate
scala> val dw = magic.dropWhile(_ < 4)
dw: Array[Int] = Array(4, 5, 6, 7, 8, 9)


// Without the last N elements
scala> val dr = magic.dropRight(4)
dr: Array[Int] = Array(0, 1, 2, 3, 4, 5)


// Just the first N elements
scala> val t = magic.take(5)
t: Array[Int] = Array(0, 1, 2, 3, 4)


// Just the first elements matching a predicate (from the left)
scala> val tw = magic.takeWhile(_ < 4)
tw: Array[Int] = Array(0, 1, 2, 3)


// Just the last N elements
scala> val tr = magic.takeRight(3)
tr: Array[Int] = Array(7, 8, 9)


// the subsequence between the index A and B
scala> val sl = magic.slice(1,7)
sl: Array[Int] = Array(1, 2, 3, 4, 5, 6)

The List methods are used to achieve functional purity.

// head, the first element
scala> val h = magic.head
h: Int = 0


// the head boxed (to prevent errors)
scala> val hb = magic.headOption
hb: Option[Int] = Some(0)


// the list without the last element
scala> val in = magic.init
in: Array[Int] = Array(0, 1, 2, 3, 4, 5, 6, 7, 8)


// the last element
scala> val ta = magic.last
ta: Int = 9


// the last boxed (to prevent errors)
scala> val lo = magic.lastOption
lo: Option[Int] = Some(9)


// all the list without the first element (known as tail)
scala> val t = magic.tail
t: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9)

Splitting

For those fans of the database perspective, there are methods to discriminate lists. We split samples into two groups, as follows.

// Here, a sample list
scala> val sample = List(-12, -9, -3, 12, 18, 15)
sample: List[Int] = List(-12, -9, -3, 12, 18, 15)


// lets separate our sample in two groups
scala> val teens = sample.groupBy(_ > 10)
teens: scala.collection.immutable.Map[Boolean,List[Int]] = Map(false -> List(-12, -9, -3), true -> List(12, 18, 15))


// to access the generated groups
scala> val t = teens(true)
t: List[Int] = List(12, 18, 15)


scala> val f = teens(false)
f: List[Int] = List(-12, -9, -3)


// partition does the same as groupBy but it returns a List with two Lists
scala> val teens = sample.partition(_ > 10)
teens: (List[Int], List[Int]) = (List(12, 18, 15),List(-12, -9, -3))


// span the list, in one list with the longest index who meets the predicate
scala> val negs = sample.span(_ < 0)
negs: (List[Int], List[Int]) = (List(-12, -9, -3),List(12, 18, 15))


// splitAt generates two lists, one before the index at N, and the rest
scala> val splitted = sample.splitAt(2)
splitted: (List[Int], List[Int]) = (List(-12, -9),List(-3, 12, 18, 15))


// partition can assign the result to a Tuple
scala> val (foo, bar) = sample.partition(_ > 10)
foo: List[Int] = List(12, 18, 15)
bar: List[Int] = List(-12, -9, -3)

Unicity

If you want to remove duplicates in a collection, only use unique elements. The following are some examples.

scala> val duplicated = List("A", "Y", "Y", "X", "X", "Z")
duplicated: List[String] = List(A, Y, Y, X, X, Z)


// The first option is using distinct
scala> val u = duplicated.distinct
u: List[String] = List(A, Y, X, Z)


// the second is is converting the Collection to a Set, duplicates not allowed
scala> val s = duplicated.toSet
s: scala.collection.immutable.Set[String] = Set(A, Y, X, Z)

Merging

For merging and subtracting collections , use ++ and --. The following show some of examples.

// The ++= method could be used in any mutable collection
scala> val nega = collection.mutable.ListBuffer(-30, -20, -10)
nega: scala.collection.mutable.ListBuffer[Int] = ListBuffer(-30, -20, -10)


// The result is assigned to the original collection, and it is mutable
scala> nega ++= Seq(10, 20, 30)
res0: nega.type = ListBuffer(-30, -20, -10, 10, 20, 30)


scala> val tech1 = Array("Scala", "Spark", "Mesos")
tech1: Array[String] = Array(Scala, Spark, Mesos)


scala> val tech2 = Array("Akka", "Cassandra", "Kafka")
tech2: Array[String] = Array(Akka, Cassandra, Kafka)


// The ++ method merge two collections and return a new variable
scala> val smack = tech1 ++ tech2
smack: Array[String] = Array(Scala, Spark, Mesos, Akka, Cassandra, Kafka)

We have the classic Set operations from Set Theory.

scala> val lang1 = Array("Java", "Scala", "Ceylon")
lang1: Array[String] = Array(Java, Scala, Ceylon)


scala> val lang2 = Array("Java", "JavaScript", "PHP")
lang2: Array[String] = Array(Java, JavaScript, PHP)


// intersection, the elements in both collections
scala> val inter = lang1.intersect(lang2)
inter: Array[String] = Array(Java)
// union, the elements in both collections
scala> val addition = lang1.union(lang2)
addition: Array[String] = Array(Java, Scala, Ceylon, Java, JavaScript, PHP)


// to discriminate duplicates we use distinct
scala> val substraction = lang1.union(lang2).distinct
substraction: Array[String] = Array(Java, Scala, Ceylon, JavaScript, PHP)

The diff method results depend on which sequence it’s called on (in set theory, A-B is different from B-A):

// difference, the elements in one set that are not in the other
scala> val dif1 = lang1 diff lang2
dif1: Array[String] = Array(Scala, Ceylon)


scala> val dif2 = lang2 diff lang1
dif2: Array[String] = Array(JavaScript, PHP)

Lazy Views

In functional programming, we call something “lazy” when it doesn’t appear until it is needed. A lazy view is a version of a collection computed and returned when it is actually needed.

By contrast, in Java, all the memory is allocated immediately when the collection is created.

The difference between these two lines could save a lot of memory:

scala> 0 to 25
res0: scala.collection.immutable.Range.Inclusive = Range(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)


scala> (0 to 25).view
res1: scala.collection.SeqView[Int,scala.collection.immutable.IndexedSeq[Int]] = SeqView(...)

To force the memory allocation of a view, use the force instruction:

scala> val v = (0 to 25).view
v: scala.collection.SeqView[Int,scala.collection.immutable.IndexedSeq[Int]] = SeqView(...)


scala> val f = v.force
f: scala.collection.immutable.IndexedSeq[Int] = Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)

Mixing views with the map method significantly improves the performance of your programs. In the following example, increasing the bounds causes your CPU to struggle.

scala> (0 to 100).map { _ * 3 }
res0: scala.collection.immutable.IndexedSeq[Int] = Vector(0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57, 60, 63, 66, 69, 72...


scala> (0 to 100).view.map { _ * 3 }
res1: scala.collection.SeqView[Int,Seq[_]] = SeqViewM(...)

Good programmers (functional or SQL) know well the views benefits:

  • Performance (the reason that you’re reading this book)

  • The data structure is similar to database views

Database views were created to allow modifications on big result sets and tables without compromising the performance.

// lets create an array
scala> val bigData = Array("B", "I", "G", "-", "D", "A", "T", "A")
bigData: Array[String] = Array(B, I, G, -, D, A, T, A)


// and a view over the first elements
scala> val view = bigData.view.slice(0, 4)
view: scala.collection.mutable.IndexedSeqView[String,Array[String]] = SeqViewS(...)


// we modify the VIEW
scala> view(0) = "F"
scala> view(1) = "A"
scala> view(2) = "S"
scala> view(3) = "T"


// voilá, our original array was modified
scala> bigData
res0: Array[String] = Array(F, A, S, T, D, A, T, A)

Sorting

To sort, you use the sorted method with the <, <=, >, and >= operators. The following are some examples.

// sorting Strings
scala> val foo = List("San Francisco", "London", "New York", "Tokio").sorted
foo: List[String] = List(London, New York, San Francisco, Tokio)


// sorting numbers
scala> val bar = List(10, 1, 8, 3.14, 5).sorted
bar: List[Double] = List(1.0, 3.14, 5.0, 8.0, 10.0)


// ascending
scala> List(10, 1, 8, 3.14, 5).sortWith(_ < _)
res0: List[Double] = List(1.0, 3.14, 5.0, 8.0, 10.0)


// descending
scala> List(10, 1, 8, 3.14, 5).sortWith(_ > _)
res0: List[Double] = List(10.0, 8.0, 5.0, 3.14, 1.0)


// ascending alphabetically
scala> List("San Francisco", "London", "New York", "Tokio").sortWith(_ < _)
res0: List[String] = List(London, New York, San Francisco, Tokio)


// descending alphabetically
scala> List("San Francisco", "London", "New York", "Tokio").sortWith(_ > _)
res0: List[String] = List(Tokio, San Francisco, New York, London)


// ascending by length
scala> List("San Francisco", "London", "New York", "Tokio").sortWith(_.length < _.length)
res0: List[String] = List(Tokio, London, New York, San Francisco)


// descending by length
scala> List("San Francisco", "London", "New York", "Tokio").sortWith(_.length > _.length)
res0: List[String] = List(San Francisco, New York, London, Tokio)

Streams

Just as views are the lazy version of collections, streams are the lazy version of lists. Here we taste some stream power:

scala> val torrent = (0 to 900000000).toStream
torrent: scala.collection.immutable.Stream[Int] = Stream(0, ?)


scala> torrent.head
res0: Int = 0


scala> torrent.tail
res1: scala.collection.immutable.Stream[Int] = Stream(1, ?)


scala> torrent.take(3)
res2: scala.collection.immutable.Stream[Int] = Stream(0, ?)


scala> torrent.filter(_ < 100)
res3: scala.collection.immutable.Stream[Int] = Stream(0, ?)


scala> torrent.filter(_ > 100)
res4: scala.collection.immutable.Stream[Int] = Stream(101, ?)


scala> torrent.map{_ * 2}
res5: scala.collection.immutable.Stream[Int] = Stream(0, ?)


scala> torrent(5)
res6: Int = 5

Arrays

Scala is a strong typed language. It determines the array type if it’s not specified.

// in numeric, the biggest data type determines the Collection type
scala> Array(6.67e-11,  3.1415,  333F,  666L)
res0: Array[Double] = Array(6.67E-11, 3.1415, 333.0, 666.0)


// we can force manually the type
scala> Array[Number] (6.67e-11,  3.1415,  333F,  666L)
res0: Array[Number] = Array(6.67E-11, 3.1415, 333.0, 666)

There are several ways to create and initialize arrays:

// from Range
scala> val r = Array.range(0, 16)
r: Array[Int] = Array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)


// from Range with step
scala> val rs = Array.range(-16, 16, 3)
rs: Array[Int] = Array(-16, -13, -10, -7, -4, -1, 2, 5, 8, 11, 14)


// with fill
scala> val f = Array.fill(3)("ha")
f: Array[String] = Array(ha, ha, ha)


// with tabulate
scala> val t = Array.tabulate(9)(n => n * n)
t: Array[Int] = Array(0, 1, 4, 9, 16, 25, 36, 49, 64)


// from List
scala> val a = List("Spark", "Mesos", "Akka", "Cassandra", "Kafka").toArray
a: Array[String] = Array(Spark, Mesos, Akka, Cassandra, Kafka)


// from String
scala> val s = "ELONGATION".toArray
s: Array[Char] = Array(E, L, O, N, G, A, T, I, O, N)


// Scala Arrays corresponds to Java Arrays
scala> val bigData = Array("B", "I", "G", "-", "D", "A", "T", "A")
bigData: Array[String] = Array(B, I, G, -, D, A, T, A)


scala> bigData(0) = "F"
scala> bigData(1) = "A"
scala> bigData(2) = "S"
scala> bigData(3) = "T"
scala> bigData
bigData: Array[String] = Array(F, A, S, T, D, A, T, A)

ArrayBuffers

An ArrayBufferis an array with dynamic size. The following are some examples.

// initialization with some elements
val cities = collection.mutable.ArrayBuffer("San Francisco", "New York")


// += to add one element
cities += "London"


// += to add multiple elements
cities += ("Tokio", "Beijing")


// ++= to add another collection
cities ++= Seq("Paris", "Berlin")


// append, to add multiple elements
cities.append("Sao Paulo", "Mexico")

Queues

The queue follows the first-in, first-out (FIFO) data structure. The following are some examples.

// to use it we need to import it from collection mutable
scala> import scala.collection.mutable.Queue
import scala.collection.mutable.Queue


// here we create a Queue of Strings
scala> var smack = new Queue[String]
smack: scala.collection.mutable.Queue[String] = Queue()


// += operator, to add an element
scala> smack += "Spark"
res0: scala.collection.mutable.Queue[String] = Queue(Spark)


// += operator, to add multiple elements
scala> smack += ("Mesos", "Akka")
res1: scala.collection.mutable.Queue[String] = Queue(Spark, Mesos, Akka)


// ++= operator, to add a Collection
scala> smack ++= List("Cassandra", "Kafka")
res2: scala.collection.mutable.Queue[String] = Queue(Spark, Mesos, Akka, Cassandra, Kafka)


// the Queue power: enqueue
scala> smack.enqueue("Scala")
scala> smack
res3: scala.collection.mutable.Queue[String] =
Queue(Spark, Mesos, Akka, Cassandra, Kafka, Scala)


// its counterpart, dequeue
scala> smack.dequeue
res4: String = Spark


// dequeue remove the first element of the queue
scala> smack
res5: scala.collection.mutable.Queue[String] = Queue(Mesos, Akka, Cassandra, Kafka, Scala)


// dequeue, will take the next element
scala> val next = smack.dequeue
next: String = Mesos


// we verify that everything run as the book says
scala> smack
res6: scala.collection.mutable.Queue[String] = Queue(Akka, Cassandra, Kafka, Scala)

The dequeueFirst and dequeueAll methods dequeue the elements matching the predicate.

scala> val smack = Queue("Spark", "Mesos", "Akka", "Cassandra", "Kafka")
smack: scala.collection.mutable.Queue[String] = Queue(Spark, Mesos, Akka, Cassandra, Kafka)


// remove the first element containing a k
scala> smack.dequeueFirst(_.contains("k"))
res0: Option[String] = Some(Spark)


scala> smack
res1: scala.collection.mutable.Queue[String] = Queue(Mesos, Akka, Cassandra, Kafka)


// remove all the elements beginning with A
scala> smack.dequeueAll(_.startsWith("A"))
res2: scala.collection.mutable.Seq[String] = ArrayBuffer(Akka)


scala> smack
res3: scala.collection.mutable.Queue[String] = Queue(Mesos, Cassandra, Kafka)

Stacks

The stack follows the last-in, first-out (LIFO) data structure. The following are some examples.

// to use it we need to import it from collection mutable
scala> import scala.collection.mutable.Stack
import scala.collection.mutable.Stack


// here we create a Stack of Strings
scala> var smack = Stack[String]()
smack: scala.collection.mutable.Stack[String] = Stack()


// push, to add elements at the top
scala> smack.push("Spark")
res0: scala.collection.mutable.Stack[String] = Stack(Spark)
scala> smack.push("Mesos")
res1: scala.collection.mutable.Stack[String] = Stack(Mesos, Spark)


// push, to add multiple elements
scala> smack.push("Akka", "Cassandra", "Kafka")
res2: scala.collection.mutable.Stack[String] = Stack(Kafka, Cassandra, Akka, Mesos, Spark)


// pop, to take the last element inserted
scala> val top = smack.pop
top: String = Kafka
scala> smack
res3: scala.collection.mutable.Stack[String] = Stack(Cassandra, Akka, Mesos, Spark)


// top, to access the last element without extract it
scala> smack.top
res4: String = Cassandra


// "Cassandra" is still on the top
scala> smack
res5: scala.collection.mutable.Stack[String] = Stack(Cassandra, Akka, Mesos, Spark)


// size, the Seq method to know the number of elements
scala> smack.size
res6: Int = 4


// isEmpty, another Seq method
scala> smack.isEmpty
res7: Boolean = false


// clear, to empty all the stack suddenly
scala> smack.clear
scala> smack
res9: scala.collection.mutable.Stack[String] = Stack()

Ranges

Ranges are most commonly used with loops, as shown in the following examples.

// to, to make a range from a to b (upper limit is included)
scala> 0 to 6
res0: scala.collection.immutable.Range.Inclusive = Range(0, 1, 2, 3, 4, 5, 6)


// until, to make a range from 0 to 7 (upper limit not included)
scala> 0 until 6
res1: scala.collection.immutable.Range.Inclusive = Range(0, 1, 2, 3, 4, 5)


// by, to specify a step (in this case, every 3)
scala> 0 to 21 by 3
res2 scala.collection.immutable.Range = Range(0, 3, 6, 9, 12, 15, 18, 21)


// to, also function with chars
scala> 'a' to 'k'
res3: scala.collection.immutable.NumericRange.Inclusive[Char] = NumericRange(a, b, c, d, e, f, g, h, i, j, k)


// a Range toList
scala> val l = (0 to 16).toList
l: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)


// a Range toArray
scala> val a = (0 to 16).toArray
a: Array[Int] = Array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)


// a Range toSet
scala> val s = (0 to 10).toSet
s: scala.collection.immutable.Set[Int] = Set(0, 5, 10, 1, 6, 9, 2, 7, 3, 8, 4)


// Array has a range method (upper limit excluded)
scala> val a = Array.range(0, 17)
a: Array[Int] = Array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)


// Vector has a range method (upper limit excluded)
scala> val v = Vector.range(0, 10)
v: collection.immutable.Vector[Int] = Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)


// List has a range method (upper limit excluded)
scala> val l = List.range(0, 17)
l: List[Int] = List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)


// A list with numbers in a range with a step of 5
scala> val l = List.range(0, 50, 5)
l: List[Int] = List(0, 5, 10, 15, 20, 25, 30, 35, 40, 45)


// An ArrayBuffer with characters in a range
scala> val ab = collection.mutable.ArrayBuffer.range('a', 'f')
ab: scala.collection.mutable.ArrayBuffer[Char] = ArrayBuffer(a, b, c, d, e)


// An old fashioned for loop using a range
scala> for (i <- 1 to 5) println(i)
1
2
3
4
5

Summary

Since all the examples in this book are in Scala, we need to reinforce it before beginning our study. This chapter provided a review of Scala. We studied the fundamental parts of the language. Programming is about data structures and algorithms. In this chapter, we discussed the Scala type system (the data structures) and the principal concepts of functional programming.

The use of object-oriented programming (OOP) in past decades was an era of reusable software components. Things no longer work that way. Now components interoperate by exchanging immutable data structures (lists, maps, and sets), which is more like functional programming.

In the next chapter, we review an actor model implementation called Akka. To fully understand the examples, you need to know the Scala programming language.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset