Looser coupling with type classes

So far, we have been reading and writing simple types to the database. Let's imagine that we want to add a gender column to our database. We will store the gender as an enumeration in our physicists database. Our table is now as follows:

mysql> CREATE TABLE physicists (
        id INT(11) AUTO_INCREMENT PRIMARY KEY,
        name VARCHAR(32) NOT NULL,
        gender ENUM("Female", "Male") NOT NULL
);

How can we represent genders in Scala? A good way of doing this is with an enumeration:

// Gender.scala

object Gender extends Enumeration {
  val Male = Value
  val Female = Value
}

However, we now have a problem when deserializing objects from the database: JDBC has no built-in mechanism to convert from a SQL ENUM type to a Scala Gender type. We could achieve this by just converting manually every time we need to read gender information:

resultsStream.map { 
  rs => Gender.withName(rs.getString("gender")) 
}.toVector

However, we would need to write this everywhere that we want to read the gender field. This goes against the DRY (don't repeat yourself) principle, leading to code that is difficult to maintain. If we decide to change the way gender is stored in the database, we would need to find every instance in the code where we read the gender field and change it.

A somewhat better solution would be to add a getGender method to the ResultSet class using the pimp my library idiom that we used extensively in this chapter. This solution is still not optimal. We are adding unnecessary specificity to ResultSet: it is now coupled to the structure of our databases.

We could create a subclass of ResultSet using inheritance, such as PhysicistResultSet, that can read the fields in a specific table. However, this approach is not composable: if we had another table that kept track of pets, with name, species, and gender fields, we would have to either reimplement the code for reading gender in a new PetResultSet or factor out a GenderedResultSet superclass. As the number of tables grows, the inheritance hierarchy would become unmanageable. A better approach would let us compose the functionality that we need. In particular, we want to decouple the process of extracting Scala objects from a result set from the code for iterating over a result set.

Type classes

Scala provides an elegant solution using type classes. Type classes are a very powerful arrow in the Scala architect's quiver. However, they can present a bit of a learning curve, especially as there is no direct equivalent in object-oriented programming.

Instead of presenting an abstract explanation, I will dive into an example: I will describe how we can leverage type classes to convert fields in a ResultSet to Scala types. The aim is to define a read[T](field) method on ResultSet that knows exactly how to deserialize to objects of type T. This method will replace and extend the getXXX methods in ResultSet:

// results is a ResultSet instance
val name = results.read[String]("name")
val gender = results.read[Gender.Value]("gender")

We start by defining an abstract SqlReader[T] trait that exposes a read method to read a specific field from a ResultSet and return an instance of type T:

// SqlReader.scala

import java.sql._

trait SqlReader[T] {
  def read(results:ResultSet, field:String):T
}

We now need to provide a concrete implementation of SqlReader[T] for every T type that we want to read. Let's provide concrete implementations for the Gender and String fields. We will place the implementation in a SqlReader companion object:

// SqlReader.scala

object SqlReader {
  implicit object StringReader extends SqlReader[String] {
    def read(results:ResultSet, field:String):String =
      results.getString(field)
  }

  implicit object GenderReader extends SqlReader[Gender.Value] {
    def read(results:ResultSet, field:String):Gender.Value =
      Gender.withName(StringReader.read(results, field))
  }
}

We could now use our ReadableXXX objects to read from a result set:

import SqlReader._
val name = StringReader.read(results, "name")
val gender = GenderReader.read(results, "gender")

This is already somewhat better than using the following:

Gender.withName(results.getString("gender"))

This is because the code to map from a ResultSet field to Gender.Value is centralized in a single place: ReadableGender. However, it would be great if we could tell Scala to use ReadableGender whenever it needs to read Gender.Value, and use ReadableString whenever it needs to read a String value. This is exactly what type classes do.

Coding against type classes

We defined a Readable[T] interface that abstracts how to read an object of type T from a field in a ResultSet. How do we tell Scala that it needs to use this Readable object to convert from the ResultSet fields to the appropriate Scala type?

The key is the implicit keyword that we used to prefix the GenderReader and StringReader object definitions. It lets us write:

implicitly[SqlReader[Gender.Value]].read(results, "gender")
implicitly[SqlReader[String]].read(results, "name")

By writing implicitly[SqlReader[T]], we are telling the Scala compiler to find a class (or an object) that extends SqlReader[T] that is marked for implicit use. Try this out by pasting the following in the command line, for instance:

scala> :paste

import Implicits._ // Connection to RichConnection conversion
SqlUtils.usingConnection("test") {
  _.withQuery("select * from physicists") {
    rs => {
      rs.next() // advance to first record
      implicitly[SqlReader[Gender.Value]].read(rs, "gender")
    }
  }
}

Of course, using implicitly[SqlReader[T]] everywhere is not particularly elegant. Let's use the pimp my library idiom to add a read[T] method to ResultSet. We first define a RichResultSet class that we can use to "pimp" the ResultSet class:

// RichResultSet.scala

import java.sql.ResultSet

class RichResultSet(val underlying:ResultSet) {
  def read[T : SqlReader](field:String):T = {
    implicitly[SqlReader[T]].read(underlying, field)
  }
}

The only unfamiliar part of this should be the read[T : SqlReader] generic definition. We are stating here that read will accept any T type, provided an instance of SqlReader[T] exists. This is called a context bound.

We must also add implicit methods to the Implicits object to convert from ResultSet to RichResultSet. You should be familiar with this now, so I will not bore you with the details. You can now call results.read[T](fieldName) for any T for which you have a SqlReader[T] implicit object defined:

import Implicits._

SqlUtils.usingConnection("test") { connection =>
  connection.withQuery("SELECT * FROM physicists") {
    results =>
      val resultStream = SqlUtils.stream(results)
      resultStream.map { row => 
        val name = row.read[String]("name")
        val gender = row.read[Gender.Value]("gender")
        (name, gender)
      }.toVector
  }
}
//=> Vector[(String, Gender.Value)] = Vector((Albert Einstein,Male), (Marie Curie,Female))

Let's summarize the steps needed for type classes to work. We will do this in the context of deserializing from SQL, but you will be able to adapt these steps to solve other problems:

  • Define an abstract generic trait that provides the interface for the type class, for example, SqlReader[T]. Any functionality that is independent of T can be added to this base trait.
  • Create the companion object for the base trait and add implicit objects extending the trait for each T, for example,
    implicit object StringReader extends SqlReader[T].
  • Type classes are always used in generic methods. A method that relies on the existence of a type class for an argument must contain a context bound in the generic definition, for example, def read[T : SqlReader](field:String):T. To access the type class in this method, use the implicitly keyword: implicitly[SqlReader[T]].

When to use type classes

Type classes are useful when you need a particular behavior for many different types, but exactly how this behavior is implemented varies between these types. For instance, we need to be able to read several different types from ResultSet, but exactly how each type is read differs between types: for strings, we must read from ResultSet using getString, whereas for integers, we must use getInt followed by wasNull.

A good rule of thumb is when you start thinking "Oh, I could just write a generic method to do this. Ah, but wait, I will have to write the Int implementation as a specific edge case as it behaves differently. Oh, and the Gender implementation. I wonder if there's a better way?", then type classes might be useful.

Benefits of type classes

Data scientists frequently have to deal with new input streams, changing requirements, and new data types. Having an object-relational mapping layer that is easy to extend or alter is therefore critical to responding to changes efficiently. Minimizing coupling between code entities and separation of concerns are the only ways to ensure that the code can be changed in response to new data.

With type classes, we maintain orthogonality between accessing records in the database (through the ResultSet class) and how individual fields are transformed to Scala objects: both can vary independently. The only coupling between these two concerns is through the SqlReader[T] interface.

This means that both concerns can evolve independently: to read a new data type, we just need to implement a SqlReader[T] object. Conversely, we can add functionality to ResultSet without needing to reimplement how fields are converted. For instance, we could add a getColumn method that returns a Vector[T] of all the values of a field in a ResultSet instance:

def getColumn[T : SqlReader](field:String):Vector[T] = {
  val resultStream = SqlUtils.stream(results)
  resultStream.map { _.read[T](field) }.toVector
}

Note how we could do this without increasing the coupling to the way in which individual fields are read.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset