Extracting objects from the database

We now have a database populated with a few users. Let's query this database from the REPL:

scala> import com.mongodb.casbah.Imports._
import com.mongodb.casbah.Imports._

scala> val collection = MongoClient()("github")("users")
MongoCollection = users

scala> val maybeUser = collection.findOne
Option[collection.T] = Some({ "_id" : { "$oid" : "562e922546f953739c43df02"} , "github_id" : 1 , "login" : "mojombo" , "repos" : ...

The findOne method returns a single DBObject object wrapped in an option, unless the collection is empty, in which case it returns None. We must therefore use the get method to extract the object:

scala> val user = maybeUser.get
collection.T = { "_id" : { "$oid" : "562e922546f953739c43df02"} , "github_id" : 1 , "login" : "mojombo" , "repos" : ...

As you learned earlier in this chapter, DBObject is a map-like object with keys of type String and values of type AnyRef:

scala> user("login")
AnyRef = mojombo

In general, we want to restore compile-time type information as early as possible when importing objects from the database: we do not want to pass AnyRefs around when we can be more specific. We can use the getAs method to extract a field and cast it to a specific type:

scala> user.getAs[String]("login")
Option[String] = Some(mojombo)

If the field is missing in the document or if the value cannot be cast, getAs will return None:

scala> user.getAs[Int]("login")
Option[Int] = None

The astute reader may note that the interface provided by getAs[T] is similar to the read[T] method that we defined on a JDBC result set in Chapter 5, Scala and SQL through JDBC.

If getAs fails (for instance, because the field is missing), we can use the orElse partial function to recover:

scala> val loginName = user.getAs[String]("login") orElse {       
  println("No login field found. Falling back to 'name'")
  user.getAs[String]("name")
}
loginName: Option[String] = Some(mojombo)

The getAsOrElse method allows us to substitute a default value if the cast fails:

scala> user.getAsOrElse[Int]("id", 5)
Int = 1392879

Note that we can also use getAsOrElse to throw an exception:

scala> user.getAsOrElse[String]("name", 
  throw new IllegalArgumentException(
    "Missing value for name")
)
java.lang.IllegalArgumentException: Missing value for name
...

Arrays embedded in documents can be cast to List[T] objects, where T is the type of elements in the array:

scala> user.getAsOrElse[List[DBObject]]("repos",
  List.empty[DBObject])
List[DBObject] = List({ "github_id" : 26899533 , "name" : "30daysoflaptops.github.io" ...

Retrieving a single document at a time is not very useful. To retrieve all the documents in a collection, use the .find method:

scala> val userIterator = collection.find()
userIterator: collection.CursorType = non-empty iterator

This returns an iterator of DBObjects. To actually fetch the documents from the database, you need to materialize the iterator by transforming it into a collection, using, for instance, .toList:

scala> val userList = userIterator.toList
List[DBObject] = List({ "_id" : { "$oid": ...

Let's bring all of this together. We will write a toy program that prints the average number of repositories per user in our collection. The code works by fetching every document in the collection, extracting the number of repositories from each document, and then averaging over these:

// RepoNumber.scala

import com.mongodb.casbah.Imports._

object RepoNumber {

  /** Extract the number of repos from a DBObject
    * representing a user.
    */   
  def extractNumber(obj:DBObject):Option[Int] = {
    val repos = obj.getAs[List[DBObject]]("repos") orElse {
      println("Could not find or parse 'repos' field")
      None
    }
    repos.map { _.size }
  }

  val collection = MongoClient()("github")("users")

  def main(args:Array[String]) {    
    val userIterator = collection.find()

    // Convert from documents to Option[Int]
    val repoNumbers = userIterator.map { extractNumber }

    // Convert from Option[Int] to Int
    val wellFormattedNumbers = repoNumbers.collect { 
      case Some(v) => v 
    }.toList

    // Calculate summary statistics
    val sum = wellFormattedNumbers.reduce { _ + _ }
    val count = wellFormattedNumbers.size
    
    if (count == 0) {
      println("No repos found")
    }
    else {
      val mean = sum.toDouble / count.toDouble
      println(s"Total number of users with repos: $count")
      println(s"Total number of repos: $sum")
      println(s"Mean number of repos: $mean")
    }
  }
}

Let's run this through SBT:

> runMain RepoNumber
Total number of users with repos: 500
Total number of repos: 9649
Mean number of repos: 19.298

The code starts with the extractNumber function, which extracts the number of repositories from each DBObject. The return value is None if the document does not contain the repos field.

The main body of the code starts by creating an iterator over DBObjects in the collection. This iterator is then mapped through the extractNumber function, which transforms it into an iterator of Option[Int]. We then run .collect on this iterator to collect all the values that are not None, converting from Option[Int] to Int in the process. Only then do we materialize the iterator to a list using .toList. The resulting list, wellFormattedNumbers, has the List[Int] type. We then just take the mean of this list and print it to screen.

Note that, besides the extractNumber function, none of this program deals with Casbah-specific types: the iterator returned by .find() is just a Scala iterator. This makes Casbah straightforward to use: the only data type that you need to familiarize yourself with is DBObject (compare this with JDBC's ResultSet, which we had to explicitly wrap in a stream, for instance).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset