We now have a database populated with a few users. Let's query this database from the REPL:
scala> import com.mongodb.casbah.Imports._ import com.mongodb.casbah.Imports._ scala> val collection = MongoClient()("github")("users") MongoCollection = users scala> val maybeUser = collection.findOne Option[collection.T] = Some({ "_id" : { "$oid" : "562e922546f953739c43df02"} , "github_id" : 1 , "login" : "mojombo" , "repos" : ...
The findOne
method returns a single DBObject
object wrapped in an option, unless the collection is empty, in which case it returns None
. We must therefore use the get
method to extract the object:
scala> val user = maybeUser.get collection.T = { "_id" : { "$oid" : "562e922546f953739c43df02"} , "github_id" : 1 , "login" : "mojombo" , "repos" : ...
As you learned earlier in this chapter, DBObject
is a map-like object with keys of type String
and values of type AnyRef
:
scala> user("login") AnyRef = mojombo
In general, we want to restore compile-time type information as early as possible when importing objects from the database: we do not want to pass AnyRef
s around when we can be more specific. We can use the getAs
method to extract a field and cast it to a specific type:
scala> user.getAs[String]("login") Option[String] = Some(mojombo)
If the field is missing in the document or if the value cannot be cast, getAs
will return None
:
scala> user.getAs[Int]("login") Option[Int] = None
The astute reader may note that the interface provided by getAs[T]
is similar to the read[T]
method that we defined on a JDBC result set in Chapter 5, Scala and SQL through JDBC.
If getAs
fails (for instance, because the field is missing), we can use the orElse
partial function to recover:
scala> val loginName = user.getAs[String]("login") orElse { println("No login field found. Falling back to 'name'") user.getAs[String]("name") } loginName: Option[String] = Some(mojombo)
The getAsOrElse
method allows us to substitute a default value if the cast fails:
scala> user.getAsOrElse[Int]("id", 5) Int = 1392879
Note that we can also use getAsOrElse
to throw an exception:
scala> user.getAsOrElse[String]("name", throw new IllegalArgumentException( "Missing value for name") ) java.lang.IllegalArgumentException: Missing value for name ...
Arrays embedded in documents can be cast to List[T]
objects, where T
is the type of elements in the array:
scala> user.getAsOrElse[List[DBObject]]("repos", List.empty[DBObject]) List[DBObject] = List({ "github_id" : 26899533 , "name" : "30daysoflaptops.github.io" ...
Retrieving a single document at a time is not very useful. To retrieve all the documents in a collection, use the .find
method:
scala> val userIterator = collection.find() userIterator: collection.CursorType = non-empty iterator
This returns an iterator of DBObject
s. To actually fetch the documents from the database, you need to materialize the iterator by transforming it into a collection, using, for instance, .toList
:
scala> val userList = userIterator.toList List[DBObject] = List({ "_id" : { "$oid": ...
Let's bring all of this together. We will write a toy program that prints the average number of repositories per user in our collection. The code works by fetching every document in the collection, extracting the number of repositories from each document, and then averaging over these:
// RepoNumber.scala import com.mongodb.casbah.Imports._ object RepoNumber { /** Extract the number of repos from a DBObject * representing a user. */ def extractNumber(obj:DBObject):Option[Int] = { val repos = obj.getAs[List[DBObject]]("repos") orElse { println("Could not find or parse 'repos' field") None } repos.map { _.size } } val collection = MongoClient()("github")("users") def main(args:Array[String]) { val userIterator = collection.find() // Convert from documents to Option[Int] val repoNumbers = userIterator.map { extractNumber } // Convert from Option[Int] to Int val wellFormattedNumbers = repoNumbers.collect { case Some(v) => v }.toList // Calculate summary statistics val sum = wellFormattedNumbers.reduce { _ + _ } val count = wellFormattedNumbers.size if (count == 0) { println("No repos found") } else { val mean = sum.toDouble / count.toDouble println(s"Total number of users with repos: $count") println(s"Total number of repos: $sum") println(s"Mean number of repos: $mean") } } }
Let's run this through SBT:
> runMain RepoNumber Total number of users with repos: 500 Total number of repos: 9649 Mean number of repos: 19.298
The code starts with the extractNumber
function, which extracts the number of repositories from each DBObject
. The return value is None
if the document does not contain the repos
field.
The main body of the code starts by creating an iterator over DBObject
s in the collection. This iterator is then mapped through the extractNumber
function, which transforms it into an iterator of Option[Int]
. We then run .collect
on this iterator to collect all the values that are not None
, converting from Option[Int]
to Int
in the process. Only then do we materialize the iterator to a list using .toList
. The resulting list, wellFormattedNumbers
, has the List[Int]
type. We then just take the mean of this list and print it to screen.
Note that, besides the
extractNumber
function, none of this program deals with Casbah-specific types: the iterator returned by .find()
is just a Scala iterator. This makes Casbah straightforward to use: the only data type that you need to familiarize yourself with is DBObject
(compare this with JDBC's ResultSet
, which we had to explicitly wrap in a stream, for instance).