Chapter 3. Writing programs using MongoDB

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3. Writing programs using MongoDB

This chapter covers

Introducing the MongoDB API through Ruby
Understanding how the drivers work
Using the BSON format and MongoDB network protocol
Building a complete sample application

It’s time to get practical. Though there’s much to learn from experimenting with the MongoDB shell, you can see the real value of this database only after you’ve built something with it. That means jumping into programming and taking a first look at the MongoDB drivers. As mentioned before, MongoDB, Inc. provides officially supported, Apache-licensed MongoDB drivers for all of the most popular programming languages. The driver examples in the book use Ruby, but the principles we’ll illustrate are universal and easily transferable to other drivers. Throughout the book we’ll illustrate most commands with the JavaScript shell, but examples of using MongoDB from within an application will be in Ruby.

We’re going to explore programming in MongoDB in three stages. First, you’ll install the MongoDB Ruby driver and we’ll introduce the basic CRUD (create, read, update, delete) operations. This process should go quickly and feel familiar because the driver API is similar to that of the shell. Next, we’re going to delve deeper into the driver, explaining how it interfaces with MongoDB. Without getting too low-level, this section will show you what’s going on behind the scenes with the drivers in general. Finally, you’ll develop a simple Ruby application for monitoring Twitter. Working with a real-world data set, you’ll begin to see how MongoDB works in the wild. This final section will also lay the groundwork for the more in-depth examples presented in part 2 of the book.

New to Ruby?

Ruby is a popular and readable scripting language. The code examples have been designed to be as explicit as possible so that even programmers unfamiliar with Ruby can benefit. Any Ruby idioms that may be hard to understand will be explained in the book. If you’d like to spend a few minutes getting up to speed with Ruby, start with the official 20-minute tutorial at http://mng.bz/THR3.

3.1. MongoDB through the Ruby lens

Normally when you think of drivers, what comes to mind are low-level bit manipulations and obtuse interfaces. Thankfully, the MongoDB language drivers are nothing like that; instead, they’ve been designed with intuitive, language-sensitive APIs so that many applications can sanely use a MongoDB driver as the sole interface to the database. The driver APIs are also fairly consistent across languages, which means that developers can easily move between languages as needed; anything you can do in the JavaScript API, you can do in the Ruby API. If you’re an application developer, you can expect to find yourself comfortable and productive with any of the MongoDB drivers without having to concern yourself with low-level implementation details.

In this first section, you’ll install the MongoDB Ruby driver, connect to the database, and learn how to perform basic CRUD operations. This will lay the groundwork for the application you’ll build at the end of the chapter.

3.1.1. Installing and connecting

You can install the MongoDB Ruby driver using RubyGems, Ruby’s package management system.

Many newer operating systems come with Ruby already installed. You can check if you already have Ruby installed by running ruby -–version from your shell. If you don’t have Ruby installed on your system, you can find detailed installation instructions at www.ruby-lang.org/en/downloads.

You’ll also need Ruby’s package manager, RubyGems. You may already have this as well; check by running gem -–version. Instructions for installing RubyGems can be found at http://docs.rubygems.org/read/chapter/3. Once you have RubyGems installed, run:

gem install mongo

This should install both the mongo and bson^[1] gems. You should see output like the following (the version numbers will likely be newer than what’s shown here):

¹
BSON, explained in the next section, is the JSON-inspired binary format that MongoDB uses to represent documents. The bson Ruby gem serializes Ruby objects to and from BSON.

Fetching: bson-3.2.1.gem (100%)
Building native extensions.  This could take a while...
Successfully installed bson-3.2.1
Fetching: mongo-2.0.6.gem (100%)
Successfully installed mongo-2.0.6
2 gems installed

We also recommend you install the bson_ext gem, though this is optional. bson_ext is an official gem that contains a C implementation of BSON, enabling more efficient handling of BSON in the MongoDB driver. This gem isn’t installed by default because installation requires a compiler. Rest assured, if you’re unable to install bson_ext, your programs will still work as intended.

You’ll start by connecting to MongoDB. First, make sure that mongod is running by running the mongo shell to ensure you can connect. Next, create a file called connect.rb and enter the following code:

require 'rubygems'
require 'mongo'

$client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => 'tutorial')
Mongo::Logger.logger.level = ::Logger::ERROR
$users = $client[:users]
puts 'connected!'

The first two require statements ensure that you’ve loaded the driver. The next three lines instantiate the client to localhost and connect to the tutorial database, store a reference to the users collection in the $users variable, and print the string connected!. We place a $ in front of each variable to make it global so that it’ll be accessible outside of the connect.rb script. Save the file and run it:

$ ruby connect.rb
D, [2015-06-05T12:32:38.843933 #33946] DEBUG -- : MONGODB | Adding
     127.0.0.1:27017 to the cluster. | runtime: 0.0031ms
D, [2015-06-05T12:32:38.847534 #33946] DEBUG -- : MONGODB | COMMAND |
     namespace=admin.$cmd selector={:ismaster=>1} flags=[] limit=-1 skip=0
     project=nil | runtime: 3.4170ms
connected!

If no exceptions are raised, you’ve successfully connected to MongoDB from Ruby and you should see connected! printed to your shell. That may not seem glamorous, but connecting is the first step in using MongoDB from any language. Next, you’ll use that connection to insert some documents.

3.1.2. Inserting documents in Ruby

To run interesting MongoDB queries you first need some data, so let’s create some (this is the C in CRUD). All of the MongoDB drivers are designed to use the most natural document representation for their language. In JavaScript, JSON objects are the obvious choice, because JSON is a document data structure; in Ruby, the hash data structure makes the most sense. The native Ruby hash differs from a JSON object in only a couple of small ways; most notably, where JSON separates keys and values with a colon, Ruby uses a hash rocket (=>).^[2]

²
In Ruby 1.9, you may optionally use a colon as the key-value separator, like hash = {foo: 'bar'}, but we’ll stick with the hash rocket in the interest of backward compatibility.

If you’re following along, you can continue adding code to the connect.rb file. Alternatively, a nice approach is to use Ruby’s interactive shell, irb. irb is a REPL (Read, Evaluate, Print Loop) console, in which you can type Ruby code to have it dynamically executed, making it ideal for experimentation. Anything you write in irb can be put in a script, so we recommend using it to learn new things, then copying your commands when you’d like them executed in a program. You can launch irb and require connect.rb so that you’ll immediately have access to the connection, database, and collection objects initialized therein. You can then run Ruby code and receive immediate feedback. Here’s an example:

$ irb -r ./connect.rb
irb(main):017:0> id = $users.insert_one({"last_name" => "mtsouk"})
=> #<Mongo::Operation::Result:70275279152800 documents=[{"ok"=>1, "n"=>1}]>
irb(main):014:0> $users.find().each do |user|
irb(main):015:1* puts user
irb(main):016:1> end
{"_id"=>BSON::ObjectId('55e3ee1c5ae119511d000000'), "last_name"=>"knuth"}
{"_id"=>BSON::ObjectId('55e3f13d5ae119516a000000'), "last_name"=>"mtsouk"}
=> #<Enumerator: #<Mongo::Cursor:0x70275279317980
@view=#<Mongo::Collection::View:0x70275279322740 namespace='tutorial.users
@selector={} @options={}>>:each>

irb gives you a command line shell with a prompt followed by > (this may look a little different on your machine). The prompt allows you to type in commands, and in the previous code we’ve highlighted the user input in bold. When you run a command in irb it will print out the value returned by the command, if there is one; that’s what is shown after => above.

Let’s build some documents for your users’ collection. You’ll create two documents representing two users, Smith and Jones. Each document, expressed as a Ruby hash, is assigned to a variable:

smith = {"last_name" => "smith", "age" => 30}
jones = {"last_name" => "jones", "age" => 40}

To save the documents, you’ll pass them to the collection’s insert method. Each call to insert returns a unique ID, which you’ll store in a variable to simplify later retrieval:

smith_id = $users.insert_one(smith)
jones_id = $users.insert_one(jones)

You can verify that the documents have been saved with some simple queries, so you can query with the user collection’s find() method like this:

irb(main):013:0> $users.find("age" => {"$gt" => 20}).each.to_a do |row|
irb(main):014:1* puts row
irb(main):015:1> end
=> [{"_id"=>BSON::ObjectId('55e3f7dd5ae119516a000002'), "last_name"=>"smith",
     "age"=>30}, {"_id"=>BSON::ObjectId('55e3f7e25ae119516a000003'),
     "last_name"=>"jones", "age"=>40}]

The return values for these queries will appear at the prompt if run in irb. If the code is being run from a Ruby file, prepend Ruby’s p method to print the output to the screen:

p $users.find( :age => {"$gt" => 20}).to_a

You’ve successfully inserted two documents from Ruby. Let’s now take a closer look at queries.

3.1.3. Queries and cursors

Now that you’ve created documents, it’s on to the read operations (the R in CRUD) provided by MongoDB. The Ruby driver defines a rich interface for accessing data and handles most of the details for you. The queries we show in this section are fairly simple selections, but keep in mind that MongoDB allows more complex queries, such as text searches and aggregations, which are described in later chapters.

You’ll see how this is so by looking at the standard find method. Here are two possible find operations on your data set:

$users.find({"last_name" => "smith"}).to_a
$users.find({"age" => {"$gt" => 30}}).to_a

The first query searches for all user documents where the last_name is smith and that the second query matches all documents where age is greater than 30. Try entering the second query in irb:

2.1.4 :020 > $users.find({"age" => {"$gt" => 30}})
 => #<Mongo::Collection::View:0x70210212601420 namespace='tutorial.users
@selector={"age"=>{"$gt"=>30}} @options={}>

The results are returned in a Mongo::Collection::View object, which extends Iterable and makes it easy to iterate through the results. We’ll discuss cursors in more detail in Section 3.2.3. In the meantime, you can fetch the results of the $gt query:

cursor = $users.find({"age" => {"$gt" => 30}})
cursor.each do |doc|
  puts doc["last_name"]
end

Here you use Ruby’s each iterator, which passes each result to a code block. The last_name attribute is then printed to the console. The $gt used in the query is a MongoDB operator; the $ character has no relation to the $ placed before global Ruby variables like $users. Also, if there are any documents in the collection without last_name, you might notice that nil (Ruby’s null value) is printed out; this indicates the lack of a value and it’s normal to see this.

The fact that you even have to think about cursors here may come as a surprise given the shell examples from the previous chapter. But the shell uses cursors the same way every driver does; the difference is that the shell automatically iterates over the first 20 cursor results when you call find(). To get the remaining results, you can continue iterating manually by entering the it command.

3.1.4. Updates and deletes

Recall from chapter 2 that updates require at least two arguments: a query selector and an update document. Here’s a simple example using the Ruby driver:

$users.find({"last_name" => "smith"}).update_one({"$set" => {"city" =>
"Chicago"}})

This update finds the first user with a last_name of smith and, if found, sets the value of city to Chicago. This update uses the $set operator. You can run a query to show the change:

$users.find({"last_name" => "smith"}).to_a

The view allows you to decide whether you only want to update one document or all documents matching the query. In the preceding example, even if you had several users with the last name of smith, only one document would be updated. To apply the update to a particular smith, you’d need to add more conditions to your query selector. But if you actually want to apply the update to all smith documents, you must replace the update_one with the update_many method:

$users.find({"last_name" => "smith"}).update_many({"$set" => {"city" =>
"Chicago"}})

Deleting data is much simpler. We’ve discussed how it works in the MongoDB shell and the Ruby driver is no different. To review: you simply use the remove method. This method takes an optional query selector that will remove only those documents matching the selector. If no selector is provided, all documents in the collection will be removed. Here, you’re removing all user documents where the age attribute is greater than or equal to 40:

$users.find({"age" => {"$gte" => 40}}).delete_one

This will only delete the first one matching the matching criteria. If you want to delete all documents matching the criteria, you’d have to run this:

$users.find({"age" => {"$gte" => 40}}).delete_many

With no arguments, the drop method deletes all remaining documents:

$users.drop

3.1.5. Database commands

In the previous chapter you saw the centrality of database commands. There, we looked at the two stats commands. Here, we’ll look at how you can run commands from the driver using the listDatabases command as an example. This is one of a number of commands that must be run on the admin database, which is treated specially when authentication is enabled. For details on the authentication and the admin database, see chapter 10.

First, you instantiate a Ruby database object referencing the admin database. You then pass the command’s query specification to the command method:

$admin_db = $client.use('admin')
$admin_db.command({"listDatabases" => 1})

Note that this code still depends on what we put in the connect.rb script above because it expects the MongoDB connection to be in $client. The response is a Ruby hash listing all the existing databases and their sizes on disk:

#<Mongo::Operation::Result:70112905054200 documents=[{"databases"=>[
{
    "name"=>"local",
    "sizeOnDisk"=>83886080.0,
    "empty"=>false
},
{
    "name"=>"tutorial",
"sizeOnDisk"=>83886080.0,
"empty"=>false
},
{
    "name"=>"admin",
    "sizeOnDisk"=>1.0, "empty"=>true
}], "totalSize"=>167772160.0, "ok"=>1.0}]>
 => nil

This may look a little different with your version of irb and the MongoDB driver, but it should still be easy to access. Once you get used to representing documents as Ruby hashes, the transition from the shell API is almost seamless.

Most drivers provide you convenient functionality that wraps database commands. You may recall from the previous chapter that remove doesn’t actually drop the collection. To drop a collection and all its indexes, use the drop_collection method:

db = $client.use('tutorial')
db['users'].drop

It’s okay if you’re still feeling shaky about using MongoDB with Ruby; you’ll get more practice in section 3.3. But for now, we’re going to take a brief intermission to see how the MongoDB drivers work. This will shed more light on some of MongoDB’s design and prepare you to use the drivers effectively.

3.2. How the drivers work

At this point it’s natural to wonder what’s going on behind the scenes when you issue commands through a driver or via the MongoDB shell. In this section, you’ll see how the drivers serialize data and communicate it to the database.

All MongoDB drivers perform three major functions. First, they generate MongoDB object IDs. These are the default values stored in the _id field of all documents. Next, the drivers convert any language-specific representation of documents to and from BSON, the binary data format used by MongoDB. In the previous examples, the driver serializes all the Ruby hashes into BSON and then deserializes the BSON that’s returned from the database back to Ruby hashes.

The drivers’ final function is to communicate with the database over a TCP socket using the MongoDB wire protocol. The details of the protocol are beyond the scope of this discussion. But the style of socket communication, in particular whether writes on the socket wait for a response, is important, and we’ll explore the topic in this section.

3.2.1. Object ID generation

Every MongoDB document requires a primary key. That key, which must be unique for all documents in each collection, is stored in the document’s _id field. Developers are free to use their own custom values as the _id, but when not provided, a MongoDB object ID will be used. Before sending a document to the server, the driver checks whether the _id field is present. If the field is missing, an object ID will be generated and stored as _id.

MongoDB object IDs are designed to be globally unique, meaning they’re guaranteed to be unique within a certain context. How can this be guaranteed? Let’s examine this in more detail.

You’ve probably seen object IDs in the wild if you’ve inserted documents into MongoDB, and at first glance they appear to be a string of mostly random text, like 4c291856238d3b19b2000001. You may not have realized that this text is the hex representation of 12 bytes, and actually stores some useful information. These bytes have a specific structure, as illustrated in figure 3.1.

Figure 3.1. MongoDB object ID format

The most significant four bytes carry a standard Unix (epoch) timestamp^[3]. The next three bytes store the machine ID, which is followed by a two-byte process ID. The final three bytes store a process-local counter that’s incremented each time an object ID is generated. The counter means that ids generated in the same process and second won’t be duplicated.

³
Many Unix machines (we’re including Linux when we say Unix machine) store time values in a format called Unix Time or POSIX time; they just count up the number of seconds since 00:00 on January 1^st, 1970, called the epoch. This means that a timestamp can be stored as an integer. For example, 2010-06-28 21:47:02 is represented as 1277761622 (or 0x4c291856 in hexadecimal), the number of seconds since the epoch.

Why does the object ID have this format? It’s important to understand that these IDs are generated in the driver, not on the server. This is different than many RDBMSs, which increment a primary key on the server, thus creating a bottleneck for the server generating the key. If more than one driver is generating IDs and inserting documents, they need a way of creating unique identifiers without talking to each other. Thus, the timestamp, machine ID, and process ID are included in the identifier itself to make it extremely unlikely that IDs will overlap.

You may already be considering the odds of this happening. In practice, you would encounter other limits before inserting documents at the rate required to overflow the counter for a given second (2²⁴ million per second). It’s slightly more conceivable (though still unlikely) to imagine that ifyou had many drivers distributed across many machines, two machines could have the same machine ID. For example, the Ruby driver uses the following:

@@machine_id = Digest::MD5.digest(Socket.gethostname)[0, 3]

For this to be a problem, they would still have to have started the MongoDB driver’s process with the same process ID, and have the same counter value in a given second. In practice, don’t worry about duplication; it’s extremely unlikely.

One of the incidental benefits of using MongoDB object IDs is that they include a timestamp. Most of the drivers allow you to extract the timestamp, thus providing the document creation time, with resolution to the nearest second, for free. Using the Ruby driver, you can call an object ID’s generation_time method to get that ID’s creation time as a Ruby Time object:

irb> require 'mongo'
irb> id = BSON::ObjectId.from_string('4c291856238d3b19b2000001')
=> BSON::ObjectId('4c291856238d3b19b2000001')
irb> id.generation_time
=> 2010-06-28 21:47:02 UTC

Naturally, you can also use object IDs to issue range queries on object creation time. For instance, if you wanted to query for all documents created during June 2013, you could create two object IDs whose timestamps encode those dates and then issue a range query on _id. Because Ruby provides methods for generating object IDs from any Time object, the code for doing this is trivial:^[4]

⁴
This example will actually not work; it’s meant as a thoughtful exercise. By now you should have enough knowledge to create meaningful data for the query to return something. Why not take the time and try it out?

jun_id = BSON::ObjectId.from_time(Time.utc(2013, 6, 1))
jul_id = BSON::ObjectId.from_time(Time.utc(2013, 7, 1))
@users.find({'_id' => {'$gte' => jun_id, '$lt' => jul_id}})

As mentioned before, you can also set your own value for _id. This might make sense in cases where one of the document’s fields is important and always unique. For instance, in a collection of users you could store the username in _id rather than on object ID. There are advantages to both ways, and it comes down to your preference as a developer.

3.3. Building a simple application

Next you’ll build a simple application for archiving and displaying Tweets. You can imagine this being a component in a larger application that allows users to keep tabs on search terms relevant to their businesses. This example will demonstrate how easy it is to consume JSON from an API like Twitter’s and convert that to MongoDB documents. If you were doing this with a relational database, you’d have to devise a schema in advance, probably consisting of multiple tables, and then declare those tables. Here, none of that’s required, yet you’ll still preserve the rich structure of the Tweet documents, and you’ll be able to query them effectively.

Let’s call the app TweetArchiver. TweetArchiver will consist of two components: the archiver and the viewer. The archiver will call the Twitter search API and store the relevant Tweets, and the viewer will display the results in a web browser.

3.3.1. Setting up

This application requires four Ruby libraries. The source code repository for this chapter includes a file called Gemfile, which lists these gems. Change your working directory to chapter3 and make sure an ls command shows the Gemfile. You can then install them from your system command line like this:

gem install bundler
bundle install

This will ensure the bundler gem is installed. Next, install the other gems using Bundler’s package management tools. This is a widely used Ruby tool for ensuring that the gems you use match some predetermined versions: the versions that match our code examples.

Our Gemfile lists the mongo, twitter, bson and sinatra gems, so these will be installed. The mongo gem we’ve used already, but we include it to be sure we have the right version. The twitter gem is useful for communicating with the Twitter API. The sinatra gem is a framework for running a simple web server in Ruby, and we discuss it in more detail in section 3.3.3.

We provide the source code for this example separately, but introduce it gradually to help you understand it. We recommend you experiment and try new things to get the most out of the example.

It’ll be useful to have a configuration file that you can share between the archiver and viewer scripts. Create a file called config.rb (or copy it from the source code) that looks like this:

DATABASE_HOST   = 'localhost'
DATABASE_PORT   = 27017
DATABASE_NAME   = "twitter-archive"
COLLECTION_NAME = "tweets"
TAGS = ["#MongoDB", "#Mongo"]

CONSUMER_KEY    = "replace me"
CONSUMER_SECRET = "replace me"
TOKEN           = "replace me"
TOKEN_SECRET    = "replace me"

First you specify the names of the database and collection you’ll use for your application. Then you define an array of search terms, which you’ll send to the Twitter API.

Twitter requires that you register a free account and an application for accessing the API, which can be accomplished at http://apps.twitter.com. Once you’ve registered an application, you should see a page with its authentication information, perhaps on the API keys tab. You will also have to click the button that creates your access token. Use the values shown to fill in the consumer and API keys and secrets.

3.3.2. Gathering data

The next step is to write the archiver script. You start with a TweetArchiver class. You’ll instantiate the class with a search term. Then you’ll call the update method on the TweetArchiver instance, which issues a Twitter API call, and save the results to a MongoDB collection.

Let’s start with the class’s constructor:

def initialize(tag)
  connection = Mongo::Connection.new(DATABASE_HOST, DATABASE_PORT)
  db         = connection[DATABASE_NAME]
  @tweets    = db[COLLECTION_NAME]
  @tweets.ensure_index([['tags', 1], ['id', -1]])
  @tag = tag
  @tweets_found = 0

  @client = Twitter::REST::Client.new do |config|
    config.consumer_key        = API_KEY
    config.consumer_secret     = API_SECRET
    config.access_token        = ACCESS_TOKEN
    config.access_token_secret = ACCESS_TOKEN_SECRET
  end
end

The initialize method instantiates a connection, a database object, and the collection object you’ll use to store the Tweets.

You’re creating a compound index on tags ascending and id descending. Because you’re going to want to query for a particular tag and show the results from newest to oldest, an index with tags ascending and id descending will make that query use the index both for filtering results and for sorting them. As you can see here, you indicate index direction with 1 for ascending and -1 for descending. Don’t worry if this doesn’t make sense now—we discuss indexes with much greater depth in chapter 8.

You’re also configuring the Twitter client with the authentication information from config.rb. This step hands these values to the Twitter gem, which will use them when calling the Twitter API. Ruby has somewhat unique syntax often used for this sort of configuration; the config variable is passed to a Ruby block, in which you set its values.

MongoDB allows you to insert data regardless of its structure. With a relational database, each table needs a well-defined schema, which requires planning out which values you would like to store. In the future, Twitter may change its API so that different values are returned, which will likely require a schema change if you want to store these additional values. Not so with MongoDB. Its schema-less design allows you to save the document you get from the Twitter API without worrying about the exact format.

The Ruby Twitter library returns Ruby hashes, so you can pass these directly to your MongoDB collection object. Within your TweetArchiver, you add the following instance method:

def save_tweets_for(term)
  @client.search(term).each do |tweet|
    @tweets_found += 1
    tweet_doc = tweet.to_h
    tweet_doc[:tags] = term
    tweet_doc[:_id] = tweet_doc[:id]
    @tweets.insert_one(tweet_doc)
  end
end

Before saving each Tweet document, make two small modifications. To simplify later queries, add the search term to a tags attribute. You also set the _id field to the ID of the Tweet, replacing the primary key of your collection and ensuring that each Tweet is added only once. Then you pass the modified document to the save method.

To use this code in a class, you need some additional code. First, you must configure the MongoDB driver so that it connects to the correct mongod and uses the desired database and collection. This is simple code that you’ll replicate often as you use MongoDB. Next, you must configure the Twitter gem with your developer credentials. This step is necessary because Twitter restricts its API to registered developers. The next listing also provides an update method, which gives the user feedback and calls save_tweets_for.

Listing 3.1. archiver.rb—A class for fetching Tweets and archiving them in MongoDB

All that remains is to write a script to run the TweetArchiver code against each of the search terms. Create a file called update.rb (or copy it from the provided code) containing the following:

$LOAD_PATH << File.dirname(__FILE__)
require 'config'
require 'archiver'

TAGS.each do |tag|
  archive = TweetArchiver.new(tag)
  archive.update
end

Next, run the update script:

ruby update.rb

You’ll see some status messages indicating that Tweets have been found and saved. You can verify that the script works by opening the MongoDB shell and querying the collection directly:

> use twitter-archive
switched to db twitter-archive
> db.tweets.count()
30

What’s important here is that you’ve managed to store Tweets from Twitter searches in only a few lines of code.^[5] Next comes the task of displaying the results.

⁵
It’s possible to accomplish this in far fewer lines of code. Doing so is left as an exercise to the reader.

3.3.3. Viewing the archive

You’ll use Ruby’s Sinatra web framework to build a simple app to display the results. Sinatra allows you to define the endpoints for a web application and directly specify the response. Its power lies in its simplicity. For example, the content of the index page for your application can be specified with the following:

get '/' do
  "response"
end

This code specifies that GET requests to the / endpoint of your application return the value of response to the client. Using this format, you can write full web applications with many endpoints, each of which can execute arbitrary Ruby code before returning a response. You can find more information, including Sinatra’s full documentation, at http://sinatrarb.com.

We’ll now introduce a file called viewer.rb and place it in the same directory as the other scripts. Next, make a subdirectory called views, and place a file there called tweets.erb. After these steps, the project’s file structure should look like this:

- config.rb
- archiver.rb
- update.rb
- viewer.rb
- /views
  - tweets.erb

Again, feel free to create these files yourself or copy them from the code examples. Now edit viewer.rb with the code in the following listing.

Listing 3.2. viewer.rb—Sinatra application for displaying the Tweet archive

The first lines require the necessary libraries along with your config file . Next there’s a configuration block that creates a connection to MongoDB and stores a reference to your tweets collection in the constant TWEETS .

The real meat of the application is in the lines beginning with get '/' do. The code in this block handles requests to the application’s root URL. First, you build your query selector. If a tags URL parameter has been provided, you create a query selector that restricts the result set to the given tags . Otherwise, you create a blank selector, which returns all documents in the collection . You then issue the query . By now, you should know that what gets assigned to the @tweets variable isn’t a result set but a cursor. You’ll iterate over that cursor in your view.

The last line renders the view file tweets.erb (see the next listing).

Listing 3.3. tweets.erb—HTML with embedded Ruby for rendering the Tweets

<!DOCTYPE html>
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
  <style>
    body {
      width: 1000px;
      margin: 50px auto;
      font-family: Palatino, serif;
      background-color: #dbd4c2;
      color: #555050;
    }
    h2 {
      margin-top: 2em;
      font-family: Arial, sans-serif;
      font-weight: 100;
    }
  </style>
</head>
<body>
<h1>Tweet Archive</h1>
<% TAGS.each do |tag| %>
  <a href="/?tag=<%= URI::encode(tag) %>"><%= tag %></a>
<% end %>
<% @tweets.each do |tweet| %>
  <h2><%= tweet['text'] %></h2>
  <p>
    <a href="http://twitter.com/<%= tweet['user']['screen_name'] %>">
      <%= tweet['user']['screen_name'] %>
    </a>
    on <%= tweet['created_at'] %>
  </p>
  <img src="<%= tweet['user']['profile_image_url'] %>" width="48" />
<% end %>
</body>
</html>

Most of the code is just HTML with some ERB (embedded Ruby) mixed in. The Sinatra app runs the tweets.erb file through an ERB processor and evaluates any Ruby code between <% and %> in the context of the application.

The important parts come near the end, with the two iterators. The first of these cycles through the list of tags to display links for restricting the result set to a given tag. The second iterator, beginning with the @tweets.each code, cycles through each Tweet to display the Tweet’s text, creation date, and user profile image. You can see results by running the application:

$ ruby viewer.rb

If the application starts without error, you’ll see the standard Sinatra startup message that looks something like this:

$ ruby viewer.rb
[2013-07-05 18:30:19] INFO  WEBrick 1.3.1
[2013-07-05 18:30:19] INFO  ruby 1.9.3 (2012-04-20) [x86_64-darwin10.8.0]
== Sinatra/1.4.3 has taken the stage on 4567 for development with backup from
     WEBrick
[2013-07-05 18:30:19] INFO  WEBrick::HTTPServer#start: pid=18465 port=4567

You can then point your web browser to http://localhost:4567. The page should look something like the screenshot in figure 3.2. Try clicking on the links at the top of the screen to narrow the results to a particular tag.

Figure 3.2. Tweet Archiver output rendered in a web browser

That’s the extent of the application. It’s admittedly simple, but it demonstrates some of the ease of using MongoDB. You didn’t have to define your schema in advance, you took advantage of secondary indexes to make your queries fast and prevent duplicate inserts, and you had a relatively simple integration with your programming language.

3.4. Summary

You’ve just learned the basics of talking to MongoDB through the Ruby programming language. You saw how easy it is to represent documents in Ruby, and how similar Ruby’s CRUD API is to that of the MongoDB shell. We dove into some internals, exploring how the drivers in general are built and looking in detail at object IDs, BSON, and the MongoDB network protocol. Finally, you built a simple application to show the use of MongoDB with real data. Though using MongoDB in the real world often requires more complexity, the prospect of writing applications with the database should be in reach.

Beginning with chapter 4, we’re going to take everything you’ve learned so far and drill down. Specifically, you’ll investigate how you might build an e-commerce application in MongoDB. That would be an enormous project, so we’ll focus solely on a few sections on the back end. We’ll present some data models for that domain, and you’ll see how to insert and query that kind of data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3. Writing programs using MongoDB

Create new playlist

Sign In

Sign Up

Chapter 3. Writing programs using MongoDB

3.1. MongoDB through the Ruby lens

3.1.1. Installing and connecting

3.1.2. Inserting documents in Ruby

3.1.3. Queries and cursors

3.1.4. Updates and deletes

3.1.5. Database commands

3.2. How the drivers work

3.2.1. Object ID generation

Figure 3.1. MongoDB object ID format

3.3. Building a simple application

3.3.1. Setting up

3.3.2. Gathering data

Listing 3.1. archiver.rb—A class for fetching Tweets and archiving them in MongoDB

3.3.3. Viewing the archive

Listing 3.2. viewer.rb—Sinatra application for displaying the Tweet archive

Listing 3.3. tweets.erb—HTML with embedded Ruby for rendering the Tweets

Figure 3.2. Tweet Archiver output rendered in a web browser

3.4. Summary

Table of Contents for
Chapter 3. Writing programs using MongoDB