Day 2: Creating and Querying Views

In CouchDB, a view is a window into the documents contained in a database. Views are the principal way that documents are accessed in all but trivial cases, such as those individual CRUD operations you saw on Day 1. Today, you’ll discover how to create the functions that make up a view. You’ll also learn how to perform ad hoc queries against views using cURL. Finally, you’ll import music data, which will make the views more salient and demonstrate how to use couchrest, a popular Ruby library for working with CouchDB.

Accessing Documents Through Views

A view consists of mapper and reducer functions that are used to generate an ordered list of key-value pairs. Both keys and values can be any valid JSON. The simplest view is called _all_docs. It is provided out-of-the-box for all databases and contains an entry for each document in the database, keyed by its string _id.

To retrieve all of the things in the database, issue a GET request for the _all_docs view.

 $ ​​curl​​ ​​"${COUCH_ROOT_URL}/music/_all_docs"
 {
  "total_rows": 1,
  "offset": 0,
  "rows": [
  {
  "id": "2ac58771c197f70461056f7c7e0001f9",
  "key": "2ac58771c197f70461056f7c7e0001f9",
  "value": {
  "rev": "7-d37c47883f4d30913c6a38644410685d"
  }
  }
  ]
 }

You can see in the previous output the one document we’ve created so far. The response is a JSON object that contains an array of rows. Each row is an object with three fields:

  • id is the document’s _id.
  • key is the JSON key produced by the mapreduce functions.
  • value is the associated JSON value, also produced through mapreduce.

In the case of _all_docs, the id and key fields match, but for custom views this will almost never be the case.

By default, views will include only metadata for documents in the value field rather than each document’s content. To retrieve all of the document’s fields, add the include_docs=true URL parameter.

 $ ​​curl​​ ​​"${COUCH_ROOT_URL}/music/_all_docs?include_docs=true"
 {
  "total_rows": 1,
  "offset": 0,
  "rows": [
  {
  "id": "2ac58771c197f70461056f7c7e0001f9",
  "key": "2ac58771c197f70461056f7c7e0001f9",
  "value": {
  "rev": "7-d37c47883f4d30913c6a38644410685d"
  },
  "doc": {
  "_id": "2ac58771c197f70461056f7c7e0001f9",
  "_rev": "7-d37c47883f4d30913c6a38644410685d",
  "name": "The Beatles",
  "albums": [
  {
  "title": "Help!",
  "year": 1965
  },
  {
  "title": "Sgt. Pepper's Lonely Hearts Club Band",
  "year": 1967
  },
  {
  "title": "Abbey Road",
  "year": 1969
  }
  ]
  }
  }
  ]
 }

Here you can see that the other properties, name and albums, have been added to the value object in the output. With this basic structure in mind, let’s make our own views.

Writing Your First View

Now that we’ve gotten a rough overview of how views work, let’s try creating our own views. To start, we’ll reproduce the behavior of the _all_docs view, and after that, we’ll make increasingly complex views to extract deeper information from our documents for indexing.

To create a view, open a browser to Fauxton[33] as we did in Day 1. Next, open the music database by clicking the link. Click the plus symbol next to Design Documents and then click New View. That should bring you to a page that resembles the figure that follows.

images/couchdb-fauxton-new-view.png

The code in the Map Function box on the right should look like this:

 function(doc) {
  emit(_doc.id, 1);
 }

Change the emitter function to emit(null, doc). If you click the Options button in the upper right and then click Run Query, CouchDB will execute this function once for each document in the database, passing in that document as the doc parameter each time. This will generate a result that looks like the figure.

images/couchdb-fauxton-view-result.png

The secret to this output, and all views, is the emit function (this works just like the MongoDB function of the same name). emit takes two arguments: the key and the value. A given map function may call emit one time, many times, or no times for a given document. In the previous case, the map function emits the key-value pair null/doc. As you see in the output table, the key is indeed null, and the value is the same object you saw on Day 1 when we requested it directly from cURL.

To make a mapper that achieves the same thing as _all_docs, we need to emit something a little different. Recall that _all_docs emits the document’s _id field for the key and a simple object containing only the _rev field for the value. With that in mind, change the Map Function code to the following, and then click Run.

 function(doc) {
  emit(doc._id, { rev: doc._rev });
 }

The output table should now resemble the following JSON payload, echoing the same key-value pair we saw earlier when enumerating records via _all_docs:

 {
 "id"​: ​"2ac58771c197f70461056f7c7e0001f9"​,
 "key"​: ​"2ac58771c197f70461056f7c7e0001f9"​,
 "value"​: {
 "rev"​: ​"8-e1b7281f6adcd82910c6473be2d4e2ec"
  },
 "_id"​: ​"2ac58771c197f70461056f7c7e0001f9"
 }

Saving a View as a Design Document

When using CouchDB in production, you should store your views in design documents. A design document is a real document in the database, just like the Beatles document we created earlier. As such, it can show up in views and be replicated to other CouchDB servers in the usual fashion. To save a design document in Fauxton, click the Create Document and Build Index button, and then fill in the Design Document and Index name fields.

Design documents always have IDs that start with _design/ and contain one or more views. The index name distinguishes this view from others housed in the same design document. Deciding which views belong in which design documents is largely application-specific and subject to taste. As a general rule, you should group views based on what they do relative to your data. You’ll see examples of this as we create more interesting views.

Finding Artists by Name

Now that we’ve covered the basics of view creation, let’s develop an application-specific view. Recall that our music database stores artist information, including a name field that contains the band’s name. Using the normal GET access pattern or the _all_docs view, we can access documents by their _id values, but we’re more interested in looking up bands by name.

In other words, today we can look up the document with _id equal to 2ac58771c197f70461056f7c7e0001f9, but how do we find the document with name equal to The Beatles? For this, we need a view. In Fauxton, head back to the New View page, enter artists as the name of the design document and by_name as the name of the view. Then enter the following Map Function code and click Create Document and Build Index.

 function​(doc) {
 if​ (​'name'​ ​in​ doc) {
  emit(doc.name, doc._id);
  }
 }

This function checks whether the current document has a name field and, if so, emits the name and document _id as the relevant key-value pair. This should produce a table like this:

Key

Value

"The Beatles"

"2ac58771c197f70461056f7c7e0001f9"

Finding Albums by Name

Finding artists by name is pretty useful, but we can do more. This time, let’s make a view that lets us find albums. This will be the first example where the map function will emit more than one result per document.

Return to the New View page once again (by clicking the plus sign next to Design Documents). Set the design doc name to albums and the index name to by_name; then enter the following mapper and click Create Document and Build Index:

 function​(doc) {
 if​ (​'name'​ ​in​ doc && ​'albums'​ ​in​ doc) {
  doc.albums.forEach(​function​(album){
 var
  key = album.title || album.name,
  value = { by: doc.name, album: album };
  emit(key, value);
  });
  }
 }

This function checks whether the current document has a name field and an albums field. If so, it emits a key-value pair for each album where the key is the album’s title or name and the value is a compound object containing the artist’s name and the original album object. The row field should contain an array with these key/value pairs:

Key

Value

"Abbey Road"

{by: "The Beatles", album: {title: "Abbey Road", year: 1969}}

"Help!"

{by: "The Beatles", album: {title: "Help!", year: 1965}}

"Sgt. Pepper’s Lonely Hearts Club Band"

{by: "The Beatles", album: {title: "Sgt. Pepper’s Lonely Hearts Club Band", year: 1967}}

Just as we did with the Artists By Name view, click the Create Document and Build Index button. This time, for Design Document, enter albums, and for the index name enter by_name. Click Save to persist the change. Now let’s see how to query these documents.

Querying Our Custom Artist and Album Views

Now that we have a couple of custom design documents saved, let’s jump back to the command line and query them with the curl command. We’ll start with the Artists By Name view. On the command line, execute the following:

 $ ​​curl​​ ​​"${COUCH_ROOT_URL}/music/_design/artists/_view/by_name"
 {
  "total_rows": 1,
  "offset": 0,
  "rows": [
  {
  "id": "2ac58771c197f70461056f7c7e0001f9",
  "key": "The Beatles",
  "value": "2ac58771c197f70461056f7c7e0001f9"
  }
  ]
 }

To query a view, construct the path /<database_name>/_design/<design_doc>/_view/<view_name>, replacing the parts as appropriate. In our case, we’re querying the by_name view in the artists design document of the music database. No surprise here that the output includes our one document, keyed by the band name.

Next, let’s try to find albums by name:

 $ ​​curl​​ ​​"${COUCH_ROOT_URL}/music/_design/albums/_view/by_name"
 {
  "total_rows": 3,
  "offset": 0,
  "rows": [
  {
  "id": "2ac58771c197f70461056f7c7e0001f9",
  "key": "Abbey Road",
  "value": {
  "by": "The Beatles",
  "album": {
  "title": "Abbey Road",
  "year": 1969
  }
  }
  },
  {
  "id": "2ac58771c197f70461056f7c7e0001f9",
  "key": "Help!",
  "value": {
  "by": "The Beatles",
  "album": {
  "title": "Help!",
  "year": 1965
  }
  }
  },
  {
  "id": "2ac58771c197f70461056f7c7e0001f9",
  "key": "Sgt. Pepper's Lonely Hearts Club Band",
  "value": {
  "by": "The Beatles",
  "album": {
  "title": "Sgt. Pepper's Lonely Hearts Club Band",
  "year": 1967
  }
  }
  }
  ]
 }

CouchDB will ensure that the records are presented in alphanumerical order by the emitted keys. In effect, this is the indexing that CouchDB offers. When designing your views, it’s important to pick emitted keys that will make sense when ordered. Requesting a view in this fashion returns the whole set, but what if you want just a subset? One way to do that is to use the key URL parameter. When you specify a key, only rows with that exact key are returned.

 $ ​​curl​​ ​​'${COUCH_ROOT_URL}/music/_design/albums/_view/by_name?key="Help!"'
 {
  "total_rows": 3,
  "offset": 1,
  "rows": [
  {
  "id": "2ac58771c197f70461056f7c7e0001f9",
  "key": "Help!",
  "value": {
  "by": "The Beatles",
  "album": {
  "title": "Help!",
  "year": 1965
  }
  }
  }
  ]
 }

Notice the total_rows and offset fields in the response. The total_rows field counts the total number of records in the view, not just the subset returned for this request. The offset field tells us how far into that full set the first record presented appears. Based on these two numbers and the length of the rows, we can calculate how many more records there are in the view on both sides. Requests for views can be sliced a few other ways beyond the keys parameter, but to really see them in action, we’re going to need more data.

Importing Data Into CouchDB Using Ruby

Importing data is a recurring problem that you’ll face no matter what database you end up using. CouchDB is no exception here. In this section, we’ll use Ruby to import structured data into our music database. Through this, you’ll see how to perform bulk imports into CouchDB, and it will also give us a nice pool of data to work with when we create more advanced views.

We’ll use music data from Jamendo.com,[34] a site devoted to hosting freely licensed music. Jamendo provides all their artist, album, and track data in a structured XML format, making it ideal for importing into a document-oriented database such as CouchDB. Download a GZIPped version of the XML:

 $ ​​curl​​ ​​-O​​ ​​https://imgjam.com/data/dbdump_artistalbumtrack.xml.gz

The XML file is over 200 MB. To parse Jamendo’s XML file, we’ll use the libxml-ruby gem.

Rather than writing our own Ruby-CouchDB driver or issuing HTTP requests directly, we’ll use a popular Ruby gem called couchrest that wraps these calls into a convenient Ruby API. We’ll be using only a few methods from the API, but if you want to continue using this driver for your own projects, you can check out the documentation.[35] On the command line, install the necessary gems:

 $ ​​gem​​ ​​install​​ ​​libxml-ruby​​ ​​couchrest

Just as we did for Wikipedia data in Chapter 3, HBase, we’ll use a streaming XML parser to process documents sequentially for insert as they’re streamed in through standard input. Here’s the code:

 require ​'libxml'
 require ​'couchrest'
 
 include​ LibXML
 
 class​ JamendoCallbacks
 include​ XML::SaxParser::Callbacks
 
 def​ initialize
  @db = CouchRest.database!(​"http://localhost:5984/music"​)
  @count = 0
  @max = 10000 ​# maximum number to insert
  @stack = []
  @artist = ​nil
  @album = ​nil
  @track = ​nil
  @tag = ​nil
  @buffer = ​nil
 end
 def​ on_start_element(element, attributes)
 case​ element
 when​ ​'artist'
  @artist = { ​:albums​ => [] }
  @stack.push @artist
 when​ ​'album'
  @album = { ​:tracks​ => [] }
  @artist[​:albums​].push @album
  @stack.push @album
 when​ ​'track'
  @track = { ​:tags​ => [] }
  @album[​:tracks​].push @track
  @stack.push @track
 when​ ​'tag'
  @tag = {}
  @track[​:tags​].push @tag
  @stack.push @tag
 when​ ​'Artists'​, ​'Albums'​, ​'Tracks'​, ​'Tags'
 # ignore
 else
  @buffer = []
 end
 end
 
 def​ on_characters(chars)
  @buffer << chars ​unless​ @buffer.nil?
 end
 
 def​ on_end_element(element)
 case​ element
 when​ ​'artist'
  @stack.pop
  @artist[​'_id'​] = @artist[​'id'​] ​# reuse Jamendo's artist id for doc _id
  @artist[​:random​] = rand
  @db.save_doc(@artist, ​false​, ​true​)
  @count += 1
 if[email protected]? && @count >= @max
  on_end_document
 end
 if​ @count % 500 == 0
  puts ​" ​​#{​@count​}​​ records inserted"
 end
 when​ ​'album'​, ​'track'​, ​'tag'
  top = @stack.pop
  top[​:random​] = rand
 when​ ​'Artists'​, ​'Albums'​, ​'Tracks'​, ​'Tags'
 # ignore
 else
 if​ @stack[-1] && @buffer
  @stack[-1][element] = @buffer.join.force_encoding(​'utf-8'​)
  @buffer = ​nil
 end
 end
 end
 
 def​ on_end_document
  puts ​"TOTAL: ​​#{​@count​}​​ records inserted"
  exit(0)
 end
 end
 
 parser = XML::SaxParser.io(ARGF)
 parser.callbacks = JamendoCallbacks.new
 parser.parse

A few things that you should make note of:

  1. At the very beginning of the script, we bring in all of the gems we need.

  2. The standard way to use LibXML is by defining a callbacks class. Here we define a JamendoCallbacks class to encapsulate our SAX handlers for various events.

  3. The first thing our class does during initialization is connect to our local CouchDB server using the couchrest API and then create the music database (if it doesn’t exist already). After that, it sets up some instance variables for storing state information during the parse. Note that if you set the @max parameter to nil, all documents will be imported, not just the first 100.

  4. Once parsing has started, the on_start_element method will handle any opening tags. Here we watch for certain especially interesting tags, such as <artist>, <album>, <track>, and <tag>. We specifically ignore certain container elements—<Artists>, <Albums>, <Tracks>, and <Tags>—and treat all others as properties to be set on the nearest container items.

  5. Whenever the parser encounters character data, we buffer it to be added as a property to the current container element (the end of @stack).

  6. Much of the interesting stuff happens in the on_end_element method. Here, we close out the current container element by popping it off the stack. If the tag closes an <artist> element, we take the opportunity to save off the document in CouchDB with the @db.save_doc method. For any container element, we also add a random property containing a freshly generated random number. We’ll use this later when selecting a random track, album, or artist.

  7. Ruby’s ARGF stream combines standard input and any files specified on the command line. We feed this into LibXML and specify an instance of our JamendoCallbacks class to handle the tokens—start tags, end tags, and character data—as they’re encountered.

To run the script, pipe the unzipped XML content into the import script:

 $ ​​zcat​​ ​​<​​ ​​dbdump_artistalbumtrack.xml.gz​​ ​​|​​ ​​ruby​​ ​​import_from_jamendo.rb
 TOTAL: 10000 records inserted

The script will begin importing 10,000 records (you can adjust the limit by changing the value of the @max variable in the script). When the import has finished, drop back down to the command line and we’ll see how our views look. First, let’s pull up a few artists. The limit URL parameter specifies that we want only that number of documents in the response (or less).

 $ ​​curl​​ ​​"${COUCH_ROOT_URL}/music/_design/artists/_view/by_name?limit=5"
 {"total_rows":10001,"offset":0,"rows":[
 {"id":"5385","key":" A.n.K.h // ","value":"5385"},
 {"id":"354581","key":" E2U","value":"354581"},
 {"id":"457184","key":" EL VECINO","value":"457184"},
 {"id":"338059","key":" ENIGMA63","value":"338059"},
 {"id":"378976","key":" John Ov3rblast","value":"378976"}
 ]}

The previous request started at the very beginning of the list of artists. To jump to the middle, we can use the startkey parameter:

 $ ​​curl​​ ​​"${COUCH_ROOT_URL}/music/_design/artists/_view/by_name?
 limit=5&startkey=%22C%22"
 {"total_rows":10001,"offset":1320,"rows":[
 {"id":"363267","key":"C-74","value":"363267"},
 {"id":"357962","key":"c-dio","value":"357962"},
 {"id":"350911","key":"C-Jay L'infidel J.B","value":"350911"},
 {"id":"1188","key":"c-nergy","value":"1188"},
 {"id":"832","key":"C. Glen Williams","value":"832"}
 ]}

Previously, we started with artists whose names began with C. Specifying an endkey provides another way to limit the returned content. Here we specify that we want artists only between C and D:

 $ ​​curl​​ ​​"${COUCH_ROOT_URL}/music/_design/artists/_view/by_name?
 startkey=%22C%22&endkey=%22D%22&limit=5"
 {"total_rows":10001,"offset":1320,"rows":[
 {"id":"363267","key":"C-74","value":"363267"},
 {"id":"357962","key":"c-dio","value":"357962"},
 {"id":"350911","key":"C-Jay L'infidel J.B","value":"350911"},
 {"id":"1188","key":"c-nergy","value":"1188"},
 {"id":"832","key":"C. Glen Williams","value":"832"}
 ]}

To get the rows in reverse order, use the descending URL parameter. Be sure to reverse your startkey and endkey as well.

 $ ​​curl​​ ​​"${COUCH_ROOT_URL}/music/_design/artists/_view/by_name?
 startkey=%22D%22&endkey=%22C%22&limit=5&descending=true"
 {"total_rows":10001,"offset":8174,"rows":[
 {"id":"1689","key":"czskamaarù","value":"1689"},
 {"id":"341426","key":"CZAQU","value":"341426"},
 {"id":"360640","key":"Cystoflo","value":"360640"},
 {"id":"355941","key":"CYRUS DA VIRUS","value":"355941"},
 {"id":"427004","key":"Cyrix Project","value":"427004"}
 ]}

A number of other URL parameters are available for modifying view requests, but these are the most common and are the ones you’ll reach for the most often. Some of the URL parameters have to do with grouping, which comes from the reducer part of CouchDB mapreduce views. You’ll explore these tomorrow.

Day 2 Wrap-Up

Today we covered some good ground. You learned how to create basic views in CouchDB and save them into design documents. You explored different ways of querying views to get subsets of the indexed content. Using Ruby and a popular gem called couchrest, we imported structured data and used it to support our views. Tomorrow, we’ll expand on these ideas by creating more advanced views by adding reducers then move on to other APIs that CouchDB supports.

Day 2 Homework

Find

  1. We’ve seen that the emit method can output keys that are strings. What other types of values does it support? What happens when you emit an array of values as a key?

  2. Find a list of available URL parameters (like limit and startkey) that can be appended to view requests and what they do.

Do

  1. The import script import_from_jamendo.rb assigned a random number to each artist by adding a property called random. Create a mapper function that will emit key-value pairs where the key is the random number and the value is the band’s name. Save this in a new design document named _design/random with the index name artist.

  2. Craft a cURL request that will retrieve a random artist.

    Hint: You’ll need to use the startkey parameter, and you can produce a random number on the command line using ‘ruby -e ’puts rand’‘.

  3. The import script also added a random property for each album, track, and tag. Create three additional views in the _design/random design document with the index names album, track, and tag to match the earlier artist view.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset