Chapter 7. Updates, atomic operations, and deletes

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 7. Updates, atomic operations, and deletes

This chapter covers

Updating documents
Processing documents atomically
Applying complex updates to a real-world example
Using update operators
Deleting documents

To update is to write to existing documents. Doing this effectively requires a thorough understanding of the kinds of document structures available and of the query expressions made possible by MongoDB. Having studied the e-commerce data model in the last two chapters, you should have a good sense of the ways in which schemas are designed and queried. We’ll use all of this knowledge in our study of updates.

Specifically, we’ll look more closely at why we model the category hierarchy in such a denormalized way, and how MongoDB’s updates make that structure reasonable. We’ll explore inventory management and solve a few tricky concurrency issues in the process. You’ll get to know a host of new update operators, learn some tricks that take advantage of the atomicity of update operations, and experience the power of the findAndModify command. In this case atomicity refers to MongoDB’s ability to search for a document and update it, with the guarantee that no other operation will interfere, a powerful property. After numerous examples, there will be a section devoted to the nuts and bolts of each update operator, which will expand on the examples to give you the full range of options for how you update. We’ll also discuss how to delete data in MongoDB, and conclude with some notes on concurrency and optimization.

Most of the examples in this chapter are written for the JavaScript shell. The section where we discuss atomic document processing, though, requires a good bit more application-level logic, so for that section we’ll switch over to Ruby.

By the end of the chapter, you’ll have been exposed to the full range of MongoDB’s CRUD operations, and you’ll be well on your way to designing applications that best take advantage of MongoDB’s interface and data model.

7.1. A brief tour of document updates

If you need to update a document in MongoDB, you have two ways of going about it. You can either replace the document altogether, or you can use update operators to modify specific fields within the document. As a way of setting the stage for the more detailed examples to come, we’ll begin this chapter with a simple demonstration of these two techniques. We’ll then provide reasons for preferring one over the other.

To start, recall the sample user document we developed in chapter 4. The document includes a user’s first and last names, email address, and shipping addresses. Here’s a simplified example:

{
  _id: ObjectId("4c4b1476238d3b4dd5000001"),
  username: "kbanker",
  email: "[email protected]",
  first_name: "Kyle",
  last_name: "Banker",
  hashed_password: "bd1cfa194c3a603e7186780824b04419",
  addresses: [
    {
      name: "work",
      street: "1 E. 23rd Street",
      city: "New York",
      state: "NY",
      zip: 10010
    }
  ]
}

You’ll undoubtedly need to update an email address from time to time, so let’s begin with that.

Please note that your ObjectId values might be a little different. Make sure that you’re using valid ones and, if needed, manually add documents that will help you follow the commands of this chapter. Alternatively, you can use the following method to find a valid document, get its ObjectId, and use it elsewhere:

doc = db.users.findOne({username: "kbanker"})
user_id = doc._id

7.1.1. Modify by replacement

To replace the document altogether, you first query for the document, modify it on the client side, and then issue the update with the modified document. Here’s how that looks in the JavaScript shell:

user_id = ObjectId("4c4b1476238d3b4dd5003981")
doc = db.users.findOne({_id: user_id})
doc['email'] = '[email protected]'
print('updating ' + user_id)
db.users.update({_id: user_id}, doc)

With the user’s _id at hand, you first query for the document. Next you modify the document locally, in this case changing the email attribute. Then you pass the modified document to the update method. The final line says, “Find the document in the users collection with the given _id, and replace that document with the one we’ve provided.” The thing to remember is that the update operation replaces the entire document, which is why it must be fetched first. If multiple users update the same document, the last write will be the one that will be stored.

7.1.2. Modify by operator

That’s how you modify by replacement; now let’s look at modification by operator:

user_id = ObjectId("4c4b1476238d3b4dd5000001")
db.users.update({_id: user_id},
  {$set: {email: '[email protected]'}})

The example uses $set, one of several special update operators, to modify the email address in a single request to the server. In this case, the update request is much more targeted: find the given user document and set its email field to [email protected].

Syntax note: updates vs. queries

Users new to MongoDB sometimes have difficulty distinguishing between the update and query syntaxes. Targeted updates always begin with the update operator, and this operator is almost always a verb-like construct (set, push, and so on). Take the $addToSet operator, for example:

db.products.update({}, {$addToSet: {tags: 'Green'}})

If you add a query selector to this update, note that the query operator is semantically adjectival (less than, equal to, and so on) and comes after the field name to query on (price, in this case):

db.products.update({price: {$lte: 10}},
   {$addToSet: {tags: 'cheap'}})

This last query example only updates documents with a price ? 10 where it adds 'cheap' to their tags.

Update operators use the prefix notation whereas query operators usually use the infix notation, meaning that $addToSet in the update operator comes first, and $lte in the query operator is within the hash in the price field.

7.1.3. Both methods compared

How about another example? This time you want to increment the number of reviews on a product. Here’s how you’d do that as a document replacement:

product_id = ObjectId("4c4b1476238d3b4dd5003982")
doc = db.products.findOne({_id: product_id})
doc['total_reviews'] += 1       // add 1 to the value in total_reviews
db.products.update({_id: product_id}, doc)

And here’s the targeted approach:

db.products.update({_id: product_id}, {$inc: {total_reviews: 1}})

The replacement approach, as before, fetches the user document from the server, modifies it, and then resends it. The update statement here is similar to the one you used to update the email address. By contrast, the targeted update uses a different update operator, $inc, to increment the value in total_reviews.

7.1.4. Deciding: replacement vs. operators

Now that you’ve seen a couple of updates in action, can you think of some reasons why you might use one method over the other? Which one do you find more intuitive? Which do you think is better for performance? What happens when multiple threads are updating simultaneously—are they isolated from one another?

Modification by replacement is the more generic approach. Imagine that your application presents an HTML form for modifying user information. With document replacement, data from the form post, once validated, can be passed right to MongoDB; the code to perform the update is the same regardless of which user attributes are modified. For instance, if you were going to build a MongoDB object mapper that needed to generalize updates, then updates by replacement would probably make for a sensible default.^[1]

¹
This is the strategy employed by most MongoDB object mappers, and it’s easy to understand why. If users are given the ability to model entities of arbitrary complexity, then issuing an update via replacement is much easier than calculating the ideal combination of special update operators to employ.

But targeted modifications generally yield better performance. For one thing, there’s no need for the initial round-trip to the server to fetch the document to modify. And, just as important, the document specifying the update is generally small. If you’re updating via replacement and your documents average 200 KB in size, that’s 200 KB received and sent to the server per update! Recall chapter 5 when you used projections to fetch only part of a document. That isn’t an option if you need to replace the document without losing information. Contrast that with the way updates are specified using $set and $push in the previous examples; the documents specifying these updates can be less than 100 bytes each, regardless of the size of the document being modified. For this reason, the use of targeted updates frequently means less time spent serializing and transmitting data.

In addition, targeted operations allow you to update documents atomically. For instance, if you need to increment a counter, updates via replacement are far from ideal. What if the document changes in between when you read and write it? The only way to make your updates atomic is to employ some sort of optimistic locking. With targeted updates, you can use $inc to modify a counter atomically. This means that even with a large number of concurrent updates, each $inc will be applied in isolation, all or nothing.^[2]

²
The MongoDB documentation uses the term atomic updates to signify what we’re calling targeted updates. This new terminology is an attempt to clarify the use of the word atomic. In fact, all updates issued to the core server occur atomically, isolated on a per-document basis. The update operators are called atomic because they make it possible to query and update a document in a single operation.

Optimistic locking

Optimistic locking, or optimistic concurrency control, is a technique for ensuring a clean update to a record without having to lock it. The easiest way to understand this technique is to think of a wiki. It’s possible to have more than one user editing a wiki page at the same time. But you never want a situation where a user is editing and updating an out-of-date version of the page. Thus, an optimistic locking protocol is used. When users try to save their changes, a timestamp is included in the attempted update. If that timestamp is older than the latest saved version of the page, the user’s update can’t go through. But if no one has saved any edits to the page, the update is allowed. This strategy allows multiple users to edit at the same time, which is much better than the alternative concurrency strategy of requiring each user to take out a lock to edit any one page.

With pessimistic locking, a record is locked from the time it’s first accessed in a transaction until the transaction is finished, making it inaccessible to other transactions during that time.

Now that you understand the kinds of available updates, you’ll be able to appreciate the strategies we’ll introduce in the next section. There, we’ll return to the e-commerce data model to answer some of the more difficult questions about operating on that data in production.

7.2. E-commerce updates

It’s easy to provide stock examples for updating this or that attribute in a MongoDB document. But with a production data model and a real application, complications will arise, and the update for any given attribute might not be a simple one-liner. In the following sections, we’ll use the e-commerce data model you saw in the last two chapters to provide a representative sample of the kinds of updates you’d expect to make in a production e-commerce site. You may find certain updates intuitive and others not so much. But overall, you’ll develop a better understanding of the schema developed in chapter 4 and an improved understanding of the features and limitations of MongoDB’s update language.

7.2.1. Products and categories

Here you’ll see a couple of examples of targeted updates in action, first looking at how you calculate average product ratings and then at the more complicated task of maintaining the category hierarchy.

Average product ratings

Products are amenable to numerous update strategies. Assuming that administrators are provided with an interface for editing product information, the easiest update involves fetching the current product document, merging that data with the user’s edits, and issuing a document replacement. At other times, you may only need to update a couple of values, where a targeted update is clearly the way to go. This is the case with average product ratings. Because users need to sort product listings based on average product rating, you store that rating in the product document itself and update the value whenever a review is added or removed.

Here’s one way of issuing this update in JavaScript:

product_id = ObjectId("4c4b1476238d3b4dd5003981")
count = 0
total = 0
db.reviews.find({product_id: product_id}, {rating: 4}).forEach(
  function(review) {
    total += review.rating
    count++
  })
average = total / count
db.products.update({_id: product_id},
  {$set: {total_reviews: count, average_review: average}})

This code aggregates and produces the rating field from each product review and then produces an average. You also use the fact that you’re iterating over each rating to count the total ratings for the product. This saves an extra database call to the count function. With the total number of reviews and their average rating, the code issues a targeted update, using $set.

If you don’t want to hardcode an ObjectId, you can find a specific ObjectId as follows and use it afterwards:

product_id = db.products.findOne({sku: '9092'}, {'_id': 1})

Performance-conscious users may balk at the idea of re-aggregating all product reviews for each update. Much of this depends on the ratio of reads to writes; it’s likely that more users will see product reviews than write their own, so it makes sense to re-aggregate on a write. The method provided here, though conservative, will likely be acceptable for most situations, but other strategies are possible. For instance, you could store an extra field on the product document that caches the review ratings total, making it possible to compute the average incrementally. After inserting a new review, you’d first query for the product to get the current total number of reviews and the ratings total. Then you’d calculate the average and issue an update using a selector like the following:

db.products.update({_id: product_id},
  {
    $set: {
      average_review: average,
      ratings_total: total
    },
    $inc: {
      total_reviews: 1
    }
  })

This example uses the $inc operator, which increments the field passed in by the given value—1, in this case.

Only by benchmarking against a system with representative data can you say whether this approach is worthwhile. But the example shows that MongoDB frequently provides more than one valid path. The requirements of the application will help you decide which is best.

The category hierarchy

With many databases, there’s no easy way to represent a category hierarchy. This is true of MongoDB, although the document structure does help the situation somewhat. Documents encourage a strategy that optimizes for reads because each category can contain a list of its denormalized ancestors. The one tricky requirement is keeping all the ancestor lists up to date. Let’s look at an example to see how this is done.

First you need a generic method for updating the ancestor list for any given category. Here’s one possible solution:

var generate_ancestors = function(_id, parent_id) {
  ancestor_list = []
  var cursor = db.categories.find({_id: parent_id})
  while(cursor.size() > 0) {
    parent = cursor.next()
    ancestor_list.push(parent)
    parent_id = parent.parent_id
    cursor = db.categories.find({_id: parent_id})
  }
  db.categories.update({_id: _id}, {$set: {ancestors: ancestor_list}})
}

This method works by walking backward up the category hierarchy, making successive queries to each node’s parent_id attribute until reaching the root node (where parent_id is null). All the while, it builds an in-order list of ancestors, storing that result in the ancestor_list array. Finally, it updates the category’s ancestors attribute using $set.

Now that you have that basic building block, let’s look at the process of inserting a new category. Imagine you have a simple category hierarchy that looks like the one in figure 7.1.

Figure 7.1. An initial category hierarchy

Suppose you want to add a new category called Gardening and place it under the Home category. You insert the new category document and then run your method to generate its ancestors:

parent_id = ObjectId("8b87fb1476238d3b4dd50003")
category = {
  parent_id: parent_id,
  slug: "gardening",
  name: "Gardening",
  description: "All gardening implements, tools, seeds, and soil."
}
db.categories.save(category)
generate_ancestors(category._id, parent_id)

Note that save() puts the ID created for it into the original document. The ID is then used in the call to generate_ancestors(). Figure 7.2 displays the updated tree.

Figure 7.2. Adding a Gardening category

That’s easy enough. But what if you now want to place the Outdoors category underneath Gardening? This is potentially complicated because it alters the ancestor lists of a number of categories. You can start by changing the parent_id of Outdoors to the _id of Gardening. This turns out to be not too difficult provided that you already have both an outdoors_id and a gardening_id available:

db.categories.update({_id: outdoors_id}, {$set: {parent_id: gardening_id}})

Because you’ve effectively moved the Outdoors category, all the descendants of Outdoors are going to have invalid ancestor lists. You can rectify this by querying for all categories with Outdoors in their ancestor lists and then regenerating those lists. MongoDB’s power to query into arrays makes this trivial:

db.categories.find({'ancestors.id': outdoors_id}).forEach(
  function(category) {
    generate_ancestors(category._id, outdoors_id)
  })

That’s how you handle an update to a category’s parent_id attribute, and you can see the resulting category arrangement in figure 7.3.

Figure 7.3. The category tree in its final state

But what if you update a category name? If you change the name of Outdoors to The Great Outdoors, you also have to change Outdoors wherever it appears in the ancestor lists of other categories. You may be justified in thinking, “See? This is where denormalization comes to bite you,” but it should make you feel better to know that you can perform this update without recalculating any ancestor list. Here’s how:

doc = db.categories.findOne({_id: outdoors_id})
doc.name = "The Great Outdoors"
db.categories.update({_id: outdoors_id}, doc)
db.categories.update(
  {'ancestors._id': outdoors_id},
  {$set: {'ancestors.$': doc}},
  {multi: true})

You first grab the Outdoors document, alter the name attribute locally, and then update via replacement. Now you use the updated Outdoors document to replace its occurrences in the various ancestor lists. The multi parameter {multi: true} is easy to understand; it enables multi-updates causing the update to affect all documents matching the selector—without {multi: true} an update will only affect the first matching document. Here, you want to update each category that has the Outdoors category in its ancestor list.

The positional operator is more subtle. Consider that you have no way of knowing where in a given category’s ancestor list the Outdoors category will appear. You need a way for the update operator to dynamically target the position of the Outdoors category in the array for any document. Enter the positional operator. This operator (here the $ in ancestors.$) substitutes the array index matched by the query selector with itself, and thus enables the update.

Here’s another example of this technique. Say you want to change a field of a user address (the example document shown in section 7.1) that has been labeled as “work.” You can accomplish this with a query like the following:

db.users.update({
      _id: ObjectId("4c4b1476238d3b4dd5000001"),
      'addresses.name': 'work'},
      {$set: {'addresses.$.street': '155 E 31st St.'}})

Because of the need to update individual subdocuments within arrays, you’ll always want to keep the positional operator at hand. In general, these techniques for updating the category hierarchy will be applicable whenever you’re dealing with arrays of subdocuments.

7.2.2. Reviews

Not all reviews are created equal, which is why this application allows users to vote on them. These votes are elementary; they indicate that the given review is helpful. You’ve modeled reviews so that they cache the total number of helpful votes and keep a list of each voter’s ID. The relevant section of each review document looks like this:

{
  helpful_votes: 3,
  voter_ids: [
    ObjectId("4c4b1476238d3b4dd5000041"),
    ObjectId("7a4f0376238d3b4dd5000003"),
    ObjectId("92c21476238d3b4dd5000032")
  ]
}

You can record user votes using targeted updates. The strategy is to use the $push operator to add the voter’s ID to the list and the $inc operator to increment the total number of votes, both in the same JavaScript console update operation:

db.reviews.update({_id: ObjectId("4c4b1476238d3b4dd5000041")}, {
    $push: {
      voter_ids: ObjectId("4c4b1476238d3b4dd5000001")
    },
    $inc: {
      helpful_votes: 1
    }
  })

This is almost correct. But you need to ensure that the update happens only if the voting user hasn’t yet voted on this review, so you modify the query selector to match only when the voter_ids array doesn’t contain the ID you’re about to add. You can easily accomplish this using the $ne query operator:

query_selector = {
  _id: ObjectId("4c4b1476238d3b4dd5000041"),
  voter_ids: {
    $ne: ObjectId("4c4b1476238d3b4dd5000001")
  }
}
db.reviews.update(query_selector, {
    $push: {
      voter_ids: ObjectId("4c4b1476238d3b4dd5000001")
    },
    $inc : {
      helpful_votes: 1
    }
  })

This is an especially powerful demonstration of MongoDB’s update mechanism and how it can be used with a document-oriented schema. Voting, in this case, is both atomic and efficient. The update is atomic because selection and modification occur in the same query. The atomicity ensures that, even in a high-concurrency environment, it will be impossible for any one user to vote more than once. The efficiency lies in the fact that the test for voter membership and the updates to the counter and the voter list all occur in the same request to the server.

Now, if you do end up using this technique to record votes, it’s especially important that any other updates to the review document also be targeted—updating by replacement could result in an inconsistency. Imagine, for instance, that a user updates the content of their review and that this update occurs via replacement. When updating by replacement, you first query for the document you want to update. But between the time that you query for the review and replace it, it’s possible that a different user might vote on the review. This is called a race condition. This sequence of events is illustrated in figure 7.4.

Figure 7.4. When a review is updated concurrently via targeted and replacement updates, data can be lost.

It should be clear that the document replacement at T3 will overwrite the votes update happening at T2. It’s possible to avoid this by using the optimistic locking technique described earlier, but doing so requires additional application code to implement and it’s probably easier to ensure that all updates in this case are targeted.

7.2.3. Orders

The atomicity and efficiency of updates that you saw in reviews can also be applied to orders. Specifically, you’re going to see the MongoDB calls needed to implement an add_to_cart function using a targeted update. This is a three-step process. First, you construct the product document that you’ll store in the order’s line-item array. Then you issue a targeted update, indicating that this is to be an upsert—an update that will insert a new document if the document to be updated doesn’t exist. (We’ll describe upserts in detail in the next section.) The upsert will create a new order object if it doesn’t yet exist, seamlessly handling both initial and subsequent additions to the shopping cart.^[3]

³
We’re using the terms shopping cart and order interchangeably because they’re both represented using the same document. They’re formally differentiated only by the document’s state field (a document with a state of CART is a shopping cart).

Let’s begin by constructing a sample document to add to the cart:

cart_item = {
  _id:  ObjectId("4c4b1476238d3b4dd5003981"),
  slug: "wheel-barrow-9092",
  sku:  "9092",
  name: "Extra Large Wheel Barrow",
  pricing: {
    retail: 5897,
    sale:   4897
  }
}

You’ll most likely build this document by querying the products collection and then extracting whichever fields need to be preserved as a line item. The product’s _id, sku, slug, name, and price fields should suffice. Next you’ll ensure that there’s an order for the customer with a status of 'CART' using the parameter {upsert: true}. This operation will also increment the order sub_total using the $inc operator:

selector = {
  user_id: ObjectId("4c4b1476238d3b4dd5000001"),
  state: 'CART'
}
update = {
  $inc: {
    sub_total: cart_item['pricing']['sale']
  }
}
db.orders.update(selector, update, {upsert: true})

Initial upsert to create order document

To make the code clearer, you’re constructing the query selector and the update document separately. The update document increments the order subtotal by the sale price of the cart item. Of course, the first time a user executes the add_to_cart function, no shopping cart will exist. That’s why you use an upsert here. The upsert will construct the document implied by the query selector including the update. Therefore, the initial upsert will produce an order document like this:

{
  user_id: ObjectId("4c4b1476238d3b4dd5000001"),
  state: 'CART',
  subtotal: 9794
}

You then perform an update of the order document to add the line item if it’s not already on the order:

selector = {user_id: ObjectId("4c4b1476238d3b4dd5000001"),
    state: 'CART',
    'line_items._id':
        {'$ne': cart_item._id}
    }

update = {'$push': {'line_items': cart_item}}
db.orders.update(selector, update)

Another update for quantities

Next you’ll issue another targeted update to ensure that the item quantities are correct. You need this update to handle the case where the user clicks Add to Cart on an item that’s already in the cart. In this case the previous update won’t add a new item to the cart, but you’ll still need to adjust the quantity:

selector = {
  user_id: ObjectId("4c4b1476238d3b4dd5000001"),
  state: 'CART',
  'line_items._id': ObjectId("4c4b1476238d3b4dd5003981")
}
update = {
  $inc: {
    'line_items.$.quantity': 1
  }
}
db.orders.update(selector, update)

We use the $inc operator to update the quantity on the individual line item. The update is facilitated by the positional operator, $, introduced previously. Thus, after the user clicks Add to Cart twice on the wheelbarrow product, the cart should look like this:

{
  user_id: ObjectId("4c4b1476238d3b4dd5000001"),
  state: 'CART',
  line_items: [
    {
      _id:  ObjectId("4c4b1476238d3b4dd5003981"),
      quantity:  2,
      slug: "wheel-barrow-9092",
      sku:  "9092",
      name: "Extra Large Wheel Barrow",
      pricing: {
        retail: 5897,
        sale:   4897
      }
    }
  ],
  subtotal: 9794
}

There are now two wheelbarrows in the cart, and the subtotal reflects that.

There are still more operations you’ll need in order to fully implement a shopping cart. Most of these, such as removing an item from the cart or clearing a cart altogether, can be implemented with one or more targeted updates. If that’s not obvious, the upcoming subsection describing each query operator should make it clear. As for the actual order processing, that can be handled by advancing the order document through a series of states and applying each state’s processing logic. We’ll demonstrate this in the next section, where we explain atomic document processing and the findAndModify command.

7.3. Atomic document processing

One tool you won’t want to do without is MongoDB’s findAndModify command.^[4] This command allows you to atomically update a document and return it in the same round-trip. An atomic update is one where no other operation can interrupt or interleave itself with the update. What if another user tries to change the document after you find it but before you modify it? The find might no longer apply. An atomic update prevents this case; all other operations must wait for the atomic update to finish.

⁴
The way this command is identified can vary by environment. The shell helper is invoked camel case as db.orders.findAndModify, whereas Ruby uses underscores: find_and_modify. To confuse the issue even more, the core server knows the command as findandmodify. You’ll use this final form if you ever need to issue the command manually.

Every update in MongoDB is atomic, but the difference with findAndModify is that it also atomically returns the document to you. Why is this useful? If you fetch and then update a document (or update then fetch it), there can be changes made to the document by another MongoDB user in between those operations. Thus it’s impossible to know the true state of the document you updated, before or after the update, even though the update is atomic, unless you use findAndModify. The other option is to use the optimistic locking mentioned in section 7.1, but that would require additional application logic to implement.

This atomic update capability is a big deal because of what it enables. For instance, you can use findAndModify to build job queues and state machines. You can then use these primitive constructs to implement basic transactional semantics, which greatly expand the range of applications you can build using MongoDB. With these transaction-like features, you can construct an entire e-commerce site on MongoDB—not just the product content, but the checkout mechanism and the inventory management as well.

To demonstrate, we’ll look at two examples of the findAndModify command in action. First, we’ll show how to handle basic state transitions on the shopping cart. Then we’ll look at a slightly more involved example of managing a limited inventory.

7.3.1. Order state transitions

All state transitions have two parts: a query ensuring a valid initial state, and an update that effects the change of state. Let’s skip forward a few steps in the order process and assume that the user is about to click the Pay Now button to authorize the purchase. If you’re going to authorize the user’s credit card synchronously on the application side, you need to ensure these four things:

1. You authorize for the amount that the user sees on the checkout screen.

2. The cart’s contents never change while in the process of authorization.

3. Errors in the authorization process return the cart to its previous state.

4. If the credit card is successfully authorized, the payment information is posted to the order, and that order’s state is transitioned to PRE-SHIPPING.

The state transitions that you’ll use are shown in figure 7.5.

Figure 7.5. Order state transitions

Prepare the order for checkout

The first step is to get the order into the new PRE-AUTHORIZE state. You use findAndModify to find the user’s current order object and ensure that the object is in a CART state:

newDoc = db.orders.findAndModify({
    query: {
      user_id: ObjectId("4c4b1476238d3b4dd5000001"),
      state: 'CART'
    },
    update: {
      $set: {
        state: 'PRE-AUTHORIZE'
      }
    },
    'new': true
  })

If successful, findAndModify will return the modified order object to newDoc.^[5] Once the order is in the PRE-AUTHORIZE state, the user won’t be able to edit the cart’s contents. This is because all updates to the cart always ensure a state of CART. findAndModify is useful here because you want to know the state of the document exactly when you changed its state to PRE-AUTHORIZE. What would happen to the total calculations if another thread was also attempting to move the user through the checkout process?

⁵
By default, the findAndModify command returns the document as it appears prior to the update. To return the modified document, you must specify 'new': true as in this example.

Verify the order and authorize

Now, in the preauthorization state, you take the returned order object and recalculate the various totals. Once you have those totals, you issue a new findAndModify that only transitions the document’s state to AUTHORIZING if the new totals match the old totals. Here’s what that findAndModify looks like:

oldDoc = db.orders.findAndModify({
    query: {
      user_id: ObjectId("4c4b1476238d3b4dd5000001"),
      total: 99000,
      state: "PRE-AUTHORIZE"
    },
    update: {
      '$set': {
        state: "AUTHORIZING"
      }
    }
  })

If this second findAndModify fails, then you must return the order’s state to CART and report the updated totals to the user. But if it succeeds, you know that the total to be authorized is the same total that was presented to the user. This means you can move on to the actual authorization API call. Thus, the application now issues a credit card authorization request on the user’s credit card. If the credit card fails to authorize, you record the failure and, as before, return the order to its CART state.

Finishing the order

If the authorization is successful, you write the authorization information to the order and transition it to the next state. The following strategy does both in the same findAndModify call. Here, the example uses a sample document representing the authorization receipt, which is attached to the original order:

auth_doc = {
  ts: new Date(),
  cc: 3432003948293040,
  id: 2923838291029384483949348,
  gateway: "Authorize.net"
}
db.orders.findAndModify({
    query: {
      user_id: ObjectId("4c4b1476238d3b4dd5000001"),
      state: "AUTHORIZING"
    },

    update: {
      $set: {
        state: "PRE-SHIPPING",
        authorization: auth_doc
      }
    }
  })

It’s important to be aware of the MongoDB features that facilitate this transactional process. There’s the ability to modify any one document atomically. There’s the guarantee of consistent reads along a single connection. And finally, there’s the document structure itself, which allows these operations to fit within the single-document atomicity that MongoDB provides. In this case, that structure allows you to fit line items, products, pricing, and user ownership into the same document, ensuring that you only ever need to operate on that one document to advance the sale.

This ought to strike you as impressive. But it may lead you to wonder, as it did us, whether any multi-object transaction-like behavior can be implemented with MongoDB. The answer is a cautious affirmative and can be demonstrated by looking into another e-commerce centerpiece: inventory management.

7.3.2. Inventory management

Not every e-commerce site needs strict inventory management. Most commodity items can be replenished in enough time to allow any order to go through regardless of the actual number of items on hand. In cases like these, managing inventory is easily handled by managing expectations; as soon as only a few items remain in stock, adjust the shipping estimates.

One-of-a-kind items present a different challenge. Imagine you’re selling concert tickets with assigned seats or handmade works of art. These products can’t be hedged; users will always need a guarantee that they can purchase the products they’ve selected. Here we’ll present a possible solution to this problem using MongoDB. This will further illustrate the creative possibilities in the findAndModify command and the judicious use of the document model. It will also show how to implement transactional semantics across multiple documents. Although you’ll only see a few of the key MongoDB calls used by this process, the full source code for the InventoryFetcher class is included with this book.

The way you model inventory can be best understood by thinking about a real store. If you’re in a gardening store, you can see and feel the physical inventory; dozens of shovels, rakes, and clippers may line the aisles. If you take a shovel and place it in your cart, that’s one less shovel available for the other customers. As a corollary, no two customers can have the same shovel in their shopping carts at the same time. You can use this simple principle to model inventory. For every physical piece of inventory in your warehouse, you store a corresponding document in an inventory collection. If there are 10 shovels in the warehouse, there are 10 shovel documents in the database. Each inventory item is linked to a product by sku, and each of these items can be in one of four states: AVAILABLE (0), IN_CART (1), PRE_ORDER (2), or PURCHASED (3).

Here’s a method that inserts three shovels, three rakes, and three sets of clippers as available inventory. The examples in this section are in Ruby, since transactions require more logic, so it’s useful to see a more concrete example of how an application would implement them:

3.times do
  $inventory.insert_one({:sku => 'shovel',   :state => AVAILABLE})
  $inventory.insert_one({:sku => 'rake',     :state => AVAILABLE})
  $inventory.insert_one({:sku => 'clippers', :state => AVAILABLE})
end

We’ll handle inventory management with a special inventory fetching class. We’ll first look at how this fetcher works and then we’ll peel back the covers to reveal its implementation.

Inventory fetcher

The inventory fetcher can add arbitrary sets of products to a shopping cart. Here you create a new order object and a new inventory fetcher. You then ask the fetcher to add three shovels and one set of clippers to a given order by passing an order ID and two documents specifying the products and quantities you want to the add_to_cart method. The fetcher hides the complexity of this operation, which is altering two collections at once:

$order_id = BSON::ObjectId('561297c5530a69dbc9000000')
$orders.insert_one({
    :_id => $order_id,
    :username => 'kbanker',
    :item_ids => []
  })

@fetcher = InventoryFetcher.new({
    :orders => $orders,
    :inventory => $inventory
  })

@fetcher.add_to_cart(@order_id,
    [
      {:sku => "shovel", :quantity => 3},
      {:sku => "clippers", :quantity => 1}
    ])

$orders.find({"_id" => $order_id}).each do |order|
    puts "
Here's the order:"
    p order
end

The add_to_cart method will raise an exception if it fails to add every item to a cart. If it succeeds, the order should look like this:

{
  "_id" => BSON::ObjectId('4cdf3668238d3b6e3200000a'),
  "username" => "kbanker",
  "item_ids" => [
    BSON::ObjectId('4cdf3668238d3b6e32000001'),
    BSON::ObjectId('4cdf3668238d3b6e32000004'),
    BSON::ObjectId('4cdf3668238d3b6e32000007'),
    BSON::ObjectId('4cdf3668238d3b6e32000009')
     ]
   }

The _id of each physical inventory item will be stored in the order document. You can query for each of these items like this:

puts "
Here's each item:"
order['item_ids'].each do |item_id|
  item = @inventory.find({"_id" => item_id}).each do |myitem|
    p myitem
  end
end

Looking at each of these items individually, you can see that each has a state of 1, corresponding to the IN_CART state. You should also notice that each item records the time of the last state change with a timestamp. You can later use this timestamp to expire items that have been in a cart for too long. For instance, you might give users 15 minutes to check out from the time they add products to their cart:

{
  "_id" => BSON::ObjectId('4cdf3668238d3b6e32000001'),
  "sku"=>"shovel",
  "state"=>1,
  "ts"=>"Sun Nov 14 01:07:52 UTC 2010"
}
{
  "_id"=>BSON::ObjectId('4cdf3668238d3b6e32000004'),
  "sku"=>"shovel",
  "state"=>1,
  "ts"=>"Sun Nov 14 01:07:52 UTC 2010"
}
{
  "_id"=>BSON::ObjectId('4cdf3668238d3b6e32000007'),
  "sku"=>"shovel",
  "state"=>1,
  "ts"=>"Sun Nov 14 01:07:52 UTC 2010"
}

Inventory management

If this InventoryFetcher’s API makes any sense, you should have at least a few hunches about how you’d implement inventory management. Unsurprisingly, the findAndModify command resides at its core. The full source code for the InventoryFetcher is included with the source code of this book. We’re not going to look at every line of code, but we’ll highlight the three key methods that make it work.

First, when you pass a list of items to be added to your cart, the fetcher attempts to transition each item from the state of AVAILABLE to IN_CART. If at any point this operation fails (if any one item can’t be added to the cart), the entire operation is rolled back. Have a look at the add_to_cart method that you invoked earlier:

def add_to_cart(order_id, *items)
  item_selectors = []
  items.each do |item|
    item[:quantity].times do
      item_selectors << {:sku => item[:sku]}
    end
  end
  transition_state(order_id, item_selectors,
      {:from => AVAILABLE, :to => IN_CART})
end

The *items syntax in the method arguments allows the user to pass in any number of objects, which are placed in an array called items. This method doesn’t do much. It takes the specification for items to add to the cart and expands the quantities so that one item selector exists for each physical item that will be added to the cart. For instance, this document, which says that you want to add two shovels

{:sku => "shovel", :quantity => 2}

becomes this:

[{:sku => "shovel"}, {:sku => "shovel"}]

You need a separate query selector for each item you want to add to your cart. Thus, the method passes the array of item selectors to another method called transition _state. For example, the previous code specifies that the state should be transitioned from AVAILABLE to IN_CART:

def transition_state(order_id, selectors, opts={})
  items_transitioned = []
  begin # use a begin/end block so we can do error recovery

    for selector in selectors do
      query = selector.merge({:state => opts[:from]})
      physical_item = @inventory.find_and_modify({
          :query => query,
          :update => {
            '$set' => {
              :state => opts[:to],          # target state
              :ts => Time.now.utc           # get the current client time
            }
          }
        })


      if physical_item.nil?
        raise InventoryFetchFailure
      end

      items_transitioned << physical_item['_id']   # push item into array
      @orders.update_one({:_id => order_id}, {
          '$push' => {
            :item_ids => physical_item['_id']
          }
        })
    end # of for loop

  rescue Mongo::OperationFailure, InventoryFetchFailure
    rollback(order_id, items_transitioned, opts[:from], opts[:to])
    raise InventoryFetchFailure, "Failed to add #{selector[:sku]}"
  end

  return items_transitioned.size
end

To transition state, each selector gets an extra condition, {:state => AVAILABLE}, and the selector is then passed to findAndModify, which, if matched, sets a timestamp and the item’s new state. The method then saves the list of items transitioned and updates the order with the ID of the item just added.

Graceful failure

If the findAndModify command fails and returns nil, then you raise an InventoryFetchFailure exception. If the command fails because of networking errors, you rescue the inevitable Mongo::OperationFailure exception. In both cases, you rescue by rolling back all the items transitioned thus far and then raise an InventoryFetchFailure, which includes the SKU of the item that couldn’t be added. You can then rescue this exception on the application layer to fail gracefully for the user.

All that now remains is to examine the rollback code:

def rollback(order_id, item_ids, old_state, new_state)
  @orders.update_one({"_id" => order_id},
                 {"$pullAll" => {:item_ids => item_ids}})

  item_ids.each do |id|
    @inventory. find_one_and_update({
        :query => {
          "_id" => id,
          :state => new_state
        }
      },
      {
        :update => {
          "$set" => {
            :state => old_state,
            :ts => Time.now.utc
          }
        }
      })
  end
end

You use the $pullAll operator to remove all of the IDs just added to the order’s item_ids array. You then iterate over the list of item IDs and transition each one back to its old state. The $pullAll operator as well as many other array update operators are covered in further detail in section 7.4.2.

The transition_state method can be used as the basis for other methods that move items through their successive states. It wouldn’t be difficult to integrate this into the order transition system that you built in the previous subsection, but that must be left as an exercise for the reader.

One scenario ignored in this implementation is the case when it’s impossible to roll back all the inventory items to their original state. This could occur if the Ruby driver was unable to communicate with MongoDB, or if the process running the rollback halted before completing. This would leave inventory items in an IN_CART state, but the orders collection wouldn’t have the inventory. In such cases managing transactions becomes difficult. These could eventually be fixed, however, by the shopping cart timeout mentioned earlier that removes items that have been in the shopping cart longer than some specified period.

You may justifiably ask whether this system is robust enough for production. This question can’t be answered easily without knowing more particulars, but what can be stated assuredly is that MongoDB provides enough features to permit a usable solution when you need transaction-like behavior. MongoDB was never intended to support transactions with multiple collections, but it allows the user to emulate such behavior with find_one_and_update and optimistic concurrency control. If you find yourself attempting to manage transactions often, it may be worth rethinking your schema or even using a different database. Not every application fits with MongoDB, but if you carefully plan your schema you can often obviate your need for such transactions.

7.4. Nuts and bolts: MongoDB updates and deletes

To understand updates in MongoDB, you need a holistic understanding of MongoDB’s document model and query language, and the examples in the preceding sections are great for helping with that. But here, as promised in this chapter’s introduction, we get down to brass tacks. This mostly involves brief summaries of each feature of the MongoDB update interface, but we also include several notes on performance. For brevity’s sake, most of the upcoming examples will be in JavaScript.

7.4.1. Update types and options

As we’ve shown in our earlier examples, MongoDB supports both targeted updates and updates via replacement. The former are defined by the use of one or more update operators; the latter by a document that will be used to replace the document matched by the update’s query selector.

Note that an update will fail if the update document is ambiguous. This is a common gotcha with MongoDB and an easy mistake to make given the syntax. Here, we’ve combined an update operator, $addToSet, with replacement-style semantics, {name: "Pitchfork"}:

db.products.update_one({}, {name: "Pitchfork", $addToSet: {tags: 'cheap'}})

If your intention is to change the document’s name, you must use the $set operator:

db.products.update_one({},
  {$set: {name: "Pitchfork"}, $addToSet: {tags: 'cheap'}})

Multidocument updates

An update will, by default, only update the first document matched by its query selector. To update all matching documents, you need to explicitly specify a multidocument update. In the shell, you can express this by adding the parameter multi: true. Here’s how you’d add the cheap tags to all documents in the products collection:

db.products.update({}, {$addToSet: {tags: 'cheap'}}, {multi: true})

Updates are atomic at a document level, which means that a statement that has to update 10 documents might fail for some reason after updating the first 3 of them. The application has to deal with such failures according to its policy.

With the Ruby driver (and most other drivers), you can express multidocument updates in a similar manner:

@products.update_one({},
    {'$addToSet' => {'tags' => 'cheap'}},
    {:multi => true})

Upserts

It’s common to need to insert an item if it doesn’t exist but update it if it does. You can handle this normally tricky-to-implement pattern using upserts. If the query selector matches, the update takes place normally. But if no document matches the query selector, a new document will be inserted. The new document’s attributes will be a logical merging of the query selector and the targeted update document.^[6]

⁶
Note that upserts don’t work with replacement-style update documents.

Here’s a simple example of an upsert using the shell, setting the upsert: true parameter to allow an upsert:

db.products.update({slug: 'hammer'},
                   {$addToSet: {tags: 'cheap'}}, {upsert: true})

And here’s an equivalent upsert in Ruby:

@products.update_one({'slug' => 'hammer'},
  {'$addToSet' => {'tags' => 'cheap'}}, {:upsert => true})

As you’d expect, upserts can insert or update only one document at a time. You’ll find upserts incredibly valuable when you need to update atomically and when there’s uncertainly about a document’s prior existence. For a practical example, see section 7.2.3, which describes adding products to a cart.

7.4.2. Update operators

MongoDB supports a host of update operators. Here we provide brief examples of each of them.

Standard update operators

This first set of operators is the most generic, and each works with almost any data type.

$inc

You use the $inc operator to increment or decrement a numeric value:

db.products.update({slug: "shovel"}, {$inc: {review_count: 1}})
db.users.update({username: "moe"}, {$inc: {password_retries: -1}})

You can also use $inc to add or subtract from numbers arbitrarily:

db.readings.update({_id: 324}, {$inc: {temp: 2.7435}})

$inc is as efficient as it is convenient. Because it rarely changes the size of a document, an $inc usually occurs in-place on disk, thus affecting only the value pair specified.^[7] The previous statement is only true for the MMAPv1 storage engine. The WiredTiger storage engine works differently as it uses a write-ahead transaction log in combination with checkpoints to ensure data persistence.

⁷
Exceptions to this rule arise when the numeric type changes. If the $inc results in a 32-bit integer being converted to a 64-bit integer, then the entire BSON document will have to be rewritten in-place.

As demonstrated in the code for adding products to a shopping cart, $inc works with upserts. For example, you can change the preceding update to an upsert like this:

db.readings.update({_id: 324}, {$inc: {temp: 2.7435}}, {upsert: true})

If no reading with an _id of 324 exists, a new document will be created with that _id and a temp with the value of the $inc, 2.7435.

$set and $unset

If you need to set the value of a particular key in a document, you’ll want to use $set. You can set a key to a value having any valid BSON type. This means that all of the following updates are possible:

db.readings.update({_id: 324}, {$set: {temp: 97.6}})
db.readings.update({_id: 325}, {$set: {temp: {f: 212, c: 100}}})
db.readings.update({_id: 326}, {$set: {temps: [97.6, 98.4, 99.1]}})

If the key being set already exists, then its value will be overwritten; otherwise, a new key will be created.

$unset removes the provided key from a document. Here’s how to remove the temp key from the reading document:

db.readings.update({_id: 324}, {$unset: {temp: 1}})

You can also use $unset on embedded documents and on arrays. In both cases, you specify the inner object using dot notation. If you have these two documents in your collection

{_id: 325, 'temp': {f: 212, c: 100}}
{_id: 326, temps: [97.6, 98.4, 99.1]}

then you can remove the Fahrenheit reading in the first document and the “zeroth” element in the second document like this:

db.readings.update({_id: 325}, {$unset: {'temp.f': 1}})
db.readings.update({_id: 326}, {$pop: {temps: -1}})

This dot notation for accessing subdocuments and array elements can also be used with $set.

Using $unset with arrays

Note that using $unset on individual array elements may not work exactly as you want it to. Instead of removing the element altogether, it merely sets that element’s value to null. To completely remove an array element, see the $pull and $pop operators:

db.readings.update({_id: 325}, {$unset: {'temp.f': 1}})
db.readings.update({_id: 326}, {$unset: {'temps.0': 1}})

$rename

If you need to change the name of a key, use $rename:

db.readings.update({_id: 324}, {$rename: {'temp': 'temperature'}})

You can also rename a subdocument:

db.readings.update({_id: 325}, {$rename: {'temp.f': 'temp.fahrenheit'}})

$setOnInsert

During an upsert, you sometimes need to be careful not to overwrite data that you care about. In this case it would be useful to specify that you only want to modify a field when the document is new, and you perform an insert, not when an update occurs. This is where the $setOnInsert operator comes in:

db.products.update({slug: 'hammer'}, {
    $inc: {
      quantity: 1
    },
    $setOnInsert: {

      state: 'AVAILABLE'
    }
  }, {upsert: true})

You want to increment the quantity for a certain inventory item without interfering with state, which has a default value of 'AVAILABLE'. If an insert is performed, then qty will be set to 1, and state will be set to its default value. If an update is performed, then only the increment to qty occurs. The $setOnInsert operator was added in MongoDB v2.4 to handle this case.

Array update operators

The centrality of arrays in MongoDB’s document model should be apparent. Naturally, MongoDB provides a handful of update operators that apply exclusively to arrays.

$push, $pushAll, and $each

If you need to append values to an array, $push is your friend. By default, it will add a single element to the end of an array. For example, adding a new tag to the shovel product is easy enough:

db.products.update({slug: 'shovel'}, {$push: {tags: 'tools'}})

If you need to add a few tags in the same update, you can use $each in conjunction with $push:

db.products.update({slug: 'shovel'},
  {$push: {tags: {$each: ['tools', 'dirt', 'garden']}}})

Note you can push values of any type onto an array, not just scalars. For an example, see the code in section 7.3.2 that pushed a product onto the shopping cart’s line items array.

Prior to MongoDB version 2.4, you pushed multiple values onto an array by using the $pushAll operator. This approach is still possible in 2.4 and later versions, but it’s considered deprecated and should be avoided if possible because $pushAll may be removed completely in the future. A $pushAll operation can be run like this:

db.products.update({slug: 'shovel'},
  {$pushAll: {'tags': ['tools', 'dirt', 'garden']}})

$slice

The $slice operator was added in MongoDB v2.4 to make it easier to manage arrays of values with frequent updates. It’s useful when you want to push values onto an array but don’t want the array to grow too big. It must be used in conjunction with the $push and $each operators, and it allows you to truncate the resulting array to a certain size, removing older versions first. The argument passed to $slice is an integer that must be less than or equal to zero. The value of this argument is -1 times the number of items that should remain in the array after the update.

These semantics can be confusing, so let’s look at a concrete example. Suppose you want to update a document that looks like this:

{
  _id: 326,
  temps: [92, 93, 94]
}

You update this document with this command:

db.temps.update({_id: 326}, {
    $push: {
      temps: {
        $each: [95, 96],
        $slice: -4
      }
    }
  })

Beautiful syntax. Here you pass -4 to the $slice operator. After the update, your document looks like this:

{
  _id: 326,
  temps: [93, 94, 95, 96]
}

After pushing values onto the array, you remove values from the beginning until only four are left. If you’d passed -1 to the $slice operator, the resulting array would be [96]. If you’d passed 0, it would have been [], an empty array. Note also that starting with MongoDB 2.6 you can pass a positive number as well. If a positive number is passed to $slice, it’ll remove values from the end of the array instead of the beginning. In the previous example, if you used $slice: 4 your result would’ve been temps: [92, 93, 94, 95].

$sort

Like $slice, the $sort operator was added in MongoDB v2.4 to help with updating arrays. When you use $push and $slice, you sometimes want to order the documents before slicing them off from the start of the array. Consider this document:

{
  _id: 300,
  temps: [
    { day: 6, temp: 90 },
    { day: 5, temp: 95 }
  ]
}

You have an array of subdocuments. When you push a subdocument onto this array and slice it, you first want to make sure it’s ordered by day, so you retain the higher day values. You can accomplish this with the following update:

db.temps.update({_id: 300}, {
    $push: {
      temps: {
        $each: [
          { day: 7, temp: 92 }
        ],
        $slice: -2,
        $sort: {
          day: 1
        }
      }
    }
  })

When this update runs, you first sort the temps array on day so that the lowest value is at the beginning. Then you slice the array down to two values. The result is the two subdocuments with the higher day values:

{
  _id: 300,
  temps: [
    { day: 6, temp: 90 },
    { day: 7, temp: 92 }
  ]
}

Used in this context, the $sort operator requires a $push, an $each, and a $slice. Though useful, this definitely handles a corner case, and you may not find yourself using the $sort update operator often.

$addToSet and $each

$addToSet also appends a value to an array, but it does so in a more discerning way: the value is added only if it doesn’t already exist in the array. Thus, if your shovel has already been tagged as a tool, then the following update won’t modify the document at all:

db.products.update({slug: 'shovel'}, {$addToSet: {'tags': 'tools'}})

If you need to add more than one value to an array uniquely in the same operation, you must use $addToSet with the $each operator. Here’s how that looks:

db.products.update({slug: 'shovel'},
  {$addToSet: {tags: {$each: ['tools', 'dirt', 'steel']}}})

Only those values in $each that don’t already exist in tags will be appended. Note that $each can only be used with the $addToSet and $push operators.

$pop

The most elementary way to remove an item from an array is with the $pop operator. If $push appends an item to an array, a subsequent $pop will remove that last item pushed. Though it’s frequently used with $push, you can use $pop on its own. If your tags array contains the values ['tools', 'dirt', 'garden', 'steel'], then the following $pop will remove the steel tag:

db.products.update({slug: 'shovel'}, {$pop: {'tags': 1}})

Like $unset, $pop’s syntax is {$pop: {'elementToRemove': 1}}. But unlike $unset, $pop takes a second possible value of -1 to remove the first element of the array. Here’s how to remove the tools tag from the array:

db.products.update({slug: 'shovel'}, {$pop: {'tags': -1}})

One possible point of frustration is that you can’t return the value that $pop removes from the array. Thus, despite its name, $pop doesn’t work exactly like the stack operation you might have in mind.

$bit

If you ever use bitwise operations in your application code, you may find yourself wishing that you could use the same operations in an update. Bitwise operations are used to perform logic on a value at the individual bit level. One common case (particularly in C programming) is to use bitwise operations to pass flags through a variable. In other words, if the fourth bit in an integer is 1, then some condition applies. There’s often a clearer and more usable way to handle these operations, but this kind of storage does keep size to a minimum and matches how existing systems work. MongoDB includes the $bit operator to make bitwise OR and AND operations possible in updates.

Let’s look at an example of storing bit-sensitive values in MongoDB and manipulating them in an update. Unix file permissions are often stored in this way. If you run ls –l in a Unix system, you’ll see flags like drwxr-xr-x. The first flag, d, indicates the file is a directory. r denotes read permissions, w denotes write permissions, and x denotes execute permissions. There are three blocks of these flags, denoting these permissions for the user, the user’s group, and everyone, respectively. Thus the example given says that the user has all permissions but others have only read and execute permissions.

A permission block is sometimes described with a single number, according to the spacing of these flags in the binary system. The x value is 1, the w value is 2, and the r value is 4. Thus you can use 7 to indicate a binary 111, or rwx. You can use 5 to indicate a binary 101, or r-x. And you can use 3 to indicate a binary 011, or –wx.

Let’s store a variable in MongoDB that uses these characteristics. Start with the document:

{
  _id: 16,
  permissions: 4
}

The 4 in this case denotes binary 100, or r--. You can use a bitwise OR operation to add write permissions:

db.permissions.update({_id: 16}, {$bit: {permissions: {or: NumberInt(2)}}})

In the JavaScript shell you must use NumberInt() because it uses doubles for number by default. The resulting document contains a binary 100 ORed with a binary 010, resulting in 110, which is decimal 6:

{
  _id: 16,
  permissions: 6
}

You can also use and instead of or, for a bit-wise AND operation. This is another corner-case operator, which you might not use often but that can be useful in certain situations.

$pull and $pullAll

$pull is $pop’s more sophisticated cousin. With $pull, you specify exactly which array element to remove by value, not by position. Returning to the tags example, if you need to remove the tag dirt, you don’t need to know where in the array it’s located; you simply tell the $pull operator to remove it:

db.products.update({slug: 'shovel'}, {$pull: {tags: 'dirt'}})

$pullAll works similarly to $pushAll, allowing you to provide a list of values to remove. To remove both the tags dirt and garden, you can use $pullAll like this:

db.products.update({slug: 'shovel'},
  {$pullAll: {'tags': ['dirt', 'garden']}})

A powerful feature of $pull is the fact that you can pass in a query as an argument to choose which elements are pulled. Consider the document:

{_id: 326, temps: [97.6, 98.4, 100.5, 99.1, 101.2]}

Suppose you want to remove temperatures greater than 100. A query to do so might look like this:

db.readings.update({_id: 326}, {$pull: {temps: {$gt: 100}}})

This alters the document to the following:

{_id: 326, temps: [97.6, 98.4, 99.1]}

Positional updates

It’s common to model data in MongoDB using an array of subdocuments, but it wasn’t so easy to manipulate those subdocuments until the positional operator came along. The positional operator allows you to update a subdocument in an array identified by using dot notation in your query selector. For example, suppose you have an order document that looks like this:

{
  _id: ObjectId("6a5b1476238d3b4dd5000048"),
  line_items: [

    {
      _id: ObjectId("4c4b1476238d3b4dd5003981"),
      sku: "9092",
      name: "Extra Large Wheelbarrow",
      quantity: 1,
      pricing: {
        retail: 5897,
        sale: 4897
      }
    },
    {
      _id: ObjectId("4c4b1476238d3b4dd5003982"),
      sku: "10027",
      name: "Rubberized Work Glove, Black",
      quantity: 2,
      pricing: {
        retail: 1499,
        sale: 1299
      }
    }
  ]
}

You want to be able to set the quantity of the second line item, with the SKU of 10027, to 5. The problem is that you don’t know where in the line_items array this particular subdocument resides. You don’t even know whether it exists. You can use a simple query selector and the positional operator to solve both these problems:

query  = {
  _id: ObjectId("6a5b1476238d3b4dd5000048"),
  'line_items.sku': "10027"
}
update = {
  $set: {
    'line_items.$.quantity': 5
  }
}
db.orders.update(query, update)

The positional operator is the $ that you see in the line_items.$.quantity string. If the query selector matches, then the index of the document having a SKU of 10027 will replace the positional operator internally, thereby updating the correct document.

If your data model includes subdocuments, you’ll find the positional operator useful for performing nuanced document updates.

7.4.3. The findAndModify command

With so many fleshed-out examples of using the findAndModify command earlier in this chapter, it only remains to enumerate its options when using it in the JavaScript shell. Here’s an example of a simple findAndModify:

doc = db.orders.findAndModify({
     query: {
       user_id: ObjectId("4c4b1476238d3b4dd5000001"),
     },
     update: {
       $set: {
         state: "AUTHORIZING"
       }
     }
   })

There are a number of options for altering this command’s functionality. Of the following, the only options required are query and either update or remove:

query —A document query selector. Defaults to {}.
update —A document specifying an update. Defaults to {}.
remove —A Boolean value that, when true, removes the object and then returns it. Defaults to false.
new —A Boolean that, if true, returns the modified document as it appears after the update has been applied. Defaults to false, meaning the original document is returned.
sort —A document specifying a sort direction. Because findAndModify will modify only one document at a time, the sort option can be used to help control which matching document is processed. For example, you might sort by {created_at: -1} to process the most recently created matching document.
fields —If you only need to return a subset of fields, use this option to specify them. This is especially helpful with larger documents. The fields are specified as they’d be in any query. See the section on fields in chapter 5 for examples.
upsert —A Boolean that, when true, treats findAndModify as an upsert. If the document sought doesn’t exist, it will be created. Note that if you want to return the newly created document, you also need to specify {new: true}.

7.4.4. Deletes

You’ll be relieved to learn that removing documents poses few challenges. You can remove an entire collection or you can pass a query selector to the remove method to delete only a subset of a collection. Deleting all reviews is simple:

db.reviews.remove({})

But it’s much more common to delete only the reviews of a particular user:

db.reviews.remove({user_id: ObjectId('4c4b1476238d3b4dd5000001')})

All calls to remove take an optional query specifier for selecting exactly which documents to delete. As far as the API goes, that’s all there is to say. But you’ll have a few questions surrounding the concurrency and atomicity of these operations. We’ll explain that in the next section.

7.4.5. Concurrency, atomicity, and isolation

It’s important to understand how concurrency works in MongoDB. Prior to MongoDB v2.2, the locking strategy was rather coarse; a single global reader-writer lock reigned over the entire mongod instance. What this meant that at any moment in time, MongoDB permitted either one writer or multiple readers (but not both). In MongoDB v2.2 this was changed to a database-level lock, meaning these semantics apply at the database level rather than throughout the entire MongoDB instance; a database can have either one writer or multiple readers. In MongoDB v3.0, the WiredTiger storage engine works on the collection level and offers document-level locking. Other storage engines may offer other characteristics.

The locking characteristics sound a lot worse than they are in practice because quite a few concurrency optimizations exist around this lock. One is that the database keeps an internal map of which documents are in RAM. For requests to read or write documents not in RAM, the database yields to other operations until the document can be paged into memory.

A second optimization is the yielding of write locks. The issue is that if any one write takes a long time to complete, all other read and write operations will be blocked for the duration of the original write. All inserts, updates, and removes take a write lock. Inserts rarely take a long time to complete. But updates that affect, say, an entire collection, as well as deletes that affect a lot of documents, can run long. The current solution to this is to allow these long-running ops to yield periodically for other readers and writers. When an operation yields, it pauses itself, releases its lock, and resumes later.

Despite these optimizations, MongoDB’s locking can affect performance in workloads where there are both heavy reads and heavy writes. A good but naive way to avoid trouble is to place heavily trafficked collections in separate databases, especially when you’re using the MMAPv1 storage engine. But as mentioned earlier, the situation with MongoDB v3.0 is a lot better because WiredTiger works on the collection level instead of the database level.

When you’re updating and removing documents, this yielding behavior can be a mixed blessing. It’s easy to imagine situations where you’d want all documents updated or removed before any other operation takes place. For these cases, you can use a special option called $isolated to keep the operation from yielding. You add the $isolated operator to the query selector like this:

db.reviews.remove({user_id: ObjectId('4c4b1476238d3b4dd5000001'),
  $isolated: true})

The same can be applied to any multi-update. This forces the entire multi-update to complete in isolation:

db.reviews.update({$isolated: true}, {$set: {rating: 0}}, {multi: true})

This update sets each review’s rating to 0. Because the operation happens in isolation, the operation will never yield, ensuring a consistent view of the system at all times.

Note that if an operation using $isolated fails halfway through, there’s no implicit rollback. Half the documents will have been updated while the other half will still have their original value. Prior to MongoDB v2.2 the $isolated operator was called $atomic, a name that was deprecated presumably because these operations aren’t classically atomic in this failure scenario. This, combined with the fact that the $isolated operator doesn’t work in sharded collections, means that you should use it with care.

7.4.6. Update performance notes

The following information only applies to the MMAPv1 storage engine, which is currently the default storage engine. Chapter 10 talks about WiredTiger, which works differently and more efficiently than MMAPv1. If you’re curious about WiredTiger, you’re free to read chapter 10 right now!

Experience shows that having a basic mental model of how updates affect a document on disk helps users design systems with better performance. The first thing you should understand is the degree to which an update can be said to happen “in-place.” Ideally, an update will affect the smallest portion of a BSON document on disk because this leads to the greatest efficiency. But this isn’t always what happens.

There are essentially three kinds of updates to a document on disk. The first, and most efficient, takes place when only a single value is updated and the size of the overall BSON document doesn’t change. This often happens with the $inc operator. Because $inc is only incrementing an integer, the size of that value on disk won’t change. If the integer represents an int, it’ll always take up four bytes on disk; long integers and doubles will require eight bytes. But altering the values of these numbers doesn’t require any more space and, therefore, only that one value within the document must be rewritten on disk.

The second kind of update changes the size or structure of a document. A BSON document is literally represented as a byte array, and the first four bytes of the document always store the document’s size. Thus, if you use the $push operator on a document, you’re both increasing the overall document’s size and changing its structure. This requires that the entire document be rewritten on disk. This isn’t going to be horribly inefficient, but it’s worth keeping in mind. But if you have extremely large documents—say, around 4 MB—and you’re pushing values onto arrays in those documents, that’s potentially a lot of work on the server side. This means that if you intend to do a lot of updates, it’s best to keep your documents small.

The final kind of update is a consequence of rewriting a document. If a document is enlarged and can no longer fit in its allocated space on disk, not only does it need to be rewritten, but it must also be moved to a new space. This moving operation can be potentially expensive if it occurs often. MongoDB attempts to mitigate this by dynamically adjusting a padding factor on a per-collection basis. This means that if, within a given collection, lots of updates are taking place that require documents to be relocated, the internal padding factor will be increased. The padding factor is multiplied by the size of each inserted document to get the amount of extra space to create beyond the document itself. This may reduce the number of future document relocations. As of MongoDB 2.6, power of 2 is used to size the initial allocation of a new document, which is more efficient than the method described in this paragraph. Additionally, MongoDB 3.0 uses the power of 2 sizes allocation as the default record allocation strategy for MMAPv1.

To see a given collection’s padding factor, run the collection stats command:

db.tweets.stats()
{
  "ns" : "twitter.tweets",
  "count" : 53641,
  "size" : 85794884,
  "avgObjSize" : 1599.4273783113663,
  "storageSize" : 100375552,
  "numExtents" : 12,
  "nindexes" : 3,
  "lastExtentSize" : 21368832,
  "paddingFactor" : 1.2,
  "flags" : 0,
  "totalIndexSize" : 7946240,
  "indexSizes" : {
    "_id_" : 2236416,
    "user.friends_count_1" : 1564672,
    "user.screen_name_1_user.created_at_-1" : 4145152
  },
  "ok" : 1
}

This collection of tweets has a padding factor of 1.2, which indicates that when a 100-byte document is inserted, MongoDB will allocate 120 bytes on disk. The default padding value is 1, which indicates that no extra space will be allocated.

Now, a brief word of warning. The considerations mentioned here apply especially to deployments where the data size exceeds RAM or where an extreme write load is expected. In these cases, rewriting or moving a document carries a particularly high cost. As you scale your MongoDB applications, think carefully about the best way to use update operations like $inc to avoid these costs.

7.5. Reviewing update operators

Table 7.1 lists the update operators we’ve discussed previously in this chapter.

Table 7.1. Operators

Operators
$inc	Increment fields by given values.
$set	Set fields to the given values.
$unset	Unset the passed-in fields.
$rename	Rename fields to the given values.
$setOnInsert	In an upsert, set fields only when an insert occurs.
$bit	It performs a bitwise update of a field.
Array Operators
$	Update the subdocument at the position discovered by the query selector.
$push	Add a value to an array.
$pushAll	Add an array of values to an array. Deprecated in favor of $each.
$addToSet	Add a value to an array but do nothing if it’s a duplicate.
$pop	Remove first or last item from an array.
$pull	Remove values from an array that match a given query.
$pullAll	Remove multiple values from an array.
Array Operator Modifiers
$each	Used with $push and $addToSet to apply these operators to multiple values.
$slice	Used with $push and $each to slice the updated array down to a certain size.
$sort	Used with $push, $each, and $slice to sort subdocuments in an array before slicing.
Isolation Operators
$isolated	Don’t allow other operations to interleave with an update of multiple documents.

7.6. Summary

We’ve covered a lot in this chapter. The variety of updates may at first feel like a lot to take in, but the power that these updates represent should be reassuring. The fact is that MongoDB’s update language is as sophisticated as its query language. You can update a simple document as easily as you can a complex, nested structure. When needed, you can atomically update individual documents and, in combination with findAndModify, build transactional workflows.

If you’ve finished this chapter and feel like you can apply the examples here on your own, you’re well on your way to becoming a MongoDB guru.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 7. Updates, atomic operations, and deletes

Create new playlist

Sign In

Sign Up

Chapter 7. Updates, atomic operations, and deletes

7.1. A brief tour of document updates

7.1.1. Modify by replacement

7.1.2. Modify by operator

7.1.3. Both methods compared

7.1.4. Deciding: replacement vs. operators

7.2. E-commerce updates

7.2.1. Products and categories

Average product ratings

The category hierarchy

Figure 7.1. An initial category hierarchy

Figure 7.2. Adding a Gardening category

Figure 7.3. The category tree in its final state

7.2.2. Reviews

Figure 7.4. When a review is updated concurrently via targeted and replacement updates, data can be lost.

7.2.3. Orders

Initial upsert to create order document

Another update for quantities

7.3. Atomic document processing

7.3.1. Order state transitions

Figure 7.5. Order state transitions

Prepare the order for checkout

Verify the order and authorize

Finishing the order

7.3.2. Inventory management

Inventory fetcher

Inventory management

Graceful failure

7.4. Nuts and bolts: MongoDB updates and deletes

7.4.1. Update types and options

Multidocument updates

Upserts

7.4.2. Update operators

Standard update operators

$inc

$set and $unset

$rename

$setOnInsert

Array update operators

$push, $pushAll, and $each

$slice

$sort

$addToSet and $each

$pop

$bit

$pull and $pullAll

Positional updates

7.4.3. The findAndModify command

7.4.4. Deletes

7.4.5. Concurrency, atomicity, and isolation

7.4.6. Update performance notes

7.5. Reviewing update operators

Table 7.1. Operators

7.6. Summary

Table of Contents for
Chapter 7. Updates, atomic operations, and deletes