Day 1: CRUD and Nesting

We’ll spend today working on some CRUD operations and finish up by performing nested queries in MongoDB. As usual, we won’t walk you through the installation steps, but if you visit the Mongo website,[26] you can download a build for your OS or find instructions on how to build from source. If you have OS X, we recommend installing via Homebrew (brew install mongodb). If you use a Debian/Ubuntu variant, try MongoDB’s own apt-get package.

To prevent typos, Mongo requires you to first create the directory where mongod will store its data. A common location is /data/db. Ensure the user you run the server under has permission to read and write to this directory. If it’s not already running, you can fire up the Mongo service by running mongod.

Command-Line Fun

To create a new database named book, first run this command in your terminal. It will connect to the MySQL-inspired command-line interface.

 $ ​​mongo​​ ​​book

Typing help in the console is a good start. We’re currently in the book database, but you can view others via show dbs and switch databases with the use command.

Creating a collection in Mongo is as easy as adding an initial record to the collection. Because Mongo is schemaless, there is no need to define anything up front; merely using it is enough. What’s more, our book database doesn’t really exist until we first add values into it. The following code creates/inserts a towns collection:

 > db.towns.insert({
  name: ​"New York"​,
  population: 22200000,
  lastCensus: ISODate(​"2016-07-01"​),
  famousFor: [ ​"the MOMA"​, ​"food"​, ​"Derek Jeter"​ ],
  mayor : {
  name : ​"Bill de Blasio"​,
  party : ​"D"
  }
 })

In the previous section, we said documents were JSON (well, really BSON under the hood), so we add new documents as JSON (as we will do later on with CouchDB and, to a lesser extent, DynamoDB).

With the show collections command, you can verify the collection now exists.

 > show collections
 towns

We just created the towns collection by storing an object in it. We can list the contents of a collection via find. We formatted the output here for readability, but yours may just output as a single wrapped line.

 > db.towns.find()
 {
 "_id"​ : ObjectId(​"59093bc08c87e2ff4157bd9f"​),
 "name"​ : ​"New York"​,
 "population"​ : 22200000,
 "lastCensus"​ : ISODate(​"2016-07-01T00:00:00Z"​),
 "famousFor"​ : [ ​"the MOMA"​, ​"food"​, ​"Derek Jeter"​ ],
 "mayor"​ : {
 "name"​ : ​"Bill de Blasio"​,
 "party"​ : ​"I"
  }
 }

Unlike a relational database, Mongo does not support server-side joins. A single JavaScript call will retrieve a document and all of its nested content, free of charge.

You may have noticed that the JSON output of your newly inserted town contains an _id field of type ObjectId. This is akin to SERIAL incrementing a numeric primary key in PostgreSQL. The ObjectId is always 12 bytes, composed of a timestamp, client machine ID, client process ID, and a 3-byte incremented counter. The figure shows how bytes are laid out.

images/mongo-object-id.png

What’s great about this autonumbering scheme is that each process on every machine can handle its own ID generation without colliding with other mongod instances. This design choice exhibits Mongo’s generally distributed nature.

JavaScript

Mongo’s native tongue is JavaScript. You’ll use it when doing things as complex as mapreduce queries or as simple as asking for help.

 > db.help()
 > db.towns.help()

These commands will list available functions related to the given object. db is a JavaScript object that contains information about the current database. db.x is a JavaScript object representing a collection (named x). Commands are just JavaScript functions.

 > ​typeof​ db
 object
 > ​typeof​ db.towns
 object
 > ​typeof​ db.towns.insert
 function

If you want to inspect the source code for a function, call it without parameters or parentheses (think more Python than Ruby).

 > db.towns.insert
 function​ (obj, options, _allowDot) {
 if​ (!obj)
 throw​ Error(​"no object passed to insert!"​);
 
 var​ flags = 0;
 
 // etc.
 }

Let’s populate a few more documents into our towns collection by creating our own JavaScript function.

 function​ insertCity(
  name, population, lastCensus,
  famousFor, mayorInfo
 ) {
  db.towns.insert({
  name: name,
  population: population,
  lastCensus: ISODate(lastCensus),
  famousFor: famousFor,
  mayor : mayorInfo
  });
 }

You can just paste the code for the function into the shell. Then we can call it.

 > insertCity(​"Punxsutawney"​, 6200, ​'2016-01-31'​,
  [​"Punxsutawney Phil"​], { name : ​"Richard Alexander"​ }
 )
 > insertCity(​"Portland"​, 582000, ​'2016-09-20'​,
  [​"beer"​, ​"food"​, ​"Portlandia"​], { name : ​"Ted Wheeler"​, party : ​"D"​ }
 )

We should now have three towns in our collection, which you can confirm by calling db.towns.find as before.

Reading: More Fun in Mongo

Earlier, we called the find function without params to get all documents. To access a specific one, you only need to set an _id property. _id is of type ObjectId, and so to query, you must convert a string by wrapping it in an ObjectId(str) function.

 > db.towns.find({ ​"_id"​ : ObjectId(​"59094288afbc9350ada6b807"​) })
 {
 "_id"​ : ObjectId(​"59094288afbc9350ada6b807"​),
 "name"​ : ​"Punxsutawney"​,
 "population"​ : 6200,
 "lastCensus"​ : ISODate(​"2016-01-31T00:00:00Z"​),
 "famousFor"​ : [ ​"Punxsutawney Phil"​ ],
 "mayor"​ : { ​"name"​ : ​"Richard Alexander"​ }
 }

The find function also accepts an optional second parameter: a fields object we can use to filter which fields are retrieved. If we want only the town name (along with _id), pass in name with a value resolving to 1 (or true).

 > db.towns.find({ _id : ObjectId(​"59094288afbc9350ada6b807"​) }, { name : 1 })
 {
 "_id"​ : ObjectId(​"59093e9eafbc9350ada6b803"​),
 "name"​ : ​"Punxsutawney"
 }

To retrieve all fields except name, set name to 0 (or false or null).

 > db.towns.find({ _id : ObjectId(​"59094288afbc9350ada6b807"​) }, { name : 0 })
 {
 "_id"​ : ObjectId(​"59093e9eafbc9350ada6b803"​),
 "population"​ : 6200,
 "lastCensus"​ : ISODate(​"2016-01-31T00:00:00Z"​),
 "famousFor"​ : [ ​"Punxsutawney Phil"​ ]
 }

As in PostgreSQL, in Mongo you can construct ad hoc queries on the basis of field values, ranges, or a combination of criteria. To find all towns that begin with the letter P and have a population less than 10,000, you can use a Perl-compatible regular expression (PCRE)[28] and a range operator. This query should return the JSON object for Punxsutawney, but including only the name and population fields:

 > db.towns.find(
  { name : ​/^P/​, population : { $lt : 10000 } },
  { _id: 0, name : 1, population : 1 }
 )
 { ​"name"​ : ​"Punxsutawney"​, ​"population"​ : 6200 }

Conditional operators in Mongo follow the format of field : { $op : value }, where $op is an operation like $ne (not equal to) or $gt (greater than). You may want a terser syntax, like field < value. But this is JavaScript code, not a domain-specific query language, so queries must comply with JavaScript syntax rules (later today you’ll see how to use the shorter syntax in a certain case, but we’ll skip that for now).

The good news about the querying language being JavaScript is that you can construct operations as you would objects. Here, we build criteria where the population must be between 10,000 and 1 million people.

 > ​var​ population_range = {
  $lt: 1000000,
  $gt: 10000
 }
 > db.towns.find(
  { name : ​/^P/​, population : population_range },
  { name: 1 }
 )
 { ​"_id"​ : ObjectId(​"59094292afbc9350ada6b808"​), ​"name"​ : ​"Portland"​ }

In addition to number ranges, we can also retrieve date ranges. For example, we can find all names with a lastCensus greater than or equal to June 1, 2016, like this:

 > db.towns.find(
  { lastCensus : { $gte : ISODate(​'2016-06-01'​) } },
  { _id : 0, name: 1 }
 )
 { ​"name"​ : ​"New York"​ }
 { ​"name"​ : ​"Portland"​ }

Notice how we again suppressed the _id field in the output explicitly by setting it to 0.

Digging Deep

Mongo loves nested array data. You can query by matching exact values:

 > db.towns.find(
  { famousFor : ​'food'​ },
  { _id : 0, name : 1, famousFor : 1 }
 )
 { ​"name"​ : ​"New York"​, ​"famousFor"​ : [ ​"the MOMA"​, ​"food"​, ​"Derek Jeter"​ ] }
 { ​"name"​ : ​"Portland"​, ​"famousFor"​ : [ ​"beer"​, ​"food"​, ​"Portlandia"​ ] }

as well as matching partial values:

 > db.towns.find(
  { famousFor : ​/moma/​ },
  { _id : 0, name : 1, famousFor : 1 }
 )
 { ​"name"​ : ​"New York"​, ​"famousFor"​ : [ ​"the MOMA"​, ​"food"​ ] }

or query by all matching values:

 > db.towns.find(
  { famousFor : { $all : [​'food'​, ​'beer'​] } },
  { _id : 0, name:1, famousFor:1 }
 )
 { ​"name"​ : ​"Portland"​, ​"famousFor"​ : [ ​"beer"​, ​"food"​, ​"Portlandia"​ ] }

or the lack of matching values:

 > db.towns.find(
  { famousFor : { $nin : [​'food'​, ​'beer'​] } },
  { _id : 0, name : 1, famousFor : 1 }
 )
 { ​"name"​ : ​"Punxsutawney"​, ​"famousFor"​ : [ ​"Punxsutawney Phil"​ ] }

But the true power of Mongo stems from its ability to dig down into a document and return the results of deeply nested subdocuments. To query a subdocument, your field name is a string separating nested layers with a dot. For instance, you can find towns with mayors from the Democratic Party:

 > db.towns.find(
  { ​'mayor.party'​ : ​'D'​ },
  { _id : 0, name : 1, mayor : 1 }
 )
 { ​"name"​ : ​"New York"​, ​"mayor"​ : { ​"name"​ : ​"Bill de Blasio"​, ​"party"​ : ​"D"​ } }
 { ​"name"​ : ​"Portland"​, ​"mayor"​ : { ​"name"​ : ​"Ted Wheeler"​, ​"party"​ : ​"D"​ } }

or those with mayors who don’t have a party:

 > db.towns.find(
  { ​'mayor.party'​ : { $exists : ​false​ } },
  { _id : 0, name : 1, mayor : 1 }
 )
 { ​"name"​ : ​"Punxsutawney"​, ​"mayor"​ : { ​"name"​ : ​"Richard Alexander"​ } }

The previous queries are great if you want to find documents with a single matching field, but what if you need to match several fields of a subdocument?

elemMatch

We’ll round out our dig with the $elemMatch directive. Let’s create another collection that stores countries. This time we’ll override each _id to be a string of our choosing rather than an auto-generated identifier.

 > db.countries.insert({
  _id : ​"us"​,
  name : ​"United States"​,
  exports : {
  foods : [
  { name : ​"bacon"​, tasty : ​true​ },
  { name : ​"burgers"​ }
  ]
  }
 })
 > db.countries.insert({
  _id : ​"ca"​,
  name : ​"Canada"​,
  exports : {
  foods : [
  { name : ​"bacon"​, tasty : ​false​ },
  { name : ​"syrup"​, tasty : ​true​ }
  ]
  }
 })
 > db.countries.insert({
  _id : ​"mx"​,
  name : ​"Mexico"​,
  exports : {
  foods : [{
  name : ​"salsa"​,
  tasty : ​true​,
  condiment : ​true
  }]
  }
 })

To validate the countries were added, we can execute the count function, expecting the number 3.

 > db.countries.count()
 3

Let’s find a country that not only exports bacon but exports tasty bacon.

 > db.countries.find(
  { ​'exports.foods.name'​ : ​'bacon'​, ​'exports.foods.tasty'​ : ​true​ },
  { _id : 0, name : 1 }
 )
 { ​"name"​ : ​"United States"​ }
 { ​"name"​ : ​"Canada"​ }

But this isn’t what we wanted. Mongo returned Canada because it exports bacon and exports tasty syrup. $elemMatch helps us here. It specifies that if a document (or nested document) matches all of our criteria, the document counts as a match.

 > db.countries.find(
  {
 'exports.foods'​ : {
  $elemMatch : {
  name : ​'bacon'​,
  tasty : ​true
  }
  }
  },
  { _id : 0, name : 1 }
 )
 { ​"name"​ : ​"United States"​ }

$elemMatch criteria can utilize advanced operators, too. You can find any country that exports a tasty food that also has a condiment label:

 > db.countries.find(
  {
 'exports.foods'​ : {
  $elemMatch : {
  tasty : ​true​,
  condiment : { $exists : ​true​ }
  }
  }
  },
  { _id : 0, name : 1 }
 )
 { ​"name"​ : ​"Mexico"​ }

Mexico is just what we wanted.

Boolean Ops

So far, all of our criteria are implicitly and operations. If you try to find a country with the name United States and an _id of mx, Mongo will yield no results.

 > db.countries.find(
  { _id : ​"mx"​, name : ​"United States"​ },
  { _id : 1 }
 )

However, searching for one or the other with $or will return two results. Think of this layout like prefix notation: OR A B.

 db.countries.find(
  {
  $or : [
  { _id : ​"mx"​ },
  { name : ​"United States"​ }
  ]
  },
  { _id:1 }
 )
 { ​"_id"​ : ​"us"​ }
 { ​"_id"​ : ​"mx"​ }

There are so many operators in Mongo that we can’t cover them all here, but we hope this has given you a taste of MongoDB’s powerful querying capabilities. The table is not a complete list of the commands but it does cover a good chunk of them.

Command

Description

$regex

Match by any PCRE-compliant regular expression string (or just use the // delimiters as shown earlier)

$ne

Not equal to

$lt

Less than

$lte

Less than or equal to

$gt

Greater than

$gte

Greater than or equal to

$exists

Check for the existence of a field

$all

Match all elements in an array

$in

Match any elements in an array

$nin

Does not match any elements in an array

$elemMatch     

Match all fields in an array of nested documents

$or

or

$nor

Not or

$size

Match array of given size

$mod

Modulus

$type

Match if field is a given datatype

$not

Negate the given operator check

You can find all the commands on the MongoDB online documentation or grab a cheat sheet from the Mongo website. We will revisit querying in the days to come.

Updating

We have a problem. New York and Punxsutawney are unique enough, but did we add Portland, Oregon, or Portland, Maine (or Texas or the others)? Let’s update our towns collection to add some U.S. states.

The update(criteria,operation) function requires two parameters. The first is a criteria query—the same sort of object you would pass to find. The second parameter is either an object whose fields will replace the matched document(s) or a modifier operation. In this case, the modifier is to $set the field state with the string OR.

 db.towns.update(
  { _id : ObjectId(​"4d0ada87bb30773266f39fe5"​) },
  { $set : { ​"state"​ : ​"OR"​ } }
 );

You may wonder why the $set operation is even required. Mongo doesn’t think in terms of attributes; it has only an internal, implicit understanding of attributes for optimization reasons. But nothing about the interface is attribute-oriented. Mongo is document-oriented. You will rarely want something like this (notice the lack of $set operation):

 db.towns.update(
  { _id : ObjectId(​"4d0ada87bb30773266f39fe5"​) },
  { state : ​"OR"​ }
 );

This would replace the entire matching document with the document you gave it ({ state : "OR" }). Because you didn’t give it a command like $set, Mongo assumes you just want to switch them up, so be careful.

We can verify our update was successful by finding it (note our use of findOne to retrieve only one matching object).

 db.towns.findOne({ _id : ObjectId(​"4d0ada87bb30773266f39fe5"​) })
 {
 "_id"​ : ObjectId(​"4d0ada87bb30773266f39fe5"​),
 "famousFor"​ : [
 "beer"​,
 "food"​,
 "Portlandia"
  ],
 "lastCensus"​ : ​"Thu Sep 20 2017 00:00:00 GMT-0700 (PDT)"​,
 "mayor"​ : {
 "name"​ : ​"Sam Adams"​,
 "party"​ : ​"D"
  },
 "name"​ : ​"Portland"​,
 "population"​ : 582000,
 "state"​ : ​"OR"
 }

You can do more than $set a value. $inc (increment a number) is a pretty useful one. Let’s increment Portland’s population by 1,000.

 db.towns.update(
  { _id : ObjectId(​"4d0ada87bb30773266f39fe5"​) },
  { $inc : { population : 1000} }
 )

There are more directives than this, such as the $ positional operator for arrays. New operations are added frequently and are updated in the online documentation. The list includes the major directives.

Command     

Description

$set

Sets the given field with the given value

$unset

Removes the field

$inc

Adds the given field by the given number

$pop

Removes the last (or first) element from an array

$push

Adds the value to an array

$pushAll

Adds all values to an array

$addToSet

Similar to push, but won’t duplicate values

$pull

Removes matching values from an array

$pullAll

Removes all matching values from an array

References

As we mentioned previously, Mongo isn’t built to perform joins. Because of its distributed nature, joins in Mongo would be pretty inefficient operations. Still, it’s sometimes useful for documents to reference each other. In these cases, the Mongo community suggests that you use a construct like { $ref : "collection_name", $id : "reference_id" }. For example, we can update the towns collection to contain a reference to a document in countries.

 > db.towns.update(
  { _id : ObjectId(​"59094292afbc9350ada6b808"​) },
  { $set : { country: { $ref: ​"countries"​, $id: ​"us"​ } } }
 )

Now you can retrieve Portland from your towns collection.

 > ​var​ portland = db.towns.findOne(
  { _id : ObjectId(​"59094292afbc9350ada6b808"​) }
  )

Then, to retrieve the town’s country, you can query the countries collection using the stored $id.

 > db.countries.findOne({ _id: portland.country.$id })

Better yet, in JavaScript, you can ask the town document the name of the collection stored in the fields reference.

 > ​var​ portlandCountryRef = portland.country.$ref;
 > db[portlandCountryRef].findOne({ _id: portland.country.$id })

The last two queries are equivalent; the second is just a bit more data-driven.

Deleting

Removing documents from a collection is simple. Just replace the find function with a call to remove, and all documents that match given the criteria will be removed. It’s important to note that the entire matching document will be removed, not just a matching element or a matching subdocument.

We recommend running find to verify your criteria before running remove. Mongo won’t think twice before running your operation. Let’s remove all countries that export bacon that isn’t tasty.

 > ​var​ badBacon = {
 'exports.foods'​ : {
  $elemMatch : {
  name : ​'bacon'​,
  tasty : ​false
  }
  }
 }
 > db.countries.find(badBacon)
 {
 "_id"​ : ObjectId(​"4d0b7b84bb30773266f39fef"​),
 "name"​ : ​"Canada"​,
 "exports"​ : {
 "foods"​ : [
  {
 "name"​ : ​"bacon"​,
 "tasty"​ : ​false
  },
  {
 "name"​ : ​"syrup"​,
 "tasty"​ : ​true
  }
  ]
  }
 }

Everything looks good. Let’s remove it.

 > db.countries.remove(badBacon)
 > db.countries.count()
 2

Now when you run count, verify we are left with only two countries. If so, our parameter-targeted delete was successful!

Reading with Code

Let’s close out this day with one more interesting query option: code. You can request that MongoDB run a decision function across your documents. We placed this last because it should always be a last resort. These queries run quite slowly, you can’t index them, and Mongo can’t optimize them. But sometimes it’s hard to beat the power of custom code.

Let’s say that we’re looking for a city with a population between 6,000 and 600,000 people.

 > db.towns.find(​function​() {
 return​ ​this​.population > 6000 && ​this​.population < 600000;
 })

That should return Portland and Punxsutawney. Mongo even has a shortcut for simple decision functions.

 > db.towns.find(​"this.population > 6000 && this.population < 600000"​)

You can run custom code with other criteria using the $where clause. In this example, the query also filters for towns famous for groundhogs named Phil.

 db.towns.find({
  $where: ​"this.population > 6000 && this.population < 600000"​,
  famousFor: ​/Phil/
 })

A word of warning: Mongo will blindly run this function against each document despite there being no guarantee that the given field exists in every document. For example, if you assume a population field exists and population is missing in even a single document, the entire query will fail because the JavaScript cannot properly execute. Be careful when you write custom JavaScript functions, be comfortable using JavaScript before attempting custom code, and in general avoid these sorts of operations in production.

Day 1 Wrap-Up

Today we took a peek at our first document database, MongoDB. We saw how we can store nested structured data as JSON objects and query that data to any depth. You learned that a document can be envisioned as a schemaless row in the relational model, keyed by a generated _id. A set of documents is called a collection in Mongo, similar to a table in PostgreSQL but also quite different.

Unlike the previous styles we’ve encountered, with collections of sets of simple datatypes, Mongo stores complex, denormalized documents, stored and retrieved as collections of arbitrary JSON structures. Mongo tops off this flexible storage strategy with a powerful query mechanism unconstrained by any predefined schema.

Its denormalized nature makes a document database a superb choice for storing data with unknown qualities, while other styles (such as relational or columnar) prefer, or sometimes even demand, that you know your data models in advance and require schema migrations to add or edit fields.

Day 1 Homework

Find

  1. Bookmark the online MongoDB documentation and read up on something you found intriguing today.
  2. Look up how to construct regular expressions in Mongo.
  3. Acquaint yourself with command-line db.help and db.collections.help output.
  4. Find a Mongo driver in your programming language of choice (Ruby, Java, PHP, Go, Elixir, and so on).

Do

  1. Print a JSON document containing { "hello" : "world" }.

  2. Select a town via a case-insensitive regular expression containing the word new.

  3. Find all cities whose names contain an e and are famous for food or beer.

  4. Create a new database named blogger with a collection named articles. Insert a new article with an author name and email, creation date, and text.

  5. Update the article with an array of comments, containing a comment with an author and text.

  6. Run a query from an external JavaScript file that you create yourself.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset