Day 1: CRUD and Nesting

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Day 1: CRUD and Nesting

We’ll spend today working on some CRUD operations and finish up by performing nested queries in MongoDB. As usual, we won’t walk you through the installation steps, but if you visit the Mongo website,^[26] you can download a build for your OS or find instructions on how to build from source. If you have OS X, we recommend installing via Homebrew (brew install mongodb). If you use a Debian/Ubuntu variant, try MongoDB’s own apt-get package.

To prevent typos, Mongo requires you to first create the directory where mongod will store its data. A common location is /data/db. Ensure the user you run the server under has permission to read and write to this directory. If it’s not already running, you can fire up the Mongo service by running mongod.

Command-Line Fun

To create a new database named book, first run this command in your terminal. It will connect to the MySQL-inspired command-line interface.

$ mongo book

Typing help in the console is a good start. We’re currently in the book database, but you can view others via show dbs and switch databases with the use command.

Creating a collection in Mongo is as easy as adding an initial record to the collection. Because Mongo is schemaless, there is no need to define anything up front; merely using it is enough. What’s more, our book database doesn’t really exist until we first add values into it. The following code creates/inserts a towns collection:

	> db.towns.insert({
	name: "New York",
	population: 22200000,
	lastCensus: ISODate("2016-07-01"),
	famousFor: [ "the MOMA", "food", "Derek Jeter" ],
	mayor : {
	name : "Bill de Blasio",
	party : "D"
	}
	})

In the previous section, we said documents were JSON (well, really BSON under the hood), so we add new documents as JSON (as we will do later on with CouchDB and, to a lesser extent, DynamoDB).

With the show collections command, you can verify the collection now exists.

> show collections

towns

We just created the towns collection by storing an object in it. We can list the contents of a collection via find. We formatted the output here for readability, but yours may just output as a single wrapped line.

> db.towns.find()

	{
	"_id" : ObjectId("59093bc08c87e2ff4157bd9f"),
	"name" : "New York",
	"population" : 22200000,
	"lastCensus" : ISODate("2016-07-01T00:00:00Z"),
	"famousFor" : [ "the MOMA", "food", "Derek Jeter" ],
	"mayor" : {
	"name" : "Bill de Blasio",
	"party" : "I"
	}
	}

Unlike a relational database, Mongo does not support server-side joins. A single JavaScript call will retrieve a document and all of its nested content, free of charge.

You may have noticed that the JSON output of your newly inserted town contains an _id field of type ObjectId. This is akin to SERIAL incrementing a numeric primary key in PostgreSQL. The ObjectId is always 12 bytes, composed of a timestamp, client machine ID, client process ID, and a 3-byte incremented counter. The figure shows how bytes are laid out.

What’s great about this autonumbering scheme is that each process on every machine can handle its own ID generation without colliding with other mongod instances. This design choice exhibits Mongo’s generally distributed nature.

JavaScript

Mongo’s native tongue is JavaScript. You’ll use it when doing things as complex as mapreduce queries or as simple as asking for help.

	> db.help()
	> db.towns.help()

These commands will list available functions related to the given object. db is a JavaScript object that contains information about the current database. db.x is a JavaScript object representing a collection (named x). Commands are just JavaScript functions.

	> typeof db
	object
	> typeof db.towns
	object
	> typeof db.towns.insert
	function

If you want to inspect the source code for a function, call it without parameters or parentheses (think more Python than Ruby).

	> db.towns.insert
	function (obj, options, _allowDot) {
	if (!obj)
	throw Error("no object passed to insert!");

	var flags = 0;

	// etc.
	}

Let’s populate a few more documents into our towns collection by creating our own JavaScript function.

mongo/insertCity.js

	function insertCity(
	name, population, lastCensus,
	famousFor, mayorInfo
	) {
	db.towns.insert({
	name: name,
	population: population,
	lastCensus: ISODate(lastCensus),
	famousFor: famousFor,
	mayor : mayorInfo
	});
	}

You can just paste the code for the function into the shell. Then we can call it.

	> insertCity("Punxsutawney", 6200, '2016-01-31',
	["Punxsutawney Phil"], { name : "Richard Alexander" }
	)
	> insertCity("Portland", 582000, '2016-09-20',
	["beer", "food", "Portlandia"], { name : "Ted Wheeler", party : "D" }
	)

We should now have three towns in our collection, which you can confirm by calling db.towns.find as before.

All of the practical exercises for Mongo in this chapter will involve accessing it either through the Mongo shell or through JavaScript code. If you’re more inclined to visual representations of data—and the systems around data—you may want to explore more UI-driven tools. One very notable Mongo-specific tool is Robo 3T,^[27] previously known as Robomongo, which is a desktop app that enables you to visualize MongoDB datasets, monitor servers, engage in user management, edit data directly, and so on.

The authors themselves are largely disinclined toward UI-driven tools like this for databases, but Robo 3T is extremely well done, and if a nice UI brings you closer to grasping Mongo or any other database, we say go for it.

Reading: More Fun in Mongo

Earlier, we called the find function without params to get all documents. To access a specific one, you only need to set an _id property. _id is of type ObjectId, and so to query, you must convert a string by wrapping it in an ObjectId(str) function.

	> db.towns.find({ "_id" : ObjectId("59094288afbc9350ada6b807") })
	{
	"_id" : ObjectId("59094288afbc9350ada6b807"),
	"name" : "Punxsutawney",
	"population" : 6200,
	"lastCensus" : ISODate("2016-01-31T00:00:00Z"),
	"famousFor" : [ "Punxsutawney Phil" ],
	"mayor" : { "name" : "Richard Alexander" }
	}

The find function also accepts an optional second parameter: a fields object we can use to filter which fields are retrieved. If we want only the town name (along with _id), pass in name with a value resolving to 1 (or true).

	> db.towns.find({ _id : ObjectId("59094288afbc9350ada6b807") }, { name : 1 })
	{
	"_id" : ObjectId("59093e9eafbc9350ada6b803"),
	"name" : "Punxsutawney"
	}

To retrieve all fields except name, set name to 0 (or false or null).

	> db.towns.find({ _id : ObjectId("59094288afbc9350ada6b807") }, { name : 0 })
	{
	"_id" : ObjectId("59093e9eafbc9350ada6b803"),
	"population" : 6200,
	"lastCensus" : ISODate("2016-01-31T00:00:00Z"),
	"famousFor" : [ "Punxsutawney Phil" ]
	}

As in PostgreSQL, in Mongo you can construct ad hoc queries on the basis of field values, ranges, or a combination of criteria. To find all towns that begin with the letter P and have a population less than 10,000, you can use a Perl-compatible regular expression (PCRE)^[28] and a range operator. This query should return the JSON object for Punxsutawney, but including only the name and population fields:

	> db.towns.find(
	{ name : /^P/, population : { $lt : 10000 } },
	{ _id: 0, name : 1, population : 1 }
	)
	{ "name" : "Punxsutawney", "population" : 6200 }

Conditional operators in Mongo follow the format of field : { $op : value }, where $op is an operation like $ne (not equal to) or $gt (greater than). You may want a terser syntax, like field < value. But this is JavaScript code, not a domain-specific query language, so queries must comply with JavaScript syntax rules (later today you’ll see how to use the shorter syntax in a certain case, but we’ll skip that for now).

The good news about the querying language being JavaScript is that you can construct operations as you would objects. Here, we build criteria where the population must be between 10,000 and 1 million people.

	> var population_range = {
	$lt: 1000000,
	$gt: 10000
	}
	> db.towns.find(
	{ name : /^P/, population : population_range },
	{ name: 1 }
	)
	{ "_id" : ObjectId("59094292afbc9350ada6b808"), "name" : "Portland" }

In addition to number ranges, we can also retrieve date ranges. For example, we can find all names with a lastCensus greater than or equal to June 1, 2016, like this:

	> db.towns.find(
	{ lastCensus : { $gte : ISODate('2016-06-01') } },
	{ _id : 0, name: 1 }
	)
	{ "name" : "New York" }
	{ "name" : "Portland" }

Notice how we again suppressed the _id field in the output explicitly by setting it to 0.

Digging Deep

Mongo loves nested array data. You can query by matching exact values:

	> db.towns.find(
	{ famousFor : 'food' },
	{ _id : 0, name : 1, famousFor : 1 }
	)
	{ "name" : "New York", "famousFor" : [ "the MOMA", "food", "Derek Jeter" ] }
	{ "name" : "Portland", "famousFor" : [ "beer", "food", "Portlandia" ] }

as well as matching partial values:

	> db.towns.find(
	{ famousFor : /moma/ },
	{ _id : 0, name : 1, famousFor : 1 }
	)
	{ "name" : "New York", "famousFor" : [ "the MOMA", "food" ] }

or query by all matching values:

	> db.towns.find(
	{ famousFor : { $all : ['food', 'beer'] } },
	{ _id : 0, name:1, famousFor:1 }
	)
	{ "name" : "Portland", "famousFor" : [ "beer", "food", "Portlandia" ] }

or the lack of matching values:

	> db.towns.find(
	{ famousFor : { $nin : ['food', 'beer'] } },
	{ _id : 0, name : 1, famousFor : 1 }
	)
	{ "name" : "Punxsutawney", "famousFor" : [ "Punxsutawney Phil" ] }

But the true power of Mongo stems from its ability to dig down into a document and return the results of deeply nested subdocuments. To query a subdocument, your field name is a string separating nested layers with a dot. For instance, you can find towns with mayors from the Democratic Party:

	> db.towns.find(
	{ 'mayor.party' : 'D' },
	{ _id : 0, name : 1, mayor : 1 }
	)
	{ "name" : "New York", "mayor" : { "name" : "Bill de Blasio", "party" : "D" } }
	{ "name" : "Portland", "mayor" : { "name" : "Ted Wheeler", "party" : "D" } }

or those with mayors who don’t have a party:

	> db.towns.find(
	{ 'mayor.party' : { $exists : false } },
	{ _id : 0, name : 1, mayor : 1 }
	)
	{ "name" : "Punxsutawney", "mayor" : { "name" : "Richard Alexander" } }

The previous queries are great if you want to find documents with a single matching field, but what if you need to match several fields of a subdocument?

elemMatch

We’ll round out our dig with the $elemMatch directive. Let’s create another collection that stores countries. This time we’ll override each _id to be a string of our choosing rather than an auto-generated identifier.

	> db.countries.insert({
	_id : "us",
	name : "United States",
	exports : {
	foods : [
	{ name : "bacon", tasty : true },
	{ name : "burgers" }
	]
	}
	})
	> db.countries.insert({
	_id : "ca",
	name : "Canada",
	exports : {
	foods : [
	{ name : "bacon", tasty : false },
	{ name : "syrup", tasty : true }
	]
	}
	})

	> db.countries.insert({
	_id : "mx",
	name : "Mexico",
	exports : {
	foods : [{
	name : "salsa",
	tasty : true,
	condiment : true
	}]
	}
	})

To validate the countries were added, we can execute the count function, expecting the number 3.

	> db.countries.count()
	3

Let’s find a country that not only exports bacon but exports tasty bacon.

	> db.countries.find(
	{ 'exports.foods.name' : 'bacon', 'exports.foods.tasty' : true },
	{ _id : 0, name : 1 }
	)
	{ "name" : "United States" }
	{ "name" : "Canada" }

But this isn’t what we wanted. Mongo returned Canada because it exports bacon and exports tasty syrup. $elemMatch helps us here. It specifies that if a document (or nested document) matches all of our criteria, the document counts as a match.

	> db.countries.find(
	{
	'exports.foods' : {
	$elemMatch : {
	name : 'bacon',
	tasty : true
	}
	}
	},
	{ _id : 0, name : 1 }
	)
	{ "name" : "United States" }

$elemMatch criteria can utilize advanced operators, too. You can find any country that exports a tasty food that also has a condiment label:

	> db.countries.find(
	{
	'exports.foods' : {
	$elemMatch : {
	tasty : true,
	condiment : { $exists : true }
	}
	}
	},
	{ _id : 0, name : 1 }
	)
	{ "name" : "Mexico" }

Mexico is just what we wanted.

Boolean Ops

So far, all of our criteria are implicitly and operations. If you try to find a country with the name United States and an _id of mx, Mongo will yield no results.

	> db.countries.find(
	{ _id : "mx", name : "United States" },
	{ _id : 1 }
	)

However, searching for one or the other with $or will return two results. Think of this layout like prefix notation: OR A B.

	db.countries.find(
	{
	$or : [
	{ _id : "mx" },
	{ name : "United States" }
	]
	},
	{ _id:1 }
	)
	{ "_id" : "us" }
	{ "_id" : "mx" }

There are so many operators in Mongo that we can’t cover them all here, but we hope this has given you a taste of MongoDB’s powerful querying capabilities. The table is not a complete list of the commands but it does cover a good chunk of them.

Command	Description
$regex	Match by any PCRE-compliant regular expression string (or just use the // delimiters as shown earlier)
$ne	Not equal to
$lt	Less than
$lte	Less than or equal to
$gt	Greater than
$gte	Greater than or equal to
$exists	Check for the existence of a field
$all	Match all elements in an array
$in	Match any elements in an array
$nin	Does not match any elements in an array
$elemMatch	Match all fields in an array of nested documents
$or	or
$nor	Not or
$size	Match array of given size
$mod	Modulus
$type	Match if field is a given datatype
$not	Negate the given operator check

You can find all the commands on the MongoDB online documentation or grab a cheat sheet from the Mongo website. We will revisit querying in the days to come.

Updating

We have a problem. New York and Punxsutawney are unique enough, but did we add Portland, Oregon, or Portland, Maine (or Texas or the others)? Let’s update our towns collection to add some U.S. states.

The update(criteria,operation) function requires two parameters. The first is a criteria query—the same sort of object you would pass to find. The second parameter is either an object whose fields will replace the matched document(s) or a modifier operation. In this case, the modifier is to $set the field state with the string OR.

	db.towns.update(
	{ _id : ObjectId("4d0ada87bb30773266f39fe5") },
	{ $set : { "state" : "OR" } }
	);

You may wonder why the $set operation is even required. Mongo doesn’t think in terms of attributes; it has only an internal, implicit understanding of attributes for optimization reasons. But nothing about the interface is attribute-oriented. Mongo is document-oriented. You will rarely want something like this (notice the lack of $set operation):

	db.towns.update(
	{ _id : ObjectId("4d0ada87bb30773266f39fe5") },
	{ state : "OR" }
	);

This would replace the entire matching document with the document you gave it ({ state : "OR" }). Because you didn’t give it a command like $set, Mongo assumes you just want to switch them up, so be careful.

We can verify our update was successful by finding it (note our use of findOne to retrieve only one matching object).

db.towns.findOne({ _id : ObjectId("4d0ada87bb30773266f39fe5") })

	{
	"_id" : ObjectId("4d0ada87bb30773266f39fe5"),
	"famousFor" : [
	"beer",
	"food",
	"Portlandia"
	],
	"lastCensus" : "Thu Sep 20 2017 00:00:00 GMT-0700 (PDT)",
	"mayor" : {
	"name" : "Sam Adams",
	"party" : "D"
	},
	"name" : "Portland",
	"population" : 582000,
	"state" : "OR"
	}

You can do more than $set a value. $inc (increment a number) is a pretty useful one. Let’s increment Portland’s population by 1,000.

	db.towns.update(
	{ _id : ObjectId("4d0ada87bb30773266f39fe5") },
	{ $inc : { population : 1000} }
	)

There are more directives than this, such as the $ positional operator for arrays. New operations are added frequently and are updated in the online documentation. The list includes the major directives.

Command	Description
$set	Sets the given field with the given value
$unset	Removes the field
$inc	Adds the given field by the given number
$pop	Removes the last (or first) element from an array
$push	Adds the value to an array
$pushAll	Adds all values to an array
$addToSet	Similar to push, but won’t duplicate values
$pull	Removes matching values from an array
$pullAll	Removes all matching values from an array

Mongo, and schemaless databases in general, are not very friendly when it comes to misspellings. If you haven’t run across this problem yet, you probably will at some point, so be warned. You can draw parallels between static and dynamic programming languages. You define static up front, while dynamic will accept values you may not have intended, even nonsensical types like person_name = 5.

Documents are schemaless, so Mongo has no way of knowing if you intended to insert pipulation into your city or meant to query on lust_census; it will happily insert those fields or return no matching values. This can get you in trouble later on when you try to find a document that matches the condition population > 10000 and the result set is incomplete because Mongo doesn’t even know that the object was intended to have a population field.

This is less of a problem when you use Mongo in a more programmatic and less ad-hoc way, as we’re doing here. But keep in mind that flexibility has its price. Caveat emptor.

References

As we mentioned previously, Mongo isn’t built to perform joins. Because of its distributed nature, joins in Mongo would be pretty inefficient operations. Still, it’s sometimes useful for documents to reference each other. In these cases, the Mongo community suggests that you use a construct like { $ref : "collection_name", $id : "reference_id" }. For example, we can update the towns collection to contain a reference to a document in countries.

	> db.towns.update(
	{ _id : ObjectId("59094292afbc9350ada6b808") },
	{ $set : { country: { $ref: "countries", $id: "us" } } }
	)

Now you can retrieve Portland from your towns collection.

	> var portland = db.towns.findOne(
	{ _id : ObjectId("59094292afbc9350ada6b808") }
	)

Then, to retrieve the town’s country, you can query the countries collection using the stored $id.

> db.countries.findOne({ _id: portland.country.$id })

Better yet, in JavaScript, you can ask the town document the name of the collection stored in the fields reference.

	> var portlandCountryRef = portland.country.$ref;
	> db[portlandCountryRef].findOne({ _id: portland.country.$id })

The last two queries are equivalent; the second is just a bit more data-driven.

Deleting

Removing documents from a collection is simple. Just replace the find function with a call to remove, and all documents that match given the criteria will be removed. It’s important to note that the entire matching document will be removed, not just a matching element or a matching subdocument.

We recommend running find to verify your criteria before running remove. Mongo won’t think twice before running your operation. Let’s remove all countries that export bacon that isn’t tasty.

	> var badBacon = {
	'exports.foods' : {
	$elemMatch : {
	name : 'bacon',
	tasty : false
	}
	}
	}
	> db.countries.find(badBacon)
	{
	"_id" : ObjectId("4d0b7b84bb30773266f39fef"),
	"name" : "Canada",
	"exports" : {
	"foods" : [
	{
	"name" : "bacon",
	"tasty" : false
	},

	{
	"name" : "syrup",
	"tasty" : true
	}
	]
	}
	}

Everything looks good. Let’s remove it.

	> db.countries.remove(badBacon)
	> db.countries.count()
	2

Now when you run count, verify we are left with only two countries. If so, our parameter-targeted delete was successful!

Reading with Code

Let’s close out this day with one more interesting query option: code. You can request that MongoDB run a decision function across your documents. We placed this last because it should always be a last resort. These queries run quite slowly, you can’t index them, and Mongo can’t optimize them. But sometimes it’s hard to beat the power of custom code.

Let’s say that we’re looking for a city with a population between 6,000 and 600,000 people.

	> db.towns.find(function() {
	return this.population > 6000 && this.population < 600000;
	})

That should return Portland and Punxsutawney. Mongo even has a shortcut for simple decision functions.

> db.towns.find("this.population > 6000 && this.population < 600000")

You can run custom code with other criteria using the $where clause. In this example, the query also filters for towns famous for groundhogs named Phil.

	db.towns.find({
	$where: "this.population > 6000 && this.population < 600000",
	famousFor: /Phil/
	})

A word of warning: Mongo will blindly run this function against each document despite there being no guarantee that the given field exists in every document. For example, if you assume a population field exists and population is missing in even a single document, the entire query will fail because the JavaScript cannot properly execute. Be careful when you write custom JavaScript functions, be comfortable using JavaScript before attempting custom code, and in general avoid these sorts of operations in production.

Day 1 Wrap-Up

Today we took a peek at our first document database, MongoDB. We saw how we can store nested structured data as JSON objects and query that data to any depth. You learned that a document can be envisioned as a schemaless row in the relational model, keyed by a generated _id. A set of documents is called a collection in Mongo, similar to a table in PostgreSQL but also quite different.

Unlike the previous styles we’ve encountered, with collections of sets of simple datatypes, Mongo stores complex, denormalized documents, stored and retrieved as collections of arbitrary JSON structures. Mongo tops off this flexible storage strategy with a powerful query mechanism unconstrained by any predefined schema.

Its denormalized nature makes a document database a superb choice for storing data with unknown qualities, while other styles (such as relational or columnar) prefer, or sometimes even demand, that you know your data models in advance and require schema migrations to add or edit fields.

Day 1 Homework

Find

Bookmark the online MongoDB documentation and read up on something you found intriguing today.
Look up how to construct regular expressions in Mongo.
Acquaint yourself with command-line db.help and db.collections.help output.
Find a Mongo driver in your programming language of choice (Ruby, Java, PHP, Go, Elixir, and so on).

Print a JSON document containing { "hello" : "world" }.
Select a town via a case-insensitive regular expression containing the word new.
Find all cities whose names contain an e and are famous for food or beer.
Create a new database named blogger with a collection named articles. Insert a new article with an author name and email, creation date, and text.
Update the article with an array of comments, containing a comment with an author and text.
Run a query from an external JavaScript file that you create yourself.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Day 1: CRUD and Nesting

Create new playlist

Sign In

Sign Up

Day 1: CRUD and Nesting

Command-Line Fun

JavaScript

Reading: More Fun in Mongo

Digging Deep

elemMatch

Boolean Ops

Updating

References

Deleting

Reading with Code

Day 1 Wrap-Up

Day 1 Homework

Table of Contents for
Day 1: CRUD and Nesting