DynamoDB: The “Big Easy” of NoSQL

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

DynamoDB: The “Big Easy” of NoSQL

Six out of the seven databases that we cover in this book are easy enough to run on your laptop. But running systems such as HBase, CouchDB, and others in production—and using them for applications handling massive workloads—is a much different matter. Even databases that are a famously easy to operate at smaller scale, such as Redis, present major challenges in production environments, usually requiring skilled (and expensive!) admins and operations specialists on hand.

DynamoDB is a different story—and an outlier in this book. You don’t have to install it, start it, or maintain it. You can sign up for an AWS account, create a DynamoDB table, and just go. As you’ll see, DynamoDB does require some operations-style thinking and preparation, but you’ll never need to provide it an XML configuration à la HBase or set up a complex cluster à la Mongo. DynamoDB is a database that runs itself and yet is capable of fulfilling some of your most ambitious “webscale” dreams, offering consistently fast performance no matter how much data you’re storing.

So just how “webscale” are we talking here? Some facts:

You can store as many items as you want in any DynamoDB table (more on tables and items later).
Each item (the equivalent of a row in an SQL database) can hold as many attributes as you want, although there is a hard size limit of 400 KB per item (that limit will likely grow in the future).
If you get data modeling right, which will occupy a decent chunk of this chapter, you should experience very little performance degradation even when your tables store petabytes of data.
Over 100,000 AWS customers currently use DynamoDB.
DynamoDB handles well over a trillion total requests a day (across all AWS customers).

But DynamoDB isn’t interesting just because it’s big and cloud-based and managed by experts you don’t have to hire and fire. It’s also a system very much worth learning in its own right, providing a familiar yet unique data model and an array of features you won’t find in any of the other databases in this book. While some folks may be reticent to trust a cloud database that’s managed by someone else, a variety of forward-thinking tech companies, including Airbnb, Adobe, Siemens, and Comcast, have taken the plunge and use DynamoDB as one of the core databases driving their platforms.

DynamoDB and the (Almost) Ops-Free Lifestyle

If you’re running a database yourself or as part of a team, you can expect many of the following to keep you awake at night: ensuring speedy, predictable performance; handling unforeseeable hardware outages and network failures; scaling out disk capacity to meet unexpected spikes in demand; and enabling application developers to quickly get up and running using your database. On top of that, DBAs with a background in NoSQL aren’t exactly a dime a dozen, so hiring, training, and scaling out a team of DBAs for Mongo, HBase, or another NoSQL database is nothing to sneeze at. DynamoDB doesn’t completely rid your life of these kinds of issues, but if you use it right it can take an enormous bite out of them.

There are also a number of secondary reasons why you might want to consider DynamoDB:

You can use it in any of AWS’s many datacenters across the entire globe. As of July 2017, AWS offers DynamoDB in forty-two Availability Zones (AZs) in sixteen geographic regions, with plans to expand into at least eight more AZs in three additional regions.
All data in DynamoDB is stored on high-performing Solid State Disks (SSDs) and automatically replicated across multiple availability zones within an AWS region (which guarantees redundancy even within a single region).
You can expect genuine downtime out of DynamoDB only in the rare event that an entire AWS datacenter goes down.

Datacenter outages are the Achilles heel of the cloud, and a very real risk that you should keep in mind. We’ve all experienced Netflix, Instagram, and other widely used services going down for hours at a time due to outages in Amazon’s massive us-east-1 datacenter in Northern Virginia. AWS and DynamoDB aren’t perfect, but their track record is exceedingly good, if not downright pristine. Using a database like DynamoDB won’t grant you a completely ops-free lifestyle, but it may just enable you to refocus a huge chunk of your attention and resources onto other things, and for that reason alone it’s worth a look.

Technologically, DynamoDB originally drew heavily on concepts derived from a distributed, eventually consistent storage system called Dynamo created to address Amazon’s own data storage problems (and massive ones at that). Amazon’s theoretical research into the distributed database domain resulted in the so-called “Dynamo paper” (actually titled Dynamo: Amazon’s Highly Available Key-value Store),^[44] which exerted a seminal influence on widely used NoSQL databases such as Riak, Cassandra, and Voldemort.

It’s unclear how faithful DynamoDB is to the concepts in the Dynamo paper, as Amazon keeps most under-the-hood implementation details under wraps, but the paper itself is a treasure trove of rich theoretical explorations of distributed database concepts. Throughout this book, we’ll be careful to always use the term DynamoDB to distinguish the public-facing AWS service from the internal Dynamo and its associated paper.

The Core Storage Concepts: Tables, Items, and More

DynamoDB’s data model is a bit tricky to define using standard “NoSQL” categories. It strongly resembles the data model of a key-value store such as Redis in that it wasn’t really built to provide the rich queryability of an RDBMS such as Postgres. Although DynamoDB does have some interesting querying features, which we’ll learn about shortly, it really soars when you know what you’re looking for in advance, which is a hallmark of key-value stores. If you’re building an application that uses DynamoDB, you should always strive to architect it so that your data is associated with certain “natural” keys that allow for easy discoverability—for example, the ability to find user data on the basis of unique usernames.

There are aspects of DynamoDB’s data model, however, that are reminiscent of RDBMSs such as Postgres. The first point of overlap is that all data in DynamoDB is stored in tables that you have to create and define in advance, though tables have some flexible elements and can be modified later. You can create, modify, and delete DynamoDB tables at will using an interface called the control plane. If you’re used to interfaces like Postgres’s psql, which we explored in Chapter 2, PostgreSQL, then the control plane should be familiar to you.

The second point of overlap is that you store items inside of tables. Items roughly correspond to rows in RDBMSs; they consist of one or more attributes, which roughly correspond to RDBMS columns. Earlier in the book, you learned about databases such as Mongo and Couch that have no concept whatsoever of predefined tables. DynamoDB requires you to define only some aspects of tables, most importantly the structure of keys and local secondary indexes, while retaining a schemaless flavor.

The last point of overlap with RDBMSs is that DynamoDB enables you to query data based on secondary indexes rather than solely on the basis of a primary key (think back to secondary indexes in Postgres). This means that you can perform queries in DynamoDB that are essentially equivalent to SQL queries like these:

	/ Remember: you can't actually use SQL syntax with DynamoDB;*
	these examples are just for show /*
	SELECT * FROM chevys WHERE make = "nova";
	SELECT * FROM pro_sports_teams WHERE city = "cleveland";
	SELECT * FROM presidents WHERE first_name = "Jethro"; / OOPS! None found! /

You can even perform range queries:

	SELECT * FROM pearl_jam_albums WHERE
	title <= "Ten";
	SELECT * FROM john_cusack_films WHERE
	title BETWEEN "Better Off Dead" AND "High Fidelity";
	SELECT * FROM oscar_wilde_quotes WHERE
	quote LIKE 'I have nothing to declare%';

Now that we have a basic outline of DynamoDB’s “key-value plus” data model, a question naturally emerges: How does DynamoDB fit in with the so-called CAP theorem that we discussed in the last chapter? Are we dealing with an eventually consistent database that may turn up stale data from time to time (such as CouchDB and others in the NoSQL landscape)? Or are we dealing with a strongly consistent, ACID-compliant, transactional database that only ever returns the most up-to-date value that we’re seeking?

The answer: Yes, please! Everybody gets a car! DynamoDB actually supports both consistency models. Even better, you can specify which consistency model you want on a per-read basis.

So when you query DynamoDB, your application can say either...

I want the most up-to-date value, even if it costs me some extra latency or, heaven forbid, the value isn’t currently available at all, or
I’ve got a tight schedule, so give me what you’ve got right now, even if it’s a bit stale.

Always bear in mind, however, that “stale” in the universe of DynamoDB doesn’t mean hours; it probably means milliseconds, and the trade-off may be acceptable in plenty of cases (make sure to run it by your CTO, though). The flexibility, however, is nice, and the ability to query the same data using both models can really come in handy.

The downside of strongly consistent reads, as in other systems, is that they may not be available in case of network, hardware, or other outages. Death, taxes, and the CAP theorem: there’s no escaping them. The only real “solution” is to use strong consistency only when truly necessary and to design your application to be prepared to deal with an unresponsive database (rare as it may be with DynamoDB). Consistent reads also “cost” twice as much in terms of read capacity than non-consistent reads.

Another important thing to note about consistency is that DynamoDB supports only item-level consistency, which is analogous to row-level consistency in RDBMSs. There are no atomic operations across items, which means no consistency for batch operations. And when you run queries against indexes or whole tables, do not ever expect that the result set will be 100 percent up-to-date. Item-level consistency is a good thing to have, but if consistency across items is a necessity for your use case, you should explore other databases.

As you can see, these kinds of querying capabilities take DynamoDB beyond what you’d find in a more straightforward key-value store (like the one you’ll see in the next chapter, on Redis). So we’ll call DynamoDB’s data model key-value plus for short to account for these borrowings from the relational paradigm.

In spite of these SQL-flavored capabilities, though, there are firm limits to the DynamoDB/RDBMS parallels. Most importantly, if you need querying capabilities that go beyond the simple ones in the previous example, you’ll have to implement them on the application side, or just use a different database (or use other cloud services in conjunction with DynamoDB, as we’ll do on Day 3). Furthermore, DynamoDB has no concept of things like joins between tables; the table is the highest level at which data can be grouped and manipulated, and any join-style capabilities that you need will have to be implemented on the application side, which has its own downsides.

So that provides a little bit of background, historical and technological, for DynamoDB. It’s time to dig much deeper using real interactions with the database.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for DynamoDB: The “Big Easy” of NoSQL

Create new playlist

Sign In

Sign Up

DynamoDB: The “Big Easy” of NoSQL

DynamoDB and the (Almost) Ops-Free Lifestyle

The Core Storage Concepts: Tables, Items, and More

Table of Contents for
DynamoDB: The “Big Easy” of NoSQL