DynamoDB: The “Big Easy” of NoSQL

Six out of the seven databases that we cover in this book are easy enough to run on your laptop. But running systems such as HBase, CouchDB, and others in production—and using them for applications handling massive workloads—is a much different matter. Even databases that are a famously easy to operate at smaller scale, such as Redis, present major challenges in production environments, usually requiring skilled (and expensive!) admins and operations specialists on hand.

DynamoDB is a different story—and an outlier in this book. You don’t have to install it, start it, or maintain it. You can sign up for an AWS account, create a DynamoDB table, and just go. As you’ll see, DynamoDB does require some operations-style thinking and preparation, but you’ll never need to provide it an XML configuration à la HBase or set up a complex cluster à la Mongo. DynamoDB is a database that runs itself and yet is capable of fulfilling some of your most ambitious “webscale” dreams, offering consistently fast performance no matter how much data you’re storing.

So just how “webscale” are we talking here? Some facts:

  • You can store as many items as you want in any DynamoDB table (more on tables and items later).

  • Each item (the equivalent of a row in an SQL database) can hold as many attributes as you want, although there is a hard size limit of 400 KB per item (that limit will likely grow in the future).

  • If you get data modeling right, which will occupy a decent chunk of this chapter, you should experience very little performance degradation even when your tables store petabytes of data.

  • Over 100,000 AWS customers currently use DynamoDB.

  • DynamoDB handles well over a trillion total requests a day (across all AWS customers).

But DynamoDB isn’t interesting just because it’s big and cloud-based and managed by experts you don’t have to hire and fire. It’s also a system very much worth learning in its own right, providing a familiar yet unique data model and an array of features you won’t find in any of the other databases in this book. While some folks may be reticent to trust a cloud database that’s managed by someone else, a variety of forward-thinking tech companies, including Airbnb, Adobe, Siemens, and Comcast, have taken the plunge and use DynamoDB as one of the core databases driving their platforms.

DynamoDB and the (Almost) Ops-Free Lifestyle

If you’re running a database yourself or as part of a team, you can expect many of the following to keep you awake at night: ensuring speedy, predictable performance; handling unforeseeable hardware outages and network failures; scaling out disk capacity to meet unexpected spikes in demand; and enabling application developers to quickly get up and running using your database. On top of that, DBAs with a background in NoSQL aren’t exactly a dime a dozen, so hiring, training, and scaling out a team of DBAs for Mongo, HBase, or another NoSQL database is nothing to sneeze at. DynamoDB doesn’t completely rid your life of these kinds of issues, but if you use it right it can take an enormous bite out of them.

There are also a number of secondary reasons why you might want to consider DynamoDB:

  • You can use it in any of AWS’s many datacenters across the entire globe. As of July 2017, AWS offers DynamoDB in forty-two Availability Zones (AZs) in sixteen geographic regions, with plans to expand into at least eight more AZs in three additional regions.

  • All data in DynamoDB is stored on high-performing Solid State Disks (SSDs) and automatically replicated across multiple availability zones within an AWS region (which guarantees redundancy even within a single region).

  • You can expect genuine downtime out of DynamoDB only in the rare event that an entire AWS datacenter goes down.

Datacenter outages are the Achilles heel of the cloud, and a very real risk that you should keep in mind. We’ve all experienced Netflix, Instagram, and other widely used services going down for hours at a time due to outages in Amazon’s massive us-east-1 datacenter in Northern Virginia. AWS and DynamoDB aren’t perfect, but their track record is exceedingly good, if not downright pristine. Using a database like DynamoDB won’t grant you a completely ops-free lifestyle, but it may just enable you to refocus a huge chunk of your attention and resources onto other things, and for that reason alone it’s worth a look.

The Core Storage Concepts: Tables, Items, and More

DynamoDB’s data model is a bit tricky to define using standard “NoSQL” categories. It strongly resembles the data model of a key-value store such as Redis in that it wasn’t really built to provide the rich queryability of an RDBMS such as Postgres. Although DynamoDB does have some interesting querying features, which we’ll learn about shortly, it really soars when you know what you’re looking for in advance, which is a hallmark of key-value stores. If you’re building an application that uses DynamoDB, you should always strive to architect it so that your data is associated with certain “natural” keys that allow for easy discoverability—for example, the ability to find user data on the basis of unique usernames.

There are aspects of DynamoDB’s data model, however, that are reminiscent of RDBMSs such as Postgres. The first point of overlap is that all data in DynamoDB is stored in tables that you have to create and define in advance, though tables have some flexible elements and can be modified later. You can create, modify, and delete DynamoDB tables at will using an interface called the control plane. If you’re used to interfaces like Postgres’s psql, which we explored in Chapter 2, PostgreSQL, then the control plane should be familiar to you.

The second point of overlap is that you store items inside of tables. Items roughly correspond to rows in RDBMSs; they consist of one or more attributes, which roughly correspond to RDBMS columns. Earlier in the book, you learned about databases such as Mongo and Couch that have no concept whatsoever of predefined tables. DynamoDB requires you to define only some aspects of tables, most importantly the structure of keys and local secondary indexes, while retaining a schemaless flavor.

The last point of overlap with RDBMSs is that DynamoDB enables you to query data based on secondary indexes rather than solely on the basis of a primary key (think back to secondary indexes in Postgres). This means that you can perform queries in DynamoDB that are essentially equivalent to SQL queries like these:

 /* Remember: you can't actually use SQL syntax with DynamoDB;
  these examples are just for show */
 SELECT​ * ​FROM​ chevys ​WHERE​ make = "nova";
 SELECT​ * ​FROM​ pro_sports_teams ​WHERE​ city = "cleveland";
 SELECT​ * ​FROM​ presidents ​WHERE​ first_name = "Jethro"; ​/* OOPS! None found! */

You can even perform range queries:

 SELECT​ * ​FROM​ pearl_jam_albums ​WHERE
  title <= "Ten";
 SELECT​ * ​FROM​ john_cusack_films ​WHERE
  title ​BETWEEN​ "Better Off Dead" ​AND​ "High Fidelity";
 SELECT​ * ​FROM​ oscar_wilde_quotes ​WHERE
  quote ​LIKE​ ​'I have nothing to declare%'​;

As you can see, these kinds of querying capabilities take DynamoDB beyond what you’d find in a more straightforward key-value store (like the one you’ll see in the next chapter, on Redis). So we’ll call DynamoDB’s data model key-value plus for short to account for these borrowings from the relational paradigm.

In spite of these SQL-flavored capabilities, though, there are firm limits to the DynamoDB/RDBMS parallels. Most importantly, if you need querying capabilities that go beyond the simple ones in the previous example, you’ll have to implement them on the application side, or just use a different database (or use other cloud services in conjunction with DynamoDB, as we’ll do on Day 3). Furthermore, DynamoDB has no concept of things like joins between tables; the table is the highest level at which data can be grouped and manipulated, and any join-style capabilities that you need will have to be implemented on the application side, which has its own downsides.

So that provides a little bit of background, historical and technological, for DynamoDB. It’s time to dig much deeper using real interactions with the database.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset