Basics of scaling a traditional database

While our Hello World application doesn't have any data store such as a database, it is very likely that it won't be the case for most applications. There are many types of databases available on the market, but more often than not, the right thing to do will be to start with a traditional relational database such as MySQL or PostgreSQL. We will talk about NoSQL databases toward the end of this chapter, which at scale are a good supplement to relational databases, but in most cases starting with a different data store will be considered by most as a premature optimization.

In terms of architecture, it is best to break out our stack such that the databases are on their own layer. This is called building a three-tier architecture. The first tier is called the Presentation tier and it will often be a browser of a mobile device, the second tier is the Application tier, which is what we have built so far in this book, that is our node service, and finally, the Data tier will hold the different data stores used by our application including our databases. The data layer is often more critical and harder to manage than the application layer since it's a more stateful layer that contains data that isn't source controlled the same way our application or infrastructure code is controlled:

AWS offers the Relational Database Service (RDS) (http://amzn.to/2gOGi8s) which lets you create managed relational databases. The service offers a variety of flavors of relational databases, ranging from MySQL to Oracle. Using the RDS is a really compelling option as Amazon will take all the administration burden off your plate. They will take care of some of the most critical tasks, including doing daily backups through snapshots and periodic maintenance to update the database code or the OS.

One of the biggest concerns to have when adding a database to our system is its availability. You need to handle the following:

The ability to survive the loss of the instance hosting your database in case of a hardware or network outage
The ability to scale up your database to handle more read requests
The ability to scale up your database to handle more write requests

Thanks to RDS, all those concerns are easily addressable. In order to sustain a major database failure, your database will need to not be a single point of failure. If you created a MySQL, PostgreSQL, or Oracle instance, you will want to use an option called Multi-AZ (http://amzn.to/2fLZANI). When enabled, this option will create a synchronized replica of your database in a different availability zone. In case the primary database was to suffer from an outage, the RDS service would perform an automatic failover to the standby instance and the web application would recover seamlessly from the outage.

With regards to scaling up the number of read requests, the RDS service provides the ability to create read replicas of your database (http://amzn.to/2fuhFwA). At the application level, this means using two different types of connections, one for write requests and one for read requests. For the read requests, you will likely want to connect to all your read replicas and round-robin across them with your select statements.

Scaling the write requests is the hardest problem to address. Early on, the easiest way to handle that constraint is to scale your RDS instances vertically (meaning use more powerful instances).

In addition to the traditional MySQL, PostgreSQL, and Oracle databases, AWS offers its own relational database called Aurora. Aurora is fully compatible with MySQL and PostgreSQL (you select the type of database upon creation). If you are developing against one of the two databases, then using Aurora will be a much better solution.

Aurora offers up to five times the performance of its standard counterpart. The data is more durable as each chunk of the database volume is replicated six ways across three AZs. In addition, Aurora works with a concept of clusters. Instead of talking to the individual instances as described previously with reading replicas, Aurora exposes read and write endpoints for the cluster. Behind the scenes, AWS will automatically take care of adding the new instances when you scale up your cluster and even promote one of your replicas in the event that your master instance was to fail. Finally, Aurora will let you store up to 64 TB through an auto-scaling storage feature, so you won't have to worry about running out of storage over time. You can read more about AWS Aurora at http://amzn.to/2fjc9kj.

At this point, assuming your application has a database, this is what your architecture could look like:

Depending on how big you expect your databases to become, it is also important to keep in mind that at a large scale, there are a number of functionalities that are best to avoid, such as the use of joins or transactions. You can read more on this topic by searching the web for database denormalization.

Our stack is now very scalable. AWS will auto-scale the ELB service to handle any amount of traffic; on the other side of it, our application is managed through an Auto Scaling group that will also automatically add more instances as needed and finally, we also saw how to scale up a data layer either vertically for writes or horizontally for reads. The next step in the evolution of our stack will be to optimize performance and costs by taking advantage of more managed services.

Table of Contents for Basics of scaling a traditional database

Create new playlist

Sign In

Sign Up

Table of Contents for
Basics of scaling a traditional database