Why Aggregates?

The avid reader will probably be wondering what all of this has to do with Aggregates and Aggregate Design. And actually, that's a pretty good question. There's a direct relation, so let's explore it. The Relational Model uses tables to store data. Those tables are composed of rows, where each row usually represents an instance of a concept of the application's interest. Additionally, each row can point to other rows on other tables of the same database, and the consistency between this relationship can be kept by the use of referential integrity. This model is fine; however, it lacks a very basic word: the object word.

Indeed, when we talk about the Relational Model, we're namely talking about tables, rows, and relationships between rows. And when we talk about the Object-Oriented Model, we're talking mainly about compositions of objects. So every time we fetch data — a set of rows — from a relational database, we run a translation process responsible for building an in-memory representation we can operate with. The same applies to the opposite direction. Every time we need to store an object in the database, we should run the other translation process to translate that object to a given set of rows or tables. This translation, from object to rows or tables, means that you may run different queries against your database. As such, without using any specific tool, such as transactions, it's impossible to guarantee the data will be persisted consistently. This problem is the so-called impedance mismatch.

Impedance Mismatch
The object-relational impedance mismatch is a set of conceptual and technical difficulties that are often encountered when a relational database management system (RDBMS) is being used by a program written in an object-oriented programming language or style, particularly when objects or class definitions are mapped in a straightforward way to database tables or relational schemata.
Extracted from Wikipedia

The impedance mismatch is not an easy problem to solve, so we highly discourage trying to solve it on your own. It would be a huge undertaking, and it's simply not worth the effort. Luckily, there are some libraries out there that take care of this translation process. They're commonly known as Object-Relational Mappers (which we've discussed in earlier chapters) and their primary concern is to ease the process of translating from the Relational Model to the Object-Oriented Model, and vice versa.

This is an issue that also affects NoSQL persistence engines and not just databases. Most NoSQL engines use documents. An Entity is translated into a document representation such as JSON, XML, binary, and so on. and then persisted. The main difference with RDBMS databases is that if a main Entity (such as Order) has other related Entities (such as OrderLines), you can more easily design a single JSON document that will contain all the information. With this approach, with a single request to your NoSQL engine, you don't need transactions.

Nevertheless, if you're using NoSQL or RDBMS for fetching and persisting your Entities, you'll need one or more queries. In order to ensure data consistency, those queries or requests need to be executed as a single operation. Running as a single operation can guarantee that data will be consistent.

What does consistent mean? It means that all data persisted into our database must be compliant with all business rules, also known as invariants. An example of a business invariant could be how on GitHub, a user is able to have unlimited public repositories but no private repositories. However, if this user pays $12 per month, then they're able to have up to 10 private repositories.

Relational databases provide three main tools for helping us with data consistency: * Referential integrity: Foreign keys, nullable checks, and so on. * Transactions: Run multiple queries as a single operation. The problem with transactions is the same as that of branches and merges in your code repository. Keeping a branch has a performance cost (memory, CPU, storage, indexing, and so on.). If too many people (concurrency) are touching the same data, conflicts will occur and transaction commits will fail. * Locking: Block rows or tables. Other queries around the same tables and rows must wait for the block to be removed. Locking has a negative impact on the performance of your application.

Suppose we have an e-commerce application we want to expand to other countries and regions, and suppose the release goes fairly well and sales increase. A pretty evident side effect of the release is that the database should be able to handle the additional load increase. As seen earlier, there are two scaling methods: up or out.

Scaling up means we improve the hardware infrastructure we have (For example: better CPU, more memory, better hard disks). Scaling out means adding more machines that will organize in a cluster for doing specific work. In this case, we could have a cluster of databases.

But relational databases aren't designed to scale horizontally, since we can't configure them to save one set of rows to a given machine and another set of rows to a different one. Relational databases are easy to scale up, but the Relational Model doesn't scale horizontally.

In the NoSQL world, data consistency is a bit more difficult: transactions and referential integrity aren't generally supported, while locking is supported but generally not encouraged.

NoSQL databases aren't affected as drastically by the impedance mismatch. They match perfectly with Aggregate Design because they enable us to easily store and retrieve single units atomically. For example, when using a key-value store such as Redis, an Aggregate could be serialized and stored on a specific key. On a document-oriented store such as Elasticsearch, an Aggregate would be serialized into a JSON and persisted as a document. As mentioned before, the problem comes when multiple documents must be updated at once.

For that reason, when persisting any object with a single representation (one document, so no multiple queries needed), it's easy to distribute those single units across several machines, called nodes, which make up a cluster of NoSQL databases. It's common knowledge that these databases are easy to distribute, which means that the style of databases is easy to scale horizontally.

Table of Contents for Why Aggregates?

Create new playlist

Sign In

Sign Up

Table of Contents for
Why Aggregates?