Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Kasun Indrasiri and Prabath SiriwardenaMicroservices for the Enterprisehttps://doi.org/10.1007/978-1-4842-3858-5_5

5. Data Management

Kasun Indrasiri¹ and Prabath Siriwardena¹

(1)

San Jose, CA, USA

In most business use cases, the service logic is built on top of an underlying persistent layer. Often a database is used as the persistent layer and it is acting as the system of record for a given service. As we’ve discussed, microservices are built as autonomous entities and should have control over the data layer that they operate on. This essentially means that microservices cannot depend on a data layer that is owned by or shared by another entity. So, in the process of building autonomous services, it is also required to have an isolated persistent layer for each microservice. In this chapter, we discuss the commonly used patterns and best practices for transforming centralized or shared database-based enterprise applications to microservices that are based on decentralized databases.

Monolithic Applications and Shared Databases

In the context of enterprise architecture, which is based on monolithic applications and services, often a single centralized database (or a few) is shared among multiple applications and services. For example, as depicted in Figure 5-1, all services of the retail system share a centralized database (Retail DB). So, all the information related to products, customers, orders, payments, and so on, are centrally managed through the Retail database.

../images/461146_1_En_5_Chapter/461146_1_En_5_Fig1_HTML.jpg — Figure 5-1
The microservices of online retail application share a single database

In fact, a centralized shared database makes it easier for the application to combine data from multiple tables and formulate different business representations. The powerful query languages like SQL innately support all the features required for sharing the data and building different data compositions. For example, using SQL we can join multiple tables on various complex conditions and create a composite view of different entities.

When working with shared business entities , implementing business interactions among them in transactional manner becomes quite important. A transaction is an atomic unit of work that either fails or succeeds. The key characteristics of transactions are known as ACID, which is an acronym for atomicity (all changes to the data are performed as if they are a single operation), consistency (data is in a consistent state when a transaction starts and when it ends), isolation (the intermediate state of a transaction is invisible to other transactions), and durability (after a transaction successfully completes, changes to data persist and are not undone, even in the event of a system failure).

A centralized database makes ACID transactions across multiple entities trivially easy. In a relational database, every SQL statement must execute in the scope of a transaction. Hence it’s quite trivial to model a complex transactional scenario that involves multiple tables. Most Relational Database Management Systems (RDBMS) support such capabilities out of the box.

Despite the advantages of the centralized shared database architecture, it has drawbacks too. It is the single point of failure, creates a potential performance bottleneck due to heavy application traffic directed into a single database, and has tight dependencies between applications, as they share same database tables. So, you cannot build autonomous and independent microservices if you are using a shared persistent layer or database. Hence, with microservices you need to decentralize data management and each microservice has to fully own the data that it operates on.

A Database per Microservice

The microservices architecture encourages microservices to own the data that they operate on and databases shouldn’t be shared with any other service. Therefore a given microservice will have an isolated datastore or use an isolated persistent service (e.g., from a cloud provider).

Having a database per microservice gives us a lot of freedom when it comes to microservices autonomy. For instance, microservices owners can modify the database schema as per the business requirements, without worrying about the external consumers of the database. There’s nobody from the external applications who can access the database directly. This also gives the microservices developer the freedom to select the technology to be used as the persistent layer of microservices. Different microservices can use different persistent store technologies, such as RDBMS, NoSQL or other cloud services.

However, having a database per service introduces a new set of challenges. When it comes to the realization of any business scenario, sharing data between microservices and implementing transactions within and across service boundaries becomes quite challenging.

Sharing Data Between Microservices

In a monolithic database, it is quite easy to do any arbitrary data composition because we share a single monolithic database. However, in the microservices context, every piece of data is owned by a single service (single system of record). A system of record (or persistent layer) cannot be directly accessed from any other service or system. The only way to access the data owned by another microservice is through a service interface or API. Other systems, which access the data through the published API, possibly could use a read-only local cache to keep the data locally.

To cater to these requirements, we need to come up with suitable techniques to share data between microservices, as most business scenarios would require it.

Eliminating Shared Tables

Sharing tables between multiple services/applications is a quite common pattern in a monolithic database. As discussed earlier, when we share a table between two or more microservices, a change to the schema of that table could affect all dependent microservices. For example, as shown in Figure 5-2, the Order Processing and Shipping services share the same table TRACKING_INFO, which keeps track of the order status. Both services may read or write to the same table and the underlying central database provides all the required functionalities (such as ACID transactions). However, if we need to change the schema of the TRACKING_INFO table, then that’ll affect both the Order Processing and Shipping services. Also, it is not possible to have service specific data (that service would not like to share) in the shared table. These types of shared table scenarios are not compatible with microservices data management fundamentals. The persistent store/database of a service should be independent and only one microservice should operate on it. Therefore, with a microservices architecture, we need to get rid of such shared tables.

../images/461146_1_En_5_Chapter/461146_1_En_5_Fig2_HTML.jpg — Figure 5-2
Data management between two services using a shared table

So, how do we get rid of shared tables ? If you think in terms of microservice data handling principles that we have discussed earlier, a given piece of data must be owned by a single service. Hence, in this example (depicted in Figure 5-3), the tracking information should be split into two tables. One table should have the data, which is relevant to the Order Processing microservice and the other table should contain the data relevant to the Shipping microservice. There can be shared data duplicated on these two tables and services are responsible for keeping the data in-sync using the published APIs of those services (no direct database access). We discuss these synchronization techniques in detail later in this chapter.

../images/461146_1_En_5_Chapter/461146_1_En_5_Fig3_HTML.jpg — Figure 5-3
Splitting shared data and managing it as independent entities

Another variation of shared tables is when the shared data is represented as a separate business entity. In the previous example, the shared data (tracking information) doesn’t represent a business entity. So, let’s take a different example of sharing customer data between the Order Processing and Product Management services. In this case, both these services use data from a shared data table (CUSTOMER table) in their business logic. We can now identify that customer information is not just a table but also a completely different business entity. We can simply treat it as a business capability oriented entity and model that as a microservice. As shown in Figure 5-4, we can introduce the Customer microservice and let it own the customer data and the other services can consume customer data through an API exposed by the Customer service.

../images/461146_1_En_5_Chapter/461146_1_En_5_Fig4_HTML.jpg — Figure 5-4
Services share the customer information through a service build on top of customer database

So, we can identify the key steps involved in eliminating data tables, which share data between multiple microservices.

1.
Identify the shared table and identify the business capability of the data stored in that shared table.
2.
Move the shared table to a dedicated database and, on top of that database, create a new service (business capability) identified in the previous step.
3.
Remove all direct database access from other services and only allow them to access the data via the service’s published API.

With this design, we need to have a dedicated owner of the newly created shared service that can modify the service interface or schema of that service. This also helps us discover the new business boundaries between these services, which will make our microservices-based application future-proof with any new requirements.

Shared Data

Storing data across multiple tables and connecting it through foreign keys (FK) is a very common technique in relational databases. A foreign key is a column or combination of columns that is used to establish and enforce a link between the data in two tables. You can create a foreign key by defining a FOREIGN KEY constraint when you create or modify a table. Foreign keys enable the referential integrity between the data stored in multiple tables, which means that if a foreign key contains a value, this value refers to an existing record in the related table.

For example, Figure 5-5 illustrates the Order Processing and Product Management services, which use the ORDER and PRODUCT tables. A given order contains multiple products and the order table refers to such products using a foreign key, which points to the primary key of the PRODUCT table. With the foreign keys constraint, you can only add a value to the foreign key of the ORDER table from an existing ORDER entity.

../images/461146_1_En_5_Chapter/461146_1_En_5_Fig5_HTML.jpg — Figure 5-5
Shared database - Foreign key relationship between tables

With monolithic shared databases, using a foreign key and joining data is quite trivial. But when you want to have independent services and use a database per service, having this kind of link for referential integrity is virtually impossible. So, with a microservices architecture, we need to find different ways of handling this scenario. Let’s take a look at some of the commonly used techniques to achieve these requirements.

Synchronous Lookups

When you have a dedicated database for each microservice, if one service needs to access the data of the other, it can simply access the published API of that microservice and retrieve the required data. For example, as shown in Figure 5-6, the order service keeps the required product IDs that are part of a given order. If the Order Processing service requires the detailed information of the products, then from it’s application logic, it has to invoke the Product Management service and retrieve the product information.

../images/461146_1_En_5_Chapter/461146_1_En_5_Fig6_HTML.jpg — Figure 5-6
Using synchronous lookups to the service interface to access the data owned by other services

This technique is quite trivial to understand and at the implementation level, you need to write extra logic to do an external service call. We need to keep in mind that, unlike databases, we no longer have the referential integrity of a foreign key constraint. This means that the service developers have to take care of the consistency of data that they put into the table. For example, when you create an order you need to make sure (possibly by calling the product service) that the products that are referred from that order actually exist in the PRODUCT table.

Using Asynchronous Events

Sharing data using synchronous lookups from other microservices may be expensive in certain business scenarios. As an alternative, we can leverage the event-driven architecture (publisher-subscriber pattern) to share data between services. For example, for the same scenario of the Order Processing and Product Management services (see Figure 5-7), we can introduce an event-driven communication pattern, in which we have an event bus, which is used as the messaging infrastructure. If there is an update to a product, the Product Management service (publisher) updates its product table and publishes an event into the event bus. The Order Processing service (subscriber) has subscribed to the interested topic of product updates and, therefore, as the Product Management service publishes product update events to that topic, the Order Processing service will receive them. Then it can update its local cache of product information and use the cache to implement the business logic of the Product Management service.

../images/461146_1_En_5_Chapter/461146_1_En_5_Fig7_HTML.jpg — Figure 5-7
Using asynchronous events to share data between microservices

As the event bus you can select any asynchronous messaging technology (we discussed asynchronous messaging techniques in detail in Chapter 3, “Inter-Service Communication”), such as Kafka or AMQP Broker (such as RabbitMQ), and you can use different subscription techniques to ensure the delivery of the event to the subscriber (such as durable subscriptions).

With this approach, you can eliminate synchronous service calls from one service to the other, but since we are using a local cache, the data can be stale. Hence the asynchronous event-based data sharing is an eventual consistency model. Eventual consistency makes sure that the data of each service gets consistent eventually (you may get stale data for a certain amount of time). The time taken by the services to get consistent data may or may not be defined. Therefore, we need to use this pattern for use cases that are not affected by an eventual consistency nature.

Shared Static Data

When it comes to storing and sharing the immutable read-only metadata, conventional monolithic databases are often used and data is shared through a shared table. For example, data such as U.S. states, list of countries, etc., is often used as the shared static data. With a microservices approach, since we don’t want to share the databases, we need to think about how to keep the shared static data.

One would think that having another microservices with the static data would solve this problem, but it is overkill to have a service just to get some static information that does not change over time. Hence, sharing static data is often done with shared libraries. For example, if a given service wants to use the static metadata, it has to import the shared library into the service code.

Data Composition

Composing data from multiple entities and creating different views is a very common requirement in data management. With monolithic databases (RDBMS in particular), it is trivially easy to build the composition of multiple tables using joins in SQL statements. So, you can seamlessly compose different data views out of the exiting entities and use them in your services.

However, in the microservices context, when you introduce the database per microservice method, building data compositions becomes very complex. You no longer can use the built-in constructs such as joins to compose data, which are dispersed among multiple databases owned by different services.

Let’s take a closer look at some of the commonly used techniques to do data composition with microservices.

Composite Services or Client-Side Mashups

When you have to create join of data from multiple microservices, you are only allowed to access the service APIs. So, to create composition of data from multiple microservices, you can create a composite service on top of the existing microservices. The composite service is responsible for invoking the downstream services and does the runtime composition of the data retrieved via service calls.

For example, let’s consider the example shown in Figure 5-8. Suppose that we need to create a composition of the orders placed and have the details of the customers who have placed those orders. Here we have two services—Order Processing and Customer—which have their own databases to hold the orders and customer information.

../images/461146_1_En_5_Chapter/461146_1_En_5_Fig8_HTML.jpg — Figure 5-8
Data composition using a composite service that calls the downstream services and aggregates the data

The requirement that we have at hand is to create a join of the orders and customers. With the composite service approach, we can create a new service—the Customer-Order composite service—and call the Order Processing and Customer microservices from it. You need to implement the runtime data composition logic as well as the communication logic (for example, to invoke RESTful Order Processing and Customer microservices) inside the composite service.

One other alternative is to implement the same runtime data composition at the client side. Basically, rather having a composite service, the consumers/client applications can call the required downstream services and build the composition themselves. This is often known as a client-side mashup.

Composite services or client-side mashups are suitable when the data that you have joined is relatively small. Since this is a runtime composition, if you are going to load a lot of data into memory, the runtime of the composite service will require a lot of memory. Therefore, we need to select this approach based on the data composition scenario that we have to implement.

Tip

Data composition with composite services or client-side mashup is suitable for joins of 1:m type, where a row from one table can have multiple matching rows in another table.

Joins with Materialize View Using Asynchronous Events

There are certain data composition scenarios where you need to materialize the view with pre-joined data coming from multiple microservices. For example, consider the scenario illustrated in Figure 5-9. Here we have the Order Processing and Customer services, and we need to materialize the customer-order join/view. The materialized view will be used for a specific business function, which requires the join between orders and customers.

../images/461146_1_En_5_Chapter/461146_1_En_5_Fig9_HTML.jpg — Figure 5-9
Joins with materialized view using asynchronous events

The Order Processing and Customer services publish order and customer update events to the event bus/broker. There is a service that has subscribed to those events and it then materializes the join of orders and customers. That service (Customer-Order-View-Sync) maintains a de-normalized join between the orders and customers, which is done ahead of time rather than in realtime. As shown in Figure 5-9, Customer-Order-View-Sync service also has a component that operates on the Customer-Order View Cache that serves all the external queries.

Tip

Data composition with materializing view is suitable for compositions with large numbers of rows (m:n joins with high cardinality) on each side.

The denormalized data can be kept in a cache or other such storage and it could be consumed by another microservices as a read-only datastore.

Transactions with Microservices

Transactions are an important concept in software applications. They allow you to group a set of operations that should be executed together in an all-or-none scenario (i.e., they are executed or are all rolled back in the event of a failure). Transactions are quite commonly used in the context of a database but not limited to it. In this chapter, we mainly focus on the database transactions.

ACID (Atomicity, Consistency, Isolation, and Durability) is a set of properties of database transactions intended to guarantee validity even in the event of a failure. Therefore, a sequence of database operations that satisfies the ACID properties can be considered a transaction.

The monolithic applications are often built on top of a single centralized relational database and transactions are used to keep data in a consistent state across multiple tables. Using the ACID properties in the application provides the capability of beginning a transaction to perform changes like insert, update, and delete and allows committing or rolling back the transaction. With monolithic applications and centralized databases, it is quite straightforward to begin a transaction, change the data in multiple rows (which can span across multiple tables), and finally commit the transaction.

With microservices, the business requirements related to transactions wouldn’t change drastically. However, unlike a centralized database, microservices have their own database and the transactional boundaries that span across multiple services and databases. Therefore, implementation of such transactional scenarios is no longer straightforward as it is with monolithic applications and centralized databases.

Avoiding Distributed Transactions with Two-Phase Commit

Distributed transactions are built around the concept of using a centralized process called a transaction manager to orchestrate the steps of a transaction. The main algorithm that is used in implementing distributed transactions is known as two-phase commit (2PC). Let’s look at the details of the two-phase commit protocol. The changes required by a transaction are sent to each participant and initially stored temporarily at each participant of the transaction. Then the transaction manager initiates the voting/commit-request phase.

Voting/Commit-Request Phase

The transaction manager/coordinator sends a prepare request to all the services that participate in a given transaction.
The transaction manager will wait until all services reply yes or no.

Commit Phase

Based on the responses received in the first phase, if all the services have responded with a yes, then the transaction manager will commit the transaction.
If any of the services respond with a no (or don’t respond at all), then the transaction manager will invoke the rollback operations for all the participating services. Once the commit message is received, all the participant entities persist the temporarily stored changes.

The distributed transactions method addresses most of the transaction requirements that we discussed earlier, but it comes with inherent limitations that hinder the usage of distributed transactions with 2PC for most microservices transactional behaviors. Some of the limitations of the two-phase commit method are:

The transaction manager is the single point of failure. All pending transactions will never complete.
If a given participant fails to respond, then the entire transaction will be blocked.
A commit can fail after voting. The 2PC protocol assumes that if a given participant has responded with a yes, then it can definitely commit the transaction too. This is not the case in most practical scenarios.
Given the distributed and autonomous nature of microservices, using a distributed transactions/two-phase commit for implementing transactional business use cases is a complex, error-prone task that can hinder the scalability of the entire system.

Tip

Avoid using distributed transactions with two-phase commit for microservices transactions.

It is better if you can avoid using distributed transactions with two-phase commit when implementing transactions across multiple microservices. However, the requirement of building transactional business scenarios, which span multiple microservices, is still valid. There are several other alternatives you can use.

Publishing Events Using Local Transactions

Asynchronous event-based data management is quite common in microservices data management. There are certain transactional behaviors that you can implement in event-driven architecture that allow you to achieve atomicity. For example, suppose that the Order Processing service illustrated in Figure 5-10 is responsible for updating the ORDER table and publishing an event to the event bus in a transactional manner (i.e., update the order and publish the event at once).

../images/461146_1_En_5_Chapter/461146_1_En_5_Fig10_HTML.jpg — Figure 5-10
Using local transactions to update a database table and create an event

Here, we have used an event table to store the order update events. So, the Order Processing service can start a local transaction, which includes order update operations, and add an event to the ORDER_EVENT table. Both of these operations will be executed in the same local transaction boundary.

There is a dedicated service/process that is responsible for consuming the ORDER_EVENT table and publishing the event to the event bus. It can also use local transactions to read the events, publish them to an event bus/message broker, and update the order event table. The event consumer service is responsible for reading events from the event bus and handling them in the transaction boundary of that service.

This approach avoids the usage of distributed transactions with 2PC, but it has some limitations, such as dependency on a database that supports transactions (i.e., most of the NoSQL databases do not support transactions).

Database Log Mining

When a service performs various database operations on top of the data owned by that service, all the transaction details are recorded in the database transactions or the commit log. Therefore, we can consider the database the single source-of-truth and extract data changes from its transaction or commit log. For example, as shown in Figure 5-11, the Order Processing service performs various database operations and they are recorded in the database transaction log. The DBTransactionLogProc application can mine the database transactions log of the Order database and create events that match each transaction. These events are then published into an event bus or a message broker.

../images/461146_1_En_5_Chapter/461146_1_En_5_Fig11_HTML.jpg — Figure 5-11
Achieving atomicity by using a database transaction log and publishing events

Other applications can consume these events and we can maintain eventual consistency between all services.

Data-management solutions such as Change Data Capture (CDC) leverage database transaction log processing techniques. For example, solutions such as Debezium or LinkedIn Databus use this technique to build CDC pipelines.

The transaction log mining technique is quite effective as we use the database as the single source of truth. Every successful database operation is recorded in the database transaction log. However, when it comes to the implementation and processing, the database transaction log drastically changes from one database to the other, because there is no standard format for transaction logs and each database has its own proprietary way to recording the transactions. Therefore, most of the data-management solutions based on database transaction log mining must have implementation for each and every database type.

Event Sourcing

By using the techniques that we discussed earlier (such as publishing events using local transactions), we can persist each state-changing event of an entity as a sequence of events. All such events are stored in an event bus and subscribers can derive the state of that entity by processing the sequence of the event that has taken place on that entity. For example, as shown in Figure 5-12, the Order Processing service publishes the changes taking place on the entity Order as events (rather than updating the database table with the order status). The state-changing events—such as order created, updated, paid, shipped, etc.—are published into the event bus.

../images/461146_1_En_5_Chapter/461146_1_En_5_Fig12_HTML.jpg — Figure 5-12
Event sourcing

The subscriber applications and services can recreate the status of an order by simply replaying the events that are taking place on an order. For example, it can retrieve all the events related to order11234 and derive the current state of that order.

Saga

So far, we have discussed several asynchronous event-driven architecture-based solutions that we can leverage to avoid distributed transactions with two-phase commit. For fully synchronous messaging scenarios, we can build transaction behavior across multiple services using Sagas.

Before we jump into the theoretical aspects of Saga, let’s look at a real-world example of using Saga. (There¹ is also good conference talk by Caitie McCaffrey on how the Saga pattern is designed for real-world scenarios.) Consider a travel agent service (see Figure 5-13), which allows you to plan a vacation. The travel agent service takes the duration, location, and other details through the travel agent app and books flight, hotel, and car rental services. The booking of flight, hotel, and car rental service information has to be done in a transactional manner (book all three of them together or if a booking of one of those fails then cancel the rest). As we discussed earlier, if the travel agent service is built on top of a centralized database, the implementation of this scenario with transactions will be quite trivial.

../images/461146_1_En_5_Chapter/461146_1_En_5_Fig13_HTML.jpg — Figure 5-13
Transactions on a centralized database

With microservices, building this scenario requires us to have transactional safety across multiple service invocations. As you have seen in the previous sections, using distributed transactions with two-phase commit has inherent limitations and won’t be a suitable approach to solve this problem.

Saga aims to solve the distributed transactions problem by grouping a given transaction into a sequence of sub-transactions and corresponding compensating transactions. All transactions in a Saga either complete successfully or, in the event of a failure, the compensating transactions are ran to roll back everything, which is done as part of the Saga.

Note

A Saga is a long-lived transaction that can be written as a sequence of transactions that can be interleaved. All transactions in a sequence complete successfully or the compensating transactions are executed to amend a partial execution. The Saga² pattern was introduced in a paper published in 1987 by Hector Garcia-Molina and Kenneth Salem.

Now, let’s try to use the Saga pattern in our travel booking scenario. We can model (see Figure 5-14) this use case as a collection of sub-transactions—booking airline, booking hotel, and booking scar rental. Each of these sub-transactions operates on a single transaction boundary and each sub-transaction has an associated compensating transaction that can semantically undo the sub-transaction. For example, for each service, we can list the transaction and the compensating transactions as follows.

T1: Book flight, C1: Cancel flight
T2: Book hotel, C2: Cancel hotel
T3: Book car rental, C3: Cancel car rental

For each service, there’s a dedicated transaction boundary and it will operate on top of a dedicated database (having a database for each service is not mandatory though).

A Saga can be represented as a directed acyclic graph that consists of all the sub-transactions and compensating transactions. The travel-booking Saga contains the set of sub-transactions and compensating transactions. The travel agent service contains a component called Saga Execution Coordinator (SEC) , which executes the book flight, book hotel, and book car rental transactions. If any of these operations fails at a given step, then SEC rolls back the entire transaction by executing the corresponding compensating transactions.

../images/461146_1_En_5_Chapter/461146_1_En_5_Fig14_HTML.jpg — Figure 5-14
Using Sagas

Sagas at the conceptual level are quite trivial and most of the centralized workflow solutions, such as Business Process Model and Notation (BPMN) solutions , are in fact based on the same terminology. However, building a Saga pattern for a microservices-based system in a decentralized fashion is quite challenging. Therefore, let’s take a closer look at how a Saga can be implemented in the microservices context.

The implementation of Saga requires a Saga log, which is a distributed log that the Saga Execution Coordinator interacts with.

Saga Log

The Saga log is a distributed log that’s used to persist every transaction/operation during the execution of a given Saga. At a high level, the Saga log contains various state-changing operations, such as Begin Saga, End Saga, Abort Saga, Begin T-i, End T-i, Begin C-i, and End C-i.

The Saga log is often implemented using a distributed log and systems such as Kafka are commonly used for the implementation.

Saga Execution Coordinator (SEC)

The SEC is the main component that orchestrates the entire logic and is responsible for the execution of the Saga. All the steps in a given Saga are recorded in the Saga log and the SEC writes and interprets the records of the Saga log. It also executes the sub-transactions/operations (e.g., invoke the hotel service and make a reservation) and the corresponding compensating transactions when necessary. While the steps related to the Saga are recorded in the Saga log, the orchestration logic (which can be represented as a directed acyclic graph) is part of the SEC process (orchestration can be built using your own custom logic or can be built on top of a standard, such as BPMN).

It is very important to understand that, unlike the coordinator in 2PC, SEC is not a special process that has central control of the entire execution. It certainly operates as a centralized runtime, but the runtime is dumb and the execution logic is kept in the distributed Saga log. However, it is required to make sure that SEC is up and running all the time. In the event of an SEC failure, a new SEC process should be started based on the same distributed Saga log.

Since now you have a good understanding of the SEC and the Saga log, let’s dive into the execution of a distributed Saga.

Executing a Distributed Saga

A distributed Saga is a Directed Acyclic Graph (DAG) and the SEC’s primary task is to execute that DAG. Suppose that, in our travel-booking scenario (see Figure 5-15), we have the SEC built into the travel agent service.

The SEC can start processing the Saga, which is recorded in the distributed log.

../images/461146_1_En_5_Chapter/461146_1_En_5_Fig15_HTML.jpg — Figure 5-15
Execution steps of a successful Saga

Once the travel agent service gets a travel-booking request, SEC initiates a Saga by writing to the Saga log with a Start Saga instruction along with any other metadata required to process the Saga. Once the record is durably committed into the log, SEC can move into the next instruction.

Then, based on the DAG of the Saga, SEC can pick one of the airline, hotel, or car rental transactions (given that all three can work in parallel). Suppose that airline transactions are executed first. In that case, the SEC logs a Start Airline message to the Saga log. Then, the SEC executes the book flight operation.

Once the SEC receives the response from the airline service, it commits the End Airline message along with the response from the airline service, which we may need during the latter part of the Saga.

Similarly, the same set of steps continues until we have successfully executed all three operations on the airline, hotel, and car rental services.

Finally, since we have completed everything successfully, SEC commits the End Saga message into the Saga log. This is a successful Saga execution.

Now let’s look at a Saga in a failure scenario. In Figure 5-16, we have the same set of steps that we discussed earlier, but in this case, the car rental process fails (imagine there are no cars available on the specified dates, for example).

../images/461146_1_En_5_Chapter/461146_1_En_5_Fig16_HTML.jpg — Figure 5-16
Execution steps of an unsuccessful Saga

Since we have detected a failure of a particular sub-transaction (i.e., the car reservation), we need to roll back all the other sub-transaction we made so far. So, in the Saga log you can find the Start Car Rental log and now the car reservation has failed. Now we have to walk through the inverted DAG of Saga that we executed so far and commit the rollback.

When the car rental service returns an error for the car reservation, the SEC commits the Abort Car Rental message to the Saga log.

Since this is a failure, the SEC has to initiate the rollback operations for the current Saga. The SEC can roll back the sub-transactions that are completed so far by inverting the DAG of this Saga and processing the Saga log backward.

So, the SEC looks for any records with car, hotel, and airline in the Saga log and it finds the records on the hotel and airline sub-transactions. The SEC will execute the compensating transactions for the hotel and airline. It can use the stored information on Saga logs, such as the reservation number, to execute the compensation transactions.

After a successful execution of the compensating transaction, the SEC commits the [Comp] Airline and [Comp] Hotel messages into the Saga log.

Finally, the SEC can commit the Saga completion into the Saga log.

SEC failures are relatively easier to handle, as we can recreate the DAG and the state by processing the Saga log. Also, in order for the Saga to function as expected, the sub-transaction should be sent at most once and the compensating transactions are sent at least once (idempotent).

It’s also important to keep in mind that at any given point in time, the system may not be in full consistent state, but with time it will come to a consistent state (eventual consistency). We execute the sub-transactions and complete them as we walk through the Saga DAG . But when we come across an issue, we roll back the transaction. In the example of an unsuccessful Saga, we booked the airline and hotel, and then later we cancelled both bookings.

Note

The core concepts of the Saga pattern are intended to achieve eventual consistency.

The Saga pattern is not just for database transactions. The Saga concept is widely used in practice with solutions such as workflow, payment processing, financial systems, etc. Also, the Saga pattern is good for any use case that requires approval and human interaction.

The Saga pattern for microservices is supported in most workflow solutions that can operate in a microservices environment. In Chapter 7, “Integrating Microservices,” we explore how such workflow and business processes are used in the context of a microservices architecture.

Polyglot Persistence

With decentralized data management, you can take advantage of using the most appropriate persistent technique for your use case. Based on your use case, one microservice can use a SQL database, while another service can leverage a NoSQL database.

For example, a microservices of a social media app may use a relational/SQL database to store its user information, while multimedia storage is based on a NoSQL database.

Caching

As part of the data-management techniques for microservices, caching plays a critical role as it improves availability, scalability, and performance of a given microservice. At each microservice level, we can cache the business entities that a given service operates on. Usually such business entities (or objects) do not frequently change (e.g., product information service caches the product name and details in a cache, which will be frequently used for product searching). Such data can usually be cached on-demand (when we first access the product information from the underlying datastore). Also, you can cache any service-level metadata (configurations or static data) during the service startup.

One of the most important aspects of caching is not to use a central caching layer that is shared between microservices. However, the instances of a given microservice will all have the same data requirements, so it makes sense to share a caching layer across these instances.

There are quite a few caching solutions out there, but Redis³, Ehcache⁴, Hazelcast⁵, and Coherence⁶ are some of the more popular caching implementations. Redis in particular has been widely used in open source and container-native microservice caching scenarios. Redis is an open source, in-memory data structure store, used as a database, cache, and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, and geospatial indexes with radius queries. While Redis is mainly used for caching in the microservices context, it can also be used as a database or message broker (publisher-subscriber messaging).

Summary

In this chapter, we discussed the decentralized data-management techniques that we can leverage in a microservices architecture. The database per microservices pattern gives us a set of advantages and several challenges too. The conventional data sharing across multiple tables using SQL, dependencies between tables such as foreign key constraints, etc. are no longer applicable when each microservice can only operate on a single private database.

We discussed several techniques that we can use to share data between microservices, such as runtime lookups by accessing the service interface, asynchronous event-based data sharing using local caches, and maintaining a materialized view using event-driven communication.

Transactions are one of the main challenging aspects in distributed data management with microservices. You are no longer able to define transaction boundaries that span across multiple business services (multiple tables), as each service has a dedicated database and services are not allowed to access the external databases directly. Using distributed transaction with two-phase commit is not an option due to the inherent limitations related to scalability. Sagas provide an alternative approach to distributed transactions with two-phase commit. With the use of sub-transactions, which are associated with a corresponding compensating transaction, we can build transactional safe business scenarios that span over multiple microservices.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Data Management

Create new playlist

Sign In

Sign Up

5. Data Management

Monolithic Applications and Shared Databases

A Database per Microservice

Sharing Data Between Microservices

Eliminating Shared Tables

Shared Data

Synchronous Lookups

Using Asynchronous Events

Shared Static Data

Data Composition

Composite Services or Client-Side Mashups

Tip

Joins with Materialize View Using Asynchronous Events

Tip

Transactions with Microservices

Avoiding Distributed Transactions with Two-Phase Commit

Tip

Publishing Events Using Local Transactions

Database Log Mining

Event Sourcing

Saga

Note

Saga Log

Saga Execution Coordinator (SEC)

Executing a Distributed Saga

Note

Polyglot Persistence

Caching

Summary

Table of Contents for
5. Data Management