Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Kasun Indrasiri and Prabath SiriwardenaMicroservices for the Enterprisehttps://doi.org/10.1007/978-1-4842-3858-5_2

2. Designing Microservices

Kasun Indrasiri¹ and Prabath Siriwardena¹

(1)

San Jose, CA, USA

Steve Jobs believed that design is not just what something looks like and feels like, but how it works. How a microservice works within itself and interacts with other microservices highly depends on its design. Most of the architectural concepts and design principles discussed in terms of microservices don’t just relate to microservices. They’ve been here for some time, even during the early days when SOA (Service Oriented Architecture) was popular. Some even call microservices SOA done right! The fundamental issue with SOA was that people didn’t get the design right. They got caught up in the hype and left behind the key design principles. Over time, SOA became just another buzzword, while the original need for it was left unaddressed. Microservices as a concept has emerged to fill this vacuum. Unless you pay close attention to microservices architectural concepts and design principles, you are not doing microservices!

Sir Charles Antony Richard Hoare is the British scientist who developed the quicksort-sorting algorithm. During his speech accepting the Turing award in 1980, he mentioned that there are two ways to design software: one way is to make the software so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies—the first method is far more difficult. In a microservices design, you need to worry about its inner and outer architecture. The inner architecture defines how you design a microservice itself, and the outer architecture talks about how it communicates with other microservices. Unless you make both the designs simple and easy to evolve, you make the system error prone and move away from key microservices design goals. At its core in any microservices design, time to production, scalability, complexity localization, and resiliency are key elements. Unless you make the design simple, it’s hard to reach these expectations.

Domain-Driven Design

Domain-driven design (DDD) is not a new concept introduced with microservices and has been around for quite some time. Eric Evans, in his book, Domain-Driven Design: Tackling Complexity in the Heart of Software, coined the term Domain-Driven Design. As microservices became a mainstream architectural pattern, people started to realize the applicability of domain-driven design concepts in designing microservices. This design plays a key role in scoping out microservices.

Note

An in-depth explanation of domain-driven design is out of the scope of this book. This chapter focuses on the application of domain-driven design for building microservices. Readers who are keen on learning more about domain-driven design are encouraged to go through the book by Eric Evans. In addition to Eric’s book, we also recommend reading the book Patterns, Principles, and Practices of Domain-Driven Design, by Scott Millett and Nick Tune.

What is domain-driven design? It is mainly about modeling complex business logic, or building an abstraction over complex business logic. The domain is at the heart of a domain-driven design. All software we develop is related to some user activity or interest. Eric Evans says that the subject area to which the user applies the program is the domain of the software. Some domains involve the physical world. In the retail business, you find buyers, sellers, suppliers, partners, and many other entities. Some domains are intangible. For example, in the crypto-currency domain, a Bitcoin wallet application deals with intangible assets. Whatever it is, the domain is related to the business, not the software. Of course, software can be a domain itself when you build software to work in the software domain, for example a configuration management program.

Throughout this book we’ll be using many examples to elaborate on these concepts. Let’s say we have an enterprise retailer that is building an e-commerce application. The retailer has four main departments: inventory and order management, customer management, delivery, and billing and finance. Each department may have multiple sections. The order processing section of the inventory and order management department accepts an order, locks the items in the inventory, and then passes the control to billing and finance to work on the payment. Once the payment is successfully processed, the delivery department makes the order ready for delivery. The customer management department takes the ownership of managing all customer personal data and all the interactions with the customer. See Figure 2-1.

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig1_HTML.jpg — Figure 2-1
Divide and conquer

One key principle behind domain-driven design is divide and conquer . The retail domain is the core business domain in our example. Each department can be treated as a sub-domain. Identifying the core business domain and the related sub-domains is critically important. This helps us build an e-commerce application for our retailer following microservices architectural principles. One of the key challenges many architects face in building a microservices architecture is to come up with the right level of granularity for each service. Domain-driven design helps here. As the name implies, under domain-driven design, domain is the king!

Let’s take a step back and look at Conway’s law . It says any organization that designs a system will produce a design whose structure is a copy of the organization’s communication structure. This justifies identifying sub-domains in an enterprise in terms of departments. A given department is formed for a purpose. A department has it own internal communication structure, as well as a communication structure between departments. Even a given department can have multiple sections, where we can identify each such section as a sub-domain. See Figure 2-1 for the details.

Let’s see how we can map this domain structure into the microservices architecture. Possibly we can start building our e-commerce application with four microservices (see Figure 2-2): Order Processing, Customer, Delivery, and Billing.

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig2_HTML.jpg — Figure 2-2
Communication between microservices

Suppose that these microservices communicate with each other via messages, which are sent as events. The request first hits the Order Processing microservice, and once it locks the items in the inventory, fires the ORDER_PROCESSING_COMPLETED event. Events are a way of communicating between microservices. There can be multiple other services listening to the ORDER_PROCESSING_COMPLETED event and once they are notified, start acting upon it accordingly. As per Figure 2-2, the Billing microservice receives the ORDER_PROCESSING_COMPLETED event and starts processing the payment. Amazon, for example, does not process payments at the time an order is placed, but only when it is ready to ship. Just like Amazon, the Order Processing microservice will fire the ORDER_PROCESSING_COMPLETED event only when the order is ready to ship. This event itself contains the data required to process the payment by the Billing microservice. In this particular example, it carries the customer ID and the payment method. The Billing microservice stores customer payment options in its own repository, including the credit card information, so it can now independently process the payment.

Note

Using events to communicate between microservices is one of the commonly used patterns in inter-microservice communication. It removes point-to-point connections between microservices and the communication happens via a messaging system. Each microservice, once it’s done with its own processing, publishes an event to a topic, while the other microservices, which register themselves as listeners to the interested topics, will act accordingly once they receive a notification. Messaging technologies used in microservices, microservice integration patterns, and event-driven messaging patterns are covered in Chapter 3, “Inter-Service Communication,” Chapter 7, “Integrating Microservices,” and Chapter 10, “APIs, Events, and Streams,” respectively.

Once the Billing microservice completes processing the payment, it will fire the PAYMENT_PROCESSING_COMPLETED event and the Delivery microservice will capture it. This event carries the customer ID, order ID, and invoice. Now the Delivery microservice loads the customer delivery address from its own repository and prepares the order for delivery. Even though the Customer microservice is shown in Figure 2-2, it is not being used during the order processing flow. When new customers are on-boarded into the system or existing customers want to update their personal data, the Customer microservice will be used.

A project faces serious problems when its language is fractured.—Eric Evans

Each microservice in Figure 2-2 belongs to a business domain. The inventory and order management is the domain of the Order Processing microservice; customer management is the domain of the Customer microservice; delivery is the domain of the Delivery microservice; and billing & finance is the domain of the Billing microservice. Each of these domains or departments can have its own communication structure internally along with its own terminology to represent business activities.

Each domain can be modeled independently. As much as one can be independent from the others, it gains more flexibility to evolve on its own. Domain-driven design defines best practices and guidelines on how you can model a given domain. It highlights the need to have a ubiquitous language to define the domain model. The ubiquitous language is a shared team language, which is shared by domain experts and the developers. In fact, ubiquitous means that the same language must be used everywhere within a given context (or to be precise, within a bounded context, which we discuss in the section to follow), from conversations to code. This bridges the gap in communication between domain experts and developers. Domain experts are thorough with their own jargons, but have limited or no understanding of the technical terms used in software development, while the developers know how to describe a system in technical terms, but have no or limited domain expertise. The ubiquitous language fills this gap and brings everyone to the same page.

The terminology defined by the ubiquitous language must be bounded by the corresponding context. The context is related to a domain. For example, the ubiquitous language can be used to define an entity called customer. The definition of the customer entity in the inventory and order management domain does not necessarily need to be the same as in the customer management domain. For example, the customer entity in the inventory and order management domain may have properties such as order history, open orders, and scheduled orders, while the customer entity in the customer management domain has properties such as first name, last name, home address, email address, mobile number, etc. The customer entity in the billing & finance domain may have properties like credit card number, billing address, billing history, and scheduled payments. Any term defined by ubiquitous language must only be interpreted under the corresponding context.

Note

A typical software project involves domain experts only during the requirement-gathering phase. A business analyst (BA) translates the business use cases into a technical requirements specification. The business analyst completely owns the requirements and there is no feedback cycle. The model is developed, as heard by the business analyst. One key aspect in domain-driven design is to encourage more, lengthy communication between domain experts and developers. This goes well beyond the initial requirement-gathering phase and finally ends up with building a domain model well understood by both the domain experts and developers.

Let’s delve deep in to this example. In our architecture, a given microservice belongs to a single business domain and the communication between microservices happens via message passing. The message passing can be based on an event-driven architecture or just over HTTP. Each message from one microservice to another carries domain objects. For example, the ORDER_PROCESSING_COMPLETED event carries the order domain object, while the PAYMENT_PROCESSING_COMPLETED event carries the invoice domain object (see Figure 2-2). The definition of these domain objects must be carefully derived via domain-driven design with the collaboration between domain experts and developers.

Note

Domain-driven design has its own inherent challenges. One challenge is to get domain experts involved in the project throughout its execution. It also takes a considerable amount of time to build the ubiquitous language, which requires good collaboration between domain experts and developers. Unlike developing a monolithic application, which expands across all the domains, building a solution for a given domain and encapsulating domain-specific business logic requires a change in developer mindset, which is also challenging.

Bounded Context

As we discussed, one of the most challenging parts of a microservices design is to scope out a microservice. This is where the SOA and its implementations had poorly defined the scope. In SOA, while doing a design, we take into consideration the entire enterprise. It will not worry about individual business domains, but rather the enterprise as a whole. It will not worry about inventory and order management, billing and finance, delivery and customer management as separate independent domains—but rather will treat the complete system as an enterprise e-commerce application.

Figure 2-3 illustrates the layered architecture of an e-commerce application done by an SOA architect. For anyone from some SOA background, this should look quite familiar. What we have here is a monolithic application. Even though the service layer exposes some functionality as services, none is decoupled from each other. The scoping of services was not done based on the business domain they belong to. For example, the Order Processing service may also deal with billing and delivery. In Chapter 1, “The Case for Microservices,” we discussed the deficiencies of such a monolithic architecture.

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig3_HTML.jpg — Figure 2-3
A layered architecture of an e-commerce application

As we discussed in the previous section, domain-driven design helps scope out microservices. The scoping of a microservice is done around a bounded context. The bounded context is at the heart of a microservices design. Eric Evans first introduced bounded context as a design pattern in his book Domain-Driven Design: Tackling Complexity in the Heart of Software. The idea is that any given domain consists of multiple bounded contexts, and each bounded context encapsulates related functionalities into domain models and defines integration points to other bounded contexts. In other words, each bounded context has an explicit interface, where it defines what models to share with other contexts. By explicitly defining what models should be shared, and not sharing the internal representation, we can avoid the potential pitfalls that can result in tight coupling. These modular boundaries are great candidates for microservices. In general, microservices should cleanly align to bounded contexts. If the service boundaries are aligned to the bounded contexts of the corresponding domain, and the microservices represent those bounded contexts, that’s a great indication that the microservices are loosely coupled and strongly cohesive.

Note

A bounded context is an explicit boundary within which a domain model exists. Inside the boundary all terms and phrases of the ubiquitous language have a specific meaning, and the model reflects the language with exactness.¹

Let’s extend our previous example with bounded contexts. There we identified four domains: inventory and order management, billing and finance, delivery, and customer management. Each microservice we designed attached to one of those domains. Even though we have one-to-one relationship between a microservice to a domain, we know by now that one domain can have more than one bounded context, hence more than one microservices. For example, if you take the inventory & order management domain, we have the Order Processing microservice, but we also can have multiple other microservices as well, based on different bounded contexts (e.g., the Inventory microservice). To do that, we need to take a closer look at the key functions provided under the inventory and order management domain and identify the corresponding bounded contexts.

Note

It is recommended that bounded contexts maintain their separation by each having its own team, codebase, and the database schema.

The inventory and order management department of an enterprise takes care of managing stocks and makes sure customer demand can be met with the existing stocks. It should also know when to order more stocks from suppliers to optimize the sales as well as the storage facilities. Whenever it receives a new order, it has to update the inventory and lock the corresponding items for delivery. Once the payment is done and confirmed by the billing department, the delivery department has to locate the item in its warehouse and make it available for pick up and delivery. At the same time, whenever the available quantity of an item in the store reaches some threshold value, the inventory and order management department should contact the suppliers to get more, and once received, should update the inventory.

One of the key highlights of the domain-driven design is the collaboration between domain experts and the developers. Unless you have proper understanding of how an inventory management department works within an enterprise, you will never identify the corresponding bounded contexts. With our limited understanding of inventory management, based on what was discussed before, we can identify the following three bounded contexts.

Order processing : This bounded context encapsulates the functionality related to processing an order, which includes locking the items in the order in the inventory, recording orders against the customer, etc.
Inventory : Inventory itself can be treated as a bounded context. This takes care of updating the stocks upon receiving items from suppliers and releasing for delivery.
Supplier management : This bounded context encapsulates the functionality related to managing suppliers. Upon releasing an item for delivery, supplier management checks whether it has enough stocks in the inventory, and if not, it notifies the corresponding suppliers.

Figure 2-4 illustrates multiple microservices under the inventory and order management domain, representing each of the bounded contexts. Here the service boundaries are aligned to the bounded contexts of the corresponding domain. The communication between bounded contexts happens only via message passing against well-defined interfaces. As per Figure 2-4, the Order Processing microservice first updates the Inventory microservice to lock the items in the order and then triggers the ORDER_PROCESSING_COMPLETED event. The Billing microservice listening to the ORDER_PROCESSING_COMPLETED event executes payment processing and then triggers the PAYMENT_PROCESSING_COMPLETED event. The Supplier Management microservice listening to the PAYMENT_PROCESSING_COMPLETED event checks whether the number of items in the stocks is above the minimal threshold and if not notifies the suppliers. The Delivery microservice listening to the same event executes its operations to look up the items (probably sending instructions to warehouse bots) and then groups the items together to build the order and make it ready for delivery. Once that’s done, the Delivery microservice will trigger the ORDER_DISPATCHED event, which notifies the Order Processing microservice, and will update the order status.

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig4_HTML.jpg — Figure 2-4
Communication between bounded contexts

A good design will scope out one microservice to a single bounded context. Any microservice expanding across multiple bounded contexts deviates from the original goals. When we have one microservice, encapsulating the business logic behind a well defined interface and representing one bounded context, bringing in new changes will have no or minimal impact on the complete system.

As we discussed, communication between microservices can happen via events. Under domain-driven design, these events are known as domain events. Domain events are triggered as a result of state changes in bounded contexts. Then the other bounded contexts can respond to these events in a loosely coupled manner. The bounded contexts firing events need not to worry about the behaviors that should take place as a result of those events and at the same time bounded contexts, which handle such events, need not to worry about where the events came from. Domain events can be used between bounded contexts within a domain or between domains.

Context Map

The bounded context helps encapsulating business logic within a service boundary and helps define the service interface. When the number of bounded contexts in an enterprise grows, it could easily become a nightmare to figure out how they are connected. The context maps help visualize the relationships between bounded contexts. Conway’s Law we discussed earlier is another reason why we should build a context map. As per Conway’s Law, any organization that designs a system will produce a design whose structure is a copy of the organization’s communication structure. In other words, we will have different teams working on different bounded contexts. They may result in very good communication within a team, but not between teams. When the communication between teams lack, the design decisions made on corresponding bounded contexts do not get propagated to other parties properly. Having a context map helps each team track changes happening on the bounded contexts they rely on.

Vaughn Vernon, in his book Implementing Domain-Driven Design, presents multiple approaches to express a context map. The easier way is to come up with a diagram to show the mapping between two or more existing bounded contexts, as in Figure 2-5. Also, keep in mind here that each bounded context in Figure 2-5 has a corresponding microservice. We used a line between two bounded contexts with two identifiers at each end, either U or D, to show the relationship between corresponding bounded contexts. U is for upstream, while D is for downstream.

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig5_HTML.jpg — Figure 2-5
Context map

In the relationship between the Order Processing bounded context and the Billing bounded context, the Order Processing bounded context is the upstream context, while Billing is the downstream context. The upstream context has more control over the downstream context. In other words, the upstream context defines the domain model passed between the two contexts. The downstream context should be well aware of any changes happening to the upstream context. Figure 2-4 shows what exact messages being passed between these two bounded contexts. There is no direct coupling. The communication between the Order Processing bounded context and the Billing bounded context happens via eventing. The upstream bounded context or the Order Processing bounded context defines the structure of the event, and any downstream bounded context interested in that event must be compliant.

The relationship between the Billing bounded context and the Supplier Management bounded context is also the same as what is between the Order Processing bounded context and the Billing bounded context. There the Billing is the upstream context while Supplier Management is the downstream context. The communication between these two bounded contexts happens via eventing, as shown in Figure 2-4. The communication between the Order Processing bounded context and the Inventory bounded context is synchronous. The Inventory is the upstream context while Order Processing is the downstream context. In other words, the contract for the communication between the Order Processing bounded context and the Inventory bounded context is defined by the Inventory bounded context. Not all the relationships, which are shown in Figure 2-5, need an explanation, as they are self-explanatory.

Let’s step back and delve little deeper into the Order Processing and Inventory bounded contexts. A bounded context has its own domain model, which is defined as a result of a long exercise carried out by domain experts and developers. You may recall that the same domain object may reside in different bounded contexts with different definitions. For example, the order entity in the Order Processing bounded context has properties such as order id, customer id, line items, delivery address, and payment option, while the order entity in the Inventory bounded context has properties such as order id and line items. Even though a reference to the customer, the deliver address, and the payment option are required by the Order Processing interface to maintain the history of all the orders against the customer, none of them are needed by the Inventory. Each bounded context should know how to manage such situations, to avoid any conflicts in their domain models. In the following section we discuss some patterns to follow in order to maintain relationships between multiple bounded contexts.

Relational Patterns

Domain-driven design has lead to multiple patterns that facilitate communication between multiple bounded contexts. The same patterns are applicable while designing microservices, which are well aligned with bounded contexts. These relational patterns for bounded contexts were first introduced by Eric Evans in his book Domain-Driven Design: Tackling Complexity in the Heart of Software.

Anti-Corruption Layer

Let’s revisit the scenario we discussed in the previous section, where the order entity has two different definitions under the Order Processing and Inventory bounded contexts. For the communication between these two bounded contexts, the contract is defined by the Inventory bounded context (see Figure 2-5). When Order Processing updates the Inventory, it has to translate its own order entity to the order entity, which is understood by the Inventory bounded context. We use the anti-corruption layer (ACL) pattern to address this concern. The anti-corruption layer pattern provides a translation layer (see Figure 2-6).

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig6_HTML.png — Figure 2-6
Anti-corruption layer pattern

Let’s see how this pattern is applicable in an enterprise grade microservices deployment. Imagine a scenario where your microservice has to invoke a service exposed by a monolithic application. The design of the monolithic application does not worry about domain-driven design. The best way to facilitate the communication between our microservice and the monolithic application is via an anti-corruption layer. This helps to keep the microservice design much cleaner (or less corrupted).

There are multiple ways to implement the anti-corruption layer. One approach is to build the translation into the microservice itself. You will be using whatever the language you used to implement the microservice to implement the anti-corruption layer as well. This approach has multiple drawbacks. The microservice development team has to own the implementation of the anti-corruption layer; hence they need to worry about any changes happening at the monolithic application side. If we implement this translation layer as another microservice, then we can delegate its implementation and the ownership to a different team. That team only needs to understand the translation and nothing else. This approach is commonly known as the sidecar pattern.

As shown in Figure 2-7, the sidecar pattern is derived from the vehicle where a sidecar is attached to a motorcycle. If you’d like, you can attach different sidecars (of different colors or designs) to the same motorcycle, provided that the interface between those two is unchanged. The same applies in the microservices world, where our microservice resembles the motorcycle, while the translation layer resembles the sidecar. If any change happens to the monolithic application, we only need to change the sidecar implementation to be compliant with it—and no changes to the microservice.

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig7_HTML.jpg — Figure 2-7
Sidecar

The communication between the microservice and the sidecar happens over the network (not a local, in-process call), but both the microservice and sidecar are co-located in the same host—so it will not be routed over the network. We discuss multiple microservices deployment patterns later in the book, in Chapter 8, “Deploying and Running Microservices”. Also, keep in mind the sidecar itself is another microservice. Figure 2-8 shows how to use a sidecar as an anti-corruption layer.

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig8_HTML.jpg — Figure 2-8
Sidecar acting as the anti-corruption layer

An anti-corruption layer is one possible way to use a sidecar pattern. It is also used in several other use cases such as a Service Mesh. We discuss what a Service Mesh is and how the sidecar pattern is used in Service Meshes in detail in Chapter 9, “Service Mesh”.

Shared Kernel

Even though we discussed how it’s important to have clear boundaries between bounded contexts, there are cases in which we need to share domain models. This can happen in a case where we have two or more bounded contexts with certain things in common, and it adds lot of overhead to maintain separate object models under different bounded contexts. For example, each bounded context (or the microservice) has to authorize the user who invokes its operations. Different domains could be using their own authorization policies, but in many cases they share the same domain model as the authorization service (the authorization service itself is a microservice or a separate bounded context). In such cases, the domain model of the authorization service acts as the shared kernel. Since there is a shared code dependency (probably wrapped into a library), to make the shared kernel pattern work in practice, all the teams who use the shared kernel have to collaborate well with each other (see Figure 2-9).

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig9_HTML.png — Figure 2-9
Shared kernel

Conformist

We already discussed in this chapter the responsibilities of an upstream bounded context and a downstream bounded context. Let’s quickly recall. The upstream context has more control over the downstream context. The upstream context defines the domain model passed between the two contexts. The downstream context should be well aware of any changes happening to the upstream context. The conformist pattern states that the downstream context (the conformist) has to conform to the contract defined by the upstream context.

The conformist pattern looks similar to the shared kernel, where both patterns have a shared domain model. The difference is in the decision-making and the development process. The shared kernel is a result of the collaboration between two teams that coordinate tightly, while the conformist pattern deals with integration with a team that is not interested in collaboration—possibly a third-party service, where you have no control. For example, you may use the PayPal API to process payments. PayPal is never going to change its API to fit you, rather your bounded context has to comply with it. In case this integration makes your domain model look ugly, you can possibly introduce an anti-corruption layer to isolate the integration in just one place.

Customer/Supplier

The conformist pattern has its own drawbacks where the downstream context or the service has no say as to how the interface between itself and the upstream context should be. There is no collaboration between the teams who work upstream/downstream bounded contexts. The customer/supplier pattern is one step forward to build better communication between these two teams and find a way to build the interface with collaboration. It’s not a total collaboration like in the shared kernel pattern, but something like a customer/supplier relationship. The downstream context is the customer, and the upstream context is the supplier.

A customer does not have complete say over what a supplier does. But, then again, the supplier cannot totally ignore customer feedback. A good supplier will always listen to its customers, extract the positives, give feedback back to the customers, and produce the best products to address their needs. No point in producing something useless to its customers. This is the level of collaboration expected between upstream/downstream contexts adhering to the customer/supplier pattern. This helps the downstream contexts to provide suggestions, and request changes to the interface between the two contexts. Following this pattern, there is more responsibility on the upstream context. A given upstream context not only deals with one downstream context. You need to be extra careful that a suggestion from one downstream context does not break the contract between the upstream context and another downstream context.

Partnership

When we have two or more teams building microservices under different bounded contexts, but overall moving toward the same goal, and there are notable inter-dependencies between them, partnership pattern is an ideal way to build collaboration. Teams can collaborate in making decisions over the technical interfaces, release schedules, and anything of common interest. The partnership pattern is also applicable to any teams using the shared kernel pattern. The collaboration required to build the shared kernel can be established via a partnership. Also keep in mind that the output of the partnership pattern is not necessarily a shared kernel. It can be any interdependent services with nothing concrete to share.

Published Language

Microservices or bounded contexts, which follow the published language pattern, should agree on a published language to communicate. Here the published means that the language is readily available to the community that might be interested in using it. This can be XML, JSON, or any other language corresponding to the domain where the microservices are. Here the domain means the core domain. For example, there are domain specific languages used in the financial, e-business, and many other domains.

This pattern highlights the importance of using a well-documented shared language to express necessary domain information as a common medium of communication. Figure 2-10 shows how the translation works in and out of the published language to the context-specific languages. There we have a Java to JSON parser at the Order Processing microservice end, which knows how to create a JSON document from a Java object model. We use the JSON to C# parser at the Inventory microservice end to build a C# object model from a JSON document.

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig10_HTML.png — Figure 2-10
Published language pattern

Open Host Service

In the anti-corruption layer pattern we have a translation layer between upstream and downstream microservices (or bounded contexts). When we have multiple downstream services, each downstream service has to handle the translation, as shown in Figure 2-11. Both the Delivery and Supplier Management microservices have to translate the object model they get from the Billing upstream microservice to their own respective domain models (see Figure 2-11). If each of these downstream microservice has its own domain model, it doesn’t matter. We cannot avoid the translation happening at each end. But, if we have many downstream microservices doing the same translation, then it’s a duplication of the effort. The open host service pattern suggests an approach to overcome this.

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig11_HTML.jpg — Figure 2-11
Anti-corruption layer pattern with multiple downstream services

One way to implement the open host service pattern is to expose the upstream microservice functionality via an API, and the API does the translation. Now, all the downstream microservices, which share the same domain model, can talk to the API (instead of the upstream microservice) and follow either the conformist or customer/supplier pattern.

Figure 2-12 shows the implementation of the open host service pattern using an API. We’ll be discussing the role of an API gateway, in a microservices architecture, in Chapter 10: APIs, Events and Streams.

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig12_HTML.png — Figure 2-12
Open host service pattern

Separate Ways

Let’s revisit the microservices design we did for our e-commerce application. There we have a Customer microservice and an Order Processing microservice (see Figure 2-2). Think of a customer portal, which talks to the Customer microservice and displays user profile. It may be useful to the end user to see his/her order history, along with the profile data. But the Customer microservice does not have direct access to the order history of a given customer; it’s under the control of Order Processing microservice. One way to facilitate this is to integrate the Order Processing microservice with the Customer microservice, and change the domain model of the Customer microservice to return the order history along with the profile data, which is a costly integration.

Integration is always expensive and sometimes the benefit is small. The separate ways pattern suggests avoiding such costly integrations, and finds other ways to cater such requests. For example, in this particular scenario, we can avoid the integration between the Order Processing microservice and the Customer microservice and provide a link to the customer portal, along with the profile data to retrieve order history, and that will directly talk to the Order Processing microservice (see Figure 2-13).

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig13_HTML.jpg — Figure 2-13
Separate ways pattern

Big Ball of Mud

Most of the time you don’t get a chance to work on a green-field project. Always you find some kind of a legacy system with lots of resistance to integrate with other systems in a standard manner. These systems do not have clear boundaries and clean domain models. The big ball of mud pattern highlights the need to identify such systems and treat them in a special context. We should not try to apply sophisticated modeling to these contexts, but rather find a way to integrate via an API or some kind of a service interface and use an anti-corruption layer pattern at the downstream service end.

Design Principles

Domain-driven design helps us scope out microservices along with bounded contexts. At its core in any microservices design, time to production, scalability, complexity localization, and resiliency are key elements. Adhering to the following design principles helps a microservice reach those design goals.

High Cohesion And Loose Coupling

Cohesion is a measure of how well a given system is self-contained. Gary McLean Hall, in his book Adaptive Code, presents cohesion as a measure of the contextual relationship between variables in a method, methods in a class, classes in a module, modules in a solution, solutions in a subsystem, and subsystems in a system. This fractal relationship is important because a lack of cohesion at any scope is problematic. Cohesion can be low or high based on the strength of the contextual relationships. In a microservices architecture, if we have one microservice to address two or more unrelated problems, or in other words two or more problems with weak contextual relationships, this results in a low cohesive system. Low cohesive systems are brittle in nature. When we build one microservice to address many other not-so related requirements, chances are high that we have to change the implementation more frequently.

How do we know, given two requirements, if we have a strong contextual relationship? This is the whole point of the exercise we carried out in the previous section under domain-driven design. If your requirements fall under the same bounded context, then those do have a strong contextual relationship. If the service boundaries of a microservice are aligned to the bounded context of the corresponding domain, then it will produce a highly cohesive microservice.

Note

A highly cohesive and loosely coupled system will naturally follow the single responsibility principle. The single responsibility principle states that a component or a system that adheres to it should only have one reason to change.

Cohesion and coupling are two related properties of a system design. A highly cohesive system is naturally loosely coupled. Coupling is a measure of the interdependence between different systems or, in our case, microservices. When we have a high interdependency between microservices, this will result in a tightly coupled system, while a low interdependency will produce a loosely coupled system. Tightly coupled systems build a brittle system architecture. A change done in one system will affect all the related systems. If one system goes down, all the related systems will be dysfunctional. When we have a high cohesive system, we group all the related or interdependent functionalities together, into one system (or group into one bounded context). So, it does not need to heavily rely on other systems.

A microservices architecture must be highly cohesive and loosely coupled, by definition.

Resilience

Resilience is a measure of the capacity of a system or individual components in a system to recover quickly from a failure. In other words it is an attribute of a system that enables it to deal with failure in a way that doesn’t cause the entire system to fail. A microservices architecture is naturally a distributed system. A distributed system is a collection of computes (or nodes) connected over the network, with no shared memory, that appears to its users as a single coherent system. In a distributed system, failures are not strange. Now and forever the network will always be unreliable. An underwater fiber optics cable may get damaged by a submarine, a router can be blown up due to the heat, a load balancer can start to malfunction, computers may run out of memory, a power failure can take a database server down, an earthquake may take an entire data center down—and for a thousand and one other reasons the communication in a distributed system may fail.

Note

In 1994, Peter Deutsch, a Sun fellow at the time, drafted seven assumptions architects and designers of distributed systems are likely to make, which prove wrong in the long run and result in all sorts of troubles and pain for the solution and architects who made the assumptions. In 1997, James Gosling added another such fallacy. The assumptions are now collectively known as the eight fallacies of distributed computing: 1. The network is reliable. 2. Latency is zero. 3. Bandwidth is infinite. 4. The network is secure. 5. Topology doesn’t change. 6. There is one administrator. 7. Transport cost is zero. 8. The network is homogeneous.²

Failures are unavoidable. Ariel Tseitlin, in his ACM paper, “Anti-Fragile Organization,”³ talks about embracing failures to improve resilience and maximize availability, taking Netflix as an example. One way Ariel highlights in his paper to increase the resilience of a system is to reduce uncertainty by regularly inducing failures. Netflix accepts the idea of inducing failures regularly and takes an aggressive approach by writing programs that cause failures and running them in production in daily basis (The Netflix Simian Army). Google too goes beyond simple tests to mimic server failures, and as part of its annual Disaster Recovery Test (DiRT) exercises, it has simulated large-scale disasters such as earthquakes.

Note

Netflix has taken three important steps to build a more anti-fragile system and organization. The first step is to treat every engineer as an operator of the corresponding service. The second step is to treat each failure as an opportunity to learn and the third is to foster a blameless culture.

The most common way to counter attack failures in a distributed system is via redundancy. Each component in the distributed system will have a redundant counterpart. If one component fails, the corresponding counterpart will take over. Not all the time, every component in a system will be able to recover from a failure with zero downtime. Apart from redundancy, having a developer mindset focusing on recovery-oriented development helps in building more resilient software. The following lists a set of patterns that were initially introduced by Michael T. Nygard in his book, Release It, to build resilient software. Today, these patterns are part and parcel of microservices development. Many microservices frameworks have first class support for implementing these patterns. What follows is a detailed explanation of resilient communication patterns and we revisit them when we discuss microservice integration in Chapter 7. There we discuss how they are used in practice with most of the common microservice development frameworks.

Timeouts

Almost all the applications we build today do remote calls over the network. It can be an HTTP request to a web service endpoint, a database call over JDBC, or an authentication request to an LDAP server. Anything over the network is fallible; hence we should not wait infinitely expecting a response from the remote endpoint. For example, in a database connection, if we decide to wait infinitely until we see a response, that takes one connection out of the database connection pool for the period of the wait time. And if we have few more like that, the overall performance of the application will start to degrade.

Timeouts decide how much time we want to tolerate waiting for a response. For every remote call we do from our microservice, we must have a timeout. Neither the very long timeouts nor the very short ones are going to help. It will be a learning exercise to figure out the best fit. Make sure every time a connection times out to have a log entry. That helps in the future to adjust the timeout value.

Let’s see how this could happen in practice. From our customer portal, to load suggestions for the logged in user, based on his/her previous order patterns, we have to call the Customer Suggestions microservice. This service has to talk to a database internally to load the suggestions. If the connection to the database times out, what should we return? If we build the microservice with a recovery-oriented development mindset, we should not just return an error, rather we should return a default set of suggestions, which will not break anything in the customer portal. From a business point of view, this may not be effective, but from the end user’s user experience point of view it’s a good pattern to follow.

Circuit Breaker

The circuit breaker protects electrical appliances by controlling the flow of electricity (see Figure 2-14). If the flow of electricity is higher than a certain threshold value, the circuit breaker will break the flow and protect the appliances behind it from damages. The circuit breaker pattern brings this same concept to the software world. If our microservice keeps timing out against one endpoint all the time, there is no point keep trying, at least for some time. The circuit breaker pattern suggests wrapping such operations with a component that can circumvent calls when the system is not healthy. This component will maintain some threshold value for failures and, if it is met, then will break the circuit; no more calls will hit the wrapped operation. It may also wait for some configured time interval and close the circuit to see if the operation returns errors , and if does not, it will keep the circuit closed for all the subsequent operations.

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig14_HTML.jpg — Figure 2-14
MCBs (miniature circuit breakers) , mostly used in homes

Bulkheads

Bulkheads are mostly used in ships and boats to build watertight compartments, so that if one compartment is caught in the water, the people can move to another compartment for safety. The damage in one compartment will not take the ship completely down, as the compartments are isolated from each other.

The bulkheads pattern borrows this same concept . This pattern highlights how we can allocate resources, such as thread pools, which are connection pools for outbound connections. If we have one single thread pool for all outbound endpoints, then, if one endpoint happens to be slow in responding, releasing the corresponding thread to the pool will take more time. If this repeats consistently then will have an impact on requests directed to other endpoints as well, since more threads are now waiting on the slow endpoint. Following the bulkheads pattern, we can have one thread pool per endpoint—or in a way you can logically group those together. This prevents an issue in one bad endpoint being propagated into other good endpoints. The bulkheads pattern partitions capacity to preserve partial functionality when bad things happen.

Steady State

The steady state pattern highlights the need of adhering to a design that lets your system run in a steady state for a long time without frequent interventions. Every time your DevOps team has to touch the system, they increase the possibility of introducing new errors, no matter how experienced they are. In March 2017, there was a massive AWS (Amazon Web Services) outage. This four-hour outage at AWS S3 system caused disruptions, slowdowns, and failure-to-load errors across the United States. Amazon later published⁴ the cause of the issue, which was a trivial human error. An authorized S3 team member using an established playbook executed a command, which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.

The design of a system must be done in a way to keep humans away from it whenever possible. Everything, from development to deployment, must be automated. This includes cleaning up resources that accumulate in the production servers. The best example are the logs. Accumulating log files can easily fill up the filesystem. Later in this chapter we discuss an approach to managing logs in an effective manner in a microservices deployment.

We also used to store temporary states and time-bound tokens in the database, which causes the database to grow heavily over the time. Even though these tokens and temporary states are no longer valid, they still sit in the database, eating space and slowing down the system. The design of the system should have a way to clean such data periodically via an automated process.

In-memory caching is another area to look into. Memory is a limited, precious resource in a running system. Letting an in-memory cache grow infinitely will cause system performance to degrade and, ultimately, the system will run out of memory. It is always recommend to have an upper limit (in terms of the number of elements in the cache) to the cache and have an LRU (least recently used) strategy to clean up. With the least recently used strategy , when the cache hits its upper limit, the least recently used element will be evicted to find room for the new one. Periodic flush is another strategy to clean up the cache.

The steady state pattern says that, for every mechanism that accumulates a resource, some other mechanism must recycle that resource.

Fail Fast

Fail fast pattern highlights the need for making decisions early in the flow of execution, if the request is going to fail or be rejected. For example, if the load balancer already knows that a given node is down, there is no point in sending requests there again and again to find out its up or not. Rather, that node can be marked as a faulty node until you hear a valid heartbeat from it. The circuit breaker pattern can also be used to implement a fail fast strategy. With the circuit breaker pattern, we can isolate faulty endpoints, and any requests going out for such endpoints can be rejected without retrying, until circuit breaker decides that it’s time to recheck.

Let It Crash

There are many cases where doctors decide to cut off the leg of a person after a serious accident to save that person’s life. This helps to prevent propagating serious damage from the leg to the other parts of the body. In the Erlang⁵ world, this is called the “let it crash” philosophy. It may be useful at sometimes to abandon a subsystem to preserve the stability of the overall system. The let it crash approach suggests getting back to a clean startup as rapidly as possible, in case the recovery is difficult and unreliable due to a failure. This is a very common strategy in microservice deployments. A given microservice addresses a limited scope of the overall system, and taking it down and booting up again will have a minimal impact on the system. This is being well supported by the one microservice per host strategy with containers. Having a rapid server startup time, probably a few milliseconds, is also a key aspect in making this strategy successful.

Handshaking

Handshaking is mostly used to share requirements between two parties to establish a communication channel. This happens prior to establishing a TCP (transmission control protocol) connection, which is commonly known as the TCP three-way handshake. Also, we see a handshake before establishing a TLS (transport layer security) connection. These are the two most popular handshaking protocols in computer science. The handshaking pattern suggests the use of a handshake by the server to protect it by throttling its own workload. When a microservice is behind a load balancer, it can use this handshaking technique to inform the load balancer whether it is ready to accept more requests or not. Each server hosting the microservice can provide a lightweight health check endpoint. The load balancer can periodically ping this endpoint and see whether the corresponding microservice is ready to accept requests.

Test Harness

All the failures in a distributed system are hard to catch, either in development testing or in QA (quality assurance) testing. Integration testing possibly looks like a better option, but it has its own limitations. Most of the time we build integration tests as per a specification provided by the corresponding service endpoint. Mostly it covers success scenarios, and even in failure cases it defines what exactly to expect, for example the error codes. Not all the systems all the time work as per the specifications. The test harness pattern suggests an approach for integration testing that would allow us to test most of the failure modes, even outside the service specifications.

The test harness is another remote endpoint that represents each of the remote endpoints you need to connect from your microservice. The difference between the test harness and the service endpoint is that the test harness is just for testing failures and it does not worry about the application logic. Test harness has to be written so that it is capable of generating all sorts of errors covering all seven layers of the OSI (open systems interconnection) model. For example, the test harness may send connection refused responses, invalid TCP packets, slow responses, responses with incorrect content types (XML instead of JSON), and many other errors , that we would never expect from the service endpoint under normal circumstances.

Shed Load

If we look at the way TCP (transmission control protocol) works, it provides a listen queue per port. When the connections are flooded against a given port, then all of them will be queued. Each pool has a maximum limit and when the limit is reached, no more connections will be accepted. When the queue is full, any new attempt to establish a connection will be rejected with an ICMP RST (reset) packet. This is how TCP sheds load at the TCP layer. The applications running on top of the TCP layer will pull requests from the TCP connection pool. In practice, most of the applications are exhausted with connections before the TCP connection pool reaches its maximum. The shed load pattern suggests that the applications or the services also should be modeled after TCP.

The application should shed load when it finds out that it is running behind a given SLA (service level agreement). Usually when the applications are exhausted and the running threads are blocked on certain resources, the response time starts to degrade. Such indicators help show whether a given service is running behind an SLA. If so, this pattern advocates shedding load or notifying the load balancer that the service is not ready to accept more requests. This can be combined with the handshaking pattern to build a better solution.

Observability

Collecting data is cheap, but not having it when you need it can be expensive. In March 2016, Amazon was down for 20 minutes and the estimated revenue loss was $3.75 million. In January 2017, there was a system outage at Delta Airlines that caused cancellation of more than 170 flights and resulted in an estimated loss of $8.5 million. In both these cases, if they had the right level of data collected, they could have predicted such behavior or recovered from it as soon as it happened by identifying the root cause. The more information we have, the better decisions we can make.

Observability is a measure of how well internal states of a system can be inferred from knowledge of their external outputs⁶. A company I know used to monitor its employees effective time at work, by calculating the time difference between ins and outs when they swipe the ID card at the front door. This strategy is effective only if all the employees support it or make themselves observable. At the end of each week the Human Resource (HR) department used to send the effective work time by date to each employee. In most of the cases, figures were completely incorrect. The reason is that most people would go out in groups for lunch and tea, and when they would go out and come in, only one person would swipe the card to open the door. Even though we have monitoring in place, it didn’t produce the expected results, as the employees were not cooperating or observable.

There is another company I know that used to track its employees’ in and out times, the places they worked from within the company, by the time they connected to the company’s wireless network, and the location of the wireless endpoint. Even with this approach, we are not tracking employees, but their laptops. We can keep our laptops on our desks and spend the day at the ping-pong table—or go shopping and come back to take the laptop home. Both of these examples highlight one important fact—monitoring is only effective if we have an observable system in place.

Observability is one of the most important aspects that needs to be baked into any microservices design. We may need to track throughput of each microservice, the number of success/failed requests, utilization of CPU, memory and other network resources, and some business related metrics. Chapter 13, “Observability”, includes a detailed discussion on the observability of microservices.

Automation

One of the key rationales behind a microservice architecture is less time to production and shorter feedback cycles. We cannot meet such goals with no automation. A good microservices architecture will only look good on paper (or a whiteboard), if not for the timely advancements in DevOps and tooling around automation. No idea is a good idea if it doesn’t appear at the right time. Microservices came as a good idea, as it had all the tooling support at the time it started to become mainstream, in the form of Docker, Ansible, Puppet, Chef, and many more.

Tooling around automation can be divided into two broad categories—continuous integration tools and continuous deployment tools. Continuous integration enables software development teams to work collaboratively, without stepping on each other's toes. They can automate builds and source code integration to maintain source code integrity. They also integrate with DevOps tools to create an automated code delivery pipeline. Forrester, one of the top analyst firms, in its latest report⁷ on continuous integration tools, identifies the top ten tools in the domain: Atlassian Bamboo, AWS CodeBuild, CircleCI, CloudBees Jenkins, Codeship, GitLab CI, IBM UrbanCode Build, JetBrains TeamCity, Microsoft VSTS, and Travis CI.

The continuous delivery tools bundle applications, infrastructure, middleware, and the supporting installation processes and dependencies into release packages that transition across the lifecycle. The latest Forrester report⁸ on continuous delivery and release automation highlights 15 most significant vendors in the domain: Atlassian, CA Technologies, Chef Software, Clarive, CloudBees, Electric Cloud, Flexagon, Hewlett Packard Enterprise (HPE), IBM, Micro Focus, Microsoft, Puppet, Red Hat, VMware, and XebiaLabs.

12-Factor App

A microservices architecture is not just built around design principles. Some call it a culture . It’s a result of many other collaborative efforts. Yes, the design is a key element, but we also need to worry about collaboration between developers and domain experts, the communication between teams and team members, continuous integration and delivery, and many other issues. The 12 Factor App is a manifesto⁹ published by Heroku in 2012. This manifesto is a collection of best practices and guidelines to build and maintain scalable, maintainable, and portable applications. Even though these best practices are initially derived from the applications deployed on the Heroku cloud platform, today it has become a mantra for any successful microservices deployment. These 12 factors discussed next are quite common and natural, so chances are very high that you are adhering to them, knowingly or unknowingly.

Codebase

The codebase factor highlights the importance of maintaining all your source code in a version control system, and having one code repository per application. Here the application could be our microservice. Having one repository per microservice helps release it independently from other microservices. The microservice must be deployed to multiple environments (test, staging, and production) from the same repository. Having per service repositories also helps the governance aspects of the development process.

Dependencies

The dependencies factor says, in your application you should explicitly declare and isolate your dependencies and should never rely on implicit system-wide dependencies. In practice, if you are building a Java based microservice, you must declare all your dependencies either with Maven in a pom.xml file or with Gradle in a build.gradle file. Maven and Gradle are two very popular build automation tools, but with recent developments, Gradle seems to have the edge over Maven and is used by Google, Netflix, LinkedIn and many other top companies. Netflix, in their microservices development process, uses Gradle with their own build automation tool called Nebula¹⁰. In fact, Nebula is a set of Gradle plugins developed by Netflix.

Managing dependencies for microservices has gone beyond just having them declared under build automation tools. Most of the microservices deployments today rely on containers, for example Docker. If you are new to Docker and containers, do not worry about it now; we discuss those in Chapter 8, when we talk about microservices deployment patterns in detail. With Docker, you can declare not just the core dependencies your microservice needs to run, but also other external dependencies with specific versions, such as the MySQL version, Java version, and so on.

Configuration

The configuration factor emphasizes the need to decouple environment specific settings from the code to configuration. For example, the connection URL of the LDAP or the database server involves environment specific parameters and certificates. These settings should not be baked into the code. Also, even with the configuration files, never commit any kind of credentials into source repositories. This is a common mistake some developers do; they think that since they are using private GitHub repositories, only they have access, which is not true. When you commit your credentials to a private GitHub repository, even though it’s private, those credentials are stored in outside servers in cleartext.

Backing Services

A backing service is any kind of service our application consumes during its normal operations. It can be a database, a caching implementation, LDAP server, message broker, external service endpoint, or any kind of an external service (see Figure 2-15). The backing services factor says that these backing services should be treated as attachable resources. In other words, they should be pluggable into our microservices implementation. We should be able to change the database, LDAP server, or any external end point, just by editing a configuration file or setting up an environment variable. This factor is very much related to the previous one.

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig15_HTML.png — Figure 2-15
Backing services

Build, Release, Run

This is the fifth factor and it highlights the importance of having a clear separation among the build, release, and run phases of an application. Let’s look at how Netflix builds its microservices¹¹. They have a clear separation between these phases (see Figure 2-16). It starts with the developer who builds and locally tests using Nebula. Nebula is a build automation tool developed by Netflix; in fact it is a set of Gradle plugins. Then the changes are committed to the corresponding Git repository. Then a Jenkins job executes Nebula, which builds, tests, and packages the application for deployment. For those who are new to Jenkins, it is a leading open source automation server that facilitates continuous integration and deployment. Once the build is ready, it is baked into an Amazon Machine Image (AMI).

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig16_HTML.png — Figure 2-16
Netflix build process

Spinnaker is another tool used in Netflix’s release process. It’s in fact a continuous delivery platform for releasing software changes with high velocity and confidence, developed by Netflix, and later made open source. Once the AMI is ready to deploy, Spinnaker makes it available for deployment to tens, hundreds, or thousands of instances, and deploys it in the test environment. From there, the development teams will typically exercise the deployment using a set of automated integration tests.

Processes

The sixth factor states that the processes should be stateless and should avoid using sticky sessions. Stateless means that an application should not assume any data to be in the memory before and after it executes an operation. Any microservices deployment compliant with the sixth factor should be designed in a way to be stateless. In a typical enterprise grade microservices deployment, you will find that multiple instances of a given microservice spin up and down based on the load it gets. If we are to maintain some kind of a state in memory in those microservices, it will be a cumbersome process to replicate the state across all the microservice instances and would add lot of complexity. Stateless microservices can be replicated on demand and no coordination is required between each during the bootstrap or even in the runtime, which leads us to the shared nothing architecture.

Shared Nothing Architecture

The shared nothing architecture is a well-established principle in distributed computing. It states that a given node in a system should not share either the disk or the memory with other nodes, which is well more than being just stateless. This helps build highly scalable systems. A scalable system should be able to produce increased throughput against increased load, when more nodes or resources are introduced to the system. If we have shared resources in a system between nodes, then introducing more nodes will introduce more load on to those shared resources, hence the impact on the total throughput will be less.

In a typical web application, the disk is shared between nodes for two main purposes. One is to share some common configuration files between the nodes and, in most of the cases, mounting a shared disk drive to each node does this. If use of a shared drive is not possible, we can build some kind of a replication mechanism between those nodes, possibly using a shared repository. One node may commit its changes to a shared Git or subversion repository, and the other nodes will periodically pull updates. There is another approach, which is quite common these days with the new advancements in DevOps engineering. We can use configuration management tools like Puppet¹² or Chef¹³ to manage configurations centrally and automate the distribution between all the nodes in the system. In today’s microservices deployment, we use a similar kind of approach with slight variations. No configuration changes are done on running servers; instead, a new container will be created with the new configuration using Puppet or Chef and will be deployed into the corresponding servers. This is the same approach Netflix follows.

The second purpose of having a shared disk is for the database. In a traditional web application as well as in a microservices deployment, we cannot totally eliminate the need for having a shared database. But to avoid the issues in scalability, we can have a separate database server, which can scale independently. Then again, even though we share the same database between different nodes of the same microservice, it is not encouraged to share the same database between different microservices. In Chapter 5, “Data Management”, we talk about different data sharing strategies related to microservices.

Port Binding

The port binding factor highlights the need of having completely self-contained applications. If you take a traditional web application, let’s say a Java EE application, it is deployed in some kind of Java EE container, for example a Tomcat, WebLogic, or WebSphere server, as a WAR (Web Application aRchive) file. The web application does not worry about how people (or systems) access it, under which transport (HTTP or HTTPS) or which port. Those decisions are made at the web container level (Tomcat, WebLogic, or WebSphere server configuration). A WAR file cannot exist on its own; it relies on the underneath container to do the transport/port bindings. They are not self-contained.

This seventh factor says your application itself has to do the port binding and expose it as a service, without relying on a third-party container. This is very common in microservice deployments. For example, Spring Boot¹⁴, a popular Java-based microservices framework, lets you build microservices as self contained, self-executable JAR files. There are many other microservice frameworks (Dropwizard¹⁵ and MSF4J¹⁶) out there that do the same. Chapter 4, “Developing Services” covers how to develop and deploy microservices with Spring Boot.

Concurrency

There are two ways an application can scale, either vertically or horizontally. To scale an application vertically, you add more resources to each node. For example, you add more CPU power or more memory. This method is becoming less popular these days, as people looking toward running software on commodity hardware. To scale an application horizontally, you need not to worry about the resources each node has, rather you increase the number of nodes in the system. The 7^th factor, concurrency, says that an application should be able to horizontally scale or scale out.

The ability to dynamically scale (horizontally) is another important aspect of today’s microservice deployments. The system, which controls the deployment, will spin up and spin down server instances as the load goes up and down to increase/decrease the throughput of the entire system. Unless each microservice instance can scale horizontally, dynamic scalability is not possible here.

Disposability

The 9^th factor talks about the ability of an application to spin up fast and shut down gracefully when required. If we cannot spin up an application fast enough, then it’s really hard to achieve dynamic scalability. Most of today’s microservice deployments rely on containers and expect the startup time to be milliseconds. When you architect a microservice, you need to make sure that it adds less overhead to the server startup time. This is further encouraged by following one microservice per host (container) model, which we discuss in detail later in the book in Chapter 8. This is opposed to having multiple monolithic applications per server model, where each application contributes to the server startup time, which is usually in minutes (not even in seconds).

Everything at Google runs on containers. In 2014, Google spun up two billion containers per week. That means for every second Google was firing up on average some 3,300 containers¹⁷.

Dev/Prod Parity

The 10^th factor states the importance of ensuring that the development, staging, and production environments stay identical, as much as possible. In reality many do have fewer resources on development servers, while making staging and production servers identical. When you do not have the same level of resources that you have in the staging and production environments as in the development environment, you sometimes need to wait until staging deployment to discover issues. We have noticed that some companies do not have a cluster in the development environment, and later waste hundreds of developer hours debugging state replication issues in the staging cluster.

It’s not just the number of nodes or hardware resources; this is also applicable to the other services that your application relies on. For example, you should not have a MySQL database in your developer environment, if you plan to have Oracle in the production; don’t have Java 1.6 in your developer environment if you plan to have Java 1.8 in the production. Most of the microservice deployments today rely on container-based (for example, Docker) deployments to avoid such mishaps, as all third-party dependencies are packaged in the container itself. One of the key fundamentals behind microservices is its rapid development and deployment. Early feedback cycle is extremely important and adhering to the 10^th factor helps us get there.

Logs

The 11^th factor says that you need to treat logs as event streams. Logs play two roles in an application. They help identify what is going on in the system and isolate any issues, and they can be used as audit trails. Most of the traditional applications push logs into files, and then these files are pushed into log management applications, like Splunk¹⁸ and Kibana¹⁹. Managing logs in a microservices environment is challenging, as there are many instances of microservices. To cater to a single request, there can be multiple requests generated between these microservices. It is important to have the ability to trace and correlate a given request between all the microservices in the system.

Log aggregation²⁰ is a common pattern followed by many microservice deployments. This pattern suggests introducing a centralized logging service, which aggregates logs from all the microservice instances in the environment. The administrators can search and analyze the logs from the central server and can also configure alerts that are triggered when certain messages appear in the logs. We can further improve this pattern by using a messaging system, as shown in Figure 2-17. This decouples the logging service from all the other microservices. Each microservice will publish a log event (with a correlation ID) to a message queue and the logging service will read from it.

../images/461146_1_En_2_Chapter/461146_1_En_2_Fig17_HTML.jpg — Figure 2-17
Publishing logs from multiple microservices to a centralized logging server

Even though the traditional applications use filesystem-based logging, it is quite discouraged in a microservices environment. Going by the ninth factor, a given microservice should be disposable at any given moment. This introduces the concept of immutable servers. An immutable server is never modified after it is first launched. If we are not modifying a server , then you cannot write anything to its filesystem. Following the immutable server pattern helps reproduce the same server instance from a configuration and dispose at any given time.

Admin Processes

This 12^th factor highlights the need for running administrative tasks as one-off processes. These tasks could be a database migration (due to a new version of the application) of a one-time script that needs to be run along with the application itself. The origin of this 12th factor seems to be little bias toward interpreted languages like Ruby, which support and encourage an interactive programming shell. Once the application is up, the developers can SSH into those servers and run certain scripts via these interactive programming consoles. If we adhere to the 12^th factor, we should completely avoid such remotely done administrative tasks via SSH, rather introduce an admin process and make the admin tasks part of it. In a microservices deployment, just as you run your microservices in different containers, this admin process can also run from its own container.

Beyond the 12 Factor App

It is awesome to see how the original 12 factors introduced in 2012 (at a time when microservices were not mainstream and Docker was not even born) relate to the microservice deployments becoming mainstream today. In 2016, Kevin Hoffman from Pivotal introduced²¹ three more factors to the original set, which we discuss next.

API First

Anyone coming from the SOA background must be familiar with the two approaches commonly used for service development: contract first and code first. With contract first, we first develop the service interface in a programming language independent manner. In the SOAP world, this produces a WSDL (Web Services Description Language), and in REST world, it could be an OpenAPI²² document (formerly known as Swagger). The OpenAPI specification is a powerful definition format to describe RESTful APIs.

With this factor, Kevin highlights the need to start any application development following the API first approach, which is an extension to the contract first approach. In an environment where you have multiple development teams working on multiple microservices, under different schedules, having the API defined first for each microservice, helps all the teams build their microservices against APIs. This helps improve the productivity of the development process, without worrying too much about the implementation details of other microservices one has to integrate. If the implementation of a given microservice is not available at the time you are ready to test, you can simply mock it against the published API. There are many tools out there under different programming languages to create such mock instances.

Telemetry

According to Wikipedia, telemetry is an automated communications process by which measurements and other data are collected at remote or inaccessible points and transmitted to the receiving equipment for monitoring. When it comes to software, this is extremely useful to track the health of production servers and to identify any issues. Kevin Hoffman suggests a great analogy to highlight the need of telemetry for applications. Think of your applications like unmanned space shuttles launched into the space—this definition is so powerful and require no further explanation on the importance of telemetry.

Telemetry data can be categorized into three areas—application performance monitoring, domain specific, health and system logs. The number of HTTP requests, the number of database calls, and the time it takes to serve each request over the time are all recorded under the application performance monitoring category. The domain specific data category records data related to business functions. For example, the Order Processing microservice will push data related to orders being processed, including the number of new orders, open orders, and closed orders by date. The data related to server startup, shutdown, memory consumption and CPU utilization fall in the health and system logs category.

Security

This is a notable missing factor in the original 12 factors. Any application or microservice should worry about security from its early stages of design. There are multiple perspectives in securing microservices. The key driving force behind microservices is the speed to production (or the time to market). One should be able to introduce a change to a service, test it, and instantly deploy it in production. To make sure we do not introduce security vulnerabilities at the code level, we need to have a proper plan for static code analysis and dynamic testing — and most importantly those tests should be part of the continuous delivery (CD) process. Any vulnerability should be identified early in the development lifecycle and should have shorter feedback cycles.

There are multiple microservices deployment patterns (which we will discuss later in the book in Chapter 8), but the most commonly used one is service-per-host model. The host does not necessarily mean a physical machine — most probably it would be a container (Docker). We need to worry about container-level security here and worry about how to isolate a container from other containers and what level of isolation we have between the container and the host operating system.

And last, but not least, we need to worry about application-level security. The design of the microservice should talk about how we authenticate and access control users and how we secure the communication channels between microservices. We discuss microservices security in detail in Chapter 11, “Microservices Security Fundamentals” and Chapter 12, “Securing Microservices”.

Summary

In this chapter we discussed the essential concepts related to the design of microservices. The chapter started with a discussion of domain-driven design principles, which are the key ingredients in modeling microservices from the business point of view. Then we delved deep into microservices design principles and finally concluded with the 12-factor app, which is a collection of best practices and guidelines to build and maintain much scalable, maintainable, and portable applications.

The external parties via messages consume any business functionality that is implemented as a microservice. Microservices can leverage messaging styles such as synchronous messaging and asynchronous messaging based on the business use cases. In the next chapter, we discuss messaging technologies and protocols in detail.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2. Designing Microservices

Create new playlist

Sign In

Sign Up

2. Designing Microservices

Domain-Driven Design

Note

Note

Note

Note

Bounded Context

Note

Note

Context Map

Relational Patterns

Anti-Corruption Layer

Shared Kernel

Conformist

Customer/Supplier

Partnership

Published Language

Open Host Service

Separate Ways

Big Ball of Mud

Design Principles

High Cohesion And Loose Coupling

Note

Resilience

Note

Note

Timeouts

Circuit Breaker

Bulkheads

Steady State

Fail Fast

Let It Crash

Handshaking

Test Harness

Shed Load

Observability

Automation

12-Factor App

Codebase

Dependencies

Configuration

Backing Services

Build, Release, Run

Processes

Shared Nothing Architecture

Port Binding

Concurrency

Disposability

Dev/Prod Parity

Logs

Admin Processes

Beyond the 12 Factor App

API First

Telemetry

Security

Summary

Table of Contents for
2. Designing Microservices