In Chapter 7, “Integrating Microservices,” we discussed that microservices have to communicate with each other and inter-service communication is one of the key challenges in realizing the microservices architecture. In the conventional Service Oriented Architecture (SOA), the centralized Enterprise Service Bus (ESB) facilitated most of the inter-service communication requirements and with the shift to the smart endpoints and dumb pipes with Microservices architecture, now the service developers have to take care of all the complexities of inter-service communication. Service Mesh has immerged as a pattern to overcome most of these challenges. It does this by providing a generic distributed layer that encapsulates the commodity features of inter-service communication which are not part of the business logic of the service.
Note
Some of the concepts and examples in this chapter may require prior knowledge of Docker and Kubernetes. If you are new to either of these technologies, we recommend you read Chapter 8, “Deploying and Running Microservices” first.
In this chapter, we delve deep into the motivation and key concepts behind the Service Mesh pattern. Then we discuss some of the main Service Mesh implementations with real-world examples.
Why Service Mesh?
The main motivation behind the inception of Service Mesh is the challenges we started to encounter after the elimination of the centralized ESB architecture, which lead us to build smart endpoints and dumb pipes.
As with many emerging technologies, there is a lot of hype around the microservices architecture. Most people think that microservices architecture is the answer to all the problems they had with the previous SOA/ESB architecture. However, when we observe the real-world microservices implementations, we can see that most of the functionalities that a centralized bus (ESB) supports are now implemented at the microservices level. We are more or less solving the same set of fundamental problems, but we are solving them at different dimensions with microservices.
When you implement the same scenario using microservices, you no longer have a centralized integration/ESB layer, but a set of microservices. You have to implement all these functionalities at the each microservice level.
Business Logic implements the business functionalities, computations, and service composition/integration logic.
Network Functions take care of the inter-service communication mechanisms (basic service invocation through a given protocol, apply resiliency and stability patterns, service discovery, and instrumentation for observability). These network functions are built on top of the underlying operating system (OS) level network stack.
Now think about the effort involved in implementing such a microservice. Implementing the functionalities related to service-to-service communication from scratch is a nightmare. Rather than focusing on the business logic, you will have to spend a lot of time building service-to-service communication functionalities. This is even worse if you use multiple technologies to build microservices, because you need to duplicate the same efforts across different languages (e.g., the circuit breaker has to be implemented in Java, Node, Python, etc.).
Since most of the inter-service communication requirements are quite generic across all microservices implementations, we can think about offloading all such tasks to a different layer, so that we can keep the service code independent. That’s where Service Mesh comes into the picture.
What Is a Service Mesh?
In a nutshell, a Service Mesh is an inter-service communication infrastructure. With A Service Mesh, a given microservice won’t directly communicate with the other microservices. Rather, all service-to-service communications will take place on top of a software component called the Service Mesh proxy (or sidecar proxy). Sidecar or Service Mesh proxy is a software component that is co-located with the service in the same VM or pod (Kubernetes). The sidecar proxy layer is known as the Data Plane. All these sidecar proxies are controlled via a Control Plane. That is where all the configuration related to inter-service communications are applied.
Sidecar Pattern
A sidecar is a software component that is co-located with the primary application but runs on its own process or container, providing a network interface to connect to it, which makes it language-agnostic. All the core application functionalities are implemented as part of the main application logic, while other commodity crosscutting features, which are not related to the business logic, are facilitated from the sidecar. Usually all the inbound and outbound communication of the application take place via the sidecar proxy.
Service Mesh provides the built-in support for some network functions such as resiliency, service discovery, etc. Therefore, service developers can focus more on the business logic while most of the work related to the network communication is offloaded to the Service Mesh. For instance, you don’t need to worry about circuit breaking anymore when your microservice calls another service. That comes as part of the Service Mesh.
Service Mesh is language-agnostic. The microservice to Service Mesh proxy communication always happens over standard protocols such as HTTP1.x/2.x, gRPC, etc. You can write your microservices from any technology and they will still work with the Service Mesh.
The service directly communicates with the sidecar proxy and it should therefore be capable of doing primitive network functions (such as calling an HTTP service), but should not need to take care of application-level network functions (circuit breakers, etc.).
It’s important to identify the boundaries and responsibilities between a service and a sidecar proxy. As we discussed in Chapter 7, some of the capabilities provided in the Service Mesh are also provided by the microservices development languages. We need to be cautious when implementing a given capability at each layer. Let’s look next at each of these layers and their responsibilities in detail.
Business Logic
The service implementation should contain the realization of the business functionalities of a given service. This includes logic related to its business functions, computations, integration with other services/systems (including legacy, proprietary, and SaaS) or service compositions, complex routing logics, type mapping logic between different business entities, etc.
Primitive Network Functions
Although we offload most of the network functions to the Service Mesh, a given service must contain the basic high-level network interactions to connect with the Service Mesh sidecar proxy. Hence, a given service implementation will have to use some kind of a network library (unlike the ESB world, where you just have to use a very simple abstraction) to initiate network calls (to the Service Mesh proxy only). In most cases, microservices development frameworks embed the required network libraries to be used for these functions (e.g. basic HTTP transport).
Application Network Functions
There are application functionalities that are tightly coupled to the network, such as circuit breaking, timeouts, service discovery, etc. Those are explicitly separated from the service code/business logic, and Service Mesh facilitates those functionalities out-of-the-box.
Most of the initial microservices implementations simply ignored the gravity of the network functions offered from a centralized ESB layer, and they implemented all such functionalities from scratch at each microservice level. Now they have started realizing the importance of having a similar shared functionality as a distributed mesh.
Control Plane
All Service Mesh proxies are centrally managed by a control plane. This is quite useful when supporting Service Mesh capabilities such as access control, observability, service discovery, etc. All the changes you make at the control plane are pushed into sidecar proxies.
Functionalities of a Service Mesh
As we saw earlier, the Service Mesh offers a set of application network functions while the primitive network functions (such as calling the sidecar over the localhost network) are still implemented at the microservices level. There is no hard and fast rule on what functionalities should be offered from a Service Mesh. However, some of the commonly offered features by a Service Mesh are mentioned in the following sections.
Resiliency for Inter-Service Communications
The network communication capabilities, such as circuit-breaking, retries and timeouts, fault injection, fault handling, load balancing, and failover are supported as part of the Service Mesh. With microservices, we used to have such capabilities implemented as part of the service logic. With the Service Mesh in place, you will not have to build such network functions as part of your service code.
Service Discovery
The services that you run with the Service Mesh needed to be discovered via a logical naming scheme (no hardcoded hosts or ports). Therefore, Service Mesh works with a given service discovery tool to support service registration and discovery. Most Service Mesh implementations come with out-of-the-box capabilities to support service discovery. For example, Istio comes with built-in support for service discovery using the underlying Kubernetes and etcd1. If you already have a service discovery solution such as Consul2, it can also be integrated with the Service Mesh.
Routing
Some of the primitive routing capabilities, such as routing based on, certain headers, versions etc., are supported by Service Mesh. We have to be really careful about what we implement at the Service Mesh routing layer to not to have any business logic as part of the Service Mesh routing logic.
Observability
When you use Service Mesh, all your services are automatically becoming observable without any changes to your code. Metrics, monitoring, distributed logging, distributed tracing, and service visualizations are available out-of-the-box. Since all the traffic data is captured at the sidecar proxy level, sidecar proxy can publish those data to the relevant control plane components that are responsible to analyze are published to corresponding observability tools.
Security
Service Mesh supports transport level security (TLS) between service-to-service communication and Role-Based-Access Control (RBAC). Also, some of the existing Service Mesh implementations are constantly adding more security-related capabilities to the Service Mesh implementations.
Deployment
Almost all the Service Mesh implementations are closely integrated to container and container management systems. Docker and Kubernetes are the de-facto standards for deployment options with Service Meshes. However, running inside VMs is also possible.
Inter-Service Communication Protocols
Service Meshes support different communication protocols, such as HTTP1.x, HTTP2, and gRPC. The service has to communicate with the sidecar with the same protocol of the service that it would like to proxy to. Service Mesh takes care of most of the low-level communication details while your service code uses primitive network capabilities to invoke the sidecar. Now let’s look at some of the popular Service Mesh implementations out there.
Istio
Istio3 is an open platform to connect, manage, and secure microservices. It provides a communication infrastructure for inter-microservices communication, with resiliency, routing, load balancing, service-to-service authentication, observability, and more, without requiring any changes in your service code.
You can simply add your service to the Istio Service Mesh, by deploying the Istio sidecar proxy alongside your service. As we discussed earlier, the Istio sidecar proxy is responsible for all network communication between microservices, configured and managed using Istio control plane. Istio deployment is deeply married to Kubernetes, but deploying it on some of the other systems is also possible.
Istio Architecture
Data plane: The data plane is composed of a set of sidecar proxies that route and control all network communication between microservices. Istio’s data plane is mainly composed of the Envoy proxy (which is developed by Lyft).
Control plane: The control plane is responsible for managing and configuring sidecar proxies to change their network communication behaviors. Control plane is composed of Pilot, Mixer, and Citadel components.
Istio Proxy
In its data plane, Istio uses an enhanced version of the Envoy5 proxy , which is a high-performance proxy developed in C++, to mediate all inbound and outbound traffic for all the services in the Service Mesh. Istio leverages Envoy’s many built-in features such as dynamic service discovery, load balancing, TLS termination, HTTP/2 and gRPC proxying, circuit breakers, health checks, staged rollouts with percentage-based traffic split, fault injection, and rich metrics.
Envoy is deployed as a sidecar alongside your microservice and it takes care of all the ingress and egress network communication of your microservice.
Mixer
Mixer allows you to fully decouple your service/application code from policy decision-making, so that you can move policy decisions out of the application layer into the configuration instead, which is under operator control. The application code instead does a fairly simple integration with Mixer, and Mixer takes responsibility for interfacing with the backend systems.
Mixer provides three main capabilities in the Istio ecosystem. The Istio proxy sidecar logically calls Mixer before each request to perform precondition checks and after each request to report telemetry. The sidecar has local caching such that a large percentage of precondition checks can be performed with the cache. Additionally, the sidecar buffers outgoing telemetry so that it only calls Mixer infrequently.
Pilot
Pilot maintains a canonical representation of services in the mesh that is independent of the underlying platform. Pilot abstracts platform-specific service discovery mechanisms and synthesizes them into a standard format consumable by any sidecar that conforms to the Envoy data plane APIs.
Citadel
Citadel provides strong service-to-service and end-user authentication using mutual TLS, with built-in identity and credential management. It can be used to upgrade unencrypted traffic in the Service Mesh and provide operators with the ability to enforce policies based on service identity rather than on network controls.
Note
The scope of this book is to give an introduction to Istio as a Service Mesh implementation. For any low-level details and further information on Istio, we recommend you follow the Istio documentation7 and recommend the book entitled Introducing Istio Service Mesh for Microservices8 by Christian Posta and Burr Sutter.
Using Istio
In this section, we take a closer look at some of the capabilities of Istio with some use cases. We will only cover a selected set of commonly used microservices scenarios; for other scenarios, it is highly recommended that you refer to the Istio official documentation.
Note
Istio examples are heavily dependent on Docker and Kubernetes. Therefore, it is recommended that you read Chapter 8 if you are not familiar with Docker or Kubernetes.
Running Your Services with Istio
Running your microservice with Istio is trivially easy. If you are running your service on Kubernetes, then as the first step you need to create the Docker image for your service. Once you have the Docker image, then you need to create the Kubernetes artifacts to deploy the service.
For example, suppose that you want to develop a simple hello service and you have created the Docker image and Kubernetes artifacts to deploy the service. What we have shown here is the generic Kubernetes deployment artifact for that service.
For example, in the following Kubernetes descriptor, you can find the configuration for Kubernetes service and deployment components. In addition, you need to include two Istio specific configurations, namely VirtualService and Gateway .
A VirtualService defines the rules that control how requests for a service are routed in an Istio Service Mesh. A Gateway configures a load balancer for HTTP/TCP traffic, most commonly operating at the edge of the mesh to enable ingress traffic for an application.
Now you want to deploy this service on Istio. To do that, you need to inject the Istio sidecar into your deployment. This can either be done as an automatic capability of Istio installation or as a manual process. To understand the behavior properly, let’s use the manual sidecar injection.
That’s all you have to do. Now you can access your service via the Node port or an ingress (if any) and the traffic flows through the Istio Service Mesh. (If needed, you can verify this by enabling tracing at the Istio level. We discuss how to do that in the next couple of sections.)
The BookInfo use case is comprised of four polyglot services—Product Page, Reviews, Details, and Ratings. Now let’s move on to some of the requirements of this use case in which we can leverage Istio.
Note
You can try most of the following Istio examples by following the guidelines given in the Istio documentation10.
Traffic Management with Istio
When one service calls the other service or when a given service is exposed to external clients, you can apply Istio’s traffic-management capabilities to route the traffic based on different mechanisms. It decouples the traffic flow and infrastructure scaling, letting you specify via Pilot what rules they want the traffic to follow rather than which specific pods/VMs should receive traffic. The traffic-management feature also includes dynamic request routing for A/B testing, gradual rollouts, canary releases, failure recovery using timeouts, retries, circuit breakers, and fault injection.
A VirtualService defines a set of traffic routing rules to apply when a host is addressed. Each routing rule defines matching criteria for traffic of a specific protocol. If the traffic is matched, it is sent to a named destination service (or subset/version of it) defined in the registry. The source of traffic can also be matched in a routing rule. This allows routing to be customized for specific client contexts.
A DestinationRule configures the set of policies to be applied to a request after VirtualService routing has occurred. These rules specify configuration for load balancing, connection pool size from the sidecar, and outlier detection settings to detect and evict unhealthy hosts from the load balancing pool.
A ServiceEntry is commonly used to enable requests to services outside of an Istio Service Mesh.
A Gateway configures a load balancer for HTTP/TCP traffic, most commonly operating at the edge of the mesh to enable ingress traffic for an application.
Request Routing
Let’s try to build a simple request routing scenario using the Istio’s BookInfo example. The Istio’s BookInfo example consists of four separate microservices, each with multiple versions. Suppose that we need to apply a routing rule that routes all traffic to v1 (version 1) of the Ratings service.
Resiliency
As part of resilient inter-service communication techniques, you can use timeouts when calling another service via the Istio sidecar proxy. For example, suppose that you want to apply a timeout when you call the Reviews service of the Istio Bookinfo example. Then you can include the timeout configuration as part of your virtual service created for the Reviews service.
Similarly, there are a number of other service resiliency related capabilities that you can apply when you invoke your services via Istio.
Fault Injection
Policy Enforcement
We introduced Istio Mixer earlier in this chapter as one of the main components of Istio, which is responsible for policy enforcement and telemetric collection. Let’s look at how we can leverage policy enforcement with respect to rate limiting.
For example, suppose that you need to configure Istio to rate-limit traffic to the Product Page service based on the IP address of the originating client. You will use the X-Forwarded-For request header as the client IP address.
This memquota handler defines three different rate-limit schemes. The default, if no overrides match, is 500 requests per one second (1s). The first is 1 request (maxAmount) every 5s (validDuration), if the destination is reviews.
This QuotaSpecBinding binds the QuotaSpec you created to the services you want to apply it to. The Product Page service is explicitly bound to request-count. Note that you must define the namespace since it differs from the namespace of the QuotaSpecBinding.
Observability
When you are using Istio, making your services observable is trivially easy. For example, suppose that you want to enable distributed tracing for your microservices application. Then you need to install the corresponding add-on (such as Zipkin11 or Jaeger12) into your Istio installation. Now when you send requests to your microservices, they will go through the sidecar proxies. Sidecar proxies are capable of sending tracing spans automatically to either Zipkin or Jaeger.
Istio proxies can also automatically send spans. They need some hints to tie together the entire trace. Applications need to propagate the appropriate HTTP headers so that when the proxies send span information to Zipkin or Jaeger, the spans can be correlated correctly into a single trace.
Similarly, other aspects of observability can also be supported with no or minimal changes to your code. Refer to the Istio documentation13 for more details on how you can make your services observable with Istio. We discuss observability in detail, with respect to microservices, in Chapter 13, “Observability”.
Security
Mutual TLS (mTLS) authentication between services
Whitelisting and blacklisting of services
Access control with denials
Role-Based Access Control (RBAC)
For further details on these security use cases, refer to the Istio documentation14.
Since you had a closer look at Istio, let’s look at another popular Service Mesh implementation, Linkerd.
Linkerd
Linkerd is an open source network proxy designed to be deployed as a Service Mesh: a dedicated layer for managing, controlling, and monitoring service-to-service communication within an application.
Linkerd takes care of the difficult, error-prone parts of cross-service communication—including latency-aware load balancing, connection pooling, TLS, instrumentation, and request-level routing. Linkerd is built on top of Netty and Finagle.
From the previous file-based service discovery configuration, Linkerd can resolve the host and port (9999) for the service web.
failureAccrual is configured in Linkerd’s configuration under the client section, so any failure related to the backend reside on that route and are subjected to the client, which is resilient.
Should We Use Service Mesh?
At the time this book was written, there was a lot of traction toward using Service Mesh. However, production usage of Service Mesh was rare at this time. Therefore, we don’t have sufficient understanding of the real pros and cons yet. Let’s look at some of the key areas that we need to be mindful about when using Service Mesh.
Pros
Developers can focus more on business functionality than on inter-service communication: Most of the commodity features are implemented outside microservice code and are reusable.
Observability out of the box: Services are innately observable via the sidecar. Hence, distributed tracing, logging, metrics, etc. require no additional effort from the service developer.
Polyglot-services friendly: More freedom when it comes to selecting a microservices implementation language. You don’t need to worry about whether a given language supports or has libraries to build network application functions.
Centrally managed decentralized system: Most of the capabilities can be managed centrally via the control plane and pushed into decentralized sidecar runtimes.
If you are already running Kubernetes in your enterprise, adopting Service Mesh into your architecture is quite straightforward.
Cons
Complexity: Having a Service Mesh drastically increases the number of runtime instances that you have in a given microservices implementation.
Adding extra hops: Each service call has to go through an extra hop (through the Service Mesh sidecar proxy).
Service Meshes address a subset of problems: Service Mesh only addresses a subset of inter-service communication problems, and there are a lot of complex problems it doesn’t address, such as complex routing, transformation/type mapping, and integrating with other services and systems. These must be solved by your microservice’s business logic.
Immature: Service Mesh technologies are relatively new to be declared as full production ready for the large-scale deployments.
Summary
In this chapter, we took a detailed look at the concept of a Service Mesh and the key reason behind its inception. Service Mesh tries to simplify some of the inter-service communication complexities and service governance requirements. When we use a Service Mesh, the developers don’t need to worry about the inter-service communication and most of the other crosscutting features of a service such as security, observability, etc. Each microservice runs with a co-located sidecar, which is controlled by a central control plan. The sidecars are controlled using a predefined configuration, which is pushed via the control plane. Istio is one of the most commonly used implementations of the Service Mesh. The Service Mesh concept is relatively new and yet to be fully battle-tested. So, we need to be very aware of its pros and cons.