Chapter 11. Operations and Continuous Delivery of Microservices

Deployment and operation are additional components of the continuous delivery pipeline (see section 10.1). When the software has been tested in the context of the pipeline, the microservices go into production. There, monitoring and logging collect information that can be used for the further development of the microservices.

The operation of a microservice-based system is more laborious than the operation of a deployment monolith. There are many more deployable artifacts that all have to be surveilled. Section 11.1 discusses the typical challenges associated with the operation of microservice-based systems in detail. Logging is the topic of section 11.2. Section 11.3 focuses on the monitoring of the microservices. Deployment is treated in section 11.4. Section 11.6 shows necessary measures for directing a microservice from the outside, and finally, section 11.7 describes suitable infrastructures for the operation of microservices.

The challenges associated with operation should not be underestimated. It is in this area where the most complex problems associated with the use of microservices frequently arise.

11.1 Challenges Associated with the Operation of Microservices

There are a number of challenges associated with the operation of microservices. The main challenges are covered in this section.

Numerous Artifacts

Teams that have so far only run deployment monoliths are confronted with the problem that there are many additional deployable artifacts in microservices-based systems. Each microservice is independently brought into production and therefore a separate deployable artifact. Fifty, one hundred, or more microservices are definitely possible. The concrete number depends on the size of the project and the size of the microservices. Such a number of deployable artifacts is hardly met with outside of microservices-based architectures. All these artifacts have to be versioned independently because only then can which code runs currently in production be tracked. Besides, this enables bringing a new version of each microservice independently into production.

When there are so many artifacts, there has to be a correspondingly high number of continuous delivery pipelines. They comprise not only the deployment in production but also the different testing phases. In addition, many more artifacts have to be surveilled in production by logging and monitoring. This is only possible when all these processes are mostly automated. For a small number of artifacts, manual interventions might still be acceptable. Such an approach is simply not possible any more for the large number of artifacts contained in a microservice-based architecture.

The challenges in the areas of deployment and infrastructure are the most difficult ones encountered when introducing microservices. Many organizations are not sufficiently proficient in automation although automation is also very advantageous in other architectural approaches and should already be routine.

There are different approaches for achieving the necessary automation.

Delegate into Teams

The easiest option is to delegate this challenge to the teams that are responsible for the development of the microservices. In that case each team has not only to develop its microservice but also to take care of its operation. They have the choice to either use appropriate automation for it or to adopt automation approaches from other teams.

The team does not even have to cover all areas. When there is no need to evaluate log data to achieve reliable operation, the team can decide not to implement a system for evaluating log data. A reliable operation without surveilling the log output is hardly possible, though. However, this risk is then within the responsibility of the respective team.

This approach only works when the teams have a lot of knowledge regarding operation. Another problem is that the wheel is invented over and over again by the different teams: each team implements automation independently and might use different tools for it. This approach entails the danger that the laborious operation of the microservices gets even more laborious due to the heterogeneous approaches taken by the teams. The teams have to do this work. This interferes with the rapid implementation of new features. However, the decentralized decision about which technologies to use increases the independence of the teams.

Unify Tools

Because of the higher efficiency, unification can be a sensible approach for deployment. The easiest way to obtain uniform tools is to prescribe one tool for each area—deployment, test, monitoring, logging, and deployment pipeline. In addition, there will be guidelines and best practices such as immutable server or the separation of build environment and deployment environment. This enables the identical implementation of all microservices and will facilitate operation since the teams only need to be familiar with one tool for each area.

Specify Behavior

Another option is to specify the behavior of the system. For example, when log output is supposed to be evaluated in a uniform manner across services, it is sufficient to define a uniform log format. The log framework does not necessarily have to be prescribed. Of course, it is sensible to offer a configuration that generates this output format for at least one log framework. This increases the motivation of the teams to use this log framework. In this way uniformity is not forced but emerges on its own since the teams will minimize their own effort. When a team regards the use of another log framework or programming language that necessitates another log framework as more advantageous, it can still use these technologies.

Defining uniform formats for log output has an additional advantage: the information can be delivered to different tools that process log files differently. This enables operations to screen log files for errors while the business stakeholders create statistics. Operation and business stakeholders can use different tools that use the uniform format as shared basis.

Similarly, behavior can be defined for the other areas of operation such as deployment, monitoring, or the deployment pipeline.

Micro and Macro Architecture

Which decisions can be made by the team and which have to be made for the overall project correspond to the separation of the architecture into micro and macro architecture (see section 12.3). Decisions the team can make belong to micro architecture while decisions that are made across all teams for the overall project are part of the macro architecture. Technologies or the desired behavior for logging can be either part of the macro or the micro architecture.

Templates

Templates offer the option to unify microservices in these areas and to increase the productivity of the teams. Based on a very simple microservice, a template demonstrates how the technologies can be used and how microservices are integrated into the operation infrastructure. The example can simply respond to a request with a constant value since the domain logic is not the point here.

The template will make it easy and fast for a team to implement a new micro-service. At the same time, each team can easily make use of the standard technology stack. So the uniform technical solution is at the same time the most attractive for the teams. Templates achieve a large degree of technical uniformity between microservices without prescribing the technology used. In addition, a faulty use of the technology stack is avoided when the template demonstrates the correct use.

A template should contain the complete infrastructure in addition to the code for an exemplary microservice. This refers to the continuous delivery pipeline, the build, the continuous integration platform, the deployment in production, and the necessary resources for running the microservice. Especially build and continuous delivery pipeline are important since the deployment of a large number of microservices is only possible when these are automated.

The template can be very complex when it really contains the complete infrastructure—even if the respective microservice is very simple. It is not necessarily required to provide a complete and perfect solution at once. The template can also be built up in a stepwise manner.

The template can be copied into each project. This entails the problem that changes to the template are not propagated into the existing microservices. On the other hand, this approach is much easier to implement than an approach that enables the automated adoption of changes. Besides, such an approach would create dependencies between the template and practically all microservices. Such dependencies should be avoided for microservices.

The templates fundamentally facilitate the generation of new microservices. Accordingly, teams are more likely to create new microservices. Therefore, they can more easily distribute microservices in multiple smaller microservices. Thus templates help to keep microservices small. When the microservices are rather small, the advantages of a microservice-based architecture can be exploited even better.

11.2 Logging

By logging, an application can easily provide information about which events occurred. These can be errors, but they can also be events like the registration of a new user that are mostly interesting for statistics. Finally, log data can help developers to locate errors by providing detailed information.

In normal systems logs have the advantage that they can be written very easily and that the data can be persisted without huge effort. Besides, log files are human-readable and can be easily searched.

Logging for Microservices

For microservices writing and analyzing log files is hardly sufficient:

• Many requests can only be handled by the interplay of multiple microservices. In that case the log file of a single microservice is not sufficient to understand the complete sequence of events.

• The load is often distributed across multiple instances of one microservice. Therefore, the information contained in the log file of an individual instance is not very useful.

• Finally, due to increased load, new releases, or crashes, new instances of a microservice start constantly. The data from a log file can get lost when a virtual machine is shut down and its hard disk is subsequently deleted.

It is not necessary for microservices to write logs into their file system because the information cannot be analyzed there anyhow. Only writing to the central log server is definitely necessary. This has also the advantage that the microservices utilize less local storage.

Usually, applications just log text strings. The centralized logging parses the strings. During parsing relevant pieces of information like time stamps or server names are extracted. Often parsing goes even beyond that and scrutinizes the texts more closely. If it is possible, for instance, to determine the identity of the current user from the logs, all information about a user can be selected from the log data of the microservices. In a way the microservice hides the relevant information in a string that the log system subsequently takes apart again. To facilitate the parsing log data can be transferred into a data format like JSON. In that case the data can already be structured during logging. They are not first packaged into a string that then has to be laboriously parsed. Likewise, it is sensible to have uniform standards: When a microservice logs something as an error, then an error should really have occurred. In addition, the semantics of the other log levels should be uniform across all microservices.

Technologies for Logging via the Network

Microservices can support central logging by sending log data directly via the network. Most log libraries support such an approach. Special protocols like GELF (Graylog Extended Log Format)1 can be used for this or long-established protocols like syslog, which is the basis for logging in UNIX systems. Tools like the logstash-forwarder,2 Beaver,3 or Woodchuck4 are meant to send local files via the network to a central log server. They are sensible in cases where the log data is supposed to be also locally stored in files.

1. https://www.graylog.org/

2. https://github.com/elastic/logstash-forwarder

3. https://github.com/python-beaver/python-beaver

4. https://github.com/danryan/woodchuck

ELK for Centralized Logging

Logstash, Elasticsearch, and Kibana can serve as tools for the collection and processing of logs on a central server (see Figure 11.1). These tools form the ELK stack (Elasticsearch, Logstash, Kibana).

• With the aid of Logstash5 log files can be parsed and collected by servers in the network. Logstash is a very powerful tool. It can read data from a source, modify or filter data, and finally write it into a sink. Apart from importing logs from the network and storage in Elasticsearch, Logstash supports many other data sources and data sinks. For example, data can be read from message queues or databases or written into them. Finally, Logstash can also parse data and supplement it—for example, time stamps can be added to each log entry, or individual fields can be cut out and further processed.

5. https://www.elastic.co/products/logstash

• Elasticsearch6 stores log data and makes it available for analyses. Elasticsearch cannot only search the data with full text search, but it can also search in individual fields of structured data and permanently store the data like a database. Finally, Elasticsearch offers statistical functions and can use those to analyze data. As a search engine Elasticsearch is optimized for fast response times so that the data can be analyzed quasi-interactively.

6. https://www.elastic.co/products/elasticsearch

Image

Figure 11.1 ELK Infrastructure for Log Analysis

• Kibana7 is a web user interface that enables analysis of data from Elasticsearch. In addition to simple queries, statistical evaluations, visualizations and diagrams can be created.

7. https://www.elastic.co/products/kibana

All three tools are open source projects and are available under the Apache 2.0 license.

Scaling ELK

Especially in case of microservices, log data often accumulates in large amounts. Therefore, in microservice-based architectures the system for the central processing of logs should be highly scalable. Good scalability is one of the advantages of the ELK stack:

Elasticsearch can distribute the indices into shards. Each data set is stored in a single shard. As the shards can be located on different servers, this makes possible load balancing. In addition, shards can be replicated across several servers to improve fail-safe qualities of the system. Besides, a read access can be directed to an arbitrary replica of the data. Therefore, replicas can serve to scale read access.

Logstash can write logs into different indices. Without an additional configuration Logstash would write the data for each day into a different index. Since the current data usually is read more frequently, this enables reduction of the amount of data that has to be searched for a typical request and therefore improves performance. Besides, there are still other possibilities to distribute the data to indices—for instance, according to the geographic origin of the user. This also promotes the optimization of the data amounts that has to be searched.

• Log data can be buffered in a broker prior to processing by Logstash. The broker serves as buffer. It stores the messages when there are so many log messages that they cannot be immediately processed. Redis8 is often used as broker. It is a fast in memory database.

8. http://redis.io/

Graylog

The ELK stack is not the only solution for the analysis of log files. Graylog9 is also an open source solution and likewise utilizes Elasticsearch for storing log data. Besides it uses MongoDB for metadata. Graylog defines its own format for the log messages: The already mentioned GELF (Graylog Extended Log Format) standardizes the data that is transmitted via the network. For many log libraries and programming languages there are extensions for GELF. Likewise, the respective information can be extracted from the log data or surveyed with the UNIX tool syslog. Also Logstash supports GELF as in- and output format so that Logstash can be combined with Graylog. Graylog has a web interface that makes it possible to analyze the information from the logs.

9. https://www.graylog.org/

Splunk

Splunk10 is a commercial solution that has already been on the market for a long time. Splunk presents itself as a solution that not only analyzes log files but can generally analyze machine data and big data. For processing logs Splunk gathers the data via a forwarder, prepares it via an indexer for searching, and search heads take over the processing of search requests. Its intention to serve as an enterprise solution is underlined by the security concept. Customized analysis, but also alerts in case of certain problems, are possible. Splunk can be extended by numerous plugins. Besides there are apps that provide ready-made solutions for certain infrastructures, such as Microsoft Windows Server. The software does not necessarily have to be installed in your own computing center, but is also available as a cloud solution.

10. http://www.splunk.com/

Stakeholders for Logs

There are different stakeholders for logging. However, the analysis options of the log servers are so flexible and the analyses so similar that one tool is normally sufficient. The stakeholders can create their own dashboards with the information that is relevant to them. For specific requirements the log data can be passed on to other systems for evaluation.

Correlation IDs

Often multiple microservices work together on a request. The path the request takes through the microservices has to be traceable for analysis. For filtering all log entries to a certain customer or to a certain request, a correlation ID can be used. This ID unambiguously identifies a request to the overall system and is passed along during all communication between microservices. In this manner log entries for all systems to a single request are easy to find in the central log system, and the processing of the requests can be tracked across all microservices.

Such an approach can, for instance, be implemented by transferring a request ID for each message within the headers or within the payloads. Many projects implement the transfer in their own code without using a framework. For Java there is the library tracee,11 which implements the transfer of the IDs. Some log frameworks support a context that is logged together with each log message. In that case it is only necessary to put the correlation ID into the context when receiving a message. This obliterates the need to pass the correlation ID on from method to method. When the correlation ID is bound to the thread, problems can arise when the processing of a request involves several threads. Setting the correlation ID in the context ensures that all log messages contain the correlation ID. How the correlation ID is logged has to be uniform across all microservices so that the search for a request in the logs works for all microservices.

11. https://github.com/tracee/tracee

Zipkin: Distributed Tracing

Also in regard to performance, evaluations have to be made across microservices. When the complete path of the requests is traceable, which microservice represents a bottleneck and requires an especially long time for processing requests can be identified. With the aid of distributed tracing which microservice needs how much time for answering a request and where optimization should start can be determined. Zipkin12 enables exactly this type of investigations.13 It comprises support for different network protocols so that a request ID is automatically passed on via these protocols. In contrast to the correlation IDs, the objective is not to correlate log entries, but to analyze the time behavior of the microservices. For this purpose, Zipkin offers suitable analysis tools.

12. https://github.com/openzipkin/zipkin

13. https://blog.twitter.com/2012/distributed-systems-tracing-with-zipkin


Try and Experiment

• Define a technology stack that enables a microservice-based architecture to implement logging:

How should the log messages be formatted?

Define a logging framework if necessary.

Determine a technology for collecting and evaluating logs.

This section listed a number of tools for the different areas. Which properties are especially important? The objective is not a complete product evaluation, but a general weighing of advantages and disadvantages.

Chapter 13, “Example of a Microservice-Based Architecture,” shows an example for a microservice-based architecture, and in section 13.15 there are suggestions about how the architecture can be supplemented with a log analysis.

• How does your current project handle logging? Is it possible to implement parts of these approaches and technologies in your project also?


11.3 Monitoring

Monitoring surveils the metrics of a microservice and uses information sources other than logging. Monitoring uses mostly numerical values that provide information about the current state of the application and indicate how this state changes over time. Such values can represent the number of processed calls over a certain time, the time needed for processing the calls, or also system values like the CPU or memory utilization. If certain thresholds are surpassed or not reached, this indicates a problem and can trigger an alarm so that somebody can solve the problem. Or even better: The problem is solved automatically. For example, an overload can be addressed by starting additional instances.

Monitoring offers feedback from production that is not only relevant for operation but also for developers or the users of the system. Based on the information from monitoring they can better understand the system and therefore make informed decisions about how the system should be developed further.

Basic Information

Basic monitoring information should be mandatory for all microservices. This makes it easier to get an overview of the state of the system. All microservices should deliver the required information in the same format. Besides, components of the microservice system can likewise use the values. Load balancing, for instance, can use a health check to avoid accessing microservices that cannot process calls.

The basic values all microservices should provide can comprise the following:

• There should be a value that indicates the availability of the microservice. In this manner the microservice signals whether it is capable of processing calls at all (“alive”).

• Detailed information regarding the availability of the microservice is another important metric. One relevant piece of information is whether all microservices used by the microservice are accessible and whether all other resources are available (“health”). This information does not only indicate whether the microservice functions but also provide hints about which part of a microservice is currently unavailable and why it failed. Importantly, it becomes apparent whether the microservice is unavailable because of the failure of another microservice or because the respective microservice itself is having a problem.

• Information about the version of a microservice and additional meta information like the contact partner or libraries used and their versions as well as other artifacts can also be provided as metrics. This can cover part of the documentation (see section 7.15). Alternatively, which version of the microservice is actually currently in production can be checked. This facilitates the search for errors. Besides, an automated continuous inventory of the microservices and other software used is possible, which simply inquires after these values.

Additional Metrics

Additional metrics can likewise be recorded by monitoring. Among the possible values are, for instance, response times, the frequency of certain errors, or the number of calls. These values are usually specific for a microservice so that they do not necessarily have to be offered by all microservices. An alarm can be triggered when certain thresholds are reached. Such thresholds are different for each microservice.

Nevertheless, a uniform interface for accessing the values is sensible when all microservices are supposed to use the same monitoring tool. Uniformity can reduce expenditure tremendously in this area.

Stakeholders

There are different stakeholders for the information from monitoring:

Operations wants to be informed about problems in a timely manner to enable a smooth operation of the microservice. In case of acute problems or failures it wants to get an alarm—at any time of day or night—via different means like a pager or SMS. Detailed information is only necessary when the error has to be analyzed more closely—often together with the developers. Operations is interested not only in observing the values from the microservice itself, but also in monitoring values of the operating system, the hardware, or the network.

Developers mostly focus on information from the application. They want to understand how the application functions in production and how it is utilized by the users. From this information they deduce optimizations, especially at the technical level. Therefore, they need very specific information. If the application is, for instance, too slow in responding to a certain type of call, the system has to be optimized for this type of call. To do so it is necessary to obtain as much information as possible about exactly this type of call. Other calls are not as interesting. Developers evaluate this information in detail. They might even be interested in analyzing calls of just one specific user or a circle of users.

• The business stakeholders are interested in the business success and the resulting business numbers. Such information can be provided by the application specifically for the business stakeholders. The business stakeholders then generate statistics based on this information and therefore prepare business decisions. On the other hand, they are usually not interested in technical details.

The different stakeholders are not only interested in different values but also analyze them differently. Standardizing the data format is sensible to support different tools and enables all stakeholders to access all data.

Figure 11.2 shows an overview of a possible monitoring of a microservice-based system. The microservice offers the data via a uniform interface. Operations uses monitoring to surveil for instance threshold values. Development utilizes a detailed monitoring to understand processes within the application. Finally, the business stakeholders look at the business data. The individual stakeholders might use more or less similar approaches: The stakeholders can, for instance, use the same monitoring software with different dashboards or entirely different software.

Image

Figure 11.2 Stakeholders and Their Monitoring Data

Correlate with Events

In addition, it can be sensible to correlate data with an event, such as a new release. This requires that information about the event has to be handed over to monitoring. When a new release creates markedly more revenue or causes decisively longer response times, this is an interesting realization.

Monitoring = Tests?

In a certain way monitoring is another version of testing (see section 10.4). While tests look at the correct functioning of a new release in a test environment, monitoring examines the behavior of the application in a production environment. The integration tests should also be reflected in monitoring. When a problem causes an integration test to fail, there can be an associated alarm in monitoring. Besides, monitoring should also be activated for test environments to pinpoint problems already in the tests. When the risk associated with deployments is reduced by suitable measures (see section 11.4), the monitoring can even take over part of the tests.

Dynamic Environment

Another challenge when working with microservice-based architectures is that microservices come and go. During the deployment of a new release, an instance can be stopped and started anew with a new software version. When servers fail, instances shut down, and new ones are started. For this reason, monitoring has to occur separately from the microservices. Otherwise the stopping of a microservice would influence the monitoring infrastructure or may even cause it to fail. Besides, microservices are a distributed system. The values of a single instance are not telling in themselves. Only by collecting values of multiple instances does the monitoring information become relevant.

Concrete Technologies

Different technologies can be used for monitoring microservices:

• Graphite14 can store numerical data and is optimized for processing time-series data. Such data occurs frequently during monitoring. The data can be analyzed in a web application. Graphite stores the data in its own database. After some time, the data is automatically deleted. Monitoring values are accepted by Graphite in a very simple format via a socket interface.

14. http://graphite.wikidot.com/

• Grafana15 extends Graphite by alternative dashboards and other graphical elements.

15. http://grafana.org/

• Seyren16 extends Graphite by a functionality for triggering alarms.

16. https://github.com/scobal/seyren

• Nagios17 is a comprehensive solution for monitoring and can be an alternative to Graphite.

17. http://www.nagios.org/

• Icinga18 has originally been a fork of Nagios and therefore covers a very similar use case.

18. https://www.icinga.org/

• Riemann19 focuses on the processing of event streams. It uses a functional programming language to define logic for the reaction to certain events. For this purpose, a fitting dashboard can be configured. Messages can be sent by SMS or email.

19. http://riemann.io/

• Packetbeat20 uses an agent that records the network traffic on the computer to be monitored. This enables Packetbeat to determine with minimal effort which requests take how long and which nodes communicate with each other. It is especially interesting that Packetbeat uses Elasticsearch for data storage and Kibana for analysis. These tools are also widely used for analyzing log data (see section 11.2). Having only one stack for the storage and analysis of logs and monitoring reduces the complexity of the environment.

20. https://www.elastic.co/products/beats

• In addition, there are different commercial tools. Among those are HP’s Operations Manager,21 IBM Tivoli,22 CA Opscenter23 and BMC Remedy.24 These tools are very comprehensive, have been on the market for a long time, and offer support for many different software and hardware products. Such platforms are often used enterprise-wide, and introducing them into an organization is usually a very complex project. Some of these solutions can also analyze and monitor log files. Due to their large number and the high dynamics of the environment, it can be sensible for microservices to establish their own monitoring tools, even if an enterprise-wide standard exists already. When the established processes and tools require a high manual expenditure for administration, this expenditure might not be feasible any more in the face of the large number of microservices and the dynamics of the microservice environment.

21. http://www8.hp.com/us/en/software-solutions/operations-manager-infrastructure-monitoring/

22. http://www-01.ibm.com/software/tivoli/

23. http://www3.ca.com/us/opscenter.aspx

24. http://www.bmc.com/it-solutions/remedy-itsm.html

• Monitoring can be moved to the Cloud. In this manner no extra infrastructure has to be installed. This facilitates the introduction of tools and monitoring the applications. An example is NewRelic.25

25. http://newrelic.com/

These tools are, first of all, useful for operations and for developers. Business monitoring can be performed with different tools. Such monitoring is not only based on current trends and data, but also on historical values. Therefore, the amount of data is markedly larger than for operations and development. The data can be exported into a separate database or investigated with big data solutions. In fact, the analysis of data from web servers is one of the areas where big data solutions have first been used.

Enabling Monitoring in Microservices

Microservices have to deliver data that is displayed in the monitoring solutions. It is possible to provide the data via a simple interface like HTTP with a data format such as JSON. Then the monitoring tools can read the data out and import it. For this purpose, adaptors can be written as scripts by the developers. This makes it possible to provide different tools via the same interface with data.

Metrics

In the Java world, the Metrics26 framework can be used. It offers functionalities for recording custom values and sending them to a monitoring tool. This makes it possible to record metrics in the application and to hand them over to a monitoring tool.

26. https://github.com/dropwizard/metrics

StatsD

StatsD27 can collect values from different sources, perform calculations, and hand over the results to monitoring tools. This enables condensing of data before it is passed on to the monitoring tool in order to reduce the load on the monitoring tool. There are also many client libraries for StatsD that facilitate the sending of data to StatsD.

27. https://github.com/etsy/statsd

collectd

collectd28 collects statistics about a system—for instance, the CPU utilization. The data can be analyzed with the front end or it can be stored in monitoring tools. collectd can collect data from a HTTP JSON data source and send it on to the monitoring tool. Via different plugins, collectd can collect data from the operating system and the basic processes.

28. https://collectd.org/

Technology Stack for Monitoring

A technology stack for monitoring comprises different components (see Figure 11.3):

• Within the microservice itself data has to be recorded and provided to monitoring. For this purpose, a library can be used that directly contacts the monitoring tool. Alternatively, the data can be offered via a uniform interface—for example JSON via HTTP–and another tool collects the data and sends it on to the monitoring tool.

Image

Figure 11.3 Parts of a Monitoring System

• In addition, if necessary, there should be an agent to record the data from the operating system and the hardware and pass it on to monitoring.

• The monitoring tool stores and visualizes the data and can, if needed, trigger an alarm. Different aspects can be covered by different monitoring applications.

• For analyses of historical data or by complex algorithms a solution based on big data tools can be created in parallel.

Effects on the Individual Microservice

A microservice also has to be integrated into the infrastructure. It has to hand over monitoring data to the monitoring infrastructure and provide some mandatory data. This can be ensured by a suitable template for the microservice and by tests.


Try and Experiment

• Define a technology stack that enables implementation of monitoring in a microservice-based architecture. To do so define the stakeholders and the data that is relevant for them. Each of the stakeholders needs to have a tool for analyzing the data that is relevant for him/her. Finally, with which tools the data can be recorded and how it is stored has to be defined. This section listed a number of tools for the different areas. In conjunction with further research it is possible to assemble a technology stack that is well suited for individual projects.

Chapter 13 shows an example for a microservice-based architecture, and in section 13.15 there is also a suggestion about how the architecture can be extended by monitoring. How does your current project handle monitoring? Can some of the technologies presented in this section also be advantageous for your project? Which? Why?


11.4 Deployment

Independent deployment is a central aim of microservices. Besides, the deployment has to be automated because manual deployment or even just manual corrections are not feasible due to the large number of microservices.

Deployment Automation

There are different possibilities for automating deployment:

Installation scripts can be used that only install the software on the computer. Such scripts can, for instance, be implemented as shell scripts. They can install necessary software packages, generate configuration files, and create user accounts. Such scripts can be problematic when they are called repeatedly. In that case the installation finds a computer on which the software is already installed. However, an update is different from a fresh installation. In such a situation a script can fail, for example, because user accounts or configuration files might already be present and cannot easily be overwritten. When the scripts are supposed to handle updates, development and testing the scripts get more laborious.

Immutable servers are an option to handle these problems. Instead of updating the software on the servers, the server is completely deployed anew. This facilitates not only the automation of deployment but also the exact reproduction of the software installed on a server. It is sufficient to consider fresh installations. A fresh installation is easier to reproduce than an update, which can be started from many different configuration states and should lead to the same state from any of those. Approaches like Docker29 make it possible to tremendously reduce the expenditure for installing software. Docker is a kind of lightweight virtualization. It also optimizes the handling of virtual hard drives. If there is already a virtual hard drive with the correct data, it is recycled instead of installing the software anew. When installing a package like Java, first a virtual hard drive is looked for that already has this installation. Only when one does not exist is the installation really performed. Should there only be a change in a configuration file when going from an old to a new version of an immutable server, Docker will recycle the old virtual hard drives behind the scenes and only supplement the new configuration file. This does not only reduce the consumption of hard drive space, but also profoundly speeds up the installation of the servers. Docker also decreases the time a virtual team needs for booting. These optimizations turn immutable server in conjunction with Docker into an interesting option. The new deployment of the servers is very fast with Docker, and the new server can also rapidly be booted.

29. https://www.docker.com/

• Other possibilities are tools like Puppet,30 Chef,31 Ansible,32 or Salt.33 They are specialized for installing software. Scripts for these tools describe what the system is supposed to look like after the installation. During an installation run the tool will take the necessary steps to transfer the system into the desired state. During the first run on a fresh system the tool completely installs the software. If the installation is run a second time immediately afterwards, it will not change the system any further since the system is already in the desired state. Besides, these tools can uniformly install a large number of servers in an automated manner and are also able to roll out changes to a large number of servers.

30. http://puppetlabs.com/

31. https://www.chef.io/

32. http://www.ansible.com/

33. http://www.saltstack.com/

• Operating systems from the Linux area possess package managers like rpm (RedHat), dpkg (Debian/Ubuntu), or zypper (SuSE). They make it possible to centrally roll out software onto a large number of servers. The file formats used are very simple, so that it is very easy to generate a package in a fitting format. The configuration of the software poses a problem, though. Package managers usually support scripts that are executed during installation. Such scripts can generate the necessary configuration files. However, there can also be an extra package with the individual configurations for each host. The installation tools mentioned under the last bullet point can also use package manager for installing the actual software so that they themselves only generate the configuration files.

Installation and Configuration

Section 7.10 already described tools that can be used for configuring microservices. In general, it is hard to separate the installation from the software configuration. The installation has to generate a configuration. Therefore, many of the tools such as Puppet, Chef, Ansible, or Salt can also create configurations and roll them out onto servers. Thus, these solutions are an alternative to the configuration solutions that are specialized for microservices.

Risks Associated with Microservice Deployments

Microservices are supposed to make possible an easy and independent deployment. Nevertheless, it can never be excluded that problems arise in production. The microservice-based architecture by itself will already help to reduce the risk. When a microservice fails as a result of a problem with a new version, this failure should be limited to the functionality of this microservice. Apart from that, the system should keep working. This is made possible by stability patterns and resilience described in section 9.5. Already for this reason the deployment of a microservice is much less risky than the deployment of a monolith. In cases of a monolith it is much harder to limit a failure to a certain functionality. If a new version of the deployment monolith has a memory leak, this will cause the entire process to break down so that the entire monolith will not be available any more. A memory leak in a microservice only influences this microservice. There are different challenges for which microservices are not helpful per se: schema changes in relational databases are, for instance, problematic because they often take very long and might fail—especially when the database already contains a lot of data. As microservices have their own data storage, a schema migration is always limited to just one microservice.

Deployment Strategies

To further reduce the risk associated with a microservice deployment there are different strategies:

• A rollback brings the old version of a microservice back into production. Handling the database can be problematic: Often the old version of the microservice does not work anymore with the database schema created by the newer version. When there are already data in the database that use the new schema, it can get very difficult to recreate the old state without losing the new data. Besides, the rollback is hard to test.

• A roll forward brings a new version of a microservice in production that does not contain the error any more. The procedure is identical to the procedure for the deployment of any other new version of the microservice so that no special measures are necessary. The change is rather small so that deployment and the passage through the continuous delivery pipeline should rapidly take place.

Continuous deployment is even more radical: Each change to a microservice is brought into production when the continuous delivery pipeline was passed successfully. This further reduces the time necessary for the correction of errors. Besides, this entails that there are fewer changes per release, which further decreases the risk and makes it easier to track that changes to the code caused a problem. Continuous deployment is the logical consequence when the deployment process works so well that going into production is just a formality. Moreover, the team will pay more attention to the quality of their code when each change really goes into production.

• A blue/green deployment builds up a completely new environment with the new version of a microservice. The team can completely test the new version and then bring it into production. Should problems occur, the old version can be used again, which is kept for this purpose. Also in this scenario there are challenges in case of changes to the database schema. When switching from the one version to the other version of the microservice, the database has to be switched also. Data that has been written into the old database between the built-up of the new environment and the switch has to be transferred into the new database.

Canary releasing is based on the idea to deploy the new version initially just on one server in a cluster. When the new version runs without trouble on one server, it can also be deployed on the other servers. The database has to support the old and the new version of the microservice in parallel.

Microservices can also run blindly in production. In that case they get all requests, but they may not change data, and calls that they send out are not passed on. By monitoring, log analyses, and comparison with the old version, it is possible to determine whether the new service has been correctly implemented.

Theoretically, such procedures can also be implemented with deployment monoliths. However, in practice this is very difficult. With microservices it is easier since they are much smaller deployment units. Microservices require less comprehensive tests. Installing and starting microservices is much faster. Therefore, microservices can more rapidly pass through the continuous delivery pipeline into production. This will have positive effects for roll forward or rollback because problems require less time to fix. A microservice needs fewer resources in operation. This is helpful for canary releasing or blue/green deployment since new environments have to be built up. If this is possible with fewer resources, these approaches are easier to implement. For a deployment monolith it is often very difficult to build up an environment at all.

11.5 Combined or Separate Deployment? (Jörg Müller)

by Jörg Müller, Hypoport AG

The question whether different services are rolled out together or independently from each other is of greater relevance than sometimes suspected. This is an experience we had to make in the context of a project that started approximately five years ago.

The term “microservices” was not yet important in our industry. However, achieving a good modularization was our goal right from the start. The entire application consisted initially of a number of web modules coming in the shape of typical Java web application archives (WAR). These comprised in turn multiple modules that had been split based on domain as well as technical criteria. In addition to modularization we relied from the start on continuous deployment as a method for rolling out the application. Each commit goes straight into production.

Initially, it seemed an obvious choice to build an integrated deployment pipeline for the entire application. This enabled integration tests across all components. A single version for the entire application enabled controlled behavior, even if multiple components of the applications were changed simultaneously. Finally, the pipeline itself was easier to implement. The latter was an important reason: Since there were relatively few tools for continuous deployment at the time, we had to build most ourselves.

However, after some time the disadvantages of our approach became obvious. The first consequence was a longer and longer run time of our deployment pipeline. The larger the number of components that were built, tested, and rolled out, the longer the process took. The advantages of continuous deployments rapidly diminished when the run time of the pipeline became longer. The first countermeasure was the optimization that only changed components were built and tested. However, this increased the complexity of the deployment pipeline tremendously. At the same time other problems like the runtime for changes to central components or the size of the artifacts could not be improved this way.

But there was also a subtler problem. A combined rollout with integrative tests offered a strong security net. It was easy to perform refactorings across multiple modules. However, this often changed interfaces between modules just because it was so easy to do. This is, in principle, a good thing. However, it had the consequence that it became very frequently necessary to start the entire system. Especially when working on the developer machine, this turned into a burden. The requirements for the hardware got very high, and the turnaround times lengthened considerably.

The approach got even more complicated when more than one team worked with this integrated pipeline. The more components were tested in one pipeline, the more frequently errors were uncovered. This blocked the pipeline since the errors had to be fixed first. At the time when only one team was dependent on the pipeline, it was easy to find somebody who took over responsibility and fixed the problem. When there were several teams, this responsibility was not so clear any more. This meant that errors in the pipeline persisted for a longer time. Simultaneously, the variety of technologies increased. Again, the complexity rose. This pipeline now needed very specialized solutions. Therefore, the expenditure for maintenance increased, and the stability decreased. The value of continuous deployment got hard to put into effect.

At this time it became obvious that the combined deployment in one pipeline could not be continued any more. All new services, regardless of whether they were microservices or larger modules, now had their own pipeline. However, it caused a lot of expenditure to separate the previous pipeline that was based on shared deployment into multiple pipelines.

In a new project it can be the right decision to start with a combined deployment. This especially holds true when the borders between the individual services and their interfaces are not yet well known. In such a case good integrative tests and simple refactoring can be very useful. However, starting at a certain size an independent deployment is obligatory. Indications for this are the number of modules or services, the run time and stability of the deployment pipeline, and last, but not least, the how many teams work on the overall system. If these indications are overlooked and the right point in time to separate the deployment is missed, it can easily happen that one builds a monolith that consists of many small microservices.

11.6 Control

Interventions in a microservice might be necessary at run time. For instance, a problem with a microservice might require restarting the respective microservice. Likewise, a start or a stop of a microservice might be necessary. These are ways for operation to intervene in case of a problem or for a load balancer to terminate instances that cannot process requests any more.

Different measures can be used for control:

• When a microservice runs in a virtual machine, the virtual machine can be shut down or restarted. In that case the microservice itself does not have to make special arrangements.

• The operating system supports services that are started together with the operating system. Usually, services can also be stopped, started, or restarted by means of the operating system. In that case the installation only has to register the microservice as service. Working with services is nothing unusual for operation, which is sufficient for this approach.

• Finally, an interface can be used that enables restarting or shutting down, for instance via REST. Such an interface has to be implemented by the microservice itself. This is supported by several libraries in the microservices area—for instance by Spring Boot, which is used to implement the example in Chapter 13. Such an interface can be called with simple HTTP tools like curl.

Technically, the implementation of control mechanisms is not a big problem, but they have to be present for operating the microservices. When they are identically implemented for all microservices, this can reduce the expenditure for operating the system.

11.7 Infrastructure

Microservices have to run on a suitable platform. It is best to run each microservice in a separate virtual machine (VM). Otherwise it is difficult to assure an independent deployment of the individual microservices.

When multiple microservices run on a virtual machine, the deployment of one microservice can influence another microservice. The deployment can generate a high load or introduce changes to the virtual machine that also concern other microservices running on the virtual machine.

Besides, microservices should be isolated from each other to achieve a better stability and resilience. When multiple microservices are running on one virtual machine, one microservice can generate so much load that the other microservices fail. However, precisely that should be prevented: When one microservice fails, this failure should be limited to this one microservice and not affect additional microservices. The isolation of virtual machines is helpful for limiting the failure or the load to one microservice.

Scaling microservices is likewise easier when each microservice runs in an individual virtual machine. When the load is too high, it is sufficient to start a new virtual machine and register it with the load balancer.

In case of problems it is also easier to analyze the error when all processes on a virtual machine belong to one microservice. Each metric on the system then unambiguously belongs to this microservice.

Finally, the microservice can be delivered as hard drive image when each microservice runs on its own virtual machine. Such a deployment has the advantage that the entire environment of the virtual machine is exactly in line with the requirements of the microservice and that the microservice can bring along its own technology stack up to its own operating system.

Virtualization or Cloud

It is hardly possible to install new physical hardware upon the deployment of a new microservice. Besides, microservices profit from virtualization or a Cloud, since this renders the infrastructures much more flexible. New virtual machines for scaling or testing environments can easily be provided. In the continuous delivery pipeline microservices are constantly started to perform different tests. Moreover, in production new instances have to be started depending on the load.

Therefore, it should be possible to start a new virtual machine in a completely automated manner. Starting new instances with simple API calls is exactly what a Cloud offers. A cloud infrastructure should be available in order to really be able to implement a microservice-based architecture. Virtual machines that are provided by operation via manual processes are not sufficient. This also demonstrates that microservices can hardly be run without modern infrastructures.

Docker

When there is an individual virtual machine for each microservice, it is laborious to generate a test environment containing all microservices. Even creating an environment with relatively few microservices can be a challenge for a developer machine. The usage of RAM and CPU is very high for such an environment. In fact, it is hardly sensible to use an entire virtual machine for one microservice. In the end, the microservice should just run and integrate in logging and monitoring. Therefore, solutions like Docker are convenient: Docker does not comprise many of the normally common operating system features.

Instead Docker34 offers a very lightweight virtualization. To this purpose Docker uses different technologies:

34. https://www.docker.com/

• In place of a complete virtualization Docker employs Linux Containers.35 Support for similar mechanisms in Microsoft Windows has been announced. This enables implementation of a lightweight alternative to virtual machines: All containers use the same kernel. There is only one instance of the kernel in memory. Processes, networks, data systems, and users are separate from each other. In comparison to a virtual machine with its own kernel and often also many operating system services, a container has a profoundly lower overhead. It is easily possible to run hundreds of Linux containers on a simple laptop. Besides, a container starts much more rapidly than a virtual machine with its own kernel and complete operating system. The container does not have to boot an entire operating system; it just starts a new process. The container itself does not add a lot of overhead since it only requires a custom configuration of the operating system resources.

35. https://linuxcontainers.org/

• In addition, the file system is optimized: basic read-only file systems can be used. At the same time additional file systems can be added to the container, which also enables writing. One file system can be put on top of another file system. For instance, a basic file system can be generated that contains an operating system. If software is installed in the running container or if files are modified, the container only has to store these additional files in a small container-specific file system. In this way the memory requirement for the containers on the hard drive is significantly reduced.

Besides, additional interesting possibilities arise: For example, a basic file system can be started with an operating system, and subsequently software can be installed. As mentioned, only changes to the file system are saved that are introduced upon the installation of the software. Based on this delta a file system can be generated. Then a container can be started that puts a file system with this delta on top of the basic file system containing the operating system—and afterwards additional software can be installed in yet another layer. In this manner each “layer” in the file system can contain specific changes. The real file system at run time can be composed from numerous such layers. This enables recycling software installations very efficiently.

Figure 11.4 shows an example for the file system of a running container: The lowest level is an Ubuntu Linux installation. On top there are changes that have been introduced by installing Java. Then there is the application. For the running container to be able to write changes into the file system, there is a file system on top into which the container writes files. When the container wants to read a file, it will move through the layers from top to bottom until it finds the respective data.

Image

Figure 11.4 Filesystems in Docker

Docker Container versus Virtualization

Docker containers offer a very efficient alternative to virtualization. However, they are not “real” virtualization since each container has separate resources, its own memory, and its own file systems, but all share, for instance, one kernel. Therefore, this approach has some disadvantages. A Docker container can only use Linux and only the same kernel as the host operating system—consequently Windows applications, for instance, cannot be run on a Linux machine this way. The separation of the containers is not as strict as in the case of real virtual machines. An error in the kernel would, for example, affect all containers. Moreover, Docker also does not run on Mac OS X or Windows. Nevertheless, Docker can directly be installed on these platforms. Behind the scenes a virtual machine with Linux is being used. Microsoft has announced a version for Windows that can run the Windows container.

Communication between Docker Containers

Docker containers have to communicate with each other. For example, a web application communicates with its database. For this purpose, containers export network ports that other containers use. Besides, file systems can be used together. There containers write data that can be read by other containers.

Docker Registry

Docker images comprise the data of a virtual hard drive. Docker registries enable saving and downloading Docker images. This makes it possible to save Docker images as result of a build process and subsequently to roll them out on servers. Because of the efficient storage of images, it is easily possible to distribute even complex installations in a performant manner. Besides, many cloud solutions can directly run Docker containers.

Docker and Microservices

Docker constitutes an ideal running environment for microservices. It hardly limits the technology used, as every type of Linux software can run in a Docker container. Docker registries make it possible to easily distribute Docker containers. At the same time the overhead of a Docker container is negligible in comparison to a normal process. Since microservices require a multitude of virtual machines, these optimizations are very valuable. On the one hand, Docker is very efficient, and on the other hand, it does not limit the technology freedom.


Try and Experiment

• At https://docs.docker.com/engine/getstarted/ the Docker online tutorial can be found. Complete the tutorial—it demonstrates the basics of working with Docker. The tutorial can be completed quickly.


Docker and Servers

There are different possibilities to use Docker for servers:

• On a Linux server Docker can be installed, and afterwards one or multiple Docker containers can be run. Docker then serves as solution for the provisioning of the software. For a cluster new servers are started on which, again, the Docker containers are installed. Docker only serves for the installation of the software on the servers.

• Docker containers are run directly on a cluster. Which physical computer a certain Docker is located on is decided by the software for cluster administration. Such an approach is supported by the scheduler Apache Mesos.36 It administrates a cluster of servers and directs jobs to the respective servers. Mesosphere37 enables running of Docker containers with the aid of the Mesos scheduler. Besides Mesos supports many additional kinds of jobs.

36. http://mesos.apache.org/

37. http://mesosphere.com/

• Kubernetes38 likewise supports the execution of Docker containers in a cluster. However, the approach taken is different from Mesos. Kubernetes offers a service that distributes pods in the cluster. Pods are interconnected Docker containers, which are supposed to run on a physical server. As basis Kubernetes requires only a simple operating system installation—Kubernetes implements the cluster management.

38. http://kubernetes.io/

CoreOS39 is a very lightweight server operating system. With etcd it supports the cluster-wide distribution of configurations. fleetd enables the deployment of services in a cluster—up to redundant installation, failure security, dependencies, and shared deployment on a node. All services have to be deployed as Docker containers while the operating system itself remains essentially unchanged.

39. http://coreos.com/

Docker Machine40 enables the installation of Docker on different virtualiza-tion and cloud systems. Besides, Docker machine can configure the Docker command line tool in such a manner that it communicates with such a system. Together with Docker Compose41 multiple Docker containers can be combined to an overall system. The example application employs this approach—compare section 13.6 and section 13.7. Docker Swarm42 adds a way to configure and run clusters with this tool stack: Individual servers can be installed with Docker Machine and combined to a cluster with Docker Swarm. Docker Compose can run each Docker container on a specific machine in the cluster.

40. https://docs.docker.com/machine/

41. http://docs.docker.com/compose/

42. http://docs.docker.com/swarm/

Kubernetes, CoreOS, Docker Compose, Docker Machine, Docker Swarm, and Mesos, of course, influence the running of the software so that the solutions require changes in the operation procedures in contrast to virtualization. These technologies solve challenges that were previously addressed by virtualization solutions. Modern virtualization technology run virtual machines on a node in a cluster and do the cluster management. The container technologies mentioned above distribute containers in the cluster. So the cluster handling is done by different software which requires a fundamental change in the operations procedures.

PaaS

PaaS (platform as a service) is based on a fundamentally different approach. The deployment of an application can be done simply by updating the application in version control. The PaaS fetches the changes, builds the application, and rolls it out on the servers. These servers are installed by PaaS and represent a standardized environment. The actual infrastructure—that is, the virtual machines—are hidden from the application. PaaS offers a standardized environment for the application. The environment also takes care, for instance, of the scaling and can offer services like databases and messaging systems. Because of the uniform platform PaaS systems limit the technology freedom that is normally an advantage of microservices. Only technologies that are supported by PaaS can be used. On the other hand, deployment and scaling are further facilitated.

Microservices impose high demands on infrastructure. Automation is an essential prerequisite for operating the numerous microservices. A PaaS offers a good basis for this since it profoundly facilitates automation. To use a PaaS can be especially sensible when the development of a home-grown automation is too laborious and there is not enough knowledge about how to build the necessary infrastructure. However, the microservices have to restrict themselves to the features that are offered by the PaaS. When the microservices have been developed for the PaaS from the start, this is not very laborious. However, if they have to be ported, considerable expenditure can ensue.

Nanoservices (Chapter 14, “Technologies for Nanoservices”) have different operating environments, which, for example, even further restrict the technology choice. On the other hand, they are often even easier to operate and even more efficient in regards to resource usage.

11.8 Conclusion

Operating a microservice-based system is one of the central challenges when working with microservices (section 11.1). A microservice-based system contains a tremendous number of microservices and therefore operating system processes. Fifty or one hundred virtual machines are no rarity. The responsibility for operation can be delegated to the teams. However, this approach creates a higher overall expenditure. Standardizing operations is a more sensible strategy. Templates are a possibility to achieve uniformity without exerting pressure. Templates turn the uniform approach into the easiest one.

For logging (section 11.2) a central infrastructure has to be provided that collects logs from all microservices. There are different technologies available for this. To trace a call across the different microservices a correlation ID can be used that unambiguously identifies a call.

Monitoring (section 11.3) has to offer at least basic information such as the availability of the microservice. Additional metrics can, for instance, provide an overview of the overall system or can be useful for load balancing. Metrics can be individually defined for each microservice. There are different stakeholders for the monitoring: operations, developers, and business stakeholders. They are interested in different values and use, where necessary, their own tools for evaluating the microservices data. Each microservice has to offer an interface with which the different tools can fetch values from the application. The interface should be identical for all microservices.

The deployment of microservices (section 11.4) has to be automated. Simple scripts, especially in conjunction with immutable server, special deployment tools, and package manager can be used for this purpose.

Microservices are small deployment units. They are safeguarded by stability and resilience against the failure of other microservices. Therefore, the risk associated with deployments is already reduced by the microservice-based architecture itself. Strategies like rollback, roll forward, continuous deployment, blue/green-deployment, or a blind moving along in production can further reduce the risk. Such strategies are easy to implement with microservices since the deployment units are small and the consumption of resources by microservices is low. Therefore, deployments are faster, and environments for blue/green-deployment or canary releasing are much easier to provide.

Control (section 11.6) comprises simple intervention options like starting, stopping, and restarting of microservices.

Virtualization or Cloud are good options for infrastructures for microservices (section 11.7). On each VM only a single microservice should run to achieve a better isolation, stability, and scaling. Especially interesting is Docker because the consumption of resources by a Docker container is much lower than that of a VM. This makes it possible to provide each microservice with its own Docker container even if the number of microservices is large. PaaS are likewise interesting. They enable a very simple automation. However, they also restrict the choice of technologies.

This section only focuses on the specifics of continuous delivery and operation in a microservices environment. Continuous delivery is one of the most important reasons for the introduction of microservices. At the same time operation poses the biggest challenges.

Essential Points

• Operation and continuous delivery are central challenges for microservices.

• The microservices should handle monitoring, logging, and deployment in a uniform manner. This is the only way to keep the effort reasonable.

• Virtualization, Cloud, PaaS, and Docker are interesting infrastructure alternatives for microservices.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset