Microservices are truly distributed systems with a fluid deployment topology. Without sophisticated monitoring in place, operations teams may run into trouble managing large-scale microservices. Traditional monolithic application deployments are limited to a number of known services, instances, machines, and so on. This is easier to manage compared to the large number of microservices instances potentially running across different machines. To add more complication, these services dynamically change their topologies. A centralized logging capability only addresses part of the issue. It is important for operations teams to understand the runtime deployment topology and also the behavior of the systems. This demands more than a centralized logging can offer.
In general application, monitoring is more a collection of metrics, aggregation, and their validation against certain baseline values. If there is a service-level breach, then monitoring tools generate alerts and send them to administrators. With hundreds and thousands of interconnected microservices, traditional monitoring does not really offer true value. The one-size-fits-all approach to monitoring or monitoring everything with a single pane of glass is not easy to achieve in large-scale microservices.
One of the main objectives of microservice monitoring is to understand the behavior of the system from a user experience point of view. This will ensure that the end-to-end behavior is consistent and is in line with what is expected by the users.
Similar to the fragmented logging issue, the key challenge in monitoring microservices is that there are many moving parts in a microservice ecosystem.
The typical issues are summarized here:
Many of the traditional monitoring tools are good to monitor monolithic applications but fall short in monitoring large-scale, distributed, interlinked microservice systems. Many of the traditional monitoring systems are agent-based preinstall agents on the target machines or application instances. This poses two challenges:
Many traditional tools need baseline metrics. Such systems work with preset rules, such as if the CPU utilization goes above 60% and remains at this level for 2 minutes, then an alert should be sent to the administrator. It is extremely hard to preconfigure these values in large, Internet-scale deployments.
New-generation monitoring applications learn the application's behavior by themselves and set automatic threshold values. This frees up administrators from doing this mundane task. Automated baselines are sometimes more accurate than human forecasts:
As shown in the diagram, the key areas of microservices monitoring are:
This is done by either running agents on the source machines, streaming data from the sources, or polling at regular intervals.
Generally, this is done by an intermediary that accept the metrics.
These tools may use big data and stream analytics solutions.
Dashboards and alerting tools are capable of handling these requirements.
Microservice monitoring is typically done with three approaches. A combination of these is really required for effective monitoring:
There are many tools available to monitor microservices. There are also overlaps between many of these tools. The selection of monitoring tools really depends upon the ecosystem that needs to be monitored. In most cases, more than one tool is required to monitor the overall microservice ecosystem.
The objective of this section is to familiarize ourselves with a number of common microservices-friendly monitoring tools:
When there are a large number of microservices with dependencies, it is important to have a monitoring tool that can show the dependencies among microservices. It is not a scalable approach to statically configure and manage these dependencies. There are many tools that are useful in monitoring microservice dependencies, as follows:
This section will explore Spring Cloud Hystrix as a library for a fault-tolerant and latency-tolerant microservice implementation. Hystrix is based on the fail fast and rapid recovery principles. If there is an issue with a service, Hystrix helps isolate it. It helps to recover quickly by falling back to another preconfigured fallback service. Hystrix is another battle-tested library from Netflix. Hystrix is based on the circuit breaker pattern.
Read more about the circuit breaker pattern at https://msdn.microsoft.com/en-us/library/dn589784.aspx.
In this section, we will build a circuit breaker with Spring Cloud Hystrix. Perform the following steps to change the Search API Gateway service to integrate it with Hystrix:
@EnableCircuitBreaker
. This command will tell Spring Cloud Hystrix to enable a circuit breaker for this application. It also exposes the /hystrix.stream
endpoint for metrics collection.getHub
annotated with @HystrixCommand
. This tells Spring that this method is prone to failure. Spring Cloud libraries wrap these methods to handle fault tolerance and latency tolerance by enabling circuit breaker. The Hystrix command typically follows with a fallback method. In case of failure, Hystrix automatically enables the fallback method mentioned and diverts traffic to the fallback method. As shown in the following code, in this case, getHub
will fall back to getDefaultHub
:@Component class SearchAPIGatewayComponent { @LoadBalanced @Autowired RestTemplate restTemplate; @HystrixCommand(fallbackMethod = "getDefaultHub") public String getHub(){ String hub = restTemplate.getForObject("http://search-service/search/hub", String.class); return hub; } public String getDefaultHub(){ return "Possibily SFO"; } }
getHub
method of SearchAPIGatewayController
calls the getHub
method of SearchAPIGatewayComponent
, as follows:@RequestMapping("/hubongw") String getHub(){ logger.info("Search Request in API gateway for getting Hub, forwarding to search-service "); return component.getHub(); }
@EnableHystrixDashboard
annotation.9999
. So, open the URL http://localhost:9999/hystrix
.In this case, Search API Gateway is running on port 8095
. Hence, the hystrix.stream
URL will be http://localhost:8095/hytrix.stream
, as shown:
http://localhost:8095/hubongw
.To know the meaning of each of these parameters, visit the Hystrix wiki at https://github.com/Netflix/Hystrix/wiki/Dashboard.
In the previous example, the /hystrix.stream
endpoint of our microservice was given in the Hystrix Dashboard. The Hystrix Dashboard can only monitor one microservice at a time. If there are many microservices, then the Hystrix Dashboard pointing to the service has to be changed every time we switch the microservices to monitor. Looking into one instance at a time is tedious, especially when there are many instances of a microservice or multiple microservices.
We have to have a mechanism to aggregate data coming from multiple /hystrix.stream
instances and consolidate it into a single dashboard view. Turbine does exactly the same thing. Turbine is another server that collects Hystrix streams from multiple instances and consolidates them into one /turbine.stream
instance. Now, the Hystrix Dashboard can point to /turbine.stream
to get the consolidated information:
Turbine currently works only with different hostnames. Each instance has to be run on separate hosts. If you are testing multiple services locally on the same host, then update the host file (/etc/hosts
) to simulate multiple hosts. Once done, bootstrap.properties
has to be configured as follows:
eureka.instance.hostname: localdomain2
This example showcases how to use Turbine to monitor circuit breakers across multiple instances and services. We will use the Search service and Search API Gateway in this example. Turbine internally uses Eureka to resolve service IDs that are configured for monitoring.
Perform the following steps to build and execute this example:
@EnableTurbine
to the main Spring Boot Application class. In this example, both Turbine and Hystrix Dashboard are configured to be run on the same Spring Boot application. This is possible by adding the following annotations to the newly created Turbine application:@EnableTurbine @EnableHystrixDashboard @SpringBootApplication public class TurbineServerApplication {
.yaml
or property file to point to the instances that we are interested in monitoring:spring: application: name : turbineserver turbine: clusterNameExpression: new String('default') appConfig : search-service,search-apigateway server: port: 9090 eureka: client: serviceUrl: defaultZone: http://localhost:8761/eureka/
search-service
and search-apigateway
services. The search-service
and search-apigateways
service IDs are used to register services with Eureka. Turbine uses these names to resolve the actual service host and port by checking with the Eureka server. It will then use this information to read /hystrix.stream
from each of these instances. Turbine will then read all the individual Hystrix streams, aggregate all of them, and expose them under the Turbine server's /turbine.stream
URL.turbine: aggregator: clusterConfig: [comma separated clusternames]
SearchComponent
to add another circuit breaker, as follows:@HystrixCommand(fallbackMethod = "searchFallback") public List<Flight> search(SearchQuery query){
@EnableCircuitBreaker
to the main Application class in the Search service.bootstrap.properties
of the Search service. This is required because all the services are running on the same host:Eureka.instance.hostname: localdomain1
bootstrap.properties
of the Search API Gateway service. This is to make sure that both the services use different hostnames:eureka.instance.hostname: localdomain2
search-apigateway
: one on localdomain1:8095
and another one on localdomain2:8096
. We will also run one instance of search-service
on localdomain1:8090
.java -jar -Dserver.port=8096 -Deureka.instance.hostname=localdomain2 -Dserver.address=localdomain2 target/chapter7.search-apigateway-1.0.jar java -jar -Dserver.port=8095 -Deureka.instance.hostname=localdomain1 -Dserver.address=localdomain1 target/chapter7.search-apigateway-1.0.jar java -jar -Dserver.port=8090 -Deureka.instance.hostname=localdomain1 -Dserver.address=localdomain1 target/chapter7.search-1.0.jar
http://localhost:9090/hystrix
./hystrix.stream
, this time, we will point to /turbine.stream
. In this example, the Turbine stream is running on 9090
. Hence, the URL to be given in the Hystrix Dashboard is http://localhost:9090/turbine.stream
.http://localhost:8095/hubongw
and http://localhost:8096/hubongw
.Once this is done, the dashboard page will show the getHub service.
chapter7.website
. Execute the search transaction using the website http://localhost:8001
.After executing the preceding search, the dashboard page will show search-service as well. This is shown in the following screenshot:
As we can see in the dashboard, search-service is coming from the Search microservice, and getHub is coming from Search API Gateway. As we have two instances of Search API Gateway, getHub is coming from two hosts, indicated by Hosts 2.