Handling logs, events, and metrics in ECS

In the previous section, we added an extra step in our deployment process to identify and export the version for our application. In the case of EC2 and CodeDeploy, we created a version string using the deployment execution information. As such, we can easily correlate the logs produced by the deployment execution. In the case of ECS, what matters the most is to be able to identify the container ID within the ECR registry, as we are working with immutable containers. Therefore, we will update our code to use the container tag information as our application version.

In addition, we collected logs on EC2 instances using the awslogs agent. In the case of ECS, while we could do something similar by mounting the /var/log volume on to the ECS host and running the same agent, there is a much better way to do that.

ECS has many settings that we didn't explore. Among them is the ability to configure environment variables and change how logs are managed. We will edit the troposphere script helloworld-ecs-service-cf created in the last chapter to send the logs produced in the console directly to the CloudWatch logs.

With your text editor, open the file helloworld-ecs-service-cf-template.py.

We will first add a new troposphere.ecs import as follows:

from troposphere.ecs import ( 
    TaskDefinition, 
    ContainerDefinition, 
    LogConfiguration,
    Environment, 
)

We will use these classes inside the TaskDefinition section. Locate the TaskDefinition, and after the port mapping definition, add the following to define our HELLOWORLD_VERSION variable and the logging configuration:

            PortMappings=[ecs.PortMapping( 
                ContainerPort=3000)], 
            Environment=[
                Environment(Name='HELLOWORLD_VERSION', Value=Ref("Tag"))
            ],
            LogConfiguration=LogConfiguration(
                LogDriver="awslogs",
                Options={
                    'awslogs-group': "/aws/ecs/helloworld",
                    'awslogs-region': Ref("AWS::Region"),
                }
            ),

Once those changes are in place, we will create the volume group using the command-line interface:

$ aws logs create-log-group --log-group-name /aws/ecs/helloworld

In the last chapter, we created our cluster with all the permissions that we need to go through this chapter; therefore, we won't need to do anything else to get our logs, events, and metrics sent to CloudWatch.

We can save the changes and commit them:

$ git commit -am "Configuring logging"
$ git push

You can then generate the new CloudFormation template and commit it to the template directory of our helloworld application:

$ cd helloworld
$ curl -L http://bit.ly/2v3fryS | python > templates/helloworld-ecs-service-cf.template

$ git commit -am "Configuring logging"
$ git push

Thanks to our pipeline, a new version of the container will soon be deployed, and you will be able to observe the logs and metrics produced by your container.

Our monitoring infrastructure is now looking good. We are collecting and indexing metrics, events, and logs. In most cases, this is enough to get started. We can improve our metrics by creating dashboards to display some of the key metrics and search in our logs for a particular event or timeframe. As applications get more complex, it is common for these types of monitoring architectures to reach their limits. Sometimes, you would like to be able to group logs to find out what type of errors are happening often, or do some complex queries. In addition, you may want to have a more hybrid approach to how you store your logs and keep them indexed for just a few days, but archive them on S3 for a much longer period. To do that, we will need a logging infrastructure made up of ElasticSearch, Kibana, and Kinesis Firehose.

Creating a health-check endpoint
It is a good practice to create a route dedicated to monitoring in your application. This endpoint can then be used with your load balancers and ECS tasks to validate that the application is in a working state. The code behind that route will commonly check that the application can connect to your databases, storages, and other services that it depends on before returning an HTTP 200 (OK) to signal that the application is healthy.

Table of Contents for Handling logs, events, and metrics in ECS

Create new playlist

Sign In

Sign Up

Table of Contents for
Handling logs, events, and metrics in ECS