Using CloudWatch events and Lambda to create alerts on custom metrics

In the previous section, we added two alarms to our CloudFormation template. Whenever possible, keeping your monitoring information with the resources they are monitoring is good practice. Unfortunately, it isn't always easy to do. For instance, we are keeping track of the disk space usage of our EC2 instances. Those EC2 instances are created by our Auto Scaling group. Because of that, adding alerts for that metric in our troposphere code is a lot more complicated, as we don't have some of the critical information, such as the instance ID. To get around that issue, we are going to see how to create alerts based on infrastructure changes.

As we saw earlier, whenever a change occurs in your AWS infrastructure, the event is emitted in real time to a CloudWatch event. This includes the creation of EC2 instances. We will create a rule to capture those events and send that information to a Lambda function that will create our alarms.

We will implement that using the serverless framework (https://serverless.com/) that we looked at in Chapter 5, Scaling Your Infrastructure.

We will first create a new serverless application. In Chapter 5, Scaling Your Infrastructure, we demonstrated how to create a helloworld application using Node.js. Lambda and Serverless are also both able to handle other languages, including Python. We will use Python and the Boto library to manage the creation of our alarms. To get started, we need to create a new application using the following command:

serverless create --template aws-python 
      --name disk-free-monitoring 
      --path disk-free-monitoring  

This will create all the boilerplate we need inside a directory called disk-free-monitoring:

$ cd disk-free-monitoring  

The directory contains two files: handler.py and serverless.yml. The handler file will contain the code of our Lambda function while serverless.yml will have the information about how to deploy and configure our function. We will start there.

With your text editor, open the serverless.yml file.

The file is broken up into different sections.

The first change we will do is to add IAM permissions to our function. We want our function to be able to create and delete alarms. For that, find the provider block in the configuration file and add the following:

provider: 
  name: aws 
  runtime: python2.7 
  iamRoleStatements:
- Effect: "Allow"
Action:
- "cloudwatch:PutMetricAlarm"
- "cloudwatch:DeleteAlarms"
Resource: "*"

Toward the middle of the file, a section defines the name of the handler:

functions: 
  hello: 
    handler: handler.hello 

While ultimately we could create a function and call it hello, we can also come up with something more descriptive about the action. We will change the name to alarm as follows:

functions: 
  alarm: 
    handler: handler.alarm 

Lastly, we need to define how our function will get triggered. After the handler definition, add the following (events and handler are aligned):

    events: 
      - cloudwatchEvent: 
         event: 
           source: 
             - "aws.ec2" 
           detail-type: 
             - "EC2 Instance State-change Notification" 
           detail: 
             state: 
               - running 
               - stopping 
               - shutting-down 
               - stopped 
               - terminated 

We will now edit the handler.py file.

When you first open the file, it shows a basic hello function. We won't keep any of it. As a first step, delete everything in that file.

We will start our file with the import and initialization of the boto3 library:

import boto3 
client = boto3.client('cloudwatch') 

We will now create a function and call it alarm in reference to the handler value defined in our last file (handler.alarm). The function takes two arguments: event and context:

def alarm(event, context): 

The event will contain a JSON with the information that the EC2 instance state-change received. You can see sample events by using the CloudWatch event web interface. With your browser, open https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#rules:action=create and then provide the new information of the event you want to match, as shown in this screenshot:

In our case, we want to extract two pieces of information: the instance-id and the state. We will do that as follows:

    instance = event['detail']['instance-id'] 
    state = event['detail']['state'] 

We want to create alarms when an instance is running and delete them when they are in one of the other states listed in the serverless.yml file (stopping, shutting-down, stopped, terminated). We will create two alarms: a warning email alert when the partition is filled to 60% and a page for when we reach 80%.

We will do that by creating two functions, put_alarm and delete_alarms. For now, we will simply call them as follows:

    if state == "running": 
        warning = put_alarm(instance, 60, 'alert-email') 
        critical = put_alarm(instance, 80, 'alert-sms') 
        return warning, critical 
    else: 
        return delete_alarms(instance) 

We can now define our two functions, starting with the put_alarm function:

def put_alarm(instance, threshold, sns): 

The function takes three arguments, the instance ID, the threshold of the alarm, and the topic information.

We will first define the sns_prefix information. We can get that value using the following command:

$ aws sns list-topics 
    sns_prefix = 'arn:aws:sns:us-east-1:511912822958:' 

The next step will be to create the alarm. We will want to store the response so that we can return that to the Lambda execution:

    response = client.put_metric_alarm( 

We now need to provide all the information needed to create the alarm, starting with its name. The name of the alarm has to be unique to the AWS account. We will make sure this is the case by using the instance ID and sns suffix to generate the alarm name:

        AlarmName='DiskSpaceUtilization-{}-{}'.format(instance, sns), 

We now need to provide the details of the metric to monitor as follows. We will first provide the metric name and namespace followed by the dimensions. In the dimensions section, we are able to limit the monitoring to only our instance ID thanks to the information provided by CloudWatch through the event variable:

        MetricName='DiskSpaceUtilization', 
        Namespace='System/Linux', 
        Dimensions=[ 
            { 
                "Name": "InstanceId", 
                "Value": instance 
            }, 
            { 
                "Name": "Filesystem", 
                "Value": "/dev/xvda1" 
            }, 
            { 
                "Name": "MountPath", 
                "Value": "/" 
            } 
        ], 

We are going to define the threshold information as follows:

        Statistic='Average', 
        Period=300, 
        Unit='Percent', 
        EvaluationPeriods=2, 
        Threshold=threshold, 
        ComparisonOperator='GreaterThanOrEqualToThreshold', 
        TreatMissingData='missing', 

In this particular case, we want to have two consecutive executions of five minutes where the average disk usage is higher than 60 or 80% to trigger the alarms. Finally, we are going to specify the topics to send the message to when the alert triggers and recovers:

        AlarmActions=[
sns_prefix + sns,
],
OKActions=[
sns_prefix + sns,
]
)
return response

The function finishes with the return of the response. We will now create the function that deletes them. For that, we will create the function and call it delete_alarms. The code to delete the alarm is a lot simpler. We simply need to call the boto function, delete_alarms, and provide it an array with the two names of the alert we created:

def delete_alarms(instance): 
    names = [ 
        'DiskSpaceUtilization-{}-alert-email'.format(instance), 
        'DiskSpaceUtilization-{}-alert-sms'.format(instance) 
    ] 
    return client.delete_alarms(AlarmNames=names) 

The handler.py is done, but to make this code work, we need to create a few extra files. The first file we want to add is requirements.txt. This file defines the libraries required by our Python code to run. In our case, we need boto.

In the same directory as handler.py and serverless.yml, create a file and call it requirements.txt. In it, add the following:

boto3==1.4.4 

serverless doesn't automatically handle those requirement files. To handle them, we need to create a package.json file in the same directory as the other files and put the following in it:

{ 
  "name": "disk-free-monitoring", 
  "version": "1.0.0", 
  "description": "create cloudwatch alarms for disk space", 
  "repository": "tbd", 
  "license": "ISC", 
  "dependencies": { 
    "serverless-python-requirements": "^2.3.3" 
  } 
} 

We now can run the command npm install.

With those two extra files created, we are ready to deploy our application as follows:

$ serverless deploy
Serverless: Packaging service...
Serverless: Creating Stack...
Serverless: Checking Stack create progress...
.....
Serverless: Stack create finished...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service .zip file to S3 (1.17 KB)...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
.....................
Serverless: Stack update finished...
Service Information
service: disk-free-monitoring
stage: dev
region: us-east-1
api keys:
  None
endpoints:
  None
functions:
  alarm: disk-free-monitoring-dev-alarm  

From that point on, any EC2 instance that gets created in us-east-1 will automatically get two dedicated alarms while the instances are running:

We won't show it in the book, but there are many things you can improve in this script, including looking at the EC2 tags of your instances to see whether it's a production system.

Lastly, we will take a closer look at a service that AWS calls personal health.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset