Creating a CloudWatch role in Ansible

We first need to go into the roles directory of our Ansible repository:

$ cd roles  

We will use the ansible-galaxy command to generate our new role:

$ ansible-galaxy init cloudwatch
- cloudwatch was created successfully  

We will create a minimal role that allows us to report some of those missing stats. With your text editor, open the file cloudwatch/tasks/main.yml.

We will use an open source tool called cloudwatchmon. You can access its code source and documentation on the GitHub page of the project at http://bit.ly/2pYjhI9. The tool is written in Python and is available through pip. To install pip packages, ansible provides a pip module. After the initial comment of the task, add the following:

--- 
# tasks file for cloudwatch 

- name: Installing cloudwatchmon pip: name: cloudwatchmon

This tool works through the intermediary of cronjob. We will use the cron module of Ansible to create what we will call cloudwatchmon.

After the call to the pip module, call the cron module as follows:

- name: Execute cloudwatchmon every 5min 
  cron: 
    name: "cloudwatchmon" 
    minute: "*/5" 
    job: /usr/local/bin/mon-put-instance-stats.py --auto-scaling --loadavg-percpu --mem-util --disk-space-util --disk-path=/ --from-cron" 

In this case, we are configuring our job to trigger the mon-put-instance-stats.py every five minutes. We are also specifying the list of metrics we want to collect in the command. The mem-util option will provide the percentage of memory utilization, while disk-space-util will do the same, but referring to the disk space on the / partition. You can refer to the documentation of the script to check the full list of options available.

Percentage versus raw values
There are two ways to report these resource usages. You can provide the utilization percentages (for example, "the partition is full at 23%") or look at the exact value (for example, "there are 2 GB free on that partition"). For our purposes, suffice it to say, monitoring infrastructures using percentages tends to speed up iteration time, as you can create more generic alerts. This tends to change over time, as your different applications will often have different constraints requiring different types of hardware.

Before committing our change, we are going in go one directory up and edit the file nodeserver.yml:

$ cd ..  

We need to include the new role we just created to our service. We can do that simply by adding a new entry to the roles section, as shown here:

--- 
- hosts: "{{ target | default('localhost') }}" 
  become: yes 
  roles:
    - nodejs 
    - codedeploy 
    - cloudwatch
- { role: awslogs, name: messages, file: /var/log/messages }
- {
role: awslogs,
name: helloworld,
file: /var/log/helloworld/helloworld.log,
datetime_format: "%Y-%m-%dT%H:%M:%S.%f"
}

We can save all the changes and commit them:

$ git add roles/cloudwatch nodeserver.yml
$ git commit -m "Adding new role for CloudWatch monitoring"
$ git push  

Since Ansible pulls changes every 10 minutes, within 15 minutes at the most, we should start seeing a new section in CloudWatch called Linux System, containing the new metrics of our hosts:

Now that we have all the visibility we need on our EC2 instances, we can put some alarms in place.

In many cases, especially with applications exposed to the internet, you tend to observe occasional strange behavior, but you can't easily understand how the application gets in that state. One of the most useful pieces of information to have in those cases is the access logs of your load balancer.

Currently, our load balancer exposes several metrics in CloudWatch, but it can't tell us the full story. What routes are causing a 5xx error? What is the latency? Where are the users coming from? How aggressively are they using your application? To gain access to those insights, we make a few changes to our ELB and ALB instances.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset