Creating an ECS cluster

Creating an ECS cluster is very similar to what we did in Chapter 5, Scaling Your Infrastructure, when we created an Auto Scaling group to run our helloworld application. The main difference is that there is one more level of abstraction. ECS will run a number of services called tasks.

Each of those tasks may exist multiple times in order to handle the traffic, as shown in the following diagram:

In order to do that, the ECS service provides an orchestration layer.

That orchestration layer is in charge of managing the life cycle of containers, including upgrading or downgrading, and scaling up or down your containers. The orchestration layer also distributes all containers for every service across all instances of the cluster optimally. Finally, it also exposes a discovery mechanism that interacts with other services, such as ALB and ELB, to register and deregister containers.

Task placement strategies
While by default, the entire orchestration system is managed by AWS, you also have the ability to customize it through the creation of a task placement strategy. This will let you configure the orchestration to optimize, for instance, count, for load distribution, or add constraints, and make sure that certain tasks are launched on the same instances.
You can read more about task placement strategy at http://amzn.to/2kn2OXO. In addition, AWS is maintaining a collection of open source projects geared toward container management and orchestration. You can check those out at https://blox.github.io.

We will create a new script to generate our ECS cluster.

The filename will be ecs-cluster-cf-template.py.

This template starts almost exactly like the template we created in Chapter 5, Scaling Your Infrastructure, for the Auto Scaling group:

"""Generating CloudFormation template.""" 
 
from ipaddress import ip_network
from ipify import get_ip
from troposphere import ( 
    Base64, 
    Export, 
    Join, 
    Output, 
    Parameter, 
    Ref, 
    Sub, 
    Template, 
    ec2 
) 

from troposphere.autoscaling import ( 
    AutoScalingGroup, 
    LaunchConfiguration, 
    ScalingPolicy 
) 

from troposphere.cloudwatch import ( 
    Alarm, 
    MetricDimension 
) 

from troposphere.ecs import Cluster 

from troposphere.iam import ( 
    InstanceProfile, 
    Role 
)

The only new import is the Cluster one from the ECS module. Exactly like in Chapter 5, Scaling Your Infrastructure, we will extract our IP address in order to use it later for the SSH security group, create our template variable, and add a description to the stack:

PublicCidrIp = str(ip_network(get_ip())) 
 
t = Template() 
 
t.add_description("Effective DevOps in AWS: ECS Cluster")

We will now proceed with adding our parameters, which are the exact same parameters as in Chapter 5, Scaling Your Infrastructure, the ssh keypair, the vpc id, and its subnets:

t.add_parameter(Parameter( 
    "KeyPair", 
    Description="Name of an existing EC2 KeyPair to SSH", 
    Type="AWS::EC2::KeyPair::KeyName", 
    ConstraintDescription="must be the name of an existing EC2 KeyPair.", 
)) 
 
t.add_parameter(Parameter( 
    "VpcId", 
    Type="AWS::EC2::VPC::Id", 
    Description="VPC" 
)) 
 
t.add_parameter(Parameter( 
    "PublicSubnet", 
    Description="PublicSubnet", 
    Type="List<AWS::EC2::Subnet::Id>", 
    ConstraintDescription="PublicSubnet" 
))

Next, we will look at creating our security group resources:

t.add_resource(ec2.SecurityGroup( 
    "SecurityGroup", 
    GroupDescription="Allow SSH and private network access", 
    SecurityGroupIngress=[ 
        ec2.SecurityGroupRule( 
            IpProtocol="tcp", 
            FromPort=0, 
            ToPort=65535, 
            CidrIp="172.16.0.0/12", 
        ), 
        ec2.SecurityGroupRule( 
            IpProtocol="tcp", 
            FromPort="22", 
            ToPort="22", 
            CidrIp=PublicCidrIp, 
        ), 
    ], 
    VpcId=Ref("VpcId") 
))

There is one important difference here. In Chapter 5, Scaling Your Infrastructure, we opened up port 3000 since that's what our application is using. Here, we are opening every port to the CIDR 172.16.0.0/12, which is the private IP space of our internal network. This will give our ECS cluster the ability to run multiple helloworld containers on the same hosts, binding different ports.

We will now create our cluster resource; this can simply be done with the following call:

t.add_resource(Cluster( 
    'ECSCluster', 
))

Next, we will focus on configuring instances of the cluster starting with their IAM role. Overall, this is one of the more complex resources to create in ECS as the cluster will need to perform a number of interactions with other AWS services. We can create a complete custom policy for it, or import the policies AWS created:

t.add_resource(Role(
    'EcsClusterRole',
    ManagedPolicyArns=[
        'arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM',
        'arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly',
        'arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role',
        'arn:aws:iam::aws:policy/CloudWatchFullAccess'
    ],
    AssumeRolePolicyDocument={
        'Version': '2012-10-17',
        'Statement': [{
            'Action': 'sts:AssumeRole',
            'Principal': {'Service': 'ec2.amazonaws.com'},
            'Effect': 'Allow',
        }]
    }
))

We can now tie our role with the instance profile, as follows:

t.add_resource(InstanceProfile( 
    'EC2InstanceProfile', 
    Roles=[Ref('EcsClusterRole')], 
))

The next step is to create our launch configuration. This is what it looks like:

t.add_resource(LaunchConfiguration( 
    'ContainerInstances', 
    UserData=Base64(Join('', [ 
        "#!/bin/bash -xe
", 
        "echo ECS_CLUSTER=", 
        Ref('ECSCluster'), 
        " >> /etc/ecs/ecs.config
", 
        "yum install -y aws-cfn-bootstrap
", 
        "/opt/aws/bin/cfn-signal -e $? ", 
        "         --stack ", 
        Ref('AWS::StackName'), 
        "         --resource ECSAutoScalingGroup ", 
        "         --region ", 
        Ref('AWS::Region'), 
        "
"])), 
    ImageId='ami-04351e12', 
    KeyName=Ref("KeyPair"), 
    SecurityGroups=[Ref("SecurityGroup")], 
    IamInstanceProfile=Ref('EC2InstanceProfile'), 
    InstanceType='t2.micro', 
    AssociatePublicIpAddress='true', 
))

In this example, we don't install Ansible like we did before. Instead, we are using an ECS-optimized AMI (you can read more about it at http://amzn.to/2jX0xVu) and using the UserData field to configure the ECS service and start it.

Now that we have our launch configuration, we can create our Auto Scaling group resources.

When working with ECS, scaling is needed at the following two levels:

The containers level, as we will need to run more containers of a given service if the traffic spikes
The underlying infrastructure level

Containers, through the intermediary of their task definitions, set a requirement for CPU and memory. They will require, for example, 1024 CPU units, which represents one core and 256 memory units, which means 256 MB of RAM. If the ECS instances are close to being filled up on one of those two constraints, the ECS Auto Scaling group needs to add more instances. The following diagram shows how pending tasks are routed from the ECS Orchestrator, which places them in the ECS Auto Scaling group:

In terms of implementation, the process is very similar to what we did in Chapter 5, Scaling Your Infrastructure.

We first create the AutosScalingGroup resource:

t.add_resource(AutoScalingGroup( 
    'ECSAutoScalingGroup', 
    DesiredCapacity='1', 
    MinSize='1', 
    MaxSize='5', 
    VPCZoneIdentifier=Ref("PublicSubnet"), 
    LaunchConfigurationName=Ref('ContainerInstances'), 
))

Next, we will create ScalingPolicies and Alarms to monitor the CPU and memory reservation metrics. In order to accomplish that, we will take advantage of Python to generate our stack and create for loops, as follows:

states = { 
    "High": { 
        "threshold": "75", 
        "alarmPrefix": "ScaleUpPolicyFor", 
        "operator": "GreaterThanThreshold", 
        "adjustment": "1" 
    }, 
    "Low": { 
        "threshold": "30", 
        "alarmPrefix": "ScaleDownPolicyFor", 
        "operator": "LessThanThreshold", 
        "adjustment": "-1" 
    } 
} 
 
for reservation in {"CPU", "Memory"}: 
    for state, value in states.iteritems(): 
        t.add_resource(Alarm( 
            "{}ReservationToo{}".format(reservation, state), 
            AlarmDescription="Alarm if {} reservation too {}".format( 
                reservation, 
                state), 
            Namespace="AWS/ECS", 
            MetricName="{}Reservation".format(reservation), 
            Dimensions=[ 
                MetricDimension( 
                    Name="ClusterName", 
                    Value=Ref("ECSCluster") 
                ), 
            ], 
            Statistic="Average", 
            Period="60", 
            EvaluationPeriods="1", 
            Threshold=value['threshold'], 
            ComparisonOperator=value['operator'], 
            AlarmActions=[ 
                Ref("{}{}".format(value['alarmPrefix'], reservation))] 
        )) 
        t.add_resource(ScalingPolicy( 
            "{}{}".format(value['alarmPrefix'], reservation), 
            ScalingAdjustment=value['adjustment'], 
            AutoScalingGroupName=Ref("ECSAutoScalingGroup"), 
            AdjustmentType="ChangeInCapacity", 
        ))

Finally, we will output a small amount of resource information, namely the stack ID, the VPC ID, and public subnets:

t.add_output(Output( 
    "Cluster", 
    Description="ECS Cluster Name", 
    Value=Ref("ECSCluster"), 
    Export=Export(Sub("${AWS::StackName}-id")), 
)) 
 
t.add_output(Output( 
    "VpcId", 
    Description="VpcId", 
    Value=Ref("VpcId"), 
    Export=Export(Sub("${AWS::StackName}-vpc-id")), 
)) 
 
t.add_output(Output( 
    "PublicSubnet", 
    Description="PublicSubnet", 
    Value=Join(',', Ref("PublicSubnet")), 
    Export=Export(Sub("${AWS::StackName}-public-subnets")), 
)) 
 
print(t.to_json())

CloudFormation provides a number of pseudo-parameters such as AWS::StackName. Throughout the chapter, we will rely on it to make our template generic enough to be used across different environments and services. In the preceding code, we created an ECR repository for our helloworld container. The name was generated by the stack creation command. If needed, we could reuse that exact same template to create another repository for another container.

The script is complete and it should look like this: http://bit.ly/2vatFi9

As before, we can now commit our script and create our stack by first generating our template:

$ git add ecs-cluster-cf-template.py
$ git commit -m "Adding Troposphere script to generate an ECS cluster"
$ git push
$ python ecs-cluster-cf-template.py > ecs-cluster-cf.template

To create our stack, we need three parameters: the keypair, the VPC id, and the subnets. In the previous chapters, we used the web interface to create those stacks. Here, we will see how to get that information using the CLI.

To get the VPC ID and subnet IDs, we can use the following:

$ aws ec2 describe-vpcs --query 'Vpcs[].VpcId'
[
    "vpc-f7dc4093"
]
$ aws ec2 describe-subnets --query 'Subnets[].SubnetId'
[
    "subnet-4decfe66",
    "subnet-3e905948",
    "subnet-82ba3fbf",
    "subnet-4f3bdb17"
]

We can now create our stack by combining those outputs. Since ECS clusters can run a variety of containers and, through that, run a number of applications and services, we will aim for one ECS cluster per environment, starting with staging. In order to differentiate each environment, we will rely on the stack name. Therefore, it is important to call your stack staging-cluster as shown here:

$ aws cloudformation create-stack 
      --stack-name staging-cluster 
      --capabilities CAPABILITY_IAM 
      --template-body file://ecs-cluster-cf.template 
      --parameters 
        ParameterKey=KeyPair,ParameterValue=EffectiveDevOpsAWS 
        ParameterKey=VpcId,ParameterValue=vpc-f7dc4093 
        ParameterKey=PublicSubnet,ParameterValue=subnet-3e905948\,subnet-4decfe66\,subnet-4f3bdb17\,subnet-82ba3fbf
{
    "StackId": "arn:aws:cloudformation:us-east-1:511912822958:stack/staging/6b2a2510-e21a-11e6-a834-50d501eed2b3"
}

We are now going to add a load balancer. In the previous chapter, we used an ELB for our Auto Scaling group. Previously, we also mentioned the existence of the ALB service. This time, we will create ALB instance to proxy our application traffic.

Table of Contents for Creating an ECS cluster

Create new playlist

Sign In

Sign Up

Table of Contents for
Creating an ECS cluster