The frontend layer

With the subnets in place, we can start thinking about our VPC inhabitants.

The frontend or application layer consists of our Auto Scaling Groups and the first decision that we'll face would be that of an EC2 instance type.

The profile of the frontend application would very much dictate the choice between a memory, compute or a storage optimized instance. With some help from fellow developers (in the case of an in-house application) and a suitable performance testing tool (or service) you should be able to ascertain which system resource does the given application make most use of.

Let us assume we have picked the C4 Compute Optimized instance class which AWS suggests for webservers. The next question will be - what size?

Well, one way to guess our way through, is to take the average number of requests per second that we would like to be able to support, deploy the minimum number of instances we can afford (two for resilience) of the smallest size available in the chosen class and run a load test against them. Ideally the average utilization across the two nodes would remain under 50% to allow for traffic spikes and events of failure where the remaining host takes all the load. If the results are far below that mark, then we should look for a different class with smaller instance types for better value. Otherwise we keep increasing the C4 size.

Next comes the question of Auto Scaling. We have the right class and instance size to work with, and now we need scaling thresholds. Firstly, if you are fortunate enough to have predicable loads, then your problems end here with the use of Scheduled Actions:

The frontend layer

You can simply tell AWS scale me up at X o'clock then back down at Y. The rest of us, we have to set alarms and thresholds.

We've already decided that a 50% average utilization (let us say CPU) is our upper limit and by that time we should already have scaling in progress. Otherwise, if one of our two nodes fails, at that rate the other one will have to work at maximum capacity. As an example a CloudWatch alarm could be >40% average CPU used for five minutes, triggering an Auto Scaling Group action to increase the group size by 50% (which is one instance).

Tip

In order to prevent unnecessary scaling events, it is important to adjust the value of the Cooldown period. It should reflect the expected time a newly launched instance will take to become fully operational and start affecting the CloudWatch metric.

For even finer control over how Auto Scaling reacts to the alarm we could use Step Scaling (ref: http://docs.aws.amazon.com/autoscaling/latest/userguide/as-scale-based-on-demand.html). Step Adjustments allow for a varied response based on the severity of the threshold breach. For example, if the load increases from 40% to 50%, then scale up with only a single instance, but if the hop is from 40% to 70%, go straight to two or more.

Tip

With Step Scaling the Cooldown period is set via the Instance Warmup option.

While we aim to scale up relatively quickly to prevent any service disruption, scaling down should be timely to save hourly charges, but not premature which could cause a scaling loop.

The CloudWatch alarm for scaling down should act over a much longer period of time than the five minutes we observed earlier. Also the gap between the threshold for scaling up and the one for scaling down should be wide enough not to have instances launch, only to be terminated shortly after.

EC2 Instance utilization is just one example of a trigger; it is also worth considering ELB metrics such as sum of total request, non-2XX responses or response latency. If you choose to use any of those, ensure that your scale down alarms react to the INSUFFICIENT_DATA state which is observed during periods of no traffic (perhaps late at night).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset