Chapter 6. Stage 3: Optimize

Although traceability and predictability are important elements in financial governance policies, cost control and cost reduction are typically the focus of any financial governance exercise.

Having fully understood the nature of your platform and implemented sufficient controls, the next step is to see how you can take advantage of the cloud platform in order to optimize your usage and therefore minimize cost without affecting the quality of service or the traceability and predictability put in place.

As discussed earlier, the fluidity and ease of control of cloud platforms can cause real difficulties for maintaining financial governance. However, when used correctly, you can use these same elements as a tool of cost reduction.

Cloud platforms are designed for automation, to be dynamically created and destroyed on demand, and careful use of these facilities can result in a highly optimized system that is extremely cost effective while always meeting the requirements of the business.

Optimizing for Performance

Part of the optimization process is to try to ensure optimal performance. However, when we optimize for performance, it is important to remember that we are optimizing not only the speed of query execution, but also the timeliness of the execution. One of the costliest resources to the business is data scientist time; this can often be a hidden cost of running a data-processing platform. So, the least amount of waiting these people must do, the better.

However, timeliness does not always mean “as quickly as possible.” It is more a matter of understanding when the results are needed and ensuring that they are available by that time and optimizing the cost of delivery to have them ready by then.

Reinventing Capacity Management

Moving to a cloud platform requires you to fundamentally reinterpret what is meant by capacity management. As discussed, capacity management was traditionally a matter of planning what capacity was going to be needed during the lifetime of the infrastructure being purchased, allowing some extra capacity for unexpected growth, and then building your system to meet that capacity level. In other words, the objective was always to have spare capacity.

In the cloud world, the opposite view should be taken, because you can create and destroy infrastructure on demand and pay only for what you use. The objective should be never to have any spare capacity.

Your goal during the optimization stage should be building systems that are constantly providing sufficient capacity to be slightly above that needed (cloud capacity versus real capacity, as demonstrated in Figure 6-1), while maintaining the traceability and predictability put in place in earlier stages.

Traditional capacity management versus cloud capacity management
Figure 6-1. Traditional capacity management versus cloud capacity management

Financial Governance Tools Provided by Cloud Service Providers

Cloud service providers offer very limited functionality in the area of optimization. In general, their position is that they provide reporting information with full alerting and the ability to programmatically react to those alerts to manage infrastructure—any action that could be taken automatically to optimize cost becomes your responsibility.

Many companies will create their own custom scripts to carry out automation. For simple tasks, this can be a good solution; it is generally relatively quick to create and allows you to tailor very specific requirements. The downside is that the scripts then need managing and maintaining as cloud platforms evolve.

Another approach is to build optimization into the application that you have running on it, making it aware of its capacity or availability requirements and adjusting the platform in real time to meet those requirements. If you do this well, this can be a very sophisticated solution but it is a complex development task and carries higher risk and overhead than the aforementioned scripting approach.

Financial Governance Tools Provided by Cloud Management Platforms

In general, optimization is where people turn to third-party tools as a solution. Tools such as CloudCheckr or Cloudability offer a wide range of optimization tasks that you can easily configure and manage, which we cover momentarily.

Again, these tools look to move control of complex tasks from a technical to a management level, removing the risk of developing and managing automation scripts on an ongoing basis, bringing continuous improvements in the functionality they can offer, and integrating with the reporting and alerting solutions the platforms offer.

Waste Reduction

Optimizations usually focus on reducing the amount of waste within the system. This can include the following:

Removing orphaned or unused infrastructure

Removing infrastructure that has been left behind when other infrastructure was terminated (e.g., disk volumes, ideally combined with auto-snapshot before deletion) or infrastructure that has sat idle for a specified amount of time.

Resizing underutilized infrastructure

Adjusting the size of infrastructure that has had spare resource to an appropriate level. This requires careful policy creation because capacity must take into account expected spikes in usage.

Starting/stopping infrastructure based on schedules

Automating the creation and destruction of systems to fit around usage patterns. For example, creating development environments for use during office hours or extending production platforms during peak trading hours.

Cost Optimization

You can undertake other automation tasks to minimize the cost of the infrastructure being used. The varying types of cloud charging models are discussed in more detail in a moment, but tooling can apply automatic system management to ensure that the type of infrastructure being used is the best value while meeting the levels of resilience and availability required by the system.

Effective use of practices such as reserved instances or spot instances could reduce costs by up to 80%.

Traceability Management

You can also use automation to apply rules that will ensure that levels of traceability defined are being met. For example, you can configure policies to automatically destroy any elements that are created that do not meet the tagging policy in place.

Financial Governance Tools Provided by Cloud-Native Data Platforms

It is in the area of optimization that cloud-native data platforms come into their own. This was often the original objective in their creation, to remove the complexity and inefficiency of running a diverse set of big data activity on a cloud platform.

Cloud-native data platforms optimize your cloud usage by taking two approaches: ensuring the most efficient use of resources, and ensuring that resources are bought at optimal cost.

Resource Efficiency

Cloud-native data platforms ensure that the minimum amount of resources is being used by doing the following:

Ensuring efficient start up and shut down of infrastructure

Many cloud providers charge by the second, so there are savings to be made by ensuring that infrastructure is destroyed as soon as it is not needed. Analytic data platforms manage this by ensuring that infrastructure is destroyed as soon as any workload is completed. An argument against destroying infrastructure immediately can be that there might be requirements to access that infrastructure later to retrieve additional information. Analytic data platforms reduce the risk of this problem by providing capture and backup of logs from any systems that are destroyed.

Ensuring efficient sharing of resources that might be underutilized

Minimizing the need for creation, management, and termination of many platforms. Using an existing platform also speeds up processing time because there is no need to wait several minutes while the platform is created.

Appropriate sizing of infrastructure

Ensuring rightsized infrastructure is used in order to meet performance needs at optimal cost. Analytic data platforms will include workload-aware autoscaling; that is, the dynamic scaling up of infrastructure specifically to meet the needs of the workload being carried out, scaling down the infrastructure as soon as it is completed. This is more efficient than standard cloud autoscaling, which is driven by physical metrics such as memory usage or CPU usage and therefore has no concept of the work being undertaken on the platform.

Resource Cost Optimizations

Cloud-native data platforms offer various costs depending on the level of commitment you want to make. There are three basic models; some platforms might use slightly different terminology, but the models are the same:

On demand

Available immediately when you request it with no commitment; ability to destroy when no longer needed

Reserved

Upfront payment is made in return for an agreed reduced on-demand cost

Spot

Reduced cost given based on spare capacity being available within the cloud platform, typically done via an auction type system

Each of these cost models is best suited for a different use case. On demand suits immediacy without the need for commitment; reserved suits situations in which you know the infrastructure will be in use for the majority of the time; spot works for situations in which cost is a driver and the workload is not time sensitive. Spot instances (Figure 6-2) can be terminated at any point when your bid is below the current price so a level of resilience needs to be built in to handle this. Cloud-native data platforms handle that resilience by building in systems to replace any spot instances that are terminated (in some cases that can include looking in alternative regions for right-priced spot instances) or by building a platform that is a mix of spot and on-demand instances to ensure that the core of the platform will never be terminated.

Example of spot instance management from Qubole platform
Figure 6-2. Example of spot instance management from Qubole platform

Cloud-native data platforms understand cloud platform cost models and, in line with the cost policies that you set in the analytic data platform, ensure that the infrastructure created is done so at as optimal a cost as possible while still achieving your performance needs, therefore allowing you to take advantage of the reduced cost models offered by the cloud platforms without having to understand the details or manage the process.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset