Chapter 14. Optimize Phase: Adjusting to Hit Goals

As you enter the optimize phase, your focus moves to making decisions in near real time, identifying anomalies, and spending efficiently. You use goal setting on metrics so your organization can understand how well it is performing against set expectations.

In this chapter we’ll expand on why you need goals, how they should be implemented in the FinOps world, and how to set them. We’ll also introduce the goals of cost optimization, which set the stage for the remainder of the optimize phase.

Why Do You Set Goals?

Every journey into the cloud tells a different story. For some, the speed with which all services are deployed into the cloud is more important than the cost of doing so. For others, the primary goal is to take it more slowly and ensure that the budget is maintained during the whole process. Others may already be in the cloud and are discovering that they’re spending much more than expected. They understandably feel a need to get things under control.

It’s worth noting that most organizations have multiple cloud stories going on at the team level. Some teams might be cloud native and looking to maintain their costs. Others are migrating or implementing new projects and are focused more on delivery time than on dollars spent.

By setting goals at the corporate level, and for each individual team, you’re able to keep track of business decisions, set expectations, and identify where within your cloud spend things aren’t working out as well as you hoped.

The First Goal Is Good Cost Allocation

Before you can set further goals, you must have a clear understanding of the current state. If you implement a clear tagging and account separation strategy as you scale out resources in the cloud, it’ll be easier to make sense of what you are dealing with and how best to optimize it.

Every organization should be tracking spend by cost center/business unit. This not only enables each individual team to understand their impact on the overall cloud spend but also allows the business to determine which teams are driving any cost changes.

We’ve already covered how to implement a clear cost allocation strategy that helps identify spend by cost center. These cost allocation strategies need to remain consistent. If they don’t, a team trying to determine their historic cloud spend won’t be able to determine the difference between their cost changes and the allocation strategy that is assigning more or less of the overall cloud spend to them.

A team-by-team breakdown of costs highlights when changes are occurring. Without individual team-level reporting, data can be misleading. For instance, if one team is optimizing their cloud spend while another grows their footprint by a similar size, it can appear as if costs are flat overall. If a team makes changes to the infrastructure and there’s no resulting change to the graphs tracking cost metrics, then you’re tracking the wrong metrics.

Is Savings the Goal?

As we’ve mentioned numerous times (because it’s very important), savings is not always the goal. Many FinOps practitioners start out focused on ways to reduce their cloud bill, and that’s great. But you must always remember that FinOps is about enabling accountability, ownership of spend investment decisions, and ultimately innovation. As you set goals, you should ask yourself: How can I spend the right amount of money to deliver the biggest benefits to the business? What are the benefits I’m seeking? Do they include increased revenue, faster project delivery, cost efficiency in other areas, or even just happier customers? Could spending more here reduce costs elsewhere, such as labor?

An organization may need to move out of a data center within a certain timeframe, and the cost of missing that deadline is much higher than the cost of migrating quickly into the cloud. Or in another case, it may be crucial for a business to deliver a feature ahead of a deadline, which requires using managed services like AWS Aurora instead of running its own MySQL. Aurora costs more than running internally, but it brings with it the benefit of removing time-consuming administration tasks (e.g., hardware provisioning, database setup, patching, and backups), which changes the TCO (total cost of ownership) comparison and allows the business more time to dedicate to the migrations.

The optimize phase is about measuring your spending against your goals and determining what decisions you can make now and which actions you should schedule for later to clean up.

To encourage a conversation about how your goals should align to savings versus the speed of innovation, we introduce the Iron Triangle.

The Iron Triangle: Good, Fast, Cheap

The Iron Triangle (or “Project Triangle,” as some call it), shown in Figure 14-1, is a model of the constraints of project management: speed, cost, and quality. The common formulation is: “Good, fast, cheap—pick any two.” It contends that a project manager can trade among constraints in determining the way a project is completed. A change in one constraint necessitates changes in the others to compensate.

The Iron Triangle
Figure 14-1. The Iron Triangle

For example, a migration project can be completed more quickly if you increase the budget or cut the scope. Similarly, increasing scope may require equivalent increases in budget and schedule. Cutting budget without adjusting schedule or scope will lead to lower quality.

The Iron Triangle is a perfect metaphor for visualizing the process that occurs around cloud decision making—there’s often a trade-off among good, fast, and cheap. The more a team aligns with one end of the triangle, the less they focus on the other areas. But this isn’t a bad thing. You want teams to make intentional choices toward one end of an axis to support their business goals.

  • Good measures the quality of your services. Giving more resources to a service running in the cloud can result in improved availability, disaster readiness, performance, and scalability.

  • Fast measures how quickly your teams move. That speed can be very valuable when you’re getting the latest updates or features out to customers, quickly migrating services into the cloud, or cutting time to market for new products.

  • Cheap measures the amount of cloud spend. A focus on cheap (due to COGS restraints or budget caps) will result in the lowest cloud costs, but with it comes impact to quality and speed, or good and fast.

The way this plays out in the real world was reflected in a recent project at Atlassian. One team oversized resources to gather extra performance data, and, as expected, that resulted in improved migration speed. Had the team stopped to resize resources for cost savings, the overall project would have been delayed. Delivery speed was more important than the amount of potential savings on the table. Once the project was delivered, the focus shifted away from speed, and the team actively began reviewing the deployment to discover possible savings.

We must emphasize that the Iron Triangle isn’t intended to be used only at the organization level. It must extend to the team level as well. It’s common, and useful, for a cost-aware organization to have a single team and/or service operating with a speed focus. While overall the organization is trying to maintain or increase savings in the cloud, the increase in savings can be reinvested in the team moving more quickly to develop the next competitive advantage for the organization.

Hitting Goals with OKRs

OKRs, or objectives and key results, are a framework for defining and tracking objectives and their outcomes. For each OKR, there’s an objective to be achieved, along with a set of metrics called key results that will measure the achievement of that objective. OKRs typically have a shelf life of a quarter.

During a FinOps Foundation presentation, Joe Daly, formerly Director of Cloud Services at Nationwide, said, “We call our shots with OKRs and focus on results to provide clarity, accountability, and measurable outcomes.”

Daly also made clear that the most important thing with OKRs is to focus on results. When an enterprise is moving to the cloud, there can be a massive amount of disruption as teams try to quickly assimilate a mountain of new knowledge. And that can be very intimidating for teams who have done things another way for years.

OKR Focus Area #1: Credibility

When Daly’s FinOps team at Nationwide was relatively new, he offered this advice: “Credibility is probably the most important area to focus on when you’re starting up a FinOps practice. Credibility equates to trust, and if you don’t have that trust, you’re constantly trying to prop up the services you provide.”

Daly supports his credibility objective by providing transparency from the end user all the way down to the code. A key part of that is regular spend updates (daily, weekly, monthly) at whatever granularity each stakeholder needs. These updates must be simple and easy to understand. The numbers must also tie to what their accountants will report to them at the end of the month.

OKR Focus Area #2: Maintainable

All too often, new FinOps teams approach their work in unmaintainable ways, such as not enforcing automated tagging. As Daly says, “Meaningful data needs to be managed by the people to whom it’s meaningful. What we’ve done is create a tag repository that’s maintained by the application product managers and business-facing folks, so that you can tie all the application data and business data to the resources without depending on engineers, to whom the data is not as meaningful.”

Two examples of key results his team has set in this area are being able to tie application name, application owner, and business group to each resource and automating routine, time-consuming tasks like chargeback.

OKR Focus Area #3: Control

The goal here is to focus on establishing control while also enabling speed. In Daly’s words, “We push accountability for usage control to the application/product teams.” They’ve accomplished this by establishing a direct chargeback model, sharing knowledge, and encouraging user adoption, while being sure to implement policies to protect against autoscale nightmares.

Some examples of key results his team has set in this area are to double their tag compliance rate, to establish chargeback for cloud and container usage, and to automate compliance policies.

Teams have different motivators that will drive spend and savings. Engineers quite rightly set goals around more performance, higher availability, and faster deployments of new features. Finance teams focus appropriately on spending, savings, and efficiency of spend. And business leaders predictably remain focused on overall business performance. When setting goals for FinOps, you can’t just leave these teams to individually set their goals. Not only will it not achieve the correct outcome for the business, but it will also drive up friction between teams.

When you have engineering and finance talking together you usually get a solution that works for both departments, as opposed to engineers just getting in the room and coming up with a solution that only works for the engineers. Likewise, finance comes in developing policies that make life harder for the engineers. Getting people to work together to build better-focused OKRs will drive better results for everyone.

Joe Daly, formerly Director of Cloud Services at Nationwide

Business leaders need to decide how much they are willing to spend, and when they should forgo savings (and potentially even incur waste) in the interest of speed. But whether it’s about moving IT out of the data center as soon as possible or getting a new service out to customers, there should always be tracked spending expectations. Speed at all costs works only until spend grows much higher than is acceptable.

Instead of waiting for cloud costs to breach uncomfortable thresholds and then struggling to respond, setting early expectations and tracking cloud spend as projects progress will help avoid bill shock. Engineers are then given more freedom within agreed cost bounds. FinOps teams can help identify the easy wins to keep budgets on track, while finance and business leaders are able to reassess budgets and targets.

In FinOps Certified Practitioner training sessions, the topic of the FinOps team’s role in the optimize phase is about helping to make the important decisions about cost that keep spending in bounds (no matter how fast you’re trying to go) and then being able to identify places to optimize later when ideal cost paths aren’t followed (due to speed OR due to sloppiness, lack of skill, desire for higher quality, etc.). So we find the low-hanging fruit to clean up, but we also inject an important set of cost checks early in the process for governance as well.

Rob Martin, Director of Learning at the FinOps Foundation

The entire business needs to find this happy medium, spending enough to enable technology teams while keeping spending within acceptable bounds. Ultimately, you’re building toward a FinOps nirvana, where unit economics will enable you to determine the amount of business value you gain from cloud spend.

Discussing goals with FinOps practitioners at other organizations can help you with building out your own goals. Hearing the goals that are important to them, and why, can assist you in ensuring you are setting the right goals and aligning with the general FinOps community.

Goals as Target Lines

Goals aren’t just a single value—they’re a trend over time. If you set a goal as a single value such as x dollars per month, you have to wait until the end of the month to see how you are tracking. By breaking this goal into daily values, you can draw target lines on your cloud spend graphs that enable near-real-time analysis. Target lines are critical in metric-driven cost optimization, which we will cover later in Chapter 22.

Metrics should always have target lines to provide more context to the data. Where you set the target line will be based on your organization’s cloud journey, how aggressive you are in maintaining spend, and the value you see in spending more to enable innovation.

Organizations currently focusing on the “fast” corner of the Iron Triangle might set their targets pretty high. Breaching the targets will only be informational and will result in raising the forecast. On the other hand, a cautious organization (focusing on the “cost” corner of the Iron Triangle) will be setting targets fairly close to its existing spend trajectory, and breaches in spend will be followed up by quick actions to get back on track.

If you graph your spend over time as in Figure 14-2, you can determine a few basic things:

  • You’re currently spending between $400,000 and $530,000 per day.

  • Overall, your cloud spend is increasing.

  • Last month’s spend increased at a steeper rate than previous months.

But these basic facts about the cloud spend are only data points. You can’t determine any performance or business expectations from the graph.

Graph of daily spend over a few months
Figure 14-2. Graph of daily spend over a few months

However, if you add a target line, as shown in Figure 14-3, you can determine a lot more:

  • Your spend has been under target historically.

  • Last month you overspent against your target.

  • You haven’t revised your target over the previous months.

  • You will be over target again this month if your current trend doesn’t change.

Adding the target line allows you to get more context about the graphed data points. A target line does not always need to be linear; consider a target that has a quick ramp up and then a plateau or delayed growth then rapid rise. Where possible, you should always try to include a target line within your charts to understand the impacts of the data points on the organization. This plays into the language of FinOps, as discussed in Chapter 4.

Graph of daily spend over a few months with the addition of a target line
Figure 14-3. Graph of daily spend over a few months with the addition of a target line

Budget Variances

Generally, if your metrics vary significantly from your targets, there’s an issue. To maintain budgets, teams should take breaches of targets seriously and react to changes in metric performance as early as possible.

While it might not appear to be important if metrics are way under targets, you should always strive to set your targets at the level expected. This will build confidence in the numbers being tracked and ensure that all details are being built into the target-setting phase.

Sometimes teams need to make choices, and those choices will affect their budgets. Consider a case in which one team is behind on a project. Their plan was to deploy smaller cloud resources to keep costs down, but to get the project back on track, they decided to deploy larger resources and complete things faster. This will result in a short-term breach of targets, but it will have a business benefit of getting the project delivered on time. This is what we classify as an expected anomaly. A decision is made ahead of time to exceed the planned budgets to achieve a desired business outcome.

An unexpected anomaly occurs when teams deploy resources with the expectation that they will maintain budget, and then find they are trending over budget. An unexpected anomaly typically means something will need to change: either what was deployed, or the targets set for the team.

Say, for example, that new features in your products could drive up customer interaction with your services, which would in turn increase costs. If that increase in cost also increases your earnings, then this would be a good anomaly to see in your cloud spend. (We’ll discuss unit economics, a process that allows you to track earnings against your costs, in Chapter 26.)

You must also be able to track anomalies that might not directly result in a change in cloud spend. If one of your teams starts using a new cloud service offering, replacing the usual one, you can learn of this through anomaly reports that show your cost by cloud service offering. Anomalies in this report can be very significant for companies that require sign-off—for security or compliance reasons—before using new services.

Tracking your cost optimizations allows you to identify unexpected changes in optimization performance, which could indicate that part of your FinOps strategy is not being followed correctly. Anomalies in the optimization reports allow the FinOps team to react early in order to maintain the savings that an organization has come to expect. Using automated, machine learning–based anomaly detection is key to finding those needles in your cloud haystack quickly.

Using Less Versus Paying Less

When you know what you are spending versus what you are expecting, the next obvious question is what to do when you are over forecast. Well, there are two ways to reduce your cloud bill, and to best understand them, let’s revisit how the cloud charges you.

We covered how cloud providers charge you in Chapter 5. Remember, the basic formula for cloud spend is Spend = Usage × Rate. To reduce spend, you can either reduce the amount of usage you have or use cloud service provider–specific offerings to reduce the rate you pay for your resources.

You reduce the usage by either reducing the size of provisioned resources (e.g., smaller virtual machines with less virtual CPU [vCPU] and memory) or turning them off/removing them when not needed. You generally earn a rate reduction by making a commitment to your cloud service provider for a set amount of usage over a period of time. In return, they reduce the rate you pay for those resources.

We’ll be covering tactics and strategies for each of these two levers—usage and rate reduction—in Chapters 15 and 16.

Conclusion

The inform phase of the FinOps lifecycle helps you understand where things are. As you move into the optimize phase, you set goals about where you expect to be. Using these goals enables the business to focus on items that need attention.

To summarize:

  • The optimize phase is not always about reducing costs—it’s about spending efficiently.

  • Goals give context to metrics, allowing people to understand not only where things are, but also where they should be.

  • Anomalies detected early can be addressed quickly, avoiding billing surprises.

  • You can reduce costs through usage reduction or rate reduction.

When reducing cloud spend, it’s important to understand what is actually needed. Reducing spend at the cost of innovation, or at the cost of impacting an important project, should always be avoided. Reducing costs in the cloud is complex, but we’ll help by breaking it down over the next few chapters.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset