Hands-on lab 3 – Current billing file

What is in the current billing file? How can it be compared to the current market? Today, analyzing a cloud billing file is very difficult. Billing files are very detailed. They usually have many different services, with different locations, different billing methods, different terms, quantities, and very cryptic ways of identifying exactly what product or service is being referenced. Today, many try to download spreadsheets and CSV files to analyze them line by line. This is very time-consuming and prone to error. Most automated tools do not have the ability to compare and drive insight across the entire market. Many efforts take days and weeks to normalize and compare billing data. Cloud solutions have services that last fractions of a second, hours, and days. Taking weeks to analyze, compare, and design solutions is not acceptable in the cloud industry. Automation and enablement is a requirement.

  1. In the bottom left of the design board, there are a few icons that can start leading us to visually-driven insight. The first icon should be currently selected. The first icon shows the logical visualization of everything in the file:
  1. This billing file has four main locations as described earlier, three in the US and one in Europe. Please click on the icon located furthest to the right in that same row at the bottom left of the design board:

This view visualizes the billing file line by line with a total at the bottom. The following screenshot shows a partial view of the Bill of Material (BOM) view. Please confirm you have selected the correct tab by comparing to the following screenshot:

  1. Please scroll to the bottom of the page, using the small slider on the right side of the page or use the mouse wheel to scroll. At the bottom of the page, a green oval will hold the total for the billing term specified in the billing file loaded as well as a second green oval with the total monthly recurring cost (MRC). Since this is an AWS bill that has services for a term of one month or less, the monthly (MRC) will match the total. The oval labeled NRC has a total of $0.00. This confirms that no reserve instances are being consumed. Please confirm that views match up to the following screenshot:

The imported billing file shows a total of $65,337.13. This is the rolled-up total for all locations contained in the billing file. It is important to be able to understand the stories the data is telling. It is also very important to understand what questions still need to be asked and what answers still need to be found. For example, how much of this bill is allocated to each site? Which site is primary? What products and services are currently deployed at each site?

  1. Please click on the icon in the bottom left of the screen; this time, please choose the icon located second from the right:

This icon will bring up a list that can be searched and filtered, again enabling quick visual insight utilizing various ways to align and compare data. Please confirm views have changed to the correct location by comparing to the following screenshot:

  1. The columns can be sorted by clicking the heading for each column. Please confirm which service has the highest MRC cost by clicking MRC twice. The first time will arrange it from low to high. The second click will reverse the order and arrange it from high to low. Please confirm via the following screenshot. Which service is the most expensive? Which site is it deployed in? What is the second highest and where is it deployed?

AmazonElasticCache appears to be the highest-cost line item in the bill. This service is currently deployed in USW1 (AWS San Jose). Some interesting questions regarding optimization surface now that we know caching is the highest-cost item in the entire billing file. Caching is typically a service that is employed to keep the cost of other services down:

    • Is caching working as planned?
    • Is it deployed correctly?
    • Is it refreshing content and removing stale content working as planned?
    • Does the caching service offset other more expensive services as intended?
    • Should this much content in San Jose be caching this often?
    • Is San Jose the primary location that should be serving a majority of the content?

Quickly visualizing data in this way enables attention, focus, and effort to be placed in the most effective way based on insight revealed. Cloud architecture requires a keen sense of utilizing only what is needed only when it is needed. Cloud architecture is being as mindful of economic impact as required for technical details.

The bill also has AWSDirectConnect as the second most expensive line item. This line item is deployed in a different location from the caching service. AWSDirectConnect is deployed from US East 1 (Northern Virginia), not US West 1. AWSDirectConnect is used to connect client locations directly to AWS. What types of questions surface knowing these details?

    • How does the direct-connected location on the east coast relate to the west coast location that appears to be caching a lot of content?
    • Is the US East 1 location the primary location or is the US West 1 location primary?
    • There were four sites represented in the billing file. Is one of the other locations primary?
    • Why is there 2445 GB of data being transferred in one month across the AWSDirectConnect link? Big transfer or backup job?
    • 2445 GB transferred in less than 720 hours per month equates to a fully utilized 7 Mbps-8 Mbps line. Is there a more cost-effective solution for low bandwidth connectivity?
    • What location does the AWS US East 1 location directly connect to?
    • Can/should the services connecting to AWS be moved into a cloud service to eliminate the monthly cost associated with AWSDirectConnect?
    • At current market pricing, direct-connect to a 10G port is $2.41/hour. If using 720 hours as a standard month, the monthly cost shown in the bill would equate to a total of three 10GE ports sending a total of 7 Mbps. What is the story that this is telling?
    • The actual cost is for transfer out. AWS does not charge inbound. At an average of $0.02 currently per GB of outbound transfer, 2448 GB should account for less than $50.00 total for the month. Again, what is the story that is behind such an anomaly?
  1. As great cloud architects, diving deeper is a must. There is more to this story. Please click on the Contract column header at the far left:

By clicking this header, you can sort the table by this column using alphabetical order. Clicking one time will sort A-Z. Clicking a second time will sort Z-A. Please click one time only. Please scroll down to find USE1 for US East 1. Please confirm the view matches the following screenshot:

More very interesting data points rise in this view. Unfortunately, at this time, we appear to be finding more questions than answers. Please look at the types of services (second column) and the monthly costs (last column on the right). What stands out? What is the story being told?

  • Fairly normal infrastructure is deployed that could be used in either primary or backup locations including DB, block storage, compute, S3, DNS, and so on
  • Costs are minimal and in some cases $0.00
  • It appears that this site would be set up as a redundant site; maybe a warm site that has some data, but not thousands of GBs worth of data

Some questions appear when looking at some of the additional detail:

  • Why are there thousands of GBs and thousands of dollars' worth of data being transferred out of this site when there is very little data stored in this location?
  • The amount of data stored in this redundant/backup location does not appear to match up with what is expected of a $68,000 per month consumer of AWS cloud services.
  • Compute costs are zero, or close to it. Has the data that is replicated there been validated? Has it been verified to work as planned? When was the last time it was checked and tested?

The solutions have both DynamoDB as well as RDS. In some cases, particularly in the cloud realm, different types of databases can be utilized for different purposes. For example, DynamoDB is only a NoSQL database where RDS can be one of six types. DynamoDB is a multi-tenant database solution with much lower costs. RDS is a single tenant solution at much higher costs. Both have completely different pricing models.

  1. At the top of the same page in current view, there is a Text Search box. Please type dyn in the search box:

The filtered results immediately change to only show locations with DynamoDB deployed. Please confirm that views match the following screenshot:

The filtered detail shows that DynamoDB is deployed, or at least enabled, in all four locations in the billing file. There is very little, if any, activity in the last month, or maybe longer:

    • Why are these services enabled and not used, or used very little?
    • Do these services present any added risk, as they are likely partially configured, or set to basic defaults, and not locked down at this point?
    • How do these relate to RDS, if at all? Is RDS also partially configured?
    • Which service is primary for the business?
    • Which site is primary and backup for the database service that is supposed to be utilized?
  1. Please replace dyn in the Text Search box with RDS:

The filtered results immediately change to only show locations with RDS deployed. Please confirm that views match the following screenshot:

The filtered detail shows that RDS is only deployed in two locations based on the data in the billing file. US West 1 appears to be the primary location, with nearly $3000.00 in monthly spend associated. The only other site is in Europe with less than $250.00 in monthly spend. Again, with some answers found, more questions are added to the list:

    • RDS can be set up in a multi-zone deployment. Based on the data, it does not appear to be true for this deployment. Should this be verified?
    • How does a single-zone deployment of RDS affect suggestions for the future state?
    • How would a multi-zone RDS deployment affect economics and risk?
    • Which site appears to be primary, based on database activity?
    • This may not be ideal when trying to find which location is production but has a very high probability based on the details seen so far.

As a cloud architect, many hats must be worn. We are investigators at times. The accountant, technician, risk manager, and strategist hats are never far away. Modern cloud architects must have as much or more skill in business finance and economics as they do technical prowess.

As the investigation into NeBu Systems' current state has progressed, the details examined continually must work to align NeBu Systems strategically, economically, and technically. NeBu appears to be overspending in areas and potentially not spending enough in others. The technical mix appears to be solid for the most part, with definite areas to improve. As described by NeBu Systems earlier, things do feel like they have grown quickly without the best governance and limited change management.

Up to this point, most of the examination has been across all locations included in the billing file. This has helped NeBu Systems gain a better understanding of where they are overall and what they are consuming, and has identified some ways to focus optimization efforts that may help control service sprawl and escalating costs.

In the next section, a deeper dive into individual locations within the billing file is needed. Comparing existing deployments to the current market will quickly provide insight that will help solidify direction and next steps for NeBu Systems as they continue their transformation to cloud.

Please click the first icon in the bottom-left corner to switch back to the design view. In the next section, current state data for the primary location will be considered and compared to current market real-time data to help expose additional insight:

The view should switch to display all four NeBu Systems locations, with any infrastructure and services currently deployed at each location. Please confirm the view matches the following screenshot. From this view, each NeBu location and the services deployed there can be individually compared to the current market to identify options that NeBu can use to optimize designs item by item.

Each service has its own characteristics, deployment size, level of utilization, technical detail, and economic impact. Each compute service has its own performance characteristics and reliability/availability trends over time. Each of these data points will help the NeBu Systems cloud architects align strategy, economics, and technical requirements:

The primary locations appear to be USW1 and USW2. This next section will focus on the optimization of USW1. The current view shows compute, storage, services, and connectivity. Please click on the affectionately named hamburger menu at the top right of the USW1 current state design. Please confirm menu location in the following screenshot:

It will take a minute or two for the view to change. In real time, every line item is analyzed and compared to the current market. Once the view changes, a BOM view should show on the right, with the design visualization on the left. Please confirm the view has changed as shown in the following screenshot:

Please scroll through the line items on the right to the bottom of the page. Three ovals should now be visible. Please confirm the current view matches the following screenshot, with the three ovals now visible at the bottom of the page:

Again, what stories can be told with the data?

  • The billing data shows nearly $31,000.00 total spent for this location during the billing period
  • The monthly recurring cost (MRC) is $31,000.00
  • The MRC equals the total, meaning that all services have a term of one month or less
  • None of the spend is NRC, meaning NeBu Systems is not currently utilizing any reserve instances

The data mentioned provides a good high-level overview of the current state services and current state spending for NeBu Systems. More detail is needed as optimization efforts are explored:

  • What is driving the cost of the solution?
  • Are there any strategic, economic, or technical factors that highlight where the focus should be placed as future state considerations are made?
  • How does performance factor in?
  • Can the footprint be consolidated to help control cost?
  • Are the correct or optimal instance types being used?
  • Are consumption models matching up with strategy?

Please click on the middle icon at the bottom left of the screen. This will change the view to show how each service contributes to the overall cost of the solution. Larger blocks mean that items with larger block size account for larger portions of the overall spend. This provides a visual way for cloud architects to quickly identify places to focus and find alternatives to re-align strategy, economics, and technology:

Please confirm that the view has changed to match the following screenshot. A couple of very large blocks quickly points out that a small number of services are contributing to a majority of the cost in the current state:

The purple block is Other Costs. These costs are AWS-specific services that may lead to vendor lock-in. These services are generally not the same from provider to provider. There may be alternatives that could be used by other providers. Additional time and effort are needed to investigate each of these further. The Other Costs block accounts for 31% of the total solution cost monthly.

The second large block (dark blue) is associated with the m3-xlarge-linux instance type. This single instance type is contributing 28% to the overall solution cost each month. There may be more than one instance deployed, but this type of instance is contributing significantly to the overall NeBu Systems solution in US West 1. Some interesting questions come to mind based on these two additional data points:

  • What services are AWS-specific?
  • Is lock-in to AWS an issue? Does it need to be resolved?
  • The M3 instance types are older instances that have now been updated to newer versions. Should these be upgraded?
  • Why have the M3 instances not been upgraded to a more current version?
  • M3 instances are a general use compute type with SSD storage. Is it better to split the applications into more cost-effective compute types that match the applications?
  • Would smaller instance types match NeBu System strategy better technically and/or economically?
  • What are these instances doing? Are they still critical to the solution?
  • As upgrades and changes for a future state are considered, what new services may align better strategically, economically, and technically to NeBu Systems' current direction?

A general idea is now understood regarding how NeBu Systems has deployed their infrastructure and services. Several questions have been raised with very good opportunities for optimization coming into focus quickly.

Please click on the first icon at the bottom left of the screen to change the view to the IQ view:

The view should change to show the Summary tab by default. This view provides high-level details such as location, cost, and price-to-performance for the entire solution. Please confirm the view has changed to match the following screenshot:

In this view, the location is confirmed as San Jose, CA, which is US West 1. Again, the total cost is shown. Two new pieces of data are shown in this view. First, it is shown as (from contract) in the first column under Solution Set. This distinguishes data that is from the current state billing file versus comparison market data that is compared in real time. The second new piece of data is the $/BCU red text in the middle of the screen. This red number is the Burstorm Compute Unit (BCU), an average cost per unit of performance based on the benchmark data discussed in depth at the end of the previous chapter. $/BCU will be used in later steps to compare solutions and individual solution components to current market options available. These comparisons will help quickly identify options that have lower price-to-performance ratios. Lower $/BCU numbers are more desirable if all other criteria are equal.

Please click on the Details tab in the gray bar at the top of the provider response window. The location of the tab is shown in the following screenshot:

The view should now have detailed solution data for each line item in the current state solution. Please confirm that the solution detail is shown. The following screenshot is included for reference:

The preceding view shows the services portion of the current state bill related to Amazon-specific services. 30% of NeBu Systems' current monthly spend ($9,424.00) is associated with the AWS-specific services. Some of these services have already been discussed in detail earlier in this chapter.

Scrolling down through the same window shows the same level of detail for the infrastructure components and services. Please match views with the following screenshot. Scrolling to the bottom exposes two ovals. The green oval shows the total for the infrastructure components ($21,510.00). The second oval contains price-to-performance data. This view, by default, is set to prioritize based on price. The price-to-performance oval will show performance data as a cumulative total for the current state solution (2086.1):

In the preceding compute details, some answers to previous questions can be answered. NeBu Systems noticed a very large portion of the bill was committed to m3-xlarge-linux instances. In this view, 21 instances are shown. The detail for each is also shown (4 cores, 15 GB RAM, 80 GB of storage). Quick math shows this to be the largest grouping of total cores and RAM (81 cores and 315 GB RAM). Depending on application requirements and the number of applications, this group may be able to be changed to more cost-effective and more specialized workloads that match the application and NeBu Systems' strategy better:

  • Which instance type would be more beneficial based on performance and pricing data?
  • Is there a way to re-stack applications to utilize a more advantageous instance type and/or size?
  • What does this cost to deploy on updated infrastructure?
  • Are there any applications that can now be purchased as a service?

Please change from prioritizing on price to prioritizing on price performance. Changing the prioritization of the data, compute can be compared looking for opportunities to optimize the instance type. Depending on actual utilization data, an m3-large may be more beneficial than an m3-xlarge. If RAM utilization is low, the m3 may be the instance of choice:

Additional data that may also be helpful is how the individual types rank based on price performance data. The following real-time data is available by looking through the ongoing benchmark data. The arrows have been placed over the price-to-performance details for the m3-xlarge and the m3-large instance types:

NeBu Systems applications tend to be more RAM-intensive. m3-large may not be the right instance type based on the workload type. What does the current market have available? Is there a high-performing, lower-cost instance type that matches up to the NeBu Systems workload?

Please click on the hamburger menu above the word Summary as follows:

A menu will appear with a switch for Exact-Match. The switch should be on by default. Please click to flip the switch to OFF. Please use the following screenshot as a reference:

Exact-Match | OFF asks the platform to compare the solution to external solution providers. Showing a provider that is not an exact 100% match is allowed when the switch is off. Once the app has refreshed the new data view, the following screenshot should now match the current view:

The new data allows for comparisons to be made using current market data. As in previous chapters, Google is lower cost than several others, including Azure and AWS. The following screenshot shows a few interesting insights:

  • Google is the low-cost provider
  • Azure and AWS are very similar in cost
  • AWS, again, appears to be the higher-performing solution with a lower $/BCU
  • The lowest-cost AWS solution is in US West 2 (Boardman, OR), not the NeBu current location of US West 1 (San Jose, CA)

Cloud architects must often weigh risk and economics. This book has discussed that economics must offset risk. The higher the risk, the lower the cost must be to make it worth absorbing the risk. Migrating to Boardman may feel a bit risky. However, it is still in the same region with the same provider. If the cost is significantly less and/or the performance is significantly high where applications can be consolidated or re-stacked, the move may be worth the effort. NeBu Systems has a strategy of getting the highest performance at the lowest cost. The move to Boardman may be a foundation piece for realigning strategy, economics, and technology:

There should be a list of several solutions from several providers that match the following screenshot. In this section, a comparison between the current state billing file data for US West 1 and the current market. Please check the Compare box for AWS and the Compare box for (from contract) toward the bottom of the list. Please see the marked boxes in the following screenshot for reference:

Before comparing, there are a couple of interesting data points worth mentioning in this view. NeBu Systems is focused on finding the best performance at the lowest cost. Changing providers is an option if the price and/or performance is worth the risk:

  • The current state bill is one of the most expensive options presented
  • The current state option is $10,000+ higher than current market with the same provider
  • The current state services are potentially much slower than current services from the same provider

The data in this view makes comparisons very easy for NeBu Systems. This data alone may lead NeBu to conclude that focusing on staying with AWS is the right option and migrating to Boardman may make a lot of sense as well. A direct comparison between the two AWS locations is the next logical step.

Please change to the Compare view at the top of the provider response table as follows:

The view will immediately change to align each unique line item side by side. The following screenshot is shown for reference if needed:

It becomes very clear quickly that staying with AWS and migrating to Boardman has many benefits. Please scroll to the bottom and look at the ovals with the summarized cost and price-to-performance data:

  • Boardman has a much lower $/BCU ratio, $3.50 versus $8.83 for current state
  • Boardman is lower infrastructure cost, $11,295 versus $21,510 for current state
  • Changing the prioritized view from price performance to performance only shows that only Boardman is significantly faster, 2698.2 versus 2086.1 for current state

If these comparisons are difficult, refer to the first couple of sections of the hands-on labs, as each of these comparisons were detailed in those sections.

It is also very interesting to look at some of the side-by-side comparisons to see what is suggested based on the current state data available. Please look at the following example:

  • The first line is the current state solution. A total of 21 m3-xlarge instances were deployed accounting for $8,279.04 with a performance score of 413.1.
  • The second line is the potential future state solution utilizing 21 t2-xlarge instances for only $2848.27.
  • The difference between the two options is 66% less cost and a 20% increase in performance.
  • T2 instances may work very well for NeBu Systems' strategy as most of the applications are RAM-intensive, not CPU-intensive. The T2 series instances could stay at base CPU performance levels, controlling costs quite well. The T2 prices are very depending on how CPU performance and load increases. Staying at base performance would allow NeBu systems to utilize the RAM fully without increasing costs. Key note: understanding how economics and technology relate enables the simultaneous alignment of strategy, economics, and technology:
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset