Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 19
Allocating and Pricing Data Center Resources with Power-Aware Combinatorial Auctions

BENJAMIN LUBIN and DAVID C. PARKES

19.1 Introduction

As data centers become ever more essential to not only our economy but also our daily lives, it is increasingly important for them to be well managed. Three specific aspects are of paramount importance: ensuring that the right resources be allocated to the right use, ensuring that power is used only when it is providing real benefit, and making sure that the prices charged to users for service are providing incentives that support such allocative and power efficiency. In this chapter, we look at a method for employing a combinatorial market mechanism to achieve these ends.

In 2006, US data centers used about 61 billion kWh; that is, 1.5% of the 4 trillion kWh consumed in total. This is the amount of energy used by 5.8 million average US households (5% of all households) [1]. Producing this power resulted in 37 million metric tons of CO $c19-math-0001$ , or 0.6% of the 5.9 billion metric tons released from all sources. That is roughly 16% of that produced by the burning of jet fuel and more than that used to power TVs. This electricity cost $4.5 billion and required a peak load capacity of about 7 GW, more than double the level of consumption in 2000 [2, 3]. This pace of rising data center energy consumption has slowed, with global data center power usage rising 56% over its 2005 level by 2010 [4], but this modestly slower growth rate is still alarmingly fast and a serious cause for concern. Peak usage has reached approximately 30 GW, and of this, an average of only 6–12% of electricity goes to actual computation, the rest powering idle computers and support systems such as cooling [5].

Given rising power consumption and its associated financial and environmental costs, data-center operators realize that the established practice of running large numbers of significantly underutilized servers is no longer acceptable and are eager for energy-saving solutions.

Market paradigms have often been proposed as useful for allocating limited computational resources and satisfying multicriteria objectives. The earliest work on such markets was for time sharing on the PDP-11 in the 1960s by Sutherland [6]. In the intervening years, there have been proposals to use such methods in high performance computing and grid computing, as well as in datacenters.

However, existing proposals have deficiencies that can render them impractical for modern data centers. Here, we describe a general method for overcoming these concerns and illustrate its applicability to one specific environment. Our method provides the following:

A Realistic Model of Resources. We support a fine granularity of computational entity (e.g., core vs server, which is especially important as multicore machines become the norm) and fine control over the power state of machines (e.g., Dynamic Voltage and Frequency Scaling (DVFS), not just on/off). We also handle heterogeneous applications running on heterogeneous classes of servers.
A Realistic Representation of Goals. We use a general form of utility function expressed in terms of outcome percentile and derived from standard long-term service-level agreements (SLAs) that are programatically interpreted as short-term utility functions in a dynamic environment.
Principled Optimization. We adopt mixed-integer programming (MIP) for optimization of data center resources, providing a carefully formulated MIP model that can scale to large problem sizes.¹

We show that a market-based approach provides a natural, feasible, and advantageous framework for representing the milieu of physical and computational resources, and the applications that consume these resources, in modern day data centers.

Experimental results indicate that such a system can robustly and scalably improve net profits of our data-center prototype by up to 137%. We envision reallocating resources in the data center in roughly 10-min periods. So it is important that our mechanism should be able to compute an allocation within one time period, permitting the allocation to be realized on time. For large instances of 1000 machines and 10 applications (all associated with different customers), each with a realistic demand of 2700 transactions a second, when limiting MIP solve time to 10 min, we achieve an average solve time of 5.16 min, with a negligible approximation imposed on the instances that timeout. Thus a dedicated machine can optimize the usage on 1000 others, providing an acceptable 0.1% overhead factor.

19.1.1 Related Work

In many ways, the modern problem of allocating data center resources is a recapitulation of the problem of allocating slices of the early time-sharing systems of the 1960s. The modern need for expressive ways for users to describe their requirements has parallels in early methods such as IBM's Job Control Language (JCL). Consequently, there is a long history of proposed methods for describing user's computational requirements and for allocating resources based on those requirements. Most of these systems have not taken a market approach. Rather, they have assumed that resource owners require effective ways to allocate their hardware and that they would thus gather enough information to approximately solve an appropriate bin packing problem. Pricing has typically been a separate consideration—either because the system owner and users are in the same firm or because it has been handled by a separate human-mediated contracting process.

There are a number of limitations to such an approach. First, just because the hardware may be owned by the same firm as the users does not guarantee that appropriate prioritization of jobs will be manifest in the system—competition for resources occurs even within cooperative organizations. Thus hardware ownership may have less importance than is often assumed as, inevitably, there are two sides to the allocation problem: those who need resources and those who have them. If the allocator can make an accurate estimate of user needs, then treating the system as monolithic is without loss. However, this is difficult in practice, and thus it is often essential for users to provide this information. As soon as they do so, they may be incentivized to misrepresent their needs, in order to obtain more resources or a better price. If, as is ideal, we wish to confront this potential gaming of the system, the full allocation problem falls into the field of mechanism design, which considers the problem of designing systems for gathering information necessary for optimal allocation when agents may attempt to game the system. Viewed from this lens, pricing creates powerful incentives for participants and is thus also essential for moderating manipulation of the system and thereby enabling high quality outcomes. In this chapter, we adopt this view and construct a market mechanism that attempts to balance the “buyers” and “sellers” view of the data center. Our method is directly scalable to the allocation of a thousand machines and could be replicated in a straightforward way to handle arbitrary numbers of such pools. We view this approach as better capturing the full scope of the allocation problem, although clearly the simpler problem of allocation on behalf of a single monolithic owner is still important and challenging in its own right.

Chase et al. [8] and Chen et al. [9] present a market-based system for data center resource allocation and are able to experimentally demonstrate significant energy savings over static allocations. However, their greedy clearing mechanism imposes restrictions on the form of utility that can be modeled, SLAs are not directly represented, and demand/utility computations occur with respect to mean statistics instead of distributional information. Their model does not handle the heterogeneity of data-center machines or modern power-throttling architectures (instead simply turning machines on and off) and allocatesservers rather than cores. The nonlinear cost model that we use is related to the one provided by Chen et al. [9]. But rather than identifying total-value maximizing allocations with respect to SLAs, they treat SLAs as constraints and attempt to find the cheapest allocation subject to meeting implied quality constraints.

One branch of work on resource allocation in data centers has focused on utility computing, which seeks to provide access to data centers in a way that is analogous to that of a public utility (e.g., gas, water, power) [10, 11]. In general, utility computing views computational resources as easily substituted for one another, whereas we allow only particular machines to be configured for, and capable of running, certain applications. Rather than argue for a more radical shift in how computation is bought, sold, and deployed, we propose a more gradual evolutionary step by creating a market that handles the heterogeneity in present data centers and encompasses and generalizes the present contract format (SLAs).

There is an extensive literature on using market-based methods in related contexts, including computational grids and high performance computing. Yeo and Buyya [12] provide an abstract model of a market-based resource manager as comprised by its various parts, including a market model, an economic model, pricing, tendering, bidding, buyer/seller focus, a trading environment, QoS attributes, resource composition, an execution service/support, an accounting mechanism, management control, and a job model. The authors offer an extensive survey of existing systems according to these properties. In another survey, Broberg et al. [13] contrast several different market-based allocations systems and evaluate their pros and cons. These authors argue that fixed pricing can be problematic, that optimal combinatorial bidding and pricing can be expensive, and propose a particular ad hoc pricing mechanism that seeks a compromise between expressivity and scalability. In general, all allocation methods must grapple with this trade-off; we handle it by appeal to modern mixed-integer optimization algorithms that are “anytime” and can thus be stopped early at high quality approximate solutions.

A market in our setting is combinatorial in nature; participants must be able to bid (indirectly, via SLAs) on “packages” of items, where the value for a package need not be a linear combination of the value for its constituent items. There is a long literature on combinatorial auctions; an excellent overview is provided in the book by Crampton et al. [14].

19.2 A Market Model of Data Center Allocation

In this section, we describe how to represent the energy-aware data-center allocation problem as a market. Using a market for the internal allocation both ensures that resources will be used in an economically optimal way and opens a path to exposing customers to the mechanism as well. In turn, this can enable better information revelation and thus higher efficiency allocations and/or higher revenue.To begin, we describe the type of operation we see such a system operating within and define some needed terminology.

Typical data centers have thousands of servers, many of which will share the same hardware and software configuration. We call such equivalence classes “machine groups” and assume that this partitioning is performed by a separate offline process. The owner of a data center typically contracts (either internally or externally) to provide these resources to a set of applications (each associated with a customer), each with time-varying load and utility and a range of resource requirements and importance.

In present use, each application is associated with an SLA that is negotiated between the customer and the data-center provider. Such an agreement specifies a price, a performance objective (e.g., a cap on the 95th percentile of response time), and a penalty for failing to meet the objective. The SLA is useful for assigning a relative importance to the applications, but despite its quantitative nature, it is generally used at present in only a qualitative way, as a guideline for personnel when manually configuring data-center operations. Yet, SLAs suggest a direction toward application utility functions that are highly relevant to obtaining reasonable performance in a power-constrained environment [15, 16]. Here, we introduce a system that adopts SLAs for the purpose of utility-based optimization of resource configurations.

When allocating resources in the data center, we seek to optimize the operator's business value for the data center: that is, the revenue net of costs. This means assigning (portions of) the machines from discrete machine groups to the various applications as well as specifying the power for each machine and thus restraining overall consumption. For this, we use a detailed model of the power saving modes available to modern servers and assume access to monitors of both power consumption and application usage.

Figure 1.1 — **Figure 19.1** The data center market model.

Our market allocates goods (cores of machines from the various machine groups) to applications, as illustrated in Figure 19.1. The market is repeatedly cleared over brief periods, allocating resources to maximize the value reflected in short-term bids. These short-term bids are generated automatically from predictions about future supply and demand as well as applications’ long-term SLAs. In our experiments, we have used 10-min intervals, chosen to match the demand volatility we see in the applications we are modeling. The winner-determination problem requires optimization and could be approximately solved via heuristic search. We choose to formulate it as a mixed-integer program and solve it via a commercial solver, specifically IBM ILOG CPLEX. The form of the buyer (or customer's) value model and seller cost model have been chosen to ease formulation of the problem as a MIP, as described in the following.²

The formulation presented here is myopic in that it predicts demand in the next period and seeks the optimal allocation for that period. Instead, we could formulate a stochastic optimization problem that predicts multiple time steps forward and seeks an optimal allocation over that longer time period. However, such a formulation will be complex and difficult to solve with much fidelity over the short time frame available to the solver, and thus we view a single time-step solution as a reasonable simplification. Moreover, if reasoning about future time periods, we would ideally permit participant's bids to capture changes in value over time, leading to a scheduled market instead of the spot market presented here. We leave this important extension for future work.

For each period, we use a myopic net revenue maximization objective:

19.1

where $c19-math-0003$ is the value of the chosen allocation of machines to application (associated with a particular buyer) $c19-math-0004$ , $c19-math-0005$ is the dollar cost of a kiloWatt-hour of energy, $c19-math-0006$ is the total energy used to establish and maintain the chosen allocation for the current period, and $c19-math-0007$ is the dollar cost for the hardware.

The objective is thus quite straightforward, and the complexity comes from the constraints that define and restrict these variables. We begin by defining the buyer value, $c19-math-0008$ , that is, the value associated with application $c19-math-0009$ of some buyer. We note that throughout this section we treat $c19-math-0010$ as the buyer's true value for the service he/she receives. In practice, this value is not known a priori to the mechanism and must instead rely on the buyer's reported values. The existence of strategic opportunities to customers for misreporting values will depend on the interaction with the pricing rule used to charge for the service. We defer discussion of pricing rules and their incentive properties to Section 19.5.

19.2.1 Buyer Valuation Model

The contracts signed for data center provisioning are typically in terms of SLAs. We model a per-application SLA contract as a piecewise linear function for the value of receiving a given response time at a given demand percentile. This form has a number of advantages. As in contracts commonly used in practice, it is expressed in terms of an outcome metric percentile (e.g., 95th percentile). This enables the most extreme cases to be ignored as outliers but captures “bad-case” behavior in a way that the mean does not. While typical industry SLAs specify a single value for service exceeding a threshold quality at this percentile, we generalize this notion to allow a range of possible service levels. We are thus able to capture the value of degraded-but-still-useful service in a way that traditional SLAs cannot. Figure 19.2 shows an example of an SLA value curve of this form for two different applications A and B.

Rather than having bidders interact directly with the market, we use a proxy that operates on their behalf. This shields the human buyers from having to deal with low level details, instead of letting them to concentrate on the high level value function with which they are familiar. The bidding proxy takes the SLA thus provided and combines it with supply and demand prediction and an application performance model, to represent the SLA as a short-term bid; that is, a bid that is appropriate for the period of time for which the market is cleared.

Figure 1.2 — **Figure 19.2** SLAs as provided by two different applications.

In order to construct this short-term bid, the bidding proxy needs a model of how a given supply of machines (and thus transaction capacity) and application demand for the next planning episode will translate to the long-term response-time distribution (and in turn to, e.g., its 95th percentile) and thus to the value curve associated with an SLA.

As a very simple example, let us assume that transaction processing is described as an M/M/1 queue (exponential interarrival and service times). In this case, the response-time distribution is exponential with mean response time $c19-math-0011$ , where $c19-math-0012$ is the supply and $c19-math-0013$ is the demand, both in transactions per unit time. The fraction of response times above a percentile $c19-math-0014$ is given by the exponential quartile function: $c19-math-0015$ . The proxy composes the customer's SLA (Fig. 19.2) with this response-time model, resulting in a value function over both supply and demand at, for example, the 95th percentile (Fig. 19.3). As depicted in the figure, the value is scaled by predicted demand relative to the mean statistic. This is necessary to translate from the long-term value specified in the SLA into a short-term bid. Thus, we, on behalf of the bidder, estimate future demand and figure out an appropriate bid for the transactions expected in the current period. With this conversion, the market tracks the long-term statistics that are a customer's true concern (as reflected in the SLA). We note that this bid conversion is not done for strategic reasons, but rather to generate value-based information that can be used to drive good short-term allocation decisions.³

Figure 1.3 — **Figure 19.3** Application value by supply and demand.

In addition, the bidding proxy needs a predictive model of application demand over the next period to construct its short-term bid. We have found it sufficient to simply use statistics gathered over a small window of previous periods to provide a Gaussian model of the distribution of possible demand in the next period via a maximum likelihood estimation (MLE) of mean and standard deviation. The bidding proxy draws equal weighted samples from this Gaussian demand prediction model and takes a slice from the value model (Fig. 19.3) for each sample, in a process of inverse transform sampling. These slices are averaged to produce a single consumption-value curve that manifests our demand model. By taking a piecewise linear approximation of this curve (obtained by chaining the control points of the originally supplied response-time/value curve through these transformations), we arrive at a utility curve, which is provided to the market in a given period as a bid. An example of such a bid is shown in Figure 19.4.

Figure 1.4 — **Figure 19.4** Expected short-term value for a single application.

Through these transformations, we arrive at a value function, which is a mapping from potential resources bundles to agent value and used by the winner-determination algorithm to clear the market. In particular, the bidder proxy creates a function

19.2

for application $c19-math-0017$ , where $c19-math-0018$ is the piecewise linear function we have established through the process just described and $c19-math-0019$ is the number of cycles provided to application $c19-math-0020$ by the chosen allocation. To formulate this function, any standard MIP representation for a piecewise linear function can be used, which will induce auxiliary constraints and variables in order to account for the various line segments.

Importantly, the value function is specified in terms of cycles, not in terms of the underlying machine groups and power modes that are supplying them. This results in a significant dimensionality reduction and simplifies the winner-determination problem. However, in cases where differences in the underlying hardware architecture are important to application performance (e.g., all cycles are not equal and xeon cycles perform differently from atom cycles), an extension to a more complex higher dimensional model is available; see Reference 17 for details.

Next, we turn our attention to modeling the argument to our function $c19-math-0021$ , the total number of cycles provided to application $c19-math-0022$ in a period, defined as

19.3

where $c19-math-0024$ is the set of machine groups that can support application $c19-math-0025$ , $c19-math-0026$ is the set of power modes available to machines in group $c19-math-0027$ , $c19-math-0028$ is the number of cycles provided by a machine from group $c19-math-0029$ in mode $c19-math-0030$ over a period of time given by its argument, $c19-math-0031$ is the amount of time in the current period, $c19-math-0032$ is the amount of time it takes to transition from mode $c19-math-0033$ to mode $c19-math-0034$ , and each $c19-math-0035$ variable defines a quantity of cores (i.e., goods) allocated from group $c19-math-0036$ that were in mode $c19-math-0037$ and are now in mode $c19-math-0038$ (described in more detail in the following text). With these definitions, the expression then captures the total cycles provided to application $c19-math-0039$ , accounting for reductions due to time spent in transitioning power level and considering all machine groups and power levels.

19.2.2 Defining The Goods in the Market

Within each machine group, we track only the number of cores in each power state. An allocation of some quantity of such cores is ultimately mapped into an assignment of cores on physical machines in postprocessing. This avoids the creation of immaterial distinctions that would only complicate winner determination and is similar to complexity-reduction methods that have been used to good effect in recent high profile government bandwidth auctions [18, 19]. We currently use a fast but potentially only approximate greedy assignment for this postprocessing, but more sophisticated methods could be used if the identities of machines in a group are important.

To properly encode the data-center cost model, described in the next section, we need a representation of goods that captures not just static power states but powerstate transitions, enabling us to account for resultant changes in energy usage, cycle loss, and increases in failure rate. Consequently, our goods are manifest in $c19-math-0040$ variables that capture the number of cores in a given machine group starting in mode $c19-math-0041$ in the previous period, transitioning to (the possibly identical) mode $c19-math-0042$ in the current period and assigned to a given application $c19-math-0043$ .

Constraints are defined to ensure that an allocation of these goods will be physically implementable; for example, on present day platforms it is required that all cores on the same physical machine be at the same power level:

19.4

19.5

where $c19-math-0045$ is the number of cores per machine in group $c19-math-0046$ , $c19-math-0047$ are variables counting the unassigned cores, $c19-math-0048$ and $c19-math-0049$ count sold and unsold machines, respectively, and $c19-math-0050$ count the unsold cores on partially sold machines. Additionally, we need to restrain available supply, through the machine counts:

19.6

where $c19-math-0052$ is the number of machines in group $c19-math-0053$ .

Figure 1.5 — **Figure 19.5** Power and speed under low power.

Figure 1.6 — **Figure 19.6** Power and speed under high power.

19.2.3 Seller Cost Model

On the supply side of the market, we explicitly model both the hardware and energy costs of running the data center's machines in their various power states. Our model captures the power consumed and performance attained by each machine as a function of the number of active and inactive cores, as measured empirically on an IBM BladeCenter HS21 Server (Figs. 19.5 and 19.6). Modern dynamic voltage and frequency scaling (DVFS) enabled machines can have their most efficient state at less than full power: for example, a maximum of 64 versus 50 MCycles/W with four cores active (taking the ratio of the curves in each figure).

We define the energy requirements (i.e., power over the time period) of the active machines and cores as follows:⁴

19.7

where $c19-math-0056$ is the energy required to go from power state $c19-math-0057$ to $c19-math-0058$ , $c19-math-0059$ is the base power for an active machine, and $c19-math-0060$ is the incremental energy needed to run a fully loaded core in this power state. Both of there are parameterized by $c19-math-0061$ so as to only include energy for that period of time when the machine has reached its destination power level, excluding that used in the transition period (energy for the transition is accounted for by $c19-math-0062$ and on some hardware may occur at full power regardless of the proximal power levels). Here $c19-math-0063$ accounts for the typically two- to threefold increase in energy needed to run power supply units,uninterruptible power supplies, network switches and storage, and, most importantly, cooling equipment. All elements of this expression are ultimately constants, except for the $c19-math-0064$ and $c19-math-0065$ variables.

We stipulate the hardware costs for active cores across the full data center as follows:

19.8

where $c19-math-0067$ is the prorated cost for each machine (again accounting for only that period of time cycles are actually being provided, as opposed to power level transitions) and includes not only the amortized server cost but also supporting equipment, buildings, and personnel; and $c19-math-0068$ accounts for the cost associated with an increased failure rate on a state transition due to, for example, spinning up/down hard drives. We expect each of these numbers to be easily obtainable through a principled evaluation of existing business practices and capital investments.

Episodic formulations have a common problem in that they may not bear large transition costs when they create a temporary loss in utility, despite a long-term gain. Consequently, we also include sell-side tracking of the power state of machines over previous periods (similarly to the buyer-demand prediction), which can be used to predict the expected time that a machine transitioned to a new state will stay within it. This can be used to amortize transition costs over a good estimate of their appropriate time frame. This adjustment can be calculated exogenously to the rest of the system and is used to adjust the $c19-math-0069$ and $c19-math-0070$ constants.

A system administrator might, in addition, wish to specify additional restrictions on the allocation based on technical requirements that are not externally visible or those that are unstated by buyers but can reasonably be inferred as being part of their intent. Considerable flexibility is possible; someexamples include min/max cores/machines for a given application, min/max energy used in a given machine group or for a given application, and max cores in a given machine group that can be allocated to a given application if a certain number is already allocated to specific alternative application (anti-colocation).

19.3 Experimental Results

We have evaluated our market-based system in a set of simulation experiments to establish computational tractability and effective allocative behavior over a wide range of environments. Each experiment has been performed on a 3.2 GHz dual-processor dual-core workstation with 8 GB of memory and IBM ILOG CPLEX 11.1. Each data point is the mean of 10 randomly generated time-dependent demand traces.

Our synthetic traces are the sum of two sinusoidal curves (e.g., 1-day period with 9000 peak transactions/min plus 1-week period with 36,000 peak transactions/min) and a noise term drawn from a Gaussian with a standard deviation equal to 25% of the signal. These match well with real customer traces, where request density is time dependent and oscillates over both days and weeks [20]. Unlike captured data, synthetic traces enable us to test not only robustness to wide variation in absolute load level but also different amounts of correlation among applications. Each transaction is assumed to use 300 MCycles, which is representative of the processing needed to produce a custom database report. Lastly, each allocation period is 10 min, which is fast enough to react to dynamic changes in the load, but without being so short as to require hysteresis in the model beyond that implicit in the transition costs.

Because no allocator in the literature has comparable capabilities, we adopt as a benchmark a sophisticated greedy allocator, which operates as follows:

Put all the machines in their highest efficiency state.
Determine the target supply for each application by calculating what is required to produce its ideal response time at its 95th percentile of demand.
Allocate cores (from feasible machine groups) to the applications, weighted by the marginal value of supply to each application. If an application's supply of high efficiency cores is exhausted, then bump one of the machines supporting it into a higher power state. Stop when either all the targets have been met or all the cores/states have been allocated.
Consider each application in turn and trim the allocation until the expected value at the 95th percentile of demand is greater than or equal to the expected cost.
Place remaining machines in their lowest power state.

For exposition purposes we consider a simple scenario with two applications (i.e., two customers) and three machine groups (each capable of supporting the first, second, and both applications, respectively), for a simulated week of time-varying demand. We also provide results where we vary the price of energy, to demonstrate the flexibility that a market-based allocation scheme can bring to bear.

Figure 1.7 — **Figure 19.7** Energy used and response time as the cost of energy is varied under market and heuristic algorithms. Bars indicate one standard error over 10 random traces at each price point.

Figure 19.7 shows the effect of varying the price of energy under both the market and the static-allocation algorithm. We can see that, as expected, under both algorithms, the energy used falls and consequently the mean response time rises as the price of energy is increased. However, bidding proxies in the market (representing customers) find it profitable to purchase enough energy to maintain a near-optimal response time until the price finally reaches such a point that such high energy usage can no longer be sustained and more energy-frugal allocations are chosen.

Figure 1.8 — **Figure 19.8** Buyer value and seller revenue net cost as the cost of energy is varied under market and heuristic algorithms. (a) Buyer value by energy price and (b) seller cost by energy price.

In Figure 19.8, we see the impact of the choice of these allocations on buyer (i.e., customer) and seller value, as judged by SLAs and revenue net of cost, respectively.

The greedy allocation is cheaper to provide because of the static power levels and also results in significantly lower buyer value over a wide range of prices. The maximum revenue net cost improvement is 137% higher in the market model, although margins become slim when energy is expensive.

It is also important to consider distributive effects to customers in the data-center setting. In this scenario, the A application induces a larger load than B, but with a smaller marginal value for cycles. Consequently, as energy prices rise, the static allocator quickly devotes the limited resources that can be afforded to the B allocation, thereby starving the A application, as seen in Figure 19.9. The market allocation not only maintains the allocation for the B application but also recognizes that some resources can profitably be given to A. This is made possible by switching machines between their most efficient modes to conserve energy and their high power modes to track spikes in demand. Figure 19.10 shows that in this setting the static allocator has placed all the machines in the high efficiency “low power” mode, whereas the market has made use of both modes. When the price for power is low, the most efficient allocation is to maintain quite a few machines in the high power state. However, as the price crosses [[cents]]40/kWh, there is a phase change and it becomes much more efficient to run mostly in the low power mode. Beyond about [[cents]]60/kWh, it becomes impossible to afford the energy needed to maintain a supply sufficient to keep a low response time and the optimal allocation shrinks.

Figure 1.10 — **Figure 19.10** Power mode allocation by energy cost under market and static algorithms.

19.3.1 Scalability and Complexity

To evaluate the scalability of our MIP formulation, we evaluated 10 instances of a scenario with 200 quad-core machines in each of five groups for a total of 1000 machines. We configured 10 applications, each with a demand for some 2700 transactions per second, to draw upon these resources with each group supporting three of the applications in a ring topology. We restricted the optimizer to no more than 10 min of computation per instance, taking advantage of the anytime nature of modern MIP solvers. Four of the instances were capped to 10 min, and the average solve time was 5.16 min over all 10 instances, well within the time of a single period. Further, the approximation resulted in only a 0.08% revenue loss when compared to the optimal solution, which would have taken an additional 29 min on an average for these difficult cases. Thus a single machine is capable of optimizing the usage on 1000 others, providing an acceptable 0.1% overhead factor. For a data center with many more machines, one could decompose them into multiple machine pools, each of a size around 1000.

We have also investigated the effect on runtime of the structure of the bipartite graph that defines which application can be supplied by each machine group. For this, we use a scenario with five applications and five machine groups, where supply is set so as to be just sufficient to meet demand. The average solve time of the winner-determination problem increases nonlinearly as we vary the number of edges in the bipartite graph. A graph with 30% of the edges (already highly connected for current data centers) takes only 3.8% of the time needed to clear the market with a complete graph. With 50% connectivity, the computation time has risen to 58.8%, and with 60% connectivity, the timing has already risen to 86.6%. Interestingly, eliminating even a relatively substantial number of edges does not produce a correspondingly large increase in the application response time, as the system can still find an allocation that supports the demand. With 60% of the edges, we are only 8% above the response time of the complete graph. Consequently, moderately restricting the number of machine groups on which an application can run (e.g., for administrative reasons) will only modestly decrease solution quality.

We have shown that a market mechanism can allocate resources under realistic load conditions and more efficiently than the static allocators consistent with current practice. Further, we have shown that such a market can be cleared using a modest amount of computing power within an allocation window that is short enough to capture the essential dynamics of typical loads. So far we have described and evaluated methods for participants to describe their value to the market (the “bidding language” problem) and a method for clearing the market (the “winner-determination” problem). In the rest of the chapter, we first discuss an important class of generalizations to our model and then turn to the essential task of devising prices for themarket that have the necessary computational, economic, and game theoretic properties.

19.4 Going Beyond Processing and Power

So far we have considered only two properties of the data center in our resource allocation: CPU cycles and the energy needed to produce them. In general, such a system can handle any single aspect of the computational resources: CPU, memory, disk space, or bandwidth. In many cases, loads are bound by only a single such attribute, and the system described will be effective; it need not be CPU as we have illustrated but could be another attribute with suitable changes to the response-time model. However, more generally, there will be situations where application performance is heavily dependent on multiple attributes or where the data center is being used by a heterogeneous mixture of applications each with its own attribute performance profile.

To model the cost structure for more attributes is straightforward. We introduce additional variables that track how much of each attribute from a given machine group has been allocated to an application $c19-math-0071$ . As these attributes can be reasonably modeled as having linear costs, these variables can be included in the objective with cost constants that vary by machine group (e.g., some groups have expensive memory or high speed bandwidth; others do not).

In the formulation, we have described up until now, the aggregate goods transferred from one side of the market to the other are total CPU cycles, and these are then assumed by the buyer proxy to be evenly divided across its application's load. There is a simplifying assumption here, in that we have modeled all cycles as fungible (see Reference 17 for a relaxation of this assumption). If we maintain this assumption in our multiple attribute model, we have a variable for the total amount of each attribute assigned to each application. Consequently, the buyer proxy must submit a more complex buyer value function:

19.9

where each of the $c19-math-0073$ variables is the aggregate amount of attribute X given to application $c19-math-0074$ . As before, we decompose $c19-math-0075$ into a response-time model and a value model over the resulting time:

19.10

Thus, moving to multiple attributes does not change the piecewise linear SLA value curve $c19-math-0077$ , as may, for example, be seen in Figure 19.2, that the buyer enters into the system. We still assume value to be a direct function of response time.

However, we previously assumed the response time, $c19-math-0078$ , was a one-dimensional function of CPU cycles, and now it must be a multidimensional function of multiple attributes. The most straightforward model for this is multiplicative, where the previous queuing model over CPU is multiplicatively altered by a factor for each of the attributes. Adopting a softmax function, to encode a threshold of quantity needed for good performance, is a reasonable approach for modeling the factors.

The bidder proxy would then compose this response-time model with the user-specified value model, yielding a nonlinear multivariate value function. This is sampled to produce a multivariate piecewise linear value function, which is submitted into the market as the buyer's bid.

So there is a straightforward way to extend both the buyer and seller side of the market. The catch comes from the complexity of clearing this market. Methods for encoding a single-dimensional, piecewise linear function into a MIP are very effective if there are a modest number of inflection points. Such methods extend to multiple dimensions but scale in an exponential manner. So working with a small number of attributes can be coupled with exact (or almost exact) MIP solutions. But this approach will be too slow for more than a few attributes, and it will make more sense to skip the linearization step and approximately solve the resulting nonlinear programming problem directly. Such an approach will not guarantee the fully efficient solution, but in the data-center context, local optima may well be sufficient.

19.5 Pricing

We have shown how buyers can specify value over complex combinations of resources and how sellers can specify their costs for providing these resources. And we have shown that a MIP formulation can be used to clear the resulting market and produce an allocation. But what payment should be asked of the buyer in return for these resources?

The simplest solution would be to have fixed prices associated with each resource type and simply charge the users for the amount of resources they actually use. This is the same pricing model as that used in Amazon EC2 (note that here decisions about quantity and quality of machines to buy is optimized by the system; on EC2, such a decision is left to the user who typically does not have the information needed to make an optimal choice). Fixed prices have the advantage of being simple for buyers to understand; they are often a fixed markup over the seller cost, making them easy for sellers to compute. However, they are not adaptive and thus do not take advantage of the information revelation that occurs in the market. Sellers may set the markup too high and thereby miss out on profitable business when buyers are unwilling to meet the price. Alternatively, they may price too low, forgoing revenue they might otherwise enjoy by leaving much of the surplus from trade with the buyers. However, it is important to emphasize that even when using fixed exogenous prices, a market such as the one described here can still be useful as an internal allocation method that attempts to establish the proper trade-off between performance and consumption.

Another alternative is to still use easily understood linear prices (e.g., a single price per unit of each type of good, as opposed to more complex prices based on bundles of goods), but to determine what these prices should be, using the bids submitted to the system. This has the advantage of adaptivity, by choosing pricesbased on the information revealed by the buyers in the course of interacting with the system. However, because the market is combinatorial in nature, we cannot actually find perfect market clearing prices without using nonlinear bundle prices (which we consider shortly). One way around this is to find and use a set of linear prices that come as close to clearing the market as possible using a linear programming approximation; see Reference 21 for details.

However, such formulations are very complex. While participants will immediately understand the linear prices produced, the same cannot be said for the process used to arrive at them. Consequently, we may instead choose to simply charge the buyers their stated value: the so-called “first-price auction.” This is both adaptive in the above sense and simple—it is instantly recognizable to anyone who has seen a single good “English” auction for, for example, art, cars, or cattle. There is, however, a problem. As is common we will assume quasi-linearity in the sequel, that is, a buyer's utility is exactly his/her value minus the payment he/she is charged. Suppose buyers act so as to maximize their expected utility and may, therefore, submit bids to the system inconsistent with their true value, if they believe they will get either a better set of resources or a lower payment. The first price rule provides a large such incentive for buyers to benefit from these manipulations [22].

The classic solution to this problem is to instead use the Vickrey—Clarke—Groves (VCG) mechanism [23]. This mechanism clears at the efficient trade and charges each buyer its marginal impact on the economy, that is, each agent $c19-math-0079$ is charged the difference between the optimal trade without $c19-math-0080$ and with $c19-math-0081$ , as measured by the total reported value of the agents other than $c19-math-0082$ . This mechanism has many desirable properties. It is individually rational, meaning that agents are never forced to incur a negative profit. It is budget balanced, meaning that it will never run at a deficit. And, it is strategyproof, meaning that agents will have a dominant strategy to truthfully report their bids to the system, solving the problem identified in the first-price auction.

The VCG payment rule has justifiably received an enormous amount of attention in the literature. However, it too has drawbacks, many of which are listed in an oft-cited paper by Rothkopf [24]. Chief among these is that it can produce unacceptably low revenue. Consider an example from Ausubel et al. [25], with two items for sale, A and B. Bidder 1 wants the AB bundle for $1. Bidder 2 wants just A for $1, and Bidder 3 wants B for just $1. VCG gives the goods to Bidders 2 and 3, but charges $0, the marginal impact of each of these players. For these reasons, adopting VCG may not be appropriate in the data center setting, despite its many desirable properties.

Another consequence of the low revenue of VCG is that it can often be profitable for sets of participants to come together and manipulate their bids so as to subvert the system. Informally, VCG is not “collusionproof”—it is not strategyproof for coalitions of agents working together. Payment rules that have this property are said to choose payments in the “Core,” and such rules are now available for combinatorial markets of the type proposed here [26]. However, such prices are typically notstrategyproof (i.e., the VCG payments, which are the only ones that are strategyproof in sufficiently complex settings, are not in the core). Because of this, it is now common to choose those payments in the buyer-optimal core that are closest to the VCG payments, in an attempt to mitigate the strategic incentives that must inevitably arise [27]; see also Reference 28 for related ideas in the context of two-sided combinatorial markets [28]. Such pricing represents an attempt to balance the need for both reasonable revenue and reasonable incentive properties and as such have recently been used in the United Kingdom to clear combinatorial bandwidth auctions [18, 19, 29].

While appealing in theory and offering increasing practicality, core payments still retain one major disadvantage. They are exceedingly complex and hard for buyers to understand. This is less of a problem when they can be imposed by governments on large players purchasing exceedingly valuable resources, as in bandwidth auctions. But it is less clear that they can be adopted in more modest settings.

One such setting is the sale of sponsored-search advertising on sites such as Google. Most systems in this space have adopted a simple yet effective payment rule known as the generalized second price (GSP) auction [30]. In these settings, the goods being sold can be easily sorted into a value order that is known to be consistent with all the bids—specifically, ads placed at the top of the page are more valuable to every bidder than those placed further down. The basic rule is then very simple: the market is cleared efficiently at the bids, and players are charged not their bid, but the bid of the player they displaced into the next lower spot. That is, they are charged the price required for them to win their slot holding the other bids constant—or equivalently, a separate second-price auction is effectively applied at each slot in turn from highest value to low, among only those bidders who have not already been satisfied. The GSP mechanism is easily understood, reasonably efficient in practice, results in relatively small amounts of strategic behavior (even though it is not strategyproof), and generates reasonable amounts of revenue for the seller. In short, for settings that meet its requirements, it is an extremely effective mechanism.

A payment rule similar in spirit to the GSP rule can easily be constructed for our setting. First, the efficient allocation at the reported bid is determined, as we have described. We then consider each agent-allocation pair in turn, ordered from highest reported value to lowest. For each such pair, we find the highest value for the buyer's allocation among all buyers further down the list and charge this for the payment. In other words, we charge each buyer the price needed to win the bundle away from the next-most eager bidder. While not strategyproof, or core, there is reason to believe such a rule should work well in practice, and it is certainly simple.

Any of the above payment rules could potentially be adopted for use with our mechanism. As we have discussed, each comes with its own pros and cons. Considering them in balance, we can easily see a role for several of the choices. Clearly fixed prices will continue to be important; we argue that they can indeedbe coupled with more complex bidding and allocation systems. We believe core-based rules will continue to garner significant academic interest and, provided that implementations can hide their complexity, may well see practical application in these settings. But we see the fastest immediate impact in GSP-like rules that are simple and should have reasonable properties in practice.

19.6 Conclusions

We have established that suitably designed combinatorial markets can find practical application to power-managed resource allocation in data centers. Further, it is possible to inject revenue-based utility functions directly into the present data-center business/allocation model without the large changes associated with utility computing; this creates the streamlined migration path required for rapid industry adoption. Such markets obviate the need for operators to divine their customers’ value profile, quantify the trade-offs of multiobjective optimization, and facilitate the use of combinatorial optimization in a scalable way, provided carefully designed models are used.

There are many intriguing avenues for future work in this space. First, the results presented here are in simulation; a real-world implementation and trial is thus a clear direction to take. One initial step in this direction might be to evaluate the market approach on a model of a Google compute cluster based on the real-world demand traces they have recently released [31]. More generally, one possibility is to consider richer SLA models, such as models that capture the importance of the timing of failures. For most applications, it is better to suffer an occasional transaction failure over a long period of time than to become completely unresponsive for even a short time period—yet the standard percentile-based SLA model does not distinguish between these cases. The present system is straightforwardly scalable by simply deploying multiple copies of the market. However, this approach makes each market an island. Consequently, very large installations may need a way to transfer loads across these islands, and a higher level market is a reasonable way to perform such coordination. Thus it makes sense to consider a hierarchical allocation paradigm, where a market clears at each level, a compelling generalization of the present approach.

Acknowledgments

Our profound thanks to Jeff Kephart and Rajarshi Das, our coauthors on much of the work described herein. Any errors are our own. An earlier version of this work appeared in IJCAI-09 [32]. We would also like to thank IBM Research where much of this work took place.

References

1. US EPA. Report to Congress on Server and Data Center Energy Efficiency, Aug. 2007.
2. US DOE and EPA. Carbon Dioxide Emissions from the Generation of Electric Power in the United States, Jul. 2000.
3. US EIA. Emissions of Greenhouse Gases Report: Carbon Diaoxide Emissions, Dec. 2008.
4. J. Koomey. Growth in Data Center Electricity Use 2005 to 2010. Analytics Press, Oakland, CA. Aug. 1 2010, 2011.
5. J. Glanz. Power, polution and the internet. New York Times, Sept. 22 2012.
6. I. E. Sutherland. “A futures market in computer time,” Communications of the ACM, 11(6), 1968, 449–451.
7. T. Sandholm. “Expressive commerce and its application to sourcing: how we conducted $35 billion of generalized combinatorial auctions,” AI Magazine, 28(3), 2007, 45.
8. J. S. Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat, and R. P. Doyle. Managing energy and server resources in hosting centers. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP-01), pp. 103–116. ACM, New York, NY, 2001.
9. Y. Chen, A. Das, W. Qin, A. Sivasubramaniam, Q. Wang, and N. Gautam. Managing server energy and operational costs in hosting centers. In Proceedings ACM SIGMETRICS Int. Conf. on Measurement and Modeling of Computer Systems (05), pp. 303–314. ACM, New York, NY, 2005.
10. C. Low and A. Byde. Market-based approaches to utility computing. Technical Report 23, HP Laboratories, Bristol, Feb. 2006.
11. A. Byde. A comparison between mechanisms for sequential compute resource auctions. In Proceedings of the 5th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-06), 8(12), 1199–1201, 2006.
12. C. S. Yeo and R. Buyya. “A taxonomy of market-based resource management systems for utility-driven cluster computing,” Software, Practice & Experience, 36(13), 2006, 1381–1419.
13. J. Broberg, S. Venugopal, and R. Buyya. “Market-oriented grids and utility computing: the state-of-the-art and future directions,” Journal of Grid Computing, 6(3), 2007, 255–276.
14. P. Cramton, Y. Shoham, and R. Steinberg, editors, Combinatorial Auctions. MIT Press, Cambridge, MA, Jan. 2006.
15. J. O. Kephart and R. Das. “Achieving self-management via utility functions,” IEEE Internet Computing, 11(1), 2007, 40–48.
16. M. Steinder, I. Whalley, J. E. Hanson, and J. O. Kephart. Coordinated management of power usage and runtime performance. In Proceedings of the 9th International Symposium on Integrated Network Management (NOMS-08), pp. 387–394, 2008.
17. M. Guevara, B. Lubin, and B. C. Lee. Navigating heterogeneous processors with market mechanisms. In Proceedings of the 19th International Symposium on High Performance Computer Architecture (HPCA-13), 2013. Forthcoming.
18. P. Cramton. A review of the l-band auction. Technical report, Office of Communications, United Kingdom, Aug. 2008.
19. P. Cramton. A review of the 10-40ghz auction. Technical report, Office of Communications, United Kingdom, Sept. 2008.
20. G. Pacifici, W. Segmuller, M. Spreitzer, and A. Tantawi. “CPU demand for web serving: measurement analysis and dynamic estimation,” Performance Evaluation, 65(6), 2008, 531–553.
21. B. Lubin, D. Parkes, J. Shneidman, S. Lahaie, R. Cavallo, and A. Juda. “Ice: an expressive iterative combinatorial exchange,” Journal of Artificial Intelligence Research, 33, 2008, 33–77.
22. B. Lubin and D. C. Parkes. Quantifying the strategyproofness of mechanisms via metrics on payoff distributions. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI-09), 2009.
23. V. Krishna. Auction Theory. Academic Press, University Park, PA, 2002.
24. M. H. Rothkopf. “Thirteen reasons why the vickrey-clarke-groves process is not practical,” Operations Research, 55(2), 2007, 191–197.
25. L. M. Ausubel and Paul Milgrom. “Ascending auctions with package bidding,” Frontiers of Theoretical Economics, 1(1), 2002, 1–42.
26. R. Day and P. Milgrom. Core-selecting package auctions. International Journal of game Theory, 36(3), 2008, 393–407.
27. R. W. Day and P. Cramton. The Quadratic Core-Selecting Payment Rule for Combinatorial Auctions. Working Paper, Sept. 2008.
28. D. C. Parkes, J. R. Kalagnanam, and M. Eso. Achieving budget-balance with vickrey-based payment schemes in exchanges. In Proceedings of the 17th International Joint Conference on Artificial Intelligence, pp. 4–10, 2001.
29. P. Cramton, E. Kwerel, G. Rosston, and A. Skrzypacz. “Using spectrum auctions to enhance competition in wireless services,” Journal of Law and Economics, 54(4), 2011, S167–S188.
30. B. Edelman, M. Ostrovsky, and M. Schwarz. “Internet advertising and the generalized second price auction: selling billions of dollars worth of keywords,” American Economic Review, 97(1), 2007, 242–259.
31. C. Reiss, J. Wilkes, and J. L. Hellerstein. Google cluster-usage traces: format + schema. Technical report, Google Inc., Mountain View, CA, USA, Nov. 2011. Revised 2012.03.20. Posted at URL http://code.google.com/p/googleclusterdata/wiki/TraceVersion2.
32. B. Lubin, J. Kephart, R. Das, and D. C. Parkes. Expressive power-based resource allocation for data centers. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI-09), 2009.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.