Chapter 10. Performance Modeling: Tools for Predicting Performance

TCA is the scientific method of determining server capacity—establish a hypothesis, and then run an experiment with live hardware and real or projected usage data to determine the validity of your expected result. Using this method, basic capacity planning and real-time evaluation of existing systems can be done with an acceptable degree of certainty. Provided the usage data, hardware, and network resources are available to test, TCA can also be used to anticipate changes in load and cost, and to determine where adjustments can be made in the name of either higher performance or lower financial impact.

When an established server’s load grows to the point where performance is reduced, the most common solution is to add more resources, with the expectation that things like additional processor capacity or bandwidth will have a direct impact on the performance problem. Although this can be expensive, it is usually considered a good problem to have. A system that has reached its critical limit for performance is probably generating revenue, or at least experiencing high traffic. Additional hardware costs are justifiable on the surface, and hardware installation rarely involves a major reconfiguration of the server’s software components. The expectation is that users will not experience an interruption in their service—just an improvement in its response time.

The attitude that more hardware solves the problem can get software engineers and network managers in trouble, however, because adding more physical resources does not always address the real issue.

For example, consider an e-commerce server that is taking too long to process order handling, and generating errors for some users who try to place an order. The IT staff may decide that because their processor load is often near maximum, the bottleneck is the result of a lack of hardware resources. However, this bottleneck in performance might be caused by the server application: perhaps the order processing application is only designed to handle a finite number of simultaneous open order input transactions. If this aspect of the application is not anticipated in TCA evaluations, the TCA results will not show that increasing processor speed will have no impact on the problem.

Predicting and Evaluating Performance Through TCA

TCA is greatly valuable for evaluating existing systems and gauging the improvement caused by changes in code or upgrades in hardware. By changing the parameters within an Excel spreadsheet, projections can easily be made. By running TCA evaluations against old and new versions of software, as illustrated in the Chapter 9 "Real World Example—Shop.Microsoft.com,” changes are validated in a measurable way. TCA is a significant improvement over completing an entire system, including both hardware and software, and then attempting to make improvements to meet requirements after the fact. However, due to its dependence on the actual or projected data and system responses, it is a somewhat reactive method of evaluating performance.

When the overall architecture of a server system requires multiple changes to improve performance, a company’s existing TCA criteria may not be designed to take all of these elements into consideration.

What about a new business launching a new Web service? In the last chapter, we discussed the need to have real user data in order to complete the TCA experiment. In addition, validating the impact of more bandwidth, processor capacity, or other physical improvements to the system can be expensive because those resources must be physically present. The cost of having hardware for testing purposes is often hard to justify, especially when the system being tested is not yet generating revenue.

Advanced Performance Modeling

To address this issue, a more advanced method of performance modeling is required. The purpose of this more advanced form is to facilitate testing of multiple hardware and network resource configurations without actually having all of the resources present. This allows an enterprise to run extensive tests, including what-if scenarios, in order to make informed decisions about their physical resource purchases. The difference between these two methods is most evident in their end results: TCA tests known data and existing resources to the point of failure, defining the physical limits of the system. Advanced performance modeling tests possible configuration scenarios with a greater number of variables, and can be used not only to predict the physical limits of the system, but also to suggest possible improvements prior to those limits being reached.

Performance modeling also allows software engineers to test models of their code before the code is complete. This can be achieved by defining the design of the software in a language like Universal Modeling Language (UML), which is an industry-standard design formalism. By creating code objects that encapsulate the performance characteristics of software in terms of expected resource utilization and workflow, performance engineering is accomplished through the creation and evaluation of performance models.

Another obvious benefit of this virtual model approach is the flexibility of testing during the design phase. Software engineers can establish the time and resource constraints of their system and test different architecture options before committing to writing finalized code. Doing this reduces the chance that code will have to be massaged or rewritten a second time to meet performance requirements—and therefore reduces both time to release and possible introduction of errors.

Performance Modeling Technology

One goal of performance modeling is to be truly proactive in performance engineering—to examine a proposed system in its entirety, from hardware and network resources to code optimization, before completely building any one component. In this section, we’ll discuss the following:

  • Scenarios in which performance modeling can replace other methods of performance assessment and engineering

  • Different methods of modeling and when they are appropriate

  • A brief look at currently available performance modeling tools

  • A detailed look at the toolkit approach, represented by Microsoft’s Indy project

Modeling Scenarios

How can performance modeling improve business practices and efficiency, as well as the overall performance of systems? Let’s look at some of the common issues facing client-server systems.

Capacity Planning

Anticipating the need for more resources involves both business and engineering decisions. Once a company has determined an acceptable estimate for future growth, the process of improving available resources follows close behind. Simply adding more hardware, as discussed in our introductory example, can mean that the additional resources do not equal more capacity, and performance suffers, taking company profits along with it. Using TCA for capacity planning can be highly accurate, if the data in the TCA structure is accurate to begin with. However, it still requires verification and, possibly, unnecessary expenditures to verify the projected results. Through more detailed definition of the user load, hardware, and software components of a system, performance modeling can eliminate the verification step and suggest alternative ways to improve capacity before the need arises.

Bottleneck Analysis

Evaluating a system that suddenly reaches a plateau and does not respond to hardware upgrades can be a frustrating process. Using performance modeling to detail each transaction and the associated hardware requirements for those transactions can expose bottlenecks. In advance of an actual performance failure, it can also be used to test with higher potential system load and predict the occurrence of a bottleneck, simultaneously showing ways to alleviate the problem before it occurs.

Hardware Configuration

Detailed performance models can very accurately predict the behavior of proposed hardware upgrades, using performance counters and simulating transactions against the virtual hardware. The question of whether to spend money to upgrade from 2-CPU to 4-CPU servers can be answered in a performance model with a few clicks of a mouse, rather than through trial and error.

Architectural Assessment

One way to assess the performance quality of two radically different architectural structures is to build both and examine the actual results. This is hardly feasible in the real world of software engineering or multi-client services, however. Both cost and time would be prohibitive. TCA can be used to produce estimates regarding overall system performance based on hardware and network architecture, but it still requires verification. In this case, performance modeling can decrease both the time and money spent on architectural engineering by providing varying levels of detail, depending on the architecture questions to be answered.

User Scenarios

TCA is a simple and effective tool for examining what happens when the user load on a given system increases. It is more difficult, however, to use TCA to track the impact of changes in typical user behavior. Performance modeling adds the ability to test different user habits by relying on models of user actions instead of real or borrowed user data patterns.

Performance Modeling Methods

There are three main methods of modeling that can be used to predict, assess, and interpret performance during system construction. These methods are analytical, statistical, and simulation. Understanding the requirements and results of each method is critical for making decisions about which is best used for a given task.

Analytical Modeling

The process of analytical modeling involves the use of mathematical expressions to represent the interactions that take place on a system. In some cases mathematical notations such as queuing networks and Petri networks are employed to represent the architecture of a system.

Typically, each component in the model, whether it represents a hardware device, network transaction, section of software code, or user activity, is represented in the underlying system in mathematical terms. Complex equations, representing the relationships between these components, are developed and then solved to determine the projected value of a given variable in the model.

For example, if you have a Web server that only serves simple HTML pages to clients connecting over the Internet, you would need to gather information about the server’s hardware configuration (CPU, disk speed, and memory usage), the content being served (including file sizes for HTML, image, and other elements of the Web pages), the network capacity at the server (the real bandwidth available to the server), and various scenarios for client activity (including low and high load situations, average number of page views, average amount of time between page requests, and various network connection speeds at the client level). Given this data, you can extrapolate a control equation that accurately relates all of the elements in terms of the time and resource utilization.

Once the control equation (which might be represented in a spreadsheet with embedded calculations) has been established, you can change the values of different elements in order to see how those changes impact other elements within the boundaries of the equation.

You will recognize this as the basic methodology behind TCA. Knowing the system’s typical behavior, TCA can predict system performance based on possible changes in user behavior, or changes in hardware configuration (provided the data values are accurately modified to reflect the new hardware’s capacity and response times).

Analytical approaches can provide an initial estimate of the performance of new systems, but typically can produce results for only a limited set of what-if scenarios. The use of a mathematical notation to describe the most challenging performance effects of a system, such as queuing delays and resource contention, requires substantial expertise on the part of the performance engineer.

Statistical Modeling

Statistical modeling in performance engineering relies on known performance metrics for existing systems. You can use this data as a basis to predict the behavior of a new, not yet built system. This is achieved by analyzing the measured data with techniques such as regression analysis. Then the resulting statistical models can be used to extrapolate the performance of the system in new configurations.

For example, a hosting company may have a number of existing e-commerce servers for its clients already in production. To recommend hardware and network capacity for a new client, aggregate statistical data about all of the production servers could be analyzed and a recommendation made, either to use similar hardware and bandwidth, or to use different elements in the new system in hopes of improving the current systems’ performance.

As with any statistical analysis, the accuracy of the resulting projections increases with the sample size of existing data. For this reason, the hosting company in the example would want to measure performance on several systems before arriving at an average model. One barrier to using statistical analysis in performance engineering is the lack of either quantity or complexity of existing data. Reviewing the published site statistics for a comparable, competing e-commerce vendor might yield valuable statistical data about number of visitors and those visitors’ traffic patterns, but probably would not include information about the site’s hardware and network connection capacity, or the software used to provide the site’s services. In this case, a combination of statistical and analytical models (again, similar to TCA when used to project behavior of a new service) would be required to accurately assess performance prior to launch.

Statistical models include assumptions about the way the performance extra­polates. For example, the performance engineer might assume that the CPU utilization is linearly increasing with load. Although this might be true for the observed performance measurements, the CPU utilization behavior changes when it approaches saturation. Experience is the only way to avoid these pitfalls.

Simulation

Simulation may be done in two ways. The first involves building a proposed system and subjecting it to simulated load patterns to assess its performance. The accuracy of simulation modeling in this manner is rivaled only by real-world experience, because the hardware and software in place are the real tools used to provide the service being tested. Load generation can be done through the creation and automation of use cases, or by using commercially available traffic generation tools such as ACT. Stress-testing a real system not only exposes the performance problems, but demonstrates the aftermath of a performance failure as well.

For example, consider a server application that processes telephone calls for an IP telephony system. Its performance requirements run the gamut from lightning-fast CPU response time to 100-percent bandwidth availability on the network connection, because it must complete all of its requests to the telephone network within a 200-millisecond time window. Such a system might perform flawlessly under minimal user load, but an increase in the load that results in either queuing on the CPU or congestion on the network would result in telephone service outages for its users. By simulating call-processing load on the actual system before deployment, you could determine the real limits of the system and avoid overuse by setting strict limits on the number of clients who use that server for the handling of their calls.

The trade-off for accurate results in this type of simulation modeling is the fact that improvements must be made to a complete, or nearly complete, system. In the preceding example, the only way to avoid service failures with the existing call processing server is to limit the potential number of simultaneous users. Alternately, the system engineers can test alternate hardware or software configurations, but must purchase the resources or rewrite the code to do so.

The second method of simulation modeling involves writing software that accurately represents the behavior of hardware devices, and using another software tool to process existing code or simulated software behavior (such as UML) through the hardware device simulations. This eliminates the hardware costs associated with simulating only the load on a system, but can be extremely time-consuming, if a performance engineer has to develop the constructs and applications for processing the simulation from scratch.

Advanced Performance Modeling

Each method of modeling has advantages and disadvantages. In the real world, it is necessary to balance the potential quality of test results with the time and money required to run those tests. Advanced performance modeling tools, such as the Indy example later in this chapter, combine these methods to provide flexibility and scalability with lower cost and a broader learning curve.

Performance Modeling Tools

A few tools are available today that use models to evaluate and predict performance. Most of them are customized solutions, which can be purchased either as an entire system, or built to specification by the vendor. They typically feature very large libraries of existing hardware models, graphical drag-and-drop interfaces, and the services of consultants to assist engineering staff in applying the techniques provided by the software.

The power and features of these tools come at a high price—often hundreds of thousands of dollars in licensing, and more in training for the engineer who will ultimately be responsible for the tool’s use. More emphasis is typically placed on hardware than software.

The direction of most performance modeling tools on the market today is toward solving one or two specific problems, or addressing a single application’s performance. For performance modeling to be more widely used, and accessible to a larger audience, it must be more modular and support a broader range of user skill levels.

Indy: A Performance Technology Infrastructure

In this chapter, we’ll use examples from a Microsoft performance modeling project called Indy to compare the results available through TCA (from the previous chapter) and those available through more advanced forms of performance modeling. Indy is designed to tackle the problem of performance engineering by creating a performance technology infrastructure.

Performance modeling is a multidimensional space, with different modeling tools having different requirements for level of detail, modeling technique, and target audience. No one tool or static application can meet all these needs. However, a toolkit approach allows construction of specialized tools from a basic infrastructure combined with customized components. This approach also permits an infinite range of complexity in those tools and models. Thus, a simple question can be answered quickly, or a critical system component can be tested in great detail.

Indy Concepts

Indy uses a simulation-based approach to performance modeling with analytical shortcuts to improve its performance. That is, it uses internal models of each of the major devices of the system being modeled, and simulates how these devices behave and interact. After a simulated run, the performance of each device can be examined in detail.

Indy comes with a predefined library of device models. Each model can in turn have sub-models. Thus, the Indy model of a server farm might consist of various server models connected by a network model. The server models can in turn contain models of their CPUs, disks, and network interfaces. Instances of these hardware models can then be arranged in a system topology that matches the hardware configuration of the real server farm.

Given a model of the system configuration, we also need to model the input load that it will experience. The input load is referred to as its workload. For example, the Indy model of an e-commerce site might have a workload defined in terms of how often it receives requests for the various pages and actions on the site.

Finally, we must model how the various components in the system will react to the workload: that is, we must define the behavior of the system. Indy provides a range of ways to define this behavior, but here we will concentrate on an XML-based scripting language called Transaction Modeling Language (TML). A TML script defines the transactions that a Web site will support in terms of their component actions (such as computation, disk operations, or network traffic), and on what devices these actions will run.

Indy Architecture

Figure 10-1 illustrates the basic architecture of Indy and its components.

Indy architecture
Figure 10-1. Indy architecture

Kernel

At the heart of Indy is the kernel, which interacts with and controls the other components via well-defined APIs. The kernel must be present in all tools produced with the Indy toolkit. It includes the central evaluation engine that is used to produce simulation results. As noted above, the current Indy kernel uses an event-based evaluation engine that combines direct simulation with some hybrid shortcut techniques to improve performance. However, for other purposes a different evaluation engine could be used: for example, one that uses analytical or statistical modeling techniques.

Hardware DLLs

The hardware DLLs implement the models of the individual system devices, such as CPUs and disks. A library of models is available to choose from, and additional models can be easily added. Multiple models might be available for a particular device, differing in the level of detail they go into to model performance. A more detailed model can give more accurate results and allow more performance effects to be considered, but may in turn require more information at run time and may take longer to simulate.

Workload DLLs

The workload DLLs are responsible for defining and injecting events into the kernel representing the workload for a particular simulated run. As shown, a variety of workload DLLs can be used. In this chapter we describe the use of a workload DLL that interprets TML scripts to create a workload. Alternate workload DLLs could be used to interpret UML diagrams or produce a customized workload for a specialized tool.

Interface Definitions

Information about the configuration of other components is stored by the kernel in a metadirectory. For example, different versions of the same basic disk can use the same hardware model, but with different performance characteristics. Similarly, performance characteristics of a workload, such as how often a transaction occurs or how much network traffic it causes, can be varied without having to recode the workload DLL or TML script.

Front-end EXEs

The front-end executable combines the kernel, hardware, and workload DLLs, and the metadirectory with an appropriate user interface for the final tool to use. In a production environment we might choose to export data about a simulated run to an Excel spreadsheet or a SQL database. However, in this chapter we will concentrate on a graphical interface called IndyView. It is intended to be used by performance engineers who require more detailed access to information about an Indy simulation. You will see samples of its output later this chapter.

IndyView

In this section we will see some elements of the IndyView interface used to examine a performance model of the IBuySpy sample Web site. We will also see the underlying XML code used to represent various aspects of the model.

System Topology

Consider a simple configuration of the IBuySpy sample Web site, as shown in Figure 10-2:

Physical topology for a typical IBuySpy sample Web site
Figure 10-2. Physical topology for a typical IBuySpy sample Web site

The system consists of an IIS server and a SQL server (which together make up the IBuySpy sample Web site), connected to one another over the LAN. The IIS server is also connected to the Internet, and processes requests from remote clients via the Internet.

Figure 10-3 is an example of how this simple topology for the IBuySpy Web site can be represented using the IndyView interface.

Indy topology for the IBuySpy sample Web site
Figure 10-3. Indy topology for the IBuySpy sample Web site

Examining this in more detail, the IIS server (iissrv) includes Network Interface Card (NIC) devices and CPU devices. Devices that are functionally identical can be represented in multiples, as the CPUs are shown here. The SQL server (sqlsrv) includes NIC, CPU and disk devices. Although it is obvious that the real IIS server also has a disk, its performance is not relevant to the model being tested, and so it has not been included.

On the sqlsrv device, below the CPU x 2 heading, there are two more devices: a CPU performance counter, and the CpuModel:Pentium 1000MHz device, which defines the behavior of this particular type of CPU, including how quickly it can process calculations. Similarly, below the Disk x 1 heading, you find both a disk performance counter and a DiskModel: HP NetRAID-4M device, which has its access speed and other hardware performance factors predefined.

For the purposes of the performance being measured here, the client’s hardware configuration does not need to be detailed. Therefore, the client device is just a black box, referenced in the transactions as the requester or recipient of data.

The net and lan devices, located at the same hierarchical level as the two servers, represent the properties of the network connections for the Internet and LAN segments respectively.

Finally, the links between each of the computer devices and the network devices are detailed, including which interface on each computer connects to which network device. This is so that the output of the test results will distinguish between the different devices’ activities at any point in the transaction.

The following XML code underlies the diagram in Figure 10-3:

<?xml version="1.0" encoding="utf-8"?>
<system name="IBuySpy">
    <active_device type="computer" name="iissrv" count="1">
        <active_device type="generic" name="lan_nic_send" count="1"/>
        <active_device type="generic" name="lan_nic_recv" count="1"/>
        <active_device type="generic" name="net_nic_send" count="1"/>
        <active_device type="generic" name="net_nic_recv" count="1"/>
        <active_device type="cpu" name="cpu" count="2">
            <rct name="cpu"/>
            <use_template name="CpuModel:Pentium 1000MHz"/>
        </active_device>
    </active_device>
    <active_device type="computer" name="sqlsrv" count="1">
        <active_device type="generic" name="lan_nic_send" count="1"/>
        <active_device type="generic" name="lan_nic_recv" count="1"/>
        <active_device type="cpu" name="cpu" count="2">
            <rct name="cpu"/>
            <use_template name="CpuModel:Pentium 1000MHz"/>
        </active_device>
        <active_device type="generic" name="disk" count="1">
            <rct name="disk"/>
            <use_template name="DiskModel:HP NetRAID-4M"/>
        </active_device>
    </active_device>
    <open_device name="client"/>
    <passive_device type="network" name="net" ports="100">
        <use_template name="NetModel2:OptimumCapacity"/>
    </passive_device>
    <passive_device type="network" name="lan" ports="100">
        <use_template name="LanModel:Ethernet"/>
    </passive_device>
    <link active="client" passive="net" fromport="0" toport="0"/>
    <link active="iissrv[?].2" passive="net" fromport="0" toport="99"/>
    <link active="iissrv[?].3" passive="net" fromport="0" toport="99"/>
    <link active="iissrv[?].0" passive="lan" fromport="0" toport="99"/>
    <link active="iissrv[?].1" passive="lan" fromport="0" toport="99"/>
    <link active="sqlsrv[?].0" passive="lan" fromport="0" toport="99"/>
    <link active="sqlsrv[?].1" passive="lan" fromport="0" toport="99"/>
</system>

The devices referenced in the TML code are defined in the metadirectory of hardware configurations. By changing the underlying properties of one of the referenced devices, this same script could be used to test different architectural options. Similarly, the number of devices can be changed to examine the performance impact of factors such as number of CPUs in a server.

IBuySpy Search Transaction

Having constructed our topology, we can now define the transactions that it will support. Here we see a simple example of a transaction written in TML to simulate the request and processing of a search page on IBuySpy.

<tml>

    ...

    <!-- BasicSearch
      Request the .aspx and then the two gifs
    -->
    <transaction name="BasicSearch" frequency="BasicSearchFreq">
      <include name="ChooseClientSpeed" />
      <action name="net_msg_sync_async" connection="net" service="Client" 
               saveschedule="clientstate">
        <param name="linkspeed" value="transaction.ClientSpeed" />
        <param name="msgsize" value="HttpRequestSize*3" />
        <peer name="target" service="IIS" saveschedule="iisstate" />
      </action>
      <action name="compute" service="IIS" useserver="iisstate">
        <param name="cpuops" value="BasicSearchCpu" />
      </action>
      <action name="net_msg_async_sync" connection="net" service="IIS" 
               useserver="iisstate">
        <param name="linkspeed" value="transaction.ClientSpeed" />
        <param name="msgsize" value="BasicSearchSize" /> 
          <!-- just the .aspx page size -->
        <peer name="target" service="Client" useserver="clientstate" />
      </action>
      <fork>
        <branch>
          <action name="net_msg_sync_async" connection="net" 
               service="Client" saveschedule="clientstate">
            <param name="linkspeed" value="transaction.ClientSpeed" />
            <param name="msgsize" value="HttpRequestSize" />
            <peer name="target" service="IIS" saveschedule="iisstate" />
          </action>
          <action name="net_msg_async_sync" connection="net" service="IIS" 
               useserver="iisstate">
            <param name="linkspeed" value="transaction.ClientSpeed" />
            <param name="msgsize" value="0.04" /> <!-- 1x1.gif -->
            <peer name="target" service="Client" useserver="clientstate" />
          </action>
        </branch>
        <branch>
          <action name="net_msg_sync_async" connection="net" service="Client" 
               saveschedule="clientstate">
            <param name="linkspeed" value="transaction.ClientSpeed" />
            <param name="msgsize" value="HttpRequestSize" />
            <peer name="target" service="IIS" saveschedule="iisstate" />
          </action>
          <action name="net_msg_async_sync" connection="net" service="IIS" 
               useserver="iisstate">
            <param name="linkspeed" value="transaction.ClientSpeed" />
            <param name="msgsize" value="1.52" /> <!-- thumbs/image.gif -->
            <peer name="target" service="Client" useserver="clientstate" />
          </action>
        </branch>
      </fork>
    </transaction>

    ...
</tml>

The transaction definition begins with a name and a relative frequency with which the transaction occurs. Then the actions within the transaction are listed in the order in which they occur. In this example script, the individual actions (which each begin with action name and end with /action) are:

  1. net_msg_sync_async: Send an HTTP request message from the client service (representing all of the possible client machines on the Internet) to the IIS service over the Internet, using variable parameters for the link speed and message size. This message is sent synchronously (that is, the client waits for a response), but is received asynchronously (the server can handle many simultaneous requests).

  2. compute: Process the HTTP request on the IIS server with the variable parameter of how many CPU operations are required.

  3. net_msg_async_sync: Send a message back from the IIS service to the client service over the Internet, using variable parameters of link speed and message size. In the comments, we see that the value BasicSearchSize is just the ASPX page size, meaning that the variable BasicSearchSize has been previously defined as a workload parameter that contains the network size of the ASPX file. This variable can then be easily modified from within IndyView.

  4. At this point, the script dictates a fork into two branches, which will be executed simultaneously:

    The first branch, containing the actions net_msg_sync_async and net_msg_async_sync, make up the request and response for a GIF file (referred to in the comment as 1x1.gif) with a size of 0.04 KB.

    The second branch, containing the actions net_msg_sync_async and net_msg_async_sync, make up the request and response for a GIF file (referred to in the comment as thumbs/image.gif) with a size of 1.52 KB.

Hard coding parameter values in this way results in a script that will require editing if any of the values changed. For parameters whose values a user might want to change frequently, it makes more sense to use a workload variable, as with the ASPX page size and CPU cost.

We can drill down to another level of detail to see what information is embedded in one of the service definitions in the script:

<service name="IIS">
    <serverlist>
        <server name="iissrv" />
    </serverlist>
    <actionscheduling>
        <schedule action="compute" policy="roundrobin">
            <target device="cpu" />
        </schedule>
        <schedule action="net_msg_async_sync" connection="net" 
            policy="random">
            <target device="nic_send" />
        </schedule>
    </actionscheduling>
</service>

This tells us that for the service IIS, the device iissrv (defined in the system topology) is to be used. Actions can be scheduled on iissrv’s sub-devices. In this script, when the action compute is required by a transaction, the target sub-device is one of the two CPUs, chosen using a round-robin policy. When traffic must be sent out to the Internet (the net device in the system topology script), the target device is the NIC dedicated to sending.

When processed by the Indy kernel API using the device definitions in the system topology, IndyView can produce a number of different visual representations of the transaction as a whole, or can focus on specific devices and how they are affected throughout the flow of the transaction. Figure 10-4, Figure 10-5, Figure 10-6, and Figure 10-7 were all produced using a sample model of the IBuySpy sample Web site.

Control flow of Basic Search Transaction
Figure 10-4. Control flow of Basic Search Transaction

After we have defined a transaction in TML, we can use IndyView to visualize it with a transaction flow diagram, as shown in Figure 10-4. This simple view can be used to inspect and debug the TML, and would typically be used by a performance engineer in the development stage of a model.

This search flow diagram shows the order and dependencies of each action in the transaction. The sequence of actions include: request from client to Web server, computation on the Web server, response from the Web server to the client, and then two image fetches in parallel being received by the client. Clicking on one of the actions in the flow diagram opens a window that provides more information. In this figure, one is a message from server to client (background) and the other is the computation on an iissrv CPU (foreground).

Time-space analysis
Figure 10-5. Time-space analysis

Figure 10-5 shows the time and resource requirements of events taking place on all system devices. This diagram can be used by a performance engineer to visualize the level of utilization of individual devices and determine possible performance problems by simple inspection of detailed events. Time goes left to right, and each line represents a device (identified in the device list on the left side of the screen), while boxes represent events. The color map, which corresponds to the type of event, is shown on the diagram below the toolbar. Lines in the window connect communication events where both partners in the communication are visible in the window. Black circles represent communication events in which one partner of the communication is not visible. Since clients are not shown in this view, communications with them are displayed this way. The device numbers on the left are derived from the topology script. The most utilized resources in the diagram are the IIS server CPUs (iissrv[0000].0004-0005) and the SQL disk (sqlsrv[0000].0004).

Rather than looking at all of the actions taking place on the entire system, we can use the Transaction Analysis view of IndyView to examine how long each of the individual actions in a particular instance of a transaction take, and what resources they require, as shown in Figure 10-6.

Search transaction analysis
Figure 10-6. Search transaction analysis

The top half of the screen shows the control flow of a search transaction, stretched to represent the actual time of taken by each action. In addition, the panel to the left shows the start time and name of each transaction in the simulated run, allowing each of them to be examined individually.

In the lower half, the events that take place during the selected transaction are highlighted in green. The panel to the left of this section shows just the devices involved in this particular transaction: iissrv[0000].0002 is the sending activity on the NIC that the first IIS server is connected to the Internet with; iis-srv[0000].0003 is the receiving activity on the same; and iissrv[0000].0004 is the first of the CPUs.

A performance engineer can use the previous views to construct and debug a performance model. Then, additional IndyView screens can be used to evaluate and analyze the performance impact of various scenarios. Figure 10-7 shows a diagram similar to that produced by the Windows monitoring tool System Monitor.

This screen shows the predicted CPU utilization for the SQL server (black line), displayed simultaneously with the utilization of the backbone network (highlighted blue line, averaging around 5 percent). For more information on performance counters, refer to Chapter 4.

Performance counter prediction
Figure 10-7. Performance counter prediction
Predicted queue lengths
Figure 10-8. Predicted queue lengths

IndyView includes a statistics engine that allows users to examine any performance metric of the system being modeled, using either a built-in graphing tool or by exporting data to an Excel spreadsheet or a database. The two graphs in Figure 10-8 show the predicted average queue size for different event types during a sample run of IBuySpy on a particular system topology. The top graph shows event queues on the IIS server while the bottom graph shows event queues on the SQL server. Looking at the scales on the graphs, it is clear the system bottleneck of the system is the IIS processor since the average computation queue size is 21.4. By comparison, very little queuing is taking place on its NICs. The SQL server also has very small average queue sizes.

A performance engineer would typically use this view to predict possible methods of improving overall system performance. By changing the system topology script to include more processors or a set of load-balanced IIS servers, improvements in performance would immediately become visible. In addition, since the overall model takes into account the actual behavior of the other devices in the system, improving the CPU capacity of the IIS server in the model would then show the next possible bottleneck in the system’s performance.

TCA vs. Performance Modeling Conclusions

In Chapter 9, we used verification tests to confirm the costs predicted by the TCA model (see Figure 9-9). For purposes of comparison, we used Indy to define a performance model of IBuySpy’s concurrent user capacity, using the numbers we obtained from TCA as event costs. The results are shown here side by side:

Table 10-1. Comparing TCA and Indy Predictions

Concurrent Users

TCA Predicted Mcycles

Indy Predicted Mcycles

Measured Mcycles

1000

116.0

115.8

120.2

10000

1160.0

1166.0

1147.0

14653

1699.0

1694.0

1661.0

For this simple model, Indy accurately tracks the results of both TCA and the measurements. This shows that two completely different performance-m-odeling techniques, namely the analytical model of TCA, and the hybrid simulation approach of Indy, can accurately model the same system. We will now explore the areas in which Indy can extend upon the capabilities of TCA.

Building What-if Scenarios Using Indy

As we discussed earlier, one of the major advantages of performance modeling is the ability to configure each minute element of the overall client/server interaction, in order to test different scenarios before a particular configuration or code architecture is chosen. In the following examples, two key performance issues—bottleneck analysis and architectural evaluation—are evaluated using Indy.

Bottleneck analysis
Figure 10-9. Bottleneck analysis

What-if Scenario 1: Bottleneck Analysis

Figure 10-9 shows an example of the type of bottleneck analysis possible with Indy. The graph shows the predicted performance of an e-commerce site as we change the number of Web servers. The site is being stress-tested to show the maximum achievable throughput for purchase transactions. As we would expect, increasing the number of Web servers increases the total throughput of the system in terms of purchase transactions per second. However, we reach a plateau at seven Web servers: beyond this point, adding extra Web servers does not increase the throughput of the system. When we use Indy to look at the simulated queuing delays in each of the active components, we see that the SQL server has reached saturation point. After this point the system throughput will remain the same until we increase the number of SQL servers or their performance.

Given this conclusion, further tests using this existing set of transactions could be performed to determine how much of an improvement hardware changes on the SQL servers might provide, or how many more SQL servers could feasibly be added before other elements like network performance were affected.

What-if Scenario 2: Architectural Improvements

Another feature of Indy is the ability to model how architectural changes will affect the performance of a system. For example, imagine we are running IBuySpy on an e-commerce site with only two old 450-MHz CPU Web servers with a static load-balancer. For the standard user mix we have used in this chapter, we can use Indy to determine a maximum throughput of 46.8 transactions per second. Christmas is coming, so we decide to add a third server to the mix. This is a more modern Web server with a 1-GHz CPU. Despite more than doubling the total CPU horsepower of our Web servers, Indy predicts that they will only support a maximum throughput of 53.5 transactions per second. The problem is that we are still using round-robin load balancing, so that only one-third of our transactions are benefiting from the faster CPU. If we change to using a dynamic load balancing technique that takes account of relative server load, Indy predicts our throughput will increase to 73.4 transactions per second. This type of modeling of dissimilar server types, combined with the dynamic runtime behavior of a load-balancing system, would be impossible in TCA.

Conclusion

The Indy system is just one example of an advanced performance-modeling tool. The toolkit approach, in which users can rely on an included library of hardware and network models, expands Indy’s usability to include non-experts in the area of performance modeling. At the same time, the power of TML to infinitely customize objects related to one’s own code and hardware makes Indy a valuable tool for very advanced software performance engineers.

In any engineering effort, the ability to predict success with certainty reduces the bottom line for both time and money, and improves the confidence of system architects and business managers alike. Using the principles discussed in this book, you should now be able to think about your own development and production processes with an eye toward how you can increase performance through careful consideration, rather than simply through trial and error.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset