Chapter 23. Tuning and Optimization Techniques

Performance tuning and optimization is frequently overlooked by administrators in an era where hardware has become relatively inexpensive and the processing power of systems in many cases far exceeds the utilization demands of an organization. However in cost-cutting efforts, organizations are consolidating servers, sometimes consolidating five to 10 servers into a single server. In other cases, organizations are fine-tuning servers to optimize the performance of the systems in an effort to minimize server operation inefficiencies that frequently lead to premature server failures.

This chapter focuses on tips, tricks, and best practices to tune and optimize a Windows Server 2003 system and the Windows 2003 networking environment.

Understanding of Capacity Analysis

Most capacity analysis works to minimize unknown or immeasurable variables, such as the number of gigabytes or terabytes of storage the system will need in the next few months or years to adequately size a system. The high number of unknown variables is largely because network environments, business policies, and personnel are constantly changing. As a result, capacity analysis is an art as much as it involves experience and insight.

If you’ve ever found yourself specifying configuration requirements for a new server or estimating whether your configuration will have enough power to sustain various workloads now and in the foreseeable future, proper capacity analysis can help in the design and configuration of an efficiently running network environment. These capacity-analysis processes help you weed out the unknowns and make decisions as accurately as possible. They do so by giving you a greater understanding of your Windows Server 2003 environment. This knowledge and understanding can then be used to reduce time and costs associated with supporting and designing an infrastructure. The result is that you gain more control over the environment, reduce maintenance and support costs, minimize fire-fighting, and make more efficient use of your time.

Business depends on network systems for a variety of different operations, such as performing transactions or providing security, so that the business functions as efficiently as possible. Systems that are underused are probably wasting money and are of little value. On the other hand, systems that are overworked or can’t handle workloads prevent the business from completing tasks or transactions in a timely manner, can cause a loss of opportunity, and might keep the users from being productive. Either way, these systems are typically not much benefit to operating a business. To keep network systems well tuned for the given workloads, capacity analysis seeks a balance between the resources available and the workload required of the resources. The balance provides just the right amount of computing power for given and anticipated workloads.

This concept of balancing resources extends beyond the technical details of server configuration to include issues such as gauging the number of administrators that might be needed to maintain various systems in your environment. Many of these questions relate to capacity analysis, and the answers aren’t readily known because they can’t be predicted with complete accuracy.

Capacity analysis provides the processes to guide you through lessening the burden and dispelling some of the mysteries of estimating resource requirements. These processes include vendor guidelines, industry benchmarks, analysis of present system resource use, and more. Through these processes, you’ll gain as much understanding as possible of the network environment and step away from the compartmentalized or limited understanding of the systems. In turn, you’ll also gain more control over the systems and increase your chances of successfully maintaining the reliability, serviceability, and availability of your system.

To proactively manage your system, first establish systemwide policies and procedures that shape service levels and user expectations. After these policies and procedures are classified and defined, you can start characterizing system workloads, which will help gauge acceptable baseline performance values.

Best Practice for Establishing Policy and Metric Baselines

If you first begin defining policies regarding desired service levels and objectives, the resulting procedures are more easily created and implemented. Essentially, policies and procedures define how the system is supposed to be used—establishing guidelines to help users understand that the system can’t be used in any way they see fit. Many benefits are derived from these policies and procedures. For example, in an environment where policies and procedures are working successfully and where network performance becomes sluggish, it would be safer to assume that groups of people aren’t playing a multiuser network game, that several individuals aren’t sending enormous e-mail attachments to everyone in the global address list, or that a rogue Web or FTP server wasn’t placed on the network.

Network performance is a combination of both individual uses of the system just as well as IT department optimization. Therefore, it’s equally important to gain an understanding of user expectations and requirements through interviews, questionnaires, surveys, and more. Some examples of operational policies that can be implemented in a networking environment pertaining to end users could be the following:

  • Only certain applications will be supported and allowed on the network.

  • E-mail message size can’t exceed 2MB.

  • Beta software can be installed only on lab equipment (that is, not on client machines or servers in the production environment).

  • All computing resources are for business use only (in other words, no gaming or personal use of computers is allowed).

  • All home directories will be limited to 300MB per user.

  • Users must request assistance through a managed helpdesk rather than try to apply patches, fixes, or conduct system repairs on their own.

Policies and procedures, however, aren’t just for end users. They can also be established and applied to IT personnel. In this scenario, policies and procedures can serve as guidelines for technical issues, rules of engagement, or simply an internal set of rules. The following list provides some examples of policies and procedures that might be applied to the IT personnel:

  • System backups must include System State data and should be completed by 5 a.m. each workday.

  • Routine system maintenance should be performed only on Saturday mornings between 5 and 8 a.m.

  • Basic technical support requests should be attended to within two business days.

  • Priority technical support requests should be attended to within four hours of the request.

  • Technical support staff should use Remote Desktop on client machines first before attempting to solve the problem locally.

  • Any planned downtime for servers must be approved by the IT management at least one week in advance.

Benchmark Baselines

If you’ve begun defining policies and procedures, you’re already cutting down the number of immeasurable variables and amount of empirical data that challenge your decision-making process. The next step to prepare for capacity analysis is to begin gathering baseline performance values.

Baselines give you a starting point against which to compare results. For the most part, determining baseline performance levels involves working with hard numbers that represent the health of a system. On the other hand, a few variables coincide with the statistical representations such as workload characterization, vendor requirements or recommendations, industry-recognized benchmarks, and the data that you collect.

Workload Characterization

It is unlikely that each system in your environment is a separate entity that has its own workload characterization. Most, if not all, network environments have systems that depend on other systems or are even intertwined among different workloads. This makes workload characterization difficult at best.

Workloads are defined by how processes or tasks are grouped, the resources they require, and the type of work being performed. Departmental functions, time of day, the type of processing required (batch or real-time), companywide functions (such as payroll), volume of work, and much more can be characterized as examples of workloads.

So why is workload characterization so important? Identifying system workloads enables you to determine the appropriate resource requirements for each of them. This way, you can properly plan the resources according to the performance levels the workloads expect and demand.

Benchmarks for Performance Analysis

Benchmarks are a means to measure the performance of a variety of products, including operating systems, virtually all computer components, and even entire systems. Many companies rely on benchmarks to gain competitive advantage because so many professionals rely on them to help determine what’s appropriate for their network environment.

As you would suspect, sales and marketing departments often exploit the benchmark results to exaggerate the performance or benefit of a technology solution. For this reason, it’s important to investigate the benchmark results and the companies or organizations that produced the results. Check to make sure that the benchmarks are consistent with other benchmarks produced by third-party organizations (such as magazines, benchmark organizations, and in-house testing labs). If none are available, you should try to gain insight from other IT professionals or run benchmarks on the product yourself before implementing it in production.

Although some suspicion might arise from benchmarks because of the sales and marketing techniques, the real purpose of benchmarks is to point out the performance levels that you can expect when using the product. Benchmarks can be extremely beneficial for decision-making, but they shouldn’t be your sole source for evaluating and measuring performance. Use the benchmark results only as a guideline or starting point when consulting benchmark results during capacity analysis. You should also pay close attention to their interpretation.

A list of companies or organizations that provide benchmark statistics and benchmark-related information along with some tools for evaluating product performance include

Leveraging Capacity-Analysis Tools

A growing number of tools originating and evolving from the Windows NT 4.0, Windows 2000, and Unix operating system platforms can be used in data collection and analysis on Windows Server 2003. Some of these tools are even capable of forecasting system capacity, depending on the amount of information they are given.

Microsoft also offers some handy utilities that are either inherent to Windows Server 2003 or are sold as separate products. Some of these utilities are included with the operating system, such as Task Manager, Network Monitor, and Performance Console (also known as Performance Monitor). Data that is collected from these applications can be exported to other applications, such as Microsoft Excel or Access, for inventory and analysis. Other Microsoft utilities that are sold separately are Systems Management Server (SMS) and Microsoft Operations Manager (MOM).

Built-in Toolset

Windows Server 2003’s arsenal of utilities for capacity analysis includes command-line and GUI-based tools. This section discusses the Task Manager, Network Monitor, and Performance Console, which are bundled with the Windows Server 2003 operating system.

Task Manager

The Windows Server 2003 Task Manager is similar to its Windows 2000 predecessor in that it offers multifaceted functionality. You can view and monitor processor, memory, application, and process information in real-time for a given system. This utility is great for getting a quick view of key system health indicators with the lowest performance overhead.

To begin using Task Manager, use any of the following methods:

  • Press Ctrl+Shift+Esc.

  • Right-click the taskbar and select Task Manager.

  • Press Ctrl+Alt+Delete and then click Task Manager.

When you start the Task Manager, you’ll see a screen similar to that in Figure 23.1.

The Task Manager window after initialization.

Figure 23.1. The Task Manager window after initialization.

The Task Manager window contains the following five tabs:

  • Applications—. This tab lists the user applications that are currently running. You also can start and end applications under this tab.

  • Processes—. Under this tab, you can find performance metric information of the processes currently running on the system.

  • Performance—. This tab can be a graphical or tabular representation of key system parameters in real-time.

  • Networking—. This tab displays the network traffic coming to and from the machine. The displayed network usage metric is a percentage of total available network capacity for a particular adapter.

  • Users—. This tab displays users who are currently logged on to the system.

In addition to the Task Manager tabs, the Task Manager is, by default, configured with a status bar at the bottom of the window. This status bar, shown in Figure 23.2, displays the number of running processes, CPU utilization percentage, and the amount of memory currently being used.

All processes currently running on the system.

Figure 23.2. All processes currently running on the system.

As you can see, the Task Manager presents a variety of valuable real-time performance information. This tool is particularly useful for determining what processes or applications are problematic and gives you an overall picture of system health.

There are limitations, however, which prevent it from becoming a useful tool for long-term or historical analysis. For example, the Task Manager can’t store collected performance information; it is capable of monitoring only certain aspects of the system’s health, and the information that is displayed pertains only to the local machine. For these reasons alone, the Task Manager doesn’t make a prime candidate for capacity-planning purposes (you must be logged on locally or connected via Terminal Services to gauge performance with the Task Manager).

Network Monitor

There are two versions of Network Monitor that you can use to check network performance. The first is bundled within Windows Server 2003, and the other is a part of Systems Management Server (SMS). Although both have the same interface, like the one shown in Figure 23.3, the one bundled with the operating system is slightly scaled down in terms of functionality when compared to the SMS version.

The unified interface of the Network Monitor.

Figure 23.3. The unified interface of the Network Monitor.

The Network Monitor that is built into Windows Server 2003 is designed to monitor only the local machine’s network activity. This utility design stems from security concerns regarding the ability to capture and monitor traffic on remote machines. If the operating system version had this capability, anyone who installed the Network Monitor would possibly be able to use it to gain unauthorized access to the system. Therefore, this version captures only frame types traveling into or away from the local machine.

To install the Network Monitor, perform the following steps:

  1. Double-click the Add/Remove Programs applet on the Control Panel.

  2. In the Add/Remove Programs window, click Add/Remove dialog box.

  3. Within the Windows Components Wizard, select Management and Monitoring Tools and then click Details.

  4. In the Management and Monitoring Tools window, select Network Monitor Tools and then click OK and then Next.

  5. If you are prompted for additional files, insert your Windows Server 2003 CD or type a path to the location of the files on the network. You might be prompted to install the Phone Book Services (PBS) at this point. Choose Yes to continue with the PBS installation.

  6. After the installation, locate and execute the Network Monitor by choosing Start, Programs, Administration Tools.

As described previously, the SMS version of the Network Monitor is a full version of the one integrated into Windows Server 2003. The most significant difference between the two versions is that the SMS version can run indiscriminately throughout the network (that is, it can monitor and capture network traffic to and from remote machines). It is also equipped to locate routers on the network, provide name-to-IP address resolution, and generally monitor all the traffic traveling throughout the network.

Because the SMS version of Network Monitor is capable of capturing and monitoring all network traffic, it poses possible security risks. Any unencrypted network traffic can be compromised; therefore, it’s imperative that you limit the number of IT personnel who have the necessary access to use this utility.

On the other hand, the SMS version of Network Monitor is more suitable for capacity-analysis purposes because it is flexible enough to monitor network traffic from a centralized location. It also allows you to monitor in real-time and capture for historical analysis. For all practical purposes, however, it wouldn’t make much sense to install SMS just for the Network Monitor capabilities, especially considering that you can purchase more robust third-party utilities.

The Performance Console

Many IT professionals rely on the Performance Console because it is bundled with the operating system, and it allows you to capture and monitor every measurable system object within Windows Server 2003. This tool is a Microsoft Management Console (MMC) snap-in, so using the tool involves little effort to become familiar with it. You can find and start the Performance Console from within the Administrative Tools group on the Start menu.

The Performance Console, shown in Figure 23.4, is by far the best utility provided in the operating system for capacity-analysis purposes. With this utility, you can analyze data from virtually all aspects of the system both in real-time and historically. This data analysis can be viewed through charts, reports, and logs. The log format can be stored for use later so that you can scrutinize data from succinct periods of time.

The Performance Console startup screen.

Figure 23.4. The Performance Console startup screen.

Because the Performance Console is available to everyone running the operating system and it has a lot of built-in functionality, most administrators choose to use this built-in utility.

Third-Party Toolset

Without a doubt, many third-party utilities are excellent for capacity-analysis purposes. Most of them provide additional functionality not found in the Windows Server 2003 Performance Console, but they cost more too. You might want to evaluate some third-party utilities to get a more thorough understanding of how they might offer more features than the Performance Console. Generally speaking, these utilities enhance the functionality that is inherent to Performance Console, such as scheduling, an enhanced level of reporting functionality, superior storage capabilities, the ability to monitor non-Windows systems, or algorithms for future trend analysis.

Some of these third-party tools are listed in Table 23.1.

Table 23.1. Third-Party Capacity-Planning Tools

Utility Name

Company

Web Site

AppManager Suite

NetIQ Corporation

http://www.netiq.com/solutions/

   

Openview

Hewlett-Packard

http://www.openview.hp.com/

PATROL

BMC Software

http://www.bmc.com/products

   

PerfMan

Information Systems

http://www.infosysman.com/

   

RoboMon

Heroix

http://www.robomon.com/

   

Unicenter TNG

Computer Associates

http://www3.ca.com/Solutions/Solution.asp?id=315

Although it might be true that most third-party products do add more functionality to your capacity-analysis procedures, there are still pros and cons to using them over the free Performance Console. The most obvious is the expense of purchasing the software licenses for monitoring the enterprise, but some less obvious factors include the following:

  • The number of administrators needed to support the product in capacity-analysis procedures is high.

  • Some third-party products have high learning curves associated with them. This increases the need for either vendor or in-house training just to support the product.

The key is to decide what you need to adequately and efficiently perform capacity-analysis procedures in your environment. You might find that the Performance Console is more than adequate, or you might find that your network environment requires a third-party product that can encompass all its intricacies.

Identifying and Analyzing Core Analysis and Monitoring Elements

The capacity analysis and performance optimization process can be intimidating because there can be an enormous amount of data to work with. In fact it can easily become unwieldy if not done properly. The process is not just about monitoring and reading counters; it is also an art.

As you monitor and catalog performance information, keep in mind that more information does not necessarily yield better optimizations. Tailor the number and types of counters that are being monitored based on the server’s role and functionality within the network environment. It’s also important to monitor the four common contributors to bottlenecks: memory, processor, disk, and network subsystems. When monitoring application servers like Microsoft Exchange systems, it is equally important to understand the various roles each server plays (front-end server, back-end server, bridgehead gateway server, and so on) to keep the number of counters being monitored to a minimum.

Memory Subsystem Optimizations

As with earlier versions of Windows, Windows Server 2003 tends to use the amount of memory that you throw at it. However, its efficient memory management outperforms its predecessors. Nevertheless, fine-tuning system memory can go a long way toward making sure that each Windows 2003 server has adequate amounts of memory.

Memory management is performed by Windows Server 2003 and is directly related to how well applications on the server perform. Windows Server 2003 also has greatly enhanced memory management and the way it uses virtual memory. This reduces memory fragmentation and enables more users to be supported on a single server or cluster of servers.

Using the Performance Monitor Console, there are a number of important memory-related counters that can help you establish an accurate representation of the system’s memory requirements. The primary memory counters that provide information about hard pages (pages that are causing the information to be swapped between the memory and the hard disk) are

  • Memory—Pages/sec. The values of this counter should range from 5 to 20. Values consistently higher than 10 are indicative of potential performance problems whereas values consistently higher than 20 might cause noticeable and significant performance hits.

  • Memory—Page Faults/sec. This counter together with Memory—Cache Faults/sec and Memory—Transition Faults/sec counters can provide valuable information about page faults that are not committed to disk because the memory manager has allocated those pages to a standby list also known as transition faults. Most systems today can handle a large number of page faults but it is important to correlate these numbers with the Pages/sec counter as well to determine whether or not each application is configured with enough memory.

Figure 23.5 shows some of the various memory and process counters.

Memory-related counters in Windows Server 2003.

Figure 23.5. Memory-related counters in Windows Server 2003.

Improving Virtual Memory Usage

Calculating the correct amount of virtual memory is one of the more challenging aspects of planning a server’s memory requirements. While trying to anticipate growing usage demands, it is critical that the server has an adequate amount of virtual memory for all applications and the operating system.

Virtual memory refers to the amount of disk space that is used by Windows Server 2003 and applications as physical memory gets low or when applications need to swap data out of physical memory. Windows Server 2003 uses 1.5 times the amount of RAM as the default minimum paging file size, which is adequate for many systems. However, it is important to monitor memory counters to determine if this amount is truly sufficient for that particular server’s resource requirements. Another important consideration is the maximum size setting for the paging file. As a best practice, this setting should be at least 50 percent more than the minimum value to allow for paging file growth should the system require it. If the minimum and maximum settings are configured with the same value, there is a greater risk that the system could experience severe performance problems or even crash.

The most indicative sign of low virtual memory is the presence of warning events, such as the Event 9582 logged by the Microsoft Exchange Information Store service, that can severely impact and degrade the Exchange Server’s message-processing abilities. These warning events are indicative of virtual memory going below 32MB. If unnoticed or left unattended, these warning messages might cause services to stop or the entire system to fail.

To get an accurate portrayal of how a server is using virtual memory, monitor the following counters in the Performance MMC tool:

  • VM Largest Block Size. This counter should consistently be above 32MB.

  • VM Total 16MB Free Blocks. This counter should remain greater than three 16-MB blocks.

  • VM Total Free Blocks. This value is specific to your messaging environment.

  • VM Total Large Free Block Bytes. This counter should stay above 50MB.

Other important counters to watch closely are as follows:

  • Memory—Available Bytes. This counter can be used to establish whether the system has adequate amounts of RAM. The recommended absolute minimum value is 4MB.

  • Paging File—% Usage—%. Usage validates the amount of the paging file used in a predetermined interval. High usage values might indicate that you need more physical memory or need to increase the size of a paging file.

Monitoring Processor Usage

Analyzing the processor usage can reveal invaluable information about system performance and provide reliable results that can be used for baselining purposes. There are two major processor counters that are used for capacity analysis of a Windows Server 2003 system.

  • % Privileged Time. Indicates the percentage of non-idle processor time spent in privileged mode. The recommended value is less than 55 percent.

  • % Processor Time. Specifies the use of each processor or the total processor utilization. If these values are consistently higher than 50%–60%, you should consider upgrading options or segmenting workloads.

Optimizing the Disk Subsystem Configuration

There are many factors such as the type of file system to use, physical disk configuration, database size, and log file placement that need to be considered when you are trying to optimize the disk subsystem configuration. When optimizing the disk subsystem, there are many choices that need to be made that are specific to the configuration of the existing network environment.

Choosing the File System

Among the file systems supported by Windows Server 2003 (FAT and NTFS), it is recommended you use only NTFS on all servers, especially those in production environments. Simply put, NTFS provides the best security, scalability, and performance features. For instance, NTFS supports file and directory-level security, large file sizes (files of up to 16TB), large disk sizes (disk volumes of up to 16TB), fault tolerance, disk compression, error detection, and encryption.

Choosing the Physical Disk Configuration

Windows Server 2003, like its predecessors, supports RAID (Redundant Array of Inexpensive Disks). The levels of RAID supported by the operating system are

  • RAID 0 (striping)

  • RAID 1 (mirroring)

  • RAID 5 (striping with parity)

Two Recommended Basic RAID Levels to Use

There are various levels of RAID but for the context of enterprise servers, there are two recommended basic levels to use: RAID 1 and RAID 5. Other forms of RAID, such as RAID 0+1 or 1+0 are also optimal solutions for enterprise servers. These more advanced levels of RAID are only supported when using a hardware RAID controller. Therefore, only RAID 1 and 5 will be discussed in this chapter.

There are various other levels of RAID that can be supported through the use of hardware-based RAID controllers.

The deployment of the correct RAID level is of utmost importance because each RAID level has a direct effect on the performance of the server. From the viewpoint of pure performance, RAID level 0 by far gives the best performance. However, fault tolerance and the reliability of system access are other factors that contribute to overall performance. The skillful administrator is one who strikes a balance between performance and fault tolerance without sacrificing one for the other.

Disk Mirroring (RAID 1)

In this type of configuration, data is mirrored from one disk to the other participating disk in the mirror set. Data is simultaneously written to the two required disks, which means read operations are significantly faster than systems with no RAID configuration or with a greater degree of fault tolerance. Because a RAID 1 configuration only has one hard drive controller to handle the writing of information to two or more disks, write performance is slower because data is being written to multiple drives from a single drive controller source.

Besides adequate performance it also provides a good degree of fault tolerance. If one drive fails the RAID controller can automatically detect the failure and run solely on the remaining disk with minimal interruption.

The biggest drawback to RAID 1 is the amount of storage capacity that is lost. RAID 1 uses 50% of the total drive capacity for the two drives.

Well Suited

RAID 1 is particularly well suited for the boot drive as well as for volumes containing log files for application and database servers.

Disk Striping with Parity (RAID 5)

In a RAID 5 configuration, data and parity information are striped across all participating disks in the array. RAID 5 requires a minimum of three disks. Even if one of the drives fails within the array, the server can still remain operational.

After the drive fails, Windows Server 2003 continues to operate because of the data contained on the other drives. The parity information gives details of the data that is missing due to the failure. Either Windows Server 2003 or the hardware RAID controller also begins the rebuilding process from the parity information to a spare or new drive.

RAID 5 is most commonly used for the data drive because it is a great compromise among performance, storage capacity, and redundancy. The overall space used to store the striped parity information is equal to the capacity of one drive. For example, a RAID 5 volume with three 200GB disks can store up to 400GB of data.

Hardware Versus Software RAID

Hardware RAID (configured at the disk controller level) is recommended over software RAID (configurable from within the Windows Server 2003) because of its faster performance, greater support of different RAID levels, and capabilities to more easily recover from hardware failures.

Monitoring the Disk Subsystem

Windows Server 2003 application servers typically rely heavily on the disk subsystem and it is therefore a critical component to properly design and monitor. Although the disk object monitoring counters are by default enabled in Windows Server 2003, it is recommended that you disable these counters until such time that you are ready to monitor them. This is because the resource requirements can influence overall system performance. The syntax to disable and re-enable these counters is as follows:

  • diskperf -n disables the counter

  • diskperf -y [\computer_Name] re-enables the counter

Nevertheless, it is important to gather disk subsystem performance statistics over time.

The primary performance-related counters for the disk subsystem are located within the Physical and Logical Disk objects. Critical counters to monitor include, but are not limited to, the following:

  • Physical Disk—% Disk Time. Analyzes the percentage of elapsed time that the selected disk spends on servicing read or write requests. Ideally this value should remain below 50 percent.

  • Logical Disk—% Disk Time. Displays the percentage of elapsed time that the selected disk spends fulfilling read or write requests. It is recommended that this value be 60%–70% or lower.

  • Current Disk Queue Length (both Physical and Logical Disk objects). This counter has different performance indicators depending on the monitored disk drive. On disk drives storing application databases, this value should be lower than the number of spindled drives divided by two. On disk drives storing filesystem data, this value should be lower than one.

Monitoring the Network Subsystem

The network subsystem is by far one of the most difficult subsystems to monitor because of the many different variables. The number of protocols used in the network, the network interface cards (NICs), network-based applications, topologies, subnetting, and more, play vital roles in the network, but they also add to its complexity when you’re trying to determine bottlenecks. Each network environment has different variables; therefore, the counters that you’ll want to monitor will vary.

The information that you’ll want to gain from monitoring the network pertains to network activity and throughput. You can find this information with the Performance Console alone, but it will be difficult at best. Instead, it’s important to use other tools, such as the Network Monitor, in conjunction with Performance Console to get the best representation of network performance possible. You might also consider using third-party network analysis tools such as sniffers to ease monitoring and analysis efforts. Using these tools simultaneously can broaden the scope of monitoring and more accurately depict what is happening on the wire.

Because the TCP/IP suite is the underlying set of protocols for a Windows Server 2003 network subsystem, this discussion of capacity analysis focuses on this protocol. The TCP/IP counters are added after the protocol is installed (by default).

There are several different network performance objects relating to the TCP/IP protocol, including ICMP, IP, Network Interface, NetBT, TCP, UDP, and more. Other counters such as FTP Server and WINS Server are added after these services are installed. Because entire books are dedicated to optimizing TCP/IP, this section focuses on a few important counters that you should monitor for capacity-analysis purposes.

First, examining error counters, such as Network Interface: Packets Received Errors or Packets Outbound Errors, is extremely useful in determining whether traffic is easily traversing the network. A greater number of errors indicates that packets must be present, causing more network traffic. If a high number of errors is persistent on the network, throughput will suffer. This might be caused by a bad NIC or unreliable links.

If network throughput appears to be slowing because of excessive traffic, you should keep a close watch on the traffic being generated from network-based services such as the ones described in Table 23.2.

Table 23.2. Network-based Service Counters to Monitor Network Traffic

Counter

Description

NBT Connection: Bytes Total/sec

Monitors the network traffic generated by NBT connections

Redirector: Bytes Total/sec

Processes data bytes received for statistical calculations

Server: Bytes Total/sec

Monitors the network traffic generated by the Server service

Optimizing Performance by Server Roles

In addition to monitoring the common set of bottlenecks (memory, processor, disk subsystem, and network subsystem), the functional roles of the server influence what other counters you should monitor. The following sections outline some of the most common roles for Windows Server 2003 that also require the use of additional performance counters.

Terminal Services Server

Windows Server 2003 Terminal Services comes in two flavors: Remote administration mode and Application server mode. The Remote administration mode monitors and services a Terminal Services server remotely. Because it has minimal resource requirements, and therefore is more efficient, this discussion focuses primarily on the Application server mode.

Terminal Services has its own performance object for the Performance Console called the Terminal Services Session object. It provides resource statistics such as errors, cache activity, network traffic from Terminal Services, and other session-specific activity. Many of these counters are similar to those found in the Process object. Some examples include % Privileged Time, % Processor Time, % User Time, Working Set, Working Set Peak, and so on.

For More Information on Terminal Services

You can find more information on Terminal Services in Chapter 20, “Leveraging Thin Client Terminal Services.”

Three important areas to always monitor for Terminal Services capacity analysis are the memory, processor, and application processes for each session. Application processes are by far the hardest to monitor and control because of the extreme variances in programmatic behavior. For example, all applications might be 32-bit, but some might not be certified to run on Windows Server 2003. You might also have in-house applications running on Terminal Services that might be poorly designed or too resource-intensive for the workloads they are performing.

Domain Controllers

A Windows Server 2003 domain controller (DC) houses the Active Directory (AD) and might have additional roles such as being responsible for one or more Flexible Single Master Operation (FSMO) roles (schema master, domain naming master, relative ID master, PDC Emulator, or infrastructure master) or a global catalog (GC) server. Also, depending on the size and design of the system, a DC might serve many other functional roles. In this section, AD, replication, and DNS monitoring will be explored.

Monitoring AD

Active Directory is the heart of Windows Server 2003 systems. It’s used for many different facets, including, but not limited to, authentication, authorization, encryption, and Group Policies. Because AD plays a central role in a Windows Server 2003 network environment, it must perform its responsibilities as efficiently as possible. Each facet by itself can be optimized, but this section focuses on the NTDS and Database objects.

The NTDS object provides various AD performance indicators and statistics that are useful for determining AD’s workload capacity. Many of these counters can be used to determine current workloads and how these workloads might affect other system resources. There are relatively few counters in this object, so it’s recommended that you monitor each one in addition to the common set of bottleneck objects. With this combination of counters, you can determine whether the system is overloaded.

Another performance object that you should use to monitor AD is the Database object. This object is not installed by default, so you must manually add it to be able to start gathering more information on AD.

To load the Database object, perform the following steps:

  1. Copy the performance DLL (esentprf.dll) located in %SystemRoot%System32 to any directory (for example, c:esent).

  2. Launch the Registry Editor (Regedt32.exe).

  3. Create the Registry key HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesESENT.

  4. Create the Registry key HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesESENTPerformance.

  5. Select the ESENTPerformance subkey.

  6. Create the value Open using data type REG_SZ and string equal to OpenPerformanceData.

  7. Create the value Collect using the data type REG_SZ and string equal to CollectPerformanceData.

  8. Create the value Close using the data type REG_SZ and string equal to ClosePerformanceData.

  9. Create the value Library using the data type REG_SZ and string equal to c:esentesentprf.dll.

  10. Exit the Registry Editor.

  11. Open a command prompt and change directory to %SystemRoot%System32.

  12. Run Lodctr.exe Esentprf.ini at the command prompt.

After you complete the Database object installation, you can execute the Performance Console and use the Database object to monitor AD. Some of the relevant counters contained within the Database object to monitor AD are described in Table 23.3.

Table 23.3. AD Performance Counters

Database Counter

Description

Cache % Hit

The percentage of page requests for the database file that were fulfilled by the database cache without causing a file operation. If this percentage is low (85% or lower), you might consider adding more memory.

Cache Page Fault Stalls/sec

The number of page faults per second that cannot be serviced because there are no pages available for allocation from the database cache. This number should be low if the system is configured with the proper amount of memory.

Cache Page Faults/sec

The number of page requests per second for the database file that require the database cache manager to allocate a new page from the database cache.

Cache Size

The amount of system memory used by the database cache manager to hold commonly used information from the database to prevent file operations.

File Operations Pending

The number of reads and writes issued by the database cache manager to the database file or files that the operating system is currently processing. High numbers might indicate memory shortages or an insufficient disk subsystem.

Monitoring DNS

The domain name system (DNS) has been the primary name resolution mechanism in Windows 2000 and continues to be with Windows Server 2003. There are numerous counters available for monitoring various aspects of DNS in Windows Server 2003. The two most important categories in terms of capacity analysis are name resolution response times and workloads, as well as replication performance.

The counters listed in Table 23.4 are used to compute name query traffic and the workload that the DNS server is servicing. These counters should be monitored along with the common set of bottlenecks to determine the system’s health under various workload conditions. If users are noticing slower responses, you can compare the query workload usage growth with your performance information from memory, processor, disk subsystem, and network subsystem counters.

Table 23.4. Counters to Monitor DNS

Counter

Description

DYNAMIC UPDATE RECEIVED/Sec

Dynamic Update Received/sec is the average number of dynamic update requests received by the DNS server in each second.

RECURSIVE QUERIES/Sec

Recursive Queries/sec is the average number of recursive queries received by the DNS server in each second.

RECURSIVE QUERY FAILURE/Sec

Recursive Query Failure/sec is the average number of recursive query failures in each second.

SECURE UPDATE RECEIVED/Sec

Secure Update Received/sec is the average number of secure update requests received by the DNS server in each second.

TCP QUERY RECEIVED/Sec

TCP Query Received/sec is the average number of TCP queries received by the DNS server in each second.

TCP RESPONSE SENT/Sec

TCP Response Sent/sec is the average number of TCP responses sent by the DNS server in each second.

TOTAL QUERY RECEIVED/Sec

Total Query Received/sec is the average number of queries received by the DNS server in each second.

TOTAL RESPONSE SENT/Sec

Total Response Sent/sec is the average number of responses sent by the DNS server in each second.

UDP QUERY RECEIVED/Sec

UDP Query Received/sec is the average number of UDP queries received by the DNS server in each second.

UDP RESPONSE SENT/Sec

UDP Response Sent/sec is the average number of UDP responses sent by the DNS server in each second.

Comparing results with other DNS servers in the environment can also help you to determine whether you should relinquish some of the name query responsibility to other DNS servers that are less busy.

Replication performance is another important aspect of DNS. Windows Server 2003 supports legacy DNS replication, also known as zone transfers, which populate information from the primary DNS to any secondary servers. There are two types of legacy DNS replication: incremental (propagating only changes to save bandwidth) and full (the entire zone file is replicated to secondary servers).

Full zone transfers (AXFR) occur on the initial transfers and then the incremental zone transfers (IXFR) are performed thereafter. The performance counters for both AXFR and IXFR (see Table 23.5) measure both requests and the successful transfers. It is important to note that if your network environment integrates DNS with non-Windows systems, it is recommended to have those systems support IXFR.

Table 23.5. DNS Zone Transfer Counters

Counter

Description

AXFR Request Received

Total number of full zone transfer requests received by the DNS Server service when operating as a master server for a zone

AXFR Request Sent

Total number of full zone transfer requests sent by the DNS Server service when operating as a secondary server for a zone

AXFR Response Received

Total number of full zone transfer requests received by the DNS Server service when operating as a secondary server for a zone

AXFR Success Received

Total number of full zone transfers received by the DNS Server service when operating as a secondary server for a zone

AXFR Success Sent

Total number of full zone transfers successfully sent by the DNS Server service when operating as a master server for a zone

IXFR Request Received

Total number of incremental zone transfer requests received by the master DNS server

IXFR Request Sent

Total number of incremental zone transfer requests sent by the secondary DNS server

IXFR Response Received

Total number of incremental zone transfer responses received by the secondary DNS server

IXFR Success Received

Total number of successful incremental zone transfers received by the secondary DNS server

IXFR Success Sent

Total number of successful incremental zone transfers sent by the master DNS server

If your network environment is fully Active Directory–integrated, the counters listed in Table 23.5 will all be zero.

Monitoring AD Replication

Measuring AD replication performance is a complex process because of the many variables associated with replication. They include, but aren’t limited to, the following:

  • Intrasite versus intersite replication

  • The compression being used (if any)

  • Available bandwidth

  • Inbound versus outbound replication traffic

Fortunately, there are performance counters for every possible AD replication scenario. These counters are located within the NTDS object and are prefixed by the primary process that is responsible for AD replication—the Directory Replication Agent (DRA).

Summary

Although most organizations pay little attention to performance tuning and optimization, performing basic steps to monitor a server and to track performance statistics helps organizations better understand the operation of their systems. By setting realistic business operation policies along with monitoring system performance, when a problem arises, the administrators of the organization have a better idea what the normal operation of the system is, and can statistically analyze information to isolate problems more easily.

Additionally, performance optimization helps an organization minimize bottlenecks or inefficiencies that can lead to reliability problems and possibly system failure. Performance management helps an organization improve operational effectiveness, and creates a network that can run more efficiently.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset